Udacity Data Scientist Nanodegree : Prerequisite — Practical Statistics(L1, L2)

Lesson 1 / 2: Descriptive Statistics

What is data?

Data Types

Quantitative data takes on numeric values that allow us to perform mathematical operations (like the number of dogs).We can think of quantitative data as being either continuous or discrete.

Categorical are used to label a group or set of items (like dog breeds — Collies, Labs, Poodles, etc.). We can divide categorical data further into two types: Ordinal and Nominal.

Quantitative variables

Measures of Centre (3M’s: Mean, Median, Mode)




What is Notation?

Random Variables

Capital vs. Lower

R.V. — Capital letter, Observed values — Lowercase letters



Measures of Spread — How far are points from one another

Histogram — The most common visual for quantitative data

5 Number Summary

Standard Deviation and Variance


Important Final Points

Shape of Distribution

最常見的對稱分佈:mean = median = mode


Common Techniques

When outliers are present we should consider the following points.

1. Noting they exist and the impact on summary statistics.

2. If typo — remove or fix

3. Understanding why they exist, and the impact on questions we are trying to answer about our data.

4. Reporting the 5 number summary values is often a better indication than measures like the mean and standard deviation when we have outliers.

5. Be careful in reporting. Know how to ask the right questions.

Outliers Advice

1. Plot your data to identify if you have outliers.

2. Handle outliers accordingly via the methods above.

3. If no outliers and your data follow a normal distribution — use the mean and standard deviation to describe your dataset, and report that the data are normally distributed.

Side note

If you aren’t sure if your data are normally distributed, there are plots called normal quantile plots and statistical methods like the Kolmogorov-Smirnov test that are aimed to help you understand whether or not your data are normally distributed. Implementing this test is beyond the scope of this class, but can be used as a fun fact.

4. If you have skewed data or outliers, use the five number summary to summarize your data and report the outliers.


Descriptive Statistics

Descriptive statistics is about describing our collected data.

Inferential Statistics

Inferential Statistics is about using our collected data to draw conclusions to a larger population.

We looked at specific examples that allowed us to identify the