Udacity Data Scientist Nanodegree : Prerequisite — Practical Statistics(L1, L2)

Lesson 1 / 2: Descriptive Statistics

What is data?

Data Types

Quantitative data takes on numeric values that allow us to perform mathematical operations (like the number of dogs).We can think of quantitative data as being either continuous or discrete.

Categorical are used to label a group or set of items (like dog breeds — Collies, Labs, Poodles, etc.). We can divide categorical data further into two types: Ordinal and Nominal.

Quantitative variables

Measures of Centre (3M’s: Mean, Median, Mode)

Mean

Median

Mode

What is Notation?

Random Variables

Capital vs. Lower

R.V. — Capital letter, Observed values — Lowercase letters

Summation

雖然是國中數學但還是放一下吧。

Measures of Spread — How far are points from one another

Histogram — The most common visual for quantitative data

5 Number Summary

Standard Deviation and Variance

母體的標準差。

Important Final Points

Shape of Distribution

最常見的對稱分佈:mean = median = mode

Outliers

Common Techniques

When outliers are present we should consider the following points.

1. Noting they exist and the impact on summary statistics.

2. If typo — remove or fix

3. Understanding why they exist, and the impact on questions we are trying to answer about our data.

4. Reporting the 5 number summary values is often a better indication than measures like the mean and standard deviation when we have outliers.

5. Be careful in reporting. Know how to ask the right questions.

Outliers Advice

1. Plot your data to identify if you have outliers.

2. Handle outliers accordingly via the methods above.

3. If no outliers and your data follow a normal distribution — use the mean and standard deviation to describe your dataset, and report that the data are normally distributed.

Side note

If you aren’t sure if your data are normally distributed, there are plots called normal quantile plots and statistical methods like the Kolmogorov-Smirnov test that are aimed to help you understand whether or not your data are normally distributed. Implementing this test is beyond the scope of this class, but can be used as a fun fact.

4. If you have skewed data or outliers, use the five number summary to summarize your data and report the outliers.

敘述統計與推論統計

Descriptive Statistics

Descriptive statistics is about describing our collected data.

Inferential Statistics

Inferential Statistics is about using our collected data to draw conclusions to a larger population.

We looked at specific examples that allowed us to identify the

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。