Udacity Data Scientist Nanodegree : Prerequisite — SQL(L3, L4, L5)

Lesson 3: Aggregating / Lesson 4: Subquery, temporary tables / Lesson 5: Data Cleaning

NULL — No data

WHERE variable IS NULL    

COUNT

  • 僅回傳 Non-Null data
  • 算總共有幾個
SELECT COUNT(*)
FROM accounts;

SUM

  • 不能用 SUM(*)
  • Unlike COUNT, you can only use SUM on numeric columns. However, SUM will ignore NULL values.
  • SUM treats NULL as 0

Aggregation Reminder

An important thing to remember: aggregators only aggregate vertically — the values of a column. If you want to perform a calculation across rows, you would do this with simple arithmetic.

MIN / MAX

  • 也無視 NULL values
  • Depending on the column type, MIN will return the lowest number, earliest date, or non-numerical value as early in the alphabet as possible. As you might suspect, MAX does the opposite — it returns the highest number, the latest date, or the non-numerical value closest alphabetically to “Z.”

AVG — Average

  • This aggregate function again ignores the NULL values in both the numerator and the denominator.
  • If you want to count NULLs as zero, you will need to use SUM and COUNT.

Tips — Median

  • 通常是拿來計算中央趨勢的最好參數
  • 但是 SQL 計算中位數很困難,難到會被當作是面試題的程度
SELECT *
FROM (SELECT total_amt_usd
FROM orders
ORDER BY total_amt_usd
LIMIT 3457) AS Table1
ORDER BY total_amt_usd DESC
LIMIT 2;

GROUP BY

  • Allow creating segments that will aggregate independent from one another
  • 當你在聚合(aggregate)的時候,其實是在有效的折疊(collapse)返回的行數
所以這個程式碼會有錯,因為只有後三個被折疊到
  • 夾在 WHERE 跟 ORDER BY 中間

GROUP BY — Expert Tip (看不懂)

Before we dive deeper into aggregations using GROUP BY statements, it is worth noting that SQL evaluates the aggregations before the LIMIT clause. If you don’t group by any columns, you’ll get a 1-row result — no problem there. If you group by a column with enough unique values that it exceeds the LIMIT number, the aggregates will be calculated, and then some rows will simply be omitted from the results.

DISTINCT

DISTINCT is always used in SELECT statements, and it provides the unique rows for all columns written in the SELECT statement. Therefore, you only use DISTINCT once in any particular SELECT statement.

SELECT DISTINCT column1, column2, column3
FROM table1;
SELECT DISTINCT column1, DISTINCT column2, DISTINCT column3
FROM table1;

DISTINCT — Expert Tip

It’s worth noting that using DISTINCT, particularly in aggregations, can slow your queries down quite a bit.

HAVING

  • WHERE 不允許你用 aggregating function 去當條件,這時我們就需要 HAVING

HAVING — Expert Tip

HAVING is the “clean” way to filter a query that has been aggregated, but this is also commonly done using a subquery. Essentially, any time you want to perform a WHERE on an element of your query that was created by an aggregate, you need to use HAVING instead.

DATE function

  • GROUPing BY a date column is not usually very useful in SQL, as these columns tend to have transaction data down to a second

Lesson 4: Subquery

Subquery

WITH — CTE

  • The WITH statement is often called a Common Table Expression or CTE.
跟上面那張圖一樣,更快

Lesson 5: Data cleaning

LEFT & RIGHT

Here we looked at three new functions:

  1. RIGHT
  2. LENGTH

POSITION, STRPOS, & SUBSTR

In this lesson, you learned about:

  1. STRPOS
  2. LOWER
  3. UPPER

CONCAT

In this lesson you learned about:

  1. Piping ||

CAST

In this video, you saw additional functionality for working with dates including:

  1. CAST
  2. Casting with ::

COALESCE = IFNULL

In general, COALESCE returns the first non-NULL value passed for each row.

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。