Udacity Data Scientist Nanodegree : Prerequisite — SQL(L1, L2)

Lesson 1: Basic SQL / Lesson 2: SQL joins

Entity Relationship Diagrams

  1. The names of the tables.
  2. The columns in each table.
  3. The way the tables work together.

What to Notice

  1. web_events
  2. accounts
  3. orders
  4. sales_reps
  5. region

Why Do Data Analysts Use SQL?

  • SQL is easy to understand.
  • Traditional databases allow us to access data directly.
  • Traditional databases allow us to audit and replicate our data.
  • SQL is a great tool for analyzing multiple tables at once.
  • SQL allows you to analyze more complex questions than dashboard tools like Google Analytics.

Video: How Databases Store Data

  1. Data in databases is stored in tables that can be thought of just like Excel spreadsheets.
  2. All the data in the same column must match in terms of data type.
  3. Consistent column types are one of the main reasons working with databases is fast.
    Often databases hold a LOT of data. So, knowing that the columns are all of the same type of data means that obtaining data from a database can still be fast.

Small Differences

Types of Statements

  1. CREATE TABLE is a statement that creates a new table in a database.
  2. DROP TABLE is a statement that removes a table in a database.
  3. SELECT allows you to read data and display it. This is called a query.

SELECT & FROM

  • What data do you want to pull from?
  • Which elements from the database do you want to pull?
  1. SELECT indicates which column(s) you want to be given the data for.
  2. FROM specifies from which table(s) you want to select the columns. Notice the columns need to exist in this table.
  • SELECT * FROM orders

Semicolons

LIMIT

  • The LIMIT statement is useful when you want to see just the first few rows of a table. This can be much faster for loading than if we load the entire dataset.
  • The LIMIT command is always the very last part of a query. An example of showing just the first 10 rows of the orders table with all of the columns might look like the following:
SELECT *
FROM orders
LIMIT 10;

ORDER BY

Pro Tip

SELECT account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC, account_id;

Solutions to previous ORDER BY Questions

  • Write a query that displays the order ID, account ID, and total dollar amount for all the orders, sorted first by the account ID (in ascending order), and then by the total dollar amount (in descending order).
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY account_id, total_amt_usd DESC;
  • Now write a query that again displays order ID, account ID, and total dollar amount for each order, but this time sorted first by total dollar amount (in descending order), and then by account ID (in ascending order).
SELECT id, account_id, total_amt_usd
FROM orders
ORDER BY total_amt_usd DESC, account_id;

WHERE

  • allows you to filter a set of results based on specific criteria
  1. > (greater than), < (less than)
  2. >= (greater than or equal to), <= (less than or equal to)
  3. = (equal to), != (not equal to)
SELECT *
FROM orders
WHERE account_id = 4251
ORDER BY occurred_at
LIMIT 1000;

Arithmetic Operators

Derived Columns

  1. * (Multiplication)
  2. + (Addition)
  3. - (Subtraction)
  4. / (Division)
SELECT id, (standard_amt_usd / total_amt_usd) * 100 AS std_percent
FROM orders
LIMIT 10;

Order of Operations

  1. Standard_qty / standard_qty + gloss_qty + poster_qty
  2. standard_qty / (standard_qty + gloss_qty + poster_qty)

Introduction to Logical Operators

  1. LIKE This allows you to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.
  2. IN This allows you to perform operations similar to using WHERE and =, but for more than one condition.
  3. NOT This is used with IN and LIKE to select all of the rows NOT LIKE or NOT IN a certain condition.
  4. AND & BETWEEN These allow you to combine operations where all combined conditions must be true.
  5. OR This allows you to combine operations where at least one of the combined conditions must be true.

LIKE

SELECT *
FROM orders
WHERE referrer_url LIKE '%google%' > 會找到「夾雜」著 google 的資料

IN

  • Allows you to filter data based on several possible values

Expert Tip

SELECT name, primary_poc, sales_rep_id
FROM accounts
WHERE name IN ('Walmart', 'Target', 'Nordstrom');

NOT

AND

BETWEEN Operator

WHERE column >= 6 AND column <= 10
WHERE column BETWEEN 6 AND 10
SELECT name
FROM accounts
WHERE name NOT LIKE 'C%' AND name LIKE '%s';

OR

Lesson 2: SQL joins

Why Would We Want to Split Data Into Separate Tables?

  • Orders and accounts(訂單和漲戶) are different types of objects
  • This allows queriers to execute more quickly

Database Normalization

  1. Are the tables storing logical groupings of the data?
  2. Can I make changes in a single location, rather than in many tables for the same information?
  3. Can I access and manipulate data quickly and efficiently?

Introduction to JOINs(Inner JOIN)

  • 首先,他一樣需要 SELECT / FROM
  • 可以把它想成第二條 FROM 子句。FROM 子句拉的是第一張我們想要拉資料的 table,JOIN 則是第二張 Table。
  • It identifies the table where the data that we want to join lives.
  • 再來需要 ON:指出兩個表格之間的關係。The ON clause specifies the column on which you’d like to merge the two tables together.
SELECT table_column1, table_column2...
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name;

例子

這是一個客戶資料表(customers)
而這是產品訂單的資料表 orders
此為查詢結果

ERD Reminder(entity relationship diagram)

後面課程會用到
  • PK: Primary key. A primary key exists in every table, and it is a column that has a unique value for every row.
  • If you look at the first few rows of any of the tables in our database, you will notice that this first, PK, column is always unique. For this database it is always called id, but that is not true of all databases.

Primary and Foreign Keys

Primary Key (PK)

Foreign Key (FK)

  1. region_id
  2. account_id
  3. sales_rep_id

Primary — Foreign Key Link

  1. The region_id is the foreign key.
  2. The region_id is linked to id — this is the primary-foreign key link that connects these two tables.
  3. The crow’s foot shows that the FK can actually appear in many rows in the sales_reps table.
  4. While the single line is telling us that the PK shows that id appears only once per row in this table.

JOIN Revisited

SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;

Notice

JOIN More than Two Tables

The Code

SELECT *
FROM web_events
JOIN accounts
ON web_events.account_id = accounts.id
JOIN orders
ON accounts.id = orders.account_id
SELECT web_events.channel, accounts.name, orders.total

Alias

SELECT o.*
a.*
FROM orders o # alias 空白鍵加縮寫
JOIN accounts a
ON o.account_id = a.id
FROM tablename AS t1
JOIN tablename2 AS t2
SELECT col1 + col2 AS total, col3
FROM tablename t1
JOIN tablename2 t2
SELECT col1 + col2 total, col3

Aliases for Columns in Resulting Table

Select t1.column1 aliasname, t2.column2 aliasname2
FROM tablename AS t1
JOIN tablename2 AS t2

Some code

SELECT r.name region, s.name rep, a.name account  # 把r.name這欄名字改成region
FROM sales_reps s
JOIN region r
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
ORDER BY a.name;

LEFT and RIGHT JOINs

Inner join: only return rows that appear in both tables

如果出現尚未下訂單的客戶,inner join 就幫不上忙了
inner join 是中間的交集
left join
right join
right join result
left join 跟 right join 可以交換

JOIN Check In

INNER JOINs

OUTER JOINS

JOINs and Filtering

  • Logic in the ON clause reduces the roes before combining the tables
  • Logic in the WHERE clause occurs after the join occurs
SELECT orders.*, account.*
FROM orders
LEFT JOIN accounts
ON orders.account_id = accounts.id
AND accounts.sales_rep_id = 321500

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joe Chao

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。