[C2, Week1]The Bits and Bytes of Computer Networking(Google IT Support) 網路概論

這堂課目前我上到一半,覺得十分困難,每個字分開都能理解,合在一起就變成了聽不懂的英文,雖然有中翻但翻譯的並不是很好,希望透過這個筆記能夠梳理出一個完整的架構來。

第一週是 TCP/IP 的基本介紹,加上 Physical layer 跟 Data Link layer 的詳述。

The TCP/IP Five-Layer Network Model

TCP/IP 總共分為五層(Layer name/Protocol/Protocol Data Unit/Addressing)

  1. Physical layer / 10 Base T, 802.11 / Bits / N/A

看不太懂,對吧?但是這個很重要,可以先記著,後續都會有解釋。以下先提及簡單的定義與中翻。

依據定義來說,越接近硬體的階層為底層 (layer 1),越接近應用程式的則是高層 (layer 7)。 不論是接收端還是發送端,每個一階層只認識對方的同一階層資料。 而整個傳送的過程就好像人們在玩整人遊戲一般,我們透過應用程式將資料放入第五層的包裹,再將第五層的包裹放到第四層的包裹內, 依序一直放到第一層的最大的包裹內,然後傳送出去給接收端。接收端的主機就得由第一個包裹開始,依序將每個包裹拆開, 然後一個一個交給對應負責的階層來視察!

既然說是包裹,那我們都知道,包裹表面都會有個重要的資訊,這些資訊包括有來自哪裡、要去哪裡、接收者是誰等等, 而包裹裡面才是真正的資料。同樣的,在七層協定中,每層都會有自己獨特的表頭資料 (header),告知對方這裡面的資訊是什麼, 而真正的資料就附在後頭囉!

那 TCP/IP 是如何運作的呢?我們就拿妳常常連上的 Yahoo 入口網站來做個說明好了,整個連線的狀態可以這樣看:

  1. 應用程式階段:妳打開瀏覽器,在瀏覽器上面輸入網址列,按下 [Enter]。此時網址列與相關資料會被瀏覽器包成一個資料, 並向下傳給 TCP/IP 的應用層;

Physical layer: Represents the physical devices that interconnect computers, e.g. cables

Data Link layer: Responsible for defining a common way of interpreting these signals, so network devices can communicate.

Physical layer 的定義十分清楚,就是代表所有相關跟電腦連接的設備。相較於 Physical layer 的纜線、連接器等。而 Physical Layer 所負責的是:定義所使用的媒體設備之電壓與訊號等, 同時還必須瞭解資料訊框轉成位元串的編碼方式,最後連接實體媒體並傳送/接收位元串。

Data Link layer 則負責制定一些規範來解釋這些訊號,使得網路設備可以互相連通。而 Data Link layer 最常見的網路協議是 Ethernet 或 Wi-Fi。關於 Ethernet,Google 是這樣定義的:

The Ethernet standards also define a protocol responsible for getting data to nodes on the same network or link.

在解釋這句話之前,要先解釋一個概念 Node,可以稱為節點,通常就是作為 Server 或是 Client 端的設備,兩端都可以是節點。再來對 Ethernet 的中文解釋是這樣的,Ethernet 標準定義了一個把數據傳輸去同網路內節點的協議,這裡有一個重點,他只能處理在同一個網路內的節點的數據傳輸。那不同網路之間呢?這就是第三層的重要性。

Network layer: Allows different networks to communicate with each other through devices known as routers.

Internetwork: A collection of networks connected together through routers, the most famous of these being the Internet.

接下來來到第三層,處理的是不同網路內的連結,通過一個設備叫 Router(路由器)來相互交流。而一群被路由器連接在一起的各個網路就叫做 Internet。這層最常見的協議則是 IP(Internet Protocol),是 Internet 跟世界各地小型網路的中心。

Transport layer: sorts out which client and server programs are supposed to get that data.

處理完了各個網路的相互連結,接下來會有兩個東西出現 Client 和 Server。Client 發送需求,而 Server 回饋需求,因此有了第四層的出現,Transport layer,好得以確認是哪一端要取得資料。最常用的協議則是 Transmission Control Protocol(TCP)。而 TCP 協議則利用了 IP,因此有了 TCP/IP 這個詞,但其實也有其他協議運用 IP 協議,如 UDP(User Datagram Protocol),但 TCP 跟 UDP 最大的差別在於, TCP 的原理能確保數據的可靠傳輸,然而 UDP 不行。一言以蔽之,也就是確保數據能夠到達節點上的應用程式。

這五層就像寄快遞一樣,Physical layer 就像貨車與道路,Data Link layer 則像是車從一個路口到下一個路口,Network layer 則是確認從A到B的路,Transport layer 則是郵差敲你的門告訴你要收快遞了,Application layer 則是包裹本身。

The Basics of Networking Devices

關於硬體的部分,不多加解釋。

Cables(電纜):Cables are what connect different devices to each other, allowing data to be transmitted over them, allowing you to form point-to-point networking connections,可以分作銅纜、光纖。The most common forms of copper twisted pair cables used in networking, are Cat 5(較舊,會有 Crosstalk 產生,多被 Cat 5e取代), Cat 5e, and Cat 6 cables.

Crosstalk: when an electrical pulse on one wire is accidentally detected on another wire.

Hub: a physical layer device that allows for connections from many computers at once. 原理是他會「同時」發送訊號給「所有」在 Hub中的電腦,然後電腦會判斷說是不是他要接收的,但這也造成 Collision domain

A collision domain, is a network segment where only one device can communicate at a time. If multiple systems try sending data at the same time, the electrical pulses sent across the cable can interfere with each other. 因為這個的關係,所以現在很少人用 Hub了。

多數人現在用的是 network switch(aka switching hub),兩者差沒多少,差別在於 Hub在的是 TCP/IP 的第一層,Switch 則在 TCP/IP 的第二層,所以 Switch可以透過Ethernet Protocol 檢查到底要把訊息傳給誰。

Hubs and switches: the primary devices used to connect computers on a single network, usually referred to as a LAN, or local area network.

Router: 把數據傳到其他網路。 A router is a device that knows how to forward data between independent networks. At layer 3, 同樣可以透過檢查 IP address 去傳輸資料

Routers share data with each other via a protocol known as BGP, or border gateway protocol, that let’s them learn about the most optimal paths to forward traffic.(讓他們了解轉發數據的最佳路徑)

Router的用途在於接收 LAN的數據,再傳送給 ISP(Internet Service Provider)

The Physical Layer

絕大多數不進行翻譯。

The physical layer consists of devices and means of transmitting bits across computer networks. A bit is the smallest representation of data that a computer can understand; it’s a one or zero.

A standard copper network cable, once connected to devices on both ends, will carry a constant electrical charge. Ones and zeros are sent across those network cables through a process called modulation. Modulation is a way of varying the voltage of this charge moving across the cable. When used for computer networks, this kind of modulation is more specifically known as line coding.

The most common type of cabling used for connecting computing devices is known as twisted pair. Cables allow for duplex communication. Duplex communication: the concept that information can flow in both directions across the cable. Simplex communication/ Full-Duplex/ half- duplex

這些電纜允許 “Duplex Communication”,也就是說,訊息能夠在纜線上雙向流動。而 Simple Communication 則是單向,Full-Duplex 指的是能夠「同時」雙向流動,而 Half-duplex 指的則是只有一方能夠流動訊息,但雙方都能向對方流動訊息。

The Data Link Layer

Ethernet and MAC Addresses

The protocol most widely used to send data across individual links is known as Ethernet. One of the primary purposes of this layer is to essentially abstract away the need for any other layers to care about the physical layer and what hardware is in use. By dumping this responsibility on the data link layer, the Internet, transport and application layers can all operate the same no matter how the device they’re running on is connected.

正如上面所說,這層最常見的協議就是 Ethernet。而這層的最大用途在於關心 Physical layer 與其正在使用的硬體的需要,簡化其他層的需要 Physical layer 的途徑。透過這個轉化途徑,將責任轉移到 Data Link layer,再上層就不需要擔心底下硬體是如何連結的了(這是我目前能想到最好的翻譯,如有錯誤歡迎指正,因為 Coursera 本身的翻譯很怪。)

Ethernet, as a protocol, solved this problem by using a technique known as carrier sense multiple access with collision detection.(CSMA/CD)

歷史背景是這樣的,當時 Switch 跟 Hub 都尚未發明,因此電腦在同一個網段中極有可能是 Collision domain,而 Ethernet 這個協議本身透過了 CSMA/CD 這個技術,成功處理掉了 Collision Domain 的問題。

CSMA/CD is used to determine when the communications channels are clear and when the device is free to transmit data.

原理是如果 Network segment 沒有數據傳輸,則 Node 就可以隨時傳輸數據,反之,如果數據互相撞到了,則會停一段時間再傳輸數據。

When a network segment is a collision domain, it means that all devices on that segment receive all communication across the entire segment. This means we need a way to identify which node the transmission was actually meant for. This is where something known as a media access control address or MAC address comes into play.

MAC address(media access control address): a globally unique identifier attached to an individual network interface.It’s a 48-bit number normally represented by six groupings of two hexadecimal numbers.

當這個網段是 Collision domain 時,代表著該網段上的所有設備都會接收整個網段中的所有訊息。所以,我們需要一個方法來判斷到底要傳輸給哪個 Node,於是就有了 MAC address 的出現。MAC address 是全球唯一的,而他的組成成分是 48 個數字,通常由兩個十六進位數字組成的六組表示。

Octet, in computer networking, is any number that can be represented by 8 bits. In this case, two hexadecimal digits can represent the same numbers that 8 bits can.

另一種引用MAC地址中每組數字的方法是二進位八位數(位元組)。

MAC address 分成兩個部分:The first three octets of a MAC address are known as the organizationally unique identifier or OUI. 可以從製造商判斷。第二部分則隨意。

Ethernet uses MAC addresses to ensure that the data it sends has both an address for the machine that sent the transmission, as well as the one that the transmission was intended for.

Unicast, Multicast, and Broadcast

Unicast: Ways for one device to transmit data to one other device. A unicast transmission is always meant for just one receiving address.

At the Ethernet level, this is done by looking at a special bit in the destination MAC address. If the least significant bit in the first octet of a destination address is set to zero, it means that Ethernet frame is intended for only the destination address.

首先是 Unicast,一方「單向」傳遞數據給另外一方。這要怎麼做呢?在 Ethernet 這層中,我們可以透過 MAC address 的特殊數字去判斷,如果第一個位元組中的終點位址被設為 0 ,則代表這個 Ethernet Frame 僅用於目的地址。這代表著數據將會被發送到 Collision Domain 中的所有設備,但僅由預期的目的地實際接受並處理。

If the least significant bit in the first octet of a destination address is set to one, it means you’re dealing with a multicast frame. A multicast frame is similarly set to all devices on the local network signal. What’s different is that it will be accepted or discarded by each device depending on criteria aside from their own hardware MAC address.

Broadcast: sent to every single device on a LAN. This is accomplished by using a special destination known as a broadcast address. The Ethernet broadcast address is all Fs. Ethernet broadcasts are used so that devices can learn more about each other.

Dissecting an Ethernet Frame

Data packet: an all-encompassing term that represents any single set of binary data being sent across a network link.(just a concept)

Data packets at the Ethernet level are known as Ethernet frames. An Ethernet frame is a highly structured collection of information presented in a specific order

Data Packet 算是一個包羅萬象的術語,他並不包含於特定哪一層,僅僅代表一個概念。而在 Data Link Layer,也就是 Ethernet 這層,Data Packets 叫做 Ethernet Frames,透過這個方法,Physical layer可以把這些 bit 轉換成有意義的數據。

Ethernet Frames Order: Preamble(8 bytes) — SFD(1 byte) — Destination address(6 bytes) — Source address(6 bytes) — VLAN header(4 bytes) — Ether-type(2 bytes) — Payload(0–1500 bytes) — FCS(4 bytes)

Preamble: 8 bytes or 64 bits long and can itself be split into two sections. First section(7 bytes) — buffer between frames, and the second section — SFD(1 byte)

Start frame delimiter(SFD): Signals to a receiving device that the preamble is over and that the actual frame contents will now follow.

Destination address: The hardware address of the intended recipient.(預期收件人的硬體地址)

VLAN header: Indicates that the frame itself is what’s called a VLAN frame. If a VLAN header is present, the EtherType field follows it.

VLAN stands for virtual LAN. It’s a technique that lets you have multiple logical LANs operating on the same physical equipment. Any frame with a VLAN tag will only be delivered out of a switch interface configured to relay that specific tag.

EtherType field: It’s 16 bits long and used to describe the protocol of the contents of the frame.

Payload: in networking terms is the actual data being transported, which is everything that isn’t a header.

Frame Check Sequence: This is a 4-byte or 32-bit number that represents a checksum value for the entire frame. This checksum value is calculated by performing what’s known as a cyclical redundancy check against the frame.

A cyclical redundancy check or CRC, is an important concept for data integrity and is used all over computing, not just network transmissions. A CRC is basically a mathematical transformation that uses polynomial division to create a number that represents a larger set of data. Anytime you perform a CRC against a set of data, you should end up with the same checksum number. The reason it’s included in the Ethernet frame is so that the receiving network interface can infer if it received uncorrupted data.

If the checksum computed by the receiving end doesn’t match the checksum in the frame check sequence field, the data is thrown out. This is because some amount of data must have been lost or corrupted during transmission. It’s then up to a protocol at a higher layer to decide if that data should be retransmitted.

Ethernet itself only reports on data integrity. It doesn’t perform data recovery.

我覺得這整串英文講得蠻清楚的,因此不多做翻譯。第一週在此結束,有錯誤歡迎指正,謝謝你的閱讀。

參考資料:

鳥哥的 Linux 私房菜

理科與藝術交織成靈魂的會計人,喜愛戲劇與攝影,但也喜歡資料科學。