This influential paper from Google on Content Delivery Networks (CDNs) was authored by Rupa Krishnan et. al. The paper came out in 2009, the time when computer networks were constantly being improved for the next generation of the Internet. The paper explores Content Delivery Networks (CDNs), which became hot in the subsequent decade. Several internet-giants like Google, Facebook, and Netflix rely on CDNs to deliver content to the end users.
The paper will familiarize readers with the idea of CDNs. In that process, the paper delves into several advanced network engineering concepts. In this insight, we will start with OASIS - the inspiration behind CDN architectures. Then we will delve into some networking concepts relevant to the paper and finally look into the mitigation techniques applied by the authors to reduce end-to-end latency of Google’s CDN. As a bonus, we will also go through the architecture of Coral CDN - which introduced a multi-layered architecture to CDNs.
1. OASIS
OASIS (2006) was developed by M. Freedman, K. Lakshminarayanan, and D. Mazieres. It addresses the critical challenge for Internet: locating the service replica with the lowest latency for a given client.
Prior to OASIS, clients naively pinged every service replica to determine the fastest one based on round-trip time (RTT). This was highly accurate, however, this approach suffered from excessive probing and computationally expensive comparisons.
1.1. Architecture
OASIS introduced a two-tier architecture as shown in illustration 1. OASIS maintains replicas that redirect a client to the nearest replica of any service.
Illustration 1: The OASIS architecture.
The OASIS replicas:
- Possess global membership information about all service replicas.
- Employ epidemic gossiping to detect and handle failures.
- Utilize consistent hashing for efficient replica placement.
1.2. Algorithm
Step 1
/n suffix in CIDR indicates the number of bits in the subnet mask that are set to 1. In this specific case, /8 signifies a subnet mask of 255.0.0.0. To derive the network prefix from an IP address, we need to perform a bitwise AND operation between the IP address and the subnet mask.Step 2
If multiple replicas of service reside in the same geographic region, the client probes only those replicas to select the one with the lowest RTT, significantly reducing the probe space.
2. Google CDN Architecture
Google's CDN architecture shares similarities with OASIS, but with a key distinction: it utilizes the IP prefix of the DNS server initiating the request for redirection, rather than the client's IP address.
2.1 Goal
The authors contend that solely relying on RTT (a.k.a. end-to-end path information) for redirection, may not consistently deliver the best possible Quality of Service (QoS) to the client.
They delve into various factors that can contribute to suboptimal QoS, beyond simple RTT. The paper then explores methods for identifying and mitigating these contributing factors.
Before delving deeper into their proposed techniques, it's crucial to establish a common understanding of the key network terminologies employed throughout the paper.
3. Networking Architecture
3.1. Routing
Computers within a Local Area Network (LAN), such as your school network or a cluster of machines in a data center, communicate directly. These networks typically employ network switches to efficiently forward packets to their intended destinations based on their unique Media Access Control (MAC) addresses, which operate at the Ethernet layer.
However, when a packet needs to traverse beyond the confines of a local network, routing becomes essential. By examining the destination IP address of each packet, routers consult their routing tables to determine the optimal next hop in the network, sending the packet closer to its ultimate destination. These routing tables are dynamically calculated and maintained through sophisticated distributed algorithms.
3.2. Point of Presence
A Point of Presence (PoP) is a designated entry point where end users can connect to an Internet Service Provider (ISP). Each ISP maintains a network of PoPs.
Within a single PoP, multiple routers may be present, allowing for various paths for data packets to travel from the end users to PoP.
3.3. Autonomous System
An Autonomous System (AS) is a collection of interconnected networks under the control of a single administrative entity. Each AS operates with its own distinct routing policies, determining how traffic flows within its boundaries. Common examples of entities operating ASes include ISPs, large corporations (like Google), universities, and government agencies.
Google's internal network, connecting its vast infrastructure (like the Borg cluster), functions as an AS with its own internal routing policies.Border Gateway Routers (BGP Routers) are specialized routers reside at the edges of an AS, responsible for exchanging routing information with other ASes.
3.4. Traceroute
The performance of Internet traffic can be impacted by the stability and condition of routers along the path, some of which may be older or less reliable. Traceroute is a widely used network diagnostic tool that reveals the sequence of routers traversed by packets traveling from a source to a destination. T
Traceroute operates by sending a series of Internet Control Message Protocol (ICMP) packets with increasing Time-to-Live (TTL) values as shown in illustration 2.
Illustration 2: Example working of Traceroute.
- A → X → Y → B
- B → Y → Z → X → A
Traceroute can only measure the RTT for each hop, not the one-way travel time. Thus, while traceroute effectively reveals the forward path, determining the reverse path requires more complex analysis.
4. iPlane
iPlane is a system designed to predict path properties between any two points on the Internet. iPlane uses traceroutes to all Internet prefixes from numerous vantage points (e.g., PlanetLab servers).
By analyzing the collected traceroute information, iPlane clusters routers into PoPs. Routers are grouped together if they exhibit the following characteristics:
- Respond to traceroutes with the same source IP address.
- Display similar RTT values across different vantage points.
5. Latency Cause Detection Techniques
This paper explores several techniques for detecting latency issues in Google's CDN:
1. Redirection
2. Prefix Latency Inflation
Detection: This is identified by performing traceroute analysis. For instance, if the route path is:
3. Queueing Delays
- Changes in routes throughout the day.
- Changes in the PoP paths (determined using iPlane data).
- Changes in the AS path.
6. Mitigation
The paper explores several approaches to address the observed inflated RTT issue:
- Direct Peering: Establishing direct peering connections between Google and the relevant ASes for the affected prefixes. This solution is exemplified in Case 1, where a new peering link was added between Google and PhilISP1.
- Increased Peering Capacity: Enhancing the capacity of existing peering links between Google and ASes. This is demonstrated in Case 3, where a peering connection between Google and PhilISP2 existed but suffered from insufficient capacity.
- Routing Configuration Optimization: Correcting routing configurations on border routers within Google's or the AS's network. Case 4 illustrates this approach, where Google's network administrators adjusted routing configurations to enable shorter reverse paths between Google and a JapanISP.
- Traffic Engineering Techniques: Implementing various traffic engineering approaches to optimize network traffic flow.
7. Bonus: Coral CDN
Among the related works discussed in the paper, Coral CDN stands out. While I recall using Coral CDN in the 2000s, I never delved into its internal workings. Coral CDN was developed by the authors of OASIS. It was a free service, designed to mirror web content.
Coral CDN's primary use case was to mitigate slashdotting, a phenomenon where a website, particularly smaller ones, experiences a sudden and overwhelming surge in traffic after being linked by a popular website.
The most distinctive feature of Coral CDN is its utilization of a Distributed Sloppy Hash Table (DSHT). Unlike traditional Distributed Hash Tables (DHTs) that rely on a single hash ring, DSHT employs multiple concentric Chord hash rings arranged hierarchically. DSHT can be adapted to work with other hashing protocols like Kademlia.
7.1. Interface
Coral provides a simple interface for higher-level applications:
- put (key, val, ttl): Inserts a key-value mapping with a specified TTL for the entry.
- get (key): Retrieves a subset of values associated with the given key.
7.2. Implementation
The Coral CDN implementation utilizes a three-tiered hierarchical structure, as shown in illustration 3:
- Regional Layer: Connects nodes within the same geographical region.
- Continental Layer: Connects nodes within a continent.
- Global Layer: Connects nodes globally.
Nodes have the same Id across layers, but their assigned ranges vary due to variation in the hashing modulo. During the put operation, the value is replicated across all three layers, ensuring redundancy. Conversely, during the get operation, the system prioritizes retrieving the value from the local regional layer.
8. Paper Remarks
The paper does not propose novel networking architectures; instead, it offers a robust methodology for network analysis. Furthermore, it provides significant insights into Google’s extensive global infrastructure. The analysis of queuing delays and the characterization of circuitous reverse paths are especially compelling aspects of the work.



Comments
Post a Comment