This highly influential paper on Content Delivery Networks (CDNs) was authored by Rupa Krishnan et. al, including Sushant Jain, who was listed fourth among the authors. Sushant was a valued colleague of mine at Google Ads Infrastructure, where he served as Senior Engineering Director for many years.
Before delving into the paper's concepts, which are generally straightforward to grasp, let's explore some relevant background information.
OASIS (2006)
OASIS, developed by M. Freedman, K. Lakshminarayanan, and my former Distributed Systems (CS244b) professor at Stanford, D. Mazieres, elegantly addresses the critical challenge for Internet: locating the service replica with the lowest latency for a given client.
Prior to OASIS
Clients naively pinged every service replica to determine the fastest one based on round-trip time (RTT). While highly accurate, this approach suffered from excessive probing and computationally expensive comparisons.
OASIS Architecture
OASIS introduced a two-tier architecture. OASIS maintains replicas that redirect a client to the nearest replica of any service.
The OASIS replicas:
- Possess global membership information about all service replicas.
- Employ epidemic gossiping to detect and handle failures.
- Utilize consistent hashing for efficient replica placement.
OASIS Algorithm
Step 1
/n
suffix in CIDR indicates the number of bits in the subnet mask that are set to 1. In this specific case, /8 signifies a subnet mask of 255.0.0.0. To derive the network prefix from an IP address, we need to perform a bitwise AND operation between the IP address and the subnet mask.Step 2
If multiple replicas of service reside in the same geographic region, the client probes only those replicas to select the one with the lowest RTT, significantly reducing the probe space.
Integration and Impact
OASIS was seamlessly integrated into existing protocols like DNS and HTTP, demonstrating its practicality and potential for widespread adoption.
Google CDN Architecture
Google's CDN architecture shares similarities with OASIS, but with a key distinction: it utilizes the IP prefix of the DNS server initiating the request for redirection, rather than the client's IP address.
Goal
The authors contend that solely relying on RTT (a.k.a. end-to-end path information) for redirection, may not consistently deliver the best possible Quality of Service (QoS) to the client.
They delve into various factors that can contribute to suboptimal QoS, beyond simple RTT. The paper then explores methods for identifying and mitigating these contributing factors.
Before delving deeper into their proposed techniques, it's crucial to establish a common understanding of the key network terminologies employed throughout the paper.
Networking Architecture
Routing
Computers within a Local Area Network (LAN), such as your school network or a cluster of machines in a data center, communicate directly. These networks typically employ network switches to efficiently forward packets to their intended destinations based on their unique Media Access Control (MAC) addresses, which operate at the Ethernet layer.
However, when a packet needs to traverse beyond the confines of a local network, routing becomes essential. By examining the destination IP address of each packet, routers consult their routing tables to determine the optimal next hop in the network, sending the packet closer to its ultimate destination. These routing tables are dynamically calculated and maintained through sophisticated distributed algorithms.
Point of Presence (PoP)
A PoP is a designated entry point where end users can connect to an Internet Service Provider (ISP). Each ISP maintains a network of PoPs.
Within a single PoP, multiple routers may be present, allowing for various paths for data packets to travel from the end users to PoP.
Autonomous System (AS)
An AS is a collection of interconnected networks under the control of a single administrative entity. Each AS operates with its own distinct routing policies, determining how traffic flows within its boundaries. Common examples of entities operating ASes include:
- ISPs
- Large corporations (like Google)
- Universities
- Government agencies
Border Gateway Routers (BGP Routers) are specialized routers reside at the edges of an AS, responsible for exchanging routing information with other ASes.
Traceroute
Traceroute is a widely used network diagnostic tool that reveals the sequence of routers traversed by packets traveling from a source to a destination. T
It's important to note that the performance of Internet traffic can be impacted by the stability and condition of routers along the path, some of which may be older or less reliable.
Traceroute operates by sending a series of Internet Control Message Protocol (ICMP) packets with increasing Time-to-Live (TTL) values.
- A -> X -> Y -> B
- B -> Y -> Z -> X -> A
Traceroute can only measure the RTT for each hop, not the one-way travel time. Thus, while traceroute effectively reveals the forward path, determining the reverse path requires more complex analysis.
iPlane
iPlane is a system designed to predict path properties between any two points on the Internet. iPlane uses traceroutes to all Internet prefixes from numerous vantage points (e.g., PlanetLab servers).
By analyzing the collected traceroute information, iPlane clusters routers into PoPs. Routers are grouped together if they exhibit the following characteristics:
- Respond to traceroutes with the same source IP address.
- Display similar RTT values across different vantage points.
Latency Cause Detection Techniques
This paper explores several techniques for detecting latency issues in Google's CDN:
1. Redirection
2. Prefix Latency Inflation
Detection: This is identified by performing traceroute analysis. For instance, if the route path is:
3. Queueing Delays:
- Changes in routes throughout the day.
- Changes in the PoP paths (determined using iPlane data).
- Changes in the AS path.
Mitigation
The paper explores several approaches to address the observed inflated RTT issue:
- Direct Peering: Establishing direct peering connections between Google and the relevant ASes for the affected prefixes. This solution is exemplified in Case 1, where a new peering link was added between Google and PhilISP1.
- Increased Peering Capacity: Enhancing the capacity of existing peering links between Google and ASes. This is demonstrated in Case 3, where a peering connection between Google and PhilISP2 existed but suffered from insufficient capacity.
- Routing Configuration Optimization: Correcting routing configurations on border routers within Google's or the AS's network. Case 4 illustrates this approach, where Google's network administrators adjusted routing configurations to enable shorter reverse paths between Google and a JapanISP.
- Traffic Engineering Techniques: Implementing various traffic engineering approaches to optimize network traffic flow.
Coral CDN
Among the related works discussed in the paper, Coral CDN stands out. While I recall using Coral CDN in the 2000s, I never delved into its internal workings. Developed by the authors of OASIS, Coral CDN was a free, designed to mirror web content.
Coral CDN's primary use case was to mitigate "slashdotting", a phenomenon where a website, particularly smaller ones, experiences a sudden and overwhelming surge in traffic after being linked by a popular website.
The most distinctive feature of Coral CDN is its utilization of a Distributed Sloppy Hash Table (DSHT). Unlike traditional Distributed Hash Tables (DHTs) that rely on a single hash ring, DSHT employs multiple concentric Chord hash rings arranged hierarchically. DSHT can be adapted to work with other hashing protocols like Kademlia.
Coral CDN Interface
Coral provides a simple interface for higher-level applications:
- put (key, val, ttl): Inserts a key-value mapping with a specified TTL for the entry.
- get (key): Retrieves a subset of values associated with the given key.
Hierarchical Implementation
The Coral CDN implementation utilizes a three-tiered hierarchical structure:
- Regional Layer: Connects nodes within the same geographical region.
- Continental Layer: Connects nodes within a continent.
- Global Layer: Connects nodes globally.
Nodes have the same Id across layers, but their assigned ranges vary due to variation in the hashing modulo. During the put
operation, the value is replicated across all three layers, ensuring redundancy. Conversely, during the get
operation, the system prioritizes retrieving the value from the local regional layer.
Paper Review
The paper does introduce readers to design concepts related to CDNs. Additionally, fixing the issues discovered by these methods could have a very significant impact, potentially saving Google millions of dollars in network expenditure. However, the paper does not introduce any truly novel concepts. Rather, it essentially provides a strong approach to Internet analysis. I especially loved the part where they determined the queueing delay and circuitous reverse paths.
Comments
Post a Comment