Skip to main content

Posts

Paper Insights #35 - Microsecond Consensus for Microsecond Applications

Presented at Usenix OSDI '20, this influential paper from VMware Research, with contributions from EPFL (Swiss), has since received significant attention within the distributed systems community. It was authored by Marcos Aguirela , a researcher at VMware.
Recent posts

Paper Insights #34 - CRDTs: Consistency without Concurrency Control

Authored in 2009, this noteworthy paper from National Institute for Research in Computer Science and Automation was presented at the prestigious IEEE ICDCS. It introduces compelling design concepts for distributed systems.

Paper Insights #33 - Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams

Presented at SIGMOD 2013, this paper from Google details another innovation stemming from Google Ads, a platform known for its planet-scale data processing. Notable authors include Ashish Gupta , a senior engineering leader within Google Ads, and Manpreet Singh , a principal engineer at Google. The year 2013 marked a significant period for stream processing, as Google was concurrently developing MillWheel and Dataflow , foundational technologies that influenced the creation of Apache Flink and Apache Beam .

Paper Insights #32 - Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google

Napa represents the next generation of planet-scale data warehousing at Google, following Mesa . Napa is a key system for analytics workloads that stores enormous datasets for various tenants within Google. The extensive authorship of the paper underscores the collaborative effort behind its creation. This paper was presented at VLDB 2021.

Paper Insights #31 - F1 Query: Declarative Querying at Scale

We shift our focus from databases to a query engine. Google presented this paper at VLDB, the premier global database conference, in 2018. Notably, this paper has a number of authors and is incredibly dense. With so many parts, the paper only provides a high-level idea of its different components.

Paper Insights #30 - Autopilot: Workload Autoscaling at Google Scale

This paper from Google was presented at Eurosys 2020. It has a lot of statistics. However, it represents one of the most important concepts in cluster/cloud computing - scaling - and it is important to explore those concepts in system design.

Paper Insights #29 - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

This paper, presented at NSDI in 2011, comes from the UC Berkeley Systems Lab, with authorship by influential figures like Matei Zaharia (Spark's creator and CTO of  Databricks ), Ali Ghodsi  (CEO of Databricks), Scott Shenker , and Ion Stoica . UC Berkeley's Systems Lab is a powerhouse in computer systems research. Their most recent work on Sky Computing —envisioning a cloud computing marketplace—is truly groundbreaking.

Paper Insights #28 - TAO: Facebook's Distributed Data Store for the Social Graph

Following our discussion of causal consistency in COPS , this paper presents an eventually consistent database designed for graph storage. This paper was presented at USENIX ATC 2013, a prestigious venue in the field of computer science, in the year 2013.

Paper Insights #27 - Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

This work provides a strong foundation for understanding causality , both within distributed systems and more broadly. Its principles underpin systems achieving causal consistency, a powerful form of consistency that ensures high availability. Presented at SOSP 2011, this paper features contributions from prominent distributed systems researchers Wyatt Lloyd and Michael Freedman .

Paper Insights #26 - CliqueMap: Productionizing an RMA-Based Distributed Caching System

Memcached is a popular in-memory cache, but I'd like to discuss CliqueMap, Google's caching solution. Having worked closely with CliqueMap, I have a deep understanding of its architecture. One major difference from Memcached is CliqueMap's use of RMA for reads. We'll also take a closer look at RDMA, a crucial cloud technology that emerged in the 2010s.

Paper Insights #25 - Eliminating Receive Livelock in an Interrupt-driven Kernel

Jeff Mogul , a pioneering figure in computer science, authored this influential paper presented at Usenix ATC 1996. Since then, it has become a seminal work, sparking extensive discussion in academic circles. I also found it particularly engaging during Stanford's CS240, partly due to its clarity compared to other readings, but primarily because of my deep interest in networking.

Paper Insights #24 - Spanner: Google's Globally-Distributed Database

This landmark paper, presented at ODSI '12, has become one of Google's most significant contributions to distributed computing. It didn't solve the long-standing core problem of scalability of 2PC in distributed systems, rather, it introduced  TrueTime  that revolutionized system assumptions. Authored by J.C. Corbett , with contributions from pioneers like Jeff Dean and Sanjay Ghemawat , this paper effectively ended my exploration of distributed SQL databases. It represents the leading edge of the field.