Skip to main content

Preface

Welcome to Pico, your guide to understanding computer systems through the lens of influential research papers!

Pico delves into the core of modern computing by exploring key areas like datastores, big data, operating systems, and distributed systems. Unlike AI summaries, the articles here are crafted from my extensive notes taken while reading hundreds of research papers and engaging in numerous discussions within academic and industry circles. The presented papers represent a blend of academic and industry perspectives, offering insights into both theoretical foundations and practical implementations. The academic papers would emphasize theory and scientific rigor, while industry papers would focus on practical considerations and implementation details.

To maximize your learning, I suggest following this order for each article:
  1. Read the research paper at least twice from start to end (including any appendices).
  2. Then, read the corresponding Pico article.
  3. Finally, read the research paper one more time.

The key point is that thoroughly reading the papers is essential; the notes merely offer supplementary explanations. Without engaging with the papers themselves, these clarifications will have limited impact on deepening your understanding. The iterative approach will help you progressively build your understanding, moving from the original source to a guided explanation and back again. Though not strictly required, but having a basic foundation in computer systems concepts, perhaps from your college studies, will be beneficial. A quick refresher might be a good starting point.

Keep in mind that the ideas presented in these papers are interconnected. You'll often find that concepts from different papers relate to and build upon each other. Don't hesitate to revisit previous articles as new connections emerge. It's perfectly normal if some details aren't immediately clear. By working through all the articles, you'll gradually develop a robust and interconnected understanding of these critical concepts. Expect to reread articles as your knowledge deepens. Ultimately, you'll gain a strong, holistic grasp of the subject matter.

Pico is not intended to replace system design courses. Instead, it aims to offer something more profound. If you're just beginning your journey into system design, I would highly recommend prioritizing the reading of these foundational papers over enrolling in online courses. Here's why:
  1. Detail-Oriented: Reading the original papers provides a complete picture, explaining not just the "what" and "how" of a system, but also the crucial "why" behind its design choices. Research papers offer a deep dive into system design, evaluation, and related work, backed by the rigorous review of leading researchers. While their detailed nature can initially make them challenging to navigate, the effort pays off. As you become more comfortable with reading them, your understanding of the system will align with the insights of its very creators.
  2. Beyond Mediocrity: Engaging with these fundamental ideas will push you beyond surface-level understanding and foster a deeper, more nuanced perspective.
  3. Critical Thinking Enhancement: Analyzing research papers actively cultivates your critical thinking skills, enabling you to evaluate and synthesize complex information.
  4. Reading Habits and Patience: This process will significantly improve your reading comprehension and develop patience, essential skills for any technical discipline.
If you're new to reading research papers, anticipate spending around one week on each paper, assuming a commitment of 1-2 hours of reading per day. Keep in mind that some articles will involve reading multiple papers. Factoring in off-weeks and busier periods, completing all the material will likely take approximately one year (or 800 hours). While this time commitment may seem significant, the long-term benefits to your understanding and career will undoubtedly be worth the investment. 

To find answers to questions about these papers, I would recommend starting with a Google search, as many queries can be resolved by reading relevant web articles. In fact, exploring other people's explanations of the same paper from forums like StackOverflow would provide valuable alternative perspectives.

For specific content-related questions, please leave a comment, and I will respond as soon as possible. If that doesn't yield the information you need, feel free to reach out to me directly, and I will do my best to answer your questions. While I've read and discussed these papers extensively, I don't claim to be an expert. My familiarity comes from repeated readings and group discussions. Therefore, please understand that I may not be able to answer every detailed question about specific paragraphs within the papers.

Lastly, Pico is free, reflecting my belief that high-quality education should be freely available on the internet. I often find that commercially driven education caters quick success at the cost of depth.

When you are ready, start here! There is no better time to start than now. I wish you the best in the journey.

-- Sushant

Popular Posts

Paper Insights #25 - CliqueMap: Productionizing an RMA-Based Distributed Caching System

Memcached is a popular in-memory cache, but I'd like to discuss CliqueMap, Google's caching solution. Having worked closely with CliqueMap, I have a deep understanding of its architecture. One major difference from Memcached is CliqueMap's use of RMA for reads. We'll also take a closer look at RDMA, a crucial cloud technology that emerged in the 2010s. Paper Link Let's begin with some basic concepts. Network Interface Card (NIC) The NIC facilitates data reception and transmission. Understanding its operation requires examining the fundamental interaction between the CPU and memory. CPU <-> Memory Communication In a Von Neumann Architecture , the CPU and memory are core components, enabling Turing computation. Their communication relies on the system bus (e.g. PCIe ), a set of electrical pathways connecting the CPU, memory, and I/O devices. The system bus comprises three primary logical components: Data Bus : Bidirectional, carrying the actual data being tran...

Paper Insights #26 - Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

This work provides a strong foundation for understanding causality , both within distributed systems and more broadly. Its principles underpin systems achieving causal consistency, a powerful form of consistency that ensures high availability. Presented at SOSP 2011, this paper features contributions from prominent distributed systems researchers Wyatt Lloyd and Michael Freedman . Paper Link Let's begin with some basic concepts. Causal Ordering In 1978, Leslie Lamport published Time, Clocks, and the Ordering of Events in a Distributed System , a seminal paper that significantly impacted distributed system design. This work, alongside Paxos and TLA+ , stands as one of Lamport's most influential contributions. A fundamental challenge in distributed systems is clock synchronization . Perfect synchronization is unattainable, a fact rooted in both computer science and physics. However, the goal isn't perfect synchronization itself, but rather the ability to totally order even...

Paper Insights #24 - Spanner: Google's Globally-Distributed Database

This landmark paper, presented at ODSI '12, has become one of Google's most significant contributions to distributed computing. It didn't solve the long-standing core problem of scalability of 2PC in distributed systems, rather, it introduced  TrueTime  that revolutionized system assumptions. Authored by J.C. Corbett , with contributions from pioneers like Jeff Dean and Sanjay Ghemawat , this paper effectively ended my exploration of distributed SQL databases. It represents the leading edge of the field. Paper Link I would highly recommend reading the following before jumping into this article: 1.  Practical Uses of Synchronized Clocks in Distributed Systems where I introduced why clock synchronization is necessary but not sufficient for external consistency. 2.  A New Presumed Commit Optimization for Two Phase Commit where I introduced two-phase commits (2PC) and how it is solved in a distributed system. 3.  Amazon Aurora: Design Considerations for High Th...