Skip to main content

Posts

Paper Insights #38 - Availability in Globally Distributed Storage Systems

In data centers, failures are inevitable. The jobs are relatively easy to restore upon failures - as binaries can simply be executed on different machines. However, protecting data is more complex. How do we ensure data availability during outages, such as a single machine becoming unavailable or, in worse cases, an entire region becoming unreachable due to network faults? This is extremely crucial for organizations like cloud providers, as maintaining user trust depends heavily on the high availability of data. This insight will address these challenges and the solutions used to ensure data resilience. This paper, published by Google in 2010 and presented at Operating System Design and Implementation (OSDI) 2010, is quite challenging due to its highly mathematical nature. The paper covers two independent main topics: The first section examines the availability and Mean Time To Failure  of individual hardware components like disks and nodes . It also delves into failure bursts a...
Recent posts

Paper Insights #37 - The Honey Badger of BFT Protocols

In the previous insights, we explored several consensus algorithms; however, they were all designed for non-byzantine faults. In this insight, we will focus on protocols designed for public or permissioned environments where Byzantine faults can occur. Thousands of researchers have explored this complex field, motivated by its critical impact on blockchain and cryptocurrency economics. Developing solutions here is particularly challenging, as the literature demands a deep understanding of logical constructs and advanced cryptography. This paper was presented at ACM Special Interest Group on Security, Audit and Control (SIGSAC) 2016, a premier conference in computer security. Authored by Andrew Miller et al. and affiliated with leading global universities, the paper is inherently technical and academically focused. In this insight, we will first revisit the different failure types and network models in distributed systems. We will also revisit consensus and briefly summarize the di...