Large-Scale Cluster Management and Operations Automation
This article discusses core issues in large-scale cluster operations automation, including automatic fault detection, automatic remediation…
All posts in reverse chronological order.
13 Entries
This article discusses core issues in large-scale cluster operations automation, including automatic fault detection, automatic remediation…
This article outlines the design goals of Microsoft's Autopilot cluster management system, as well as its mechanisms for machine lifecycle …
Starting from the CAP theorem and ACID properties, this article discusses the trade-offs among consistency, availability, and transaction d…
This article summarizes common issues in hash tables, such as collisions and resizing, and discusses advanced topics such as perfect hashin…
This article summarizes a practical subset of C++ features commonly used in systems programming, including recommendations on RAII, smart p…
This post explains how the Anna key-value store uses the actor model and lattice-based conflict resolution to achieve high performance and …
Using lower_bound as an example, this article explains how to derive and verify a binary search implementation with safety and liveness ana…
This article summarizes common design methods for single-node storage engines from the perspectives of data and index layout, hash tables, …
This article summarizes the architectural design of Facebook F4 for warm data object storage, along with its erasure coding and cross-data-…
This post summarizes the design and optimization ideas behind Facebook Haystack's single-machine object storage engine for hot photo storag…
This article introduces Raft's leader election and log replication mechanisms, and discusses how it differs from Paxos in understandability…
This article summarizes Paxos's core safety properties, proposal process, and the basic derivation of the single-decree consensus problem.
Reviews the consistency, replication, conflict handling, and routing designs in the Amazon Dynamo paper, and summarizes the key engineering…
1 Entries
This article reviews three common uses of const in C++, analyzes its semantic confusion and limitations, and discusses alternatives such as…
2 Entries
Many people do not understand that different programming languages have different expressive power. That is why, after assembly language, w…
In practice, it is sometimes necessary to detect in advance whether arithmetic overflow will occur. C# provides the checked keyword to solv…
6 Entries
This question often comes up in interviews, and real-world projects also frequently need to address the same issue: how to determine whethe…
Overloading and overriding are two concepts frequently tested in written exams and interviews. These two concepts differ from the concepts …
Interviews often ask about the three major characteristics of object-oriented programming, but no single book explains all three thoroughly…
In a binary search tree, one node violates the properties of a binary search tree. Find that node.
Images exported from PDF files with ImageMagick are usually not clear enough. Because GhostScript can parse the underlying PDF data, it can…
This article introduces what tail recursion is, what CPS is, and how to apply the first two concepts to convert recursion into loops.
1 Entries
This article discusses the meaning of reference types through a C++ written-test question, and how to define types with polymorphic behavio…