Building a Distributed Storage System from Scratch, Beginner Series: System High-Level Design

End-User API

The simplest key-value store only needs to implement Put and Get methods. However, if only these two APIs are provided, they can place a significant burden on callers in more complex scenarios. In some scenarios that require consistency guarantees, callers may even be unable to implement the required functionality on their own.

Redis is a mature and widely used NoSQL store. We can refer to Redis’s interface design and implement some of its more frequently used APIs.

Method Name

Meaning

GET

Get by key

GETSET

Set, returning old value

MGET

Get multiple

SET

Set by key

SETNX

Set if doesn’t exist

SETXX

Set if exists

MSET

Set multiple

In addition, we add Exists, which determines whether a given key exists without returning its content.

One point that requires special attention is that MSET should be atomic: either all operations succeed or all fail. MGET and MSET, as a pair of operations, should satisfy certain consistency principles. For example, fractured reads ^[1] should not occur. However, implementing this is relatively difficult, so we will not consider it for now.

Overall Architecture

The architecture of the target system is shown in System Architecture Diagram:

Figure 1. System Architecture Diagram

All user requests first enter the API Server, which then handles the system’s internal logic. The API Server periodically synchronizes data topology information with the MetadataServer, such as which DataServers hold which data, and then forwards user requests to the corresponding DataServers. The DataServer is the actual owner of the data and the actual provider of the service. The DataServer periodically synchronizes with the MetadataServer, reports its own status information to the MetadataServer, and receives control instructions from the MetadataServer.

Main Flows

User MGET Request

Figure 2. User MGET Request Sequence Diagram

When a user makes an MGET request, the request is first sent to the API Server. The API Server periodically synchronizes messages with the Metadata Server to obtain the topology of the entire cluster. This topology contains the mapping of which data is stored on which Data Servers. The API Server sends the request to the corresponding one or more Data Servers, aggregates the retrieved data, and returns the response to the user.

User PUT Request

Figure 3. User PUT Request Sequence Diagram

When a user makes a PUT request, the request is first sent to the API Server. The API Server writes the data to the corresponding Data Servers, and then asks the primary node of that data shard to commit the request. After the primary and secondary nodes coordinate internally, they return the coordinated result to the API Server. The API Server returns that result to the user.

If the value in a PUT is very small, performing primary-secondary coordination together with the control instructions can save some message exchanges. In more cases, however, the value is not that small. At this point, having the API Server perform data replication is more reasonable than having the primary node do it, because the number of API Servers grows with request load, while the number of primary nodes does not have this relationship. Therefore, this approach can reduce pressure on the primary node.

1. See the Read Atomic consistency model in Paper Notes: [CONCUR'15] A Framework for Transactional Consistency Models with Atomic Visibility.