Paper Note: [EDBT'16] Designing Access Methods: The RUM Conjecture
This post introduces the RUM Conjecture's view on the trade-offs among read amplification, write amplification, and memory overhead, and discusses its implicat…
The RUM Conjecture states that among Read Overhead, Update Overhead, and Memory (or Storage) Overhead, optimizing any two simultaneously requires degrading the remaining one as the cost. The paper’s authors further explain that optimization within a certain range—before reaching the optimum—does not follow the RUM Conjecture; however, after a certain threshold is reached, further optimization requires paying a cost. Here, Update Overhead considers only write amplification, not the cost of addressing during writes.
The RUM Conjecture: Read, Update, Memory – Optimize Two at the Expense of the Third.
designing access methods that set an upper bound for two of the RUM overheads, leads to a hard lower bound for the third overhead which cannot be further reduced.
The paper’s authors explain that proposing this conjecture does not mean there is nothing left for everyone to do. Rather, it means that after reaching the optimization threshold, if we do not want to pay the cost of degrading one performance metric, we should consider approaches such as adaptive adjustment, balancing these three important parameters according to the characteristics of the data.
RUM-Aware Access Method Design. Accepting that a perfect access method does not exist, does not mean the research community should stop striving to improve; quite the opposite. The RUM Conjecture opens the path for exciting research challenges towards the goal of creating RUM-aware and RUM-adaptive access methods.
P.S. The same authors also published nearly identical content at SIGMOD'16[2].
First, consider a few extreme cases:
\(\min(RO) = 1.0 \implies UO = 2.0 \,\text{and}\, MO \to \infty\)
\(\min(UO) = 1.0 \implies RO \to \infty \,\text{and}\, MO \to \infty\)
\(\min(MO) = 1.0 \implies RO = N \,\text{and}\, UO = 1.0\)
The first case optimizes read performance to the extreme. It is similar to the extreme case of a hash table: we enumerate the entire possible key space and allocate a fixed space for each key. In this case, storage-space usage is unbounded—or effectively unbounded in practice—because enumerating the entire key space and reserving value-sized space for each key is impractical in most cases.
The second case optimizes write performance to the extreme. Simply put, it is an append-only log of write requests. Because we never actually delete anything, the storage space is unbounded. A write request also needs to scan the entire log, so it is unbounded as well.
The third case optimizes storage space to the extreme. It is similar to the second case, except that all update operations are performed in place.
Clearly, these three extreme cases, each optimizing only one objective, are impractical. Next, consider the case of optimizing two objectives simultaneously. The paper’s authors propose the following hypothesis: in RUM, optimizing any two objectives simultaneously requires paying the cost of degrading the remaining metric.
The RUM Conjecture. An access method that can set an upper bound for two out of the read, update, and memory overheads, also sets a lower bound for the third overhead.
The RUM perspective on several common access methods is shown in RUM perspective on common access methods, and their time complexities (Big O notation) are shown in Time complexities of common access methods.


Consider this problem intuitively. Data structures optimized for read performance follow an idea similar to the first extreme case above: do more work during writes, while also paying for some redundant storage space to record auxiliary query information. Data structures optimized for write performance follow an idea similar to the second extreme case above: do less work during writes, increasing the burden during queries. To reduce the query burden, they further amortize work over time by splitting the work that should have been completed in a single write request into an upper half and a lower half, executing the lower half off the critical path and reorganizing and merging the write logs. The main idea for optimizing storage space is compression or sacrificing precision, both of which generally require losing some read and write performance at the same time.
References
[1] ATHANASSOULIS M, KESTER M, MAAS L, et al. Designing Access Methods: The RUM Conjecture[C]//International Conference on Extending Database Technology (EDBT). Bordeaux, France: 2016.
[2] ATHANASSOULIS M, IDREOS S. Design Tradeoffs of Data Access Methods[C]//Proceedings of the 2016 International Conference on Management of Data. New York, NY, USA: Association for Computing Machinery, 2016: 2195–2200.