Data Structure

A Summary of Intrusive Linked Lists in C++

Summarizing the main pros and cons of intrusive linked lists and a reference implementation in C++

Published

Recently I started looking into performance-related optimizations and came across intrusive linked lists, so I am writing a brief summary here.

First, let me briefly explain what an intrusive linked list is, so readers can quickly decide whether they need to keep reading. A common non-intrusive linked list looks like this:

cxx
template <typename T>
struct ListNode {
  ListNode* prev;
  ListNode* next;
  T value;
};

In other words, a non-intrusive linked list is implemented by having the linked-list node contain the data. An intrusive linked list is implemented in the opposite way: the business data structure contains the linked-list node structure:

cxx
template <typename T>
struct IntrusiveListNode {
  IntrusiveListNode* prev;
  IntrusiveListNode* next;
  T* owner;
};

struct UserData {
  // ...
  InstruiveListNode list;
};

A common use case for intrusive linked lists is as an alternative to an approach such as std::list<T*>; other scenarios are generally not suitable for intrusive linked lists. For example, when implementing an LRU, you need to add an entry to both a HashMap and a List, using the former to provide $O(1)$ lookup complexity and the latter to maintain recency order. Because a value object cannot belong to two non-intrusive containers at the same time, one of the containers must store pointers or iterators. In this case, once you have an Entry, you can operate on the linked list that contains that Entry, which is very convenient and also relatively efficient.

Benefits of Intrusive Lists

For a comparison of intrusive and non-intrusive approaches, see https://www.boost.org/doc/libs/1_79_0/doc/html/intrusive/intrusive_vs_nontrusive.html

Generally speaking, developers prefer to use non-intrusive implementations first. This is because intrusive implementations require coupling some logic into application code, which makes them less appealing. However, in the scenarios mentioned in the background introduction, intrusive implementations have significant benefits, which is why they are widely used. Here we will no longer emphasize the differences between intrusive and non-intrusive approaches, and will mainly consider the advantages of intrusive linked lists.

Better Data Locality

When traversing std::list<T*>, you still need to dereference T* before you can access the data inside T. In an intrusive linked list, however, next and the data inside T are stored together, so no extra dereference is needed. Also, because the memory layout is colocated, you get better Data Locality.

A Friendlier API

With an intrusive linked list, once we have the data, we can remove this node from the list without first locating its iterator and then finding the corresponding container to erase that iterator.

Managing Lifetimes Independently of Containers

The main use case is when the same object needs to be shared across multiple containers, such as the LRU implementation scenario mentioned in the background introduction, or when the same data needs to be added to multiple linked lists. Therefore, the fact that intrusive linked lists require users to manage the lifetime of data nodes themselves becomes an advantage here.

Analysis of Some Implementations

Linux #

https://github.com/torvalds/linux/blob/v5.18/include/linux/list.h

As its comments state, it implements a circular doubly linked list.

circular doubly linked list

The definition of the intrusive linked-list structure is at https://github.com/torvalds/linux/blob/v5.18/include/linux/types.h

c
struct list_head {
  struct list_head *next, *prev;
};

The related methods for using it are at https://github.com/torvalds/linux/blob/v5.18/include/linux/list.h

For usage, take the scheduler module as an example:

c
// https://github.com/torvalds/linux/blob/v5.18/kernel/sched/sched.h#L376
struct task_group {
  // Omit some business logic
  struct list_head list;
};

// https://github.com/torvalds/linux/blob/v5.18/kernel/sched/core.c

/*
 * Default task group.
 * Every task in system belongs to this group at bootup.
 */
struct task_group root_task_group;
LIST_HEAD(task_groups);

list_add(&root_task_group.list, &task_groups);

The question here is: since the pointers in struct list_head are also of type struct list_head, how do we get the user type’s data? Since all fields inside a C struct are arranged according to a defined layout, we can use offsetof to compute the offset, find the starting position of the user struct, and then perform a type cast.

c
// https://github.com/torvalds/linux/blob/v5.18/include/linux/list.h#L519

/**
 * list_entry - get the struct for this entry
 * @ptr:        the &struct list_head pointer.
 * @type:       the type of the struct this is embedded in.
 * @member:     the name of the list_head within the struct.
 */
#define list_entry(ptr, type, member) \
        container_of(ptr, type, member)

// https://github.com/torvalds/linux/blob/v5.18/include/linux/container_of.h#L17

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:          the pointer to the member.
 * @type:         the type of the container struct this is embedded in.
 * @member:       the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member) ({                            \
        void *__mptr = (void *)(ptr);                                 \
        static_assert(__same_type(*(ptr), ((type *)0)->member) ||     \
                      __same_type(*(ptr), void),                      \
                      "pointer type mismatch in container_of()");     \
        ((type *)(__mptr - offsetof(type, member))); })

Note that offsetof only works on Standard Layout types. For example, it is not applicable in virtual inheritance scenarios. Therefore, you must be very careful when using this technique in C++.

https://stackoverflow.com/questions/1129894/why-cant-you-use-offsetof-on-non-pod-structures-in-c

Doom3 #

https://github.com/Edgarins29/Doom3/blob/d80c4d8341a35a8d9bc71324dcfc2be583295c03/neo/idlib/containers/LinkList.h

We have already introduced how Linux implements an intrusive linked list, and we pointed out that this method has some limitations for C++ and cannot be used with non-standard-layout data structures. In most cases we will not encounter such limitations, but what should we do if we do? The Doom codebase also has a similar implementation, which solves this problem by adding an extra owner member.

cxx
template< class type >
class idLinkList {
 public:
  // omitted

 private:
  idLinkList* head;
  idLinkList* next;
  idLinkList* prev;
  type*       owner;
};

Usage looks roughly like this:

cxx
class idEntity {
  idLinkList<idEntity> spawnNode;  // for being linked into spawnedEntities list
};

idEntity::idEntity() {
  spawnNode.SetOwner( this );
}

/*
================
idLinkList<type>::InsertBefore

Places the node before the existing node in the list.  If the existing node is the head,
then the new node is placed at the end of the list.
================
*/
template< class type >
void idLinkList<type>::InsertBefore(idLinkList &node) {
  Remove();

  next       = &node;
  prev       = node.prev;
  node.prev  = this;
  prev->next = this;
  head       = node.head;
}

/*
================
idLinkList<type>::AddToEnd

Adds node at the end of the list
================
*/
template< class type >
void idLinkList<type>::AddToEnd(idLinkList &node) {
  InsertBefore(*node.head);
}

ent->spawnNode.AddToEnd(spawnedEntities);

Boost Intrusive #

https://www.boost.org/doc/libs/1_79_0/doc/html/intrusive/list.html

Boost’s intrusive containers make many considerations for generality, which makes the code especially complicated. If your scenario is relatively simple, I suggest being cautious about using Boost intrusive containers directly. If you need to mix multiple types of non-intrusive containers (linked lists, hash tables, B-trees, and so on), then consider whether to use Boost.

Boost provides two ways to build intrusive containers. One is similar to the approach mentioned above: using a member hook. Its drawback is that it cannot handle virtual inheritance. The other is to use a base hook, injecting the related variables for the intrusive container through inheritance, and then distinguishing different intrusive containers by configuring different tags during inheritance.

Overall, I feel that Boost’s approach is cleaner than the method above that forcibly adds an owner field. This is because users must use member hooks for Standard Layout objects and base hooks for non-Standard Layout objects. A base hook can access the this pointer, so it does not need an owner field to store this.

Google QUICHE

https://github.com/google/quiche/blob/977f5bf82a47fb833ab8e5a5cea2c8a593fb6add/quiche/spdy/core/spdy_intrusive_list.h

This implementation is similar to Boost’s base hook, but because it does not need to handle as many cases, its implementation is much “cleaner”, and at a glance it seems more compatible with the STL.

LLVM #

https://llvm.org/docs/ProgrammersManual.html?highlight=ilist#llvm-adt-ilist-h

https://github.com/llvm/llvm-project/blob/llvmorg-14.0.4/llvm/include/llvm/ADT/ilist.h

This implementation has the same idea as Boost’s base hook. However, it extracts some logic into ilist_traits, and it also handles whether to set up sentinel elements.

Other Thoughts #

Unrolled linked lists #

https://www.wikiwand.com/en/Unrolled_linked_list

Not Using Linked Lists #

vector plus deletion markers, with delayed compaction. However, it cannot guarantee that iterators remain valid.

References #

  1. https://www.boost.org/doc/libs/1_79_0/doc/html/intrusive/intrusive_vs_nontrusive.html
  2. https://www.gamedeveloper.com/programming/in-depth-intrusive-lists
  3. https://news.ycombinator.com/item?id=8795745 They moved away from intrusive lists to vectors in the BFG edition for performance reasons.
  4. https://stackoverflow.com/questions/1129894/why-cant-you-use-offsetof-on-non-pod-structures-in-c