Intrusive Data Structures in C and C++

11 minute read Published: 2025-07-22

This article discusses the benefits and shortcomings of intrusive data structures and the implementation of a type-safe container_of in c++.

Table of Contents

Basic Data Structure Layout

We know from basic data structures 101 that we typically have something like the following. Let's take a doubly linked list as the canonical example using C pseudocode:

struct udata{
    uint32_t id;
    uint64_t value;
};

struct list_node{
    list_node *next;
    list_node *prev;

    void *data;  /* udata* */
};

This clearly shows that a doubly-linked list has a pointer to the next node, a pointer to the previous node and a pointer to the data. The links themselves are not actually the subject of this article. Rather, it is the way the data is accessed.

The problem with the above is it would normally incur two dynamic allocations: one to allocate the list_node, another to allocate the data. This is not ideal since it introduces undesirable indirection ('pointer-chasing'): to get to a field in the data (assuming data points to a dynamically-allocated udata instance) we would have to dereference a list_node pointer, and a data (after appropriate casting).

C++ of course, unlike C, makes it possible to avoid this shortcoming (along with some others as well):

template<typename T>
struct list_node{
    list_node *next = nullptr;
    list_node *prev = nullptr;
    T data;
};

Minutiae aside, the thing to notice here is that data is contained by the linked list node. The linked list node is a container of user data. This is how C++ STL data structures ('containers') are designed e.g. std::list, std::unordered_map etc.

This is a very intuitive way of organizing data. Not only that, but it fits naturally within the C++ ecosystem. For example, one of the elements at the core of modern C++ design is RAII -- resource acquisition is initialization. In this case, removing (erasing) an element from a data structure will invoke the destructor of that object. If the object is a std::unique_ptr, the object's dyanamic memory gets deallocated. No need for C-like manual memory management in most cases. Smart pointers, in other words, and other such constructs relying on RAII integrate naturally with STL containers.

However, there are also drawbacks:

Intrusive Data Structures

While STL containers embed user data inside, this relationship can in fact be reversed: instead of a linked data structure node containing the user data, the user data can instead contain one or more linked data structure nodes.

struct list_node{
    list_node *next;
    list_node *prev;
};

struct udata{
    uint32_t id;
    uint64_t value;
    list_node list;
};

This accomplishes the same goal: a udata object is linked into a doubly-linked list. This is called an intrusive data structure, since the user data object itself embeds a linked data structure node.

What benefits could this sort of layout possibly bring? As it turns out, quite a few. In fact, the intrusive approach is in many ways a mirror to the traditional approach. It excels where the traditional approach falls short; but also vice versa.

Some of the advantages should be immediately apparent:

However, there is also a set of drawbacks.

First -- how do we actually get the user data object from a node? We could of course do the following:

template<typename T>
struct list_node{
    list_node *next = nullptr;
    list_node *prev = nullptr;
    T *parent = nullptr;
};

But this means every single linked data structure node must pay with a pointer's worth of overhead. Imagine we have the following:

struct udata{
    struct list_node lista;
    struct list_node listb;
    struct list_node listc;
    struct list_node listd;
    struct list_node liste;
};

If every single one of those nodes required such a back-reference to the parent container, suddenly this would not look very efficient. As we will see later, however, the pointer penalty is in fact unnecessary.

Second, intrusive data structures are by definition non-owning: they do not contain the user data object so they do not own it. Instead they themselved are contained and owned by the user data object. This reversal however is backward to how data structures are used in C++. If you erase from an intrusive linked list, the element user data object does not get destructed. It is simply unlinked. This means there is normally some other owning structure that the lifetime of the user data object is bound to, whether this be an STL container, the stack, or smart or raw pointers to dynamic memory stored in some registry.

This requires care in ensuring the intrusive data structure does not hold onto references to the destructed object, risking use-after-free memory violations. When the user data object is erased from the owning structure, it must be unlinked from any non-owning intrusive data structure as well.

The above makes intrusive data structures (due to their efficiency, performance, low-overhead, and non-owning semantics) particularly suited to implementing multidimensional views of the same user data set. For example, indices in a database giving different sorts.

Boost.Intrusive is a boost library1 that implements a collection of such intrusive data structures. Their documentation provides interesting coverage of the pros and cons of intrusive and non-intrusive data structures as well as their relative performance characteristics.

The Linux Kernel container_of

container_of solves one of the problems of intrusive containers pointed out earlier. Given a pointer to a linked data structure node, we want to transform it into a pointer to user data, remembering that the node itself is contained within the user data object, and does not store a back pointer to it.

template<typename T>
struct list_node{
    list_node *next = nullptr;
    list_node *prev = nullptr;

    // we do NOT have this.
    // T *parent = nullptr;
};

This is a Linux Kernel macro ubiquitous in Linux kernel code2.

It looks like this:

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:	the pointer to the member.
 * @type:	the type of the container struct this is embedded in.
 * @member:	the name of the member within the struct.
 *
 * WARNING: any const qualifier of @ptr is lost.
 */
#define container_of(ptr, type, member) ({				\
	void *__mptr = (void *)(ptr);					\
	static_assert(__same_type(*(ptr), ((type *)0)->member) ||	\
		      __same_type(*(ptr), void),			\
		      "pointer type mismatch in container_of()");	\
	((type *)(__mptr - offsetof(type, member))); })

If this is the first time you have seen this it may well look incomprehensible. In actuality it is simple enough to understand. This relies on the offsetof macro from the C standard library that given the name of a structure (type) and the name of a member of that structure, it returns the offset of the member from the start of that structure3, while taking into account any implementation-specific details. Given this offset, and given a pointer to such a member field, basic pointer arithmetic (subtracting the offset from the pointer) will produce a pointer to the structure containing the member field. In other words, this macro turns a pointer to a member object into a pointer to the container of that object.

Not only is the macro itself seen everywhere in Linux kernel code, but it is also used in arguably the best-known example of an intrusive data structure: the Linux kernel's intrusive circular doubly-linked list.

The macro itself has been primarily popularized as an idiom/pattern by the Linux Kernel but can also be found in BSD kernel code even though the BSD kernels have traditionally been more reliant on macro code expansion for their data structures rather than instrusion. Both of these can be seen in the FreeBSD kernel queue implementations for example.

I implemented a few such examples of intrusive data structures in C myself a few years ago, including singly-linked lists, stacks, AVL tree, etc.

The C++ container_of

Perhaps somewhat unexpectedly, container_of can in fact be fully implemented in c++ in a more type_safe way and without using any macros at all.

The idea is the same: given a pointer to a member object, we can get a pointer to the containing object through basic pointer arithmetic by subtracting an offset from the member object pointer.

There is an important limitation, however: this only works (reliably/safely) for standard-layout classes. Beyond that we are not guaranteed to be able to get back the pointer to the container by simply applying a pointer offset subtraction. For example, in the case of virtual inheritance there is a virtual table (vtable) and possibly pointer indirection involved rather than a simple pointer offset adjustment.

With this said, standard-layout classes in C++ are quite a bit more than simple C-structs, so the limitation is not quite as restrictive as it sounds. For example, you can still have constructors, member functions, and overloads. On the other hand it is easy for standard-layout-ness to break with small changes. For example a standard-layout class inheriting from a standard-layout class can become non-standard-layout, easily demonstrable through a static_assert.

If the standard-layout restriction proves limiting, one can move the complexity to a non-standard layout class that inherits from a simple CRTP standard-layout class, which will in turn be used to store intrusive data structure links. This enables arbitrary complexity, and the static downcast is cheap and always safe thanks to CRTP. Here's a quic example:

template<typename T>
struct base{
    dlnode taskq;
    dlnode txq;
};

class complexclass: public base<complexclass>{
private:
    std::mutex_t m_mtx;
    asio::steady_timer m_timer;
    // ... other complex fields
};

In c++, the container_of mechanism can be implemented by using the somewhat arcane pointer-to-member. This allows us to compute an offset in much the same way as offsetof, but without resorting to any macros. Once we have the offset, we can then perform the offset subtraction performed in the usual Linux/BSD container_of to recover a pointer to the container.

An example of a c++ container_of implementation can be found here, and is reproduced below:

// intrusive.hpp

#pragma once

#include <cstdlib>
#include <type_traits>

namespace tarp {

// Non-standard layout objects are not supported because in their
// case we cannot necessarily get a pointer to the container from
// a pointer to a member simply by adjusting an offset.
// (Instead, conceivably multiple indirections may be required, etc).
template<typename Parent, typename Member, Member Parent::*Ptr>
struct is_safely_offsettable {
    static constexpr bool value =
      std::is_object_v<Member> &&
      std::is_member_object_pointer_v<decltype(Ptr)> &&
      std::is_standard_layout_v<Member> && std::is_standard_layout_v<Parent>;
};

template<typename Parent, typename Member, Member Parent::*Ptr>
[[maybe_unused]] static inline constexpr bool is_safely_offsettable_v =
  is_safely_offsettable<Parent, Member, Ptr>::value;

template<typename Parent, typename Member, Member Parent::*Ptr>
concept SafelyOffsettable = is_safely_offsettable<Parent, Member, Ptr>::value;

//

template<typename Parent, typename Member, Member Parent::*member_ptr>
constexpr std::ptrdiff_t member_offset() {
    static_assert(is_safely_offsettable<Parent, Member, member_ptr>::value,
                  "Unsafe use of member_offset: illegal layout");

    // Compute offset using pointer-to-member.
    // For the ->* syntax, see
    // https://en.cppreference.com/w/cpp/language/operator_member_access.html#Built-in_pointer-to-member_access_operators.
    // What we are doing here is:
    // 1. cast 0 (nullptr) to a Parent*. In other words, we imagine a Parent
    // exists that starts at address 0.
    // 2. We take the address of the member pointer, residing at some offset
    // inside this parent object. Because the Parent object starts at address 0,
    // the address of the member in this _particular_ case gives us
    // the **offset** in Parent of that member in the _general_ case.
    // Note we do not dereference any pointers. E.g. ->*member_ptr here is
    // in an unevaluated context (because it is the operand of the unary &
    // address-of operator). Since the reinterpret_cast result is never
    // dereferenced, the behavior is not undefined.
    return reinterpret_cast<std::ptrdiff_t>(
      &(reinterpret_cast<Parent *>(0)->*member_ptr));
}

// Get a pointer to the container (parent) of Member.
//
// Given a pointer to a member field of a given object,
// return a pointer to the containing object itself.
// If the input pointer is nullptr, nullptr is returned.
//
// NOTE:
// This accomplishes the same purpose as the classic container_of
// implementation in C (e.g. see libtarp/include/tarp/container.h
// and the notes there), often seen in kernel code.
// However, the C-based container_of implementation relies on macros
// and the use of offsetof. However, littering a c++ codebase with
// such macros is highly undesirable. As it turns out, however, the
// same thing is in fact achievable in c++ --- in a type safe manner
// and without resorting to any macros. Instead, the somewhat
// arcane pointer-to-member provides us the mechanism for calculating
// the required offset used to get back the pointer of the containing
// object (as mentioned, this is only safe for standard-layout types).
//
// NOTE: member_ptr uses pointer to member syntax,
// see https://en.cppreference.com/w/cpp/language/pointer.html.
template<typename Parent, typename Member, Member Parent::*member_ptr>
Parent *container_of(Member *member) {
    // Subtract to get parent pointer
    const auto member_addr = reinterpret_cast<char *>(member);
    return reinterpret_cast<Parent *>(
      member_addr - member_offset<Parent, Member, member_ptr>());
}

// Get a pointer to the container (parent) of Member.
// See non-const overload for details.
template<typename Parent, typename Member, Member Parent::*member_ptr>
const Parent *container_of(const Member *member) {
    const auto member_addr = reinterpret_cast<const char *>(member);
    return reinterpret_cast<const Parent *>(
      member_addr - member_offset<Parent, Member, member_ptr>());
}

};  // namespace tarp

An example

What does using this c++ container_of look like in practice? It's actually very convenient. Unlike the macro-hell the FreeBSD kernel has to deal with, here it can be seamlessly combined with templates.

A few demonstrative bits and pieces from an intrusive doubly linked list implementation are shown below.

// intrusive doubly linked list node.
// Any data structure that wants to be linked into
// one or more intrusive doubly linked lists must embed
// one or more nodes of this type.
struct dlnode {
    bool is_linked() const { return prev != nullptr || next != nullptr; }

    struct dlnode *next = nullptr;
    struct dlnode *prev = nullptr;
};

// Intrusive doubly linked list template class.
// This is templated in order to avoid having to
// manually specify container_of calls everywhere which
// would be both more inconvenient and more error prone.
//
// -> Parent
// The container type that embeds a dlnode which makes it
// part of this list.
//
// -> MemberPtr
// A pointer to the dlnode member of Parent (pointer to member).
//
// These 2 parameters are used internally in container_of calls.
template<typename Parent, auto MemberPtr>
class dllist final {
public:
    // Given a pointer to a node, get a pointer to its parent container.
    Parent *get_container(dlnode *node) {
        return tarp::container_of<Parent, dlnode, MemberPtr>(node);
    }

    // Append the element to the back of the list.
    void push_back(Parent &);

    // Remove and return the last element in the list.
    // Null if the list is empty.
    Parent *pop_back();
...

private:
    void push_back(dlnode &);
    dlnode *pop_back_node();

...
};

template<typename Parent, auto MemberPtr>
void dllist<Parent, MemberPtr>::push_back(dlnode &node) {
    node.next = nullptr;
    node.prev = m_back;
    m_back = &node;
    if (m_count == 0) m_front = m_back;
    else m_back->prev->next = m_back;
    m_count++;
}

template<typename Parent, auto MemberPtr>
void dllist<Parent, MemberPtr>::push_back(Parent &obj) {
    auto &node = obj.*MemberPtr;
    push_back(node);
}

template<typename Parent, auto MemberPtr>
dlnode *dllist<Parent, MemberPtr>::pop_back_node() {
    dlnode *node = m_back;

    switch (m_count) {
    case 0: return nullptr;
    case 1: m_front = m_back = nullptr; break;
    case 2:
        m_front = m_back = node->prev;
        m_front->prev = m_front->next = nullptr;
        break;
    default:
        m_back = node->prev;
        m_back->next = nullptr;
        break;
    }
    m_count--;
    return node;
}

template<typename Parent, auto MemberPtr>
Parent *dllist<Parent, MemberPtr>::pop_back() {
    return this->get_container(pop_back_node());
}

And here's a small example of using it:

/*
 * Push large number of nodes;
 * With each pushed node:
 *  - rotate the list 100 times to the back/bottom
 *  - upend the whole list
 *  - rotate the list 100 times to the front/top
 */
TEST_CASE("perf") {
    constexpr size_t num = 80 * 1000;

    dllist q;
    std::vector<std::unique_ptr<testnode>> owner;

    for (size_t i = 0; i < num; i++) {
        auto node = testnode::make(i);
        q.push_back(*node);
        owner.push_back(std::move(node));
        q.rotate(1, 100);
        q.upend();
        q.rotate(-1, 100);
    }

    for (const auto &x : q){
        std::cerr << "x.num=" << x.num << std::endl;
    }
    
    for (size_t i = 0; i < num; ++i) {
        q.pop_front();
    }

    q.clear();
}

  1. https://www.boost.org/doc/libs/1_75_0/doc/html/intrusive.html

  2. https://www.kernel.org/doc/Documentation/driver-model/design-patterns.txt

  3. https://man7.org/linux/man-pages/man3/offsetof.3.html