CPython Internals: Optimization in Garbage Collection

2022-01-03

Without proper optimizations, the basic CPython garbage collector covered in previous post may result in quadratic time complexity. Two techniques are currently applied: using generations and untracking some objects during GC.

All codes in this post are based on CPython v3.10.

Generations

Most objects have a very short lifespan in reality, implicitly implies that older objects are less likely to become unreachable. To take advantage of this fact, CPython manages all objects into three generations.

Initially, all objects are placed into the generation 0, and each time they survive after GC is applied to the object, they move on to the next generation. GC is executed separately for each generation, and whenever executed, GC only scans the objects belonging the younger generations.

typedef struct _gc_runtime_state {
    /* ... */
    // all generations with supplementary info; see below
    struct gc_generation generations[NUM_GENERATIONS];
    // shortcut for the youngest generation
    PyGC_Head *generation0;

    // # of objects survived the last full collection
    Py_ssize_t long_lived_total;
    // # of objects survived all "non-full" collections
    Py_ssize_t long_lived_pending;
} GCState;

static Py_ssize_t gc_collect_main(PyThreadState *tstate,
                                  int generation,
                                  Py_ssize_t *n_collected,
                                  Py_ssize_t *n_uncollectable,
                                  int nofail) {
    // get runtime GC info
    GCState *gcstate = &tstate->interp->gc;

    /* ... */

    // update runtime info; see below.
    if (generation+1 < NUM_GENERATIONS)
        gcstate->generations[generation+1].count += 1;
    for (i = 0; i <= generation; i++)
        gcstate->generations[i].count = 0;

    // merge every younger generations and find older generation
    PyGC_Head *young, *old;
    for (i = 0; i < generation; i++)
        gc_list_merge(GEN_HEAD(gcstate, i), GEN_HEAD(gcstate, generation));
    young = GEN_HEAD(gcstate, generation);
    if (generation < NUM_GENERATIONS-1)
        old = GEN_HEAD(gcstate, generation+1);
    else  // already oldest
        old = young;

    PyGC_Head unreachable;
    deduce_unreachable(young, &unreachable);  // cycle detection on young gen

    untrack_tuples(young);          // untrack tuples; see below
    if (young != old)               // non-full collection
        gc_list_merge(young, old);  // move reachable objects to next gen
    else                            // full collection
        untrack_dicts(young);       // untrack dicts; see below

    // update runtime info; see below.
    if (young != old) {  // non-full collection
        if (generation == NUM_GENERATIONS - 2)
            gcstate->long_lived_pending += gc_list_size(young);
    } else {             // full collection
        gcstate->long_lived_pending = 0;
        gcstate->long_lived_total = gc_list_size(young);
    }

    /* ... */
}

The execution schedule is determined by runtime information stored in each generation. Whenver the criteria is met for each generation, GC is executed. Current criteria are:

Generation 0: allocation count exceeds 700.
Generation 1: GC execution count on generation 0 exceeds 10.
Generation 2 (a.k.a. full collection): GC execution count on generation 1 exceeds 10, and long_lived_pending / long_lived_total > 25%.

struct gc_generation {
    PyGC_Head head;
    int threshold;
    int count;      // generation 0: # of allocations
                    // generation n: # of collections of younger generation
};

void _PyGC_InitState(GCState *gcstate) {
    struct gc_generation generations[NUM_GENERATIONS] = {
        //                                        PyGC_Head,  threshold,  count
        // youngest
        {{(uintptr_t)_GEN_HEAD(0), (uintptr_t)_GEN_HEAD(0)},  700,        0},
        {{(uintptr_t)_GEN_HEAD(1), (uintptr_t)_GEN_HEAD(1)},  10,         0},
        {{(uintptr_t)_GEN_HEAD(2), (uintptr_t)_GEN_HEAD(2)},  10,         0},
        // oldest
    };

    /* ... */
}

// can be executed while allocation
static Py_ssize_t gc_collect_generations(PyThreadState *tstate) {
    GCState *gcstate = &tstate->interp->gc;
    Py_ssize_t n = 0;
    for (int i = NUM_GENERATIONS-1; i >= 0; i--) {
        // most oldest generation w/ count > threshold
        if (gcstate->generations[i].count > gcstate->generations[i].threshold) {

            // prevents full collection executed frequently; see below
            if (i == NUM_GENERATIONS - 1
                && gcstate->long_lived_pending < gcstate->long_lived_total / 4)
                continue;

            // thin wrapper of `gc_collect_main`
            n = gc_collect_with_callback(tstate, i);
            break;
        }
    }
    return n;
}

In CPython, objects of these generations can be examined by using gc.get_objects(generation=NUM) and GC can be manually triggered by calling gc.collect(generation=NUM).

Collecting the Oldest Generation

It has been remarked that periodically doing a full collection may entail a performance degradation (e.g., quadratic time complexity). The reason is that cost of a full collection is proportional to the total number of long-lived objects, which is virtually unbounded.

CPython solves this issue by reducing the number of full GC calls, by using another criteria that yields amortized linear performance in the total number of objects in practice.

More details can be found on the proposal from python-dev mailing list.

Delay Tracking Containers

Certain types of containers cannot participate in a reference cycle (e.g., primitive objects like integer), so untracking these objects reduces the cost of GC. Although, determining which objects may be untracked is not free.

As a general rule, instances of atomic types aren't tracked and instances of non-atomic types (e.g., containers, user-defined objects...) are. Also, some type-specific optimizations are implemented:

Tuples

All tuples except the empty tuple are tracked when created.

A tuple then can be untracked while collection if all of its contents are already not tracked. Note that it may take several cycles to untrack a tuple.

void _PyTuple_MaybeUntrack(PyObject *op) {
    PyTupleObject *t = (PyTupleObject *) op;
    Py_ssize_t i;
    for (i = 0; i < Py_SIZE(t); i++) {
        PyObject *elt = PyTuple_GET_ITEM(t, i);

        /* Tuple with NULL elements aren't fully constructed,
        don't untrack them yet. */
        if (!elt || _PyObject_GC_MAY_BE_TRACKED(elt)) return;
    }
    _PyObject_GC_UNTRACK(op);
}

Dictionaries

Empty dictionaries are untracked.
If a tracked item is inserted either as a key or value, the dictionary becomes tracked.

During a full collection, the collector will untrack any dictionaries whose contents are not tracked. Note that the dictionaries can be untracked during the full collection only, to avoid quadratic build-up time.

#define _PyDict_HasSplitTable(d) ((d)->ma_values != NULL)
void _PyDict_MaybeUntrack(PyObject *op) {
    PyDictObject *mp = (PyDictObject *) op;
    PyDictKeyEntry *ep0;
    PyObject *value;
    Py_ssize_t i, numentries;

    ep0 = DK_ENTRIES(mp->ma_keys);
    numentries = mp->ma_keys->dk_nentries;
    if (_PyDict_HasSplitTable(mp)) {  // key in ma_keys and value in ma_values
        for (i = 0; i < numentries; i++) {
            if ((value = mp->ma_values[i]) == NULL)
                continue;
            if (_PyObject_GC_MAY_BE_TRACKED(value)) {
                assert(!_PyObject_GC_MAY_BE_TRACKED(ep0[i].me_key));
                return;
            }
        }
    } else {  // key and value are in ma_keys
        for (i = 0; i < numentries; i++) {
            if ((value = ep0[i].me_value) == NULL)
                continue;
            if (_PyObject_GC_MAY_BE_TRACKED(value) ||
                _PyObject_GC_MAY_BE_TRACKED(ep0[i].me_key))
                return;
        }
    }
    _PyObject_GC_UNTRACK(op);
}

References

Pablo Galindo Salgado, Design of CPython's Garbage Collector
Martin von Löwis, Proposal: Run GC less often
Issue #4074 and #14775