Ode to a Flat Set

Grecian-Urn
In my previous post I discussed the most important changes in hardware since the STL was created. These changes are the increasing cost of malloc and the increasing importance of caching considerations.

I’ll group the standard containers into contiguous containers: vector, string, and deque, and node-based containers: list, *set, *map, and unordered*. The implications are that we want to favor contiguous containers over node-based containers both because they have fewer allocations per item and also because they are more cache friendly.

This isn’t earth shaking news. Alex Stepanov, the STL’s creator, has been quoted as saying, “Use vectors whenever you can. If you cannot use vectors, redesign your solution so that you can use vectors.”

Using vector, which has always been good advice, is now even better advice. It has always been true that for small values of N, vector beats containers with better order operations. Do we care about performance if N is small? If don’t have much data, why worry about which container might be faster? It is true that a small data set may mean that cost is insignificant in any one operation, but we may still be concerned about optimizing container performance . If our look-ups are in a small data set, but we are doing them in a loop, performance is still a consideration. With modern hardware the values for N where contiguous containers can beat node based containers can be larger than ever. How large? Follow Andrei Alexandrescu’s advice: measure.

If your entire vector fits in the cache, it will be hard to beat by a node-based container such as list, set, or map and perhaps even for unordered containers (with their constant-time accesses).

Scott Meyers
Scott Meyers

That is, if your vector is sorted.

In Item 23 of Effective STL, Scott Meyers discussed replacing associative containers with sorted vectors. There are trade-offs, of course. If you are doing inserts, erases, and lookups in an unpredictable pattern, then sorted vector looks less attractive. They also look less attractive if the container holds a very large number of items. Large numbers of items make cache misses likely even for vector and also favor the constant time access of hashes.

They look more attractive as the ratio of lookups to modifications gets larger or if you can segregate the container modification phase from the lookup phase. The ideal scenario for sorted vector is that you populate the vector, sort it once, then read it many times.

Lookups on sorted vector use lower_bound, upper_bound, or equal_range which have the same order (log) as associative container lookups. Although the order is the same, sorted vector will out perform tree-based containers because tree-based containers are dereferencing pointers and often getting cache misses while sorted vector is calculating offsets and getting cache hits.

The story for populating the container is more nuanced. An arbitrary insert into a sorted vector will be order N because vector will need to move all the elements following the insertion point. The log order of associative containers is better. On the other hand, the associative container is going to allocate a node. Vector can move a lot of elements in the time it takes to allocated a node.

Of course the interface for vector isn’t the same as that of an associative container. In order for lookups to be log order, the vector must be sorted and it must stay sorted (or be re-sorted) in the face of modifications. Insertions will need to be preceded by calls to lower_bound to find the insertion point. A single lapse may create a hard to find bug.

Get Boost
Boost Downloads

Enter flat_set.

The Boost Container library has a family of flat_* containers that have associative container interfaces and semantics, but are implemented as sorted vectors.

In addition to faster lookup, the flat_* containers have much faster iteration, less memory overhead, fewer allocations, and improved cache performance.

However, as with all things, there are trade-offs. Unlike items in an associative container which are allocated into their own nodes and are never moved or copied (and thus never throw while being moved or copied), items in a flat_* container must be movable or copyable and may throw when being moved or copied during insertion or erasure. Further, insertion and erasure invalidates iterators and may be slower (particularly for non-movable types).

In short, the smaller your container and the closer your usage is to read-only, the better flat_* containers look as replacements for associative containers.

In a future post I’ll discuss an alternative to consider when the sorted vector’s limitations make it unattractive.

Please comment on Google Plus or reddit.