Choosing the Right Container: Associative Containers

I recently provided an overview of the C++ standard container types as well as high-level guidelines for selecting the right container. Today we will be taking a look at the tradeoffs of the STL associative containers.

In general, there are two types of associative containers:

  1. AssociativeContainer, which represents an ordered container utilizing key-based lookup
  2. UnorderedAssociativeContainer, which utilizes hashing for key-based lookup

The ordered associative containers are:

  • std::set, which implements a collection of unique keys, sorted by keys
  • std::map, which implements a collection of key-value pairs, sorted by unique keys
  • std::multiset, which implements a set that allows multiple entries each key
  • std::multimap, which implements a map that allows multiple entries for each key

The unordered associative containers are:

Selecting between Ordered and Unordered Associative Containers

One of the biggest questions I've had is "when should I use a map vs. a unordered_map"? There are certainly plenty of opinions around the internet, and I haven't achieved full clarity yet. However, I have distilled some general details and rules of thumb to help you decide between ordered and unordered associative containers.

If you are only worried about performance, I'll save you some time: it seems that unordered_map has better performance in general.

Ordered Containers

Each of the associative containers sorts keys upon insertion, allowing for O(log n) search complexity.

The ordered associative containers use a node-based allocation scheme. Iterators to all elements are stable, and in C++17 you can even move elements from one map to another without invalidating their iterators. Node-based memory allocation keeps ordered container function timings more consistent, since they never need to make large allocations.

By using an ordered container, you are able to implement efficient range searches or iterate over subsets (e.g. operate on all elements where key > 100).

If you want to use a user-defined object for a key, you'll need to define a less-than operator or implement a new comparison function. This can be much simpler than implementing a hashing function for your custom data type.

You should use an ordered associative container if:

  • You prefer a tree structure
  • You don't want the memory overhead of storing the hash table
  • Ordering or ordered traversal of your data set matters
  • A good hash of your key data is not possible or is to slow
  • You prefer to write a comparison operator for your custom data type instead of a hashing operator
  • You need guaranteed performance (e.g. embedded systems)
    • Unordered containers have O(n) worst-case complexity in collision cases

Unordered Containers

Each of the unordered associative containers uses hashed data structures that can be searched in amortized constant time (O(1) amortized). This means that most operations take constant time, though collisions can increase the operational time to O(n) in the worst case.

Unordered containers require additional memory to store the hash table.

If you want to use a user-defined object for a key, you'll need to implement a definition of std::hash<T> for your key type T and override the == operator.

You should use an unordered associative container if:

  • You have the memory for a hash table
  • Can absorb the occasional long operation
  • Have a good hashing function to reduce collisions
  • You are using string data as a key

Differentiating the Containers

Most differences between the various types of associative containers are connected to the OrderedAssociative and UnorderedAssociative container concepts. Since we've already reviewed these two major categories, let's briefly review the subtypes: set, multiset, map, and multimap. The notes below apply to both ordered and unordered associative containers.

The map containers support key-value pairs. If you're just looking for a simple lookup table implementation (or a Perl-like associative array), the map subtype is the container for you. The map containers support unique keys: writing to a key which already exists in the structure will overwrite the stored value.

The set containers support key-only entries. Like map, set only supports unique keys, though writing to a key which already exists is a wasted operation. A set is useful for storing a collection of unique things, such as "topics we are subscribed to" or "types of food that our restaurant sells". If you need to store a value with your key, you need a map instead.

The multimap and multiset containers operate the same as map and set but relax the requirement that keys must be unique.

The multimap containers are useful for grouping objects that are related together by key. For example:

  • ZIP code for the key, and people who reside at that ZIP code as the value
  • Word as the key, and definition(s) as the value

A multiset is similar in concept to a key with an associated integer count. You can get the count for each time that an entry appears in your multiset.

In general, you will stick to the map and set types in most cases.

Further Reading