Ditch Those Built-in Arrays for C++ Containers

In June I introduced you to the C++ containers std::vector and std::array. You should ditch your standard built-in arrays for these container types.

I expect most readers of this article to be familiar with the basics of C-style arrays. I will only be providing a brief summary of built-in array features and pitfalls.

Built-in Arrays

There are a few ways to create built-in arrays. We could declare a static array of a specific size:

int x[256];

Or we could declare an array and size it to the initialized data:

const char str[] = 'This is a string";

We can also use a pointer and dynamically allocate memory with new or malloc:

int * arr = static_cast<int*>(malloc(256 * sizeof(int)));

Each of these forms allows us to access individual elements with the [] operator:

int t = x[28];
arr[0] = t;

You'll note that this same syntax applies to all pointers, and this leads us to one of the primary problems with built-in arrays: they are just dressed up pointers. You must always keep these facts in mind when using built-in arrays:

  • When arrays are passed into functions, they decay into a simple pointer
  • Built-in arrays and pointers provide no bounds checking. The [] operator will allow you to write to any offset you want.
  • You must keep track of array size separately
    • sizeof() only returns the correct size for statically allocated arrays when used in the same context you allocated them in
    • Do you keep track of size in bytes or elements? Will you remember the correct interpretation throughout your entire call chain?
  • You must supply the size as an argument to functions which require array input
  • You must remember whether your array is statically or dynamically allocated
  • You must remember to free your dynamically allocated arrays

That's a pretty significant list of overhead that we need to manage and keep in mind. Unfortunately, many of these details are easy to forget. Even minor mistakes, like indexing an array past its bounds, forgetting to free memory, or misinterpreting the output of sizeof(), can have catastrophic effects on your program. In their worst form, array usage bugs also result in serious security issues.

Since C does not provide templating, the idea of building an array manager or container structure is not terribly appealing: a new implementation is required for each array type you want to manage. Yuck!

Advantages of Using C++ Containers

C++ containers like std::array and std::vector become very appealing when viewed in the context of built-in array deficiencies. There are many benefits to using C++ containers over built-in arrays:

  • The container automatically keeps track of size for you.
    • The size is always specified as number of elements.
  • Containers don't automatically decay into pointers
    • You can access the data pointer for interoperability with APIs expecting pointers
  • Containers give you the option of using bounds checking with data accesses (or not!)
  • You can utilize design patterns such as Scope Bound Resource Management (SBRM) to write cleaner functions and reduce the chance that you will forget to free dynamically allocated memory
  • Containers utilize a common interface
    • You can utilize any algorithm designed to work on containers

std::vector or std::array: Which one should I use?

You should utilize a std::vector container when:

  • The size of your array is not known in advance
  • You need the ability for your array to grow and shrink depending on runtime conditions
  • Your memory needs to be shared out-of-scope (e.g. moved between threads, returned up the function call chain)
  • Your size is known in advance but is too large to be placed in static storage.
    • You can use the reserve() function to allocate space just once, avoiding later reallocations during runtime.

You should utilize a std::array structure when:

  • The size of your array is known in advance
  • You want to skip heap allocation and use the stack or static global storage
  • You want to reduce memory fragmentation (no heap usage)
  • You care about the overhead storage requirements of std::vector (e.g. when creating a multi-dimensional container)
    • On my system with clang, this is currently 24 bytes

Note that if you want to share a std::array that isn't in a global static scope, you will need to use a copy to preserve the memory. Once the original scope is exited, the std::array will be destructed.

Multi-dimensional Arrays

Built-in arrays have a usage advantage over std::vector and std::array when you need to utilize a multi-dimensional array.

Our built-in multi-dimensional array is very simple to create:

int marr[5][20];

This is much prettier than the equivalent std::array syntax:

std::array<std::array<int,5>,20> sarr;

And std::vector is even more unwieldy since we have to manually set the size. Two options for intializing a std::vector are shown below. Note that the first example, which allocates space in the constructor, is much faster than the second example and should be preferred.

//Option 1
std::vector<std::vector<int>  > varr(4, std::vector<int>(4));

//Option 2: Manually initialize and size
std::vector< std::vector<int> > varr;

for (int i = 0; i < 4; i++)
{
    //declare a new row
    std::vector<int> row;
    for (int j = 0; j < 4; j++)
    {
        // Add a column element
        row.push_back(0);
    }

    //Add the initialized row to the main vector
    varr.push_back(row);
}

The syntax of creating multi-dimensional containers is not the only disadvantage: don't forget std::vector's storage overhead! Each sub-vector will also allocate its own memory separately. Both of these factors mean that you can never rely on your multi-dimensional std::vector being packed, so be careful passing the underlying data pointer to an API expecting packed memory.

Due to std::vector storage overhead, note that a multi-dimensional std::vector of dimensions [5][1000] will be much smaller than one with dimensions[1000][5]. To demonstrate the difference using my system's 24 bytes of overhead:

  • In the first case, you end up with 120 bytes of overhead.
  • In the second case, you end up with 24,000 bytes of overhead!

Unlike std::vector, a multi-dimensional std::array will have the elements packed in memory just as a built-in array would. Also, due to having no container storage overhead, a multi-dimensional std::array will always take up less space than a multi-dimensional std::vector.

Further Reading

Acknowledgements

Thanks to Patrice R. for sending me some very helpful notes and corrections!