Ditch Those Built-in Arrays for C++ Containers

I previously introduced the C++ containers std::vector and std::array. You should ditch your standard built-in arrays for these container types.

I expect most readers of this article to be familiar with the basics of C-style arrays. I will only be providing a brief summary of built-in array features and pitfalls.

Table of Contents:

  1. Built-in Arrays
  2. Advantages of Using C++ Containers
  3. std::vector or std::array: Which one should I use?
  4. Multi-dimensional Arrays
  5. Further Reading

Built-in Arrays

There are a few ways to create built-in arrays. We could declare a static array of a specific size:

int x[256];

Or we could declare an array and size it to the initialized data:

const char str[] = 'This is a string";

We can also use a pointer and dynamically allocate memory with new or malloc:

int * arr = static_cast<int*>(malloc(256 * sizeof(int)));

Each of these forms allows us to access individual elements with the [] operator:

int t = x[28];
arr[0] = t;

You’ll note that this same syntax applies to all pointers, and this leads us to one of the primary problems with built-in arrays: they are just dressed up pointers. You must always keep these facts in mind when using built-in arrays:

  • When arrays are passed into functions, they decay into a simple pointer
  • Built-in arrays and pointers provide no bounds checking. The [] operator will allow you to write to any offset you want.
  • You must keep track of array size separately
    • sizeof() only returns the correct size for statically allocated arrays when used in the same context you allocated them in
    • Do you keep track of size in bytes or elements? Will you remember the correct interpretation throughout your entire call chain?
  • You must supply the size as an argument to functions which require array input
  • You must remember whether your array is statically or dynamically allocated
  • You must remember to free your dynamically allocated arrays

That’s a pretty significant list of overhead that we need to manage and keep in mind. Unfortunately, many of these details are easy to forget. Even minor mistakes, like indexing an array past its bounds, forgetting to free memory, or misinterpreting the output of sizeof(), can have catastrophic effects on your program. In their worst form, array usage bugs also result in serious security issues.

Since C does not provide templating, the idea of building an array manager or container structure is not terribly appealing: a new implementation is required for each array type you want to manage. Yuck!

Advantages of Using C++ Containers

C++ containers like std::array and std::vector become very appealing when viewed in the context of built-in array deficiencies. There are many benefits to using C++ containers over built-in arrays:

  • The container automatically keeps track of size for you.
    • The size is always specified as number of elements.
  • Containers don’t automatically decay into pointers
    • You can access the data pointer for interoperability with APIs expecting pointers
  • Containers give you the option of using bounds checking with data accesses (or not!)
  • You can utilize design patterns such as Scope Bound Resource Management (SBRM) to write cleaner functions and reduce the chance that you will forget to free dynamically allocated memory
  • Containers utilize a common interface
    • You can utilize any algorithm designed to work on containers

std::vector or std::array: Which one should I use?

You should utilize a std::vector container when:

  • The size of your array is not known in advance
  • You need the ability for your array to grow and shrink depending on runtime conditions
  • Your memory needs to be shared out-of-scope (e.g. moved between threads, returned up the function call chain)
  • Your size is known in advance but is too large to be placed in static storage.
    • You can use the reserve() function to allocate space just once, avoiding later reallocations during runtime.

You should utilize a std::array structure when:

  • The size of your array is known in advance
  • You want to skip heap allocation and use the stack or static global storage
  • You want to reduce memory fragmentation (no heap usage)
  • You care about the overhead storage requirements of std::vector (e.g. when creating a multi-dimensional container)
    • On my system with clang, this is currently 24 bytes

Note that if you want to share a std::array that isn’t in a global static scope, you will need to use a copy to preserve the memory. Once the original scope is exited, the std::array will be destructed.

Multi-dimensional Arrays

Built-in arrays have a usage advantage over std::vector and std::array when you need to utilize a multi-dimensional array.

Our built-in multi-dimensional array is very simple to create:

int marr[5][20];

This is much prettier than the equivalent std::array syntax:

std::array<std::array<int,20>,5> sarr;

And std::vector is even more unwieldy since we have to manually set the size. Two options for intializing a std::vector are shown below. Note that the first example, which allocates space in the constructor, is much faster than the second example and should be preferred.

//Option 1
std::vector<std::vector<int>  > varr(4, std::vector<int>(4));

//Option 2: Manually initialize and size
std::vector< std::vector<int> > varr;

for (int i = 0; i < 4; i++)
{
    //declare a new row
    std::vector<int> row;
    for (int j = 0; j < 4; j++)
    {
        // Add a column element
        row.push_back(0);
    }

    //Add the initialized row to the main vector
    varr.push_back(row);
}

The syntax of creating multi-dimensional containers is not the only disadvantage: don’t forget std::vector’s storage overhead! Each sub-vector will also allocate its own memory separately. Both of these factors mean that you can never rely on your multi-dimensional std::vector being packed, so be careful passing the underlying data pointer to an API expecting packed memory.

Due to std::vector storage overhead, note that a multi-dimensional std::vector of dimensions [5][1000] will be much smaller than one with dimensions [1000][5]. To demonstrate the difference using my system’s 24 bytes of overhead:

  • In the first case, you end up with 120 bytes of overhead.
  • In the second case, you end up with 24,000 bytes of overhead!

Unlike std::vector, a multi-dimensional std::array will have the elements packed in memory just as a built-in array would. Also, due to having no container storage overhead, a multi-dimensional std::array will always take up less space than a multi-dimensional std::vector.

Further Reading

Acknowledgements

Thanks to Patrice R. for sending me some very helpful notes and corrections!

Migrating from C to C++ Articles

3 Replies to “Ditch Those Built-in Arrays for C++ Containers”

  1. To pass std::array to a function, the function need to be a template function if the function intends to handle multiple length std::array. In the worst case, it might force to templatize all the functions called from the first function. There are few solutions, e.g. pass begin and end iterator insteasd of std::array<>&, but these approaches are not very elegant.

  2. I agree that passing std::arrays around quickly becomes annoying. Rather than the inelegant solutions that you mention, do you utilize a different construct instead? Or stick to raw pointers in that case?

  3. Usually I try to avoid raw pointers as this looks like c-style code. I try to avoid iterators as parameters (though <algorithms> uses iterators approach), but this approach clutters the function call with too many parameters. If array size is fixed through out the program life or if I need an array local to a function, I use std::array<size>. If the array size is variable through out the program life or need to pass acorss functions, I prefer using std::vector with reserve(size).

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.