Adding Delimiters Between String Elements in C++

A system I’m working on involves collecting data and logging it to an SD card in a CSV-like file format. Each line is formatted with a start-of-line character (STX) and an end-of-line character (ETX), and fields within the message are separated by delimiting character (DELIMITER, in this case, ).

Formatted log lines are being manually constructed throughout the program in the following manner (and often more verbosely than this example):

std::string line = Logger::STX + loggerID + Logger::DELIMITER + systime::getTimestamp().data() + Logger::DELIMITER +
                       getCurrentStateLabel() + Logger::DELIMITER + ".eng file opened " + profileFileName + Logger::ETX
Logger::logLine(VERBOSE, line);

Repeated manual operations like the above catch my attention: they should happen within a function. This way we can change the formatting in one place if needed, rather than in thousands of lines spread throughout a program.

Further reading
This is an example of information hiding, since we are hiding the assembly of formatted log strings behind an API. For more on this, and other related ideas, see our course Designing Embedded Software for Change.

When I first encountered this usage, I immediately though that this was a job for a variadic template function to solve. This way, I could specify each field, allowing the logger implementation to perform the assembly.

Logger::logEngLine(VERBOSE, loggerID, systime::getTimestamp().data(), getCurrentStateLabel() , (std::string(".eng file opened") + profileFileName));

How naive I was – this turned out to be much more challenging of a job than I expected! I went through multiple different variations before I settled on one that I thought was workable. Let’s take a look at each in turn.

Table of Contents:

  1. The Basic Structure
  2. Option 1: Pack Expansion inside a Braced Initialization List
  3. Option 2: Fold Expression
  4. Option 3: Initializer List (the chosen route)
Note
If you have a potential solution that is better that what is shown below, or suggestions for improving how we handle this case, we’d love to hear from you. Please leave a comment below or send us an email. Compiler explorer links are greatly appreciated!

The Basic Structure

For evaluating the different solutions, I created a basic skeleton structure in Compiler Explorer.

  • There is a logLine function which simply outputs the formatted line to std::cout (in the real program, this would be a file append operation).
    • This is what the templated function will invoke after the various fields have been properly assembled.
  • There is a missing logEngLine function which wall take a variadic list of input arguments and create a properly formatted string, invoking logLine with the assembled output
    • Output should be assembled with STX at the start of the line, followed by interleaved values and DELIMITER characters, with an ETX line at the end
    • There should be no extra DELIMITER between the final value and ETX
  • There is a main() function which invokes logEngLine with a few different variations of input parameters so I can see the overall impact of the solution on generated code output

Here’s the basic skeleton code: (Compiler Explorer )

#include <iostream>
#include <string>

constexpr char STX='$';
constexpr char ETX='
';
constexpr char DELIMITER = '    ';

void logLine(std::string_view line)
{
   std::cout << line;
}

template<typename... Values>
void logEngLine(Values const&... values)
{
    std::string formatted_line;

    // TODO: logEngLine implementation

    logLine(formatted_line);
}

int main()
{
    std::string line_a{"This is line A"};
    std::string_view line_b{"This is line B"};
    logEngLine("this","is","a","string");
    logEngLine(line_a, line_b);
    logEngLine(line_b, line_a);

    return 0;
}

Option 1: Pack Expansion inside a Braced Initialization List

My first attempt, based on this StackOverflow answer, used pack expansion inside a braced initialization list of an array declaration. (Compiler Explorer)

template<typename... Values>
void logEngLine(Values const&... values)
{
    std::string formatted_line{STX};

    int unpacker[]{0, (formatted_line += values, formatted_line += DELIMITER, 0)...};
    static_cast<void>(unpacker);
    formatted_line += ETX;

    logLine(formatted_line);
}
Note
An ostringstream version of this was also prototyped.

Example output:

$this is  a   string  
$This is line A This is line B  
$This is line B This is line A  

This approach is clever as far as implementations go, but it has two significant downsides:

  1. I didn’t see a good way of eliminating the extra DELIMITER on the final value without going back and modifying the string to remove it
  2. This generates a large amount of code for each variation of logEngLine.
    • We only have three variations of argument formats in the example, and that will pale in comparison to the number of variations found in the actual program.
    • I suppose there are more optimization opportunities available if I were able to convince the compiler to generate functions of homogenous string_view parameters, but I did not put effort into that area.

Option 2: Fold Expression

Another StackOverflow answer inspired me to test out a version with that used a fold expression. (Compiler Explorer)

template<typename... Values>
void logEngLine(Values const&... values)
{
    constexpr size_t num_inputs = sizeof...(Values);
    int input = 0;
    std::ostringstream formatted_line;
    composed_string << STX;

    ([&]{
        input++;
        formatted_line << values;
        formatted_line << (input == num_inputs ? ETX : DELIMITER);
    }
    (), ...);

    logLine(formatted_line.str());
}

Example output:

$this is  a   string
$This is line A This is line B
$This is line B This is line A

This version produces superior output in that we are able to control the addition of last DELIMITER character, albeit somewhat inefficiently. But we are able to generate properly formatted log lines, which is something.

However, it’s downside is that it still generates a lot of code for each variation, which will not be acceptable in the final program.

Option 3: Initializer List

The best option I thought of was using std::initializer_list<std::string_view> as an input type. This would enable the use of one assembly function implementation that looped over the elements in the list to create a string. We could also use iterators to more easily control the addition of the final DELIMITER/ETX character.
Here’s the implementation: (Compiler Explorer)

void logLine(std::initializer_list<std::string_view> args)
{
    std::ostringstream formatted_line;
    formatted_line << STX;
    
    // We stop one before the end to avoid the extra delimiter
    auto end_target = args.end();
    end_target--;
    for(auto value = args.begin(); value != end_target; value++)
    {
        formatted_line << *value << DELIMITER;
    }
    
    // Then we add the end and close out the string
    formatted_line << *end_target << ETX;

    logLine(formatted_line.str());
}

One tradeoff here is that the arguments would need to be passed as an initialization list, like so:

Logger::logEngLine(VERBOSE, {"Arg1","Arg2", a_variable});

For my own purposes, this would work fine, but I could see it causing problems as other developers start using the Logger – especially those not as experienced with C++. To eliminate that headache, I opted to take a slightly non-optimal approach of creating a templated function that would assemble the various arguments into an initializer_list.

template<typename... Values>
void logEngLine (Values const&... values )
{
    logLine({values...});
}

Even with this inefficiency, the std::initializer_list variation generates about half the code as the other versions, making it the ideal choice. The optimizer also appears to elide the templated functions in this case, simply generating the initializer_list at the call site.

In general, I’m interested in other optimization ideas here. Log string formatting will be a non-trivial part of this system, and I feel as if I am not well versed enough in string interactions or templates to find the optimal solution here.

One concern I have is the overall variation in arguments passed to logEngLine. While the chosen solution appears to elide these functions with optimizations on, there is still a concern with debug builds that have no optimizations enabled. We could reduce the number of generated functions if we could somehow convince the compiler to generate ONLY functions with std::string_view arguments – it seemed the best I could manage in C++17 was to restrict the templated generation to types that are convertible to std::string_view, which does not achieve the optimization goal.

References

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.