The Problems with Global Variables

oWe try to avoid issuing rules and decrees when giving programming advice. There are tradeoffs to every decision, and sometimes the tradeoffs one needs to make completely invalidate general advice. Still, we cannot ignore the collective experience of our entire industry across many decades. What makes us any wiser than those who came before us?

One hill that we will proudly stand and die on is global variables should be avoided unless absolutely necessary. And their use rarely is absolutely necessary. Instead, we often find that developers introduce global variables and global state into our program out of convenience, time pressure, or ignorance of other possibilities.

In this entry, we will discuss the problems with global variables, alternative approaches for sharing data, refactoring techniques, and guidelines for whenever global variables must be used.

Table of Contents:

  1. The Problems with Global Variables
  2. Alternatives to Global Variables
  3. Refactoring Techniques
  4. Rules for When Globals Must Be Used
  5. References

The Problems with Global Variables

Global variables are subject to a number of problems:

  1. Violates Information Hiding
  2. Negates the Open-Closed Principle
  3. Introduces Implicit Coupling and Side Effects
  4. Difficult to Trace
  5. Shadowing
  6. Aliasing
  7. Race Conditions
  8. Global Initialization Ordering Problems
  9. Comments Don’t Help
  10. Testing Difficulties

Violates Information Hiding

The use of global variables indicates poor modularity. It also violates the Information Hiding principle. The implementation of the global data is available to the rest of the program – there is no easy way to change how data is stored or encoded without changing every single reference to that variable throughout the program. Should a redesign happen in the future, changes will cascade throughout the program.

Negates the Open-Closed Principle

The use of global variables negates the possibility of applying the Open-Closed Principle (OCP). Any component that depends upon a global variable cannot truly be closed to modifications – a change in one component that causes the variable to be used in an unexpected way will break other components using the variable.

Introduces Implicit Coupling and Side Effects

Functions and modules that share global data are tightly coupled together. However, this type of coupling is easy to miss, because it is not reflected in the software design, and it is not made visible except through source code analysis and searching for names. Essentially, you are introducing hidden dependencies between functions and modules.

Often, the functions and modules that are coupled together with global data have no conceptual reason to be – the data could easily be shared through another mechanism, such as a function parameter or callback.

Difficult to Trace

Since he system does not reflect good encapsulation, it can be difficult to figure out what exactly is happening in the system. Additionally, there are no access controls for global variables, meaning that they can be modified anywhere – it is not clear who has the responsibility for changing the variable.

The end result of these factors is that the consequences of a change cannot be determined at the site of the change – instead, all uses of the variable must be analyzed to determine the effect of the change. Side effects can also arise from two different parts of the program modifying the variable in an unanticipated sequence, because the distributed nature of the variable makes it difficult to properly reason about its use. You can certainly perform an analysis of where each global variable is written and read, but this is often time consuming and error prone. In addition, aliasing (discussed below) makes it likely that simple searches are incorrect. Any change to a global variable will invalidate this analysis and require it be redone.

The practical effect is that developers make a local change in one area, which introduces a bug in another section of the code. The complexity of the system increases the likelihood of errors and the difficulty in tracking them down – the variable can be read and written to all over the system. Eventually, developers stop making changes to the software out of fear of breaking it, drastically increasing the aging effect.

Shadowing

It is possible to enter a situation where you have “global variable shadowing”, which happens when you declare a local variable with the same name as a global variable. This can cause unexpected behavior, as you may think you’re modifying the global variable when instead you’re working with the local variable. Thankfully, modern compilers can generate a warning for this, but we have worked on projects where this warning isn’t enabled and shadowing occurred without anyone realizing it, resulting in erroneous behavior.

Aliasing

There is no way to prevent someone from making an alias to a global variable, such as passing a pointer to global variable as a function input. If global variables are considered harmful, pointers to global variables are pure evil. It makes it difficult to track what is actually happening to a variable – you may search for all instances where a global variable is written to, completely missing the fact that there are functions that modify the variable through a pointer. This makes it even harder to track down modifications to a global variable in our program.

Race Conditions

In multi-threaded systems, global variables introduce race conditions. Synchronization methods (such as locks and critical sections) become a requirement, but they are too often overlooked. When global variable access is distributed throughout a system (rather than contained in a single module), it can be easy to improperly protect a variable or miss a location where the variable is updated – there is no way guarantee that synchronization will be used at every usage site other than developer discipline (which can be fickle).

The mere presence of global variables in sufficient can effectively limit a program to being single-threaded due to the work required to properly protect reads/writes from each variable.

Global Initialization Ordering Problems

In C++, the initialization order of global and file static variable is a common source of problems in a system. The standard states that global variables within a single source file are created and initialized in the order in which they appear in the file. However, there is no guarantee for global ordering across source files. When a global in one file references a global in another file, the initialization order may not be as expected, resulting in a crash. These crashes can change depending on how the object files are linked in the build, leading to spurious problems that appear and disappear over time.

Comments Don’t Help

Some argue that the downsides to global variables can be reduced through judicious commenting explaining the intent and operations of a variable. However, these types of comments do not resolve the problems with globals. Due to their nature, you may even realize that explanations about what a variable represents and how it is used and accessed are completely wrong. This can often arise due to the complexity introduced by reading/writing the variable in multiple locations – for example, subtle race conditions may exist in the execution due to the nature of the generated code, but may not be noticed by any developers reading the code. Comments also do no resolve the problem because there is always the possibility for the information to become outdated due to the distributed nature of the variable.

Testing Difficulties

Global data makes testing difficult. In many cases, it means that individual components of the system cannot be tested in a standalone manner. Due to the coupling introduced by global variables, an entire subsystem or system must be put under test.

Global variables also make it difficult to restore the state of the system under test between test cases. Sometimes, the only way to cleanly reset state is to reset the entire system, lengthening the testing process and making it more brittle.

Alternatives to Global Variables

Those who are accustomed to sharing data through global variables may not know what other tools are at their disposal. Whenever you have the temptation to reach for global data, instead consider one of the following options.

  1. Reduce Scope
  2. Access Data Through Interfaces
  3. Message Passing
  4. Callbacks

Reduce Scope

The simplest course of action is to reduce the scope of a global variable as much as possible. By default, you should use the smallest practical scope for variables and functions:

  • Prefer to work with local data or data that is passed by parameter whenever possible
  • Work with function local static variables (they cannot be accessed outside of the function)
  • Prefer to work with file static variables (they cannot be accessed outside of the source file)

Analyzed in the opposite direction, you can:

  • Change global variables to file static variables
  • Change file static variables to local static variables

Code can be restructured to pass data through function parameters instead of sharing data through global variables. If you need persistence within a function, you can use a local static variable.

Note

Global variable scope can often be reduced by rethinking how you organize your source code. See the Refactoring Techniques section for more information.

Access Data Through Interfaces

Ideally, global data should not be written to or read from directly. Instead, you can move the data into a module and make it available only through an API. Aside from reducing scope, this ensures that users of the data can only perform intended operations on the variable. This eliminates the potential to set an erroneous arbitrary value that has a negative impact on the program. Another benefit to localized control is that it becomes much easier to guarantee that the variable is properly protected in multi-threaded situations.

Note

See the Refactoring Techniques section for more information.

Message Passing

Often, embedded systems software employs global variables to share information across contexts, such as reading data in an ISR and making it available in the main loop or passing data between multiple threads. Instead of using global variables, a queue can be used instead (e.g., to pass messages, events, or new data values). Such queues are often provided with many RTOSes, but they can also be created out of circular buffers. The scope of the queue can be reduced so that it is only available to the necessary producer and consumer(s), whether it is through a file static variable or made available by passing the queue handle into a module as a function parameter.

Further reading

For more on this topic, see the Message Passing entry.

Callbacks

Another way to share a new data value with interested clients is through the use of callbacks (or, more generally, the Observer pattern). Whenever new data is available, the producer can share that data by invoking all registered callbacks and passing the new data by parameter.

Refactoring Techniques

If you are trying to untangle a design that relies heavily on global data, the following techniques can be employed:

  • Refactoring Global Data to Support Multiple Instances describes a refactoring approach that can address a common global data challenge, which is the need to migrate from supporting a single instance of something to supporting multiple instances. The same technique can be used to localize global data – first, encapsulate the data within a structure, then reduce the scope of that structure (whether within a module, or passing information through function parameters).
  • Better Embedded System SW: Getting Rid of Global Variables describes the process for converting a global variable and the associated functions that use it into a single module. This is preferable for a number of reasons – it localizes access (making changes and debugging easier), hides the implementation, and we can actually enable multi-threading by being sure we’ve locked properly
  • Reorganize source code based on data access
    • Shared global data can be reduced in scope by rethinking how your source code is organized: arrange your source files based on access to data. Functions that modify the same global data can be grouped within one module. The global variable can then be reduced in scope to a file static variable serving that group of functions.
    • To use an example given by Phil Koopman, consider a time-of-day variable that is updated by an ISR. You can create a TimeOfDay.c file which contains the following:
      • A file static variable timeOfDay
      • The timer ISR function implementation
      • An API to read the time of day (returns the value of timeOfDay).

Rules for When Globals Must Be Used

Ideally, no variables in a system will be global (special globals such as mathematical constants and compile-time configuration information excepted). However, if global variables are used, we suggest the following rules:

  • Each global variable or category of global variables should be justified as required for effective software construction, and the rationale for their use should be documented for other developers.
    • Efficiency is not a reason for using global variables (e.g., “accessing this data through a function introduces overhead” is rarely a sufficient justification. If this is truly what is bottlenecking your system, get a faster processor.)
  • In a multi-tasking system, each global should be declared volatile and protected in some way (e.g., by a mutex, within a critical section).

When global data is used for a configuration value, mathematical constant, or other value that does not change at runtime, ensure:

  • The variable is declared using the const keyword to prevent modifications
  • Visibility is limited to need-to-know, rather than making it visible to the entire system in a generic globals.h type file.

References

  • Global Variables Considered Harmful by Wulf and Shaw

    We claim that the non-local variable is a major contributing factor in programs which are difficult to understand. […] Roughly, however, we mean any variable which accessed, and particularly modified, over a relatively large span of a program text.

    We must admit certain limitations on our intellectual powers – in particular, that we are better able to cope with static relations among objects than dynamically evolving ones. Since the text of a program is static but its execution is dynamic, anything which destroys or obscures the mapping between textual relations and execution relations magnifies the difficulty in doing that which we are east equipped to do.

  • The Art of Readable Code by Dustin Boswell and Trevor Foucher

    We’ve all heard the advice to “avoid global variables.” This is good advice, because it’s hard to keep track of where and how all those global variables are being used. And by “polluting the namespace” (putting a bunch of names there that might conflict with your local variables), code might accidentally modify a global variable when it intended to use a local variable, or vice versa.

  • Patterns in the Machine : A Software Engineering Guide to Embedded Development by John Taylor and Wayne Taylor

    Here is a partial list in no particular order of reasons why the global variable is the bête noir of programming:

    • Non-CONST global variables are dangerous because their values can be changed at any time and there is no easy way for the developer to know when, how, or if this will happen.
    • Global variables are not inherently thread safe.
    • Clients have to spend time searching for all the places a global variable is referenced.
    • Global variables can bring in hidden dependencies and make testing code in any predictable fashion extremely difficult.
    • No module that depends upon a global variable can be closed against any other module that might write to that variable.
    • Global variables pollute the standard namespace.
  • Better Embedded System SW: Global Variables Are Evil sample chapter by Phil Koopman

     The problem with using globals is that different parts of the software are coupled in ways that increase complexity and can lead to subtle bugs.

    Pointers to globals are even more evil.

    The main problem with using global variables is that they create implicit couplings among various pieces of the program (various routines might set or modify a variable, while several more routines might read it). Those couplings are not well represented in the software design, and are not explicitly represented in the implementation language. This type of opaque data coupling among modules results in difficult to find and hard to understand bugs.

  • Global Variables are Evil! Lecture Slides by Phil Koopman

    Use of globals indicates poor modularity

  • Better Embedded System SW: Getting Rid of Global Variables by Phil Koopman

  • Better Embedded System SW: Minimize Use of Global Variables by Phil Koopman

    Critical embedded software should use the minimum practicable variable scope for each variable, and should minimize use of global variables.

    Over-use of globals can reasonably be expected to result in significantly increased defect rates and the presence of difficult-to-find software defects that are likely to render a system unsafe.

    Consequences: Using too many global variables increases software complexity and can be reasonably expected to increase the number of bugs, as well as make the code difficult to maintain properly. Defining variables as globals that could instead be defined as locals can be reasonably expected to significantly increase the risk of the data being improperly used globally, as well as to make it more difficult to track and analyze the few variables that should legitimately be global. In short, excessive use of globals leads to an increased number of software defects.

    A significant minority of variables (or fewer) should be global. Ideally zero variables should be global. (Special globals such as mathematical constants and configuration information might be excluded from this metric.) The exact number varies with the system, but an expected range would be from less than 1% to perhaps 10% of all statically allocated variables (even this is probably too high), with an extremely strong preference for the lower side of that range Exceeding this range can reasonably be expected to lead to an increase in software defects.

    The need for each global or category of globals should be specifically justified as required for effective software construction. Speed increases are generally not sufficient justification, nor is limited memory space.

    In any system with more than one task (including systems that just have a main task plus interrupts), every non-constant global should be declared volatile, and every access to a global should be protected by a form of concurrency management (e.g., disabling interrupts or using a mutex). Failing to do either can normally be expected to result in concurrency defects somewhere in your code.

    Each variable should be declared locally if possible. Variables used by a collection of functions in the same C programming language module should be declared as top-level “static” within the corresponding .c file to limit visibility to functions declared within that .c file. Variables used by only one function should be declared within that function and thus not be visible to any other function.

    One reason to avoid globals is that use of many globals can be reasonably expected to lead to high coupling among many disparate portions of a program. Any variable that is globally visible might be read or written from anywhere in the code, increasing program complexity and thus the chance for software defects. While analysis might be performed to determine where globals actually are read and written, doing so for a large number of globals is time consuming, and must be re-done any time any substantive change is made to the code. (For example, it would be expected that such analysis would have to be re-done for each software release.)

    Another reason to avoid globals is that they introduce concurrency hazards (see the discussion on concurrency in an upcoming post). Because it is difficult to keep track of what parts of the program are reading and writing a global, safe code must assume that other tasks can access the global and use full concurrency protection each time a global is referenced. This includes both locking access to the global when making a change and declaring the global “volatile” to ensure any changes propagate throughout the software.

    Even if concurrency hazards are generally avoided, if more than one place in the program modifies a global it is easy to have unexpected software behavior due to two portions of the program modifying the globals’ value in an unanticipated sequence. This can be reasonably expected to lead to infrequent and subtle (but potentially severe) concurrency defects.

    Using too many globals can be thought of as the data flow version of spaghetti code. With code “control” flow (conditional “if” statements and the like) that is tangled, it is difficult to follow the flow of control of the software. Similarly, with too many globals it is difficult to follow the flow of data through the program – you get “spaghetti data.” In both cases (tangled data flow and tangled control flow) designers can reasonably be expected that the spaghetti software will have elevated levels of software defects. Excessive use of global variables makes unit testing difficult, because it requires identifying and setting specific values in all the globals referenced by a module that is being unit tested.

  • The Open-Closed Principle” by Robert Martin

    No Global Variables — Ever.

    The argument against global variables is similar to the argument against pubic member variables. No module that depends upon a global variable can be closed against any other module that might write to that variable. Any module that uses the variable in a way that the other modules don’t expect, will break those other modules. It is too risky to have many modules be subject to the whim of one badly behaved one.

    On the other hand, in cases where a global variable has very few dependents, or cannot be used in an inconsistent way, they do little harm. The designer must assess how much closure is sacrificed to a global and determine if the convenience offered by the global is worth the cost.

    Again, there are issues of style that come into play. The alternatives to using globals are usually very inexpensive. In those cases it is bad style to use a technique that risks even a tiny amount of closure over one that does not carry such a risk. However, there are cases where the convenience of a global is significant. The global variables cout and cin are common examples. In such cases, if the open-closed principle is not violated, then the convenience may be worth the style violation.

  • C++: The Case Against Global Variables

  • C2 Wiki: Global Variables Are Bad

  • Code Complete by Steve McConnell, cited by Koopman in Minimize Use of Global Variables

    McConnell says: “Global data is like a love letter between routines – it might go where you want it to go, or it might get lost in the mail.” (McConnell 1993, pg. 88). McConnell also says: “global-data coupling is undesirable because the connection between routines is neither intimate nor visible. The connection is so easy to miss that you could refer to it as information hiding’s evil cousin – ‘information losing.’” (McConnell 1993, pg. 90).

    Image version of the text quoted above

  • [Solved] When are global variables actually considered good/recommended practice? – Local Coder

    Global variables aren’t generally bad because of their performance, they’re bad because in significantly sized programs, they make it hard to encapsulate everything – there’s information “leakage” which can often make it very difficult to figure out what’s going on.

  • CppCoreGuidelines/CppCoreGuidelines.md at master · isocpp/CppCoreGuidelines

    I.2: Avoid non-const global variables
    Reason: Non-const global variables hide dependencies and make the dependencies subject to unpredictable changes.

  • Coding Standards, C++ FAQ

    FAQ: What’s a good coding standard for using global variables?
    The names of global variables should start with //.

  • 6.8 — Why (non-const) global variables are evil – Learn C++

  • Why Functional Programming Should Be the Future of Software Development – IEEE Spectrum

    In designing hardware for a computer, you can’t have a resistor shared by, say, both the keyboard and the monitor’s circuitry. But programmers do this kind of sharing all the time in their software. It’s called shared global state: Variables are owned by no one process but can be changed by any number of processes, even simultaneously.

    Now, imagine that every time you ran your microwave, your dishwasher’s settings changed from Normal Cycle to Pots and Pans. That, of course, doesn’t happen in the real world, but in software, this kind of thing goes on all the time. Programmers write code that calls a function, expecting it to perform a single task. But many functions have side effects that change the shared global state, giving rise to unexpected consequences.

Share Your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.