Leaky Abstraction

Description

26 January 2022 by Phillip Johnston

An abstraction is “leaky” when it exposes details about the underlying implementation to the users that should ideally be hidden away.

The term was coined by Joel Spolsky in The Law of Leaky Abstractions, where he states:

All non-trivial abstractions, to some degree, are leaky.

Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions.

The existence of leaky abstractions means that abstractions do not always simplify our work in the intended ways. While we can often operate with the abstractions, we are not free from understanding the implementation beyond the abstraction. Eventually, a problem will appear, and we will need to look behind the curtain and learn the underlying details.

As Spolsky points out, one implication of this is that abstractions save us time working, but they don’t save us time learning. One tradeoff here is that we can build more complex systems more quickly, but debugging problems that leak from the abstractions can be a lengthy process.

Examples of Leaky Abstractions

Spolsky cites a number of examples in his article. For embedded developers, the most relevant is the idea that memory is abstracted as a big flat address space, but often this abstraction leaks. As Spolsky points out, iterating over a large two-dimensional array can have different performance characteristics depending on how you iterate due to potentials for page faults / cache invalidations or other underlying processor performance implications.

Many embedded systems abstractions designed around hardware are leaky because they encode information in the abstraction that is not actually generally applicable across different hardware components. When a component is swapped, the designers may realize that their abstractions leaked some hardware details that do not apply to the new component, resulting in changes to the interface and/or application.

References

Wikipedia: Leaky Abstraction

A leaky abstraction is an abstraction that exposes details and limitations of its underlying implementation to its users that should ideally be hidden away. Leaky abstractions are considered problematic, since the purpose of abstractions is to manage complexity by concealing unnecessary details from the user.
Towards a New Model of Abstraction in Software Engineering by Gregor Kiczales
The Law of Leaky Abstractions by Joel Spolsky

That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It’s a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It’s a way to pretend that a hard drive isn’t really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.

Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn’t, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can’t do anything about it and your message doesn’t arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can’t quite protect you from. This is but one example of what I’ve dubbed the Law of Leaky Abstractions:

All non-trivial abstractions, to some degree, are leaky.

Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions.

One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I’m training someone to be a C++ programmer, it would be nice if I never had to teach them about char*’s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they’ll write the code “foo” + “bar”, and truly bizarre things will happen, and then I’ll have to stop and teach them all about char*’s anyway. Or one day they’ll be trying to call a Windows API function that is documented as having an OUT LPTSTR argument and they won’t be able to understand how to call it until they learn about char*’s, and pointers, and Unicode, and wchar_t’s, and the TCHAR header files, and all that stuff that leaks up.

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying “learn how to do it manually first, then use the wizzy tool to save time.” Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.

And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we’ve created over the years do allow us to deal with new orders of complexity in software development that we didn’t have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks.

Categories: Field Atlas

Tags: Public Entry, Software Engineering

« Back to Glossary Index

Description

Examples of Leaky Abstractions

References

Related

Share Your ThoughtsCancel reply