Note: this is an unpublished draft post

The Test Diamond

There are dozens, if not hundreds, of different takes on the Test Pyramid, and while I heartily agree with the main premise, there are some things that have always bothered me about the pyramids themselves. Testers and Developers tend to interpret the pyramid differently, and each seem to miss some of the important concepts. Additionally, the test pyramid still encourages optimizing some of the wrong things, so the proper shape really needs to be a diamond, not a triangle.

You may have heard of this pithy tweet by Guillermo Rauch:

There is wisdom in these words that is not sufficiently captured in our current conception of the Test Pyramid. Let me explain how I interpret this idea as I walk through how I transitioned from thinking about the test pyramid to the test diamond. To clarify, though, the test diamond is explicitly about tests that are regularly run as part of a “regression” suite to evaluate whether previously functioning features are no longer working as desired. Acceptance testing and exploratory or manual testing can be discussed at another time in another context.

  1. The Traditional Test Pyramid

    Traditional Test Pyramid The basic story for this diagram is that too many teams focus on End to End tests, which are “expensive,” and not enough focus on integration or unit tests, which are “cheaper.” I’m using these particular colors in this diagram to represent the types of test. The black background is for “black box” tests that do not have information about the application’s implementation details, white background is for “white box” tests that require information about the application’s implementation details, and the gray background is for “gray-box” test that typically have some information about implementation details but are often interacted with as a consumer.

  2. The Traditional Test Roles

    Traditional Test Roles

    We need to better describe the top layer. With the rise of JavaScript web frameworks, UI tests can be created at the unit level with stubs and mocks of their own, so this is no longer sufficiently descriptive. As for the often-used term “End to End” for testing, I’ve long opposed its usage since it means very different things to different audiences in ways that they often don’t even realize. My alternative is to call this top layer “DOM to Database” tests. This is where the entire system is evaluated from the user’s perspective. These are typically tests executed with Selenium or Appium.

    I’ve also added a bold orange line here to delineate what Dev teams and QA teams typically feel they have responsibility for. From the QA side, testers, especially those with manual backgrounds, often only think in terms of what they can do from the UI. This is with good reason since Tweet quote testers are often isolated from the developers and rarely given access to the code that would allow them to do anything other than black box tests.

    On the other end of the pyramid, developers, when they do write tests, tend to focus exclusively on unit testing. This is often because of the arbitrary “code coverage” metrics imposed by misguided managers. Also, legacy monolithic applications with the proverbial “spaghetti code” are legitimately hard to create useful integration tests for.

    So, from both sides, tests that are more comprehensive than unit tests but more granular than full DOM to Database tests are frequently ignored by both testers and developers.

  3. What Are Integration Tests, Anyway?

    Traditional Test Roles

    “Integration tests” is an overloaded term and has been ascribed all kinds of meanings. Tweet quote To make the Test Pyramid more relatable to both testers and developers, differentiate between black box and gray box tests in the middle layer.

    With the rise of Single Page Applications and JavaScript frameworks, the responsibility for the site’s presentation layer has moved into the DOM with multiple discrete requests to an API. Since the API only requires specific blobs of serialized data provided to a specific endpoint in a specific format, you can perform the same functionality in code with an HTTP library or REST client as you can through the browser. Since the calls are made from outside the system, they are still “black box” tests, but they avoid all of the complexity of interacting with the browser and the DOM.

    From the developer side, we need to be thinking beyond testing inputs and outputs of class methods and be more holistic about the system. I like the term Service Tests because regardless of whether the application is coded in distinct microservices or mashed into a monolith in some fashion, it is still providing discrete services. The move toward microservices means that we have a more explicit way of testing the interactions between these services. The rise of contract testing is an excellent example of this. Further, it is important to test the interactions of classes within the services, in whatever way that needs to be accomplished.

    Note that to demonstrate the importance of communication, and to recognize that the roles of testers and developers within the industry are merging, I’ve changed this line from solid to dotted.

  4. It’s a Diamond not a Triangle

    Traditional Test Roles

    Next let’s talk about our over-reliance on unit tests. There are three major problems with unit testing.

    Firstly, if you have an effective suite of service layer tests, you shouldn’t need as many unit tests. If the only way for a unit test to fail would also cause a higher level test to fail, then the unit test is no longer providing unique and useful information. Finding ways to replace unit tests with integration tests that provide more overall insight into the actual functioning of your system is a good thing! This also minimizes writing unnecessary unit tests for scenarios that are impossible to encounter in the real world.

    Secondly, unit tests are often entirely dependent on the implementation of your code. Many unit tests will always fail when the code is changed, but also the only way for them to fail is for that same code to change. These tests can’t tell you anything useful about the state of the system, only whether or not you’ve changed your code. Hopefully the fact that you’ve changed your code will sufficiently alert you to the fact that you have indeed changed your code.

    Thirdly, unit tests typically test your assumptions about the state of the system rather than the actual state of the system. Mocks and stubs and doubles all serve to allow you to ensure that your code will properly respond under specific conditions. Most teams do not do anything to ensure that the those conditions continue to represent what is seen during actual usage. Your tests can pass even though the system is broken, and a false sense of security is more dangerous than a healthy dose of uncertainty.

  5. Putting it into Practice Traditional Test Roles

    The traditional test pyramid encourages developers to ignore integration tests in favor of more unit tests, and it encourages testers to do “integration tests,” when they aren’t empowered to do so.

    The main argument for continuing to run more unit tests in spite of their problems is that they are “cheaper.” You’ll often see this on the axes next to a test pyramid. Tweet quote Unit tests are only cheaper than other tests when just considering execution costs; they can easily be more expensive when you factor in the more important maintenance costs. Because maintenance costs are the most expensive cost of testing, we need to optimize maintenance costs not execution costs. If you are rewriting tests every time you change your code’s implementation, those are maintenance costs. If you have to track down real bugs in your application because you (or your favorite coworker) made assumptions that are no longer valid, those are maintenance costs.

    So instead of referring to expense in the axes of the diamond, I’ve changed it to “more assumptions” vs “more fidelity.” I started to use “more accurate” for D2D tests, but obviously that isn’t necessarily the case. I decided on “fidelity” because the further up you go, the more faithful the tests will be to the actual experience of the end user. Conversely, as you traverse down the diamond, each layer must make additional assumptions about the actual state of the system for the information it is providing.

    As I mentioned in my article on The Valley of Success, Tweet quote we should be constantly thinking about whether each choice we make is going to increase or decrease the likelihood of future failures. In this context that means that testers should minimize relying on the complexities of browser interactions and focus on faster, more reliable black box API tests. For developers that means less reliance on unit tests and finding ways to ensure the lower level code is behaving correctly in more real-world scenarios.

Fun fact, I actually gave a lightning talk at the Selenium Conference in Portland back in 2015 about this idea, based on some conversations I was having with Mike Lueders when I was working at Blackbaud:

If you found this article interesting: or answer one of these questions in the comments or on Twitter: