Want to increase your quality and reduce your deployment times? Meet App-level Integration Testing
In my last blog post I teased a new pattern for integration testing that we started following at HMH over the last year. I’m so excited to share this approach with you now — so let’s dive in!
What problems are we trying to solve?
Alright folks, you know the drill. Let’s shout out some problems that I plan to address in this article.
“The time to deploy a feature to production takes too long!”
Oh yes, you might hear that one a lot from all levels of the organisation. At the end of the day, everybody wants to get new features out to customers ASAP to keep ’em happy, meet our goals, and positively impact the profit margin. (And if you don’t hear this complaint in your organisation, that’s great! But it’s still worth asking yourself, are you sure it’s not something you can improve?)
“Our automation suites are unreliable because the services go down! I can’t test my new stuff!”
This is a common gripe at water cooler in the UI engineering department. Backend services have their own deployments and performance testing in different environments, just like the UI teams do. Even if you’re got a very good line of communication, coordination, and service monitoring on your environments, some of your automated end-to-end suites will fail for reasons unrelated to the UI deployment. This can be super frustrating and introduce unnecessary context-switching for engineers working on getting their features through the pipeline. To make matters worse, it’s worth noting that context-switching was acknowledged as the number one blocker to development efficiency at Spotify in the recent ThoughtWorks Technology podcast, “Making developer experience a reality”.
“The data setup for this particular scenario is crazy, it’s a total edge case — and it’s too hard to maintain in our end-to-end suites. I’m wasting so much time here!”
It can be really hard to get your automation to a mature enough stage that it dynamically creates the users and data that your test scenario needs, and cleans up those users and data afterwards. In many cases you might find yourself relying on static data in each environment as a stop-gap until you get to that point of maturity. That static data can be inherently unreliable — it could expire, someone could accidentally delete it, or tests could conflict with each other if they rely on the same data set.
Additionally, as engineers we need to think about all the possibilities — empty data states, loading states, network failures, weird user configuration, all sorts of things. If you start testing all those possibilities in your end-to-end suites, they will get extremely bloated and unwieldy very quickly. Resulting in a potentially long-running e2e suite that runs on every deployment and fails more often that it passes.
Any of that sound familiar? All of those examples I mentioned above have and still do happen at HMH. Now, before I jump into describing app-level integration tests, let’s set the scene for how things work in our tests right now.
First of all, what do you mean by apps?
You might remember in my last blog post that I spoke about our Ed product being made up of a collection of independent React applications as we move towards a microfrontend architecture. When I speak about “apps” from here on out, that’s what I mean — discrete areas of the Ed platform, with some examples like student assignments, reporting, student scores, and all of those sorts of product areas you can expect from a learning platform.
How were these apps tested before this magic pattern you keep dangling in front of us?
Generally speaking, we used to have an automation or e2e suite (written in Protractor or more recently, Codecept) for each app. Our engineers would run these suites locally whenever they updated functionality, relative to their changes — so if I made a change in the scores app, I’d run the scores e2e suite. We also run all of the e2e suites we have on deployments to our test and production environments.
This e2e suite would typically test all of the features of the app, including any edge case bugs that may have cropped up over the years. Our scores app is fairly simple, so the e2e suite doesn’t take too long to run, but it still adds up on every deployment. As time to deploy kept creeping up, it made us stop and ask ourselves — do we really need to test all this stuff all the time? Is it absolutely critical to the end user or the platform as a whole?
Some examples of non-critical features our e2es test would be pagination or sorting on our scores table, which is handled server-side — by having these tests on every UI deployment, we’re wasting a lot of time when we should be relying on service tests to keep track of that functionality.
Another non-critical test for the scores app would be the messaging that we show to students when they have no scores. It’s a classic edge case, and something that doesn’t last long as a real-life scenario for our students out there! We also have to maintain static data and users to keep testing this in an e2e — bringing us right back to our problems that we mentioned earlier.
How do we reduce the scope of our e2es but still maintain the same quality?
In order to help us slim down our expensive e2e suites, and help to tackle these problems, we now tackle testing with a multi-tier approach.
There’s a lot of stuff mentioned there, with one very special and important tool being PACT testing (to help us tackle those server-side issues) that Brendan Donegan spoke about at a recent TestHeads Dublin meetup, and Francislâiny Campos covered in a previous blog post. For now, let’s focus on Integration Tests.
What is App-level integration testing?
First of all, application-level integration tests don’t run in the browser. They run locally when developers make changes, and in our continuous integration pipelines, at the same time as our unit tests. They mount the entire application as part of the test, and mock only the minimum amount of functionality necessary to get the app to perform like it would in a browser.
This will usually mean mocking any API calls that the app makes, and not much more. We do this to adhere to the guiding principles of the test framework we use — React Testing Library, which state:
The more your tests resemble the way your software is used, the more confidence they can give you.
In a nutshell, you want to load your whole application in the test and perform interactions in the same way as a user would. This means clicking on buttons and entering text in input fields — not calling functions directly or mocking pieces of the underlying UI.
What’s wrong with mocking lots of stuff?
Let’s be clear, outside of the context of app-level tests, nothing! If you’re at all familiar with unit testing or writing local tests, you might already know how painful it can be to test things in isolation. Mocking things makes it easier to avoid complexity unrelated to your specific scenario, and often your unit test files can ends up with a lot of jest.mock(...
functions at the start for all sorts of inner components and files. You might also be familiar with ways to mock app state — wrapping inner components in all sorts of test utilities and providers that your App.js
file would normally supply that let you mock an imaginary state for different scenarios.
We don’t want to do that here though. The point of app-level testing is to try and replicate some of the scenarios you might have traditionally tested in an e2e suite in the browser — and to do that, you need to get your app looking as close to “real” as possible. For example, would an e2e test, or a user for that matter, ever look at the scores page where the header component is mocked and just shows a random <div>Mock header</div>
? Absolutely not. So we don’t do it in our integration tests!
What can be tested in app-level integration tests?
It’s worth noting that as per our pyramid, our expectation of what tests we ship with each feature has shifted somewhat. Instead of writing e2es for every single bug and feature, we expect that the majority of testing will be performed at the application integration level, and the e2e suites are reserved for high level user flows that might traverse many different applications or parts of Ed.
So, when we ask ourselves at the outset of a feature or story what testing we need, a better way to phrase this question is to describe what can’t be tested in integration tests!
- A full user flow that traverses through multiple apps (e.g., a teacher creates an assignment, a student completes the assignment, and a teacher views the results in reporting). This kind of scenario needs an e2e test, both from a technical point of view (as you can only mount one app at a time in an integration test) and from the point of view of the value a test like that gives us.
- Any scenario that, if it broke, would cause a major incident for our customers. That assignments scenario above could potentially cause an MI if any of the individual parts broke. I’m also thinking of things here like login, and accessing educational content — those definitely qualify for being covered by e2e tests.
But there are plenty of features that live under those flows that can be very easily tested in integration tests. Different flavours of the assignments pages for example (with pagination / filtering / sorting / different statuses), or what we show to users if and when we ever have network failures.
By writing tests at this level, you will have more confidence your code is correct and overall have fewer tests to write, as you will only need to write the few unit tests which are needed to cover edge cases in your code.
Alright, enough talk! Show me an example!
I’m getting there I promise! Here’s what you’ll need.
What frameworks should I use?
We use following frameworks for our app integration tests:
- @testing-library/react — We use this to mount our apps and interact with them. Sarah Flanagan gave a great example on how we use this already in our unit and component testing in her blog post Testing React apps: an introduction to React Testing Library. The test utilities it provides for us involve querying the DOM for nodes in a way that is similar to how a user would find elements on a page (like input fields).
- Mock Service Worker — This is what we use for mocking API calls, and lets you define API responses as if you have an actual server running beside your tests — like what should be returned by a
GET
request to/studentAssignments/:studentId
. It simplifies the way we mock service endpoints in our tests so that we don’t need to know anything about how the endpoint is called from the UI (whether it be by something like axios or fetch) we just need to know what the request is and what the response should be.
You may have used Apollo Mocking Provider
or even plain old jest
to mock the components that make data call-outs in the past — however, we highly recommend Mock Service Worker
. Why? Because it’s easier to write, more readable, and goes hand in hand with @testing-library/react
.
Imagine, for example, that in your functional UI code, you want to swap out your API-calling framework from Apollo to something else, or you decide to rename the module you were using in your jest.mocks
. If you do that, your tests will also need to be updated to reflect the new implementation if you use Apollo Mocking Provider
or jest.mocks
. However, if you use Mock Service Worker
to mock the endpoints, your tests will be decoupled from the implementation detail of your code, and will only need to be updated if the actual endpoint you call changes!
Got it — let’s dive in.
Say you have a React app that does something fairly simple, like one that shows a button you can click to fetch some data. After the user clicks the button, a loader icon will appear while the GET
request is fetching some data (https://jsonplaceholder.typicode.com/ is a useful free site for this), and then it displays the data on screen.
I used create-react-app
to whip up some quick and dirty code for this if you want to follow along. Make sure you have Node >= 10.16 and npm >= 5.6 locally, navigate to a new directory in your terminal, and run the following in a new directory:
$ npx create-react-app app-integration-testing
$ cd app-integration-testing
$ yarn add axios
$ yarn add msw --dev
$ yarn start
Open up your App.js
file and update it with the following markup to add the button and some error handling. (For the purposes of readability here and brevity, completely disregard the fact that I’ve crammed a bunch of code into the top level file instead of breaking it out into reusable pieces, or doing any sort of styling, internationalisation, responsiveness, or accessibility testing. I’m feeling reckless!)
Next up, let’s create a reusable helper file here to get Mock Service Worker running in our tests to intercept any API requests that our app might make.
Now, update your App.test.js
file to render the app, fire up Mock Service Worker, and interact with it like a user would — clicking the button and checking if the table renders on the page.
For this super simple app, and indeed any app at all, that’s all you need to get started — start Mock Service Worker, render the app, and write out the assertions you expect for a successful interaction. Once you run this test locally with yarn test
(or yarn test — watchAll=false
, if you want to avoid running the tests in watch mode) the magic of MSW
will start to complain at you in the console about any endpoints being called in your test that aren’t mocked.
Look at that! Your test is basically telling you exactly what to do — it’s picked up that your app is making a GET
request to https://jsonplaceholder.typicode.com/
that you haven’t mocked a response for. React Testing Library is also very helpfully rendering the DOM when your queries fail to find the table elements on the page, showing you exactly what’s happening to the app in your test.
Now that you know what endpoints you need to mock, you can work backwards — do the scenario you want to automate manually in the browser, and check what responses your API endpoints are providing. I decided to mock a response with just one result, and to make things easier for everyone to understand what the test is testing, I’ve renamed the test file too to DisplaysButtonAndPosts.integration.test.js
.
I can also add a test for network failure in almost exactly the same way with another test called DisplaysFailureMessage.integration.test.js
.
Awesome, our tests pass! Now what?
That’s it! Now that you’re up and running, the possibilities are endless:
- If you have a more complex app with lots of different API requests, you just need to define more handlers for those endpoints and pass them into your
getAndSetupServer
method. At HMH we generally put those in something like amockServiceWorkerHandlers.data.js
file so that we can reuse them across tests (unit tests too!) in an app. - You can mock all sorts of data scenarios now very easily — imagine you use this app with something like a student scores endpoint and you’ve implemented pagination on the table. You can now test this with another
TableWithPagination.integration.test.js
file rather than having to set up a user with lots of data and worry about the integrity of that data and the user in a real environment. - Our tests are now completely dissociated from the actual real-life services, which, as we mentioned at the start, often have their own deployments / performance issues that can impact unrelated UI testing.
- If you ever want to swap our
axios
for a different API mechanism, you can do so without having to update the mocks in your test — and your tests will scream at you in the console if anything goes wrong. A much quicker feedback loop when you’re caffeinated and in the zone than having to stop and lock up your machine with browser e2e tests. - You can write tests for all your network failure scenarios to make sure your users always have a great experience, even when your services go down — something that is very difficult to achieve in regular e2es.
And the best part? If you’ve already got e2e tests that are testing things that could be achieved with this pattern, you can delete them and rewrite them as app-level integration tests, reducing the scope of your e2es and speeding up every single e2e test run across every single environment on every test run! 🏃♀️
What about using this in more mature apps — any other tips?
Seeing as we’ve been using this pattern at HMH for quite a while now, we’ve amassed quite a few best practices in our internal documentation for more complex scenarios, and some general patterns that we find helpful for the monorepo & microfrontend architecture that we have on the Ed learning platform.
Use semantic locators
Accessibility is a cornerstone of our UI development here at HMH. Using semantic locators such as getByRole('button', { name: 'Assign' })
in your tests helps you identify inter-actable elements by their accessible role and therefore give that little bit of extra assurance that your app will interact with screen-readers in the way that you expect it to. It’s also another way for you to interact with your app like a user would — a screenreader user is never going to find an element by its data-testid
, but will most certainly use link navigation to navigate through links on the page.
Our friends at React Testing Library very helpfully maintain a cheatsheet on which query to use. Additionally, there is a very useful Chrome extension giving you a context menu option to tell you the right query to use for a particular element with @testing-library/react
.
Another useful trick is to put screen.logTestingPlaygroundURL();
in your test, run the test, grab the URL outputted in your console, and paste it into your browser. It will show you the DOM of your app in the browser and an un-styled version that you can click on to see what queries you should use (useful for identifying buttons by their aria-label, cell data in a table, etc).
Be careful with your CI — these tests can be expensive
Our first piece of advice was (and still is in most cases) to use semantic locators. However, we did find that these kind of locators can make tests take longer and use more memory due to having to traverse the DOM to find elements by role, something that has been reported in GitHub too.
You might have quite complicated applications as your microfrontends, and loading the entirety of those apps in jsdom
can slow down the tests. When working in a monorepo, this issue is particularly important to be conscious of. We had to do a lot of experimentation with our jest
threads to balance memory issues in our continuous integration (CI) with the ability to continue running these somewhat heavier tests. There’s not a huge amount of concrete advice we can give here, other than to be judicious with your queries and to make sure you invest time in the efficiency of your tooling (which you are hopefully already doing anyway!).
Test the number of API calls in your app for performance
We use Gatling for service load and performance testing at HMH, but it’s still a little bit of manual process to keep the Gatling tests up to date with the actual API calls made on every page throughout a user flow in Ed. One way to mitigate against your app accidentally introducing repeated API calls (through bad use of a hook or re-rendering, it happens!) is to add some assertions in your integration tests about the number of network requests received by MSW. You create a jest
spy on your server, and in your tests, you can expect(requestSpy).toHaveBeenCalledTimes(1)
or whatever your heart desires.
Render your apps once per test with @testing-library/react/pure
functions
This one might be a little bit contentious, but it has its pros and cons. Using pure
functions allow us to render the app once per test, interact with it sequentially using screen
and userEvent
, and then cleanup
once the test is complete. It would look something like this:
import { render, cleanup} from @testing-library/react/pure;...beforeAll(() => {
render(<App/ >);
};afterAll(cleanup);
Reasons you might want to follow this pattern:
- Think about getting your app integration tests as close to an e2e, or a real user interaction, as possible. If your tests are related closely enough to be in the same file or scenario, then they should all be operating on the same instance of the app, just like a user would!
- This also helps improve the performance of the integration test if you’re rendering a “heavy” app.
- One school of thought is that it makes these tests easier to read and write from a QE or automation engineer mindset, rather than a development engineer mindset — if you believe in the distinction between the two. (Although in my opinion everyone is capable of doing both — but that’s a discussion for another time!)
Reasons why you might not want to follow this pattern:
- It breaks the cardinal rule of independent tests — you now have
it
ortest
blocks in the same file that depend on each other to run in sequence, making it a little bit harder to debug failures. - You run the risk of forgetting to
cleanup
after yourself, and leak memory across tests.
Test React routing with the window object, and not with MemoryRouter
This one is best explained with an example. In reality, our apps look a little bit more complicated than the simple button / fetch demo we went through earlier. In our microfrontend architecture, we have apps that load on certain URLs and might need to use react-router
with a Router
wrapper around itself to be able to route to other URLs in the platform.
When you first write the test for an app like this, you might find that nothing loads when you do your render
because the Routes
component in the app only shows content if you are on the scores
URL. We have BrowserRouter
at the App.js
level in our application, so we can’t use MemoryRouter
to mock the window location here (MemoryRouter
only works as a replacement for the router, and doesn’t work as a wrapper — so <MemoryRouter initialEntries={['/scores']}><App/></MemoryRouter>
wouldn’t work in this situation).
Instead, you can replace the current state of the window
object in the test, and it won’t leak across to other tests. You can also use this to test routing out from one application to another (seeing as we can only mount one application in a test at a time).
Note that jest
/ JSDOM
is a trickster when it comes to mocking routing in the DOM. If your app uses the window
object directly to update the URL, you won’t be able to see the routing changes in your rendered app. We found that it does perform well with react-router
and react-router-dom
for route changes within an app though.
Query parameters and emulating server-side logic
Sometimes you find yourself needing to verify the contents of an API request (like a real server would) before sending back a response — perhaps you have multiple requests to the same endpoint in your test, or you are trying to provide as much test coverage for minor bugs as you can without writing an expensive e2e.
One example we had recently was in our create-assignments app. We had a bug that only appeared if a user interacted with our form in a very specific way, where the wrong number of students were being assigned to an assignment. Instead of writing an e2e to replicate this and then check the newly created assignments were given to the right students, we performed some logic on the request in our mock handler to give a success response if the request was correct, and a failure response if the request was incorrect to cover this bug.
Likewise for multiple requests to the same endpoint (perhaps with different query parameters), you mock the endpoint without the query parameters, and then you can extract the parameters in the request if/when you need them.
Don’t forget to actually eyeball your applications of course!
You might notice one glaring gap here in this approach— it just doesn’t give you that visual verification that your app still looks ok as its traverses through your user flow. The exciting news is however, that if you start using Mock Service Worker, you can reuse your mock handlers to surface apps and components on Storybook too! This is something we literally just started doing recently at HMH to speed up development, help us assess the consistency of our applications for different edge case states, and get us ready for visual regression testing with Chromatic. Lots of potential there, and definitely another blog post coming down the line.
And that’s all folks!
Hopefully that curated breakdown gave you an idea of the advantages app-level integration testing can bring to the development cycle, and how thrilled I am that it’s taken off so well here at HMH. Now go ahead and use your new-found knowledge out there in the wild — and feel free to let me know how you get on in the comments!
Does quality and automated testing give you a thrill? Have you got ideas for improving our approach? We need people like you to come join us at HMH!