Testing In Production

In April I’m giving a talk at RailsConf titled “Testing in Production.” Today I thought I would write about a straightforward type of testing in production: running your existing tests against production.

There’s a good chance you have some kinds of tests for your code. You probably have some unit tests and you may have higher level integration tests. Maybe your integration tests are written in something like cucumber or a record/replay test automation tool. Now, the hard question, do you run these tests against production? Most folks I talk to don’t. They think of testing as something that happens in pre-production environments. If you have a QA/test team working on your project you may even think of testing as something that delays the push from development to production. As a former tester, I’d like to propose that you run at least some of your tests against production.

The Story

I started my career in QA doing both black box manual testing and automated testing using record/replay tools. While most record/replay tools create brittle tests, the program I was using at the time allowed you to parameterize the recording, so I was able to create a small set of tests that were pretty resilient. We used these tests whenever we pushed code, which was approximately once a month. Once we had confidence that the scripts worked, I got permission to run them against the live site on a spare computer I had in my office (I was using it partly as a space heater). I set up the tests to run every four hours and email me the results. Then I set up a filter in my email to file away all the successful runs. I set this up mostly as a way to know when the tests needed fixing. I didn’t expect to find actual bugs.

These tests ran without incident for several months, but one day a failure came through. I assumed that the tests were broken so I manually retried the scenario. It turned out that the tests had detected an actual error. The site I was working on at the time relied on several external partners to provide inventory. The test detected that one of the partners was returning empty responses to our queries. The requests weren’t failing, so the monitoring systems hadn’t noticed the error. The website’s code was designed to handle poorly formed responses gracefully, so customers hadn’t noticed there was an issue. But there was an issue, and the test had found it before any other part of our system detected the outage. I notified the appropriate staff, and they notified the partner, and the issue was fixed quickly and with no impact on real users.

The How

Tooling

Running tests against production sounds easy but some test frameworks make it easier than others. Unit and functional testing frameworks often assume that the code under test and the tests themselves are running on the same computer. Some integration test suites allow you to specify a URL to run against, but others do not, so what is the best course of action? I believe that where possible the integration tests you run in production should be the same ones you run against staging and dev environments. When I haven’t been able to do that, I use a tool like mechanize, which is also available for at least Python, Node, and Perl, to script the production website interactions I want to verify.

What To Verify

I try to limit my production test suite to the core workflows of the website. In the past, I’ve prioritized tests for authentication, purchase, and any other endpoints that drive a majority of the site traffic. I’ve also run tests in production for features that we wanted to demonstrate to potential investors or users. Like all testing, it is essential that your tests actually verify something concrete. If your tests just verify that the URL doesn’t return a 500, they aren’t particularly useful. I’ve always just picked critical text or links on the page and verified they appear in the HTML response. This doesn’t guarantee the page rendered successfully, but it should let you know that it was at least close and there were no major malfunctions.

Other Considerations

In my experience, it is helpful to have a way to distinguish test data from actual user data. This lets the team not include test data in reports about usage and lets you purge it from the database occasionally. There are many ways to flag test data. You can use a URL parameter, a database field, or a special naming scheme or even all three. The other way I avoid creating a bunch of test data in a production system is choosing tests that don’t create records or clean up after themselves. If I need to search for five products to cover the breadth of my site, I can do five search tests. Then I can write a separate purchase test, limiting the amount of data created. After that you can even use the transaction from the purchase test to run a cancellation test, cleaning up the purchase. Although I dislike interdependent tests, I make an exception in this case because the benefits of a test suite that cleans up after itself are more than the downsides of two well-documented but interdependent tests.

Wrapping up

This is just one part of what people consider testing in production, but it is easy to implement. If you have an existing integration test suite, you can choose a few specific tests to run against your production environment right away. And this technique does find bugs. It is especially likely to find bugs in integrations, bugs that only appear at certain times, and bugs that result from configuration errors. I’m not an expert on testing, and I’ve only had a chance to try these techniques out on three different products during my career, so I’m curious if you’ve tried this and how it worked for you. Let me know in the comments.