Jazoon 2012: Large scale testing in an Agile world

Alan Ogilvie is working at a division of IBM responsible for testing IBM’s Java SE product. Some numbers from his presentation:

A build for testing is about 500MB (takes 17 min to download to a test machine)
There are 20 different versions (AIX, Linux, Windows, z/OS * x86, power, zSeries)
The different teams create 80..200 builds every day
The tests run on heaps from 32MB to 500GB
They use hardware with 1 to 128+ cores
4 GC policies
More than 1000 different combinations of command line options
Some tests have to be repeated a lot of time to catch “1 out of 100” failures that happen only very rarely

That amounts to millions of test cases that run every month.

1% of them fail.

To tame this beast, the team uses two approaches:

Automated failure analysis that can match error messages from the test case to known bugs
Not all of the tests are run every time

The first approach makes sure that most test failures can be handled automatically. If some test is there to trigger a known bug, that shouldn’t take any time from a human – unless the test suddenly succeeds.

The second approach is more interesting: They run only a small fraction of the tests every time the test suite is started. How can that possibly work?

If you run a test today and it succeeds, you will have some confidence that it still works today. You’re not 100% sure but, well, maybe 99.5%. So you might skip this test today and mark it as “light green” in the test results (as opposed to “full green” for a test that has been run this time).

What about the next day? You’re still 98% sure. And the day after that? Well, our confidence is waning fast, so we’re still pretty sure – 90%.

The same goes for tests that fail. Unless someone did something about them (and requested that this specific test is run again), you can be pretty sure that the test would fail again. So it gets light red unlike the tests that failed today.

This way, most tests only have to be run once every 4-5 days during development.

Why would they care?

For a release, all tests need to be run. That takes three weeks.

They really can’t possibly run all tests all the time.

This entry was posted on Thursday, June 28th, 2012 at 21:06 and is filed under Conference, Software. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Dark Views