Testrepository

cannot determine order of test execution in a parallel worker

Bug #974622 reported by Gary Poster on 2012-04-05

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Testrepository	Fix Released	High	Benji York	Testrepository 0.0.6
	testtools	Fix Released	Medium	Jonathan Lange	testtools 0.9.15

Bug Description

In working on making the Launchpad tests work with the testr run --parallel feature, we have encountered many tests that fail intermittently. Many of these have resolved into tests that fail or succeed depending on the order in which they run. For one of these intermittently failing tests, some previous test is not isolated--it leaves the environment in an unclean state. In some cases a previous test is necessary to run before another test succeeds. More often a previous test will cause a following test to fail.

The order that the tests run in is available via one file per process while they run in a temporary directory. When testr runs successfully to completion, these files are deleted.

In the same way that one can interrogate testr for the tests that failed, I would like to be able to interrogate it to see the ordered collection of tests that included the failing test. Given that, we should be able to run the tests again. It gives an easy way to verify that test ordering affects or does not affect outcome, and it gives us a reduced starting point of tests to determine which ones are affected.

Related branches

lp:~benji/testrepository/add-worker-id-tagging

Merged into lp:~testrepository/testrepository/trunk at revision 148

Jonathan Lange: Approve on 2012-04-20

Robert Collins (lifeless) on 2012-04-05

summary:	- --parallel should provide easy access to the order in which tests were - run + cannot determine order of test execution in a parallel worker
Changed in testrepository:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Robert Collins (lifeless) wrote on 2012-04-05:

#1

I'm not sure that the order is available during execution, as the workers stream directly into the multiplexer.

I'd solve this by tagging tests coming in with a worker id - the sequence is still implicit within a worker.

One special case would be to identify tests already tagged this way, and use a nested tag (this will handle nested parallelisation where each parallel worker is a separate machine which internally parallelises further.

e.g. the logic would be something like:
when starting a parallel worker: add tag worker-%d
if an incoming tag of worker-%d is seen, translate that to worker-%d-%d where the left most %d is the id for this worker and the right hand %d is that from the incoming stream. (And make this generic so worker-%d-%d -> worker-%d-%d-%d too.

Revision history for this message

Gary Poster (gary) wrote on 2012-04-09:

#2

At least in our case, the order is available during execution by looking at files in the temp directory. These are the files passed to bin/test --load-list. We can duplicate the test run outside of the parallel machinery by getting the appropriate file and using --load-list. This has provided a valuable analysis tool, and it is the use case I want to support.

Your approach certainly sounds more general, and could be the basis of nice general purpose support. As you'd expect, a full solution for our own use case needs to provide easy access to the filtered list of tests for a particular worker in a format that we can then use bin/test --load-list on.

Revision history for this message

Robert Collins (lifeless) wrote on 2012-04-09: Re: [Bug 974622] Re: cannot determine order of test execution in a parallel worker

#3

You have --shuffle in use, so --load-list does not determine the order
of execution. Certainly for other test runners, or for test runs
without --shuffle, you could use the load-list files as an
approximation, *as long as* the test runner in question cooperates.
For your scenario here though, of being able to reconstruct what an
arbitrary run did, you can't depend on them. (Due to shuffle).

> Your approach certainly sounds more general, and could be the basis of
> nice general purpose support. As you'd expect, a full solution for our
> own use case needs to provide easy access to the filtered list of tests
> for a particular worker in a format that we can then use bin/test
> --load-list on.

Yes, you'd be able to filter the run down quite easily I think.

Revision history for this message

Gary Poster (gary) wrote on 2012-04-09:

#4

ack Robert, sounds good.

Benji and I looked at this today and it looks like the cleanest place to put this in is testtools' ConcurrentTestSuite. Benji is adding tests and implementation there and will propose an MP.

Revision history for this message

Jonathan Lange (jml) wrote on 2012-04-20:

#5

All of the necessary changes for implementing this with testtools have been put in place.

Changed in testtools:
status:	New → Fix Committed
importance:	Undecided → Medium
assignee:	nobody → Jonathan Lange (jml)

Revision history for this message

Robert Collins (lifeless) wrote on 2012-04-29:

#6

@jml I'm not sure it has - does your tagger nest properly? (That is if I setup two layers of parallelisation, independently, will we be able to reconstruct it ?)

Revision history for this message

Robert Collins (lifeless) wrote on 2012-04-29:

#7

This is however at least minimally fixed today in trunk testrepository.

Changed in testrepository:
status:	Triaged → Fix Committed

Revision history for this message

Jonathan Lange (jml) wrote on 2012-04-30:

#8

I'm not even sure what nesting would mean for ``Tagger`` proper. It just tags things.

The thing in testrepository that comes up with tags for workers can't nest tags properly as is. If it could fetch current tags from the result, that would work.

Revision history for this message

Robert Collins (lifeless) wrote on 2012-04-30:

#9

I sketched earlier in this bug how it might work: shifting incoming
tags out of the way by prefixing them.

e.g. all seen worker(-X)+ tags get replaced with worker-N\1.

-Rob

Robert Collins (lifeless) on 2012-05-03

Changed in testrepository:
milestone:	none → next
Changed in testtools:
milestone:	none → next
Changed in testrepository:
assignee:	nobody → Benji York (benji)

Robert Collins (lifeless) on 2012-05-07

Changed in testtools:
status:	Fix Committed → Fix Released

Robert Collins (lifeless) on 2012-05-08

Changed in testrepository:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.