cannot determine order of test execution in a parallel worker

Bug #974622 reported by Gary Poster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Testrepository
Fix Released
High
Benji York
testtools
Fix Released
Medium
Jonathan Lange

Bug Description

In working on making the Launchpad tests work with the testr run --parallel feature, we have encountered many tests that fail intermittently. Many of these have resolved into tests that fail or succeed depending on the order in which they run. For one of these intermittently failing tests, some previous test is not isolated--it leaves the environment in an unclean state. In some cases a previous test is necessary to run before another test succeeds. More often a previous test will cause a following test to fail.

The order that the tests run in is available via one file per process while they run in a temporary directory. When testr runs successfully to completion, these files are deleted.

In the same way that one can interrogate testr for the tests that failed, I would like to be able to interrogate it to see the ordered collection of tests that included the failing test. Given that, we should be able to run the tests again. It gives an easy way to verify that test ordering affects or does not affect outcome, and it gives us a reduced starting point of tests to determine which ones are affected.

Related branches

summary: - --parallel should provide easy access to the order in which tests were
- run
+ cannot determine order of test execution in a parallel worker
Changed in testrepository:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote :

I'm not sure that the order is available during execution, as the workers stream directly into the multiplexer.

I'd solve this by tagging tests coming in with a worker id - the sequence is still implicit within a worker.

One special case would be to identify tests already tagged this way, and use a nested tag (this will handle nested parallelisation where each parallel worker is a separate machine which internally parallelises further.

e.g. the logic would be something like:
when starting a parallel worker: add tag worker-%d
if an incoming tag of worker-%d is seen, translate that to worker-%d-%d where the left most %d is the id for this worker and the right hand %d is that from the incoming stream. (And make this generic so worker-%d-%d -> worker-%d-%d-%d too.

Revision history for this message
Gary Poster (gary) wrote :

At least in our case, the order is available during execution by looking at files in the temp directory. These are the files passed to bin/test --load-list. We can duplicate the test run outside of the parallel machinery by getting the appropriate file and using --load-list. This has provided a valuable analysis tool, and it is the use case I want to support.

Your approach certainly sounds more general, and could be the basis of nice general purpose support. As you'd expect, a full solution for our own use case needs to provide easy access to the filtered list of tests for a particular worker in a format that we can then use bin/test --load-list on.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 974622] Re: cannot determine order of test execution in a parallel worker

You have --shuffle in use, so --load-list does not determine the order
of execution. Certainly for other test runners, or for test runs
without --shuffle, you could use the load-list files as an
approximation, *as long as* the test runner in question cooperates.
For your scenario here though, of being able to reconstruct what an
arbitrary run did, you can't depend on them. (Due to shuffle).

> Your approach certainly sounds more general, and could be the basis of
> nice general purpose support.  As you'd expect, a full solution for our
> own use case needs to provide easy access to the filtered list of tests
> for a particular worker in a format that we can then use bin/test
> --load-list on.

Yes, you'd be able to filter the run down quite easily I think.

Revision history for this message
Gary Poster (gary) wrote :

ack Robert, sounds good.

Benji and I looked at this today and it looks like the cleanest place to put this in is testtools' ConcurrentTestSuite. Benji is adding tests and implementation there and will propose an MP.

Revision history for this message
Jonathan Lange (jml) wrote :

All of the necessary changes for implementing this with testtools have been put in place.

Changed in testtools:
status: New → Fix Committed
importance: Undecided → Medium
assignee: nobody → Jonathan Lange (jml)
Revision history for this message
Robert Collins (lifeless) wrote :

@jml I'm not sure it has - does your tagger nest properly? (That is if I setup two layers of parallelisation, independently, will we be able to reconstruct it ?)

Revision history for this message
Robert Collins (lifeless) wrote :

This is however at least minimally fixed today in trunk testrepository.

Changed in testrepository:
status: Triaged → Fix Committed
Revision history for this message
Jonathan Lange (jml) wrote :

I'm not even sure what nesting would mean for ``Tagger`` proper. It just tags things.

The thing in testrepository that comes up with tags for workers can't nest tags properly as is. If it could fetch current tags from the result, that would work.

Revision history for this message
Robert Collins (lifeless) wrote :

I sketched earlier in this bug how it might work: shifting incoming
tags out of the way by prefixing them.

e.g. all seen worker(-X)+ tags get replaced with worker-N\1.

-Rob

Changed in testrepository:
milestone: none → next
Changed in testtools:
milestone: none → next
Changed in testrepository:
assignee: nobody → Benji York (benji)
Changed in testtools:
status: Fix Committed → Fix Released
Changed in testrepository:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.