selftest --parallel on windows fails with 'lost connection'

Bug #551332 reported by Gordon Tyler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Gordon Tyler

Bug Description

When I run selftest on bzr.dev using --parallel=subprocess, the tests run but I get 8 errors (8 CPUs) at the end, all complaining about a lost connection.

Command line: bzr selftest --no-plugins --parallel=subprocess cmdline commands diff rules win32utils

Example error:

======================================================================
ERROR: bzrlib.tests.blackbox.test_cat.TestCat.test_cat_different_id
----------------------------------------------------------------------
_StringException: lost connection during test 'bzrlib.tests.blackbox.test_cat.Te'tCat.test_cat_different_id

I can run the same command without the --parallel option and it completes normally.

OS: Windows 7 64-bit

C:\dev\bzr\bzr.dev>bzrdev version
Bazaar (bzr) 2.2.0dev1
  from bzr checkout C:/dev/bzr/bzr.dev
    revision: 5121
    revid: <email address hidden>
    branch nick: bzr.dev
  Python interpreter: C:\Python26\python.exe 2.6.4
  Python standard library: C:\Python26\lib
  Platform: Windows-post2008Server-6.1.7600
  bzrlib: C:\dev\bzr\bzr.dev\bzrlib
  Bazaar configuration: C:\Users\Owner\AppData\Roaming\bazaar\2.0
  Bazaar log file: C:\Users\Owner\Documents\.bzr.log

Tags: selftest win32

Related branches

Gordon Tyler (doxxx)
description: updated
Revision history for this message
Martin Pool (mbp) wrote :

I wonder, just as a stab in the dark, if this is to do with Windows' quirky use of econnreset vs file closed.

tags: added: selftest win32
Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Gordon Tyler (doxxx) wrote :

As a semi-consequence of this, I am unable to reasonably run the full test suite for bzr. Running it non-parallel takes many hours. Tests are good but not if you can't run them.

Revision history for this message
Martin Pool (mbp) wrote :

I'll bump this up a bit then.

Changed in bzr:
importance: Low → Medium
Revision history for this message
Martin Pool (mbp) wrote :

Can you get a traceback for this at all? Maybe through BZR_PDB=1?

Revision history for this message
Gordon Tyler (doxxx) wrote :

I'm not sure I used BZR_PDB=1 correctly. I set it in my command shell's environment and then ran the test again. No perceivable difference in output, nothing in bzr.log.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 551332] Re: selftest --parallel on windows fails with 'lost connection'

On Wed, 2010-03-31 at 02:49 +0000, Gordon Tyler wrote:
> I'm not sure I used BZR_PDB=1 correctly. I set it in my command shell's
> environment and then ran the test again. No perceivable difference in
> output, nothing in bzr.log.

BZR_PDB=1 won't do much, because bzr isn't failing - the exceptions are
being caught internally.

The key questions are:
 - are the subprocesses generating valid subunit streams
 - or is the parent getting confused.

Could you try:
 bzr selftest --parallel=subprocess test_selftest

Which will run a small number of pretty portable tests, but should
generate a reasonable sized stream.

If it fails in the same way, I'm going to suspect the parent side. If it
fails differently, the subprocesses.

You could hack tests/__init__.py to decorate the subprocess streams with
something that will record the output to a file, which we can look at
later.

-Rob

Revision history for this message
Gordon Tyler (doxxx) wrote :

On 3/30/2010 11:14 PM, Robert Collins wrote:
> Could you try:
> bzr selftest --parallel=subprocess test_selftest
>
> Which will run a small number of pretty portable tests, but should
> generate a reasonable sized stream.
>
> If it fails in the same way, I'm going to suspect the parent side. If it
> fails differently, the subprocesses.

It fails the same way.

> You could hack tests/__init__.py to decorate the subprocess streams with
> something that will record the output to a file, which we can look at
> later.

Running the test generates a lot of what looks like subunit streams to
stdout. I've redirected stdout to a file and I'll upload it once
Launchpad becomes writable again.

Ciao,
Gordon

Revision history for this message
Gordon Tyler (doxxx) wrote :

I've attached the stdout of that command.

Revision history for this message
Gordon Tyler (doxxx) wrote :

So I think I found something which might be related. On win32, writing '\n' to sys.stdout in python becomes '\r\n'. The HTTP Chunked encoder in subunit is writing lines ending in '\r\n' to the output, which in the case of --parallel=subprocess is each subprocess' stdout, thus it's coming out as '\r\r\n'.

Would this affect the parsing of subprocess' subunit streams by the parent process?

Revision history for this message
Gordon Tyler (doxxx) wrote :

I found this recipe for forcing stdout into binary mode on win32: http://code.activestate.com/recipes/65443-sending-binary-data-to-stdout-under-windows/

import sys

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

Revision history for this message
Gordon Tyler (doxxx) wrote :

Another solution is to run each python subprocess with the '-u' flag for unbuffered stdout/stdin, which apparently disables the \n -> \r\n conversion.

Changed in bzr:
assignee: nobody → Gordon Tyler (doxxx)
status: Confirmed → In Progress
Revision history for this message
Gordon Tyler (doxxx) wrote :

It looks like the end-of-line corruption was indeed the problem. Merge proposal incoming.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 551332] Re: selftest --parallel on windows fails with 'lost connection'

OK, we should probably do that.

There's another bug about the .. unattractiveness of mandating a mix
of lf and crlf.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Martin Pool (mbp) wrote :

The problem is bug 505078.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Gordon Tyler (doxxx) wrote :

As pointed out by bialix in the merge proposal, the '-u' option doesn't work for bzr.exe. I'll investigate the msvcrt hack as a replacement.

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 551332] Re: selftest --parallel on windows fails with 'lost connection'

Gordon Tyler пишет:
> As pointed out by bialix in the merge proposal, the '-u' option doesn't
> work for bzr.exe. I'll investigate the msvcrt hack as a replacement.

It's not a hack. We're using the same approach in commands.py
Commands._setup_outf (or at least code was there in the past).

--

All the dude wanted was his rug back

Revision history for this message
Alexander Belchenko (bialix) wrote :

Gordon Tyler пишет:
> As pointed out by bialix in the merge proposal, the '-u' option doesn't
> work for bzr.exe. I'll investigate the msvcrt hack as a replacement.

There is no need to replace -u with msvcrt-based code. bzr.exe is not
Python interpreter, so simply --parallel option should be disabled (or
produce meaningful error at least) if we run selftest from bzr.exe.

--

All the dude wanted was his rug back

Revision history for this message
Gordon Tyler (doxxx) wrote :

If selftest is meant to work with bzr.exe why should all its options not work as well?

Revision history for this message
Alexander Belchenko (bialix) wrote :

Gordon Tyler пишет:
> If selftest is meant to work with bzr.exe why should all its options not
> work as well?

I don't know how --parallel option supposed to work. But right now IIUC
your patch think that sys.executable is Python interpreter. bzr.exe does
not have python.exe python interpreter executable around, only
pythonXY.dll interpreter library.

So now you tell me: "why should all its options not work as well?" ;-)

--

All the dude wanted was his rug back

Revision history for this message
Gordon Tyler (doxxx) wrote :

I understand the problem. My point is that if we can make it work for bzr.exe using the msvcrt method, why not do it? If we're going to start dropping features from selftest when run using bzr.exe, it makes me wonder why selftest is available in bzr.exe at all.

Revision history for this message
Alexander Belchenko (bialix) wrote :

Gordon Tyler пишет:
> I understand the problem. My point is that if we can make it work for
> bzr.exe using the msvcrt method, why not do it? If we're going to start
> dropping features from selftest when run using bzr.exe, it makes me
> wonder why selftest is available in bzr.exe at all.

Full test suite of bzr itself does not pass and even does not work correctly for bzr.exe.

But! Running tests for most plugins does work. And I'm using bzr.exe for working on several plugins
and using selftest is a big win.

So, please keep bzr.exe selftest working, but don't spend too much time trying to fix it. Almost
nobody needed it for bzr.exe itself. Some people needed it for plugins, but there such complicated
things like --parallel usually don't required. I could be wrong of course.

--
All the dude wanted was his rug back

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 551332] Re: selftest --parallel on windows fails with 'lost connection'

I think it would be great to have --parallel work with bzr.exe, but like
Alexander I don't think its hugely important. Gordon, I'd say - if you're
interested in making it work, great. If not, thats ok to!

Vincent Ladeuil (vila)
Changed in bzr:
milestone: none → 2.2b3
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.