bzr+http mod_python wsgi issues

Bug #119330 reported by Martin Packman
4
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Unassigned

Bug Description

This is a bit of a cascade problem, started with wanting to adapt to my setup the mod_python config for a bzr+http server, found here:
http://doc.bazaar-vcs.org/bzr.dev/http_smart_server.htm
The server to smarten:
Server: Apache/2.2.4 (Win32) mod_python/3.3.1 Python/2.4.1

However, first attempt (see coming py attachment) yields:
> bzr branch bzr+http://localhost/bzr/stst/ stst.clone
bzr: ERROR: Not a branch: bzr+http://localhost/bzr/stst/

As that seemed from logs to be not liking the bzr at the start, attapted to serve from base, but then failed (after creating some of the branch):
> bzr branch bzr+http://localhost/stst/ stst.clone
bzr: ERROR: exceptions.AssertionError: unexpected response code ('error', 'not a bzip2 file')
and a traceback

Hacked in some logging to try and debug the problem, but after finding out a few interesting things (see coming logs), got lost.

Revision history for this message
Martin Packman (gz) wrote :
Revision history for this message
Martin Packman (gz) wrote :
Revision history for this message
Martin Packman (gz) wrote :

Note that RelpathSetter *is* doing its job, it's just getting ignored somewhere else later

Revision history for this message
Martin Packman (gz) wrote :

And it was doing so well...

Revision history for this message
Martin Packman (gz) wrote :

Ah, and comment that might deserve a seperate bug, if the answer's not a RTFM: some way to turn on the default bzr logging for the calls from within the server (without conflicting with the bzr talking to the server) would have made this much easier to work out what was going on.

Martin Packman (gz)
description: updated
Martin Packman (gz)
description: updated
Revision history for this message
Martin Packman (gz) wrote :

Okay, have resolved the second issue. First hacked out some unhelpful catches so could follow the actual problem:

--- bzrlib/smart/protocol.py 2007-06-04 18:40:51.890625000 +0100
+++ bzrlib-mod/smart/protocol.py 2007-06-08 19:40:41.937500000 +0100
@@ -106,7 +106,7 @@
                     self._send_response(self.request.response)
             except KeyboardInterrupt:
                 raise
- except Exception, exception:
+ except None: # except Exception, exception: # expose traceback
                 # everything else: pass to client, flush, and quit
                 self._send_response(request.FailedSmartServerResponse(
                     ('error', str(exception))))

--- Lib/tarfile.py 2007-06-08 20:15:22.734375000 +0100
+++ Lib-mod/tarfile.py 2007-06-08 19:39:44.234375000 +0100
@@ -997,7 +997,7 @@

         try:
             t = cls.taropen(tarname, mode, bz2.BZ2File(name, mode, compresslevel=compresslevel))
- except IOError:
+ except None: # except IOError: # Bad catch, gets non-bz2-read-fail related exceptions
             raise ReadError, "not a bzip2 file"
         t._extfileobj = False
         return t

This gives a nice traceback in the server error log, attached.

Revision history for this message
Martin Packman (gz) wrote :

Which leads to trying the following in the console:

>>> import tempfile, tarfile
>>> temp = tempfile.NamedTemporaryFile()
>>> tarball = tarfile.open(temp.name, mode='w:bz2')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python24\lib\tarfile.py", line 896, in open
    return func(name, filemode, fileobj)
  File "C:\Python24\lib\tarfile.py", line 999, in bz2open
    t = cls.taropen(tarname, mode, bz2.BZ2File(name, mode, compresslevel=compres
slevel))
IOError: [Errno 13] Permission denied: 'c:\\windows\\temp\\tmphahxhi'

Patch that works under my setup attached, though not sure of full implications of change. Now uses a non-random-seeking mode of operation, which may or may not matter.

Revision history for this message
Martin Packman (gz) wrote :
Download full text (3.3 KiB)

Right, better write up my resolutions from the other day to the first issue, in case anyone wants to have a look at this at some point.

For simplicity, the walkover below is using the path names from the current http_smart_server.htm doc.

So, branch is intended to be:
bzr branch bzr+http://example.com/code/someproject

Where url requested works out as:
http://example.com/code/someproject/.bzr/smart
'--scheme+domain-''---givenpath---''--added--'
                  '-------actualpath---------'

The server location is
/srv/example.com/www/code
'---baselocation---''-----webvisible-----

Note that though the /code/... paths to access through the server corresponds to a ./code/... directory in the filesystem in this example, there's no actual need for there to be any *real* 'code' directory in the server's filesystem at all. It could be various different bazaar repos from different locations all exposed via aliases in the server configuration.

The script location (though again, this could really be anywhere) is given as:
/srv/example.com/scripts/bzr-smart.py

In the script, the config for smart_server_app sets:
    root='/srv/example.com/code', prefix='/code/'

This maps the path:
/srv/example.com/code
to:
chroot-xxxx:///
*and* establishes a prefix that the wsgi server will strip before dealing with request urls.

So, a real request will come in over HTTP in the form:
POST /code/someproject/.bzr/smart
With attached body with instructions for the smart server.
The wsgi script will strip off front "/code/" and rear "/.bzr/smart" to give "someproject" and add to chroot to give chroot-xxx:///someproject as relative link to be used later.
The problem comes from the strings that are posted (and passed straight through to the bzr server untouched by wsgi). When requests are given with relative paths:
'get\x01.bzr/branch-format'
Join chroot-xxx:///someproject and .bzr/branch-format to form chroot-xxx:///someproject/.bzr/branch-format which reduces to /srv/example.com/www/code/someproject/.bzr/branch-format - which is right.
However, requests are also made with absolute paths from the server url namespace:
'BzrDir.open\x01/code/someproject/'
Join chroot-xxx:///someproject and /code/someproject/ (abs. path clobbers existing part) to make chroot-xxx:///code/someproject/ which reduces to /srv/example.com/www/code/code/someproject/ - which is not right.

Note, you can 'fix' this particular example by rolling the root= of the smart_server_app up one level and having an empty prefix. This is only due to the coincidence of the absserverpath mapping directly to the filesystem (which is not guaranteed), and also gives the smart server a jail containing the whole webserver, rather than just the code dir. At any rate, the config as given in the example doc appears to just not to work.

I have a patch that actually fixes this behaviour, but by subclassing chroot and just changing the behaviour of abspath to kill the prefix. Which is cute, but obtuse and really not The Right Thing. Basically, either the smart server needs to understand that it might be passed absolute paths to some scheme it's not using, or the client needs to not issue requests that use absolute pa...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

This makes it very difficult to use bzr+http, so I'm marking it High priority.

Basically, I think that once we have a connection to the remote server, all requests (passed paths) should be relative. We may need the ability to allow '..' when searching for a containing repository, etc. But honestly that should be taken care of by the abstraction. In other words, we shouldn't have to search for the remote repository by changing directories, it should be a simple RPC "find the repository".
I believe we already have such a request written, but we are not using it correctly ATM.

Changed in bzr:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Martin Packman (gz) wrote :

I notice Bug #124089 covers the final issue here, and a fix is now committed? I'll update my local bzr later and test.

Had a patch to http_smart_server doc to simplify the config (remove need for mod_rewrite etc) and recognise mod_python 3.3.1 changes. If bzr+http is now usable, that may be worth committing.

Jelmer Vernooij (jelmer)
Changed in bzr:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.