> I've just found these code bits and just wanted to put a quick update
> here. As far as I can see, this mechanism kicks in in the following
> cirumstances (_readv in http/__init.py):
> - errors.ShortReadvError: short-read; the the range is actually incomplete
> - errors.InvalidRange
> - errors.InvalidHttpRange: when the range headers are malformed.
> However, none of these are applying for reading the boundary marker
> (read_boundary in http/response.py) when it is missing.
Correct.
> I will first attempt fitting the necessary adaption there.
Great.
> I'll see if I can use the existing mechanism for issues in Content-
> Range's (_degrade_range_hint in http/__init__.py). According to their
> documentation the degration flag propagates whenever the connection
> object is cloned - not sure when this is done, though.
Hmm, yeah, it's a bit unclear and hairy (and there are edge cases where
it's bogus):
- it starts at None,
- it's set once the first multi-range request succeeds (in whatever form
including retries),
- it propagates by cloning (see __init__ in HttpTransportBase),
So if cloning occurs before the first successful request, each clone
have to rediscover it.
A better way would be make this attribute part of the connection
parameters (which contains only the authentication data so far). Don't
worry about that too much in a first step if you feel it's too complex.
Anyway, the place you should detect the bug and raise a specific
exception is probably in http/response.py, catching it should occur
where the hint is handled.
> Even if it doesn't propagate and we're not doing requests in
> parallel, we'll be using the same HTTP request object for the
> remaining requests --- and that's the one who's keeping the HTTP
> connection open.
You mean connection not request here right ? The connection attribute in
the request object is the one that is shared between the transports and
this achieved via the request objects but we create a request object
for... each request.
>> This is also bad for the client and in the end for the user as it
>> means the connection will have to be re-established incurring
>> latency penalty.
> Regarding latency I think flag on fail is a good solution. A user
> not affected by evil proxies will not take a performance hit. For
> the rest of us this re-establishment will happen exactly once per
> HTTP client object and then the degraded mode gets activated and
> successive requests over the same connection will request
> individual ranges and the connection will be kept alive. This is
> definitely preferable over a broken checkout. :)
Yes, that's the idea.
> Regarding a config option I have a few reasons to refrain from
> doing so:
> 1. For an ordinary user it will be very difficult to determine
> whether he's affected by this issue or not. He'd have to be aware
> of a transparent proxy being in place and he'd have to be aware
> that those bazaar requests are quite uncommon. Most of the web
> works very well without HTTP multipart responses. ;)
But almost always used by bzr when supported by the server. But once
it's established that a server is affected, the option can be set in
such a
> 2. Affected are only users who fetch with bazaar through
> HTTP. That means amongst the people behind an evil proxies only
> people are hit that are *not* fetching sources using typical
> developer transport connections (like https or ssh).
Oh, you're right, https is immune !
> Most projects offer .tar.gz's and you'd ordinarily download
> these. I stumbled over the issue only myself, because I was
> writing a live ebuild for Gentoo that checks out another project
> using bazaar.
> 3. /me would have to fiddle with two proably quite distant (in
> terms of cross references) parts of bazaar. I leave that to you
> pros. :-P
No problem with that. Correctly detecting the behavior is the most
important part, we can help you for the rest or even handle it
ourselves.
>>>>> Miriup writes:
<snip/>
> I've just found these code bits and just wanted to put a quick update
> here. As far as I can see, this mechanism kicks in in the following
> cirumstances (_readv in http/__init.py):
> - errors. ShortReadvError : short-read; the the range is actually incomplete InvalidHttpRang e: when the range headers are malformed.
> - errors.InvalidRange
> - errors.
> However, none of these are applying for reading the boundary marker
> (read_boundary in http/response.py) when it is missing.
Correct.
> I will first attempt fitting the necessary adaption there.
Great.
> I'll see if I can use the existing mechanism for issues in Content- range_hint in http/__init__.py). According to their
> Range's (_degrade_
> documentation the degration flag propagates whenever the connection
> object is cloned - not sure when this is done, though.
Hmm, yeah, it's a bit unclear and hairy (and there are edge cases where
it's bogus):
- it starts at None,
- it's set once the first multi-range request succeeds (in whatever form
including retries),
- it propagates by cloning (see __init__ in HttpTransportBase),
So if cloning occurs before the first successful request, each clone
have to rediscover it.
A better way would be make this attribute part of the connection
parameters (which contains only the authentication data so far). Don't
worry about that too much in a first step if you feel it's too complex.
Anyway, the place you should detect the bug and raise a specific
exception is probably in http/response.py, catching it should occur
where the hint is handled.
> Even if it doesn't propagate and we're not doing requests in
> parallel, we'll be using the same HTTP request object for the
> remaining requests --- and that's the one who's keeping the HTTP
> connection open.
You mean connection not request here right ? The connection attribute in
the request object is the one that is shared between the transports and
this achieved via the request objects but we create a request object
for... each request.
>> This is also bad for the client and in the end for the user as it
>> means the connection will have to be re-established incurring
>> latency penalty.
> Regarding latency I think flag on fail is a good solution. A user
> not affected by evil proxies will not take a performance hit. For
> the rest of us this re-establishment will happen exactly once per
> HTTP client object and then the degraded mode gets activated and
> successive requests over the same connection will request
> individual ranges and the connection will be kept alive. This is
> definitely preferable over a broken checkout. :)
Yes, that's the idea.
> Regarding a config option I have a few reasons to refrain from
> doing so:
> 1. For an ordinary user it will be very difficult to determine
> whether he's affected by this issue or not. He'd have to be aware
> of a transparent proxy being in place and he'd have to be aware
> that those bazaar requests are quite uncommon. Most of the web
> works very well without HTTP multipart responses. ;)
But almost always used by bzr when supported by the server. But once
it's established that a server is affected, the option can be set in
such a
> 2. Affected are only users who fetch with bazaar through
> HTTP. That means amongst the people behind an evil proxies only
> people are hit that are *not* fetching sources using typical
> developer transport connections (like https or ssh).
Oh, you're right, https is immune !
> Most projects offer .tar.gz's and you'd ordinarily download
> these. I stumbled over the issue only myself, because I was
> writing a live ebuild for Gentoo that checks out another project
> using bazaar.
> 3. /me would have to fiddle with two proably quite distant (in
> terms of cross references) parts of bazaar. I leave that to you
> pros. :-P
No problem with that. Correctly detecting the behavior is the most
important part, we can help you for the rest or even handle it
ourselves.