Comment 20 for bug 380504

Revision history for this message
Bryce Harrington (bryce) wrote : Re: [Bug 380504] Re: Handle HTTP Error 502: Bad Gateway automatically

On Thu, Mar 25, 2010 at 02:19:00PM -0000, Gary Poster wrote:
> On Mar 24, 2010, at 6:31 PM, Bryce Harrington wrote:
> > On Wed, Mar 24, 2010 at 08:41:26PM -0000, Leonard Richardson wrote:
> >>
> >> How transient are these transient exceptions? I'm open to the idea of
> >> having launchpadlib deal with 5xx errors by retrying a configurable
> >> number of times using exponential backoff.
> >
> > I generally get them one or two times a day on scripts that are run
> > hourly, so say less than 5% of the time.
>
>
> That production would have daily problems like this is an unpleasant surprise.
>
> When we last looked into these reports, Francis and the LOSAs found that
> the reports coincided with nightly edge updates, deployments and other
> planned downtimes.

Yes, when they pointed that out I noticed a pattern of time when the
failures would occur. I subsequently adjusted my scripts to not run
during that 1-2 hour period. But there were also some outliers that
didn't fit into the pattern.

Then I thought maybe the issues were due to edge being restarted and so
on, so I moved away from using edge to lpnet. It took some time to
update all my scripts to parameterize this rather than reference the
edge url directly but all scripts have been changed over now.

> Have you noticed any potentially helpful patterns for these? For
> instance, are they often at a particular time, or on particular
> launchpadlib calls?

Unfortunately no. Sometimes they happen when setting up the launchpad
connection initially, but most of the time they occur on some later call
almost arbitrarily. Typically I'll run through a loop making the same
sequence of calls for a bunch of times before the error triggers.

I have a number of scripts which run for long periods of time (due I
think to slowness in attachment handling on the launchpadlib side, which
I've reported separately). I might speculate that the longer a
launchpad connection is kept open the higher the chance it has of
triggering this error. But for all I know it's just random.

> I suspect that we'll want to ask you to institute httplib2 logging
> again, as Leonard described earlier. Leonard, could you confirm?

Can do. (Might take a few days to collect it though.)

Bryce