Comment 31 for bug 380504

Revision history for this message
Gary Poster (gary) wrote :

The following is in regards to the underlying problem of the frequent 502 errors, rather than the lazr.restfulclient changes we've been discussing, of which I am still in favor.

James offered yesterday to set up a dedicated launchpadlib instance to catch the errors if I could tell him what logs to capture. I had in mind the logs that Leonard had suggested in comment #2, but I wanted to verify that these would be useful, so I said I'd get back to him.

I then talked with Francis about his earlier investigation described in comment #10, and what we might be able to investigate now. He proposed a reasonable hypothesis, which we could test.

HAProxy will queue requests up to 30 seconds, and send no more than 8 at once to the appservers. Busy appservers will mean that HAProxy may drop requests. This could be the cause of the 502 errors.

Things to test for this hypothesis that come to mind:
- does a 502 typically indicate HAProxy complaining, and 503 indicate Apache complaining? Can we find logs for HAProxy and Apache that correlate errors there with errors reported in the clients?
- when clients get 502s and 503s, are they typically accompanied with request times of >30 seconds?

Therefore, for client-side logs for this problem, I'd be inclined to get logs of what Leonard requests (which is what wgrant shows in Comment #4, I think) along with logs of how long the request took (which I do not see in that example output). Is that reasonable, and reasonably easy to gather?

Additionally and separately, we should investigate HAProxy logs and see what we can discover.