importds and DB authentication

Bug #510490 reported by Steve McInerney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Michael Hudson-Doyle

Bug Description

The 3 importds seem to be causing pain with oidentd - used for DB authentication.

They manage to cause oidentd to spawn large numbers of children.
Which is not good.

We don't see this behaviour anywhere else, except on the importds.

Discussions suggest a possible cause may be around the importds holding DB connections open for LONG periods of time.

Tags: lp-code qa-ok

Related branches

Revision history for this message
Tom Haddon (mthaddon) wrote :

Steve, I've only seen this on galapagos. Have you seen it on the other importd servers as well?

Revision history for this message
Tom Haddon (mthaddon) wrote :

Hmm, I stand corrected (galapagos seems to be the worst offender, but by no means the only importd server with issues):

importd@galapagos:~$ ps auwwxx | grep oident | wc -l
181

mthaddon@russkaya:~$ ps auwwxx | grep oident | wc -l
101

mthaddon@neumayer:~$ ps auwwxx | grep oident | wc -l
44

For comparison:

mthaddon@gangotri:~$ ps auwwxx | grep oident | wc -l
2

Revision history for this message
Steve McInerney (spm) wrote :

Yeah, seen it on all.

My current working theory (hereafter referred to as "guess") is that the load issues we're seeing where galapagos is getting the lions share are somehow related.

somewhat busted by the diff counts between neumayer and russkaya; perhaps if we stab all 3 at the same time and watch from there... :-/

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

How much of an actual problem is this? Happy to bump the priority but basically have no idea what's happening...

Changed in launchpad-code:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Tom Haddon (mthaddon) wrote :

I would say this is definitely a high priority problem - it means a lot of hand-holding for the importd servers, as they reach their process limit alerts in nagios, so we need to quiesce, restart oidentd and the set them online again to fix it. Lots of unnecessary manual intervention.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

OK, bumping priority then.

The only way I can think of dealing with this is to not talk to the database directly at all from the importd machines, to route everything through the internal xml-rpc server -- luckily I think this is probably a good idea anyway.

Changed in launchpad-code:
status: Incomplete → Triaged
importance: Medium → High
Changed in launchpad-code:
assignee: nobody → Michael Hudson (mwhudson)
milestone: none → 10.03
Revision history for this message
Tom Haddon (mthaddon) wrote :

Just got a processes alert for galapagos:

importd@galapagos:~$ ps auwwxxx | grep oident | wc -l
216

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Well, I spent basically all of today working on the fix :-)

We can look at cherrypicking it, I guess but it's quite large so we'll need to test the bejeezus out of it on staging first.

Revision history for this message
Ursula Junque (ursinha) wrote : Bug fixed by a commit
Changed in launchpad-code:
status: Triaged → Fix Committed
tags: added: qa-needstesting
tags: added: qa-ok
removed: qa-needstesting
Tim Penhey (thumper)
Changed in launchpad-code:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.