App servers dying and leaving stale pidfile
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Unassigned |
Bug Description
Since the move to storm, we've noticed that production app servers are occasionally dying and leaving a stale pidfile. Below are some relevant entries from our incident log:
2008-07-09 10:43 UTC - lpnet4 had died and left stale pidfile - restarted
2008-07-07 08:57 UTC - lpnet7 had died and left stale pidfile - restarted
2008-07-07 04:57 UTC - lpnet8 had died and left stale pidfile - restarted
2008-07-05 06:13 UTC - lpnet2 had died and left stale pidfile - restarted
2008-07-02 17:00 UTC - lpnet1 had died and left pidfile hanging around - restarted
This now appears to be also affecting staging (last two days, staging restore has failed to start the staging app server because of a stale pidfile from the app server dying), and staging is restarted every day as part of the upgrade.
There doesn't yet appear to be anything obvious in the logs or nohup files that we can see suggesting why the app servers are dying, but I've attached some access logs and app server logs from some died app servers.
tags: | added: canonical-losa-lp |
Changed in launchpad-foundations: | |
status: | Incomplete → Fix Released |
visibility: | private → public |
I have observed this when I run make schema. The running instance of launchpad dies silently.