Launchpad itself

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #427990
Comment #4

Comment 4 for bug 427990

Revision history for this message

Stuart Bishop (stub) wrote on 2009-09-12:

The first reference I can find is this from the PostgreSQL master's log file:

<@:24515> 2009-09-10 20:23:19 BST LOG: server process (PID 19692) was terminate
d by signal 9: Killed

Looks like somebody kill -9'd a backend process, which is bad - when the backend process terminates without cleanup, PostgreSQL has to restart everything because shared resources are in an unknown state. Its an instant outage button basically.

The OOPS reports are not useful to us, as we now log an OOPS for a disconnection error before we reconnect and retry the transaction. Seeing thousands of OOPSes like this when the database is in recovery mode is normal. We may want to reconsider this decision.

The appservers should reconnect once the database is back on its feet. We have tests for this behavior even, so I'm not sure what is going wrong there.