production loggerhead branch leaks memory

Bug #156453 reported by Max Kanat-Alexander
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
loggerhead
Fix Released
Critical
Max Kanat-Alexander

Bug Description

I'm running mwhudson's "production" branch of loggerhead on my server. I have three auto-published folders and one set of non-auto-published folders.

After running for over a week, the process was consuming 1.1GB of RAM and bzr had become so slow on that server that I couldn't do remote checkouts. (Possibly because the server only has 2GB of RAM and it was swapping.)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I've observed this too. Our solution has been to restart loggerhead every so often :/

Roughly how many branches total do you publish (i.e. how many auto-published branches are there)?

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

I auto-publish three folders for a total of six branches, and I directly publish three more branches. The install is at http://bzr.everythingsolved.com/ when it's working.

Changed in loggerhead:
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think revision 156 should fix this. Sorry for taking so long to achieve anything here!

Changed in loggerhead:
status: Confirmed → Fix Committed
Revision history for this message
Max Kanat-Alexander (mkanat) wrote : Re: [Bug 156453] Re: production loggerhead branch leaks memory

 Thank you so much, you rock. :-) I'll let you know if there are
still any memory leaks that I notice.

 -Max

On Mon, 21 Apr 2008 04:58:18 -0000 Michael Hudson
<email address hidden> wrote:
> I think revision 156 should fix this. Sorry for taking so long to
> achieve anything here!
>
> ** Changed in: loggerhead
> Status: Confirmed => Fix Committed
>

--
http://www.everythingsolved.com/
Competent, Friendly Bugzilla and Perl Services. Everything Else, too.

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

  Hm, it's a lot better now but the process is still 650MB after running for a few days. Is that just the nature of TurboGears, or can it actually be fixed?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think it's a kid thing, really: when it renders large, complicated pages it just chews ram for breakfast. My change made the rendering of large diffs much simpler, so I suspect the main driver of memory increases is now the annotation of large files. But I haven't profiled this yet. If you want to confirm, try restarting loggerhead, note the memory usage, find the largest file in the branches you are publishing in terms of lines of code, annotate that and check loggerhead's usage again.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I should also say that I have no evidence that this is a leak in the normal sense: it's just that the RAM isn't being returned to the system after it's being used, so loggerhead nearly always ends up using as much memory as required to process the largest file it's rendered so far -- which can be a ludicrous amount, thanks to kid, but if you have enough ram to annotate the largest file you have, you should be ok.

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Hmm. I just loaded the largest file in my repo, and loggerhead is only using 184MB. But when I killed it earlier today, it was using 641MB.

The issue is that other services are running on this box. Is it just the nature of Python (similar to Perl) to never return memory to the system?

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2008-05-02 at 06:09 +0000, Max Kanat-Alexander wrote:
> Hmm. I just loaded the largest file in my repo, and loggerhead is only
> using 184MB. But when I killed it earlier today, it was using 641MB.
>
> The issue is that other services are running on this box. Is it just the
> nature of Python (similar to Perl) to never return memory to the system?

Older pythons had that behaviour, but it is largely fixed these days.
There are fundamental bugs in loggerhead but some progress is being
made; it will unfortunately change the runtime dependencies, as the
template engine is one of the culprits causing the bloat.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Martin Albisetti (beuno) wrote :

Max, Loggerhead has had a lot of work done in the past weeks, and we've changed quite a lot of the code base, including removing turbogears, cherrypy and KID in exchange for python-paste + python-simpletal.

It uses a *lot* less resources in general. Could you update your instances and give us some feedback?

Thanks!

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, with mwhudson's help, I got trunk working, and at the moment (after running for several hours) it's consuming 469M RES and 820M VIRT (according to top, at least). It has grown in the past few hours, but the growth seems to be logarithmic this time instead of linear (like the last one was)--it's growing less over time. It's still huge (the biggest process on my machine by far, unless you count all my mod_perl httpds combined), but it at least seems a little less leaky. I'm slightly concerned, though, because it's about 250MB when it starts, and 469MB now.

However, it's definitely much faster. :-)

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Sadly, today my installation was using 1.2GB of RAM, larger than the old one ever got.

Changed in loggerhead:
status: Fix Committed → Confirmed
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, we're still having issues too. Grr!

It is interesting that you are having them too though, as one of the excuses we can use is the very large number of branches we serve.

I guess it's time to enter the land of python memory debuggers...

Revision history for this message
Paul Hummer (rockstar) wrote :

We started work on finding some of the issues with Loggerhead during the Epic, I think we have a good idea what needs to be fixed.

Changed in loggerhead:
assignee: nobody → rockstar
status: Confirmed → Triaged
Paul Hummer (rockstar)
Changed in loggerhead:
status: Triaged → In Progress
Revision history for this message
Paul Hummer (rockstar) wrote :

A profiling branch has landed in Loggerhead trunk, which we've been using to profile Loggerhead. I have a branch that uses a memory middleware to monitor memory usage. We believe this issue has to do with how we generate a log for a file. If we can use bzr's way of doing it, I think we can benefit from their optimizations.

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, 2008-12-04 at 15:34 +0000, Paul Hummer wrote:
> A profiling branch has landed in Loggerhead trunk, which we've been
> using to profile Loggerhead. I have a branch that uses a memory
> middleware to monitor memory usage. We believe this issue has to do
> with how we generate a log for a file. If we can use bzr's way of doing
> it, I think we can benefit from their optimizations.

Where is your memory middleware branch? (Can you land it in trunk too?)

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Martin Albisetti (beuno) wrote :

Partially addressed in revno 252, monitoring it's performance on Launchpad to see if it's reduced it significantly.

Revision history for this message
Pete Deremer (sportman1280-deactivatedaccount) wrote :

I am experiencing this in version 1.10. Over time... it more than doubles the amount of memory it takes and never seems to go back down. :(

Paul Hummer (rockstar)
Changed in loggerhead:
assignee: rockstar → nobody
status: In Progress → Triaged
Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Maybe there are some circular references and not all unused objects are deleted.

Revision history for this message
Paul Hummer (rockstar) wrote :

So, a status report that will also be brought up in the Launchpad Production Meeting:

I have a branch that adds memory profiling to loggerhead (which will land in trunk as soon as I stop being so fascinated with it). Based on the information received, not all objects are being deleted. I've successfully gotten Loggerhead to climb in memory on my local machine consistently, which is huge progress. Using both of these new developments, I have a starting place and a way to measure my progress in reducing memory usage on Loggerhead.

Changed in loggerhead:
assignee: nobody → rockstar
status: Triaged → In Progress
Revision history for this message
Tim Penhey (thumper) wrote :

Paul, that is really great news. Keep us informed.

Revision history for this message
Martin Albisetti (beuno) wrote :

I think that this problem is mostly dealt with, thanks to John, Michael and Paul.

Changed in loggerhead:
assignee: Paul Hummer (rockstar) → nobody
status: In Progress → Fix Committed
Paul Hummer (rockstar)
Changed in loggerhead:
status: Fix Committed → Fix Released
Revision history for this message
Martin Albisetti (beuno) wrote :

Leaving at Fix committed, as we haven't released a version with this fix.

Changed in loggerhead:
status: Fix Released → Fix Committed
Herb McNew (herb)
Changed in loggerhead:
status: Fix Committed → Confirmed
Revision history for this message
Herb McNew (herb) wrote :

So, I've changed the status of this from Fix Committed back to Confirmed. We're still seeing (or seeing again) this issue on Launchpad. I don't know if there has been a regression or if this has been lurking the entire time. I'm also not sure if this is related to any LP-specific changes we have added to loggerhead (if indeed there are any). The LOSAs would be happy to help debug this in any way we can. Relevant process listing attached.

Revision history for this message
Herb McNew (herb) wrote :
Changed in loggerhead:
assignee: nobody → Max Kanat-Alexander (mkanat)
Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, my investigation shows that this leak is really, really slow. I ran htdig on my local loggerhead instance, and at first it grew really fast, up until it got to about 115MB. Then it just grew really slowly over several hours. I think we're leaking a few KB, but only on certain requests--the process would grow and shrink and then grow permanently, across requests.

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, I've been working quite a bit on tracing this down.

When comment 24 was written, was bazaar.launchpad.net running Python 2.4 or 2.5? Is it running 2.5 (or higher) now? I'm wondering because Python 2.5 had some changes in memory management that could be relevant.

Now that codebrowse actually stays up--are you still seeing this memory leak?

On Launchpad, is the leak slow like I said in comment 26, or does it get really huge by, say, the end of a few hours or a day?

It's possible that, on Launchpad, the issue is (or is related to) bug 436406.

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

Max Kanat-Alexander wrote:
> It's possible that, on Launchpad, the issue is (or is related to) bug
> 436406.

Does Launchpad even have bzr-svn installed and plugins enabled in
Loggerhead? (OK, they are by default, but...)

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, I've been investigating this for about 10 hours just tonight, and I haven't gotten anywhere significant yet. I actually can't even make the process grow consistently. Dozer and objgraph both tell me nothing interesting.

To help figure this out, I've updated bzr.bugzilla.org (one place I was seeing this) to the latest code, and given it its own Apache log, and I'm going to let it run for a few days to see if it gets very large, and then I can replay the log against a local install and attempt to trace down the leak that way, if there is indeed still a leak.

I do still want to know if the leak is still happening on Launchpad. If it is, then it might be something specific to the launchpad-loggerhead glue.

Revision history for this message
Martin Pool (mbp) wrote :

@max you should talk to jam about meliae
<https://launchpad.net/meliae>. It might help.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Matt Nordhoff wrote:
> Max Kanat-Alexander wrote:
>> It's possible that, on Launchpad, the issue is (or is related to) bug
>> 436406.
>
> Does Launchpad even have bzr-svn installed and plugins enabled in
> Loggerhead? (OK, they are by default, but...)

No. We want to fix this so the foreign vcs icons show up (see some
other bug...) but currently, bzr-svn et al are not loaded. loom and
lpserve are loaded (i.e. things in $launchpad_root/bzrplugins are loaded).

Cheers,
mwh

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, I'm changing this back to "fix committed" after talking to mwhudson about it today. It sounds like the loggerhead process on launchpad is simply taking up a lot of memory because it's got so many branches to look at, not because it's leaking. I actually couldn't get a loggerhead process to grow consistently or leak in my recent testing, which was far more thorough than my previous testing when I thought that there was a leak. It certainly gets really large, and we could certainly look at how to reduce memory usage, but we don't seem to be leaking.

Changed in loggerhead:
status: Confirmed → Fix Committed
Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
Revision history for this message
Robert Collins (lifeless) wrote :

Really should have been fix released, not fix committed.

tags: added: oops
tags: removed: oops
Changed in loggerhead:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.