empty ppa builders should allow more builds from a single ppa

Bug #478691 reported by Micah Gersten
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Julian Edwards

Bug Description

I think this is fallout from Bug #393546. Now some of the daily build PPAs like https://edge.launchpad.net/~ubuntu-mozilla-daily/+archive/ppa have trouble completing in 24 hours. While it's nice not to monopolize, empty builders shouldn't be allowed to go unused.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I'm not sure how to best deal with this. The fix from bug 393546 has not worked as well as hoped.

The best idea so far is to reduce the number of concurrent long-running builds that a single PPA can have on a dynamic basis; that is allow them to increase in concurrency based on a few factors like:

 * empty queues on some builders
 * length of build (e.g. 1 fewer concurrent build for every 30 minutes of build time queued)

I welcome more suggestions if you can think of anything.

Changed in soyuz:
status: New → Triaged
importance: Undecided → High
tags: added: soyuz-build
Revision history for this message
Julian Edwards (julian-edwards) wrote :

BTW it is right that empty builders should be left unused if the only option is to put a 3 hour build on it. We need to be clever and keep some back for other incoming builds.

Revision history for this message
Alexander Sack (asac) wrote :

It might feel like chromium and mozilla are the ones to blame ... however, note that if there are only 3 builders available, there will be a backlog even without our dailies.

also i dont buy the "empty builders should be left unused" from above. maybe you could say "if only one buildd is empty it should be left unused" ... but as long as there is still one builder slot free all should be fine.

Maybe you could say if > 20% of builders are idle, just go ahead and schedule.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Hi asac,

> also i dont buy the "empty builders should be left unused" from above.

I didn't say how many should be left unsused, just that *some* should be. It's working out how many that should be left unused that's the hard part so that we get a good balance between long and short builds.

>but as long as there is still one builder slot free all should be fine. Maybe you could say if > 20% of builders are idle, just go ahead and schedule.

I don't think it's that simple. We need to look at how long something takes to build and compare that to the number of available builders and how many builds the PPA has queued up.

However, ultimately, we need more builders!

Revision history for this message
Fabien Tassin (fta) wrote :

While i understand the previous situation was unfair, the new algorithm is even more unfair for active teams, it's even counter productive. (if it has been done because dailies are no longer welcome, just tell me, i'll shut down the ones i manage). IMHO, the 1st task should have been to prune off the packages that have been building in loops for months/years waiting for something that will never happen (depwaits, mostly).
Anyway..

Processor Builders Queue
amd64 13 38 jobs (1 hour 30 minutes)
i386 16 45 jobs (1 hour 40 minutes)
lpia 12 36 jobs (1 hour 40 minutes)

with resp. 5, 8 and 7 builders idle. So it seems several team/people are impacted, not just mozilla & chromium.
It's a waste of resources and time for everyone. That number of machines should be more than enough to handle the current load.

IMHO, *no* machine should be idle if the corresponding arch queue is not empty.
It would be better for everyone to queue new entries by interleaving them after those waiting for a builder, but in front of those that are owned by a user/team already building something, disregarding what has already been built.

Say you have 3 builders M1, M2, M3 and 5 Teams/Users at work A, B, C, D, E.

A pushes A1 & A2 => M1 (A1), M2 (A2), M3 (idle) + Q (empty)
B pushes B1 => M1 (A1), M2 (A2), M3 (B1) + Q (empty)
C pushes C1 & C2 & C3 => M1 (A1), M2 (A2), M3 (B1) + Q (C1, C2, C3)
A pushes A3 & A4 => M1 (A1), M2 (A2), M3 (B1) + Q (C1, A3, C2, A4, C3)
C pushes C4 & C5 => M1 (A1), M2 (A2), M3 (B1) + Q (C1, A3, C2, A4, C3, C4, C5)
D pushes D1 => M1 (A1), M2 (A2), M3 (B1) + Q (C1, D1, A3, C2, A4, C3, C4, C5)
M1 completes A1 => M1 (C1), M2 (A2), M3 (B1) + Q (D1, A3, C2, A4, C3, C4, C5)
M3 completes B1 => M1 (C1), M2 (A2), M3 (D1) + Q (A3, C2, A4, C3, C4, C5)
M2 completes A2 => M1 (C1), M2 (A3), M3 (D1) + Q (C2, A4, C3, C4, C5)
B pushes B2 => M1 (C1), M2 (A3), M3 (D1) + Q (B2, C2, A4, C3, C4, C5)
E pushes E1 => M1 (C1), M2 (A3), M3 (D1) + Q (B2, E1, C2, A4, C3, C4, C5)
etc.

it's just a matter of creating the queue in a fair way, builders just have to get the tip as usual.
Even if someone pushes 200 packages while all the builders are idle (think of langpacks, or test rebuilds), if someone arrives just after to build something, he just have to wait for 1 machine to complete its current build. Sounds reasonable to me.

btw, queue times are now nonsensical as they now depend on the team/user Q. For mozilla, it's close to 20h, it will be more than 24h once i add lucid, meaning the last packages will never be published.

Revision history for this message
William Grant (wgrant) wrote :

It's not that simple -- we need to actively discriminate against long builds or daily PPAs and prevent them from ever taking up all of the builders. Imagine a situation (that happens daily) where there are enough two-hour daily builds from various PPAs to fill all of the builders. Queuing other builds ahead of those will work for a while, but as soon as the non-daily queue empties the dailies will take over every builder. Once that happens, nothing else can build for a couple of hours.

Revision history for this message
Fabien Tassin (fta) wrote :

William, I understand from this that dailies are no longer welcome. I will shut my bot and let someone else take over the PPAs.

Revision history for this message
Alexander Sack (asac) wrote :

William, we maintain those PPAs to supplement the service of the ubuntu distribution as a whole. Its not like many teams will run mozilla dailies, just because the fun of it. They are an important resource in fixing/verifying bugs and keeping track of regressions in ubuntu and upstream. Also some of the PPAs have plenty of users whose service would be busted by such discrimination.

So if you start talking about discriminating built time on a PPA/packages base, remember to also take into account the usefulness of the PPAs/packages in the ubuntu context (or even overall).

Anyway, the change that lead to this should not have been done before talking to the ones running those dailies - speaking me or fta. We should be well known to the soyuz team by now.

Now we need to act similar swiftly like you landed this ... which is why i urge you to do something (that is not a complete rollback!) like I suggested in comment 3. We can still get to a better even fairer approach when we know how to tackle this problem for real.

Revision history for this message
Alexander Sack (asac) wrote :

Also, from what I understood after discussion with Julian this afternoon, he seemed to ok to do something like in comment 3 ...

Revision history for this message
Alexander Sack (asac) wrote :

oh ... to avoid confusion ... in comment #8 last paragraph: s/you/one/ or we

Revision history for this message
Max Bowsher (maxb) wrote :

Alexander, Fabien: I'm inclined to believe that when William says "discriminate" he means it in the purely neutral technical sense - i.e. "do something different" - rather than with intent to imply daily builds are undesirable.

Fabien: The problem with your queuing suggestion in comment #5 is that it still permits a single PPA to monopolize the entire supply of builders for the length of a build, i.e potentially hours - which is the problem the original change was introduced to avoid.

Revision history for this message
William Grant (wgrant) wrote :

Fabien, I meant nothing of the sort (nor am I in a position to make any such statement). "discriminate" does not mean discourage. My opinion is, however, that the queue algorithm has no option but to treat daily builds slightly differently. I cannot see a way around it.

It is a simple fact that daily builds take up a lot of builder time, and I suspect that the timeliness of such builds is less important than other less regular PPAs. With the old system, daily PPAs can block up the build queue for hours. We need to devise a system where they can still build (ie. a less insane restriction that the one implemented in 3.1.10), but do not block the shorter, more time-critical builds for hours.

Revision history for this message
Micah Gersten (micahg) wrote :

It seems like the only build that takes hours is chromium. Is it possible to take into account how long the last build took and use that as a basis to determine priority?

Revision history for this message
William Grant (wgrant) wrote :

If someone can come up with a working algorithm that doesn't involve my suggested discrimination, I would have no problem with that. I just doubt that it's doable.

(I also see that Max completed his reply while I was penning mine.)

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Fabien, please don't shut down your bot, the dailies are a very important part of what Launchpad needs to do and trust me when I say I will do everything I can to sort out this situation.

It's obvious that the recent change has not worked very well, but c'est la vie, we live and learn. I am going to make a change to the scheduling this week to rectify things. It won't be a complete solution, but I will attempt to strike a balance between the needs of daily builds and the needs of everyone else.

Longer term, I can promise you that this will be done much, much better. We aim to have ~200 daily builds within a few months.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I am implementing asac's suggestion for the 20% rule as an interim measure. It should help a lot though.

Changed in soyuz:
milestone: none → 3.1.11
status: Triaged → In Progress
assignee: nobody → Julian Edwards (julian-edwards)
Revision history for this message
Diogo Matsubara (matsubara) wrote : Bug fixed by a commit
Changed in soyuz:
status: In Progress → Fix Committed
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Let me know how things go, if you're watching the build farm. It should be a bit fairer now.

Changed in soyuz:
status: Fix Committed → Fix Released
Revision history for this message
Alexander Sack (asac) wrote :

thanks a lot for the quick fix julian!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.