queue builder is way too slow

Bug #31777 reported by Daniel Silverstone
12
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
High
Celso Providelo

Bug Description

The buildd queue builder is currently extremely slow resulting in pauses of up to 15 minutes in the build queues.

Since the queue builder only runs when cron.daily isn't running and that currently eats 30 minutes of every hour anyway, we are in a position that the queue builder is running only once per hour and taking upwards of ten minutes to do its job.

Mostly this is because the queue builder considers *every* source package release in each distrorelease it is pondering. As a result things take a very long time even when there is nothing for the queue builder to do.

The following is a query which lists all the sourcepackagepublishing records for which there are in theory insufficient builds for a given distrorelease:

EXPLAIN ANALYZE
SELECT SourcePackagePublishing.id
  FROM SourcePackagePublishing, SourcePackageRelease
 WHERE SourcePackagePublishing.SourcePackageRelease = SourcePackageRelease.id
   AND SourcePackagePublishing.DistroRelease = <DISTRORELEASEID>
   AND (SELECT COUNT(*)
         FROM Build
        WHERE Build.SourcePackageRelease = SourcePackageRelease.id
        ) <= CASE SourcePackageRelease.ArchitectureHintList
             WHEN 'any' THEN <CHROOTCOUNT>
             ELSE (char_length(SourcePackageRelease.ArchitectureHintList) -
                   char_length(
         replace(SourcePackageRelease.ArchitectureHintList,
                               ' ', '')))
              END
   AND SourcePackagePublishing.status = <PUBLISHEDSTATUS>
;

The DISTRORELEASEID is the numeric ID of the distrorelease in question, the CHROOTCOUNT should be the number of chroots for the distrorelease such that we could use them for building, and the PUBLISHEDSTATUS is the dbschema of the published status (2)

We should probably include a SourcePackagePublishing.Pocket = <BLAH> for the pocket in question too, but I've not looked at the code in a while.

Revision history for this message
Daniel Silverstone (dsilvers) wrote :

Currently this really slows down distro work, sometimes leading to a delay of up to three hours between source upload and binaries hitting the archive mirrors.

Revision history for this message
Celso Providelo (cprov) wrote :

Will address it as soon as the next soyuz release rollout finish

Changed in launchpad-buildd:
assignee: nobody → cprov
Revision history for this message
Celso Providelo (cprov) wrote :

Let's try to use this smarter approach.

Changed in launchpad-buildd:
status: Unconfirmed → Confirmed
Revision history for this message
Daniel Silverstone (dsilvers) wrote :

Is it helping?

Revision history for this message
Celso Providelo (cprov) wrote :

We may go for a more 'package-centric' approach.
However, the proposed query might help to speed up the "overall checker" that we can run less often than hourly (maybe daily)

Changed in soyuz:
assignee: cprov → nobody
Joey Stanford (joey)
Changed in soyuz:
assignee: nobody → julian-edwards
Revision history for this message
Christian Reis (kiko) wrote :

Pushing off to 1.1.8 and giving back to Celso for now -- he has a better idea of what's necessary, and I want to make sure the commercial-repo work gets done.

Changed in soyuz:
assignee: julian-edwards → cprov
Revision history for this message
Christian Reis (kiko) wrote :

So my suggestion here is to avoid rescoring completely, and instead when deciding what build to process, ordering by the [original, process-upload defined] score and the build record's age.

If you want to preserve the behaviour where we ensure that high scores were given to old records that are for non-priority components, you can use the age in seconds and then fudge it with a divisor, and then add to the score, and order by that. For instance, using a divisor of 100 gives you 36 points per hour, so a build that is 10 hours old gets 360 points; if main's score is 1500 and universe's score is 500, that means that in 30 hours a universe build would be prioritized over a main build -- which I think is close to the behaviour we have today.

Removing the rescoring step, the queue builder becomes necessary only to build stuff which comes in through minority cases (as most build records are today created at upload time) -- for instance, new architectures, PAS changes, source syncs, and maybe another one or two cases I can't recall right now.

It can be run a lot less frequently if that's all that it's handling -- maybe once or twice a day.

Revision history for this message
Adam Conrad (adconrad) wrote :

I was with kiko right up until the "we can run it less frequently" bit. If we rewrite this flow to make sure new uploads always get scored when accepted, and that the queue-builder is only creating new records for P-a-s changes and the like, or rescoring based on age, it will end up so fast that we could run it every 5 minutes for all we care (obviously, every 15 or so is fine).

The basic idea of scoring at creation time (and only at creation time), and rescoring based on age is a good one, and I tend to like it. Note that we can also do a couple of things about the initial scoring to make it saner:

- Drop the available-deps check entirely. It's pointless, it becomes stale and inaccurate very soon after the record is created, and we have auto-dep-waiting on the buildd side specifically to deal with missing deps.
- Try to score in a similar fashion to dak/wanna-build, taking into account suite, component, priority, and section (libs first, then everything else), in that order.

If you guys want my input on this at any point during the development of it, please ask. This is something pretty near and dear to my hear, and my daily work.

Revision history for this message
Christian Reis (kiko) wrote : Re: [Bug 31777] Re: queue builder is way too slow

On Wed, Aug 08, 2007 at 11:45:48AM -0000, Adam Conrad wrote:
> I was with kiko right up until the "we can run it less frequently" bit.

Because it is potentially expensive. We need to find out exactly how
expensive it will end up once we've done the changes suggested here, but
the essential problem is that doing the whole-archive check for records
that need creating is expensive.

> The basic idea of scoring at creation time (and only at creation time),
> and rescoring based on age is a good one, and I tend to like it. Note

We can dynamically "rescore" based on age; there is no need to actually
run a script to update the score. Fundamentally, this decision can be
made when dispatching builds -- it doesn't need to be persisted.

> - Drop the available-deps check entirely. It's pointless, it becomes stale and inaccurate very soon after the record is created, and we have auto-dep-waiting on the buildd side specifically to deal with missing deps.
> - Try to score in a similar fashion to dak/wanna-build, taking into account suite, component, priority, and section (libs first, then everything else), in that order.

That'd be cool and definitely something we could roll into this work.
--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3376 0125

Celso Providelo (cprov)
Changed in soyuz:
status: Confirmed → In Progress
Christian Reis (kiko)
Changed in soyuz:
milestone: 1.1.10 → 1.1.11
Changed in soyuz:
milestone: 1.1.11 → 1.1.12
Revision history for this message
Celso Providelo (cprov) wrote :

Short update:

 * RF 5242 allow builddmaster tasks to run in parallel (slave-scanner from sequencer and queue-builder & retry-depwait from cron).

Next step is to create and score (statically) build records synchronously for source-only uploads.

Revision history for this message
Martin Pitt (pitti) wrote :

Confirmed that creating build records straight on package uploads, and having q-b just doing the rescoring based on age, etc. would be wonderful. This is currently a major cause of release engineering slowdown.

Changed in soyuz:
milestone: 1.1.12 → 1.2.1
Revision history for this message
Celso Providelo (cprov) wrote :

The remaining issue mentioned in this bug will fixed during the implementation of https://blueprints.edge.launchpad.net/soyuz/+spec/soyuz-buildd-improvements

Changed in soyuz:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.