Comment 3 for bug 260140

Revision history for this message
Stuart Bishop (stub) wrote :

The current approach cannot be done efficiently. We are joining three tables - Revision, BranchRevision and Branch. Going right to left, we first get the set of branches that might be interesting. We then join with BranchRevision with no filters, so get all rows for all those branches (Nearly 4.5 million for bzr). We then join with Revision, ordering by revision_date so we can pick the most recent. This is slow no matter how we juggle things. Going from left to right is worse as we start with all 3.5 million revisions, join with BranchRevision giving us all 15 million rows in our set before finally filtering this by joining with Branch. We then need to order and pull out the interesting rows at the top.

We need to filter on BranchRevision. We could mirror Revision.revision_date into Branch, but I suspect that using the existing BranchRevision.id will actually give us better results.

A Revision needs to appear in the feed when Launchpad becomes aware of it, not possibly skipped altogether because the actual commit was made some time in the past and the branch only just pushed to somewhere Launchpad can see it. So the feed should display the most recently uploaded revisions rather the most recent commits that happen to have been mirrored to Launchpad. This is Revision.date_created, and we can achieve the same ordering by relying on Revision.id or BranchRevision.revision.

So if we are happy showing the last 25 revisions pushed/pulled into Launchpad then we can do:

SELECT DISTINCT ON (BranchRevision.revision) Revision.*
FROM Revision, BranchRevision, Branch
WHERE BranchRevision.branch = Branch.id
    AND BranchRevision.revision = Revision.id
    AND Branch.private IS FALSE AND Branch.product = 1186
ORDER BY BranchRevision.revision DESC LIMIT 25;

or

SELECT Revision.*
FROM Revision, (
    SELECT DISTINCT ON (BranchRevision.revision) BranchRevision.revision
    FROM BranchRevision, Branch
    WHERE BranchRevision.branch = Branch.id
        AND Branch.private IS FALSE AND Branch.product = 1186
    ORDER BY BranchRevision.revision) AS BR
WHERE Revision.id = BR.revision
ORDER BY Revision.revision_date DESC LIMIT 25;

The first is preferable, but the latter might be easier to put into Storm syntax - I can't recall if there is DISTINCT ON support there yet.