Comment 10 for bug 328302

Revision history for this message
Tim Penhey (thumper) wrote :

I spent some time with spm today looking at this. It seems more likely that our optimisation is no longer an optimisation due to the shear size of the BranchRevision table. I also spent some time thinking about how we could do without the branch revision table. The primary use cases for it are "feeds", "unmerged revisions", "summary info".

The use-case that seems to be hurting us the most is the summary information for projects. We'd like to have it for project groups, and users, but the query is too cumbersome. As of today we have 84 million rows in the branch revision table. Too big to be doing ad-hoc queries across.

I propose that we create a table for the purpose of providing quick access to the summary information.

If we had something like:

create table RevisionSummaryCache
(
revision_date datetime
product int references Product
distroseries int references DistroSeries -- can't forget source package branches
sourcepackage int references SourcePackageName -- or what ever it is
author int -- no references, use +ve numbers for person id fields, and -ve for revision_author links where there isn't a person
revision int references Revision
)

We make some arbitrary time cutoff, like 30 days. We remove any entries in this table older than that time.

When we scan a branch we make sure there exists a row for the product or distroseries/sourcepackage for each revision that is within the last 30 days.

With good indices on this table, we should be able to have very quick counts of revisions across projects, as well as source packages, distroseries and distributions. Additionally we should be able to have quick counts of all commits within the last 30 days across all of Launchpad (which admittedly isn't hard now, but the others are). We should also be able to get quick counts of commits by an individual across different subsections.

Comments?