Comment 1 for bug 607960

Revision history for this message
Robert Collins (lifeless) wrote : Re: timeouts on Distribution:+bugs doing searches

slow query:
 -- SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache, "_prejoin1".owner, "_prejoin1".active, "_prejoin1".autoupdate, "_prejoin1".bug_reported_acknowledgement, "_prejoin1".bug_reporting_guidelines, "_prejoin1".bug_supervisor, "_prejoin1".bugtracker, "_prejoin1".date_next_suggest_packaging, "_prejoin1".datecreated, "_prejoin1".description, "_prejoin1".development_focus, "_prejoin1".displayname, "_prejoin1".downloadurl, "_prejoin1".driver, "_prejoin1".enable_bug_expiration, "_prejoin1".freshmeatproject, "_prejoin1".homepage_content, "_prejoin1".homepageurl, "_prejoin1".icon, "_prejoin1".id, "_prejoin1".lastdoap, "_prejoin1".license_approved, "_prejoin1".license_info, "_prejoin1".reviewed, "_prejoin1".logo, "_prejoin1".max_bug_heat, "_prejoin1".mugshot, "_prejoin1".name, "_prejoin1".official_answers, "_prejoin1".official_blueprints, "_prejoin1".official_malone, "_prejoin1".official_rosetta, "_prejoin1".private_bugs, "_prejoin1".programminglang, "_prejoin1".project, "_prejoin1".registrant, "_prejoin1".remote_product, "_prejoin1".reviewer_whiteboard, "_prejoin1".screenshotsurl, "_prejoin1".security_contact, "_prejoin1".sourceforgeproject, "_prejoin1".summary, "_prejoin1".title, "_prejoin1".translation_focus, "_prejoin1".translationgroup, "_prejoin1".translationpermission, "_prejoin1".wikiurl, "_prejoin2".id, "_prejoin2".name, Bug.date_last_message, Bug.date_last_updated, Bug.date_made_private, Bug.datecreated, Bug.description, Bug.duplicateof, Bug.heat, Bug.heat_last_updated, Bug.id, Bug.latest_patch_uploaded, Bug.message_count, Bug.name, Bug.number_of_duplicates, Bug.owner, Bug.private, Bug.security_related, Bug.title, Bug.users_affected_count, Bug.users_unaffected_count, Bug.who_made_private FROM Bug, BugTask LEFT JOIN Product AS "_prejoin1" ON BugTask.product = "_prejoin1".id LEFT JOIN SourcePackageName AS "_prejoin2" ON BugTask.sourcepackagename = "_prejoin2".id WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND ((BugTask.status = 10) OR (BugTask.status = 15) AND (Bug.date_last_message IS NOT NULL AND BugTask.date_incomplete <= Bug.date_last_message) OR (BugTask.status = 15) AND (Bug.date_last_message IS NULL OR BugTask.date_incomplete > Bug.date_last_message) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof is NULL AND (Bug.fti @@ ftq('blu-ray') OR BugTask.fti @@ ftq('blu-ray') OR BugTask.targetnamecache ILIKE '%' || 'blu-ray' || '%') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 15177 AND BugSubscription.person = TeamParticipation.team AND BugSubscription.bug = Bug.id)) AND (1=1) ORDER BY BugTask.importance DESC, BugTask.id LIMIT 9 OFFSET 0
-------------
Bug.fti @@ ftq(\\'blu-ray\\') OR BugTask.fti @@ ftq(\\'blu-ray\\') OR BugTask.targetnamecache ILIKE \\'%\\' || \\'blu-ray\\' || \\'%\\') is interesting in this query.

Some thoughts (timings are best of 3 to get hot stats):

lpmain_staging=> select count(*) from bugtask where BugTask.targetnamecache ILIKE ('%' || 'blu-ray' || '%') ;
 count
-------
     0
(1 row)

Time: 1020.885 ms

That is, targetnamecache + ILIKE == epic pain. We should put this in a fti index; if we need to replace it frequently, then we could use a multi column gist index, or even a separate index.

lpmain_staging=> select count(*) from bug, bugtask where bug.id=bugtask.bug and( Bug.fti @@ ftq('blu-ray') OR BugTask.fti @@ ftq('blu-ray') );
 count
-------
    41
(1 row)

Time: 8666.260 ms

lpmain_staging=> select count(*) from bug, bugtask where bug.id=bugtask.bug and( Bug.fti @@ ftq('blu-ray') );
 count
-------
    41
(1 row)

Time: 393.130 ms

lpmain_staging=> select count(*) from bug, bugtask where bug.id=bugtask.bug and( BugTask.fti @@ ftq('blu-ray') );
 count
-------
     0
(1 row)

Time: 88.948 ms

So - querying multiple separate vectors is disproportionately slow.

lpmain_staging=> explain analyze select count(*) from bug, bugtask where (Bug.fti @@ ftq('blu-ray') OR BugTask.fti @@ ftq('blu-ray')) and bug.id=bugtask.bug;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=309877.32..309877.33 rows=1 width=0) (actual time=8793.012..8793.013 rows=1 loops=1)
   -> Hash Join (cost=201501.09..309873.82 rows=1399 width=0) (actual time=3865.313..8792.883 rows=41 loops=1)
         Hash Cond: (bugtask.bug = bug.id)
         Join Filter: (((bug.fti)::tsvector @@ '''blu'' & ''ray'' | ''bluray'''::tsquery) OR ((bugtask.fti)::tsvector @@ '''blu'' & ''ray'' | ''bluray'''::tsquery))
         -> Seq Scan on bugtask (cost=0.00..35328.14 rows=699814 width=125) (actual time=0.016..558.606 rows=699823 loops=1)
         -> Hash (cost=163550.93..163550.93 rows=599693 width=390) (actual time=2228.090..2228.090 rows=599698 loops=1)
               -> Seq Scan on bug (cost=0.00..163550.93 rows=599693 width=390) (actual time=0.008..921.580 rows=599698 loops=1)
 Total runtime: 8794.783 ms

------
Stuart - any particular ideas here? The main one that jumps out at me is to ditch the bugtask fti column and instead index all these things directly into bug.fti.