+filebug dupefinder produces less relevant results after performance tuning changes

Bug #626656 reported by Martin Pool
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

I typed "pull from one ssh branch into another fails with TooManyConcurrentRequests" into <https://bugs.edge.launchpad.net/bzr/+filebug>. I would expect this to find me some other bugs with the TooManyConcurrentRequests exception name in their subject, since they are the most likely dupes.

My impression is that with the code active in June it would have found the likely dupes, so this is a regression in functionality, arising from a performance-oriented change to requiring a match on N-1 terms.

As a workaround, you can progressively remove terms from your summary: if I enter only "TooManyConcurrentRequests" then I do see the bugs I would have expected, including bug 483661 which is in fact the bug I hit.

Ideally we would just do the right thing. If we expect users to trim their terms to find the right thing then we could change the ui here; because the text they enter is going to be used as the default bug subject we typically don't expect or want them to be terse.

Revision history for this message
Deryck Hodge (deryck) wrote :

Hi, Martin.

Thanks for the bug report and the useful example. I agree that we should not expect users to remove terms, but I appreciate you stating the work around.

We knew there was a trade off here between performance and relevance. This is certainly a regression and should be marked of high importance; however, I don't think we can do anything about it until we better deal with the search story, i.e. have something that allows for both fast and relevant searches. I, of course, welcome Robert's further thoughts on this.

Cheers,
deryck

Changed in malone:
status: New → Triaged
importance: Undecided → High
tags: added: dupefinder search
summary: - dupefinder now over-tight
+ +filebug dupefinder produces less relevant results after performance
+ tuning changes
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 626656] Re: dupefinder now over-tight

Hi Deryck, Martin.

I think there is both a big picture and a little picture here.

Big picture, we need to ask how we want the dupfinder to really work.
Google-like magic qualities would be wonderful.

Little picture, we can see about tuning this to expand the width of
searches when the domain is small.

E.g. on bzr - small bug db - search as we did.
On ubuntu - big bug db - search as we do now.
On ubuntu/package-Foo, if it has many bugs, search somewhere
inbetween, if not search like we did.

I don't know if this is worth doing.

Revision history for this message
Martin Pool (mbp) wrote :

istm the little picture here is to just add text to the page saying
something like "try removing words from the search to get better
matches" or something similar.

eventually, yes, sufficiently smart text searching could help a lot in
not just making this page fast but also in reducing ubuntu's dupe
burden.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.