Make it possible to use feature flags to override the global timeout for specific pages

Bug #627701 reported by Gary Poster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Robert Collins

Bug Description

Robert said something very close to this:

"I want to lower the global timeout more but we have same pages which are all of:
 - infrequently used
 - extremely hard to optimise today
 - slow
so I've proposed and I think you were ok, with making the timeout overridable per page but this needs a unique-enough key to identify the page in the appserver. I had been thinking to use the page id, but I'm not entirely sure now whether its unique enough or even appropriate."

I agreed, pageids were not designed for that. Robert continued:

"I was actually thinking to lookup (key) as a feature flag scope. As long as (key) shows up in oopses, that would be a very low friction means to change this. As in, a sysadmin could just do it, when a page starts blowing up in future, file a critical bug, we fix, rule gets removed returning it to the default."

I mentioned that we could not try to change pageid for now, and that we could maybe use the full view class name for a key, but it might be tricky, because of dynamically created classes from some old Zope code we still use, for one thing.

Robert proceeded.

"The key constraints in the short term are:
 - deterministic key;
 - lookup in features;
 - control per-request timeout if a feature rule is found; and
 - key shown in oops somehow.
The uniqueness needed is that we should be identifying a code path reasonably precisely. Adding stuff to the oops is easy via the request vars dict, we can wedge it in any old how."

I questioned whether looking them up in feature flags were a short-term requirement. Robert replied:

"It is because:
 - we don't know -completely reliably- whats going to go *boom* when we change the global timeout
  - when we change it, we'll be flooded with issues, and the config system takes about 1.5 hours to make a change."

I conceded the point.

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :

I'd like to note that I've added a pageid scope feature already, so all thats needed is something like this:

07:51 <lifeless> === modified file 'lib/canonical/launchpad/webapp/adapter.py'
07:51 <lifeless> --- lib/canonical/launchpad/webapp/adapter.py 2010-09-14 04:52:39 +0000
07:51 <lifeless> +++ lib/canonical/launchpad/webapp/adapter.py 2010-09-17 19:50:58 +0000
07:51 <lifeless> @@ -272,6 +272,13 @@
07:51 <lifeless> if not getattr(_local, 'enable_timeout', True):
07:51 <lifeless> return None
07:51 <lifeless> if timeout is None:
07:51 <lifeless> + timeout_str = getFeatureFlag('hard_timeout')
07:51 <lifeless> + if timeout_str:
07:51 <lifeless> + try:
07:51 <lifeless> + timeout = float(timeout_str)
07:51 <lifeless> + except ValueError:
07:51 <lifeless> + logging.error('invalid hard timeout flag %r', timeout_str)
07:51 <lifeless> + if timeout is None:
07:51 <lifeless> timeout = config.database.db_statement_timeout
07:51 <lifeless> if not timeout:
07:51 <lifeless> return None

(and tests) grepping for 'lp.services.features' in existing doctests will show (a way) to do it.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 627701] Re: Make it possible to use feature flags to override the global timeout for specific pages

Oh, also, please add the new flag (I suggest calling it hard_timeout
or hard-timeout) to the table at the bottom of
https://dev.launchpad.net/LEP/FeatureFlags when it lands.

Revision history for this message
Stuart Bishop (stub) wrote :

This feature can be used to keep the timeout high for person merge requests - the lower the timeout, the more of these will fail per Bug #162510. We can keep a reasonably high timeout for user requested merges, and a very high timeout for admin merges.

Changed in launchpad-foundations:
status: Triaged → In Progress
assignee: nobody → Robert Collins (lifeless)
milestone: none → 10.10
Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
tags: added: qa-needstesting
Changed in launchpad-foundations:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

This is now working - I successfully changed the staging bugtask index page timeout to 5 seconds and things timed out like crazy :)

tags: added: qa-ok
removed: qa-needstesting
Curtis Hovey (sinzui)
Changed in launchpad-foundations:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.