Launchpad itself

Incorrect nonce processing in API

Bug #319710 reported by Brad Crittenden on 2009-01-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Gary Poster	Launchpad itself 2.2.2

Bug Description

Using lplib in a long-running script that made tens of thousands of requests I was occasionally seeing 401-Unauthorized errors that are actually shown to be "Invalid nonce/timestamp" errors as shown below[1].

Reviewing the code in db/oath.py for ensureNonce vs the OAuth spec[2] it seems our implementation is exactly backwards. Rather than require nonce uniqueness only for the period of a given timestamp (1 second), our implementation allows nonce reuse for a period of [t-60, t+60] seconds for any timestamp t and then raises an error if the nonce is used again outside of this window. As described in the spec, re-using a nonce with different timestamps should be allowed, not an error condition.

At the top of db/oath.py there is a rationale given for the use of this expanded window but the justification is not strong. It would be better to follow the spec strictly until the need to deviate is demonstrated and well understood.

[1] https://pastebin.canonical.com/12941/
[2] http://oauth.net/core/1.0/#nonce

See original description

Tags:

Brad Crittenden (bac) on 2009-01-21

description:

updated

Revision history for this message

Guilherme Salgado (salgado) wrote on 2009-01-21:

IIRC, ensureNonce() was implemented this way for the use case described in db/oauth.py and things like http://groups.google.com/group/oauth/msg/387fdafcf0be322a, yet still preventing replay attacks. But now I have the feeling these are two conflicting use cases, so ensureNonce() won't actually prevent replay attacks if they're made less than 60 seconds after the original request, no?

Anyway, regardless of what we decide about ensureNonce(), we must start deleting nonces older than a given number of hours/days, to avoid the clashes we're seeing.

Revision history for this message

Brian Murray (brian-murray) wrote on 2009-01-21:

I have seen these errors too when trying to find information about all the bug tasks assigned to particular people.

Changed in launchpad:
status:	New → Confirmed

Francis J. Lacoste (flacoste) on 2009-01-23

Changed in launchpad:
importance:	Undecided → High
status:	Confirmed → Triaged

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2009-01-23:

Because of https, we don't have to worry about replay attack for now.

But it's true that this implementation is terrible.

We should:

1- Rename the ensureNonce() method to checkNonceAndTimestamp().
2- Look for the nonce using both nonce and timestamp. If there is one, we reject the request because that's not allowed.
3- Look at the latest timestamp/nonce for the request and only allow it if the timestamp is within our window of that last timestamp received. (We allow the window because of pipelining). So we deviate from the spec here.
4- The rationale for this deviation should be explained in way that doesn't make you puzzled for half an hour.

5 (optional) clear out old nonce information. With the above logic that's only to save on cruft, because it's not needed per-se.

Changed in launchpad-foundations:
milestone:	none → 2.2.2

Revision history for this message

Brad Crittenden (bac) wrote on 2009-01-29:

The initial report noted that the error occurred in a long-running script. Just now I ran a script that only made three requests to the API and I got a 401 while attempting to fetch a public project. So it appears the nonce space is being filled based on the access token associated with a user's credentials and cumulative for the life of those credentials, which increases the chance of collision and failure over time.

Revision history for this message

Bryce Harrington (bryce) wrote on 2009-01-31:

I've noticed in the past several days the frequency of this has increased significantly. Whereas before it seemed like I got a 401 every hundred or so operations, now it is more on the order of every 10-20 operations or so.

Brad Crittenden (bac) on 2009-01-31

description:

updated

Revision history for this message

Brad Crittenden (bac) wrote on 2009-01-31:

Francis can you explain the rationale for this "pipelining" exception? It is referred to obliquely in the code, and you make an unexplained reference above, but I do not understand the use case and why we need to turn the spec upside down to support it.

In my understanding the spec says the nonce must be unique only for a window of one second, the granularity of a timestamp.

The pipelining exception seems to indicate the nonce can be non-unique for a window of +/- 60 seconds. That exception completely destroys the utility of using a nonce in the first place as it now must not be unique for a given timestamp.

Francis J. Lacoste (flacoste) on 2009-02-02

Changed in launchpad-foundations:
assignee:	nobody → flacoste
status:	Triaged → In Progress

Revision history for this message

Björn Tillenius (bjornt) wrote on 2009-02-18:

Francis, this has been in progress for more than two weeks. What's the current status of it? I also get those Unauthorized errors quite frequently.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2009-02-18:

As a data point, I have a script that runs every 10 minutes, and it does maybe 30 requests. About every second run, the script fails with an Unauthorized error.

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2009-02-18:

I haven't basically been able to start concretely on it. Gary will take care of it once he finishes the librarian bug.

Changed in launchpad-foundations:
assignee:	flacoste → garyposter
status:	In Progress → Triaged

Gary Poster (gary) on 2009-02-24

Changed in launchpad-foundations:
status:	Triaged → In Progress

Revision history for this message

Gary Poster (gary) wrote on 2009-02-28:

#10

The necessary database changes were rolled out as a part of 2.2.2. The code changes were committed to trunk after the deployment, but they are only pertinent to edge, and so they will be available very soon (the next time edge is updated from stable).

Changed in launchpad-foundations:
status:	In Progress → Fix Released

Revision history for this message

Bryce Harrington (bryce) wrote on 2009-03-02:

#11

I can confirm this bug seems to be gone.

(Unfortunately, now I'm seeing a worse bug, #336866, but the symptoms are different so guess it's an unrelated problem.)

Revision history for this message

Leonard Richardson (leonardr) wrote on 2009-03-03:

#12

Judging from the log given by Markus in bug 318112, I believe this is a threading problem. httplib2 is not thread safe:
if you write a multithreaded application using launchpadlib, each thread needs to have its own Launchpad object.

Basically, if two threads try to make a request simultaneously there's a good chance httplib will raise CannotSendRequest. httplib2 will try to make the request again, without re-signing it, and you'll get a 401. I don't have a picture of exactly how this works, but unless you have counterevidence I think it's very likely this is the problem.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #318112

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.