Incorrect nonce processing in API

Bug #319710 reported by Brad Crittenden
38
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Gary Poster

Bug Description

Using lplib in a long-running script that made tens of thousands of requests I was occasionally seeing 401-Unauthorized errors that are actually shown to be "Invalid nonce/timestamp" errors as shown below[1].

Reviewing the code in db/oath.py for ensureNonce vs the OAuth spec[2] it seems our implementation is exactly backwards. Rather than require nonce uniqueness only for the period of a given timestamp (1 second), our implementation allows nonce reuse for a period of [t-60, t+60] seconds for any timestamp t and then raises an error if the nonce is used again outside of this window. As described in the spec, re-using a nonce with different timestamps should be allowed, not an error condition.

At the top of db/oath.py there is a rationale given for the use of this expanded window but the justification is not strong. It would be better to follow the spec strictly until the need to deviate is demonstrated and well understood.

[1] https://pastebin.canonical.com/12941/
[2] http://oauth.net/core/1.0/#nonce

Brad Crittenden (bac)
description: updated
Revision history for this message
Guilherme Salgado (salgado) wrote :

IIRC, ensureNonce() was implemented this way for the use case described in db/oauth.py and things like http://groups.google.com/group/oauth/msg/387fdafcf0be322a, yet still preventing replay attacks. But now I have the feeling these are two conflicting use cases, so ensureNonce() won't actually prevent replay attacks if they're made less than 60 seconds after the original request, no?

Anyway, regardless of what we decide about ensureNonce(), we must start deleting nonces older than a given number of hours/days, to avoid the clashes we're seeing.

Revision history for this message
Brian Murray (brian-murray) wrote :

I have seen these errors too when trying to find information about all the bug tasks assigned to particular people.

Changed in launchpad:
status: New → Confirmed
Changed in launchpad:
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Because of https, we don't have to worry about replay attack for now.

But it's true that this implementation is terrible.

We should:

1- Rename the ensureNonce() method to checkNonceAndTimestamp().
2- Look for the nonce using both nonce and timestamp. If there is one, we reject the request because that's not allowed.
3- Look at the latest timestamp/nonce for the request and only allow it if the timestamp is within our window of that last timestamp received. (We allow the window because of pipelining). So we deviate from the spec here.
4- The rationale for this deviation should be explained in way that doesn't make you puzzled for half an hour.

5 (optional) clear out old nonce information. With the above logic that's only to save on cruft, because it's not needed per-se.

Changed in launchpad-foundations:
milestone: none → 2.2.2
Revision history for this message
Brad Crittenden (bac) wrote :

The initial report noted that the error occurred in a long-running script. Just now I ran a script that only made three requests to the API and I got a 401 while attempting to fetch a public project. So it appears the nonce space is being filled based on the access token associated with a user's credentials and cumulative for the life of those credentials, which increases the chance of collision and failure over time.

Revision history for this message
Bryce Harrington (bryce) wrote :

I've noticed in the past several days the frequency of this has increased significantly. Whereas before it seemed like I got a 401 every hundred or so operations, now it is more on the order of every 10-20 operations or so.

Brad Crittenden (bac)
description: updated
Revision history for this message
Brad Crittenden (bac) wrote :

Francis can you explain the rationale for this "pipelining" exception? It is referred to obliquely in the code, and you make an unexplained reference above, but I do not understand the use case and why we need to turn the spec upside down to support it.

In my understanding the spec says the nonce must be unique only for a window of one second, the granularity of a timestamp.

The pipelining exception seems to indicate the nonce can be non-unique for a window of +/- 60 seconds. That exception completely destroys the utility of using a nonce in the first place as it now must not be unique for a given timestamp.

Changed in launchpad-foundations:
assignee: nobody → flacoste
status: Triaged → In Progress
Revision history for this message
Björn Tillenius (bjornt) wrote :

Francis, this has been in progress for more than two weeks. What's the current status of it? I also get those Unauthorized errors quite frequently.

Revision history for this message
Björn Tillenius (bjornt) wrote :

As a data point, I have a script that runs every 10 minutes, and it does maybe 30 requests. About every second run, the script fails with an Unauthorized error.

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

I haven't basically been able to start concretely on it. Gary will take care of it once he finishes the librarian bug.

Changed in launchpad-foundations:
assignee: flacoste → garyposter
status: In Progress → Triaged
Gary Poster (gary)
Changed in launchpad-foundations:
status: Triaged → In Progress
Revision history for this message
Gary Poster (gary) wrote :

The necessary database changes were rolled out as a part of 2.2.2. The code changes were committed to trunk after the deployment, but they are only pertinent to edge, and so they will be available very soon (the next time edge is updated from stable).

Changed in launchpad-foundations:
status: In Progress → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

I can confirm this bug seems to be gone.

(Unfortunately, now I'm seeing a worse bug, #336866, but the symptoms are different so guess it's an unrelated problem.)

Revision history for this message
Leonard Richardson (leonardr) wrote :

Judging from the log given by Markus in bug 318112, I believe this is a threading problem. httplib2 is not thread safe:
if you write a multithreaded application using launchpadlib, each thread needs to have its own Launchpad object.

Basically, if two threads try to make a request simultaneously there's a good chance httplib will raise CannotSendRequest. httplib2 will try to make the request again, without re-signing it, and you'll get a 401. I don't have a picture of exactly how this works, but unless you have counterevidence I think it's very likely this is the problem.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.