AttributeError: 'NoneType' object has no attribute 'utf_8_decode'

Bug #491705 reported by Robert Collins
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Gary Poster

Bug Description

We're seeing operational issues in the app servers when they are brought up in r/o mode.

I'll attach more backtraces to the bug, but briefly:
encodings.utf_8 has its global variable codecs set to None, which is something the VM does when it thinks a module is being finalised, to make things cleanup safely and racefree.

This patch http://bugs.python.org/file15318/modcleanup.patch in http://bugs.python.org/issue6551 shows a similar issue that Python itself had in its test suite with near-identical symptoms.

My theory is:
14:11 < lifeless> I'm thinking a thread that starts up and does some naughty 'probe'
14:12 < lifeless> and thus sometimes it wins, and the system breaks
14:12 < lifeless> and other times it loses and the system doesn't break

Which sounded plausible to spiv, so I think bears investigating.

It happened with several (but not all app servers) and restarting them manually usually work.

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (4.6 KiB)

Traceback excerpt

2009-12-03T02:31:46 ERROR root Exception during task
Traceback (most recent call last):
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/taskthreads.py", line 48, in handlerThread
    task.service()
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/serverchannelbase.py", line 207, in service
    task.service()
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/http/httptask.py", line 71, in service
    self.channel.server.executeRequest(self)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zc.zservertracelog-1.1.5-py2.4.egg/zc/zservertracelog/tracelog.py", line 114, in executeRequest
    response = self.application(env, start_response)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.app.wsgi-3.4.1-py2.4.egg/zope/app/wsgi/__init__.py", line 49, in __call__
    request = self.requestFactory(environ['wsgi.input'], environ)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.app.publication-3.4.4-py2.4.egg/zope/app/publication/httpfactory.py", line 69, in __call__
    request = request_class(input_stream, env)
  File "/srv/launchpad.net/production/login-rev-8648/lib/canonical/launchpad/webapp/servers.py", line 573, in __init__
    BasicLaunchpadRequest.__init__(self, body_instream, environ, response)
  File "/srv/launchpad.net/production/login-rev-8648/lib/canonical/launchpad/webapp/servers.py", line 502, in __init__
    super(BasicLaunchpadRequest, self).__init__(
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.publisher-3.5.6-py2.4.egg/zope/publisher/browser.py", line 234, in __init__
    super(BrowserRequest, self).__init__(body_instream, environ, response)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.publisher-3.5.6-py2.4.egg/zope/publisher/http.py", line 311, in __init__
    environ = sane_environment(environ)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.publisher-3.5.6-py2.4.egg/zope/publisher/http.py", line 79, in sane_environment
    dict['PATH_INFO'] = dict['PATH_INFO'].decode('utf-8')
  File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
AttributeError: 'NoneType' object has no attribute 'utf_8_decode'
------
2009-12-03T02:31:47 ERROR root Exception during task
Traceback (most recent call last):
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/taskthreads.py", line 48, in handlerThread
    task.service()
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/serverchannelbase.py", line 207, in service
    task.service()
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zope.server-3.4.3-py2.4.egg/zope/server/http/httptask.py", line 71, in service
    self.channel.server.executeRequest(self)
  File "/srv/launchpad.net/production/login-rev-8648/eggs/zc.zservertracelog-1.1.5-py2.4.egg/zc/zservertracelog/tracelog.py", line 114, in executeRequest
    response = self.application(env, start_response)
  File "/srv/launchpad.net/productio...

Read more...

security vulnerability: yes → no
affects: canonical-identity-provider → launchpad-foundations
visibility: private → public
description: updated
description: updated
Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Downgrading to High, since it didn't occur on the new python2.5 code.

It's not clear what is garbage collecting the encodings.utf_8 module, but keeping a reference to it at the top of bin/run allowed us to restart several times without problems.

In 2.4, the encodings.utf_8 module is not in the clean_modules list.

In 2.5, it is.

It probably makes sense for zc.recipe.egg to also not delete any module starting with encodings. because of that problem.

Also there were questions on how safe it was to modify sys.modules while iterating over its keys.

Changed in launchpad-foundations:
importance: Critical → High
Gary Poster (gary)
Changed in launchpad-foundations:
assignee: nobody → Gary Poster (gary)
Revision history for this message
Gary Poster (gary) wrote :

= Previous shipit occurrence =

We encountered a similar problem in shipit earlier. ``encodings.utf_8`` was not in the bin/run clean_modules list initially. This was working fine for some period of time (weeks). Suddenly, this began causing a problem exactly like the one described here when restarting with Python 2.4. We eliminated _pythonpath.py from the possible causes by making it into an empty file. We isolated the same problem identified here (``encodings.utf_8`` being removed via the code in bin/run). This should theoretically only happen if the system Python has changed somehow (e.g., site.py is changed to, directly or indirectly, import encodings.utf_8). We ran ``make clean`` and ``make`` and the problem went away--bin/run now included ``encodings.utf8`` in clean_modues.

I currently see a few possible causes.

(1) The system Python changed somehow between the two starts. For instance, site.py was changed to directly or indirectly import encodings.utf_8, as described above.

(2) Python sometimes includes encodings.utf_8, and sometimes doesn't (e.g., a race condition, like what Robert describes). If the build is created when encoding.utf_8 does not stick around, and then bin/run is called when the module *does* stick around, things break.

The symptoms we saw in the shipit incident seemed to indicate the first cause, while the symptoms here seem to indicate the second. Perhaps there is a combination of the two causes: (3) something could have changed in the system Python to make a race condition change the likely winner.

= Modifying sys.modules while iterating keys =

Francis mentioned that this was a concern.

We are iterating over a snapshot of the keys (i.e., the list from ``keys()`` rather than ``iterkeys()`` or, effectively, ``iter(...)``) so that aspect of the mutation should not be an issue.

That said, unsurprisingly, mutating sys.modules also makes me uncomfortable.

= Possible fixes =

(1) We could hardcode names of packages not to remove from sys.modules (as suggested, do not delete any module starting with ``encodings.``). This maintains the risky sys.modules mutation, and only slightly loosens the very tight whitelist approach to the allowed modules.

(2) We could prevent removal of any packages that live in "standard lib" paths, rather than maintaining a whitelist. These paths are already available to us. This maintains the risky sys.modules mutation. However, it does not hardcode values. It significantly loosens the allowed modules, and would be more generally forgiving than what we have now, or than the approach described in the first option.

(3) We could not mutate sys.modules at all, and use an os.exec variant calling python -S to restart the script. This also has made me uncomfortable, but experience is suggesting that it may be the better of the two basic approaches (that is, in comparison with the sys.modules mutation).

(Note that another option, simply making the shebang line call python -S, is not an option because buildout needs to support Windows.)

My current preference is option 3, followed by option 2. I'll pursue the third, and fall back to other options as necessary.

Gary Poster (gary)
Changed in launchpad-foundations:
status: Triaged → In Progress
Revision history for this message
Gary Poster (gary) wrote :

The following is a branch with the workaround that the LOSAs have had to do manually:

https://code.launchpad.net/~gary/launchpad/bug-491705-hack

It is approved for merging and in EC2 in preparation for landing.

The proper fix is in svn+ssh://svn.zope.org/repos/main/zc.buildout/branches/gary-4 and is near completion.

Revision history for this message
Gary Poster (gary) wrote :

Linked branch addresses the problem. I'll land after 10.02 so we have plenty of time to QA.

Changed in launchpad-foundations:
milestone: none → 10.03
Gary Poster (gary)
Changed in launchpad-foundations:
milestone: 10.03 → 10.04
Gary Poster (gary)
Changed in launchpad-foundations:
status: In Progress → Fix Committed
Revision history for this message
Ursula Junque (ursinha) wrote : Bug fixed by a commit
tags: added: qa-needstesting
Revision history for this message
Ursula Junque (ursinha) wrote :
Gary Poster (gary)
Changed in launchpad-foundations:
status: Fix Committed → Fix Released
Gary Poster (gary)
tags: added: qa-ok
removed: qa-needstesting
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.