Eucalyptus cloud controller stops working suddenly

Bug #428010 reported by Gustavo Niemeyer
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I had to reinstall Eucalyptus from the ground up several times already. After some time with little to no activity (besides running euca-describe-images and logging in the admin interface), it refuses new actions through the interface with:

 $ euca-describe-images
Warning: failed to parse error message from AWS: <unknown>:1:0: syntax error
EC2ResponseError: 403 Forbidden
Failure: 403 Forbidden

This same command with exactly the same settings was working just moments before.

Then, if I try to download a new credentials zip file from the admin interface, I get back an empty file (0 bytes).

Once this situation is in place, restarting it doesn't help. I have to kill the whole installation, purging packages, configuration, databases, etc, and reinstalling everything again. From then on, it will start working again for a short period, and then the problem starts again.

Revision history for this message
Chuck Short (zulcss) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. This bug did not have a package associated with it, which is important for ensuring that it gets looked at by the proper developers. You can learn more about finding the right package at https://wiki.ubuntu.com/Bugs/FindRightPackage. I have classified this bug as a bug in eucalyptus

When reporting bugs in the future please use apport, either via the appropriate application's "Help -> Report a Problem" menu or using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

affects: ubuntu → eucalyptus (Ubuntu)
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Oops.. sorry for missing the package.

Btw, this was with 1.6~bzr645-0ubuntu2.

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

I confirm I had exactly the same problem happen to me last week.

When looking at /var/log/eucalyptus/cloud-error.log, there was some verbiage about a corrupted index or somesuch. Do you see something similar?

Also, is it possible that you started the eucalyptus-cloud service from within an SSH session that suffer bug #407428?

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Potentially some times, but signals seems to be working fine now, and it happened to me today again.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

I'm attaching my cloud-ouput.log file. grep -i "corrupt" there doesn't find anything, but there are several other errors.

Revision history for this message
Trevor Ellermann (trevor-ellermann) wrote :

I am getting a similar error though it only happens for me when I actually try to run an instance. Here is the exact command line and error message.

$ euca-run-instances emi-67DC1321 -k mykey
Warning: failed to parse error message from AWS: <unknown>:1:0: syntax error
EC2ResponseError: 403 Forbidden
Failure: 403 Forbidden

Here is the only error message that is produced when I try to run the instance. It is from cloud-output.log

com.eucalyptus.ws.AuthenticationException: Missing required parameter: AWSAccessKeyId
        at com.eucalyptus.ws.handlers.HmacV2Handler.incomingMessage(HmacV2Handler.java:110)
        at com.eucalyptus.ws.handlers.MessageStackHandler.handleUpstream(MessageStackHandler.java:115)
        at com.eucalyptus.ws.server.FilteredPipeline$StageBottomHandler.handleUpstream(FilteredPipeline.java:171)
        at com.eucalyptus.ws.server.NioServerHandler.messageReceived(NioServerHandler.java:119)
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:114)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:385)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:459)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:443)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:381)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:342)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:329)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:330)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:203)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:53)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

Trevor,

Can you have a look at bug #430093, and see if it is the same bug that you get? If yes, could you add a comment with relevant details (such as the above log snippet).

Thanks!

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Marking confirmed, since multiple reports seem to indicate that this issue is widespread.

High priority, as this is core to the functionality of a working cloud.

:-Dustin

Changed in eucalyptus (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

From the description of bug #436885, this doesn't feel like a duplicate at least, since the deadlock condition wasn't following any significant usage. But then, the other bug is quite short on details, so it could just be a non-relevant bit.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.