1.6~bzr854-0ubuntu13 fails to run instances

Bug #439288 reported by Thierry Carrez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Invalid
High
Thierry Carrez
Karmic
Invalid
High
Thierry Carrez

Bug Description

When starting up an instance on 1.6~bzr854-0ubuntu13, it fails to run on the NC.

nc.log shows:
[EUCAINFO ] retrieving images for instance i-5BA00998 (disk limit=20480MB)...
[EUCAINFO ] walrus_request(): downloading /var/lib/eucalyptus/instances/admin/i-5BA00998/kernel-digest
[EUCAINFO ] from http://127.0.0.1:8773/services/Walrus/ueckernel/vmlinuz-2.6.31-11-generic.manifest.xml
[EUCADEBUG ] walrus_request(): writing GET output to /var/lib/eucalyptus/instances/admin/i-5BA00998/kernel-digest
[EUCADEBUG ] walrus_request(): wrote 0 bytes in 0 writes
[EUCAERROR ] walrus_request(): couldn't connect to host (7)

For some reasons it believes it should connect to 127.0.0.1 to retrieve things.
Cluster is apparently correctly registered with external IP addresses... I couldn't find an easy workaround.

I'm considering reverting beta candidate to 20090930.

Tags: iso-testing
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu):
importance: Undecided → High
milestone: none → ubuntu-9.10-beta
Revision history for this message
Thierry Carrez (ttx) wrote :

Needs to be reverted to code from ubuntu12 and uploaded as ubuntu14.

Changed in eucalyptus (Ubuntu Karmic):
milestone: ubuntu-9.10-beta → none
status: New → Triaged
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Karmic):
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
Thierry Carrez (ttx) wrote :

Some more information.
I did a 20090930.1 CD-based install.
Cluster : no issue. Looking at the registration logs, I remember seeing some errors, but the final lines all show "success" with the external IP used, so I figured everything was ok and fixed.
Node: no issue.
Stopped/started eucalyptus on cluster to take care of bug 439251
Ran nodes discovery and added the node.
On the node, nc.log shows its getting polled.
Bundled EMI, ran instance: nc.log shows 127.0.0.1 walrus download errors (see above)
Going back to the cluster, I tried:
- restarting everything
- manually registering
- manually registering and restart
- manually deregister, register and restart
Every time the external IP address is shown and the tools answer "success".
But every time I run the instance I got the error on the NC.

Revision history for this message
Daniel Nurmi (nurmi) wrote :

Is it possible that, at some point, a walrus registration was attempted with 'localhost'? If so, you'll need to deregister walrus:

euca_conf --deregister-walrus

and try registering again with a valid public IP. This error is implying that the system believes that a registered walrus lives on 127.0.0.1

Revision history for this message
Thierry Carrez (ttx) wrote :

This is an install from scratch, using CD install. First test was done with the autoregistration stuff from Dustin. The logs showed the external IP to be used.

See above, I tried deregister/manual register. It also showed external IP being used, and also failed. I think there ios something on the NC side that makes it believe walrus is 127.0.0.1, whatever you do on the Cluster side doesn't seem to help.

btw, I also tried deregistering the node and reregistering it, fwiw :)

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Thierry,

I *definitely* am not reproducing the 127.0.0.1 error. In my nc.log, I'm seeing:

[Wed Sep 30 17:37:49 2009][008286][EUCAINFO ] from http://192.168.0.136:8773/services/Walrus/ueckernel/vmlinuz-2.6.31-11-generic.manifest.xml
[Wed Sep 30 17:37:49 2009][008286][EUCAINFO ] from http://192.168.0.136:8773/services/Walrus/uecramdisk/initrd.img-2.6.31-11-generic.manifest.xml
[Wed Sep 30 17:37:49 2009][008286][EUCAINFO ] from http://192.168.0.136:8773/services/Walrus/uecimage/ubuntu-uec-jaunty-amd64.img.manifest.xml
[Wed Sep 30 17:37:49 2009][008286][EUCAINFO ] from http://192.168.0.136:8773/services/Walrus/uecimage/ubuntu-uec-jaunty-amd64.img.manifest.xml

Where 192.168.0.136 is my CC, and 192.168.0.121 is my NC.

I am also seeing my network traffic flooded for ~15 minutes as it downloads 10GB+, and I see
root@node:/var/lib/eucalyptus/instances# find /var/lib/eucalyptus/instances/ -type f
/var/lib/eucalyptus/instances/admin/i-3BEB06F3/kernel
/var/lib/eucalyptus/instances/admin/i-3BEB06F3/ramdisk
/var/lib/eucalyptus/instances/admin/i-3BEB06F3/disk-digest
/var/lib/eucalyptus/instances/admin/i-3BEB06F3/disk
/var/lib/eucalyptus/instances/eucalyptus/cache/eki-8F0A139D/kernel
/var/lib/eucalyptus/instances/eucalyptus/cache/eki-8F0A139D/kernel-digest
/var/lib/eucalyptus/instances/eucalyptus/cache/eri-E64F14E8/ramdisk
/var/lib/eucalyptus/instances/eucalyptus/cache/eri-E64F14E8/ramdisk-digest
/var/lib/eucalyptus/instances/eucalyptus/cache/emi-D5871527/disk-staging

However, I never see my instance move from "pending" to "running", so there is a bug here, but perhaps different from whatever you're seeing on your end.
RESERVATION r-31500705 admin default
INSTANCE i-3BEB06F3 emi-D5871527 192.168.0.250 172.19.1.2 pending mykey 0 m1.xlarge
2009-09-30T22:37:49.042Z canyonedge eki-8F0A139D eri-E64F14E8

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Actually, the last part of the previous comment about "never" moving pending->running might be premature, actually.

Its still working on creating the instance, I think. In the NC, I can see:

root 17311 17278 3 17:53 ? 00:00:16 /bin/dd if=/var/lib/eucalyptus/instances/admin/i-3BEB06F3/disk of=/dev/loop1 bs=512k

So I'll give it a bit longer. (Note that I'm doing this on an external USB hardrive, so my throughput is very slow, like 10MB/s).

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Okay, yeah, so my instance runs after a long time:

RESERVATION r-31500705 admin default
INSTANCE i-3BEB06F3 emi-D5871527 192.168.0.250 172.19.1.2 running mykey 0 m1.xlarge 2009-09-30T22:37:49.042Z canyonedge eki-8F0A139D eri-E64F14E8

The console output is attached. Networking isn't quite working correctly.

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I'm punting this one back over to you, Thierry, as I was able to run instances (and my day is ending here).

:-Dustin

Changed in eucalyptus (Ubuntu Karmic):
assignee: Dustin Kirkland (kirkland) → Thierry Carrez (ttx)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

The lack of networking appears to be a problem in the particular image I tried (daily UEC build).

When I run bundle and run an image from Eucalyptus, such as:
 * http://eucalyptussoftware.com/downloads/eucalyptus-images/euca-ubuntu-9.04-x86_64.tar.gz
networking comes up just fine!

I'm now running my *first* network connected Eucalyptus instance!

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I was able to do this with these instructions:

 wget http://eucalyptussoftware.com/downloads/eucalyptus-images/euca-ubuntu-9.04-x86_64.tar.gz
 gunzip euca-ubuntu-9.04-x86_64.tar.gz
 tar xvf euca-ubuntu-9.04-x86_64.tar
 euca-bundle-image -i euca-ubuntu-9.04-x86_64/kvm-kernel/vmlinuz-2.6.28-11-generic --kernel true
 euca-upload-bundle -b ueckernel -m /tmp/vmlinuz-2.6.28-11-generic.manifest.xml
 euca-register ueckernel/vmlinuz-2.6.28-11-generic.manifest.xml
 euca-bundle-image -i euca-ubuntu-9.04-x86_64/kvm-kernel/initrd.img-2.6.28-11-generic --ramdisk true
 euca-upload-bundle -b uecramdisk -m /tmp/initrd.img-2.6.28-11-generic.manifest.xml
 euca-register uecramdisk/initrd.img-2.6.28-11-generic.manifest.xml
 euca-bundle-image -i euca-ubuntu-9.04-x86_64/ubuntu.9-04.x86-64.img --kernel eki-XXXXXXXX --ramdisk eri-XXXXXXXX
 euca-upload-bundle -b uecimage -m /tmp/ubuntu.9-04.x86-64.img.manifest.xml
 euca-register uecimage/ubuntu.9-04.x86-64.img.manifest.xml
 euca-run-instances -k mykey emi-XXXXXXXX
 watch -n 5 euca-describe-instances
 euca-get-console-output i-XXXXXXXX
 touch mykey.priv
 chmod 600 mykey.priv
 euca-add-keypair mykey > mykey.priv
 ssh -i mykey.priv ubuntu@192.168.0.250

Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Karmic):
assignee: Thierry Carrez (ttx) → Dustin Kirkland (kirkland)
Revision history for this message
Thierry Carrez (ttx) wrote :

I won't have time to spend more time on that this morning.

I validated that this bug does *not* happen in that scenario:
- Install amd64 20090930.2 ISO (with 0ubuntu12)
- manual registration
- upgrade to 0ubuntu13

I also validated that this bug doesn't occus in that scenario:
- Install i386 20090930.2 ISO (with 0ubuntu12)
- upgrade to 0ubuntu13

Two things to consider:
1/ Maybe the issue is introduced by direct installation of ubuntu13 from the ISO tasksel. It introduces subtle differences (like not starting the services while still running the postinsts) that may explain it...
2/ Maybe it was a screwup in my test. Definitely a possibility, since I'm the only one that tested that ISO and I ran the full test only once (you know how much time it takes :)

In both cases we should work on postbeta daily ISOs (shipping ubuntu13) to validate or invalidate my findings.

Changed in eucalyptus (Ubuntu Karmic):
status: Triaged → Incomplete
Revision history for this message
Thierry Carrez (ttx) wrote :

I need to reproduce it with todays 20091002 ISO.

Changed in eucalyptus (Ubuntu Karmic):
assignee: Dustin Kirkland (kirkland) → Thierry Carrez (ttx)
milestone: none → ubuntu-9.10
Revision history for this message
Thierry Carrez (ttx) wrote :

Testing with a 20091004 ISO containing -0ubuntu13:
Autoregistration failed after CD install/reboot, the registration logs only show:
ERROR: you need to be on the CLC host and the CLC needs to be running.

Stopping/starting eucalyptus, autoregistration worked.

Trying to run an instance, it worked alright.

So if autoregistration still needs more work, this precise bug (NC believes walrus is on 127.0.0.1 whatever you do) is invalid (or not reproduceable).

Changed in eucalyptus (Ubuntu Karmic):
milestone: ubuntu-9.10 → none
status: Incomplete → Invalid
tags: added: iso-testing
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.