Preface

Recently I had faced an interesting question during an Interview. Question was, One of the virtual machine is in error state, how will you troubleshoot?. Well, thats a general question and all of them went through it atleast once?. But real question is where to troubleshoot? Before start troubleshooting, you need to understand whats the core issue causes this error status. Here i try to explain few scenarios for a Guest VM error state.

Core services may not running

nova service-list
neutron service-list
rabbitmqctl cluster_status
cinder service-list
systemctl status keystone.service

All should give a smiley to you :-). If not, check the reason for a failure.

Two different host types

When you are live migrating a Guest VM from a host to another host which have a different computing capabilities like less number of CPU cores, the VM may ended up in an error.

No Shared volume

Live migration is possible only when a shared volume is attached to both Source and Destination host. If there is no shared volume available, the virtual machine will fall down to error state.

No enough Quota

User is spinning up a virtual machines, but his resouce quota finished, for example neutron port the virtual machine will ended up with error state

Apparmor in place

Apparmor (Application Armor) is a Linux kernel security module that allows the system administrator to restrict programs' capabilities with per-program profiles. If you didnt implement proper profile for libvirt, during the live migration the virtual machine may ended up in error.

The source compute host swapped

The source compute where Guest VM residing is already swapped. Guest VM start using swap. And in this case you are trying to live migrate the instance to different host, your virtual machine may ended up with error.

No valid host found

Some times, you have enough resouce in your compute inventory but you may be missing a metadata entry in your Aggregate configuration. In this case, nova-scheduler is looking for aggregate with this metadata in flavor properties and could not find one. In this case, nova mark the virtual machine in error state with a message No valid hosts found.

Further reading