Unable to boot RHEL 7.6 on baremetal nodes

 

环境

Red Hat OpenStack Platform 13

Red Hat Enterprise Linux 7.6

问题

When trying to deploy an image built from the RHEL7.6 QCOW2, the image fails to boot.

The issue is that the double-quotes are in the wrong spot in the ​​GRUB_CMDLINE_LINUX​​ as such, everything after ​​crashkernel=auto"​​ is taken as a command instead of a CMDLINE argument.

决议

The image can be edited using ​​guestfish​​ to modify ​​/etc/default/grub​​ like so:

 ​

sudo guestfish -a rhel-7.6.qcow2
> run
> mount /dev/sda /
> edit /etc/default/grub

Move the quotes from the end of crashkernel=auto" to the end of the whole line. So the change should be:

 

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0

Change To:

 ​

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200n8 no_timer_check net.ifnames=0"

​Raw​

> exit

根源

This issue is caused by the double-quotes (") being in the wrong place in the ​​GRUB_CMDLINE_LINUX​​ line of ​​/etc/default/grub​

By default the ​​GRUB_CMDLINE_LINUX​​ looks like this:

 ​

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0

As can be seen, the double-quotes end after crashkernel=auto. This results in everything after that being parsed as a command rather than a command-line option. It should be:

 ​

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200n8 no_timer_check net.ifnames=0"

诊断步骤

Working backwards from the Nova logs, we can see the error you're referring to with the timeout in nova-conductor.log:

 ​

req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770ec85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default default] 
Failed to compute_task_build_instances: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 74627e47-4c0f-442e-a7e0-76595ef1eb7ee.:
MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 74628f44-4c0f-442e-a7e0-76595ef1fd3e.

Checking for the request ID in nova-scheduler.log, we can see:

​Raw​

nova-scheduler successfully identifies a node: 
2018-11-26 02:06:58.244 1 DEBUG nova.scheduler.utils [req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770e121914a658ab3c85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default default]
Attempting to claim resources in the placement API for instance 74628f44-4c0f-442e-a7e0-76595ef1fd3e claim_resources /usr/lib/python2.7/site-packages/nova/scheduler/utils.py:786

2018-11-26 02:06:58.809 1 DEBUG nova.scheduler.filter_scheduler [req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770e121914a658ab3c85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default
default] Selected host: (overcloud-controller-0.example.com, b94daa75-82f9-4381-9aa4-52ed77de3431) ram: 768000MB disk: 184320MB io_ops: 0 instances: 0 _consume_selected_host
/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:325

The selected Node ID is: ​​b94daa75-82f9-4381-9aa4-52ed77de3431​

Checking Ironic for the node ID:

 ​

2018-11-26 02:25:26.828 1 ERROR ironic.drivers.modules.agent_base_vendor [req-98b72aaf-5185-4ab0-8df2-5c1618629210 - - - - -] Asynchronous exception: Node failed to deploy. Exception: Failed to install a bootloader when deploying node b94daa75-82f9-4381-9aa4-52ed77de3431. Error: {u'message': u'Command execution failed: Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.', u'code': 500, u'type': u'CommandExecutionError', u'details': u'Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'} for node b94daa75-82f9-4381-9aa4-52ed77de3431: InstanceDeployFailure: Failed to install a bootloader when deploying node b94daa75-82f9-4381-9aa4-52ed77de3431. Error: {u'message': u'Command execution failed: Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.', u'code': 500, u'type': u'CommandExecutionError', u'details': u'Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'}

The exact problem is that it is unable to find the command ​​no_timer_check​​:

 ​

Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'}

Looking at the ​​/etc/default/grub​​ file within the QCOW2 image, we can see where this is coming from:

​ ​

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0