HowTo: Workaround fixing: HP hosts stuck on reboot after ILO upgrade & showing lost connectivity


This short blog will show a method to overcome the issues an ILO 4 firmware upgrade for HP DL380G9 servers is creating. The method is found and tested by Cris van den Dungen an IT admin I work with often. So all credits go to him for finding this workaround.

The issue

After upgrading ILO from 2.40 to the 2.54 all ESXi hosts running 6.0 with specific builds are reporting the following error;

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0

They seem to work fine but once you reboot them (tried with one to see if the error would go away) they fail to boot.ILO

“Error loading /conrep.v00

Compressed MD5: (md5hash)

Decompressed MD5: (only 0s)

Fatal error: 6 (Buffer too small)”

 

Steps to resolve it

Of course steps have been taken to resolve it, the ILO version was downgraded to 2.40 but that didn’t solve it.

We noted that ESXi hosts with version 6 and build 3825889, 4510822, 3620759 seem to be affected but the ones with build 4600944 is not. It’s not rebooted but it doesn’t show the error. We couldn’t find any help in knowledge base articles and calls with VMware did not solve it. From the error it looked as the SD card are damaged some how. The advice was to reinstall ESXi – that’s from the KB articles.

Workaround

The customer has been deep diving into this as a reinstall would be a huge task and in the mean time they could not handle a power issue. It would cripple the whole environment instantly. Cris the IT admin found a workaround I think you should know, so based on his testing this is how to get your hosts back online without disrupting production.

  1. Downgrade iLO to the last stable release (2.40)
  2. Powerdown (not gracefully) via iLO
  3. Unplug both powercords
  4. Eject and re-insert  SD card
  5. Get a long coffee break – 5-10mins
  6. Re-insert powercords
  7. Boot
  8. enjoy your ESXi host

Just thought I shared this, hope it helps


2 Responses

  1. The anonymous colleague of Cris says:

    Additional note, the upgrade on iLO was from 2.40 to 2.54, at time of writing there is also a 2.55 version but release notes mention nothing of this resolving the above mentioned issue. There have been ealier versions of iLO firmware that had the same problem, 2.40 fixed that. My assumption is that maybe this early bug found it’s way back in the 2.54 release (maybe HPE devs created a new firmware from the wrong branch in their repository?). Also, from other posts, the problem with the pre 2.40 iLO firmware seems to affect only Gen8 and Gen9 Proliants.

Leave a Reply

https://tracking.cirrusinsight.com/869c29e2-3a9b-48c5-9232-0b95e7993ae8/controlup-com-pixel-php