Search This Blog

Friday, May 25, 2018

Homelab Cluster: Growing pains. NVMe boot with the X10DAi

This post is entirely about trying to get an NVMe SSD to boot from an X10DAi motherboard.

Apparently installing to and booting from an NVMe with Linux is a common problem. The NVMe, motherboard BIOS, and OS all have to have compatible drivers for it to work. My workstation motherboard is a X10DAi, and the NVMe I'm trying to use is the Samsung 960 Evo. This FAQ says to just enable EFI Option ROM for the PCIe slot the NVMe drive (well, it attached to its adapter) is in, then boot from a UEFI dvd/usb installer and install. However, this did not work for me with CentOS 7.3 because CentOS couldn't see the NVMe. I activated a shell from the centos installer (cntl + alt + F2) and did "lspci", but it didn't detect the NVMe drive, which means the motherboard/BIOS isn't seeing it either. The BIOS was a version out of date (2.0a instead of 3.0a), so I downloaded the new BIOS image and followed the instructions to update it, which went smoothly. Loaded default BIOS settings, changed the EFI OPROM on appropriate slot again, then booted to the CentOS UEFI installer. But it still didn't see the NVMe drive. Back to the shell...nope, nothing. "modprobe nvme" doesn't help either. "lsblk" doesn't see it. Well this sucks. So either the motherboard doesn't work with NVMe's (despite the FAQ), the PCIe adapter is bad, the drive is bad, or the drive is incompatible with Linux (unlikely). To test to see if it's CentOS, I tried installing Ubuntu 18. It also did not see the drive. So then I took it out of the work station and put it in the desktop. The desktops motherboard has an M.2 PCIe drive slot, so it should be compatible with PCIe SSDs in PCIe slots. Bingo, showed up in Ubuntu 16.04 "lsblk" without having to do anything. "lspci" shows the Samsung driver. I was able to partition it and write files to it. So the drive and the adapter are fine. The X10DAi is not compatible with most NVMe PCIe SSDs for boot, despite what the FAQ says. It may only work with certain ones, or it may only boot with Windows, which is annoying. This thread suggests the BIOS might have to be modded, which I really want to avoid. A post in that thread said that the 950 Pro works with an X10 and the latest BIOS. This thread has an interesting post in reply to someone else saying a 950 Pro worked in their X10DAX (which is same line as my X10DAi):
Contrary to nearly all other NVMe SSDs the Samsung 950 Pro has an NVMe Option ROM in the box. That is why you can boot off that SSD in LEGACY mode.
Generally you need a suitable NVMe EFI module within the mainboard BIOS, if you want to be able to boot off an NVMe SSD in UEFI mode.
I guess that explains it. The fancy Intel drives they suggest buying problem have the NVMe Option ROM, too. Here's another good post, this time about the X10DRi-T. He managed to get SM to send him a BIOS with NVMe support for the 960 Evo. Unfortunately, it seems they haven't updated the X10DAi's BIOS with the same code. Moving forward, I'll either have to:
  1. Mod BIOS
  2. Get a Samsung 950 Pro
  3. Get a different motherboard, like an ASUS Z10PE-D8/D16
I looked into the ASUS Z10PE-D8/D16. According to a few threads on servethehome, it turns out that unless you buy one from after Nov. 2015, they do not support dual E5 V4 CPUs, even with a BIOS update. Turns out they a chip has to be replaced on the motherboard to enable dual E5 V4's. Since most of the used boards are probably produced before that, and given my luck so far with this project, I think I won't do that. Unfortunately, the X10DRi-t's are expensive. None of the other dual SM X10 motherboards are confirmed to work with the 960 Evo or other consumer NVMe SSDs. That pretty much leaves getting a Samsung 950 Pro. For now, I'll have to use a regular SATA SSD for the OS. So I installed the OS, did updates, etc. I also contacted Supermicro Support to see if they could do anything.

Then I had a derp moment. I only had one CPU installed, which meant that the PCIe slot I was trying to use for the NVMe was not active. ** ** ***** * * * **** * (Oops). I put it in another slot and booted to the SATA SSD I had already installed CentOS on. "lsblk" showed the nvme, and lspci showed the samsung driver, but oddly for the PM961. I suppose the PM961, like the 950 Pro, is supported then? Maybe the PM951 is, too? Who knows. I shut down, disconnected the SATA drive, put the installer CentOS USB back in, booted to UEFI installer, which also saw the drive. I then installed CentOS 7 minimal to it, shut down, and rebooted. However, the NVMe drive was not listed as a boot option. So close!

So I tried again, but this time with an extra USB drive inserted. I did custom standard partitioning and placed the /boot and /boot/efi partitions (1 GiB each) only on the USB drive, but placed everything else only on the NVMe drive (root 50GiB, swap 8GiB, home ~407 GiB), switched the USB drive to the boot device, installed, and rebooted to the USB drive. THIS WORKED. CentOS minimal booted from the USB to the NVMe. Heck yes. It's ugly, but it works, which is what matters. I'll probably get a low-profile 4-8GB USB 3.0 drive for the boot partition drive and just leave it in my computer forever.

Hopefully I'll be able to enable SATA RAID in the BIOS so I can use the hardware RAID controller for the storage drives. There seems to be some incompatibility with the RAID setting (it's on AHCI now).

Updates:

I've been in contact with Supermicro support. The X10DAi simply doesn't have the right code to talk to most consumer PCIe SSD's. I'd have to purchase a custom BIOS from them to enable booting from the 960 Evo.

That's kind of irrelevant though because I happened upon a crazy good deal for a Z10PE-D8 manufactured after Oct 2015, meaning it has the correct BIOS chip to handle dual v4 Xeons. So I bought it. It also has a PCIe 3.0 x4 M.2 slot that works with NVMe's, as well as 7 x16 slots. So overall, major upgrade. I'll be selling the X10DAi. 

1 comment:

  1. The PM961 is the OEM version of the 960 Pro, the PM951 is the OEM version of the 950 Pro. Same drive, slightly different firmware - OEM versions usually use a little bit less power in exchange for slightly lower write speeds.

    ReplyDelete