Deploying VMware Cloud Foundation 3.0.1 on EoL servers

In my company’s lab I found a couple of quite old x86 servers, which were not in use anymore. The rack servers are in fact so old, that the original manufacturer (Sun) doesn’t exist anymore. The model is named “X4270 M2” and labeled end-of-life by Oracle for a while now. They are equipped with Intel Xeon processors released in 2011 (!), code name Westmere EP. That is in fact the oldest dual socket CPU generation by Intel which is supported by ESXi 6.7 (needed soon for VCF 3.5 upgrade).
I found some more servers, but those are equipped with Nehalem CPUs, so not hypervisor material; One possibility to give them a new purpose could be as a baremetal NSX-T edge

The main concerns whether VCF could be successfully deployed on old hardware
like that (when vSAN Ready Nodes, as required by VCF, were not even a thing yet) were compatibility with VMware’s HCL (especially HDDs, SSDs & raid controller), lack of 10 GbE adapters and not enough RAM.
Preparing the five servers (four for the management domain and another for the Cloud Builder VM) with ESXi was by the book, except for a well known workaround needed on old Sun servers.
For NTP, DNS and DHCP the OPNsense distribution was used once more.
After uploading the filled-out Deployment Parameter Sheet the Cloud Builder VM started its validation, resulting only in one warning/error regarding cache/capacity tier ratio which can be acknowledged. In fact the same message was displayed at a customer´s site with Dell PowerEdge R640 nodes with 4TB/800GB SSDs. This seems to be related with a known issue

VCF Configuration File Validation

However after hitting Retry another error was displayed saying that no SSDs available for vSAN were found. This could be confirmed when logging into any of the hosts ESXi interface.
The Intel SSDs were marked as hard disks and could not be marked as Flash via the GUI. The reason for this is the RAID controller by LSI which does not have a SATA bypass mode, meaning you have to create a RAID 0 virtual disk for each pass-through drive, so that the hypervisor has no clue about which hardware device lies underneath.
Upon investigating further in VMware´s KB a storage filter for local devices can be added via the CLI so that after a reboot that device will be marked correctly as SSD:

esxcli storage core device list
[Find the SSD which is supposed to be marked as such, e.g. "naa.600605b00411be5021404f8240529589"]
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device naa.600605b00411be5021404f8240529589 --option "enable_local enable_ssd"

Finally the Cloud Foundation Bring-Up Process could be initiated.
Still no luck however, as an error deploying the NSX manager was displayed:

SDDC Bring Up

As the error was in that stage meant that the platform services controllers, SDDC controller and vCenter were already successfully deployed and reachable. After logging into the latter it was clear all VMs were put on the first host and so no more RAM space was available for the NSX manager. 

The first attempt to fix this problem was to migrate all VMs which were deployed so far onto the other three hosts. Afterwards the Bring Up process could be picked up by hitting Retry, but eventually the same error came up again.
It became apparent, that the four hosts were not equipped with a sufficient amount of RAM (24 GB) after all.
After  shutting down the hosts in the correct order more RAM was added (however still less than the amount described as required minimum: 72 GB vs. 192 GB) and then started up again.

Now the Bring Up went through, resulting in an up-to-date private cloud SDDC automated deployment on >8 year old hardware…

Of course this setup is only valid for lab tests as not respecting VMware’s minimum requirements and design recommendations is not supported and not  suited for production.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.