Deploying two vRLI 4.7.1 clusters with vRealize Suite LCM 2.0 & setting up forwarding with SSL

After deploying vROPS using the vRSLCM yesterday, today the task was to deploy two separate instances of vRealize Log Insight. Both instances should consist of a cluster of one master and three workers (deployment type “Medium with HA”) and be placed on different hypervisor clusters, each managed by their own vCenter and separated by a third-party firewall. Finally the “outer” vRLI cluster would forward their received telemetry onto the “inner” cluster, which will function as part of a central SIEM platform.

The first step is to deploy both of the clusters. Again the “Create Environment” screen is used:

vRealize Suite Lifecycle Manager – Create Environment screen

After being finished with entering all the deployment parameters the pre-check is performed, but failed. Allegedly the IP addresses provided could not be resolved. Correctly configured Active Directory servers with the according A- and (reverse) PTR-entries were set up and reachable, so the warnings were ignored:

vRealize Suite Lifecycle Manager – Create Environment screen (Pre-check)

The environment creation is initiated:

vRealize Suite Lifecycle Manager – Create Environment screen (Initiated)

After deploying the master the three workers are deployed in parallel:

vRealize Suite Lifecycle Manager – Create Environment screen (In progress)

After deploying the three workers the LCM fails to configure the supplied NTP servers for some reason:

vRealize Suite Lifecycle Manager – Create Environment screen (Error)

At this point you have two options. The first one being deleting the environment (including the VMs by the below checkbox) and starting over: (e.g. if you actually made a mistake)

vRealize Suite Lifecycle Manager – Delete Environment screen

The other option is to resume the request: (The arrow on the right already disappeared after clicking so I drew one where it was)

vRealize Suite Lifecycle Manager – Resume Request

This time the step and eventually the entire request finished successfully. From the vCenter perspective the result will look like this:

vSphere Client – vRealize Log Insight cluster VMs

This process is repeated for the second cluster / environment, leaving us with two environments, each with a vRealize Log Insight cluster:

vRealize Suite Lifecycle Manager – Two environments with vRealize Log Insight

The next step is to set up message forwarding, so that the “inner” cluster will receive also the messages from the devices logging to the “outer” cluster, with only allowing SSL secured traffic from that cluster to the other on the firewall between the clusters.
Before configuring the two vRLI clusters we first need to export the certificate for the “inner” cluster, which was created separately using the vRSLCM:
(If the same certificate is used for both environments, e.g. subject alternative name=*.”parent.domain”, you can skip this)

vRealize Suite Lifecycle Manager – Settings / Certificate

The certificate is imported into all (four) nodes of the forwarding cluster (“outer”) sequentially like shown below or described in the official documentation, followed by a reboot:

SSH to vRealize Log Insight cluster VM

The receiving (“inner”) cluster can be configured to accept only SSL encrypted traffic: (optionally)

vRealize Log Insight – SSL Configuration

Finally the FQDN for the virtual IP of the the “inner” cluster is added as event forwarding destination in the configuration page of the “outer” cluster. The protocol drop-down should be left on “Ingestion API” as changing to “Syslog” will overwrite the original source IPs of the logging entries. After checking the “Use SSL” box verify the connection by using the “Test” button:

vRealize Log Insight – Event Forwarding

If no filters are added here all events received by that vRLI cluster will also be available on the other one.

For testing the setup I configured a NSX-T manager, placed at the “inner” management cluster, to log directly onto the “inner” cluster and a couple of edge VMs, which were deployed to the “outer” edge cluster, as described here.

Deploying vROPS 7.0 with vRealize LCM 2.0

In my previous post I described how to deploy the vRealize Lifecycle Manager 2.0 and import product binaries and patches.
Now it is time to make use of it to deploy the first vRealize product: vRealize Operations Manager.
There are some more steps, which you need to complete first, like generating a certificate or certificate signing request, and also some optional tasks, like adding an identity manager or Active Directory association. As they are described quite well in the official documentation I will skip those here.

Before you can add an environment (the term used for deploying vRealize products) a vCenter has to be added. The documentation states how to add a user with only the necessary roles, but for testing purposes you can also use the default administrator SSO account.

Add a Data Center to vRealize Suite Lifecycle Manager

If you have an isolated environment the request to add a vCenter will look like the above screenshot, as it can’t get patches from the internet, but it will still work.
In the “Create Environment” screen you can select which products you want to deploy. For each product you need to select the version and the deployment type:

vRealize Suite Lifecycle Manager – Create Environment screen

Next to the deployment type each product has a small “info” icon. Upon clicking that the details to each type are displayed:

vRealize Suite Lifecycle Manager – Create Environment screen (vROPS deployment types)

After selecting your desired products you have to accept the license agreements and fill in details like license keys, deployment options, IP addresses, host names etc.

vRealize Suite Lifecycle Manager – Create Environment screen (EULA & deployment parameters)

After putting in all necessary information a pre-check is performed:

vRealize Suite Lifecycle Manager – Create Environment screen (Pre-check)

The pre-check verifies the availability of your DNS servers, datastores and so on:

vRealize Suite Lifecycle Manager – Create Environment screen (Pre-check tasks)

After submitting the LCM creates the environment according to your input:

vRealize Suite Lifecycle Manager – Create Environment screen (Submitted)

As I made a mistake in the DNS server configuration the request failed.

vRealize Suite Lifecycle Manager – Create Environment screen (Failed)

Upon clicking “View Request Details” a more detailed view is presented. (see screenshot below)
Before deleting the environment and giving it another shot after having the mistake fixed you should export the configuration. Two options are offered: Simple or Advanced. I picked simple, which lets you download most of the parameters you entered as a JSON file.

vRealize Suite Lifecycle Manager – Create Environment screen (Failed, details)

The red info icon in the lower left corner gives even more details. In my case the successfully deployed master node was not reachable because of the DNS misconfiguration mentioned above.

In the “Create Environment” screen you can paste the contents of the saved JSON file (see above) to speed up the process. This brings you directly to the pre-check step. However you still need to go back one step and select your NTP servers – this doesn’t seem to be included in the JSON configuration.
While the environment creation request is in progress you can also see details:

vRealize Suite Lifecycle Manager – Create Environment screen (In progress, details)

Finally the request finished successfully. Some steps were left out, probably because this is a single node deployment and not a “real” cluster…

vRealize Suite Lifecycle Manager – Create Environment screen (Finished, details)

After the environment is created you can (and should) enable health checks via the menu which open when you click the three dots in the upper right corner of the request box. This menu also offers you to download logs and export the configuration, as done before.

vRealize Suite Lifecycle Manager – Create Environment screen (Enable health checks)

The first task I am going to do with the newly deployed vROPS is to install the HF3 security fix imported earlier:

vRealize Suite Lifecycle Manager – Environment details screen

Just select the patch, click “Next” to review and install:

vRealize Suite Lifecycle Manager – Environment details screen (Install Patch)

You can monitor the patch installation progress:

vRealize Suite Lifecycle Manager – Environment details screen (Installing patch in progress)

To be able to use the integrated Content Management you have to configure the environment as an endpoint. Just click the link “Edit” which appears when clicking on the three dots next to the list element:

vRealize Suite Lifecycle Manager – Content management: Endpoints screen

First confirm or modify the credentials entered earlier and test the connection:

vRealize Suite Lifecycle Manager – Content management: Edit Content Endpoint

Finally you have four checkboxes to selecht your desired Policy Settings:

vRealize Suite Lifecycle Manager – Content management: Edit Content Endpoint (Policy Settings)

I will pick up the Content Management section in another blog post.
Up until then the vROPS deployed using the vRealize Suite LCM can be used as usual by opening the web GUI. It asks you to set your currency (can’t be modified later on!) and is ready to fill its dashboards with data as soon as you configure the parameters and credentials for the solutions you want to monitor, e.g. vCenter:

vRealize Operations Manager – Configure Solutions
vRealize Operations Manager – Configure currency
vRealize Operations Manager – Configure currency (e.g. set to EUR)

Reusing storage devices for vSAN

Sometimes when a storage device (i.e. SSD or HDD) has been used for a previous vSAN deployment or has other leftovers it cannot be re-used (either for vSAN or a local VMFS datastore) right away. When you try to format the drive as shown below the error message “Cannot change the host configuration”:

Erase paritions highlighted in the Storage Devices view of a ESXi host in vSphere Client 6.7U1

The easiest way is to change the partition scheme from GPT to MSDOS via CLI (and back via GUI) and has been described in the community before.

However, even that may fail, e.g. because of the error “Read-only file system during write”. This can occur if the ESXi hypervisor finds traces of old vSAN deployments on the drive and refuses to overwrite these. In that case you first have to delete those traces manually. Log into the host in question as the root user and issue the vSAN commands needed. These are the commands for listing all known vSAN disks, deleting a SSD (cache device) and a (capacity) disk:

esxcli vsan storage list
esxcli vsan storage remove -s naa.6006016045502500c20a2b3ccecfe011 
esxcli vsan storage remove -d naa.58ce38ee2056991e

Afterwards repeat the steps described in the link above to correctly (re)claim the entire diskspace and then use it according to your plan.

Deploying VMware Cloud Foundation 3.0.1 on EoL servers

In my company’s lab I found a couple of quite old x86 servers, which were not in use anymore. The rack servers are in fact so old, that the original manufacturer (Sun) doesn’t exist anymore. The model is named “X4270 M2” and labeled end-of-life by Oracle for a while now. They are equipped with Intel Xeon processors released in 2011 (!), code name Westmere EP. That is in fact the oldest dual socket CPU generation by Intel which is supported by ESXi 6.7 (needed soon for VCF 3.5 upgrade).
I found some more servers, but those are equipped with Nehalem CPUs, so not hypervisor material; One possibility to give them a new purpose could be as a baremetal NSX-T edge

The main concerns whether VCF could be successfully deployed on old hardware
like that (when vSAN Ready Nodes, as required by VCF, were not even a thing yet) were compatibility with VMware’s HCL (especially HDDs, SSDs & raid controller), lack of 10 GbE adapters and not enough RAM.
Preparing the five servers (four for the management domain and another for the Cloud Builder VM) with ESXi was by the book, except for a well known workaround needed on old Sun servers.
For NTP, DNS and DHCP the OPNsense distribution was used once more.
After uploading the filled-out Deployment Parameter Sheet the Cloud Builder VM started its validation, resulting only in one warning/error regarding cache/capacity tier ratio which can be acknowledged. In fact the same message was displayed at a customer´s site with Dell PowerEdge R640 nodes with 4TB/800GB SSDs. This seems to be related with a known issue

VCF Configuration File Validation

However after hitting Retry another error was displayed saying that no SSDs available for vSAN were found. This could be confirmed when logging into any of the hosts ESXi interface.
The Intel SSDs were marked as hard disks and could not be marked as Flash via the GUI. The reason for this is the RAID controller by LSI which does not have a SATA bypass mode, meaning you have to create a RAID 0 virtual disk for each pass-through drive, so that the hypervisor has no clue about which hardware device lies underneath.
Upon investigating further in VMware´s KB a storage filter for local devices can be added via the CLI so that after a reboot that device will be marked correctly as SSD:

esxcli storage core device list
[Find the SSD which is supposed to be marked as such, e.g. "naa.600605b00411be5021404f8240529589"]
esxcli storage nmp satp rule add --satp=VMW_SATP_LOCAL --device naa.600605b00411be5021404f8240529589 --option "enable_local enable_ssd"

Finally the Cloud Foundation Bring-Up Process could be initiated.
Still no luck however, as an error deploying the NSX manager was displayed:

SDDC Bring Up

As the error was in that stage meant that the platform services controllers, SDDC controller and vCenter were already successfully deployed and reachable. After logging into the latter it was clear all VMs were put on the first host and so no more RAM space was available for the NSX manager. 

The first attempt to fix this problem was to migrate all VMs which were deployed so far onto the other three hosts. Afterwards the Bring Up process could be picked up by hitting Retry, but eventually the same error came up again.
It became apparent, that the four hosts were not equipped with a sufficient amount of RAM (24 GB) after all.
After  shutting down the hosts in the correct order more RAM was added (however still less than the amount described as required minimum: 72 GB vs. 192 GB) and then started up again.

Now the Bring Up went through, resulting in an up-to-date private cloud SDDC automated deployment on >8 year old hardware…

Of course this setup is only valid for lab tests as not respecting VMware’s minimum requirements and design recommendations is not supported and not  suited for production.

Manually downloading the VMware Cloud Foundation Update Bundle 3.0.1.1

If your VCF SDDC deployment does not have Internet connectivity you can manually download update bundles on another machine and import it afterwards.
Here are the necessary steps on a Windows workstation.

Password fields for SDDC Manager “vcf” and “root” users.

First use Putty to connect to the SDDC manager as user “vcf” and the password set in the cloud foundation deployment parameter spreadsheet (red circle in the image above) and run the following commands:

cd /opt/vmware/vcf/lcm/
su
[enter root password; see green circle in top image]
mkdir bundleimport
chown vcf:vcf bundleimport
exit
cd lcm-tools/bin/
./lcm-bundle-transfer-util --generateMarker

Create a folder on your windows machine (e.g. “C:\…\bundleupdate”) and copy the remote files “markerFile” and “markerFile.md5” from “/home/vcf/”, as well as the entire “/opt/vmware/vcf/lcm/lcm-tools/” directory structure using WinSCP. In that folder create another subfolder; In my case I called it “downloadedBundles”.
Make sure you have a current version of Java (JRE) installed.
Open a command prompt and run the following the commands: (when asked enter your my.vmware.com password)

cd C:\...\bundleupdate
./lcm-bundle-transfer-util -download -outputDirectory C:\...\
bundleupdate\downloadedBundles -depotUser {your my.vmware.com username} -markerFile C:\...\bundleupdate\markerFile
-markerMd5File C:\...\bundleupdate\markerFile.md5
WinSCP (above) and lcm-bundle-transfer-util (below)

After the download is completed unplug your internet cord and connect to your VCF deployment once more. Using WinSCP copy the content of your local folder “C:…\
bundleupdate\downloadedBundles” to “/opt/vmware/vcf/lcm/bundleimport”. Then use Putty again to run these commands:

cd /opt/vmware/vcf/lcm/lcm-tools/bin
chmod -R 777 ../../bundleimport
./lcm-bundle-transfer-util -upload -bundleDirectory /opt/vmware/vcf/lcm/bundleimport
Successfully imported update bundle listed in the Repository section.

Updating VMware Cloud Foundation 3.0.1 to 3.0.1.1

Here are a couple of screenshots of the SDDC Manager GUI showing the update process of a VCF deployment at a customer site (hostnames are edited out)…

Successfully downloaded update bundles listed in the Repository section.
Available updates for the Management Domain in the “Update/Patches” pane after completing the Precheck.
Starting the update by clicking “Update now” or “Schedule Update”.
Update in progress…
Detailed update steps in Progress / Queued.
ESX build number 10175896 shown in vCenter before update
ESX build number 1079125 shown in vCenter after update
The date on which the update completed is shown under “Update History”.

Deploying and patching vRealize Suite Lifecycle Manager 2.0

Another customer, another project – again the need to deploy a couple of vRealize components (Log Insight, Network Insight, Operations Manager, Automation & more).
Why not use the same helper tool the VMware Cloud Foundation uses to deploy “vROPS” and “vRA”?

VMware describes this management appliance as follows:

vRealize Suite Lifecycle Manager automates install, configuration, upgrade, patch, configuration management, drift remediation and health from within a single pane of glass, thereby freeing IT Managers/Cloud admin resources to focus on business-critical initiatives, while improving time to value (TTV), reliability and consistency. Automates Day 0 to Day 2 operations of the entire vRealize Suite, enabling simplified operational experience for customers.

https://blogs.vmware.com/management/2018/09/vrealize-suite-lifecycle-manager-2-0-whats-new.html

Download and deployment of the appliance’s OVA file is pretty straight forward as with most of VMware’s current products. After starting the newly created VM in the vCenter client you can log in with the default credentials “admin@localhost” / “vmware”, as described in the documentation.

Some patches are available and can be downloaded from my.vmware.com and applied to the VM via the web GUI pretty easily.

Patches available in December 2018 for VMware vRealize Suite Lifecycle Manager 2.0

For being able to use the current versions of “vRA” and “vRLI” you also need to install a product support pack available on the VMware marketplace. For downloading you need to click the “Try” button on the right hand side. The screenshot on there shows how to install the “.pspak” file.
After the pack is applied the product versions shown in the following screenshots are supported:

vRealize Product versions supported by vRLCM 2.0.0.2

The vRealize Suite LCM first needs to import the binaries of the products which are supposed to be deployed. If you are at a site with internet access you can use the integrated “My VMware downloads” option.
At an isolated site however the easiest way for me was to upload the required OVA files into the LCM VM, e.g. with WinSCP. After connecting with the “root” user (needs to set a password first) change into the “/data” folder and create a new directory (e.g. called “binary_import”) and copy everything into there.
Afterwards import the binaries from the web GUI as described in the documentation (local location type, base location = “/data/binary_import”, discover, add).
When the LCM is finished with discovering and mapping the product binaries and importing the patches the GUI should look like this:

Succesfully mapped most recent product binaries of vROPS, vRLI & vRNI supported by vRLCM 2.0.0.2 (above) and ciritical product patches (below)

After the holiday break the next steps will be to deploy and manage the vRealize Suite components needed…

Keeping up with the Cloud Foundations

I am currently helping a customer build a infrastructure platform to run a couple of
virtualized applications. The decision to use VMware products was already made before I joined the project, but at that stage (middle of the year) it was still uncertain whether the deployment / networking would both be “old school” (setting up everything by hand / VLANs seperated by physical firewalls) or if new approaches should be applied.
My experience with NSX and some articles I read about a new way of deploying VMware based SDDCs, namely the VMware Cloud Foundation (VCF), layed out the foundation (see what I did there…) for our new private cloud.

After continuing to dive into the VCF stack and its ideas (this free fundamentals course is great for starters) it quickly became clear that this could help reduce resources spent on deploying and operating the project’s infrastructure
drastically and also prevent human errors, as entire batches of tasks are automated, following the VMware Validated Designs

While planning the environment the latest VMware Cloud Foundation version available was 2.3.2. For this version the hardware compatibility list (both compute and networking equipment) was rather short, so for hardware selection Dell components were chosen. Until some more workshops were conducted an the boxes finally arrived some time passed, so a lot happened in the mean time…

During the VMworld US 2018 the new version 3.0 was announced and was released shortly after. The big difference introduced in this mayor update was focusing on VMware’s own products. When pre-3.0 versions also included the networking stack, supporting only certain models from a handful of vendors (Cisco, Juniper, QCT, Dell), now any underlay network supporting 1600 byte MTUs and 10 Gbps ethernet and all vSAN Ready Nodes (> 20 vendors) meeting the required/supported minimums could be used, making even brown-field scenarios possible.

More than a test deployment of the 3.0 Cloud Builder VM to download the deployment parameter spreadsheet and prerequisite checklist didn’t see the light of the day in the project, as by the time the hardware was installed 3.0.1 was already available to download. This minor version jump featured some bug fixes and improvements. For example it was no longer necessary to convert the Excel spreadsheet containing the 
deployment parameters (IP addresses/networks, license details, passwords) into JSON format with the included Python script on your own. The 3.0.1 Cloud Builder VM web GUI accepts the Excel file directly. Very nice!

The entire VCF 3.0.1 deployment took less than two hours from uploading the parameter spreadsheet to finishing the bring up, leaving us with a ready to use environment with vCenter, two Platform Service Controllers, vSAN, NSX, vRealize Log Insight cluster and, of course, the new SDDC manager.
The preparation of our hosts (Dell PowerEdge vSAN ReadyNodes) with ESX 6.5 was pretty easy. For DHCP (VXLAN transport VLAN), DNS & NTP I set up a HA cluster of OPNsense gateways. Some pictures from the deployment process will follow in a separate post.

Shortly after this another new version came out (3.0.1.1). As that only contains the current security patches for ESX 6.5 there only is a update bundle, not an OVA download.

Last week the next long awaited mayor release was published: 3.5. Again being available via upgrade or fresh OVA deployment it includes a log of changes. These were already announced at this year’s VMworld Europe, which I had the fortune to attend for the first time. Besides more bug fixes the jump to the current 6.7 releases of ESX, vCenter & vSAN is the biggest news (finally no need for Flash client – long live HTML5!), along with NSX 6.4.4 and updated version of vRLI, SDDC Manager and so on. Now also included is NSX-T 2.3.0, but only for workload domains – the management domain continues to rely on NSX(-V). This is supposed to pave the road for container based workloads like PKS/Kubernetes.

After the holidays I will continue the story with both results from upgrading the customer’s 3.0.1.1 site to 3.5 and also deploying 3.5 at my company’s lab on older hardware, so stay tuned…

How I got my VCIX-NV certification

Yesterday I went to take on the 3.5h VCAP6-NV Deploy exam. An accompanying  Design exam (still) doesn’t exist (as it does in other tracks), so when passing you automatically get awarded the VCIX status.
I wasn’t sure I had reached the minimum passing score, as the lab shut down while I was still working on the last tasks. Just like I heard from multiple vExperts (more on that below), the time really is very short, so time management and a lot of practice is imperative.
Luckily today I received an email from Acclaim with the subject “VMware issued you a new badge”, making this the third VMware exam I passed on the first attempt.

At this point I would like to thank my former colleagues at Accenture, where my VMware NSX journey began with the NSX-ICM and -Ninja courses, my colleagues and leadership team at Seven Principles, giving me support and resources, everyone from VMware I met in Staines, Barcelona or on projects, and of course the vExpert community, assisting me during preparation.

Especially the following articles were helping a lot:

Trying out netbox

After having read a recommendation from Greg Ferro, known of course for the Packet Pushers podcasts, for a tool claiming to offer both IP address management and data center inventory management I decided to give it a try.

The tool was written by the network engineering team at DigitalOcean before being published as open source software on Github.

A fast way for a demo deployment was via docker-compose.

As the great interface (see screenshots) and the idea of a more dynamic approach to a single point of truth convinced my superiors too, we are going to use it to document our Lab environment – maybe even as a showcase for customers.