Tuesday, May 4, 2021

VMware vCenter Server High Availability (VCHA)

Recently, I got into a discussion with my colleague regarding vCenter Server High Availability (VCHA) and a good discussion on the area where VCHA could be of use.

Before we start, I like to summarize the few questions that always got asked during my course of work and as well when I teach vSphere Install, Configure and Manage course.

What is VCHA use for?
It is really meant for local site availability. Where a lost of vCenter Server can create a outage to other management components where vSphere HA RPO is not sufficient or vSphere HA is not possible to bring up the vCenter Server.

Can VCHA be used in a stretched cluster setup and how should we plan to place the nodes?
Yes. You will definitely have two sites in a stretched setup. And if you like to have VCHA implement in such a setup, you will need to have a 3rd set or minimal a separate cluster at the passive node site. Typically you will have active node at site 1, and passive node with witness node at site 2 where witness node is on a separate cluster at minimal or a 3rd site recommended. In the case where active node lose connection to witness node, the passive will be promoted to active node.

What is the requirements for VCHA?
VCHA is using a continuous replication which also means it require a some high requirement. A minimum is 1G with 10ms RTT. You will also to separate network for VCHA network between active, passive and witness node where it is must not be part of management network and not have a default gateway. Reference: 1, 2

So what are the failure scenario? Reference: 1

  1. 1. Active node is isolated from Witness node.
    Failover happens. Passive node become Active Node. Degrade state till passive node is resume.
  2. Passive isolated from Witness node.
    VCHA in degraded state. No failover.
  3. Witness node isolated from active and passive node.
    VCHA in degrade state. No failover.

What configuration is possible on VCHA setup?
You can refer to this doc.

How can we monitor VCHA?
You can setup alarms that are available such as when the Database change from sync to async. Refer to doc.

The use of VCHA has becoming lesser since we vSphere HA. The only reason is to ensure any component failure of vCenter Server where it cannot be recovered, that is where VCHA is highly usable. This can happen where you want to prevent storage failure where Active node is on one storage and Passive is on another. In the scenario of storage failure for active node, the passive node will ensure vCenter Server is available.

If you are leveraging vSphere HA together with VCHA, you are double protecting the vCenter Server and definitely recommended. In fact, using vSphere HA, you should always set to ensure vCenter Server to be high priority recovery so to ensure if resources are not optimize, you have a higher probability of bring up vCenter Server. Upgrade to vSphere 7.0 Update 1 or at least vCenter Server 7.0 Update 1 (where ESXi are in a version supported by vCenter Server 7.0 Update 1), where DRS service and vSphere HA are made available when vCenter server is not present via the vCLS. More from KB.

However, if your RPO time is not that crucial such as a failure requirement, then a typical vCenter Server native backup can come into the picture. Doing a new deployment of vCenter Server and restoring from the backup set. However, this does not restore the vDS configuration. You will also need to backup your configure of vDS if you are using that. And restore it's configuration after restoring from vCenter Server backup set. You can also use a PowerShell script to do a schedule backup as shown here.

Tuesday, April 20, 2021

Next Change, Moving Forward

I started this blog when I was doing technical implementation and design back in IBM days. The term always use is Post Sales engineer/consultant. It was held up and I never get to start anything until I left and joined BT Frontline which is the now British Telecom.

I started this blog in Apr 2011 with one main purpose, paying forward for what I have benefited from the community. I remembered clearly, during my early years when I was building a VMware View in my home lab, someone from LinkedIn VCP group was so willingly helping over message and a call from oversea to help me troubleshoot and guide me to configure a vyatta virtual router. Since then, I decided to share what I have learned if not I would not have gain as much as I have.

That very year, I got an opportunity at VMware and join the company as a System Engineer which industry term it as Presales engineer. It was my dream company and I never looked back since. I carry on to share what I learned without any plan in my career. Just doing what I was good at. From Commercial customers to Government customers over my good 9 years.

I was active in the community back then started as VCP Club in Singapore to what is today VMUG. From attending events in VMUG Singapore to running it as a VMware technical liaison, doing content development for VMware various exams such as VCP, VCAP and badge exam, and getting to be a certified VMware Instructor for a taste in teaching a few classes. Got accepted into VMware CTOA Ambassador program and learning how to do well as a individual contributor has gain me much rewarding knowledge and getting to work with many peers around the world.

Early this year, I was offered as a SE manager to manage the public sector team. It was never something that have crossed my mind. After consideration I decided to take up the role.

It has been a good 10 years as an SE with all blog writing, youTube video publishing and community discussion. 

You might ask what makes me decide on this choice? I look back and see what I have accomplished over the years. I have ventured to places as a SE many would not have gone before. Having journey through all this, I believe that experience can definitely be experienced by anyone who is keen. If I am able to proliferate what I do with a team, it will extend much more than myself. Helping the community and as well benefiting my customers.

Hopefully with my 9 years of experience, I can shape and nurture the .Next ready architects in my team. The results is no longer my own but the outcome of the team I managed.

It is definitely something new to me but I am game for the challenge.

My blog will still be updated but perhaps I can add more interesting management related experience that I have gained along the way for those who are keen to explore and decide yourself if it is the path for you.

Let the challenge fun begins!

I was asked to read an article, for those who are pursuing a SE leadership role, this blog article does help shine some light.

Tuesday, April 6, 2021

Cross vCenter Migration in vCenter 7.0 Update 1

If you are not aware when vSphere 7.0 Update 1 was release, there is one improvement made to Cross vCenter Migration.

For those who didn't follow, in the past, Cross vCenter Migration can only be done between vCenter Servers within the same SSO domain. This created a limitation especially when one company merge or acquire another, they are unable to move the workloads but have to resort to the traditional methods either from backup and restore, etc.

With vSphere 7.0 Update 1, vCenter Server 7.0 Update 1 improve this function. There is no longer a requirement of having both the vCenter Servers to be part of the same SSO domain. This resolve lots of use cases out there in the field.

This feature came as a Fling and eventually made it to be part of the official product.

However, there are some things need to be clear on the requirements.

You need to make sure the vSphere edition needs to be at least Enterprise Plus. For Standard and the old Enterprise edition (if you didn't upgrade this which has EOL as a sold product) do not support Cross vCenter Migration.

You need to make sure the supported source to destination version of vCenter. There is a KB documented just for this. The minimum supported at time of writing is vSphere 6.0 U3 will be able to support to migrate to vSphere 7.0 U1.

When planning for migration, do take note of the above.

Saturday, March 13, 2021

vSAN 7 Update 2 What so Sexy?

There are so many blogs and articles been posted by many. You can refer to some of the official ones below.

Here I am going to list some of the great feature found in vSAN 7 U2 which will help in everyday operations or use cases.

vSphere Lifecycle Manager
Once feature which was covered in vSphere 7 U2 post, was the ability to upgrade or patch the ESXi with Suspend of memory with Quick Boot. In vSAN, this reduces resynchronization efforts

We also mentioned that more vendors hardware support for updates is now available.

With the new vLCM, you can now dictate a desired state with an image and a prescribe outcome as a desired result.

vSAN Data Persistence Platform (DPp)
In vSAN 7 U1, support for a new framework for integrating stateful aps working with Kubernetes Operators such as MINO, DATASTAX, etc. was introduced providing the vSAN Data Persistence platform.

In vSAN 7 U2, not j ust providing a easy deployment when the service is enabled, it will automatically spin up in the supervisor cluster. Also this update allows the migration from legacy vSphere Cloud Provider (VCP) to the Container Storage Interface (CSI) to use DPp.

vSAN File Services
More granular support has been added with SMB v2.1 and v3 paired with Actived Directory.  Kerberos for NFS and scaling up to 32 hosts for file services processing. Cormac wrote an article on this.

The support for 2-Node and stretched cluster setup is much welcome. Especially for remote office where a small setup is seen such services is often required. In terms of security, File Service now support encryption with Data-in-Transit.

Hope the above gives you a useful summary and discover more from the links provided.

Update 16th Mar 2021
Lastly, one last item is HCI Mesh which allows you to split Compute and storage clusters instead of having the Compute cluster to also be part of vSAN cluster. This allows more flexibility and there is no vSAN licensing requirement for Compute cluster. Check out what's new in vSAN 7 U2 on HCI Mesh here. Check out more on vSAN HCI Mesh here.

Thursday, March 11, 2021

vSphere 7.0 Update 2 What's so great?

There are multiple What's update and overview when vSphere 7.0 Update 2 was released on 9th Jan 2021. I am not here to list down those however, you can check it out below.

What I like to pinpoint out here is what I find will be useful for an architect choosing the right solution for the right use case and be aware of what is useful to help customers in running it after deployed.

I will break this down into three portions in the area of vSphere with Tanzu, AI/ML Platform and vSphere improvement.

vSphere with Tanzu
As you know vSphere with Tanzu or TKG-s has been introduce when vSphere 7.0 was released. With update 2, it now able to leverage on NSX Advanced Load Balancer (previously known as AVI), an enterprise grade Load Balancer for Supervisor Cluster, Guest cluster (TKG) and Kubernetes Services of Type LoadBalancer deployed in TKG clusters. Check out this article to know more.

AL/ML Platform
If you have not followed, VMware and NVIDIA have established a new partnership to support in support of the latest Ampere family of NVIDIA GPUs and also the support for NVIDIA AI Enterprise Software Suite.

With support of Multi-Instance GPU (MIG), previously release in U1 as tech preview, this replaces the older method of time-slicing the use of vGPU. This will requires to assign a GPU profile to the VM and also requires SR-IOV to be turn on in the BIOS on the server. You can check out this article to know more.

At the same time, vSphere Bitfusion 3.0 update will include improvement to support newer CUDA version and GPU-to-GPU comunication via NVIDIA Collective Communications Library (NCCL). Other minior improvements involve in adding subsequent servers users to cluster and adding more network adaptor to Bitfusion servers.

Now this has make vSphere as a preferred platform and the only hypervsior on the market to be able support NVIDIA new technology.

vSphere Improvement
vSphere Lifecycle Manager has increased the ecosystem support that now include HPE iLO and Dell OMIVV. In this release, vLCM supports vSphere with Tanzu and NSX-T llifecycle. This now I would say covers the SDDC infrastructure components. Now you can also import cluster image from existing host or from a new host. Lastly, the best part is when doing upgrades, you can now specify whether to vMotion off the host for the VMs or simply suspend to memory for the VMs. This definitely will reduce the time taken if you were to vMotion away and back after upgrade.
Check out the post and video here.

The next feature is on vMotion Auto Scale. Prior to this release, vMotion will not utilize the full link for 40GbE or 100 GbE. We need to enable multiple configurations to enable multiple streams, etc to maximize the use of it. What this release does is vMotion will automatically spins up the number of streams to tulize the bandwidth given. This not only save you on configuration requirements and have faster live-migrations when needed.

Lastly, further support by optimizing use of AMD EPYC CPU which now will give customers more option in terms of CPU.

In the area of security, there is vSphere Native Key Provider. This is a good feature in my opinion as it will applies to a big group of customers who does not or would love to use KMS without to implement one in remote site of in their environment which do not a full flege KMS. This not only allows customers now to be able to use vSphere VM encryption but also vSAN Encryption. Check this out to find out how easy it is to setup.

Another area is to safely dispose your hardware equipment where no more worries of exposing your confidential data such as password, certificates, etc. with ESXi Configuration Encryption. This leverage on the physical TPM on the server to ensure thinngs such as boot volumes are encrypted where it can be dispose of safely and for warranty exchange.

Big changes will be coming also for the vSphere Security Configuration Guide to guide customers what is use for certain standard such as PCI DSS or HIPAA.

Last but not least VMware Tools now can help in Guest Content Distribution to allows customer to share content to the VM via any type of shared datastore and apply granular access policies. This also regulate both the vSphere and VM sides of things. Also added is a VMware Time Provider to use a low-jitter channel to synchronize time directly with hosts. This reduce the latency via the traditional methods.

Hope this sum up all those useful items for planning and designing taking consideration what is possible into your solution.

Monday, March 1, 2021

Critical: vCenter Server Vulnerability VMSA-2021-0002

Many might have been raised alert on the recent vCenter Server vulnerability which was raised as a 9.8/10 scale rating. One of it can be found here reported on Feb 23rd.

If you have subscribe to VMware Security advisory, you would have received this information VMSA–2021–0002.

I would strongly encourage anyone who is using VMware solution to subscribe to VMware Securities Advisories so as to be kept informed of any security information.

If you have refer to VMSA-2021-002, vCenter Server version 7.0 U1c was updated in Dec 17th, 6.7 U3I Nov 19th and lastly 6.5 U3N Feb 23rd one day after the report. If you have been up to date, you would have been protected way before the report was announced. The only version was 6.5 which was release a day after, but based on the report, it was a one day turn around which is still impressive.

Also this is very critical for vCenter Server that are connected to the internet. However, this case would be minimal as most customer would not have place their management server facing internet. This normally would be front by a proxy server to start with. Nevertheless, do get yourself updated for any critical security patches to be save from a compromise.

Tuesday, February 16, 2021

VMware Tools Missing!

 Recently, I was in a Facebook group, VMware vExpert and one member actually posted this.

He was running a VDI environment and notice his VMware Tools got uninstalled and was not able to install successful after several attempt. This is a VMware issue, but let's looks more into it.

With further check, the user did a update to their ESXi host, and vSphere auto update the VMware Tools to every virtual machine that got rebooted. During the installation, whether auto or manual triggered by user, it fails. With an investigation by the member, it seems his anti-virus has blocked the installation.

But wait right here, how did vSphere did auto update of VMware Tools? Isn't that trigger normally by using the vCenter Update Manager (prior to vSphere 7.0) or vCenter Lifecycle Manager (vSphere 7.0 onwards)?

A good thing the member found this article by one of our VCDX.

It seems that there is an auto update of VMware Tools to patch ESXi host if you check that on as show by vMiss.net.

vSphere 6.7U1 Source: https://vmiss.net/

vSphere 7.0 Source: http://labs.hol.vmware.com/

From the above screenshot, you can setup auto update to match a host. It sounds great but it may not be particularly useful especially when you are running a virtual desktop infrastructure (VDI) environment.

Similarly, I have encountered a customer issue at the same time and manage to link both cases together.

Let me explain here. You do have a master image for VDI which will have most of the base software such as anti-virus agent been installed. Every desktop created from the master image, it typically starts from a reboot status, in this case, VMware Tools will be triggered to be installed. This is done often especially when policy is set to create new or refresh a desktop when a user logs off. And this also happens during a regular maintenance to refresh a desktop pool.

Now the problem comes, AV are typically setup to prevent system installation specially to prevent any intrusion action. In this case, VMware Tools is one of such installation and this got blocked. This causes unnecessary panic where user start to relate this to a vSphere or a VDI solution issue.

This does not often relate to a solution issue, but a setting been used based on a design. One would need to assess whether this makes sense for any solution and understand the caveats that comes with it. For someone who took over an environment without prior knowledge will not be aware and will waste a lot of time troubleshooting.

There is also one more caveat, when upgrading your ESXi host and working with VDI solution such as Horizon or any other solution, you need to ensure that compatibility even for the VMware Tools. The VMware Product Interoperability Matrices site will be an especially useful to bookmark. An example Horizon 7.7 with vSphere 6.7U3. In this case, we do not upgrade the VMware Tools from host upgrade and not enable Auto Update.

Horizon 7.7 support up to VMware Tools 11.0.5

vSphere ESXi 6.7U3 support up to VMware Tools 11.2.5

In summary, design decisions and caveats in the environment hand over is an important process. This should be communicated to all the members who are involved.

VMware vCenter Server High Availability (VCHA)

Recently, I got into a discussion with my colleague regarding vCenter Server High Availability (VCHA) and a good discussion on the area wher...