What’s New in NSX 4.1.1 (On-Prem NSX)

These are some of the features added/enhanced for On-prem usage with NSX 4.1.1 version

Layer 2 Networking

  • Filter VLAN on VLAN Transport Zone: It is now possible to define a list of VLANs authorized for a given VLAN-based transport zone, which prevents NSX tenants from using some VLANs already used for general connectivity in the datacenter.
  • ESX Observability Enhancements: NSX API now offers statistics for ESX overlay, ESX distributed routing and switch security (IP discovery and spoof guard) modules which were only available through the CLI in previous releases. This gives more options to monitoring the NSX modules running on ESX.
  • Enhanced Data Path (EDP) now supports the following improvements:
    • Support of double overlay encapsulation traffic coming from VMs and containers to a host switch, offering better performances to Antrea deployments.
    • Support of MAC and VLAN filtering, which allows a physical NIC driver to program (MAC, VLAN) pair to physical NIC Rx queue.  
    • Optimization of flow cache for Geneve overlay traffic reducing the impact of large numbers of flows on forwarding performance.

Layer 3 Networking

IPv6 TEP (tunnel end point) support for Transport Nodes: This release introduces support for IPv6 TEP (tunnel end point) with Geneve encapsulation for Transport Nodes (Edge Nodes and ESXi hosts). With this feature, you can create overlay Transport Zones using IPv6 as the underlay transport protocol.

DPU-based Acceleration

NVIDIA BlueField-2 (100Gbps) is now supported

Edge Platform

  • Bare Metal Edge supports Intel 810: expands the list of supported NICs with 25Gbps/100Gbps Intel NIC. 
  • Disabled Flow Cache Alarm on Edge Node:  With VMware NSX 4.1.1, if the flow cache is disabled on Edge Transport Node, NSX triggers an alarm. This ensures Edge Transport Nodes can deliver the best performances.
  • Packet drop alarm: The alarm shows more specific information about packet drops, providing more granular information. 
  • Maximum supported cores for Bare Metal Edge: With VMware NSX 4.1.1, the Bare Metal Edge node can have up to 80 cores that will maximize the performance of the Edge node.

Physical Server

Added support of Windows 2012R2 OS for the Physical server transport node.

Container Networking and Security

  • Scale Improvements for TKGi: NSX 4.1.1 brings scale parity for TKGi customers shifting from Manager Mode APIs to the declarative Policy APIs.
  • Support for more than 1000 OpenShift routes with NCP 4.1.1.

Installation and Upgrade

  • Run NSX pre-upgrade checks at any time and independent of the upgrade process. Check upgrade readiness and fix any underlying issues ahead of time, and use your maintenance window for actual upgrades. Benefit from latest pre-checks added dynamically by NSX.NOTE: While this capability allows you to run pre-checks in advance, they do need to be mandatorily run again before starting an NSX upgrade. This is to ensure an accurate and latest assessment of your deployment right before upgrade.
  • NSX Federation now supports the same upgrade interoperability of N+2 (N = minor version number in product series) as NSX on-premises. 
  • Reduced downtime in rolling upgrade of NSX Manager cluster. 
  • Revamped NSX Upgrade UI for better performance. Additional user experience enhancements in the NSX upgrade process. 
  • Search <uuid> command is added to help to search the resource details using UUID to identify IP Address, Host Name, Display Name and Resource Type.
  • The NSX Manager UI now displays a banner when the NSX Manager is deployed by VMware Cloud Foundation. The NSX Upgrade splash page also reminds you to upgrade NSX from SDDC Manager when NSX is deployed by VMware Cloud Foundation.

Operations and Monitoring

  • LTA support on ESXi ENS Fastpath
    • Counter actions is introduced in this release and available on API, which enables users to trace the ENS fastpath and slow path traffic on the port.
  • Status metrics in string format can now be persisted in NAPP as Time-Series metrics, for example: cluster status metric shows the health of the NSX Manager cluster, and has values in string format i.e., stable/unstable/degraded, which can now be persisted in NAPP. This will allow you to monitor the status metrics which are strings over a period of time with historical context.
  • Online Diagnostic System
    • Thirteen new runbooks are added in this release to help with troubleshooting across Edge, Host and NSX Manager. See the Administration Guide for more information.
  • NSX CLI Enhancements
    • More than one filter can be applied on the CLI commands to filter the data, which is helpful to filter the large output.
    • New modifiers sed, awk, uniq are supported to filter and format the CLI output.
    • From the central CLI, users can now execute the commands on remote CLI using IP Address, Host Name and Display name. This provides flexibility to execute CLI commands on remote nodes using any of these identifiers.
    • The search <input-str> CLI command is added. This command can be used to get information such as the IP address, display name, resource type for any <input-str>. Along with the search string, resource-type parameter can be added to filter the resource type.
  • Transport Node APIs are updated to provide various details to aid in monitoring.
    • Transport Node Status API
      • MP connection status can be retrieved real time using source=realtime field in the API call.
      • For CCP connection, the status description field is added. This describes the reason for the disconnection and last status changed time field shows when the status was updated.
      • For pnic status, details of pnics that are down and last status change are added.
      • For cfgagent and opsagent, component and last status change fields are added.
      • Aggregated status now has status description, which shows the reason why aggregated status is not up and last aggsvc heartbeat, last status changed time added shows when the mp connection was active and status was updated.
    • PNIC bond Status API
      • Pnic bond details and type field reflecting whether it’s used for NSX or non-NSX are added.

VPN

With NSX 4.1.1 you can now enable the allocation of more cores for VPN services

Platform Security

Local User Account Management: Adds an NSX API for listing local user accounts on NSX appliances (NSX Manager, NSX Edge, NSX Application Platform, NSX Cloud Service Manager).

IPv6 TEP (tunnel end point) support for Transport Nodes

This release introduces support for IPv6 TEP (tunnel end point) with Geneve encapsulation for Transport Nodes (Edge Nodes and ESXi hosts). With this feature, users can create overlay Transport Zones using IPv6 as the underlay transport protocol. 

Here are some of the features with NSX which are used to connect with NSX+ and for the Cloud model —–

  • Securely connect to VMware NSX+ Services: NSX 4.1.1 provides on-premises NSX Manager deployments to securely connect to NSX+ Services. Details of NSX+ Services are available in the NSX+ Release Notes.
  • Cloud Consumption Model with NSX VPCs: New NSX VPC allows self service consumption for networking, security and services through on-demand isolated environment aligned to Cloud standard consumption. It offers a second level of tenancy below Project, with a streamlined UI and API to allow teams to easily deploy networking and security in the Cloud environment.
  • Multi-tenant Distributed IDS/IPS: Distributed IDS/IPS now offers multi-tenant consumption with the ability to configure it under Projects. It allows multiple users to apply IDS/IPS rules to their own VMs without risks of overlap.

Reference: VMware NSX 4.1.1 Release Notes

What’s New in vSphere 8 Update 2

Update vCenter Server with Minimal Downtime

With this Update, now we can Upgrade the vCenter server using Migration-based upgrade method, which means a new vCenter appliance will be deployed and the existing vCenter data is completely migrated, with this the downtime is only during the vCenter switchover (which is appx 5 minutes).

This reduces the actual downtime of the vCenter server being down during the upgrade.

However, this is an option alongside the regular vCenter server upgrade using the existing method as well. This new option is called the migration based approach.

There are some limitations however to this approach in this initial release:

1.This method is NOT supported in vCenters which are in Enhanced Linked Mode

2. The on-prem version doesn’t support vCenters which are in HA Configuration

Resilient vCenter Patching

In this update, the vCenter will automatically take a snapshot before the patching starts and then it gives you an option to rollback if the patching fails.

An Automatic Logical Volume Manager Snapshot (LVM) is taken before the patching of the vCenter server. This is an OS level snapshot and it is not a file based snapshot.

This will take a snapshot of the OS regardless of a VM snapshot being present on the VM.

Updates to the Certificate Management in vCenter Server

Now, you can renew or replace the vCenter Server certificates without the service restart option in vSphere 8 U2.

There is no need to schedule downtime to restart the services on the vCenter Server after the certificates have been renewed.

Restoring VDS Switch Configuration

There is a distributed Key Value Store added to the esxi hosts in the cluster which reconciles its information with the vCenter when the vCenter is restored from backup. This helps the vCenter to have the latest distributed switch information rather than the information which could be outdated from when the vCenter snapshot was taken by the backup application.

This helps in having the latest changes in the distributed switch provided to the vCenter server once it is restored from backup.

Adding Additional Identity Provider/s

With vSphere 8 U2, Azure AD has been added as an Identity Source to direct Federation options. All the other identity choices are still available.

This option helps the customer use consistent single sign-on inside their org.

we still have the legacy identity options like Microsoft AD over LDAPS, Microsoft ADFS & Okta Identity. This is a new addition to the existing options.

Updates to the vSphere Security Configuration & Hardening Guide with vSphere 8 U2

Lifecycle Manager can now manage shared vSAN witness nodes

vSphere lifecycle manager now can manage the image of vSAN witness nodes independent of the vSAN cluster in vSphere 8 U2

End to End UI has been added to the Configuration Management option in the vCenter Server

The Configuration Management which has been introduced in vSphere 8 has been improved and now an end to end UI has been added where the draft can be created, the configuration can be imported from file (as a json document)/host in the configuration Management menu in the vCenter Server.

Streamlined Windows Guest Customization

Now you can specify the Windows AD OU Path during the VM Customization Specification in vSphere 8 U2

Descriptive Error Messages when Files are Locked

With this Update (vSphere 8 U2), the vCenter server now shows which esxi host is holding the lock for a particular file. It is now easy to identify the source of locked VM files from the vSphere client. There is no need to run CLI commands or review the logs. The vSphere client shows the IP address and MAC of the host holding the file lock.

Expanding the Vendor Partnership for DPUs

With vSphere 8 U2, the vendor Fujitsu Systems with NVIDIA DPUs have been added to the ecosystem of DPU vendors.

Improved Placement for GPU Workloads

with this update, DRS makes smarter placement decisions for vGPU enabled VMs. vGPU VMs are automatically migrated to accomodate larger VMs. DRS can now place the VMs with vGPU more efficiently in the cluster and can now move those workloads around in the cluster

Quality of Service for GPU Workloads

A new option has been introduced in this update is the Estimated Max Stun time calculated based on assigned vGPU profile on the VM. Administrators can now define max acceptable stun time on the VM which has a vGPU profile.

VM Hardware Version 21

With this Update, the VM hardware version is now 21 and here are some of the additions to this hardware version

  • You can now add 16 vGPU per VM
  • now it supports 256 disks per VM of vNVMe (64 disks * 4 vNVMe adapters)
  • NVMe 1.3 support for Windows 11 and Windows Server 2022
  • NVMe for WSFC using NVMe adapters
  • Latest Guest OS (RHEL 10, Oracle Linux 10, Debian 13 & FreeBSD 15)

Streamlining Supervisor Cluster Deployments

reuse cluster configuration by exporting and importing the supervisor configuration. you can also now clone supervisor configuration to a new cluster

The above screenshot is a sample json document

Increased Flexibility of DevOps Deployments

more enhancements for customers to deploy Windows based vms in TKG namespaces

These are some of the major updates coming to vSphere 8 U2 which will be released in Q3 of 2023.

Excited for some of these changes !

Failed to get tasks data. Something went wrong. SDDC Manager 4.5 Password Management shows Error Message on its UI

Recently, after upgrading our SDDC Manager from 3.11.1 to 4.5, I came across an UI message on the SDDC Password Management tab as follows:

Issue Description: The error message shown in the screenshot above is “Failed to get tasks data. Something went wrong. Please retry or contact the service provider and provide the reference token.

Root Cause: As I was digging deeper into this, I found that the functionality of the password manager itself was fine, but it was somehow not able to get the complete task list data and hence throwing this error out.

after looking at the operationsmanager log at

/var/log/vmware/vcf/operationsmanager/operationsmanager.log

we found the following error in the log

2023-08-08T16:02:57.768+0000 ERROR [vcf_om,0000000000000000,0000] [o.a.c.c.C.[.[.[.[dispatcherServlet],http-nio-127.0.0.1-7300-exec-6] Servlet.service() for servlet [dispatcherServlet] in context wit
h path [/operationsmanager] threw exception [Request processing failed; nested exception is java.lang.IllegalArgumentException: No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManag
erErrorCode.PASSWORD_MANAGER_VRA_ENDPOINT_FAILED] with root cause
java.lang.IllegalArgumentException: No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManagerErrorCode.PASSWORD_MANAGER_VRA_ENDPOINT_FAILED

This means that after the SDDC Manager Upgrade, it was not able to determine the error with “PASSWORD_MANAGER_VRA_ENDPOINT_FAILED” in its internal database

Solution: After consulting with VMware Engineering, the issue was resolved by going into the operationsmanager db on the sddc manager and executing the following command to replace the message with something which the SDDC Manager could understand.

Disclaimer: Do this at your own risk, I would highly recommend to contact VMware GSS if you have the same issue to get an official resolution.

operationsmanager=# update passwordmanager.master_password_transaction set diagnostic_message = replace (diagnostic_message, 'PASSWORD_MANAGER_VRA_ENDPOINT_FAILED','PASSWORD_UPDATE_VRA_ENDPOINT_FAILED');

Once this is executed, you will get an message of UPDATE XX (where XX is the update number if it has been updated successfully)

Conclusion: Now, you can refresh the SDDC Manager UI and you will not see the error message on the UI in Password Management as before.

Hope this helps if you see similar message after your VCF Upgrade from 3.11.x to 4.5