How to troubleshoot vSAN issues?

Great Comprehensive Post on How to Check for VSAN Issues in your Environment !!

VirtuallyVTrue

Hope you are doing all great, for today’s post I wanted to put together some of the commands/troubleshooting I’ve had used with VMware vSAN,

Identify a partitioned node from a VSAN Cluster(Hosts)

What a single partitioned node looks like:

~ # esxcli vsan cluster get
Cluster Information
 Enabled: true
 Current Local Time: 2020-10-25T10:35:19Z
 Local Node UUID: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Local Node State: MASTER
 Local Node Health State: HEALTHY
 Sub-Cluster Master UUID: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Sub-Cluster Backup UUID:
 Sub-Cluster UUID: 52e4fbe6-7fe4-9e44-f9eb-c2fc1da77631
 Sub-Cluster Membership Entry Revision: 7
 Sub-Cluster Member UUIDs: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Sub-Cluster Membership UUID: ba45d050-2e84-c490-845f-1cc1de253de4
~ #

What a full 4-node cluster looks like (no partition)

~ # esxcli vsan cluster get Cluster Information Enabled: true Current Local Time: 2020-10-25T10:35:19Z Local Node UUID: 54188e3a-84fd-9a38-23ba-001b21168828 Local Node State: MASTER Local Node Health State: HEALTHY Sub-Cluster Master UUID: 54188e3a-84fd-9a38-23ba-001b21168828 Sub-Cluster Backup UUID: 545ca9af-ff4b-fc84-dcee-001f29595f9f Sub-Cluster UUID: 529ccbe4-81d2-89bc-7a70-a9c69bd23a19 Sub-Cluster Membership Entry Revision: 3 Sub-Cluster Member UUIDs: 54188e3a-84fd-9a38-23ba-001b21168828, 545ca9af-ff4bfc84-dcee-001f29595f9f…

View original post 1,361 more words

Advertisement

Obtain the placement of the physical disk by NAA id on the ESXi Hosts

This is a great Post on How to find the Physical Location of the disks on an esxi host.

VirtuallyVTrue

Here is a simple script to obtain the placement of the physical disk by naa on ESXi hosts

Copy below script and save it on the ESXi host

# Script to obtain the placement of the physical disk by naa on ESXi hosts
# Do not change anything below this line
# --------------------------------------

echo "=============Physical disks placement=============="
echo ""
	
esxcli storage core device list | grep "naa" | awk '{print $1}' | grep "naa" | while read in; do

echo "$in"
esxcli storage core device physical get -d "$in"
sleep 1

echo "===================================================="

done

Run the script:

[root@esxi1:~] sh disk.sh

You will get similar output as per your environment.
Output:

[root@esxi:~] sh disk.sh =============Physical disks placement============== naa.5002538a9823d020 Physical Location: enclosure 1, slot 6 ==================================================== naa.5002538a9823d1c0 Physical Location: enclosure 1, slot 3 ==================================================== naa.58ce38ee204ccd59 Physical Location: enclosure 1, slot 7 ==================================================== naa.5002538a9823d070 Physical Location: enclosure 1, slot 1 ==================================================== naa.5002538a9823d040 Physical…

View original post 33 more words

Workaround instructions to address CVE-2021-44228 in vCenter Server 6.7.x – For VCF 3.10.x

UPDATE: VMware has Updated the KB 87081 to Include the Script to remove log4j_class

I have taken these Workaround Instructions from the KB article 87081 and KB article 87095

For vCenter 6.7.x appliance in an VCF 3.10.x setup, some of the instructions in article 87081 don’t work and also in VCF 3.10.x since there are external PSC’s and the order to execute the instructions is as follows.

I am calling out VMware team to amend the steps for vCenter 6.7.x appliance in an non-HA configuration in the article 87081, especially for VCF 3.10.x installations.

For vCenter 6.7.x ; Steps to execute

vMON Service

  1. Backup the existing java-wrapper-vmon file

cp -rfp /usr/lib/vmware-vmon/java-wrapper-vmon /usr/lib/vmware-vmon/java-wrapper-vmon.bak

  1. Update the java-wrapper-vmon file with a text editor such as vi

vi /usr/lib/vmware-vmon/java-wrapper-vmon

  1. At the very bottom of the file, replace the very last line with 2 new lines
    • Originalexec $java_start_bin $jvm_dynargs “$@”Updated
      log4j_arg=”-Dlog4j2.formatMsgNoLookups=true”
      exec $java_start_bin $jvm_dynargs $log4j_arg “$@” 
  2. Restart vCenter Services

service-control –stop –all
service-control –start –all

Note: If the services do not start, ensure the file permissions are set correctly with these commands:

  • chown root:cis /usr/lib/vmware-vmon/java-wrapper-vmon
  • chmod 754 /usr/lib/vmware-vmon/java-wrapper-vmon

Analytics Service

NOTE:- The below workaround (Analytics service) is applicable for vCenter Server Appliance 6.7 Update 3o and Older versions only. vCenter Server Appliance 6.7 Update 3p is by default covered by vMON Service workaround. 

  1. Back up the log4j-core-2.8.2.jar file

cp -rfp /usr/lib/vmware/common-jars/log4j-core-2.8.2.jar /usr/lib/vmware/common-jars/log4j-core-2.8.2.jar.bak

  1. Run the zip command to disable the class

zip -q -d /usr/lib/vmware/common-jars/log4j-core-2.8.2.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

  1. Restart the Analytics service

service-control –restart vmware-analytics 

CM Service

  1. Back up the log4j-core.jar file

cp -rfp /usr/lib/vmware-cm/lib/log4j-core.jar /usr/lib/vmware-cm/lib/log4j-core.jar.bak

  1. Run the zip command to disable the class

zip -q -d /usr/lib/vmware-cm/lib/log4j-core.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

  1. Restart the CM service

service-control –restart vmware-cm

Run the remove_log4j_class.py script

1. Download the script attached to this KB (remove_log4j_class.py)

2. Login to the vCSA using an SSH Client (using Putty.exe or any similar SSH Client)

3. Transfer the file to /tmp folder on vCenter Server Appliance using WinSCP
Note: It’s necessary to enable the bash shell before WinSCP will work

4. Execute the script copied in step 1:

python remove_log4j_class.py

The script will stop all vCenter services, proceed with removing the JndiLookup.class from all jar files on the appliance and finally start all vCenter services. The files that the script modifies will be reported as “VULNERABLE FILE” as the script runs.

Verify the changes

Once all sections are complete, use the following steps to confirm if they were implemented successfully.

  1. Verify if the stsd, idmd, and vMon controlled services were started with the new -Dlog4j2.formatMsgNoLookups=true parameter:

ps auxww | grep formatMsgNoLookups

Check if the processes include -Dlog4j2.formatMsgNoLookups=true

  1. Verify the Analytics Service changes:

grep -i jndilookup /usr/lib/vmware/common-jars/log4j-core-2.8.2.jar | wc -l
 This should return 0 lines

  1. Verify the CM Service changes:

grep -i jndilookup /usr/lib/vmware-cm/lib/log4j-core.jar | wc -l

This should return 0 lines

The remaining steps for Secure Token Service, Identity Management Service don’t work for vcenter 6.7.x in VCF 3.10.x (3.10.2.1) environment

——– So, after this Step, we will have to SSH into the External PSC and follow the below steps ———-

CM Service

  1. Back up the log4j-core.jar file

cp -rfp /usr/lib/vmware-cm/lib/log4j-core.jar /usr/lib/vmware-cm/lib/log4j-core.jar.bak

  1. Run the zip command to disable the class

zip -q -d /usr/lib/vmware-cm/lib/log4j-core.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

  1. Restart the CM service

service-control –restart vmware-cm


Secure Token Service

  1. Back up and edit the the vmware-stsd file

cp /etc/rc.d/init.d/vmware-stsd /root/vmware-stsd.bakvi /etc/rc.d/init.d/vmware-stsd

  1. Find the section labeled start_service(). Insert a new line near line 266, just before “$DAEMON_CLASS start” with “-Dlog4j2.formatMsgNoLookups=true \” as seen in the example:

start_service()
{
  perform_pre_startup_actions

  local retval
  JAVA_MEM_ARGS=`/usr/sbin/cloudvm-ram-size -J vmware-stsd`
  $JSVC_BIN -procname $SERVICE_NAME \
            -home $JAVA_HOME \
            -server \
            <snip>
            -Dauditlog.dir=/var/log/audit/sso-events  \
            -Dlog4j2.formatMsgNoLookups=true \
            $DAEMON_CLASS start

  1. Restart the vmware-stsd service

service-control –stop vmware-stsd
service-control –start vmware-stsd

Identity Management Service

  1. Back up and edit the the vmware-sts-idmd file

cp /etc/rc.d/init.d/vmware-sts-idmd /root/vmware-sts-idmd.bakvi /etc/rc.d/init.d/vmware-sts-idmd

  1. Insert a new line near line 177 before “$DEBUG_OPTS \” with “-Dlog4j2.formatMsgNoLookups=true \” as seen in the example:

$JSVC_BIN -procname $SERVICE_NAME \
          -wait 120 \
          -server \
          <snip>
          -Dlog4j.configurationFile=file://$PREFIX/share/config/log4j2.xml \
          -Dlog4j2.formatMsgNoLookups=true \
          $DEBUG_OPTS \
          $DAEMON_CLASS

  1. Restart the vmware-sts-idmd service

service-control –stop vmware-sts-idmd
service-control –start vmware-sts-idmd

Verify the changes

Once all sections are complete, use the following steps to confirm if they were implemented successfully.

  1. Verify if the stsd, idmd, psc-client, and vMon controlled services were started with the new -Dlog4j2.formatMsgNoLookups=true parameter:

ps auxww | grep formatMsgNoLookups

Check if the processes include -Dlog4j2.formatMsgNoLookups=true

  1. Verify the CM Service changes:

grep -i jndilookup /usr/lib/vmware-cm/lib/log4j-core.jar | wc -l

This should return 0 lines

The steps in VMware KB Article 87081 is for vCenter with Embedded PSC and the above steps are for the vCenter server 6.7 with an External PSC

Hope this article helps the Engineers who are working on this log4j Vulnerability and if they have VCF 3.10.x you can follow the above steps with an external PSC Configuration.

NSX Plugin 1.2 in VRA 7.6 Not Generating NSX Security Groups in a Page (NOT SOLVED YET)

Recently, we have an ongoing issue where the NSX Plugin in VRO is not populating one page out of 4 pages and this is messing up our VRO Code to create and put Security Tags (NSX-V) on our VMs.

Below is a screenshot of the issue

This shows that the other pages have security groups in them but page-1 under one of the NSX Manager (NSX-V version 6.4.x) are not populated.

I have already deleted and re-installed the NSX-V Plugin using the VRO Control Center to no resolution.

The issue is not resolved yet and I will update this post with the resolution soon.

How to Find the NIC Driver Version on ESXI Host and get the Correct Driver from VMware

Recently, I had to Search for an QLogic 2x25GE QL41262HMCU CNA NIC driver to update it on multiple Dell R740XD hosts. It’s been a while since I used the Update Manager (vSphere 6.7 environment) and hence writing this post.

First thing is to SSH into an esxi host and then execute the following commands to check the firmware/driver version of the vmnic you want to update (In my case all my vmnics are Qlogic CNA NIC’s)

esxcli network nic get -n vmnic2

Output to the above esxcli command

Things to note is the Driver Name/Type, Firmware Version (First Part of it is sufficient), Version (This is the actual driver version on the esxi host).

In the Above screenshot the driver is ‘qedentv’, the firmware version is 8.53.3.0 and the version is 3.11.16.0

Now, we need to find the entries/numbers to search for the exact driver on the VMware compatibility website.

Execute the following command on the ESXI SSH session

vmkchdev -l | grep vmnic2

The highlighted portion is the one we require to search for the driver on VMware Compatibility website

Let us go to the VMware Compatibility website and IO section

We need to fill in the following values —

VID, DID, SVID and Max SSID to get the exact driver for your nic.

Let us fill in the values from our vmkchdev output

  • VID 1077
  • DID 8070
  • SVID 1077
  • Max SSID 000b
Input the values in VMware IO Compatibility list website
Qlogic Adapter and its versions by vSphere version

Select the vSphere version and click on the version to display the different driver versions we can download

I have selected vSphere version 6.7 U3 in this case and the screenshot is below

The esxi nic driver version and the physical adapter firmware version is different on my Dell server

As you can see, the esxi nic driver version and the physical nic adapter firmware versions are different on this Host. (Typically you should update the esxi nic driver once you upgrade the physical nic firmware as a best practice)

In this case my esxi nic driver version is 3.11.16.0 and the Qlogic NIC Physical firmware version is 8.53.x.x

To download the correct driver, you need to make sure that the esxi nic driver coincides with the Physical nic driver firmware for best compatibility. We will need to download the ‘qedentv’ driver.

We download the driver equal to the physical nic firmware version and the esxi nic driver name which is qedentv in this case

Download the driver.zip file using your my vmware credentials and you can use this zip file in the offline patches in Update Manager to create a baseline for your esxi hosts so this driver can be updated.

NOTE: Put the Host in Maintenance mode before you update the nic driver as this will reboot the esxi host.

How to Unregister a VM which is missing in VRA 7.6

Recently I had to get rid of multiple vms through VRA, However, I found that some of the vms status was showing as missing. This happens if the VM has already been deleted through the vCenter and VRA can’t find that VM in the vCenter.

The way you can see the missing status is you go to the deployments tab, check the Status if its ON, OFF or Missing (?) as the screenshot shows below:

The missing status is displayed next to the VM Name

Some of the info in the screenshot has been removed to protect my Organization Data and the VM Names have also been changed for the same purpose.

In VRA 7.6, you can unregister it easily using the GUI, You click on the Deployment Name

Then click on the VM Name itself (in this case its DC1Test001), then click on the small gear icon and then click on the option “Unregister” in the drop down menu as in the screenshot below:
The unregister option will remove this VM from the VRA internal DB so that it doesn’t show up in VRA.

Hope this post helps, as I was not able to see any blog posts regarding this simple unregister procedure in VRA 7.6

VMKPING and its uses in ESXI

I have recently been working with esxi hosts and to decommission them and recommission them into new projects and had to use the command vmkping to test the MTU of certain types of vmkernel ports like VMOTION, VSAN, VTEPs etc.

Here is a refresher for the vmkping commands which are very useful for a day to day Virtual Administrator

Command to check the MTU of 9000 with a certain amount of packets and with a certain interval and using a certain vmkernel port

vmkping -I vmk3 -d -s 8972 -c 1000 -i 0.005

vmkping -d -s 1472 <IP_Address>

In one of the above command vmkernel port is vmk3, for MTU 9000, we will be using 8972 as the packet size , -c is the count of packets and -i is the interval for which the ping will work (In the above example it is 0.005 seconds)

The second command is to test the MTU 1500 and the IP to test. You can also add -I (Interface) and vmkernel port through which you want to ping the IP

Command to check the communication of an IP address through an vmkernel port

vmkping -I vmk# IP address of the host

Command to get all the network adapters and the type of tcp/ip stack assigned to the nics

esxcfg-vmknic -l

Using the above command you can check the netstack which will be used in the below command to ping a vmotion vmkernel port

vmkping -S vmotion -I vmk1 <IP_Address_to_ping>

The -S is for netstack name like vmotion and this is the only command to be used if we use a NetStack

List of arguments:

vmkping [args] [host/IP_Address]

args:

  -4                            use IPv4 (default)

  -6                            use IPv6

  -c <count>            set packet count

  -d                           set DF bit (IPv4) or disable fragmentation (IPv6)

  -D                           vmkernel TCP stack debug mode

  -i <interval>           set interval (secs)

  -I <interface>         outgoing interface – for IPv6 scope or IPv4 bypasses routing lookup

  -N <next_hop>       set IP*_NEXTHOP – bypasses routing lookup

                                  for IPv4, -I option is required

  -s <size>                 set the number of ICMP data bytes to be sent.

                                  The default is 56, which translates to a 64 byte

                                  ICMP frame when added to the 8 byte ICMP header.

                                 (Note: these sizes does not include the IP header).

  -t <ttl>                   set IPv4 Time To Live or IPv6 Hop Limit

  -v                            verbose

  -W <timeout>        set timeout to wait if no responses are received (secs)

  -X                            XML output format for esxcli framework.

  -S                           The network stack instance name. If unspecified the default netstack instance is used.

New Product Lifecycle Matrix from VMware

VMware has released a new Product Lifecycle Matrix website so that we can check the validity of all the software from VMware like General Availability, End of General Support, End of Availability etc in one page.

Previously, this was a tedious process to check the end of support cycle for some of the VMware products, but now its all in one place !!

The link is —

https://lifecycle.vmware.com/#/

You can even filter by Product or Filter on any of the columns on the site.

This website can definitely help the Virtual Administrator to check for End of Life/End of mainstream support in their environments.

Visio Diagrams for VMware Validated Design for SDDC 6.0

Looks like the new Visio diagrams and Stencils for VMware SDDC 6.0 are out at communities.vmware.com and created by the author Gary JBlake.

The link to download the diagrams is below:

https://communities.vmware.com/t5/Documents/Visio-Diagrams-for-VMware-Validated-Design-for-SDDC-6-0/ta-p/2782683

This has been referenced from communities.vmware.com forum under VMTN > SDDC > VMware Validated Designs > Validated Designs for SDDC 6.x > Documents

https://communities.vmware.com/t5/Documents/Visio-Diagrams-for-VMware-Validated-Design-for-SDDC-6-0/ta-p/2782683

VMware Cloud Foundation (VCF) API Reference Guides

Here is the direct link to the API Reference Guide for VMware Cloud Foundation (VCF)

https://vdc-download.vmware.com/vmwb-repository/dcr-public/2d4955d7-fb6f-4a61-be78-64d95b951ccd/c6e26ae1-9438-4da0-bfc7-2e21d9046820/index.html#_overview

This is the Generic API Reference Guide for VCF instead of being Version Centric.

Updated – 2/16/2022

From Version VCF 4.3.1 (Non-VxRail), VMware has moved their API Guides to a new location which is a better format than before and more user friendly.

VCF 4.3.1 New API Reference Guide

VCF 4.4.0 New API Reference Guide

Updated – 1/25/2022

For Version Centric API Guides

VCF 3.10 API Reference Guide

VCF 4.0 API Reference Guide

VCF 4.1 API Reference Guide

VCF 4.2 API Reference Guide

VCF 4.3 API Reference Guide

VCF 4.3.1 API Reference Guide

NOTE: These Reference Guides and their versions are for NON VXRAIL implementations. They are valid for Regular VCF Implementation with VSAN Ready nodes.