How to troubleshoot vSAN issues?

Great Comprehensive Post on How to Check for VSAN Issues in your Environment !!

VirtuallyVTrue

Hope you are doing all great, for today’s post I wanted to put together some of the commands/troubleshooting I’ve had used with VMware vSAN,

Identify a partitioned node from a VSAN Cluster(Hosts)

What a single partitioned node looks like:

~ # esxcli vsan cluster get
Cluster Information
 Enabled: true
 Current Local Time: 2020-10-25T10:35:19Z
 Local Node UUID: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Local Node State: MASTER
 Local Node Health State: HEALTHY
 Sub-Cluster Master UUID: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Sub-Cluster Backup UUID:
 Sub-Cluster UUID: 52e4fbe6-7fe4-9e44-f9eb-c2fc1da77631
 Sub-Cluster Membership Entry Revision: 7
 Sub-Cluster Member UUIDs: 507e7bd5-ad2f-6424-66cb-1cc1de253de4
 Sub-Cluster Membership UUID: ba45d050-2e84-c490-845f-1cc1de253de4
~ #

What a full 4-node cluster looks like (no partition)

~ # esxcli vsan cluster get Cluster Information Enabled: true Current Local Time: 2020-10-25T10:35:19Z Local Node UUID: 54188e3a-84fd-9a38-23ba-001b21168828 Local Node State: MASTER Local Node Health State: HEALTHY Sub-Cluster Master UUID: 54188e3a-84fd-9a38-23ba-001b21168828 Sub-Cluster Backup UUID: 545ca9af-ff4b-fc84-dcee-001f29595f9f Sub-Cluster UUID: 529ccbe4-81d2-89bc-7a70-a9c69bd23a19 Sub-Cluster Membership Entry Revision: 3 Sub-Cluster Member UUIDs: 54188e3a-84fd-9a38-23ba-001b21168828, 545ca9af-ff4bfc84-dcee-001f29595f9f…

View original post 1,361 more words

Obtain the placement of the physical disk by NAA id on the ESXi Hosts

This is a great Post on How to find the Physical Location of the disks on an esxi host.

VirtuallyVTrue

Here is a simple script to obtain the placement of the physical disk by naa on ESXi hosts

Copy below script and save it on the ESXi host

# Script to obtain the placement of the physical disk by naa on ESXi hosts
# Do not change anything below this line
# --------------------------------------

echo "=============Physical disks placement=============="
echo ""
	
esxcli storage core device list | grep "naa" | awk '{print $1}' | grep "naa" | while read in; do

echo "$in"
esxcli storage core device physical get -d "$in"
sleep 1

echo "===================================================="

done

Run the script:

[root@esxi1:~] sh disk.sh

You will get similar output as per your environment.
Output:

[root@esxi:~] sh disk.sh =============Physical disks placement============== naa.5002538a9823d020 Physical Location: enclosure 1, slot 6 ==================================================== naa.5002538a9823d1c0 Physical Location: enclosure 1, slot 3 ==================================================== naa.58ce38ee204ccd59 Physical Location: enclosure 1, slot 7 ==================================================== naa.5002538a9823d070 Physical Location: enclosure 1, slot 1 ==================================================== naa.5002538a9823d040 Physical…

View original post 33 more words

How to Find the NIC Driver Version on ESXI Host and get the Correct Driver from VMware

Recently, I had to Search for an QLogic 2x25GE QL41262HMCU CNA NIC driver to update it on multiple Dell R740XD hosts. It’s been a while since I used the Update Manager (vSphere 6.7 environment) and hence writing this post.

First thing is to SSH into an esxi host and then execute the following commands to check the firmware/driver version of the vmnic you want to update (In my case all my vmnics are Qlogic CNA NIC’s)

esxcli network nic get -n vmnic2

Output to the above esxcli command

Things to note is the Driver Name/Type, Firmware Version (First Part of it is sufficient), Version (This is the actual driver version on the esxi host).

In the Above screenshot the driver is ‘qedentv’, the firmware version is 8.53.3.0 and the version is 3.11.16.0

Now, we need to find the entries/numbers to search for the exact driver on the VMware compatibility website.

Execute the following command on the ESXI SSH session

vmkchdev -l | grep vmnic2

The highlighted portion is the one we require to search for the driver on VMware Compatibility website

Let us go to the VMware Compatibility website and IO section

We need to fill in the following values —

VID, DID, SVID and Max SSID to get the exact driver for your nic.

Let us fill in the values from our vmkchdev output

  • VID 1077
  • DID 8070
  • SVID 1077
  • Max SSID 000b
Input the values in VMware IO Compatibility list website
Qlogic Adapter and its versions by vSphere version

Select the vSphere version and click on the version to display the different driver versions we can download

I have selected vSphere version 6.7 U3 in this case and the screenshot is below

The esxi nic driver version and the physical adapter firmware version is different on my Dell server

As you can see, the esxi nic driver version and the physical nic adapter firmware versions are different on this Host. (Typically you should update the esxi nic driver once you upgrade the physical nic firmware as a best practice)

In this case my esxi nic driver version is 3.11.16.0 and the Qlogic NIC Physical firmware version is 8.53.x.x

To download the correct driver, you need to make sure that the esxi nic driver coincides with the Physical nic driver firmware for best compatibility. We will need to download the ‘qedentv’ driver.

We download the driver equal to the physical nic firmware version and the esxi nic driver name which is qedentv in this case

Download the driver.zip file using your my vmware credentials and you can use this zip file in the offline patches in Update Manager to create a baseline for your esxi hosts so this driver can be updated.

NOTE: Put the Host in Maintenance mode before you update the nic driver as this will reboot the esxi host.

VMKPING and its uses in ESXI

I have recently been working with esxi hosts and to decommission them and recommission them into new projects and had to use the command vmkping to test the MTU of certain types of vmkernel ports like VMOTION, VSAN, VTEPs etc.

Here is a refresher for the vmkping commands which are very useful for a day to day Virtual Administrator

Command to check the MTU of 9000 with a certain amount of packets and with a certain interval and using a certain vmkernel port

vmkping -I vmk3 -d -s 8972 -c 1000 -i 0.005

vmkping -d -s 1472 <IP_Address>

In one of the above command vmkernel port is vmk3, for MTU 9000, we will be using 8972 as the packet size , -c is the count of packets and -i is the interval for which the ping will work (In the above example it is 0.005 seconds)

The second command is to test the MTU 1500 and the IP to test. You can also add -I (Interface) and vmkernel port through which you want to ping the IP

Command to check the communication of an IP address through an vmkernel port

vmkping -I vmk# IP address of the host

Command to get all the network adapters and the type of tcp/ip stack assigned to the nics

esxcfg-vmknic -l

Using the above command you can check the netstack which will be used in the below command to ping a vmotion vmkernel port

vmkping -S vmotion -I vmk1 <IP_Address_to_ping>

The -S is for netstack name like vmotion and this is the only command to be used if we use a NetStack

List of arguments:

vmkping [args] [host/IP_Address]

args:

  -4                            use IPv4 (default)

  -6                            use IPv6

  -c <count>            set packet count

  -d                           set DF bit (IPv4) or disable fragmentation (IPv6)

  -D                           vmkernel TCP stack debug mode

  -i <interval>           set interval (secs)

  -I <interface>         outgoing interface – for IPv6 scope or IPv4 bypasses routing lookup

  -N <next_hop>       set IP*_NEXTHOP – bypasses routing lookup

                                  for IPv4, -I option is required

  -s <size>                 set the number of ICMP data bytes to be sent.

                                  The default is 56, which translates to a 64 byte

                                  ICMP frame when added to the 8 byte ICMP header.

                                 (Note: these sizes does not include the IP header).

  -t <ttl>                   set IPv4 Time To Live or IPv6 Hop Limit

  -v                            verbose

  -W <timeout>        set timeout to wait if no responses are received (secs)

  -X                            XML output format for esxcli framework.

  -S                           The network stack instance name. If unspecified the default netstack instance is used.

Disable vSphere Managed Object Browser (MOB)

To harden your ESXi 6.0 hosts, we disable the MOB service so that any attacker can’t get to the web browser and access the MOB of the ESXi host (ex: https://esxi01.lab.com/mob), this setting will disable one of the attack vectors of theESXi hosts in the environment.

to do this, you SSH into the ESXi host where you want to disable the mob service and perform the following commands

esxi01# vim-cmd proxysvc/remove_service "/mob" "httpsWithRedirect"

to verify if the mob service has been removed from the ESXi host, use the following command

esxi01# vim-cmd proxysvc/service_list

the above command will list all the services on the ESXi host, look for the service “/mob”, if you don’t see this service, then it has been removed. if it is still there, then you will have to perform the first command and reboot the ESXi host to disable the mob service from the host.

 

 

Can’t fork error on ESi 6 host on UCS Blade

Recently, I was working on an UCS blade firmware upgrade along with esxi upgrade from esxi 5.5 to 6.0 and came across this error where the esxi host became unresponsive with an error “can’t fork” on its DCUI.

here is a little background on this story, this particular blade was B240 blade which was being used as SAP HANA blade by the customer and the firmware upgrade and esxi upgrade went fine and two days later the host became unresponsive and we couldn’t connect to it using SSH, DCUI, etc, connecting to the kvm console revealed the below screen when we went to its Alt+F1 command interface

Esxi_Cant_fork_error

we had to bounce the box and we had to reduce the linux vm memory which was hosting SAP HANA on it to be 10% less than the memory of the esxi host.

Conclusion: The HANA VM (linux) on the esxi host should have 10% less memory than the overall memory of the esxi host to avoid this problem.

How to Install Cisco VEM vib on an ESXi 6 host

Recently, I had to install the Cisco vem module onto an esxi 6 host as it was not installed and i couldn’t join the esxi host to the cisco nexus 1000v distributed switch. here is the process on how to first check if the vem module is installed on the esxi host.

SSH into the esxi host and run the following commands to check if the VEM module is installed

host# esxcli software vib list | grep -i vem

the above command will display the cisco vem module installed on the esxi host, if nothing is displayed then you will have to install the vem module by downloading the vem vib from the nexus 1kv in the environment.

i did it by going to https://nexus1kv_hostname    in a web browser which will display you the vibs which you can download from nexus 1000v, download the vem vib associated with your environment and run the following command to install the vib onto the esxi host

upload the vem vib file onto a datastore on the host

SSH into the esxi host where you want to install the vem module

host# esxcli software vib install -v /vmfs/volumes/<directory_path>/cross_cisco-vem-version_x.x.x.x.x.vib

NOTE: directory_path in the above command is the place where the cisco vem vib is stored. (name of the datastore/volume)

Once the vib is installed you can check the status of the vem by using the command

host# vem status

The above command will display that the VEM agent (vemdpa) is running