Fixing SDDC Manager Inventory Sync Issues for ESXi Hosts

I recently encountered an issue in my lab. I was trying to patch my ESXI hosts from version 8.0U3b/3d to 8.0U3e/f. I used the SDDC Manager and an Imported LCM Image (Dell Custom ESXI Image). The task was failing at Post check like in the below Screenshot

On digging a little deeper into the issue, I found the SDDC Manager Inventory Sync to be the problem. The ESXI Hosts Upgrade is done. Yet, the SDDC Manager does not register that all the ESXI hosts in the cluster have finished upgrading. As a result, it fails.

As you can see in the above image, SDDC Manager doesn’t see the proper host version. This issue affects all the ESXI hosts in the same cluster.

I did verify that all the 4 hosts are of the same version (In this instance the version is 8.0.3-24784735)

This issue can be resolved by performing an Inventory sync from within the SDDC Manager. Use the asyncPatchTool for this task. You can download it from the Broadcom website. Here are the Instructions on how to download the async patch tool from the broadcom website.

** You need to have an active entitlement to get this tool. **

Once you download the asyncPatchTool, transfer the tool (vcf-async-patch-tool-1.2.0.0.tar.gz) to /home/vcf directory in the SDDC Manager using WinSCP tool.

Make Sure you follow the instructions in this document in regards to the asyncPatchTool folder and then go to SDDC manager SSH and use the following commands to perform an inventory sync using the asyncPatchTool

./vcf-async-patch-tool --sync --sddcSSOUser administrator@vsphere.local --sddcSSHUser vcf

(Assuming your sddc manager sso account is administrator@vsphere.local)

As you can see from the above screenshots, perform an inventory sync using the asyncPatchTool. The correct versions of ESXI hosts and other products appear in the output.

In the below screenshot, you can see that I ran the asyncPatchTool Inventory sync. Then I checked the SDDC Manager. My ESXI hosts are all showing the correct version.

This concludes this article.

Deploying Workload Domain in Holodeck Toolkit 5.2x

In this post, I will be going over how to deploy a workload domain in the holodeck lab if you have only deployed the management domain with NSX Edge Cluster configured in it by using VLC GUI

In my lab, I was unsuccessful the first try in getting VLC GUI to deploy the workload management with NSX Edge Cluster in it, so I only deployed the management domain and then configured the workload domain using the SDDC Manager.

First, you will have to use “add_4_big_hosts_ESXi5-8.json” or “add_4_hosts_ESXi5-8.json” using the VLC GUI to provision 4 nested esxi hosts esxi5-esxi8 in the lab env.

Once the hosts are created, you will have to use the commission hosts option under Hosts in SDDC Manager to get the 4 esxi hosts into the SDDC Manager. Once the 4 esxi hosts are unassigned in the SDDC Manager, we will start the creation of the workload domain using the SDDC Manager.

NOTE: The SDDC Manager will only deploy 1 NSX Manager Appliance nsx1-wld even though you provide the network details for all 3 managers

Next Post will be on How to Add an NSX Edge Cluster to the workload domain.

502 Bad gateway when trying to access SDDC Manager Web UI

Introduction:

Recently I received a 502 Bad Gateway error when trying to access the SDDC Manager UI. This happened after Upgrading our VCF Environment from 4.5.0 to 5.1.1

After going through multiple troubleshooting steps, here is what I have done to resolve this issue.

Troubleshooting:

First things first, please check the volumes on your SDDC Appliance.

SSH into the appliance using the user vcf and then root and use the following command

#df -h

The above screenshot shows the output to check all the volumes on the sddc manager appliance.

In this case, the /data volume is at 100% capacity and that was causing the UI not to load. So, I dug a little deeper into this volume to see what was occupying all the space in it and found this file which is occupying it.

Resolution:

After investigating the data volume, I did some research on the volume being full and found the following kb article from Broadcom, KB311989 which shows the way to increase the capacity of the /data partition and then to reboot the appliance to get it working.

I followed the steps in the KB article and once I increased the /data volume and rebooted the sddc manager, all the services came back up in the SDDC Manager and its UI was up.

Conclusion:

The issue was with the /data volume being full. Once we increased the capacity of this volume from the above KB, the UI Issue resolved itself.

Hope this article helps.

An Unexpected Error Occurred When trying to access the settings in Aria Suite Lifecycle Manager (8.12.x)

Recently, I have patched our Aria Suite Lifecycle Manager from 8.12.0 to 8.12.x Patch 2 from its web UI and encountered a strange issue where we were getting the following error when trying to get into any of the settings like ‘System Patches, System Upgrade, System Settings, DNS, NTP Servers, Binary Mapping’ etc., on the web UI.

Upon closer inspection, I couldn’t find anything wrong with the appliance itself or its services.

I even restarted the appliance with no success

I checked the vpostgres service, vrlcm-server service and they were active.

Resolution:

I stopped and started the vrlcm-service and this resolved the issue. I am not able to get into all the settings on the VRSLCM Web UI.

The above screenshot shows the commands to stop and start the service on the VRSLCM SSH Session.

Hope this helps if you come across this issue.

Enable Certificate Validation in SDDC Manager (VCF 4.5.x)

Recently, I had to use the Asyncpatch tool in SDDC Manager to Patch our vcenter to 7.0U3o due to the Critical Security patch VMSA-2023-0023 and came across this issue when performing the precheck for Management Domain in SDDC Manager.

If you Expand “Sddc Security Configuration”, the error was on the option “VMware Cloud Foundation certificate validation check”

if you come across this issue, perform the following commands to enable the Certificate Validation Check in SDDC Manager

Review the Certificate Validation Setting

Command --
root@sddcmgr1# curl localhost/appliancemanager/securitySettings

Output --
{"fipsMode":false,"certificateValidationEnabled":false}

Enable the Certification Validation

Command --
root@sddcmgr1# curl 'http://localhost/appliancemanager/securitySettings' -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{"fipsMode":false,"certificateValidationEnabled":true}'

Check the Certificate Validation Setting after Enabling the Certificate Validation

Command --
root@sddcmgr1# curl localhost/appliancemanager/securitySettings

Output --
{"fipsMode":false,"certificateValidationEnabled":true}

You can observe from the above Output that the certificate validation is enabled as true.

Now, you can go ahead and retry the precheck and it will go through.

The final precheck which is green is shown in the screenshot below

How to Change the admin@local account password in VCF 4.5.x (UNOFFICIAL)

You might have come across an issue where the VCF REST API User account admin@local password is either lost or need to reset the password of this account.

NOTE: This is NOT an official way to do it according to VMware and is meant to be done under the supervision of VMware Support.

Login into the SDDC Manager as vcf and go to the root user prompt

sddcmanager#

type the following commands to reset the password of admin@local account on the sddc manager

mkdir -p /etc/security/local
chown root:vcf_services /etc/security/local
chmod 650 /etc/security/local
echo -n "" > /etc/security/local/.localuserpasswd
chown root:vcf_services /etc/security/local/.localuserpasswd
chmod 660 /etc/security/local/.localuserpasswd

type the following command to set a new password -  in this example it is NewP@SSW0rd010

echo -n 'NewP@SSW0rd010' | openssl dgst -sha512 -binary | openssl enc -base64 | tr -d '\n' > /etc/security/local/.localuserpasswd

Once you have changed the password, I would recommend to restart the sddc manager services using the following command

/opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh

Once the services are restarted, you can verify if the password for admin@local has been successfully changed by going to lookup_passwords

in lookup_passwords, use the admin@local to check if it can pull the passwords from the SDDC Password Manger.

Failed to get tasks data. Something went wrong. SDDC Manager 4.5 Password Management shows Error Message on its UI

Recently, after upgrading our SDDC Manager from 3.11.1 to 4.5, I came across an UI message on the SDDC Password Management tab as follows:

Issue Description: The error message shown in the screenshot above is “Failed to get tasks data. Something went wrong. Please retry or contact the service provider and provide the reference token.

Root Cause: As I was digging deeper into this, I found that the functionality of the password manager itself was fine, but it was somehow not able to get the complete task list data and hence throwing this error out.

after looking at the operationsmanager log at

/var/log/vmware/vcf/operationsmanager/operationsmanager.log

we found the following error in the log

2023-08-08T16:02:57.768+0000 ERROR [vcf_om,0000000000000000,0000] [o.a.c.c.C.[.[.[.[dispatcherServlet],http-nio-127.0.0.1-7300-exec-6] Servlet.service() for servlet [dispatcherServlet] in context wit
h path [/operationsmanager] threw exception [Request processing failed; nested exception is java.lang.IllegalArgumentException: No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManag
erErrorCode.PASSWORD_MANAGER_VRA_ENDPOINT_FAILED] with root cause
java.lang.IllegalArgumentException: No enum constant com.vmware.vcf.passwordmanager.exception.PasswordManagerErrorCode.PASSWORD_MANAGER_VRA_ENDPOINT_FAILED

This means that after the SDDC Manager Upgrade, it was not able to determine the error with “PASSWORD_MANAGER_VRA_ENDPOINT_FAILED” in its internal database

Solution: After consulting with VMware Engineering, the issue was resolved by going into the operationsmanager db on the sddc manager and executing the following command to replace the message with something which the SDDC Manager could understand.

Disclaimer: Do this at your own risk, I would highly recommend to contact VMware GSS if you have the same issue to get an official resolution.

operationsmanager=# update passwordmanager.master_password_transaction set diagnostic_message = replace (diagnostic_message, 'PASSWORD_MANAGER_VRA_ENDPOINT_FAILED','PASSWORD_UPDATE_VRA_ENDPOINT_FAILED');

Once this is executed, you will get an message of UPDATE XX (where XX is the update number if it has been updated successfully)

Conclusion: Now, you can refresh the SDDC Manager UI and you will not see the error message on the UI in Password Management as before.

Hope this helps if you see similar message after your VCF Upgrade from 3.11.x to 4.5

SDDC Manager 3.x [3.11.x] Issues & Solutions

Hello All,

Recently I came across a few issues while preparing for an VCF Upgrade from 3.11.x version to 4.x in our environment.

Below are few of the issues and how to resolve them using commands on the SDDC Manager

Issue: SDDC Manager UI shows the message “Password Manager option failed in pre-validation stage” or when you go to the security tab for password management, it shows that one of the password tasks have failed.

Issue: Deployment locked by password manager when you try to rotate the passwords of PSC, VCENTER from the security -> password management tab

Solution/s:

Find the Deployment Lock ID in the sddc manager by logging into sddc manager using SSH, login as VCF and then root user and use the following command

psql --host=localhost -U postgres -d platform -c "select * from lock"

This will display the ID’s of the locked tasks in the SDDC Manager

Delete the locked task by using the following command

psql --host=localhost -U postgres -d platform -c "delete from lock where id=<ID displayed above>"

Once the task is deleted, the lock is released. You can now refresh the SDDC Manager UI and then continue with the password update or rotate options

Issue: Remove Failed Tasks in SDDC Manager

Solution:

Reference https://www.martingustafsson.com/removing-failed-tasks-in-sddc-manager/

Multiple Useful Commands for SDDC 3.x

The Unofficial VCF Troubleshooting Guide v2 – https://www.lab2prod.com.au/2021/03/the-unofficial-vcf-troubleshooting-guide.html

Reference

MANAGING VMWARE CLOUD FOUNDATION – FIRST LOOK

Password Operation Failed to Change the SSO password on an external PSC in VCF 3.11

Recently I came across an issue trying to change the SSO account (administrator@vsphere.local) password from the SDDC Manager using the Rotate password option under Security in VCF 3.11

I tried to Rotate the SSO password using the SDDC Manager, and got the following error:

However, Interesting thing is the sddc manager did change the SSO password in the backend

However, to check on this error, I dug a little deeper and saw the following error in the password rotate task:

I used the following command to check the operationsmanager.log to check the log in SDDC Manager

less /var/log/vmware/vcf/operationsmanager/operationsmanager.log

The log also shows that the sddc manager is trying to change the sso credential (administrator@vsphere.local) on VRA endpoints

I had to open a VMware Support ticket and here is the answer I received:

“As per the Engineering team this issue is due to a misconfiguration of vRA endpoints. SDDC Manager is trying to change the administrator@vsphere.local on the VRA endpoints but VRA endpoints are configured with a different user (vcf-secured-user@vsphere.local).  This issue is addressed in VCF 4.x”

What the VMware Engineering team is saying is that in VCF 3.10.x, 3.11 there is an issue with VRA as it is typically configured using a different tenant admin instead of using administrator@vsphere.local user to configure the endpoints in it. However, the SDDC manager is trying to change the administrator@vsphere.local credential on VRA endpoints. Hence this issue. Looks like this issue has been fixed in VCF 4.x

This resolves the issue at this time as we will be working to upgrade our VCF to 4.x soon.