I was recently tasked with an issue where our CIM probe was failing during CIM requests to new VMWare ESXi 6.5 servers we deployed. We were getting connection rejected failures from our probes which resulted in no valuable data being returned. We started following the breadcrumbs which lead us back to the ESXi host. We opened the UI and checked the health monitor in the UI and found it was showing “No sensor data available”. The first thing we checked was to see if the sfcbd-watchdog was running, and it was not. By default, this service was turned off, or so we thought! We turned on the service and the UI reported that the service was now running.
Even after several refreshes of the UI it stilled showed running but we still received a connection rejected. We rebooted the ESXi host and after it came back we tested the connections again and are still failing. We reopen the web UI and looked at the services again and there was our watchdog service stopped. We had set the service to autostart with host so this lead us to believe it must be dying at some point.
The best way to see what a service doesn’t like is to login to ESXi host using SSH and manually start the process and see what it’s output is. A quick /etc/init.d/sfcbd-watchdog start showed us that the service was “Administratively disabled”.
After digging around Google for some reference to this new data we came across a blurb about setting an option to allow CIM manager to run.
The command esxcli system wbem set –enable true followed by /etc/init.d/sfcbd-watchdog start allowed the sfcb-HTTPS-Daem process to start. This process is the TCP Listener that takes CIM requests from probes like ours and returns the health of the hardware.
You should get an output like the following
/etc/init.d/sfcbd-watchdog start
sfcbd-init: Getting Exclusive access, please wait…
sfcbd-init: Exclusive access granted.
sfcbd-init: Request to start sfcbd-watchdog, pid 69438
sfcbd-config[69448]: No third party cim providers installed
sfcbd-init: snmp has not been enabled.
sfcbd-init: starting sfcbd
sfcbd-init: Waiting for sfcb to start up.
sfcbd-init: Program started normally.
Invoking lsof -nPV | awk {‘count[$2]++}END{for(i in count)print count[i], i’} | sort -n in the SSH console will produce a list of running processes minus all the junk. You can use this list of processes to determine what is running on the ESXi Host.
We also used esxcli network ip connection list to get a list of ports the ESXi host was listening on to help determine if the port 5989 was active.
If you are deploying VMWare ESXi 6.5 in your environments and need CIM health data, remember to enable it and do not just assume that the WebUI is telling you it is active.
Plugins4LabTech has created a new plugin that will monitor the CIM data announced by the hardware running VMWare ESX software. We can report on any hardware that follows CIM standards and makes it’s hardware statuses known to VMWare. This simple to use plugin deploys in just minutes and can be setup by anyone with little or no knowledge of LabTech functions and processes. If you want to see in action the easiest ESX Hardware Health Monitor available for LabTech then have a peek at our video on our YouTube Channel.
Never be in the dark again over hardware health!
Get alerts and tickets when hardware failures are detected. Using the monitor agent supplied with the plugin get quick responses to failures and warnings directly from VMWare. When alarms happen get emails, tickets and messaging alerting you to the failure that is taking place on the hardware.
Plugins4LabTech’s VMWare Health Monitor plugin for Labtech uses an agent to talk to VMware ESX hosts and retrieves the CIM data for the hardware the ESX host is running on. The plugin processes this data and stores it in LabTech’s database to be used in the views and alarms the plugin issues when failures are seen. You will see data on your RAID arrays and SCSI controllers, Power Supplies failures and system overheats. If the hardware is reporting it to VMWare we can see it.
Main ESX Health Monitor Console
The main view lets you see all the ESX hosts under management sorted by Client. In this view you are able to turn on and off the collection of data, set the interval of how often the data is collected and enable alerting for any seen failures.
ESX Health Monitor Client Console Tab
The Client Console tab is where you would add and delete the ESX Hosts you want to monitor. You are able to force the update of ESX CIM data by selecting to rescan ESX hosts and you can view the full CIM data list of any ESX Host.
ESX Health Monitor CIM Data
The CIM Data view shows all the collected CIM data on a given ESX host. If any data is in a state not considered ok then those lines will be represented with warning and alarming icons. You will also get the hardware manufactures data back and in the case of Dell that is the Hardware type and Service Tag.
ESX Health Monitor Internal Agent
Under the main view you can select to enable alerts. When you do this agent is created in the internal monitors of LabTech. You can use this monitor to email, ticket, alert or message anyone when failures are picked up.
A simple plugin all self contained and easy to deploy. Just a few clicks to configure and you have data. So easy to use anyone can setup and have it working in just a few minutes.
The plugin is currently with the Squid Squad getting a final review before we release the new plugin. Follow our release notes at http://support.plugins4labtech.com
Well it’s finally here, the plugin we have all been waiting for, Vmware ESXi Health Monitor plugin for Labtech. This plugin installs into your Labtech system as a Location plugin to monitor the CIM data available in most hardware. Easy to configure controls and full view of the CIM data collected is just part of what this plugin can do. We have incorporated new functions into this plugin that are stark differences from our earlier version 2 and version 1 ESXi Health Monitors. We are now supporting multiple usernames and passwords per location for ESX hosts and only require 1 probe system to monitor all the ESX hosts at a single location. I could talk all day about the plugin but maybe it’s better if I show you.
Here is the main view of the ESXi Health Monitor plugin.
The hosts get listed with status face based on current status and when you select a host the CIM data is displayed so you can see all the reported statuses of the hardware.
This is the ESXi host configuration tab of the ESXi health Monitor plugin.
This is where you will add and edit your host systems to monitor, you can set the system you want to be responsible for probing the ESXi hosts for the CIM data. Then you select a system and “Set” it as the probe the plugin will launch a script against the probe to prep it for monitoring automatically. You will not need to “install” the probe software manually as this is handled by the plugin when the probe system is selected.
The ESXI Health Monitor plugin uses a custom group to locate all the probes available by using a custom search to locate all systems with the EDF of “VMware Master CIM Scanner” selected as seen it the example below. You will find this setting under the Info tab -> VMWare on any system console. Just checking it will not install probe software so you must “Set” the probe via the plugin.
The Custom group should look like this and have the custom search setup as seen in this example.
The custom group also has a scheduled script to run every 2-4 hours and I use the exclude time range as I do not need this data so bad that I can’t sleep with out it running every hour or 2. This is just my personal preference but saves on CPU cycles during backup windows and maintenance schedules. You will need to reset this when you import your group as this seems to always get rest to nothing during imports.
You can see your probe run on the system by watching the script logs on the probe systems console. This helps when troubleshooting common issues.
This version is in Beta! this is the first release of this plugin and as such may have odd behavior issues and or may not work for you as expected. I seriously doubt you will have any issues but as this is Beta expect a few minor glitches. We are actively working on updates and with your help we can make this a great plugin!
You can run the version 2 and version 3 side by side and they will not effect each other.
Updates:
———————————————————————————————————-
Changes in New Version 0.3.0.3
Changed Version number back to what it should of started at. It was a typo starting with (3)
Added Internal monitor called CIM – VMWare ESXi Health Monitor
Added Client View and Global View of System Statuses
Corrected a few minor coding mistakes
Added Linux Probe Support
Updated the data views
Changes in New Version 0.3.0.5
Added Last Scan Time Stamps
Added color coded data views
Added new images
Changes in New Version 0.3.0.6
Fixed Scan Time not updating
Changes in Version 3.0.7
Updated Python Packages to 2.7.8
Fixed several SQL issues with table creation.
Minor enhancements
Changes in Version 3.0.8
Added new instant host probe from the configure tab
Fixed minor issues in plugin
Fixed issues with installer script
Changes in Version 3.0.9
Fixed CIM Monitor in LT
Fixed Versioning context
Fixed password bug when resaving the same password for esx probe.
Changes in Version 3.0.10
Fixed Issues with Installer where latest PY script requires several new Python modules.
New Client Level View
This view give you a look at all systems under the clients control. you can select the system and review the CIM data returned.
This plugin monitors the condition all the systems and will alarm when a system is found to be in error. You can customize the alarms and the methods of alerts received through the Monitors management interface in Labtech .
You can download the latest version here.
Version 3.0.10 Now Available
If you like what we are doing then please donate to our cause, help keep our software free.
How to install plugin
Please post here any comments and issues you may have so we can get them fixed and out in the next release.
Squidwork’s ESX Hardware monitor is a set of scripts, a custom group and search for Labtech that will monitor the CIM data provided through the VMWare API for ESX 4 and 5. The probe will launch hourly and report back to Labtech any hardware failure or warning. The script will email an alarm to any email address you would like. The script can be modified to also set an alarm, create a ticket or anything else Labtech scripting will let you do.
New in version 2.1:
We added several checks for false alarms and socket errors to prevent alarms and emails on non failures.
We added alarm flood control, once a email goes out it will not send another until the system has reported a “all OK” then alerts are reset to go out on next fail.
Added extra EDFs to control processes.
Here is how it works:
Download the zip file, extract and import the XML files into your Labtech system.
Addendum Update:
After you import “all” scripts in Version 2.1 Download this zip and import this script. This script should then be used in your group scheduled script probe instead of the v2.1. This v.2.2 of that one script. Download update here
Download extra files directly if import fails for any reason here.
After the import you should have a VMware script group that has 3 scripts in it.
Script #1 (The Installer) will install the monitor to a Windows system. You will need to provide the FQDN or IP of the ESX host you want to monitor when you execute the installer script. When the scheduler pops up make sure to add your ESX host. The only thing you should need to do is execute the installer on a Windows system. The installer will configure the system and add the system to the custom group and search. You do not need to configure anything else at this point. The ESX user and password will be fetched from the Locations password menu for VMWare.
The next script (The Monitor) is assigned to the custom group “Systems that monitor ESX hosts” to run every hour. You can modify this to run at what interval you like. The Monitor will query your “Locations” passwords database table and retrieve the VMWare user listed just like the original Labtech probes do. The monitor get the CIM health data and returns it to Labtech, It also looks to see if we are “Not OK” and fires off a email if failures are picked up.
After the Monitor runs you should see data on the Info Tab -> VMWare sub tab
You will need to edit the monitor script updating the email address that it reports to when failures are found, you can also modify monitor script to create a ticket, fire off an alarm, set an alert or anything your heart desires. Line 39 of the script ESX Hardware Monitor V2-1 needs to be edited and the example@example.com email changed to the email you want to get the alerts.
The 3rd script is a updater script that will fetch the latest build of the Nagios Plugin:”check_esxi_hardware.py” script maintained by www.claudiokuenzler.com You can run this script against any Windows box that has the monitor installed and it will get the latest version of this script and deploy it to that system. This way you can keep up with all the fixes they do to this script. You may want to run this script on the group once or twice a year just to make sure you have the latest fixes and updates.
Wouldn’t it be really cool if you could somehow safely access any VMWare vSphere ESX 5 host directly just using the local vSphere 5 Client installed on your workstation without porting and NATing traffic through your customers firewalls? With Labtech that is no problem, by setting up a Application Redirector you can create a new proxy that will pass all your traffic to any ESX or vCenter host and allow you to fully manage that host using your own installed vClient on your workstation, using the Labtech server and an agent, wow is it fast.
Let me show you how to do it.
I want to thank the guys over at www.labtechgeek.com for creating the outline I am following here.
1.) You will need to create a new redirected app named vSphere Client.
Goto [System Dashboard] -> [Config] -> [Redirected Apps]
2.) In the program field, enter the location of your local vShpere client like -> C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Launcher\VpxClient.exe
3.) Now we will redirect the following ports.
Local Port: 443 Local IP: 127.0.0.1 Remote Port: 443 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 902 Local IP: 127.0.0.1 Remote Port: 902 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 903 Local IP: 127.0.0.1 Remote Port: 903 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 8080 Local IP: 127.0.0.1 Remote Port: 8080 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 9443 Local IP: 127.0.0.1 Remote Port: 9443 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 10080 Local IP: 127.0.0.1 Remote Port: 10080 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 10443 Local IP: 127.0.0.1 Remote Port: 10443 Remote IP: %RemoteIP% Type: TCP Local Listen
Local Port: 902 Local IP: 127.0.0.1 Remote Port: 902 Remote IP: %RemoteIP% Type: UDP Local Listen
Redirector type should have the “Computer” box checked, this makes the redirector show up on computer consoles along side the other redirectors.
4.) Now we need to create an entry in your hosts file for the redirector to work. Add -> “127.0.0.1 vsphere-redir” to your local host file. If you just use 127.0.0.1 or localhost when the client pops up then the client may actually try to connect to the NetBIOS name of your PC, which will not work.
5.) Reload your systems cache and you should see the redirector show up under Computer consoles Redirectors menu.
To connect directly to an ESXi host, while holding the Shift click on the vSphere Client redirector from meun. This will prompt you for an IP address for the remote ESX host you want to control – enter the IP of the ESXi host.
Once you place your IP of the ESX host in the IP for Redirection box click OK. Give it a few seconds to get the proxy up and app to launch then you should see your VMware client pop up. You now type in to IPaddress/Name area “vsphere-redir” as the host IP and then the user and pass needed to log in to the ESX host.
***Note – When you’re connecting directly to a host, or to vCenter, you must always enter vsphere-redir as the IP Address/Name in the VMWare vSphere client.
***Note – If you’re connecting to a vCenter server, you won’t be able to view the console of any VMs (MKS) – this is because the vSphere Client makes a direct connection to the ESXi server on port 902. If you connect directly to the ESXi host, MKS works fine.
We have several customers on sketchy hardware and on occasion the VM crash due to a SCSI card issue with the mother board used, that aside we have from time to time a need to force a hard reboot of the server running in a VM. Some times it works great and sometimes we have a lockup at 95% and have to force a kill of process that runs the VM to get it to restart.
So here is the process we take to get this to free up and reboot the VM on ESXi 5.0 and later VMware hosts.
Make sure if you do not already have it turned on, to turn on SSH on the ESXi Host. This can be done via the [Configuration -> [Security Profile] using the VMware client.
Using your favorite SSH Client (Putty), connect to your VMWare ESXi 5 Host.
We now need to get and kill the process group for the VM that has failed. To do this we will look for the process group ID using this command. execute -> ps -g|grep “VMName”You should get a return that looks similar to this.
We are looking for the common number across all processes and in this case that would be “3368” as seen near the end of each line.
Now will need to kill the process. To do this we need to execute -> kill -9 3368 Replace “3368” with the ID number of your system.
Now we need to do some clean up, We need to delete the swap file in the directory where the VM is stored. To get to where we store the swap file you will need to do the following.execute ->cd/vmfs/volumes/<YourDataStore>/<VMName>
Next we need to make sure what our swap file name is so execute -> ls
This will give you a directory listing find your swap file by looking for the file extension “.vswp”. Now we will remove it with this command.
execute -> rm –r <YourSwapFile.vswp>
Now lets restart our VM services, This will not affect any running VM and is safe to run while VMs are active on host.execute -> /sbin/services.sh restart
Reconnect your VMware client to the host and complete the process to power on the VM by first removing the VM from the inventory (Do not Delete from Disk) and re-adding it back in. This will reset the VM fully and allow you to restart it. After you remove your VM from inventory you can re-add it by browsing the datastore in your VMWare client finding your VM directory and right clicking on the “.vmx” file. A menu will pop up and you can click “Add to Inventory” which will place VM back into the available VMs list. Now select the VM and click the Boot up arrow button to get started again.
From the skunk works here at Squidworks comes another great monitoring script for Kaseya.
This script uses the SDK provided by VMWare to query the ESX host and return a good or bad variable. If the hardware test fails then the script grabs the log of the test and uploads it to Kaseya Server then places it under the “Get File” area for the host that ran the test. You can run this script on any windows box, I have also included the current vSphere SDK installer and a Kaseya script to install it if it is not found on the Windows Host.
Upload the SDK installer and Import the scripts to your public files area in Kaseya under the directory “VMWare”. If you place files anywhere else you will need to edit script for the new location of files.
The script then makes a unique event log entry into the Windows Application Event Log under the Application Events that can then be picked up by Kaseya’s Event Log Monitor. When Kaseya picks up this event you can instruct the monitor to create an alarm, create a ticket, run another agent procedure or email the alarm to an address(s). Just schedule the agent procedure to run a couple of times a day to keep an eye on your customers VMware vSphere ESXi Hardware health.
This script links to the CIM information provided by the hardware to the ESX host. You will see CPU, Memory, Fans, RAID and Controller Health. The log file that is uploaded will only show failures and will tell you what failed and on what ESX host.
Grab the Hardware Health of your VMware vSphere ESXi Host
Here is RC1 of my ESX Health Script for the XYmon monitoring server, not to be compared to the ESXi script going around for VM and Snapshot Health. ESXHealth monitors the physical health of the hardware running the ESX hyper-visor This script uses the xymoncgimsg.cgi to send status reports from a remote network to your XYmon Display Server thus allowing you to monitor ESX Host from anywhere really easily. Using this CGI allows you to run it from any windows box and send the notifications through port (http) friendly firewalls.
The only prereqs are, you must have installed the VMWare vSphere Perl SDK and have installed CURL or place the curl-nossl.exe provided in zip in the your PATH on Windows. Your XYmon server must have the xymoncgimsg.cgi moved from the xymon/server/bin folder to your xymon-cgi folder to allow web based status messages.
I have include the curl-nossl portable EXE with zip. Just drop it in the path for windows so you can call it by name. You will need to edit the script and update the URL to send the notifications to. I show an example in the script on how to use with a htpasswd protecting your web CGI for thoes who use that layer of security. If not then just place the standard URL for your xymoncgimsg.cgi and your good to go.
The script is very simple.
If all is good you get a “All’s OK” else if anything is bad it spits out any relevant info about that issue. The last error I got was for redundant Power lost to Supply 1 and was reported with plenty of detail to know what is wrong.
Just schedule windows to run every 15 minutes or so, see script for command line syntax for task scheduler
VMware vSphere vCenter Server 5.1 now uses a new SSO (Single Sign On) service to authenticate with Microsoft Active Directory when deploying vCenter. If you do not install this services and configure it for AD then you will not be able to use your domain accounts with vCenter 5.1 During the initial install you may get errors when installing SSO. KB 2034374 reports that a error of ” Error 29155 Identity source discovery error” is due to a failed attempt to automatically discover your Active Directory domain. Verify that the domain name and DNS are setup correctly.
Now lets setup an AD server in vCenter to allow our Domain Accounts. First we will login to vCenter Web Client (https://127.0.0.1:9443) if you used the default ports for the web client installs. The default login is admin@system-domain and the password you set for SSO during the install process. Once you are logged in to the web client you can continue.
Now Select [Administration]
Now Select [Sign-On and Discovery] -> [Configuration]
Under the Identity Sources Tab in the right pane select the PLUS symbol to add a new AD source. This will pop up a “Add Identity Source” window, select the active directory radio button and fill out the requested information with you AD Domain name and the “OU” the holds your users and groups.
Here is the generic information you will need just replace the sesenviron.local with your domain and then place your AD credentials at the bottom.
Now that we have a AD server assigned as a source we must now add this newly created connector to our “Default Domains ” list.
Now that we have it in our Default Domains list lets move it up to be our primary source. To do this highlight the AD domain name and select the blue arrow head pointing up and move the domain name to the top of the list .
Now lets select the small floppy disk icon to save the changes to the default domain list box. Once this is complete we should be able to open up the vSphere client application and log in with domain access. You should be using a domain level admin to access vCenter.