Category Archives: Administrative Tasks

Find base class, hosting class, and all properties using powershell

A few years ago, I wrote a powershell script that returns the entire base and host class path for a given class, including all available properties on each of the classes. This can be useful in a few different scenarios, including management pack development.

I haven’t used the script in a while, but the other day ran into a situation where it was quite handy. I had to update it to work with SCOM 2012, however, so I thought I would post the updated script here for future reference.

You can find the SCOM 2007 version here.

 

##This script accepts a class name, and returns the entire
##Base class path.  It also returns Host Class for each
##Base class returned.  You'll see the entire
##class path for the given class.
##Author: Jonathan Almquist, Scomskills
##version 2.0 (for SCOM 2012)
##Original: 11-01-2008
##Updated: 07-07-2014

##Usage = getClassPath.ps1 <class_system_name>
##Example = getClassPath.ps1 Microsoft.Windows.Computer

param($classname)
$ast = "-"
$class = get-scomclass | where {$_.name -eq $classname}
Write-Host ($ast * 50)
Write-Host "TARGET CLASS" $class
Write-Host ($ast * 50)`n
while ($class -ne "False")
    {
    $property = $class | foreach-object {$_.getProperties()} | Select-Object name
    foreach ($value in $property)
        {
        if ($value.name -ne $null)
            {
            write-host `t`t`t`t $value.name
            }
            else
            {
            Write-Host `t`t`t`t "No properties"
            }
        }
    write-host `n
    Write-Host ($ast * 50)
    Write-Host "BASE CLASS PATH for" $class
    Write-Host ($ast * 50)`n
    $baseclass = get-scomclass | where {$_.id -eq $class.base.id.tostring()}
    While ($baseclass.base.id -ne $NULL)
        {
        $baseclass.name
        $property = $baseclass | foreach-object {$_.getProperties()} | Select-Object name
        foreach ($value in $property)
            {
            write-host `t`t`t`t $value.name
            }
        $baseclass = get-scomclass | where {$_.id -eq $baseclass.base.id.tostring()}
        }
    if ($class.hosted -eq "True")
        {
        $hostclass = get-scomclass | where {$_.name -eq $class.Name} | ForEach-Object {$_.findHostClass()}
        write-host `n
        Write-Host ($ast * 50)
        Write-Host "HOST CLASS for" $class
        Write-Host ($ast * 50)`n
        $class = get-scomclass | where {$_.name -eq $class.Name} | ForEach-Object {$_.findHostClass()}
        Write-Host $class
        }
        else
        {
        write-host `t`t "*Not Hosted*" `n`n
        $class = "False"
        }
    }

 

🙂

Agent Management–List Primary and Failover Configuration

Something I don’t like about using the SDK (powershell) to manage agents, are the get* member cmdlet’s to return information – large scale queries take too long! The SDK is typically pretty slow in this regard, and that’s a shame because I find myself writing TSQL to accomplish tasks that the SDK should be able to promptly handle.

Recently I wrote some TSQL that will return all agents with their associated primary and failover management servers. This is very informative when the question "where does this agent failover to", and it’s a speedy way to implement some sort of automation process to expedite agent assignment.

 

Here you go!

 

SELECT rgv.SourceObjectPath AS [Agent], rgv.TargetObjectPath AS [ManagementServer], 
       CASE
              WHEN rtv.DisplayName = 'Health Service Communication' THEN 'Primary'
              ELSE 'Failover'
       END AS [Type]
FROM ManagedTypeView mt INNER JOIN
       ManagedEntityGenericView AS meg ON meg.MonitoringClassId = mt.Id INNER JOIN
       RelationshipGenericView rgv ON rgv.SourceObjectId = meg.Id INNER JOIN
       RelationshipTypeView rtv ON rtv.Id = rgv.RelationshipId
WHERE mt.Name = 'Microsoft.SystemCenter.Agent' AND
       rtv.Name like 'Microsoft.SystemCenter.HealthService%Communication' AND
       rgv.IsDeleted = 0
ORDER BY rgv.SourceObjectPath ASC, rtv.DisplayName ASC

 

Something like this would take several minutes in small to medium sized environments, and maybe upwards of 15-30 minutes in larger environments. This little bit of TSQL returns in 1-2 seconds. Eat that!

 

🙂

Troubleshooting network device discovery–snmputil

UPDATE (09/04/2015): SNMPUtil is just one method for checking connectivity, but these days I prefer to use SNMPWalk or some other utility. I’m keeping this post here for archival purposes.

One of the first things you might want check while troubleshooting network device discovery in OpsMgr 2012 is to verify whether the network discovery server can connect to to the SNMP agent on the device. There are a few reasons why a network discovery server cannot connect to the device via SNMP, and one of the easiest methods to test this is to use a the SNMPUtil tool. This tool was included with the Windows 2000 resource kit (which is becoming increasingly difficult to find).

Here is a simple command to use to test whether a device is reachable from the network discovery server:

snmputil getnext <IP Address or FQDN> <community string> .1.3

Just replace IP Address or FQDN and Community String with your varables.

If the device is reachable via SNMP, you will receive a message similar to the following:

image

If the device is not reachable, you will receive a message similar to the following:

image

Health Service Heartbeat Failure

I’ve seen plenty of questions come up in the forums and from customers regarding the Health Service Heartbeat Failure monitor, and its associated diagnostics and recoveries. I spent a little time digging further into these workflows and thought I’d share what I found here. Hope this helps those curious about what’s happening under the hood.

Communication Channel Basics

After an Operations Manager Agent is installed on a Windows computer, and after it is approved to establish a communication channel with an Operations Manager 2007 management group, the communication channel is maintained by the Health Service. If this communication channel is interrupted or dropped between the Agent and its primary Management Server (MS) for any reason, the Agent will make three attempts to re-establish communication with its primary MS, by default.

If the Agent is not able to re-establish the channel to its primary MS, it fails over to the next available MS. Failover configuration and the order of failover is another topic, and will not be covered here.

While the Agent is failed over to a secondary MS, it will attempt to re-establish communication with its primary MS every 60 seconds, by default. As soon as the Agent can establish communication with its primary MS again, it will disconnect from the secondary MS and fail back to its primary MS.

Health Service Heartbeat Failure Monitor

To briefly summarize the Heartbeat process, there are two configurable mechanisms that control Heartbeat behavior. Heartbeat interval and number of missed Heartbeats. If the MS fails to receive a Heartbeat from an Agent computer greater than the number of intervals specified, the Health Service Heartbeat Failure monitor will change to a critical state and generate an alert.

Read more about Heartbeat and configuration here.

Diagnostic and Recovery Tasks

There are a couple of diagnostic tasks that run when the Health Service Heartbeat Failure monitor changes to a critical state. Ping Computer on Heartbeat Failure and Check If Health Service Is Running.

Ping Computer on Heartbeat Failure

This diagnostic is defined in the Operations Manager 2007 Agent Management Library and is enabled by default. This workflow uses the Automatic Agent Management Account, which will run under the context of the Management Server Action Account by default, to execute a probe action which is defined in the Microsoft System Center Library named WmiProbe.

This probe is initiated on the Health Service Watcher. Since the Health Service Watcher is a perspective class hosted by the Root Management Server, this is where the WMI query is executed when the Health Service Heartbeat Failure monitor changes to a critical state. Even though the agent may be reporting to another MS, it is the RMS that sends the ICMP packet to the agent.

Unlike the traditional Ping.exe program we are all accustomed to, which sends four ICMP packets to the target host by default, the WMI query is executed only once and sends a single ICMP packet, so there is no calculation of percentage of lost packets one would expect to see with Ping.exe.

Following is the WMI query executed on the RMS.

SELECT * FROM Win32_PingStatus WHERE Address = ‘$Config/NetworkTargetToPing$’

To verify the number of ICMP packets sent, I ran a traditional Ping.exe test and the WMI query used in this workflow and traced these using Netmon. The first two entries in the image below were captured from the WMI query, and the last eight entries captured were from a Ping.exe test using default parameters (four packets).

WMI query vs. Ping.exe
image

The WMI query results are passed to a condition detection module, which filter StatusCode and execute the appropriate write action. If StatusCode <> 0, the write action ComputerDown will set state to reflect the computer is down. If StatusCode = 0, the write action ComputerUp will set state to reflect computer is up.

The condition detection modules that filter StatusCode are actually the recovery tasks shown in the Health Service Heartbeat Failure monitor. These are the reserved recoveries, Reserved (Computer Not Reachable – Critical) and Reserved (Computer Not Reachable – Success), respectively.

Under the covers, these reserved recoveries are actually setting state of the Computer Not Reachable monitor, which is defined in the System Center Core Monitoring MP. Ultimately, if StatusCode <> 0, the Computer Not Reachable monitor will change to a critical state and generate the Failed to Connect to Computer alert.

Since this is a diagnostic task which runs during a degraded state change event, the Agent will only be pinged once when the Health Service Heartbeat Failure monitor changes to a critical state. If there are any network related problems after this monitor has changed to critical and the diagnostic task has ran, there will be no further monitoring regarding the ping status of this Agent and no “Failed to Connect to Computer” alert will be generated.

We can understand the root cause better based on whether the Health Service Heartbeat Failure alert was generated along with the Failed to Connect to Computer alert. If the Health Service Heartbeat Failure alert generated without the Failed to Connect to Computer alert, logic would tell us that the issue is not related to loss of network connectivity or that the server has shutdown or become unresponsive. Both alerts together generally indicate the server is completely unreachable due to network outage, or the server is down or unresponsive.

Check if Health Service is Running

This diagnostic is defined in the Operations Manager 2007 Agent Management Library and is enabled by default. This workflow uses the Automatic Agent Management Account, which will run under the context of the Management Server Action Account by default, to initiate a probe action which is defined in the Operations Manager 2007 Agent Management Library named QueryRemoteHS.

Specifically, this probe is initiated on the Health Service Watcher and queries Health Service state and configuration on the Agent, when the Health Service Heartbeat Failure monitor changes to a critical state. This probe module type is further defined in the Windows Core Library. It takes computer name and service name as configuration, and passes the query results through an expression filter and returns the startup type and current state of the Health Service.

If the service doesn’t exist or the computer cannot be contacted, state will reflect this. Depending on output of the diagnostic task, optional recovery workflows may be initialized (i.e., reinstall agent, enable and start Health Service, and continue Health Service if paused), but these recoveries are not enabled by default.