Best Practices–Logging Script Events

Scripts are a part of monitoring, and those scripts sometimes may fail for any number of reasons. When a monitoring script fails, it is essential to capture those failures. In this post, I will briefly go over two types of events (exception and debug)that every management pack developer should consider for script-based modules.

Exception Event

These types of events (or exceptions) are usually generated when a resource is not accessible for some reason; cannot authenticate to resource, cannot connect to resource, resource does not exist, etc. In this case, the script cannot continue as expected.

Capturing script event context and generating a meaningful alert when there is an exception will enable an operator to more quickly understand a problem without having to put on a developer hat. Not capturing exceptions may result in a situation where a monitor may not be working and nobody ever knows about it, at least until a catastrophic failure occurs and everyone is asking why the monitoring tool did not catch it.

Debug Event

Debug events can optionally be handled within a script, and can provide useful information to a monitoring administrator while tracing a script-based workflow. These events can be logged anywhere within the script that makes sense to you. It should read like a short story to an administrator (sequentially), so anyone can follow what’s happening in the script when viewing the sequence of events in the Operations Manager log.

Debug events typically will not generate an alert, and writing debug events should be an optional setting and disabled by default.

Writing Events

Writing events to the Operations Manager log using the LogScriptEvent method is described here.

Example:

Dim oAPI, oBag 
Set oAPI = CreateObject("MOM.ScriptAPI")
Call oAPI.LogScriptEvent("YourScriptName.vbs",101,1,"Something bad happened!")

Here is an example of a Powershell function that writes events and includes some debugging logic:

function WriteToEventLog ($id, $level, $param0, $message) {
if ($debugFlag) {
$momapi.LogScriptEvent($param0,$id,$level,$message)
} elseif ($level -ne 4) {
$momapi.LogScriptEvent($param0,$id,$level,$message)
}
}

In the above example, Param0 is a common placeholder for the script name, but it can be anything that makes sense to you or an operator.

Taking this a step further, consider also implementing a try-catch where you think there is potential for an exception in the script, like a problem connecting to a resource. This is an excellent way to provide additional context in the event log, and optionally (ideally) bubble up into an alert in the console.

This example uses the Write-EventLog cmdlet, which is described here.

Example:

try { 
#...do something...
} catch [system.exception] {
$message = $_.Exception
Write-EventLog –logname Application –source YourSource –eventID 101 –entrytype Error –message $message
}

Grey Agents With Reason (gray agents)

A few years ago I wrote some TSQL to return all grey agents with the reason code. This worked fine in SCOM 2007, but it doesn’t work in 2012 environments for some reason. I basically just modified the WHERE clause, removing a bunch of SELECT statements – I’m not sure why I added those additional SELECT statements, but there must have been a reason.

I am reposting the TSQL here, updated for SCOM 2012. There was also a small bug fixed with the outage days column –  my initial query did not use UTC time in the DATEDIFF calculation, which would cause a negative value for newly grey agents and was off n hours depending on your local time zone.

/*
Gray agents with reason
Jonathan Almquist (jonathan@scomskills.com)
Updated 02-24-2014
*/
USE OperationsManagerDW
SELECT
    ME.Path,
    HSO.StartDateTime AS OutageStartDateTime,
    DATEDIFF (DD, HSO.StartDateTime, GETUTCDATE()) AS OutageDays,
    HSO.ReasonCode,
    DS.Name AS ReasonString
FROM  vManagedEntity AS ME INNER JOIN
    vHealthServiceOutage AS HSO ON HSO.ManagedEntityRowId = ME.ManagedEntityRowId INNER JOIN
    vStringResource AS SR ON HSO.ReasonCode = 
    REPLACE(LEFT(SR.StringResourceSystemName, LEN(SR.StringResourceSystemName)
    - CHARINDEX('.', REVERSE(SR.StringResourceSystemName))), 
    'System.Availability.StateData.Reasons.', '') INNER JOIN
    vDisplayString AS DS ON DS.ElementGuid = SR.StringResourceGuid
WHERE (HSO.EndDateTime IS NULL)
    AND (SR.StringResourceSystemName LIKE 'System.Availability.StateData.Reasons.[0-9]%')
    AND DS.LanguageCode = 'ENU'
ORDER BY OutageStartDateTime
 
 
 
🙂