Category Archives: Tricks

Suppressing Module Events (event id 21405 example)

I was recently faced with an uncommon scenario, where I needed a script-based discovery to exit without submitting data. Exiting a script-based discovery without submitting discovery data, even if it’s an empty data item, is not considered a good practice. But in this particular scenario, I understood the reasons and the resulting instance space was minimal, so I was open to finding a trick that would accomplish this without having 21405 events logged on the agent-managed computer at each interval if the instance did not pass the test.

What I discovered was the CommandExecuterEventPolicyType in the System.CommandExecuterSchema. This is a Schema Type defined in the System.Library.

All script-based probes have a default event policy that describes how exit codes, standard error, and standard out are handled by the system. If any of these data streams match an event policy expression, the system will log an event to the Operations Manager log on that agent-managed computer.

For example, the Microsoft.Windows.ScriptProbeDiscoveryBase has a default policy as follows:

<DefaultEventPolicy> 
<StdOutMatches Operator="DoesNotMatchRegularExpression">&lt;DataItem.+/DataItem\b*&gt;}|{&lt;DataItem.*/&gt;}</StdOutMatches>
<StdErrMatches>\a+</StdErrMatches>
<ExitCodeMatches>[^0]+</ExitCodeMatches>
</DefaultEventPolicy>
 
In my particular scenario, this default policy was capturing an exited discovery script as an error and writing the following event 21405 on those agents.
 
The process started at 2:04:14 PM failed to create System.Discovery.Data, no errors detected in the output
 
By overriding the default event policy, without having to change anything else in the module, those errors built into the product may be ignored by specifying event policy configuration as follows.
 
<EventPolicy> 
<StdOutMatches>^suppress_event_21405$</StdOutMatches>
<StdErrMatches>^suppress_event_21405$</StdErrMatches>
</EventPolicy>

In this example, the override has been documented by specifying an expression that describes a reason for overriding the policy. This expression will never match actual standard out or standard error, so it serves two purposes.

Overriding a default event policy like this is very uncommon, and I do not recommend it unless you have a good understanding of how it may impact your workflow – but it is a nice trick to use if you find yourself in a unique situation that calls for it.

As a side note – this scenario probably could be handled in a more sophisticated way, by composing a new module that would filter the data stream before it even reaches a discovery module. This way would produce no errors, so overriding event policies would not be required. Food for thought!

Coupling time offset to monitoring interval

The requirements gathering phase of the management pack development lifecycle is critically important to the success of the project. Something that may come out of this phase is receiving company health check scripts, and this is an excellent opportunity to incorporate familiar company knowledge into a new monitoring solution.

These scripts might be used to check for some condition that may have occurred in the past n minutes or hours – n is referred to as a time offset in this case. This article will briefly describe a simple concept to a best practice around implementing this type of script in a custom data source.

This concept can be broken down into the simplest term, where n and monitoring interval share configuration.

For example, a script executes the following SQL query:

SELECT COUNT(Column1) as [Count], Name 
FROM MyDatabase
WHERE Timestamp BETWEEN DATEADD(minute,-60,GETDATE()) AND GETDATE()
GROUP BY Name

The part I want to draw your attention to is the WHERE clause in the SQL query, because this is where time offset comes into the picture – it is how time offset is identified, and allows for the implementation of this coupling concept.

The query above would return records that have been written in the past 60 minutes from now. When the script is plugged into a data source, “now” is the monitoring interval, which is configured on the scheduler that triggers script execution.

So, we conclude that “now” is IntervalSeconds on the simple scheduler module.

Now that we know we can couple time offset with monitoring interval, we can easily use the same value for both by sharing the same configuration. In order to do this, two minor changes need to be made in any script you plan to incorporate using this concept:

1. Ensure time offset is in seconds.
2. Replace the time offset value with the IntervalSeconds configuration.

In this scenario, we cover points 1 and 2 above by updating the 1st and 2nd arguments in the DATEADD function like this:
 
WHERE Timestamp BETWEEN DATEADD(second,-$Config/IntervalSeconds$,GETDATE()) AND GETDATE()

Now compose the module as usual…
Why is using this concept a good practice?

Monitoring interval is a standard override parameter, and inevitably it will be overridden – maybe not on this particular monitor, and maybe not until you’re long gone. But don’t assume the customer is going to keep the default interval – ever.

By coupling script time offsets to monitoring intervals, a basic interval override will not cause monitor state skewing.