Category Archives: Best Practices

Coupling time offset to monitoring interval

The requirements gathering phase of the management pack development lifecycle is critically important to the success of the project. Something that may come out of this phase is receiving company health check scripts, and this is an excellent opportunity to incorporate familiar company knowledge into a new monitoring solution.

These scripts might be used to check for some condition that may have occurred in the past n minutes or hours – n is referred to as a time offset in this case. This article will briefly describe a simple concept to a best practice around implementing this type of script in a custom data source.

This concept can be broken down into the simplest term, where n and monitoring interval share configuration.

For example, a script executes the following SQL query:

SELECT COUNT(Column1) as [Count], Name 
FROM MyDatabase
WHERE Timestamp BETWEEN DATEADD(minute,-60,GETDATE()) AND GETDATE()
GROUP BY Name

The part I want to draw your attention to is the WHERE clause in the SQL query, because this is where time offset comes into the picture – it is how time offset is identified, and allows for the implementation of this coupling concept.

The query above would return records that have been written in the past 60 minutes from now. When the script is plugged into a data source, “now” is the monitoring interval, which is configured on the scheduler that triggers script execution.

So, we conclude that “now” is IntervalSeconds on the simple scheduler module.

Now that we know we can couple time offset with monitoring interval, we can easily use the same value for both by sharing the same configuration. In order to do this, two minor changes need to be made in any script you plan to incorporate using this concept:

1. Ensure time offset is in seconds.
2. Replace the time offset value with the IntervalSeconds configuration.

In this scenario, we cover points 1 and 2 above by updating the 1st and 2nd arguments in the DATEADD function like this:
 
WHERE Timestamp BETWEEN DATEADD(second,-$Config/IntervalSeconds$,GETDATE()) AND GETDATE()

Now compose the module as usual…
Why is using this concept a good practice?

Monitoring interval is a standard override parameter, and inevitably it will be overridden – maybe not on this particular monitor, and maybe not until you’re long gone. But don’t assume the customer is going to keep the default interval – ever.

By coupling script time offsets to monitoring intervals, a basic interval override will not cause monitor state skewing.

Best Practices – Alert Priority

It’s been a while since I’ve written a best practices article. I thought about an interesting one the other day I wanted to share. Alert priority is one of those configuration elements that seems to stay tucked under the covers in the vast majority of environments that I’ve worked in, but it can be an integral piece of operations – it’s also something to consider when developing management packs.

Because of this, I think it’s a good idea to take a look at alert priority from two perspectives; development and operations.

Development Perspective

This is a well known practice for management pack development, but I’ve actually seen vendors not adhere to this practice in their management packs (including Microsoft). That is – alert priority should ALMOST ALWAYS be set to medium in a management pack. A developer should not presume the priority of an alert for a couple reasons:

1. Developers do not know what a customer considers high priority, and;

2. Customers should have the ability to freely use alert priority in processing alerts without having adverse effects.

The only case a developer should set alert priority as anything but medium, is to set is as low priority. In no case should a management pack be sealed with any alert priority being set to high.

Operations Perspective

Alert priority can be an integral part of processing alerts. Whether a command channel is implemented, another System Center product is used (Orchestrator or Service Manager), or perhaps a 3rd party product connects to your environment; Alert priority should be one of the top 3 criteria in determining whether an alert should be processed.

Fortunately, most vendor management packs do set default alert priority to medium (or low). Hopefully the days are gone when some developers set default alert priority to high.

This means that you, as a SCOM administrator, can easily add alert priority as a criteria for alert processing. This can be accomplished by setting overrides for select alert-generating unit monitors and rules. In my opinion, the best way to go about this is to use MPViewer, because you can export the management pack to an Excel spreadsheet and hand that to the group that monitors the application for review.

Once the application group has reviewed the spreadsheet and selected the alerts in which they want to be notified on, they send it back and the SCOM team can set overrides on those workflows by updating alert priority to high. If your company employs the Advanced Operators role, then there is really nothing you need to do but provide them with the spreadsheet, because then you can assign a member of the application team the privilege to update alert priority based on their requirements and when they want. The latter will dramatically reduce administrative overhead, and overall this would reduce complexity of subscriptions.

Finally, the benefit of using alert priority as a processing criteria is enormous. For example: if you just select all classes of the SQL management pack to send email notifications to the database administration team, the potential for them to ignore all email notifications is high; however, if alert priority is implemented correctly, you enhance the user experience by only sending notifications that the application team actually want to receive. This is a win-win situation.

 

If you have any best practices suggestions you’d like me to write about, let me know in the comments section or send me an email. Thanks for your readership 🙂

 

.

Best Practices–Logging Script Events

Scripts are a part of monitoring, and those scripts sometimes may fail for any number of reasons. When a monitoring script fails, it is essential to capture those failures. In this post, I will briefly go over two types of events (exception and debug)that every management pack developer should consider for script-based modules.

Exception Event

These types of events (or exceptions) are usually generated when a resource is not accessible for some reason; cannot authenticate to resource, cannot connect to resource, resource does not exist, etc. In this case, the script cannot continue as expected.

Capturing script event context and generating a meaningful alert when there is an exception will enable an operator to more quickly understand a problem without having to put on a developer hat. Not capturing exceptions may result in a situation where a monitor may not be working and nobody ever knows about it, at least until a catastrophic failure occurs and everyone is asking why the monitoring tool did not catch it.

Debug Event

Debug events can optionally be handled within a script, and can provide useful information to a monitoring administrator while tracing a script-based workflow. These events can be logged anywhere within the script that makes sense to you. It should read like a short story to an administrator (sequentially), so anyone can follow what’s happening in the script when viewing the sequence of events in the Operations Manager log.

Debug events typically will not generate an alert, and writing debug events should be an optional setting and disabled by default.

Writing Events

Writing events to the Operations Manager log using the LogScriptEvent method is described here.

Example:

Dim oAPI, oBag 
Set oAPI = CreateObject("MOM.ScriptAPI")
Call oAPI.LogScriptEvent("YourScriptName.vbs",101,1,"Something bad happened!")

Here is an example of a Powershell function that writes events and includes some debugging logic:

function WriteToEventLog ($id, $level, $param0, $message) {
if ($debugFlag) {
$momapi.LogScriptEvent($param0,$id,$level,$message)
} elseif ($level -ne 4) {
$momapi.LogScriptEvent($param0,$id,$level,$message)
}
}

In the above example, Param0 is a common placeholder for the script name, but it can be anything that makes sense to you or an operator.

Taking this a step further, consider also implementing a try-catch where you think there is potential for an exception in the script, like a problem connecting to a resource. This is an excellent way to provide additional context in the event log, and optionally (ideally) bubble up into an alert in the console.

This example uses the Write-EventLog cmdlet, which is described here.

Example:

try { 
#...do something...
} catch [system.exception] {
$message = $_.Exception
Write-EventLog –logname Application –source YourSource –eventID 101 –entrytype Error –message $message
}

Best Practices – Display Names

Adding to the best practices category, I will discuss the entity display name.

Note: Check back if you find yourself thinking about this stuff later, because I may update these pages at any time!

 

Entity Display Name

 

The first time I realized entity display name was so important was when I was authoring a monitoring pack some time ago, and while testing some monitors I noticed the alerts had a source of the class system name. I wrote about my experience in a previous article.

I learned a valuable lesson by not assigning a value to entity display name, and it got me thinking about so many other monitoring packs that have the same issue – and some of these packs are from Microsoft, by the way.

The biggest offenders to me are the Windows Operating System packs. There are many monitors in those packs that target types with entity display names of something like "Microsoft Windows Operating System XXX".

Ever notice all the Windows alerts in the Operations console that have a source of "Microsoft Windows Operating System XXX?" When a Windows server is low on free memory, is it useful in any way to draw focus of the operator to the version Windows? I don’t think so – and the source column is the all important default focus column in the console.

The point here is, ALWAYS assign computer name (PrincipalName is preferable in most cases) to entity display name whenever possible – and this is accomplished in the discovery of that object, not in the monitor. I’m not sure why Microsoft and some others haven’t fixed this problem, but I have a hunch it has something to do with versioning – in other words, it was baked into the pack from the beginning, and changing this would break upgrade compatibility for future packs.

I say "whenever possible" because sometimes it’s really not very possible. But 99% of the time, it is. The only time it can be a real challenge, is when we’re dealing with distributed applications, client side monitoring (perspectives) or a connector-based management pack – even still there are ways, but I won’t get into that here.

 

The nuts and bolts of a good system entity display name looks like this.

 

Define entity display name for the class type (not required)

<DiscoveryTypes>
  <
DiscoveryClass TypeID="YourPack.YourClass">
    <
Property TypeID="System!System.Entity" PropertyID="DisplayName"/>
  </
DiscoveryClass>
</
DiscoveryTypes>

 
Set entity display name property in a registry discovery (suggesting PrincipalName)

<InstanceSettings>
  <
Settings>
    <
Setting>
      <
Name>$MPElement[Name="System!System.Entity"]/DisplayName$</Name>
      <
Value>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</Value>
    </
Setting>
  </
Settings>
</
InstanceSettings>

 
Set entity display name property in a script (programmatically passing a name)

call oInst.AddProperty("$MPElement[Name=’System!System.Entity’]/DisplayName$", PrincipalName)

 
Set entity display name property in a script (auto-assign a name – netbios example)

oInstance.AddProperty "$MPElement[Name=’System!System.Entity’]/DisplayName$", "$Target/Property[Type=’Windows!Microsoft.Windows.Computer’]/NetbiosComputerName$"

 

Moving on to other types of display names…

 

Regular Display Names

 

Other types of display names are pretty straight forward throughout the pack – these are just user friendly names your operators and administrators see in the UI. These display names do effect user experience, and are just as important as the entity display name.

Be sure you are familiar with namespaces before getting into display names, because namespace is equally important.

Display names are pretty easy to understand. It’s a short snippet in the pack that assigns a user friendly name to any element. It looks like this:

 

<DisplayString ElementID="YourPack.YourElement">
  <Name>I'm a display name!</Name>
  <Description>I'm a description of this thing!</Description>
</DisplayString>

 

It’s simple. In the LanguagePacks section of the pack (or fragment), just add a few lines to give a user friendly name to the elements that you’ve created – right? Don’t forget a description, because that’s always helpful for others that may be looking at your stuff – don’t pull a "superwall" (inside joke).

For the name – be succinct. 2 to 5 words to briefly describe the discovery, rule, or monitor.

For the description – go to town on what that particular workflow does. This is an opportunity to document what the workflow does, and it’s always good to be verbose here.

At a minimum, ALWAYS provide a display string for classes, discoveries, rules, and monitors. Classes – be minimalistic with 3 or less words, if possible. Pay attention to the fact that classes in other packs might have the same name, so differentiate your class display name from those other classes. An example of a poor name is "Logical Disk", because this name is already assigned to a base class in the Microsoft packs.

Other elements, like relationships, data sources, and monitor types – these things aren’t quite as visible to most operators or administrators. Although it is still imperative to user proper namespace, missing a display name on these elements probably will not degrade usability much, if at all.

 

This is an open forum – your comments are welcome. I do not check or moderate comments (I support freedom of speech).

If you have a particular subject you’d like me to talk about, or to call me out on anything I write, send me a message at https://projects.scomskills.com/messenger/suggestion.

 

😉

Best Practices – Targeting Introduction

UPDATED 03/15/2015

Targets need to be carefully selected for everything from overrides, to creating new monitors and discoveries. In my experience, this is one of the most difficult concepts to grasp. A crash course in object-oriented programming (OOP) concepts is a great primer to understanding the service model framework in SCOM.

Understanding classes and how inheritance works is essential in selecting or creating targets. Every class may have one or more instances, and each class may be typed (or specialized). When an class is typed, it can be considered a concrete class in SCOM. Concrete classes are typically the best target for monitoring.

For example, Windows Computer is not considered a concrete class, and in most cases should not be targeted for monitoring. See this article for an example of how the Windows Computer class is typed.

Windows Computer is typed because there are several different versions, and each version may have different or additional monitoring requirements. Typing allows the developer to easily implement these monitoring differences, and also allows the administrator to select the correct type for new monitoring or overrides.

 

Here are some general rules for targeting:

  • Target the most derived class (typed class) for new rules and monitors
  • Avoid targeting Windows Computer for monitors* (see below)
  • Target new rules to Windows Computer only if it applies to all versions of Windows
  • Avoid creating new rules or monitors in a disabled state, only to enable for a group* (see below)
  • If there is not a good target a new monitor, then create a new class and discovery

 

*The first best practice in this series:

Avoid creating monitors in a disabled state and target Windows Computer. There are very specific situations that call for any monitor to be disabled by default. Not having a valid target is not a reason – this indicates the need to create a new class.

Selecting Window Computer for new monitors will distribute that pack to every Windows instance in the environment. This in itself is not a detrimental problem – but if the monitor is disabled by default, the result is an empty white circle in the Operations Console and this is confusing to operators.

Imagine if this was a regular practice – Health Explorer could eventually turn into a list of empty white circles and monitors that do not apply to that instance.

Understanding targeting is a preface to service models, which is the framework for application health rollup. Omitting the service model will inhibit the ability to create useful dashboards in the future.

If you have a particular subject you’d like me to talk about, or to call me out on anything I write, send me a message at https://projects.scomskills.com/messenger/suggestion.