Best Practices–Logging Script Events

February 28, 2014 Jonathan Almquist Leave a comment

Scripts are a part of monitoring, and those scripts sometimes may fail for any number of reasons. When a monitoring script fails, it is essential to capture those failures. In this post, I will briefly go over two types of events (exception and debug)that every management pack developer should consider for script-based modules.

Exception Event

These types of events (or exceptions) are usually generated when a resource is not accessible for some reason; cannot authenticate to resource, cannot connect to resource, resource does not exist, etc. In this case, the script cannot continue as expected.

Capturing script event context and generating a meaningful alert when there is an exception will enable an operator to more quickly understand a problem without having to put on a developer hat. Not capturing exceptions may result in a situation where a monitor may not be working and nobody ever knows about it, at least until a catastrophic failure occurs and everyone is asking why the monitoring tool did not catch it.

Debug Event

Debug events can optionally be handled within a script, and can provide useful information to a monitoring administrator while tracing a script-based workflow. These events can be logged anywhere within the script that makes sense to you. It should read like a short story to an administrator (sequentially), so anyone can follow what’s happening in the script when viewing the sequence of events in the Operations Manager log.

Debug events typically will not generate an alert, and writing debug events should be an optional setting and disabled by default.

Writing Events

Writing events to the Operations Manager log using the LogScriptEvent method is described here.

Example:

Dim oAPI, oBag 
Set oAPI = CreateObject("MOM.ScriptAPI") 
Call oAPI.LogScriptEvent("YourScriptName.vbs",101,1,"Something bad happened!")

Here is an example of a Powershell function that writes events and includes some debugging logic:

function WriteToEventLog ($id, $level, $param0, $message) {
    if ($debugFlag) {
        $momapi.LogScriptEvent($param0,$id,$level,$message)
    } elseif ($level -ne 4) {
        $momapi.LogScriptEvent($param0,$id,$level,$message)
    }
}

In the above example, Param0 is a common placeholder for the script name, but it can be anything that makes sense to you or an operator.

Taking this a step further, consider also implementing a try-catch where you think there is potential for an exception in the script, like a problem connecting to a resource. This is an excellent way to provide additional context in the event log, and optionally (ideally) bubble up into an alert in the console.

This example uses the Write-EventLog cmdlet, which is described here.

Example:

try { 
    #...do something... 
} catch [system.exception] {
    $message = $_.Exception 
    Write-EventLog –logname Application –source YourSource –eventID 101 –entrytype Error –message $message
}

TSQL

Grey Agents With Reason (gray agents)

February 24, 2014 Jonathan Almquist 2 Comments

A few years ago I wrote some TSQL to return all grey agents with the reason code. This worked fine in SCOM 2007, but it doesn’t work in 2012 environments for some reason. I basically just modified the WHERE clause, removing a bunch of SELECT statements – I’m not sure why I added those additional SELECT statements, but there must have been a reason.

I am reposting the TSQL here, updated for SCOM 2012. There was also a small bug fixed with the outage days column – my initial query did not use UTC time in the DATEDIFF calculation, which would cause a negative value for newly grey agents and was off n hours depending on your local time zone.

/*
Gray agents with reason
Jonathan Almquist (jonathan@scomskills.com)
Updated 02-24-2014
*/
USE OperationsManagerDW
SELECT
    ME.Path,
    HSO.StartDateTime AS OutageStartDateTime,
    DATEDIFF (DD, HSO.StartDateTime, GETUTCDATE()) AS OutageDays,
    HSO.ReasonCode,
    DS.Name AS ReasonString
FROM  vManagedEntity AS ME INNER JOIN
    vHealthServiceOutage AS HSO ON HSO.ManagedEntityRowId = ME.ManagedEntityRowId INNER JOIN
    vStringResource AS SR ON HSO.ReasonCode = 
    REPLACE(LEFT(SR.StringResourceSystemName, LEN(SR.StringResourceSystemName)
    - CHARINDEX('.', REVERSE(SR.StringResourceSystemName))), 
    'System.Availability.StateData.Reasons.', '') INNER JOIN
    vDisplayString AS DS ON DS.ElementGuid = SR.StringResourceGuid
WHERE (HSO.EndDateTime IS NULL)
    AND (SR.StringResourceSystemName LIKE 'System.Availability.StateData.Reasons.[0-9]%')
    AND DS.LanguageCode = 'ENU'
ORDER BY OutageStartDateTime

:)

Administrative Tasks, TSQL

Agent Management–List Primary and Failover Configuration

January 9, 2014 Jonathan Almquist 3 Comments

Something I don’t like about using the SDK (powershell) to manage agents, are the get* member cmdlet’s to return information – large scale queries take too long! The SDK is typically pretty slow in this regard, and that’s a shame because I find myself writing TSQL to accomplish tasks that the SDK should be able to promptly handle.

Recently I wrote some TSQL that will return all agents with their associated primary and failover management servers. This is very informative when the question "where does this agent failover to", and it’s a speedy way to implement some sort of automation process to expedite agent assignment.

Here you go!

SELECT rgv.SourceObjectPath AS [Agent], rgv.TargetObjectPath AS [ManagementServer], 
       CASE
              WHEN rtv.DisplayName = 'Health Service Communication' THEN 'Primary'
              ELSE 'Failover'
       END AS [Type]
FROM ManagedTypeView mt INNER JOIN
       ManagedEntityGenericView AS meg ON meg.MonitoringClassId = mt.Id INNER JOIN
       RelationshipGenericView rgv ON rgv.SourceObjectId = meg.Id INNER JOIN
       RelationshipTypeView rtv ON rtv.Id = rgv.RelationshipId
WHERE mt.Name = 'Microsoft.SystemCenter.Agent' AND
       rtv.Name like 'Microsoft.SystemCenter.HealthService%Communication' AND
       rgv.IsDeleted = 0
ORDER BY rgv.SourceObjectPath ASC, rtv.DisplayName ASC

Something like this would take several minutes in small to medium sized environments, and maybe upwards of 15-30 minutes in larger environments. This little bit of TSQL returns in 1-2 seconds. Eat that!

🙂

Authoring

New MatchCount Configuration in SCOM 2012

January 8, 2014 Jonathan Almquist 7 Comments

I’ve been meaning to write about this for a while, because I was thrilled when I found this new configuration element in the expression filter module when SCOM 2012 hit the press.

For reference, here are the differences in the expression filters:

SCOM 2007: http://msdn.microsoft.com/en-us/library/ee692962.aspx

SCOM 2012: http://msdn.microsoft.com/en-us/library/jj129836.aspx

Previously, the System.ExpressionFilter did not include suppression – today it does!

What this means is, we can now count the number of passes through a condition detection, and it will only pass data to the next module when the MatchCount value exceeds the configuration provided.

It doesn’t sound like a big deal really – but it is. I’ve had cases where I needed to count condition passes, and the only way to do it before was to include a consolidation module. This was not fun and it turned out to be a lot more work than was necessary – and it was confusing to the customer when they looked at the code.

What I do not like so much is the fact that Microsoft doesn’t expose this new configuration in their base monitoring at this time. For example, it’s not possible to override the match count for a service monitor that you created using the service monitoring template – or even interval for that matter. To me, it doesn’t make sense to introduce a new configuration element without providing a way to override it – especially a valuable configuration such as this.

The default monitoring for Windows services (at this time) is to sample every 30 seconds and exceed a match count of 2. This equates to a state change within 60 seconds of service downtime.

What I am providing here is a Windows service monitoring VSAE fragment that will allow you to override both the interval as well as the match count. I’ve also included an additional state value to account for service not found conditions. I added this condition because sometimes a pack needs to take into account upgrade scenarios where a service name changes – you don’t want an alert on a service that had been renamed due to an upgrade!

By the way, MatchCount has nothing to do with service monitoring – it’s a part of the expression filter, and can be used anywhere. This is just a working example of how you can use it in a custom service monitor type.

Here you go!

<ManagementPackFragment SchemaVersion="2.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <TypeDefinitions>
    <MonitorTypes>
      <UnitMonitorType ID="Example.CustomeModuleLibrary.MonitorType.CheckServiceState" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="MTS_Running" />
          <MonitorTypeState ID="MTS_NotRunning" />
        </MonitorTypeStates>
        <Configuration>
          <xsd:element name="ComputerName" type="xsd:string" />
          <xsd:element name="ServiceName" type="xsd:string" />
          <xsd:element name="IntervalSeconds" type="xsd:integer" />
          <xsd:element name="MatchCount" type="xsd:integer" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
          <OverrideableParameter ID="MatchCount" Selector="$Config/MatchCount$" ParameterType="int" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.Win32ServiceInformationProvider">
              <ComputerName>$Config/ComputerName$</ComputerName>
              <ServiceName>$Config/ServiceName$</ServiceName>
              <Frequency>$Config/IntervalSeconds$</Frequency>
            </DataSource>
            <ProbeAction ID="Probe" TypeID="Windows!Microsoft.Windows.Win32ServiceInformationProbe">
              <ComputerName>$Config/ComputerName$</ComputerName>
              <ServiceName>$Config/ServiceName$</ServiceName>
            </ProbeAction>
            <ConditionDetection ID="CD_ServiceRunning" TypeID="System!System.ExpressionFilter">
              <Expression>
                <RegExExpression>
                  <ValueExpression>
                    <XPathQuery Type="Integer">Property[@Name='State']</XPathQuery>
                  </ValueExpression>
                  <Operator>MatchesRegularExpression</Operator>
                  <Pattern>^(4|8)$</Pattern> 
                </RegExExpression>
              </Expression>
            </ConditionDetection>
            <ConditionDetection ID="CD_ServiceNotRunning" TypeID="System!System.ExpressionFilter">
              <Expression>
                <RegExExpression>
                  <ValueExpression>
                    <XPathQuery Type="Integer">Property[@Name='State']</XPathQuery>
                  </ValueExpression>
                  <Operator>DoesNotMatchRegularExpression</Operator>
                  <Pattern>^(4|8)$</Pattern>
                </RegExExpression>
              </Expression>
              <SuppressionSettings>
                <MatchCount>$Config/MatchCount$</MatchCount>
              </SuppressionSettings>
            </ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="MTS_Running">
              <Node ID="CD_ServiceRunning">
                <Node ID="DS" />
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="MTS_NotRunning">
              <Node ID="CD_ServiceNotRunning">
                <Node ID="DS" />
              </Node>
            </RegularDetection>
          </RegularDetections>
          <OnDemandDetections>
            <OnDemandDetection MonitorTypeStateID="MTS_Running">
              <Node ID="CD_ServiceRunning">
                <Node ID="Probe" />
              </Node>
            </OnDemandDetection>
            <OnDemandDetection MonitorTypeStateID="MTS_NotRunning">
              <Node ID="CD_ServiceNotRunning">
                <Node ID="Probe" />
              </Node>
            </OnDemandDetection>
          </OnDemandDetections>
        </MonitorImplementation>
      </UnitMonitorType>
    </MonitorTypes>
  </TypeDefinitions>
  <LanguagePacks>
    <LanguagePack ID="ENU" IsDefault="true">
      <DisplayStrings>
        <DisplayString ElementID="Example.CustomeModuleLibrary.MonitorType.CheckServiceState" SubElementID="IntervalSeconds">
          <Name>Interval (seconds)</Name>
          <Description>Check service state interval.</Description>
        </DisplayString>
        <DisplayString ElementID="Example.CustomeModuleLibrary.MonitorType.CheckServiceState" SubElementID="MatchCount">
          <Name>Match Count</Name>
          <Description>Number of intervals service is not running before changing monitor state.</Description>
        </DisplayString>
      </DisplayStrings>
    </LanguagePack>
  </LanguagePacks>
</ManagementPackFragment>

Now you can implement new unit monitors that use this monitor type, and extend to your operators the ability to override interval and match count. You might want to replace "Example" with your company name before implementing in your library.

🙂

Authoring, Reporting, VSAE Fragment

Report Fragment–Visual Studio Authoring Extensions

January 7, 2014 Jonathan Almquist 4 Comments

Developing reports in SCOM is quite a bit different than developing any type of monitoring workflow. You really need to ramp up your skills on a couple different tools and languages to become a good report developer.

In this post, I will cover a typical VSAE fragment that provides for deploying the report and stored procedure files – of course, the report files are deployed to the report server and the stored procedure is installed on the data warehouse.

This post covers the fragment essentials – it does not get into report or stored procedure development. It is intended to be a quick reference for those developers out there to quickly plug in the necessary elements to push the rdl and sql resource files in their management pack.

At the end, I will provide some essential elements that need to be included in your sql file that will satisfy "install", "upgrade", and "uninstall" functionality, as well as set the right execution permissions that will allow the data reader account to run the report in a generic environment.

In this example, there is a main report and a detail report. The detail report may be launched by clicking on an element in the main report – consider this a linked report.

Here’s the entire fragment

<ManagementPackFragment SchemaVersion="2.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Reporting>
    <DataWarehouseScripts>
      <DataWarehouseScript ID="MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script" Accessibility="Public">
        <InstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install</InstallScript>
        <UninstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall</UninstallScript>
        <UpgradeScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade</UpgradeScript>
      </DataWarehouseScript>
      <DataWarehouseScript ID="MyReports.Deploy.MyReportAvailabilityDataGet.Script" Accessibility="Public">
        <InstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Install</InstallScript>
        <UninstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Uninstall</UninstallScript>
        <UpgradeScript>Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Upgrade</UpgradeScript>
      </DataWarehouseScript>
    </DataWarehouseScripts>
    <Reports>
      <Report ID="MyReports.Availability.Main" Accessibility="Public" Visible="true">
        <Dependencies>
          <DataWarehouseScript>MyReports.Deploy.MyReportAvailabilityDataGet.Script</DataWarehouseScript>
        </Dependencies>
        <ReportDefinition>Res.MyReports.Availability.Main</ReportDefinition>
      </Report>
      <Report ID="MyReports.Availability.Detail" Accessibility="Public" Visible="true">
        <Dependencies>
          <DataWarehouseScript>MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script</DataWarehouseScript>
          <Report>MyReports.Availability.Main</Report>
        </Dependencies>
        <ReportDefinition>Res.MyReports.Availability.Detail</ReportDefinition>
      </Report>
    </Reports>
  </Reporting>
  <LanguagePacks>
    <LanguagePack ID="ENU" IsDefault="true">
      <DisplayStrings>
        <DisplayString ElementID="MyReports">
          <Name>MyReports</Name>
          <Description>This management pack contains all data warehouse and reporting elements for custom MyReports.</Description>
        </DisplayString>
        <DisplayString ElementID="MyReports.Availability.Main">
          <Name>Availability Main</Name>
          <Description>Availability MyReport Main</Description>
        </DisplayString>
        <DisplayString ElementID="MyReports.Availability.Detail">
          <Name>Availability Detail</Name>
          <Description>Availability MyReport Detail</Description>
        </DisplayString>
        <DisplayString ElementID="MyReports.Deploy.MyReportAvailabilityDataGet.Script">
          <Name>Deploy MyReport Availability Data Get Script</Name>
        </DisplayString>
        <DisplayString ElementID="MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script">
          <Name>Deploy MyReport Availability Data Detail Get Script</Name>
        </DisplayString>
      </DisplayStrings>
    </LanguagePack>
  </LanguagePacks>
  <Resources>
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Install" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Install.sql" HasNullStream="false" />
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Uninstall" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Uninstall.sql" HasNullStream="false" />
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Upgrade" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Upgrade.sql" HasNullStream="false" />
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install.sql" HasNullStream="false" />
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall.sql" HasNullStream="false" />
    <Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade.sql" HasNullStream="false" />
    <ReportResource ID="Res.MyReports.Availability.Detail" Accessibility="Public" FileName="Res.MyReports.Availability.Detail.rdl" HasNullStream="false" MIMEType="application/octet-stream" />
    <ReportResource ID="Res.MyReports.Availability.Main" Accessibility="Public" FileName="Res.MyReports.Availability.Main.rdl" HasNullStream="false" MIMEType="application/octet-stream" />
  </Resources>
</ManagementPackFragment>

Let’s break it down.

Data Warehouse Scripts

<DataWarehouseScript ID="MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script" Accessibility="Public">
  <InstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install</InstallScript>
  <UninstallScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall</UninstallScript>
  <UpgradeScript>Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade</UpgradeScript>
</DataWarehouseScript>

This is where the resource file pointers are defined – your sql stored procedures. You will need each version of the stored procedure to install, upgrade, and uninstall the stored procedure. These pointers reference the actual sql file resource later.

Note: I have yet to see the uninstall work (I think this is a bug in the sdk, but I won’t go there now).

Reports

<Report ID="MyReports.Availability.Main" Accessibility="Public" Visible="true">
  <Dependencies>
    <DataWarehouseScript>MyReports.Deploy.MyReportAvailabilityDataGet.Script</DataWarehouseScript>
  </Dependencies>
  <ReportDefinition>Res.MyReports.Availability.Main</ReportDefinition>
</Report>

This section defines the report id, the report dependencies, and the report definition resource (this points to the actual rdl file later).

Resource Files (skipping language packs, as we know what that’s for)

<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Install" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Install.sql" HasNullStream="false" />
<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Uninstall" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Uninstall.sql" HasNullStream="false" />
<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Upgrade" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataGet.Script.Upgrade.sql" HasNullStream="false" />
<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Install.sql" HasNullStream="false" />
<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Uninstall.sql" HasNullStream="false" />
<Resource ID="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade" Accessibility="Public" FileName="Res.MyReports.Deploy.MyReportAvailabilityDataDetailGet.Script.Upgrade.sql" HasNullStream="false" />
<ReportResource ID="Res.MyReports.Availability.Detail" Accessibility="Public" FileName="Res.MyReports.Availability.Detail.rdl" HasNullStream="false" MIMEType="application/octet-stream" />
<ReportResource ID="Res.MyReports.Availability.Main" Accessibility="Public" FileName="Res.MyReports.Availability.Main.rdl" HasNullStream="false" MIMEType="application/octet-stream" />

This section ties the resource id’s from above to actual physical files you will include in your solution – these are the .rdl and .sql files.

Stored Procedure Necessities

I mentioned earlier that I would discuss a couple things you will need in your stored procedure. Namely, you will need to specify the action of the procedure (install, upgrade, uninstall) and you need to assign permissions to execute the procedure (otherwise it will not work and you’ll get errors).

MSDN has a mediocre description for deploying stored procedures, but it falls short with real world examples. So I’ll give an example of each here.

Install

Basically, you need to first create the procedure, and then alter the procedure as follows. Don’t worry about the parameter declarations – it’s just an example in case you have them.

IF NOT EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'MyReport_AvailabilityDataGet')
BEGIN
EXECUTE ('CREATE PROCEDURE dbo.MyReport_AvailabilityDataGet AS RETURN 1')
END
GO

ALTER PROCEDURE dbo.MyReport_AvailabilityDataGet
    @StartDate datetime,
    @EndDate datetime,
    @ObjectList xml,
    @MonitorName nvarchar(256),
    @DataAggregation tinyint = 0,
    @LanguageCode varchar(3) = 'ENU'
AS
BEGIN
SET NOCOUNT ON

{your procedure goes here}

Upgrade

The only thing that needs to be changed for the upgrade procedure is to remove the entire first section of the install procedure (the first 5 lines). Everything else stays the same – just start your procedure with the ALTER PROCUDURE section.

Uninstall

This is actually very simple – just perform a drop.

IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'MyReport_AvailabilityDataGet')
BEGIN
        DROP PROCEDURE dbo.[MyReport_AvailabilityDataGet]
END
GO

Assign Permissions to the DataReader account

This is required at the end of both the install and upgrade scripts. The OpsMgrReader account is a standard role that is created during setup, so unless you have some custom configuration in your environment, this will work for you.

GRANT EXECUTE ON MyReport_AvailabilityDataGet TO OpsMgrReader
GO

And that’s about it. Now go write some reports and deploy them with your management pack, like a pro!