Xplat Xperts

  • Home
  • Archives
  • Subscribe

Writing Management Packs for SCOM: Adding your own Monitors

In the first part of this series we got our toolset setup so that we can write our own MP, created a root folder and added a state view to duplicate the Hyper-V Server Role state view. 

Now it's time to start adding our own extensions to the base Hyper-V MP, keep in mind that what I'm doing here will work with any management pack, including the BridgeWays MPs where we may be exposing data but only consuming it through rules and you want to add a monitor.

So open up the Authoring Console and goto the Health Model frame.  You'll see that we have the option to add Discoveries (we don't need these right now because the base Hyper-V MP is doing this for us), Monitors, Rules, Service Levels, etc.

Hyper-V Monitors Frame

Let's create a new monitor:

  1. Right-Click the view frame and select New->Windows Performance->Static Threshold->Average Threshold to create a new monitor that will take the average of a series of polls and determine Health/Error state from that average value
  2. Provide a new name, I'm going to build an alert for the number of Virtual Machines being reported as in a critical state, so my name is VMsCritical
  3. Give the monitor a display name "Virtual Machines are Critical"
  4. Specify the target by clicking the ellipses, selecting list and finding hyper.  I set the target to Microsoft.Windows.HyperV.ServerRole so that the monitor will run against any Hyper-V Server
  5. Set the Parent Monitor to be System.Health.PerformanceState.  This is because I feel VMs in a critical state is a resource issue causing bad performance, it's not necessarily an Availability issue because the VMs are running.
  6. Click Next
  7. Now, we set the object we're going to monitor.  You can either hit the Select button and search for the appropriate perfmon counter through the dialog that pops up or enter the data directly.  For this counter it is:  Object = Hyper-V Virtual Machine Health Summary, Counter = Health Critical, Instance is blank.
  8. Set an appropriate interval for the monitor, 5 minutes is the lowest you should go.
  9. Click Next
  10. Set our threshold, I use 1 because I want to know if even a single VM is critical
  11. Set our number of samples, I use 2 if a VM goes critical and then clears the next poll... I'm fine with that, the problem was a blip.
  12. Click finish and we're almost done.

We've created the base monitor, now we just need to do a little tweaking to make it be everything we want it to be.  To finish things off, select the monitor and click properties.

Hyper-V VMs Critical 

  1. Goto to the Health Tab and change the Over Threshold Health state to be Critical, I do this because I view VMS being critical as a Critical alarm.
  2. Click on the Alerting tab and select Generate alerts for this monitor, you can also tweak the priority and severity which helps you sort alerts in custom alert views where you may only want to see Critical alerts shown.
  3. Click on the Options tab and change Accessibility to "Public" so that you can make future MPs that extend this MP and people can add Diagnostic and Recovery tasks to the monitor without having to be part of this MP directly.
  4. Click on Product Knowledge, Edit and this will launch Word, if you have it and visual studio installed, so you can create a KB article. 
  5. Goto the word document that launched and add some knowledge base information about the monitor, why it would fire, and how to fix it.

Now that we've done that, save your MP.  Import the new version into SCOM and see how the monitor shows up under the Performance

Hyper-V Health Model

Repeat this process to add any additional monitors you may wish to have.

Part 1: Getting Started with writing a more robust Hyper-V MP
Part 2: Adding your own Monitors
Part 3: Adding Rules and Performance Views
Part 4: Adding Dashboards

Posted on February 12, 2010 at 04:02 PM in Operations Manager, Technology | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Writing Management Packs for SCOM: A more robust Hyper-V MP

Have you ever been looking at a management pack and wishing it was collecting just a little more information for you?  You're not the only one, there are quite a few people who write MPs from scratch to support applications they are running, but there are more who are still a little leery of going down the path of MP Authoring.  In the next series of posts I'm going to walk through extending the base Hyper-V management pack so that it provides more information around how healthy the servers really are. 

Tools of the trade:

  1. System Center Operations Manager 2007 R2 Authoring Console: I'm going to focus on writing the MP using the Authroing console as opposed ot raw XML, it's a good tool to use when you want to build out quick Windows based monitors, validate the XML schema, and see all the views you have created. http://www.microsoft.com/downloads/details.aspx?FamilyID=9104af8b-ff87-45a1-81cd-b73e6f6b51f0&displaylang=en
  2. XML editor: trying to make things like Dashboards. Performance views and a few other components still require that you write raw XML, so you'll need an editor which can be nothing more than notepad if you wish, I use Visual Studio
  3. Authoring Guide: keep it handy so you can make reference to it when some of the XML layout makes you want to scream. http://technet.microsoft.com/en-us/opsmgr/bb498235.aspx
  4. Install the SCOM console on the system you will be doing the authoring on
  5. To make writting KB articles a bit easier, install Visual Studio and Word on the system you will be using to author the MPs.

Now let's run through the steps to get things setup:

  1. Get a copy of the MP you want to extend, you do this by downloading the manual install package as opposed to installing the MP via the SCOM catalog.  For Hyper-V you can get the installer at: http://www.microsoft.com/DownLoads/details.aspx?familyid=502E7A26-2FEA-4052-89FD-8F75142DE4F2&displaylang=en
  2. Run the installer and extract the sealed MPs to a local path like c:\authoring\dependencies
  3. Launch the Authoring Console
  4. Select File->New.. Empty Management Pack
  5. Give your MP a name, I used BridgeWays.Windows.HyperV
  6. Give your new MP a Display Name, such as "BridgeWays Windows Hyper-V"
  7. Create

You now have an unsaved shell Management Pack that does nothing.  So let's set it up to extend the core Hyper-V MP:

  1. Click Tools->Options...
  2. Go to the references tab and add in the System Center directory and the directory where you put the Hyper-V MP.

    References
  3. Next, go File->Management Pack Properties and go to the References tab
  4. Click Add Reference
  5. Select the MPs that you want to extend and use, I selected the Hyper-V MPs
    MPs

Now we're ready to start adding our rules and monitors.

Let's start by checking to see if we can extend the MP directly, ie add our own folders and views to the Hyper-V folder in the SCOM Monitoring hierarchy.  This is determined by whether or not the root folder is flagged as 'internal' or 'public'.

  1. In the Authoring Console, click the Presentation tab.
  2. Right-Click the Microsoft.Windows.HyperV.RootFolder and select New->Folder
  3. Give your new folder an object name such as BridgeWays.Windows.HyperV.Performance and a display name like Performance
  4. Click the Folder tab and make sure the selected folder is Microsoft.Windows.HyperV.RootFolder
  5. Click OK

RootFolder

No luck with the Hyper-V MP, the root folder is internal and it doesn't have any other folders so in this case we'll have to build out a parallel MP that has our new information in it.  For Hyper-V it's not a big deal since there are only the 3 state views, for an MP like SQL Server, luck is on our side the root folder is public so we can extend that MP directly.

Another way to do this is to use the MP Viewer tool that you can get from here http://blogs.msdn.com/boris_yanushpolsky/archive/2008/01/31/mpviewer-1-3.aspx 

To create the new folder, simply follow the previous steps, but give your folder a more precise name like "BridgeWays Windows Hyper-V" and set the parent folder to be Microsoft.SystemCenter.Monitoring.ViewFolder.Root .  Now we have a new root folder for any views we want to create.

Now let's create a duplicate state view within our new folder.  This state view will simply duplicate the same information as the Hyper-V MP shows in the Server Role View.  I'm doing this because I know I'll be adding a number of monitors as well as performance views and I don't want to have to switch from one folder to another when I'm watching the environment.  The Hyper-V MP only has 3 state views so it's not a big deal to duplicate them.

  1. Right click on the new folder (BridgeWays.Windows.HyperV.Root in my case) and select New->State View
  2. Give it a new ID postfix, such as ServerRole.State
  3. Give it a Display name, such as "Server Role State"
  4. Click the ellipses by Target and select the target we want to show in the state view, in this case Microsoft.Windows.HyperV.ServerRole
  5. Click Finish

That's it, we've just duplicated the Server Role View from the Hyper-V MP to our new custom MP.  The beauty of this is we don't have to redo the discoveries or anything because the data is already being populated by the core MP.

For the next post, we'll look at adding our own monitors to the discovered servers.

Part 1: Getting Started with writing a more robust Hyper-V MP
Part 2: Adding your own Monitors
Part 3: Adding Rules and Performance Views
Part 4: Adding Dashboards

Posted on February 04, 2010 at 03:43 PM in Operations Manager, Technology | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

SCOM and Distributed Applications: Building Performance Views


  With the last post I showed you how to build a Dashboard using some of the basic view types, now let's go into building performance views.  With performance views you need to know a bit more about the MPs you are using with your distributed applications because you need to know the rule names that you want to see and what object they are associated with.

For anyone not familiar with the terminology, SCOM makes use of Rules to gather performance data like CPU usage, Memory usage, etc.  The data collected is then typically stored in the Data Warehouse so that you can build reports using the information.  Monitors are a different entity, and those are what check the values fo specific metrics, compare those values to defined thresholds and raise alerts if the thresholds are exceeded (or not met depending on the type of comparison being made).  State Views are used to expose the monitors, Performance Views are used to expose the rules and Dashboards are used to pull them all together.

The next question you may have is... why bother building custom performance views when you can simply reference a performance view directly in the tree?  There are two reasons for this:

  1. If the performance view you want is in a dashboard and not a single view, you can't directly reference it.
  2. If you reference an existing performance view, you can't scope it to show only the objects that are part of the distributed application and any changes to the visible data made is reflected on both views. 

So when it comes to distributed applications, it's best to specifically create any performance views you want so that you can have full control over them.

We're now ready to go ahead and create a performance view, I'll do my example around building a performance view for the Buffer Cache Hit % for Oracle.

In the monitoring pane, find the metric you want to build a view for.  Using Oracle as an example,

 Oracle-DatabaseSize

  1. Look at the original MP and locate the dashboard or performance view showing the metric you want to expose.
  2. Copy down the rule name and the object name
  3. Goto your distributed application in the tree and right click the folder where you want to place the performance view
  4. Select New->Performance View
  5. Give the view a Name and Description (if you like), then click the ellipses (...) under Show data related to:
  6. Select the appropriate Object from the dialog, this will match the Object name you copied down, and you may have to select the radio button for "View all Targets" to see it.
  7. Check the "collected by specific rule" check box in Select conditions
  8. Click the blue underlined "specific" in Criteria description
  9. Select the rule name that you copied down
  10. Optionally, you can also scope it to a specific discovered instance.  This isn't absolutely necessary since you will be able to do that through the performance view later as well.
  11. Click OK

Oracle-PerfView 
We now have a new performance view and this view can be included in higher level dashboards and more importantly customized, tweaked, and consumed as a distinct part of our distributed application.

PetStore2-Size 

You now have all the tools necessary to build out customized distributed applications that tie different management packs together into logical entities.  This is something that can help simplify troubleshooting issues because it provides full context around problems that are detected.  Having a full service stack monitored as a single entity gets rid of the distractions from metrics that don't effect the service in question and you can now see an alert in one tier may be causing an alert in another.   

For the next post, we'll look at expanding the distributed application through Service Level Objectives that can be used to build out SLAs around our service.

Part 1 - Associating Components
Part 2 - Building the Service Model
Part 3 - Building Custom Views
Part 4 - Building Performance Views
Part 5 - Service Level Objectives

Posted on January 13, 2010 at 09:49 AM in Operations Manager | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

SCOM - BridgeWays Management Pack Videos: VMware ESX, JBoss AS and Oracle DB

We've recently started to build out some nice overview videos of the BridgeWays management packs, you can find the first three on YouTube and higher resolution versions will be available at www.bridgeways.ca soon.

Introduction to monitoring Oracle Databases


Introduction to monitoring VMware ESX 3.x and vSphere

Introduction to monitoring JBoss Application Servers

Posted on December 22, 2009 at 07:52 PM in BridgeWays, Operations Manager, Technology | Permalink | Comments (1) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Planning Your Operations Manager 2007 R2 Deployment

As you start to plan out the deployment of Operations Manager 2007 R2 in your organization, you will likely be faced with many questions.  How many management servers do I need?  What sort of hardware do I need for them?  How much storage do I need for the data warehouse?  These questions are important to understand (and to be able to answer) to maximize the return on your hardware investment.  You'll want to ensure you have room to grow, but at the same time you don't want to waste money by going "overboard" and end up with significant idle resources.

One key thing to look out for, is the difference between monitoring agent based Windows machines, and monitoring Unix/Linux machines.  An agent based Windows machine can manage its workflows locally, but since a Unix/Linux machine does not have a HealthService locally, the workflows will run on the management server that owns that Unix/Linux machine.  As a general guideline, a management server can handle about 5 times the number of agent based Windows machines as it can Unix/Linux machines.  A quad proc machine with 8GB of RAM, and a 4 disk RAID 10 array can handle about 2500 agent based Windows machines, but only about 500 Unix/Linux machines.

Remember that this isn't a cut and dry number.  As the hardware varies, so will the capabilities of the management server, so adjust those numbers accordingly.  These numbers assume that you are monitoring more than just the base OS (AD, Exchange, SQL, IIS, MySQL, Apache, Oracle, etc).  However, the application layers can significantly influence the number of running workflows.  If a machine has a huge Oracle deployment with many instances, hundreds of tablespaces, sessions, and processes this is going to affect the number of machines a management server can handle.  In this example, the original capacity of 500 Unix/Linux machines, may have now fallen to 250 or less.

Luckily, Microsoft has provided some guidance to help out with planning deployments. They have created an interactive document that walks you through various scenarios and outlines the required hardware for that deployment.  You can get the document here:

http://blogs.technet.com/momteam/archive/2009/08/12/operations-manager-2007-r2-sizing-helper.aspx

Posted on December 08, 2009 at 10:53 AM in Operations Manager | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Operations Manager R2 and Service Level Dashboard 2.0

The Service Level Dashboard is one of those features that can be very helpful in terms of providing a high level view on a service or environment.  For this post I'm going to point you in the right direction in terms of installing and configuring the accelerator because there are a couple of things you need to watch when setting it up and I've been seeing a fair number of people getting tripped up and loosing a lot of hours to redoing the installation.

ServiceLevelDashboard

Setting the SLD up involves a few steps, and it is important to do them in the correct order:

  1. Download the Service Level Dashboard 2.0 zip file from Microsoft: http://www.microsoft.com/downloads/details.aspx?FamilyId=1d9d709f-9628-46a8-952b-a78f5dd2bdd9&displaylang=en
  2. Install SQLServer
  3. Install SharePoint using a full SQL Server Database for the backend.  By default SharePoint installs using the Office Embedded version of SQL Server and this is not robust enough for the Dashboard server.
  4. Import the Service Level Dashboard MP in OpsMgr
  5. Run the Service Level Dashboard installer on the Sharepoint server.

Installing SQL Server and SharePoint

The main trick here is to do a complete install instead of a Stand-Alone install which will allow you to specify the database to use with your MOSS 2007 installation.

To set this part up I found a nice post by Andreas Glaser that walked through all the necessary steps to get this working with Win2K8, SQL 2008 and MOSS 2007: http://andreasglaser.net/post/2008/08/14/Installing-MOSS-2007-on-Windows-Server-2008-and-SQL-Server-2008-Part-1-Overview.aspx

Installing the Service Level Dashboard

For this side of things the User Guide included in the zip file is quite good, just make sure you run the installer as an administrator and you import the Management Pack before running the SLD installer as the MP import will create some stored procedures in the database.

Posted on September 16, 2009 at 09:50 AM in Operations Manager | Permalink | Comments (2) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

GateWays servers with Operations Manager 2007 R2

Working with Gateway servers can be pretty frustrating the first time out, you really need to read through all the documented steps before you dive in. 

The main steps to watch are:

  1. Use Microsoft.EnterpriseManagement.GatewayApprovalTool.exe before you install the gateway
  2. Generate the certifcates through your CA for each management server and the gateway server,  then import them using MOMCertImport.exe
  3. After importing the certifcates, you may need to restart the Health Services in order to pick up the new certifcates

The other thing to keep in mind is that for any managed object being handled by a Gateway server the Gateway is responsible to handling the workflows, so this means you need to:

  1. configure winrm to handle basic authentication (winrm set winrm/config/client/auth @{Basic="true"})
  2. ensure the Gateway can resolve a DNS name for the IP of the server being discovered
  3. If you are using the BidgeWays Unix/Linux deployment mechanism ensure you run the installer on the Gateway server so that the deployment packages are available for use.

The entire deployment procedure is available through technet at http://technet.microsoft.com/en-us/library/bb432149.aspx and once you get things running it works very well in terms of being able to monitor your unix/linux (and windows of course) servers running int he DMZ or across the WAN.

Posted on September 03, 2009 at 08:00 AM in BridgeWays, Operations Manager | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Bringing Unix/Linux Scripts into Operations Manager

There are a few blog posts out there that point to the Cross Platform Authoring Guide available on Technet (http://technet.microsoft.com/en-us/library/dd919155.aspx) but I didn't find any that point you to the sample MP that is available.  You can find a downloadable version of the Guide and the sample MP at  http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=19bd0eb5-7ca0-41be-8c0f-2d95fe7ec636 or by searching for the main document name which is OM2007R2_CrossPlatformMPAuthoringGuide.docx

It's a good sample of how you can discover and use scripts to get additional data into OpsMgr.

Posted on August 27, 2009 at 10:03 AM in Operations Manager | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Operations Manager and Unix/Linux Agents

Something to keep in mind when you're rolling out System Center Operations Manager R2 is how the Agents work when you're dealing with Unix and Linux servers.

UnixAgents

On the Windows side of things when you deploy an agent to the server, you are in effect deploying a Health Service that will run the workflows related to the providers that target that machine.  So if  you deploy an agent, and then discover SQL Server on the machine, most of the work in terms of gathering information from that SQL Server instance will be done on the host machine itself.  This helps OpsMgr scale out in large environments and works because the calls to gather information is generally script or WMI based.

For Unix and Linux agents, this changes.  The Unix and Linux agent is just a CIM server, it's sitting around waiting for queries to be sent from a Management Server, it looks up what class is being requested, maps that to a provider and returns the data that the provider serves up. 

The big difference here is that the Unix and Linux agent does not handle workflows, the management server must be able to handle all of the workflows being used to gather the information.  So for the Unix/Linux side of things, you need to ensure that you have enough Management Servers to handle the amount of data collected and you have properly balanced the Unix/Linux servers across your Management Servers. 

This is something to be aware of when you're planning out how you are going to pull your Unix/Linux infrastructure into OpsMgr.

Posted on August 19, 2009 at 01:32 PM in Operations Manager | Permalink | Comments (0) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

OpsMgr - Cross Platform Discovery Errors

 The key to being able to monitor a server is being able to discover that server :), until you can get the server into Operations Manager you aren't going to be able to do much with it.  While the discovery process for Unix and Linux servers seems simple enough, there is a lot going on behind the scenes that is hidden by the wizard.  In a previous entry I went over a successful discovery path (OpsMg and Cross Plat-Getting Started), for this post I'm going to go over some of the errors that can occur and how to resolve them.

The first one I'll talk about is Not Enough Entropy, this one required a little digging to figure out what was wrong.  The exact error is Failed to allocate resource of type random data: Failed to get random data - not enough entropy.

Entropy 

I've had this issue when discovering both RHEL and SLES servers and it is related to certificate generation. 

There are two ways to solve this problem, you can recreate the /dev/random file or do a manual agent install.

For both fixes, clean off the partially installed agent using the commands

  1. rpm -e scx
  2. rm -rf /etc/opt/microsoft/scx

Then if you want to make it so that discovery will work from the wizard use the commands

  1. rm /dev/random
  2. mknod -m 644 /dev/random c 1 9
  3. chown root:root /dev/random

A manual install requires copying the appropriate package from %Program Files%\System Center Operations Manager 2007\AgentManagement\UnixAgents to the Unix\Linux machine and installing it directly.

After fixing the install issue, switch the /dev/random file back to a signed random file using the commands:

  1. rm /dev/random
  2. mknod -m 644 /dev/random c 1 8
  3. chown root:root /dev/random

Next, let's look at Unspecified Problem, this is one where I am sure there is a whole gamut of reasons why it occurs.  The text is Starting Microsoft SCX CIM Server:  Unspecified Problem. 

Unspecified 

The key here is that we can see that the certificate was generated by the statement "Generating certificate with hostname..." so we know we need to look at things after the certificate creation.  The only reason I have found for this error is the firewall, after installation and certificate generation there is a validation step.  If you watch the steps through the wizard, the error pops up almost immediately so the wizard is unable to verify the agent suggesting a communication issue.  Ensure that port 1270 has been opened on the firewall and try to discover again.

Some of the other errors I've run into over time are:

Access is Denied, this one pops up from time to time when an agent installation failed for some reason, you fixed the underlying reason and tried again. The problem is the partially installed agent is blocking the re-install, the fix is to clean off the agent and do a fresh install the same way we  did for Not Enough Entropy.

Cannot connect to port 1270, this one typically occurs when there is a library path issue on the monitored server.  If you go to the server, you'll likely see that the service failed to start. Trying to restart the service will give you the name of the library that cannot be found.  

The typical resolution path for linux is:

  1. scxadmin -restart all
  2. See what library is missing 
  3. find / -name <missing library>  
  4. vi /etc/ld.so.conf 
  5. add path to missing library  
  6. ldconfig to reload dynamic loader  
  7. scxadmin -restart all   

The path for Solaris is the same for steps 1 - 3 but differs when it comes to setting the library path:

  1. crle to see the current path
  2. crle -l to update the path (include the old path plus the new path because the command is a replacement, not an append) 
  3. scxadmin -restart all  

Can not resign certificate, /etc/opt/microsoft/ssl/scx-host-<hostname>.pem already exists,in this situation the re-creation of a certificate was attempted but failed because there was a previously generated certificate on the target host.  If you want to generate a new certificate, simply delete the contents of the /etc/opt/microsoft/ssl directory.  Alternatively you can export the certificate and trust it on the management server.

winrm failed to connect in a timely manner, this can happen if the target server is over loaded. OpenPegasus will time out after 20 seconds or so and this can result in a failure to validate the agent was properly installed.  The fix here is to ensure the agent was in fact installed using scxcimcli ei -n root/scx CIM_ManageElement on the target server and then retrying the discovery.
 
There are  many other things that couild go wrong during discovery but in most cases the error message you receive should help you determine how to fix the problem. One thing to watch is at what phase the error occurred: Initial discovery (name resolution issues), Installation (user account issues), Signing (certificate issues), Validation (configuration issues), knowing where to start looking is half the battle to getting our servers successfully discovered.

Posted on August 10, 2009 at 08:00 AM in Debugging, Operations Manager | Permalink | Comments (4) | TrackBack (0)

Reblog (0) | Digg This | Save to del.icio.us | Tweet This!

Next »

Subscribe

  • Subscribe to this blog's feed

Sites We Like

  • System Center Central
  • Bridgeways Management Packs

Categories

  • BridgeWays
  • Debugging
  • Hyper-V
  • Management Packs
  • Operations Manager
  • SCOM
  • Technology