The key to being able to monitor a server is being able to discover that server :), until you can get the server into Operations Manager you aren't going to be able to do much with it. While the discovery process for Unix and Linux servers seems simple enough, there is a lot going on behind the scenes that is hidden by the wizard. In a previous entry I went over a successful discovery path (OpsMg and Cross Plat-Getting Started), for this post I'm going to go over some of the errors that can occur and how to resolve them.
The first one I'll talk about is Not Enough Entropy, this one required a little digging to figure out what was wrong. The exact error is Failed to allocate resource of type random data: Failed to get random data - not enough entropy.
I've had this issue when discovering both RHEL and SLES servers and it is related to certificate generation.
There are two ways to solve this problem, you can recreate the /dev/random file or do a manual agent install.
For both fixes, clean off the partially installed agent using the commands
Then if you want to make it so that discovery will work from the wizard use the commands A manual install requires copying the appropriate package from %Program Files%\System Center Operations Manager 2007\AgentManagement\UnixAgents to the Unix\Linux machine and installing it directly. After fixing the install issue, switch the /dev/random file back to a signed random file using the commands: Next, let's look at Unspecified Problem, this is one where I am sure there is a whole gamut of reasons why it occurs. The text is Starting Microsoft SCX CIM Server: Unspecified Problem. The key here is that we can see that the certificate was generated by the statement "Generating certificate with hostname..." so we know we need to look at things after the certificate creation. The only reason I have found for this error is the firewall, after installation and certificate generation there is a validation step. If you watch the steps through the wizard, the error pops up almost immediately so the wizard is unable to verify the agent suggesting a communication issue. Ensure that port 1270 has been opened on the firewall and try to discover again. Some of the other errors I've run into over time are: Access is Denied, this one pops up from time to time when an agent installation failed for some reason, you fixed the underlying reason and tried again. The problem is the partially installed agent is blocking the re-install, the fix is to clean off the agent and do a fresh install the same way we did for Not Enough Entropy. Cannot connect to port 1270, this one typically occurs when there is a library path issue on the monitored server. If you go to the server, you'll likely see that the service failed to start. Trying to restart the service will give you the name of the library that cannot be found. The typical resolution path for linux is:
The path for Solaris is the same for steps 1 - 3 but differs when it comes to setting the library path:
Can not resign certificate, /etc/opt/microsoft/ssl/scx-host-<hostname>.pem already exists,in this situation the re-creation of a certificate was attempted but failed because there was a previously generated certificate on the target host. If you want to generate a new certificate, simply delete the contents of the /etc/opt/microsoft/ssl directory. Alternatively you can export the certificate and trust it on the management server.
the "not enough entropy" is a good one, I had not seen that one yet.
there is a couple more of these errors and possible troubleshooting here:
http://technet.microsoft.com/en-us/library/dd891011.aspx
and here
http://technet.microsoft.com/en-us/library/dd891009.aspx
Posted by: Daniele Muscetta | August 10, 2009 at 01:19 AM
there is some more info about "not enough entropy" here
http://social.technet.microsoft.com/Forums/en-US/crossplatformsles/thread/f94ec905-23ac-4444-b9f8-644fec3ae357
Posted by: Daniele Muscetta | August 22, 2009 at 01:11 PM
Hi,
Thanks for this advice.
Your technique of remapping /dev/urandom onto /dev/random leaves /dev/random in a writable state:
# rm -rf /dev/random
# mknod /dev/random c 1 9
# ls -al /dev/random
crw-r--r-- 1 root root 1, 9 Aug 25 16:28 /dev/random
You might be better off running:
# mknod -m 0444 /dev/random c 1 9
# ls -al /dev/random
cr--r--r-- 1 root root 1, 9 Aug 25 16:31 /dev/random
After the installation, you should replace the existing /dev/random. I believe it is different from /dev/urandom for a reason.
Posted by: Spelunker | August 25, 2009 at 11:33 AM
Spelunker, you are correct the permissions should be set correctly if you are going to maintain the file. It should root writable though, so the commands would be:
mknod -m 644 /dev/random c 1 9
chown root:root /dev/random
To go back to the original state, you can use the commands:
mknod -m 644 /dev/random c 1 8
chown root:root /dev/random
I've updated the post to specify the new commands.
Thanks for pointing that out.
Posted by: Michael Guthrie | August 25, 2009 at 05:02 PM