a random aisle in the supermarket of life

August 10, 2006

SNMP on a cisco 6509 and intermapper

Filed under: Administration, Computers — emjaydee @ 8:20 am

at my work we use [tag]InterMapper[/tag] to monitor all our equipment. I was trying to get the [tag]SNMP[/tag] probe it has for [tag]cisco[/tag] equipment to work with our Cisco [tag]6509[/tag] switch, but apparantly cisco decided that it would be fun to use completely differant OIDs for that line of [tag]switches[/tag]. so I spent hours yesterday trying to get it to work.

Sure, cisco has a nice repository of all the [tag]MIBs[/tag] for all their equipment, but they are all uncompiled and missing the actual OIDs.

Granted I am not nearly as familiar with SNMP stuff as I would like to be, but come on.

Look at the number of mibs available just for the 6500 series:
ftp://ftp-sj.cisco.com/pub/mibs/supportlists/wsc6000/wsc6000-supportlist-ios.html

All I am looking for is the CPU load and the amount of memory available. For the 5 second CPU Load according to the MIB file this is what I need:

cpmCPUTotal5sec OBJECT-TYPE
SYNTAX Gauge32 (1..100)
MAX-ACCESS read-only
STATUS deprecated
DESCRIPTION
"The overall CPU busy percentage in the last 5 second
period. This object obsoletes the busyPer object from
the OLD-CISCO-SYSTEM-MIB. This object is deprecated
by cpmCPUTotal5secRev which has the changed range of
value (0..100)."
::= { cpmCPUTotalEntry 3 }

Part of the fine is the deprecation chain. As you can see in the mib excerpt, cpmCPUTotal5sec was deprecated by cpmCPUTotal5secRev. If you go to the cpmCPUTotal5secRev section, it says it was deprecated by cpmCPUTotalMonInterval, which when you go to that section. But of course the only one of those that is actually in our version of the 6509 is cpmCPUTotal5sec.

Anyway, It sure would be nice if the OID was listed in that mib file. Then I find this file:
ftp://ftp.cisco.com/pub/mibs/oid/CISCO-PROCESS-MIB.oid

One of the lines says:
"cpmCPUTotal5sec" "1.3.6.1.4.1.9.9.109.1.1.1.1.3"

So I should be all set now right? no.

This might be an issue with our version of intermapper, because if I use snmpwalk like this:

snmpwalk -O -v 2c -c CommunityName IPAddress 1.3.6.1.4.1.9.9.109.1.1.1.1.3

I get this result:
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.3.9 = Gauge32: 21

It sure looks like that should work. I get a value and everything! So I write the custom SNMP probe for InterMapper with the 3 OIDs I want to watch. But none of them work, InterMapper claims none of those OIDs are available in the switch. Of course snmpwalk disagrees. So I figure I just completely messed up writing the probe.

So this morning I come into work figuring I would give it a fresh go. I happened to be looking through the options for snmpwalk, and notice the “-O n” option, which prints out the OID numerically. Which returns:
.1.3.6.1.4.1.9.9.109.1.1.1.1.3.9 = Gauge32: 21

So apparantly, my problem the whole time was that InterMapper wants the OID to look like this:
.1.3.6.1.4.1.9.9.109.1.1.1.1.3.9
Instead of this:
1.3.6.1.4.1.9.9.109.1.1.1.1.3

Not sure what the .9 at the end does, but go figure… It sure would be nice to just make the OID available in the first place. without jumping through so many hoops.

For anyone that cares, These are the OIDs that seem to make the most sense:

cpmCPUTotal5sec .1.3.6.1.4.1.9.9.109.1.1.1.1.3.9
cpmCPUTotal1min .1.3.6.1.4.1.9.9.109.1.1.1.1.4.9
cpmCPUTotal5min .1.3.6.1.4.1.9.9.109.1.1.1.1.5.9
ciscoMemoryPoolFree 1.3.6.1.4.1.9.9.48.1.1.1.6
DRAM .1.3.6.1.4.1.9.9.48.1.1.1.6.1
FLASH .1.3.6.1.4.1.9.9.48.1.1.1.6.6
NVRAM .1.3.6.1.4.1.9.9.48.1.1.1.6.7
MBUF .1.3.6.1.4.1.9.9.48.1.1.1.6.8
CLUSTER .1.3.6.1.4.1.9.9.48.1.1.1.6.9
MALLOC .1.3.6.1.4.1.9.9.48.1.1.1.6.10

Memory stuff
ftp://ftp.cisco.com/pub/mibs/v2/CISCO-MEMORY-POOL-MIB.my
ftp://ftp.cisco.com/pub/mibs/oid/CISCO-MEMORY-POOL-MIB.oid

CPU/Process stuff
ftp://ftp.cisco.com/pub/mibs/v2/CISCO-PROCESS-MIB.my
ftp://ftp.cisco.com/pub/mibs/oid/CISCO-PROCESS-MIB.oid

InterMapper Cisco 6500 Probe:
http://aisle10.net/intermapper-snmp.cisco6500.txt

April 11, 2006

expensive equipment, a hammer, backups, and disaster recovery; A good mix

Filed under: Administration, Computers, Hardware — emjaydee @ 9:01 am

I found out yesterday that apparently using a hammer and a phillips head screw driver to drive a [tag]SCSI[/tag] cable through a maybe 1/8 inch opening between my desk and the cube wall it is screwed into is a bad idea.

I spent a couple hours between yesterday afternoon, later on that night, and some time this morning trying to figure out why my linux box refused to acknowledge the existence of the Sun StorEdge L8 [tag]LTO[/tag] tape [tag]autoloader[/tag] I hooked up to it. I didn't think the screwdriver actually went into the cable at all. It just looked like it busted into the magnet that surrounds the cable near the end. That thing really needed to be driven through the desk. On the good side, it gave Bill and I a good excuse to use a hammer and a bunch of prying tools to "install" a tape autoloader.
I have been trying to implement a fairly reliable backup system for a few small file servers we have at the office. The previous group of people that managed the backups for these systems had a [tag]disaster recovery[/tag] plan that involved having a rotation of backups that traveled through 3 separate physical locations. It seemed like a bit overkill, but then again, it is better to be safe. The funny thing is that the [tag]backups[/tag] were all on a bunch of 4mm 20 gig (uncompressed) tapes. The 3 servers that were being backed up totaled somewhere around 500 gigs…maybe a bit less. The best part was that between the 3 servers there where only 2 tape drives. 2 very slow tape drives. Plus, the majority of the data that was being backed up was uncompressable. movies, audio, and pictures mostly. So this involved a lot of tapes. It took a good 3 hours for 1 tape to get filled. They would get no notification it was ready for the next tape, so every couple hours they would go and log into the machine, or just check if the tape drive ejected a tape, then switch it, and rinse and repeat for the 2 day (or more) long backup. Luckily incremental backups weren't as bad, but most of the time I don't think they could even happen given how long a full backup would take. If you forget to change the tape for a while, you just might have wasted a whole days worth of time that the backup could have been chugging along. The tapes would get put into a plastic tape case that looked like it was supposed to be rushed to the hospital for a life saving organ transplant. Then that would get carted off to the first off site location in the big 3 location backup plan.

Then the group that had been handling these backups..plus a bunch of other tasks got moved to another location because of "streamlining" how their group worked. Which is when My co-worker and I got stuck with all the fun. Neither one of us had the time to keep checking to see when the next tape needed to be changed, so a full backup would take maybe 2 weeks to finish.

Anyway, that is a bunch of back story that doesn't really matter. I really wanted to just complain about [tag]Backup Exec[/tag], and some oddness associated with the Arkeia trial installation I have been working on. The whole old backup system for these 3 machines used Backup Exec. I really really really don't like Backup Exec. The UI was poorly designed, the server has to run on a windows machine, and [tag]Veritas[/tag]/[tag]Symantec[/tag] decided to screw over their customer base and not offer any encryption option unless you upgraded to their $20,000 Enterprise "we screwed you" 2.0 package (i made that price up). I didn't realize that until I was going to upgrade the 3 client installs, and the 1 Backup Exec server to their most recent version. But, I did get a chance to try out the Sun StorEdge L8 autoloader we have had laying around for who knows how long. The L8 uses 200gig LTO tapes (400 compressed), and when I tried the first backup on the trial of the new Backup Exec, The entire backup of the 3 systems took around 4 hours to finish, and everything fit on a tape and a half. On the bad side, the L8 only holds 8 tapes, one of which is a cleaning tape, so really 7. That isn't a safe number for a full mostly automated backup strategy, but it is still much better than the previous setup.

After I found out about the lack of encryption support, that got weighed in with the crappy UI, and the need for a windows 2003 server, we decided to try something else, and since my co-worker loved [tag]Arkeia[/tag] so much, I figured I would give that a try.

For a test install, I hooked the Storedge autoloader up to a [tag]Sun V120[/tag] running [tag]Solaris[/tag] 10, and got a bunch of trial licenses for Arkeia. The installation was completely painless, everything was pretty straight forward. The only part that took any time was getting the v120 to recognize the autoloader, but that can't be blamed on the software. It was more my lack of knowledge.

Arkeia has a really well thought out X interface that everything can be setup from, and you can install the server on a variety of platforms. Solaris, Linux, FreeBSD..etc. Most installs involve just typing rpm -i, or dpkg -i or ./install, depending on the packaging system on the server. I was pretty surprised on how well thought out everything was.

After I got everything going, i tried the first backup. I left encryption off, and figured I would try the best (compression wise) compression method, which was [tag]LZ3[/tag]. The backup gets started, and I looked at the fun little speedometer the X interface displays during an interactive backup. You can see a bunch of differant metrics, like MB/h, MB/min, MB/s, KB/s for both the network and the backup speed. This is when things started to go downhill. The max backup speed I was getting was 5 gigs an hour. Then I thought maybe the compression was too much for a v120. The load on the machine was a little over 1, but still, something didn't seem right. I tried the backup again with no compression this time, and left work for the weekend (this was on Friday). Some time Saturday I log in to see how things are going. and in 33 hours it has backed up a whopping 144 gigs. This was never going to finish. I tried a bunch of differant things, then on Monday, we tried doing an scp of a large file from the v120 to various other machines. I was getting the same crappy throughput. The port on the switch was set to auto negotiated, so I tried forcing it to 100/full duplex, but no difference. It must be a misconfiguration of some kind either on the switch or with the interface on the server, but it was happening on a couple of the other servers on that same bank of switches, so I figured I would just try a more localized test install on my [tag]Sun Ultra 20[/tag], which is running [tag]OpenSuse[/tag] 10.0/64bit. Arkeia had an rpm for Suse enterprise 64 bit, and that installed without a problem.

I really didn't want to shove the autoloader under my desk, and I found a SCSI cable that was long enough to let me put the autoloader on the corner of my cube against a wall. The only problem was that the hole in the desk for cables to pass through can't fit the whole SCSI cable end. Which left me with 2 options. Leave it under the desk, or figure out a way to get the cable up behind the desk. Which is where the [tag]hammer[/tag] and a bunch of large screw drivers came in. My co-worker pried from the top, and I was prying with another screwdriver from the bottom while trying to push the cable through the little opening. I was thinking how funny it would be if we ripped the desk out of the cube wall by accident and the whole thing crashed on top of me (including my co-worker) but the cable got through. Except for that damn metal cylinder at the end of the cable. This was going to take some finesse. After trying everything. I decided to use a philips head as a wedge, and just smacked it as hard as I could until the stupid metal/plastic/rubber thing went up through the crack….with the screwdriver inside. The cable looked fine, but apparently it wasn't.

This morning, after trying everything I could think of to get my system to recognize the new scsi device, I figured I would try another cable, which all I could find was a little 3 foot long cable. So under the desk the autoloader went. It is actually just balancing on top of a little [tag]terastation[/tag] NAS device. If I touch it with my foot by accident, I am sure it will flip on its side, but that is part of the fun.

So, I plug in the autoloader, reload the scsi card module, and low and behold, there it is in all its glory. So I set Arkeia up real quick and get a backup going. No compression or encryption which is the same as the last backup I did on the Solaris install. The backup speed now is averaging 30-40 gigs an hour.

I have no idea what was up with the v120, but if you saw our network closet, our network…actually, any of our stuff, you would run in horror. So now I can add that to my never decreasing list of tasks.

"figure out why throughput on half the equipment sucks"

The funny part I guess is that my Ultra 20 is my main workstation. I wrote this post on it, in [tag]KDE[/tag], with a bunch of other stuff running all during the backup.

March 2, 2006

Sun is out to get me, and God told them to do it

Filed under: Administration, Computers — emjaydee @ 9:27 am

After mucking around with it for 3 days off and on, I come into work 2 hours early today to get a head start on getting the Sun Java Enterprise Server (with LDAP/Messaging support) running and populated so that my work can finally move off of NIS/40 other authentication systems.

I have it to the point where all that is left is to run the various post-deployment configuration scripts and steps, which I find odd in the first place. Why are their configuration steps that you have to do after you finish the configuration? what is the point of having a configuration wizard with a product if after you complete using it, the wizard then says “yeah, uh, you still have things to do….I dont know what, but there is stuff, and it is in document 819-2328.”

The fun part is that document 819-2328 is on suns docs.sun.com website. Which gives the good ‘ol

Server Error

This server has encountered an internal error which prevents it from fulfilling your request. The most likely cause is a misconfiguration. Please ask the administrator to look for messages in the server’s error log.

message. That doesn’t look like post deployment instructions to me. You know, I always thought that generic 500 error was stupid. So I am supposed to just go and contact “the administrator” at Sun? I am sure Sun only has one administrator..just one. Not only that, but I am sure he is just sitting at his desk…twiddling his fingers just waiting for the phone to ring for me to say “hey, uhh…your website is down…your probably didn’t get 500 calls, pages and emails about it, but yeah…I just wanted you to know. Could you get it back up soon?”
Someone out there really does not like me. It must be because I didn’t pay much attention to ash wednesday. Now god is smiting me.

I think that this is what happened to the server:

melted computer

On a related note, aside from the massiveness of the entire Java Enterprise Server system, it actually is fairly cool. The web mail client that comes with the messaging server is not the best thing in the world, but it is fairly decent, and adding info to the LDAP directory with their java interface is beyond easy. I don’t know why it took around 4 years for us to finally set one up. I guess it is probably because of the 300 other projects that are always going on.

[tags]Sun,JES,LDAP,Messaging Server,Java Enterprise Server,Solaris,web mail,500 errors[/tags]

November 8, 2005

whois and traceroute suck. WhoB, and LFT are where the party is

Filed under: Administration — emjaydee @ 11:54 am

Last night I was trying to track down why all these odd HTTP requests were going to a server I am working on. It looked like the server got listed on some web proxy list or something, because basically every request that came in was in the form of

GET http://randomsitename.com

What was even more weird was that every once of those crazy requests was for either a random little search engine, or a bunch of popular 3rd party ad servers.

Either way, the end result was that I had about 280 IP addresses that all these requests were from, and I was trying to find some kind of link to why all these IP’s were sending requests to this one random server that hasn’t even been put into production yet.

looking at whois output gets real boring after a while, plus most whois clients don’t handle bulk processing very well, and I wasn’t really interested in sitting around and either manually running whois queries on 280 IP’s or staring at the output of all those whois records going by.

Then I found this little tool called WhoB. WhoB is a really handy little command line whois client that is designed to product all its output on 1 pipe delimited line, which makes it really easy to use with grep or awk. Also, WhoB uses a variety of sources to get its data. It primarily looks up information derived from the global internet routing table, as opposed to the standard whois client, which sucks unless you specify which whois database to use (and you need to know its address), which makes things really inconvenient if the addresses you are researching are scattered internationally.

You can look WhoB manual on how to use it, by just typing this line:

for ii in `cat fulllist`; do whob -o $ii;sleep 10; done|tee ./whoisoutput

I was able to save all the output of the file, watch the results scroll by in the meantime, and have some nice easily grepable output, which after it has finished, told me that all the requests were from 2 very large networks in China. Also, in case you were wondering, I added the “sleep 10” line because the ARIN database apparently cut me off because I was querying it at least once a second, and apparently they don’t like that.

Here is a sample of the output:

222.79.29.118 | origin-as 4134 (222.76.0.0/14) | CHINANET fujian province network

The -o option tells WhoB to display the organization name on file at whatever registrar for who owns that IP.

Also, WhoB comes in the same package as another really useful tool that I found last night as well called LFT. LFT is …

short for Layer Four Traceroute, is a sort of ‘traceroute’ that often works much faster (than the commonly-used Van Jacobson method) and goes through many configurations of packet-filter based firewalls. More importantly, LFT implements numerous other features including AS number lookups through several reliable sources, loose source routing, netblock name lookups, et al. What makes LFT unique? Rather than launching UDP probes in an attempt to elicit ICMP “TTL exceeded” from hosts in the path, LFT accomplishes substantively the same effect using TCP SYN or FIN probes. Then, LFT listens for “TTL exceeded” messages, TCP RST (reset), and various other interesting heuristics from firewalls or other gateways in the path. LFT also distinguishes between TCP-based protocols (source and destination), which make its statistics slightly more realistic, and gives a savvy user the ability to trace protocol routes, not just layer-3 (IP) hops.

LFT it a lot more useful than the normal traceroute command, I won’t say it actually ran any faster though.

Also, LFT/WhoB is available as a package in debian. If you’re using Ubuntu, you need to tell the package manager to use the “universe” package database, otherwise you will have to go to the LFT/WhoB website and download the debian package from there.

Blog at WordPress.com.