a random aisle in the supermarket of life

April 11, 2006

expensive equipment, a hammer, backups, and disaster recovery; A good mix

Filed under: Administration, Computers, Hardware — emjaydee @ 9:01 am

I found out yesterday that apparently using a hammer and a phillips head screw driver to drive a [tag]SCSI[/tag] cable through a maybe 1/8 inch opening between my desk and the cube wall it is screwed into is a bad idea.

I spent a couple hours between yesterday afternoon, later on that night, and some time this morning trying to figure out why my linux box refused to acknowledge the existence of the Sun StorEdge L8 [tag]LTO[/tag] tape [tag]autoloader[/tag] I hooked up to it. I didn't think the screwdriver actually went into the cable at all. It just looked like it busted into the magnet that surrounds the cable near the end. That thing really needed to be driven through the desk. On the good side, it gave Bill and I a good excuse to use a hammer and a bunch of prying tools to "install" a tape autoloader.
I have been trying to implement a fairly reliable backup system for a few small file servers we have at the office. The previous group of people that managed the backups for these systems had a [tag]disaster recovery[/tag] plan that involved having a rotation of backups that traveled through 3 separate physical locations. It seemed like a bit overkill, but then again, it is better to be safe. The funny thing is that the [tag]backups[/tag] were all on a bunch of 4mm 20 gig (uncompressed) tapes. The 3 servers that were being backed up totaled somewhere around 500 gigs…maybe a bit less. The best part was that between the 3 servers there where only 2 tape drives. 2 very slow tape drives. Plus, the majority of the data that was being backed up was uncompressable. movies, audio, and pictures mostly. So this involved a lot of tapes. It took a good 3 hours for 1 tape to get filled. They would get no notification it was ready for the next tape, so every couple hours they would go and log into the machine, or just check if the tape drive ejected a tape, then switch it, and rinse and repeat for the 2 day (or more) long backup. Luckily incremental backups weren't as bad, but most of the time I don't think they could even happen given how long a full backup would take. If you forget to change the tape for a while, you just might have wasted a whole days worth of time that the backup could have been chugging along. The tapes would get put into a plastic tape case that looked like it was supposed to be rushed to the hospital for a life saving organ transplant. Then that would get carted off to the first off site location in the big 3 location backup plan.

Then the group that had been handling these backups..plus a bunch of other tasks got moved to another location because of "streamlining" how their group worked. Which is when My co-worker and I got stuck with all the fun. Neither one of us had the time to keep checking to see when the next tape needed to be changed, so a full backup would take maybe 2 weeks to finish.

Anyway, that is a bunch of back story that doesn't really matter. I really wanted to just complain about [tag]Backup Exec[/tag], and some oddness associated with the Arkeia trial installation I have been working on. The whole old backup system for these 3 machines used Backup Exec. I really really really don't like Backup Exec. The UI was poorly designed, the server has to run on a windows machine, and [tag]Veritas[/tag]/[tag]Symantec[/tag] decided to screw over their customer base and not offer any encryption option unless you upgraded to their $20,000 Enterprise "we screwed you" 2.0 package (i made that price up). I didn't realize that until I was going to upgrade the 3 client installs, and the 1 Backup Exec server to their most recent version. But, I did get a chance to try out the Sun StorEdge L8 autoloader we have had laying around for who knows how long. The L8 uses 200gig LTO tapes (400 compressed), and when I tried the first backup on the trial of the new Backup Exec, The entire backup of the 3 systems took around 4 hours to finish, and everything fit on a tape and a half. On the bad side, the L8 only holds 8 tapes, one of which is a cleaning tape, so really 7. That isn't a safe number for a full mostly automated backup strategy, but it is still much better than the previous setup.

After I found out about the lack of encryption support, that got weighed in with the crappy UI, and the need for a windows 2003 server, we decided to try something else, and since my co-worker loved [tag]Arkeia[/tag] so much, I figured I would give that a try.

For a test install, I hooked the Storedge autoloader up to a [tag]Sun V120[/tag] running [tag]Solaris[/tag] 10, and got a bunch of trial licenses for Arkeia. The installation was completely painless, everything was pretty straight forward. The only part that took any time was getting the v120 to recognize the autoloader, but that can't be blamed on the software. It was more my lack of knowledge.

Arkeia has a really well thought out X interface that everything can be setup from, and you can install the server on a variety of platforms. Solaris, Linux, FreeBSD..etc. Most installs involve just typing rpm -i, or dpkg -i or ./install, depending on the packaging system on the server. I was pretty surprised on how well thought out everything was.

After I got everything going, i tried the first backup. I left encryption off, and figured I would try the best (compression wise) compression method, which was [tag]LZ3[/tag]. The backup gets started, and I looked at the fun little speedometer the X interface displays during an interactive backup. You can see a bunch of differant metrics, like MB/h, MB/min, MB/s, KB/s for both the network and the backup speed. This is when things started to go downhill. The max backup speed I was getting was 5 gigs an hour. Then I thought maybe the compression was too much for a v120. The load on the machine was a little over 1, but still, something didn't seem right. I tried the backup again with no compression this time, and left work for the weekend (this was on Friday). Some time Saturday I log in to see how things are going. and in 33 hours it has backed up a whopping 144 gigs. This was never going to finish. I tried a bunch of differant things, then on Monday, we tried doing an scp of a large file from the v120 to various other machines. I was getting the same crappy throughput. The port on the switch was set to auto negotiated, so I tried forcing it to 100/full duplex, but no difference. It must be a misconfiguration of some kind either on the switch or with the interface on the server, but it was happening on a couple of the other servers on that same bank of switches, so I figured I would just try a more localized test install on my [tag]Sun Ultra 20[/tag], which is running [tag]OpenSuse[/tag] 10.0/64bit. Arkeia had an rpm for Suse enterprise 64 bit, and that installed without a problem.

I really didn't want to shove the autoloader under my desk, and I found a SCSI cable that was long enough to let me put the autoloader on the corner of my cube against a wall. The only problem was that the hole in the desk for cables to pass through can't fit the whole SCSI cable end. Which left me with 2 options. Leave it under the desk, or figure out a way to get the cable up behind the desk. Which is where the [tag]hammer[/tag] and a bunch of large screw drivers came in. My co-worker pried from the top, and I was prying with another screwdriver from the bottom while trying to push the cable through the little opening. I was thinking how funny it would be if we ripped the desk out of the cube wall by accident and the whole thing crashed on top of me (including my co-worker) but the cable got through. Except for that damn metal cylinder at the end of the cable. This was going to take some finesse. After trying everything. I decided to use a philips head as a wedge, and just smacked it as hard as I could until the stupid metal/plastic/rubber thing went up through the crack….with the screwdriver inside. The cable looked fine, but apparently it wasn't.

This morning, after trying everything I could think of to get my system to recognize the new scsi device, I figured I would try another cable, which all I could find was a little 3 foot long cable. So under the desk the autoloader went. It is actually just balancing on top of a little [tag]terastation[/tag] NAS device. If I touch it with my foot by accident, I am sure it will flip on its side, but that is part of the fun.

So, I plug in the autoloader, reload the scsi card module, and low and behold, there it is in all its glory. So I set Arkeia up real quick and get a backup going. No compression or encryption which is the same as the last backup I did on the Solaris install. The backup speed now is averaging 30-40 gigs an hour.

I have no idea what was up with the v120, but if you saw our network closet, our network…actually, any of our stuff, you would run in horror. So now I can add that to my never decreasing list of tasks.

"figure out why throughput on half the equipment sucks"

The funny part I guess is that my Ultra 20 is my main workstation. I wrote this post on it, in [tag]KDE[/tag], with a bunch of other stuff running all during the backup.


Create a free website or blog at WordPress.com.