Standalone Sysadmin: 2008

Wednesday, December 31, 2008

Update on the Zune issue

Who called it? Who da man?.

Zune freeze issue is a result of the Leap Year.

OK, enough gloating. Hopefully they'll fix the firmware before 2012 ;-)

Note to anyone selling equipment

ALWAYS wipe your equipment before you sell it to anyone.

This includes things like hard drives and network devices.

I can't mention any names at all, or specifics, but I ordered a couple of refurb routers a few days ago, and I was very surprised today when I saw full router configs in place, complete with IPSec settings, ACLs, and plaintext read/write SNMP strings.

Always wipe your configs before you sell the devices. Always.

Zune: Y2K + 8 11.9/12

If your Zune doesn't work this morning, it's not just you.

Apparently, Zunes all over the world froze at or near midnight(local time) last night.

The fact that this follows an iPhone post isn't me gloating, that's just a coincidence. Promise. ;-)

[Edit] It just occurred to me that since this is a leap year, today is day 366. It would be hilarious if they screwed that up.

The mobile device is dead! Long live the mobile device!

My blackberry died at a particularly inopportune moment over the holiday weekend. Specifically sometime between when I went to bed on the 25th and when the firewall cluster members decided to kill each other on the 26th. In any event, I didn't find out about it until nearly noon, which is Not Good(tm).

To rectify the situation, I got permission to go buy another phone, and I cleared it with the president to pick up an iPhone, since there were a lot of positive responses when I asked, way back in November. Matt's response in particular swayed my opinion.

In the intervening week, I have to say that I've become pretty attached to the thing. With some additional apps from the AppStore (if you're considering getting an iphone, prepare to hear that phrase a lot), it becomes much easier to type mail (get Firemail for landscape typing), there are free RDP, VNC, ssh (touchterm), and other apps as necessary, and best of all, there are built-in settings for VPN (IPSec, PPTP, and L2TP). Browsing the web using Safari in landscape mode makes even Opera mini on the Blackberry look like masochism.

In fact, the only complaint that I have is that the notification options suck. Apple really, really dropped the ball with the configuration options for notifications. You can change ringtones for phone calls and text messages, but you cannot change ringtones or adjust volume for incoming email. At all. And the default notification is a quiet, polite "blip", which doesn't wake me up at 3am. And that's a deal breaker.

Before I took my iPhone back, I wanted to try everything, so I decided to jailbreak (I used quickpwn for Windows) it and see what I could change. The process went very, very smoothly (as soon as I realized the the "power" button is the one on the top right that you click to lock the phone).

I used Cydia (the jailbreak equivalent of the App Store) to install OpenSSH on the phone:

Matt-Simmons:/usr root# uname -a
Darwin Matt-Simmons 9.4.1 Darwin Kernel Version 9.4.1: Sat Nov 1 19:09:48 PDT 2008; root:xnu-1228.7.36~2/RELEASE_ARM_S5L8900X iPhone1,2 arm N82AP Darwin

and used that to sftp in and copy the ringtone to my desktop, which I then modified using Audacity to increase the volume and added a double beat to the beginning, so that it now goes "chi-ching!". Saved that, exported it as an AIFF file, renamed it to the original new-mail.caf and then dragged it back across the sftp pipe to the phone. Sent an email to myself, and I'm now guaranteed to be woken up if I get mail at night.

I should really look into getting a dev kit for the phone. It would be really handy to support actual profiles and to use the GUI to set things like this up. There is a local terminal app available, but it doesn't appear to be supported on my firmware. I'm sure it'll be updated shortly.

Anyone else have any neat tricks for a jailbroken iPhone?

Tuesday, December 30, 2008

Scare Tactics and Security Warnings!

I like looking at big scary apocalyptic events. There's just something...calming...about it. Watching movies where the Earth gets destroyed makes me feel better about the real world and how comparatively un-screwed-up it is. This tendency of mine has spread to the internet, I think. I talked about some crazyness a while back, but today's news is much more fun.

Hackers at the Chaos Computer Conference announced today that they have managed to completely break SSL by using 200 PS3s. Not just that they can spy on communications between hosts communicating over SSL, but that they can brute-force create a "trusted" certificate for whatever they want.

So let me posit a quick scenario. Hackers use the BGP flaw to redirect your bank's traffic to their server, where they've installed a freshly created fake trusted certificate and they man-in-the-middle till the cows come home. Not even two-way authentication can help you then. The best part is that these aren't "bugs" in the applicable protocols as much as flaws in their design.

I suppose in the beginning banks and other lucrative targets can filter known-offenders from their access lists, but the use of botnets will stop that from being an effective tactic.

I wonder if [EDIT] two way PKI will start being cost-effective to implement in that case, since (as I understand it?) the keys and certs aren't being recreated byte-per-byte, they're creating a rogue certificate authority and using that to issue certs. There's a large difference between that and replicating someone's 2048 bit private key. At least, I'm pretty sure. IANAC (I am not a crytpologist).

If the large institutions decide not to do anything, it might get really interesting. Maybe we'll have to go back to writing checks. ;-)

Monday, December 29, 2008

Adventures in VOIP part 2

This is a continuation of Adventures in VoIP part 1

Elastix

The harder half of this endeavor has been the configuration of Elastix. I missed most of the operating system install, but I have been doing a lot of the work getting the extensions set up and configuring the operating panel. My boss got to set up the inbound and outbound routes and configure the trunk lines on the server. Being a Windows guy (and my DOS days being long behind me) I am not all that comfortable working straight from a command line anymore. Thus I attempted to use the web gui supplied with the software.

The web gui is not actually all that bad. I can honestly say they spent some time working on it, but there is one thing they did that drives me absolutely batty. What the hell is up with the red bar? You go in and edit an extension. At the bottom of the form is your standard issue submit button. You think you've made your change, you go and check and nope! It's still the same. You must have missed the red bar. Check out the image. As you can see, the red bar isn't all that red and looks very much like it is a part of the natural background. Up until you look closely and see the pale blue text that says "Apply Configuration" and proceed to facepalm. Unembedded FreePBX (The Elastix form is actually a front end for this) does this right. Notice the orange on blue. Completely contrasting and smacks you right upside the head and tells you that you need to do something. It's noticeable.

Another annoyance encountered dealt with the batch upload. Rather than manually setting up 40+ extensions, you can load a simple csv file and get all of them in at once. After loading (I did remember to click the red button), only some of my extensions worked properly. Oddly enough, only the ones manually entered. I checked to make sure the settings were exactly the same and on a whim, I decided to just hit submit and reload the config. Of course previously unworking extension started to work. I then proceeded to manually reload all the extensions to get the working. I am certain I could have done that from the command line, I just didn't know the way and with my luck I would have just killed something on accident (Yes I have that kind of luck. Ask me about my dead RAID unit sometime, and try not to laugh too hard at me).

With that out of the way, the next task was getting the operator panel online. One thing we noticed is that it could only display 39 extensions in its default configuration. So after a bit of googling, I come across instructions for altering the operator panel. And there is no gui for this. Off to the command line I go. One way a lot of users decide to show more buttons is to physically change how big they are. This option is a no-go for me. Firstly, getting them to look good is a pain in the ass from what I have read. Secondly, our receptionist is somewhere around 70 and her eyes aren't what they used to be (She is surprisingly good with working a computer, as far as receptionists go at any rate. She calls in a timely manner when there is a problem and is polite when something needs fixed.) So to change button positioning, there is a text file to edit: op_buttons.cfg and a perl script to edit: retrieve_op_conf_from_mysql.pl.

The buttons config defines the area that the buttons will take up in pixels on the screen. You can also change column headings, column colors and a few other options. The perl file is where you actually change where the buttons will be. Apparently you edit the perl file so it can generate a buttons config file (op_buttons_additional.cfg) and that file is included with the op_buttons.cfg to get the buttons and their placement. Any manual changes to op_buttons_additional.cfg get nuked whenever Asterisk restarts or the panel reloads. My first attempt at editing the two files was a complete disaster. I found out that it will not automatically extend its default four columns downward, but it will certainly add more to the right off the screen. So that was a dismal failure. I ended up removing the entry for queues (for call queues if you are running a call center) and extending my extensions there.

With that issue solved I moved on to the next one: I was not getting all of my parking lot extensions. For those who have not dealt with larger phone systems (namely me before this job) a parking lot is a set of extensions used for holding calls for other users. That way you can transfer the call there and someone can pick up anywhere in the shop instead of trying to get to their extension before the voicemail get it. Anyway, we have nine parking lots set up 51-59, and the operator panel was only displaying five of them. I double-checked my configuration and I had set up nine, so I delved into the mysterious perl file again and found this:

for (my $i = 1 ; $i <= $numberlots && $i <= 5 ; $i++ )

Now I don't know perl, but I am pretty damn sure I can recognize a for loop when I see it. Two seconds and a reload later and I am in business with all the lots.

And that is pretty much where I stand now. I'll publish further update(s) and anecdotes from the whole process when the system actually comes online.

This was written back on the 18th and since then the system is now online. There will be more forthcoming in this series as soon as I get time to actually write it.

If you're looking for documentation, Craig Borysowich might be your man

I tout documentation quite a lot, but the specifics behind the actual documents have been a little fuzzy. For instance, having an internal wiki is invaluable, but it's as easy to create crap documentation just as well as good documentation (probably easier). What goes into a good document? What form should it take? There are no easy answers.

As with many things, one of the best ways to learn is to examine what other people have done, and that's where Craig Borysowich comes in. If you don't read his blog on IT Toolbox, you should. He consistently produces excellent examples of documentation with his Deliverables series. He also completed posting an example of a system blueprint. For anyone who hasn't already done something like this (like me), it's an amazing time saver, since Craig has done all the hard work.

Like I said, if you're looking for excellent examples of documentation, you owe it to yourself to check out his blog.

Sunday, December 28, 2008

Backup (and Recovery)

I'm reading through an enjoyable book called Backup and Recovery by Curtis Preston, and I thought I'd recommend it to any of you who are looking for more information on backup (and more importantly, recovery) schemes. Curtis runs a site called backupcentral.com, which hosts a wiki and forum about backup solutions, commercial and opensource.

I hadn't heard of it, and I figured some of you might not have either.

Friday, December 26, 2008

IT Admin groups on social networking sites

Social networking sites are on the rise, that much is apparent. Tom Anderson sold myspace.com for $580 million dollars. Current estimates are guessing over 140 million users on Facebook. And before you think that social networks are just for kids, Linked In hosts 30 million profiles of experienced professionals who are looking to network with others. Clearly, these sites are tools which can be used to learn and grow in a professional capacity.

I've had my LinkedIn account for a long time, and initially I resisted the others. Eventually I succumbed to Facebook, then myspace, mostly due to peer pressure. Since I have accounts on those three networks, I figured I'd check to see if there were any groups put together by IT administrators. And how.

Most of these groups feature discussions on various topics that you might find interesting. Check them out,and let me know what you think.

LinkedIn
IT Management
System Administrator (Mac, Win, and Linux)
System Administrators
Nagios Administrators

Facebook
Unix Sysadmin
Linux Administrators
Cisco Systems
Appreciate your sysadmin
*NIX
Network/Security/System Administrators
System Administrator Appreciation Day

MySpace
Sysadmin Superstars
Network Admins / Engineers / System Specialists
Computer / Network Administrators
Sysadmins
Network / Sysadmin / Comptechs
Network Engineers

If you know of any other social networking sites (or other types) that you'd recommend, let us know in the comments. I'm always looking for other sources of information, and I know lots of other people are too.

[EDIT]
Talk about coincidence. I wrote this last night and scheduled it for this morning for 8:30am. Before it could go live, Dru posted a link to some BSD Certification groups created on LinkedIn. Funny how things happen sometimes :-)

Thursday, December 25, 2008

Systems Administrations Advent Calendar

I really, really thought I linked to this already, but I can't find it, and I'm very sorry! Anyway, it's complete now, so you get to read it all at once. Jordan Sissel put forth superhuman effort this year to start up an Advent Calendar for Systems Administrators. He wanted it to be in the same vein as the Perl Advent Calendar.

Out of the 25 entries, Jordan completed 23 of them. Ben Rockwood wrote Day 17, Time Management, and I wrote Day 23, Change Management. Other than those two, Jordan wrote one a day, each one long and detailed about a different subject, and he did an amazing job. Read through the articles, and I know you'll find them useful and interesting.

Everyone who gets something from Sysadvent owes Jordan a thanks. Please comment on the blog there and let him know you appreciate the work, because he did an amazing job!

Merry Christmas!

First, a little joke I found:

I was musing on similarities between Santa Claus and system administrators. Consider:

Santa is bearded, corpulent, and dresses funny.
When you ask Santa for something, the odds of receiving what you wanted are infinitesimal.
Santa seldom answers your mail.
When you ask Santa where he gets all the stuff he's got, he says, "Elves make it for me."
Santa doesn't care about your deadlines.
Your parents ascribed supernatural powers to Santa, but did all the work themselves.
Nobody knows who Santa has to answer to for his actions.
Santa laughs entirely too much.
Santa thinks nothing of breaking into your $HOME.
Only a lunatic says bad things about Santa in his presence.

Also, here's Admin's Night Before Christmas. Also, the 12 Days of Admin Christmas. There's a Unix Christmas Carol, too.

I hope you have a festive holiday, whichever it is that you celebrate, and stay safe during the ensuing weeks.

Tuesday, December 23, 2008

"I'm sure it's nothing..just a random nagios timeout error. I'll be right back"

I said that to my wife 3 1/2 hours ago as I climbed out of bed to troubleshoot what I thought was a temporary network latency issue. Hah.

I've spent hours on the phone with Juniper support tonight trying to nail down why my netscreen SSG5 firewalls randomly attack the other cluster members. Tonight was the worst so far. Not only did they fight over cluster master, ns1 actually cratered. It refuses to talk on the network. It won't even see ns2 in the NSRP cluster, even though there's a direct cable connection.

Juniper is sending me the latest firmware, and the colo is going to ship me my firewall so I can re-image it and try again.

Of course, all of that is going to happen at some point after I wake up.

Friday, December 19, 2008

More undersea cables cut...this is not a repeat of a few months ago

Yes, three undersea cables have been severed. It's not like this is unprecedented or anything.

We talked about undersea cables a while back.

Thursday, December 18, 2008

Adventures in VOIP Part 1

In the interest of trying to avoid overload, This has been broken up over a couple of days. Read on, and I look forward to your commentary.

Hello again, it's time for another infrequent post by Jim the Windows admin. Since about Thanksgiving, here at our shop we have been getting a crash course in VOIP setup. Our current setup is an old Executone system that has been a work horse here at the shop for at least 8 years before I even got here. Unfortunately, like all old horses, it needs to be put out to pasture. We initially looked at going to VOIP almost a year ago, and to paraphrase the CFO "Limp the current system as long as possible". So of course when November rolls around and the system is restarting itself and dumping all calls 4-5 times a day, we suddenly have funding for a VOIP setup. Go figure. So luckily for us, we have a server we just freed up and our adventures with Asterisk can begin!

The project kicked off with the boss installing CentOS 5.0 and Elastix on a newly spare server. The last 3 weeks have been spent with the 3 man IT department here testing various phones and trying to emulate our current functionality in the new system as closely to the original as possible. For the most part the process hasn't been too bad. We've made a couple of rookie mistakes here and there, but we have a mostly operation system here. Here are a few things I've discovered while working on the system.

Phones

With the rushed time table we were handed, we have looked at 4 different brands of phones for use here in the shop. We looked at the Linksys SPA942, a PolyCom SoundPoint IP 330, a Polycom SoundPoint IP 430, a Cisco 7941 and a little later an Aastra 480i (make a note of this model number, it will come into play later). The dead simplest to set up (and the one the brass here liked the most) was the Linksys SPA942. If you don't need to do a firmware upgrade, you can go from box to working in about 5 minutes, which we in IT thoroughly enjoyed. Of course, after using the phone a bit more, we found that it is not as configurable as some of the other phones on the market (say the Aastra for example) but is perfectly serviceable for what we need. Not to mention it doesn't require text files to do configuration like the Polycom and Aastra phones. As for the Cisco, I'm not sure if we ever got it working. The boss said he would get to it, and I never felt like dealing with it. Sometimes it is good to be the underling.

After doing a little playing around with the setup on the Linksys, we decided to try out its functionality with POE and connecting the phone inline with the between the computer and the wall. To do this, we used a Dell Powerconnect 3448p to provide the POE and we also configured a VLAN for the phones. All was well, up until I decided to change something on one of the phones and my internet connection dropped. Apparently, when the phone reboots (which is does whenever you make a change on its web interface), the power is cut to its ethernet temporarily and you lose the packet forwarding through the phone. Granted it's only momentary, but suddenly dropping your SSH connection to the phone server is very annoying when you are editing a file.

The other thing I found during this process concerns our network map of the shop. The network map of the shop is old. And by old, I mean somewhere in the neighborhood of 6-8 years old. In that time, a new area in the shop was wired up and ports have been added in new places. And apparently no one bothered to label these new ports on the map. Proceed immediately to crawling under desks and trying to read someone's chicken scratch handwriting since they didn't bother with a label maker either. And to top it off, if the port was a low number (ie.6), you had to figure out if it was panel A or panel B. (Apparently starting the numbering where the first panel left off would have been too easy.) All said and done, flashing phones, installing them on desks, and moving ports around took about 3-4 hours and ate most of the Sunday after Thanksgiving day.

Remember that 480i I mentioned earlier? Well, my boss wanted to get a phone for our photo studio that would have multiple handsets since it is a rather large work area. And the 480i seemed to fit the bill. Of course he failed to notice one minor thing. Aastra has a 480i and a 480i CT. The two phones are completely identical in look function (and manual according to the site) excepting of course the ability to pair to a cordless handset. I am still waiting for the right phone to come in.

Work status update

I figure this is a blog, so every once in a while I should just talk about what's going on.

We decided to go with Apptix for our hosted email solution. They seem to have a pretty straightforward interface for adding users and administrating the server. I am a little disappointed that they don't offer an Active Directory import tool, but I'll live. Adding users is relatively painless. Getting all the strange little groups and rules set up will take a bit more time.

My boss is currently in NJ. Since I didn't get the snort machine ready in time for my trip to NJ last week, he's putting it in the rack for me, which I really appreciate. I'll get that into service before too long.

I've been getting behind on tape backups, so I've been concentrating on them this week. I managed to free up a few hundred gigs on an array by finally getting some archives in the tape safe. The freed up space will promptly be used by more archive logs, I'm sure.

It seems like the more vendors I work with, the more sites I have to use that require Internet Explorer. It's a PITA. I've got to load up a VM of some type, log in, load the browser, etc etc, or I've got to remote desktop into another machine and do the same thing. I finally broke down and tried IE4Linux again (which Nick reminded me of). It's improved a lot since I last tried it, and it lets me work on IE only sites. I haven't tried anything too advanced, but it does what I need. The underlying issue is that web client programmers don't understand that a large contingent of technical individuals don't have IE available to them (at least easily). At least I don't think they do.

I'm getting a new tape drive (horray!). Right now I've got an Exabyte (really Tandberg now) 1x10 VXA-3 packet loader. We'll be getting a Quantum Superloader with 16 slots and an LTO-3 drive. It's not as modern as the LTO-4, but it's bigger and faster than our VXA-3 and has 16 slots rather than our 12. The media is actually cheaper, too. This means I'll be needing to re-engineer my backup solution. I'm still deciding between Bacula and Amanda.

My first article has been approved for publication in Simple Talk: Exchange. It'll be in the January issue, which of course I'll link to. Many thanks to my editor Michael Francis for his patience with my horrible writing :-)

And in further good news, I have learned from my boss that they're considering getting more manpower to help me cover the infrastructure here. I don't know what shape or form it will come in, but I'll have a hand in the decision. I'm keeping the name of the blog, regardless of what happens ;-)

Also, Jim is working on a long blog post detailing his recent arguments with his VoIP mess. Look forward to that soon!

So that's been my recent life in a nutshell.

Tuesday, December 16, 2008

IE Vulnerability - We're stopping use

If you haven't heard, there's a serious IE Flaw which is causing lots of people to recommend temporarily switching from IE to Firefox (or Safari, if you're on a Mac).

Our company just went one further. We're stopping the permitted use of IE. With the exception of those sites which require it, or which we control, we are not permitting our users to browse with IE. Between this and the last few issues (mouse pointer vulnerability anyone?), IE isn't good enough to make us risk the loss of corporate data over it.

Mounting disks by UUID rather than /dev/device

Most everyone who has to knows how to mount a disk in Linux. It's easy, mount /dev/whatever /mnt/whatever. Easy.

What becomes hard for me is when I've got a bunch of external drives hooked up to a machine, and it reboots, how do you make sure everything gets to where it belongs? I used a hack. Every disk has a .disk file that holds the directory name under /mnt/hd where the drive will be mounted. Upon boot up, the server mounts each of the USB disks it finds into a staging directory, checks the contents of the file if it exists, and then remounts it to the intended location. Like I said, a hack.

That is because when I was creating that horrible scheme, I didn't know about Universally Unique Identifiers (UUIDs). Each disk has one, and you can find it by looking at the UUID that lives in /dev/disk/by-uuid/. Each disk has a file in that directory which is a symbolic link back to the /dev device.

It wouldn't be too hard to write a script that looped through that directory, checked if a disk was mounted, and if not, mounted the disk according to either fstab or some other database of known disks.

Anyone using these methods? They strike me as much much more flexible than mounting by device name.

Survey Complete

Well, the IT Admin Job (dis)Satisfaction survey is done, and I'm currently reviewing the results. There were 334 responses to the survey, which is just tremendous. We should definitely get a feel for what other admins go through, and what our work environments are like. To whet your appetite, here are the most popular answers:

1) I deal with:

Server software - 92.8% (309 responses)

2) I am on call

24x7 365 (all the time) - 43.5% (145 responses)

3) I am the primary point of contact in the event of any failure of the IT infrastructure

True - 59.8% (196 responses)

4) The number of users (total) supported in your organization

200+ - 52.1% (174 responses)

5) The number of people providing support to those users

2-4 - 41.0% (137 responses)

6) The number of serverss (physical and virtual) in your organization that are administered

200+ - 21.0% (70 responses)

7) Number of people administering those servers

2-4 - 43.4% (145 responses)

8) Total number of WAN network connections in your organization

2-4 - 30.8% (102 responses)

9) I am...

a) overworked
Agree (118 responses)

b) unappreciated
Agree (107 responses)

c) paid sufficiently
Disagree (129)

d) Happy that I am in my job role
Agree (179 responses)

e) Enjoying my job
Agree (176)

f) Seeking other employment
Disagree (112)

10) Do you think that most people in your position have it better or worse than you do?

Worse - 64.0% (208 responses)

Now, it's important to keep in mind that the raw survey results are just that: raw. There are some untruths that looking at these results might lead you to believe. For instance, you can see that 52% of people have over 200 users in their organization. You also see that 41% of respondents say that there are 2-4 people supporting their user base. This might lead you to believe that it's very common for 2-4 people to support 200+ users, but in reality, when you filter for people who have 200+ users, you see that over 67% of them have 5 or more support people for that user base. 24% of the 200+ people have 10 or more support personnel for their user base.

One thing that struck me was that almost without exception, people thought that others had it worse off than they themselves do. My asking that question was my equivalent to the computer asking Spock how he felt. I wanted to end with a question that might throw people off a little bit, and now I'm glad I did. It's an interesting metric, and people consistently felt despite how overworked and under-appreciated they were, that other people had it worse.

Anyway, I'm working on a much more in-depth report on the results with all sorts of interesting findings (including the one metric that might change an unhappy person into a happy one, and it has nothing to do with money!). Thanks, everyone, for your responses!

Thursday, December 11, 2008

Some Campfire Stories

Here are some old stories passed around and down and across. Some are probably tall tales, but they're all interesting in some way, and involve sysadmins trying to recover from problems.

http://www.cavecanen.org/linux/humour/horrorstories.txt

Wednesday, December 10, 2008

Never thought to check there...

I never thought about it, but there's a Wikipedia entry for Systems Administrators.

It's an interesting view from inside the cage, so to speak.

Also, completely on topic with the subject of this post...

There's sort of like an unwritten rule to networking and systems that states that the strangeness of a problem has a strong correlation to the likelihood that it's DNS based.

We spent a week tracking down an issue with one of our users who could see a file on one server, but not on another, even when both were pulling from the same fileshare.

We made no headway until I tried to access the file. I could log into one of the servers with my newly minted account, but not the other. Using dig, we figured out that one of the machine names was an old DNS entry that should have gotten updated but didn't, so when she logged into the server, newer files than were on the old testing machine didn't show up, but they did on the production machine.

Always suspect DNS for weird issues.

Tuesday, December 9, 2008

What? SSH stuff AGAIN?!?!?

Apparently the SSH fiasco isn't done. I didn't believe it either, but there are still things that haven't been covered!

Daniel, at Bonetree Blog wrote an overview of a great tool to have in your toolbox: SSH tunnels. Completely aside from the inherent security that an SSH tunnel provides, I've got lots of random hardware (usually cheap routers, APs, and the like) that only want to allow an administrator to log in if the admin is on the same subnet that they are. That's a pain in the butt when you're a couple of states away! To remedy this, I connect to a server that IS on the same network as the device and I create an SSH tunnel through the server to get to the appliance. Daniel explains it better than I'm doing, and he actually uses it to make a SOCKS proxy. Just read his article.

Monday, December 8, 2008

Layers of (non)abstraction

I couldn't sleep the other night, and to try to put myself to sleep, I figured I'd try some remedial programming. I grabbed the Linux 3d Graphics Programming book because it's A) relatively interesting, and B) actually a pretty good primer if you want to refresh the basics of how programming, windowed interfaces, and 3d graphics in general work.

While reading through the section on object factories, a realization hit me. There is an inherent difference in the way that programmers work and systems administrators work. Take the idea of objects, for example. As a programmer, as long as you know the interfaces to an object, you don't have to know anything else at all about the object. You don't need to know its inner structure, you don't care how its variables are defined inside, you just want to know how to access it.

I can't think of a single thing in the whole IT infrastructure that a sysadmin can look at as a black box. Even a server may need to be taken apart and repaired. Heck, I know guys who replace blown capacitors on motherboards. From the base electronics up through logical network design, sysadmins have to cover it all, and the smaller the shop, the more you have to know. I sometimes laugh (and other times lament) the fact that I can be interrupted from enterprise storage design to fix someone's broken mouse, and both are equally valid parts of my job.

I was talking to someone a while back about an open position that they had, and they asked me how wide ranging my experience was. I thought for a second, and said “Do you see that lightswitch? At my company, if it was broke, I'd have to fix it”.

Such is the life of a sysadmin, I suppose. I have wondered if I would even be happy at a place where I didn't have so many different things to do. I think I could somehow manage.

By the way, I'm in the NY/NJ area working this week, so updates may be (even more) sporadic. Just fair warning :-)

Thursday, December 4, 2008

Great comment on Bruce Schneier's blog

Bruce put up a blog entry today talking about a one time password generator built into a credit card. It sounds neat, sort of like an RSA security token.

Anyway, in the comments, there was this gem that made me laugh out loud:

Ive never understood how adding 3 more digits to a 16-digit number makes it more secure in the first place. Is this so that if you only managed to steal a copy of the FRONT of a credit card, then you don't reduce the length of the staff of Ra by the right amount and dig in the wrong room? Talk about movie-plot threat!

Posted by: bob at December 4, 2008 7:00 AM

You always get bonus points in my book for referencing Indiana Jones

User Support, Intrusion Detection, and Broken Firewalls. Kids, don't try this at home

I want to start by thanking everyone who has taken the IT Admin Job (dis)Satisfaction Survey. I've gotten around 250 responses in the couple of days it's been up, and that's great. I'm seeing some interesting trends, and I hope to continue to receive responses for the next couple of weeks before publishing results. I'm leaving a link to the survey in the top right hand corner of the blog's homepage (http://standalone-sysadmin.blogspot.com for those of you who use RSS readers), so if you haven't taken it yet, please take a moment and go for it.

I've been very busy at work in the past couple of days, which explains the lack of blog entries. I've been hip-deep in user issues while I've been trying to work on building a Network Intrusion Detection System (NIDS) using Snort. Then I had a firewall cluster member die and try to take the remaining member down with it. It's just been a fun week so far ;-)

If I don't get to write another entry till the end of the week, I apologize, it's just that I usually write my blog entries the night before, and I've been beat and haven't had the energy.

If any of you have killer Snort tips, my ears are open. I'm using the extensive documentation that is available on the Snort site, and also Network Security Hacks, a very fun book to leaf through, and absolutely worth the $20 sale price.

Tuesday, December 2, 2008

IT Administrator Job (dis)Satisfaction Survey

I drove a lot today, returning home from my holiday weekend. While I was driving, my mind wandered to my job. It was supposed to be my day off, but all I did today was fix things and provide support to users. I was bitter, because I felt like I never have time off, even when I'm off, and that I'm never away from work, even when I'm at home.

Then I started wondering exactly why I was whining so much. It's not like these aspects of the job were unknown to me when I joined the company. At our Christmas party two years ago, my boss told me that part of the reason he hired me was because I told him about the other times I've shrugged off personal life to fix things for work. I guess I should have seen it coming.

I thought back to some other conversations I've had with people, and what their experiences are as admins, and really, the ones I've talked to don't have a clue how the rest of the world works, in terms of numbers of supported devices and supported users, hours worked, etc etc.

To fix this lack, and hopefully to provide some transparency to our profession, I drew up a quick 10 question survey at SurveyMonkey (a great site for building surveys). I call it the IT Administrator Job (dis)Satisfaction Survey. Please take it, it will only use a few seconds of your time, and every result helps to add to the shared knowledge of our positions.

The questions aren't perfect, but I think that they should shed some light. I'll leave the survey open for 2 weeks, till December 16th. After that I'll examine the results, produce some graphs, and post the results for you to see.

Thanks for your help!

Monday, December 1, 2008

RSS aggregation of blog entries and the like

The other day, I read a post on MySysAd blog about people copying blog entries verbatim and passing them off as their own. esofthub added a postfix to his RSS feed to automatically credit the source of the material, which I think is a great solution.

I wasn't too worried about it until I google'd for a random string from one of my blog entries. The results led me to http://www.melonjuice.com/planet, which appears to be an RSS aggregator for technology blogs.

The owner of the site doesn't claim at all to be the author of the material, and I love that my feed is getting aggregated there, but if you look at the site, there aren't any sources listed, except a small link at the bottom with the first name of the author and a link back to the source post.

Like I said, I don't mind, but I have added a 'signature' to my RSS feed which displays the source domain. If it bothers anyone, please drop me a line at standalone.sysadmin@gmail.com and let me know. Otherwise, I'll keep it on. It doesn't seem too intrusive and provides confirmation of the source of the data.

And lest there be any doubt, if you have an aggregator on your site, feel free to aggregate this blog. You're very much welcome to do it, and you don't even have to let me know, though it's neat when I hear from a new site that's doing it :-) Thanks for your support and interest in my material!

Sunday, November 30, 2008

Gift ideas for you and the other sysadmins in your life

I meant to post this on Friday, but I was busy recovering from standing in line for a couple of hours to buy a new TV for $400.

Every Christmas people ask me what I want, and I always give the kneejerk response, "I don't know, nothing?". I usually can't come up with something that I genuinely need or want typically, though there are lots of things I'd be pleasantly surprised by. I'm really not hard to shop for, but I think that people think I am. I'd be happy getting nothing, or even just a card.

Anyway, I set out this year to try to compile a list of things that they can get without worrying whether I will like it. I found some neat stuff online, and thought that you might be interested in the same things I am, so I've compiled a list of stuff, or more like a list of lists. If someone wants to know what to get a geek, just hand them this page.

First, since I just got my brand new HDTV:

I'm a Browncoat and not ashamed of it. I've got the series on DVD and I watch it pretty often. Watching it on Blueray will be pretty sweet, and if that special geek in your life digs SciFi, you can't go wrong with Firefly. (click the box to go to the Fox store)

If there's anyone reading this who isn't familiar with ThinkGeek, you should click the logo and check it out. It's Geek nirvana. Everything is great there, and at one point most of my wardrobe was from their T-shirt section. I was going to give a couple of categories, but really, if it's on that site, there's a good chance your geek will like it.

Sysadmins as a general rule really like to learn. A lot. To that end, here's a link to every product on Amazon with the tag of sysadmin. Lots of great books. If you find yourself getting lots of stuff from Amazon, it's probably cost effective to subscribe to Amazon Prime, so your shipping is free or much cheaper.

Even Wired is getting into the season with this list of Geeky toys that will make you a christmas hero. Some of these are on the expensive side, but anyone who does manage to hack the triceratops gets kudos from me.

I'll end with one of the coolest lists I found: Make's Open Source Hardware 2008. If you're a hardware tinkerer, then this is your list. I'm *not* a hardware hacker, and I want to get some of these. Excellent stuff.

anyway, just some Sunday fluff to fill space. Hope you're having a good weekend.

[UPDATE]
Now with smaller inline image. Sorry!

Saturday, November 29, 2008

More LVM information

I talked about Logical Volume Manager in my Intro to LVM in Linux.

Tonight I came across an article on backing up LAMP stacks with LVM snapshots.

I knew LVM could do it, but I wasn't aware of the particulars . Justin Ellison's article on it clarifies many off the difficulties with the process. His particular howto is geared towards LAMP (Linux, Apache, MySQL, PHP) setups, but it is by no means limited to it.

Read through his write up and let me know what you think.

One thing I am interested in seeing is how well it scales. He mentions 500MB of data, which is around 1/600th of the size I'm dealing with. I do have to wonder how quickly I could create a snapshot of that amount of data.

Anyone have more experience with this?

Friday, November 28, 2008

Quality Assurance vs Quality Control

Are you good at finding faults in your infrastructure, or are you good at making sure there are no faults. As Jason Cohen relates, Quality Assurance is not Quality Control.

Like many other topics, this is written to programmers, but is a good lesson for sysadmins as well.

Doing sysadminy things with Windows PreInstalled Environments

I was, until recently, unfamiliar with the concept of a Windows PreInstalled Environment. For those of us who are primarily Unix based, this is basically like a live CD that boots straight into Windows.

There are a few of these PE CDs available. You can use Microsoft's Preinstallation Environment, or maybe the UBCD4WIN (Ultimate Boot CD for Windows), but the one that seems to get the lion's share of attention is BartPE. There's also REAtoGo, which seems to be a customized BartPE disc. To be completely honest, I haven't used any of these yet, but I'm looking forward to trying it.

Whichever you go with, building the CD seems to be a similar process. You use your own Windows install disc and customize the software through slipstreaming.

Once you've got the disc setup the way you want, it becomes easy to administer your Windows server using it as a known-clean boot. Virus cleansing is risk free, you've got the full gamut of useful Windows recovery tools at your service, and Earnest Oporto used it to update his firmware. What a great idea. How often do you see stuff like that which requires Windows? Sure, there are ways to update that particular firmware without Windows, but for lots of hardware, there isn't. This is a viable solution in that case.

Since I'm woefully unexperienced in this department, I'll appeal to you. Have you ever used a Windows PreInstalled Environment? What types of things do you do with it? Any tips or tricks?

Thanks for sharing!

Thursday, November 27, 2008

Happy Thanksgiving!

Here in the United States, it's Thanksgiving today, and I've I'm off on holiday to visit my wife's family in Cincinnati.

It's customary to reflect on the things that we're thankful for, so I thought that I'd share some here. Hopefully you've got some things that you're thankful for as well.

I'm thankful for:

My family and friends, even if I don't get to see them all as often as I'd like
My health, while not being the best, is better than a lot of people's
My profession, because I get to learn and grow in it

There are lots of other things that are small in comparison with those, but I really do appreciate the blessings that have allowed me to become who I am and do what I do.

Today, whether you get to spend time with the people important to you or not, reflect on what you're thankful for, and consider those who are less fortunate.

Happy Thanksgiving!

Tuesday, November 25, 2008

The case of the 500-mile email

This is a great story about someone who had a user who couldn't email anyone over 500 miles away.

This is why being a sysadmin can drive people crazy.

Monday, November 24, 2008

Infrastructure Switchover

Well, the big event that I've been building to for the past few months is done. All that's left is sweeping up the dust.

Our previous primary/production/main site was in a colocation in central Ohio. It's not a bad facility, but it's geographically non-ideal (the company is recentering in NorthCentral NJ), and the infrastructure there isn't the best. Far better than we can provide, but it can't touch what we've got now.

We relocated to a Tier 3 to a Tier 4 [pdf link] datacenter. The new colocation features world-class infrastructure, from multifactor security to N+2 generators. It's hot.

I am now able to say that I'm not at all worried about the physical plant of the primary site. If there's a problem, we're probably going to cause it ourselves. This is both a relief and a curse ;-)

All I'm doing right now is going around the network making sure that things are running alright after the change. Host keys and things of that nature are all fine, since those were extensively tested prior to the switchover. The things I'm concerned about now are processes which weren't fully tested because they couldn't be, due to the architecture change.

This week is pretty much over Wednesday, thankfully. After that, I'm looking forward to a nice relaxing break where maybe I'll finally get to finish polishing my Simple Talk: Exchange

Interesting bug in fresh CentOS install (or why I'm glad I didn't pay for RHEL support on all my servers) from The Life of a Sysadmin

StAardvarktheCarpeted ran into a really interesting bug the other day, and wrote about it. Apparently on his CentOS 5.2 machines, users who were authenticated against an LDAP server couldn't pipe commands.

Right. 'ls' would work, 'grep' would work, but 'ls | grep' wouldn't work. The problem came down to a bug in the distributed nss_ldap software, and as StAardvark alludes to, the bugzilla discussion is well worth reading.

It's sort of interesting to note that the original bug was issued in May of this year, but an actual fixed package wasn't available until the end of July, even though the upstream software was repaired 5 days after the bug was submitted.

Even CentOS (the free version of RHEL) fixed the bug in June, while RedHat support-paying customers didn't get fixed unless they called support for help. The instructions that they were giving out weren't published until a couple of days before the updated package was released.

I've heard that RedHat support wasn't worth buying, but jeez. To actually punish users by making them wait longer for a fix than the free version is pretty bad. I'll stick with CentOS at this point.

Saturday, November 22, 2008

Imagine a beowu...wait, I mean RAID array of these...

So I'm sitting here browsing through Reddit and I find a list of interesting USB devices.

And on that list is this:

60 USB ports with flash drives in them.

My eyes light up! My God! 60 drives....I know 32GB flash drives are getting cheaper...32x60...almost 2TB of raw fast storage...

Of course, bringing me down somewhat was the fact that it costs $8,000...not to mention that it's a duplicator, not a hub. Alas, these dreams...

I still think that if you had critical information that only three or four people were allowed to have access to, it would be neat to setup software raid and encrypt the partition across one device for each person. When you're done accessing the information, everyone gets their drive back, and no one person can look at the data.

Friday, November 21, 2008

Configuring the Netgear SSL VPN Concentrators (SSL312/FVS336G) with Active Directory / LDAP

If you're a chronic Standalone Sysadmin reader, you might remember that a while back, I started implementing an Active Directory infrastructure with-which to centralize authentication for my Linux hosts. Well, getting the Linux machines authenticated has been completed, and now I'm on to further integration.

I've talked about my VPN issues before, and that I picked up some SSL VPN concentrators. An added feature of the ones I got are the ability to authenticate against Active Directory and/or LDAP. I figured it was time for me to put it to use.

Now, I'm no old hand at Active Directory...I've got the kind of knowledge that comes from reading several books but never touching it; in other words, the kind that leads to pain and gnashing of teeth. When I started creating my AD users, I just had employees, so I created users in the "Users" folder. Straightforward enough. We've got several classifications of users, and many users are in multiple groups, which in Windows is easy enough. I used a pretty much 1-to-1 Unix group to Windows security group (not Windows distribution groups, which is only for email (thank you SAM's ActiveDirectory Unleashed)). The group assignment was simple.

Then I thought about it, and remembered that we really have a couple of different bodies of users. For example, some of our clients have FTP accounts that they connect to in order to drop off or pick up files. There wasn't any sort of hierarchy under "Users", so I created two new security groups: the first, employees, which contains all of my employees, and clients, which contains all of our clients. I restricted the accounts in AD so that the clients could only login to the FTP servers, and I setup Likewise-Open so that only accounts in the "clients" group could FTP in, and only accounts in the "employees" group could connect to the rest of the machines. Theoretically, all of the other machines were inside the network, and behind the firewall so there's no chance a client would be logging in anyway, but there's no sense not being thorough.

All was well and good until I went to set up these VPN boxes. The only fields I could fill out were "authentication server", which was the local domain controller, and domain. Well, both answers were straightforward enough, except that if I set it up that way, all of the "client" users would be able to log in and get a VPN connection to the inside network. I tested it, and was right. Not a good thing.

I read a few help files and some documents on the devices, and found a suggestion for limiting group access. It suggested pointing to the OU (which is sort of like a folder in LDAP terms) that contained the appropriate users via LDAP authentication, rather than a direct "Active Directory" connection. Erm. Okay?

In an aside, I knew that Active Directory was essentially a gussied up LDAP server, but I didn't (and don't) know all that much about LDAP. I have a really big LDAP book that I've skimmed part of, but to say I have a mastery of it would be laughable. I know that "DN" is distinguished name, and "OU" is "organizational unit", and there's some sort of hierarchy for when you are building DNs. Or something like that.

So I read, and researched, and played, and installed the ldap tools package, and researched some more. And made liberal use of the "ldapsearch" command, and found this post which taught me how to query the Active Directory server from the command line. And it was good.

I read some more, and played some more, and came to the sad realization that I couldn't make my VPN boxes authenticate against the AD LDAP unless I modified it to create an OU to hold the accounts that I wanted to allow access to.

When you're faced with a problem that you know little to nothing about, and you want to test an idea that you suspect might work, but might also break the entire infrastructure you've spent the last few months of your life building, it's a good idea to get a second opinion.

That's why I gave my good friend Ryan a call (he's on hiatus from his blog while he assembles a datacenter from a box of erector sets), who knows far more about AD than I do, and explained the situation. I said that the manual suggested pointing to an OU, and that my research suggested that I might want to create another OU for the accounts to live in, but I was concerned that there was some sort of "Windows Magic" that would be broken if I just started to move accounts to this new "OU" all willy-nilly.

Ryan suggested making two OUs, one for "internal" accounts, and one for "external" accounts. Then, and when he said this, I smacked myself in the forehead, he suggested making "test" accounts in the Users folder, verifying that they worked, and then moving them, and seeing if they still worked. Ryan is a brilliant guy, and I owe him a few more beers now :-)

So I followed his suggestions, created the OUs, created the test user, it worked fine, tested transitioning my account, and it worked fine, and then I tested moving the client FTP accounts. They worked fine. I had created the OUs, moved accounts, and nothing broke. Glorious.

Time to get the VPN machines to authenticate. I created a new domain, using LDAP authentication, and it asked for the server address and the base DN. The server address was just the IP, and I had gotten good enough to know that the base DN was going to be "OU=Internal,DC=mydomain,DC=TLD". I saved it, opened another window and tried to log in with my domain credentials. And failed.

I thought about it, and remembered from doing the command line LDAP queries that my Distinguished Name (DN) actually started with "CN=Matt Simmons" rather than msimmons@mydomain.tld. On a hunch, I tried logging in with a username of "Matt Simmons" (without the quotes) and my domain password. Light shone from the heavens, choirs of angels sang, and I got the VPN portal.

That, my friends, was my experience Thursday. I've learned a lot, and I feel a lot more confident about LDAP and Active Directory. And I'm able to continue to centralize user administration. It feels pretty good.

I'm really interested in what other people are doing with Windows servers and Active Directory. Are there tips that you've picked up on the job and want to share? I'm really open to suggestions on what I've been working on too. I know so little that almost everything I hear is new information. It's an exciting phase for me.

Thursday, November 20, 2008

Default vsftpd on CentOS is dumb

This is pretty ironic, since the vs stands for "very secure".

From the top of /etc/vsftpd/vsftpd.conf:

#
# Allow anonymous FTP? (Beware - allowed by default if you comment this out).
anonymous_enable=YES
#

This is the default configuration. Now, one of a couple things is going on here. Either the comment is lying, or the configuration flag is lying, or I'm terribly confused about what these words mean.

I figured that I'd check to see which was the case:

msimmons@newcastle:~$ ftp ftp1.test
Connected to ftp1.test.
220 (vsFTPd 2.0.5)
Name (ftp1.int.ia:msimmons): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp>

OKAY! This isn't good. In fact, it's a Bad Thing(tm).

Lets fix that. Ignoring the utterly stupid comment, I switch the flag to "NO", and restart the daemon. I try again, and I fail. Horray. Lets see what else I can find.

I log in as a pretend user, and I authenticate fine. I 'cd' to .. and do a directory listing, and what do I find, but all of the various client accounts. Our clients are confidential, which means that them seeing each other would be a Bad Thing(tm). I dig into the config again, and find this gem:

# You may specify an explicit list of local users to chroot() to their home
# directory. If chroot_local_user is YES, then this list becomes a list of
# users to NOT chroot().
#chroot_list_enable=YES
# (default follows)
#chroot_list_file=/etc/vsftpd/chroot_list

Great, so apparently, I just need to find and flip the flag on chroot_local_user. Of course, it doesn't exist in the file. I create it, set it to "YES", restart the daemon, and things are working the way they should.

The question in my mind is why an FTP daemon that bills itself as Very Secure comes with such an asinine configuration. There are occasions where chrooting the ftp users isn't called for, but there are relatively few occasions that require anonymous FTP access. I can't understand why they wouldn't have shipped a secure config and then made people unsecure it, as opposed to the way it is now. Really hard to believe.

I suppose it is possible that the distro maintainers are responsible, but it's still stupid.

Wednesday, November 19, 2008

Unix Rosetta Stone

I just found the Unix Rosetta Stone which seems simplified, but still probably handy if you've got a really heterogeneous network, or if an AIX machine should suddenly spring up in the middle of the server room.

Judging from the number of Delicious bookmarks it has, it's pretty well know, but I figured that I couldn't be the only person in the dark, and I figured someone might get some info from it.

Wacky SSH Authorized Keys Tricks

You may have caught my blog post last week about setting up host to host ssh keys.

What you might not have caught was in the comments, where Ben Cotton mentioned a trick I hadn't heard of, namely specifying the allowed remote commands in the authorized_keys line. He said there were even more features available, just waiting on the manpage. I replied that if he wrote it, I'd link to it.

Well, Ben put his money where his mouth is. He goes into nice detail and provides some good links and suggestions. This is really fascinating stuff, and I'm looking forward to using it in my own organization.

Therek over at Unix Sysadmin jumped in the fray, too. He's got three neat tricks for your ssh needs that you should really check out. I had no idea SSH key auth could be bent in these directions!

I've said it before, but I'll keep saying it. I love having visitors to my blog who enjoy what I write, and it really brings it home to interact with everyone like this. I couldn't ask for a better bunch of readers, though to be honest, I'm worried about Ben's longevity. I can't imagine what his cholesterol level must be ;-)

Ben, Therek, thank you both very much! I know my readers will really enjoy these articles. And as for everyone else, the same offer goes for you. If you've got something to share, let me know, I'll be happy to link to your blog entry or host it here if you've got the urge to write.

Tuesday, November 18, 2008

Great tool for network diagramming

I'm getting ready to implement a new Nagios monitoring system at our soon-to-be-production server, and I'm using Nagios v3 this time. Because I sort of figured out the configuration on my own last time, and it grew in a very organic (read: unplanned) way, the config is a mess. That is going to be different this time, thanks to Nagios 3 Enterprise Monitoring. It's not an intro guide to Nagios, that's for sure. The first chapter deals with what's new. The 2nd chapter deals with streamlining the configuration for large installations. It's been very educational in teaching me how various hostgroups and service groups can work together to really make life easier for configuring monitoring.

After reading this book pretty much cover to cover, I decided that I needed to logically map out the various relationships of my services, to figure out the inheritance policies (Nagios supports multiple inheritance in configuration objects).

I started looking for a good free diagramming tool, first on Windows then on Linux. Windows was hopeless. I found lots that looked promising, but ended up being shareware. I don't have MS Office Pro on my personal laptop, so I didn't have Visio handy, and I wasn't going to buy a piece of software when I was sure that something good and free existed.

Giving up, I booted into Linux to see if anything I didn't know about was in synaptic. Of course not. The best diagramming solution in Linux is Dia, and I'm sorry to say it, but it's ugly. Really ugly. I'll use it if that's the only thing available and I'm just looking for something quick, but I won't like it.

I kept looking, and finally out of desperation I did a search for online applications, and I hit the jackpot. I found Gliffy. It's a flash diagramming application with built in stencils for all sorts of things, and the ability to add your own clipart. It'll even export to Visio.

I was impressed. It's free for personal use up to 5 public diagrams. You can pay $5/month for unlimited drawings and removing the ads, and there are corporate versions that have built in collaboration. It's easy to use, and it helped me a lot. Here's a drawing of some of my nagios groups:

If you're in the market for a cross-platform diagramming solution, you could do a lot worse than Gliffy.

Monday, November 17, 2008

Building and designing systems: Is the cart pulling the horse?

There's a really interesting post over on Code Monkeyism about test driven development of code, and how it's related to the design of the space shuttle engines.

The short of it is that opposed to typical complex engine designs, where each individual part was tested independently and then together in subassemblies, and then again when the unit was complete, the space shuttle was pretty much designed, assembled, then tested. The better method has the advantage of weeding out all the really bad decisions in the small scale, then when you get to the point that you put them together, it generally works rather than flying apart at high speed.

While Code Monkeyism is primarily centered on software development, the points that Stephan make are readily applicable to us as infrastructure engineers, particularly in a growth phase where we're engineering new solutions and trying to implement them.

I'm as guilty of putting the cart in front of the horse as anyone. My debacle with the cluster was a prime example. When you're given a job to do, the equipment to do it with, and no time to learn, these kinds of things happen. Particularly when you're working with shoddy tools anyway.

I shouldn't have attempted to have the very first cluster I created be a production system, first. More due diligence in researching solutions was called for, and I probably would have learned beforehand that RHCS wasn't ready for prime time. I have learned from the experience, though, so all is not lost. Using the knowledge and experience I've gained, the next time will be more solid.

Is this something that everyone has to learn on the job, or was there a class or memo that I didn't get?

Friday, November 14, 2008

Host to host security with SSH Keys

I have a lot of Linux hosts. Somewhere in the vicinity of 70 servers, all but 3 run some variant of Linux. Lots of them have to communicate seamlessly through cronjobs and monitoring daemons. To pull this off, I've implemented SSH key authentication between the applicable accounts. The method is pretty easy.

Check the ~/.ssh directory for the user you want to ssh as. There's probably a "known_hosts" file, which keeps track of the machines that user has contacted previously, and there's probably an id_dsa and id_dsa.pub. These are the private and public keys of the user, respectively. You might instead see similar files, but with "rsa" instead of "dsa". These are keys that have been created with another encryption method. See more information here.

We have the keys now, so what we want to do is make the remote machine aware of them, so that our account on the source machine which has the private key can connect without authenticating with a password. To do this, we install the public key (the id_dsa.pub) in the ~/.ssh/authorized_keys of the remote account we want to connect to, on the remote host. So, we have

Machine A:
User: msimmons

Machine B:
User: msimmons

machineA$ cat ~/.ssh/id_dsa.pub
[text output]

machineB$ vi ~/.ssh/authorized_keys
[insert text output from machineA]

Ensure that the permissions on the authorized_keys file are not world-writable or the ssh daemon will probably refuse to connect. It should also be noted that your sshd config (probably in /etc/ssh/sshd_config) should be setup to allow key based authentication. The manpage should help you there.

At this point, you should be able to connect from one account to the other without a password. This allows you to use rsync to transfer things automatically, through the cron. It would look a bit like this:

machineA$ rsync -e ssh -av /home/msimmons/myDirectory/ msimmons@machineB:/home/msimmons/myDirectory/

Read the manpage for (many) more rsync options.

There is a weakness to this method, though. Anyone that obtains a copy of the private key (the one in machineA called id_dsa) can pretend to be you, and authenticate as you to machineB (or any other machine that has your public key listed in the authorized_keys). This is potentially a very bad thing. Particularly if you have your private key on your laptop, and the laptop gets stolen. You wouldn't want a thief to get their hands on your private key and compromise the rest of your network. So how to get around not needing a password, but not wanting someone just to be able to use your private key if they get a copy. The answer is to use a pass phrase on your private key.

Through proper use of the ssh-agent and ssh-add commands, you can set up passwordless communication from one machine to another. I could explain the common usage of these, but it would just be duplicating this fine effort from Brian Hatch: SSH and ssh-agent. He talks about setting up ssh-agent and ssh-add, but if you're like me, you've already got existing SSH keys laying around without passphrases. The answer to that is to simply run ssh-keygen -f [keyfile] -p to reset it.

Now that you've got a working secure key and a way of not having to type your passphrase every time, lets figure out how to get your servers to take advantage of the same technique. At the very least, you're going to have to type the user's passphrase once, either the first time you want to connect, or (more likely) when the machine boots up. That is not to say that you'll require a password to boot the server, just that before your cron jobs run, you'll need to start the ssh-agent.

Once you start the ssh agent on the remote machine and add the key (per the instructions above), how do we keep that information static? Well, remember those variables that ssh-agent setup that tell 'ssh' the socket and PID to talk to the agent with? It turns out that you can put those (and any other variables you need to be static and universal) in the crontab at the top:

msimmons@newcastle:~$ crontab -l
SSH_AUTH_SOCK=/tmp/ssh-sBrpd11266/agent.11266
SSH_AGENT_PID=11267
48 10 * * * ssh root@testserver.mydomain uptime > ~/uptime.txt 2>&1

This will allow any of the scripts being called by the cron daemon to access the variables SSH_AUTH_SOCK and SSH_AGENT_PID, which in turn allows your scripts to ssh without using the passphrase. All that is required is updating the crontab when you reboot the machine and/or restart the agent.

On my desktop, since I ssh a lot, I add the same variables to my .profile in my home directory so that I only need to type in the passphrase once. If you find yourself connecting to other machines frequently from the server, you might want to do the same thing.

I'm sure I messed up the explanation in some parts, so if you have any questions, please don't be afraid to ask in the comments. I hope this helps someone set up their key-based authentication in a more secure manner.

[UPDATE]
See the followup to this article!

Datacenter that could belong to S.P.E.C.T.R.E.

This datacenter is dripping with "evil lair" vibes.

The Worlds Most Super-Designed Data Center

Tips for an initial buildout

St Aardvark the Carpeted (best. name. EVER.) has been working on building out a new data site for one of the companies he works for. He's got some great tips on things to remember and take into account before you do one of these yourself.

I know the installers that built my most recent rack at the colocation really appreciated the
layout that I made to show what was going where in the rack. I also prepared spreadsheets listing all the cables and where they went. The colocation also needed all serial numbers to all equipment that I was bringing in, which is good for me to have anyway and is probably a good practice to have on hand.

Has anyone else got any tips for a one-time build out that would help?

Tuesday, November 11, 2008

Sysadmin Extorts company for better severance package

I just read this over at TaoSecurity. Apparently a recently laid-off sysadmin was arrested because he threatened to bring down the IT infrastructure if his severance package wasn't improved.

This isn't the first disgruntled sysadmin story we've seen this year. Please, I beg of you, spare the servers in your rampage. ;-)

Where to put your system monitoring

I'm getting ready to implement my new Nagios monitoring system, and I've been researching best practices.

My current setup is that I have 3 “data sites”, which I consider to be physical locations where servers are kept. The primary site, the backup site, and the soon-to-be-primary site. When the new site becomes primary, the current primary will become backup, and the backup site will go away. Here's how they're setup:

They are geographically diverse. and as you can see, there is limited bandwidth between them.

Nagios is currently set up at the Backup site, and has remained unchanged for the most part since the backup site was the primary (and only) data site. This is not ideal, for a bunch of reasons.

Because of the way Nagios queries things, it is at the mercy of the networking devices between it and the target. If the router in-between goes down, then Nagios sees everything beyond that router as down. You can alleviate the most annoying side effect (dozens or hundreds of alerts) by assigning things beyond the router to be "children" of the router, in which case Nagios will only let you know that the parent is unavailable.

Aside from not having status checking on entire segments of our network in the event of an outage, what if the segment with no network access hosts your mail server? I've had this happen before, and it's disturbing to suddenly receive 2 hours worth of 'down' notifications at 3am. Not a good thing.

To circumvent this type of behavior, I'm going to be employing one nagios at each location:

In the event that one of my sites loses network access, I've still got another host to send messages.

If you monitor, how do you guys arrange your monitoring? If you don't, any plans to start?

Monday, November 10, 2008

Encrypted Filesystems out of the box on CentOS

Like many people that have multiple locations, I sometimes have to get in my car and sneaker-net a hard drive to another facility. Sometimes I ship them via FedEx. In any event, whenever I take a hard drive out of my business, I run the risk of becoming another statistic. These days, it seems that a month doesn't pass where some high profile data has been breached. It happens frequently enough that there's a blog devoted to it.

Anyway, I've been looking for ways to encrypt the drives I transport. It looks like the "best" way is to use TrueCrypt for encrypting the entire device. It's cross platform (Windows, MacOS, and Linux) and has a great interface and is pretty easy to script.

My problem is that it is a comparative pain in the butt to get running on my platform of choice (CentOS/RHEL5). If you look, the only supported Linux versions are Ubuntu and SLES. Yes, I can compile from the source, and I have to test things, but I don't want to have to manually recompile things on production servers. I suppose I could compile it once and package an RPM if I had the time and knowledge (and the time to acquire the knowledge). Instead, I decided that it wasn't the solution for me, unless it was the only solution available. So I kept searching.

Today I chanced upon what I think is a great solution. Using the dm-crypt software along with built in loop devices, it's possible to encrypt a device without using any non-native software.

In the (hopefully) unlikely event that the link I pointed to goes away, here is the (much abridged) process:

If you're using a file, rather than a device (to have an encrypted volume sitting on an otherwise unencrypted filesystem), create the file, here using 'dd':

dd of=/path/to/secretfs bs=1G count=0 seek=8

Setup the loop to point to your file/device:
losetup /dev/loop0 /path/to/secretfs

Create the encrypted volume with cryptsetup:
cryptsetup -y create secretfs /dev/loop0

Create the filesystem on the device:
mkfs.ext3 /dev/mapper/secretfs

Mount the encrypted filesystem:
mount /dev/mapper/secretfs /mnt/cryptofs/secretfs

And now you have access.

To remove the filesyste, perform the last few steps in reverse:
umount /mnt/cryptofs/secretfs
cryptsetup remove secretfs
losetup -d /dev/loop0

Whenever you want to remount the device, just follow all the steps above that don't use dd or create filesystems.

There you go, an easy way to have encrypted volumes on your CentOS/RHEL machines.

Saturday, November 8, 2008

Best. Bug. Ever.

The other day we talked about mobile devices for administrators. Today I read about a particularly amazing bug on the Google Android platform that you might be interested seeing.

According to that article, certain firmwares (1.0 TC4-RC29 and below) spawn a hidden terminal window that accepts all keystrokes as commands. *ALL* keystrokes.

The person who discovered the bug was in the middle of explaining to his girlfriend why he had to reboot his phone when his phone rebooted again. Because he typed the word reboot. Good thing he wasn't explaining the various uses of the 'rm' command.

Now THAT is a bug. By the way, the best workaround (aside from updating the firmware) is to edit init.rc and take out the lines at the end that spawn the shell.

Friday, November 7, 2008

Justify the extistance of IT (from Slashdot)

If you've ever wondered how to value your time, or justify your hours, there's a lot of input going on at Slashdot.

I've been lucky to escape this particular issue, but I know there are a lot of people who have to constantly fight for what little resources they're given. Maybe this discussion can help someone out there.

The Creative Admin

Curious. Intelligent. Technical. Detail oriented. Stubborn.

Are these words that describe you, or do they describe the traits required for the position you hold? Is there a difference, after a while?

What if, in addition to those, you had other traits?

Imaginative. Creative. Inventive.

Are these traits (and others like them) that you think would be useful in your job? Would being inspired by your creative side yield positive or negative results? As Tom Limoncelli asked the other day, do you really want your electricians getting creative?

Does your position inspire, require, or even permit you to be creative? It might not, but I know that it can. Art isn't just something that lives in a museum, or hangs on a wall. Art is sometimes intrinsic to science (great example), but art can also happen when science is transcended.

Your work can be your creative outlet. There are flickr groups full of examples. Don't get burned out and lose your will to innovate. It might seem like administration is work-by-rote sometimes, but don't lose sight of the bigger picture, and stay creative.

Thursday, November 6, 2008

WPA TKIP Cracked

Well, hell.

I caught on Slashdot today that WPA using TKIP has been compromised. At the moment, only communication from the router to the host is vulnerable, but I can't imagine that it will stay that way for long.

I'm really considering moving my wireless APs to the external network, as opposed to the internal access they have now. That would require anyone on wireless to use a VPN, which has superior encryption anyway, I believe.

Any thoughts?

Wednesday, November 5, 2008

Stupid Unix Tricks (from Slashdot)

I figure everyone that reads this blog knows about Slashdot, but in case you missed it, here's an entire thread of people contributing their Stupid but Useful Unix Tricks. I figure there's enough of Unix out there that we could all learn some more.

The balance of security and usability

You read it everywhere, from all the security analysts. Security is a process, not a goal. As the implementers and administrators of the control mechanisms, we need to be especially cognizant of that concept.

If you're anything like me, you tend to work on things in waves, or spurts. I'll go for a while, concentrating on one thing for as long as it takes to achieve my goal, then move to the next (probably unrelated) task. When it comes to improving the security of a particular segment of the infrastructure, if we tarry too long in one spot, though, we run the risk of becoming a bit too fervorous in our decisions and wind up becoming draconian.

Rather than becoming like Mordac, we need to view ourselves as enablers of technology. There is a balance to be struck, and that's the hard part. The line is sometimes fuzzy between information security and infrastructure usability. Where you draw will depend on the importance of the data you are protecting, and the organization you're a part of.

Where do you draw that line in your organization? Do you get to decide, or are you at the mercy of policy makers who ignore an entire side of the equation?

Tuesday, November 4, 2008

Scalability in the face of the incoming onslaught

As you may know, today is election day in the United States. If you're a US citizen and registered to vote, go do it.

Because this is a big election, lots of political sites are going to feel the squeeze. To counter this, many of them are beefing up their facilities ahead of time. High Scalability is taking a look at techniques to improve response.

If you're not familar with High Scalability (the site), you should check it out. They frequently link to very interesting studies in reliability and to techniques that very large sites (google, facebook, etc) use to manage and balance load.

Sysadmin Mobile Devices

Manage any network of sufficient complexity, and eventually you'll want to be alerted to when something breaks. I've mentioned this in general, but not all devices are created equal. What should you look for?

In my devices, I need a full qwerty keyboard. I really do, even if I'm only replying to email. I've seen people texting with a number pad, but my brain is hardwired now to querty. Of course, if you were hardwired to Dvorak like some people are, you might feel differently.

By far, the most important service my phone provides me is email. Since we don't have a Blackberry Enterprise Server, I have a rule on my corporate mail that forwards email to my blackberry. It's actually a combination of rules, crafted to get it to work the way I want. Since I subscribe to all manner of lists and newsletters, those things get sent out around the clock. I don't want to be woken up at 3am because someone on the Likewise Open list can't authenticate their AIX machine. For this reason, I have my mail rules setup to forward everything (excluding some high-traffic lists) to the blackberry, and then from 10pm till 8am, only emails originating from our externally-facing domain are forwarded. Since all of my internal cron job notifications are sent from the imaginary domain we use for internal resolution, they don't get forwarded. I have specifically set up Nagios to send emails from an external-domain account, so I get them all the time. This ensures that my bosses can get a hold of me, and that I'm aware of any critical weirdness happening at any hour of the day or night.

Also important to me is an SSH client. I don't make full use of mine yet, for reasons I'll explain, but I can administer my firewalls from outside with my phone. I have heard, and someone please correct me if I'm wrong, that if your corporation has a Blackbery Enterprise Server, you can use that connection to reach internal hosts. I don't know that I'm going to be running my own BBES anytime soon, but that's a strong argument for it. There appear to be lots of remote desktop solutions available too.

All in all, my blackberry provides me with sufficient access to resources. I wish there was a VPN solution for it that I was convinced would work with my Netscreen solution , but I suppose you can't have everything.

Of course, I'm not suggesting that the blackberry is the bees knees, as it were. I'm sure there are better solutions out there. I'd like to think the iPhone would be amazing, but I don't know how typing commands on the keyboard would go. I doubt the auto-correct on spelling would like some of the unix commands I'd be typing.

What do you use for a mobile device? Can you do any remote administration through it, or is it just for communication, and you fall back to your laptop in emergencies?

Friday, October 31, 2008

Re: System Administration needs more PhDs

Tom Limoncelli has a post up today entitled System administration needs more PhDs.

He makes some great observations and brings up a lot of interesting questions. The one that I think the others flow from is "Why are good practices so rarely adopted?"

My opinion, gained through observation, is that sysadmins arise from one of two places. Either they start out in relative isolation, or they come from an environment with multiple systems administrators.

The former develop their own ways of doing things through trial and error and/or research. This leads to endless ways of accomplishing the same or similar tasks. The utter heterogeneity of possible platform combinations lends itself to having each admin reinvent the wheel.

The latter typically have an established infrastructure in place, a well defined set of hardware, and a much more rigid structure of procedures and usually a bona fide methodology for change management.

The reason that the standalone sysadmin almost never resembles the well trained sysadmin is because best practices all seem to be vendor driven, reliant on a subset of devices and situations, and are hidden as well as possible behind support and contract agreements.

Those are hurdles the lone sysadmin faces AFTER he has discovered the "optimal solution", whatever that is. You mention puppet. Should you use cfengine or puppet? Unless you know about puppet, you'll use cfengine, unless you haven't heard of that either, in which case you'll roll your own. In my experience, you'll find $betterSolution right as you're implementing $bestSolutionYouKnowAbout.

I don't know whether there are more sysadmins in a single environment than in a plurality, but there are a _lot_ of sysadmins out there by themselves.

By themselves, sysadmins rely on their own cleverness, but together you get a synergy of ideas. The whole becomes smarter than the sum of the individuals, but most sysadmins never get to experience that. That's one of the reasons I started my blog. To shed light on what other people are doing, how they operate in their organizations, and so on.

Your books are a great resource for sysadmins, but the lone sysadmins of the world need to start communicating between themselves, and with the "institutional" admins out there. The same solution won't always work, but the sharing will go a way toward a meritocracy of
ideas.