There's a really interesting post over on Code Monkeyism about test driven development of code, and how it's related to the design of the space shuttle engines.
The short of it is that opposed to typical complex engine designs, where each individual part was tested independently and then together in subassemblies, and then again when the unit was complete, the space shuttle was pretty much designed, assembled, then tested. The better method has the advantage of weeding out all the really bad decisions in the small scale, then when you get to the point that you put them together, it generally works rather than flying apart at high speed.
While Code Monkeyism is primarily centered on software development, the points that Stephan make are readily applicable to us as infrastructure engineers, particularly in a growth phase where we're engineering new solutions and trying to implement them.
I'm as guilty of putting the cart in front of the horse as anyone. My debacle with the cluster was a prime example. When you're given a job to do, the equipment to do it with, and no time to learn, these kinds of things happen. Particularly when you're working with shoddy tools anyway.
I shouldn't have attempted to have the very first cluster I created be a production system, first. More due diligence in researching solutions was called for, and I probably would have learned beforehand that RHCS wasn't ready for prime time. I have learned from the experience, though, so all is not lost. Using the knowledge and experience I've gained, the next time will be more solid.
Is this something that everyone has to learn on the job, or was there a class or memo that I didn't get?
Monday, November 17, 2008
Subscribe to:
Post Comments (Atom)



6 comments:
I think some of the most important lessons are leaned through unexpected hitches and tripping over yourself. From simple things like when I upgraded our server 2003 file/print box with a windows server 2003 64bit box without thinking about obtaining any 64bit printer drivers. Umm upgrading a bunch of Cisco 1721 routers running a flavor of IOS that supported SNA with 1841 routers with an IOS that didn't (resulting in some IBM workstation controllers I didn't even know we were using breaking :D) Umm.. Well there is more but it gets increasingly pathetic hah.
@rick
Oh, I understand how painful it gets looking back on the mistakes I've made. Thanks for sharing your experiences. Did you get the workstation controllers fixed?
No instead I replaced the 1 or two working 5250 terminals with windows boxes that were just sitting around. This allowed the sales people to use Outlook web access and allowed me to turn off the workstation controller and a couple other old switches that only interfaced with the controller/terminals. we've completely dumped our twinax/sna stuff at all of our branches and only have 1 left at HQ! Hope to get rid of it by the end of the week :)
Nice! sounds like a great way to turn a problem into an advantage!
Sorry for running around commenting on old entries, but well I've just discovered your blog, and well, I empathize (though frankly I rather enjoy my job).
In any case...how does "buy a NetApp" fix the problem? If you wanted to use NFS for access to the storage then you could just as easily do that with RedHat and wouldn't need a cluster FS (could use Linux-HA with DRBD or SCSI-reserve), and if you want to connect with iSCSI then....you still need a cluster filesystem if you want multiple servers to have the volume mounted. What am I missing here?
@Ryan
Hey, thanks for commenting, even on older entries. It's nice to hear from people!
I probably have given the wrong impression over the past couple of weeks. I really, really do like my job. My issue with my situation is that after spending large amounts of money to make sure that there is no technological single point of failure, there still is one, namely me. I don't mind the responsibility, but it does wear on a person after a while, as I'm sure you know. I go through phases, I suppose.
Anyway, regarding the NetApp, it's not a magical solution to everything, but it is widely considered to be the most reliable NAS/SAN appliance available, and to have the best performance.
Depending on the model, I believe, the same appliance is capable of acting as a NAS and SAN server at the same time (though for different slices, I'm sure).
Rather than running a cluster of servers attached to a SAN, your primary storage can just be the NetApp. Sort of like a Snap appliance on steroids. And by the way, lest there be any confusion, I've had two Snap appliances. Either they were both lemons, or it's a very bad choice.
Post a Comment