There's a really interesting post over on Code Monkeyism about test driven development of code, and how it's related to the design of the space shuttle engines.
The short of it is that opposed to typical complex engine designs, where each individual part was tested independently and then together in subassemblies, and then again when the unit was complete, the space shuttle was pretty much designed, assembled, then tested. The better method has the advantage of weeding out all the really bad decisions in the small scale, then when you get to the point that you put them together, it generally works rather than flying apart at high speed.
While Code Monkeyism is primarily centered on software development, the points that Stephan make are readily applicable to us as infrastructure engineers, particularly in a growth phase where we're engineering new solutions and trying to implement them.
I'm as guilty of putting the cart in front of the horse as anyone. My debacle with the cluster was a prime example. When you're given a job to do, the equipment to do it with, and no time to learn, these kinds of things happen. Particularly when you're working with shoddy tools anyway.
I shouldn't have attempted to have the very first cluster I created be a production system, first. More due diligence in researching solutions was called for, and I probably would have learned beforehand that RHCS wasn't ready for prime time. I have learned from the experience, though, so all is not lost. Using the knowledge and experience I've gained, the next time will be more solid.
Is this something that everyone has to learn on the job, or was there a class or memo that I didn't get?