Tuesday, October 7, 2008

Big Picture: How do you store old email?

The biggest hurdle that we, as systems administrators, have to deal with is that our time is finite, and thus our knowledge is incomplete. The development and growth of the internet has been a boon to people like ourselves, who seek to gain information. The advent of effective search engines goes far in presenting this information usefully and keeps it at our fingertips.

I've found that there are some things that search engines don't do so well on. You can search Google for how, technically, to archive email, but you can't search for what the best policy is. You can't grep the experience of other administrators unless they've written about what they've done, and you can't download memories unless they've been recorded and put online.

That's a large part of why I started this blog.

None of us have as much experience as all of us, and by working together, we amass a pool of information and experiences that can help others learn what we have, and hopefully by standing on shoulders, heads, and feet, eclipse us and our knowledge. And then we do the same to them.

It's a type of bootstrapping where we architect our own usurpers, then try to follow their example and surpass their achievements. Collective learning and pushing and growing. It's great to watch and experience, and if you're reading this blog, you're a part of it.

You know that I ask for help all the time. Password retention, project management software, even best practices for security policies. I'm not afraid to admit that I don't know something, or that I don't know how to do something. There's no shame in ignorance, there's just an opportunity to learn.

In that same vein, I thank you for all the help you have given me previously, and ask again for advice and experience.

I have users who currently store, and semi-regularly read, email up to 8-9 years old. All told, one of my users weighs in at 11GB of mail.

If you're a regular reader, you know I asked about hosted email providers the other day. We're going to be limited to having 4GB of mail per user. The act of storing every email which touches the server, as suggested by this email compliance document, is completely unfeasible. We have several users who all receive the same tens-of-megabytes files every day of the week from several clients. I would need a $100,000 storage system with data deduplication just to store a few years worth of mail.

Aside from being completely out of the financial ballpark, it would violate the concept of keeping your least volatile data on your most expensive storage.

This article has what sounds like good advice, but much of it is provided by people with vested interests in selling solutions. I want to hear from other admins who deal with this. Even if you don't deal with it, what ideas do you have as to how it should be done?