Backup vs. Archival and Thoughts on Archival
Archival often gets confused with backup. The activities are (technically) very similar and invite such a confusion. Both are the action of moving bits from "the place where everybody looks" (the mailbox, the current database, the file share, the intranet etc.) to some other place (a backup tape, a cheaper storage, a CD-ROM /dev/null etc).
Backup is for the sole purpose to keep data available in case the main storage area is no longer available (due to accidental deletion, soft- or hardware problems).
Archival is the removal of data from the "main area" to an "archive area" for later retrieval for historic or compliance reasons. A secondary motive for archival is to remove obsolete or less relevant data from the active work area to improve performance, shorten search time or save on storage in the system hosting the active work area. To confuse matters further: quite often technologies designed for backup are successfully used for archival (e.g. copy data to a removable storage like a tape or optical disk).
In other terms: you don't expect to ever restore a backup unless something went wrong, while accessing an archive can be part of a regular business process. There are a few perceptions about archival that need to be put into perspective:
Archival does not save any storage space!
At least not when you look at all storage across the Enterprise. However it can help saving storage on your active work area (which is most likely the most expensive one) and so help saving storage cost. IMHO the biggest advantage of archival is the reduction of data a user would look for, since the current work area only would contain relevant data. This is also the greatest peril of archival: when data gets archived too early and the archival location turns into yet-another-work-area-to-check. (OK your archive might use a better compression that your life system - but are you sure that is isn't just a backup?)
Archival needs information life cycle management
Every information has a certain life cycle. Like food items information has a "best use before" data (that varies depending on the purpose). It follows roughly the following pattern:
What's your Retention/Archival policy?
Backup is for the sole purpose to keep data available in case the main storage area is no longer available (due to accidental deletion, soft- or hardware problems).
Archival is the removal of data from the "main area" to an "archive area" for later retrieval for historic or compliance reasons. A secondary motive for archival is to remove obsolete or less relevant data from the active work area to improve performance, shorten search time or save on storage in the system hosting the active work area. To confuse matters further: quite often technologies designed for backup are successfully used for archival (e.g. copy data to a removable storage like a tape or optical disk).
In other terms: you don't expect to ever restore a backup unless something went wrong, while accessing an archive can be part of a regular business process. There are a few perceptions about archival that need to be put into perspective:
Archival does not save any storage space!
At least not when you look at all storage across the Enterprise. However it can help saving storage on your active work area (which is most likely the most expensive one) and so help saving storage cost. IMHO the biggest advantage of archival is the reduction of data a user would look for, since the current work area only would contain relevant data. This is also the greatest peril of archival: when data gets archived too early and the archival location turns into yet-another-work-area-to-check. (OK your archive might use a better compression that your life system - but are you sure that is isn't just a backup?)
Archival needs information life cycle management
Every information has a certain life cycle. Like food items information has a "best use before" data (that varies depending on the purpose). It follows roughly the following pattern:
- New: freshly created, might not be relevant yet (e.g. upcoming policy change)
- Current: data supports one or more business processes and is actively used
- Reference: data is no longer actively use, but is regularly required for reports or comparison
- Compliance: data is obsolete but needs to be kept for compliance (e.g. business records in Singapore : 7 years)
- Historic: the data doesn't need to be kept, it doesn't serve any active business process, but might be of historic interest. This state of information is a field of tension between (corporate) lawyers and historians: historians like to keep everything, while lawyers see a potential discovery risk (cost and content) in every piece of data kept. When analyzing the archival policies of any organization one can find out who won this conflict.
- Obsolete: In 2050 really nobody cares how many rolls of toilet paper you bought at what price (while the price volume of toilet paper might still be of historic interest as curiosity how mankind could be so wasteful with resources before they had the self cleaning buttock nano coating)
What's your Retention/Archival policy?
Posted by Stephan H Wissel on 27 May 2010 | Comments (1) | categories: Business Software