Data archiving is a quiet market with significant potential. Given the growth in data volumes and intolerance for response time delays, interest in the capabilities of archiving, retention and restore software is reaching take-off speed. This has caused some mixed messages. Indeed, the confusion about the distinction between data archiving and data warehousing in the market, among vendors and end-user firms alike, is so pervasive that it requires clarification and even debunking. A data warehouse stores data for decision support and business intelligence (BI) and may itself require an archive process to remain healthy and make optimal use of the storage technology at its disposal. In the modern sense of the word, “archiving” is not substitutable for database backup, whether incremental or global, though backup may be involved at some point in the process of archiving. Nor is it substitutable or a replacement for “data warehousing.” Finally, archiving is neutral with regards to any particular hardware - it does not mean tape archive, though automated tape libraries (silos) or write once, read many (WORM) optical jukeboxes may be used in modern, policy-based data archiving systems. Data archiving, in the modern sense of the word, performs the following functions:
- Offloading seldom or never-used production data on a record-by-record (or individual object) basis from either transactional or BI systems;
- Retaining the business context of the production data and the offloaded records;
- Finding critical offloaded records within a defined service level;
- Restoring offloaded records efficiently to their business context; and
- As policy-based and policy-driven, by representing classes of business transactions (objects) and time periods, within a framework of information lifecycle management (ILM).
This is completely different than a data warehouse function. Data warehousing is designed to answer basic questions such as – “What are product or service are customers buying or using?” An archive is a copy of production in the same schema. This must be emphasized - the archive and the item being archived have the same database schema, so that when a record is restored, it is restored to a consistent data (and business) context. A star schema is not an archive of the transactions it aggregates and transforms - it is a fundamentally different representation of the data and for a different business purpose (decision support versus transaction processing for example). Simply throwing disk space at the problem is not viable in the long term. It is not practical to restore hundreds of gigabytes of data (or more) to get at a single individual customer records or small set of archival records in response to an audit question, a legal action (e-discovery) or a selective recovery operation.
Source: Data Archiving - A Quiet Market Reaches Take-off Velocity, Data Strategy Adviser, By Lou Agosta, DM Review Online, February 14, 2008