I recently had the opportunity to talk with a Data Warehouse expert who manages over six petabytes of data. For those who are not familiar with petabytes, 1024 bytes is a kilobyte (KB), 1024 kilobytes is a megabyte (MB), and 1024 megabytes is a gigabyte (GB), and 1024 gigabytes is a terabyte (TB). Follow so far? We can get hard drives in terabytes now, so hopefully that’s a point of reference. And, finally, 1024 terabytes is a petabyte (PB). So basically that expert manages 6,000 of those 1 TB hard drives! That’s a lot of data.
When we started talking, the very first item he mentioned is something we’ve always said and experienced first-hand; a backup of your data is only half of the process – make sure you can restore! While many organizations put a lot of emphasis on having a good backup schedule with off-site or cloud storage of the media or data, very few organizations actually practice the restore process. What will eventually happen is when it is time to restore a critical file, you might find your storage media is bad, your media reader is bad, your software has a fault or you might even discover you have the wrong parameters on your backups and you are not truly getting the backup you thought you were getting.
Personally I’ve experienced having a faulty tape drive that would give the software a successful backup complete message, however all the tapes were actually unreadable on that drive or other drives. This issue wasn’t discovered for some time and years and years’ worth of backup tapes were worthless, and the precious data needed was unrecoverable. I learned that lesson early in my career and it’s always kept me keenly aware that testing recovery is a critical part of any backup procedure(s).
So, how do you find out if you can restore data?
A simple way to make a test case is to create a folder and add copies of some files (try several file extensions) that you intentionally delete and try to restore both to alternate locations and the original location. I once discovered a backup program that skipped certain types of files and that would have been a real disaster had we needed to recover. Another more thorough test procedure is to restore a full backup to a spare disk and use some sort of differences tool (i.e., Diff or Windiff) to make sure the restore is actually complete. The original and test restore files can be compared manually but that has a high potential for error and could take a large amount of time depending on the number of files.
Another gem of wisdom that was imparted on me was to emphasize that Backups are not Disaster Recovery! [from Wikipedia: “Disaster recovery (DR) is the process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster.”] Backup and restore is just one part of a full DR plan – we’ll revisit Disaster Recovery in a future post…stay tuned!
The post Data Warehousing: You Might Backup, But Can You Restore? appeared first on Open Sky Group.