With technology rapidly changing, it is difficult for credit union IT managers to keep up. The entire landscape has changed over the past 10 years with virtualization becoming so widely adopted. Additionally, with the rapid growth of data and the demand for up-time due to internet and mobile banking, backup windows are shrinking.
Many credit unions are still backing up to tape. However, tape is unreliable, inefficient and there are serious concerns as it relates to compliance. If you are one of many credit unions still using tape or just struggling with the aforementioned, this article might help you better navigate the complexities of backup technology.
In the world of tape backups, you copy all files and databases to tape. To backup more efficiently, you might perform an incremental backup. An incremental compares files from the prior backup and only copies the ones that have changed. If there is a database on the server and even a single record is written, an incremental will need to backup the entire database. If a single word is added to an existing Word document, an incremental will need to backup the entire file. Should you need to recover, you will need to restore the original full backup and the subsequent incremental backups.
In order to understand deduplication, you need to forget everything you know about tape. Enterprise data is highly redundant with identical files or data stored within and across systems. Traditional backup methods magnify this by storing all of this redundant data over and over again. Deduplication is the process of analyzing files and databases at the block level and only storing the unique blocks of data eliminating redundancy. Sounds easy, right? Well, not so fast. First, you need to know, not all deduplication is the same. There are two types of deduplication, inline and post-process. Inline deduplication identifies duplicate blocks as they are written to disk. Post-process deduplication deduplicates data after it has been written to disk.
Inline deduplication is considered more efficient in terms of overall storage requirements because non-unique or duplicate blocks are eliminated before they’re written to disk. Because duplicate blocks are eliminated, you don’t need to allocate enough storage to write the entire data set for later deduplication. However, inline deduplication requires more processing power because it happens “on the fly”; this can potentially affect storage performance, which is a very important consideration when implementing deduplication on primary storage. On the other hand, post-process deduplication doesn’t have an immediate impact on storage performance because deduplication can be scheduled to take place after the data is written. However, unlike inline deduplication, post-process deduplication requires the allocation of sufficient data storage to hold an entire data set before it’s reduced via deduplication.
In order to remain competitive, many tape based backup software providers have stepped into the deduplication arena. Most write to disk just as they do to tape and then run post-process deduplication to minimize the disk footprint.
A primary concern with both inline and post-process deduplication is they require streaming the data across the LAN or WAN to disk (target) which consumes a considerable amount of bandwidth. As deduplication has evolved, rather than only target based deduplication, a few vendors are now offering source based deduplication. This is the process of deduplicating at the client (source server) and then streaming only the unique blocks of data to the target (backup server). Taking it a step further, once the data hits the target, it can perform global deduplication (inline deduplication) where it compares the blocks with the blocks that have already been written to disk on the target, then only write the unique blocks of data. Rather than performing inline deduplication on 100% of the data, it may only need to compare 1% or less, eliminating the concern for processing power. As you can imagine, streaming and writing only the unique blocks of data significantly reduces the required daily network bandwidth and storage.
VMware changes your server and application IT environment. Server utilization has commonly run as low as 5 percent to 20 percent. Because virtualization can make a single physical server act like multiple logical servers, it can improve server utilization by combining numerous computing resources on a single server. VMware allows users to run 10 or more virtual machines on a single server, increasing server utilization to 70 percent or more.
Virtual server backups can be accomplished using a traditional approach with conventional backup software. The backup software is simply installed and configured on each virtual machine, and backups will run normally to any conventional backup target, including tape drives, virtual tape libraries, or disk storage. However, applying traditional backup tactics to virtual server backups does have drawbacks. The most significant challenge is resource contention. Backups demand significant processing power, and the added resources needed to execute a backup may compromise the performance of that virtual machine and all virtual machines running on the system—constraining the VMware host server’s CPU, memory, disk, and network components—and often making it impossible to back up within available windows.
Backup processes have evolved to deliver greater efficiencies in your highly consolidated environment. How is it this possible with larger workloads and shared resources?
The key to making VMware infrastructure backup as efficient as possible is source-based global deduplication.
Backing up at the source can quickly and efficiently protect virtual machines by sending only the changed segments of data on a daily basis, providing up to 500 times daily reduction in network resource consumption compared to traditional full backups. Source based deduplication also reduces the traditional backup load—from up to 200 percent weekly to as little as 2 percent weekly—dramatically reducing backup times.
Some of the more sophisticated backup solutions can back up at the guest level—an individual virtual machine—or at a VMware Consolidated Backup server. In addition, disk based deduplication software negates the need for transporting tapes to offsite repositories for disaster-recovery or compliance purposes by providing remote backup immediately offsite via the cloud.
Second, source based deduplication is the optimal granularity to find changes anywhere within a virtual machine disk format (VMDK), and this is where target based deduplication alone fails to deliver.
As I have stated in prior blogs, backups are the means to recovery. Once data has been deduplicated, in order to perform a recovery, it has to go through what is called a re-hydration process. This is a process of putting all of the pieces back together again and as you can imagine, some software performs this process much more efficiently than others.
Some target based solutions will store multiple revisions before deduplicating so that, in the event of a recovery, it does not have to rehydrate since the re-hydration process can take so long. If you are considering backing up to the cloud, you have to remember that once your initial backup has been seeded (fully written to disk) and your daily backups are running reasonably fast, should you have to recover, you have to rehydrate the entire backup and pull it across the internet. This can add hours or even days to your recovery depending on various factors; re-hydration time, bandwidth etc. If it is your core system, waiting several hours or even days to recover is not an option.
For this very reason, many vendors are now offering a hybrid approach. A hybrid approach requires placing a backup appliance local (at the credit union) to allow for much faster recovery. Additionally, the backup appliance will replicate off-site to the cloud provider.
Backing up to The Cloud
Credit Unions have been slower than most to adopt The Cloud. No surprise since Credit Unions by nature are very conservative. However, we have passed the tipping point and more and more Credit Unions are moving services that direction and backups are no exception. When selecting a backup provider, it is important to understand how the majority of cloud providers price their service. Since deduplication creates a much smaller footprint, pricing is typically based on the amount of data stored in the cloud. The issue with this is nobody truly knows exactly what that number will represent until you have backed up all of your data. This is where it gets complex.
There are two types of data, structured and unstructured. Unstructured data is typical file system files, Word and Excel documents etc. Structured data is primarily databases; Exchange, domain controller, SQL, Oracle etc. On Average, roughly 70% of data at most businesses is unstructured. Unstructured data will deduplicate much more efficiently than structured. In order to estimate your deduplication footprint, it requires the service provider gathering the details on your data to calculate the percentage of structured and unstructured data.
Additionally, retention is a key factor since once the seed is calculated, you have to factor in the average daily change rate and multiply it times your defined retention policies. You also need to factor in average annual data growth. As you can see, this becomes highly complex. If not accurately calculated, you can sign on expecting to pay one amount and end up paying another. Additionally, some software deduplicates much more efficiently than others. Although one vendor may have a lower price per GB or TB than another, they may end up storing two to three times more data, essentially costing you more. It is very important to demo the software before making a long-term commitment and ideally, choosing a vendor that understands credit unions.
One common challenge credit unions encounter after selecting a cloud backup provider is how to transport their data to their disaster recovery service provider in a timely manner. Also, will the DR provider know what to do with it once it arrives?
More and more disaster recovery service providers are offering backup solutions. It just makes sense to have your data stored at the site where the recovery will be performed, avoiding a logistics nightmare. Not to mention, the last thing you want is to have them fumbling around trying to figure out how to use someone else’s software. They need to be experts on the tools they will be using to perform the recovery. The key is to ensure they are capable of meeting all of your recovery needs, they are security conscious, and they perform a regular SSAE examination.
As you can see, technology is rapidly changing and backup software is evolving to keep up with the pace. If you are still using tape, struggling with up-time or just unsatisfied overall with your current backup, I hope this article helps guide you in the right direction.