![]() Snapshots simply capture a moment in time for your table by creating the equivalent of UNIX hard links to your table’s storage files on HDFS (Figure 1). My colleague Matteo Bertozzi covered snapshots very well in his blog entry and subsequent deep dive. Here I will provide only a high-level overview. The following table provides an overview for quickly comparing these approaches, which I’ll describe in detail below.Īs of CDH 4.3.0, HBase snapshots are fully functional, feature rich, and require no cluster downtime during their creation. Let’s start with the least disruptive, smallest data footprint, least performance-impactful mechanism and work our way up to the most disruptive, forklift-style tool: So how in the world can you get a consistent backup copy of this data that resides in a combination of HFiles and Write-Ahead-Logs (WALs) on HDFS and in memory on dozens of region servers? HBase is a log-structured merge-tree distributed data store with complex internal mechanisms to assure data accuracy, consistency, versioning, and so on. (Cloudera Enterprise 5, currently in beta, offers HBase snapshot management via Cloudera BDR.) Backup HBase is not included in that GA release therefore, the various mechanisms described in this blog are required. Note: At the time of this writing, Cloudera Enterprise 4 offers production-ready backup and disaster recovery functionality for HDFS and the Hive Metastore via Cloudera BDR 1.0 as an individually licensed feature. (The details herein apply to CDH 4.3.0/HBase 0.94.6 and later.) You should also understand the pros, cons, and performance implications of each mechanism. After reading this post, you should be able to make an educated decision on which BDR strategy is best for your business needs. In this post, you will get a high-level overview of the available mechanisms for backing up data stored in HBase, and how to restore that data in the event of various data recovery/failover scenarios. As daunting as it may sound to quickly and easily backup and restore potentially petabytes of data, HBase and the Apache Hadoop ecosystem provide many built-in mechanisms to accomplish just that. ![]() With increased adoption and integration of HBase into critical business systems, many enterprises need to protect this important business asset by building out robust backup and disaster recovery (BDR) strategies for their HBase clusters. Get an overview of the available mechanisms for backing up data stored in Apache HBase, and how to restore that data in the event of various data recovery/failover scenarios
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |