Skip to main content

HDFS High Availability Setup -CCA-500 Exam Notes

Why HDFS Cluster needs High Availability : 

At the end of this topic you will understand idea behind setting up High Availability and how you should prepare for CCA-500 Exam Questions , idea is you shouldn't be confused about any of the trick questions that may come in exam so if you are clear on concepts and design in depth detail then you wouldn't fail the exam.

If you are familiar with HDFS Cluster components, you need two types of Nodes to run the HDFS :
  • Name Node
  • Data Node
If you are using a  standard configuration, running just one  NameNode is a single point of failure  in an HDFS cluster. Each cluster has a single NameNode, and if that host or process became unavailable, the cluster as a whole is unavailable until the NameNode is either restarted or brought up on a new host. 

We can prevent the Single point of failure by running two Name Nodes in High availability mode - one as "Active" node and secondary Name node as "Passive" node.

CCA-500 Exam covers questions on topics listed below :

 HDFS (17%)
  • Describe the function of HDFS daemons
  • Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing
  • Identify current features of computing systems that motivate a system like Apache Hadoop
  • Classify major goals of HDFS Design
  • Given a scenario, identify appropriate use case for HDFS Federation
  • Identify components and daemon of an HDFS HA-Quorum cluster
  • Analyze the role of HDFS security (Kerberos)
  • Determine the best data serialization choice for a given scenario
  • Describe file read and write paths
  • Identify the commands to manipulate files in the Hadoop File System Shell




 CCA-500 Exam Preparation Guide

Cloudera Manager 5 uses Quorum based storage as HA implementation where CDH4 supports both Quorum based and Shared NFS based storage.


Before you upgrade from CDH4 to CDH5:

1.)if you don't disable the HA Configuration then HA will continue to run with warning to switch to Quorum based storage.
2.After upgrade from CDH4 you will not be able to enabled NFS storage based HA, only option you will have is to configure Quorum based storage.

Quorum based Storage: - uses journalnodes daemons to synchoronise standby nodes with active nodes.When any namespace modification is performed by the active NameNode, it durably logs a record of the modification to an edit log file stored in the shared directory. The standby NameNode constantly watches this directory for edits, and when edits occur, the standby NameNode applies them to its own namespace.

Name Node Failovers -  In the event of a failover, the standby will ensure that it has read all of the edits from the shared storage before promoting itself to the active state. This ensures that the namespace state is fully synchronized before a failover occurs.

Comments