Why HDFS Cluster needs High Availability :
At the end of this topic you will understand idea behind setting up High Availability and how you should prepare for CCA-500 Exam Questions , idea is you shouldn't be confused about any of the trick questions that may come in exam so if you are clear on concepts and design in depth detail then you wouldn't fail the exam.If you are familiar with HDFS Cluster components, you need two types of Nodes to run the HDFS :
- Name Node
- Data Node
We can prevent the Single point of failure by running two Name Nodes in High availability mode - one as "Active" node and secondary Name node as "Passive" node.
CCA-500 Exam covers questions on topics listed below :
HDFS (17%)
- Describe the function of HDFS daemons
- Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing
- Identify current features of computing systems that motivate a system like Apache Hadoop
- Classify major goals of HDFS Design
- Given a scenario, identify appropriate use case for HDFS Federation
- Identify components and daemon of an HDFS HA-Quorum cluster
- Analyze the role of HDFS security (Kerberos)
- Determine the best data serialization choice for a given scenario
- Describe file read and write paths
- Identify the commands to manipulate files in the Hadoop File System Shell
Cloudera Manager 5 uses Quorum based storage as HA implementation where CDH4 supports both Quorum based and Shared NFS based storage.
Before you upgrade from CDH4 to CDH5:
1.)if you don't disable the HA Configuration then HA will continue to run with warning to switch to Quorum based storage.2.After upgrade from CDH4 you will not be able to enabled NFS storage based HA, only option you will have is to configure Quorum based storage.
Quorum based Storage: - uses journalnodes daemons to synchoronise standby nodes with active nodes.When any namespace modification is performed by the active NameNode, it durably logs a record of the modification to an edit log file stored in the shared directory. The standby NameNode constantly watches this directory for edits, and when edits occur, the standby NameNode applies them to its own namespace.
Name Node Failovers - In the event of a failover, the standby will ensure that it has read all of the edits from the shared storage before promoting itself to the active state. This ensures that the namespace state is fully synchronized before a failover occurs.
Comments
Post a Comment