Here's the configurations for our Hadoop cluster, which is all related with NameNode HA:
core-site.xml: <property> <name>fs.defaultFS</name> <value>hdfs://ns1</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>644v4.mzhen.cn:2181,644v5.mzhen.cn:2181,644v6.mzhen.cn:2181</value> </property> hdfs-site.xml: <property> <name>dfs.nameservices</name> <value>ns1</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>644v1.mzhen.cn:9000</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn2</name> <value>644v2.mzhen.cn:9000</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>644v1.mzhen.cn:10001</value> </property> <property> <name>dfs.namenode.http-address.ns1.nn2</name> <value>644v2.mzhen.cn:10001</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://644v4.mzhen.cn:8485;644v5.mzhen.cn:8485;644v6.mzhen.cn:8485/ns1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/data/hdfsdir/journal</value> </property> <property> <name>dfs.client.failover.proxy.provider.ns1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/supertool/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
After Hadoop cluster is fully started, there are some checkpoints that should be verified to make sure NameNode HA is fully applied:
1. Nodes corresponding to "dfs.ha.namenodes.ns1" argument in hdfs-site.xml should have processes named "DFSZKFailoverController", "NameNode".
2. Nodes corresponding to "dfs.namenode.shared.edits.dir" argument in hdfs-site.xml should have process named "JournalNode".
3. Nodes corresponding to "ha.zookeeper.quorum" argument in core-site.xml should have process named "QuorumPeerMain".
The way to launch all the processes above, if needed to do so respectively, is listed as below:
QuorumPeerMain: The service for ZooKeeper.
bin/zkServer.sh start
bin/zkServer.sh status
bin/zkServer.sh stop
bin/zkServer.sh restart
JournalNode: In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called "JournalNodes" (JNs)
./sbin/hadoop-daemon.sh stop journalnode
./sbin/hadoop-daemon.sh start journalnode
NameNode:
./sbin/hadoop-daemon.sh stop namenode
./sbin/hadoop-daemon.sh start namenode
DFSZKFailoverController:
./sbin/hadoop-daemon.sh stop zkfc
./sbin/hadoop-daemon.sh start zkfc
Attention that if the above command fails to start with no explicit errors, you could try executing command `./bin/hdfs zkfc` so as to retrieve detailed information.
Lastly, Some common commands relevant with NameNode HA is listed here:
## Get the status of NameNode, active or standby. hdfs haadmin -getServiceState nn1 ## Transfer a NameNode to active manually, which requires 'dfs.ha.automatic-failover.enabled' be set to 'false'. hdfs haadmin -transitionToActive nn1
Reference:
1. High Availability for Hadoop - Hortonworks
2. HDFS High Availability Using the Quorum Journal Manager - Apache Hadoop
No comments:
Post a Comment