Thursday, December 4, 2014

A Record On The Process Of Adding DataNode To Hadoop Cluster

Procedure of checking and configuring linux node


1. Check on hostname

To check whether current hostname is exactly what it should be.
hostname

2. Check on the type of file system

To make sure it is the same as all the other nodes in Hadoop cluster.
df -T


3. Change owner of all data disk to hadoop user

chown -R hadoop:hadoop /home/data*

4. Check and backup disk mount info

We can simply execute the output of mount.sh in case some mounted filepaths are lost upon restart of node.

su root

--mount.sh--
n=1 ; for i in a b c d e f g h i j k l ; do a=`/sbin/blkid -s UUID | grep ^/dev/sd$i | awk '{print $2}'` ; echo mount $a /home/data$n ; n=`echo $n+1|bc` ; done

> bash mount.sh
mount UUID="09c42017-9308-45c3-9509-e77a2e99c732" /home/data1
mount UUID="72461da2-b0c0-432a-9b65-0ac5bc5bc69a" /home/data2
mount UUID="6d447f43-b2db-4f69-a3b2-a4f69f2544ea" /home/data3
mount UUID="37ca4fb8-377c-493d-9a4c-825f1500ae52" /home/data4
mount UUID="53334c93-13ff-41f5-8688-07023bd6f11a" /home/data5
mount UUID="10fa31f7-9c29-4190-8ecd-ec893d59634c" /home/data6
mount UUID="fe28b8dd-ff3b-49d9-87c6-6eee9f389966" /home/data7
mount UUID="5201d24b-9310-4cff-b3ad-5b09e47780a5" /home/data8
mount UUID="d3b85455-8b94-4817-b43e-69481f9c13c4" /home/data9
mount UUID="6f2630f1-7cfe-4cac-b52d-557f46779539" /home/data10
mount UUID="bafc742d-1477-439a-ade4-29711c5db840" /home/data11
mount UUID="bf6e36d8-1410-4547-853c-f541c9a07e52" /home/data12

We can append the output of mount.sh into /etc/rc.local, in which way the mount command will be invoked automatically every time node starts up.

5. Check on the version of operating system

lsb_release -a

6. Optimizing TCP parameters in sysctl

Append the following content to /etc/sysctl.conf.
fs.file-max = 800000
net.core.rmem_default = 12697600
net.core.wmem_default = 12697600
net.core.rmem_max = 873800000
net.core.wmem_max = 655360000
net.ipv4.tcp_rmem = 8192 262144 4096000
net.ipv4.tcp_wmem = 4096 262144 4096000
net.ipv4.tcp_mem = 196608 262144 3145728
net.ipv4.tcp_max_orphans = 300000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_max_syn_backlog = 100000
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp.keepalive_time = 1200
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 1500
net.core.somaxconn=32768
vm.swappiness=0

Issue the following command to validate the previous setting.
/sbin/sysctl -p

# List all cuurent parameters to double-check
/sbin/sysctl -a

7. Max connections to a file

Check on current max connections to a file:
ulimit -n

Appending the following content to /etc/security/limits.confs so as to change it to 100000.
*      soft    nofile  100000
*      hard    nofile  100000

8. Check and sync /etc/hosts

Checkout a current /etc/hosts file from one of the existing nodes in Hadoop cluster, namely canonical node. Append the newly-added node's host in this file and synchronize it to all nodes.

9. Revise locale to en_US.UTF-8

Append in /etc/profile will just do the work.
export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_COLLATE=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_TELEPHONE=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LC_IDENTIFICATION=en_US.UTF-8
export LC_ALL=en_US.UTF-8

10. Transfer .ssh directory to newly-added node

Since all nodes are sharing one set of .ssh directory, in this way, no-auth ssh among all the nodes is simple to achieve: Append id_rsa.pub in current authorized_keys and spread .ssh directory to all nodes.

11. Deploy Java & Hadoop environment

Copy java directory from canonical node to newly-added node, append the following environment variable to /etc/profile:
export JAVA_HOME=/usr/java/jdk1.7.0_11
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

Likewise, do the same to hadoop directory:
export HADOOP_HOME=/home/workspace/hadoop
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export PATH=$PATH:$HADOOP_HOME/bin

At the same time, mkdir and chown the paths configured by dfs.datanode.data.dir parameter in hdfs-site.xml:
su hadoop
for i in {1..12}
do
  mkdir -p /home/data$i/hdfsdir/data
  chmod -R 755 /home/data$i/hdfsdir
done

Lastly, append newly-added host to etc/hadoop/slaves and synchronize to all nodes.
su hadoop
for i in $(cat $HADOOP_HOME/etc/hadoop/slaves  | grep -v "#")
do
 echo '';
 echo $i;
 scp $HADOOP_HOME/etc/hadoop/slaves hadoop@$i:/home/workspace/hadoop/etc/hadoop/;
done

12. Open iptables

This is well-explained in another post: Configure Firewall In Iptables For Hadoop Cluster.

13. Launch Hadoop services

Eventually, we simple invoke the following two commands so as to start DataNode and NodeManager service.
su hadoop
$HADOOP_HOME/sbin/hadoop-daemon.sh start datanode
$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager

After that, check whether the above two processes exist or not and look through the corresponding logs in $HADOOP_HOME/logs directory for the purpose of double-check.


The python script, which is the implementation of all the procedures listed above, can be found in my project in github.


© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

No comments:

Post a Comment