Best practices for Namenode and Datanode restarts

by robin · Published October 10, 2016 · Updated December 9, 2016

Problems

Following are some problems we might come across while working in a large setup of hadoop clusters,

Namenode restarts taking long time (http://nn-host:50070/dfshealth.html#tab-startup-progress)
Namenode startup goes to safemode for a long time after restart

Best practices for Namenode & restarts

DO NOT restart all services at once. Instead do the following in order,

Go to standby namenode first, and restart it
Then restart the active namenode
Do a rolling restart for datanodes. Increase the duration between restart jobs to be 3-4 minutes and restart 2 datanodes at a time. It is safer that was as running jobs should not get impacted. At least one copy is alive if replication factor is 3x.

Faster namenode startup

Most of the times startup times are long if there are large number of edit logs to load for a namenode. It is recommended to save Namespaces once in a while to rebuild fsimage once in a while (once a month or so). Make sure no jobs are running

# For all namenodes
hdfs dfsadmin -safemode enter 
hdfs dfsadmin -saveNamespace 
hdfs dfsadmin -safemode leave

# For specific namenode in case of HA (start with Standby first. Port is usually 8020 or 9000)
hdfs dfsadmin -fs hdfs://<namenode-host>:<port> -safemode enter
hdfs dfsadmin -fs hdfs://<namenode-host>:<port> -saveNamespace 
hdfs dfsadmin -fs hdfs://<namenode-host>:<port> -safemode leave

Exiting namenode safemode manually

DO NOT try to leave or exit the namenode manually using the command below 😀

hdfs dfsadmin -safemode leave

This could result in missing blocks or under replicated block for a namenode. Instead go to the namenode UI and check for the datanodes that has not reported the blocks to namenode and restart them individually. An easy way to find out those datanodes is from the number of blocks reported in the UI. They will be the once having oddly low number of blocks.

Switching Namenodes

Use the below command instead of bouncing the active namenode

hdfs dfsadmin -failover nn2(standby) nn1(active)

HTH

Tags: bigdata hadoop hortonworks

Best practices for Namenode and Datanode restarts

Problems

Best practices for Namenode & restarts

Faster namenode startup

Exiting namenode safemode manually

Switching Namenodes

You may also like...

Leave a Reply Cancel reply

Archives

Best practices for Namenode and Datanode restarts

Problems

Best practices for Namenode & restarts

Faster namenode startup

Exiting namenode safemode manually

Switching Namenodes

Related posts:

You may also like...

Hive msck repair not working

​DistCp Between HA Clusters

How to identify what is consuming space in HDFS

Leave a Reply Cancel reply

Tags

Archives

DistCp Between HA Clusters