Uninstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/

 

I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as virtual Hadoop cluster. Hortonworks provide great documentation, but as of today I found it is not complete when it comes to uninstalling their distribution. This post aims at creating a small guide on what all one might need to uninstall/clean to bring the cluster to original state as before you started installing Hadoop on it for first time. I have tried these steps on my Linux cluster having HDP 2.2 installed (which was earlier upgraded from HDP 2.1).

Software/Hardware Specifications:

Operating System: RedHat 2.6.32-504.1.3.el6

Apache Ambari Version: 1.7 (Hortonworks Weblink)

Hadoop Distribution: HDP 2.2

Tools used: Yum, pdsh, Various Linux Commands

Tip:- Use PDSH to get rid of running commands on individual servers and at the same time get nice history of installation process.


STEP I – Stop all services

For this you can navigate to your Ambari UI “http://<Ambari-Server FQDN>:<Ambari-Server Port>“. Use “Stop All Services” option. Ensure that all of the service indicator icons flash RED which means services are no more running. If there are issues while stopping services, you will need to refer to error logs from Ambari UI to find issue and fix it.


STEP II – Stop Ambari Server & Clients

$ssh root@<Ambari-Server>

$ambari-server stop

$pdsh -a ambari-agent stop | dshbak

Note:- Ensure from the output that ambari-server and ambari-agent has been stopped on all of the servers.


STEP III – Remove Hadoop packages

$ pdsh -a yum -y remove `yum list installed | grep -i hadoop | cut -d. -f1 | sed -e :a -e '$!N; s/\n/ /; ta'` | dshbak

$ pdsh -a yum -y remove ambari* | dshbak

$ pdsh -a yum -y remove `yum list installed | grep -w 'HDP' | egrep -v -w 'pdsh|dshbak' | cut -d. -f1 | grep -v "^[ ]" | sed -e :a -e '$!N; s/\n/ /; ta'`| dshbak

$ pdsh -a yum -y remove `yum list installed | egrep -w 'hcatalog|hive|hbase|zookeeper|oozie|pig|snappy|hadoop-lzo|knox|hadoop|hue' | cut -d. -f1 | grep -v "^[ ]" | sed -e :a -e '$!N; s/\n/ /; ta'`|dshbak


STEP IV – Uninstall databases used by HDP

Note:- You my skip this step if you want to retain the same for any specific reason. I have demonstrated uninstalling MySQL and PostgreSQL databases, however in case you have any other DB used (such as Oracle), please refer to specific database manuals for the uninstalling the same.

#I had multiple MYSQL servers in my hadoop clusters, so I used pdsh to remove all server and client components at once

$ pdsh -a yum -y remove mysql mysql-server | dshbak

#I do not wanted to backup old mysql data, hence deleting all of that. You might want to save a copy.

$ pdsh -a rm -rf /var/lib/mysql

$ pdsh -a yum -y remove postgre* | dshbak


STEP V – Remove all Hadoop related folders/logs/etc

$ pdsh -a rm -r `find /etc -maxdepth 1 | egrep -wi 'mysql|hcatalog|hive|hbase|zookeeper|oozie|pig|snappy|hadoop|knox|hadoop|hue|ambari|tez|flume|storm|accumulo|spark|kafka|falcon|slider|ganglia|nagios|phoenix' | sed -e :a -e '$!N; s/\n/ /; ta'`

$ pdsh -a rm -r `find /var/log -maxdepth 1 | egrep -wi 'mysql|hcatalog|hive|hbase|zookeeper|oozie|pig|snappy|hadoop|knox|hadoop|hue|ambari|tez|flume|storm|accumulo|spark|kafka|falcon|slider|ganglia|nagios|phoenix' | sed -e :a -e '$!N; s/\n/ /; ta'`

$ pdsh -a rm -r `find /tmp -maxdepth 1 | egrep -wi 'hadoop' | sed -e :a -e '$!N; s/\n/ /; ta'`

#You would have defined hadoop data/name node folder/partition. Please ensure you delete it from all of the nodes. In my case it was /hodoop

$ pdsh -a rm -r /hadoop


STEP VI – Reboot all server

This step is to ensure that any run-away processes are cleaned and the system returns to sane stage. Follow step for each server in your cluster. If the server from which you are going to run following command is part of the Hadoop cluster, then reboot it after all are rebooted.

$ ssh root@<Cluster-Node-FQDN> shutdown -r now

#Wait for sometime before checking if all the servers are rebooted

$pdsh -a uptime | dshbak


STEP VII – Update software packages on each node to the latest.

This step will ensure that your OS packages are latest from the repository and hence no surprises during next round of installation. Repeat following command for each node in the Hadoop cluster.

$ ssh root@<Cluster-Node-FQDN> yum -y update


I hope I covered all the steps required to uninstall Hortonworks Hadoop HDP 2.2, if you find any discrepancies or issues after you follow these steps, please let me know and I will be glad to update my blog.

The very next step you might be looking to is installing hadoop and I strongly believe Hortonworks team has done a great job to document the detailed process at,

Installing Hortonworks Hadoop HDP 2.2

Enjoy Hadooping …

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *