Setup local Hadoop dev environment on macOS
It is always so convenient to have a local environment for learning and quick testing of a scenario. If you are working in macOS environment looking to learn or setup Hadoop locally then you are in the right place. I will keep the install steps to a minimum with no BS 😉
Pre-requisites
- brew
- passwordless ssh
Installation
hadoop
brew install hadoop
pig
brew install pig
Hive
brew install hive
The above setup will enable you to run hadoop, hive and pig. In case you want to persist the data, make it survive reboots, you can further configure as indicated below. Look for changing dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml.
Configuration
hadoop
- Create folder for hadoop data
mkdir -p /opt/hadoop/data
- Modify core-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml to look like,
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://www.robin.eu.org:9000</value> </property> </configuration>
- Modify hdfs-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml to look like,
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop/data/dfs.namenode.name.dir</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop/data/dfs.datanode.data.dir</value> </property> </configuration>
- Modify yarn-site.xml in/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml to look like
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
- Modify mapred-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml to look like,
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn-tez</value> </property> </configuration>
Setting up tez on Hive
- Download tez from –
- compile using –
- Add Tez conf to hadoop environment (/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh)
export TEZ_CONF_DIR=/opt/tez/conf/ export TEZ_JARS=/opt/tez/tez/ export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}
Well, pretty much runs out of the box. Try executing start-dfs.sh and shart-yarn.sh , if that fails for you, you will need to setup .bash_profile. Add export PATH=”/usr/local/sbin:$PATH” to .bash_profile
Testing Tez
hdfs dfs -rm -r -f /input hdfs dfs -mkdir /input echo "Please note that the tarball version should match the version of the client jars used when submitting Tez jobs to the cluster. Please refer to the Version Compatibility Guide for more details on version compatibility and detecting mismatches." > /tmp/input.txt hdfs dfs -put /tmp/input.txt /input hadoop jar /opt/tez/tez/tez-examples-0.8.4.jar orderedwordcount /input /output hdfs dfs -rm -r -f /input hdfs dfs -rm -r -f /output rm /tmp/input.txt