Setup local Hadoop dev environment on macOS

by robin · Published June 9, 2016 · Updated January 21, 2021

It is always so convenient to have a local environment for learning and quick testing of a scenario. If you are working in macOS environment looking to learn or setup Hadoop locally then you are in the right place. I will keep the install steps to a minimum with no BS 😉

Pre-requisites

brew
passwordless ssh

Installation

hadoop

brew install hadoop

pig

brew install pig

Hive

brew install hive

The above setup will enable you to run hadoop, hive and pig. In case you want to persist the data, make it survive reboots, you can further configure as indicated below. Look for changing dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml.

Configuration

hadoop

Create folder for hadoop data mkdir -p /opt/hadoop/data

Modify core-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml to look like,

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmp/hadoop-${user.name}</value>
        <description>A base for other temporary directories.</description>
    </property> 
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://www.robin.eu.org:9000</value>
    </property>
</configuration>

Modify hdfs-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml to look like,

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/data/dfs.namenode.name.dir</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop/data/dfs.datanode.data.dir</value>
    </property>
</configuration>

Modify yarn-site.xml in/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml to look like

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

Modify mapred-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml to look like,

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
    </property>
</configuration>

Setting up tez on Hive

Download tez from –
compile using –

Add Tez conf to hadoop environment (/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh)

export TEZ_CONF_DIR=/opt/tez/conf/
export TEZ_JARS=/opt/tez/tez/
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}

Well, pretty much runs out of the box. Try executing start-dfs.sh and shart-yarn.sh , if that fails for you, you will need to setup .bash_profile. Add export PATH=”/usr/local/sbin:$PATH” to .bash_profile

Testing Tez

hdfs dfs -rm -r -f /input
hdfs dfs -mkdir /input
echo "Please note that the tarball version should match the version of the client jars used when submitting Tez jobs to the cluster. Please refer to the Version Compatibility Guide for more details on version compatibility and detecting mismatches." > /tmp/input.txt
hdfs dfs -put /tmp/input.txt /input
hadoop jar /opt/tez/tez/tez-examples-0.8.4.jar orderedwordcount /input /output
hdfs dfs -rm -r -f /input
hdfs dfs -rm -r -f /output
rm /tmp/input.txt

Tags: hadoop hive programming

Setup local Hadoop dev environment on macOS

Installation

hadoop

pig

Hive

Configuration

hadoop

Testing Tez

You may also like...

Leave a Reply Cancel reply

Archives

Setup local Hadoop dev environment on macOS

Installation

hadoop

pig

Hive

Configuration

hadoop

Testing Tez

Related posts:

You may also like...

Remotely debug hadoop

Parsing XML using JAXB in Java

Nested collections in Hive

Leave a Reply Cancel reply

Tags

Archives