Setup local Hadoop dev environment on macOS

It is always so convenient to have a local environment for learning and quick testing of a scenario. If you are working in macOS environment looking to learn or setup Hadoop locally then you are in the right place. I will keep the install steps to a minimum with no BS 😉

Pre-requisites

  1. brew
  2. passwordless ssh

 

Installation

hadoop

brew install hadoop

pig

brew install pig

Hive

brew install hive

The above setup will enable you to run hadoop, hive and pig. In case you want to persist the data, make it survive reboots, you can further configure as indicated below. Look for changing dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml.

 

Configuration

hadoop

  1. Create folder for hadoop data mkdir -p /opt/hadoop/data
  2. Modify core-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml to look like,
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/hadoop/tmp/hadoop-${user.name}</value>
            <description>A base for other temporary directories.</description>
        </property> 
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://www.robin.eu.org:9000</value>
        </property>
    </configuration>
  3. Modify hdfs-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml to look like,
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/opt/hadoop/data/dfs.namenode.name.dir</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/opt/hadoop/data/dfs.datanode.data.dir</value>
        </property>
    </configuration>
    
  4. Modify yarn-site.xml in/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml to look like
    <configuration>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
    </configuration>
    
  5. Modify mapred-site.xml in /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml to look like,
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn-tez</value>
        </property>
    </configuration>
    

 

Setting up tez on Hive

  1. Download tez from –
  2. compile using –
  3. Add Tez conf to hadoop environment (/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh)
    export TEZ_CONF_DIR=/opt/tez/conf/
    export TEZ_JARS=/opt/tez/tez/
    export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*:${HADOOP_CLASSPATH}:${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}
    

     

Well, pretty much runs out of the box. Try executing start-dfs.sh and shart-yarn.sh , if that fails for you, you will need to setup .bash_profile. Add export PATH=”/usr/local/sbin:$PATH” to .bash_profile

 

Testing Tez

hdfs dfs -rm -r -f /input
hdfs dfs -mkdir /input
echo "Please note that the tarball version should match the version of the client jars used when submitting Tez jobs to the cluster. Please refer to the Version Compatibility Guide for more details on version compatibility and detecting mismatches." > /tmp/input.txt
hdfs dfs -put /tmp/input.txt /input
hadoop jar /opt/tez/tez/tez-examples-0.8.4.jar orderedwordcount /input /output
hdfs dfs -rm -r -f /input
hdfs dfs -rm -r -f /output
rm /tmp/input.txt

 

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *