Adding compression codec to Hortonworks data platform

by robin · Published September 20, 2016 · Updated September 22, 2016

Lately I tried installing xz/lzma codec on my local VM setup. The compression ratios are pretty awesome. Won’t do a benchmark here, try it out yourself 😉

Steps

Download codec JAR – https://github.com/yongtang/hadoop-xz or https://mvnrepository.com/artifact/io.sensesecure/hadoop-xz

Copy downloaded JAR to HDPs’ libs folder

find /usr/hdp/ -name *snappy*jar |  xargs -L1 dirname | xargs -L1 sudo cp ~/hadoop-xz-1.4.jar

Setup compression in HDFS config using Ambari

Ambari -> HDFS -> Configs -> Advanced core-site -> io.compression.codecs -> add 'io.sensesecure.hadoop.xz.XZCodec'

Making hive to use new JAR
Create below folder on server running hive server 2 with hive as owner and copy hadoop-xz jar file.
```
mkdir /usr/hdp/<version>/hive/auxlib
chown hive -R /usr/hdp/<version>/hive/auxlib
```
Restart HiveServer2

Testing with Hive

create a big sample file in local dir /tmp/sample.txt

Operations in hive

create table orig_sample(val string);
!sh hdfs dfs -put /tmp/sample.txt /tmp;
LOAD DATA INPATH '/tmp/sample.txt' OVERWRITE INTO TABLE orig_sample;

-- test lzma
set hive.exec.compress.output=true;
set io.seqfile.compression.type=BLOCK;
set mapreduce.output.fileoutputformat.compress.codec=io.sensesecure.hadoop.xz.XZCodec;

drop table test_table_lzma;
CREATE TABLE test_table_lzma
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," 
LINES TERMINATED BY "\n" 
STORED AS TEXTFILE 
LOCATION "/tmp/test_table_lzma" as 
select * from orig_sample;

Checking results

hdfs dfs -du -s -h /tmp/sample.txt
hdfs dfs -du -s -h /tmp/test_table_lzma

Tags: bigdata hacks hadoop hortonworks

Adding compression codec to Hortonworks data platform

Steps

Testing with Hive

You may also like...

Leave a Reply Cancel reply

Archives

Adding compression codec to Hortonworks data platform

Steps

Testing with Hive

Related posts:

You may also like...

Understanding Hadoop Clusters and the Network

Hive datatype mappings

LDAP Tutorial: Add User Entries and Group Entries

Leave a Reply Cancel reply

Tags

Archives