Category: Bigdata

September 28, 2016

Setting up password-less ssh across all nodes in a cluster

Pre-requisites User account for which passwordless ssh will be setup, should be present on all nodes Password of the account should be same across all nodes pdsh and ssh-copy-id commands should be available Prepare 2 files file_of_hosts.txt – containing all...

Bigdata / Hadoop

September 22, 2016

Hive on Tez Performance Tuning – Determining Reducer Counts

Source: https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html Short Description: Some practical steps in Hive Tez tuning Article How Does Tez determine the number of reducers? How can I control this for performance? In this article, I will attempt to answer this while executing and tuning...

Bigdata / Hadoop

September 20, 2016

Hive query tips

Date operations Data operations Headers in Beeline Unlock hive tables Check partitions used in hive query Debugging Hive Long (query length) queries submitted to Hive Occurrence of thread printing in hiveserver2 log file Capture classes used in hiveserver2 log...

Bigdata / Hadoop

September 20, 2016

Adding compression codec to Hortonworks data platform

Lately I tried installing xz/lzma codec on my local VM setup. The compression ratios are pretty awesome. Won’t do a benchmark here, try it out yourself 😉 Steps Download codec JAR – https://github.com/yongtang/hadoop-xz or https://mvnrepository.com/artifact/io.sensesecure/hadoop-xz Copy downloaded JAR to HDPs’ libs...

Bigdata / Hadoop

September 16, 2016

Good looking .hiverc file

Following is the .hiverc from one of the hadoop environments I work on, — additional .jar includes like the one below — add jar hdfs://ualprod/tmp/json-serde-1.3.7-jar-with-dependencies.jar; set hive.exec.dynamic.partition.mode=nonstrict; set hive.auto.convert.join.noconditionaltask=true; set hive.optimize.sort.dynamic.partition=true; set hive.exec.max.dynamic.partitions=100000; set hive.exec.max.dynamic.partitions.pernode=10000; — large mem?? set hive.tez.container.size=10240;...

Bigdata / Mac

September 13, 2016

Kafka on OSX / macOS

Source: https://dtflaneur.wordpress.com/2015/10/05/installing-kafka-on-mac-osx/ Apache Kafka is a highly-scalable publish-subscribe messaging system that can serve as the data backbone in distributed applications. With Kafka’s Producer-Consumer model it becomes easy to implement multiple data consumers that do live monitoring as well persistent...

Bigdata / Hadoop

September 12, 2016

Apache drill – No current connection

After reading multiple posts, it seems that this is a problem of conflicting jars. My current setup has apache drill installed using $brew install apache-drill and upon executing $drill-embedded or $drill-localhost, I see below error (line 10) robin@MacBook-Pro:~$ drill-localhost Java HotSpot(TM)...

Bigdata / Hadoop

September 8, 2016

Hive ORC files – Pro Tips

Extract text from ORC files (source) Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command, $ hive –orcfiledump <location-of-orc-file> Create hive table definition using ORC files on HDFS $ hive –orcfiledump hdfs:///data/location/of/the/ORC/file.orc...

Bigdata / Hadoop

September 2, 2016

Deleting users from Ranger database (mysql)

Once you sync users in Apache Ranger they will stay in the database even if we sync ranger users from a different source. All those users will clutter up the Ranger user interface. Following two scripts will help in deleting...

Bigdata / Hadoop

August 17, 2016

Hadoop security practices

References http://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/ http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ http://hortonworks.com/blog/author/balajiganesan03/ http://www.slideshare.net/hortonworks/ops-workshop-asrunon20150112 Related posts: Deleting users from Ranger database (mysql) Apache Ranger tips and tid bits Remotely debug hadoop Good looking .hiverc file

Category: Bigdata

Tags

Archives