Category: Hadoop

0

Compiling Hue on CentOS

Tested on CentOS 6.8 Minimal ISO install with Hue 3.11 Downloads Hue 3.11   Steps Download Hue tarball Install dependencies yum install python-devel libffi-devel gcc openldap-devel openssl-devel libxml2-devel libxslt-devel mysql-devel gmp-devel sqlite-devel openldap-devel gcc-c++ rsync Compile You can either compile...

0

Performance of Hive tables with Parquet & ORC

Source: http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy Datasets Table A – Text File Format- 2.5GB Table B – ORC – 652MB Table C – ORC with Snappy – 802MB Table D – Parquet – 1.9 GB Parquet was worst as far as compression for my table...

0

Hadoop security practices

References http://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/ http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/ http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ http://hortonworks.com/blog/author/balajiganesan03/ http://www.slideshare.net/hortonworks/ops-workshop-asrunon20150112 Related posts: Deleting users from Ranger database (mysql) Apache Ranger tips and tid bits Remotely debug hadoop Good looking .hiverc file

0

Can’t connect Excel to Hive using ODBC driver on MAC

So you done everything right and can’t connect Excel to Hive using ODBC driver on your macOS? Let’s see what is going on. Are you running El Capitan on Sierra? Well I was running Sierra and tried connecting before while...

0

Connecting SQuirrel SQL to Hive

Pre-requisites In order to connect SQuirrel SQL client we need the following prerequisites, Client – http://squirrel-sql.sourceforge.net/ Hive connection JARs (found in lib directories) Hive JDBC JAR – hive-jdbc-1.2.1-standalone.jar Hadoop common JAR (for ) – hadoop-common-2.7.2.jar Running HiveServer2 instance For connections use the following...

0

Creating Hive tables on compressed files

Stuck with creating Hive tables on compressed files? Well the documentation on apache.org suggests that Hive natively supports compressed file – https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Lets try that out. Store a snappy compressed file on HDFS. … thinking, I do not have such file… Wait!...

0

Query escaped JSON string in Hive

There are times when we want to parse a string that is actually a JSON. Usually that could be done with built in functions of Hive such as get_json_object(). Though get_json_object cannot parse JSON Array from my experience. These array...

0

Using JSON SerDe in Hive

Using JsonSerDe in Hive Download JSON Serde – https://github.com/rcongiu/Hive-JSON-Serde Compile command for hive 1.2.1 – “mvn -Pcdh5 -Dcdh5.hive.version=1.2.1 clean package” . change hive version per the environment Copy json-serde/target/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar (or similar) to hive/lib Restart hive Sample JSON with test HiveQLs...

0

HDFS disk consumption – Find what is taking hdfs space

Source: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.html Script #!/usr/bin/env bash max_depth=5 largest_root_dirs=$(hdfs dfs -du -s ‘/*’ | sort -nr | perl -ane ‘print “$F[1] “‘) printf “%15s %s\n” “bytes” “directory” for ld in $largest_root_dirs; do printf “%15.0f %s\n” $(hdfs dfs -du -s $ld| cut -d’ ‘...

0

Use SSH Tunneling to access Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI’s

Source: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-linux-ambari-ssh-tunnel/ Original Author: Larry Franks Excerpts ssh tunnel command ssh -C2qTnNf -D 9876 user-name@machine-name This creates a connection that routes traffic to local port 9876 to the cluster over SSH. The options are: D 9876 – The local port that will route...