Tagged: bigdata

0

Hadoop streaming with Python

Want to write a Hadoop program in less than 5 minutes? Get in here for a quick check on how it’s done. We use Python and Hadoop streaming to complete the task.

0

Cleanup hdfs directory having too many files and directories

At times some directories on hdfs has too many inodes (files and folders) and it is really hard to delete. Some instances also lead to out of memory (OOM) errors such as the following error, INFO retry.RetryInvocationHandler: java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError:...

0

Querying Hive Metastore

Querying hive metastore tables can provide more in depth details on the tables sitting in Hive. This article is a collection of queries that probes Hive metastore configured with mysql to get details like list of transactional tables, etc. More...

0

Hadoop like a PRO

Lookup yarn queue of a user from bash This bash function will lookup capacity scheduler XML and return queues for the user getYarnQueue() { grep $1 -B 1 /etc/hadoop/conf/capacity-scheduler.xml | awk -F’.’ ‘/name/{print $(NF-1)}’ } Works on Hortonworks HDP. Usage:...

0

Hive msck repair not working

Okay, so msck repair is not working and you saw something as below, 0: jdbc:hive2://hive_server:10000> msck repair table mytable; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) It seems to appear because of higher...

0

​DistCp Between HA Clusters

Source: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_administration/content/distcp_between_ha_clusters.html ​DistCp Between HA Clusters To copy data between HA clusters, use the dfs.internal.nameservices property in the hdfs-site.xml file to explicitly specify the name services belonging to the local cluster, while continuing to use the dfs.nameservices property to specify all of the name services in...

0

Hive export to CSV

Bash function to export Hive table data to local CSV file Usage: hive_export_csv <db.table> <output.csv> [queue] Recommendation: Add to .bash_profile   hive_export_csv () { if [ -z “$2” ]; then echo “Bad arguments. Usage: ${FUNCNAME[0]} <db.table> <output.csv> [queue]” else uuid=$(uuidgen)...

0

How a newline can ruin your Hive

Source: http://marcel.is/how-newline-can-ruin-your-hive/ If you do not fully understand how Hive/Impala stores your data, it might cost you badly. Symptom #1: Weird values in ingested Hive table You double-checked with select distinct(gender) from customers that the gender column in your source RDBMS really contains only values male, female and NULL....

0

Fastest way of compressing file(s) in Hadoop

Compressing files in hadoop Okay, well.. It may or may not be the fastest. Email me if you find a better alternate 😉 Short background, The technique uses simple Pig script Make Pig use tez engine (set the queue name...

0

Ambari REST Api

Ambari configuration over REST   Ambari configuration over REST API Need to login to ambari Access below URL, http://ambari-host:8080/api/v1/services/AMBARI/components/AMBARI_SERVER   Related posts: Adding compression codec to Hortonworks data platform Permanently add jars to hadoop HDFS disk consumption – Find what...