Category: Hadoop

Bigdata / Hadoop

December 30, 2020

distcp from Hadoop cluster to AWS S3 bucket

This article may help you to copy data between on Premises hadoop cluster and AWS S3 bucket.

Bigdata / Hadoop

October 30, 2020

Sqoop – Data import & export

Data import using Sqoop Teradata example $ sqoop import –connect jdbc:teradata://server_name/DATABASE=db1,LOGMECH=LDAP,CHARSET=UTF8 –driver “com.teradata.jdbc.TeraDriver” –username <user1> –password <password1> –query “select a.*, b.* from a inner join b on a.a_id=b.a_id where \$CONDITIONS AND a.f1 in (‘A’,’B’,’C’) group by 1” –null-string ‘\\N’ –null-non-string...

Hadoop

December 25, 2019

Cleanup hdfs directory having too many files and directories

At times some directories on hdfs has too many inodes (files and folders) and it is really hard to delete. Some instances also lead to out of memory (OOM) errors such as the following error, INFO retry.RetryInvocationHandler: java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError:...

Bigdata / Hadoop

December 25, 2019

Querying Hive Metastore

Querying hive metastore tables can provide more in depth details on the tables sitting in Hive. This article is a collection of queries that probes Hive metastore configured with mysql to get details like list of transactional tables, etc. More...

Bigdata / Hadoop

September 12, 2019

Extract query time from hiveserver2 Interactive log

Got in a situation where you were asked to extract hive queries and the time they took to execute? Steps On log files run below 2 extracts awk ‘match($0, “^([^ ]+).*Completed executing command\$queryId=([0-9a-z_-]+)\$; Time taken: (.*)”, a) {print “COMPLETE\t” a[1]...

Hadoop

June 21, 2018

Hadoop like a PRO

Lookup yarn queue of a user from bash This bash function will lookup capacity scheduler XML and return queues for the user getYarnQueue() { grep $1 -B 1 /etc/hadoop/conf/capacity-scheduler.xml | awk -F’.’ ‘/name/{print $(NF-1)}’ } Works on Hortonworks HDP. Usage:...

Bigdata / Hadoop

September 13, 2017

Hive msck repair not working

Okay, so msck repair is not working and you saw something as below, 0: jdbc:hive2://hive_server:10000> msck repair table mytable; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) It seems to appear because of higher...

Bigdata / Hadoop

September 6, 2017

DistCp Between HA Clusters

Source: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_administration/content/distcp_between_ha_clusters.html DistCp Between HA Clusters To copy data between HA clusters, use the dfs.internal.nameservices property in the hdfs-site.xml file to explicitly specify the name services belonging to the local cluster, while continuing to use the dfs.nameservices property to specify all of the name services in...

Bigdata / Hadoop

August 31, 2017

Hive export to CSV

Bash function to export Hive table data to local CSV file Usage: hive_export_csv <db.table> <output.csv> [queue] Recommendation: Add to .bash_profile hive_export_csv () { if [ -z “$2” ]; then echo “Bad arguments. Usage: ${FUNCNAME[0]} <db.table> <output.csv> [queue]” else uuid=$(uuidgen)...

Bigdata / Hadoop

August 29, 2017

HiveAccessControlException Permission denied user [user] does not have [WRITE] privilege on …

Source: https://community.hortonworks.com/questions/112754/insert-overwrite-directory-beeline.html Error Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [user] does not have [WRITE] privilege on [/tmp/*] (state=42000,code=40000) The above error appears, even though you’ve setup ranger policies, hdfs policies are set up. You’ve checked everything and...

Category: Hadoop

Tags

Archives