Hive query tips

 

Debugging Hive

 

Date operations

When date is stored is YYYY-MM-DD it makes date operations very easy. For example the date comparisons can use string comparison!.

Date comparison

where to_date(your_date_field) >= '2016-07-01' AND to_date(your_date_field) <= '2016-08-31'

Last X days

Use something similar like below in where clause

where to_date(your_date_field) >= date_sub(current_date, 7)

 

Data operations

Changing delimiter upon export

INSERT OVERWRITE DIRECTORY '/user/hadoop/output'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "sq|"
SELECT * FROM graph_edges;

 

Headers in Beeline

To enable headers > !set showHeader false

To set header width > !set headerInterval 50

 

Unlock hive table

set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager; 
unlock table tablename;

 

Check partitions used in hive query

> explain dependency <query that runs on a table with partitions>

 

Long queries submitted to Hive

Search in hiveserver2.log files in bash using awk

$ awk -F": " '/Starting command\(/  {print length ($3) "\t" $2}' hiveserver2.log | sort -nr | head -5

 

Occurence of thread printing in hiveserver2.log file

$ awk 'match($0,"(Thread-[0-9]+)",a) {print a[1]}' hiveserver2.log | sort | uniq -c | sort -n | less

 

Capture classes used in hiveserver2 log files

$ awk '/^20/ && $4 ~ /^org/ {print $1,$2,$3,$4 }'

 

Fetch Hive queries executed from hiveserver2.log file

$ awk '/Starting command\(/{print}' hiveserver2.log

 

Hive query commands by count, by user, by type

$ awk '/org.apache.hadoop.hive.metastore.HiveMetaStore.audit/ &&  match($0,/^([^ ]+) ([0-9][0-9]).* ugi=([a-z0-9]+).*cmd=([^:]+).*$/,a) { print a[1],a[2], a[3], a[4] }' hiveserver2.log  | head -n 1000 | sort | uniq -c

 

Errors and warning counts by type

$ awk '/ERROR|WARN/ &&  match($0,/^([^ ]+) ([0-9][0-9])[^ ]+ ([A-Z]+)/,a) {print a[3],a[1], a[2]}' hiveserver2.log | sort | uniq -c

 

 

More tricks

  1. http://www.wmanalytics.io/blog/four-useful-tricks-working-hive
  2. Set number of reducers – http://stackoverflow.com/questions/8762064/hive-unable-to-manually-set-number-of-reducers
  3. Performance tuning Hive on Tez – https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html

 

 

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *