Hive ORC files – Pro Tips

Extract text from ORC files (source)

Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command,

$ hive --orcfiledump <location-of-orc-file>

Create hive table definition using ORC files on HDFS

$ hive --orcfiledump hdfs:///data/location/of/the/ORC/file.orc  2>/dev/null | head -n 6 | grep struct | sed -e 's/\([^0-9]\),/\1,\n/g' |sed  's/:/ /g' |sed  's/Type  struct</\n---------------\ncreate external table (\n/' | sed 's/>/)\nstored as orc\nlocation '\'\''\n---------------\n/'

Viewing ORC files without hive

Apache foundation has JAVA tools project to view orc files. You could use brew to install it.

$ brew install orc-tools

# to get meta infomation, column names, type and meta information
$ orc-tools meta local_orc_file.orc

# to get the actual data stored in ORC in JSON format
$ orc-tools meta local_orc_file.orc 2> /dev/null 

Get hive table definition using ORC files on local filesystem

$ orc-tools meta rtd_bundles_reporting_003_20190215151003 | head -n 6 | grep struct | sed -e 's/\([^0-9]\),/\1,\n/g' |sed  's/:/ /g' |sed  's/Type  struct</create table ( /' | sed 's/>/)/' | less

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *