Hive ORC files – Pro Tips
Extract text from ORC files (source)
Hive (0.11 and up) comes with ORC file dump utility. dump can be invoked by following command,
$ hive --orcfiledump <location-of-orc-file>
Create hive table definition using ORC files on HDFS
$ hive --orcfiledump hdfs:///data/location/of/the/ORC/file.orc 2>/dev/null | head -n 6 | grep struct | sed -e 's/\([^0-9]\),/\1,\n/g' |sed 's/:/ /g' |sed 's/Type struct</\n---------------\ncreate external table (\n/' | sed 's/>/)\nstored as orc\nlocation '\'\''\n---------------\n/'
Viewing ORC files without hive
Apache foundation has JAVA tools project to view orc files. You could use brew to install it.
$ brew install orc-tools # to get meta infomation, column names, type and meta information $ orc-tools meta local_orc_file.orc # to get the actual data stored in ORC in JSON format $ orc-tools meta local_orc_file.orc 2> /dev/null
Get hive table definition using ORC files on local filesystem
$ orc-tools meta rtd_bundles_reporting_003_20190215151003 | head -n 6 | grep struct | sed -e 's/\([^0-9]\),/\1,\n/g' |sed 's/:/ /g' |sed 's/Type struct</create table ( /' | sed 's/>/)/' | less