Category: Bigdata

0

Troubleshooting Hadoop services

Hive Lookup what killed Hive server $ grep –color=always -nr -B 1 ‘Exception|Service:HiveServer2 is started|java.lang.OutOfMemoryError’ /var/log/hive/hiveserver2.log | less -N Above command looks up the log file for exceptions and startup of hive and print one line above the search term....

0

Nested collections in Hive

1, 2 & 3 .. Lets go! 1. SHELL echo “1345653,110909316904:1341894546|221065796761:1341887508” > /tmp/20170317_array_inputfile.txt hdfs dfs -mkdir -p /tmp/20170317/array_test/input hdfs dfs -put /tmp/20170317_array_inputfile.txt /tmp/20170317/array_test/input rm /tmp/20170317_array_inputfile.txt 2. HIVE drop table SAMPLE; CREATE external TABLE SAMPLE( id BIGINT, record array<struct<col1:string,col2:string>> )row format...

0

Hive Vertex failure

Vertex failure while running Hive queries? Let’s see what can be done… Not sure..Change, hive.fetch.task.conversion=more; to hive.fetch.task.conversion=none;   Was the data on hdfs in ORC files? and error being similar to below? Vertex failed, vertexName= at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java: ) Try changing...

0

Fixing large mysql ibdata1 resulting from ranger audits

Table Partitioning in MySQL: (Version 5.1.6 or above) Note: Before starting backup/restore please stop all running application which usage XA_ACCESS_AUDIT table. this will be help for keeping snapshot of XA_ACCESS_AUDIT for particular timestamp. Table Partitioning in MySQL:- Partitioned tables created...

0

Developing A Custom Apache Nifi Processor (JSON)

Source: http://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/   Developing A Custom Apache Nifi Processor (JSON) Feb 7, 2015 • Phillip Grenier The list of available Apache Nifi processors is extensive, as documented in this post. There is still a need to develop your own; to pull...

0

Apache Ranger tips and tid bits

1. Error syncing users Observation: org/apache/commons/httpclient/URIException in ranger log ERROR UserGroupSync [UnixUserSyncThread] – Failed to synchronize UserGroup information. Error details: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException at org.apache.ranger.unixusersync.process.PolicyMgrUserGroupBuilder.delXUserGroupInfo(PolicyMgrUserGroupBuilder.java:615) at org.apache.ranger.unixusersync.process.PolicyMgrUserGroupBuilder.delXUserGroupInfo(PolicyMgrUserGroupBuilder.java:600) at org.apache.ranger.unixusersync.process.PolicyMgrUserGroupBuilder.addOrUpdateUser(PolicyMgrUserGroupBuilder.java:326) at org.apache.ranger.unixusersync.process.FileSourceUserGroupBuilder.updateSink(FileSourceUserGroupBuilder.java:97) at org.apache.ranger.usergroupsync.UserGroupSync.syncUserGroup(UserGroupSync.java:113) at org.apache.ranger.usergroupsync.UserGroupSync.run(UserGroupSync.java:87) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.URIException at...

0

Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions

Source: http://beekeeperdata.com/posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html Author: Matthew Rathbone Co-author: Elena Akhmatova   Article Hadoop Hive UDTF Tutorial – Extending Apache Hive with Table Functions While working with both Primitive types and Embedded Data Structures was discussed in part one, the UDF interfaces are limited to...

0

How to create a Hive UDF in Scala

Source: https://community.hortonworks.com/articles/42695/how-to-create-a-hive-udf-in-scala.html   This article will focus on creating a custom HIVE UDF in the Scala programming language. Intellij IDEA 2016 was used to create the project and artifacts. Creation and testing of the UDF was performed on the Hortonworks...

0

Permanently add jars to hadoop

Looking to add custom SerDe and custom or third party codecs to Hortonworks HDP? Only auxlib folder trick worked for me after having tried lot of alternatives. The places where we need to add that auxlib folder containing JARs is,...

0

Best practices for Namenode and Datanode restarts

Problems Following are some problems we might come across while working in a large setup of hadoop clusters, Namenode restarts taking long time (http://nn-host:50070/dfshealth.html#tab-startup-progress) Namenode startup goes to safemode for a long time after restart   Best practices for Namenode &...