Category: Hadoop

0

Understanding HDFS Quotas and Hadoop Fs and Fsck Tools

Source: http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/ References: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html   In my experience Hadoop users often confuse the file size numbers reported by commands such as hadoop fsck, hadoop fs -dus and hadoop fs -count -q when it comes to reasoning about HDFS space quotas. Here is...

0

How to identify what is consuming space in HDFS

Source: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.html Find the directories using the most space in HDFS For a UI showing the biggest consumers of space in HDFS install and configure Twitter’s HDFS-DU. For a quick visual representation of HDFS disk usage with no extra tools required,...

0

Setup local Hadoop dev environment on macOS

It is always so convenient to have a local environment for learning and quick testing of a scenario. If you are working in macOS environment looking to learn or setup Hadoop locally then you are in the right place. I...

0

Hive statistics using beeline and expect script

Following expect script uses beeline interface to fetch statistics of tables within a database. Use username and queuename with your environment values. #!/usr/bin/expect -f # hive_statistics, v0.1, 2016-05, [email protected] # Usage: ./hive_statistics [database_name] set _database [lindex $argv 0] if {...

0

Hortonworks Data Platform Installation errata – Missing manual

Pre-requisites Creating service users and databases in MySQL JDBC connector error during ambari-server setup with MySQL MySql connection failing during ambari automated installation   Some useful bash scripts Create database and user on mysql for services like ambari, oozie, hue,...

0

Configure log files on HDP platform

  Kafka Storm Ranger HDFS Zookeeper Oozie Knox Hive & Hive metastore   1. Kafka Kafka currently uses org.apache.log4j.DailyRollingFileAppender, which doesn’t allow us to specify the max backup index, max file size. And by default, rolls every hour creating 24...

0

Insightful Hadoop administration commands

Tip #1 Quick list of operations Where? Example: /var/log/hadoop/hdfs cat hdfs-audit.log | awk ‘{cmds[$9]++}END{for (i in cmds)printf “%s %d\n”,i,cmds[i]}’ Results [user@server hdfs]$ cat hdfs-audit.log | awk ‘{cmds[$9]++}END{for (i in cmds)printf “%s %d\n”,i,cmds[i]}’ cmd=setTimes 52 cmd=listStatus 47422 cmd=create 36932 cmd=getfileinfo 7431182...

0

Lookup YARN Acls capacity scheduler queue users from /etc/passwd

Following is an awk script that I use in TextWrangler as a Text Filter. This script generates the required awk and grep commands to lookup /etc/passwd file. #!/bin/sh # gawk ‘{match($0,”([a-zA-Z]+).acl_submit_applications=(.*)”,a); if(a[1] != “”) print a[1] “\t” a[2] }’ #...

0

A Secure HDFS Client Example

Source: http://henning.kropponline.de/2016/02/14/a-secure-hdfs-client-example/ It takes about 3 lines of Java code to write a simple HDFS client that can further be used to upload, read or list files. Here is an example: Configuration conf = new Configuration(); conf.set(“fs.defaultFS”,”hdfs://one.hdp:8020″); FileSystem fs = FileSystem.get(conf);...

0

Clean UNinstall Hortonworks HDP 2.2

Source: https://pulsatingtaurus.wordpress.com/2015/02/15/clean-uninstall-hortonworks-hdp-2-2/   I love Hadoop and Hortonworks is one of my favored Hadoop distributuion. However while experimenting with the hadoop installation, I had many instances when I needed to start afresh on the set of physical as well as virtual Hadoop cluster. Hortonworks provide...