Enterprise Hadoop Solution distributed by key Hadoop vendors

Lets start from Cloudera Enterprise Data Hub:

Cloudera-Ehadoop

Here is the offering from Hortonworks:

HW-enterprizehadoop

And this is how MapR is packaging Enterprise Hadoop

mapr-hadoop

And finally Pivotal Enterprise Hadoop offering:

Pivotal-hadoop

Keywords: Apache Hadoop, Cloudera, Hortonworks, Pivotal, MapR, Big Data

Advertisements

Open Source Distributed Analytics Engine with SQL interface and OLAP on Hadoop by eBay – Kylin

What is Kilyn?

  • Kylin is an open source Distributed Analytics Engine with SQL interface and multi-dimensional analysis (OLAP) to support extremely large datasets on Hadoop by eBay.

kylin

Key Features:

  • Extremely Fast OLAP Engine at Scale:
    • Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data
  • ANSI-SQL Interface on Hadoop:
    • Kylin offers ANSI-SQL on Hadoop and supports most ANSI-SQL query functions
  • Interactive Query Capability:
    • Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset
  • MOLAP Cube:
    • User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records
  • Seamless Integration with BI Tools:
    • Kylin currently offers integration capability with BI Tools like Tableau.
  • Other Highlights:
    • Job Management and Monitoring
    • Compression and Encoding Support
    • Incremental Refresh of Cubes
    • Leverage HBase Coprocessor for query latency
    • Approximate Query Capability for distinct Count (HyperLogLog)
    • Easy Web interface to manage, build, monitor and query cubes
    • Security capability to set ACL at Cube/Project Level
    • Support LDAP Integration

Keywords: Kylin, Big Data, Hadoop, Jobs, OLAP, SQL, Query

A collection of Big Data Books from Packt Publication

I found that Packt publication have few great books on Big Data and here is a collection of few books which I found very useful:Screen Shot 2014-09-30 at 11.50.08 AM

Packt is giving its readers a chance to dive into their comprehensive catalog of over 2000 books and videos for the next 7 days with LevelUp program:

packt

Packt is offering all of its eBooks and Videos at just $10 each or less

The more EXP customers want to gain, the more they save:

  • Any 1 or 2 eBooks/Videos – $10 each
  • Any 3 to 5 eBooks/Videos – $8 each
  • Any 6 or more eBooks/Videos – $6 each

More Information is available at bit.ly/Yj6oWq  |  bit.ly/1yu4679

For more information please visit : www.packtpub.com/packt/offers/levelup

Finding Hadoop specific processes running in a Hadoop Cluster

Recently I was asked to provide info on all Hadoop specific process running in a Hadoop cluster. I decided to run few commands as below to provide that info.

Hadoop 2.0.x on Linux (CentOS 6.3) – Single Node Cluster

First list all Java process running in the cluster

[cloudera@localhost usr]$ ps -A | grep java
1768 ?        00:00:28 java
2197 ?        00:00:54 java
2439 ?        00:00:30 java
2507 ?        00:01:19 java
2654 ?        00:00:35 java
2784 ?        00:00:52 java
2911 ?        00:00:56 java
3028 ?        00:00:31 java
3239 ?        00:00:59 java
3344 ?        00:01:11 java
3446 ?        00:00:27 java
3551 ?        00:00:30 java
3644 ?        00:00:22 java
3878 ?        00:01:08 java
4142 ?        00:02:16 java
4201 ?        00:00:36 java
4223 ?        00:00:25 java
4259 ?        00:00:21 java
4364 ?        00:00:29 java
4497 ?        00:11:11 java
4561 ?        00:00:44 java

Next dig each Java specific process to dig further to see which Hadoop specific application is running within Java proc:

[cloudera@localhost usr]$ ps -aef | grep java

499       1768     1  0 08:29 ?        00:00:29 /usr/java/jdk1.6.0_31/bin/java -Dzookeeper.datadir.autocreate=false -Dzookeeper.log.dir=/var/log/zookeeper -********

yarn 2197 1 0 08:29 ? 00:00:55 /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn ********

sqoop2 2439 1 0 08:29 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/sqoop2/sqoop-server/conf/logging.properties -Dsqoop.config.dir=/etc/sqoop2/conf ****************

yarn 2507 1 0 08:29 ? 00:01:21 /usr/java/jdk1.6.0_31/bin/java -Dproc_nodemanager -Xmx1000m -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn **********

mapred 2654 1 0 08:30 ? 00:00:36 /usr/java/jdk1.6.0_31/bin/java -Dproc_historyserver -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-mapreduce -Dhadoop.log.file=yarn-mapred-historyserver-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hdfs 2784 1 0 08:30 ? 00:00:53 /usr/java/jdk1.6.0_31/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-localhost.localdomain.log ********

hdfs 2911 1 0 08:30 ? 00:00:57 /usr/java/jdk1.6.0_31/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-localhost.localdomain.log *********

hdfs 3028 1 0 08:30 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Dproc_secondarynamenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hbase 3239 1 0 08:31 ? 00:01:00 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log *******

hbase 3344 1 0 08:31 ? 00:01:13 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC ****

hbase 3446 1 0 08:31 ? 00:00:28 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-rest-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/*******

hbase 3551 1 0 08:31 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-thrift-localhost.localdomain.log *******

flume 3644 1 0 08:31 ? 00:00:23 /usr/java/jdk1.6.0_31/bin/java -Xmx20m -cp /etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar *******

root 3865 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log ********

mapred 3878 3865 0 08:31 ? 00:01:09 java -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop **********

root 4139 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ************

mapred 4142 4139 1 08:31 ? 00:02:19 java -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ***************

httpfs 4201 1 0 08:31 ? 00:00:37 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/hadoop-httpfs/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager ******

hive 4223 1 0 08:31 ? 00:00:26 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *********

hive 4259 1 0 08:31 ? 00:00:22 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *****

hue 4364 4349 0 08:31 ? 00:00:30 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dlog4j.configuration=log4j.properties -Dhadoop.log.dir=//usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log *******

oozie 4497 1 6 08:31 ? 00:11:27 /usr/bin/java -Djava.util.logging.config.file=/usr/lib/oozie/oozie-server-0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx1024m -Doozie.https.port=11443 *********

sqoop 4561 1 0 08:31 ? 00:00:45 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop *******

cloudera 15657 8150 0 11:26 pts/4 00:00:00 grep java

Note: The above output is trimmed as each process spit out full class path etc. along with other process specific details.

HDInsight On Windows – Single Node Cluster

Apache Hadoop datanode Running Automatic .hadoop
Apache Hadoop historyserver Running Automatic .hadoop
Apache Hadoop isotopejs Running Automatic .hadoop
Apache Hadoopjobtracker Running Automatic .hadoop
Apache Hadoop namenode Running Automatic .hadoop
Apache Hadoop secondarynamenode Running Automatic .hadoop
Apache Hadoop tasktracker Running Automatic .hadoop
Apache Hive Derbyserver Running Automatic Ahadoop
Apache Hive hiveserver Running Automatic .hadoop
Apache Hive hwi Running Automatic .hadoop