Apache Weave: Big Data Application runtime and development framework by Continuuity

Continuuity decided to build Weave and be part of the journey to take Apache YARN to the next level of usability and functionality. Continuuity has been using Weave extensively to support their  products and  seen the benefit and power of Apache YARN and Weave combined.  Continuuity decided to share Weave under the Apache 2.0 license in an effort to collaborate with members of the community, broaden the set of applications and patterns that Weave supports, and further the overall adoption of Apache YARN.

Weave is NOT a replacement for Apache YARN.  It is instead a value-added framework that operates on top of Apache YARN.

What is Weave:  Weave is a simple set of libraries that allows you to easily manage distributed applications through an abstraction layer built on Apache YARN. Weave allows you to use YARN’s distributed capabilities with a programming model that is similar to running threads.

Features of Weave:
– Simple API for specifying, running and managing application lifecycle
– An easy way to communicate with an application or parts of an application
– A generic Application Master to better support simple applications
– Simplified archive management and local file transport
– Improved control over application logs, metrics and errors
– Discovery service
– And many more…

Weave Source code is available on github at http://github.com/continuuity/weave under the Apache 2.0 License.

Learn more at http://www.continuuity.com/.

Keyword: Hadoop, Yarn, MapReduce, Big Data

Upgrading Pycrypto using pip in Ubuntu

Here are the steps to upgrade pycrypto library in ubuntu machine:

Step 1: check pycrypto version

ubuntu@ip-***:~$ pip show pycrypto

Name: pycrypto
Version: 2.4.1
Location: /usr/local/lib/python2.7/dist-packages

Note: If you dont have pip working try installing 

$ sudo apt-get install python-devel

$ easy_install pip


Step 2: upgrade pycrypto using pip

ubuntu@ip-10-254-71-179:~$ pip install –upgrade pycrypto

Downloading/unpacking pycrypto from https://pypi.python.org/packages/source/p/pycrypto/pycrypto-2.6.tar.gz#md5=88dad0a270d1fe83a39e0467a66a22bb
Downloading pycrypto-2.6.tar.gz (443kB): 443kB downloaded
Running setup.py egg_info for package pycrypto

Installing collected packages: pycrypto
Found existing installation: pycrypto 2.4.1
Uninstalling pycrypto:


Successfully installed pycrypto
Cleaning up…


Step 3: Verifying the upgrade

ubuntu@ip-10-254-71-179:~$ pip show pycrypto

Name: pycrypto
Version: 2.6
Location: /usr/local/lib/python2.7/dist-packages



Amazon EC2 Security Group (Firewall) settings for Hadoop Cluster

When setting Hadoop cluster in Amazon EC2 you would need to configure proper security settings (firewall) so you can access Hadoop cluster directly. Following are the settings for Cloudera CDH4 Hadoop distribution on EC2:




Port 22 for SSH, Port 7180/82 for CDH Manager, 7432 for PSQL and 8888 for Hue and finally Port 50000-50100 for Hadoop JT and HDFS.


US Mass-shootings data visualization from 1996-2012

Here are mass-shootings data visualization between  1996-2012 in US. The stats are based on news papers and data visualization is done using Platfora.

Graph 1: Mass shootings in USA between 1996-2012

Mass shootings in USA between 1996-2012







Graph 2: Month with maximum numbers of mass shootings

Month with maximum numbers of mass shootings







Graph 3: School or Place Name and Causality Count

School or Place Name and Causality Count







Graph 4: Month with maximum numbers of mass shootings

Month with maximum numbers of mass shootings







Graph 5: Mass shootings in particular month

Mass shootings in particular month







Keywords: Hadoop, Big Data, Data Visualization, Platfora, HadoopBI, BigDataBI