ABC of Data Science

HAPPY NEW YEAR to all my readers!!


A: ACID – Atomicity, Consistency, Isolation and Durability
B: Big Data – Volume, Velocity, Variety
C: Columnar (or Column-Oriented) Database
D: Data Warehousing – Relevant and very useful
E: ETL – Extract, transform and load
F: Flume – A framework for populating Hadoop with data
G: Geospatial Analysis – A picture worth 1,000 words or more
H: Hadoop, HDFS, HBASE – Do you really want to know?
I:  In-Memory Database – A new definition of superfast access
J: Java – Hadoop gave biggest push in last years to stay in enterprise market
K: Kafka – High-throughput, distributed messaging system originally developed at LinkedIn
L: Latency – Low Latency and High Latency
M: Map/Reduce – MapReduce
N:  NoSQL Databases – No SQL Database or Not Only SQL
O: Oozie – Open-source workflow engine managing Hadoop job processing
P: Pig – Platform for analyzing huge data sets
Q: Quantitative Data Analysis
R: Relational Database – Still relevant and will be for some time
S: Sharding (Database Partitioning)  and Sqoop (SQL Database to Hadoop)
T: Text Analysis – Larger the information, more needed analysis
U: Unstructured Data – Growing faster than speed of thoughts
V: Visualization – Important to keep the information relevant
W: Whirr – Big Data Cloud Services i.e. Hadoop distributions by cloud vendors
X:  XML – Still eXtensible and no Introduction needed
Y: Yottabyte – Equal to 1,000 exabytes, 1 million petabytes and 1 billion terabytes
Z: Zookeeper – Help managing Hadoop nodes across a distributed network

Here are some visualization from Wordle:









Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s