6 reasons why 2012 could be the year of Hadoop
Defining Hadoop: the Players, Technologies and Challenges of 2011
The Hadoop project includes these subprojects:
- Hadoop Common: The common utilities that support the other Hadoop subprojects.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Other Hadoop-related projects at Apache include:
- Avro™: A data serialization system.
- Cassandra™: A scalable multi-master database with no single points of failure.
- Chukwa™: A data collection system for managing large distributed systems.
- HBase™: A scalable, distributed database that supports structured data storage for large tables.
- Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout™: A Scalable machine learning and data mining library.
- Pig™: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper™: A high-performance coordination service for distributed applications.
- BigDataUniversity is a free resources to learn Big Data with Hadoop using self paced training material online
- Cloudera has some free online material which you can use:
- MapR have a few amazing Videos which you can start to dive into:
- MapReduce WiKi: http://en.wikipedia.org/wiki/Mapreduce
- MapReduce.org: http://mapreduce.org/
- 5 Part Lecture on MapReduce 2007: http://www.youtube.com/watch?v=yjPBkvYh-ss&noredirect=1
- Google Code University on Parallel computing & MapReduce: http://code.google.com/edu/parallel/mapreduce-tutorial.html
- App Engine and MapReduce (GoogleIO 2011): http://www.youtube.com/watch?v=EIxelKcyCC0
- MapReduce by Google Israel: http://www.youtube.com/watch?v=zVSSsJ_ua4Q