Which one to choose between Pig and Hive?

Technically they both will do the job, you are looking from “either hive or Pig” perspective, means you don’t know what you are doing yet. However if you first define the data source, scope and the result representation and then look for which one to choose between Hive or Pig, you will find they are different for your job now and choosing one instead of other will have extra benefits. At last both Hive and Pig can be extended with UDFs and UDAFs to make them look again same at the end so now you can think again which one was best.

For a person with roots in database & SQL, Hive is the best however for script kids or programmer, Pig has close resemblance.

Hive provides SQL like interface and relational model to your data, and if your data really unstructured, PIG is better choice. If you look at definition of a proper schema in HIVE which makes it closer in concept to RDBMS. You can also say that In Hive you write SQL, in Pig you execute a sequence of plans. Both Pig and Hive are abstractions on top of MapReduce, so for control and performance you would really need to use MapReduce.  You can start with Pig and use MapReduce when you really want to go deeper.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s