Sparkling Water – Tips and Tricks

You must set SPARK_HOME to proper spark version you would want to use:

$ export SPARK_HOME=/home/ec2-user/spark-1.6.2-bin-hadoop2.6

This is how you will launch spark shell:

$ bin/spark-shell or [Anywhere] $SPARK_HOME/bin/spark-shell

Once above command is successful, you will see the following stdout:

----- Spark master (MASTER) : local[*] Spark home (SPARK_HOME) : /home/ec2-user/spark-1.6.2-bin-hadoop2.6 H2O build version : 3.10.0.6 (turing) Spark build version : 1.6.2 ----

16/09/20 21:04:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to / / / / \ \/ \/ `/ / '/ // ./_,// //_\ version 1.6.2 /_/

Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_111) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. 16/09/20 21:05:10 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/09/20 21:05:10 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/09/20 21:05:16 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/09/20 21:05:16 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException SQL context available as sqlContext.

To get verbose output use –verbose options as below

 $ SPARK_HOME/bin/spark-shell --verbose

In some cases if you get an error like this:

16/09/20 21:03:41 WARN Utils: Service 'sparkDriver' could not bind on port 7815. Attempting port 7816.
16/09/20 21:03:41 ERROR SparkContext: Error initializing SparkContext.

It means the spark is trying bind at some un-configurable hostname

Note: In EC2 instance if you set [ hostname ec2-x-x-x-x..compute-1.amazonaws.com] you may see this problem.

$ export SPARK_LOCAL_IP="localhost"
$ SPARK_HOME/bin/spark-shell

Once spark shell is up you can create H2O context as below (Sparkling Water 1.6.7):

scala> import org.apache.spark.h2o._
import org.apache.spark.h2o._
scala> val hc = H2OContext.getOrCreate(sc)

Try to get H2O context as below and you will see H2O information as below:

scala> hc res0: org.apache.spark.h2o.H2OContext =
Sparkling Water Context: H2O name: sparkling-water-ec2-user_1664047936 cluster size: 1 * list of used nodes: (executorId, host, port) ------------------------ (driver,localhost,54321) ------------------------

Open H2O Flow in browser: http://127.0.0.1:54321 (CMD + click in Mac OSX)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s