Using Kyro library with Sparkling Water

Start sparkling water from spark shell:

$ bin/sparkling-shell

Start sparkling water from spark shell and add 3rd party jar:

$ bin/sparkling-shell --jars kryo-4.0.0.jar
Note: Make sure the jar file is accessible from this path

Verify if jar is added:

scala> sc.listJars
res1: Seq[String] = ArrayBuffer(spark://172.16.2.123:53523/jars/kryo-4.0.0.jar)
Access spark web Interface:

Verify on Spark UI that environment is set and jobs are listed as below:

http://localhost:4040/jobs/
http://localhost:4040/environment/
Note: Above you can look for jars, path, configuration etc to verify things.

To look available functions/method for an object type it at Scala console append . and then press TAB

scala> sc.[TAB]
Note: Above will give you the list of methods supported with SparkContext (sc)
Getting Spark Configuration:

scala> spark.conf.getAll
 res2: Map[String,String] = Map(spark.serializer -> org.apache.spark.serializer.KryoSerializer,
 spark.driver.host -> 172.16.2.123, spark.driver.port -> 53817, 
 hive.metastore.warehouse.dir -> /Volumes/OSxexT/tools/sparkling-water-2.0.0/spark-warehouse,
 spark.repl.class.uri -> spark://172.16.2.123:53817/classes, 
 spark.jars -> file:/Volumes/OSxexT/tools/sparkling-water-2.0.0/kryo-4.0.0.jar, 
 spark.repl.class.outputDir -> /private/var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T/avkashchauhan/spark/work/spark-551682b2-b75b-44c1-b57e-b5741b29f367/repl-1986c79c-aa93-4f3d-a7ad-3c83ad796d67, 
 spark.app.name -> MyApp, 
 spark.driver.memory -> 3G,
 spark.logConf -> true
 spark.executor.id -> driver, 
 spark.kryo.registrationRequired -> true, 
 spark.driver.extraJavaOptions -> " -XX:MaxPermSize=384m", 
 spark....

Referencing 3rd Party jar i.e. kryo (we added with –jars)

scala> import com.esotericsoftware.kryo.Kryo
 import com.esotericsoftware.kryo.Kryo
 scala> val k:Kryo = new Kryo()
 k: com.esotericsoftware.kryo.Kryo = com.esotericsoftware.kryo.Kryo@4626f584
 scala> k.[TAB]

Now using SparkSession object:

scala> import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.SparkSession
 scala> val spark = SparkSession.builder()
 .appName("appNameX")
 .enableHiveSupport()
 .config("spark.serializer","org.apache.spark.serializer.KryoSerializer")
 .config("spark.kryo.registrationRequired", "true")
 .config("spark.logConf", "true")
 .getOrCreate()
 16/10/21 14:56:04 WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.
 spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@2fee69a1

If you wanted to run a quick test from the source you can do the following:

$ SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.h2o.HamOrSpamDemo ./assembly/build/libs/sparkling-water-assembly_2.11-2.0.0-all.jar --jars kryo-4.0.0.jar

Note: Above a built-in example call is called as an application with sparkling water and a 3rd party jar is also passed.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s