Using Sparkling water and PySpark to log console output

Here is the command Option #1:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the command Option #2:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0,ai.h2o:sparkling-water-core_2.10:1.6.7 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

We must make sure that both h2o backend and python version are calling same Version of API.

This parameter is using H2O API backend version 1.6.7
ai.h2o:sparkling-water-core_2.10:1.6.7

This parameter is using 1.6.7 version of Python API:
–py-files /mnt/app/sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the script to test overall scenario:

>>> from pysparkling import *
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> import h2o
>>> sqlContext = SQLContext(sc)
>>> hc = H2OContext.getOrCreate(sc)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s