Sparkling Water 2.0 Walkthrough with pysparkling



Pysparkling Command:

$$> bin/pysparkling --num-executors 2 --executor-memory 2g --driver-memory 2g --conf spark.dynamicAllocation.enabled=false
 Python 2.7.10 (default, Jul 30 2016, 18:31:42)
 [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
 Type "help", "copyright", "credits" or "license" for more information.
 Using Spark's default log4j profile: org/apache/spark/
 Setting default log level to "WARN".
 To adjust logging level use sc.setLogLevel(newLevel).
 16/10/20 09:29:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / .__/\_,_/_/ /_/\_\ version 2.0.1
 Using Python version 2.7.10 (default, Jul 30 2016 18:31:42)
 SparkSession available as 'spark'.

Now entering Commands:

>>> from pysparkling import *
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> import h2o
>>> sqlContext = SQLContext(sc)
>>> sqlContext

>>> hc = H2OContext.getOrCreate(sc)

Here is the successful output:

 16/10/20 09:31:10 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
 16/10/20 09:31:10 WARN InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified!
 We recommend to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1`
 Warning: if you don't want to start local H2O server, then use of `h2o.connect()` is preferred.
 Checking whether there is an H2O instance running at connected.
 -------------------------- ----------------------------------------
 H2O cluster uptime: 09 secs
 H2O cluster version:
 H2O cluster version age: 1 month
 H2O cluster name: sparkling-water-avkashchauhan_2132345410
 H2O cluster total nodes: 3
 H2O cluster free memory: 2.364 Gb
 H2O cluster total cores: 24
 H2O cluster allowed cores: 24
 H2O cluster status: accepting new members, healthy
 H2O connection url:
 H2O connection proxy:
 Python version: 2.7.10 final
 -------------------------- ----------------------------------------

Now verifying sparkling water package and make sure you have pysparkling reference to 2.0_2.0.0 package above.

>> h2o

<module ‘h2o’ from ‘/private/var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T/avkashchauhan/spark/work/spark- 28af708d-a149-435a-9a53-41e63d9ba7f5/userFiles-cf7c7aaf-610f-439d-9037- 2dcddca73524/h2o_pysparkling_2.0-2.0.0-py2.7.egg/h2o/init.pyc’>

Getting Help for h2o:

>> help(h2o)

Getting Cluster Status

>> h2o.cluster_status()


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s