No module name H2O error with spark-submit

When running or executing sparkling water python example to spark cluster using spark-submit you may get this error:

$ spark-submit –packages
ai.h2o:sparkling-water-core_2.10:1.6.1 –py-files

Here is the error:

:: retrieving :: org.apache.spark#spark-submit-parent 
confs: [default] 
9 artifacts copied, 0 already retrieved (3050kB/16ms)
Traceback (most recent call last): 
File "/home/test12/SparklingWater/sparkling-water-1.6.5/py/examples/scripts/", line 1, in <module> 
import h2o
ImportError: No module named h2o



The problem happens here because we did not bundle H2O in the egg file based on Spark 1.6.1.  If you download Spark 1.6.5 and use the egg file as below you should be able to get H2O package downloaded in the same script and you will not have this problem. The correct command will be as below:

$ spark-submit –py-files
–conf spark.dynamicAllocation.enabled=false



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s