No module name H2O error with spark-submit

When running or executing sparkling water python example to spark cluster using spark-submit you may get this error:

$ spark-submit –packages
ai.h2o:sparkling-water-core_2.10:1.6.1 –py-files
/home/test12/SparklingWater/sparkling-water-1.6.5/py/dist/pySparkling-1.6.1-py2.7.egg
/home/test12/SparklingWater/sparkling-water-1.6.5/py/examples/scripts/ChicagoCrimeDemo.py

Here is the error:

:: retrieving :: org.apache.spark#spark-submit-parent 
confs: [default] 
9 artifacts copied, 0 already retrieved (3050kB/16ms)
Traceback (most recent call last): 
File "/home/test12/SparklingWater/sparkling-water-1.6.5/py/examples/scripts/ChicagoCrimeDemo.py", line 1, in <module> 
import h2o
ImportError: No module named h2o

 

Solution:

The problem happens here because we did not bundle H2O in the egg file based on Spark 1.6.1.  If you download Spark 1.6.5 and use the egg file as below you should be able to get H2O package downloaded in the same script and you will not have this problem. The correct command will be as below:

$ spark-submit –py-files
$YOUR_PATH/sparkling-water-1.6.5/py/dist/h2o_pysparkling_1.6-1.6.5-py2.7.egg
–conf spark.dynamicAllocation.enabled=false your_py_fille.py

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s