Tensorflow 0.12.1, Cuda 8.0 and CudNN 5.1 on Ubuntu 16.04 with Titan X

OS: Ubuntu 16.04, Nvidia Display Card – Titan X

Problem:

>> import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn’t open CUDA library libcudnn.so. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3459] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

Note: You can see above that NOT all TF specific libraries are loaded and some failed to load.

>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUresult(-1)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: mr-dl8
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: mr-dl8
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got “1”
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: “””NVRM version: NVIDIA UNIX x86_64 Kernel Module 370.28 Thu Sep 1 19:45:04 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
“””
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 370.28.0
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:

Note: You can see that  TF could not initialize the CUDA driver.

Solution:

We are going to install Nvidia 370 drivers with CUDA 8.0 and CudNN 5.1

  • Remove all nvidia drivers
    • sudo apt-get purge nvidia*
  • Search for nvidia drivers in the cache
    • apt-cache search nvidia
    • apt search nvidia
  • Add proper driver repo
    • sudo add-apt-repository ppa:graphics-drivers/ppa
  • Update the system repo
    • sudo apt-get update
  • Search for nvidia 370 driver
    • sudo apt search nvidia | grep 370
    • Note: You must see something nvidia 370 in results
  • Install the driver:
    • sudo apt-get install nvidia-370 nvidia-settings
  • Must Reboot
    • sudo reboot

After reboot you will see the following: (Shows driver Version 370)

$ nvidia-smi

Tue Jan 10 19:05:18 2017
+————————————————————–+
| NVIDIA-SMI 370.28 Driver Version: 370.28 |
|—————–+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=======================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:81:00.0 On | N/A |
| 23% 32C P8 10W / 250W | 11590MiB / 12188MiB | 0% Default |
+——————–+———————-+———————-+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===================================================================|
| 0 1162 G /usr/lib/xorg/Xorg 60MiB |
+—————————————————————+

Disable unattended upgrade in the system so machine does not update the driver.

$ sudo apt-get remove unattended-upgrades

Now Install CUDA Tool Kit for Ubuntu 16.04 as below:

$ wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
$ bash ./cuda_8.0.44_linux-run -override

Note: Do not install driver above and only install cuda 8.0 tool kit and sample at default location.

Get and Update CudNN 5.1 properly:

  • Create account at Nvidia website and download cudnn-8.0-linux-x64-v5.1.tgz
  • Unzip cudnn-8.0-linux-x64-v5.1.tgz
  • update Cuda libs and include with cudnn as below:
    • sudo cp cudnn/lib64/* /usr/local/cuda/lib64/
    • sudo cp cudnn/include/* /usr/local/cuda/include/
    • sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

REBOOT the Machine. 

Install Tensorflow 0.12.1 (Prebuilt Binaries)

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
sudo pip install –upgrade $TF_BINARY_URL

Test now:

>> import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

>>> a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=’a’)

>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=’b’)
>>> c = tf.matmul(a, b)
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:81:00.0
Total memory: 11.90GiB
Free memory: 11.69GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:81:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: TITAN X (Pascal), pci bus id: 0000:81:00.0
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: TITAN X (Pascal), pci bus id: 0000:81:00.0

>> print sess.run(c)

MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
Advertisements

2 thoughts on “Tensorflow 0.12.1, Cuda 8.0 and CudNN 5.1 on Ubuntu 16.04 with Titan X

  1. IN text above,
    “Note: Do not install driver above and only install cuda 8.0 tool kit and sample at default location.”
    The instruction is not clear. Could u please be more specific?

    Like

    • The reason to “not install driver” again is that you already have version 370.x driver and toolkit driver is i think 367.x so if you choose to install driver it will downgrade your Nvidia driver so avoid it.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s