Compiling neon (NervanaSystems) deep learning library source

screen-shot-2016-11-29-at-9-22-52-pm

Nervana Systems have Neon which  is an open source Python-based language and set of libraries for developing deep learning models. Neon is super fast, super powerful, and super easy to use!

github: https://github.com/NervanaSystems/neon

I tried building this source on my machine as below:

  • OS: Ubuntu 16.04
  • GPU: CUDA 8.0
  • NVDIA SMI – 375.20

Build Source:

  • $ git clone https://github.com/NervanaSystems/neon
  • $ cd neon
  • $ make

How to run:

  • $ pwd
    • Note: Make sure you are in the neon folder
  • neon$ . .venv/bin/activate
    • This will open python virtualenv session for you
    • You can run jupyter notebook here as
      • $ jupyter notebook
  • To close active neon session in virtualenv do the following:
    • neon$ deactivate

Here are few problems I had along with their solution:

PROBLEM [1]

c++: error: unrecognized command line option ‘-Wdate-time’
c++: error: unrecognized command line option ‘-fstack-protector-strong’
c++: error: unrecognized command line option ‘-Wdate-time’
c++: error: unrecognized command line option ‘-fstack-protector-strong’
error: command ‘c++’ failed with exit status 1


Command “/home/avkash/toolkit/neon/.venv2/bin/python2.7 -u -c “import setuptools, tokenize;file=’/tmp/pip-build-P52OJy/pycuda/setup.py’;f=getattr(tokenize, ‘open’, open)(file);code=f.read().replace(‘\r\n’, ‘\n’);f.close();exec(compile(code, file, ‘exec’))” install –record /tmp/pip-71rowS-record/install-record.txt –single-version-externally-managed –compile –install-headers /home/avkash/toolkit/neon/.venv2/include/site/python2.7/pycuda” failed with error code 1 in /tmp/pip-build-P52OJy/pycuda/
Makefile:116: recipe for target ‘.venv2/bin/activate’ failed
make: *** [.venv2/bin/activate] Error 1

SOLUTION:

  • You would need to change gcc/g++ to  5.0 or above

PROBLEM [2]

unable to execute ‘clang++’: No such file or directory
unable to execute ‘clang++’: No such file or directory
unable to execute ‘clang++’: No such file or directory


Command “/home/avkash/toolkit/neon/.venv2/bin/python2.7 -u -c “import setuptools, tokenize;file=’/tmp/pip-U3oScE-build/setup.py’;f=getattr(tokenize, ‘open’, open)(file);code=f.read().replace(‘\r\n’, ‘\n’);f.close();exec(compile(code, file, ‘exec’))” install –record /tmp/pip-le8M0S-record/install-record.txt –single-version-externally-managed –compile –install-headers /home/avkash/toolkit/neon/.venv2/include/site/python2.7/nervana-aeon” failed with error code 1 in /tmp/pip-U3oScE-build/
Makefile:157: recipe for target ‘aeon_install’ failed

SOLUTION:

You would need to install clang to solve this problem as below:
$ sudo apt-get install clang

PROBLEM [3]:

loader/src/util.hpp:24:10: fatal error: ‘sox.h’ file not found
#include <sox.h>

Solution:

You would need to install sox libraries as below:

  • $ sudo apt-get install libsox-fmt-all libsox-dev sox

Successful Build:

neon$ make

Installed /home/avkash/toolkit/neon
Processing dependencies for neon==1.7.0
Finished processing dependencies for neon==1.7.0

…..
make[1]: Entering directory ‘/home/avkash/toolkit/neon/loader’
make[1]: ‘bin/loader.so’ is up to date.
make[1]: Leaving directory ‘/home/avkash/toolkit/neon/loader’

Have fun!!!

 

DS101: Various distribution type and class of problems

When you choose “binomial” and “multinomial” distribution, those will only work with CLASSIFICATION problems. All other distributions are for REGRESSION problems.

Here is generic table of common distribution types:

Distribution Type
Binomial Classification
Multinomial Classification
Bernoulli Regression
Gaussian Regression
Poisson Regression
Gamma Regression
Tweedie Regression
Laplace Regression
Quantile Regression
Huber Regression

So if your response column is numeric and you will do binomial or multinomial, you will get an error as reported above. Your response column must be enum if you will select binomial and multinomial distribution. You can change an numeric column to ENUM first and then run the algorithm to build a classification model.

Hot reads for this week in machine learning and deep learning

gpu-accelerated-deep-learning-for-cudnn-v2-7-638

December – 27th – 31st December

December – 19th – 26th December

December – 12th – 18th December

November – 5th – 11th December

November – 28th – 4th December

MIT Tech Review  Python Image Research dd

November – 21st – 27th

November – 14th – 20th

Shoot like an artist – Using imagination, artificial intelligence, Tensorflow (& GPU)

natural-art

After I got opencv, mxnet and tensorflow working with CUDA, I was looking for tensorflow implementation of “A Neural Algorithm of Artistic Styleresearch paper and I found this.

I found  tensorflow implementation by Anish for the above research paper and I took from there.

Why Tensorflow:

  • TensorFlow supports automatic differentiation and has a clean API
  • Research paper steps are translated into code here
  • It has support for GPU (CUDA) so I can get works done faster (Time is $$)

 

Pre-requisite:

  • Ubuntu 16.04
  • Python 2.7
  • Tensorflow with GPU support

Commands:

  • Command for help:
    • $ python neural_style.py –help
  • Basic command:
    • $ python neural_style.py –content your_content.jpg –style your_style.jpg –output output_file_name.png –iteration 500
  • If you decided to use the output as input style you sure can do to get improved results
    • $ python neural_style.py –content your_content.jpg –style your_previous_output.png –output new_output_file_name.png –iteration 500

Few things I found:

  • If you have both content and style image as png you may get the following error
    • ValueError: Dimensions must be equal, but are 4 and 3
  • To solve it just use both content and style image as jpg.
  • If you have less memory in machine use both content and style image smaller under 480×480 size.

Example:

Top left after 500 iteration, top right after 2000 iteration and bottom image after 3500 iteration:

 

Tensorflow with CUDA/cuDNN on Ubuntu 16.04

tf-cuda-cudnn

Environment:

  • OS: Ubuntu 16.0
  • Python 2.7
  • CUDA 8.0.27
  • CuDNN v5.1
  • Note: TensorFlow with GPU support, both NVIDIA’s Cuda Toolkit (>= 7.0) and cuDNN (>= v3) need to be installed.

GPU verification:

$ nvidia-smi
Tue Nov 22 04:28:59 2016
+-------------------------------------------------------------+
| NVIDIA-SMI 370.28 Driver Version: 370.28 |
|---------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|====================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 43C P0 1W / 125W | 0MiB / 4036MiB | 0% Default |
+----------------------+----------------------+---------------+

+-------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===============================================================|
| No running processes found |
+--------------------------------------------------------------+

CUDA Toolkit verification:

$cat /usr/local/cuda/version.txt
CUDA Version 8.0.27

CuDNN Verification:

Download cudnn-8.0-linux-x64-v5.1.tgz from Nvidia developer site.

  • $ tar -xvzf cudnn-8.0-linux-x64-v5.1.tgz
  •  ## NOTE: unzip happens at local cuda folder
  • cuda
    • include/
      • cudnn.h
    • lib64/
      • libcudnn.so -> libcudnn.so.5*
      • libcudnn.so.5 -> libcudnn.so.5.1.5*
      • libcudnn.so.5.1.5*

You just need to merge cuDNN cudnn.h and lib64 files to cuda toolkit at /usr/bin/cuda as below:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Setting cuda libraries into path:

export PATH=${PATH}:/usr/local/cuda/bin

Tensorflow Install:

$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl
$ sudo pip install –upgrade $TF_BINARY_URL

Tensorflow Verification:

>>> import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
>>>

Have fun !!

Building opencv with Nvidia CUDA 8.0 extensions on Ubuntu 16.04

cuda

Note: Building opencv without CUDA you just need to following the following blog:

https://aichamp.wordpress.com/2016/11/11/compiling-opencv-in-ubuntu-16-04-with-gcc-4-9/

Here is the opencv libs without CUDA/gpu:

$ pkg-config –libs opencv

-L/usr/local/lib -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_cvv -lopencv_dnn -lopencv_dpm -lopencv_fuzzy -lopencv_hdf -lopencv_line_descriptor -lopencv_optflow -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_rgbd -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_face -lopencv_xfeatures2d -lopencv_shape -lopencv_video -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -lopencv_xphoto -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core

Get NVidia CUDA installed

Getting Source:

Get opencv-master and opencv_contrib from git and keep them in the same path:

Build Process (Step 1) – Config Build:

  • $ mkdir build;cd build
  • cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D OPENCV_EXTRA_MODULES_PATH=/mnt/avkash/opencv_contrib/modules -D BUILD_EXAMPLES=ON -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9 -DCUDA_CUDA_LIBRARY=/usr/local/cuda -DWITH_CUDA=ON -DENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_OPENGL=ON  ..

Note: In above step if CUDA configuration is correct you will see the following

  • CUDA detected: 8.0
— CUDA NVCC target flags: -gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-D_FORCE_INLINES

Error if CUDA Path are not set correctly:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDA_LIBRARY (ADVANCED)

Verify CUDA path (-DCUDA_CUDA_LIBRARY=/usr/local/cuda)

Build process Step 2 – Making code:

  • Make sure you are in opencv-master/build folder and step-1 was successful.
  • $ make -j8

Note: It will take about 1-2 hours and about 10GB space so make sure you have enough patience and space 🙂

Possible Errors:

[One] If you build fail due to “opencv_cudaimgproc.dir” missing error

make[2]: *** [modules/cudaimgproc/CMakeFiles/cuda_compile.dir/src/cuda/cuda_compile_generated_gftt.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs….
Scanning dependencies of target opencv_cudafeatures2d
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_cudafeatures2d.dir/src/orb.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_cudafeatures2d.dir/src/feature2d_async.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_cudafeatures2d.dir/src/brute_force_matcher.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_cudafeatures2d.dir/src/fast.cpp.o
[ 72%] Linking CXX shared library ../../lib/libopencv_cudafeatures2d.so
[ 72%] Built target opencv_cudafeatures2d
Scanning dependencies of target opencv_test_cudafeatures2d
Scanning dependencies of target opencv_perf_cudafeatures2d
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_test_cudafeatures2d.dir/test/test_features2d.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_test_cudafeatures2d.dir/test/test_main.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_perf_cudafeatures2d.dir/perf/perf_features2d.cpp.o
[ 72%] Building CXX object modules/cudafeatures2d/CMakeFiles/opencv_perf_cudafeatures2d.dir/perf/perf_main.cpp.o
[ 72%] Linking CXX executable ../../bin/opencv_perf_cudafeatures2d
CMakeFiles/Makefile2:3590: recipe for target ‘modules/cudaimgproc/CMakeFiles/opencv_cudaimgproc.dir/all’ failed
make[1]: *** [modules/cudaimgproc/CMakeFiles/opencv_cudaimgproc.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs….
[ 72%] Built target opencv_perf_cudafeatures2d
[ 72%] Linking CXX executable ../../bin/opencv_test_cudafeatures2d
[ 72%] Built target opencv_test_cudafeatures2d
Makefile:160: recipe for target ‘all’ failed
make: *** [all] Error 2

Solution:

  • Visit:https://github.com/opencv/opencv/issues/6632
  • Steps:
    • $ git clone https://github.com/thrust/thrust.git
    • $ cp -r thrust/thrust /usr/local/cuda/include
    • We are update above thrust files with files included into cuda/include folder
  • Rebuild code again
    • $ make -j8

[Two] fatal error: stdlib.h: No such file or directory

you may get the error with cmake step as below:

/usr/include/c++/6/cstdlib:75:25: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^
compilation terminated.

Solution:  Add the following parameter with cmake builder:

-DENABLE_PRECOMPILED_HEADERS=OFF

Updated cmake command as below:

  • cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D WITH_V4L=ON -D WITH_QT=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D OPENCV_EXTRA_MODULES_PATH=/mnt/avkash/opencv_contrib/modules -D BUILD_EXAMPLES=ON -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9 -DCUDA_CUDA_LIBRARY=/usr/local/cuda -DWITH_CUDA=ON -DENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_OPENGL=ON  -DENABLE_PRECOMPILED_HEADERS=OFF ..

Success Story:

[100%] Built target tutorial_imageSegmentation
[100%] Linking CXX executable ../../bin/cpp-tutorial-pnp_registration
[100%] Built target cpp-tutorial-pnp_registration
[100%] Linking CXX executable ../../bin/cpp-example-stitching_detailed
[100%] Built target example_stitching_detailed
[100%] Linking CXX shared module ../../lib/cv2.so
[100%] Built target opencv_python2

Build installer:

opencv-master/build$ sudo make install

Test OpenCV with CUDA in Python:

  • >>> import cv2
  • >>> print(cv2.__version__)
  • >>>print(cv2.cuda)

Compiling & Installing OpenBLAS in Ubuntu 16.04

Pre-requisite:

  • $ sudo apt-get install gfortran

Getting OpenBLAS Source:

Building Source:

  • cd OpenBLAS
  • make FC=gfortran
make[1]: Leaving directory ‘/mnt/avkash/OpenBLAS/exports’
OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)
OS … Linux
Architecture … x86_64
BINARY … 64bit
C compiler … GCC (command line : gcc)
Fortran compiler … GFORTRAN (command line : gfortran)
Library Name … libopenblas_sandybridgep-r0.2.20.dev.a (Multi threaded; Max num-threads is 8)
To install the library, you can run “make PREFIX=/path/to/your/installation install”.

Installation:

  • $ sudo make PREFIX=/usr/local install
make -j 8 -f Makefile.install install
make[1]: Entering directory ‘/mnt/avkash/OpenBLAS’
Generating openblas_config.h in /usr/local/include
Generating f77blas.h in /usr/local/include
Generating cblas.h in /usr/local/include
Copying LAPACKE header files to /usr/local/include
Copying the static library to /usr/local/lib
Copying the shared library to /usr/local/lib
Generating OpenBLASConfig.cmake in /usr/local/lib/cmake/openblas
Generating OpenBLASConfigVersion.cmake in /usr/local/lib/cmake/openblas
Install OK!
make[1]: Leaving directory ‘/mnt/avkash/OpenBLAS’
Error:  If you hit the following error:  /usr/bin/ld: cannot find -lgfortran

gcc -O2 -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\”_\” -DCHAR_CNAME=\”\” -DNO_AFFINITY -I.. -shared -o ../libopenblas_sandybridgep-r0.2.20.dev.so \
-Wl,–whole-archive ../libopenblas_sandybridgep-r0.2.20.dev.a -Wl,–no-whole-archive \
-Wl,-soname,libopenblas.so.0 -lm -lpthread -lgfortran -lm -lpthread -lgfortran
/usr/bin/ld: cannot find -lgfortran
collect2: error: ld returned 1 exit status
Makefile:133: recipe for target ‘../libopenblas_sandybridgep-r0.2.20.dev.so’ failed
make[1]: *** [../libopenblas_sandybridgep-r0.2.20.dev.so] Error 1
make[1]: Leaving directory ‘/home/ubuntu/toolkit/OpenBLAS/exports’
Makefile:85: recipe for target ‘shared’ failed
make: *** [shared] Error 2
Solution: You just need to use NO_LAPACK=1 parameter along with make.

$ make FC=gfortran NO_LAPACK=1

Thats it!!

Installing Nvidia driver and toolkit in Ubuntu 16.04 with tooklit samples

Pre-requisite:

  • Make sure you gcc, g++ in your machine
  • CUDA based graphics card

Installation:

Get Cuda 8.0 from Nvidia:

 

Unzip:

  • Unzip cudnn-8.0-linux-x64-v5.1.tgz
  • You will get the file as cuda_8.0.44_linux-run

Execution:

Run $ bash ./cuda_8.0.44_linux-run


Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: y
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]: /usr/local/cuda
Cannot install toolkit in /usr/local/cuda.
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ubuntu ]: /mnt/avkash/cuda-samples
Installing the NVIDIA display driver…
Error: unsupported compiler: 5.4.1. Use –override to override this check.
Installing the CUDA Samples in /mnt/avkash/cuda-samples …
sh: 1: /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh: not found
chown: failed to get attributes of ‘/mnt/avkash/cuda-samples’: No such file or directory
===========
= Summary =
===========
Driver: Installed
Toolkit: Installation Failed. Using unsupported Compiler.
Samples: Installed in /mnt/avkash/cuda-samples
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /tmp/cuda_install_6246.log

 

Problem:

  • The problem is that due to compiler issue the installation did not work.
  • Looking the log you will see the gcc/g++ compiler are 5.4.x however over 5.3 are not supported.

Solution is to downgrade gcc/g++ to 4.9

Solution: https://aichamp.wordpress.com/2016/11/10/downgrading-gcc-from-5-4-to-4-9-in-ubuntu-16-04/

Try again:

Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ubuntu ]: /mnt/avkash/cuda-samples
Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
Installing the CUDA Samples in /mnt/avkash/cuda-samples …
Copying samples to /mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples now…
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /mnt/avkash/cuda-samples
Please make sure that
– PATH includes /usr/local/cuda-8.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

 

Cuda Samples:

Compilation:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ make

Listing:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ll
    total 640
    drwxr-xr-x 2 root root 4096 Nov 21 01:24 ./
    drwxr-xr-x 7 root root 4096 Nov 21 01:17 ../
    -rwxr-xr-x 1 root root 581960 Nov 21 01:24 deviceQuery*
    -rw-r–r– 1 root root 12174 Nov 21 01:17 deviceQuery.cpp
    -rw-r–r– 1 root root 21264 Nov 21 01:24 deviceQuery.o
    -rw-r–r– 1 root root 9077 Nov 21 01:17 Makefile
    -rw-r–r– 1 root root 1737 Nov 21 01:17 NsightEclipse.xml
    -rw-r–r– 1 root root 168 Nov 21 01:17 readme.txt

Execution:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
    ./deviceQuery Starting…
    CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: “GRID K520”
    CUDA Driver Version / Runtime Version 8.0 / 8.0
    CUDA Capability Major/Minor version number: 3.0
    Total amount of global memory: 4036 MBytes (4232052736 bytes)
    ( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
    GPU Max Clock rate: 797 MHz (0.80 GHz)
    Memory Clock rate: 2500 Mhz
    Memory Bus Width: 256-bit
    L2 Cache Size: 524288 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
    Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: No
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 3
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GRID K520
    Result = PASS