CUDA Musing: TensorFlow 0.8 on Jetson TK1

This post gives updated instructions on how to build TensorFlow 0.8 on Jetson TK1 now that NVIDIA has released a new compiler that can handle the variadic templates without compiler internal errors.

If you just want to try to install the whl file, this is a direct link, tensorflow-0.8.0-cp27-none-linux_armv7l.whl

I am going to use the same approach highlighted in the previous post, basically use the CUDA runtime 6.5 and CUDDN v2 but compile the code with the newer 7.0 compiler.

Install the 7.0.76 compiler:

Before starting, you will need to download the new compiler. NVIDIA does not make your life easy in finding the link (they would like you to use Jetpack, but I don't like to reformat a working system if not absolutely needed) but you can download the .deb package directly on your Jetson with:

wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb

Now we can install it as usual:

sudo dpkg -i cuda-repo-l4t-7-0-local_7.0-76_armhf.deb

sudo apt-get update

sudo apt-get install cuda-toolkit-7-0

At this point we need to restore the standard 6.5 toolchain as the default one (we just want the 7.0 compiler to generate the object files), since the current driver on the Jetson TK1will only work with the 6.5 runtime. Go to the /usr/local directory and remove the cuda symlink to cuda-7.0 and make a new one for 6.5:

ubuntu@tegra-ubuntu:/usr/local$ sudo rm cuda

ubuntu@tegra-ubuntu:/usr/local$ sudo ln -s cuda-6.5/ cuda

You should see this output:

ubuntu@tegra-ubuntu:~$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Built on Fri_Dec_12_11:12:07_CST_2014

Cuda compilation tools, release 6.5, V6.5.35

ubuntu@tegra-ubuntu:~$ /usr/local/cuda-7.0/bin/nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Built on Mon_Feb_22_15:38:26_CST_2016

Cuda compilation tools, release 7.0, V7.0.74

Install protobuf and Bazel:

For protobuf you can follow the instruction from the previous blog post ( the only change is the location of protobuf-java-3.0.0-beta-x.jar , now in the java/core/target subdirectory).

Also for Bazel the procedure is similar, the only change required is the version, TF0.8 requires Bazel 0.1.4 so after cloning bazel, you will need to use the proper tag:

$ git clone https://github.com/bazelbuild/bazel.git

$ cd bazel

$ git checkout tags/0.1.4

Install TensorFlow 0.8:

The first thing to do it is to check out the source code and select the proper version:

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow

$ cd tensorflow

$ git checkout r0.8

TensorFlow is expecting a 64bit system, we will need to change all the reference from lib64 to lib. We can find all the files with the strings and apply all the changes with these commands:

$ cd tensorflow

$ grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g'

TensorFlow officially supports Cuda devices with 3.5 and 5.2 compute capabilities. We want to target a gpu with compute capabilities 3.2.

This can be done through TensorFlow unofficial settings with "configure" via the TF_UNOFFICIAL_SETTING variable.

When prompted, specify that you only want a 3.2 compute capability device.

ubuntu@tegra-ubuntu:~/tensorflow$ TF_UNOFFICIAL_SETTING=1 ./configure

Please specify the location of python. [Default is /usr/bin/python]: 

Do you wish to build TensorFlow with GPU support? [y/N] y

GPU support will be enabled for TensorFlow

Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]: 

Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 

Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Please specify the Cudnn version you want to use. [Leave empty to use system default]: 

Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Please specify a list of comma-separated Cuda compute capabilities you want to build with.

You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.

Please note that each additional compute capability significantly increases your build time and binary size.

[Default is: "3.5,5.2"]: 3.2

Setting up Cuda include

Setting up Cuda lib

Setting up Cuda bin

Setting up Cuda nvvm

Configuration finished

Now that the initial set up is done, it is time to change the compiler used by Bazel.

ubuntu@tegra-ubuntu:~/tensorflow$ cd third_party/gpus/cuda/

ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ rm -fr bin nvvm

ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ cp -R /usr/local/cuda-7.0/bin/ bin

ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ cp -R /usr/local/cuda-7.0/nvvm/ nvvm

Before starting the build ( that is going to take a very long time), we will need to modify few files.

tensorflow/core/kernels/conv_ops_gpu_2.cu.cc:

To avoid double instantiation, guard the second functor for InflatePadAndShuffle with:

/* On ARMv7 Eigen::DenseIndex is typedefed to int */

#ifndef __arm__

template struct functor::InflatePadAndShufflefloat

, 4,

                                              Eigen::DenseIndex>;

#endif 

tensorflow/core/kernels/conv_ops_gpu_3.cu.cc:

To avoid double instantiation, guard the second functor for ShuffleAndReverse with:

/* On ARMv7 Eigen::DenseIndex is typedefed to int */

#ifndef __arm__

template struct functor::ShuffleAndReversefloat

, 4,

                                           Eigen::DenseIndex>;

#endif

tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:

ARMv7 has no numa_node file. It should return 0 not -1, otherwise TensorFlow will crash at runtime. You can use the modification from the previous post or the following code:

static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {

#ifdef __arm__

  LOG(INFO) << "ARMV7 does not support NUMA - returning NUMA node zero";

  return 0;

#else

 ........

  return kUnknownNumaNode;

#endif

}

tensorflow/core/common_runtime/gpu/process_state.cc:

this is a new memory allocator, that is going to cause a floating point exception unless you change the following code:

if (kCudaHostMemoryUseBFC) {

      allocator =

#ifdef __arm__

          new BFCAllocator(new CUDAHostAllocator(se), 1LL << 31,

                           true /*allow_growth*/, "cuda_host_bfc" /*name*/);

#else

          new BFCAllocator(new CUDAHostAllocator(se), 1LL << 36 /*64GB max*/,

                           true /*allow_growth*/, "cuda_host_bfc" /*name*/);

#endif

    } else {

We are now ready to build. The only thing left to do is to remove the check to disable the use of variadic templates in Eigen. I have not found a clean way to do it (someone with better Bezel skills may have a better idea). My solution is to start the build and then wait for the first failure:

$bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures -s --config=cuda //tensorflow/cc:tutorials_example_trainer

If on your first compile of tensorflow you get the following error:

ERROR: /home/ubuntu/tensorflow/tensorflow/cc/BUILD:61:1: error loading package 'tensorflow/core': Extension file not found. Unable to load package for '//google/protobuf:protobuf.bzl': BUILD file not found on package path and referenced by '//tensorflow/cc:tutorials_example_trainer'.

You need to init update in the tensorflow repository to get the google/protobuf clone using:

git submodule update --init

At this point, I can edit the file Macros.h in Eigen.

This file is located in the .cache directory:

ubuntu@tegra-ubuntu:~/.cache$ find . -name Macros.h -print

./bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/eigen_archive/eigen-eigen-3f653ace7d28/Eigen/src/Core/util/Macros.h

The nvcc check needs to be eliminated:

-#if !defined(__NVCC__) || !defined(EIGEN_ARCH_ARM_OR_ARM64)

 #define EIGEN_HAS_VARIADIC_TEMPLATES 1

 #endif

-#endif

We can now restart the build and it will go through.

After you are done, you can test it with:

$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

You should see a similar output:

# Lots of output. This tutorial iteratively calculates the major eigenvalue of
# a 2x2 matrix, on GPU. The last few lines look like this.
000009/000005 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000006/000001 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000009/000009 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]

We are now ready to create the pip package and install it:

# To build with GPU support:
$ bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-cp27-none-linux_armv7l.whl

Congratulation, TensorFlow is now installed on your system.

Most of the tests are passing, but the image classification example is giving the wrong results. Now that the community can build it and play with it, someone can find the source of the error(s).

I downloaded the python files from TensorFlow-Tutorial and they seem to work:

git clone https://github.com/nlintz/TensorFlow-Tutorials.git

16 comments:

UnknownJune 17, 2016 at 1:16 PM
Thanks for the great work!

You mentioned on github a while ago that this would also work for a Jetson TX1. If I'm using a TX1, do I still restore the CUDA 6.5 toolchain?
MassimilianoJune 17, 2016 at 1:31 PM
For TX1, you can use the 7.0 tooolchain. I think there is also a new cudnn in the latest Jetpack that will work on TX1.
NateJune 21, 2016 at 6:26 PM
Great work on this. Just an FYI for some that may run into this issue...

If on your first compile of tensorflow you get the following error:

ERROR: /home/ubuntu/tensorflow/tensorflow/cc/BUILD:61:1: error loading package 'tensorflow/core': Extension file not found. Unable to load package for '//google/protobuf:protobuf.bzl': BUILD file not found on package path and referenced by '//tensorflow/cc:tutorials_example_trainer'.

You need to init update in the tensorflow repository to get the google/protobuf clone. I'm not sure why, but I double and triple checked that I ran --recurse-submodules when I checked out tensorflow but it never got the protobuf submodules. You have to force it in some instances using:

git submodule update --init

Not sure if anyone else ran into this issue, but it drove me nuts for days!
MassimilianoJune 22, 2016 at 11:37 AM
I may have encountered the same issue, I had to do the same "git submodule update --init ", I will update the installation instruction.
AlexanderJune 24, 2016 at 6:55 AM
Hi, thanks for updating this tutorial! I've been struggling to get this working for some time now -- could you provide a link to the CUDA runtime 6.5 that you mentioned we need to install (at the top of this page). Thanks!!
MassimilianoJune 24, 2016 at 9:31 AM
You can get it from:
http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda-repo-l4t-r21.2-6-5-prod_6.5-34_armhf.deb

UnknownJuly 14, 2016 at 11:54 AM
Does anyone know if this will work with Tensorflow 0.9?
UnknownJuly 18, 2016 at 10:55 PM
This comment has been removed by the author.
UnknownJuly 18, 2016 at 11:07 PM
Hi, I have installed tensorflow on jetson tk1, but I have two problem.

1. In bazel install instruction, I have no 'protoc-linux-arm32.exe' file.

2. In third_party/protobuf/, I renamed java-3.0.0-beta-1.jar to java-3.0.0-alpha-3.jar.

when bazel compiled, bazel faied compile . (error message: no input file ~~)

I think that bazel want to file name beta-1.jar ,but I renamed alpha-3.jar.

what should I do ?
UnknownOctober 12, 2016 at 11:34 PM
I keep running into errors with the stream_executor during the tensorflow build.

ERROR: /home/ubuntu/tensorflow/tensorflow/stream_executor/BUILD:5:1: C++ compilation of rule '//tensorflow/stream_executor:stream_executor' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command

cuda_rng.cc and an error with dso_loader.
EstebanJanuary 19, 2017 at 5:58 PM
Ìmpressive!!!

Any luck with 0.12?
AnonymousFebruary 20, 2018 at 12:05 AM
Can you please write the same for CPU
UnknownMarch 19, 2018 at 10:46 AM
I have an error in the process of compile bazel:
Worker process did not return a correct WorkResponse. This is probably caused by a bug in the worker
Anyone knows hot to fix it?

CUDA Musing

Friday, June 17, 2016

TensorFlow 0.8 on Jetson TK1

16 comments:

Search This Blog

Followers

Blog Archive

About Me