Friday, June 17, 2016

TensorFlow 0.8 on Jetson TK1

This post gives updated instructions on how to build TensorFlow 0.8 on Jetson TK1 now that NVIDIA has released a new compiler that can handle the variadic templates without compiler internal errors.

If you just want to try to install the whl file, this is  a direct link,  tensorflow-0.8.0-cp27-none-linux_armv7l.whl

I am going to use the same approach highlighted in the previous post, basically use the CUDA runtime 6.5 and CUDDN v2 but compile the code with the newer 7.0 compiler.


Install the 7.0.76 compiler:

Before starting, you will need to download the new compiler. NVIDIA does not make your life easy in finding the link (they would like you to use Jetpack, but I don't like to reformat a working system if not absolutely needed) but you can download the .deb package directly on your Jetson with:



wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb

Now we can install it as usual:

sudo dpkg -i cuda-repo-l4t-7-0-local_7.0-76_armhf.deb 
sudo apt-get update
sudo apt-get install cuda-toolkit-7-0

At this point we need to restore the standard 6.5 toolchain as the default one (we just want the 7.0 compiler to generate the object files), since the current driver on the Jetson TK1will only work with the 6.5 runtime. Go to the /usr/local directory and remove the cuda symlink to cuda-7.0 and make a new one for 6.5: 

ubuntu@tegra-ubuntu:/usr/local$ sudo rm cuda
ubuntu@tegra-ubuntu:/usr/local$ sudo ln -s cuda-6.5/ cuda

You should see this output:

ubuntu@tegra-ubuntu:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Fri_Dec_12_11:12:07_CST_2014
Cuda compilation tools, release 6.5, V6.5.35

ubuntu@tegra-ubuntu:~$ /usr/local/cuda-7.0/bin/nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_22_15:38:26_CST_2016
Cuda compilation tools, release 7.0, V7.0.74

Install protobuf and Bazel:
For protobuf you can follow the instruction from the previous blog post ( the only change is the location of protobuf-java-3.0.0-beta-x.jar , now in the java/core/target subdirectory).
Also for Bazel the procedure is similar, the only change required is the version, TF0.8 requires Bazel 0.1.4 so after cloning bazel, you will need to use the proper tag:

$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ git checkout tags/0.1.4

Install TensorFlow 0.8:
The first thing to do it is to check out the source code and select the proper version:

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout r0.8


TensorFlow is expecting a 64bit system, we will need to change all the reference from lib64 to lib. We can find all the files with the strings and apply all the changes with these commands:

$ cd tensorflow
$ grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g'

TensorFlow officially supports Cuda devices with 3.5 and 5.2 compute capabilities. We want to target a gpu with compute capabilities 3.2. 
This can be done through TensorFlow unofficial settings with "configure" via the TF_UNOFFICIAL_SETTING variable.
When prompted, specify that you only want a 3.2 compute capability device.


ubuntu@tegra-ubuntu:~/tensorflow$ TF_UNOFFICIAL_SETTING=1 ./configure
Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.2
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

Now that the initial set up is done, it is time to change the compiler used by Bazel.

ubuntu@tegra-ubuntu:~/tensorflow$ cd third_party/gpus/cuda/
ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ rm -fr bin nvvm
ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ cp -R /usr/local/cuda-7.0/bin/ bin
ubuntu@tegra-ubuntu:~/tensorflow/third_party/gpus/cuda$ cp -R /usr/local/cuda-7.0/nvvm/ nvvm

Before starting the build ( that is going to take a very long time), we will need to modify few files.

tensorflow/core/kernels/conv_ops_gpu_2.cu.cc:
 To avoid double instantiation, guard the second functor for InflatePadAndShuffle with:
/* On ARMv7 Eigen::DenseIndex is typedefed to int */
#ifndef __arm__
template struct functor::InflatePadAndShufflefloat
, 4,
                                              Eigen::DenseIndex>;
#endif 
tensorflow/core/kernels/conv_ops_gpu_3.cu.cc:
 To avoid double instantiation, guard the second functor  for ShuffleAndReverse with:
/* On ARMv7 Eigen::DenseIndex is typedefed to int */
#ifndef __arm__
template struct functor::ShuffleAndReversefloat
, 4,
                                           Eigen::DenseIndex>;
#endif

tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:
 ARMv7 has no numa_node file. It should return 0 not -1, otherwise TensorFlow will crash at runtime. You can use the modification from the previous post or the following code:

static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
#ifdef __arm__
  LOG(INFO) << "ARMV7 does not support NUMA - returning NUMA node zero";
  return 0;
#else
 ........
  return kUnknownNumaNode;
#endif
}

tensorflow/core/common_runtime/gpu/process_state.cc:
this  is a new memory allocator, that is going to cause a floating point exception unless you change the following code:

if (kCudaHostMemoryUseBFC) {
      allocator =
#ifdef __arm__
          new BFCAllocator(new CUDAHostAllocator(se), 1LL << 31,
                           true /*allow_growth*/, "cuda_host_bfc" /*name*/);
#else
          new BFCAllocator(new CUDAHostAllocator(se), 1LL << 36 /*64GB max*/,
                           true /*allow_growth*/, "cuda_host_bfc" /*name*/);
#endif
    } else {


We are now ready to build. The only thing left to do is to remove the check to disable the use of variadic templates in Eigen. I have not found a clean way to do it (someone with better Bezel skills may have a better idea). My solution is to  start the build and then wait for the first failure:

$bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures -s --config=cuda //tensorflow/cc:tutorials_example_trainer

If on your first compile of tensorflow you get the following error:

ERROR: /home/ubuntu/tensorflow/tensorflow/cc/BUILD:61:1: error loading package 'tensorflow/core': Extension file not found. Unable to load package for '//google/protobuf:protobuf.bzl': BUILD file not found on package path and referenced by '//tensorflow/cc:tutorials_example_trainer'.

You need to init update in the tensorflow repository to get the google/protobuf clone using:

git submodule update --init 

At this point, I can edit the file Macros.h in Eigen.
This file is located in the .cache directory:

ubuntu@tegra-ubuntu:~/.cache$ find . -name Macros.h -print
./bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/eigen_archive/eigen-eigen-3f653ace7d28/Eigen/src/Core/util/Macros.h

The nvcc check needs to be eliminated:
-#if !defined(__NVCC__) || !defined(EIGEN_ARCH_ARM_OR_ARM64)
 #define EIGEN_HAS_VARIADIC_TEMPLATES 1
 #endif
-#endif


We can now restart the build and it will go through. 
After you are done, you can test it with:

$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

You should see a similar output:

# Lots of output. This tutorial iteratively calculates the major eigenvalue of
# a 2x2 matrix, on GPU. The last few lines look like this.
000009/000005 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000006/000001 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
000009/000009 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]

We are now ready to create the pip package and install it:
# To build with GPU support:
$ bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-cp27-none-linux_armv7l.whl

Congratulation, TensorFlow is now installed on your system.


Most of the tests are passing, but the image classification example is giving the wrong results. Now that the community can build it and play with it, someone can find the source of the error(s).

I downloaded the python files from TensorFlow-Tutorial and they seem to work:

git clone https://github.com/nlintz/TensorFlow-Tutorials.git


13 comments:

  1. Thanks for the great work!

    You mentioned on github a while ago that this would also work for a Jetson TX1. If I'm using a TX1, do I still restore the CUDA 6.5 toolchain?

    ReplyDelete
  2. For TX1, you can use the 7.0 tooolchain. I think there is also a new cudnn in the latest Jetpack that will work on TX1.

    ReplyDelete
  3. Great work on this. Just an FYI for some that may run into this issue...

    If on your first compile of tensorflow you get the following error:

    ERROR: /home/ubuntu/tensorflow/tensorflow/cc/BUILD:61:1: error loading package 'tensorflow/core': Extension file not found. Unable to load package for '//google/protobuf:protobuf.bzl': BUILD file not found on package path and referenced by '//tensorflow/cc:tutorials_example_trainer'.

    You need to init update in the tensorflow repository to get the google/protobuf clone. I'm not sure why, but I double and triple checked that I ran --recurse-submodules when I checked out tensorflow but it never got the protobuf submodules. You have to force it in some instances using:

    git submodule update --init

    Not sure if anyone else ran into this issue, but it drove me nuts for days!

    ReplyDelete
  4. I may have encountered the same issue, I had to do the same "git submodule update --init ", I will update the installation instruction.

    ReplyDelete
  5. Hi, thanks for updating this tutorial! I've been struggling to get this working for some time now -- could you provide a link to the CUDA runtime 6.5 that you mentioned we need to install (at the top of this page). Thanks!!

    ReplyDelete
  6. You can get it from:
    http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda-repo-l4t-r21.2-6-5-prod_6.5-34_armhf.deb

    ReplyDelete
  7. Does anyone know if this will work with Tensorflow 0.9?

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Hi, I have installed tensorflow on jetson tk1, but I have two problem.

    1. In bazel install instruction, I have no 'protoc-linux-arm32.exe' file.

    2. In third_party/protobuf/, I renamed java-3.0.0-beta-1.jar to java-3.0.0-alpha-3.jar.

    when bazel compiled, bazel faied compile . (error message: no input file ~~)

    I think that bazel want to file name beta-1.jar ,but I renamed alpha-3.jar.

    what should I do ?

    ReplyDelete
    Replies
    1. I'm having the same issue here. Error on compiling Bazel. Did you got how to solve it?

      Delete
  10. I keep running into errors with the stream_executor during the tensorflow build.

    ERROR: /home/ubuntu/tensorflow/tensorflow/stream_executor/BUILD:5:1: C++ compilation of rule '//tensorflow/stream_executor:stream_executor' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command

    cuda_rng.cc and an error with dso_loader.

    ReplyDelete
    Replies
    1. It is possible due to hardware constraint of the Jetson TK1. Try to update the --local-resources.

      This issue has been discussed in here: https://github.com/tensorflow/tensorflow/issues/349

      Delete
  11. Ìmpressive!!!

    Any luck with 0.12?

    ReplyDelete