Sunday, September 30, 2012

Compiling for CARMA

In few days, CARMA will be finally available to the general public. If you are not familiar with the CARMA project, it is the first ARM platform supporting CUDA.
It has a Tegra 3 with 4 cores and 2 GB of memory, ethernet, USB ports and a Quadro 1000M GPU (GF108 with 2 GB of memory, 96 CUDA cores, compute capability 2.1).
It has full OpenGL and CUDA support, but at the moment, no CUDA compiler.

The developer needs to cross-compile from a Linux x86 machine. This blog shows how easy it is to cross-compile once we follow some simple instructions. I strongly suggest that you start with an Ubuntu machine, the cross-compiler are easily available under this platform.

The first thing to do, it is to install the cross-compilers:

sudo apt-get install g++-arm-linux-gnueabi gcc-arm-linux-gnueabi

At this point, we will have the cross-compilers installed under  /usr/bin/arm-linux-gnueabi-gcc and  /usr/bin/arm-linux-gnueabi-g++.

The second step is to install the CUDA Toolkit for ARM on the x86. If you choose the default location,
the installer will create a directory /usr/local/cuda.

If you need to use other libraries for ARM, you will also need to copy the libraries and corresponding header files from CARMA to the x86 machine.  You can place them under /usr/local/arm_lib and /usr/local/arm_include or you can just put them under /usr/local/cuda/lib and /usr/local/cuda/include (my preference will be for the first option to not pollute the CUDA installation).

We are now ready to compile our code, taking care of using the cross compiler and the special nvcc in the CARMA toolkit.  The following makefile will show how to compile a simple c++ code that calls a CUBLAS function and a simple CUDA code.


############################
#  Makefile for cross-compile #
############################
all : dgemm_cublas simple_cuda

CUDA_HOME=/usr/local/cuda
CC=/arm-linux-gnueabi-gcc
NVCC=$(CUDA_HOME)/bin/nvcc -target-cpu-arch ARM --compiler-bindir /usr/bin/arm-linux-gnueabi-gcc-4.5 -m32


# For a standard c++ code, we use CC and the CUDA ARM libraries
dgemm_cublas : gemm_test.cpp
$(CC)   gemm_test.cpp -I$(CUDA_HOME)/include -o dgemm_cublas -L/$(CUDA_HOME)/lib -lcudart -lcublas

# For a standard CUDA code, we just invoke nvcc
simple_cuda: file.cu
$(NVCC) -o simple_cuda file.cu

clean :
rm -f *.o dgemm_cublas simple_cuda


Once we generate the executable, since they are for ARM, we will not be able to execute them until we move them on CARMA.