Monday, October 29, 2012

Setting up a CARMA kit

I just received a brand new CARMA kit and I am going to post all the steps I did to get a working set-up.

Let's start with the x86 development system. I am using a virtual machine on my Mac as my development system.

I started by installing a fresh Ubuntu 11.04 distro and then proceed to :
  • Update the packages: 
    • sudo apt-get update
  • Install the basic developer tools: 
    • sudo apt-get install build-essential
  • Install the 32bit development libraries ( CARMA is 32bit ):
    • sudo apt-get install ia32-libs
  • Install the ARM cross compilers: 
    • sudo apt-get install gcc-4.5-arm-linux-gnueabi g++-4.5-arm-linux-gnueabi
  • Install Fortran for both x86 and ARM (real developers use Fortran....):
    • sudo apt-get install gfortran-4.5-*
  • Install the CUDA Toolkit (available from http://www.seco.com/carmakit under the downloads tab): 
    • sudo sh cuda-linux-ARMv7-rel-4.2.10-13489154.run
  • Edit .bashrc to add nvcc to the path. With your favorite editor add a line at the end of the file:
    • export PATH=/usr/local/cuda/bin:$PATH
  • Source the .bashrc to refresh the path ( it will be automatically executed the next time you login or open a terminal):
    • . .bashrc
We can check that nvcc is now in our path, invoking the compiler with the -V flag to check the version


max@ubuntu:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Tue_Jul_17_14:48:12_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221

We are now ready to compile our first CUDA code, a comparison between multiplications on CPU and GPU.


#include "stdio.h"

__global__ void kernel(int i, float *d_n)
{
*d_n *= 1.02f;
}

void main(){
 float n = 1.0f, *d_n;
 float n_ref = 1.0f;
 int i;
 cudaMalloc((void **)&d_n, sizeof(float));
 for(i = 1; i <= 10; i++)
 {
  cudaMemcpy(d_n, &n, sizeof(float), cudaMemcpyHostToDevice);
  kernel <<< 1, 1 >>> (i, d_n);
  cudaMemcpy(&n, d_n, sizeof(float), cudaMemcpyDeviceToHost);
  printf("%d\t\t%42.41f\t%42.41f\n", i, n,n_ref*=1.02f);
 }
}


We are going to use a Makefile similar to the one posted in the previous blog.


max@ubuntu:~$ cat Makefile 
############################
#  Makefile for cross-compile #
############################
all : gpu_test

CUDA_HOME=/usr/local/cuda
CC=/arm-linux-gnueabi-gcc
NVCC=$(CUDA_HOME)/bin/nvcc -target-cpu-arch ARM --compiler-bindir /usr/bin/arm-linux-gnueabi-gcc-4.5 -m32

gpu_test : gpu_test.cu
$(NVCC)  -o gpu_test gpu_test.cu 

clean:
rm gpu_test



When we type make, we should see a similar output


max@ubuntu:~$ make
/usr/local/cuda/bin/nvcc -target-cpu-arch ARM --compiler-bindir /usr/bin/arm-linux-gnueabi-gcc-4.5 -m32  -o gpu_test gpu_test.cu 
/usr/lib/gcc/arm-linux-gnueabi/4.5.2/../../../../arm-linux-gnueabi/bin/ld: warning: libc.so, needed by /usr/arm-linux-gnueabi/lib//libgcc_s.so.1, not found (try using -rpath or -rpath-link)



Don't worry about the warning. This is caused by a bogus DT_NEEDED entry in the shared libgcc file /usr/arm-linux-gnueabi/lib/libgcc_s.so.1. "readelf -a" shows:
 0x00000001 (NEEDED) Shared library: [libc.so]
Before we could use the machine for any real CUDA development, there is an extra step that we will need to perform.  The CUDA Toolkit is missing the libcuda.so ( it usually comes with the driver on the x86 platform, don't ask me why it was not included in the ARM toolkit), we will not be able to link any CUDA code before we bring this library to the x86. We will do this step once we have the CARMA up and running.


Unpack the CARMA, plugin keyboard and mouse, plus the HDMI cable in the middle connector.
Plug in the power and ethernet cable and you are ready to go.
The first boot may be slow, the system is building the NVIDIA driver. It is a blind boot, there is no console output until the GUI comes up, so you need to have a little bit of patience.

Once the CARMA system boots, it will auto-login and start a terminal. It should also pick up an IP address ( use ifconfig to find out the IP). The default username/password is ubuntu/ubuntu.

We are ready  to check if our cross-compilation worked. 
From inside the virtual machine, we will copy the file gpu_test to the CARMA ( ipconfig is reporting 
172.16.174.185 ):

   scp gpu_test ubuntu@172.16.174.185 :~

Either from the CARMA terminal or from a remote shell, we can run gpu_test and check that the CPU and GPU results are the same.

ubuntu@tegra-ubuntu:~$ ./gpu_test 
1 1.01999998092651367187500000000000000000000 1.01999998092651367187500000000000000000000
2 1.04039990901947021484375000000000000000000 1.04039990901947021484375000000000000000000
3 1.06120789051055908203125000000000000000000 1.06120789051055908203125000000000000000000
4 1.08243203163146972656250000000000000000000 1.08243203163146972656250000000000000000000
5 1.10408067703247070312500000000000000000000 1.10408067703247070312500000000000000000000
6 1.12616229057312011718750000000000000000000 1.12616229057312011718750000000000000000000
7 1.14868545532226562500000000000000000000000 1.14868545532226562500000000000000000000000
8 1.17165911197662353515625000000000000000000 1.17165911197662353515625000000000000000000
9 1.19509232044219970703125000000000000000000 1.19509232044219970703125000000000000000000
10 1.21899414062500000000000000000000000000000 1.21899414062500000000000000000000000000000

The CARMA filesystem is quite bare, let's add few useful packages:
  • Install Fortran:
    • sudo apt-get install gfortran
We need to install OpenMPI from source, the default packages don't seem to work.
The latest source (1.6.2) has support for ARM, the installation is very simple but it will take a while.

Get the latest stable version 
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.2.tar.gz

unpack it ( tar xvfz openmpi-1.6.2.tar.gz) and change the directory ( cd openmpi-1.6.2  )

We are now ready to build and install
./configure
sudo make -j 4 install

Add /usr/local/bin to your PATH and /usr/local/lib to your LD_LIBRARY_PATH