Let's start with the x86 development system. I am using a virtual machine on my Mac as my development system.
I started by installing a fresh Ubuntu 11.04 distro and then proceed to :
- Update the packages:
- sudo apt-get update
- Install the basic developer tools:
- sudo apt-get install build-essential
- Install the 32bit development libraries ( CARMA is 32bit ):
- sudo apt-get install ia32-libs
- Install the ARM cross compilers:
- sudo apt-get install gcc-4.5-arm-linux-gnueabi g++-4.5-arm-linux-gnueabi
- Install Fortran for both x86 and ARM (real developers use Fortran....):
- sudo apt-get install gfortran-4.5-*
- Install the CUDA Toolkit (available from http://www.seco.com/carmakit under the downloads tab):
- sudo sh cuda-linux-ARMv7-rel-4.2.10-13489154.run
- Edit .bashrc to add nvcc to the path. With your favorite editor add a line at the end of the file:
- export PATH=/usr/local/cuda/bin:$PATH
- Source the .bashrc to refresh the path ( it will be automatically executed the next time you login or open a terminal):
- . .bashrc
max@ubuntu:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Tue_Jul_17_14:48:12_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221
We are now ready to compile our first CUDA code, a comparison between multiplications on CPU and GPU.
#include "stdio.h"
__global__ void kernel(int i, float *d_n)
{
*d_n *= 1.02f;
}
void main(){
float n = 1.0f, *d_n;
float n_ref = 1.0f;
int i;
cudaMalloc((void **)&d_n, sizeof(float));
for(i = 1; i <= 10; i++)
{
cudaMemcpy(d_n, &n, sizeof(float), cudaMemcpyHostToDevice);
kernel <<< 1, 1 >>> (i, d_n);
cudaMemcpy(&n, d_n, sizeof(float), cudaMemcpyDeviceToHost);
printf("%d\t\t%42.41f\t%42.41f\n", i, n,n_ref*=1.02f);
}
}
We are going to use a Makefile similar to the one posted in the previous blog.
max@ubuntu:~$ cat Makefile
############################
# Makefile for cross-compile #
############################
all : gpu_test
CUDA_HOME=/usr/local/cuda
CC=/arm-linux-gnueabi-gcc
NVCC=$(CUDA_HOME)/bin/nvcc -target-cpu-arch ARM --compiler-bindir /usr/bin/arm-linux-gnueabi-gcc-4.5 -m32
gpu_test : gpu_test.cu
$(NVCC) -o gpu_test gpu_test.cu
clean:
rm gpu_test
When we type make, we should see a similar output
max@ubuntu:~$ make
/usr/local/cuda/bin/nvcc -target-cpu-arch ARM --compiler-bindir /usr/bin/arm-linux-gnueabi-gcc-4.5 -m32 -o gpu_test gpu_test.cu
/usr/lib/gcc/arm-linux-gnueabi/4.5.2/../../../../arm-linux-gnueabi/bin/ld: warning: libc.so, needed by /usr/arm-linux-gnueabi/lib//libgcc_s.so.1, not found (try using -rpath or -rpath-link)
Don't worry about the warning. This is caused by a bogus DT_NEEDED entry in the shared libgcc file /usr/arm- linux-gnueabi/ lib/libgcc_ s.so.1. "readelf -a" shows:
0x00000001 (NEEDED) Shared library: [libc.so]
Before we could use the machine for any real CUDA development, there is an extra step that we will need to perform. The CUDA Toolkit is missing the libcuda.so ( it usually comes with the driver on the x86 platform, don't ask me why it was not included in the ARM toolkit), we will not be able to link any CUDA code before we bring this library to the x86. We will do this step once we have the CARMA up and running.0x00000001 (NEEDED) Shared library: [libc.so]
Unpack the CARMA, plugin keyboard and mouse, plus the HDMI cable in the middle connector.
Plug in the power and ethernet cable and you are ready to go.
The first boot may be slow, the system is building the NVIDIA driver. It is a blind boot, there is no console output until the GUI comes up, so you need to have a little bit of patience.
Once the CARMA system boots, it will auto-login and start a terminal. It should also pick up an IP address ( use ifconfig to find out the IP). The default username/password is ubuntu/ubuntu.
We are ready to check if our cross-compilation worked.
From inside the virtual machine, we will copy the file gpu_test to the CARMA ( ipconfig is reporting
172.16.174.185 ):
scp gpu_test ubuntu@172.16.174.185 :~
Either from the CARMA terminal or from a remote shell, we can run gpu_test and check that the CPU and GPU results are the same.
ubuntu@tegra-ubuntu:~$ ./gpu_test
1 1.01999998092651367187500000000000000000000 1.01999998092651367187500000000000000000000
2 1.04039990901947021484375000000000000000000 1.04039990901947021484375000000000000000000
3 1.06120789051055908203125000000000000000000 1.06120789051055908203125000000000000000000
4 1.08243203163146972656250000000000000000000 1.08243203163146972656250000000000000000000
5 1.10408067703247070312500000000000000000000 1.10408067703247070312500000000000000000000
6 1.12616229057312011718750000000000000000000 1.12616229057312011718750000000000000000000
7 1.14868545532226562500000000000000000000000 1.14868545532226562500000000000000000000000
8 1.17165911197662353515625000000000000000000 1.17165911197662353515625000000000000000000
9 1.19509232044219970703125000000000000000000 1.19509232044219970703125000000000000000000
10 1.21899414062500000000000000000000000000000 1.21899414062500000000000000000000000000000
The CARMA filesystem is quite bare, let's add few useful packages:
- Install Fortran:
- sudo apt-get install gfortran
We need to install OpenMPI from source, the default packages don't seem to work.
The latest source (1.6.2) has support for ARM, the installation is very simple but it will take a while.
Get the latest stable version
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.2.tar.gz
unpack it ( tar xvfz openmpi-1.6.2.tar.gz) and change the directory ( cd openmpi-1.6.2 )
We are now ready to build and install
./configure
sudo make -j 4 install
Add /usr/local/bin to your PATH and /usr/local/lib to your LD_LIBRARY_PATH
Add /usr/local/bin to your PATH and /usr/local/lib to your LD_LIBRARY_PATH