Mtj with nvblas

(The following is based on a post from Eibe Frank on the Weka mailing list.)

Here is an example of running MTJ with NVBLAS (NVIDIA's BLAS wrapper) on Ubuntu:

Installed

https://prdownloads.sourceforge.net/weka/weka-3-8-6-azul-zulu-linux.zip

in
```
/home/eibe/Desktop
```

Ran

~/Desktop/weka-3-8-6/weka.sh -main weka.core.WekaPackageManager -install-package netlibNativeLinux

To install CPU-based system BLAS/LAPACK, ran

sudo apt-get install libblas-dev liblapack-dev
sudo ln -s /usr/lib/x86_64-linux-gnu/libblas.so.3 /usr/lib/libblas.so.3
sudo ln -s /usr/lib/x86_64-linux-gnu/liblapack.so.3 /usr/lib/liblapack.so.3

Downloaded and installed CUDA 11.6 from https://developer.nvidia.com/cuda-downloads
Copied example nvblas.conf from https://docs.nvidia.com/cuda/nvblas/ into local directory using
```
cat > nvblas.conf
```

Edited nvblas.conf to have

NVBLAS_CPU_BLAS_LIB  /usr/lib/x86_64-linux-gnu/blas/libblas.so.3

Now, by adapting what's given at https://github.com/fommil/netlib-java/wiki/NVBLAS, issued

export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64:/usr/lib/x86_64-linux-gnu/blas/libblas.so.3

Then,

~/Desktop/weka-3-8-6/weka.sh -main weka.Run .RandomRBF -a 5000 > RandomRBF.a5000.arff
LD_PRELOAD=libnvblas.so ~/Desktop/weka-3-8-6/weka.sh -main weka.Run .attributeSelection.PrincipalComponents -i RandomRBF.a5000.arff

Observation: Memory is being allocated on the GPU. Looking at nvblas.log, the GPU is used, but only for some dgemm operations. However, according to https://docs.nvidia.com/cuda/nvblas/, the tremm operation (which is executed on the CPU) should also be supported by the GPU.