Lib svm

Description#

Wrapper class for the LibSVM library by Chih-Chung Chang and Chih-Jen Lin. The original wrapper, named WLSVM, was developed by Yasser EL-Manzalawy. The current version is complete rewrite of the wrapper, using Reflection in order to avoid compilation errors, in case the libsvm.jar is not in the CLASSPATH.

Important note: From WEKA >= 3.7.2 installation and use of LibSVM in WEKA has been simplified by the creation of a LibSVM package that can be installed using either the graphical or command line package manager.

Reference (Weka <= 3.6.8)#

LibSVM
WLSVM

Package#

weka.classifiers.functions

Download#

The wrapper class is part of WEKA since version 3.5.2. But LibSVM, as a third-party-tool needs to be downloaded separately. It is recommended to upgrade to a post-3.5.3 version (or git) for bug-fixes and extensions (contains now the distributionForInstance method).

CLASSPATH#

Add the libsvm.jar from the LibSVM distribution to your CLASSPATH to make it available.

Note: Do NOT start WEKA then with java -jar weka.jar. The -jar option overwrites the CLASSPATH, not augments it (a very common trap to fall into). Instead use something like this on Linux:

 java -classpath $CLASSPATH:weka.jar:libsvm.jar weka.gui.GUIChooser

or this on Win32 (if you're starting it from commandline):

 java -classpath "%CLASSPATH%;weka.jar;libsvm.jar" weka.gui.GUIChooser

If you're starting WEKA from the Start Menu on Windows, you'll have to add the libsvm.jar to your CLASSPATH environment variable. The following steps are for Windows XP (unfortunately, the GUI changes among the different Windows versions):

right-click on My Computer and select Properties from the menu
choose the Advanced tab and click on Environment variables at the bottom
either add or modify a variable called CLASSPATH and add the libsvm.jar with full path to it

Troubleshooting#

LibSVM classes not in CLASSPATH!
- Check whether the libsvm.jar is really in your CLASSPATH. Execute the following command in the SimpleCLI:

java weka.core.SystemInfo

The property java.class.path must list the libsvm.jar. If it is listed, check whether the path is correct.

If you're on Windows and you find %CLASSPATH% there, see next bullet point to fix this.

On Windows, if you added the libsvm.jar to your CLASSPATH environment variable, it can still happen that WEKA pops up the error message that the LibSVM classes are not in your CLASSPATH. This can happen where the %CLASSPATH% does not get expanded to its actual value in starting up WEKA. You can inspect your current CLASSPATH with which WEKA got started up with the SimpleCLI (see previous bullet point). If %CLASSPATH% is listed there, your system has the same problem. You can also explicitly add a .jar file to RunWeka.ini.

Note: backslashes have to be escaped, not only once, but twice (they get interpreted by Java twice!). In other words, instead of one you have to use four: C:\some\where then turns into C:\\\\some\\\\where.

Issues with libsvm.jar that were discussed on the Weka list in April 2007 (and may no longer be relevant)#

The following changes were not incorporated in WEKA, since it also means modifying the LibSVM Java code, which (I think) is autogenerated from the C code. The authors of LibSVM might have to consider that update. It's left to the reader to incorporate these changes.

libsvm.svm uses Math.random#

libsvm.svm calls Math.random so the model it returns is usually different for the same training set and svm parameters over time.

Obviously, if you call libsvm.svm from weka.classifiers.functions.libsvm, and you call it again from libsvm.svm_train, the results are also different.

You can use libsvm.svm_save_model to record the svms into files, and then compare the model file from WEKA LibSVM with the model file from libsvm.svm_predict. Then you can see that ProbA values use to be different.

WEKA experimenter is based on using always the same random sequences in order to repeat experiments with the same results. So, I'm afraid some important design changes are required on libsvm.jar and weka.classifiers.functions.libsvm.class to keep such behaviour. We made a quick fix adding an static Random attribute to libsvm.svm class:

 static java.util.Random ranGen = new Random(0);

We have changed all Math.random() invokations to ranGen.nextdouble(). Then we have obtained the same svm from weka LibSVM than from LibSVM train_svm.

However, WEKA accuracy results on primary_tumor data were still worse, so there's something wrong when weka uses the svm model at testing step.

Classes without instances#

ARFF format provides some meta-information (i.e. attributes name and type, set of possible values for nominal attributes), but LibSVM format doesn't. So if there are classes in the dataset with zero occurrences through all the instances, LibSVM thinks that these classes don't exist whereas WEKA knows they exist.

For example, there is a class in primary tumor dataset that never appears. When WEKA experimenter makes testing, it calls to:

 public static double svm_predict_probability(svm_model model, svm_node[] x, double[] prob_estimates)

passing the array prob_estimates plenty of zeros (array cells are initialized to zero). The size of the array is equal to the number of classes (= 22). On the other hand, if this method is invoked from libsvm.svm_predict, the class that never appears is ignored, so the array dimension is now equal to 21.

So accuracy results are different depending on origin of svm_predict_probability method invocation. I think that better results are obtained if classes without instances are ignored, but I don't know if it is very fair. In fact, accuracies from weka.libsvm and from libsvm.predict_svm seem to be the same if the class that never appears is removed from ARFF file.

Note that this problem only appears when testing, because the training code uses always the svm_group_classes method to compute the number of classes, so Instances.numClasses() value is never used for training. Moreover, maybe the mismatch between the training number of classes and the testing number of classes is the reason behind worse accuracy results when svm_predict_probability invocation is made from WEKA, but I haven't proved it yet.

Note that this problem does also happen when you have a class with less examples than the number of folds. For some folds, the class will not have training examples.

We also made a quick fix for this problem:

Add this public method to libsvm.svm_model class

public int getNr_class(){return nr_class;}
Make the following changes into distributionforInstance Method at weka.classifiers.functions.LibSVM

First line of the method:

 int[] labels = new int[instance.numClasses()];

could be changed to
 int[] labels = new int[((svm_model) m_Model).getNr_class()];
Last line in "if(m_ProbablityEstimates)" block:

 prob_estimates = new double[instance.numClasses()];

could be changed to

 prob_estimates = new double[((svm_model) m_Model).getNr_class()];