Lib svm
Description#
Wrapper class for the LibSVM library by Chih-Chung Chang and Chih-Jen Lin. The original wrapper, named WLSVM, was developed by Yasser EL-Manzalawy. The current version is complete rewrite of the wrapper, using Reflection in order to avoid compilation errors, in case the libsvm.jar
is not in the CLASSPATH.
Important note: From WEKA >= 3.7.2 installation and use of LibSVM in WEKA has been simplified by the creation of a LibSVM package that can be installed using either the graphical or command line package manager.
Reference (Weka <= 3.6.8)#
Package#
weka.classifiers.functions
Download#
The wrapper class is part of WEKA since version 3.5.2. But LibSVM, as a third-party-tool needs to be downloaded separately. It is recommended to upgrade to a post-3.5.3 version (or git) for bug-fixes and extensions (contains now the distributionForInstance
method).
CLASSPATH#
Add the libsvm.jar
from the LibSVM distribution to your CLASSPATH to make it available.
Note: Do NOT start WEKA then with java -jar weka.jar
. The -jar option overwrites the CLASSPATH, not augments it (a very common trap to fall into). Instead use something like this on Linux:
If you're starting WEKA from the Start Menu on Windows, you'll have to add the libsvm.jar
to your CLASSPATH
environment variable. The following steps are for Windows XP (unfortunately, the GUI changes among the different Windows versions):
- right-click on My Computer and select Properties from the menu
- choose the Advanced tab and click on Environment variables at the bottom
- either add or modify a variable called
CLASSPATH
and add thelibsvm.jar
with full path to it
Troubleshooting#
- LibSVM classes not in CLASSPATH!
- Check whether the
libsvm.jar
is really in your CLASSPATH. Execute the following command in the SimpleCLI:
- Check whether the
java weka.core.SystemInfo
The property
java.class.path
must list thelibsvm.jar
. If it is listed, check whether the path is correct.If you're on Windows and you find
%CLASSPATH%
there, see next bullet point to fix this.
- On Windows, if you added the
libsvm.jar
to your CLASSPATH environment variable, it can still happen that WEKA pops up the error message that the LibSVM classes are not in your CLASSPATH. This can happen where the%CLASSPATH%
does not get expanded to its actual value in starting up WEKA. You can inspect your current CLASSPATH with which WEKA got started up with the SimpleCLI (see previous bullet point). If%CLASSPATH%
is listed there, your system has the same problem. You can also explicitly add a .jar file toRunWeka.ini
.Note: backslashes have to be escaped, not only once, but twice (they get interpreted by Java twice!). In other words, instead of one you have to use four:
C:\some\where
then turns intoC:\\\\some\\\\where
.
Issues with libsvm.jar that were discussed on the Weka list in April 2007 (and may no longer be relevant)#
The following changes were not incorporated in WEKA, since it also means modifying the LibSVM Java code, which (I think) is autogenerated from the C code. The authors of LibSVM might have to consider that update. It's left to the reader to incorporate these changes.
libsvm.svm uses Math.random#
libsvm.svm calls Math.random so the model it returns is usually different for the same training set and svm parameters over time.
Obviously, if you call libsvm.svm from weka.classifiers.functions.libsvm, and you call it again from libsvm.svm_train, the results are also different.
You can use libsvm.svm_save_model to record the svms into files, and then compare the model file from WEKA LibSVM with the model file from libsvm.svm_predict. Then you can see that ProbA values use to be different.
WEKA experimenter is based on using always the same random sequences in order to repeat experiments with the same results. So, I'm afraid some important design changes are required on libsvm.jar and weka.classifiers.functions.libsvm.class to keep such behaviour. We made a quick fix adding an static Random attribute to libsvm.svm class:
We have changed all Math.random() invokations to ranGen.nextdouble(). Then we have obtained the same svm from weka LibSVM than from LibSVM train_svm.However, WEKA accuracy results on primary_tumor data were still worse, so there's something wrong when weka uses the svm model at testing step.
Classes without instances#
ARFF format provides some meta-information (i.e. attributes name and type, set of possible values for nominal attributes), but LibSVM format doesn't. So if there are classes in the dataset with zero occurrences through all the instances, LibSVM thinks that these classes don't exist whereas WEKA knows they exist.
For example, there is a class in primary tumor dataset that never appears. When WEKA experimenter makes testing, it calls to:
public static double svm_predict_probability(svm_model model, svm_node[] x, double[] prob_estimates)
So accuracy results are different depending on origin of svm_predict_probability method invocation. I think that better results are obtained if classes without instances are ignored, but I don't know if it is very fair. In fact, accuracies from weka.libsvm and from libsvm.predict_svm seem to be the same if the class that never appears is removed from ARFF file.
Note that this problem only appears when testing, because the training code uses always the svm_group_classes method to compute the number of classes, so Instances.numClasses() value is never used for training. Moreover, maybe the mismatch between the training number of classes and the testing number of classes is the reason behind worse accuracy results when svm_predict_probability invocation is made from WEKA, but I haven't proved it yet.
Note that this problem does also happen when you have a class with less examples than the number of folds. For some folds, the class will not have training examples.
We also made a quick fix for this problem:
- Add this public method to libsvm.svm_model class
public int getNr_class(){return nr_class;}
- Make the following changes into
distributionforInstance
Method atweka.classifiers.functions.LibSVM
First line of the method:
could be changed to
Last line in "if(m_ProbablityEstimates)" block:
could be changed to