Using weka from jython
Jython is an implementation of the high-level, dynamic, object-oriented language Python written in 100% Pure Java, and seamlessly integrated with the Java platform. It thus allows you to run Python on any Java platform.
-- taken from the Jython homepage
This article explains how use Weka classes from within Jython and how to write a classifier in Jython that can be used within the Weka framework.
Accessing Weka classes from Jython#
Requirements#
In order for Jython to find the Weka classes, you must export them in your CLASSPATH. Here is an example for adding the weka.jar
located in the directory /some/where
to the CLASSPATH in a bash under Linux:
Note: Windows users must just the backslash ("\") in the command prompt instead of the slash ("/") in paths.
Implementation#
As soon as one imports classes in a Jython module one can use that class just like in Java. E.g., if one wants to use the J48 classifier, one only needs to import it as follows:
Here's a Jython module (UsingJ48.py):
import sys
import java.io.FileReader as FileReader
import weka.core.Instances as Instances
import weka.classifiers.trees.J48 as J48
# load data file
file = FileReader("/some/where/file.arff")
data = Instances(file)
data.setClassIndex(data.numAttributes() - 1)
# create the model
j48 = J48()
j48.buildClassifier(data)
# print out the built model
print j48
A slightly more elaborate example can be found in UsingJ48Ext.py, which uses more methods of the weka.classifiers.Evaluation
class.
NB: The example UsingJ48Ext.py
needs Weka 3.6.x to run, due to some changes in the API.
Implementing a Jython classifier#
Requirements#
- Weka >3.5.6
- Jython 2.2rc2 (later versions should work as well)
Implementation#
This section covers the implementation of weka.classifiers.rules.ZeroR
in Python, JeroR.py:
- Subclass an abstract superclass of Weka classifiers (in this case
weka.classifiers.Classifier
):
class JeroR (**Classifier**, JythonSerializableObject):
Note: the
JythonSerializableObject
interface is necessary for Serialization purposes (Weka creates copies of classifiers via serialization)
-
You have to implement the following methods:
def listOptions(self):
Returns an
java.util.Enumeration
ofweka.core.Option
objects of all available options. Calling the superclass method is done with*<superclass>*.listOptions()
, e.g.,Classifier.listOptions()
.def setOptions(self, options):
Sets the commandline options, with the parameter
options
being an array of strings.def getOptions(self):
Returns an array of strings, containing all the currently set options (to be used with
setOptions(self,options)
).def getCapabilities(self):
Returns a
weka.core.Capabilities
object with information about what attributes and classes can be processed by this algorithm.def buildClassifier(self, instances):
This method builds the actual model based on the data provided. The first statements in this method should be the ones checking the capabilities of the algorithm against the data and removing all instances with a missing class value:
-
at least one of the following two:
def classifyInstance(self, instance):
Returns either the index of the predicted class label (for nominal classes) or the regression result (for numeric classes)
def distributionForInstance(self, instance):
This method returns an array of doubles containing the probabilities for all class labels. In case of a numeric class attribute, the length of this array is 1. In Jython, you can use the
[jarray](http://www.jython.org/docs/jarray.html)
module to generate a double array. With the following line you can create the correct array to be returned by this method (you still need to fill it with values):
result = jarray.zeros(instance.numClasses(), 'd')
Of course, the elements of this array must sum up to 1. *
def toString(self):
Returns a string describing the not-yet-built or built model.
- The following code snippet simulates the "main" method; it creates an instance of the classifier and passes it on to the
Classifier.runClassifier
method:
This doesn't work right out-of-the-box, since Jython cannot access protected static methods in superclasses. One has to set the following value in the Jython registry to make it work (taken from this FAQ):
Documentation#
Documentation in Python is done with the so-called doc strings within the class or method the documentation is for. Using HappyDoc, one can use structured text to output nice HTML, similar to Javadoc.
- Class doc string:
class JeroR (Classifier, JythonSerializableObject):
"""
JeroR is a Jython implementation of the Weka classifier ZeroR
'author' -- FracPete (fracpete at waikato dot ac dot nz)
'version' -- $Revision$
"""
Note: the
$Revision$
tag is filled in by a source control system like CVS or Subversion.
- Method doc string:
def classifyInstance(self, instance):
"""
returns the prediction for the given instance
Parameter(s):
'instance' -- the instance to predict the class value for
Return:
the prediction for the given instance
"""
Execution#
Note: The commands listed here for a Linux/Unix bash, for Windows remove all the backslashes ("\") at the end of the lines and assemble the command in a single line. Under Windows, the path separator ":" used in the CLASSPATH needs to be replaced with ";" as well.
Jython#
The Jython classifier, e.g., FunkyClassifier.py
, can be run like this from commandline, with only the weka.jar
and the jython.jar
in the CLASSPATH:
java -classpath weka.jar:jython.jar \
org.python.util.jython \
/some/place/FunkyClassifier.py \
-t /some/where/file.arff
Weka#
In order to execute the Jython classifier FunkyClassifier.py
with Weka, one basically only needs to have the weka.jar
and the jython.jar
in the CLASSPATH and call the weka.classifiers.JythonClassifier
classifier with the Jython classifier, i.e., FunkyClassifier.py
, as parameter ("-J
"):
java -classpath weka.jar:jython.jar \
weka.classifiers.JythonClassifier \
-J /some/place/FunkyClassifier.py \
-t /some/where/file.arff
Downloads#
- UsingJ48.py
- UsingJ48Ext.py
- JeroR.py -
weka.classifiers.rules.ZeroR
as Jython script implemented
See also#
- Use Weka in your Java code - for general information on how to use the Weka API
- Using Weka via Jepp - using the
javax.script
approach to interface Java and Python
Links#
- Jython
- Python
- Java
- Eclipse