Skip to content

Using weka from jython

Jython is an implementation of the high-level, dynamic, object-oriented language Python written in 100% Pure Java, and seamlessly integrated with the Java platform. It thus allows you to run Python on any Java platform.

-- taken from the Jython homepage

This article explains how use Weka classes from within Jython and how to write a classifier in Jython that can be used within the Weka framework.

Accessing Weka classes from Jython#

Requirements#

In order for Jython to find the Weka classes, you must export them in your CLASSPATH. Here is an example for adding the weka.jar located in the directory /some/where to the CLASSPATH in a bash under Linux:

export CLASSPATH=$CLASSPATH:/some/where/weka.jar

Note: Windows users must just the backslash ("\") in the command prompt instead of the slash ("/") in paths.

Implementation#

As soon as one imports classes in a Jython module one can use that class just like in Java. E.g., if one wants to use the J48 classifier, one only needs to import it as follows:

 import weka.classifiers.trees.J48 as J48

Here's a Jython module (UsingJ48.py):

 import sys

 import java.io.FileReader as FileReader
 import weka.core.Instances as Instances
 import weka.classifiers.trees.J48 as J48

 # load data file
 file = FileReader("/some/where/file.arff")
 data = Instances(file)
 data.setClassIndex(data.numAttributes() - 1)

 # create the model
 j48 = J48()
 j48.buildClassifier(data)

 # print out the built model
 print j48

A slightly more elaborate example can be found in UsingJ48Ext.py, which uses more methods of the weka.classifiers.Evaluation class.

NB: The example UsingJ48Ext.py needs Weka 3.6.x to run, due to some changes in the API.

Implementing a Jython classifier#

Requirements#

  • Weka >3.5.6
  • Jython 2.2rc2 (later versions should work as well)

Implementation#

This section covers the implementation of weka.classifiers.rules.ZeroR in Python, JeroR.py:

  • Subclass an abstract superclass of Weka classifiers (in this case weka.classifiers.Classifier):

class JeroR (**Classifier**, JythonSerializableObject):

Note: the JythonSerializableObject interface is necessary for Serialization purposes (Weka creates copies of classifiers via serialization)

  • You have to implement the following methods:

    • def listOptions(self):

      Returns an java.util.Enumeration of weka.core.Option objects of all available options. Calling the superclass method is done with *<superclass>*.listOptions(), e.g., Classifier.listOptions().

    • def setOptions(self, options):

      Sets the commandline options, with the parameter options being an array of strings.

    • def getOptions(self):

      Returns an array of strings, containing all the currently set options (to be used with setOptions(self,options)).

    • def getCapabilities(self):

      Returns a weka.core.Capabilities object with information about what attributes and classes can be processed by this algorithm.

    • def buildClassifier(self, instances):

      This method builds the actual model based on the data provided. The first statements in this method should be the ones checking the capabilities of the algorithm against the data and removing all instances with a missing class value:

      # check the capabilities
      self.getCapabilities().testWithFail(instances)
      # remove instances with missing class
      instances = Instances(instances)
      instances.deleteWithMissingClass()
      

  • at least one of the following two:

    • def classifyInstance(self, instance):

      Returns either the index of the predicted class label (for nominal classes) or the regression result (for numeric classes)

    • def distributionForInstance(self, instance):

      This method returns an array of doubles containing the probabilities for all class labels. In case of a numeric class attribute, the length of this array is 1. In Jython, you can use the [jarray](http://www.jython.org/docs/jarray.html) module to generate a double array. With the following line you can create the correct array to be returned by this method (you still need to fill it with values):

result = jarray.zeros(instance.numClasses(), 'd')

Of course, the elements of this array must sum up to 1. * def toString(self): Returns a string describing the not-yet-built or built model.

  • The following code snippet simulates the "main" method; it creates an instance of the classifier and passes it on to the Classifier.runClassifier method:
if __name__ = "__main__":

    Classifier.runClassifier(JeroR(), sys.argv[1:])

This doesn't work right out-of-the-box, since Jython cannot access protected static methods in superclasses. One has to set the following value in the Jython registry to make it work (taken from this FAQ):

python.security.respectJavaAccessibility=false

Documentation#

Documentation in Python is done with the so-called doc strings within the class or method the documentation is for. Using HappyDoc, one can use structured text to output nice HTML, similar to Javadoc.

  • Class doc string:
 class JeroR (Classifier, JythonSerializableObject):

     """
     JeroR is a Jython implementation of the Weka classifier ZeroR

     'author' -- FracPete (fracpete at waikato dot ac dot nz)

     'version' -- $Revision$
     """

Note: the $Revision$ tag is filled in by a source control system like CVS or Subversion.

  • Method doc string:
 def classifyInstance(self, instance):

     """
     returns the prediction for the given instance

     Parameter(s):


         'instance' -- the instance to predict the class value for

     Return:


         the prediction for the given instance
     """

Execution#

Note: The commands listed here for a Linux/Unix bash, for Windows remove all the backslashes ("\") at the end of the lines and assemble the command in a single line. Under Windows, the path separator ":" used in the CLASSPATH needs to be replaced with ";" as well.

Jython#

The Jython classifier, e.g., FunkyClassifier.py, can be run like this from commandline, with only the weka.jar and the jython.jar in the CLASSPATH:

 java -classpath weka.jar:jython.jar \
   org.python.util.jython \
   /some/place/FunkyClassifier.py \
   -t /some/where/file.arff

Weka#

In order to execute the Jython classifier FunkyClassifier.py with Weka, one basically only needs to have the weka.jar and the jython.jar in the CLASSPATH and call the weka.classifiers.JythonClassifier classifier with the Jython classifier, i.e., FunkyClassifier.py, as parameter ("-J"):

 java -classpath weka.jar:jython.jar \
   weka.classifiers.JythonClassifier \
   -J /some/place/FunkyClassifier.py \
   -t /some/where/file.arff

Downloads#

See also#