The following sections show how to obtain predictions/classifications without writing your own Java code via the command line.
After a model has been saved, one can make predictions for a test set, whether that set contains valid class values or not. The output will contain both the actual and predicted class. (Note that if the test class contains simply '?' for the class label for each instance, the "actual" class label for each instance will not contain useful information, but the predicted class label will.) The
-T <test_set> command-line switch specifies the dataset of instances whose classes are to be predicted, while the
-p <attribute_range> switch allows the user to write out a range of attributes (examples: "1-2" for the first and second attributes, or "0" for no attributes). Sample command line:
The format of the output is as follows:
<test_instance_index> <actual_class_index>:<actual_class_val> <pred_class_index>:<pred_class_val> [+| ] <prob_of_pred_class_val>
where "+" occurs only for those items that were mispredicted. Note that if the actual class label is always "?" (i.e., the dataset does not include known class labels), the error column will always be empty.
inst# actual predicted error prediction 1 1:? 1:0 0.757 2 1:? 1:0 0.824 3 1:? 1:0 0.807 4 1:? 1:0 0.807 5 1:? 1:0 0.79 6 1:? 2:1 0.661 ...
In this case, taken directly from a test dataset where all class attributes were marked by "?", the "actual" column, which can be ignored, simply states that each class belongs to an unknown class. The "predicted" column shows that instances 1 through 5 are predicted to be of class 1, whose value is 0, and instance 6 is predicted to be of class 2, whose value is 1. The error field is empty; if predictions were being performed on a labeled test set, each instance where the prediction failed to match the label would contain a "+". The probability that instance 1 actually belongs to class 0 is estimated at 0.757.
- Since Weka 3.5.4 you can also output the complete class distribution, not just the prediction, by using the parameter
-distributionin conjunction with the -p option. In this case, "*" is placed beside the probability in the distribution that corresponds to the predicted class value.
- If you have an ID attribute in your dataset as first attribute (you can always add one with the AddID filter), you could output it with
-p 1instead of using
-p 0. This works only for explicit train/test sets, but you can use the Explorer for cross-validation.
- Using the
-classificationsoption instead of
-p ...you can also use different output formats, like CSV:
-classifications "weka.classifiers.evaluation.output.prediction.CSV -p ..."(the
-poption takes the indices of the additional attributes to output).
The AddClassification filter (package
weka.filters.supervised.attribute) can either train a classifier on the input data and transform this or load a serialized model to transform the input data (even though the filter was introduced in 3.5.4, due to a bug in the commandline option handling, it is recommended to download a version >3.5.5 or a snapshot from the Weka homepage).
This filter can add the classification, class distribution and the error per row as extra attributes to the dataset.
- training the classifier, e.g., J48, on the input data and replacing the class values with the ones of the trained classifier:
java \ weka.filters.supervised.attribute.AddClassification \ -W "weka.classifiers.trees.J48" \ -classification \ -remove-old-class \ -i train.arff \ -o train_classified.arff \ -c last
java \ weka.filters.supervised.attribute.AddClassification \ -serialized /some/where/j48.model \ -classification \ -remove-old-class \ -i train.arff \ -o train_classified.arff \ -c last
The Weka GUI allows you as well to output predictions based on a previously saved model.
See the Explorer section of the Saving and loading models article to setup the Explorer. Additionally, you need to check the Output predictions options in the More options dialog. Right-clicking on the respective results history item and selecting Re-evaluate model on current test set will output then the predictions as well (the statistics will be useless due to missing class values in the test set, so just ignore them). The output is similar to the one produced by the commandline.
Example output for the anneal UCI dataset:
== Predictions on test set == inst#, actual, predicted, error, probability distribution 1 ? 3:3 + 0 0 *1 0 0 0 2 ? 3:3 + 0 0 *1 0 0 0 3 ? 3:3 + 0 0 *1 0 0 0 ... 17 ? 6:U + 0 0 0 0 0 *1 18 ? 6:U + 0 0 0 0 0 *1 19 ? 3:3 + 0 0 *1 0 0 0 20 ? 3:3 + 0 0 *1 0 0 0 ...
-poption. In the More options... dialog you can specify those attribute indices with Output additional attributes, e.g., first or 1-7. In contrast to the commandline, this output also works for cross-validation.
Using the PredictionAppender#
With the PredictionAppender (from the Evaluation toolbar) you cannot use an already saved model, but you can train a classifier on a dataset and output an ARFF file with the predictions appended as additional attribute. Here's an example setup:
/---dataSet--> TrainingSetMaker ---trainingSet--\ ArffLoader --< >--> J48... \---dataSet--> TestSetMaker -------testSet------/ ...J48 --batchClassifier--> PredictionAppender --testSet--> ArffSaver
Using the AddClassification filter#
The AddClassification filter can be used in the KnowledgeFlow as well, either for training a model, or for using a serialized model to perform the predictions. An example setup could look like this:
If you want to perform the classification within your own code, see the classifying instances section of this article, explaining the Weka API in general.
- Saving and loading models
- Use Weka in your Java code - general information about using the Weka API
- Using ID attributes
The developer version shortly before the release of 3.5.6 was used as basis for this article.