Synopsis#
The Pruned Sets method (PS). Removes examples with P-infrequent labelsets from the training data, then subsamples these labelsets N time to produce N new examples with P-frequent labelsets. Then train a standard LC classifier. The idea is to reduce the number of unique class values that would otherwise need to be learned by LC. Best used in an Ensemble (e.g., EnsembleML). For more information see: Jesse Read, Bernhard Pfahringer, Geoff Holmes: Multi-label Classification Using Ensembles of Pruned Sets. In: ICDM'08: International Conference on Data Mining (ICDM 2008). Pisa, Italy., 2008.
BibTeX#
@inproceedings{JesseRead2008,
author = {Jesse Read, Bernhard Pfahringer, Geoff Holmes},
booktitle = {ICDM'08: International Conference on Data Mining (ICDM 2008). Pisa, Italy.},
title = {Multi-label Classification Using Ensembles of Pruned Sets},
year = {2008}
}
Options#
-
-P <value>Sets the pruning value, defining an infrequent labelset as one which occurs <= P times in the data (P = 0 defaults to LC). default: 0 (LC)
-
-N <value>Sets the (maximum) number of frequent labelsets to subsample from the infrequent labelsets. default: 0 (none) n N = n -n N = n, or 0 if LCard(D) >= 2 n-m N = random(n,m)
-
-S <value>The seed value for randomization default: 0
-
-W <classifier name>Full name of base classifier. (default: weka.classifiers.trees.J48)
-
-output-debug-infoIf set, classifier is run in debug mode and may output additional info to the console
-
-do-not-check-capabilitiesIf set, classifier capabilities are not checked before classifier is built (use with caution).
-
-num-decimal-placesThe number of decimal places for the output of numbers in the model (default 2).
-
-batch-sizeThe desired batch size for batch prediction (default 100).
Options specific to classifier weka.classifiers.trees.J48:
-
-UUse unpruned tree.
-
-ODo not collapse tree.
-
-C <pruning confidence>Set confidence threshold for pruning. (default 0.25)
-
-M <minimum number of instances>Set minimum number of instances per leaf. (default 2)
-
-RUse reduced error pruning.
-
-N <number of folds>Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3)
-
-BUse binary splits only.
-
-SDo not perform subtree raising.
-
-LDo not clean up after the tree has been built.
-
-ALaplace smoothing for predicted probabilities.
-
-JDo not use MDL correction for info gain on numeric attributes.
-
-Q <seed>Seed for random data shuffling (default 1).
-
-doNotMakeSplitPointActualValueDo not make split point actual value.
-
-output-debug-infoIf set, classifier is run in debug mode and may output additional info to the console
-
-do-not-check-capabilitiesIf set, classifier capabilities are not checked before classifier is built (use with caution).
-
-num-decimal-placesThe number of decimal places for the output of numbers in the model (default 2).
-
-batch-sizeThe desired batch size for batch prediction (default 100).