Synopsis#

A specified multi-label classifier is built on the training data. This model is then used to classify the test data. The confidence with which instances are classified is used to reweight them. This data is then used to retrain the classifier. This cycle continues ('EM'-style) for I iterations. The final model is used to officially classifier the test data. Because of the weighting, it is advised to use a classifier which gives good confidence (probabalistic) outputs.

BibTeX#

@article{Nigam2010,
   author = {Nigam, Kamal and Mccallum, Andrew K. and Thrun, Sebastian and Mitchell, Tom M.},
   journal = {Machine Learning},
   number = {2/3},
   pages = {103--134},
   title = {Text classification from Labeled and Unlabeled Documents using EM},
   volume = {39},
   year = {2010}
}

Options#

  • -I <value>

    The number of iterations of EM to carry out (default: 10)

  • -W <classifier name>

    Full name of base classifier. (default: meka.classifiers.multilabel.CC)

  • -output-debug-info

    If set, classifier is run in debug mode and may output additional info to the console

  • -do-not-check-capabilities

    If set, classifier capabilities are not checked before classifier is built (use with caution).

  • -num-decimal-places

    The number of decimal places for the output of numbers in the model (default 2).

  • -batch-size

    The desired batch size for batch prediction (default 100).

Options specific to classifier meka.classifiers.multilabel.CC:

  • -S <value>

    The seed value for randomizing the data. (default: 0)

  • -W <classifier name>

    Full name of base classifier. (default: weka.classifiers.trees.J48)

  • -output-debug-info

    If set, classifier is run in debug mode and may output additional info to the console

  • -do-not-check-capabilities

    If set, classifier capabilities are not checked before classifier is built (use with caution).

  • -num-decimal-places

    The number of decimal places for the output of numbers in the model (default 2).

  • -batch-size

    The desired batch size for batch prediction (default 100).

Options specific to classifier weka.classifiers.trees.J48:

  • -U

    Use unpruned tree.

  • -O

    Do not collapse tree.

  • -C <pruning confidence>

    Set confidence threshold for pruning. (default 0.25)

  • -M <minimum number of instances>

    Set minimum number of instances per leaf. (default 2)

  • -R

    Use reduced error pruning.

  • -N <number of folds>

    Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3)

  • -B

    Use binary splits only.

  • -S

    Do not perform subtree raising.

  • -L

    Do not clean up after the tree has been built.

  • -A

    Laplace smoothing for predicted probabilities.

  • -J

    Do not use MDL correction for info gain on numeric attributes.

  • -Q <seed>

    Seed for random data shuffling (default 1).

  • -doNotMakeSplitPointActualValue

    Do not make split point actual value.

  • -output-debug-info

    If set, classifier is run in debug mode and may output additional info to the console

  • -do-not-check-capabilities

    If set, classifier capabilities are not checked before classifier is built (use with caution).

  • -num-decimal-places

    The number of decimal places for the output of numbers in the model (default 2).

  • -batch-size

    The desired batch size for batch prediction (default 100).