Skip to content

Multi instance classification

Multi-instance (MI) classification is a supervised learning technique, but differs from normal supervised learning:

  • it has multiple instances in an example
  • only one class label is observable for all the instances in an example

Classifiers#

Multi-instance classifiers were originally available through a separate software package, Multi-Instance Learning Kit (= MILK). Weka handles relational attributes now natively since 3.5.3 and the multi-instance classifiers are available through the multiInstanceLearning package and filters through the multiInstanceFilters. Once the packages have been installed, the classifiers can be found in the following package:

 weka.classifiers.mi

Data format#

The data format for multi-instance classifiers is fairly simple:

  • bag-id - nominal attribute; unique identifier for each bag
  • bag - relational attribute; contains the instances of an example
  • class - the class label for the examples

Weka offers two filters to convert from flat file format (or propositional format), which is normally used in supervised classification, to multi-instance format and vice versa:

  • weka.filters.unsupervised.attribute.PropositionalToMultiInstance
  • weka.filters.unsupervised.attribute.MultiInstanceToPropositional

Here is an example of the musk1 UCI dataset, used quite often in publications covering MI learning (Note: ... denotes omission):

  • propositional format:

    This ARFF file lists all the attributes, molecule_name (which is the bag-id), f1 to f166 (containing the actual data of the instances) and the class attribute.

     @relation musk1
    
     @attribute molecule_name {MUSK-jf78,MUSK-jf67,MUSK-jf59,...,NON-MUSK-199}
     @attribute f1 numeric
     @attribute f2 numeric
     @attribute f3 numeric
     @attribute f4 numeric
     @attribute f5 numeric
     ...
     @attribute f166 numeric
     @attribute class {0,1}
    
     @data
     MUSK-188,42,-198,-109,-75,-117,11,23,-88,-28,-27,...,48,-37,6,30,1
     MUSK-188,42,-191,-142,-65,-117,55,49,-170,-45,5,...,48,-37,5,30,1
     ...
    

  • multi-instance format:

    Using the relational attribute, one only has three attributes on the first level: molecule_name, bag and class. The relational attribute contains the instances for each example, consisting of the attributes f1 to f166. The data of the relational attribute is surrounded by quotes and the single instances inside the bag are separated by line-feeds (= \n).

     @relation musk1
    
     @attribute molecule_name {MUSK-jf78,MUSK-jf67,MUSK-jf59,...,NON-MUSK-199}
     @attribute bag relational
       @attribute f1 numeric
       @attribute f2 numeric
       @attribute f3 numeric
       @attribute f4 numeric
       @attribute f5 numeric
       ...
       @attribute f166 numeric
     @end bag
     @attribute class {0,1}
    
     @data
     MUSK-188,"42,-198,-109,-75,-117,11,23,-88,-28,-27,...,48,-37,6,30\n42,-191,-142,-65,-117,55,49,-170,-45,5,...,48,-37,5,30\n...",1
     ...
    

See also#