2017-05-24 21:12:18 - Atualizado em 2017-05-24 21:12:54

Mldatagen - A multi-label dataset generator

Synthetic Dataset Generator for Multi-label Learning (Mldatagen)

This framework, which is described in ICMC-USP technical report, can to generate synthetic multi-label datasets using two strategies: hyperspheres or hypercubes. For each label in a dataset, these strategies randomly generate a geometric shape (hypersphere or hypercube), which is populated with points (instances or examples) randomly generated. Afterwards, each instance is labeled according to the shapes it belongs to, which defines the instance multi-label.

After choosing the strategy to be applied, the user must set some mandatory parameters: number of relevant features, number of irrelevant features, number of redundant features, number of labels and number of instances of the dataset. It is also possible to set the optional parameters which have default values: maximum and minimum size of the internal hyperspheres/hypercubes, noise level(s) and dataset name.

The framework output consists of a synthetic dataset without noise, as well as one synthetic dataset per noise level considered, in the Mulan format. This format consists of an ARFF file and a XML file per dataset. These files can be directly submitted to the Mulan library, which makes available several methods for multi-label learning.

Atenção! Conteúdo original hospedado em: http://sites.labic.icmc.usp.br/mldatagen.