Ferramenta para treinamento de modelos Doc2Vec.
Representações de documentos baseadas em expressões do domínio
Ferramenta para gerar representações de documentos com base em papéis semânticos.
RotuLabic is a system developed to support manual labeling of documents. The system uses a transductive learning algorithm to recommend labels to the user and, thus, supports the manual labeling work. Currently, the system interface is available only in Portuguese.
Inductive Classification Tool was developed in Java language and aims to generate results using traditional inductive algorithms and their different parameter for datasets represented in ARFF format.
This is a Java tool which transforms text files in a document-term matrix.
This tool extracts keywords from single documents using statistical methods.
The required steps to generate the bag-of-related words are implemented in this tool. Thre is also functionalities to analyse the generated bag-of-related-words.
This framework, which is described in ICMC-USP technical report, can to generate synthetic multi-label datasets using two strategies: hyperspheres or hypercubes. For each label in a dataset, these strategies randomly generate a geometric shape (hypersphere or hypercube), which is populated with points (instances or examples) randomly generated. Afterwards, each instance is labeled according to the shapes it belongs to, which defines the instance multi-label.