Harpia - Hierarchical Classification Framework Harpia is an open-source Java library for the development of machine learning algorithms which learn from hierarchically-labelled examples. It is named after the ''Harpia harpyja'' eagle. The implementation of this framework is based on the Weka Machine Learning API. Therefore, increasing the spectrum of available machine learning algorithms to be used as base classifiers in the context of "local" hierarchical classification algorithms. In addition, the evaluation module contains a variety of measures from three categories: example based, label based and level based. Furthermore, other measures such as the micro and macro average measures are also available.
IESystem - Information Extraction System The IESystem extracts metadata from scientific articles, even when they are provided from different sources or written in different languages. The process of metadata extraction is based on models which describe relative content positions or content indicators to be extracted. A set of functionalities for pre-processing and assistance to the user when constructing models are also available.
KNN-WEKA - A new k-nearest neighbor implementation for Weka KNN-WEKA extends the current Weka k-nearest neighbor implementation by adding an example weighting function related to the distance from the current example to the query example. Moreover, KNN-Weka provides a distance function, known as the Heterogeneous Euclidean-VDM Metric (HVDM), which aims to better incorporate the information provided by nominal attributes.
PRETEXT - Text preprocessing PRETEXT is a computational tool implemented in Perl using the object oriented paradigm, which automatically performs most of the Text Mining pre-processing tasks in a collection of documents. The documents may be written in three different languages: Portuguese, Spanish and English. In addition, the tool includes facilities to reduce the dimensionality of any text pre-processed data set by using Zipf’s law and Luhn cut-offs.
TaXEm - a tool for helping evaluate domain topics The TaXEm (Taxonomia em XML da Embrapa) is a fast and efficient tool to organize, retrieve, browse and extract knowledge from textual documents. In order to organize specific domain information, TaXEm builds a taxonomy which can be (semi)/automatically evaluated. This evaluation can be carried out using objective measures or using a subjective analysis based on the domain specialist judgment.
Torch - Topic Hierarchies Torch helps users to “see hidden topics” in text collections. This tool can be used in a wider variety of applications such as digital libraries, web directories and document engineering. Torch is based on the IHTC - Incremental Hierarchical Term Cluster method, which aims to build topic hierarchies from growing text collections.
Gráficos Labic Gráficos Labic is a very simple tool which helps the user to quickly choose from a set of graphs commonly used in the pre-processing phase of the Data Mining process, as well as retrieving the correspondent R code of the chosen graph.
CoAL and Co-Training Java Implementation CoAL is a new algorithm which merges Co-Training, a well known multi-view semi-supervised machine learning algorithm, with Co-Testing, a multi-view active learning algorithm. CoAL, as well as Co-Training, were implemented in Java using the Weka API. In addition to these algorithms, the implementation offers an abstract class that eases the task of implementing new Co-Training style algorithms.
FEATuRE - Features gEnerator based on AssociaTion RulEs The required steps to generate the bag-of-related words are implemented in this tool. Thre is also functionalities to analyse the generated bag-of-related-words.
ICT - Inductive Classification Tool
Inductive Classification Tool was developed in Java language and aims to generate results using traditional inductive algorithms and their different parameter for datasets represented in ARFF format.
RotuLabic is a system developed to support manual labeling of documents. The system uses a transductive learning algorithm to recommend labels to the user and, thus, supports the manual labeling work. Currently, the system interface is available only in Portuguese.