DATA MINING WITH ROUGH SETS TECHNIQUES / MINERAÇÃO DE DADOS COM TÉCNICAS DE ROUGH SETS

AUTOR(ES)
DATA DE PUBLICAÇÃO

2000

RESUMO

This dissertation investigates the application of Rough Sets to the process of KDD - Knowledge Discovery in Databases. The main goal of the work was to evaluate the performance of Rough Sets techniques in solving the classification problem. Classification is a task of the Data Mining step in KDD Process that performs the discovery of decision rules that best represent a group of registers in a database. The work had five major steps: study of the KDD process; study of Rough Sets techniques applied to data mining; evaluation of existing data mining tools; development of Bramining project; and execution of some case studies to evaluate Bramining. The study of KDD process included all its steps: transformation, cleaning, selection, data mining and post- processing. The results obtained served as a basis to the enhamcement of Bramining. The study of Rough Sets techniques included the research of theory´s concepts and its applicability at KDD context. The Rough Sets tehory has been introduced by Zdzislaw Pawlak in the early 80´s as a mathematical approach to the analysis of vague and uncertain data. This research made possible the implementation of the technique under the environment of the developed tool. The analysis of existing data mining tools included studying and testing of software based on different techniques, enriching the background used in the evaluation of the research. The evolution of Bramining Project consisted in the enhancement of the KDD environment developed in previous works, including the addition of Rough Sets techniques. The case studies were performed simultaneously with Bramining and a commercial minig tool, for comparison reasons. The quality of the knowledge generated by Bramining was considered equivalent to the results of commercial tool, both providing good decision rules for most of the cases. Nevertheless, Bramining proved to be more adapted to the complete KDD process, thanks to the many available features to prepare data to data mining step. The results achieved through the developed application proved the suitability of Rough Sets concepts to the data classification task. Some weaknesses of the technique were identified, like the need of a previous attribute reduction and the inability to deal with continuous domain data. But as the technique has been inserted in a more complete KDD environment like the Bramining Project, those weaknesses ceased to exist. The features of data preparation available in Bramining environment, particularly the reduction and attribute codification options, enable the user to have the database fairly adapted to the use of Rough Sets algorithms. Data mining is a very relevant issue in present days and many methods have been proposed to the different tasks involved in it. Compared to other techniques, Rough Sets Theory did not bring significant advantages or disadvantages to the process, but it has been of great value to show there are alternate ways to knowledge discovery.

ASSUNTO(S)

data mining database mineracao de dados banco de dados

Documentos Relacionados