ESTUDO COMPARATIVO DE ESTRATÉGIAS DE CLASSIFICAÇÃO DE PÁGINAS WEB / A COMPARATIVE STUDY OF WEB PAGE CLASSIFICATION STRATEGIES

AUTOR(ES)
DATA DE PUBLICAÇÃO

2009

RESUMO

The amount of information on the Internet increases every day. Even though this proliferation increases the chances that the subject being searched for by an user is on the Web, it also makes finding the desired information much harder. The automated classification of pages is, therefore, an important tool for organizing Web content, with specific applications on the improvement of results displayed by search engines. In this dissertation, a comparative study of different attribute sets and classification methods for the functional classification of web pages was made, focusing on 4 classes: Blogs, Blog Posts, News Portals and News. Throughout the experiments, it became evident the best approach for this task is to employ attributes that come both from the structure and the text of the web pages. We also presented a new strategy for extracting and building text attribute sets, that takes into account the different writing styles for each page class.

ASSUNTO(S)

web aprendizado de maquina classification web blogs blogs classificacao machine learning

Documentos Relacionados