ESTUDO COMPARATIVO DE ESTRATÉGIAS DE CLASSIFICAÇÃO DE PÁGINAS WEB / A COMPARATIVE STUDY OF WEB PAGE CLASSIFICATION STRATEGIES
AUTOR(ES)
THORAN ARAGUEZ RODRIGUES
DATA DE PUBLICAÇÃO
2009
RESUMO
The amount of information on the Internet increases every day. Even though this proliferation increases the chances that the subject being searched for by an user is on the Web, it also makes finding the desired information much harder. The automated classification of pages is, therefore, an important tool for organizing Web content, with specific applications on the improvement of results displayed by search engines. In this dissertation, a comparative study of different attribute sets and classification methods for the functional classification of web pages was made, focusing on 4 classes: Blogs, Blog Posts, News Portals and News. Throughout the experiments, it became evident the best approach for this task is to employ attributes that come both from the structure and the text of the web pages. We also presented a new strategy for extracting and building text attribute sets, that takes into account the different writing styles for each page class.
ASSUNTO(S)
web aprendizado de maquina classification web blogs blogs classificacao machine learning
ACESSO AO ARTIGO
Documentos Relacionados
- Estudo comparativo de estratégias de controle para inversores de fontes ininterruptas de energia.
- Globalização, estrategias gerenciais e trabalhadores : um estudo comparativo da industria brasileira de celulose
- "Classificação de páginas na internet"
- Comparative study of descriptors for content-based image retrieval on the web
- Comparative study of differents tropical soils geotechnical classification systems