Seven Editora
##common.pageHeaderLogo.altText##
##common.pageHeaderLogo.altText##


Contact

  • Seven Publicações Ltda CNPJ: 43.789.355/0001-14 Rua: Travessa Aristides Moleta, 290- São José dos Pinhais/PR CEP: 83045-090
  • Principal Contact
  • Nathan Albano Valente
  • (41) 9 8836-2677
  • editora@sevenevents.com.br
  • Support Contact
  • contato@sevenevents.com.br

Web Scraping: Data search and retrieval methodology in information science

Machado G

Gilnei Machado


Keywords

Information retrieval
Data scraping
Programming in Information Science
Python

Abstract

Web scraping is a process of automated data collection on websites through the action of bots or programs. It is important nowadays because databases are getting bigger and bigger and, in general, there is an urgent need for information. The technique presented makes it possible to extract texts, numerical data, images, files and tables, available both on the site's home page and in its various tabs. The aim of this chapter is to present the potential of using Web scraping through Python to collect data from websites and its importance for information retrieval in Information Science. We used command lines written by Lisa Tagliaferri to check whether these command lines work and whether we were able to obtain the desired information. The site used to retrieve the desired information about artists' names and the links to their names was Web archive, which lists all the artists whose works are in the National Gallery of Art in the USA. As a result, we realized that command lines are extremely useful for obtaining information, since they allow us to obtain a large amount of information in a short time. In conclusion, we saw that scraping data from websites is perfectly feasible using Python and its code and that the information retrieved was fully satisfactory.

 

DOI:https://doi.org/10.56238/sevened2023.008-002


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2023 Gilnei Machado