Web Scraping: Data search and retrieval methodology in information science

Gilnei Machado

Authors

Gilnei Machado

Keywords:

Information retrieval, Data scraping, Programming in Information Science, Python

Abstract

Web scraping is a process of automated data collection on websites through the action of bots or programs. It is important nowadays because databases are getting bigger and bigger and, in general, there is an urgent need for information. The technique presented makes it possible to extract texts, numerical data, images, files and tables, available both on the site's home page and in its various tabs. The aim of this chapter is to present the potential of using Web scraping through Python to collect data from websites and its importance for information retrieval in Information Science. We used command lines written by Lisa Tagliaferri to check whether these command lines work and whether we were able to obtain the desired information. The site used to retrieve the desired information about artists' names and the links to their names was Web archive, which lists all the artists whose works are in the National Gallery of Art in the USA. As a result, we realized that command lines are extremely useful for obtaining information, since they allow us to obtain a large amount of information in a short time. In conclusion, we saw that scraping data from websites is perfectly feasible using Python and its code and that the information retrieved was fully satisfactory.

DOI:https://doi.org/10.56238/sevened2023.008-002

Web Scraping: Data search and retrieval methodology in information science

Authors

Keywords:

Abstract

Downloads

Additional Files

Published

Issue

Section

License

How to Cite

Latest publications

Information

Language