Resumo
The use of text mining techniques has increased considerably in recent years due to the large amount of text information being produced and stored by electronic systems and the need to make this data information for organizations. In this context, the Court of Auditors of the State of Rio Grande do Norte (Tribunal de Auditors do Rio Grande do Norte, TCE-RN) receives daily a large amount of electronic invoices containing data of product's purchases that need to be analyzed for the society's benefit. Still, thief documents allow free filling, often erroneous, of some data by the sellers who issue the invoices. This way, the documents do not come to follow a pattern and make it possible to carry out analysis in a practical and efficient way through common tools for obtaining and filtering data. Therefore, there is a need for automated processing in order to standardize the data, make them available quickly and enable their use as information for audit purposes. So, this work presents a solution based on text mining and machine learning techniques for the problem of identifying commercialized products in the state of Rio Grande do Norte from the description field of Electronic Invoices as a way to enable the classification of these products into unique products
DOI: https://doi.org/10.56238/tfisdwv1-078