Book2speech: a low-cost machine for converting textual content into audible feedback via synthesized speech
dc.contributor.advisor1 | SAMPAIO NETO, Nelson Cruz | |
dc.contributor.advisor1Lattes | http://lattes.cnpq.br/9756167788721062 | pt_BR |
dc.contributor.advisor1ORCID | https://orcid.org/0000-0003-0408-4187 | pt_BR |
dc.creator | CANAVARRO, João Victor da Silva Dias | |
dc.creator.Lattes | http://lattes.cnpq.br/1994255852377245 | pt_BR |
dc.date.accessioned | 2025-03-07T15:11:24Z | |
dc.date.available | 2025-03-07T15:11:24Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Even with the great advances in technology regarding the ease of disseminating information digitally, printed media continues to be an important and frequently used means of conveying knowledge. However, for the visually impaired, there is still a major barrier in accessing phys ically distributed content, since alternative methods of reading, such as Braille, are often not available, in addition to the lack of literacy in these writing systems among the blind population. To overcome these difficulties, solutions in the Assistive Technology areas have been proposed, both in academic and commercial ambits, aiming at increasing the independence, quality of life and inclusion of this portion of the population. The present work seeks to develop a stand alone reading machine to recognize and convert the textual content of books and derivatives into audible feedback via synthesized speech. Based on the Raspberry Pi 3, an embedded mi crocomputer, equipped with the Pi NoIR (InfraRed) camera module, Book2Speech is a machine that performs the image acquisition and the Optical Character Recognition and Text-to-Speech procedures, making use of modules for image and text processing to improve the representa tiveness of the synthesized voice, reproduced through an external speaker. Due to the lack of available document-image datasets in Brazilian Portuguese, the ICDAR2015 dataset, composed mostly of English text, was used to evaluate Book2Speech’s performance. Also, the processing time and error rate metrics were considered, which are calculated from the difference between the text recognized by the machine and the reference. Regarding the results obtained, the de warping method using the L-BFGS-B algorithm as the optimizer obtained the lowest word error rate (12.32%), while the average threshold pipeline followed by dewarping with L-BFGS-B obtained the lowest character error rate (13.27%). On the other hand, the spelling correction methods evaluated did not lead to good results, often increasing the book2speech error rate. Fi nally, it is worth mentioning that the system developed, along with the tools and resources used, are freely available. | pt_BR |
dc.identifier.citation | CANAVARRO, João Victor da Silva Dias. Book2speech: a low-cost machine for converting textual content into audible feedback via synthesized speech. Orientador: Roberto Samarone dos Santos Araújo. 2021. 63 f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Faculdade de Computação, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, 2021. Disponível em:. Acesso em:. | pt_BR |
dc.identifier.uri | https://bdm.ufpa.br/jspui/handle/prefix/7777 | |
dc.rights | Acesso Aberto | pt_BR |
dc.source | 1 CD-ROM | pt_BR |
dc.subject | Assistive technology | pt_BR |
dc.subject | Image processing | pt_BR |
dc.subject | Spelling correction | pt_BR |
dc.subject | Optical character recognition | pt_BR |
dc.subject | Text-to-speech | pt_BR |
dc.subject | Embedded systems | pt_BR |
dc.subject.cnpq | CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO | pt_BR |
dc.title | Book2speech: a low-cost machine for converting textual content into audible feedback via synthesized speech | pt_BR |
dc.type | Trabalho de Curso - Graduação - Monografia | pt_BR |