Book2speech: a low-cost machine for converting textual content into audible feedback via synthesized speech

dc.contributor.advisor1SAMPAIO NETO, Nelson Cruz
dc.contributor.advisor1Latteshttp://lattes.cnpq.br/9756167788721062pt_BR
dc.contributor.advisor1ORCIDhttps://orcid.org/0000-0003-0408-4187pt_BR
dc.creatorCANAVARRO, João Victor da Silva Dias
dc.creator.Latteshttp://lattes.cnpq.br/1994255852377245pt_BR
dc.date.accessioned2025-03-07T15:11:24Z
dc.date.available2025-03-07T15:11:24Z
dc.date.issued2021
dc.description.abstractEven with the great advances in technology regarding the ease of disseminating information digitally, printed media continues to be an important and frequently used means of conveying knowledge. However, for the visually impaired, there is still a major barrier in accessing phys ically distributed content, since alternative methods of reading, such as Braille, are often not available, in addition to the lack of literacy in these writing systems among the blind population. To overcome these difficulties, solutions in the Assistive Technology areas have been proposed, both in academic and commercial ambits, aiming at increasing the independence, quality of life and inclusion of this portion of the population. The present work seeks to develop a stand alone reading machine to recognize and convert the textual content of books and derivatives into audible feedback via synthesized speech. Based on the Raspberry Pi 3, an embedded mi crocomputer, equipped with the Pi NoIR (InfraRed) camera module, Book2Speech is a machine that performs the image acquisition and the Optical Character Recognition and Text-to-Speech procedures, making use of modules for image and text processing to improve the representa tiveness of the synthesized voice, reproduced through an external speaker. Due to the lack of available document-image datasets in Brazilian Portuguese, the ICDAR2015 dataset, composed mostly of English text, was used to evaluate Book2Speech’s performance. Also, the processing time and error rate metrics were considered, which are calculated from the difference between the text recognized by the machine and the reference. Regarding the results obtained, the de warping method using the L-BFGS-B algorithm as the optimizer obtained the lowest word error rate (12.32%), while the average threshold pipeline followed by dewarping with L-BFGS-B obtained the lowest character error rate (13.27%). On the other hand, the spelling correction methods evaluated did not lead to good results, often increasing the book2speech error rate. Fi nally, it is worth mentioning that the system developed, along with the tools and resources used, are freely available.pt_BR
dc.identifier.citationCANAVARRO, João Victor da Silva Dias. Book2speech: a low-cost machine for converting textual content into audible feedback via synthesized speech. Orientador: Roberto Samarone dos Santos Araújo. 2021. 63 f. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Faculdade de Computação, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, 2021. Disponível em:. Acesso em:.pt_BR
dc.identifier.urihttps://bdm.ufpa.br/jspui/handle/prefix/7777
dc.rightsAcesso Abertopt_BR
dc.source1 CD-ROMpt_BR
dc.subjectAssistive technologypt_BR
dc.subjectImage processingpt_BR
dc.subjectSpelling correctionpt_BR
dc.subjectOptical character recognitionpt_BR
dc.subjectText-to-speechpt_BR
dc.subjectEmbedded systemspt_BR
dc.subject.cnpqCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOpt_BR
dc.titleBook2speech: a low-cost machine for converting textual content into audible feedback via synthesized speechpt_BR
dc.typeTrabalho de Curso - Graduação - Monografiapt_BR

Arquivo(s)

Pacote Original
Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
TCC_Boo2speechLowCost.pdf
Tamanho:
9.64 MB
Formato:
Adobe Portable Document Format
Licença do Pacote
Agora exibindo 1 - 1 de 1
Nenhuma Miniatura disponível
Nome:
license.txt
Tamanho:
1.84 KB
Formato:
Item-specific license agreed upon to submission
Descrição: