Simulating reading mistakes for child speech Transformer-based phone recognition - IRIT - Université Toulouse III Paul Sabatier Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Simulating reading mistakes for child speech Transformer-based phone recognition

Résumé

Current performance of automatic speech recognition (ASR) for children is below that of the latest systems dedicated to adult speech. Child speech is particularly difficult to recognise, and substantial corpora are missing to train acoustic models. Furthermore, in the scope of our reading assistant for 5-8-year-old children learning to read, models need to cope with disfluencies and reading mistakes, which remain considerable challenges even for state-of-the-art ASR systems. In this paper, we adapt an end-to-end Transformer acoustic model to speech from children learning to read. Transfer learning (TL) with a small amount of child speech improves the phone error rate (PER) by 48.7% relative over an adult model and outperforms a TL-adapted DNN-HMM model by 21.0% relative PER. Multi-objective training with a Connectionist Temporal Classification (CTC) function further reduces the PER by 4.8% relative. We propose a method of reading mistakes data augmentation, where we simulate word-level repetitions and substitutions with phonetically or graphically close words. Combining these two types of reading mistakes reaches a 19.9% PER, with a 13.1% relative improvement over the baseline. A detailed analysis shows that both the CTC multi-objective training and the augmentation with synthetic repetitions help the attention mechanisms better detect children's disfluencies.
Fichier principal
Vignette du fichier
Paper_Interspeech2021_LucileGelin.pdf (309.26 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03257870 , version 1 (11-06-2021)

Identifiants

  • HAL Id : hal-03257870 , version 1

Citer

Lucile Gelin, Thomas Pellegrini, Julien Pinquier, Morgane Daniel. Simulating reading mistakes for child speech Transformer-based phone recognition. Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2021, Brno, Czech Republic. ⟨hal-03257870⟩
194 Consultations
242 Téléchargements

Partager

Gmail Facebook X LinkedIn More