Login / Signup

Vocal development in a large-scale crosslinguistic corpus.

Margaret CychoszAlejandrina CristiaElika BergelsonMarisa CasillasGladys BaudetAnne S WarlaumontCamila ScaffLisa D YankowitzAmanda Seidl
Published in: Developmental science (2021)
This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1-36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., "ba" vs. "ee"). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter-annotator reliability on the crowdsourcing platform, relative to more traditional in-lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine-grained annotation decisions. Audio clips used for this project are compiled into a large-scale infant vocalization corpus that is available for other researchers to use in future work.
Keyphrases
  • young adults
  • electronic health record
  • big data
  • mental health
  • single cell
  • high throughput
  • rna seq
  • current status
  • quality improvement
  • clinical practice
  • data analysis
  • artificial intelligence
  • deep learning