Efficient Estimation of Children's Language Exposure in Two Bilingual Communities.
Margaret CychoszAnele VillanuevaAdriana WeislederPublished in: Journal of speech, language, and hearing research : JSLHR (2021)
Purpose The language that children hear early in life is associated with their speech-language outcomes. This line of research relies on naturalistic observations of children's language input, often captured with daylong audio recordings. However, the large quantity of data that daylong recordings generate requires novel analytical tools to feasibly parse thousands of hours of naturalistic speech. This study outlines a new approach to efficiently process and sample from daylong audio recordings made in two bilingual communities, Spanish-English in the United States and Quechua-Spanish in Bolivia, to derive estimates of children's language exposure. Method We employed a general sampling with replacement technique to efficiently estimate two key elements of children's early language environments: (a) proportion of child-directed speech (CDS) and (b) dual language exposure. Proportions estimated from random sampling of 30-s segments were compared to those from annotations over the entire daylong recording (every other segment), as well as parental report of dual language exposure. Results Results showed that approximately 49 min from each recording or just 7% of the overall recording was required to reach a stable proportion of CDS and bilingual exposure. In both speech communities, strong correlations were found between bilingual language estimates made using random sampling and all-day annotation techniques. A strong association was additionally found for CDS estimates in the United States, but this was weaker at the Bolivian site, where CDS was less frequent. Dual language estimates from the audio recordings did not correspond well to estimates derived from parental report collected months apart. Conclusions Daylong recordings offer tremendous insight into children's daily language experiences, but they will not become widely used in developmental research until data processing and annotation time substantially decrease. We show that annotation based on random sampling is a promising approach to efficiently estimate ambient characteristics from daylong recordings that cannot currently be estimated via automated methods.