Intensity variation signals metric structure in child-directed poetic speech 

Ahren Fitzroy & Mara Breen, Mount Holyoke College

Poster presented at Society for Music Perception and Cognition Meeting, San Diego, CA (2017)
(for reprint, contact

Regular stress in music guides attention to important moments in a manner similar to that observed for word onsets during speech segmentation. Our overarching hypothesis is that metric structure in speech is useful for guiding children’s segmentation during early language learning. We investigated whether intensity variation in child-directed poetic speech signals metric structure similar to the way that it does in music. We modeled intensity variation in a corpus of productions of The Cat in the Hat (Dr. Seuss, 1957) using a metric accent model derived from music performance (Drake & Palmer, 1993). Using linear mixed-effects regression, we modeled the maximum intensity (dB) of each word as a function of metric strength. To isolate meter from linguistic properties known to affect intensity, we included control parameters for segment number, lexical frequency, repetition, word class, syntactic structure, and capitalization. Intensity increased with fewer phonemes, lower frequency, first mention, open class, syntactic boundary non-alignment, and capitalization. Consistent with the music performance model, metric structure further predicted word intensity: words aligned with beat one in a 6/8 metric structure (e.g., down in (A)*) were produced with the greatest intensity, and words aligned with beat four (e.g., fish) were produced with intensity less than beat one but greater than all others. Consistent with prior work showing intensity reduction for predictable speech, words aligned with beat four were reduced when they completed a couplet (fall). That speakers use intensity variation to signal metric structure is novel in the speech production literature, and demonstrates strong connections between hierarchical timing processes in speech and music.

(A) “Put5 me6 | down1!” said2 the3 fish4.
This5 is6 | no1 fun2 at3 all4!
Put5 me6 | down1!” said2 the3 fish4.
“I5 do6 | NOT1 wish2 to3 fall4.”

*Subscripts in (A) indicate beat number in 6/8 meter, pipes indicate measure boundaries.

Audio examples:
Synthesized Cat In the Hat excerpt, normal order

Synthesized Cat In the Hat excerpt, random order

Astheimer, L. B., & Sanders, L. D. (2009). Listeners modulate temporally selective attention during natural speech processing. Biological Psychology, 80(1), 23–34.

Breen, M. (2017). Word durations in The Cat in the Hat are affected by metrical hierarchy and rhyme predictability. Talk presented at the 30th Annual CUNY Conference on Human Sentence Processing, Boston, MA

Drake, C., & Palmer, C. (1993). Accent Structures in Music Performance. Music Perception: An Interdisciplinary Journal10(3), 343–378.

Dr. Seuss. (1957). The Cat in the Hat. New York, NY: Random House.

Fitzroy, A. B., & Sanders, L. D. (2015). Musical Meter Modulates the Allocation of Attention across Time. Journal of Cognitive Neuroscience27(12), 2339–2351.

Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The Beginnings of Word Segmentation in English-Learning Infants. Cognitive Psychology39(3–4), 159–207.

Leong, V., & Goswami, U. (2015). Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech. PLOS ONE10(12), e0144411.

Lieberman, P. (1963). Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech. Language and Speech, 6(3), 172–187.

Todd, N. (1985). A Model of Expressive Timing in Tonal Music. Music Perception: An Interdisciplinary Journal3(1), 33–57.