On the generative mechanisms underlying the cortical tracking of natural speech: a position paper

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Speech is central to human life. Yet how the human brain converts patterns of acoustic speech energy into meaning remains unclear. This is particularly true for natural, continuous speech, which requires us to efficiently parse and process speech at multiple timescales in the context of our ongoing conversation and situational knowledge. Much recent progress has been made on this problem by the realization that the dynamics of natural speech stimuli and of the cortical responses they produce share a close correspondence – a phenomenon known as cortical speech tracking. This has led to the development of new methods to study the neurophysiology of speech processing in more naturalistic paradigms. However, the field still lacks consensus regarding the precise physiological mechanisms and neurostructural origins of this tracking. In particular, two contrasting theories have been advanced that attempt to explain the genesis of this phenomenon. The first proposes that the quasi-rhythmic nature of continuous speech “entrains” intrinsic, endogenous oscillations in the brain as a way to parse that continuous speech into smaller units for further (linguistic) processing. Meanwhile, the second proposes that the cortical tracking of speech reflects the summation of a series of transient evoked responses from hierarchically organized neural networks that are tuned to the different acoustic and linguistic features of speech. The contrast between these two ideas is reflected in the emergence of two almost completely non-overlapping literatures in the field of speech electrophysiology. The goal of the present article is to take a strong position on this debate – by making some falsifiable claims and questioning some existing assumptions and interpretations in the field – as a step towards increased clarity on the topic of cortical speech tracking. Our position is based on – and likely biased by – the fact that we and others have spent almost 20 years fitting models to relate natural speech stimuli to neural activity. These models show properties that are highly reminiscent of evoked potentials and entirely consistent with their generation have derived from summed evoked responses. This, combined with what we see as a weak body of evidence for the role of entrained oscillations in brain responses to natural speech leads us to assert that the cortical tracking of speech is dominated by evoked responses with a likely very limited contribution from oscillatory entrainment.

Related articles

Related articles are currently not available for this article.