Spencer Coffman

  • Guest Posts
  • Press Releases
  • Journal Articles
  • Supplemental Content
  • Guest Podcasts

Select Page

4 Stages of Speech Production

4 Stages of Speech Production

Humans produce speech on a daily basis. People are social creatures and are always talking to one another. Whether it is through social media, live conversation, texting, chat, or otherwise, we are always producing some form of speech. We produce this speech without thought.

That is, without the thought of how we produce it. Of course, we think about what we are going to say and how to say it so that the other people will listen but we don’t think about what it is made of and how our mind and body actually product speech.

If you have been following my other language-related articles, then you will not be surprised to find out that there are four stages of speech production. It seems that those who classified this data did so in measures of fours and fives. There are…

Five Methods to Learn a Language

Four Ways to Assess Student Knowledge

Five Language Learning Strategies

Four Properties of Spoken Language

The list goes on! Now we have four stages of speech production. These are the processes by which humans produce speech. All of the ways that we come up with the words we say have been compiled into four stages. These stages are not consecutive like normal scientific stages. Instead, they are simply classified as such.

This means that they are not something you go through developmentally. Rather they are simply different ways in which you may produce speech. I’ll describe each one of them so you can learn and understand what they are and know how exactly you come up with everything you say.

Money To Invest… Stop Paying For Trades!

Stage 1 – Conceptualization

The first one is called the Conceptualization Stage. This is when a speaker spontaneously thinks of what he or she is going to say. It is an immediate reaction to external stimuli and is often based on prior knowledge of the particular subject. No premeditation goes into these words and they are all formulated based upon the speaker’s knowledge and experience at hand. It is spontaneous speech. Examples of this can range from answering questions to the immediate verbiage produced as a result of stubbing your toe.

Stage 2 – Formulation

The second stage is called the Formulation Stage. This is when the speaker thinks of the particular words that are going to express their thoughts. It occurs almost simultaneously with the conceptualization stage. However, this time the speaker thinks about the response before responding. The speaker is formulating his or her words and deciding how best to reply to the external stimuli. Where conceptualization is more of an instant and immediate response, formulation is a little delayed.

Stage 3 – Articulation

The third stage is the Articulation Stage. This is when the speaker physically says what he or she has thought of saying. This is a prepared speech or planned wordage. In addition, the words may have been rehearsed such as when someone practices a presentation or rehearses a lie.

It involves the training of physical actions of several motor speech organs such as the lungs, larynx, tongue, lips, and other vocal apparatuses. Of course, the first two stages also involve these organs, however, the articulation stage uses these organs multiple times for the same word patterns.

Stage 4 – Self-Monitoring

The fourth stage is called the Self-Monitoring Stage. This is when the speaker reflects on what he or she has said and makes an effort to correct any errors in his or her speech. Often times this is done in a rebuttal or last words argument.

In addition, it could also be done during a conversation when the speaker realizes that he or she slipped up. This is the action of reflecting on what you said and making sure that what you said is what you meant.

Real-Time Spell Check And Grammar Correction

There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage. Of course, you won’t be able to consciously notice what stage you are in all of the time. However, once in a while it may be amusing for you to reflect on these stages and see how they coincide with the words you speak.

For more great information take a look at  the supplemental content on this website  and check out  these great blog posts . In addition, feel free to  connect with me on social media.

Enjoying This Content?

Consider donating to support spencer coffman, venmo         paypal          cashapp, related posts.

Upvoting For Maximum Rewards On Steemit

Upvoting For Maximum Rewards On Steemit

August 10, 2020

Hidden Cons And Drawbacks About Working At Home

Hidden Cons And Drawbacks About Working At Home

September 25, 2020

Calling All Steemers: This Information Is For You

Calling All Steemers: This Information Is For You

February 22, 2018

9 Intelligence Types And The Best Activities For Each

9 Intelligence Types And The Best Activities For Each

March 1, 2018

  • Tools and Resources
  • Customer Services
  • Applied Linguistics
  • Biology of Language
  • Cognitive Science
  • Computational Linguistics
  • Historical Linguistics
  • History of Linguistics
  • Language Families/Areas/Contact
  • Linguistic Theories
  • Neurolinguistics
  • Phonetics/Phonology
  • Psycholinguistics
  • Sign Languages
  • Sociolinguistics
  • Share This Facebook LinkedIn Twitter

Article contents

The source–filter theory of speech.

  • Isao Tokuda Isao Tokuda Ritsumeikan University
  • https://doi.org/10.1093/acrefore/9780199384655.013.894
  • Published online: 29 November 2021

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds.

The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra.

It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.

  • source–filter theory
  • speech production
  • vocal fold vibration
  • turbulent air flow
  • vocal tract acoustics
  • resonance tuning
  • source–filter interaction

1. Background

Human speech sounds are generated by a complex interaction of components of human anatomy. Most speech sounds begin with the respiratory system, which expels air from the lungs (figure 1 ). The air goes through the trachea and enters into the larynx, where two small muscular folds, called “vocal folds,” are located. As the vocal folds are brought together to form a narrow air passage, the airstream causes them to vibrate in a periodic manner (Titze, 2008 ). The vocal fold vibrations modulate the air pressure and produce a periodic sound. The produced sounds, when the vocal folds are vibrating, are called “voiced sounds,” while those in which the vocal folds do not vibrate are called “unvoiced sounds.” The air passages above the larynx are called the “vocal tract.” Turbulent air flows generated at constricted parts of the glottis or the vocal tract also contribute to aperiodic source sounds distributed over a wide range of frequencies. The shape of the vocal tract and consequently the positions of the articulators (i.e., jaw, tongue, velum, lips, mouth, teeth, and hard palate) provide a crucial factor to determine acoustical characteristics of the speech sounds. The state of the vocal folds, as well as the positions, shapes, and sizes of the articulators, changes over time to produce various phonetic sounds sequentially.

Figure 1. Concept of the source-filter theory. Airflow from the lung induces vocal fold vibrations, where glottal source sound is created. The vocal tract filter shapes the spectral structure of the source sound. The filtered speech sound is finally radiated from the mouth.

To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941 ; Fant, 1960 ) (see figure 1 ): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the “source” sound. Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components, which correspond to the resonances of the vocal tract, are amplified, while the other frequency components are diminished. The source sound characterizes mainly the vocal pitch, while the filter forms the overall spectral structure.

The source-filter theory provides a good approximation of normal human speech, under which the source sounds are only weakly influenced by the vocal tract filter, and has been applied successfully to speech analysis, synthesis, and processing (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). Independent control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications with language, which requires expression of various phonemes with a flexible maneuver of the vocal tract configuration (Fitch, 2010 ; Lieberman, 1977 ).

2. Source-Filter Theory

There are four main types of sound sources that provide an acoustic input to the vocal tract filter: glottal source, aspiration source, frication source, and transient source (Stevens, 1999 , 2005 ).

The glottal source is generated by the vocal fold vibrations. The vocal folds are muscular folds located in the larynx. The opening space between the left and right vocal folds is called “glottal area.” When the vocal folds are closely located to each other, the airflow coming from the lungs can cause the vocal fold tissues to vibrate. With combined effects of pressure, airflow, tissue elasticity, and collision between the left and right vocal folds, the vocal folds give rise to vibrations, which periodically modulate acoustic air pressure at the glottis. The number of the periodic glottal vibrations per second is called “fundamental frequency ( f o )” and is expressed in Hz or cycles per second. In the spectral space, the glottal source sound determines the strengths of the fundamental frequency and its integer multiples (harmonics). The glottal wave provides sources for voiced sounds such as vowels (e.g., [a],[e],[i],[o],[u]), diphthongs (i.e., combinations of two vowel sounds), and voiced consonants (e.g., [b],[d],[ɡ],[v],[z],[ð],[ʒ],[ʤ], [h],[w],[n],[m],[r],[j],[ŋ],[l]).

In addition to the glottal source, noisy signals also serve as the sound sources for consonants. Here, air turbulence developed at constricted or obstructed parts of the airway contributes to random (aperiodic) pressure fluctuations over a wide range of frequencies. Among such noisy signals, the one generated through the glottis or immediately above the glottis is called “aspiration noise.” It is characterized by a strong burst of breath that accompanies either the release or the closure of some obstruents. “Frication noise,” on the other hand, is generated by forcing air through a supraglottal constriction created by placing two articulators close together (e.g., constrictions between lower lip and upper teeth, between back of the tongue and soft palate, and between side of the tongue and molars) (Shadle, 1985 , 1991 ). When an airway in the vocal tract is completely closed and then released, “transient noise” is generated. By forming a closure in the vocal tract, a pressure is built up in the mouth behind the closure. As the closure is released, a brief burst of turbulence is produced, which lasts for a few milliseconds.

Some speech sounds may involve more than one sound source. For instance, a voiced fricative combines the glottal source and the frication noise. A breathy voice may come from the glottal source and the aspiration noise, whereas voiceless fricatives can combine two noise sources generated at the glottis and at the supralaryngeal constriction. These sound sources are fed into the vocal-tract filter to create speech sounds.

In the source-filter theory, the vocal tract acts as an acoustic filter to modify the source sound. Through this acoustic filter, certain frequency components are passed to the output speech, while the others are attenuated. The characteristics of the filter depend upon the shape of the vocal tract. As a simple case, consider acoustic characteristics of an uniform tube of length L = 17.5 cm , that is, a standard length for a male vocal tract (see figure 2 ). At one end, the tube is closed (as glottis), while, at the other end, it is open (as mouth). Inside of the tube, longitudinal sound waves travel either toward the mouth or toward the glottis. The wave propagates by alternately compressing and expanding the air in the tube segments. By this compression/expansion, the air molecules are slightly displaced from their rest positions. Accordingly, the acoustic air pressure inside of the tube changes in time, depending upon the longitudinal displacement of the air along the direction of the traveling wave. Profile of the acoustic air pressure inside the tube is determined by the traveling waves going to the mouth or to the glottis. What is formed here is the “standing wave,” the peak amplitude profile of which does not move in space. The locations at which the absolute value of the amplitude is minimum are called “nodes,” whereas the locations at which the absolute value of the amplitude is maximum are called “antinodes.” Since the air molecules cannot vibrate much at the closed end of the tube, the closed end becomes a node. The open end of the tube, on the other hand, becomes an antinode, since the air molecules can move freely there. Various standing waves that satisfy these boundary conditions can be formed. In figure 2 , 1 / 4 (purple), 3 / 4 (green), and 5 / 4 (sky blue) waves indicate first, second, and third resonances, respectively. Depending upon the number of the nodes in the tube, wavelengths of the standing waves are determined as λ = 4 L , 4 / 3 L , 4 / 5 L . The corresponding frequencies are obtained as f = c / λ = 490 , 1470, 2450 Hz, where c = 343 m / s represents the sound speed. These resonant frequencies are called “formants” in phonetics.

Figure 2. Standing waves of an uniform tube. For a tube having one closed end (glottis) and one open end (mouth), only odd-numbered harmonics are available. 1 / 4 (purple), 3 / 4 (green), and 5 / 4 (sky blue) waves correspond to the first, second, and third resonances (“ 1 / 4 wave” means 1 / 4 of one-cycle waveform is inside the tube).

Next, consider that a source sound is input to this acoustic tube. In the source sound (voiced source or noise, or both), acoustic energy is distributed in a broad range of frequencies. The source sound induces vibrations of the air column inside the tube and produces a sound wave in the external air as the output. The strength at which an input frequency is output from this acoustic filter depends upon the characteristics of the tube. If the input frequency component is close to one of the formants, the tube resonates with the input and propagates the corresponding vibration. Consequently, the frequency components near the formant frequencies are passed to the output at their full strength. If the input frequency component is far from any of these formants, however, the tube does not resonate with the input. Such frequency components are strongly attenuated and achieve only low oscillation amplitudes in the output. In this way, the acoustic tube, or the vocal tract, filters the source sound. This filtering process can be characterized by a transfer function, which describes dependence of the amplification ratio between the input and output acoustic signals on the frequency. Physically, the transfer function is determined by the shape of the vocal tract.

Finally, the sound wave is radiated from the lips of the mouth and the nose. Their radiation characteristics are also included in the vocal-tract transfer function.

2.3 Convolution of the Source and the Filter

Humans are able to control phonation (source generation) and articulation (filtering process) largely independently. The speech sounds are therefore considered as the response of the vocal-tract filter, into which a sound source is fed. To model such source-filter systems for speech production, the sound source, or excitation signal x t , is often implemented as a periodic impulse train for voiced speech, while white noise is used as a source for unvoiced speech. If the vocal-tract configuration does not changed in time, the vocal-tract filter becomes a linear time-invariant (LTI) system, and the output signal y t can be expressed by a convolution of the input signal x t and the impulse response of the system h t as

where the asterisk denotes the convolution. Equation ( 1 ), which is described in the time domain, can be also expressed in the frequency domain as

The frequency domain formula states that the speech spectrum Y ω is modeled as a product of the source spectrum X ω and the spectrum of the vocal-tract filter H ω . The spectrum of the vocal-tract filter H ω is represented by the product of the vocal-tract transfer function T ω and the radiation characteristics from the mouth and the nose R ω , that is, H ω = T ω R ω .

There exist several ways to estimate the vocal-tract filter H ω . The most popular approach is the inverse filtering, in which autoregressive parameters are estimated from an acoustic speech signal by the method of least-squares (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). The transfer function can then be recovered from the estimated autoregressive parameters. In practice, however, the inverse-filtering is limited to non-nasalized or slightly nasalized vowels. An alternative approach is based upon the measurement of the vocal tract shape. For a human subject, a cross-sectional area of the vocal tract can be measured by X-ray photography or magnetic resonance imaging (MRI). Once the area function of the vocal tract is obtained, the corresponding transfer function can be computed by the so-called transmission line model, which assumes one-dimensional plane-wave propagation inside the vocal tract (Sondhi & Schroeter, 1987 ; Story et al., 1996 ).

Figure 3. (a) Vocal tract area function for a male speaker’s vowel [a]. (b) Transfer function calculated from the area function of (a). (c) Power spectrum of the source sound generated from Liljencrants-Fant model. (d) Power spectrum of the speech signal generated from the source-filter theory.

As an example to illustrate the source-filter modeling, a sound of vowel /a/ is synthesized in figure 3 . The vocal tract area function of figure 3 (a) was measured from a male subject by the MRI (Story et al., 1996 ). By the transmission line model, the transfer function H ω is obtained as figure 3 (b) . The first and the second formants are located at F 1 = 805 Hz and F 2 = 1205 . By the inverse Fourier transform, the impulse response of the vocal tract system h t is derived. As a glottal source sound, the Liljencrants-Fant synthesize model (Fant et al., 1985 ) is utilized. The fundamental frequency is set to f o = 100 Hz , which gives rise to a sharp peak in the power spectrum in figure 3 (c) . Except for the peaks appearing at higher harmonics of f o , the spectral structure of the glottal source is rather flat. As shown in figure 3 (d) , convolution of the source signal with the vocal tract filter amplifies the higher harmonics of f o located close to the formants.

Since the source-filter modeling captures essence of the speech production, it has been successfully applied to speech analysis, synthesis, and processing (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). It was Chiba and Kajiyama ( 1941 ) who first explained the mechanisms of speech production based on the concept of phonation (source) and articulation (filter). Their idea was combined with Fant’s filter theory (Fant, 1960 ), which led to the “source-filter theory of vowel production” in the studies of speech production.

So far, the source-filter modeling has been applied only to the glottal source, in which the vocal fold vibrations provide the main source sounds. There are other sound sources, such as the frication noise. In the frication noise, air turbulence is developed at constricted (or obstructed) parts of the airway. Such random source also excites the resonances of the vocal tract in a similar manner as the glottal source (Stevens, 1999 , 2005 ). Its marked difference from the glottal source is that the filter property is determined by the vocal tract shape downstream from the constriction (or obstruction). For instance, if the constriction is at the lips, there exists no cavity downstream from the constriction, and therefore the acoustic source is radiated directly from the mouth opening with no filtering. When the constriction is upstream from the lips, the shape of the airway between the constriction and the lips determines the filter properties. It should be also noted that the turbulent source, generated at the constriction, depends sensitively on a three-dimensional geometry of the vocal tract. Therefore, the three-dimensional shape of the vocal tract (not the one-dimensional shape of the area function) should be taken into account to model the frication noise (Shadle, 1985 , 1991 ).

3. Resonance Tuning

As an interesting application of the source-filter theory, “resonance tuning” (Sundberg, 1989 ) is illustrated. In female speech, the first and the second formants lie between 300 and 900 Hz and between 900 and 2,800 Hz, respectively. In soprano singing, the vocal pitch can reach to these two ranges. To increase the efficiency of the vocalization at high f o , a soprano singer adjusts the shape of the vocal tract to tune the first or second resonance ( R 1 or R 2 ) to the fundamental frequency f o . When one of the harmonics of the f o coincides with a formant resonance, the resulting acoustic power (and musical success) is enhanced.

Figure 4. Resonance tuning. (a) The same transfer function as figure 3 (b). (b) Power spectrum of the source sound, whose fundamental frequency f o is tuned to the first resonance R 1 of the vocal tract. (c) Power spectrum of the speech signal generated from the source-filter theory. (d) Dependence of the amplification rate (i.e., power ratio between the output speech and the input source) on the fundamental frequency f o .

Figure 4 shows an example of the resonance tuning, in which the fundamental frequency is tuned to the first resonance R 1 of the vowel /a/ as f o = 805 Hz . As recognized in the output speech spectrum (figure 4 (c) ), the vocal tract filter strongly amplifies the fundamental frequency component of the vocal source, while the other harmonics are attenuated. Since only a single frequency component is emphasized, the output speech sounds like a pure tone. Figure 4 (d) shows dependence of the amplification ratio (i.e., the power ratio between the output speech and the input source) on the fundamental frequency f o . Indeed, the power of the output speech is maximized at the resonance tuning point of f o = 805 Hz . Without losing the source power, loud voices can be produced with less effort from the singers and, moreover, they are well perceived in a large concert hall over the orchestra (Joliveau et al., 2004 ).

Despite the significant increase in loudness, comprehensibility is sacrificed. With a strong enhancement of the fundamental frequency f o , its higher harmonics are weakened considerably, making it difficult to perceive the formant structure (figure 4 (c) ). This explains why it is difficult to identify words sung in the high range by sopranos.

The resonance tuning discussed here has been based on the linear convolution of the source and the filter, which are assumed to be independent from each other. In reality, however, the source and the filter interact with each other. Depending upon the acoustic property of the vocal tract, it facilitates the vocal fold oscillations and makes the vocal source stronger. Consequently, this source-filter interaction can make the output speech sound even louder in addition to the linear resonance effect. Such interaction will be explained in more detail in section 4 .

It should be of interest to note that some animals such as songbirds and gibbons utilize the technique of resonance tuning in their vocalizations (Koda et al., 2012 ; Nowicki, 1987 ; Riede et al., 2006 ). It has been found through X-ray filming as well as via heliox experiments that these animals adjust the vocal tract resonance to track the fundamental frequency f o . This may facilitate the acoustic communication by increasing the loudness of their vocalization. Again, higher harmonic components, which are needed to emphasize the formants in human language communications, are suppressed. Whether the animals utilize formants information in their communications is under debate (Fitch, 2010 ; Lieberman, 1977 ) but, at least in this context, production of a loud sound is more advantageous for long-distance alarm calls and pure-tone singing of animals.

4. Source-Filter Interaction

The linear source–filter theory, under which speech is represented as a convolution of the source and the filter, is based upon the assumption that the vocal fold vibrations as well as the turbulent noise sources are only weakly influenced by the vocal tract. Such an assumption is, however, valid mostly for male adult speech. The actual process of speech production is nonlinear. The vocal fold oscillations are due to combined effects of pressure, airflow, tissue elasticity, and tissue collision. It is natural that such a complex system obeys nonlinear equations of motion. Aerodynamics inside the glottis and the vocal tract is also governed by nonlinear equations in a strict sense. Moreover, there exists a mutual interaction between the source and the filter (Flanagan, 1968 ; Lucero et al., 2012 ; Rothenberg, 1981 ; Titze, 2008 ; Titze & Alipour, 2006 ). First, the source sound, which is generated from the vocal folds, is influenced by the vocal tract, since the vocal tract determines pressure above the vocal folds to change the aerodynamics of the glottal flow. As described in section 2.3 , the turbulent source is also very sensitive to the vocal tract geometry. Second, the source sound, which then propagates through the vocal tract, is not only radiated from the mouth but is also partially reflected back to the glottis through the vocal tract. Such reflection can influence the vocal fold oscillations, especially when the fundamental frequency or its harmonics is closely located to one of the vocal tract resonances, for instance, in singing. The strong acoustic feedback makes the interrelation between the source and the filter nonlinear and induces various voice instabilities, for example., sudden pitch jump, subharmonics, resonance, quenching, and chaos (Hatzikirou et al., 2006 ; Lucero et al., 2012 ; Migimatsu & Tokuda, 2019 ; Titze et al., 2008 ).

Figure 5. Example of a glissando singing. A male subject glided the fundamental frequency ( f o ) from 120 Hz to 350 Hz and then back. The first resonance ( R 1 = 270 Hz ) is indicated by a black bold line. The pitch jump occurred when f o crossed R 1 .

Figure 5 shows a spectrogram that demonstrates such pitch jump. The horizontal axis represents time, while the vertical axis represents spectral power of a singing voice. In this recording, a male singer glided his pitch in a certain frequency range. Accordingly, the fundamental frequency increases from 120 Hz to 350 Hz and then decreases back to 120 Hz. Around 270Hz, the fundamental frequency or its higher harmonics crosses one of the resonances of the vocal tract (black bold line of figure 5 ), and it jumps abruptly. At such frequency crossing point, acoustic reflection from the vocal tract to the vocal folds becomes very strong and non-negligible. The source-filter interaction has two aspects (Story et al., 2000 ). On one side, the vocal tract acoustics facilitates the vocal fold oscillations and contributes to the production of a loud vocal sound as discussed in the resonance tuning (section 3 ). On the other side, the vocal tract acoustics inhibits the vocal fold oscillations and consequently induces a voice instability. For instance, the vocal folds oscillation can stop suddenly or spontaneously jump to another fundamental frequency as exemplified by the glissando singing of figure 5 . To avoid such voice instabilities, singers must weaken the level of the acoustic coupling, possibly by adjusting the epilarynx, whenever the frequency crossing takes place (Lucero et al., 2012 ; Titze et al., 2008 ).

5. Conclusions

Summarizing, the source-filter theory has been described as a basic framework to model human speech production. The source is generated from the vocal fold oscillations and/or the turbulent airflows developed above the glottis. The vocal tract functions as a filter to modify the spectral structure of the source sounds. This filtering mechanism has been explained in terms of the resonances of the acoustical tube. Independence between the source and the filter is vital for language-based acoustic communications in humans, which require flexible maneuvering of the vocal tract configuration to express various phonemes sequentially and smoothly (Fitch, 2010 ; Lieberman, 1977 ). As an application of the source-filter theory, resonance tuning is explained as a technique utilized by soprano singers and some animals. Finally, existence of the source-filter interaction has been described. It is inevitable that the source sound is aerodynamically influenced by the vocal tract, since they are closely located to each other. Moreover, acoustic pressure wave reflecting back from the vocal tract to the glottis influences the vocal fold oscillations and can induce various voice instabilities. The source-filter interaction may become strong when the fundamental frequency or its higher harmonics crosses one of the vocal tract resonances, for example, in singing.

Further Reading

  • Atal, B. S. , & Schroeder, M. (1978). Linear prediction analysis of speech based on a pole-zero representation. The Journal of the Acoustical Society of America , 64 (5), 1310–1318.
  • Chiba, T. , & Kajiyama, M. (1941). The vowel: Its nature and structure . Tokyo, Japan: Kaiseikan.
  • Fant, G. (1960). Acoustic theory of speech production . The Hague, The Netherlands: Mouton.
  • Lieberman, P. (1977). Speech physiology and acoustic phonetics: An introduction . New York: Macmillan.
  • Markel, J. D. , & Gray, A. J. (2013). Linear prediction of speech (Vol. 12). New York: Springer Science & Business Media.
  • Stevens, K. N. (1999). Acoustic phonetics . Cambridge, MA: MIT Press.
  • Sundberg, J. (1989). The science of singing voice . DeKalb, IL: Northern Illinois University Press.
  • Titze, I. R. (1994). Principles of voice production . Englewood Cliffs, NJ: Prentice Hall.
  • Titze, I. R. , & Alipour, F. (2006). The myoelastic aerodynamic theory of phonation . Iowa, IA: National Center for Voice and Speech.
  • Fant, G. , Liljencrants, J. , & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory. Quarterly Progress and Status Report , 26 (4), 1–13.
  • Fitch, W. T. (2010). The evolution of language . Cambridge, UK: Cambridge University Press.
  • Flanagan, J. L. (1968). Source-system interaction in the vocal tract. Annals of the New York Academy of Sciences , 155 (1), 9–17.
  • Hatzikirou, H. , Fitch, W. T. , & Herzel, H. (2006). Voice instabilities due to source-tract interactions. Acta Acoustica United With Acoustica , 92 , 468–475.
  • Joliveau, E. , Smith, J. , & Wolfe, J. (2004). Acoustics: Tuning of vocal tract resonance by sopranos. Nature , 427 (6970), 116.
  • Koda, H. , Nishimura, T. , Tokuda, I. T. , Oyakawa, C. , Nihonmatsu, T. , & Masataka, N. (2012). Soprano singing in gibbons. American Journal of Physical Anthropology , 149 (3), 347–355.
  • Lucero, J. C. , Lourenço, K. G. , Hermant, N. , Van Hirtum, A. , & Pelorson, X. (2012). Effect of source–tract acoustical coupling on the oscillation onset of the vocal folds. The Journal of the Acoustical Society of America , 132 (1), 403–411.
  • Migimatsu, K. , & Tokuda, I. T. (2019). Experimental study on nonlinear source–filter interaction using synthetic vocal fold models. The Journal of the Acoustical Society of America , 146 (2), 983–997.
  • Nowicki, S. (1987). Vocal tract resonances in oscine bird sound production: Evidence from birdsongs in a helium atmosphere. Nature , 325 (6099), 53–55.
  • Riede, T. , Suthers, R. A. , Fletcher, N. H. , & Blevins, W. E. (2006). Songbirds tune their vocal tract to the fundamental frequency of their song. Proceedings of the National Academy of Sciences , 103 (14), 5543–5548.
  • Rothenberg, M. (1981). The voice source in singing. In J. Sundberg (Ed.), Research aspects on singing (pp. 15–33). Stockholm, Sweden: Royal Swedish Academy of Music.
  • Shadle, C. H. (1985). The acoustics of fricative consonants [Doctoral thesis]. Cambridge, MA: Massachusetts Institute of Technology, released as MIT-RLE Technical Report No. 506.
  • Shadle, C. H. (1991). The effect of geometry on source mechanisms of fricative consonants. Journal of Phonetics , 19 (3–4), 409–424.
  • Sondhi, M. , & Schroeter, J. (1987). A hybrid time-frequency domain articulatory speech synthesizer. IEEE Transactions on Acoustics, Speech, and Signal Processing , 35 (7), 955–967.
  • Stevens, K. N. (2005). The acoustic/articulatory interface. Acoustical Science and Technology , 26 (5), 410–417.
  • Story, B. H. , Laukkanen, A.M. , & Titze, I. R. (2000). Acoustic impedance of an artificially lengthened and constricted vocal tract. Journal of Voice , 14 (4), 455–469.
  • Story, B. H. , Titze, I. R. , & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America , 100 (1), 537–554.
  • Sundberg, J. (1989). The science of singing voice . DeKlab, IL: Northern Illinois University Press.
  • Titze, I. R. (2008). Nonlinear source–filter coupling in phonation: Theory. The Journal of the Acoustical Society of America , 123 (4), 1902–1915.
  • Titze, I. , Riede, T. , & Popolo, P. (2008). Nonlinear source–filter coupling in phonation: Vocal exercises. The Journal of the Acoustical Society of America , 123 (4), 1902–1915.

Related Articles

  • Articulatory Phonetics
  • Child Phonology
  • Speech Perception in Phonetics
  • Direct Perception of Speech
  • Phonetics of Singing in Western Classical Style
  • Phonetics of Vowels
  • Phonetics of Consonants
  • Audiovisual Speech Perception and the McGurk Effect
  • The Motor Theory of Speech Perception
  • Articulatory Phonology
  • The Phonetics of Prosody
  • Tongue Muscle Anatomy: Architecture and Function

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 15 April 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • [66.249.64.20|185.66.14.236]
  • 185.66.14.236

Character limit 500 /500

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies

Linguistics

  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Speech Production

Introduction.

  • Historical Studies
  • Animal Studies
  • Evolution and Development
  • Functional Magnetic Resonance and Positron Emission Tomography
  • Electroencephalography and Other Approaches
  • Theoretical Models
  • Speech Apparatus
  • Speech Disorders

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Acoustic Phonetics
  • Animal Communication
  • Articulatory Phonetics
  • Biology of Language
  • Clinical Linguistics
  • Cognitive Mechanisms for Lexical Access
  • Cross-Language Speech Perception and Production
  • Dementia and Language
  • Early Child Phonology
  • Interface Between Phonology and Phonetics
  • Khoisan Languages
  • Language Acquisition
  • Speech Perception
  • Speech Synthesis
  • Voice and Voice Quality

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Cognitive Grammar
  • Edward Sapir
  • Teaching Pragmatics
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Speech Production by Eryk Walczak LAST REVIEWED: 17 April 2023 LAST MODIFIED: 22 February 2018 DOI: 10.1093/obo/9780199772810-0217

Speech production is one of the most complex human activities. It involves coordinating numerous muscles and complex cognitive processes. The area of speech production is related to Articulatory Phonetics , Acoustic Phonetics and Speech Perception , which are all studying various elements of language and are part of a broader field of Linguistics . Because of the interdisciplinary nature of the current topic, it is usually studied on several levels: neurological, acoustic, motor, evolutionary, and developmental. Each of these levels has its own literature but in the vast majority of speech production literature, each of these elements will be present. The large body of relevant literature is covered in the speech perception entry on which this bibliography builds upon. This entry covers general speech production mechanisms and speech disorders. However, speech production in second language learners or bilinguals has special features which were described in separate bibliography on Cross-Language Speech Perception and Production . Speech produces sounds, and sounds are a topic of study for Phonology .

As mentioned in the introduction, speech production tends to be described in relation to acoustics, speech perception, neuroscience, and linguistics. Because of this interdisciplinarity, there are not many published textbooks focusing exclusively on speech production. Guenther 2016 and Levelt 1993 are the exceptions. The former has a stronger focus on the neuroscientific underpinnings of speech. Auditory neuroscience is also extensively covered by Schnupp, et al. 2011 and in the extensive textbook Hickok and Small 2015 . Rosen and Howell 2011 is a textbook focusing on signal processing and acoustics which are necessary to understand by any speech scientist. A historical approach to psycholinguistics which also covers speech research is Levelt 2013 .

Guenther, F. H. 2016. Neural control of speech . Cambridge, MA: MIT.

This textbook provides an overview of neural processes responsible for speech production. Large sections describe speech motor control, especially the DIVA model (co-authored by Guenther). It includes extensive coverage of behavioral and neuroimaging studies of speech as well as speech disorders and ties them together with a unifying theoretical framework.

Hickok, G., and S. L. Small. 2015. Neurobiology of language . London: Academic Press.

This voluminous textbook edited by Hickok and Small covers a wide range of topics related to neurobiology of language. It includes a section devoted to speaking which covers neurobiology of speech production, motor control perspective, neuroimaging studies, and aphasia.

Levelt, W. J. M. 1993. Speaking: From intention to articulation . Cambridge, MA: MIT.

A seminal textbook Speaking is worth reading particularly for its detailed explanation of the author’s speech model, which is part of the author’s language model. The book is slightly dated, as it was released in 1993, but chapters 8–12 are especially relevant to readers interested in phonetic plans, articulating, and self-monitoring.

Levelt, W. J. M. 2013. A history of psycholinguistics: The pre-Chomskyan era . Oxford: Oxford University Press.

Levelt published another important book detailing the development of psycholinguistics. As its title suggests, it focuses on the early history of discipline, so readers interested in historical research on speech can find an abundance of speech-related research in that book. It covers a wide range of psycholinguistic specializations.

Rosen, S., and P. Howell. 2011. Signals and Systems for Speech and Hearing . 2d ed. Bingley, UK: Emerald.

Rosen and Howell provide a low-level explanation of speech signals and systems. The book includes informative charts explaining the basic acoustic and signal processing concepts useful for understanding speech science.

Schnupp, J., I. Nelken, and A. King. 2011. Auditory neuroscience: Making sense of sound . Cambridge, MA: MIT.

A general introduction to speech concepts with main focus on neuroscience. The textbook is linked with a website which provides a demonstration of described phenomena.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Linguistics »
  • Meet the Editorial Board »
  • Acceptability Judgments
  • Acquisition, Second Language, and Bilingualism, Psycholin...
  • Adpositions
  • African Linguistics
  • Afroasiatic Languages
  • Algonquian Linguistics
  • Altaic Languages
  • Ambiguity, Lexical
  • Analogy in Language and Linguistics
  • Applicatives
  • Applied Linguistics, Critical
  • Arawak Languages
  • Argument Structure
  • Artificial Languages
  • Australian Languages
  • Austronesian Linguistics
  • Auxiliaries
  • Balkans, The Languages of the
  • Baudouin de Courtenay, Jan
  • Berber Languages and Linguistics
  • Bilingualism and Multilingualism
  • Borrowing, Structural
  • Caddoan Languages
  • Caucasian Languages
  • Celtic Languages
  • Celtic Mutations
  • Chomsky, Noam
  • Chumashan Languages
  • Classifiers
  • Clauses, Relative
  • Cognitive Linguistics
  • Colonial Place Names
  • Comparative Reconstruction in Linguistics
  • Comparative-Historical Linguistics
  • Complementation
  • Complexity, Linguistic
  • Compositionality
  • Compounding
  • Computational Linguistics
  • Conditionals
  • Conjunctions
  • Connectionism
  • Consonant Epenthesis
  • Constructions, Verb-Particle
  • Contrastive Analysis in Linguistics
  • Conversation Analysis
  • Conversation, Maxims of
  • Conversational Implicature
  • Cooperative Principle
  • Coordination
  • Creoles, Grammatical Categories in
  • Critical Periods
  • Cyberpragmatics
  • Default Semantics
  • Definiteness
  • Dene (Athabaskan) Languages
  • Dené-Yeniseian Hypothesis, The
  • Dependencies
  • Dependencies, Long Distance
  • Derivational Morphology
  • Determiners
  • Dialectology
  • Distinctive Features
  • Dravidian Languages
  • Endangered Languages
  • English as a Lingua Franca
  • English, Early Modern
  • English, Old
  • Eskimo-Aleut
  • Euphemisms and Dysphemisms
  • Evidentials
  • Exemplar-Based Models in Linguistics
  • Existential
  • Existential Wh-Constructions
  • Experimental Linguistics
  • Fieldwork, Sociolinguistic
  • Finite State Languages
  • First Language Attrition
  • Formulaic Language
  • Francoprovençal
  • French Grammars
  • Gabelentz, Georg von der
  • Genealogical Classification
  • Generative Syntax
  • Genetics and Language
  • Grammar, Categorial
  • Grammar, Construction
  • Grammar, Descriptive
  • Grammar, Functional Discourse
  • Grammars, Phrase Structure
  • Grammaticalization
  • Harris, Zellig
  • Heritage Languages
  • History of Linguistics
  • History of the English Language
  • Hmong-Mien Languages
  • Hokan Languages
  • Humor in Language
  • Hungarian Vowel Harmony
  • Idiom and Phraseology
  • Imperatives
  • Indefiniteness
  • Indo-European Etymology
  • Inflected Infinitives
  • Information Structure
  • Interjections
  • Iroquoian Languages
  • Isolates, Language
  • Jakobson, Roman
  • Japanese Word Accent
  • Jones, Daniel
  • Juncture and Boundary
  • Kiowa-Tanoan Languages
  • Kra-Dai Languages
  • Labov, William
  • Language and Law
  • Language Contact
  • Language Documentation
  • Language, Embodiment and
  • Language for Specific Purposes/Specialized Communication
  • Language, Gender, and Sexuality
  • Language Geography
  • Language Ideologies and Language Attitudes
  • Language in Autism Spectrum Disorders
  • Language Nests
  • Language Revitalization
  • Language Shift
  • Language Standardization
  • Language, Synesthesia and
  • Languages of Africa
  • Languages of the Americas, Indigenous
  • Languages of the World
  • Learnability
  • Lexical Access, Cognitive Mechanisms for
  • Lexical Semantics
  • Lexical-Functional Grammar
  • Lexicography
  • Lexicography, Bilingual
  • Linguistic Accommodation
  • Linguistic Anthropology
  • Linguistic Areas
  • Linguistic Landscapes
  • Linguistic Prescriptivism
  • Linguistic Profiling and Language-Based Discrimination
  • Linguistic Relativity
  • Linguistics, Educational
  • Listening, Second Language
  • Literature and Linguistics
  • Machine Translation
  • Maintenance, Language
  • Mande Languages
  • Mass-Count Distinction
  • Mathematical Linguistics
  • Mayan Languages
  • Mental Health Disorders, Language in
  • Mental Lexicon, The
  • Mesoamerican Languages
  • Minority Languages
  • Mixed Languages
  • Mixe-Zoquean Languages
  • Modification
  • Mon-Khmer Languages
  • Morphological Change
  • Morphology, Blending in
  • Morphology, Subtractive
  • Munda Languages
  • Muskogean Languages
  • Nasals and Nasalization
  • Niger-Congo Languages
  • Non-Pama-Nyungan Languages
  • Northeast Caucasian Languages
  • Oceanic Languages
  • Papuan Languages
  • Penutian Languages
  • Philosophy of Language
  • Phonetics, Acoustic
  • Phonetics, Articulatory
  • Phonological Research, Psycholinguistic Methodology in
  • Phonology, Computational
  • Phonology, Early Child
  • Policy and Planning, Language
  • Politeness in Language
  • Positive Discourse Analysis
  • Possessives, Acquisition of
  • Pragmatics, Acquisition of
  • Pragmatics, Cognitive
  • Pragmatics, Computational
  • Pragmatics, Cross-Cultural
  • Pragmatics, Developmental
  • Pragmatics, Experimental
  • Pragmatics, Game Theory in
  • Pragmatics, Historical
  • Pragmatics, Institutional
  • Pragmatics, Second Language
  • Prague Linguistic Circle, The
  • Presupposition
  • Psycholinguistics
  • Quechuan and Aymaran Languages
  • Reading, Second-Language
  • Reciprocals
  • Reduplication
  • Reflexives and Reflexivity
  • Register and Register Variation
  • Relevance Theory
  • Representation and Processing of Multi-Word Expressions in...
  • Salish Languages
  • Sapir, Edward
  • Saussure, Ferdinand de
  • Second Language Acquisition, Anaphora Resolution in
  • Semantic Maps
  • Semantic Roles
  • Semantic-Pragmatic Change
  • Semantics, Cognitive
  • Sentence Processing in Monolingual and Bilingual Speakers
  • Sign Language Linguistics
  • Sociolinguistics
  • Sociolinguistics, Variationist
  • Sociopragmatics
  • Sound Change
  • South American Indian Languages
  • Specific Language Impairment
  • Speech, Deceptive
  • Speech Production
  • Switch-Reference
  • Syntactic Change
  • Syntactic Knowledge, Children’s Acquisition of
  • Tense, Aspect, and Mood
  • Text Mining
  • Tone Sandhi
  • Transcription
  • Transitivity and Voice
  • Translanguaging
  • Translation
  • Trubetzkoy, Nikolai
  • Tucanoan Languages
  • Tupian Languages
  • Usage-Based Linguistics
  • Uto-Aztecan Languages
  • Valency Theory
  • Verbs, Serial
  • Vocabulary, Second Language
  • Vowel Harmony
  • Whitney, William Dwight
  • Word Classes
  • Word Formation in Japanese
  • Word Recognition, Spoken
  • Word Recognition, Visual
  • Word Stress
  • Writing, Second Language
  • Writing Systems
  • Zapotecan Languages
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|185.66.14.236]
  • 185.66.14.236

Psycholinguistics/Development of Speech Production

  • 1 Introduction
  • 2.1 Stage 1: Reflexive Vocalization
  • 2.2 Stage 2: Gooing, Cooing and Laughing
  • 2.3 Stage 3: Vocal Play
  • 2.4 Stage 4: Canonical babbling
  • 2.5 Stage 5: Integration
  • 3.1 Patterns of Speech
  • 3.2.1 Definition of Error Patterns
  • 3.3 Factors affecting development of phonology
  • 4.1 First Words
  • 4.2 Vocabulary Spurt
  • 4.3 Semantic Errors
  • 5.1.1 Two-word utterances
  • 5.2 Syntactic Errors
  • 7 Learning Exercise
  • 8 Learning Exercise Answers
  • 9 References

Introduction [ edit | edit source ]

Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. However, we rarely consider how this ability develops. We know infants often begin producing one-word utterances, such as "mama," eventually move to two-word utterances, such as "gimme toy" and finally sound like an adult. However, the process itself involves development not only of the vocal sounds (phonology), but also semantics (meaning of words), morphology and syntax (rules and structure). How do children learn to this complex ability? Considering that an infant goes from an inability to speak to two-word utterances within 2 years, the accelerated development pattern is incredible and deserves some attention. When we ponder children's speech production development more closely, we begin to ask more questions. How does a child who says "tree" for "three" eventually learn to correct him/herself? How does a child know "nana" (banana) is the yellow,boat-shaped fruit he/she enjoys eating? Why does a child call all four-legged animals "horsie"? Why does this child say "I goed to the kitchen"? What causes a child to learn words such as "doggie" before "hand"? This chapter will address these questions and focus on the four areas of speech development mentioned: phonology, semantics, and morphology and syntax.

Prelinguistic Speech Development [ edit | edit source ]

Throughout infancy, vocalizations develop from automatic, reflexive vocalizations with no linguistic meaning to articulated words with meaning and intonation. In this section, we will examine the various stages an infant goes through while developing speech. In general, researchers seem to agree that as infants develop they increase their speech-like vocalizations and decrease their non-speech vocalizations (Nathani, Ertmer, & Stark) [1] . Many researchers (Oller, ; [2] Stark, as cited in Nathani, Ertmer, & Stark, 2006) [1] . Many researchers (Oller; [2] Stark, as cited in Nathani, Ertmer, & Stark,) [1] have documented this development and suggest growth through the following five stages: reflexive vocalizations, cooing and laughing, vocal play (expansion stage) , canonical babbling and finally, the integration stage.

Stage 1: Reflexive Vocalization [ edit | edit source ]

stages of speech production

As newborns, infants make noises in responses to their environment and current needs. These reflexive vocalizations may consist of crying or vegetative sounds such as grunting, burping, sneezing, and coughing (Oller) [2] . Although it is often thought that infants of this age do not show evidence of linguistic abilities, a recent study has found that newborns’ cries follow the melody of their surrounding language input (Mampe, Friederici, Christophe, & Wermke) [3] . They discovered that the French newborns’ pattern was a rising contour, where the melody of the cry rose slowly and then quickly decreased. In comparison, the German newborns’ cry pattern rose quickly and slowly decreased. These patterns matched the intonation patterns that are found in each of the respective spoken languages. Their finding suggest that perhaps infants vocalizations are not exclusively reflexive and may contain patterns of their native language.

Stage 2: Gooing, Cooing and Laughing [ edit | edit source ]

Between 2 and 4 months, infants begin to produce “cooing” and “gooing” to demonstrate their comfort states. These sounds may often take the form of vowel-like sounds such as “aah” or “oooh.” This stage is often associated with a happy infant as laughing and giggling begin and crying is reduced. Infants will also engage in more face-to-face interactions with their caregivers, smiling and attempting to make eye contact (Oller) [2] .

Stage 3: Vocal Play [ edit | edit source ]

From 4 to 6 months, and infants will attempt to vary the sounds they can produce using their developing vocal apparatus. They show a desire to explore and develop new sounds which may include yells, squeals, growls and whispers(Oller) [2] . Face-to-face interactions are still important at this stage as it promotes development of conversation abililities. Beebe, Alson, Jaffe et al. [4] found that even at this young age, infants’ vocal expression show a “ dialogic structure ” - meaning that, during interactions with caregivers, infants were able to take turns vocalizing.

Stage 4: Canonical babbling [ edit | edit source ]

After 6 months, infants begin to make and combine sounds that are found in their native language, sometimes known as “well-formed syllables,” which are often replicated in their first words(Oller) [2] . During this stage, infants combine consonants and vowels and replicate them over and over - they are thus called reduplicated babble . For example, an infant may produce ‘ga-ga’ over and over. Eventually, infants will begin to string together multiple varied syllables, such as ‘gabamaga’, called variegated babbles . Other times, infants will move right into the variegated babbles stage without evidence of the reduplicated babbles (Oller) [2] . Early in this stage, infants do not produce these sounds for communicative purposes. As they move closer to pronouncing their first words, they may begin to use use sounds for rudimentary communicative purposes(Oller) [2] .

Stage 5: Integration [ edit | edit source ]

stages of speech production

In the final stage of prelinguistic speech, 10 month-old infants use intonation and stress patterns in their babbling syllables, imitating adult-like speech. This stage is sometimes known as conversational babble or gibberish because infants may also use gestures and eye movements which resemble conversations(Oller) [2] . Interestingly, they also seem to have acoustic differences in their vocalizations depending on the purpose of their communication. Papaeliou and Trevarthen [5] found that when they were communicating for social purposes they used a higher pitch and were more expressive in their vocalizations and gestures than when exploring and investigating their surroundings. The transition from gibberish to real words is not obvious(Oller) [2] as this stage often overlaps with the acquisition of an infant’s first words. These words begin when an infant understands that the sounds produced are associated with an object .During this stage, infants develop vocal motor schemes , the consistent production of certain consonants in a certain period of time. Keren-Portnoy and Marjorano’s [6] study showed that these vocal motor schemes play a significant part in the development of first words as children who children who mastered them earlier, produced words earlier. These consistent consonants were used in babble and vocal motor schemes, and would also be present in a child’s first words. Evidence that a child may understand the connection between context and sounds is shown when they make consistent sound patterns in certain contexts (Oller) [2] . For example, a child may begin to call his favorite toy “mub.” These phonetically-consistent sound patterns, known as protowords or quasi-words , do not always reflect real words, but they are an important step towards achieving adult-like speech(Otomo [7] ; Oller) [2] . Infants may also use their proto-words to represent an entire sentence (Vetter) [8] . For example, the child may say “mub” but may be expressing “I want my toy”, “Give me back my toy” “Where is my toy?”, etc.

Phonological Development [ edit | edit source ]

When a child explicitly pronounces their first word they have understood the association between sounds and their meaning Yet, their pronunciation may be poor, they produce phonetic errors, and have yet to produce all the sound combinations in their language. Researchers have come up with many theories about the patterns and rules children and infants use while developing their language. In this section, we will examine some frequent error patterns and basic rules children use to articulate words. We will also look how phonological development can be enhanced.

Patterns of Speech [ edit | edit source ]

Depending on their personalities and individual development, infants develop their speech production slightly differently. Some children, productive learners , attempt any word regardless of proper pronunciation (Rabagaliati, Marcus, & Pylkkänen) [9] . Conservative learners (Rabagaliati, Marcus, & Pylkkänen) [9] , are hesitant until they are confident in their pronunciation. Other differences include preference to use nouns and name things versus use of language in a more social context. (Bates et al., as cited in Smits-Bandstra) [10] . Although infants vary in their first words and the development of their phonology, by examining the sound patterns found in their early language, researchers have extracted many similar patterns. For example, McIntosh and Dodd [11] examined these patterns in 2 year olds and found that they were able to produce multiple phonemes but were lacking [ ʃ , θ , tʃ , dʒ , r ]. They were also able to produce complex syllables. Vowel errors also occurred, although consonant errors are much more prevalent. The development of phonemes continues throughout childhood and many are not completely developed until age 8 (Vetter) [8] .

Phonological Errors [ edit | edit source ]

As a child pronounces new words and phonemes, he/she may produce various errors that follow patterns. However, all errors will reduce with age (McIntosh & Dodd) [11] . Although each child does not necessarily produce the same errors, errors can typically be categorized into various groups. For example, they are multiple kinds of consonant errors. A cluster reduction involves reducing a multiple consonants in a row (ie: skate). Most often, a child will skip the first consonant (thus skate becomes kate), or they may leave out the second stop consonant ( consonant deletion - Wyllie-Smith, McLeod, & Ball) [12] (thus skate becomes sate). This type of error has been found by McIntosh and Dodd [11] . For words that have multiple syllables, a child may skip the unstressed syllable at the beginning of the sentence (ie: potato becomes tato) or in the middle of a sentence (ie: telephone becomes tephone) (Ganger & Brent) [13] . This omission may simply be due to the properties of unstressed syllables as they are more difficult to perceive and thus a child may simply lack attention to it. As a child grows more aware of the unstressed syllable, he/she may chose to insert a dummy syllable in place of the unstressed syllable to attempt to lengthen the utterance (Aoyama, Peters, & Winchester [14] ). For example, a child may say [ə hat] (‘ə hot’) (Clark, as cited in Smits-Bandstra) [10] . Replacement shows that the child understands that there should be some sound there, but the child has inserted the wrong one. Another common phonological error pattern is assimilation . A child may pronounce a word such that a phoneme within that word sounds more like another phoneme near it (McIntosh & Dodd) [11] . For example, a child may say “”gug” instead of “bug”. This kind of error may also be seen for with vowels and is common in 2 year-olds, but decreases with age (Newton) [15] .

Definition of Error Patterns [ edit | edit source ]

Definition of error pattern

Factors affecting development of phonology [ edit | edit source ]

stages of speech production

As adequate phonology is an important aspect in effective communication, researchers are interested in factors that can enhance it. In a study done by Goldstein and Schwade [16] , it was found that interactions with caregivers provided opportunities for8-10 month old infants to increase their babbling of language sounds (consonant-vowel syllables and vowels). This study also found that infants were not simply imitations their caregivers vocalizations as they were producing various phonological patterns and had longer vocalizations! Thus, it would seem that social feedback from caregivers advances infants phonological development. On the other hand, factors such as hearing impairment, can negatively affect phonological development (Nicolaidis [17] ). A Greek population with hearing impairments was compared to a control group and it was found that they have a different pattern of pronunciation of phonemes. Their pattern displayed substitutions (ie:[x] for target /k/), distortions (ie: place of articulation)and epenthesis/cluster production (ie:[ʃtʃ] or [jθ] for /s/) of words.

Semantic Development [ edit | edit source ]

When children purposefully use words they are trying to express a desire, refusal, a label or for social communication (Ninio & Snow ) [18] . As a child begins to understand that each word has a specific purpose, they will inevitably need to learn meaning of multiple words. Their vocabulary will rapidly expand as they experience various social contexts, sing songs, practice routines and through direct instruction at school (Smits-Bandstra, 2006) [19] . In this section, we will examine children’s first words, their vocabulary spurt, and what their semantic errors are like.

First Words [ edit | edit source ]

Many studies have analyzed the types of words found in early speech. Overall, children’s first words are usually shorter in syllabic length, easier to pronounce, and occur frequently in everyday speech (Storkel, 2004 [20] ). Whether early vocabularies have a noun-bias or not tends to divide researchers. Some researchers argue that the noun bias, or children’s tendency to produce names for objects, people and animals, is sufficient evidence of this bias (Gllette et al.) [21] . However, this bias may not be entirely accurate. Recently, Tardif [22] studied first words cross-culturally between English, Cantonese and Mandarin 8-16 month old infants and found interesting differences. Although all children used terms for people, there was much variation between languages for animals and objects. This suggests that there may be some language differences in which types of words children acquire first.

Vocabulary Spurt [ edit | edit source ]

stages of speech production

Around the age of 18 months, many infants will undergo a vocabulary spurt , or vocabulary explosion , where they learn new words at an increasingly rapid rate (Smits-Bandstra) [10] ; Mitchell & McMurray,2009 [23] . Before onset of this spurt, the first 50 words a child learned as usually acquired at a gradual rate (Plunkett, as cited in Smits-Bandstra) [10] .Afterward the spurt, some studies have found upwards of 20 words learned per week( Mitchell and McMurray) [23] . There has been a lot of speculation about the process underlying the vocabulary spurt and there are three main theories. First, it has been suggested that the vocabulary spurt results from the naming insight (Reznick and Gldfield) [24] . The naming insight is a process where children begin to understand that referents can be labeled, either out of context or in place of the object. Second, this period seems to coincide with Piaget’s sensorimotor stage in which children are expanding their understanding of categorizing concepts and objects. Thus, children would necessarily need to expand their vocabulary to label categories (Gopnik) [25] . Finally, it has been suggested that leveraged learning may facilitate the vocabulary explosion (Mitchell & McMurray) [23] . Learning any word begins slowly - one word is learned, which acts as a ‘leverage’ to learn the next word, then those two words can each facilitates learning a new word, and so on. Learning therefore becomes easier. It is possible that not all children experience a vocabulary spurt, however. Some researchers have tested to determine whether there truly is an accelerated learning process. Interestingly, Ganger and Brent [13] used a mathematical model and found that only a minority of the infants studied fit the criteria of a growth spurt. Thus the growth spurt may not be as common as once believed.

Semantic Errors [ edit | edit source ]

Even after a child has developed a large vocabulary; errors are made in selecting words to convey the desired meaning. One type of improper word selection is when children invent a word (called lexical innovation ). This is usually because they have not yet learned a word associated with the meaning they are trying to express, or they simply cannot retrieve it properly. Although made-up words are not real words, it is fairly easy to figure out what a child means, and sometimes easier to remember than the traditional words (Clarke, as cited in Swan) [26] . For example, a child may say “pourer” for “cup” (Clarke, as cited in Swan) [26] .These lexical innovations show that the child is able to understand derivational morphology and use it creatively and productively (Swan) [26] .

Sometimes children may use a word in an inappropriate context either extending or restricting use of the word. For example, a child says “doggie” while pointing to any four-legged animal - this is known as overextension and is most common in 1-2 year olds (McGregor, et al. [27] Bloomquist; [28] Bowerman; [29] Jerger & Damian) [30] . Other times, children may use a word only in one specific context, this is called underextension (McGregor, et al. [27] Bloomquist; [28] Bowerman; [29] Jerger & Damian) [30] . For example, they may only say “baba” for their bottle and not another infant’s bottle. Semantic errors manifest themselves in naming tasks and provide an opportunity to examine how children might organize semantic representations. In McGregor et al.’s [27] naming pictures task for 3-5 year olds, errors were most often related to functional or physical properties (ie: saying chair for saddle). Why are such errors produced? McGregor et al. [27] proposed three reasons for these errors:

Grammatical and Morphological Development [ edit | edit source ]

As children develop larger lexicons, they begin to combine words into sentences that become progressively long and complex, demonstrating their syntactic development. Longer utterances provide evidence that children are reaching an important milestone in beginning the development of morphosyntax (Aoyama et al.) [14] . Brown [31] developed a method that would measure syntactic growth called mean length of the utterance (MLU) . It is determined by recording or listening to a 30-minute sample of a child’s speech, counting the number of meaningful morphemes (semantic roles – see chart below) and dividing it by the number of utterances. Meaningful morphemes can be function words (ie: “of” ), content words (ie: “cat”) or grammatical inflections (ie: -s). Utterances will include each separate thought conveyed thus repetitions, filler words, recitations, or titles and compound words would be counted as one utterance. Brown ended up with 5 different stages to describe syntactical development: Stage I (MLU 1.0-2.0), Stage II (MLU 2.0-2.5), Stage III (MLU 2.5-3.0), Stage IV (MLU 3.0-3.5) Stage V (MLU 3.5-4.0).

Semantic roles

What is this child's MLU? [ edit | edit source ]

Two-word utterances [ edit | edit source ].

Around the age of 18 months, children’s utterances are usually in two-word forms such as “want that, mommy do, doll fall, etc.” (Vetter [8] . In English, these forms are dominated by content words such as nouns, verbs and adjectives and are restricted to concepts that the child is learning based on their sensorimotor stage as suggested by Piaget (Brown) [31] . Thus, they will express relations between objects, actions and people. This type of speech is called telegraphic speech . During this development stage, children are combining words to convey various meanings. They are also displaying evidence of grammatical structure with consistent word orders and inflections.(Behrens & Gut; [32] Vetter) [8] .

Once the child moves from Stage 1, simple sentences begin to form and the child begins to use inflections and function words (Aoyama et al.) [14] . At this time, the child develops grammatical morphemes (Brown) [31] which are classified into 14 different categories organized by acquisition (See chart below).These morphemes modify the meaning of the utterance such as tense, plurality, possession, etc. There are two theories for why this particular order takes place. The frequency hypothesis suggests that children acquire the morphemes they hear most frequently in adult speech. Brown argued against this theory by analyzing adult speech where articles were the most common word form, yet children did not acquire articles quickly. He suggested that linguistic complexity may account for the order of acquisition where the less complex morphemes were acquired first. Complexity of the morphemes was determined based on semantics (meaning) and/or syntax (rules) of the morpheme. In other words, a morpheme with only one meaning such as plurality (-s) is easier to learn than the copula “is” (which encodes number and time the action occurs). Brown also suggested that for a child to have successfully mastered a grammatical morpheme, they must use it properly 90% of the time.

Syntactic Errors [ edit | edit source ]

As children begin to develop more complex sentences, they must learn to use to grammar rules appropriately too. This is difficult in English because of the prevalence of irregular rules. For example, a child may say, “I buyed my toy from the store.” This is known as an overregularization error . The child has understood that there are syntactic patterns and rules to follow, but overuses them, failing to realize that there are exceptions to rules. In the previous example, the child applied a regular part tense rule (-ed) to an irregular verb. Why do these errors occur? It may be that the child does not have a complete understanding of the word meaning and thus incorrectly selects it (Pinker, et al.) [33] . Brooks et al. [34] suggested that these errors may be categorization errors. For example, intransitive or transitive verbs appear in different contexts and thus the child is required to learn that certain verbs appear only in certain contextes. (Brooks) [34] . Interestingly, Hartshorne and Ullman [35] found a gender difference for overregularization errors. Girls were more than three times more likely than boys to produce overregularizations. They concluded that girls were more likely to overgeneralize associatively, whereas boys overgeneralized only through rule-governed methods. In other words, girls, who remember regular forms, better than boys, quickly associated their rule forms to similar sounding words (ie: fold-folded, mold-molded, but they would say hold becomes holded). Boys, on the other hand, will use the regular rule when they have difficulty retrieving the irregular form (ie: past tense form - ed added to irregular form run becomes runed) (Hartshorne & Ullman) [35] .

Another common error committed by children is omission of words from an utterance. These errors are especially prevalent in their early speech production, which frequently lack function words (Gerken, Landau, & Remez) [36] . For example, a child may say “dog eat bone” forgetting function words “the” and “a”.This type of error has been frequently studied and researchers have proposed three main theories to account for omissions. First, it may be that children may focus on words that have referents (Brown) [31] . For example, a child may focus on “car” or “ball”, rather than “jump” or “happy.” The second theory suggests children simply recognize the content words which have greater stress and emphasis (Brown) [31] . The final theory, suggested by Gerken [36] , involves an immature production system. In their study, children could perceive function words and classify them into various syntactic categories, yet still omitted them from their speech production.

Summary [ edit | edit source ]

In this chapter, the development of speech production was examined in the areas of prelinguistics , phonology , semantics , syntax and morphology . As an infant develops, their vocalizations will undergo a transition from reflexive vocalizations to speech-like sounds and finally words. However, their linguistic development does not end there. Infants underdeveloped speech apparatus restricts them from producing all phonemes properly and thus they produce errors such as consonant cluster reduction , omissions of syllables and assimilation . At 18 months, many children seem to undergo a vocabulary spurt . Even with a larger vocabulary, children may also overextend (calling a horse a doggie) or underextend (not calling the neighbors’ dog, doggie) their words. When a child begins to combine words, they are developing syntax and morphology. Syntactic development is measured using mean length of the utterance (MLU) which is categorized into 5 stages (Brown) [31] . After stage II, children begin to use grammatical morphemes (ie: -ed, -s, is) which encode tense, plurality, etc. As with other areas of linguistic development, children also produce errors such as overregularization (ie: “I buyed it”) or omissions (ie: “dog eat bone”). In spite of children’s early errors patterns, children will eventually develop adult-like speech with few errors. Understanding and studying child language development is an important area of research as it may give us insight into underlying processes of language as well as how we might be able to facilitate it or treat individuals with language difficulties.

Learning Exercise [ edit | edit source ]

1. Watch the video clips of a young boy CC provided below.

Video 1 Video 2 Video 3 Video 4 Video 5

2. The following is a transcription of conversations between a mother (*MOT) and a child (*CHI) from Brown's (1970) corpus. You can ignore the # symbol as it represents unintelligible utterances. Use the charts found in the section on " Grammatical and Morphological Development " to help answer this question.

  • Possessive morphemes ('s)
  • Present progressive (-ing)
  • MOT: let me see .
  • MOT: over here +...
  • MOT: you have tapioca on your finger .
  • CHI: tapioca finger .
  • MOT: here you go .
  • CHI: more cookie .
  • MOT: you have another cookie right on the table .
  • CHI: Mommy fix .
  • MOT: want me to fix it ?
  • MOT: alright .
  • MOT: bring it here .
  • CHI: bring it .
  • CHI: that Kathy .
  • MOT: yes # that's Kathy .
  • CHI: op(en) .
  • MOT: no # we'll leave the door shut .
  • CHI: why ?
  • MOT: because I want it shut .
  • CHI: Mommy .
  • MOT: I'll fix it once more and that's all .
  • CHI: Mommy telephone .
  • MOT: well # go and get your telephone .
  • MOT: yes # he gave you your telephone .
  • MOT: who are you calling # Eve ?
  • CHI: my telephone .
  • CHI: Kathy cry .
  • MOT: yes # Kathy was crying .
  • MOT: Kathy was unhappy .
  • MOT: what is that ?
  • CHI: letter .
  • MOT: Eve's letter .
  • CHI: Mommy letter .
  • MOT: there's Mommy's letter .
  • CHI: Eve letter .
  • CHI: a fly .
  • MOT: yes # a fly .
  • MOT: why don't you go in the room and kill a fly ?
  • MOT: you go in the room and kill a fly .
  • MOT: yes # you get a fly .
  • MOT: oh # what's that ?
  • MOT: I'm going to go in the basement # Eve .

3. Below are examples of children's speech. These children are displaying some characteristics of terms of we have covered in this chapter. The specfic terms found in each video are provided. Find examples of these terms within their associated video. Indicate which type of development (phonological, semantic, syntactic) is associated with each of these term.

5.The following are examples of children’s speech errors. Name the error and the type of development it is associated with (phonological, syntactic, morphological, or semantic). Can you explain why such an error occurs?

Learning Exercise Answers [ edit | edit source ]

Click here!

References [ edit | edit source ]

  • ↑ 1.0 1.1 1.2 Nathani, S., Ertmer, D. J., & Stark, R. E. (2006). Assessing vocal development in infants and toddlers. Clinical linguistics & phonetics, 20(5), 351-69.
  • ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 Oller, D.K.,(2000). The Emergence of the Speech Capacity. NJ: Lawrence Erlbaum Associates, Inc.
  • ↑ Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newbornsʼ cry melody is shaped by their native language. Current biology : CB, 19(23), 1994-7.
  • ↑ Beebe, B., Alson, D., Jaffe, J., Feldstein, S., & Crown, C. (1988). Vocal congruence in mother-infant play. Journal of psycholinguistic research, 17(3), 245-59.
  • ↑ Papaeliou, C. F., & Trevarthen, C. (2006). Prelinguistic pitch patterns expressing “communication” and “apprehension.” Journal of Child Language, 33(01), 163.
  • ↑ Keren-Portnoy, T., Majorano, M., & Vihman, M. M. (2009). From phonetics to phonology: the emergence of first words in Italian. Journal of child language, 36(2), 235-67.
  • ↑ Otomo, K. (2001). Maternal responses to word approximations in Japanese childrenʼs transition to language. Journal of Child Language, 28(1), 29-57.
  • ↑ 8.0 8.1 8.2 8.3 Vetter, H. J. (1971). Theories of Language Acquisition. Journal of Psycholinguistic Research, 1(1), 31. McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9. Cite error: Invalid <ref> tag; name "vet" defined multiple times with different content
  • ↑ 9.0 9.1 Rabagliati, H., Marcus, G. F., & Pylkkänen, L. (2010). Shifting senses in lexical semantic development. Cognition, 117(1), 17-37. Elsevier B.V.
  • ↑ 10.0 10.1 10.2 10.3 Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ 11.0 11.1 11.2 11.3 McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9.
  • ↑ Wyllie-Smith, L., McLeod, S., & Ball, M. J. (2006). Typically developing and speech-impaired childrenʼs adherence to the sonority hypothesis. Clinical linguistics & phonetics, 20(4), 271-91.
  • ↑ 13.0 13.1 Ganger, J., & Brent, M. R. (2004). Reexamining the vocabulary spurt. Developmental psychology, 40(4), 621-32.
  • ↑ 14.0 14.1 14.2 Aoyama, K., Peters, A. M., & Winchester, K. S. (2010). Phonological changes during the transition from one-word to productive word combination. Journal of child language, 37(1), 145-57.
  • ↑ Newton, C., & Wells, B. (2002, July). Between-word junctures in early multi-word speech. Journal of Child Language.
  • ↑ Goldstein, M. H., & Schwade, J. a. (2008). Social feedback to infantsʼ babbling facilitates rapid phonological learning. Psychological science : a journal of the American Psychological Society / APS, 19(5), 515-23. doi: 10.1111/j.1467-9280.2008.02117.x.
  • ↑ Nicolaidis, K. (2004). Articulatory variability during consonant production by Greek speakers with hearing impairment: an electropalatographic study. Clinical linguistics & phonetics, 18(6-8), 419-32.
  • ↑ Nionio, A., & Snow, C. (1996). Pragmatic development. Boulder, CO: Westview Press
  • ↑ Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ Storkel, H. L. (2004). Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition. Applied Psycholinguistics, 25(02), 201-221.
  • ↑ Gillette, J., Gleitman, H., Gleitman, L., & Lederer, a. (1999). Human simulations of vocabulary learning. Cognition, 73(2), 135-76.
  • ↑ Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N., & Marchman, V. a. (2008). Babyʼs first 10 words. Developmental psychology, 44(4), 929-38.
  • ↑ 23.0 23.1 23.2 Mitchell, C., & McMurray, B. (2009). On Leveraged Learning in Lexical Acquisition and Its Relationship to Acceleration. Cognitive Science, 33(8), 1503-1523.
  • ↑ Reznick, J. S., & Goldfield, B. a. (1992). Rapid change in lexical development in comprehension and production. Developmental Psychology, 28(3), 406-413.
  • ↑ Gopnik, A., & Meltzoff, A. (1987). The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. Child Development, 58(6), 1523.
  • ↑ 26.0 26.1 26.2 Swan, D. W. (2000). How to build a lexicon: a case study of lexical errors and innovations. First Language, 20(59), 187-204.
  • ↑ 27.0 27.1 27.2 27.3 McGregor, K. K., Friedman, R. M., Reilly, R. M., & Newman, R. M. (2002). Semantic representation and naming in young children. Journal of speech, language, and hearing research : JSLHR, 45(2), 332-46.
  • ↑ 28.0 28.1 Bloomquist, J. (2007). Developmental trends in semantic acquisition: Evidence from over-extensions in child language. First Language, 27(4), 407-420.
  • ↑ 29.0 29.1 Bowerman, M. (1978). Systematizing Semantic Knowledge ; Changes over Time in the Child ’ s Organization of Word Meaning tion that errors of word choice stem from the Substitution Errors as Evidence for the Recognition of Semantic Similarities among Words. Child Development, 7.
  • ↑ 30.0 30.1 Jerger, S., & Damian, M. F. (2005). Whatʼs in a name? Typicality and relatedness effects in children. Journal of experimental child psychology, 92(1), 46-75.
  • ↑ 31.0 31.1 31.2 31.3 31.4 31.5 A first Language. Cambridge, MA: Harvard University Press.
  • ↑ Behrens, H., & Gut, U. (2005). The relationship between prosodic and syntactic organization in early multiword speech. Journal of Child Language, 32(1), 1-34.
  • ↑ <Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., Xu, F., et al. (2011). IN LANGUAGE ACQUISITION Michael Ullman. Language Acquisition, 57(4).
  • ↑ 34.0 34.1 Brooks, P. J., Tomasello, M., Dodson, K., & Lewis, L. B. (1999). Young Childrenʼs Overgeneralizations with Fixed Transitivity Verbs. Child Development, 70(6), 1325-1337. doi: 10.1111/1467-8624.00097.
  • ↑ 35.0 35.1 Hartshorne, J. K., & Ullman, M. T. (2006). Why girls say “holded” more than boys. Developmental science, 9(1), 21-32.
  • ↑ 36.0 36.1 Gerken, L., Landau, B., & Remez, R. E. (1990). Function Morphemes in \ bung Children ’ s Speech Perception and Production. Developmental Psychology, 26(2), 204-216.

stages of speech production

  • Psycholinguistics
  • Pages with reference errors

Navigation menu

Physiology of Speech Production

Cite this chapter.

Book cover

  • Nobuhiko Isshiki 2  

147 Accesses

2 Citations

Speech production at the peripheral level consists of three stages: exhalation, phonation, and articulation (Table 2.1). Exhalatory movement of the respiratory organ provides the subglottal air flow (direct current). The air flow is cut into puffs (alternating current) at the closed glottis as the vocal cords vibrate. The sound thereby produced at the glottis is referred to as the primary laryngeal tone or glottal sound (source) . Through the resonance of the vocal tract, the glottal sound is modified so that some frequency components are amplified and others are attenuated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Bibliography

2. physiology of voice production.

Arnold GE (1964) Clinical application of recent advances in laryngeal physiology. Ann Otol Rhinol Laryngol 73: 426–444

PubMed   CAS   Google Scholar  

Bouhuys AE (1968) Sound production in man. Ann NY Acad Sei 155: 1–381

Google Scholar  

Dunkel E (1969) Neue Ergebnisse der Kehlkopfphysiologie. Folia Phoniatr 21: 161–178

Article   Google Scholar  

Fairbanks G (1950) A physiological correlative of vowel intensity. Speech Monogr 17: 390–395

Fant G (1960) Acoustic theory of voice production. Mouton ‘S-Gravenhage, pp 265–272

Farquharson IM, Anthony JKF (1970) Research techniques in voice pathology. J Laryn-gol Otol 84: 809–817

Article   CAS   Google Scholar  

Fink BR (1962) Tensor mechanism of the vocal folds. Ann Otol Rhinol Laryngol 71: 591–600

Flanagan JL (1958) Some properties of the glottal sound source. J Speech Hear Res 1: 99–116

Flanagan JL, Ishizaka K, Shipley KL (1975) Synthesis of speech from a dynamic model of the vocal cords and vocal tract. Bell Syst Tech J 54: 485–506

Flanagan JL, Ishizaka K (1976) Automatic generation of voiceless excitation in a vocal cord vocal tract speech synthesizer. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP 24: 163–170

Fletcher WW (1950) A study of internal laryngeal activity in relation to vocal intensity. Ph.D. Thesis, Northwestern University

Floyd WF, Negus VE, Neil E (1957) Observations on the mechanism of phonation. Acta Otolaryngol 48: 16–25

Article   PubMed   CAS   Google Scholar  

Griesman BL (1943) Mechanism of phonation demonstrated by planigraphy of the larynx. Arch Otolaryngol 38: 17–26

Gupta V, Wilson TA, Beavers GS (1973) A method for vocal cord excitation. J Acoust Soc Am 54: 1607–1617

Haji T, Isshiki N, Taira T, Ohmori K, Honjo I (in press) Folia Phoniatr

Hast MH (1966) Physiological mechanism of phonation: tension of the vocal fold muscle. Acta Otolaryngol 62: 309–318

Hirano M, Koike Y, Joyner J (1969a) Style of phonation. An electromyographic investigation of some laryngeal muscles. Arch Otolaryngol 89: 902–907

Hirano M, Ohala J, Vennard M (1969b) The function of laryngeal muscles in regulating fundamental frequency and intensity of phonation. J Speech Hear Res 12: 616–628

Hirano M (1974) Morphological structure of the vocal cord as a vibrator and its variations. Folia Phoniatr 26: 89–94.

Hirano M (1981) Structure of the vocal fold in normal and disease states. Anatomical and physical studies. In: Ludlow CL, O’Connell Hart M (eds) Proceedings of the conference on the assessment of vocal pathology. ASH A Report 11. Rockville, Maryland, pp 11–27

Hirano M, Kakita Y (1985) Cover-body theory of vocal fold vibration. In: Daniloff RG (ed) Speech science. College-Hill Press, San Diego, pp 1–46

Hiroto I (1966) Patho-physiology of the larynx from the standpoint of vocal mechanism. Pract Otol (Kyoto) 59: 229–294 (in Japanese)

House AS (1959) A note on optimal vocal frequency. J Speech Hear Res 2: 55–60

Ishizaka K, Matsudaira M (1972) Fluid mechanical considerations of vocal cord vibration (SCRL Monogr. No. 8). Speech Communication Research Laboratory, Santa Barbara

Ishizaka K, Flanagan JL (1972) Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell Syst Tech J 51: 1233–1268

Isshiki N (1959) Regulatory mechanism of the pitch and volume of voice. Oto-rhino-laryng Clinic (Jibirinsho) (Kyoto) 52: 1065–1094

Isshiki N (1961) Voice and subglottic pressure. Studia Phonol 1: 86–94

Isshiki N (1964) Regulatory mechanism of voice intensity variation. J Speech Hear Res 7: 17–29

Isshiki N (1965) Vocal intensity and air flow rate. Folia Phoniatr 17: 92–104

Isshiki N (1970) Remarks on mechanism for vocal intensity variation. J Speech Hear Res 13: 669–672

Kaneko T, Asano H, Miura H, Ishizaka K (1971) Biomechanics of the vocal cords—on stiffness. Pract Otol (Kyoto) 64: 1229–1235 (in Japanese)

Kakita Y, Hiki S (1976) Investigation of laryngeal control in speech by use of thyrometer. J Acoust Soc Am 59: 669–674

Kirikae I (1943) Strobocinematographic study on the human vocal cord vibration during phonation. Jap Oto-rhino-laryng Soc (Tokyo) 49: 236–262 (in Japanese)

Koyama T, Kawasaki M, Ogura JH, Louis SM (1969) Mechanics of voice production. Laryngoscope 79: 337–354.

Ladefoged P, McKinney NP (1963) Loudness, sound pressure, and subglottal pressure in speech. J Acoust Soc Am 35: 454–460

Moore P, von Leden H (1958) Dynamic variations of the vibratory pattern in the normal larynx. Folia phoniatr 10: 205–238

Musehold A (1898) Stroboskopische und phoniatrische Studien über die Stellung der Stimmlippe im Brust-und Falsett-Register. Arch Laryng Rhinol 7: 1–21

Negus VE (1957) The mechanism of the larnx. Laryngoscope 67: 961–968

Perello J (1962) La théorie muco-ondulatoire de la phonation. Ann Otolaryngol Chir Cervicofec 79: 722–725

Portmann G (1957) The physiology of phonation. J Laryngol Otol 71: 1–15

Rubin HJ (1963) Experimental studies on vocal pitch and intensity in phonation. Laryngoscope 73: 973–1015

Sacia CF (1925) Speech power and energy. Bell Syst Tech J 4: 627–641

Sacia CF, Back CJ (1926) The power of fundamental speech sound. Bell Syst Tech J 5: 393–403

Schönhärl E (1960) Stroboskopie in der praktischen Laryngologie. Thieme, Stuttgart

Smith S (1954) Remarks on the physiology of the vibration of the vocal cords. Folia Phoniatr 6: 166–178

Smith S (1956) Membran-Polster-Theorie der Stimmlippen. Arch Ohr Nas Kehlk-heilk 60: 485

Smith S (1957) Chest register versus head register in the membrane cushion model of the vocal cords. Folia Phoniatr 9: 32–36

Sonnien AA (1956) The role of the external laryngeal muscles in length adjustment of the vocal cords in singing. Acta Otolaryngol [Suppl] 130

Stevens KN, House AS (1961) An acoustical theory of vowel production and some of its implications. J Speech Hear Res 4: 303–320

Timcke R, von Leden H, Moore P (1958) Laryngeal vibrations: measurements of the glottic wave, Part I, the normal vibratory cycle. Arch Otolaryngol 68: 1–19

Timcke R, von Leden H, Moore P (1959) Laryngeal vibrations: measurements of the glottic wave, part II, physiologic variations. Arch Otolaryngol 69: 438–444

Titze IR, Strong WJ (1975) Normal modes in vocal cord tissues. J Acoust Soc Am 57: 736–744

Titze IR (1976) On the mechanics of vocal-fold vibration. J Acoust Soc Am 60: 1366–1380

Tonndorf W (1929) Zur Physiologie des menschlichen Stimmorgans. HNO 22: 412–423

van den Berg Jw (1956a) Direct and indirect determination of the mean subglottic pressure. Folia phoniatr 8: 1–24

van den Berg Jw (1956b) Physiology and physics of voice production. Acta Physiol Pharmacology Neerl 5: 40–55

van den Berg Jw (1958) Myoelastic-aerodynamic theory of voice production. J Speech Hear Res 1:227–244

van den Berg Jw, Tan TS (1959) Results of experiments with human laryngEs. Pract Oto 21: 425–450

van den Berg Jw, Tan TS (1959) Results of experiments with human laryngEs. Pract Oto 21: 245–450

Vogelsanger GT (1954) Experimentelle Prüfung der Stimmleistung beim Singen. Folia Phoniatr 6: 193–227

von Leden H (1960a) The mechanism of phonation. Arch Otolaryngol 74: 660–676

von Leden H (1960b) Laryngeal physiology. J Laryngol Otol 74: 705–712

Wendler J (1965) Zur Messung der Stimmlippenlange. Z Laryngol Rhinol Otol 44: 162–173

Yanagihara N, von Leden H (1966a) The cricothyroid muscle during phonation: electromyographic aerodynamic, and acoustic studies. Ann Otol Rhinol Laryngol 75: 987–1007

Yanagihara N, Koike Y, von Leden H (1966b) Phonation and respiration: functional study in normal subjects. Folia Phoniatr 18: 323–340

Yanagihara N, Koike Y (1967) The regulation of sustained phonation. Folia Phoniatr 19: 1–18

Download references

Author information

Authors and affiliations.

Department of Plastic Surgery, School of Medicine, Kyoto University, Sakyo-ku, Kyoto, 606, Japan

Nobuhiko Isshiki ( Professor and Chief )

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer Japan

About this chapter

Isshiki, N. (1989). Physiology of Speech Production. In: Phonosurgery. Springer, Tokyo. https://doi.org/10.1007/978-4-431-68358-2_2

Download citation

DOI : https://doi.org/10.1007/978-4-431-68358-2_2

Publisher Name : Springer, Tokyo

Print ISBN : 978-4-431-68360-5

Online ISBN : 978-4-431-68358-2

eBook Packages : Springer Book Archive

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, identifying the speech production stages in early and late adulthood by using electroencephalography.

stages of speech production

  • 1 International Doctorate in Experimental Approaches to Language and Brain (IDEALAB, Universities of Groningen, Potsdam, Newcastle, Trento and Macquarie University), Sydney, NSW, Australia
  • 2 Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, Netherlands
  • 3 Clinical and Experimental Neurolinguistics (CLIEN), Vrije Universiteit Brussel, Brussels, Belgium
  • 4 Department of Neurology and Memory Clinic, ZNA Middelheim General Hospital, Antwerp, Belgium
  • 5 Center for Language and Brain, National Research University Higher School of Economics, Moscow, Russia

Structural changes in the brain take place throughout one’s life. Changes related to cognitive decline may delay the stages of the speech production process in the aging brain. For example, semantic memory decline and poor inhibition may delay the retrieval of a concept from the mental lexicon. Electroencephalography (EEG) is a valuable method for identifying the timing of speech production stages. So far, studies using EEG mainly focused on a particular speech production stage in a particular group of subjects. Differences between subject groups and between methodologies have complicated identifying time windows of the speech production stages. For the current study, the speech production stages lemma retrieval, lexeme retrieval, phonological encoding, and phonetic encoding were tracked using a 64-channel EEG in 20 younger adults and 20 older adults. Picture-naming tasks were used to identify lemma retrieval, using semantic interference through previously named pictures from the same semantic category, and lexeme retrieval, using words with varying age of acquisition. Non-word reading was used to target phonological encoding (using non-words with a variable number of phonemes) and phonetic encoding (using non-words that differed in spoken syllable frequency). Stimulus-locked and response-locked cluster-based permutation analyses were used to identify the timing of these stages in the full time course of speech production from stimulus presentation until 100 ms before response onset in both subject groups. It was found that the timing of each speech production stage could be identified. Even though older adults showed longer response times for every task, only the timing of the lexeme retrieval stage was later for the older adults compared to the younger adults, while no such delay was found for the timing of the other stages. The results of a second cluster-based permutation analysis indicated that clusters that were observed in the timing of the stages for one group were absent in the other subject group, which was mainly the case in stimulus-locked time windows. A z -score mapping analysis was used to compare the scalp distributions related to the stages between the older and younger adults. No differences between both groups were observed with respect to scalp distributions, suggesting that the same groups of neurons are involved in the four stages, regardless of the adults’ age, even though the timing of the individual stages is different in both groups.

Introduction

Effects of aging on the brain.

Structural changes in the brain, such as a reduction in cortical thickness ( Freeman et al., 2008 ; Zheng et al., 2018 ), a decrease in the number of cortical folds ( Zheng et al., 2018 ), and a reduction in gray ( Freeman et al., 2008 ) and white matter ( Marner et al., 2003 ) take place throughout one’s lifetime. Also, the connectivity within the cingulo-opercular network [CON; including dorsal anterior cingulate, medial superior frontal cortex, anterior insula, frontal operculum, and anterior prefrontal cortex ( Dosenbach et al., 2007 )] and the frontoparietal control network [FPCN; including the lateral prefrontal cortex, anterior cingulate cortex, and inferior parietal lobule ( Vincent et al., 2008 )] reduces with aging ( Geerligs et al., 2015 ). These networks modulate higher cognitive functions involved in language processing, such as working memory and reading. While the global efficiency of the three networks is the same in older and younger adults, the local efficiency and the modularity decrease with aging. This decrease may delay the speech production process; however, the efficiency of the visual network, which is used when watching pictures, is maintained. Therefore, no delay in the processing of information has been observed in the visual network with aging.

Age-related changes in the brain are also reflected in the oscillations of the brain, which can be measured using electroencephalography (EEG). The amplitude of components (peaks that are related to a particular process in the brain) in the processed signal, observed when many neurons fire together, is reduced in older individuals ( Wlotko et al., 2010 ). There are two reasons why this reduction may occur: (1) neurons that fire together are geometrically less aligned and do no longer fire synchronously and (2) the latency of the component is more variable. Also, delays in the latency of the N400 component have been observed in older individuals. According to the global slowing hypothesis ( Brinley, 1965 ), older adults are slower in every process, which should be reflected in the EEG. Slower processing speed may, thus, be observed in older adults when carrying out a cognitive task, because they cannot focus on speed when they are focusing on responding as accurately as possible, known as the “speed–accuracy tradeoff” ( Ratcliff et al., 2007 ). Not being able to focus on both speed and accuracy is possibly related to a decrease in the strength of the tract between the presupplementary motor area and the striatum in older adults ( Forstmann et al., 2011 ).

Effects of Aging on the Speech Production Process

Between 25 and 100% of the structural and functional changes in the brain are related to cognitive decline ( Fjell and Walhovd, 2011 ). Cognitive decline caused by aging may have an effect on the speech production process. For example, older adults are less accurate in picture naming than younger adults ( Connor et al., 2004 ). Decline in object naming is accompanied by a reduction in white and gray matter in the left temporal lobe ( Cardenas et al., 2011 ). The temporal lobe has been associated with semantic memory, in which concepts are stored. When a concept activates a lemma (the word meaning) in the lexicon, semantically related lemmas get coactivated. The correct lemma is retrieved from the mental lexicon when lemmas that are semantically related to the target are sufficiently inhibited. Both semantic memory and inhibition decline with aging ( Harada et al., 2013 ).

After the lemma retrieval stage, the lexical word form, the lexeme, is retrieved. When there is insufficient information available about the lexeme, the phonological form of the word cannot be retrieved. The speaker experiences a temporal failure to produce a word even though the word is well known to him. This so-called tip-of-the-tongue phenomenon is observed more frequently in older adults, particularly in those with atrophy in the left insula ( Shafto et al., 2007 ).

In the next stage of object naming, phonological encoding, the phonemes corresponding to the lexeme are retrieved and ordered and the phonological rules are applied. No aging effects have been reported for phonological encoding. Finally, the string of phonemes is phonetically encoded into an articulation plan. This plan specifies how the muscles of the mouth and throat will interact during the articulation of the word. Older individuals have a longer response duration for the production of both sequential and alternating syllable strings, which is associated with reduced cortical thickness in the right dorsal anterior insula and in the left superior temporal sulcus and gyrus ( Tremblay and Deschamps, 2016 ).

In sum, delayed lemma retrieval can be observed in older individuals ( Cardenas et al., 2011 ) due to reduced semantic memory and poorer inhibition abilities ( Harada et al., 2013 ). A delay at the lemma level may delay the onset of lexeme retrieval. Lexeme retrieval may be delayed due to tip-of-the-tongue states ( Shafto et al., 2007 ). In this study, lemma and lexeme retrieval are studied in picture-naming tasks, while phonological and phonetic encoding are studied in non-word production tasks. Since lemma and lexeme retrieval do not play a role in non-word production tasks, delays in these stages cannot delay the onset of phonological and phonetic encoding. Aging is not expected to have an effect on these two stages, because no aging effects on phonological encoding have been reported. Also, the task used to study phonetic encoding is different from the task used by Tremblay and Deschamps (2016) . An overview of the stages in spoken word and non-word production that may change in later adulthood is provided in Figure 1 .

www.frontiersin.org

Figure 1. Stages in the model of spoken word and non-word production based on Levelt et al. (1999) and how they may change in later compared to earlier adulthood.

Current Study

The hypothesis that the lemma and lexeme retrieval stages are delayed in older compared to younger individuals, whereas phonological and phonetic encoding are similar in both groups, can be tested using EEG. Since each speech production stage has its own timing ( Indefrey, 2011 ), it is possible to identify the individual stages using tasks in which more processing is required at the particular stage. Lemma retrieval requires more effort when the number of previously retrieved lemmas from neighboring nodes increases. This effect is referred to as the “cumulative semantic interference effect” ( Howard et al., 2006 ). Two EEG studies have used this effect to target the stage of lemma retrieval, which has been identified from 150 to 225 ms ( Maess et al., 2002 ) and from 200 to 380 ms after stimulus presentation ( Costa et al., 2009 ).

Lexeme retrieval requires more effort when the age of acquisition (AoA) of words increases ( Laganaro and Perret, 2011 ; Laganaro et al., 2012 ; Valente et al., 2014 ). This stage has been identified in a time window from 120 to 350 ms after stimulus presentation and around 280 and 150 ms before response onset ( Laganaro and Perret, 2011 ), from 380 to 400 ms after stimulus presentation and up to 200 ms before response onset ( Laganaro et al., 2012 ), and from 380 after stimulus presentation up to 100 ms before response onset ( Valente et al., 2014 ).

Phonological encoding requires more effort when the number of phonemes increases. So far, word length effects have not been identified in EEG studies, meaning that the time frame of phonological encoding has not been identified yet using this manipulation ( Valente et al., 2014 ; Hendrix et al., 2017 ). However, other tasks, such as comparing overt and covert production of nouns and verbs, have been used to track phonological encoding ( Sahin et al., 2009 ). In the current study, non-word length is used, which may lead to different findings.

Syllable frequency is known to have an effect on phonetic encoding: when syllable frequency decreases, phonetic encoding requires more effort ( Levelt and Wheeldon, 1994 ). In a task in which phonemes were inserted into non-words with varying frequencies in a non-word reading task, the syllable frequency effect has been identified using EEG from 170 to 100 ms before response onset ( Bürki et al., 2015 ). Our methodology is different because participants were asked to read the non-words, not to insert phonemes. It is, therefore, unclear what to expect.

Hence, for the current study, the cumulative semantic interference effect, the AoA effect, the effect of non-word length in phonemes, and the syllable frequency effect will be used to track the speech production stages in a group of younger adults and in a group of older adults. The time windows of the stages in both groups will be identified. If the time windows of the stages differ between the two groups, that does not mean that the processing mechanisms are different ( Nieuwenhuis et al., 2011 ). Therefore, a direct comparison of both groups will be made in the time windows of the relevant stages that were identified in the younger adults and the older adults. Additionally, the scalp distributions of the stages will be compared between the two groups.

Materials and Methods

Participants.

For the group of young adults, 20 young adulthood native speakers of Dutch (5 males) participated. The mean age of the participants was 21.8 years (age range: 17–28 years). Participants in the group of older adults were 20 late adulthood native speakers of Dutch (7 males). Their average age was 55.4 years (range: 40–65). The young adult participants are referred to as “younger adults,” and the late adulthood participants are referred to as “older adults.” The younger adults’ data will be the basis of this study, and their data will be compared to those of the older adults.

All participants were right handed, measured using the short version of the Edinburgh Handedness Inventory ( Oldfield, 1971 ). They reported no problems in hearing, and their vision was normal or corrected to normal. Also, they reported no reading difficulties. All participants were financially compensated and gave informed consent. The study was approved by the Ethics Committee of Humanities of the University of Groningen.

Lemma Retrieval

The materials used in the lemma retrieval task were black-and-white drawings. The pictures originated from the Auditief Taalbegripsprogramma ( ATP ; Bastiaanse, 2010 ) and the Verb and Action Test ( VAT ; see Bastiaanse et al., 2016 ) for individuals with aphasia. The order in which the depicted nouns were presented was manipulated for the cumulative semantic interference effect. The pictures were grouped in sets of five semantically related neighbors (e.g., bed, couch, cradle, closet, and chair) that fit into a particular category (e.g., furniture, clothes, and insects). The five nouns within one category had the same number of syllables and the same stress pattern and were controlled for logarithmic lemma frequency in Dutch ( Baayen et al., 1995 ). The depicted nouns were all mono- or disyllabic in Dutch.

For the selection of the final item list, a picture-naming task was carried out by four participants (one male) with a mean age of 22 years (age range: 21–23 years). Items that were named incorrectly by more than one participant were removed. The 125 selected items had an overall name agreement of 91.4%. The overall mean logarithmic lemma frequency was 1.28 (range: 0–2.91). The same set of pictures was used in two lists with reversed conditions to avoid an order of appearance effect. The lists were presented in three blocks of 30 items and one block of 35 items.

The pictures were presented on a computer screen, and participants were asked to name the pictures as quickly and accurately as possible. Before the picture was presented, a black fixation cross on a white background was shown for 500 ms. The function of the fixation cross was to draw attention and to announce that a picture was presented soon. The picture was shown for 5 s. Items within one category were not presented directly after another.

Lexeme Retrieval

The pictures for this test originated from the same sources as the materials on the first test and represented mono- and disyllabic nouns in Dutch. Items were controlled for AoA ( Brysbaert et al., 2014 ) and lexeme frequency ( Baayen et al., 1995 ).

Four participants (one male) with a mean age of 20.7 years (age range: 19–22) took part in a picture-naming task for pretesting the materials. These participants had not taken part in the lemma retrieval task. Items that were named incorrectly by more than one participant were omitted.

The 140 selected items had an overall name agreement of 93.9%. AoA ranged from 4.01 years for the noun “book” to 9.41 years for the noun “anchor,” with a mean of 5.96 years. The mean logarithmic lexeme frequency was 1.02 (range: 0–2.44). The correlation between AoA and lexeme frequency in the items is significant [ r (138) = −0.28, p < 0.001]. Therefore, in the analysis, only AoA has been taken into account. The items were organized in one list including four blocks of 35 items. The order of the items was randomized per block, so that every participant named the items in a different order.

The procedure of the lexeme retrieval task was the same as the procedure of the lemma retrieval task. Since there was some item overlap between the lemma and lexeme retrieval tasks, the two tasks were never administered consecutively. A non-word task was always administered in between.

Phonological and Phonetic Encoding

To identify the stages of phonological and phonetic encoding, a non-word reading task was used. 1 All non-words were disyllabic and composed of existing Dutch syllables. The combination of the two syllables resulted in a non-word, e.g., “kikkels” or “raalkro.” The non-words were controlled for spoken syllable frequency ( Nederlandse Taalunie, 2004 ). Two lists of non-words were developed in written form for the reading task. The two lists contained the same syllables, but the syllables were combined differently; thus, the non-words were unique.

The non-words were pretested in a reading task by four participants who took part in pretesting the picture-naming tasks as well. Each list was pretested with two participants. The 140 selected items for list 1 had an accuracy rate of 100%; 8% of the non-words in list 2 were produced incorrectly. The syllables used in these items were combined into new non-words. These non-words were pretested again with two other participants. Their accuracy was 100%.

For each non-word, the average spoken syllable frequency was computed over its two syllables. For list 1, the mean frequency was 1,136 (range: 257–4,514) and 1,077 (range: 257–4,676) for list 2. Also, the number of phonemes in the non-words was controlled for, because the duration of phonological encoding may increase with the number of phonemes. For both lists, the number of phonemes in the non-words ranged from 3 to 8. The average number of phonemes was 5.33 for list 1 and 5.29 for list 2.

The non-words were presented in white letters on a black background. The font type Trebuchet MS Regular, size 64, was used. The stimulus was presented for 5 s and preceded by a fixation cross, which was presented for 500 ms. Participants read either list 1 or list 2. Each list was divided into four blocks of 35 items. The order in which the non-words was presented was randomized per block, so none of the participants read the non-words in the same order. The instruction was to read the non-words aloud as quickly and accurately as possible.

General Procedure

During the experiments, participants were seated approximately 70 cm from the screen. E-Prime 2.0 (2012) was used to present the stimuli and to record the response times and the responses. A voice key was used to detect the response times. The responses were recorded using a microphone that was attached to a headset. Before the experiment started, participants practiced the task with five items for the picture-naming tasks and with eight items for the non-word reading task. Participants had the opportunity to take a short break between the four blocks of the experiments.

EEG Data Recording

Electroencephalography data were recorded with 128 (older adults) and 64 (younger adults) Ag/AgCl scalp electrodes (WaveGuard) cap using the EEGO and ASA-lab system (ANT Neuro Inc., Enschede, Netherlands). These systems are entirely compatible; EEGO is the latest version. For the older adults, only the 64 channels that were recorded in the younger group were analyzed. The full set of 128 electrodes was used in a different study. The electrode sites were distributed over the scalp according to the 10-10 system ( Jasper, 1958 ) for the system with 64 electrodes and according to the 10-5 system for the system with 128 electrodes. Bipolar electrodes were used to record vertical ocular movements, such as eye blinks, for which the electrode sites were vertically aligned with the pupil and located above and below the left eye. Impedance of the skin was kept below 20 kΩ, which was checked before every experiment. Data were acquired with a sampling rate of 512 Hz, and reference was recorded from the mastoids.

Data Processing and Analysis

Behavioral data.

The audio recordings of the participants’ responses were used to determine the speech onset time. The speech onset time in each audio file was manually determined using the waveform and the spectrogram in Praat ( Boersma and Weenink, 2018 ). The speech onset times based on the audio files were used as response events in the response-locked EEG analysis. R was used for the statistical analysis of the behavioral and item data ( R Core Team, 2017 ).

Trials to which participants responded incorrectly were excluded from the analysis (lemma retrieval: 7.8%; lexeme retrieval: 7.3%; phonological and phonetic encoding: 1.9%). Also, responses that included hesitations or self-corrections qualified as errors (lemma retrieval: 2.6%; lexeme retrieval: 2.6%; phonological and phonetic encoding: 0.8%). Items to which many participants responded extraordinarily fast or slow were excluded from the EEG analysis (lemma retrieval: 8%; lexeme retrieval: 18.6%; phonological and phonetic encoding: 12.1%). The average response time was computed over all accepted trials. Trials exceeding this average by 1.4 standard deviations were disregarded.

The EEG data were preprocessed using EEGLAB ( Delorme and Makeig, 2004 ) as an extension to MATLAB (2015) . After rereferencing to the average reference of the mastoids, the data were filtered with a 50-Hz notch filter to remove electricity noise and bandpass filtered from 0.2 to 30 Hz. Then, the data were resampled to 128 Hz. Independent components analysis on all channels was used for artifact detection. Artifact components, such as eye blinks, were removed through visual inspection. Also, the effect of component removal on the data was visually inspected. The continuous data were segmented per trial from 200 ms until 2 s after stimulus onset. A baseline correction was applied over the data epochs, using the 200 ms before stimulus onset as a baseline. Then, the events of disregarded trials were removed. To study the time window from the stimulus onset until the response onset, both stimulus-locked analyses, in which the time window after stimulus onset is analyzed, and response-locked analyses, in which the backward time window before the response onset is analyzed, were carried out. For the stimulus-locked analysis, the data epochs were segmented from stimulus onset until one sampling point (8 ms) after the earliest response time. This one extra sampling point was removed before the analysis. The start of the response-locked analysis was determined by subtracting the stimulus-locked time window from the response onset. Depending on the task, accepted trials were coded into two or three conditions for the statistical analysis. The conditions are specified below per experiment. These data were exported from EEGLAB into the format used in FieldTrip ( Oostenveld et al., 2011 ), which was used for the statistical analysis. Finally, the structure of the data files was prepared for a cluster-based permutation analysis ( Maris and Oostenveld, 2007 ).

The aims of the analyses were to identify the time window of lemma retrieval with the cumulative semantic interference effect, the time window of lexeme retrieval with the AoA effect, the time window of phonological encoding with the non-word length in phonemes effect, and the time window of phonetic encoding with the syllable frequency effect. These time windows were identified in the group of older adults and in the group of younger adults using group-level cluster-based permutation analyses carried out over all participants per group. The cumulative semantic interference effect was computed as the difference between the first and the fifth presented item within a category. The difference between words with an AoA of around 5 years and words with an AoA of around 6 years, as well as the difference between words with an AoA of 5 years and words with an AoA of around 7 years were used to compute the AoA effect. The effect of non-word length in phonemes was computed as the difference between non-words consisting of four phonemes and non-words consisting of five phonemes, as well as the difference between non-words consisting of four phonemes and non-words consisting of six phonemes. The difference between non-words with a high syllable frequency of 1,000–1,500 and non-words with a moderate syllable frequency of 500–1,000, as well as the difference between non-words with a high syllable frequency of 1,000–1,500 and non-words with a low syllable frequency of 250–500 were used to compute the syllable frequency effect. In every analysis, the number of permutations computed was 5,000. The Monte Carlo method was used to compute significance probability, using a two-sided dependent samples t -test (α = 0.025). In the first analysis of every experiment, the entire time window from stimulus onset until 100 ms before response onset was tested. When an effect was revealed in this large time window, a smaller time window around the effect was tested once, so a more specific timing of the effect could be reported. Finally, the time windows of the stages in older and younger adults were compared. This method cannot show whether the two groups differ ( Nieuwenhuis et al., 2011 ). Therefore, the EEGs of both groups have been compared in the time windows of the stages for every single condition using a cluster-based permutation analysis. Again, the Monte Carlo method was used to compute significance probability, but now a two-sided independent samples t -test (α = 0.025) was used to compare the two subject groups.

Additionally, a z -score mapping analysis ( Thatcher et al., 2002 ) was carried out to compare the scalp distributions of the older adults to those of the younger adults during the speech production stages. For each experiment, the data were analyzed in relevant time windows and conditions for which significant clusters were found in the cluster-based permutation analysis of the older and the younger adults. The length of these time windows varied between the participant groups, which would have caused a difference in the number of time points included in the analysis. To avoid this difference, the number of time points centered around the median of the longest time window used in the analysis was made equal to the number of time points in the shortest time window. For each time point, z -scores were computed per electrode. The mean computed over the younger adults’ data was subtracted from each data point from the older adults’ data individually. This subtraction was divided by the standard deviation computed over the younger adults’ data. Mean z -scores were computed per condition. When the mean z -score deviated more than one standard deviation from zero, the difference between the age groups qualified as significant.

The mean, standard deviation, and range of the response time data from the three experiments are provided per participant group in Table 1 . For all analyses on response time, only the correct responses were used.

www.frontiersin.org

Table 1. Response times of the younger and older adults.

Behavioral Results

Younger adults.

At all tasks, the younger adults performed at ceiling. The percentages of correct responses were 92.4% for lemma retrieval, 92.9% for lexeme retrieval, and 98% for the non-word reading task targeting phonological and phonetic encoding. On the lemma retrieval task, a cumulative semantic interference effect was found on the response time [ F (1, 765) = 13.38, p < 0.001]. Increased response times were found for pictures within a category that were presented at the fifth ordinal position compared to pictures that were presented at the first ordinal position. An AoA effect on the response time was identified on the lexeme retrieval task [ F (1, 2,205) = 104.01, p < 0.001]. Response time increased as AoA advanced. Non-word length in number of phonemes is relevant at the level of phonological encoding and turned out to be a significant factor: response times increased when non-words consisted of more phonemes [ F (1, 2,096) = 5.71, p = 0.017]. The frequency of the syllables was varied to tap into phonetic encoding. Response times were found to decrease when syllable frequency increased [ F (1, 2,320) = 6.35, p = 0.01].

Older Adults

Like the younger adults, the older adults performed at ceiling on all tasks. The percentages of correct responses were 86.8% for lemma retrieval, 87.6% for lexeme retrieval, and 96.5% for the non-word reading tasks. A cumulative semantic interference effect was found on the lemma retrieval task [ F (1, 721) = 7.60, p = 0.006]. Increased response times were found for pictures within a category that were presented at the fifth ordinal position compared to those presented at the first ordinal position. Also, increased response times were found for items with a later AoA on the task targeting lexeme retrieval [ F (1, 2,061) = 43.38, p < 0.001]. In the non-word reading task, response times increased with the non-word length in number of phonemes, which was used as a marker for phonological encoding [ F (1, 1,943) = 5.60, p = 0.018]. Furthermore, to target phonetic encoding, a decrease in syllable frequency of the non-words was found to increase response times [ F (1, 2,146) = 11.68, p < 0.001].

Differences Between Younger and Older Adults

On all tasks, differences in response times between both age groups were found. The older adults responded slower than the younger adults on the lemma retrieval task [ F (1, 1,488) = 4.81, p = 0.028], the lexeme retrieval task [ F (1, 4,268) = 7.14, p = 0.007], and the non-word reading task targeting phonological and phonetic encoding [ F (1, 4,468) = 28.58, p < 0.001]. Moreover, an interaction effect of AoA and participant age was found [ F (1, 4,268) = 4.51, p = 0.034]. The group of older adults showed a smaller AoA effect [ F (1, 2,061) = 43.38, p < 0.001] than the group of younger adults [ F (1, 2,205) = 104.01, p < 0.001].

EEG Results

For the presentation of the EEG results, we will first present the results of the cluster-based permutation analysis for each task in the younger adults and then in the older adults to identify the time windows of the effects in these groups. Then, the differences between the two groups in these time windows computed with cluster-based permutation analyses will be presented along with the comparisons of the scalp distributions of both age groups. The EEG statistics are given in Appendix 1A (younger adults), Appendix 1B (older adults), and Appendix 1C (comparison of older and younger adults).

In the younger adults, a difference between the first and fifth ordinal positions that was taken as evidence for the stage of lemma retrieval was revealed in the latency range from 100 to 265 ms ( p = 0.005) after stimulus onset. The difference was most pronounced over right central and posterior sensors. In the response-locked analysis, an effect was found from 445 to 195 ms ( p = 0.004) before response onset. The effect was most pronounced over central and posterior sensors bilaterally and over the right frontal electrodes. The scalp distribution of the stimulus-locked effect and the waveforms of the grand averages for the first and fifth ordinal position are shown in Figure 2 .

www.frontiersin.org

Figure 2. Left : The cluster related to the cumulative semantic interference effect in the younger adults that was revealed in the stimulus-locked analysis of the lemma retrieval task. Electrodes included in the cluster are marked in red. Right : The waveforms of the grand averages for the 1st (in blue) and 5th ordinal position (in red) for electrode PO6 in the younger adults.

Testing for an AoA effect targeting lexeme retrieval in the latency range from 100 to 300 ms after stimulus onset in the younger adults, the cluster-based permutation test revealed a difference between the items with an early AoA and items with a moderate AoA ( p = 0.002). The difference was most pronounced on bilateral frontal and central sensors, as shown in Figure 3 . Figure 3 also shows the waveforms of the grand averages for the early and moderate AoA conditions. In the response-locked cluster-based permutation analysis, a difference between items with an early AoA and items with a late AoA was revealed from 475 to 330 ms before response onset. The response-locked AoA effect was most pronounced on bilateral frontal and bilateral central electrodes ( p < 0.001).

www.frontiersin.org

Figure 3. Left : The cluster related to the AoA effect in the younger adults that was revealed in the stimulus-locked analysis of the lexeme retrieval task. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for an AoA of ca. 5 (in blue) and 6 years (in red) for electrode F1 in the younger adults.

A stimulus-locked length effect was revealed from 350 to 415 ms for the comparison of non-words consisting of four and five phonemes ( p = 0.0032) targeting phonological encoding, which is shown in Figure 4 . The waveforms of the grand averages for non-word length in four and five phonemes are provided in Figure 4 as well. Also, a stimulus-locked length effect was revealed as a difference between non-words consisting of four and six phonemes in a time window from 390 to 425 ms after stimulus presentation ( p = 0.0046). Both stimulus-locked effects were most pronounced over the bilateral centro-posterior electrodes. In the response-locked analysis, a length effect was identified as a difference between four and five phonemes from 335 to 320 ms before response onset, which was most pronounced over bilateral central and left posterior electrodes ( p = 0.0084). Also, a length effect for the difference between four and six phonemes was revealed from 330 to 320 ms before response onset ( p = 0.0084). This effect was most pronounced in right central and bilateral posterior electrodes.

www.frontiersin.org

Figure 4. Left : The cluster related to the effect of non-word length in the younger adults that was revealed in the stimulus-locked analysis of the task targeting phonological encoding. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for a non-word length of four (in blue) and five phonemes (in red) for electrode C1 in the younger adults.

Testing for a syllable frequency effect targeting phonetic encoding in the latency range from 400 to 450 ms after stimulus onset in the younger adults, the cluster-based permutation test revealed a difference between items with a high syllable frequency and items with a moderate syllable frequency ( p = 0.020). In this latency range, the difference was most pronounced over the central sensors bilaterally. Another stimulus-locked syllable frequency effect was found as a difference between items with a high syllable frequency and items with a low syllable frequency in a time window from 350 to 450 ms after stimulus onset ( p = 0.012), which is shown in Figure 5 . The difference was most pronounced at the frontal and central sensors bilaterally. In Figure 5 , the waveforms of the grand averages for the high and low syllable frequency items are provided as well. In the response-locked analysis, a difference between items with a high syllable frequency and items with a low syllable frequency was revealed in a time window from 250 to 200 ms before response onset ( p = 0.021). The effect was most pronounced at bilateral central sensors.

www.frontiersin.org

Figure 5. Left : The cluster related to the syllable frequency effect in the younger adults that was revealed in the stimulus-locked analysis of the task targeting phonetic encoding. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for high (in blue) and low syllable frequency (in red) for electrode F2 in the younger adults.

In the older adults, testing for a cumulative semantic interference effect in the latency range from 540 to 450 ms before response onset, the cluster-based permutation test revealed a difference between the first and fifth ordinal positions ( p = 0.006) that was taken as evidence for the stage of lemma retrieval. The difference was most pronounced over left posterior electrodes during the first 60 ms and most pronounced over the right posterior electrodes during the last 50 ms of the effect. No effect was found in the stimulus-locked analysis. The scalp distribution and the waveforms of the first and fifth ordinal position’s grand average are shown in Figure 6 .

www.frontiersin.org

Figure 6. Left : The cluster related to the cumulative semantic interference effect in the older adults that was revealed in the response-locked analysis of the lemma retrieval task. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for the 1st (in blue) and 5th ordinal position (in red) for electrode CP4 in the older adults.

For lexeme retrieval, an AoA effect was revealed in the cluster-based permutation analysis in three response-locked time windows as a difference between items with an early AoA (of around 5 years) and items with a moderate AoA (of around 6 years). The AoA effect was most pronounced over centro-posterior electrodes in the earliest cluster from 430 to 420 ms ( p = 0.012) before response onset. In the second cluster, from 210 to 195 ms ( p = 0.009) before response onset, the effect was most evident over the right frontal electrodes. The AoA effect was most distinct over right central electrodes in the last cluster with the longest duration from 165 to 140 ms ( p = 0.013) before response onset, which is depicted in Figure 7 . In Figure 7 , the waveforms of the grand averages for the early and moderate AoA items are provided as well. No differences were found between items with an early AoA and items with a late AoA (of around 7 years). Also, no AoA effect was found in the stimulus-locked analysis.

www.frontiersin.org

Figure 7. Left : The cluster related to the AoA effect in the older adults that was revealed in the response-locked analysis of the lexeme retrieval task. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for an AoA of ca. 5 (in blue) and 6 years (in red) for electrode FC2 in the older adults.

For phonological encoding, the effect of the length in the number of phonemes on non-word reading was used in the cluster-based permutation analysis. In the older adults, a length effect was revealed as a difference between non-words with a length of four and six phonemes in the time windows from 100 to 135 ms ( p = 0.019) and from 280 to 300 ms ( p = 0.0038) after stimulus onset. In the first time window, the length effect was most pronounced over the right posterior electrodes, as shown in Figure 8 . The waveforms of the grand averages for items consisting of four and six phonemes are provided in Figure 8 as well. The effect was most pronounced over bilateral frontal and central electrodes in the second time window. No effects were found for the comparison of non-words with a length of four and five phonemes. Also, no length effects were found in the response-locked analysis.

www.frontiersin.org

Figure 8. Left : The cluster related to the effect of non-word length in phonemes in the older adults that was revealed in the stimulus-locked analysis of the task targeting phonological encoding. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for a non-word length of four (in blue) and six phonemes (in red) for electrode P1 in the older adults.

For tapping into phonetic encoding, the effect of syllable frequency on the non-word reading task was used. The stimulus-locked cluster-based permutation analysis revealed a syllable frequency effect for reading non-words with a high syllable frequency (ranging from 1,000 to 1,500) as compared to reading non-words with a moderate syllable frequency (ranging from 500 to 1,000) in a time window from 280 to 300 ms ( p = 0.0094) and in a time window from 365 to 375 ms ( p = 0.022) after stimulus presentation. The earliest effect was most pronounced over electrodes covering the right hemisphere, the later effect over the posterior electrodes. Furthermore, the comparison of non-words with a high syllable frequency to non-words with a low syllable frequency (ranging from 250 to 500) revealed effects from 280 to 290 ms ( p = 0.0196) and from 420 to 455 ms ( p = 0.0078) after stimulus onset. The effect starting at 280 ms was most pronounced over right-posterior electrodes, while the later effect shown in Figure 9 was most pronounced over bilateral posterior electrodes. The waveforms of the high- and low-frequency items’ grand averages are shown in Figure 9 as well. Also, the syllable frequency effect was revealed from 455 to 435 ms ( p = 0.016) before response onset. This effect was most pronounced over bilateral frontal and central electrodes.

www.frontiersin.org

Figure 9. Left : The cluster related to the syllable frequency effect in the older adults that was revealed in the stimulus-locked analysis of the task targeting phonetic encoding. Electrodes included in the cluster are marked in red. Right : Waveforms of the grand averages for a high (in blue) and low syllable frequency (in red) for electrode P1 in the older adults.

Comparing the older and younger adults in the time window for lemma retrieval in younger adults from 100 to 265 ms after stimulus presentation in the fifth ordinal position, the cluster-based permutation analysis showed that both groups differed. In this time window, two effects were identified: a positive ( p = 0.0026) and a negative one ( p = 0.0022). The electrodes over which the positive effect was most pronounced were located in frontal regions bilaterally. The negative effect was most pronounced in bilateral posterior regions. Also, in the time window for lemma retrieval in older adults from 540 to 450 ms before response onset, both groups were found to differ. Differences were observed as a positive ( p = 0.023) effect that was most pronounced over bilateral frontal electrodes and a negative effect ( p = 0.013) that was most pronounced over bilateral posterior electrodes. Furthermore, a difference between the groups was observed in the response-locked time window for lemma retrieval in the younger adults from 445 to 195 ms before response onset ( p = 0.0044). This difference was most pronounced in the posterior regions bilaterally. The clusters are shown in Figure 10A along with the waveforms of the grand averages for younger and older adults.

www.frontiersin.org

Figure 10. (A) Difference between younger and older adults identified in the stimulus-locked (top) and response-locked analysis (bottom) for the 5th ordinal position in the lemma retrieval task, showing a positive cluster over frontal electrode sites and a negative cluster over posterior electrode sites. Electrodes included in the clusters are marked in red. Waveforms of the grand averages for the younger (in blue) and older adults (in red) of the frontal electrodes F1 (top left) and F5 (bottom left) and posterior electrodes O1 (right) . (B) Scalp distributions per ordinal position showing the z -scores of the older adults compared to the younger adults.

Based on the results from the cluster-based permutation analysis, a time window from 540 to 450 ms before response onset in older adults was compared to a time window from 365 to 275 ms before response onset in young adults. The z -scores computed for the first ( M = 0.03, SD = 0.15, range = −0.37 to 0.27) and the fifth ordinal positions ( M = −0.12; SD = 0.15, range = −0.41 to 0.19) indicated no differences in scalp distributions between the older and the younger adults. Figure 10B shows the z -scores of the individual electrodes mapped onto the scalp distribution per ordinal position.

In the time window for lexeme retrieval identified for the younger adults, from 100 to 300 ms after stimulus presentation, a difference between the older and younger adults was found for items with a moderate AoA ( p = 0.0022). The difference was most pronounced in frontocentral regions bilaterally, as shown in Figure 11A . Also, the waveforms of the younger and older adults’ grand averages are provided in Figure 11A . The response-locked time windows for lexeme retrieval from 430 and 140 ms before response onset identified in the older adults and from 475 to 330 ms before response onset identified in the younger adults did not reveal any differences between the groups.

www.frontiersin.org

Figure 11. (A) Left: Cluster related to the difference between younger and older adults identified in the stimulus-locked analysis for an AoA of ca. 6 years in the lexeme retrieval task. Electrodes included in the cluster are marked in red. Right: Waveforms of the grand averages for the younger (in blue) and older adults (in red) of the electrodes F3. (B) Scalp distributions per AoA showing the z -scores of the older adults compared to the younger adults.

The cluster-based permutation analysis targeting lexeme retrieval revealed no difference between early and late AoA conditions in the older adults; thus, the scalp distributions of the age groups could not be compared on these conditions. The age groups were compared on the early AoA and the moderate AoA conditions. A time window from 175 to 225 ms after stimulus presentation in the younger adults was compared to a time windows from 430 to 420 ms, from 210 to 195 ms, and from 165 to 140 ms before response onset in the older adults. Based on the z -scores of the electrodes, no differences in scalp distributions were found between the older and the younger adults for the early AoA ( M = 0.15, SD = 0.26, range = −0.64 to 0.64) and the moderate AoA conditions ( M = 0.29, SD = 0.33, range = −0.64 to 0.89). This is shown in Figure 11B .

The cluster-based permutation analysis for phonological encoding showed differences between older and younger adults for non-words consisting of five phonemes in a time window from 350 to 415 ms after stimulus presentation ( p = 0.015). Also, for the non-words consisting of six phonemes, a difference between both age groups was found from 390 to 425 ms after stimulus presentation ( p = 0.014). Both time windows were identified for phonological encoding in the young adults. The differences were most pronounced in bilateral posterior regions, as shown in Figure 12A . Figure 12A also shows the waveforms of the grand averages of the younger and the older adults. In the time windows identified for the older adults, no differences between the groups were found. This result was also the case for the response-locked time windows identified for phonological encoding in the younger adults.

www.frontiersin.org

Figure 12. (A) Left: Clusters related to the difference between younger and older adults identified in the stimulus-locked analysis for a non-word length of five (top) and six (bottom) phonemes in the task targeting phonological encoding. Electrodes included in the clusters are marked in red. Right: Waveforms of the grand averages for the younger (in blue) and older adults (in red) for the electrodes P4. (B) Scalp distributions per non-word length in phonemes showing the z -scores of the older adults compared to the younger adults.

For the older adults, no difference was found between non-words composed of four and five phonemes in the cluster-based analysis targeting phonological encoding, so the age groups cannot be compared on these conditions. The conditions with four and six phonemes were included in the scalp distributions analysis. Time windows from 390 to 425 ms after stimulus presentation and from 330 to 320 ms before response onset in the younger adults were compared to time windows from 105 to 135 ms and from 280 to 295 ms after stimulus presentation in the older adults. The z -scores revealed no differences in scalp distributions between the older and the younger adults for the four phonemes condition ( M = −0.24, SD = 0.20, range = −0.74 to 0.12) and the six phonemes condition ( M = −0.21, SD = 0.20, range = −0.74 to 0.11). The scalp distributions are shown in Figure 12B .

For phonetic encoding, the cluster-based permutation analyses showed a difference between the older and the younger adults for moderate frequency non-words from 280 to 375 ms after stimulus presentation ( p = 0.007). This range corresponds to the time window identified for phonetic encoding in the older adults. The groups did not differ in the time window for the younger adults. For low-frequency non-words, a difference between both groups was found from 280 to 455 ms after stimulus presentation ( p = 0.011). This time window corresponds to the time window identified for phonetic encoding in older adults and also includes the time window in which phonetic encoding was identified in younger adults. Both effects were most pronounced in bilateral posterior regions, as shown in Figure 13A . This figure also shows the waveforms of the grand averages for the younger and older adults. No differences between the groups were found in the response-locked time windows.

www.frontiersin.org

Figure 13. (A) Left: Clusters related to the difference between younger and older adults identified in the stimulus-locked analysis for a moderate (top) and high syllable frequency (bottom) in the reading task targeting phonetic encoding. Electrodes included in the clusters are marked in red. Right: Waveforms of the grand averages for the younger (in blue) and older adults (in red) for the electrodes P2. (B) Scalp distributions for high and moderate syllable frequency (top) and for high and low syllable frequency (bottom) showing the z -scores of the older adults compared to the younger adults.

For non-words with a high syllable frequency and a moderate syllable frequency, a time window from 410 to 440 ms after stimulus presentation in younger adults was compared to time windows from 280 to 300 ms and from 365 to 375 ms after stimulus presentation in older adults. Based on the z -scores, no differences in scalp distributions were found between the older and the younger adults for both high frequency ( M = −0.15, SD = 0.11, range = −0.33 to 0.10) and moderate frequency conditions ( M = −0.11, SD = 0.11, range = −0.36 to 0.12). Also, z -scores for non-words with a high syllable frequency and a low syllable frequency were computed to compare a time window from 385 to 440 ms after stimulus presentation in younger adults to time windows from 280 to 290 ms and from 420 to 455 ms after stimulus presentation and from 450 to 460 ms before response onset in older adults. For the high-frequency ( M = −0.15, SD = 0.12, range = −0.36 to 0.18) and the low-frequency conditions ( M = −0.11, SD = 0.14, range = −0.44 to 0.17), no differences in scalp distributions based on the z -scores were found between older and younger adults. The scalp distributions are shown in Figure 13B .

The current study had two aims, which will be addressed in this discussion. The first was to identify the speech production stages in a group of older adults and in a group of younger adults. The second aim was to test whether the stages change with age with respect to the timing or regarding the neural configuration observed in the scalp distributions.

Identification of Speech Production Stages

To identify the stages of the speech production process, a protocol with EEG was developed with three tasks tapping into four speech production stages. The manipulations in the tasks used to identify the stages had an effect on the response times in both the older and the younger adults. In the lemma retrieval task, the cumulative semantic interference effect caused increased response times for items belonging to the same category when they were presented at the fifth ordinal position compared to when they were presented at the first ordinal position. Also, later response times were found for items with a later AoA compared to items with an earlier AoA, as shown in the lexeme retrieval task. In the non-word reading task, non-words that consisted of more phonemes used to track phonological encoding and non-words with a lower syllable frequency used to tap into phonetic encoding caused increased response times. The results of the cluster-based permutation analysis of the EEG data revealed that the manipulations used in the tasks of the protocol showed an effect in particular time windows. First, the time windows in the younger adults will be discussed, after which the time windows in the older adults will be addressed.

In the younger adults, the timing of the cumulative semantic interference effect was revealed from 100 to 265 ms after stimulus presentation and from 445 to 195 ms before response onset. Response-locked cumulative semantic interference effects have not been reported in previous studies using EEG. However, the stimulus-locked timing largely corresponded to the timing of this effect found by Maess et al. (2002) from 150 to 225 ms after stimulus presentation, but only partially overlapped with the timing of this effect found by Costa et al. (2009) from 200 to 380 ms after stimulus presentation. As our materials showed, the items used by Maess et al. (2002) depicted mono- and disyllabic high-frequency words. The materials used by Costa et al. (2009) also included longer and less-frequent words, which may explain the later latency of the cumulative semantic interference effect.

The timing of the AoA effect for the younger adults appeared from 100 to 300 ms after stimulus presentation. This result corresponds to the timing of this effect from 120 to 350 ms after stimulus presentation found by Laganaro and Perret (2011) . Also, the response-locked effect for the younger adults from 475 to 330 ms before response onset overlaps with previously reported time windows of this stage from 380 after stimulus presentation up to 200 ms ( Laganaro et al., 2012 ) or up to 100 ms before response onset ( Valente et al., 2014 ).

Non-word length in phonemes was found to have an effect from 350 to 425 after stimulus presentation and from 335 to 320 before response onset for the younger adults. No previous speech production studies using EEG have reported on non-word length effects. Word length effects have been studied using picture-naming tasks, but no effects have been identified ( Valente et al., 2014 ; Hendrix et al., 2017 ). In our study, a length effect was identified with a non-word reading task. The input for phonological encoding of a word differs from the input for phonological encoding of a non-word, which may explain why the effect was found for non-words, but not for words. The phonological encoding of a familiar lexeme likely required less effort than the phonological encoding of an unfamiliar string of phonemes.

The syllable frequency effect in the non-word reading task has been identified after stimulus presentation from 350 to 450 ms for younger adults. Also, the effect has been found before response onset from 250 to 200 ms. Bürki et al. (2015) , using syllable frequency effect in a non-word reading task, identified this effect from 170 to 100 ms before response onset. This effect was later than the effect found in the current study, most likely because the task required participants to insert a phoneme into the non-word as they read it, which complicated the task.

The time windows described in the previous paragraphs correspond to the speech production stages identified by Levelt et al. (1999) and Indefrey (2011) . In the speech production model, lemma retrieval precedes lexeme retrieval. In the younger adults, the cumulative semantic interference effect and the AoA effect started at the same time in the stimulus-locked analysis, but the AoA effect lasted longer than the cumulative semantic interference effect. In the response-locked analysis, the cumulative semantic interference effect lasted longer than the AoA effect. The time window for lexeme retrieval started before and ended during the time window for lemma retrieval. In the lexeme retrieval task, lemma retrieval was not manipulated, and thus, lemma retrieval was less demanding (and, hence, faster) in the lexeme retrieval task than in the lemma retrieval task. Therefore, the time window for lexeme retrieval in the lexeme retrieval task may have started earlier than the time window for lemma retrieval in the lemma retrieval task.

Lexeme retrieval is followed by phonological encoding in the model. For picture naming, the lexical route is used, whereas for non-word reading, the sublexical route should be recruited. Thus, the timing of the lexeme retrieval stage in the picture-naming task and the timing of the phonological encoding stage in the non-word reading task cannot be compared using our method. Phonological encoding precedes phonetic encoding in the model. In the stimulus-locked analysis, the non-word length effect started at the same time as the syllable frequency effect, but the length effect ended earlier. In the response-locked analysis, the non-word length in phonemes effect preceded the syllable frequency effect. Thus, the protocol can be used to identify the stages using EEG in the younger adults.

In the older adults, the cumulative semantic interference effect was found from 540 to 450 ms before response onset. Since no response-locked cumulative semantic interference effects have been reported previously, the response-locked effect revealed in the older adults cannot be compared to other studies.

AoA effects have previously been identified in response-locked time windows until 200 ms ( Laganaro et al., 2012 ) or 100 ms before response onset ( Valente et al., 2014 ). These time windows overlap with the response-locked effects for the older adults from 430 to 140 ms before response onset.

The effect of non-word length in phonemes was identified from 100 to 135 ms and from 280 to 300 ms after stimulus presentation for the older adults. This study is the first to report the effects of non-word length in number of phonemes in an EEG study.

The second effect that was tested in the non-word reading task was syllable frequency, which has been identified from 280 to 455 ms after stimulus presentation. This effect was found from 455 to 435 ms before response onset as well. The timing of these effects is earlier than the timing of the syllable frequency effect reported by Bürki et al. (2015) . As said above, task was more demanding, which may explain these differences.

In the older adults, the response-locked cumulative semantic interference effect preceded the response-locked AoA effect. This corresponds to the speech production processes identified by Levelt et al. (1999) , Indefrey (2011) , in which lemma retrieval precedes lexeme retrieval. In the older adults, the effect of non-word length in phonemes was identified before the syllable frequency effect, but there is an overlap of 20 ms in the stimulus-locked analysis. This finding is also in agreement with the model, because phonological encoding precedes phonetic encoding. Thus, the protocol can be used to identify the stages using EEG in the older adults as well.

Aging Effects on Speech Production Stages

The behavioral data showed that both the younger adults and the older adults performed at ceiling on every task. Thus, in contrast to the study by Connor et al. (2004) , no reduced accuracy in picture naming was found for older adults. This can be explained by a major difference in the age range of the participants in both studies: it was larger in the study by Connor et al. (2004 : from 30 to 94 years) than in the current study, from 17 to 65 years. A behavioral difference between the groups was found in the response times. The older adults responded later than the younger adults on every task. It was hypothesized that the later response times of the older adults should reflected in the timing of the speech production stages in the EEG.

Differences in Timing Between Younger and Older Adults

Lemma retrieval requires semantic memory to activate the target lemma node along with its semantically related neighbors. These neighbors are inhibited to select the target lemma. Since both semantic memory ( Cardenas et al., 2011 ; Harada et al., 2013 ) and inhibition ( Harada et al., 2013 ) decline with aging, the duration of the lemma retrieval stage was expected to be increased in older adults. This hypothesis was not confirmed, because the lemma retrieval stage lasted 90 ms in the older adults, while in the younger adults, its duration was 165 ms in the stimulus-locked analysis and 250 ms in the response-locked analysis. However, all time windows of the effects that were found in the older adults were shorter than the time windows of the effects found in the younger adults. In older adults, neurons that fire together are possibly less synchronous in their timing, less aligned regarding their geometry, or the effect has a more variable latency ( Wlotko et al., 2010 ). Therefore, the time window in which all participants show an effect is shorter.

Since the duration of lemma retrieval was expected to be increased, the onset of the next stage, lexeme retrieval, was expected to be delayed in the older adults. This hypothesis was confirmed. The response-locked effect started 45 ms later for the older adults compared to the younger adults. Also, an increased duration of the lexeme retrieval stage was hypothesized, because of the tip-of-the-tongue phenomenon, which is observed more frequently in older adults ( Shafto et al., 2007 ). No increased duration was found, which again can be explained by the reduction in the effect caused by the effect’s variability within and between the older adults ( Wlotko et al., 2010 ).

The stages of the sublexical route were expected not to be delayed in older adults. There have been no previous studies on aging’s effect on phonological encoding. Also, older adults have not revealed longer response times producing alternating syllable strings, which require more effort during phonetic encoding, than for the production of sequential syllable strings ( Tremblay and Deschamps, 2016 ). However, both the effect of non-word length in phonemes related to phonological encoding and the syllable frequency effect targeting phonetic encoding started earlier for the older adults than for the younger adults. The difference in the onset of the timing of these stages between the groups is quite large; hence, this difference cannot be explained by the effect’s variability in older adults.

Neurophysiological Differences Between Younger and Older Adults

There were differences between the younger and the older adults regarding the time windows in which effects that were related to the stages were found. Results of the cluster-based permutation analyses showed that for every stage in at least one time window, differences between younger and older adults were found. In the time windows in which the younger adults showed a cumulative semantic interference effect, an AoA effect, or an effect of non-word length in number of phonemes, no such effect was observed in the older adults. This finding shows that the older adults had a different timing for the speech production stages than the younger adults. Despite partially overlapping time windows for the syllable frequency effect in the younger and older adults, a difference between both groups was found. The overlap in timing was possibly too short, so both groups differed during the majority of the time window, or the neural configuration of the syllable frequency effect differed between the groups. Except for the response-locked time windows identified using the cumulative semantic interference effect, differences between younger and older adults were generally identified in stimulus-locked time windows. When the stimulus is presented, the first process is the visual analysis of the picture or the non-word. This process is assumed to be identical in both age groups, because the efficiency of the visual network is not expected to change with age ( Geerligs et al., 2015 ). After that, higher cognitive function networks, such as CON and FPCN are involved in the speech production stages. A decrease in the local efficiency of these networks may alter their neural signature or change their timing, which is reflected in the EEG. Even though the older participants in the study by Geerligs et al. were, on average, almost a decade older than the older adults in our study, our older participants may have a mild decrease in local efficiency and modularity in the CON and the FPCN compared to the younger adults, because the decrease is not linear with age ( Geerligs et al., 2015 ).

An overview of the timing of the stages in the younger and older adults and the timing of significant differences between the two groups is provided in Figure 14 .

www.frontiersin.org

Figure 14. Timing of the stages in the model of spoken word and non-word production based on the results of the younger and the older adults and their differences.

Apart from the timing of the speech production stages, the neural configurations of the scalp distributions of the stages have been compared between the older and the younger adults. It was hypothesized that the scalp distributions do not change with age, because the same groups of neurons are expected to be involved in the stages of speech production in neurologically healthy adults, regardless of the adults’ age. Despite the fact that the effects related to each stage have been found in different time windows in the two groups, the scalp distributions during the stage were identical in the older and younger adults. This uniformity was the case for each speech production stage. Therefore, it can be concluded that older adults used the same neuronal processes as younger adults in the speech production stages. This was also supported by our behavioral results. Like the younger adults, the older adults performed at ceiling on the tasks. Also, the response times showed that the manipulations used in the tasks had the same effects in older and younger adults. Thus, the same factors had an influence on the speech production stages in both age groups.

The question remains why the response times of the older adults were later than the response times of the younger adults, even though the timing of the effects used to target the speech production stages was not generally delayed in the older adults. In the lexical route, lexeme retrieval was found to be delayed in older compared to younger adults. Since both picture-naming tasks required lexeme retrieval, the delay before this stage may have resulted in longer response times on the lemma and lexeme retrieval tasks. This is in line with the findings in the study by Laganaro et al. (2012) revealing differences between slow and fast speakers before the time window in which the AoA effect was found.

Lexeme retrieval is not involved in non-word production Therefore, delayed lexeme retrieval cannot explain later response times on non-word tasks in older adults, while no delay was observed for the phonological and phonetic encoding stages. Maybe, older adults respond later, because they generally are slower, as suggested in the Global Slowing Hypothesis (e.g., Brinley, 1965 ). However, this should have been reflected in the EEG as a longer duration and a later onset for every speech production stage, because neurophysiological measures are more sensitive than response time measures. Participants were asked to name the items as fast and accurately as possible. The tasks were fairly easy, so the accuracy of all patients was at ceiling. While younger adults can respond fast and accurately at the same time, older adults are known to focus on either speed or accuracy ( Ratcliff et al., 2007 ). Maybe older adults focused more on accuracy in our study and, therefore, needed to collect more information before they were ready to respond ( Rabbitt, 1979 ). In that case, the processes may not have been delayed in general, but only the decision whether the response was accurate or not was delayed. Thus, after the speech production process has been planned to its final stage, articulation, the older adults may have waited longer than the younger adults until they responded. In that case, this effect is not visible on the EEG, but only reflected in longer response times. If older adults wait before responding, the response-locked effects should be identified earlier in the older adults than in the younger adults. This, indeed, was the case for the cumulative semantic interference effect and the syllable frequency effect, but not for the AoA effect. However, individual differences are known to modulate the time window of the AoA effect ( Laganaro et al., 2012 ). A possible modulation of the AoA effect is supported by our response time data, in which the older adults showed a smaller AoA effect than the younger adults.

To conclude, the stages of the speech production process have been successfully identified in older and younger adults using the tasks of the protocol with EEG. The manipulations in the tasks had the same effect on the response time in both age groups; thus, the same factors influenced the speech production stages. Also, the scalp distributions related to the speech production stages did not differ between the older and the younger adults. This shows that the same neural processes are used during the speech production stages.

However, behaviorally, the comparison of the older and the younger adults showed that the older adults required longer response times on all tasks. Yet, the EEG results showed that the speech production stages do not generally start later or last longer in the older adults compared to the younger adults.

Limitations

The study is subject to two potential limitations. In this study, we included older adults (40–65 years old), whereas it is common practice to compare younger adults (i.e., university students) to a group of elderly (usually over 70 years old). Thus, the age difference between the younger and older adults was smaller than in other studies that compare language production and, therefore, the aging effects found in the current study are potentially not as large as when younger and individuals with aphasia is now possible: individuals with aphasia and without concomitant cognitive disorders are usually within the age range of our group of older adults. However, it would be very interesting to compare the performance of both age groups of the current study with the healthy elderly and individuals with dementia, who are usually above 70 years old.

Second, non-word reading skills of the two groups included in the present study have not been assessed prior to the experiment. Reading was only assessed using self-report, which cannot be used to detect potential variation in reading skills. This potential variation may have had an effect at the phonological and phonetic encoding stages. We do not think this caveat influenced the results, however, because all participants performed at ceiling on the non-word reading task.

Ethics Statement

This study was approved by the Research Ethics Committee of the Faculty of Arts of the University of Groningen.

Author Contributions

JH is working on this Ph.D. project, did the actual studies, and wrote the largest part of the text. RB is promotor and PI of this project, and wrote a large part of the manuscript. RJ is daily supervisor of JH. PM initiated this project.

This research was supported by an Erasmus Mundus Joint Doctorate (EMJD) Fellowship for “International Doctorate for Experimental Approaches to Language And Brain” (IDEALAB) of the University of Groningen (Netherlands), University of Newcastle (United Kingdom), University of Potsdam (Germany), University of Trento (Italy), and Macquarie University, Sydney (Australia), under Framework Partnership Agreement 2012-0025, specific grant agreement number 2015-1603/001-001-EMJD, awarded to JH by the European Commission. RB is partially supported by the Center for Language and Brain, National Research University Higher School of Economics, RF Government grant, agreement number 14.641.31.0004.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

  • ^ In fact, two non-word tasks were administered: reading and repetition. Since reading is more closely related to object naming (a visually presented stimulus evoking a spoken output), the data of the repetition task will be ignored.

Baayen, H. R., Piepenbrock, R., and Gulikers, L. (1995). The CELEX Lexical Database. Linguistic Data Consortium. Philadelphia, PA: University of Pennsylvania.

Google Scholar

Bastiaanse, R. (2010). Auditief Taalbegri Psprogramma (ATP). Houten: Bohn Stafleu van Loghum.

Bastiaanse, R., Wieling, M., and Wolthuis, N. (2016). The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30, 1221–1239. doi: 10.1080/02687038.2015.1100709

CrossRef Full Text | Google Scholar

Boersma, P., and Weenink, D. (2018). Praat : Doing Phonetics by Computer. Available at: http://www.praat.org/

Brinley, J. F. (1965). “Cognitive sets, speed and accuracy of performance in the elderly,” in Behavior, Aging, and the Nervous System , eds A. T. Welford and J. E. Birren (Springfield, IL: Thomas), 114–149.

Brysbaert, M., Stevens, M., De Deyne, S., Voorspoels, W., and Storms, G. (2014). Norms of age of acquisition and concreteness for 30,000 Dutch words. Acta Psychol. 150, 80–84. doi: 10.1016/j.actpsy.2014.04.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Bürki, A., Pellet-Cheneval, P., and Laganaro, M. (2015). Do speakers have access to a mental syllabary? ERP comparison of high frequency and novel syllable production. Brain Lang. 150, 90–102. doi: 10.1016/j.bandl.2015.08.006

Cardenas, V. A., Chao, L. L., Studholme, C., Yaffe, K., Miller, B. L., Madison, C., et al. (2011). Brain atrophy associated with baseline and longitudinal measures of cognition. Neurobiol. Aging 32, 572–580. doi: 10.1016/j.neurobiolaging.2009.04.011

Connor, L. T., Spiro, A., Obler, L. K., and Albert, M. L. (2004). Change in object naming ability during adulthood. J. Gerontol. Ser. B. 59, 203–209. doi: 10.1093/geronb/59.5.P203

Costa, A., Strijkers, K., Martin, C., and Thierry, G. (2009). The time course of word retrieval revealed by event-related brain potentials during overt speech. Proc. Natl. Acad. Sci. U.S.A. 106, 21442–21446. doi: 10.1073/pnas.0908921106

Delorme, A., and Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. doi: 10.1016/j.jneumeth.2003.10.009

Dosenbach, N. U. F., Fair, D. A., Miezin, F. M., Cohen, A. L., Wenger, K. K., Dosenbach, R. A. T., et al. (2007). Distinct brain networks for adaptive and stable task control in humans. Proc. Natl. Acad. Sci. U.S.A. 104, 11073–11078. doi: 10.1073/pnas.0704320104

E-Prime 2.0 (2012). Psychology Software Tools. Sharpsburg, PA: Psychology Software Tools, Inc.

Fjell, A. M., and Walhovd, K. B. (2011). Structural brain changes in aging: courses, causes and cognitive consequences. Rev. Neurosci. 21:187.

PubMed Abstract | Google Scholar

Forstmann, B. U., Tittgemeyer, M., Wagenmakers, E.-J., Derrfuss, J., Imperati, D., and Brown, S. (2011). The speed–accuracy tradeoff in the elderly brain: a structural model-based approach. J. Neurosci. 31, 17242–17249. doi: 10.1523/JNEUROSCI.0309-11.2011

Freeman, S. H., Kandel, R., Cruz, L., Rozkalne, A., Newell, K., Frosch, M. P., et al. (2008). Preservation of neuronal number despite age-related cortical brain atrophy in elderly subjects without Alzheimer disease. J. Neuropathol. Exp. Neurol. 67, 1205–1212. doi: 10.1097/NEN.0b013e31818fc72f

Geerligs, L., Renken, R. J., Saliasi, E., Maurits, N. M., and Lorist, M. M. (2015). A brain-wide study of age-related changes in functional connectivity. Cereb. Cortex 25, 1987–1999. doi: 10.1093/cercor/bhu012

Harada, C. N., Natelson Love, M. C., and Triebel, K. L. (2013). Normal cognitive aging. Clin. Geriatr. Med. 29, 737–752. doi: 10.1016/j.cger.2013.07.002

Hendrix, P., Bolger, P., and Baayen, H. (2017). Distinct ERP signatures of word frequency, phrase frequency, and prototypicality in speech production. J. Exp. Psychol. 43, 128–173. doi: 10.1037/a0040332

Nederlandse Taalunie (2004). Corpus Gesproken Nederlands, Version 2.0. Leiden: TST-Centrale INL.

Howard, D., Nickels, L., Coltheart, M., and Cole-Virtue, J. (2006). Cumulative semantic inhibition in picture naming: experimental and computational studies. Cognition 100, 464–482. doi: 10.1016/j.cognition.2005.02.006

Indefrey, P. (2011). The spatial and temporal signatures of word production components: a critical update. Front. Psychol. 2:225. doi: 10.3389/fpsyg.2011.00255

Jasper, H. H. (1958). Report of the committee on methods of clinical examination in electroencephalography. Electroencephalogr. Clin. Neurophysiol. 10, 370–375. doi: 10.1016/0013-4694(58)90053-1

Laganaro, M., and Perret, C. (2011). Comparing electrophysiological correlates of word production in immediate and delayed naming through the analysis of word age of acquisition effects. Brain Topogr. 24, 19–29. doi: 10.1007/s10548-010-0162-x

Laganaro, M., Valente, A., and Perret, C. (2012). Time course of word production in fast and slow speakers: a high density ERP topographic study. Neuroimage 59, 3881–3888. doi: 10.1016/j.neuroimage.2011.10.082

Levelt, W. J., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in speech production. Behav. Brain Sci. 22, 38–75.

Levelt, W. J. M., and Wheeldon, L. (1994). Do speakers have access to a mental syllabary? Cognition 50, 239–269. doi: 10.1016/0010-0277(94)90030-2

Maess, B., Friederici, A. D., Damian, M., Meyer, A. S., and Levelt, W. J. M. (2002). Semantic category interference in overt picture naming: sharpening current density localization by PCA. J. Cogn. Neurosci. 14, 455–462. doi: 10.1162/089892902317361967

Maris, E., and Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. J. Neurosci. Methods 164, 177–190. doi: 10.1016/j.jneumeth.2007.03.024

Marner, L., Nyengaard, J. R., Tang, Y., and Pakkenberg, B. (2003). Marked loss of myelinated nerve fibers in the human brain with age. J. Comp. Neurol. 462, 144–152. doi: 10.1002/cne.10714

MATLAB (2015). MATLAB. Natick, MA: The MathWorks Inc.

Nieuwenhuis, S., Forstmann, B. U., and Wagenmakers, E.-J. (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nat. Neurosci. 14, 1105–1107. doi: 10.1038/nn.2886

Oldfield, R. C. (1971). The assessment and analysis of handedness: the edinburgh inventory. Neuropsychologia 9, 97–113. doi: 10.1016/0028-3932(71)90067-4

Oostenveld, R., Fries, P., Maris, E., and Schoffelen, J.-M. (2011). FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Comput. Intell. Neurosci. 2011, 1–9. doi: 10.1155/2011/156869

R Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Rabbitt, P. (1979). How old and young subjects monitor and control responses for accuracy and speed. Br. J. Psychol. 70, 305–311. doi: 10.1111/j.2044-8295.1979.tb01687.x

Ratcliff, R., Thapar, A., and McKoon, G. (2007). Application of the diffusion model to two-choice tasks for adults 75-90 years old. Psychol. Aging 22, 56–66. doi: 10.1037/0882-7974.22.1.56

Sahin, N. T., Pinker, S., Cash, S. S., Schomer, D., and Halgren, E. (2009). Sequential processing of lexical, grammatical, and phonological information within Broca’s area. Science 326, 445–449. doi: 10.1126/science.1174481

Shafto, M. A., Burke, D. M., Stamatakis, E. A., Tam, P. P., and Tyler, L. K. (2007). On the tip-of-the-tongue: neural correlates of increased word-finding failures in normal aging. J. Cogn. Neurosci. 19, 2060–2070. doi: 10.1162/jocn.2007.19.12.2060

Thatcher, R. W., Biver, C. J., and North, D. M. (2002). Z Score EEG Biofeedback: Technical Foundations. Seminole, FL: Applied. Neuroscience Inc.

Tremblay, P., and Deschamps, I. (2016). Structural brain aging and speech production: a surface-based brain morphometry study. Brain Struct. Funct. 221, 3275–3299. doi: 10.1007/s00429-015-1100-1

Valente, A., Bürki, A., and Laganaro, M. (2014). ERP correlates of word production predictors in picture naming: a trial by trial multiple regression analysis from stimulus onset to response. Front. Neurosci. 8:390. doi: 10.3389/fnins.2014.00390

Vincent, J. L., Kahn, I., Snyder, A. Z., Raichle, M. E., and Buckner, R. L. (2008). Evidence for a frontoparietal control system revealed by intrinsic functional connectivity. J. Neurophysiol. 100, 3328–3342. doi: 10.1152/jn.90355.2008

Wlotko, E. W., Lee, C. L., and Federmeier, K. D. (2010). Language of the aging brain: event-related potential studies of comprehension in older adults: language of the aging brain. Lang. Linguist. Compass 4, 623–638. doi: 10.1111/j.1749-818X.2010.00224.x

Zheng, F., Liu, Y., Yuan, Z., Gao, X., He, Y., Liu, X., et al. (2018). Age-related changes in cortical and subcortical structures of healthy adult brains: a surface-based morphometry study: age-related study in healthy adult brain structure. J. Magn. Reson. Imaging 49, 152–163. doi: 10.1002/jmri.26037

www.frontiersin.org

APPENDIX 1A EEG statistics for the younger adults.

www.frontiersin.org

APPENDIX 1B EEG statistics for the older adults.

www.frontiersin.org

APPENDIX 1C EEG statistics for the comparison of the older and younger adults.

Keywords : speech production, aging, electroencephalography, word retrieval, articulation

Citation: den Hollander J, Jonkers R, Mariën P and Bastiaanse R (2019) Identifying the Speech Production Stages in Early and Late Adulthood by Using Electroencephalography. Front. Hum. Neurosci. 13:298. doi: 10.3389/fnhum.2019.00298

Received: 30 January 2019; Accepted: 12 August 2019; Published: 10 September 2019.

Reviewed by:

Copyright © 2019 den Hollander, Jonkers, Mariën and Bastiaanse. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Roelien Bastiaanse, [email protected]

† Peter Mariën passed away on November 01, 2017. He took the initiative for this project. Without him the current study could not have been performed.

This article is part of the Research Topic

Brain-Behaviour Interfaces in Linguistic Communication

U.S. flag

An official website of the United States government

Here's how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Home

Speech and Language Developmental Milestones

On this page:

How do speech and language develop?

What are the milestones for speech and language development, what is the difference between a speech disorder and a language disorder, what should i do if my child’s speech or language appears to be delayed, what research is being conducted on developmental speech and language problems.

  • Your baby's hearing and communicative development checklist

Where can I find additional information about speech and language developmental milestones?

The first 3 years of life, when the brain is developing and maturing, is the most intensive period for acquiring speech and language skills. These skills develop best in a world that is rich with sounds, sights, and consistent exposure to the speech and language of others.

There appear to be critical periods for speech and language development in infants and young children when the brain is best able to absorb language. If these critical periods are allowed to pass without exposure to language, it will be more difficult to learn.

The first signs of communication occur when an infant learns that a cry will bring food, comfort, and companionship. Newborns also begin to recognize important sounds in their environment, such as the voice of their mother or primary caretaker. As they grow, babies begin to sort out the speech sounds that compose the words of their language. By 6 months of age, most babies recognize the basic sounds of their native language.

Children vary in their development of speech and language skills. However, they follow a natural progression or timetable for mastering the skills of language. A checklist of milestones for the normal development of speech and language skills in children from birth to 5 years of age is included below. These milestones help doctors and other health professionals determine if a child is on track or if he or she may need extra help. Sometimes a delay may be caused by hearing loss, while other times it may be due to a speech or language disorder.

Children who have trouble understanding what others say (receptive language) or difficulty sharing their thoughts (expressive language) may have a language disorder. Developmental language disorder  (DLD) is a language disorder that delays the mastery of language skills. Some children with DLD may not begin to talk until their third or fourth year.

Children who have trouble producing speech sounds correctly or who hesitate or stutter when talking may have a speech disorder. Apraxia of speech is a speech disorder that makes it difficult to put sounds and syllables together in the correct order to form words.

Talk to your child’s doctor if you have any concerns. Your doctor may refer you to a speech-language pathologist, who is a health professional trained to evaluate and treat people with speech or language disorders. The speech-language pathologist will talk to you about your child’s communication and general development. He or she will also use special spoken tests to evaluate your child. A hearing test is often included in the evaluation because a hearing problem can affect speech and language development. Depending on the result of the evaluation, the speech-language pathologist may suggest activities you can do at home to stimulate your child’s development. They might also recommend group or individual therapy or suggest further evaluation by an audiologist (a health care professional trained to identify and measure hearing loss), or a developmental psychologist (a health care professional with special expertise in the psychological development of infants and children).

The National Institute on Deafness and Other Communication Disorders (NIDCD) sponsors a broad range of research to better understand the development of speech and language disorders, improve diagnostic capabilities, and fine-tune more effective treatments. An ongoing area of study is the search for better ways to diagnose and differentiate among the various types of speech delay. A large study following approximately 4,000 children is gathering data as the children grow to establish reliable signs and symptoms for specific speech disorders, which can then be used to develop accurate diagnostic tests. Additional genetic studies are looking for matches between different genetic variations and specific speech deficits.

Researchers sponsored by the NIDCD have discovered one genetic variant, in particular, that is linked to developmental language disorder (DLD), a disorder that delays children’s use of words and slows their mastery of language skills throughout their school years. The finding is the first to tie the presence of a distinct genetic mutation to any kind of inherited language impairment. Further research is exploring the role this genetic variant may also play in dyslexia, autism, and speech-sound disorders.

A long-term study looking at how deafness impacts the brain is exploring how the brain “rewires” itself to accommodate deafness. So far, the research has shown that adults who are deaf react faster and more accurately than hearing adults when they observe objects in motion. This ongoing research continues to explore the concept of “brain plasticity”—the ways in which the brain is influenced by health conditions or life experiences—and how it can be used to develop learning strategies that encourage healthy language and speech development in early childhood.

A recent workshop convened by the NIDCD drew together a group of experts to explore issues related to a subgroup of children with autism spectrum disorders who do not have functional verbal language by the age of 5. Because these children are so different from one another, with no set of defining characteristics or patterns of cognitive strengths or weaknesses, development of standard assessment tests or effective treatments has been difficult. The workshop featured a series of presentations to familiarize participants with the challenges facing these children and helped them to identify a number of research gaps and opportunities that could be addressed in future research studies.

What are voice, speech, and language?

Voice, speech, and language are the tools we use to communicate with each other.

Voice is the sound we make as air from our lungs is pushed between vocal folds in our larynx, causing them to vibrate.

Speech is talking, which is one way to express language. It involves the precisely coordinated muscle actions of the tongue, lips, jaw, and vocal tract to produce the recognizable sounds that make up language.

Language is a set of shared rules that allow people to express their ideas in a meaningful way. Language may be expressed verbally or by writing, signing, or making other gestures, such as eye blinking or mouth movements.

Your baby’s hearing and communicative development checklist

Birth to 3 months, 4 to 6 months, 7 months to 1 year, 1 to 2 years, 2 to 3 years, 3 to 4 years, 4 to 5 years.

This checklist is based upon How Does Your Child Hear and Talk ?, courtesy of the American Speech–Language–Hearing Association.

The NIDCD maintains a directory of organizations that provide information on the normal and disordered processes of hearing, balance, taste, smell, voice, speech, and language.

Use the following keywords to help you find organizations that can answer questions and provide information on speech and language development:

  • Early identification of hearing loss in children
  • Speech-language pathologists

For more information, contact us at:

NIDCD Information Clearinghouse 1 Communication Avenue Bethesda, MD 20892-3456 Toll-free voice: (800) 241-1044 Toll-free TTY: (800) 241-1055 Email: [email protected]

NIH Publication No. 00-4781 September 2010

*Note: PDF files require a viewer such as the free Adobe Reader .

Logo for TRU Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.1 Evidence for Speech Production

Dinesh Ramoo

The evidence used by psycholinguistics in understanding speech production can be varied and interesting. These include speech errors, reaction time experiments, neuroimaging, computational modelling, and analysis of patients with language disorders. Until recently, the most prominent set of evidence for understanding how we speak came from speech errors . These are spontaneous mistakes we sometimes make in casual speech. Ordinary speech is far from perfect and we often notice how we slip up. These slips of the tongue can be transcribed and analyzed for broad patterns. The most common method is to collect a large corpus of speech errors by recording all the errors one comes across in daily life.

Perhaps the most famous example of this type of analysis are what are termed ‘ Freudian slips .’ Freud (1901-1975) proposed that slips of the tongue were a way to understand repressed thoughts. According to his theories about the subconscious, certain thoughts may be too uncomfortable to be processed by the conscious mind and can be repressed. However, sometimes these unconscious thoughts may surface in dreams and slips of the tongue. Even before Freud, Meringer and Mayer (1895) analysed slips of the tongue (although not in terms of psychoanalysis).

Speech errors can be categorized into a number of subsets in terms of the linguistic units or mechanisms involved. Linguistic units involved in speech errors could be phonemes, syllables, morphemes, words or phrases. The mechanisms of the errors can involve the deletion, substitution, insertion, or blending of these units in some way. Fromkin (1971; 1973) argued that the fact that these errors involve some definable linguistic unit established their mental existence at some level in speech production. We will consider these in more detail in discussing the various stages of speech production.

An error in the production of speech.

An unintentional speech error hypothesized by Sigmund Freud as indicating subconscious feelings.

9.1 Evidence for Speech Production Copyright © 2021 by Dinesh Ramoo is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.3 Speech Production Models

The dell model.

Speech error analysis has been used as the basis for the model developed by Dell (1986, 1988). Dell’s spreading activation model (as seen in Figure 9.3) has features that are informed by the nature of speech errors that respect syllable position constraints. This is based on the observation that when segmental speech errors occur, they usually involve exchanges between onsets, peaks or codas but rarely between different syllable positions. Dell (1986) states that word-forms are represented in a lexical network composed on nodes that represent morphemes, segments and features. These nodes are connected by weighted bidirectional vertices.

A depiction of Dell’s spreading activation model, composed on nodes illustrating the morphemes, segments, and features in a lexical network.

As seen in Figure 9.3, when the morpheme node is activated, it spreads through the lexical network with each node transmitting a proportion of its activation to its direct neighbour(s). The morpheme is mapped onto its associated segments with the highest level of activation. The selected segments are encoded for particular syllable positions which can then be slotted into a syllable frame. This means that the /p/ phoneme that is encoded for syllable onset is stored separately from the /p/ phonemes encoded for syllable coda position. This also accounts for the phonetic level in that instead of having two separate levels for segments (phonological and phonetic levels), there is only one segmental level. In this level, the onset /p/ is stored with its characteristic aspiration as [ph] and the coda /p/ is stored in its unaspirated form [p]. Although this means that segments need to be stored twice for onset and coda positions, it simplified the syllabification process as the segments automatically slot into their respective position. Dell’s model ensures the preservation of syllable constraints in that onset phonemes can only fit into onset syllable slots in the syllable template (the same being true for peaks and codas). The model also has an implicit competition between phonemes that belong to the same syllable position and this explains tongue-twisters such as the following:

  • “She sells sea shells by the seashore” ʃiː sɛlz siːʃɛlz baɪ ðiː siːʃɔː
  • “Betty Botter bought a bit of butter” bɛtiː bɒtə bɔːt ə bɪt ɒv bʌtə

In these examples, speakers are assumed to make errors because of competition between segments that share the same syllable position. As seen in Figure 9.3, Dell (1988) proposes a word-shape header node that contains the CV specifications for the word-form. This node activates the segment nodes one after the other. This is supported by the serial effects seen in implicit priming studies (Meyer, 1990; 1991) as well as some findings on the influence of phonological similarity on semantic substitution errors (Dell & Reich, 1981). For example, the model assumes that semantic errors (errors based on shared meaning) arise in lemma nodes. The word cat shares more segments with a target such as mat ((/æ/nu and /t/cd) than with sap (only /æ/nu). Therefore, the lemma node of mat will have a higher activation level than the one for sap creating the opportunity for a substitution error. In addition, the feedback from morpheme nodes leads to a bias towards producing words rather then nonword error. The model also takes into account the effect of speech rate on error probability (Dell, 1986) and the frequency distribution of anticipation-, perseveration- and transposition- errors (Nooteboom, 1969). The model accounts for differences between various error types by having an in-built bias for anticipation. Activation spreads through time. Therefore, upcoming words receive activation (at a lower level than the current target). Speech rate also has an influence on errors because higher speech rates may lead to nodes not having enough time to reach a specified level of activation (leading to more errors).

While the Dell model has a lot of support for it’s architecture, there have been criticisms. The main evidence used for the model, speech errors, have themselves been questioned as a useful piece of evidence for informing speech production models (Cutler, 1981). For instance, the listener might misinterpret the units involved in the error and may have a bias towards locating errors at the beginning of words (accounting for the large number of word-onset errors). Evidence for the CV header node is limited as segment insertions usually create clusters when the target word also had a cluster and CV similarities are not found for peaks.

The model also has an issue with storage and retrieval as segments need to be stored for each syllable position. For example, the /l/ in English needs to be stored as [l] for syllable onset, [ɫ] for coda and [ḷ] when it appears as a syllabic consonant in the peak (as in bottle ). However, while this may seem redundant and inefficient, recent calculations of storage costs based on information theory by Ramoo and Olson (2021) suggest that the Dell model may actually be more storage efficient than previously thought. They suggest that one of the main inefficiencies of the model are during syllabification across word and morpheme boundaries. During the production of connected speech or polymorphic words, segments from one morpheme or word will move to another (Chomsky & Halle, 1968; Selkirk, 1984; Levelt, 1989). For example, when we say “walk away” /wɔk.ə.weɪ/, we produce [wɔ.kə.weɪ] where the /k/ moves from coda to onset in the next syllable. As the Dell model codes segments for syllable position, it may not be possible for such segments to move from coda to onset position during resyllabification . These and other limitations have led researchers such as Levelt (1989) and his colleagues (Meyer, 1992; Roelofs, 2000) to propose a new model based on reaction time experiments.

The Levelt, Roelofs, and Meyer (LRM) Model

The Levelt, Roelofs, and Meyer or LRM model is one of the most popular models for speech production in psycholinguistics. It is also one of the most comprehensive in that it takes into account all stages from conceptualization to articulation (Levelt et al., 1999). The model is based on reaction time data from naming experiments and is a top-down model where information flows from more abstract levels to more concrete stages. The Word-form Encoding by Activation and VERification (WEAVER) is the computational implementation of the LRM model developed by Roelof (1992, 1996, 1997a, 1997b, 1998, 1999). It is a spreading activation model inspired by Dell’s (1986) ideas about word-form encoding. It accounts for the syllable frequency effect and ambiguous syllable priming data (although the computational implementation has been more successful in illustrating syllable frequency effects rather than priming effects).

An illustration of the Levelt, Roelofs, and Meyer model. Illustrates the lexical level, the lemma level, and the lexeme level within the upper, “lexicon” portion of the diagram, with the syllabary and articulatory buffer contained below under “post-lexical”.

As we can see in Figure 9.4, the lemma node is connected to segment nodes. These vertices are specified for serial position and the segments are not coded for syllable position. Indeed, the only syllabic information that is stored in this model are syllable templates that indicate the stress patterns of each word (which syllable in the word is stressed and which is not). These syllabic templates are used during speech production to syllabify the segments using the principle of onset-maximization (all segments that can legally go into a syllable onset in a language are put into the onset and the leftover segments go into the coda). This kind of syllabification during production accounts for resyllabification (which is a problem for the Dell model). The model also has a mental syllabary which is hypothesized to contain the articulatory programs that are used to plan articulation.

The model is interesting in that syllabification is only relevant at the time of production. Phonemes are defined within the lexicon with regard to their serial position in the word or lemma. This allows for resyllabification across morpheme and word boundaries without any difficulties.  Roelofs and Meyer (1998) investigated whether syllable structures are stored in the mental frame. They employed an implicit priming paradigm where participants produced one word out of a set of words in rapid succession. The words were either homogenous (all words had the same word onsets) or heterogeneous. They found that priming depended on the targets having the same number of syllable and stress patterns but not the same syllable structure. This led them to conclude that syllable structure was not a stored component of speech production but computed during speech (Choline et al., 2004). Costa and Sebastian-Galles (1998) employed a picture-word interference paradigm to investigate this further. They asked participants to name a picture while a word was presented after 150 ms. They found that participants were faster to name a picture when they shared the same syllable structure with the word. These results challenge the view that syllable structure is absent as an abstract encoding within the lexicon. A new model has challenged the LRM model’s assumptions on this with a Lexicon with Syllable Structure (LEWISS) model.

The Lexicon with Syllable Structure (LEWISS) Model

Proposed by Romani et al. (2011), the Lexicon with Syllable Structure (LEWISS) model explores the possibility of stored syllable structure in phonological encoding. As seen in Figure 9.5 the organisation of segments in this model is based on a syllable structure framework (similar to proposals by Selkirk, 1982; Cairns & Feinstein, 1982). However, unlike the Dell model, the segments are not coded for syllable position. The syllable structural hierarchy is composed of syllable constituent nodes (onset, peak and coda) with the vertices having different weights based on their relative positions. This means that the peak (the most important part of a syllable) has a very strongly weighted vertex compared to onsets and codas. Within onsets and codas, the core positions are more strongly weighted compared to satellite position. This is based on the fact that there are positional variations in speech errors. For example, onsets and codas are more vulnerable to errors compared to vowels or peaks. Within onsets and codas, the satellite positions are more vulnerable compared to core positions. For example, in a word like print , the /r/ and /n/ in onset and coda satellite positions are more likely to be the subjects of errors than the /p/ and /t/ which are core positions. The main evidence for the LEWISS model comes from the speech errors of aphasic patients (Romani et al., 2011). It was observed that not only did they produce errors that weighted syllable positions differently, they also preserved the syllable structure of their targets even when making speech errors.

A diagram of the Lexicon with Syllable Structure model, which illustrates how the organization of segments can be based on syllable structure.

In terms of syllabification, the LEWISS model syllabifies at morpheme and word edges instead of having to syllabify the entire utterance each time it is produced. The evidence from speech errors supports the idea of having syllable position constraints. While Romani et al. (2011) have presented data from Italian, speech error analysis in Spanish also supports this view (Garcia-Albea et al., 1989). The evidence from Spanish is also interesting in that the errors are mostly word-medial rather than word-initial as is the case for English (Shattuck-Hufnagel, 1987, 1992). Stemberger (1990) hypothesised that structural frames for CV structure encoding may be compatible with phonological systems proposed by Clements and Keyser (1983) as well as Goldsmith (1990). This was supported by speech errors from German and Swedish (Stemberger, 1984). However, such patterns were not observed in English. Costa and Sebastian-Gallés (1998) found primed picture-naming was facilitated by primes that shared CV structure with the targets. Sevald, Dell and Cole (1995) found similar effects in repeated pronunciation tasks in English. Romani et al. (2011) brought these ideas to the fore with their analysis of speech errors made by Italian aphasic and apraxic patients. The patients did repetition, reading, and picture-naming tasks. Both groups of patients produced errors that targeted vulnerable syllable positions such as onset- and coda- satellites consistent with previous findings (Den Ouden, 2002). They also found that a large proportion of the errors preserved syllable structure even in the errors. This is noted by previous findings as well (Wilshire, 2002). Previous findings by Romani and Calabrese (1996) found that Italian patients replaced geminates with heterosyllabic clusters rather than homosyllabic clusters. For example, /ʤi.raf.fa/ became /ʤi.rar.fa/ rather than /ʤi.ra.fra/ preserving the original syllable structure of the target. While the Dell model’s segments coded for syllable position can also explain such errors, it cannot account for errors that moved from one syllable position to another. More recent computational calculations by Ramoo and Olson (2021) found that the resyllabification rates in English and Hindi as well as storage costs predicted by information theory do not discount LEWISS based on storage and computational costs.

Language Production Models

stages of speech production

  • This is the non-verbal concept of the object that is elicited when we see a picture, read the word or hear it.
  • An abstract conceptual form of a word that has been mentally selected for utterance.
  • The meaningful unit (or units) of the lemma attached to specific segments.
  • Syllable nodes are created using the syllable template.
  • Segment nodes are specified for syllable position. So, [p onset] will be a separate segment from [p coda].
  • This node indicates that the word is singular.
  • This node specifies the CV structure and order of the word.
  • A syllable template is used in the syllabification process to indicate which segments can go where.
  • The segment category nodes are specified for syllable position. So, they only activate segments that are for onset, peak or coda syllable positions. Activation will be higher for the appropriate segment.

stages of speech production

  • Segment nodes are connected to the morpheme node specified for serial position.
  • The morpheme is connected to a syllable template that indicates how many syllable are contained within the phonological word. It also indicates which syllables are stressed and unstressed.
  • Post-lexical syllabification uses the syllable template to syllabify the phonemes. This is also the place where phonological rules can be implimented. For example, in English, unvoiced stops will be aspirated in output.
  • Syllabified representations are used to access a Mental Syllabary of articulatory motor programs.
  • The final output.

LEWISS Model

stages of speech production

  • The syllable structure nodes indicate the structure of the word’s syllable structure. They also specify syllable stress or tone. In addition, the connections are weighted. So, core positions and peak positions are strongly weighted compared to satellite positions.
  • Segment nodes are connected to the morpheme node. They are also connected to a syllable structure that keeps them in place.
  • Post-lexical syllabification syllabify the phonemes at morpheme and word boundaries. This is also the place where phonological rules can be implimented. For example, in English, unvoiced stops will be aspirated in output.

Navigate to the above link to view the interactive version of these models.

Media Attributions

  • Figure 9.3 The Dell Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .
  • Figure 9.4 The LRM Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .
  • Figure 9.5 The LEWIS Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .

The process of putting individual segments into syllables based on language-specific rules.

The process by which segments that belong to one syllable move to another syllable during morphological changes and connected speech.

The structure of the syllable in terms of onset, peak (or nucleus) and coda.

Psychology of Language Copyright © 2021 by Dinesh Ramoo is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

stages of speech production

stages of speech production

Speech planning: How our brains prepare to speak

I n a fascinating exploration of speech planning, research led by the NYU Grossman School of Medicine reveals how our brains prepare to speak before we actually verbalize our thoughts.

The study, conducted among individuals undergoing surgery for epilepsy treatment, is pivotal for advancing our understanding of speech production . It unveils significant insights into the brain's mechanisms behind speech planning.

The brain's blueprint for speech

The study delves into the roles of the inferior frontal gyrus and motor cortex - two regions situated in the outer layers of the cerebral cortex. These areas are pivotal for controlling the muscles involved in speech production, influencing our selection of words and sounds.

The researchers examined brain-mapping recordings from 16 patients, aged 14 to 43, at NYU Langone Health over the period from 2018 to 2021. These individuals were undergoing pre-surgical evaluations for epilepsy.

The process involved stimulating brain regions to pinpoint and conserve areas critical to speech while targeting seizure-inducing tissue for removal.

Importantly, a key innovation of this research lies in measuring the brief time intervals, less than two seconds, between brain stimulation and its impact on speech. This approach offers fresh perspectives on the cortex's role in speech planning.

Exploring speech production

The researchers have unveiled new insights into the organization of speech within the brain. They found that the delays before speech disruption, known as latencies, vary across different brain regions. This discovery highlights the brain's complex role in speech production and its elaborate mechanisms.

"Our study adds evidence for the role of the brain's motor cortex and inferior frontal gyrus in planning speech and determining what people are preparing to say, not just voicing words using the vocal cords or mouthing the words by moving the tongue and lips," stated Dr. Heather Kabakoff, a speech pathologist at NYU Langone.

Furthermore, Dr. Adeen Flinker, the study's senior investigator and a neuroscientist, explained: "Our results show that mapping out the millisecond time intervals, or latencies, between electrical stimulation in parts of the brain to the disruption or slurring of words and eventual inability to speak can be used to better understand how the human brain works and the roles played by different brain regions in human speech."

Dr. Flinker emphasized the clinical implications of these findings. Additionally, he suggested that this research could pave the way for improved surgical techniques to protect speech functions during brain surgeries.

The next frontier in speech and brain research

The team is now expanding their focus, setting their sights on broader horizons. Consequently, they aim to unravel the roles of other brain parts in speech and auditory processing.

The researchers are studying real-time brain corrections of speech errors to enhance our understanding of speech control and modification. This endeavor marks another step forward in decoding the complexities of human communication.

The process of speech planning

The process of speech planning in the brain is a complex, multi-step process involving various brain regions working in coordination. It includes several key stages:

Conceptualization

This is the initial stage where the intent or idea of what one wants to communicate is formed. It involves abstract thinking and decision-making processes in the prefrontal cortex, where the brain decides on the message it wants to convey.

Lexical selection

Once the concept is formed, the brain selects the appropriate words to express the idea. This involves the temporal lobe, particularly the left temporal lobe for most people, where language comprehension and vocabulary are managed.

Syntactic processing

After selecting the words, the brain organizes them into a grammatically correct structure. This involves Broca's area, located in the left frontal lobe, which is responsible for speech production and the grammatical aspects of language .

Phonological processing 

This stage involves planning the sounds that need to be produced to articulate the words. This involves the interaction between Wernicke's area, which is involved in language comprehension and the processing of phonological (sound) information, and Broca's area for the motor aspects of speech production.

Motor planning

Before speech occurs, the brain must plan and coordinate the specific movements of the mouth, tongue, vocal cords, and lungs that produce speech. 

This involves the motor cortex, which controls voluntary muscle movements, and the cerebellum, which coordinates the timing and precision of these movements.

Finally, the motor cortex sends signals through the nervous system to the speech organs, executing the planned movements and producing speech. This involves intricate coordination of muscles and breathing to articulate words and modulate voice.

Throughout this speech planning process, the brain also relies on feedback mechanisms involving auditory and somatosensory systems to monitor and adjust speech production in real-time. This ensures accuracy in articulation and intonation.

The exact neural pathways and interactions between these areas are still a subject of ongoing research, as the brain's processes for managing speech are incredibly complex and vary from person to person.

The study is published in the journal Brain .

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. 

Check us out on EarthSnap , a free app brought to you by Eric Ralls and Earth.com.

Speech planning: How our brains prepare to speak

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Editorial: Models and Theories of Speech Production

Adamantios gafos.

1 Department of Linguistics and Excellence Area of Cognitive Sciences, University of Potsdam, Potsdam, Germany

Pascal van Lieshout

2 Department of Speech-Language Pathology, Oral Dynamics Laboratory, University of Toronto, Toronto, ON, Canada

Spoken language is conveyed via well-coordinated speech movements, which act as coherent units of control referred to as gestures. These gestures and their underlying movements show several distinctive properties in terms of lawful relations among the parameters of duration, relative timing, range of motion, target accuracy, and speed. However, currently, no existing theory successfully accounts for all properties of these movements. Even though models in speech motor control in the last 40 years have consistently taken inspiration from general movement science, some of the comparisons remain ill-informed. For example, our present knowledge on whether widely known principles that apply to limb movements (e.g., the speed-accuracy trade off known as Fitts' law) also hold true for speech movements is still very limited. An understanding of the principles that apply to speech movements is key to defining the somewhat elusive concept of speech motor skill and to assessing and interpreting different levels of that skill in populations with and without diagnosed speech disorders. The latter issue taps into fundamental debates about whether speech pathology assessment paradigms need to be restricted to control regimes that are specific to those underlying typical speech productions. Resolution of such debates crucially relies on our understanding of the nature of speech processes and the underlying control units.

Unlike movements in locomotion or oculomotor function, speech movements when combined into gestures are not mere physical instantiations of organs moving in space and time but, also, have intrinsic symbolic function. Language-particular systems, or phonological grammars, are involved in the patterning of these gestures. Grammar constraints regulate the permissible symbolic combinations as evidenced via eliciting judgments on whether any given sequence is well-formed in any particular language (the same sequence can be acceptable in one, but not the other language). In what ways these constraints shape speech gestures and how these fit with existing general principles of motor control is, also, not clearly understood.

Furthermore, speech gestures are parts of words and thus one window into understanding the nature of the speech production 1 system is to observe speech movements as parts of words or larger chunks of speech such as phrases or sentences. The intention to produce a lexical item involves activating sequences of gestures that are part of the lexical item. The regulation in time of the units in such sequences raises major questions for speech motor control theories (but also for theories of cognition and sequential action in general). Major challenges are met in the inter-dependence among different time scales related to gestural planning, movement execution and coordination within and across domains of individual lexical items. How these different time scales interact and how their interaction affects the observed movement properties is for the most part still unknown.

In this special issue, we present a variety of theoretical and empirical contributions which explore the nature of the dynamics of speech motor control. For practical purposes, we separate these contributions in two major themes:

  • 1) Models and theories of speech production.
  • 2) Applications.

Following is a short description of each paper as listed under these themes.

  • 1) Models and theories of speech production

The speech signal is simultaneously expressed in two information-encoding systems: articulation and acoustics. Goldstein's contribution addresses the relation between representations in these two parallel manifestations of speech while focusing not on static properties but on patterns of change over time (temporal co-modulation) in these two channels. To do so, Goldstein quantifies the relation between rates of change in the parallel acoustic and articulatory representations of the same utterance, produced by various speakers, based on x-ray microbeam data. Analysis of this relation indicates that the two representations are correlated via a pulse-like modulation structure, with local correlations being stronger than global ones. This modulation seems linked to the fundamental unit of the syllable.

It is widely assumed that acoustic parameters for vowels are normally distributed, but it is rarely demonstrated that this might be the case. Whalen and Chen quantified the distributions of F1 and F2 values of /i/ and /o/ in the English words “heed,” “geek,” “ode”/“owed,” and “dote” produced by a single speaker on three different days. Analysis based on a high number of repetitions of these vowels in different consonantal contexts indicates that distributions are generally normal, which in turn suggests consistent vowel-specific targets across different contextual environments. The results add weight to the widely-held assumption that speech targets follow a normal distribution and the authors discuss the implications for theories of speech targets.

Turk and Shattuck-Hufnagel address the nature of timing in speech, with special attention given to movement endpoints, which as they argue relate to the goals of these movements. The argument is presented that these points require dedicated control regimes. Evidence for this argument is derived from work in both speech and non-speech motor control. It is also argued that in contrast to the Articulatory Phonology/Task Dynamics view, where gestural durations are determined by an intrinsic dynamics, duration must be an independently controlled variable in speech. A phonology-extrinsic component is thus proposed to be necessary and a call is made for developing and testing models of speech where a component of abstract, symbolic phonological representations is kept apart from the way(s) in which these representations are implemented in quantitative terms which include surface duration specifications and attendant timing mechanisms for achieving these.

Shaw and Chen investigated to what degree timing between gestures is stable across variations in the spatial positions of individual articulators, as predicted in Articulatory Phonology. Using Electromagnetic Articulography with a group of Mandarin speakers producing CV monosyllables, they found a correlation between the initial position of the tongue gesture for the vowel and C-V timing. In contrast to the original hypothesis, this indicates that inter-gestural timing is sensitive to the position of the articulators, suggesting a critical role for somatosensory feedback.

Roessig and Mücke study tonal and kinematic profiles of different degrees of prominence (unaccented, broad, narrow and contrastive focus) from 27 speakers of German. Parameters in both the tonal and kinematic dimensions are shown to vary systematically across degrees of prominence. A dynamical approach is put forward in modeling these findings. This approach embraces the multidimensionality of prosody while at the same time showing how both discrete and continuous modifications in focus marking can be expressed within one formal language. The model captures qualitatively the observed patterns in the data by tuning of an abstract control variable which shapes the attractor landscape over the parameter space of kinematic and tonal dimensions considered in this work.

Iskarous provides a computational approach to explain the nature of spatiotemporal particulation of the vocal tract, as evidenced in the production of speech gestures. Based on a set of reaction-diffusion equations with simultaneous Turing and Hopf patterns the critical characteristics of speech gestures related to vocal tract constrictions can be replicated in support of the notion that motor processes can be seen as the emergence of low degree of freedom descriptions from high degree of freedom systems.

Patri et al. address individual differences in responses to auditory or somatosensory perturbation in speech production. Two accounts are entertained. The first reduces individual differences to differences in acuity of the sensory specifications while the second leaves sensory specifications intact and, instead, modulates the sensitivity of match between motor commands and their auditory consequences. While simulation results show that both accounts lead to similar results, it is argued that maintaining intact sensory specifications is more flexible, enabling a more encompassing approach to speech variability where cognitive, attentional and other factors can modulate responses to perturbations.

One of the foundational ideas of phonology and phonetics is that produced and perceived utterances are decomposed into sequences of discrete units. However, evidence from development indicates that in child speech utterances are holistic rather than segmented. The contribution by Davis and Redford offers a theoretical demonstration along with attendant modeling that the posited units can emerge from a stage of speech where words or phrases start off as time-aligned motoric and perceptual trajectories. As words are added and repeatedly rehearsed by the learner, motoric trajectories begin to develop recurrent articulatory configurations which, when coupled with their corresponding perceptual representations, give rise to perceptual-motor units claimed to characterize mature speech production.

In their contribution, Kearney et al. present a simplified version of the DIVA model, focusing on three fitting parameters related to auditory feedback control, somatosensory feedback control, and feedforward control. The model is tested through computer simulations that identify optimal model fits to six existing sensorimotor adaptation datasets, showing excellent fits to real data across different types of perturbations and experimental paradigms.

An active area in phonological theory is the investigation of long-distance assimilation where features of a phoneme assimilate to features of another non-adjacent phoneme. Tilsen seeks to identify mechanisms for the emergence of such non-local assimilations in speech planning and production models. Two mechanisms are proposed. The first is one where a gesture is either anticipatorily selected in an earlier epoch or is not suppressed (after being selected) so that its influence extends to later epochs. The second is one where gestures which may be active in one epoch of a planning-level dynamics, even though not selected during execution, may still influence production in a different epoch. Evidence for these mechanisms is found in both speech and non-speech movement preparation paradigms. The existence of these two mechanisms is argued to account for the major dichotomy between assimilation phenomena that have been described as involving the extension of an assimilating property vs. those that cannot be so described.

Xu and Prom-on contrast two principles assumed to underlie the dynamics of movement control: economy of effort and maximum rate of information. They present data from speakers of American English on repetitive syllable sequences who were asked to imitate recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. The results show that the characteristics of the formant trajectories they analyzed fit best the notion of the maximum rate of information principle.

Kröger et al.'s contribution offers a demonstration that a learning model based on self-organizing maps can serve as bridge between models of the mental lexicon and models of sensorimotor control and that such a model can learn (from semantic, auditory and somatosensory information) representational units akin to phonetic-phonological features. At a broad level, few efforts have been made to bridge theory and modeling of the lexicon and motor control. The proposed model aims at addressing that gap and makes predictions about the specificity and rate of growth of such representational features under different training conditions (auditory only vs. auditory and somatosensory training modes).

Parrell and Lammert develop a synthesis of the dynamic movement primitives model of motor control (Schaal et al., 2007 ; Ijspeert et al., 2013 ) with the task dynamics model of speech production (Saltzman and Munhall, 1989 ). A key element in achieving this synthesis is the incorporation of a learnable forcing term into the task dynamics' point-attractor system. The presence of such a tunable term endows task dynamics with flexibility in movement trajectories. The proposed synthesis also establishes a link to optimization approaches to motor control where the forcing term can be seen to minimize a cost function over the timespan of the movement under consideration (e.g., minimizing total energy expended during a reaching movement). The dynamics of the proposed synthesis model are explicitly described and their effects are demonstrated in the form of proof of concept simulations showing the consequences of perturbations on jaw movement trajectories.

  • 2) Applications

Noiray et al. present a study in which they examined whether phonemic awareness correlates with coarticulation degree, commonly used as a metric for estimating the size of children's production units. A speech production task was designed to test for developmental differences in intra-syllabic coarticulation degree in 41 German children from 4 to 7 years of age, using ultrasound imaging. The results suggest that the process of developing spoken language fluency involves dynamical interactions between cognitive and speech motor domains.

Tiede et al. describe a study in which they tracked movements of the head and speech articulators during an alternating word pair production task driven by an accelerating rate metronome. The results show that as production effort increased, so did speaker head nodding, and that nodding increased abruptly following errors. The strongest entrainment between head and articulators was observed at the fastest rate under coda alternation conditions.

Namasivayam et al. present an Articulatory Phonology approach for understanding the nature of Speech Sound Disorders (SSDs) in children, aiming to reconcile the traditional phonetic-phonology dichotomy with the concept of interconnectedness between these levels. They present evidence supporting the notion of articulatory gestures at the level of speech production and how this is reflected in control processes in the brain. They add an overview of how an articulatory “gesture”-based approach can account for articulatory behaviors in typical and disordered speech production, concluding that the Articulatory Phonology approach offers a productive strategy for further research in this area.

Heyne et al. address the relation between speech and another oral motor skill, trombone playing. Using ultrasound, they recorded midsagittal tongue shapes from New Zealand English and Tongan-speaking trombone players. Tongue shapes from the two language groups were estimated via fits with generalized additive mixed models, while these speakers/players produced vowels (in their native languages) and sustained notes at different pitches and intensities. The results indicate that, while airflow production and requisite acoustics largely constrain vocal tract configuration during trombone playing, evidence for a secondary influence from speech motor configurations can be discerned in that the two groups tended to use different tongue configurations resembling distinct vocalic monopthongs in their respective languages.

The papers assembled for this Research Topic attest to the advantages of combining theoretical and empirical approaches to the study of speech production. They also attest to the value of formal modeling in addressing long-standing issues in speech development and the relationship between motor control and phonological patterns; to the importance of somatosensory and auditory feedback in planning and monitoring speech production and the importance of integrating speech production models with other aspects of cognition; and finally, to the potential of theoretical models in informing applications of speech production in disordered speech and motor skills in other oral activities such as playing musical instruments.

Author Contributions

All authors listed have made equal contributions to the work and approved it for publication.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

AG's work has been supported by the European Research Council (AdG 249440) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project ID 317633480 - SFB 1287, Project C04.

1 One of our reviewers notes that in the field of psycholinguistics the term speech production is used more broadly (than in the use of the term implied by the contributions to this Research Topic) and, points out the need, aptly stated, “to bridge the gap between psycholinguistically informed phonetics and phonetically informed psycholinguistics.” We fully concur and look forward to future research efforts and perhaps Research Topics devoted to such bridging. For a recent special issue on psycholinguistic approaches to speech production, see Meyer et al. ( 2019 ) and for a more focused review of the issues pertinent to “phonetic encoding” (a term in psycholinguistics roughly equivalent to our use of the term speech production in the present Research Topic) see Laganaro ( 2019 ).

  • Ijspeert A. J., Nakanishi J., Hoffmann H., Pastor P., Schaal S. (2013). Dynamical movement primitives: learning attractor models for motor behaviors . Neural Computation , 25 , 328–73. 10.1162/NECO_a_00393 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Laganaro M. (2019). Phonetic encoding in utterance production: a review of open issues from 1989 to 2018 . Language Cognit. Neurosci. 34 , 1193–1201. 10.1080/23273798.2019.1599128 [ CrossRef ] [ Google Scholar ]
  • Meyer A. S., Ardi R., Laurel B. (2019). Thirty years of speaking: an introduction to the Special Issue . Language Cognit. Neurosci. 34 , 1073–1084. 10.1080/23273798.2019.1652763 [ CrossRef ] [ Google Scholar ]
  • Saltzman E. L., Munhall K. G. (1989). A dynamical approach to gestural patterning in speech production . Ecological Psychology , 1 , 333–82. 10.1207/s15326969eco0104_2 [ CrossRef ] [ Google Scholar ]
  • Schaal S., Mohajerian P., Ijspeert A. J., Cisek P., Drew T., Kalaska J. F. (2007). Dynamics systems vs. Optimal control a unifying view . In Progress in Brain Research 165 , 425–45. 10.1016/S0079-6123(06)65027-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]

COMMENTS

  1. Speech production

    Three stages. The production of spoken language involves three major levels of processing: conceptualization, formulation, and articulation.. The first is the processes of conceptualization or conceptual preparation, in which the intention to create speech links a desired concept to the particular spoken words to be expressed. Here the preverbal intended messages are formulated that specify ...

  2. 4 Stages of Speech Production

    This is the action of reflecting on what you said and making sure that what you said is what you meant. Real-Time Spell Check And Grammar Correction. Conclusion. There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage.

  3. 9.2 The Standard Model of Speech Production

    Figure 9.2 The Standard Model of Speech Production. The Standard Model of Word-form Encoding as described by Meyer (2000), illustrating five level of summation of conceptualization, lemma, morphemes, phonemes, and phonetic levels, using the example word "tiger". From top to bottom, the levels are:

  4. Articulating: The Neural Mechanisms of Speech Production

    Abstract. Speech production is a highly complex sensorimotor task involving tightly coordinated processing across large expanses of the cerebral cortex. Historically, the study of the neural underpinnings of speech suffered from the lack of an animal model. The development of non-invasive structural and functional neuroimaging techniques in the ...

  5. 1

    The production of a speech sound may be divided into four separate but interrelated processes: the initiation of the air stream, normally in the lungs; its phonation in the larynx through the operation of the vocal folds; its direction by the velum into either the oral cavity or the nasal cavity (the oro-nasal process); and finally its ...

  6. Speech Production

    Speech production is the process of uttering articulated sounds or words, i.e., how humans generate meaningful speech. It is a complex feedback process in which hearing, perception, and information processing in the nervous system and the brain are also involved. Speaking is in essence the by-product of a necessary bodily process, the expulsion ...

  7. The Source-Filter Theory of Speech

    To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941; Fant, 1960) (see figure 1): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the "source" sound.Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract.

  8. Speech Production

    Speech production is one of the most complex human activities. It involves coordinating numerous muscles and complex cognitive processes. The area of speech production is related to Articulatory Phonetics, Acoustic Phonetics and Speech Perception, which are all studying various elements of language and are part of a broader field of Linguistics.

  9. Frontiers

    Editorial: Models and Theories of Speech Production. Spoken language is conveyed via well-coordinated speech movements, which act as coherent units of control referred to as gestures. These gestures and their underlying movements show several distinctive properties in terms of lawful relations among the parameters of duration, relative timing ...

  10. Speech Production

    Study the organs of speech and the stages in producing speech, and view the diagram. Explore the disorders affecting voice production. Updated: 12/08/2023

  11. Psycholinguistics/Development of Speech Production

    Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. ... In the final stage of prelinguistic speech, 10 month-old infants use intonation and stress patterns in their ...

  12. Physiology of Speech Production

    Abstract. Speech production at the peripheral level consists of three stages: exhalation, phonation, and articulation (Table 2.1). Exhalatory movement of the respiratory organ provides the subglottal air flow (direct current). The air flow is cut into puffs (alternating current) at the closed glottis as the vocal cords vibrate.

  13. Frontiers

    Structural changes in the brain take place throughout one's life. Changes related to cognitive decline may delay the stages of the speech production process in the aging brain. For example, semantic memory decline and poor inhibition may delay the retrieval of a concept from the mental lexicon. Electroencephalography (EEG) is a valuable method for identifying the timing of speech production ...

  14. Speech and Language Developmental Milestones

    Speech is talking, which is one way to express language. It involves the precisely coordinated muscle actions of the tongue, lips, jaw, and vocal tract to produce the recognizable sounds that make up language. Language is a set of shared rules that allow people to express their ideas in a meaningful way. Language may be expressed verbally or by ...

  15. Developmental Norms for Speech and Language

    Speech Sound Disorders — Information about articulation and phonological process development, includes a speech sound acquisition chart. ASHA Products. Beyond Baby Talk — This book by Kenn Apel and Julie Masterson answers questions and covers the stages of speech and language development during the first years of a child's life.

  16. The architecture of speech production and the role of the phoneme in

    Figure 2. Two-stage psycholinguistic model of speech production. Psycholinguistic models of speech production typically identify two major linguistic stages of processing, the word (or lemma) stage in which an abstract word form without phonological specification is coded and the phonological stage in which the phonological form of the word is ...

  17. 9.1 Evidence for Speech Production

    The evidence used by psycholinguistics in understanding speech production can be varied and interesting. These include speech errors, reaction time experiments, neuroimaging, computational modelling, and analysis of patients with language disorders. Until recently, the most prominent set of evidence for understanding how we speak came from.

  18. Speech Production From a Developmental Perspective

    The speech production process is then reimagined in developmental stages, with each stage building on the previous one. Conclusion The resulting theory proposes that speech production relies on conceptually linked representations that are information-reduced holistic perceptual and motoric forms, constituting the phonological aspect of a system ...

  19. (PDF) Speech Production

    Speech Production. Dani Byrd. Department of Linguistics, USC, 3601 Watt Way, GFS 301, Los Angeles, CA 90089-1693; [email protected]. Elliot Saltzman. Department of Physical Therapy, Boston University ...

  20. PDF Articulating: the neural mechanisms of speech production

    Speech production is a highly complex sensorimotor task involving tightly coordinated processing across large expanses of the cerebral cortex. Historically, the study of the neural underpinnings of ... output of the phonological encoding stage is a set of syl-lables chosen from a mental syllabary. The DIVA model's input is approximately ...

  21. 9.3 Speech Production Models

    9.3 Speech Production Models ... data from naming experiments and is a top-down model where information flows from more abstract levels to more concrete stages. The Word-form Encoding by Activation and VERification (WEAVER) is the computational implementation of the LRM model developed by Roelof (1992, 1996, 1997a, 1997b, 1998, 1999). ...

  22. Stages of Speech Development in The First Year of Life

    The development of speech-production skills in infants and young children occur in a sequence of stages, each one of which is related to its predecessors in a coherent way. ... Stages of Speech Development The stages of development of speech production that have been de scribed by a number of investigators working independently (Oiler, 1976 ...

  23. Speech planning: How our brains prepare to speak

    Throughout this speech planning process, the brain also relies on feedback mechanisms involving auditory and somatosensory systems to monitor and adjust speech production in real-time. This ...

  24. Editorial: Models and Theories of Speech Production

    For practical purposes, we separate these contributions in two major themes: 1) Models and theories of speech production. 2) Applications. Following is a short description of each paper as listed under these themes. 1) Models and theories of speech production. The speech signal is simultaneously expressed in two information-encoding systems ...