• Our Mission

Research Zeroes In on a Barrier to Reading (Plus, Tips for Teachers)

How much background knowledge is needed to understand a piece of text? New research appears to discover the tipping point.

Photo collage illustration concept for reading comprehension and background knowledge

By now, you’ve probably heard of the baseball experiment. It’s several decades old but has experienced a resurgence in popularity since Natalie Wexler highlighted it in her best-selling new book, The Knowledge Gap .

In the 1980s, researchers Donna Recht and Lauren Leslie asked middle school students to read a passage describing a baseball game, then reenact it with wooden figures on a miniature baseball field. They were surprised by the results: Even the best readers struggled to re-create the events described in the passage. 

“Prior knowledge creates a scaffolding for information in memory,” they explained after seeing the results. “Students with high reading ability but low knowledge of baseball were no more capable of recall or summarization than were students with low reading ability and low knowledge of baseball.”

That modest experiment kicked off 30 years of research into reading comprehension, and study after study confirmed Recht and Leslie’s findings: Without background knowledge, even skilled readers labor to make sense of a topic. But those studies left a lot of questions unanswered: How much background knowledge is needed for better decoding? Is there a way to quantify and measure prior knowledge?

A 2019 study published in Psychological Science is finally shedding light on those mysteries. The researchers discovered a “knowledge threshold” when it comes to reading comprehension: If students were unfamiliar with 59 percent of the terms in a topic, their ability to understand the text was “compromised.”

In the study, 3,534 high school students were presented with a list of 44 terms and asked to identify whether each was related to the topic of ecology. Researchers then analyzed the student responses to generate a background-knowledge score, which represented their familiarity with the topic. 

Without any interventions, students then read about ecosystems and took a test measuring how well they understood what they had read.

Students who scored less than 59 percent on the background-knowledge test also performed relatively poorly on the subsequent test of reading comprehension. But researchers noted a steep improvement in comprehension above the 59 percent threshold—suggesting both that a lack of background knowledge can be an obstacle to reading comprehension, and that there is a baseline of knowledge that rapidly accelerates comprehension.

Why does background knowledge matter? Reading is more than just knowing the words on the page, the researchers point out. It’s also about making inferences about what’s left off the page—and the more background knowledge a reader has, the better able he or she is to make those inferences.

“Collectively, these results may help identify who is likely to have a problem comprehending information on a specific topic and, to some extent, what knowledge is likely required to comprehend information on that topic,” conclude Tenaha O'Reilly, the lead author of the study, and his colleagues.

5 Ways Teachers Can Build Background Knowledge 

Spending a few minutes making sure that students meet the knowledge threshold for a topic can yield outsized results. Here’s what teachers can do:

  • Mind the gap: You may be an expert in civil war history, but be mindful that your students will represent a wide range of existing background knowledge on the topic. Similarly, take note of the cultural, social, economic, and racial diversity in your classroom. You may think it’s cool to teach physics using a trebuchet, but not all students have been exposed to the same ideas that you have.
  • Identify common terms in the topic. Ask yourself, “What are the main ideas in this topic? Can I connect what we’re learning to other big ideas for students?” If students are learning about earthquakes, for example, take a step back and look at what else they should know about—perhaps Pangaea, Earth’s first continent, or what tectonic plates are. Understanding these concepts can anchor more complex ideas like P and S waves. And don’t forget to go over some broad-stroke ideas—such as history’s biggest earthquakes—so that students are more familiar with the topic.
  • Incorporate low-stakes quizzes. Before starting a lesson, use formative assessment strategies such as entry slips or participation cards to quickly identify gaps in knowledge.
  • Build concept maps. Consider leading students in the creation of visual models that map out a topic’s big ideas—and connect related ideas that can provide greater context and address knowledge gaps. Visual models provide another way for students to process and encode information, before they dive into reading.
  • Sequence and scaffold lessons. When introducing a new topic, try to connect it to previous lessons: Reactivating knowledge the students already possess will serve as a strong foundation for new lessons. Also, consider your sequencing carefully before you start the year to take maximum advantage of this effect.  

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

The Comprehension Problems of Children with Poor Reading Comprehension despite Adequate Decoding: A Meta-Analysis

The purpose of this meta-analysis was to examine the comprehension problems of children who have a specific reading comprehension deficit (SCD), which is characterized by poor reading comprehension despite adequate decoding. The meta-analysis included 86 studies of children with SCD who were assessed in reading comprehension and oral language (vocabulary, listening comprehension, storytelling ability, and semantic and syntactic knowledge). Results indicated that children with SCD had deficits in oral language ( d = −0.78, 95% CI [−0.89, −0.68], but these deficits were not as severe as their deficit in reading comprehension ( d = −2.78, 95% CI [−3.01, −2.54]). When compared to reading comprehension age-matched normal readers, the oral language skills of the two groups were comparable ( d = 0.32, 95% CI [−0.49, 1.14]), which suggests that the oral language weaknesses of children with SCD represent a developmental delay rather than developmental deviance. Theoretical and practical implications of these findings are discussed.

Reading comprehension, or the process of engaging text for the purpose of extracting and constructing meaning ( Snow, 2002 ), has paramount importance to academic success and future life outcomes ( National Institute of Child Health and Human Development [NICHD], 2000 ; Snow, 2002 ). Yet only about 36% of fourth graders and 34% eighth graders in the United States have reading comprehension scores at or above proficiency by the end of the academic year ( U.S. Department of Education, 2015 ). Furthermore, nearly 31% of fourth graders and nearly 24% of eighth graders continue to attain reading comprehension scores that are below even the basic level. This indicates that a substantial proportion of fourth and eighth graders would have problems with more complex activities that extend beyond the text itself (e.g., comparing and contrasting ideas or making inferences beyond the text). This is particularly troubling given the importance of comprehension skills for success in school, in the workplace, and in daily life (e.g., understanding newspapers and forms and contracts to be signed).

Given the importance of decoding to reading comprehension it is not surprising that decoding deficits often result in comprehension difficulties ( Perfetti, 1985 ; Perfetti & Hart, 2001 ; Perfetti & Hogaboam, 1975 ; Perfetti, Landi & Oakhill, 2005 ; Shankweiler et al., 1999 ; Snow, Burns, & Griffin, 1998 ). However, it is estimated that between 10 and 15% of 7- to 8-year-old children have normal performance on decoding measures yet still experience deficits in reading comprehension ( Nation & Snowling, 1997 ; Stothard & Hulme, 1995 ; Yuill & Oakhill, 1991 ); that is, these children are characterized as having a specific reading comprehension deficit (SCD). Although this estimate varies depending on the criteria used to identify children with SCD (see Rønberg & Petersen, 2015 ), large-scale identification studies have shown that the prevalence of SCDis most likely around 8% for children between the ages of 9 and 14 years ( Keenan et al., 2014 ). Even an 8% prevalence rate would mean an average of two students in a classroom could meet the criteria for SCD.

Reading comprehension is a complex process, involving a variety of cognitive and linguistic skills. As a result, deficits in any cognitive ability important to the comprehension process can potentially lead to deficits in reading comprehension performance. Perfetti and colleagues ( Perfetti et al., 2005 ; Perfetti & Stafura, 2014 ) provide a comprehensive framework for understanding the processes and skills involved in reading comprehension; deficits in comprehension could result from a variety of sources beyond decoding, including differences in sensitivity to story structure, inference making, comprehension monitoring, syntactic processing, verbal working memory, and oral language skills ( Cain & Oakhill, 1996 , 1999 ; Cain, Oakhill, Barnes, & Bryant, 2001 ; Nation, Adams, Bowyer-Crane, & Snowling, 1999 ; Nation & Snowling, 1998b , 1999 ; Oakhill, Hartt, & Samols, 2005 ; Pimperton & Nation, 2010a ; Snowling & Hulme, 2012 ).

Existing studies of children with SCD show that they perform poorly on a range of oral language assessments ( Cain, 2003 ; Cain, 2006 ; Cain et al., 2005 ; Cain & Oakhill, 1996 ; Carretti et al., 2014 ; Nation & Snowling, 2000 ; Oakhill et al., 1986 ; Stothard & Hulme, 1996 ; Tong, Deacon, & Cain, 2014 ; Tong, Deacon, Kirby, Cain, & Parrila, 2011 ; Yuill & Oakhill, 1991 ). However, relatively little is known about whether the comprehension problems of children with SCD are the result of their oral language deficits. Although it is possible that the documented deficits in oral language account for the observed deficits in reading comprehension, they may only be a contributing factor. A better understanding of the comprehension problems for children with SCD may be a first step towards better identification and remediation.

We briefly describe relevant theories of reading comprehension because existing theories may inform our understanding of the comprehension problems of children with SCD and understanding the comprehension problems of children with SCD in turn may inform theories of comprehension.

Several theories of reading comprehension have emerged over the years. These include the bottom-up view, the top-down view, the interactive view, the metacognitive view, and the simple view of reading comprehension. Each of these theories are relevant within the present context. Thus, we briefly discuss each theory below.

According to the bottom-up view of reading comprehension, readers move from an understanding of parts of language (e.g., letters, words) to an understanding of meaning or the whole (e.g., phrases, passages; Gough, 1972 ; Holmes, 2009 ; LaBerge & Samuels, 1974 ). Comprehension is thought to be a product of the acquisition of hierarchically arranged subskills ( Dole et al., 1991 ). Thus, lower-level word recognition skills precede the development of more complex skills that lead to an eventual understanding of phrases, sentences, and paragraphs. Automaticity in processing and understanding written text is also thought to affect text comprehension ( LaBerge & Samuels, 1974 ). Automaticity refers to the fact that proficient readers can read text automatically and that they do not need to focus consciously on lower-level word recognition. Thus, children with decoding problems allot greater cognitive resources to word recognition – and less to comprehension – whereas proficient readers are able to devote greater cognitive resources to higher-level cognitive processes (e.g., working memory; Daneman & Carpenter, 1980; Perfetti, 1985 ; Perfetti & Hogaboam, 1975 ).

Based on the top-down (i.e., conceptually-driven) view of reading comprehension, readers are moving from meaning down to the component parts of words as they engage with text ( Rumelhart, 1980 ; Shank & Abelson, 1977 ). According to this view, a reader's mental frameworks or schemas are the driving force behind successful reading comprehension ( Rumelhart, 1980 ). Readers are actively integrating new information that is encountered in the text with information that they have already stored within their previously established mental representations (i.e., background knowledge).

Top-down and bottom-up aspects are combined in the interactive view of reading comprehension. Based on this view, reading comprehension requires the reader to devote attentional resources to the more basic features of the text (e.g., letters, words) while simultaneously focusing on the more general aspects (e.g., syntax, semantics) and actively interpreting what is being read ( Perfetti et al., 2005 ). Proficient readers are those who successfully engage with multiple sources of information provided within the text and information that is not readily available from the text (Kintsch, 1998; Perfetti & Stafura, 2014 ; van Dijk & Kintsche, 1983 ). Good readers are are able to recognize and interact with key features of the text, such as lexical characteristics, at the same time that they are more broadly identifying the purpose of a passage or a paragraph ( Rayner, 1986 ; Rayner et al., 2001 ).

The simple view of reading asserts that reading comprehension is the product of decoding ability and language comprehension ( Gough & Tunmer, 1986 ; Hoover & Gough, 1990 ). The simple view also has substantial empirical validation. For example, decoding has emerged as a reliable predictor of reading comprehension ability in a variety of instances (e.g., Kendeou, van den Broek, White, & Lynch, 2009 ; Shankweiler et al., 1999 ). In fact, poor decoding skills are associated with reading comprehension problems ( Perfetti, 1985 ). Additionally, oral language skills remain a robust and unique predictor of reading comprehension over and above word reading skills ( Nation & Snowling, 2004 ).

Oral language is defined as the ability to comprehend spoken language ( National Early Literacy Panel, 2008 ) and includes a wide variety of skills, such as expressive and receptive vocabulary knowledge, grammar, morphology, syntactic knowledge, conceptual knowledge, and knowledge about narrative structure ( Beck, Perfetti, & McKeown, 1982 ; Bishop & Adams, 1990 ; Bowey, 1986 ; Perfetti, 1985 ; Roth, Speece, & Cooper, 2002 ). Oral language skills impact reading comprehension directly, such as through the understanding of the words presented in a text, as well as indirectly via other literacy-related skills (e.g., phonological awareness; NICHD, 2000 ; Wagner & Torgesen, 1987 ). Furthermore, the unique contribution of oral language to reading comprehension remains even after accounting for word recognition ( Oullette, 2006 ).

The simple view provides a potential explanation for the reading comprehension problems of children with SCD that is consistent with their observed oral language deficits: Reading comprehension requires both adequate decoding and adequate oral language comprehension. This would explain the observation that children with SCD have adequate decoding but not adequate oral language comprehension. Catts, Adolf, and Weismer (2006) and Nation and Norbury (2005) applied this simple view of reading framework to identify different types of reading problems in eighth graders and 8-year-old children, respectively. According to this classification system, children with good decoding and good comprehension are adequate readers; children with poor decoding and poor comprehension are garden-variety poor readers; children with good comprehension and poor decoding meet criteria for dyslexia; and children with good decoding and poor comprehension have SCD. Thus, a mastery of both decoding and language comprehension is necessary for reading proficiency.

Developmental Delay or Developmental Deficit?

Developmental delay and developmental deficit are two hypotheses that are often discussed in relation to the nature of reading disability (e.g., dyslexia; see Francis, Shaywitz, Stuebing, Shaywitz, & Fletcher, 1996 ). The developmental delay hypothesis asserts that poor reading performance results from a delayed acquisition of reading-related skills ( Francis et al., 1996 ). However, these children follow the same developmental trajectory as typical readers ( Francis et al., 1996 ). The developmental deficit hypothesis, on the other hand, states that the underlying skill shows a different or deviant developmental trajectory ( Francis et al., 1996 ). For the case of reading disability, the underlying skill examined was phonological processing. We are interested in determining whether an oral language weakness represents a developmental delay or deficit for children with SCD. This hypothesis could be tested within studies that matched children with SCD to a younger group of typically-developing children (comprehension-age matching; see Cain, Oakhill, & Bryant, 2000 ). If children with SCD demonstrated similar performance to the comprehension-age matched group this would support developmental delay. If children with SCD had worse performance than the comprehension-age matched group, this outcome would support developmental deviance.

The importance of the distinction between developmental delay and developmental deficit is that a skill that is characterized as a developmentally deficit is more likely to be a contributing factor in the development of the reading problem. Developmental delay implies that the skill is consistent with the observed delay in reading and is therefore less likely to be a contributing factor. To our knowledge, an empirical examination of these two hypotheses has not yet been conducted for the observed oral language deficits in children with SCD.

Below, we describe a study conducted by Cain and Oakhill (2006) that has several characteristics that are typical of studies involving children with SCD. In this investigation, the authors were interested in the cognitive profiles of 7- to 8-year-old children with SCD; this age range is very common for investigations of children with SCD (e.g., Cain, 2003 ; Cain & Oakhill, 1996 , 2007; Jerman, 2007; Oakhill, 1982 ). Children were selected based on their performance on measures of reading comprehension and word reading accuracy and were followed longitudinally. In this case, the Neale Analysis of Word Reading Ability was used to categorize children into groups of good and poor comprehenders. Age-appropriate word reading accuracy was defined as being between 6 (lower limit) and 12 months (upper limit) of their chronological age (e.g., Clarke, 2009 ). Poor reading comprehension was defined as a 12-month discrepancy between comprehension age and chronological age and their reading accuracy age and comprehension age (e.g., Nation & Snowling, 1999 , 2000 ; Weekes, Hamilton, Oakhill, & Holliday, 2008 ). Typical readers are defined as attaining reading comprehension scores that are at or above word reading accuracy performance. Due to one-to-one matching and the low proportion of SCD in the population, final groups were small (23 children per group); this is typical of many studies involving children with SCD (e.g., Ehrlich & Remond, 1997 ; Geva & Massey-Garrison, 2012 ; Nation & Snowling, 1998a , 1998b ). In this study, children were given a battery of assessments that included a combination of standardized and experimenter-created measures (e.g., Nation et al., 1999 ; Nation & Snowling, 2000 ). A unique aspect of this investigation is that children were followed longitudinally; many studies involving children with SCD are single time point studies (e.g., Cain & Oakhill, 1999 ; Oakhill, 1983 ).

SCD has been defined in a variety of ways across different studies. Although researchers tend to agree on the need for a discrepancy between an individual's decoding ability and their reading comprehension skills, individuals with SCD (also referred to as poor comprehenders or less-skilled comprehenders in the literature) have been identified using one of four criteria:

  • A discrepancy between reading comprehension and decoding (e.g., Isakson & Miller, 1976 ; Nation & Snowling, 1998a ; Oakhill, Yuill, & Parkin, 1986 ; Pimperton & Nation, 2010a );
  • A discrepancy between reading comprehension and decoding with an additional requirement that decoding skills are within the normal range (e.g., Cain et al., 2001 ; Cataldo & Oakhill, 2000 ; Cragg & Nation, 2006 ; Torppa et al., 2009);
  • Discrepancies between reading comprehension, decoding, and chronological age with an additional requirement that decoding skills are within the normal range ( Cain, 2003 ; Cain, 2006 ; Cain et al., 2000 ; Cain & Oakhill, 2006 , 2011 ; Cain, Oakhill, & Elbro, 2003 ; Cain, Oakhill, & Lemmon, 2004 ; Cain & Towse, 2008 ; Clarke, 2009 ; Marshall & Nation, 2003 ; Nation & Snowling, 1997 , 2000 ; Nation et al., 2001 ; Oakhill et al., 2005 ; Spooner, Gathercole, & Baddeley 2006 ; Stothard & Hulme, 1995 ; Yuill, 2009 ; Yuill & Oakhill, 1991 );
  • A discrepancy between reading comprehension and word-level decoding with additional requirements that decoding skills are within the normal range and that comprehension scores fall below a given percentile or cut point ( Cain & Towse, 2008 ; Carretti, Motta, & Re, 2014 ; Catts et al., 2006 ; Henderson, Snowling, & Clarke, 2013 ; Kasperski & Katzir, 2012; Megherbi, Seigneuric, & Ehrlich, 2006 ; Nation, Clarke, Marshall, & Durand, 2004 ; Nation, Snowling & Clark, 2007 ; Nesi, Levorato, Roch & Cacciari, 2006 ; Pelegrina, Capodieci, Carretti, & Cornoldi, 2014 ; Pimperton & Nation, 2014 ; Ricketts, Nation, & Bishop, 2007 ; Shankweiler et al., 1999 ; Tong et al., 2011 ; Tong et al., 2014 ).

Despite the fact that differences in identification criteria influence the percentage of children identified as having SCD (see Rønberg & Petersen, 2015 ), children with SCD likely represent a small but significant proportion of struggling readers. Moreover, across studies included within the present review, SCD was identified using all of these different criteria. Therefore, our findings provide an overall estimate of the nature of children's comprehension problems regardless of identification method.

The purpose of the present meta-analysis is to better understand the comprehension deficits of children who have SCD. The framework for the present meta-analysis grew out of a recent investigation that tested three hypotheses regarding the nature of the comprehension problem in a large sample of over 425,000 first-, second-, and third graders with SCD ( Spencer, Quinn, & Wagner, 2014 ). The three hypotheses tested whether comprehension problems for these children were largely specific to reading, general to oral language, or both (i.e., a mixture). Children were obtained from a statewide database, and prevalence of SCD was calculated based on percentile cutoffs. The results indicated that over 99 percent of children in each grade who had SCD also had deficits in vocabulary knowledge. This finding indicates that children's comprehension deficits were general to reading and at least one important aspect of oral language.

Although these results provide compelling evidence that comprehension problems are general to at least one aspect of oral language (i.e., vocabulary knowledge), three limitations of the study need to be noted. First, participants included mostly children attending Reading First schools, a Federal program for improving reading performance for students from low socioeconomic backgrounds. Because poverty is a risk factor for delayed development of oral language, the results may not generalize to students not living in poverty. Second, the assessments were brief and receptive vocabulary knowledge served as the only measure of oral language comprehension, when in fact, oral language is potentially comprised of a variety of different skills that might affect reading comprehension. Third, the study did not compare the relative magnitudes of the deficits observed in reading comprehension and vocabulary, a potentially important new source of data that could be used to compare alternative hypotheses about the nature of the comprehension problems of children with SCD.

These limitations suggest the need for a comprehensive review of the literature on the nature of the comprehension problems of children who have SCD. Such a review could incorporate results from studies with more representative samples and using a variety of measures. By examining magnitudes as well as the existence of deficits in reading versus oral-language comprehension, it would be possible to test a previously neglected hypothesis in Spencer et al. (2014) , namely that children with SCD could have deficits in oral language that are not as severe as their deficits in reading comprehension.

Thus, in addition to testing two hypotheses from Spencer et al. (2014) – (a) Children with SCD have comprehension deficits are specific to reading, such that they demonstrate impaired reading comprehension but no impairments in oral language and (b) children with SCD have comprehension deficits are general to reading and oral language, such that they demonstrate equal impairment in reading comprehension and oral language – we also test a third hypothesis in the present meta-analysis, (c) children with SCD have comprehension deficits that extend beyond reading to oral language, but they demonstrate greater impairment in reading comprehension than in oral language.

Hypothesis one: children with SCD have comprehension problems that are specific to reading

Theoretical support for this hypothesis comes from the bottom-up view of reading comprehension and from the automaticity of reading ( Gough, 1972 ; Holmes, 2009 ; LaBerge & Samuels, 1974 ). It is possible that children might have adequate decoding but their adequate decoding requires processing resources that are then not available for comprehension while reading. If this were the case, their comprehension would be impaired for reading comprehension because decoding is required but not impaired for oral language.

Empirical support for this hypothesis comes from studies that demonstrate the existence of individuals who have been identified as having SCD in the presence of intact or relatively intact vocabulary knowledge ( Cain, 2006 ; Nation, et al., 2010 ). Moreover, some studies that compared children with and without SCD matched them on vocabulary performance (e.g., Cain, 2003 , Cain, 2006 ; Spooner et al., 2006 ; Tong et al., 2014 ). That it was possible to do this match supports the possibility that comprehension problems are specific to the domain of reading.

Hypothesis two: children with SCD have comprehension problems that are general to reading and oral language

Several theoretical perspectives provide a rationale for this hypothesis, including the simple view, top-down view, and interactive views of reading comprehension. The simple view ( Gough & Tunmer, 1986 ; Hoover & Gough, 1990 ) provides support for this hypothesis because it explains SCD as resulting from a deficit in oral language comprehension ( Catts et al., 2006 ; Nation et al., 2004 ). The top-down and interactive views are in line with this hypothesis because both frameworks emphasize the readers' mental frameworks ( Rumelhart, 1980 ; Shank & Abelson, 1977 ). The top down processing highlighted in both frameworks would affect comprehension regardless of whether the context is written or oral language.

Empirical support for this hypothesis comes from studies showing that oral language ability is a predictor of future reading comprehension success and failure ( Nation & Snowling, 2004 ; Snow et al., 1998 ); children with reading comprehension problems tend to have deficits in oral language ( Catts, Fey, Tomblin, & Zhang, 2002 ). For example, Catts, Fey, Zhang, and Tomblin (1999) investigated relations between oral language and reading comprehension skills in second graders. Results indicated that children with reading comprehension deficits were significantly more likely to have had oral language weaknesses in kindergarten compared to students with more typical comprehension development (see also Elwer, Keenan, Olson, Byrne, & Samuelsson, 2013 ).

The view that comprehension problems are general to oral language and reading is supported by multiple investigations. Children with SCD have demonstrated weaknesses related to a variety of oral language domains, such as semantic processing, listening comprehension, and syntactic ability ( Carretti, Motta, & Re, 2014 ; Nation & Snowling, 2000 ; see Cain & Oakhill, 2011 and Justice, Mashburn, & Petscher, 2013 for longitudinal evidence). When compared to typical readers, these children also tend to perform significantly poorer on measures tapping verbal working memory skills (see Carretti, Borella, Cornoldi, & De Beni, 2009 ). Differences between typically-developing readers and individuals with SCD have also been reported using a wide variety of behavioral and EEG/ERP measures (e.g., Landi & Perfetti, 2007 ).

Hypothesis three: children with SCD have comprehension problems that extend to oral language but are less severe for oral language than for reading

Theoretical support for this hypothesis is provided by a combination of theoretical rationales discussed for the previous two hypotheses. Specifically, a deficit that is general to oral language as well as reading comprehension is assumed, combined with additional deficits that are specific to reading. For example, a deficit in vocabulary would impair performance in reading comprehension and oral language. Simultaneously, decoding and orthographic processing could require attention and cognitive resources that are not required by listening, such as visual processing. The combined result would be impairments in both oral language and reading comprehension, but the impairment would be greater for reading comprehension.

Empirical support for this hypothesis comes from studies showing that these children demonstrate differential performance across various oral language tasks ( Cain, 2003 ; Cain, 2006 ; Cain, Oakhill, & Lemmon, 2005 ; Stothard & Hulme, 1992 ; Tong et al., 2014 ). For example, Cain (2003) examined language and literacy skills in children with SCD who were matched to typical readers based on vocabulary; however, these same children exhibited significantly poorer performance on other oral language tasks, such as listening comprehension and a story structure task. Similarly, Tong et al. (2014) included children with SCD who were vocabulary-matched to typical readers. Yet, children with SCD exhibited poor performance on a morphological awareness task. Therefore, it may be that the comprehension problems of children with SCD affects some but not all aspects of oral language.

Additionally, we were interested in examining the effect of several potential moderators of effect size outcomes, specifically the effects of (a) publication type, (b) participant age, and (c) type of oral language measure. The rationale for these moderators are as follows: First, if publication type (e.g., published journal article versus unpublished dissertation) significantly predicts effect size outcomes, we would attribute this, at least partially, to publication bias. Thus, we wanted to include this variable within each meta-analysis. Second, we were interested in participant age as a moderator of effect sizes ( Catts et al., 2006 ; Elwer et al., 2013 ; Nation, Cocksey, Taylor, & Bishop, 2010 ). Previous research has also indicated that younger children with SCD tend to have weaker reading comprehension skills compared to older children ( Authors, 2017 ). We sought to investigate whether this finding would be replicated within a different sample and also whether these differences transfer to oral language skills as well. Finally, type of oral language measure was included as a potential moderator due to the fact that oral language measures vary greatly in the skills that they assess ( Cain & Oakhill, 1999 ; Nation et al., 2004 , 2010 ; Tong et al., 2011 ). For instance, a receptive vocabulary assessment is likely to be much less difficult for a child with SCD compared with a syntactic or morphological task. Therefore, examining the potential effects of type of oral language measure may provide additional insight into which tasks may be best to use for identifying children with SCD.

Across four decades, multiple systematic reviews of reading comprehension have been conducted. These reviews have examined a variety of topics, including an examination the component skills of reading comprehension and intervention research for struggling readers (e.g., Bus & van Ijzendoorn, 1999 ; Ehri, Nunes, Stahl, & Willows, 2001 ; Swanson, Tranin, Necoechea, & Hammill, 2003 ). In more recent years, there have been several narrative reviews focusing specifically on children with SCD ( Hulme & Snowling, 2011 ; Nation & Norbury, 2005 ; Oakhill, 1993 ), but only one known meta-analysis to date has investigated the cognitive skills of these individuals ( Carretti et al., 2009 ). However, Carretti et al. (2009) focused exclusively on working memory skills whereas the present investigation examines performance of children with and without SCD on a wide array of oral language tasks in addition to verbal working memory.

In the present review, we examine studies using five methods. First, we conducted between-group meta-analyses comparing the reading comprehension performance of children with SCD with the reading comprehension performance of typically-developing readers. Second, we conducted between-group analyses comparing the oral language performance (as indexed by measures of vocabulary, listening comprehension, storytelling ability, morphological awareness, and semantic and syntactic knowledge) of children with SCD with the oral language performance of typically-developing readers. Third and fourth, we conducted the same meta-analyses for reading comprehension and oral language performance for studies that included a comprehension-age matched group (see Cain et al., 2000 ). The existence of such studies makes it possible to determine whether impaired oral language performance represents developmental delay (i.e., performance similar to younger normal comprehenders) or a developmental difference (i.e., performance different than that of younger normal comprehenders; Francis et al., 1996 ). Finally, we conducted a separate meta-analysis for studies reporting performance on standardized reading comprehension and oral language measures for the same participants (i.e., a within-child comparison of reading comprehension and oral language) because we were interested in the comparability of oral language skills to reading comprehension within children who have SCD.

Study Collection

The current meta-analysis includes studies published in English from January 1, 1970 to February 20, 2016. Several electronic databases and keywords were used to locate relevant studies. These databases included PsycINFO, ERIC, Medline, and ProQuest Dissertations. In an effort to reduce the likelihood of publication bias within the present review, we also searched several gray literature databases (i.e., SIGLE, ESRC, and Web of Science ). We used title-based keywords related to reading comprehension and reading disabilities ( specific comprehension deficit*, poor comprehender*, comprehension difficult*, less-skilled comprehen*, comprehension failure, reading difficult*, difficulty comprehending, poor comprehension, struggling reader*, specific reading comprehension difficult*, specific reading comprehension disabilit*, low comprehender*, weak reading comprehen*, reading comprehension disab*, poor reading comprehension ) in combination with other reading-related keywords ( reader*, reading, subtype*, subgroup ). Our search spanned peer-reviewed and non-peer-reviewed journal articles, dissertations and theses, book chapters, reports, and conference proceedings. The references of relevant articles were also hand searched, and we contacted researchers who had at least three relevant publications (first authored or not) as a way of including unpublished data within the present review. We conducted additional searches for these same researchers using author- and abstract-based keyword searches [au( author ) AND ab( comprehen* )].

Inclusionary criteria

Several inclusionary criteria were used to select studies to be included within the present synthesis. Studies were required to: (a) report original data (i.e., sample means, standard deviations, correlations, sample sizes, t -tests, and/or F -tests); (b) include native speakers of a language; (c) assess children between the ages of 4 and 12 years; (d) contain at least one measure of reading comprehension, decoding ability, and oral language; (e) include a sample of children with SCD based on their performance on measures of reading comprehension and decoding ability; and (f) include a typically-developing group of readers for comparisons 2 .

We applied the language-based criterion because we wanted to be able to investigate the relation between poor reading comprehension and oral language skills separate from language status because language status is known to affect reading comprehension (e.g., Kieffer, 2008 ). However, studies could include monolingual samples that spoke a language other than English (e.g., Italian) provided that the study was reported in English. Acceptable measures of reading comprehension included assessments that measured individuals' comprehension of the text beyond word reading ability; acceptable measures of decoding ability included assessments that measured real word decoding, nonword decoding, and/or reading accuracy; and acceptable measures of oral language included tasks that assessed vocabulary knowledge, syntactic and semantic processing, listening comprehension, and/or storytelling ability.

Exclusionary criteria

Three exclusionary criteria were applied for studies included in the current meta-analysis: (a) teacher and parent ratings were not acceptable methods for identifying children with SCD, (b) samples of non-native speakers, and (c) samples could not also contain children characterized as having intellectual disability, attention deficit hyperactivity disorder (ADHD), oppositional defiant disorder (ODD), aphasia, hydrocephalus, or hearing or vision impairments.

Final study selection

The initial search yielded approximately 3,050 results. After eliminating duplicates, studies that did not adhere to our inclusion/exclusion criteria, and studies reporting results from identical participants, a total of 86 studies remained.

A random sample of 10% of the studies was coded twice by the first author and a graduate student in order to establish inter-coder reliability; studies were coded based on study features (i.e., study type, sample size, operational definition of SCD, matching variables, language spoken, and sample age) and reading comprehension- and oral language-related constructs (i.e., reported reliabilities, correlations with oral language measures, means and standard deviations for each assessment, and reported t values or F ratios). We additionally coded participant age, type of oral language measure (i.e., vocabulary knowledge, narrative, listening comprehension, syntactic/grammar, semantic knowledge, and figurative language), and type of publication (i.e., journal article, book chapter, theses/dissertations, and unpublished data). Cohen's kappa was used to measure inter-coder reliability (96% for study features; 98% for reading comprehension-related constructs; 94% for the oral language-related constructs). The overall reliability exceeded acceptability of kappa ≥ .70 (kappa = 96%). Discrepancies were resolved through discussion or by referring to the article.

The final sample included 84 studies for between ( k brc = 152 effect sizes for reading comprehension ; k bol = 309 effect sizes for oral language) and within-group analyses ( k wrc = 97 effect sizes). The between-group analyses were twofold. One was a comparison of children with SCD to typical readers and another was a comparison of children with SCD to a comprehension-age matched group of children. Between-group comparisons of children with SCD to typical readers allowed for a test of the three hypotheses outlined previously: (a) children with SCD have comprehension problems that are specific to reading; (b) children with SCD have comprehension problems that are general to reading and oral language; or (c) children with SCD have comprehension problems that extend to oral language but are less severe for oral language than for reading. Between-group comparisons of children with SCD to a comprehension-age matched group allowed for a test of the delay versus deficit hypotheses for the anticipated oral language difficulties. A subsample of the original study sample ( n = 4) included comprehension-age matched groups for additional analyses ( k brc = 4 effect sizes for reading comprehension ; k bol = 30 effect sizes for oral language).

Within-child analyses require that both measures within a single study use the same scale. Thus, in order to be included within the within-child analysis, studies had to include standardized measures of reading comprehension and oral language and report standard scores, scaled scores, z -scores, or t -scores. Our within-child analyses allowed us to test the robustness of the pattern of results observed in the between-group comparison. That is, we were able to compare the reading comprehension and oral language skills within children who had SCD.

Meta-Analytic Methods

All analyses were conducted using Microsoft Excel (Version 14.0), and Metafor ( Viechtbauer, 2010 ) and Robumeta packages in R ( Fisher & Tipton, 2015 ). Effect sizes were calculated using Hedge's g (Hedges, 1981), which is Cohen's d ( Cohen, 1977 ) after incorporating a correction for small sample sizes. Negative effect size values indicate that children with SCD had a lower group mean than typically developing readers. In several instances, groups were vocabulary-matched (i.e., children with SCD were selected on the basis of having average vocabulary performance compared to a group of typical readers). 3

Average weighted effect sizes for each meta-analysis were calculated using random-effects models, which assume all parameters to be random as opposed to fixed ( Shadish & Haddock, 2013 ). We used random-effects models in the present investigation because Q (i.e., homogeneity of effect size; Hedges & Olkin, 1985 ) was rejected across most comparisons. For one comparison, Q was not rejected; for this meta-analysis, we used a fixed-effects model. We also estimated I 2 , which calculates the percentage of variance due to heterogeneity. We used random-effects models to calculate a 95% confidence interval (CI) in order to determine whether each calculated average weighted effect size was statistically significant (i.e., different from zero). A CI within random-effects models assumes systematic study variability (i.e., that differences across studies do not result from random sampling error; Shadish & Haddock, 2013 ). We additionally conducted an Egger test for funnel plot asymmetry within each meta-analysis as a means of testing whether publication bias was present (significant plot asymmetry) or absent (non-significant plot asymmetry; Egger, Smith, Schneider, & Minder, 1997 ).

Across meta-analyses, there were several instances in which a single study resulted in multiple effect size estimates. We used robust variance estimation with the small sample size correction to handle dependent effect sizes ( Hedges, Tipton, & Johnson, 2010 ; Tipton, 2015 ). This relatively recent approach has advantages over alternative approaches to handling dependent effect sizes such as including only one effect size per study, creating an average effect size, or using multivariate approaches to model the dependency. Robust variance estimation allows one to use all effect sizes including multiple ones from the same sample in the meta-analysis for estimating average weighted effect sizes and for testing possible moderators, then corrects for the effects of the dependencies in the significance testing. Although robust variance estimation can be implemented in macros to common statistical packages such as SPSS, an efficient way of doing so is by using the Robumeta package available in R ( Fisher & Tipton, 2015 ). We carried out meta-regressions analyses of potential moderators using Robumeta when there were dependent effect sizes. For meta-analyses that did not demonstrate dependency among effect size estimates (i.e., between group comparison of reading comprehension for children with SCD and comprehension-age matched children), we calculated the average weighted effect size estimate using traditional methods in Metafor.

A total of 86 independent studies were included within the analyses. Effect sizes for each comparison are reported in Table 1 (see also Appendices A, B, and C). A substantial portion of studies included English-speaking samples (Study n = 72). Fourteen studies included children who spoke Italian ( n = 5), French ( n = 3), Finnish ( n = 1), Hebrew ( n = 1), Chinese ( n = 2), Portuguese ( n = 1), and Spanish ( n = 1). Across studies, children were between the ages of 4 and 12 years.

Effect Size
ComparisonsVariables 95% CI
SCD and typical readersReading comprehension152−2.78[−3.01, −2.54]94.39
Reading comprehension 137−2.80[−3.05, −2.55]94.68
Reading comprehension 57−2.73[−3.05, −2.42]96.82
Oral language309−0.78[−0.89, −0.68]85.55
Oral language 304−0.79[−0.90, −0.68]85.50
Oral language 133−0.95[−1.06, −0.83]91.00
Oral language 400−0.77[−0.87, −0.67]85.12
SCD and comprehension-age matchReading comprehension4−0.31[−0.63, 0.02]<1.00
Oral language300.32[−0.49, 1.14]77.13
SCD Only
Reading comprehension, oral language97−0.84[−1.06, −0.62]96.06

Note. k = Number of effect sizes; d = Average-weighted effect size estimate; CI = Confidence interval; SCD = Children with specific reading comprehension deficits;

Effect Size Analyses

Comparisons of children with scd to typical readers.

We compared children with SCD to typical readers on measures of reading comprehension and oral language. These analyses served as a means to test whether: (a) children with SCD have comprehension problems that are specific to reading; (b) children with SCD have comprehension problems that are general to reading and oral language; or (c) children with SCD have comprehension problems that extend to oral language but are less severe for oral language than for reading.

Reading comprehension

One hundred and fifty-two comparisons were made for the reading comprehension of children with SCD and typically-developing readers (Study n = 84). Across studies, there were 17,600 children with SCD ( M = 209.53; SD = 703.14; range: 7-3,236) who were compared with 155,874 typically developing children ( M = 1,855.64; SD = 6,737.96; range: 8-29,676). The average weighted effect size was negative, large, and statistically significant (random-effects robust variance estimation: d = −2.78, 95%CI [−3.01, −2.54]). Because the CI does not include zero, this indicates that the effect size estimate is significantly different from zero. This suggests that children with SCD performed substantially poorer on measures of reading comprehension compared to their typically developing peers, which was expected. Study-specific effect sizes for reading comprehension, participant ages, and sample sizes for these comparisons are reported in Appendix A ; effect sizes are reported in descending order. There was a large variability in effect size estimates across studies due to heterogeneity, I 2 = 94.39 (see Table 1 ). Sensitivity analyses indicated that varying values of rho (ρ) from 0 to 1 in .20 increments did not affect tau squared (τ 2 ), the subsequent weights, and the average weighted effect size estimate. This outcome suggests that the observed effect size is fairly robust. An Egger test of funnel plot asymmetry was significant, z = −7.09, p < .0001 (see Figure 1 ), indicating asymmetry in effect size estimates across studies.

An external file that holds a picture, illustration, etc.
Object name is nihms965774f1.jpg

Funnel plots for between- and within-group comparisons. Note . RC = Reading comprehension; OL = Oral language; WM = Working memory; CAM = Reading-comprehension age-match.

Oral language

Three hundred and nine comparisons were made for the oral language skills of children with SCD and typically-developing children, (Study n = 76). There were 16,494 children with SCD ( M = 219.93; SD = 706.39; range: 7-3,016) who were compared with 144,857 typically developing children ( M = 1,931.43; SD = 6,676.47; range: 8-28,970). The average weighted effect size was also negative, large, and statistically significant (random-effects robust variance estimation: d = −0.78, 95% CI [−0.89, −0.68]). Thus, when compared to children without comprehension problems, children with SCD additionally exhibit difficulty completing oral language tasks; however this deficit was not as severe as for reading comprehension. Study-specific effect sizes for oral language, participant ages, and sample sizes for these comparisons are reported in Appendix A ; effect sizes are reported in descending order. Variability due to heterogeneity was large across studies, I 2 = 85.55 (see Table 1 ). Sensitivity analyses indicated that the observed effect size is fairly robust; varying values of ρ resulted in no differences. An Egger test of funnel plot asymmetry was significant, z = −2.11, p < .05 (see Figure 1 ), suggesting some asymmetry in estimates. Additionally, we also examined verbal working memory for studies that were already included in the analysis, which added 91 additional comparisons to the analysis. The average weighted effect size remained negative, large, and statistically significant (random-effects robust variance estimation: d = −0.77, 95% CI [−0.87, −0.67]; I 2 = 85.12; see Table 1 ).

It is important to note that across comparisons of reading comprehension and oral language, different studies were available for analyses; however, when we analyzed only overlapping studies (Study n = 74), the effects for reading comprehension (random-effects robust variance estimation: d = −2.80, 95% CI [−3.05, −2.55]; I 2 = 94.68) and oral language were nearly identical (random effects robust variance estimation: d = −0.79, 95% CI [−0.90, −0.68]; I 2 = 85.50).

Comparisons of children with SCD to comprehension-age matched readers

Given that we found evidence that children with SCD do exhibit deficits in oral language, we were additionally interested in how such deficits were best characterized. Thus, we conducted a between-groups meta-analysis that compared the performance of children with SCD to younger comprehension-age matched readers. Children in the comprehension-age matched group were selected on the basis of having performance equivalent to children with SCD (see Cain et al., 2000 ). 4 Across studies, children within the comprehension-age matched group were approximately two years younger than children with SCD.

Four comparisons were made for the reading comprehension skills of children with SCD and comprehension-age matched control children (Study n = 4). There were 73 children with SCD ( M = 18.25; SD = 7.23; range: 14-29) compared with 68 typically-developing children across studies ( M = 17.00; SD = 6.78; range: 14-27). Study-specific effect sizes for reading comprehension, participant ages, and sample sizes for these comparisons are reported in Appendix B ; effect sizes are reported in descending order. The average weighted effect size was moderate and negative, but it was not statistically significant (fixed-effects: d = −0.31, 95% CI [−0.31, 0.02)]; Q (3) = .38, p = .94, I 2 = <1%; see Table 1 ). This outcome was expected given that the two groups were matched for reading comprehension performance. An Egger test of funnel plot asymmetry was non-significant, z = −.13, p = .90 (see Figure 1 ).

Thirty comparisons were made for the oral language skills of children with SCD and children within comprehension-age matched groups (Study n = 4). There were 73 children with SCD ( M = 18.25; SD = 7.23; range: 14-29) and 68 typically-developing children across studies ( M = 17.00; SD = 6.78; range: 14-27). The average weighted effect size was moderate and in favor of comprehension age-matched readers, but it was not statistically significant (random-effects robust variance estimation: d = 0.32, 95% CI [−0.49, 1.14]). These findings suggest that the oral language performance of children with SCD is similar to the performance of younger typical readers. In other words, there is a developmental delay in the oral language skills of children with SCD. Study-specific effect sizes for oral language, participant ages, and sample sizes for these comparisons are reported in Appendix B ; effect sizes are reported in descending order. Across studies, the variability due to heterogeneity was relatively high, I 2 = 77.13 (see Table 1 ). Sensitivity analyses indicated that the observed effect size was quite robust; varying values of ρ resulted in a .02 difference (τ 2 =.402 when ρ = 0; .423 when ρ = 1), which was minimal. However, because the degrees of freedom for these analyses were less than four, it is important to interpret these results cautiously ( Fisher & Tipton, 2015 ). An Egger test of funnel plot asymmetry was non-significant, z = −0.71, p = .48 (see Figure 1 ).

Within-child comparisons of reading comprehension and oral language for children with SCD

In addition to comparing the language and literacy skills of children with SCD to typically-developing readers and comprehension age-matched readers, we also compared the oral language skills to reading comprehension within children who have SCD. The aim of this meta-analysis was so test to robustness of the results (i.e., would the same pattern of findings emerge if comparisons were made for the same group of children [within-group] as opposed to comparisons across different groups [between-group]). Thus, we additionally conducted analyses that examined the reading comprehension and oral language skills within individuals.

Ninety-seven comparisons were included within the analysis (Study n = 32). There were 12,711 children with SCD ( M = 397.22; SD = 822.21; range:7-2,830). Because these analyses included children with SCD, we corrected correlations for range restriction using Thorndike's (1949) correction equation. 5 The average weighted effect size was moderate, negative, and statistically significant (random-effects robust variance estimation: d = −0.84, 95% CI [−1.06, −0.62]), which indicates that the reading comprehension skills of children with SCD are significantly weaker than their oral language skills. These results can be found in Table 1 . Study-specific effect sizes, participant ages, and sample sizes for these comparisons are reported in Appendix C ; effect sizes are reported in descending order. Across studies, the variability due to heterogeneity was substantial, I 2 = 96.06. However, sensitivity analyses indicated that the observed effect size was fairly robust; varying values of ρ resulted in no difference in estimates of τ 2 . An Egger test of funnel plot asymmetry was non-significant for these comparisons, z = 1.33, p = .18 (see Figure 1 ).

It is important to note that different sets of studies were included within our analyses of between-group and within-child comparisons. This may explain why the difference between reading comprehension and oral language performance within children ( d = −0.84) was not equivalent to the differences found between groups for reading comprehension and oral language (effect size difference between −2.78 and −0.78 was −2.00). We empirically tested this by analyzing only those studies that were included within the between-group reading comprehension (random-effects robust variance estimation: d = −2.73, 95% CI [−3.05, −2.42]; I 2 = 96.82) and oral language comparisons (random-effects robust variance estimation: d = −0.95, 95% [CI −1.06, −0.83]; I 2 = 91.00) and the within-child comparisons. Applying this method, we achieved anoticeable reduction in the effect size differences across comparisons (effect size difference between −2.73 and −0.95 was 1.78). This outcome may be a partially due to the absence of publication bias within the within-group comparisons relative to the potential presence of publication bias within the reading comprehension and oral language comparisons.

Moderator Analyses

Metaregressions of study type, age, and oral language measures for comparisons of children with scd to typical readers.

Due to the substantial amount of heterogeneity across studies, we were interested in examining three possible moderators – age, type of oral language measure, and study type (i.e., published journal article, book chapter, thesis/dissertation, unpublished data) – that may explain effect size differences among various studies (see Table 1 and Appendices D and E ). Due to the dependency of effect sizes across studies, we used robust variance estimation to conduct moderator analyses for the present comparisons.

Study type, β = .14, p > .05, t (11.8) = 1.05, was not a significant moderator of differences in effect size estimates for reading comprehension for comparisons of children with SCD to typical readers. However, age, β = −.47, p < .05, t (23.9) = −2.53, was a significant moderator of effect size differences. Next, we examined moderators for comparisons of oral language. Neither study type nor age were significant moderators of differences in effect size outcomes for oral language, β = −.04, p > .05, t (17) = −0.77 for study; β = −.06, p > .05, t (20.1) = −0.85 for age. Because oral language was assessed using different measures across studies, we also conducted a metaregression to examine the potential for differences in oral language measures to be a moderator of effect size outcomes. Because oral language varied both within and across studies, it is important to include both the mean (i.e., between-study covariate) and mean-centered predictors (i.e., within-study covariate) within the moderator analyses to account for the potentially hierarchical structure of the effect size dependencies ( Fisher & Tipton, 2015 ). Using this method, type of oral language measure was not a significant moderator of effect size across studies, β m = −.05, p > .05, t (16.5) = −0.91; β mc = .00, p > .05, t (16.9) = 0.02.

Metaregressions of study type, age, and oral language measures for comparisons of children with SCD to comprehension-age matched readers

We also examined potential moderators within our reading comprehension age-matched comparisons (see Table 2 ). Similar to our between group comparisons, the type of oral language measure, β m = −.10, p > .05, t (1.08) = −0.18; β mc = −.23, p > .05, t (1.20) = −1.05, was not a significant moderator of effect size for the oral language comparisons. 6 However, because the degrees of freedom were less than four, this finding should be interpreted cautiously. Study type and the age range of participants was constant across studies, thus negating the need to conduct moderator analyses for these constructs for the reading comprehension and oral language comparisons.

ComparisonsConstructsModeratorsβ
SCD and typical readers
Reading comprehension
Study type.141.05
Age−.47 −2.53
Oral language
Study type−.04−0.77
Age−.06−0.85
Oral language measure −.05−0.91
Oral language measure .000.05
SCD and comprehension-age match
Reading comprehension
Age
Study type
Oral language
Age
Study type
Oral language measure −.10−0.18
Oral language measure −.23−1.05
SCD OnlyReading comprehension, oral language
Study Type−.24 −2.77
Age−.00−.02
Oral language measure .20 2.35
Oral language measure −.03−0.85

Note . SCD = Children with specific reading comprehension deficits;

Metaregressions of study type, age, and oral language measures for within-child comparisons

We examined the moderators of study type, age, and oral language measure within our within-group comparisons as well, which are summarized in Table 2 . Study type was a significant predictor of differences in effect size, β = −.24, p < .01, t (15.3) = −2.77. Similarly, type of oral language measure was a significant predictor at the mean, β m = .20, p < .01, t (15.40) = 2.35; β mc = −.03, p > .05, t (8.30) = −0.85. Age, however, was a non-significant predictor in the model, β = −.00, p > .05, t (12.9) = −0.02.

The aim of the present meta-analysis was to determine the nature of the comprehension problems for children with SCD. This investigation was guided by three competing hypotheses: (a) children with SCD have comprehension deficits that are specific to reading; (b) children with SCD have comprehension deficits that are general to reading and oral language; or (c) children with SCD have comprehension problems that extend beyond reading but are more severe for reading than for oral language. The findings of the present meta-analysis support the third hypothesis. Children's weakness in oral language was substantial ( d = −0.78), but not as severe as their deficit in reading comprehension ( d = −2.78). The effects size estimates for oral language were comparable regardless of whether verbal working memory was included in the analysis ( d = −0.77). Within-child comparisons also indicated that performance in reading comprehension was worse than for oral language ( d = −0.84). The pattern of poorer performance in reading comprehension compared to oral language was consistent across all analyses.

When compared to comprehension age-matched readers, children with SCD tended to have comparable oral language ( d = 0.32, ns ) and reading comprehension skills ( d = −0.31, ns ). The fact that older children with SCD did not differ from younger normal readers on reading comprehension was expected rather than informative because the groups were matched on reading comprehension. However, the fact that they did not differ in oral language is informative. It supports the idea that the oral language weaknesses for children with SCD are best characterized as arising from a developmental delay as opposed to a developmental deviance ( Francis et al., 1996 ). A developmental deviance would have been supported had the oral language performance of the older children with SCD been worse than that of the younger comprehension-age matched normal readers.

Overall, our results are consistent with previous investigations. Children with SCD perform poorly on a range of oral language assessments including receptive and expressive vocabulary knowledge, listening comprehension, story structure, knowledge of idioms, awareness of syntactic structure, and morphological awareness among others ( Cain, 2003 ; Cain, 2006 ; Cain & Oakhill, 1996 ; Cain et al., 2005 ; Carretti et al., 2014 ; Nation & Snowling, 2000 ; Oakhill et al., 1986 ; Stothard & Hulme, 1996 ; Tong et al., 2011 , 2014 ; Yuill & Oakhill, 1991 ). These weaknesses emerged despite children's adequate decoding and seemingly intact phonological processing abilities ( Nation & Snowling, 2000 ; Nation et al., 2007 ; Stothard & Hulme, 1992 ). Yet, this pattern makes sense given that phonological processing appears to underlie decoding ability ( Nation et al., 2007 ; Shankweiler et al., 1999 ; Stothard & Hulme, 1996 ).

Explanations for Greater Deficits in Reading Comprehension than in Oral Language

A number of possible explanations for the observed discrepancies between reading comprehension and oral language exist. Although it is not possible to test alternative explanations in the context of the present meta-analysis, they could be tested in future studies.

A latent decoding deficit

At first glance, it seems counterintuitive that a decoding deficit would explain comprehension differences in children with SCD. However, in several studies, only decoding accuracy was used to categorize children (e.g., Cain & Oakhill, 2006 ). It is possible to be adequate in decoding accuracy yet inadequate in decoding fluency. In fact, this is a common outcome of intervention studies (e.g., de Jong & van der Leij, 2003; Torgesen & Hudson, 2006 ). The effortful application of phonics rules or other decoding strategies can result in accurate but slow decoding. This could impair reading comprehension because children's reading would be less automatic ( LaBerge & Samuels, 1974 ) and/or because fewer cognitive resources would be available for comprehension (e.g., Perfetti, 1985 ). This possible explanation could be tested in future studies by using measures of decoding fluency as well as accuracy. A dual-task paradigm could also be used to determine whether the cognitive resources required by decoding were comparable for children with and without SCD.

Differences between written and oral language

Written language differs from oral language in important ways ( Perfetti et al., 2005 ). Written language oftentimes contains more complex sentence structures and more difficult vocabulary than spoken language ( Akinnaso, 1982 ; Halliday, 1989 ). Thus, if children are having difficulty completing tasks that require the use of syntactic knowledge, for instance, they will most likely have difficulty reading grammatically complex texts. Fundamental differences between written and spoken text may also extend to increased demands on background knowledge (e.g., Wolfe & Woodwyck, 2010 ). Background knowledge has been identified as a critical component within several models of reading comprehension ( Kintsch, 1988 ; Kintsch, & van Dijk, 1983 ; Rumelhart, 1980 ). For instance, Kintsche and van Dijk's (1983) situation model describes the comprehension process as arising from an interaction of three mental models: the reader's text representation, semantic or meaning-based representation, and situational representation (i.e., prior knowledge, experiences, and interest).

There is also empirical evidence for the importance of background knowledge in reading comprehension (e.g., Stahl, Hare, Sinatra, & Gregory, 1991 ). This may explain why children with SCD also have problems with elaborative inference making and comprehension monitoring ( Cain et al., 2001 ; Oakhill, 1984 , 1993 ; Oakhill & Yuill, 1996 ). Further, differences in the amount of background knowledge required across oral language and reading comprehension tasks may explain the present pattern of skill deficits. This explanation could be tested in future studies by having children perform reading comprehension and listening comprehension tasks on identical passages and have the tasks counterbalanced across two groups. However, deficits in background knowledge may not sufficiently explain why children have SCD. In some instances, children with SCD continue to perform below expectations even after background knowledge is controlled (e.g., Cain & Oakhill, 1999 ; Cain et al., 2001 ).

Regression to the mean

Another potential explanation for the discrepancy between the reading comprehension and oral language skills of children with SCD is regression to the mean. Across studies, children were selected on the basis of poor reading comprehension. This design can lead to an over-representation of children whose observed reading comprehension score is below their true score. Consequently, they will regress to their true score on almost any subsequent measure that is correlated with the original measure. In the present context, children who were selected on the basis of poor reading comprehension may perform less poorly on oral language due to regression to the mean. Future studies could test this hypothesis by administering a second reading comprehension measure and then comparing performance on this measure to oral language. Using another design that does not involve selection based on poor reading comprehension performance would also be helpful to rule out this explanation.

Theoretical and Practical Implications of the Findings

We began this article with a review of theories of reading comprehension. We now consider the implications of our results for the theories that we reviewed. We first consider our results within the simple view of reading framework. ( Gough & Tunmer, 1986 ; Hoover & Gough, 1990 ). Based on this framework, the view is that reading comprehension is the product of decoding and oral language comprehension. Our results are not consistent with the common version of the simple view in which reading comprehension is predicted by additive effects (i.e., main effects) of decoding and oral language comprehension. If the simple view is operationalized as the interaction (i.e., multiplicative effects) between decoding and oral language comprehension, however, the results could be considered consistent with this framework. Essentially, the oral language deficit of children with SCD interacts with their decoding to produce reading comprehension that is more impaired than would be accounted for by the simple main effects. This same logic would apply to interactive activation models of reading to the extent that the interactive activation is truly interactive.

As is emphasized by the simple view and interactive models of reading comprehension, oral language is a critical component of reading comprehension. This assertion is supported by the current findings and previous studies ( Kendeou et al., 2009 ; Roth et al., 2002 ). For instance, two studies included within the present meta-analysis, Catts et al. (2006) and Nation et al. (2004) , found that a substantial portion of children who are identified as having specific language impairment (SLI) also have coexisting reading comprehension difficulties. In both investigations, 30% or one-third of children with SCD were eligible for SLI identification. Even children who were not identified as having SLI were identified as having subclinical levels of poor language comprehension (Catts et al.). Children with SCD had very poor performance on the vocabulary measure and grammatical understanding task. Catts et al. and Nation et al. referred to this subclinical poor language comprehension as hidden language impairment because these children are not typically classified as having SLI. Yet, these impairments could still potentially lead to the comprehension problems observed in these children.

If we allow for the possibility of a latent decoding problem, then nearly all of the theories of reading comprehension could account for the pattern of results that were obtained. Similarly, if we allow for the possibility of differences between written and oral language, the results would be consistent with multiple theories of reading. It will be important to carry out research to determine the best explanation for the pattern of a greater deficit in reading comprehension than in oral language. The outcome of this research will potentially affect implications for theories of reading. For example, if the pattern of a greater deficit in reading comprehension than in oral language is found when (a) groups are matched on decoding fluency as well as accuracy, (b) the reading and oral language tasks are for equivalent material, and (c) the study design eliminates the possible confound of regression to the mean, the results would only be consistent with a theory of reading that had an interactive component in addition to whatever main effects might be represented.

The implications for practice are threefold. First, the results suggest that early oral language measures may serve as a means of identifying children who are at risk for later reading comprehension problems ( Cain & Oakhill, 2011 ; Justice et al., 2013 ; Kendeou et al., 2009 ; Nation & Snowling, 2004 ; Nation et al., 2010 ; Roth et al. 2002 ). Oral language weaknesses for children with SCD are evident fairly early on, are relatively stable over time, and are predictive of future reading comprehension performance (e.g., Cain & Oakhill, 2011 ; Justice et al., 2013 ; Nation et al., 2010 ). Thus, oral language measures can potentially serve as a screening method to identify which children have weaknesses in language skills. However, this must be approached cautiously because not all oral language measures are equally predictive of a child's future reading comprehension status. For instance, Tong et al. (2011) gave children with SCD morphological tasks that assessed derivational morphological awareness. Performance of readers with SCD in Grade 3 did not significantly differentiate children with SCD from those with normal reading comprehension in Grade 5. Yet, performance on this morphological task in Grade 5 did result in significant differences between the two groups. This suggests that measures of derivational morphological awareness, for instance, may not be ideal for assessing early oral language skills (see Nippold & Sun, 2008 ). Consequently, it is important to consider this when selecting potential screening measures.

Second, the findings suggest that children with deficits in critical oral language skills should receive targeted oral language instruction and intervention. Intervention studies focusing specifically on children with SCD have indicated that interventions containing an oral language component are more effective. For example, Clarke, Snowling, Truelove, and Hulme (2010) randomly assigned three groups of 8- and 9-year-olds with SCD to receive three different types of interventions: text comprehension training, oral language training (without reading or writing), and a combined text comprehension-oral language training format. All three groups showed reliable and statistically significant gains in reading comprehension compared to the control group; however, the group that received the oral language training maintained the greatest gains after an 11-month follow up (for a review, see Snowling & Hulme, 2012 ). These outcomes are also aligned with the findings of the present review. Thus, classroom instruction and intervention that incorporate elements that encourage comprehension proficiency, such as reading fluency ( NICHD, 2000 ) and oral language ( Snow et al., 1998 ), will likely be more effective at remediating reading comprehension difficulties.

Third, the current investigation highlights the need to develop a consistent operational definition of SCD (see Rønberg & Petersen, 2015 ). For studies included in the present investigation, there were multiple ways in which children with SCD were identified. Differences in identification criteria are potentially problematic because it can lead to over- or under-identification. Such differences can also potentially lead to different groups of children being identified as having SCD over time. Yet, variability in identification criteria is not exclusive to the present population of poor readers. There remains much discourse about this issue more broadly within the field of learning disabilities ( Mellard, Deshler, & Barth, 2004 ).

Limitations and Future Directions

There are several limitations of the present meta-analysis that must be addressed. First, the present review focused specifically on monolingual school-age children. Consequently, the results may not apply to second-language learner or adult populations. Second, several studies included in the present review used the Neale Analysis of Reading to assess reading comprehension and decoding ability without incorporating an additional measure of either skill. This is potentially problematic because both decoding and comprehension scores are obtained simultaneously as children read passages. Decoding problems could therefore affect comprehension scores (see Spooner et al., 2004 ). Third, we did not examine the effect of IQ on the obtained effect size estimates. It may be the case that variability in IQ may affect effect size outcomes. Fourth, it is important to acknowledge the potential presence of some publication bias for the between-group comparisons of reading comprehension and oral language. This may contribute to the larger deficits seen between these skills.

Another limitation of this meta-analysis is that it does not address possible causal relations between the deficits in oral language and reading comprehension. It is certainly possible that poor oral language skills may contribute to the deficits in reading comprehension; children must know a substantial portion of the words in a text in order to comprehend it ( Hu & Nation, 2000 ; Kendeou et al., 2009 ). However, it is also possible that poor reading comprehension constrains future vocabulary growth because text reading provides a basis for incidental word learning ( Cain et al., 2004 ). These relations may also be reciprocal (e.g., Wagner, Muse, & Tannenbaum, 2007 ). Additionally, the general absence of longitudinal data did not allow for a more comprehensive examination of the developmental delay versus deficit hypotheses. A final limitation of the present study is that it was limited to children who were monolingual speakers of their native language. It is increasingly common for children to know more than one language. Would the results of the present meta-analysis generalize to children who were second-language learners? We decided to answer this question by carrying out a similar meta-analysis of children with poor reading comprehension yet adequate decoding, but for children who were second-language learners ( Authors, 2017 ). Sixteen studies were identified that met inclusionary and exclusionary criteria. Hedge's g was used as the effect-size measure, random-effects models were used, and robust variance estimation was used to correct significance testing for dependent effect sizes. The results were remarkably consistent with those of the present meta-analysis. A deficit in oral language was replicated with an average weighted effect size of -0.80. The pattern of the deficit in oral language being only about a third as large as the deficit in reading comprehension was also replicated, with an average weighed effect size of -2.47. In summary, the pattern of results found in the present meta-analysis of studies whose participants were monolingual children generalize to children who are second language learners.

In conclusion, children who have SCD are typically impaired in oral language, but not to the degree they are impaired in reading comprehension. Consequently, the oral language impairment is not sufficient to explain the impairment in reading comprehension. Possible explanations for this pattern of results were considered, including a latent decoding deficit, differences between written and oral language, regression to the mean, and interactive effects. Testing these alternative explanations and others that might be considered represents a critical next step to advance our understanding of an important problem in reading.

Acknowledgments

This research was supported by Grant Numbers P50 HD52120 and 1F31HD087054-01 from the National Institute of Child Health and Human Development, Grant Numbers R305F100005 and R305F100027 from the Institute for Education Sciences, and a Predoctoral Interdisciplinary Training Grant Number R305B090021 from the Institute for Education Sciences.

Appendix A. 

Study descriptions and effect size estimates for children with specific reading comprehension deficits and typical readers (Study n = 86).

AuthorsParticipant AgeStudy (SCD; TR)RC Effect Size ( )OL Effect Size ( )
Grade 3; Grade 67; 8−3.15−3.10
6 – 925 – 41; 142 – 197−4.55−1.72, −1.66, −1.61, −1.59, −1.49, −1.31, −1.07
; ; ; 7 – 816; 12−3.77−1.29, −0.95, −0.46, −0.29, −0.28, −0.24, −0.24, −0.22, −0.15, −0.15, 0.02, 0.07, 0.09, 0.11, 0.21, 0.23, 0.43, 0.45, 0.90
7 – 816; 16−4.61−0.13
; 7 – 814; 12−3.88−2.03, −1.92, −1.50, −1.19, −0.81, −0.45, −0.44, −0.42, −0.06, −0.04, 0.00
9 – 1013; 13−3.72−0.48, −0.13
; ; ; Oakhill & Cain (2007)7 – 829; 24−3.77, −0.52−0.23
; 7 – 1123; 23−2.81, −1.99−1.21, −0.91, −0.81, −0.58, −0.45, −0.32
8 – 1117; 14−3.74, −2.03−1.18, −1.03, −0.45, −0.25
9 – 1015; 15−3.46−1.16, −1.06, −0.79, −0.66, −0.52, −0.32, −0.13, −0.05, 0.14, 0.17
8 – 1012; 1010 – 12; 10 – 12−3.77, −3.13, −3.09, −2.99−0.64, −0.14, −0.05, 0.13
7 – 815; 15−3.000.13
9 – 1012; 1213; 12−3.72, −3.24−0.74, −0.42, −0.36, −0.16
; 9 – 1014; 14−0.45−1.00, −0.87, −0.64, −0.61, −0.57, −0.55, −0.53, −0.51, −0.41, −0.24, −0.20, −0.11, −0.08, −0.03, 0.14
7 – 813; 13−3.14−1.01, −0.62, −0.20
8 – 1012; 12−5.87−1.79, −1.67, −1.18, −0.97, −0.86, −0.32, −0.27
8 – 1038; 38−4.60−0.25
= 10.3312; 12−3.92, −1.33, −0.93, −0.90, −0.31−0.52
Grade 2; Grade 457; 98−1.65, −0.96
8 – 1014 – 20; 13 – 22−2.31, −2.31, −2.26, −2.24, −2.21, −2.15, −1.19, −0.95, −0.84, −0.37−1.11, −1.08, −1.07, −1.06, −1.00, −0.91, −0.89
Corso, Sperb, & Salles (2014)9 – 1219; 58−0.76, 0.48
9 – 1111; 19−4.40, −1.14−0.81
; 9 – 1023; 23−4.51−0.20
= 4.92 – 10.556; 56−4.30, −0.86, −0.55−1.58, −1.52, −1.42, −1.40, −1.27, −1.05
FDE PMRN Database 2003 (unpublished)Grade 22,830 – 2,861; 15,221 – 15,297−2.75−0.92
FDE PMRN Database 2003-2005 (unpublished)Grades 1 – 21,689 – 3,732; 15,917 – 26,052−2.36, −2.62−0.65, −0.94
FDE PMRN Database 2004-2006 (unpublished)Grades 1 – 21,825 – 4,199; 19,876 – 28,586−2.09, −2.59−0.66, −0.96
FDE PMRN Database 2005-2007 (unpublished)Grades 1 – 22,242 – 3,835; 28,064 – 30,067−2.10, −2.56−0.67, −0.95
FDE PMRN Database 2006-2008 (unpublished)Grades 1 – 22,102 – 3,212; 27,680 – 31,253−2.11, −2.51−0.69, −1.05
FDE PMRN Database 2008 (unpublished)Grade 11,369 – 1,379; 28,970 – 29,412−2.09−0.73
; Grade 57; 23−2.05, −1.72, −1.12
8.08 – 9.5026; 225−2.21−1.49, −1.44
Henderson, Snowling, & Clarke (2012)9 – 1117; 17−2.68−1.20, −1.02, −0.85, −0.80, −0.64, −0.64, −0.50, −0.42, −0.41, −0.39, −0.36, −0.36, −0.28, −0.20, −0.17, 0.00, 0.00
Grade 424; 24−4.95−1.07
4; Grade 514 – 15; 32−3.87−1.22, −0.86
Kasperski & Katzir (2012)9.3 – 10.418; 21 – 31−6.53, −3.55−0.91, −0.87
9.1 – 1212; 95−2.29−0.65, −0.54
Lesaux, Lipka, & Siegel (2006) = 9.81 – 9.8565; 314−2.95−0.59
9 – 1121; 20−3.92−1.05
Megherbi & Ehrlich (2006); 6.2 – 8.618; 31\15; 29−5.20, −4.30−2.06, −1.96
7 – 1017; 17−2.84−3.08
7 – 1013; 13−3.02−2.94
; Nation, Adams, & Bowyer-Crane (1999)8.5 – 9.516; 16−3.56−2.65, −1.88, −1.84, −0.75, −0.71, −0.47
= 10.69 – 10.7716; 16−3.21
= 9.20 – 9.3015; 15−3.64−1.65, −1.49, −1.25, −1.09, −0.87
7 – 925; 24−3.46−1.59, −1.44
7.75 – 9.2511 – 23; 9 – 22−3.44−1.75, −1.69, −1.45, −1.42, −1.31, −0.95, −0.85, −0.77, −0.77, −0.61
5 – 815; 15−3.57, −2.05, −0.92, −0.76−1.78, −1.20, −1.13, −1.07, −0.99, −0.98, −0.93, −0.82, −0.78, −0.77, −0.75, −0.74, −0.74, −0.60, −0.22
10 – 1211; 11−2.97−2.55, −1.81, −1.63, −0.88
= 8.75 – 8.8610; 10−5.90−1.60, −1.44, −1.06, −0.81
7.1 – 920; 20−3.79−1.96, −1.88, −1.49, −1.35, −0.69, −0.67, −0.65, −0.64, −0.62, −0.51
8 – 912; 12−3.30−2.14, −1.65, −1.25, −1.00, −0.88
9 – 127; 20−3.35, −1.00, −0.51−1.98, −1.44, −0.81, −0.78
7.3 – 10.213; 85\27; 72−4.11, −2.81
911; 11−3.98
9 – 1012; 12−3.03, −2.45, −1.62−0.42
7 – 812; 12−1.61−0.09
= 9.5318; 20−3.10
; , 7 – 814 – 16; 14 – 16−3.09, −2.98−1.15, −0.73, −0.22, 0.08
8 – 1018; 18−5.89−0.48
Ricketts (unpublished)7.75 – 8.6515; 15−3.39−0.80, −0.50
; ; 8 – 1015; 15−5.24, −1.92−1.93, −1.81, −1.81, −1.51
7.5 – 9.517; 114−2.51, −1. 94, −1,86
7 – 816; 16\29; 51−4.68, −4.41, −1.33\−0.30−0.44
7 – 816; 16\16; 16\16; 16−4.06, −1.17, −1.08−0.13
; , , 7 – 814; 14−4.40−2.15, −1.39, −1.17, −0.97, −0.25
= 11.2530; 34−6.20−1.49
9 – 1015; 15−2.46, −1.53−1.23, −0.99, −0.35, −0.03
= 8.72 – 8.7918; 18−1.16−0.87, −0.60, −0.43, −0.27, −0.08, 0.00, 0.45
Torppa et al. (2006)5 – 816 – 167; 27 – 637−3.49, −3.08, −1.69, −1.60, −1.43, −0.96−1.20, −0.97, −0.91, −0.87, −0.75, −0.50, −0.50, −0.29
7 – 89; 9−2.54, −2.250.14, 0.43
7 – 912; 12−1.260.00
7 – 86 – 7; 6 – 7−3.16, −2.35, −2.29, −1.32−0.49, −0.25, 0.07, 0.34
7 – 816; 16−3.46−0.05
; Garham, Oakhill, & Johnson-Laird (1982); , , , , 198, ; ; Oakhill & Yuill (1998b); ; 7 – 84 – 96; 4 – 96−3.14, −3.00, −2.61, −2.45, −2.45, −2.35, −2.32, −2.32, −2.06, −1.71, −1.43, −1.26, −1.16, −1.07, −1.00, −0.58−1.30, −1.22, −1.14, −0.59, −.046, −0.38, −0.36, −0.29, −0.16, −0.08, −0.07
7 – 98; 8−2.05
; 7 – 842; 42−2.87, −2.12−0.09, 0.00
4 – 1022 – 30; 22 – 30−3.50, −3.23, −2.95, −2.53−1.22, −1.16, −1.05, −0.94, −0.91, −0.91, −0.83, −0.78, −0.74, −0.69, −0.67, −0.67, −0.64, −0.58, −0.58, −0.45, −0.44, -0.35, −0.28, −0.22, −0.21, −0.12

Note. RC = Reading comprehension; OL = Oral language; SCD = Children with specific reading comprehension deficits; TR = Typical readers.

Appendix B. 

Study descriptions and effect sizes for children with specific reading comprehension deficits compared with comprehension-age matched readers (Study n = 4).

AuthorsAgeStudy (SCD; TD)RC Effect Size ( )OL Effect Size ( )
, ; 6 – 814; 12−0.27−0.99, −0.32, −0.27, −0.05, 1.17
6 – 816; 15−0.50−0.81, −0.47, −0.38, −0.18, −0.12, −0.12, −0.11 −0.07, 0.02, 0.02. 0.12. 0.15, 0.18, 0.20, 0.23, 0.25, 0.34, 0.62, 0.77
6 – 829; 27−0.270.99
, , 6 – 814; 14−0.21−0.26, 0.02, 0.08, 0.19, 1.33

Appendix C. 

Study descriptions and effect size estimates for within-child comparisons (Study n = 32).

AuthorsAgeStudy Effect Size ( )
Grade 3; Grade 67− 0.63
6 – 925 – 41− 1.12, − 1.08, − 1.03, − 0.89, − 0.82, − 0.79, − 0.75
1313− 1.55
9 – 1012− 1.75
8 – 1014 – 15− 1.80, − 1.77, − 0.43, − 0.35, − 0.23, − 0.22
9 – 1111− 0.07
= 4.92 – 10.556− 0.33, − 0.26, 0.54, 0.60, 0.64, 0.77
FDE PMRN Database 2003 (unpublished)Grade 22,830− 1.29
FDE PMRN Database 2003-2005 (unpublished)Grades 1 – 21,689 – 2,273− 1.31, − 1.01
FDE PMRN Database 2004-2006 (unpublished)Grades 1 – 21,825 –2,312− 1.69, − 1.39
FDE PMRN Database 2005-2007 (unpublished)Grades 1 – 22,175 – 2,242− 1.37, − 1.00
FDE PMRN Database 2006-2008 (unpublished)Grades 1 – 21,379 – 2,102− 1.44, − 1.01
FDE PMRN Database 2008 (unpublished)Grade 11,369− 1.42
8.08 – 9.5026− 0.54
4; Grade 514 – 15− 0.15, 0.01
9.1 – 1112− 2.48, − 0.62
9 – 12210.28
7 – 923− 1.27, − 0.52, − 0.06, 0.18
− 0.99, −.39
5 – 815− 1.04, − 0.74, − 0.70, − 0.67, − 0.63, − 0.57, − 0.43, − 0.23, − 0.16, − 0.03, 0.03, 0.03, 0.15, 0.15, 0.17, 0.26, 0.51, 0.60
10 – 1211− 0.52, − 0.31, − 0.10, 0.20
; 8.5 – 9.516− 0.98, − 0.66
7.1 – 920− 1.21, − 0.48, 0.00, 0.57
8 – 912− 0.54, − 0.42, − 0.40, 0.28
Ricketts (unpublished)7.75 – 8.6515− 0.58, − 0.41
; 8 – 1015− 0.57, − 0.52, − 0.35, − 0.34
7.5 – 9.517− 1.95, − 0.92, − 0.78, − 0.58, − 0.28, − 0.20, − 0.14
7 – 816− 2.18
7 – 814− 1.44
= 11.2530− 1.75
4 – 1030; 22− 1.55, − 1.29, − 1.23, − 0.64

Appendix D. 

Coding scheme for study type, participant age, and type of oral language measure.

ConstructTypeCode
Study TypeJournal article1
Thesis/Dissertation2
Unpublished data3
Book chapter4
Age
Predominantly including participants who were 6 years and below1
Predominantly including participants who were 6 to 8/9 years (if including years 7-9)2
Predominantly including participants who were 8/9 years and above (if including years 10-12)3
Oral Language
Vocabulary1
Narrative (e.g., storytelling)2
Listening Comprehension3
Syntactic/Grammar4
Semantic Knowledge (e.g., morphological awareness)5
Figurative Language (e.g., idioms, expressions)6
Verbal working memory (e.g., verbal recall)7

Appendix E. 

Types of oral language skills assessed across studies (Study n = 86).

Type of Oral Language Measure Assessed
AuthorsVocabulary
Knowledge
Narrative
Skills
Listening
Comprehension
Syntax/
Grammar
Semantic
Knowledge
Figurative
Language
Verbal
WM
X
XXXXX
; ; XX
X
; XXX
XX
; ; X
; XXXXX
X
XXX
XX
X
XXX
; ; XXXX
XXXX
XXXX
XX
XX
--
XXX
Corso, Sperb, & Salles (2014)XXX
XX
; X
XXXX
FDE PMRN Database 2003-2004 (unpublished)X
FDE PMRN Database 2004-2005 (unpublished)X
FDE PMRN Database 2005-2006 (unpublished)X
FDE PMRN Database 2006-2007 (unpublished)X
FDE PMRN Database 2007-2008 (unpublished)X
; Massey-Garrison (2010)XXX
XX
Henderson, Snowling, & Clarke (2012)XX
X
XX
Kasperski & Katzir (2012)XX
XX
Lesaux, Lipka, & Siegel (2006)XX
X
X
X
X
XX
--
X
XX
XXXXXX
XXXX
XXX
X
XXXX
XXXX
XX
--
--
X
X
XX
; , XXX
X
Ricketts (unpublished)XX
; ; Ricketts, Sperling, & X
--
X
XX
; , , XXXXX
XX
XXX
XX
XXX
X
X
X
X
; ; Garham, Oakhill, & Johnson-Laird (1982); , , , , 198, ;; ; Oakhill & Yuill (1998b); ; XXXX
--
; X
XXX

Note. For some studies, oral language was assessed but not explicitly reported.

2 For some comparisons, this comparison included skilled comprehenders.

3 Although groups were matched, correlations for the same measure between the two groups were not reported in most instances; thus, independent effect sizes were calculated.

4 Although groups were matched, correlations for the same measure between the two groups were not reported in most instances; thus, independent effect sizes were calculated.

5 In several instances, studies did not report correlations. For these studies, an estimated correlation was substituted.

6 We also conducted moderator analyses for type of oral language measure without accounting for hierarchical structure and the results remained the same [β #x0003D; −0.31, p > .05, t(1.40) = −0.98].

References marked with an asterisk indicate studies included in the meta-analysis. The in-test citations to studies selected for meta-analysis are not preceded by asterisks.

  • DOI: 10.1002/RRQ.85
  • Corpus ID: 38746852

The New Literacies of Online Research and Comprehension: Rethinking the Reading Achievement Gap

  • D. J. Leu , Elena E. Forzani , +3 authors Nicole Timbrell
  • Published 2015
  • Reading Research Quarterly

Tables from this paper

table 5

294 Citations

Evaluating group differences in online reading comprehension: the impact of item properties, literacy skills and online research and comprehension: struggling readers face difficulties online, predicting fourth grade digital reading comprehension: a secondary data analysis of (e)pirls 2016, what are preadolescent readers doing online an examination of upper elementary students' reading, writing, and communication in digital spaces., exploring the relationship among new literacies, reading, mathematics and science performance of turkish students in pisa 2012.

  • Highly Influenced

Literacy, Equity, and the Employment of iPads in the Classroom: A Comparison of Secure and Developing Readers.

Roles of paper-based reading ability and ict-related skills in online reading performance, the relationship among ict skills, traditional reading skills and online reading ability., individual differences in evaluating the credibility of online information in science: contributions of prior knowledge, gender, socioeconomic status, and offline reading ability, 142 references, predicting reading comprehension on the internet.

  • Highly Influential

The New Literacies of Online Reading Comprehension: Expanding the Literacy and Learning Curriculum

Exploring new literacies pedagogy and online reading comprehension among middle school students and teachers: issues of social equity or social exclusion, exploring the online reading comprehension strategies used by sixth-grade skilled readers to search for and locate information on the internet., the new literacies of online reading comprehension: new opportunities and challenges for students with learning difficulties, working on understanding during collaborative online reading, science on the web: students online in a sixth-grade classroom, learning, teaching, and scholarship in a digital age, mosaic of thought: teaching comprehension in a reader's workshop, teaching students to evaluate source reliability during internet research tasks, related papers.

Showing 1 through 3 of 0 Related Papers

An IERI – International Educational Research Institute Journal

  • Open access
  • Published: 28 October 2021

The achievement gap in reading competence: the effect of measurement non-invariance across school types

  • Theresa Rohm   ORCID: orcid.org/0000-0001-9203-327X 1 , 2 ,
  • Claus H. Carstensen 2 ,
  • Luise Fischer 1 &
  • Timo Gnambs 1 , 3  

Large-scale Assessments in Education volume  9 , Article number:  23 ( 2021 ) Cite this article

3788 Accesses

2 Citations

2 Altmetric

Metrics details

After elementary school, students in Germany are separated into different school tracks (i.e., school types) with the aim of creating homogeneous student groups in secondary school. Consequently, the development of students’ reading achievement diverges across school types. Findings on this achievement gap have been criticized as depending on the quality of the administered measure. Therefore, the present study examined to what degree differential item functioning affects estimates of the achievement gap in reading competence.

Using data from the German National Educational Panel Study, reading competence was investigated across three timepoints during secondary school: in grades 5, 7, and 9 ( N  = 7276). First, using the invariance alignment method, measurement invariance across school types was tested. Then, multilevel structural equation models were used to examine whether a lack of measurement invariance between school types affected the results regarding reading development.

Our analyses revealed some measurement non-invariant items that did not alter the patterns of competence development found among school types in the longitudinal modeling approach. However, misleading conclusions about the development of reading competence in different school types emerged when the hierarchical data structure (i.e., students being nested in schools) was not taken into account.

Conclusions

We assessed the relevance of measurement invariance and accounting for clustering in the context of longitudinal competence measurement. Even though differential item functioning between school types was found for each measurement occasion, taking these differences in item estimates into account did not alter the parallel pattern of reading competence development across German secondary school types. However, ignoring the clustered data structure of students being nested within schools led to an overestimation of the statistical significance of school type effects.

Introduction

Evaluating measurement invariance is a premise for the meaningful interpretation of differences in latent constructs between groups or over time (Brown, 2006 ). By assessing measurement invariance, it is made certain that the observed changes present true change instead of differences in the interpretation of items. The present study investigates measurement invariance between secondary school types for student reading competence, which is the cornerstone of learning. Reading competences develop in secondary school from reading simple texts, retrieving information and making inference from what is explicitly stated, up to the level of being a fluent reader by reading longer and more complex texts and being able to infer from what is not explicitly stated in the text (Chall, 1983 ). In particular, students’ reading competence is essential for the comprehension of educational content in secondary school (Edossa et al., 2019 ; O’Brien et al., 2001 ). Reading development is often investigated either from a school-level perspective or by focusing on individual-level differences. When taking a school-level perspective on reading competence growth within the German secondary school system, the high degree of segregation after the end of primary school must be considered. Most students are separated into different school tracks on the basis of their fourth-grade achievement level to obtain homogenous student groups in secondary school (Köller & Baumert, 2002 ). This homogenization based on proficiency levels is supposed to optimize teaching and education to account for students’ preconditions, enhancing learning for all students (Baumert et al., 2006 ; Gamoran & Mare, 1989 ). Consequently, divergence in competence attainment already exists at the beginning of secondary school and might increase among the school tracks over the school years. Previous studies comparing reading competence development between different German secondary school types have presented ambiguous results by finding either a comparable increase in reading competence development (e.g., Retelsdorf & Möller, 2008 ; Schneider & Stefanek, 2004 ) or a widening gap between upper, middle, and lower academic school tracks (e.g., Pfost & Artelt, 2013 ) for the same schooling years. Increasing performance differences in reading over time are termed “Matthew effects”, in the biblical analogy of rich getting richer and the poor getting poorer (e.g., Bast and Reitsma, 1998 ; Walberg & Tsai, 1983 ). This Matthew effect hypothesis was first used in the educational context by Stanovich ( 1986 ) to examine individual differences in reading competence development. Besides this widening pattern, as described by the Matthew effect phenomena, also parallel or compensatory patterns in reading development can be present. Parallel development is the case, when studied groups initially diverge in their reading competence and similarly increase over time. A compensatory pattern describes a reading competence development, where an initially diverging reading competence between groups converges over time.

Moreover, findings on the divergence in competence attainment have been criticized as being dependent on the quality of the measurement construct (Pfost et al., 2014 ; Protopapas et al., 2016 ). More precisely, the psychometric properties of the administered tests, such as the measurement (non-)invariance of items, can distort individual- or school-level differences. A core assumption of many measurement models pertains to comparable item functioning across groups, meaning that differences between item parameters are zero across groups, or in case of approximate measurement invariance, approximately zero. In practice, this often holds for only a subset of items and partial invariance can then be applied, where some item parameters (i.e., intercepts) are held constant across groups and others are allowed to be freely estimated (Van de Schoot et al., 2013 ). Using data from the German National Educational Panel Study (NEPS; Blossfeld et al., 2011 ), we focus on school-level differences in reading competence across three timepoints. We aim to examine the degree to which measurement non-invariance distorts comparisons of competence development across school types. We therefore compare a model that assumes partial measurement invariance across school types with a model that does not take differences in item estimates between school types into account. Finally, we demonstrate the need to account for clustering (i.e., students nested in schools) in longitudinal reading competence measurement when German secondary school types are compared.

School segregation and reading competence development

Ability tracking of students can take place within schools (e.g., differentiation through course assignment as, for example, in U.S. high schools) or between schools with a curricular differentiation between school types and with distinct learning certificates being offered by each school track, as is the German case (Heck et al., 2004 ; LeTendre et al., 2003 ; Oakes & Wells, 1996 ). The different kinds of curricula at each school type are tailored to the prerequisites of the students and provide different learning opportunities. German students are assigned to different school types based on primary school recommendations that take primary school performance during fourth grade into account, but factors such as support within the family are also considered (Cortina & Trommer, 2009 ; Pfost & Artelt, 2013 ; Retelsdorf et al., 2012 ). Nevertheless, this recommendation is not equally binding across German federal states, leaving room for parents to decide on their children’s school track. Consequently, student achievement in secondary school is associated with the cognitive abilities of students but also with their social characteristics and family background (Baumert et al., 2006 ; Ditton et al., 2005 ). This explicit between-school tracking after fourth grade has consequences for students’ achievement of reading competence in secondary school.

There might be several reasons why different trajectories of competence attainment are observed in the tracked secondary school system (Becker et al., 2006 ). First, students might already differ in their initial achievement and learning rates at the beginning of secondary school. This is related to curricular differentiation, as early separation aims to create homogenous student groups in terms of student proficiency levels and, in effect, enhances learning for all students by providing targeted learning opportunities (Baumert et al., 2003 ; Köller & Baumert, 2002 ; Retelsdorf & Möller, 2008 ). Hence, different learning rates are expected due to selection at the beginning of secondary school (Becker et al., 2006 ). Second, there are differences in learning and teaching methods among the school tracks, as learning settings are targeted towards students’ preconditions. Differences among school types are related to cognitive activation, the amount of support from the teacher in problem solving and demands regarding students’ accomplishments (Baumert et al., 2003 ). Third, composition effects due to the different socioeconomic and ethnic compositions of schools can shape student achievement. Not only belonging to a particular school type but also individual student characteristics determine student achievement. Moreover, the mixture of student characteristics might have decisive effects (Neumann et al., 2007 ). For example, average achievement rates and the characteristics of students’ social backgrounds were found to have additional effects on competence attainment in secondary school (Baumert et al., 2006 ), beyond mere school track affiliation and individual characteristics. Hence, schools of the same school type were found to differ greatly from each other in their attainment levels and their social compositions (Baumert et al., 2003 ).

Findings from the cross-sectional Programme for International Student Assessment (PISA) studies, conducted on behalf of the OECD every three years since 2000, unanimously show large differences between school tracks in reading competence for German students in ninth grade (Baumert et al., 2001 , 2003 ; Nagy et al., 2017 ; Naumann et al., 2010 ; Weis et al., 2016 , 2020 ). Students in upper academic track schools have, on average, higher reading achievement scores than students in the middle and lower academic tracks. Reading competence is thereby highly correlated with other assessed competencies, such as mathematics and science, where these differences between school tracks hold as well.

A few studies have also examined between-school track differences in the development of reading competence in German secondary schools, with most studies focusing on fifth and seventh grade in selected German federal states (e.g., Bos et al., 2009 ; Lehmann & Lenkeit, 2008 ; Lehmann et al., 1999 ; Pfost & Artelt, 2013 ; Retelsdorf & Möller, 2008 ). While some studies reported parallel developments in reading competence from fifth to seventh grade between school types (Retelsdorf & Möller, 2008 ; Schneider & Stefanek, 2004 ), others found a widening gap (Pfost & Artelt, 2013 ; Pfost et al., 2010 ). A widening gap between school types was also found for other competence domains, such as mathematics (Baumert et al., 2003 , 2006 ; Becker et al., 2006 ; Köller & Baumert, 2001 ), while parallel developments were rarely observed (Schneider & Stefanek, 2004 ).

In summary, there might be different school milieus created by the processes of selection into secondary school and formed by the social and ethnic origins of the students (Baumert et al., 2003 ). This has consequences for reading competence development during secondary school, which can follow a parallel, widening or compensatory pattern across school types. The cross-sectional PISA study regularly indicates large differences among German school types in ninth grade but does not offer insight into whether these differences already existed at the beginning of secondary school or how they developed throughout secondary school. In comparison, longitudinal studies have indicated a pattern in reading competence development through secondary school, but the studies conducted in the past were regionally limited and presented inconsistent findings on reading competence development among German secondary school types. In addition to differences in curricula, learning and teaching methods, students’ social backgrounds, family support, and student composition, the manner in which competence development during secondary school is measured and analyzed might contribute to the observed pattern in reading competence development.

Measuring differences in reading development

A meaningful longitudinal comparison of reading competence between school types and across grades requires a scale with a common metric. To be more specific, the relationships between the latent trait score and each observed item should not depend on group membership. The interpretability of scales has been questioned due to scaling issues (Protopapas et al., 2016 ). While the item response theory (IRT) calibration is assumed to be theoretically invariant, it depends in practice on the sample, item fit, and equivalence of item properties (e.g., discrimination and difficulty) among test takers and compared groups. Hence, empirically discovered between-group differences might be confounded with the psychometric properties of the administered tests. For example, Pfost et al. ( 2014 ) concluded from a meta-analysis of 28 studies on Matthew effects in primary school (i.e., the longitudinally widening achievement gap between good and poor readers) that low measurement precision (e.g., constructs presenting floor or ceiling effects) is strongly linked with compensatory patterns in reading achievement. Consequently, measuring changes using reading competence scores might depend on the quality of the measurement. Regarding competence development in secondary school, measurement precision is enhanced through the consideration of measurement error, the consideration of the multilevel data structure, and measurement invariance across groups. A biased measurement model might result when measurement error or the multilevel data structure are ignored, while the presence of differential item functioning (DIF) can be evidence of test-internal item bias. Moreover, the presence of statistical item bias might also contribute to test unfairness and, thus, invalid systematic disadvantages for specific groups (Camilli, 2006 ).

Latent variable modeling for reading competence, such as latent change models (Raykov, 1999 ; Steyer et al., 2000 ), can be advantageous compared to using composite scores. When using composite scores representing latent competences, measurement error is ignored (Lüdtke et al., 2011 ). Hence, biased estimates might be obtained if the construct is represented by composite scores instead of a latent variable measured by multiple indicators and accounting for measurement error (Lüdtke et al., 2008 ). Investigating student competence growth in secondary school poses a further challenge, as the clustered structure of the data needs to be taken into account. This can for example be achieved using cluster robust standard error estimation methods or through hierarchical linear modeling (cf. McNeish et al., 2017 ). If the school is the primary sampling unit, students are nested within schools and classes. Ignoring this hierarchical structure during estimation might result in inaccurate standard errors and biased significance tests, as standard errors would be underestimated. In turn, the statistical significance of the effects would be overestimated (Finch & Bolin, 2017 ; Hox, 2002 ; Raudenbush & Bryk, 2002 ; Silva et al., 2019 ). As one solution, multilevel structural equation modeling (MSEM) takes the hierarchical structure of the data into account while allowing for the estimation of latent variables with dichotomous and ordered categorical indicators (Kaplan et al., 2009 ; Marsh et al., 2009 ; Rabe-Hesketh et al., 2007 ). Although explicitly modeling the multilevel structure (as compared to cluster robust standard error estimation) involves additional assumptions regarding the distribution of the random effects and the covariance structure of random effects, it allows for the partitioning of variance to different hierarchical levels and for cluster-specific inferences (McNeish et al., 2017 ).

Furthermore, regarding the longitudinal modeling of performance divergence, an interpretation of growth relies on the assumption that the same attributes are measured across all timepoints (Williamson et al., 1991 ) and that the administered instrument (e.g., reading competence test items) is measurement invariant across groups (Jöreskog, 1971 ; Schweig, 2014 ). The assumption of measurement invariance presupposes that all items discriminate comparably across groups as well as timepoints and are equally difficult, independent of group membership and measurement occasion. Hence, the item parameters of a measurement model have to be constant across groups, meaning that the probability of answering an item correctly should be the same for members of different groups and at different timepoints when they have equal ability levels (Holland & Wainer, 1993 ; Millsap & Everson, 1993 ). When an item parameter is not independent of group membership, DIF is present.

The aim of our study is to investigate the effects of measurement non-invariance among school types on the achievement gap in reading competence development in German secondary schools. Measurement invariance between secondary school types is investigated for each measurement occasion to test whether items are biased among the school types. Then, we embed detected DIF into the longitudinal estimation of reading competence development between school types. A model considering school-type-specific item discrimination and difficulty for items exhibiting non-invariance between school types is therefore compared to a model that does not consider these school-type specificities. To achieve measurement precision for this longitudinal competence measurement, we consider measurement error and the clustered data structure through multilevel latent variable modeling. Finally, we present the same models without consideration of the clustered data structure and compare school type effects on reading competence development.

It is our goal to investigate whether the longitudinal development of reading competence is sensitive to the consideration of measurement non-invariance between the analyzed groups and to the consideration of the clustered data structure. This has practical relevance for all studies on reading competence development, where comparisons between school types are of interest and where schools were the primary sampling unit. Such evaluations increase the certainty that observed changes between school types reflect true changes.

Sample and procedure

The sample consisted of N  = 7276 German secondary school students, repeatedly tested and interviewed in 2010 and 2011 (grade 5), 2012 and 2013 (grade 7), and 2014 and 2015 (grade 9) as part of the NEPS. Approximately half of the sample was female (48.08%), and 25.46% had a migration background (defined as either the student or at least one parent born abroad). Please note that migration background is unequally distributed across school types: 22.1% high school students, 26.9% middle secondary school students, 38.5% lower secondary school students, 31.2% comprehensive school students and 15.2% students from schools offering all tracks of secondary education except the high school track had a migration background. In fifth grade, the students’ ages ranged from 9 to 15 years ( M  = 11.17, SD  = 0.54). Students were tested within their class context through written questionnaires and achievement tests. For the first timepoint in grade 5, immediately after students were assigned to different school tracks, a representative sample of German secondary schools was drawn using a stratified multistage sampling design (Aßmann et al., 2011 ). First, schools that teach at the secondary level were randomly drawn, and second, two grade 5 classes were randomly selected within these schools. The five types of schools were distinguished and served as strata in the first step: high schools (“Gymnasium”), middle secondary schools (“Realschule”), lower secondary schools (“Hauptschule”), comprehensive schools (“Gesamtschule”), and schools offering all tracks of secondary education except the high school track (“Schule mit mehreren Bildungsgängen”). The schools were drawn proportional to their number of classes from these strata. Finally, all students of the selected classes for whom a positive parent’s consent was obtained before panel participation were asked to take part in the study. At the second measurement timepoint in 2012 to 2013, when students attended grade 7, a refreshment sample was drawn due to German federal state-specific differences in the timing of the transition to lower secondary education ( N  = 2170; 29.82% of the total sample). The sampling design of the refreshment sample resembles the sampling design of the original sample (Steinhauer & Zinn, 2016 ). The ninth-grade sample in 2014 and 2015 was taken at the third measurement timepoint and was a follow-up survey for the students from regular schools in both the original and the refreshment sample. Students were tested at their schools, but N  = 1797 students (24.70% of the total sample) had to be tested at least one measurement timepoint through an individual follow-up within their home context. In both cases, the competence assessments were conducted by a professional survey institute that sent test administrators to the participating schools or households. For an overview of the students being tested per measurement timepoint per school type, within the school or home context, as well as information on temporary and final sample attrition, see Table 1 .

To group students into their corresponding school type, we used the information on the survey wave when the students were sampled (original sample in grade 5, refreshment sample in grade 7). Overall, most of the sampled students attended high schools ( N  = 3224; 44.31%), 23.65% attended middle secondary schools ( N  = 1721), 13.95% attended lower secondary schools ( N  = 1015), 11.96% of students attended schools offering all tracks of secondary education except the high school track ( N  = 870), and 6.13% attended comprehensive schools ( N  = 446). Altogether, the students attended 299 different schools, with a median of 24 students per school. Further details on the survey and the data collection process are presented on the project website ( http://www.neps-data.de/ ).

Instruments

During each assessment, reading competence was measured with a paper-based achievement test, including 32 items in fifth grade, 40 items in seventh grade administered in easy (27 items) and difficult (29 items) booklet versions, and 46 items in ninth grade administered in easy (30 items) and difficult (32 items) booklet versions. The items were specifically constructed for the administration of the NEPS, and each item was administered once (Krannich et al., 2017 ; Pohl et al., 2012 ; Scharl et al., 2017 ). Because memory effects might distort responses if items are repeatedly administered, the linking of the reading measurements in the NEPS is based on an anchor-group design (Fischer et al., 2016 ). With two independent link samples (one to link the grade 5 and grade 7 reading competence tests and the other to link the grade 7 with the grade 9 test), drawn from the same population as the original sample, a mean/mean linking was performed (Loyd & Hoover, 1980 ). In addition, the unidimensionality of the tests, measurement invariance of the items regarding reading development over the grade levels, as well as for relevant sample characteristics (i.e., gender and migration background) was demonstrated (Fischer et al., 2016 ; Krannich et al., 2017 ; Pohl et al., 2012 ; Scharl et al., 2017 ). Marginal reliabilities were reported as good, with 0.81 in grade 5, 0.83 in grade 7, and 0.81 in grade 9.

Each test administered to the respondents consisted of five different text types (domains: information, instruction, advertising, commenting and literary text) with subsequent questions in either a simple or complex multiple-choice format or a matching response format. In addition, but unrelated to the five text types, the questions covered three types of cognitive requirements (finding information in the text, drawing text-related conclusions, and reflecting and assessing). To answer the respective question types, these cognitive processes needed to be activated. These dimensional concepts and question types are linked to the frameworks of other large-scale assessment studies, such as PISA (OECD, 2017 ) or the International Adult Literacy Survey (IALS/ALL; e.g., OECD & Statistics Canada 1995 ). Further details on the reading test construction and development are presented by Gehrer et al. ( 2003 ).

Statistical analysis

We adopted the multilevel structural equation modelling framework for the modeling of student reading competence development and fitted a two-level factor model with categorical indicators (Kamata & Vaughn, 2010 ) to the reading competence tests. Each of the three measurement occasions was modeled as a latent factor. Please note that MSEM is the more general framework to fitting multilevel item response theory models (Fox, 2010 ; Fox & Glas, 2001 ; Kamata & Vaughn, 2010 ; Lu et al., 2005 ; Muthén & Asparouhov, 2012 ), and therefore, each factor in our model resembles a unidimensional, two-parametric IRT model. The model setup was the same for the student and the school level and therefore discrimination parameters (i.e., item loadings) were constrained to be equal at the within- and between-level, while difficulty estimates (i.e., item thresholds) and item residual variances are measured on the between-level (i.e., school-level). School type variables were included as binary predictors of latent abilities at the school level.

The multilevel structural equation models for longitudinal competence measurement were estimated using Bayesian MCMC estimation methods in the Mplus software program (version 8.0, Muthén and Muthén 1998 –2020). Two Markov chains were implemented for each parameter, and chain convergence was assessed using the potential scale reduction (PSR, Gelman & Rubin, 1992 ) criterion, where values below 1.10 indicate convergence (Gelman et al., 2004 ). Furthermore, successful convergence of the estimates was evaluated based on trace plots for each parameter. To determine whether the estimated models delivered reliable estimates, autocorrelation plots were investigated. The mean of the posterior distribution and the Bayesian 95% credibility interval were used to evaluate the model parameters. Using the Kolmogorov–Smirnov test, the hypothesis that both MCMC chains have an equal distribution was evaluated using 100 draws from each of the two chains per parameter. For all estimated models, the PSR criterion (i.e., Gelman and Rubin diagnostic) indicated that convergence was achieved, which was confirmed by a visual inspection of the trace plots for each model parameter.

Diffuse priors were used with a normal distribution with mean zero and infinite variance, N (0, ∞), for continuous indicators such as intercepts, loading parameters or regression slopes; normal distribution priors with mean zero and a variance of 5, N (0, 5), were used for categorical indicators; inverse-gamma priors IG (− 1, 0) were used for residual variances; and inverse-Wishart priors IW (0, − 4) for variances and covariances.

Model fit was assessed using the posterior predictive p-value (PPP), obtained through a fit statistic based on the likelihood-ratio \({\chi }^{2}\) test of an \({H}_{0}\) model against an unrestricted \({H}_{1}\) model, as implemented in Mplus. A low PPP indicates poor fit, while an acceptable model fit starts with PPP > 0.05, and an excellent-fitting model has a PPP value of approximately 0.5 (Asparouhov & Muthén, 2010 ).

Differential item functioning was examined using the invariance alignment method (IA; Asparouhov & Muthén, 2014 ; Kim et al., 2017 ; Muthén & Asparouhov, 2014 ). These models were estimated with maximum likelihood estimation using numerical integration and taking the nested data structure into account through cluster robust estimation. One can choose between fixing one group or free estimation. As the fixed alignment was shown to slightly outperform the free alignment in a simulation study (Kim et al., 2017 ), we applied fixed alignment and ran several models fixing each of the five school types once. Item information for items exhibiting DIF between school types were then split to the respective non-aligning group versus the remaining student groups. Hence, new pseudo-items are introduced for the models that take school-type specific item properties into account.

In the multilevel structural equation models, for the students selected as part of the refreshment sample at the time of the second measurement, we treated their missing information from the first measurement occasion as missing completely at random (Rubin, 1987 ). Please note that student attrition from the seventh and ninth grade samples can be related to features of the sample, even though the multilevel SEM accounts for cases with missing values for the second and third measurement occasions. We fixed the latent factor intercept per assessment for seventh and ninth grade to the value of the respective link constant. The average changes in item difficulty to the original sample were computed from the link samples, and in that manner, an additive linking constant for the overall sample was obtained. Please note that this (additive) linking constant does not change the relations among school type effects per measurement occasion.

Furthermore, we applied weighted effect coding to the school type variables, which is preferred over effect coding, as the categorical variable school type has categories of different sizes (Sweeney & Ulveling, 1972 ; Te Grotenhuis et al., 2017 ). This procedure is advantageous for observational studies, as the data are not balanced, in contrast to data collected via experimental designs. First, we set the high school type as the reference category. Second, to obtain an estimate for this group, we re-estimated the model using middle secondary school as the reference category. Furthermore, we report the Cohen’s ( 1969 ) d effect size per school type estimate. We calculated this effect size as the difference per value relative to the average of all other school type effects per measurement occasion and divided it by the square root of the factor variance (hence the standard deviation) per respective latent factor. For models where the multilevel structure was accounted for, the within- and between-level components of the respective factor variance were summed for the calculation of Cohen’s d .

Data availability and analysis syntax

The data analyzed in this study and documentation are available at https://doi.org/10.5157/NEPS:SC3:9.0.0 . Moreover, the syntax used to generate the reported results is provided in an online repository at https://osf.io/5ugwn/?view_only=327ba9ae72684d07be8b4e0c6e6f1684 .

We first tested for measurement invariance between school types and subsequently probed the sensitivity of school type comparisons when accounting for measurement non-invariance. In our analyses, sufficient convergence in the parameter estimation was indicated for all models through an investigation of the trace and autocorrelation plots. Furthermore, the PSR criterion fell below 1.10 for all parameters after 8000 iterations. Hence, appropriate posterior predictive quality for all parameters on the between and within levels was assumed.

DIF between school types

Measurement invariance of the reading competence test items across the school types was assessed using IA. Items with non-aligning, and hence measurement non-invariant, item parameters between these higher-level groups were found for each measurement occasion (see the third, sixth and last columns of Table 2 ). For the reading competence measurement in fifth grade, 11 out of the 32 administered items showed measurement non-invariance in either discrimination or threshold parameters across school types. Most non-invariance occurred for the lowest (lower secondary school) and the highest (high school) types. For 5 of the 11 non-invariant items, the school types with non-invariance were the same for both the discrimination and threshold parameters. In seventh grade, non-invariance across school types was found for 11 out of the 40 test items in either discrimination or threshold parameters. While non-invariance occurred six times in discrimination parameters, it occurred seven times in threshold parameters, and most non-invariance occurred for the high school type (10 out of the 11 non-invariant items). Applying the IA to the competence test administered in ninth grade showed non-invariance for 11 out of the 44 test items. Nearly all non-invariances were between the lowest and highest school types, and most item non-invariance in discrimination and threshold parameters occurred for the last test items.

Consequences of DIF for school type effects

Comparisons of competence development across school types were estimated using MSEM. Each timepoint was modeled as a latent factor, and the between-level component of each latent factor was regressed on the school type. Furthermore, the latent factors were correlated through this modeling approach, both at the within and between levels. Please note that the within- and between-level model setup was the same, and each factor was modeled with several categorical indicators. In Models 1a and 1b, no school-type specific item discrimination or item difficulty estimates were accounted for, while in Models 2a and 2b, school-type specific item discrimination and item difficulty estimates were taken into account for items exhibiting DIF. The amount of variance attributable to the school type (intraclass correlation) was high in both of these longitudinal models and amounted to 43.0% (Model 1a)/42.4% (Model 2a) in grade 5, 40.3% (Model 1a)/40.6% (Model 2a) in grade 7 and 43.4% (Model 1a)/43.3% (Model 2a) in grade 9. After including the school type covariates (Model 1b and Model 2b), the amount of variance in the school-level random effects was reduced by approximately two-thirds for each school-level factor, while the amount of variance in the student-level random effects remained nearly the same.

The development of reading competence from fifth to ninth grade appeared to be almost parallel between school types. The results of the first model (see Model 1b in Table 3 ) present quite similar differences in reading competence between school types at each measurement occasion. The highest reading competence is achieved by students attending high schools, followed by middle secondary schools, comprehensive schools and schools offering all school tracks except high school. Students in lower secondary schools had the lowest achievement at all timepoints. As the 95 percent posterior probability intervals overlap between the middle secondary school type, the comprehensive school type and the type of schools offering all school tracks except high school (see Model 1b and Model 2b in Table 3 ), three distinct groups of school types, as defined by reading competence achievement, remain. Furthermore, the comparison of competence development from fifth to ninth grade across these school types was quite stable. The Cohen’s d effect size per school type estimate and per estimated model are presented in Table 4 and support this finding. A large positive effect relative to the average reading competence of the other school types is found for high school students across all grades. A large negative effect is found across all grades for lower secondary school students relative to the other school types. The other three school types have overall small effect sizes across all grades relative to the averages of the other school types.

The results of the second model (see Model 2b in Table 3 ) show similar differences between the school types when compared to the former model. Additionally, effect sizes are similar between the two models. Hence, differences in the development of reading competence across school types are parallel, and this pattern is robust to the discovered school-type specific DIF of item discrimination and difficulty estimates. With regard to model fit, only two models (Models 2a and 2b) showed an acceptable fit with PPP > 0.05 when school type-specific item discrimination and item difficulty estimates for items exhibiting DIF were accounted for. Furthermore, single-level regression analyses with cluster robust standard error estimation using the robust maximum likelihood (MLR) estimator were performed to investigate if the findings were robust to the application of an alternative estimation method for hierarchical data. Please note that result tables for these analyses are presented in the Additional file 1 . The main findings remain unaltered, as a parallel pattern of reading competence development between the school types was found, as well as three distinct school type groups.

Consequences when ignoring clustering effects

Finally, we estimated the same models without accounting for the clustered data structure (see Table 5 ). In comparison to the previous models, Model 3a and Model 4a show that in seventh and ninth grade the comprehensive school type performed significantly better than the middle secondary schools and schools offering all school tracks except high school.

Additionally, we replicated the analyses of longitudinal reading competence development using point estimates of student reading competence. The point estimates are the linked weighted maximum likelihood estimates (WLE; Warm, 1989 ) as provided by NEPS and we performed linear growth modelling with and without cluster robust standard error estimation. Results are presented in Additional file 1 : Tables S3–S5. As before, these results support our main findings on the pattern of competence development between German secondary school types and the three distinct school type groups. When it was not accounted for the clustered data structure, the misleading finding resulted that the comprehensive schools performed significantly better in seventh and ninth grade than middle secondary schools and schools offering all school tracks except high school.

We evaluated measurement invariance between German secondary school types and tested the sensitivity of longitudinal comparisons to the found measurement non-invariance. Differences in reading competence between German secondary school types from fifth to ninth grade were investigated, while reading competence was modeled as a latent variable with measurement error taken into account. Multilevel modeling was employed to account for the clustered data structure, and measurement invariance between school types was assessed. Based on our results, partial invariance between school types is assumed (i.e., more than half of the items were measurement invariant/ free of DIF; Steenkamp & Baumgartner, 1998 ; Vandenberg & Lance, 2000 ).

The results on the longitudinal estimation of reading competence revealed a parallel pattern between German secondary school types, and that pattern remained when school-type-specific item estimates were included for items exhibiting DIF. Nevertheless, estimations of the same models without consideration of the clustered data structure led to misleading assumptions about the pattern of longitudinal reading competence development. In these models, students attending the comprehensive school type are estimated to be significantly better in seventh and ninth grade than students attending the middle secondary school type and those attending schools offering all school tracks except high school. For research focusing on school type comparisons of latent competence, we emphasize the use of hierarchical modeling when a nested data structure is present.

Furthermore, although we recommend the assessment of measurement invariance, it is not (or not only) a statistical question whether an item induces bias for group comparisons. Rather, procedures for measurement invariance testing are at best part of the test development process, including expert reviews on items exhibiting DIF (Camilli, 1993 ). Items that are measurement non-invariant and judged to be associated with construct irrelevant factors are revised or replaced throughout the test development process. Robitzsch and Lüdtke ( 2020 ) provide a thoughtful discussion on the reasoning behind (partial) measurement invariance for group comparison under construct relevant DIF and DIF caused by construct irrelevant factors. Information about the amount of item bias for a developed test is also useful to quantify the uncertainty in group comparisons, which is analogous to the report of linking errors in longitudinal large-scale assessments (cf. Robitzsch & Lüdtke, 2020 ). While the assumption of exact item parameter invariance across groups is quite strict, we presented a method to assess the less strict approach of partial measurement invariance. Even when a measured construct is only partially invariant, comparisons of school types can be valid. Nevertheless, no statistical method alone can define construct validity without further theoretical reasoning and expert evaluation. As demonstrated in this study, the sensitivity of longitudinal reading competence development to partial measurement invariance between school types can be assessed.

Implications for research on the achievement gap in reading competence

Studies on reading competence development have presented either parallel development (e.g., Retelsdorf & Möller, 2008 ; Schneider & Stefanek, 2004 ) or a widening gap (e.g., Pfost & Artelt, 2013 ) among secondary school types. In these studies, samples were drawn from different regions (i.e., German federal states), and different methods of statistical analysis were used. We argued that group differences, such as school type effects, can be distorted by measurement non-invariance of test items. As these previous studies have not reported analyses of measurement invariance such as DIF, it is unknown whether the differences found relate to the psychometric properties of the administered tests. With our analyses, we found no indication that the pattern of competence development is affected by DIF. As a prerequisite for group-mean comparisons, studies should present evidence of measurement invariance between investigated groups and in the longitudinal case, across measurement occasions, or refer to the respective sources where these analyses are presented. Also, to enhance comparability of results across studies on reading competence development, researchers should discuss if the construct has the same meaning for all groups and over all measurement occasions. On a further note, the previous analyses were regionally limited and considered only one or two German federal states. In comparison, the sample we used is representative on a national level, and we encourage future research to strive to include more regions. Please note that the clustered data structure was always accounted for in previous analyses on reading competence development through cluster robust maximum likelihood estimation. When the focus is on regression coefficients and variance partitioning or inference on the cluster-level is not of interest, researchers need to make less assumptions of their data when choosing the cluster robust maximum likelihood estimation approach, as compared to hierarchical linear modeling (McNeish et al., 2017 ; Stapleton et al., 2016 ). As mentioned before, inaccurate standard errors and biased significance tests can result when hierarchical structures are ignored during estimation (Hox, 2002 ; Raudenbush & Bryk, 2002 ). As a result, standard errors are underestimated and the confidence intervals are narrower than they actually are, and effects become statistically significant more easily. As our results showed, ignoring the clustered data structure can result in misleading conclusions about the pattern of longitudinal reading competence development in comparisons of German secondary school types.

Limitations

One focus of our study was to investigate the consequences for longitudinal measurements of latent competence when partial invariance is taken into account in the estimation model. It was assumed that the psychometric properties of the scale and the underlying relationship among variables can be affected when some items are non-invariant and thus unfair between school types. With the NEPS study design for reading competence measurement, this assumption cannot be entirely tested, as for each measurement occasion, a completely new set of items is administered to circumvent memory effects. The three measurement occasions are linked through a mean/mean linking approach based on an anchor-group design (Fischer et al., 2016 , 2019 ). Hence, a unique linking constant is assumed to hold for all school types. The computation of the linking constant relies on the assumption that items are invariant across all groups under investigation (e.g., school types). Due to data restrictions, as the data from the additional linking studies are not published by NEPS, we could not investigate the effect of item non-invariance across school types on the computation of linking constants. Therefore, we cannot test the assumption that the scale score metric, upon which the linking constant is computed, holds across measurement occasions for the school clusters and the school types under study. Overall, we assume that high effort was invested in the item and test construction for the NEPS. However, we can conclude that the longitudinal competence measurement is quite robust against the findings presented here regarding measurement non-invariance between school types, as the same measurement instruments are used to create the linking constants. Whenever possible, we encourage researchers to additionally assess measurement invariance across repeated measurements.

On a more general note, and looking beyond issues of statistical modeling, the available information on school types for our study is not exhaustive, as the German secondary school system is very complex and offers several options for students regarding schooling trajectories. A detailed variable on secondary school types and an identification of students who change school types between measurement occasions is desired but difficult to provide for longitudinal analyses (Bayer et al., 2014 ). As we use the school type information that generated the strata for the sampling of students, this information is constant over measurement occasions, but the comparability for later measurement timepoints (e.g., ninth grade) is rather limited.

In summary, it was assumed that school-level differences in measurement constructs may impact the longitudinal measurement of reading competence development. Therefore, we assessed measurement invariance between school types. Differences in item estimates between school types were found for each of the three measurement occasions. Nevertheless, taking these differences in item discrimination and difficulty estimates into account did not alter the parallel pattern of reading competence development when comparing German secondary school types from fifth to ninth grade. Furthermore, the necessity of taking the hierarchical data structure into account when comparing competence development across the school types was demonstrated. Ignoring the fact that students are nested within schools by sampling design in the estimation led to an overestimation of the statistical significance of the effects for the comprehensive school type in seventh and ninth grade.

Availability of data and materials

The data analyzed in this study and documentation are available at doi: https://doi.org/10.5157/NEPS:SC3:9.0.0 . Moreover, the syntax used to generate the reported results is provided in an online repository at https://osf.io/5ugwn/?view_only=327ba9ae72684d07be8b4e0c6e6f1684 .

This paper uses data from the National Educational Panel Study (NEPS): Starting Cohort Grade 5, doi: https://doi.org/10.5157/NEPS:SC3:9.0.0 . From 2008 to 2013, NEPS data was collected as part of the Framework Program for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, NEPS is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network.

Asparouhov, T., & Muthén, B. (2010). Bayesian analysis using Mplus: Technical implementation (Mplus Technical Report). http://statmodel.com/download/Bayes3.pdf . Accessed 12 November 2020.

Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21 (4), 495–508. https://doi.org/10.1080/10705511.2014.919210

Article   Google Scholar  

Aßmann, C., Steinhauer, H. W., Kiesl, H., Koch, S., Schönberger, B., Müller-Kuller, A., Rohwer, G., Rässler, S., & Blossfeld, H.-P. (2011). 4 Sampling designs of the National Educational Panel Study: Challenges and solutions. Zeitschrift Für Erziehungswissenschaft, 14 (S2), 51–65. https://doi.org/10.1007/s11618-011-0181-8

Bast, J., & Reitsma, P. (1998). Analyzing the development of individual differences in terms of Matthew effects in reading: Results from a Dutch longitudinal study. Developmental Psychology, 34 (6), 1373–1399. https://doi.org/10.1037/0012-1649.34.6.1373

Baumert, J., Klieme, E., Neubrand, M., Prenzel, M., Schiefele, U., Schneider, W., Stanat, P., Tillmann, K.-J., & Weiß, M. (2001). PISA 2000: Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich . Leske + Budrich. https://doi.org/10.1007/978-3-322-83412-6

Book   Google Scholar  

Baumert, J., Stanat, P., & Watermann, R. (2006). Schulstruktur und die Entstehung differenzieller Lern- und Entwicklungsmilieus. In J. Baumert, P. Stanat, & R. Watermann (Eds.), Herkunftsbedingte Disparitäten im Bildungssystem (pp. 95–188). VS Verlag für Sozialwissenschaften.

Google Scholar  

Baumert, J., Trautwein, U., & Artelt, C. (2003). Schulumwelten—institutionelle Bedingungen des Lehrens und Lernens. In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, K.-J. Tillmann, & M. Weiß (Eds.), PISA 2000. Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland (pp. 261–331). Leske u. Budrich.

Chapter   Google Scholar  

Bayer, M., Goßmann, F., & Bela, D. (2014). NEPS technical report: Generated school type variable t723080_g1 in Starting Cohorts 3 and 4 (NEPS Working Paper No. 46). Bamberg: Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://www.neps-data.de/Portals/0/Working%20Papers/WP_XLVI.pdf . Accessed 12 November 2020.

Becker, M., Lüdtke, O., Trautwein, U., & Baumert, J. (2006). Leistungszuwachs in Mathematik. Zeitschrift Für Pädagogische Psychologie, 20 (4), 233–242. https://doi.org/10.1024/1010-0652.20.4.233

Blossfeld, H.-P., Roßbach, H.-G., & von Maurice, J. (Eds.), (2011). Education as a lifelong process: The German National Educational Panel Study (NEPS) [Special Issue]. Zeitschrift für Erziehungswissenschaft , 14.

Bos, W., Bonsen, M., & Gröhlich, C. (2009). KESS 7 Kompetenzen und Einstellungen von Schülerinnen und Schülern an Hamburger Schulen zu Beginn der Jahrgangsstufe 7. HANSE—Hamburger Schriften zur Qualität im Bildungswesen (Vol. 5). Waxmann.

Brown, T. A. (2006). Confirmatory factor analysis for applied research . Guilford Press.

Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In P. W. Holland & H. Wainer (Eds.), Differential item functioning: Theory and practice (pp. 397–417). Erlbaum.

Camilli, G. (2006). Test fairness. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 221–256). American Council on Education and Praeger.

Chall, J. S. (1983). Stages of reading development . McGraw-Hill.

Cohen, J. (1969). Statistical power analysis for the behavioral sciences . Academic Press.

Cortina, K. S., & Trommer, L. (2009). Bildungswege und Bildungsbiographien in der Sekundarstufe I. Das Bildungswesen in der Bundesrepublik Deutschland: Strukturen und Entwicklungen im Überblick . Waxmann.

Ditton, H., Krüsken, J., & Schauenberg, M. (2005). Bildungsungleichheit—der Beitrag von Familie und Schule. Zeitschrift Für Erziehungswissenschaft, 8 (2), 285–304. https://doi.org/10.1007/s11618-005-0138-x

Edossa, A. K., Neuenhaus, N., Artelt, C., Lingel, K., & Schneider, W. (2019). Developmental relationship between declarative metacognitive knowledge and reading comprehension during secondary school. European Journal of Psychology of Education, 34 (2), 397–416. https://doi.org/10.1007/s10212-018-0393-x

Finch, W. H., & Bolin, J. E. (2017). Multilevel Modeling using Mplus . Chapman and Hall—CRC.

Fischer, L., Gnambs, T., Rohm, T., & Carstensen, C. H. (2019). Longitudinal linking of Rasch-model-scaled competence tests in large-scale assessments: A comparison and evaluation of different linking methods and anchoring designs based on two tests on mathematical competence administered in grades 5 and 7. Psychological Test and Assessment Modeling, 61 , 37–64.

Fischer, L., Rohm, T., Gnambs, T., & Carstensen, C. H. (2016). Linking the data of the competence tests (NEPS Survey Paper No. 1). Bamberg: Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://www.lifbi.de/Portals/0/Survey%20Papers/SP_I.pdf . Accessed 12 November 2020.

Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications . Springer.

Fox, J.-P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using gibbs sampling. Psychometrika, 66 , 271–288.

Gamoran, A., & Mare, R. D. (1989). Secondary school tracking and educational inequality: Compensation, reinforcement, or neutrality? American Journal of Sociology, 94 (5), 1146–1183. https://doi.org/10.1086/229114

Gehrer, K., Zimmermann, S., Artelt, C., & Weinert, S. (2003). NEPS framework for assessing reading competence and results from an adult pilot study. Journal for Educational Research Online, 5 , 50–79.

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed.). Chapman & Hall.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple Sequences. Statistical Science, 7 , 457–472.

Heck, R. H., Price, C. L., & Thomas, S. L. (2004). Tracks as emergent structures: A network analysis of student differentiation in a high school. American Journal of Education, 110 (4), 321–353. https://doi.org/10.1086/422789

Holland, P. W., & Wainer, H. (1993). Differential item functioning . Routledge. https://doi.org/10.4324/9780203357811

Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Quantitative methodology series . Erlbaum.

Jak, S., & Jorgensen, T. (2017). Relating measurement invariance, cross-level invariance, and multilevel reliability. Frontiers in Psychology, 8 , 1640. https://doi.org/10.3389/fpsyg.2017.01640

Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36 (4), 409–426. https://doi.org/10.1007/BF02291366

Kamata, A., & Vaughn, B. K. (2010). Multilevel IRT modeling. In J. J. Hox & J. K. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 41–57). Routledge.

Kaplan, D., Kim, J.-S., & Kim, S.-Y. (2009). Multilevel latent variable modeling: Current research and recent developments. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 592–612). Sage Publications Ltd. https://doi.org/10.4135/9780857020994.n24

Kim, E., Cao, C., Wang, Y., & Nguyen, D. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling: A Multidisciplinary Journal . https://doi.org/10.1080/10705511.2017.1304822

Köller, O., & Baumert, J. (2001). Leistungsgruppierungen in der Sekundarstufe I. Ihre Konsequenzen für die Mathematikleistung und das mathematische Selbstkonzept der Begabung. Zeitschrift Für Pädagogische Psychologie, 15 , 99–110. https://doi.org/10.1024//1010-0652.15.2.99

Köller, O., & Baumert, J. (2002). Entwicklung von Schulleistungen. In R. Oerter & L. Montada (Eds.), Entwicklungspsychologie (pp. 735–768). Beltz/PVU.

Krannich, M., Jost, O., Rohm, T., Koller, I., Carstensen, C. H., Fischer, L., & Gnambs, T. (2017). NEPS Technical report for reading—scaling results of starting cohort 3 for grade 7 (NEPS Survey Paper No. 14). Bamberg: Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://www.neps-data.de/Portals/0/Survey%20Papers/SP_XIV.pdf . Accessed 12 November 2020.

Lehmann, R., Gänsfuß, R., & Peek, R. (1999). Aspekte der Lernausgangslage und der Lernentwicklung von Schülerinnen und Schülern an Hamburger Schulen: Klassenstufe 7; Bericht über die Untersuchung im September 1999 . Hamburg: Behörde für Schule, Jugend und Berufsbildung, Amt für Schule.

Lehmann, R. H., & Lenkeit, J. (2008). ELEMENT. Erhebung zum Lese- und Mathematikverständnis. Entwicklungen in den Jahrgangsstufen 4 bis 6 in Berlin . Berlin: Senatsverwaltung für Bildung, Jugend und Sport.

LeTendre, G. K., Hofer, B. K., & Shimizu, H. (2003). What Is tracking? Cultural expectations in the United States, Germany, and Japan. American Educational Research Journal, 40 (1), 43–89. https://doi.org/10.3102/00028312040001043

Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17 , 179–193.

Lu, I. R. R., Thomas, D. R., & Zumbo, B. D. (2005). Embedding IRT in structural equation models: A comparison with regression based on IRT scores. Structural Equation Modeling: A Multidisciplinary Journal, 12 (2), 263–277. https://doi.org/10.1207/s15328007sem1202_5

Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13 , 203–229.

Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011). A 2x2 taxonomy of multilevel latent contextual model: Accuracy-bias trade-offs in full and partial error correction models. Psychological Methods, 16 , 444–467.

Marsh, H. W., Lüdtke, O., Robitzsch, A., Trautwein, U., Asparouhov, T., Muthén, B., & Nagengast, B. (2009). Doubly-latent models of school contextual effects: Integrating multilevel and structural equation approaches to control measurement and sampling error. Multivariate Behavioral Research, 44 , 764–802.

McNeish, D., Stapleton, L. M., & Silverman, R. D. (2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22 (1), 114–140. https://doi.org/10.1037/met0000078

Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17 (4), 297–334. https://doi.org/10.1177/014662169301700401

Muthén, B., & Asparouhov, T. (2012). Bayesian SEM: A more flexible representation of substantive theory. Psychological Methods, 17 , 313–335.

Muthén, B., & Asparouhov, T. (2014). IRT studies of many groups: The alignment method. Frontiers in Psychology, 5 , 978. https://doi.org/10.3389/fpsyg.2014.00978

Muthén, L.K. and Muthén, B.O. (1998–2020). Mplus User’s Guide (8th ed.), Los Angeles, CA: Muthén and Muthén.

Nagy, G., Retelsdorf, J., Goldhammer, F., Schiepe-Tiska, A., & Lüdtke, O. (2017). Veränderungen der Lesekompetenz von der 9. zur 10. Klasse: Differenzielle Entwicklungen in Abhängigkeit der Schulform, des Geschlechts und des soziodemografischen Hintergrunds? Zeitschrift Für Erziehungswissenschaft, 20 (S2), 177–203. https://doi.org/10.1007/s11618-017-0747-1

Naumann, J., Artelt, C., Schneider, W. & Stanat, P. (2010). Lesekompetenz von PISA 2000 bis PISA 2009. In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Köller, M. Prenzel (Eds.), PISA 2009. Bilanz nach einem Jahrzehnt. Münster: Waxmann. https://www.pedocs.de/volltexte/2011/3526/pdf/DIPF_PISA_ISBN_2450_PDFX_1b_D_A.pdf . Accessed 12 November 2020.

Neumann, M., Schnyder, I., Trautwein, U., Niggli, A., Lüdtke, O., & Cathomas, R. (2007). Schulformen als differenzielle Lernmilieus. Zeitschrift Für Erziehungswissenschaft, 10 (3), 399–420. https://doi.org/10.1007/s11618-007-0043-6

O’Brien, D. G., Moje, E. B., & Stewart, R. A. (2001). Exploring the context of secondary literacy: Literacy in people’s everyday school lives. In E. B. Moje & D. G. O’Brien (Eds.), Constructions of literacy: Studies of teaching and learning in and out of secondary classrooms (pp. 27–48). Erlbaum.

Oakes, J., & Wells, A. S. (1996). Beyond the technicalities of school reform: Policy lessons from detracking schools . UCLA Graduate School of Education & Information Studies.

OECD. (2017). PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving . OECD Publishing. https://doi.org/10.1787/9789264281820-en

OECD & Statistics Canada. (1995). Literacy, economy and society: Results of the first international adult literacy survey . OECD Publishing.

Pfost, M., & Artelt, C. (2013). Reading literacy development in secondary school and the effect of differential institutional learning environments. In M. Pfost, C. Artelt, & S. Weinert (Eds.), The development of reading literacy from early childhood to adolescence empirical findings from the Bamberg BiKS longitudinal studies (pp. 229–278). Bamberg: University of Bamberg Press.

Pfost, M., Hattie, J., Dörfler, T., & Artelt, C. (2014). Individual differences in reading development: A review of 25 years of empirical research on Matthew effects in reading. Review of Educational Research, 84 (2), 203–244. https://doi.org/10.3102/0034654313509492

Pfost, M., Karing, C., Lorenz, C., & Artelt, C. (2010). Schereneffekte im ein- und mehrgliedrigen Schulsystem: Differenzielle Entwicklung sprachlicher Kompetenzen am Übergang von der Grund- in die weiterführende Schule? Zeitschrift Für Pädagogische Psychologie, 24 (3–4), 259–272. https://doi.org/10.1024/1010-0652/a000025

Pohl, S., Haberkorn, K., Hardt, K., & Wiegand, E. (2012). NEPS technical report for reading—scaling results of starting cohort 3 in fifth grade (NEPS Working Paper No. 15). Bamberg: Otto-Friedrich-Universität, Nationales Bildungspanel.

Protopapas, A., Parrila, R., & Simos, P. G. (2016). In Search of Matthew effects in reading. Journal of Learning Disabilities, 49 (5), 499–514. https://doi.org/10.1177/0022219414559974

Rabe-Hesketh, S., Skrondal, A., & Zheng, X. (2007). Multilevel Structural Equation Modeling. In S.-Y. Lee (Ed.), Handbook of Latent Variable and Related Models (pp. 209–227). Elsevier.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Advanced quantitative techniques in the social sciences, (Vol. 1). Thousand Oaks, CA.: Sage Publ.

Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23 (2), 120–126. https://doi.org/10.1177/01466219922031248

Retelsdorf, J., Becker, M., Köller, O., & Möller, J. (2012). Reading development in a tracked school system: A longitudinal study over 3 years using propensity score matching. The British Journal of Educational Psychology, 82 (4), 647–671. https://doi.org/10.1111/j.2044-8279.2011.02051.x

Retelsdorf, J., & Möller, J. (2008). Entwicklungen von Lesekompetenz und Lesemotivation: Schereneffekte in der Sekundarstufe? Zeitschrift Für Entwicklungspsychologie Und Pädagogische Psychologie, 40 (4), 179–188. https://doi.org/10.1026/0049-8637.40.4.179

Robitzsch, A., & Lüdtke, O. (2020). A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychological Test and Assessment Modeling , 62(2), 233–279. https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-2/03_Robitzsch.pdf

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys . Wiley. https://doi.org/10.1002/9780470316696

Scharl, A., Fischer, L., Gnambs, T., & Rohm, T. (2017). NEPS Technical report for reading: scaling results of starting cohort 3 for grade 9 (NEPS Survey Paper No. 20). Bamberg: Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://www.neps-data.de/Portals/0/Survey%20Papers/SP_XX.pdf . Accessed 12 November 2020.

Schneider, W., & Stefanek, J. (2004). Entwicklungsveränderungen allgemeiner kognitiver Fähigkeiten und schulbezogener Fertigkeiten im Kindes- und Jugendalter. Zeitschrift Für Entwicklungspsychologie Und Pädagogische Psychologie, 36 (3), 147–159. https://doi.org/10.1026/0049-8637.36.3.147

Schweig, J. (2014). Cross-level measurement invariance in school and classroom environment surveys: Implications for policy and practice. Educational Evaluation and Policy Analysis, 36 (3), 259–280. https://doi.org/10.3102/0162373713509880

Silva, C., Bosancianu, B. C. M., & Littvay, L. (2019). Multilevel Structural Equation Modeling . Sage.

Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21 (4), 360–407. https://doi.org/10.1598/RRQ.21.4.1

Stapleton, L. M., McNeish, D. M., & Yang, J. S. (2016). Multilevel and single-level models for measured and latent variables when data are clustered. Educational Psychologist, 51 (3–4), 317–330. https://doi.org/10.1080/00461520.2016.1207178

Steenkamp, J. E. M., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of Consumer Research, 25 , 78–90. https://doi.org/10.1086/209528

Steinhauer, H. W. & Zinn, S. (2016). NEPS technical report for weighting: Weighting the sample of starting cohort 3 of the national educational panel study (Waves 1 to 3) (NEPS Working Paper No. 63). Bamberg: Leibniz Institute for Educational Trajectories, National Educational Panel Study. https://www.neps-data.de/Portals/0/Working%20Papers/WP_LXIII.pdf . Accessed 12 November 2020.

Steyer, R., Partchev, I., & Shanahan, M. J. (2000). Modeling True Intraindividual Change in Structural Equation Models: The Case of Poverty and Children’s Psychosocial Adjustment. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.),  Modeling longitudinal and multilevel data: Practical issues, applied approaches and specific examples  (pp. 109–26). Mahwah, N.J.: Lawrence Erlbaum Associates. https://www.metheval.uni-jena.de/materialien/publikationen/steyer_et_al.pdf . Accessed 12 November 2020.

Sweeney, R. E., & Ulveling, E. F. (1972). A Transformation for simplifying the interpretation of coefficients of binary variables in regression analysis. The American Statistician, 26 (5), 30–32. https://doi.org/10.2307/2683780

Te Grotenhuis, M., Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017). When size matters: Advantages of weighted effect coding in observational studies. International Journal of Public Health, 62 (1), 163–167. https://doi.org/10.1007/s00038-016-0901-1

van de Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthén, B. (2013). Facing off with Scylla and Charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology, 4 , 770. https://doi.org/10.3389/fpsyg.2013.00770

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3 (1), 4–70. https://doi.org/10.1177/109442810031002

Walberg, H. J., & Tsai, S.-L. (1983). Matthew effects in education. American Educational Research Journal, 20 (3), 359–373. https://doi.org/10.2307/1162605

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54 , 427–450. https://doi.org/10.1007/BF02294627

Weis, M., Doroganova, A., Hahnel, C., Becker-Mrotzek, M., Lindauer, T., Artelt, C., & Reiss, K. (2020). Aktueller Stand der Lesekompetenz in PISA 2018. In K. Reiss, M. Weis & A Schiepe-Tiska (Hrsg). Schulmanagement Handbuch (pp. 9–19). München: Cornelsen. https://www.pisa.tum.de/fileadmin/w00bgi/www/_my_direct_uploads/PISA_Bericht_2018_.pdf . Accessed 12 November 2020.

Weis, M., Zehner, F., Sälzer, C., Strohmeier, A., Artelt, C., & Pfost, M. (2016). Lesekompetenz in PISA 2015: Ergebnisse, Veränderungen und Perspektiven. In K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme & O. Köller (Eds.), PISA 2015—Eine Studie zwischen Kontinuität und Innovation (pp. 249–283). Münster: Waxmann.

Williamson, G. L., Appelbaum, M., & Epanchin, A. (1991). Longitudinal analyses of academic achievement. Journal of Educational Measurement, 28 (1), 61–76. https://doi.org/10.1111/j.1745-3984.1991.tb00344.x

Download references

Acknowledgements

The authors would like to thank David Kaplan for helpful suggestions on the analysis of the data. We would also like to thank Marie-Ann Sengewald for consultation on latent variable modelling.

This research project was partially funded by the Deutsche Forschungsgemeinschaft (DFG; http://www.dfg.de ) within Priority Programme 1646 entitled “A Bayesian model framework for analyzing data from longitudinal large-scale assessments” under Grant No. CA 289/8–2 (awarded to Claus H. Carstensen). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

Leibniz Institute for Educational Trajectories, Wilhelmsplatz 3, 96047, Bamberg, Germany

Theresa Rohm, Luise Fischer & Timo Gnambs

University of Bamberg, Bamberg, Germany

Theresa Rohm & Claus H. Carstensen

Johannes Kepler University Linz, Linz, Austria

Timo Gnambs

You can also search for this author in PubMed   Google Scholar

Contributions

TR analyzed and interpreted the data used in this study. TR conducted the literature review and drafted significant parts of the manuscript. CHC, LF and TG substantially revised the manuscript and provided substantial input regarding the statistical analyses. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Theresa Rohm .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: table s1.

. Results of models for longitudinal competence measurement (N= 7276) with cluster robust standard error estimation. Table S2 . Effect sizes (Cohen’s d) for school type covariates per estimated model. Table S3 . Results of models for longitudinal competence development using WLEs (N= 7276) with cluster robust standard error estimation. Table S4 . Results of models for longitudinal competence development using WLEs (N= 7276) without cluster robust standard error estimation. Table S5 . Effect sizes (Cohen’s d) for school type covariates per estimated model.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Rohm, T., Carstensen, C.H., Fischer, L. et al. The achievement gap in reading competence: the effect of measurement non-invariance across school types. Large-scale Assess Educ 9 , 23 (2021). https://doi.org/10.1186/s40536-021-00116-2

Download citation

Received : 07 December 2020

Accepted : 15 October 2021

Published : 28 October 2021

DOI : https://doi.org/10.1186/s40536-021-00116-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Alignment method
  • Competence development
  • Measurement invariance
  • Multilevel item response theory
  • Multilevel structural equation modeling

research gap for reading comprehension

Advertisement

Advertisement

Inferencing in Reading Comprehension: Examining Variations in Definition, Instruction, and Assessment

  • Original research
  • Published: 04 June 2023

Cite this article

research gap for reading comprehension

  • Marianne Rice   ORCID: orcid.org/0000-0001-8935-4734 1 ,
  • Kausalai Wijekumar   ORCID: orcid.org/0000-0002-0768-5693 2 ,
  • Kacee Lambright   ORCID: orcid.org/0000-0002-8955-4135 2 &
  • Abigail Bristow   ORCID: orcid.org/0009-0009-7093-3678 2  

591 Accesses

3 Altmetric

Explore all metrics

Inferencing is an important and complex process required for successful reading comprehension. Previous research has suggested instruction in inferencing is effective at improving reading comprehension. However, varying definitions of inferencing is likely impacting how inferencing instruction is implemented in practice and inferencing ability is measured. The goal of this study was, first, to systematically review the literature on inference instruction to compile a list of definitions used to describe inferences, and second, to review textbooks used in instruction and assessments used in research and practice to measure inferencing skills. A systematic literature search identified studies that implemented inferencing instruction with learners across all ages from preschool to adults. After screening and elimination, 75 studies were identified and reviewed for inference definitions, instructional practices, and assessments used. A widely-used reading textbook and two reading comprehension assessments were reviewed for grade 4 (elementary school) and grade 7 (middle school) to connect inferences taught and measured with the identified definitions. Reviewing the 75 studies suggested 3 broad categories of inferences and 9 definitions of specific inference types. Textbook and assessment review processes revealed differences between the types of inference questions practiced and tested. The large variation in inference types and definitions may create difficulties in schools implementing inference instruction and/or attempting to measure students’ inference abilities. More alignment between research studies on inference instruction and the textbooks and assessments used in schools to teach and assess inference skills is needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research gap for reading comprehension

Similar content being viewed by others

Inference instruction for struggling readers: a synthesis of intervention research.

research gap for reading comprehension

Using Read-Alouds to Teach Inferencing from the Start

research gap for reading comprehension

Inferential comprehension differences between narrative and expository texts: a systematic review and meta-analysis

Explore related subjects.

  • Digital Education and Educational Technology

Applegate, M. D., Quinn, K. B., & Applegate, A. (2002). Levels of thinking required by comprehension questions in informal reading inventories. The Reading Teacher, 56 , 174–180.

Google Scholar  

Beerwinkle, A. L., Owens, J., & Hudson, A. (2021). An analysis of comprehension strategies and skills covered within grade 3–5 reading textbooks in the United States. Technology, Knowledge, and Learning, 26 (2), 311–338. https://doi.org/10.1007/s10758-020-09484-0

Article   Google Scholar  

Cain, K., & Oakhill, J. V. (1999). Inference making and its relation to comprehension failure. Reading and Writing, 11 , 489–503. https://doi.org/10.1023/A:1008084120205

Cain, K., Oakhill, J. V., Barnes, M. A., & Bryant, P. E. (2001). Comprehension skill, inference-making ability, and their relation to knowledge. Memory and Cognition, 29 (6), 850–859. https://doi.org/10.3758/BF03196414

Cain, K., Oakhill, J. V., & Bryant, P. E. (2004). Children’s reading comprehension ability: Concurrent prediction by working memory, verbal ability, and component skill. Journal of Educational Psychology, 96 , 671–681. https://doi.org/10.1037/0022-0663.96.1.31

Clinton, V., Taylor, T., Bajpayee, S., Davison, M. L., Carlson, S. E., & Seipel, B. (2020). Inferential comprehension differences between narrative and expository texts: A systematic review and meta-analysis. Reading and Writing: An Interdisciplinary Journal, 33 , 2223–2248. https://doi.org/10.1007/s11145-020-10044-2

Common Core State Standards Initiative. (2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects . Retrieved from: https://learning.ccsso.org/wp-content/uploads/2022/11/ELA_Standards1.pdf

Dewitz, P., Jones, J., & Leahy, S. (2009). Comprehension strategy instruction in core reading programs. Reading Research Quarterly, 44 (2), 102–126. https://doi.org/10.1598/RRQ.44.2.1

Elleman, A. M. (2017). Examining the impact of inference instruction on the literal and inferential comprehension of skilled and less skilled readers: A meta-analytic review. Journal of Educational Psychology, 109 (6), 761–781. https://doi.org/10.1037/edu0000180

Gauche, G., & Pfeiffer Flores, E. (2022). The role of inferences in reading comprehension: A critical analysis. Theory and Psychology, 32 (2), 326–343. https://doi.org/10.1177/09593543211043805

Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101 (3), 371–395. https://doi.org/10.1037/0033-295X.101.3.371

Hall, C. S. (2016). Inference instruction for struggling readers: A synthesis of intervention research. Educational Psychology Review, 28 (1), 1–22. https://doi.org/10.1007/s10648-014-9295-x

Houghton Mifflin Harcourt. (2023). About us . https://www.hmhco.com/about-us . Retrieved on June 2, 2023.

Jones, J. S., Conradi, K., & Amendum, S. J. (2016). Matching interventions to reading needs: A case for differentiation. The Reading Teacher, 70 (3), 307–316. https://doi.org/10.1002/trtr.1513

Kendeou, P. (2015). A general inference skill. In E. J. O’Brien, A. E. Cook, & R. F. Lorch (Eds.), Inferences during reading (pp. 160 –181). Cambridge University Press. http://dx.doi.org/ https://doi.org/10.1017/CBO9781107279186.009

Kendeou, P., Bohn-Gettler, C., White, M., & van den Broek, P. (2008). Children’s inference generation across different media. Journal of Research in Reading, 31 , 259–272. https://doi.org/10.1111/j.1467-9817.2008.00370.x

Leslie, L., & Caldwell, J. (2017). Formal and informal measures of reading comprehension. In S. E. Israel (Ed.), Handbook of research on reading comprehension 2nd ed. (pp. 427–451). Routledge.

MacGinitie, W. H., MacGinitie, R. K., Maria, K., & Dreyer, L. G. (2002). Gates-MacGinitie Reading Tests . 4th ed. Riverside Publishing.

Mar, R. A., Li, J., Nguyen, A. T., & Ta, C. P. (2021). Memory and comprehension of narrative versus expository texts: A meta-analysis. Psychonomic Bulletin and Review, 28 , 732–749. https://doi.org/10.3758/s13423-020-01853-1

Nash, H., & Heath, J. (2011). The role of vocabulary, working memory and inference making ability in reading comprehension in Down syndrome. Research in Developmental Disabilities, 32 (5), 1782–1791. https://doi.org/10.1016/j.ridd.2011.03.007

Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews, 5 (1), 1–10. https://doi.org/10.1186/s13643-016-0384-4

Perfetti, C. A., & Stafura, J. Z. (2015). Comprehending implicit meanings in text without making inferences. In E. J. O’Brien, A. E. Cook, & R. F. Lorch (Eds.), Inferences during reading (pp. 1–18). Cambridge University Press. https://doi.org/10.1017/CBO9781107279186.002

Schmidt, W. H., & McKnight, C. C. (2012).  Inequality for all . Teachers College Press.

Schwartz, S. (2019). The most popular reading programs aren’t backed by science. Education Week . https://www.edweek.org/teaching-learning/the-most-popular-reading-programs-arent-backed-by-science/2019/12 .

Texas Education Agency (2022). STAAR released test questions . Retrieved from: https://tea.texas.gov/student-assessment/testing/staar/staar-released-test-questions .

Texas Education Agency (2017). Texas essential knowledge and skills for English language arts and reading . Retrieved from: https://tea.texas.gov/about-tea/laws-and-rules/texas-administrative-code/19-tac-chapter-110 .

van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension . Academic Press.

Wijekumar, K. K., Meyer, B. J., & Lei, P. (2017). Web-based text structure strategy instruction improves seventh graders’ content area reading comprehension. Journal of Educational Psychology, 109 (6), 741. https://doi.org/10.1037/edu0000168

Download references

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through grant U423A180074 to Texas A&M University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

Author information

Authors and affiliations.

Department of Educational Psychology, Texas A&M University, 4225 TAMU, College Station, TX, 77843, USA

Marianne Rice

Department of Teaching, Learning, and Culture, Texas A&M University, College Station, USA

Kausalai Wijekumar, Kacee Lambright & Abigail Bristow

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Marianne Rice .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 50 kb)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rice, M., Wijekumar, K., Lambright, K. et al. Inferencing in Reading Comprehension: Examining Variations in Definition, Instruction, and Assessment. Tech Know Learn (2023). https://doi.org/10.1007/s10758-023-09660-y

Download citation

Accepted : 26 May 2023

Published : 04 June 2023

DOI : https://doi.org/10.1007/s10758-023-09660-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Inferencing
  • Reading comprehension
  • Inferencing instruction
  • Multiple age groups
  • Inference and integrative processes
  • Find a journal
  • Publish with us
  • Track your research
  • Arts & Humanities

Reading Comprehension: Bridging the Gap ... Research and Practice

Related documents.

Dr. Richard Mezeske uses  Power Point to model a... instruction lesson on the role of prior knowledge in learning,

Add this document to collection(s)

You can add this document to your study collection(s)

Add this document to saved

You can add this document to your saved list

Suggest us how to improve StudyLib

(For complaints, use another form )

Input it if you want to receive answer

COMMENTS

  1. Reading Comprehension Research: Implications for Practice and Policy

    Despite decades of research in reading comprehension, international and national reading scores indicate stagnant growth for U.S. adolescents. In this article, we review the theoretical and empirical research in reading comprehension. We first explore different theoretical models for comprehension and then focus on components shown to be ...

  2. Reading comprehension research: Implications for practice and policy

    Reading comprehension is one of the most complex cognitive activities in which humans engage, making it difficult to teach, measure, and research. Despite decades of research in reading comprehension, international and national reading scores indicate stagnant growth for U.S. adolescents. In this article, we review the theoretical and empirical research in reading comprehension.

  3. New Research on Reading Comprehension (and 5 Tips for Teachers

    The researchers discovered a "knowledge threshold" when it comes to reading comprehension: If students were unfamiliar with 59 percent of the terms in a topic, their ability to understand the text was "compromised.". In the study, 3,534 high school students were presented with a list of 44 terms and asked to identify whether each was ...

  4. Levels of Reading Comprehension in Higher Education: Systematic Review

    This review is a guide to direct future research, broadening the study focus on the level of reading comprehension using digital technology, experimental designs, second languages, and investigations that relate reading comprehension with other factors (gender, cognitive abilities, etc.) that can explain the heterogeneity in the different ...

  5. Refocusing reading comprehension: Aligning theory with assessment and

    A stronger alignment of theory to assessment and instruction should lead to more robust and enduring educational practices. Anchoring this work closely to theory will begin to bridge the large gap between research and practice. Theory-driven assessments have the potential to evaluate the many factors and processes involved in reading comprehension.

  6. The Science of Reading Comprehension Instruction

    Decades of research offer important understandings about the nature of comprehension and its development. Drawing on both classic and contemporary research, in this article, we identify some key understandings about reading comprehension processes and instruction, including these: Comprehension instruction should begin early, teaching word-reading and bridging skills (including ...

  7. Reading Development and Difficulties: Bridging the Gap Between Research

    Compact, lightweight edition. Dispatched in 3 to 5 business days. Free shipping worldwide - see info. This book provides an overview of current research on the development of reading skills as well as practices to assist educational professionals with assessment, prevention, and intervention for students with reading difficulties.

  8. Reading Comprehension Research: Implications for Practice and Policy

    Moreover, reading comprehension can be defined as a thought process through which the reader realizes an idea, understands it in terms of their background or experience, and interprets it in terms ...

  9. A comprehensive review of research on reading comprehension strategies

    Considering the research foci and findings, we identified seven categories: (a) comparison of the strategy use in L1 and L2 reading; (b) comparison of EAL readers' and monolinguals' comprehension strategy use; (c) different L1 groups' strategy use; (d) the role of languages in the strategy use; (e) the relationship between reading proficiency and comprehension strategy use; (f) strategies in ...

  10. PDF RC-MAPS: Bridging the Comprehension Gap in EAP Reading

    2008; Urquhart & Weir, 1991). Although research has given teachers direction regarding the approach to use when providing strategy instruction in their class-rooms, it has been left to teachers to develop the specific teaching tools required. In this article, I propose Reading Comprehension MAP for Situation-based

  11. Discovering the literacy gap: A systematic review of reading and

    1. Introduction. Teachers strive to enrich students' literacy by helping them become consumers of literature and producers of writing. Classroom teachers recognize that reading and writing complement each other and include the two skills simultaneously in instruction (Gao, Citation 2013; Grabe & Zhang, Citation 2013; Ulusoy & Dedeoglu, Citation 2011).

  12. Reviewing Evidence on the Relations Between Oral Reading Fluency and

    Report of the national reading panel: Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction: Reports of the subgroups. National Institute of Child Health and Human Development, National Institutes of Health.

  13. The Role of Background Knowledge in Reading Comprehension: A Critical

    The Role of Domain Knowledge. The Construction-Integration model identifies a critical role for background knowledge in reading (Kintsch, Citation 1998; Kintsch & Van Dijk, Citation 1978).Knowledge can be classified according to its specificity; background knowledge comprises all of the world knowledge that the reader brings to the task of reading. This can include episodic (events ...

  14. Vocabulary and Reading Comprehension Revisited: Evidence for High-, Mid

    Although a substantial number of studies have found vocabulary knowledge to be a significant predictor of reading success in L2 learners and have established certain vocabulary size and lexical coverage targets for comprehension (e.g., Hazenberg & Hulstijn, 1996; Hu & Nation, 2000; Laufer, 1992a; Nation, 2006; Schmitt, Jiang, & Grabe, 2011), most of those studies have predominantly focused on ...

  15. PDF Improving Reading Comprehension

    The teacher researchers intended to improve reading comprehension by using higher-order thinking skills such as predicting, making connections, visualizing, inferring, questioning, and summarizing. In their classrooms the teacher researchers modeled these strategies through the think-aloud process and graphic organizers.

  16. The Comprehension Problems of Children with Poor Reading Comprehension

    Reading comprehension, or the process of engaging text for the purpose of extracting and constructing meaning (), has paramount importance to academic success and future life outcomes (National Institute of Child Health and Human Development [NICHD], 2000; Snow, 2002).Yet only about 36% of fourth graders and 34% eighth graders in the United States have reading comprehension scores at or above ...

  17. [PDF] The New Literacies of Online Research and Comprehension

    A significant gap persisted for online research and comprehension after we conditioned on pretest differences in offline reading, offline writing, and prior knowledge scores. The results of the questionnaire indicated that West Town students had greater access to the Internet at home and were required to use the Internet more in school.

  18. The achievement gap in reading competence: the effect of measurement

    In particular, students' reading competence is essential for the comprehension of educational content in secondary school (Edossa et al., 2019; ... Implications for research on the achievement gap in reading competence. Studies on reading competence development have presented either parallel development (e.g., Retelsdorf & Möller, ...

  19. Inferencing in Reading Comprehension: Examining Variations in

    Research evidence suggests that gap-filling inferences are more difficult than text-based inferences for struggling readers (Cain & Oakhill, 1999), but it is less clear how these different types of inferences should impact practice and policy through the content standards to be taught in schools and assessments of reading comprehension.

  20. Think Again: Should Elementary Schools Teach Reading Comprehension?

    Answer: Not really. While core "decoding" skills are clearly essential, comprehending decoded text depends mostly on broad knowledge of the world. The prevailing view of reading comprehension began to form in the 1970s and 1980s, as researchers investigated the cognitive processes that humans use to think.

  21. Reading Comprehension: Bridging the Gap ... Research and Practice

    evaluative criteria. is. to the. Reading Comprehension: Bridging the Gap. Between Research and Practice. There is usually a reciprocal relationship between. current research and what is happening in the classroom. Trends noticed by educators become issues for. researchers, and the corroborated findings of the.

  22. Understanding and Supporting Literacy Development Among English

    Reading comprehension requires that students simultaneously read the printed words in text and understand ... the studies that comprise this special topic collection help fill this gap by providing insight into how to support ELs' reading development via a concerted focus on their language comprehension skills, broadly defined and measured ...

  23. PDF Research-based Teaching Comprehension Strategies: Bridging the Gap

    Teachers find the teaching of reading comprehension strategies challenging and do not teach it effectively. As a result, learners struggle to read texts with understanding. This article is based on empirical research done on reading comprehension teaching to Grade 3 Tshivenda-speaking learners. The study found that the teaching of reading ...