Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Current Topics in Children's Learning and Cognition

The Emergence of Scientific Reasoning

Submitted: 31 March 2012 Published: 14 November 2012

DOI: 10.5772/53885

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Current Topics in Children's Learning and Cognition

Edited by Heidi Kloos, Bradley J. Morris and Joseph L. Amaral

To purchase hard copies of this book, please contact the representative in India: CBS Publishers & Distributors Pvt. Ltd. www.cbspd.com | [email protected]

Chapter metrics overview

5,037 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

IntechOpen

Total Chapter Views on intechopen.com

Overall attention for this chapters

Author Information

Bradley j. morris.

  • Kent State University, USA

Steve Croker

  • Illinois State University, USA

Corinne Zimmerman

Amy m. masnick.

  • Hofstra University, USA

*Address all correspondence to:

1. Introduction

Scientific reasoning encompasses the reasoning and problem-solving skills involved in generating, testing and revising hypotheses or theories, and in the case of fully developed skills, reflecting on the process of knowledge acquisition and knowledge change that results from such inquiry activities. Science, as a cultural institution, represents a “hallmark intellectual achievement of the human species” and these achievements are driven by both individual reasoning and collaborative cognition ( Feist, 2006 , p. ix).

Our goal in this chapter is to describe how young children build from their natural curiosity about their world to having the skills for systematically observing, predicting, and understanding that world. We suggest that scientific reasoning is a specific type of intentional information seeking, one that shares basic reasoning mechanisms and motivation with other types of information seeking ( Kuhn, 2011a ). For example, curiosity is a critical motivational component that underlies information seeking ( Jirout & Klahr, 2012 ), yet only in scientific reasoning is curiosity sated by deliberate data collection and formal analysis of evidence. In this way, scientific reasoning differs from other types of information seeking in that it requires additional cognitive resources as well as an integration of cultural tools. To that end, we provide an overview of how scientific reasoning emerges from the interaction between internal factors (e.g., cognitive and metacognitive development) and cultural and contextual factors.

The current state of empirical research on scientific reasoning presents seemingly contradictory conclusions. Young children are sometimes deemed “little scientists” because they appear to have abilities that are used in formal scientific reasoning (e.g., causal reasoning; Gopnik et al., 2004 ). At the same time, many studies show that older children (and sometimes adults) have difficulties with scientific reasoning. For example, children have difficulty in systematically designing controlled experiments, in drawing appropriate conclusions based on evidence, and in interpreting evidence (e.g., Croker, 2012 ; Chen & Klahr, 1999 ; Kuhn, 1989 ; Zimmerman, 2007 ).

In the following account, we suggest that despite the early emergence of many of the precursors of skilled scientific reasoning, its developmental trajectory is slow and requires instruction, support, and practice. In Section 2of the chapter, we discuss cognitive and metacognitive factors. We focus on two mechanisms that play a critical role in all cognitive processes (i.e., encoding and strategy acquisition/selection). Encoding involves attention to relevant information; it is foundational in all reasoning. Strategy use involves intentional approaches to seeking new knowledge and synthesizing existing knowledge. These two mechanisms are key components for any type of intentional information seeking yet follow a slightly different development trajectory in the development of scientific reasoning skills. We then discuss the analogous development of metacognitive awareness of what is being encoded, and metastrategic skills for choosing and deploying hypothesis testing and inference strategies. In Section 3, we describe the role of contextual factors such as direct and scaffolded instruction, and the cultural tools that support the development of the cognitive and metacognitive skills required for the emergence of scientific thinking.

2. The development of scientific reasoning

Effective scientific reasoning requires both deductive and inductive skills. Individuals must understand how to assess what is currently known or believed, develop testable questions, test hypotheses, and draw appropriate conclusions by coordinating empirical evidence and theory. Such reasoning also requires the ability to attend to information systematically and draw reasonable inferences from patterns that are observed. Further, it requires the ability to assess one’s reasoning at each stage in the process. Here, we describe some of the key issues in developing these cognitive and metacognitive scientific reasoning skills.

2.1. Cognitive processes and mechanisms

The main task for developmental researchers is to explain how children build on their intuitive curiosity about the world to become skilled scientific reasoners. Curiosity , defined as “the threshold of desired uncertainty in the environment that leads to exploratory behavior” ( Jirout & Klahr, 2012 , p. 150), will lead to information seeking. Information seeking activates a number of basic cognitive mechanisms that are used to extract (encode) information from the environment and then children (and adults) can act on this information in order to achieve a goal (i.e., use a strategy; Klahr, 2001; Kuhn, 2010 ). We turn our discussion to two such mechanisms and discuss how these mechanisms underlie the development of a specific type of information seeking: scientific reasoning.

A mechanistic account of the development of scientific reasoning includes information about the processes by which this change occurs, and how these processes lead to change over time (Klahr, 2001). Mechanisms can be described at varying levels (e.g., neurological, cognitive, interpersonal) and over different time scales. For example, neurological mechanisms (e.g., inhibition) operate at millisecond time scales (Burlea, Vidala, Tandonneta, & Hasbroucq, 2004) while learning mechanisms may operate over the course of minutes (e.g., inhibiting irrelevant information during problem solving; Becker, 2010 ). Many of the cognitive processes and mechanisms that account for learning and for problem solving across a variety of domains are important to the development of scientific reasoning skills and science knowledge acquisition. Many cognitive mechanisms have been identified as underlying scientific reasoning and other high-level cognition (e.g., analogy, statistical learning, categorization, imitation, inhibition; Goswami, 2008 ). However, due to space limitations we focus on what we argue are the two most critical mechanisms – encoding and strategy development –to illustrate the importance of individual level cognitive abilities.

2.1.1. Encoding

Encoding is the process of representing information and its context in memory as a result of attention to stimuli ( Chen, 2007 ; Siegler, 1989 ). As such, it is a central mechanism in scientific reasoning because we must represent information before we can reason about it, and the quality and process of representation can affect reasoning. Importantly, there are significant developmental changes in the ability to encode the relevant features that will lead to sound reasoning and problem solving ( Siegler, 1983 ; 1985 ). Encoding abilities improve with the acquisition of encoding strategies and with increases in children’s domain knowledge ( Siegler, 1989 ). Young children often encode irrelevant features due to limited domain knowledge (Gentner, Loewenstein, & Thompson, 2003). For example, when solving problems to make predictions about the state of a two-arm balance beam (i.e., tip left, tip right, or balance), children often erroneously encode distance to the fulcrum and amount of weight as a single factor, decreasing the likelihood of producing a correct solution (which requires weight and distance to be encoded and considered separately as causal factors, while recognizing non-causal factors such as color; Amsel, Goodman, Savoie, & Clark, 1996; Siegler, 1983 ). Increased domain knowledge helps children assess more effectively what information is and is not necessary to encode. Further, children’s encoding often improves with the acquisition of encoding strategies. For example, if a child is attempting to recall the location of an item in a complex environment, she may err in encoding only the features of the object itself without encoding its relative position. With experience, she may encode the relations between the target item and other objects (e.g., the star is in front of the box), a strategy known as cue learning. Encoding object position and relative position increases the likelihood of later recall and is an example of how encoding better information is more important than simply encoding more information ( Chen, 2007 ; Newcombe & Huttenlocher, 2000 ).

Effective encoding is dependent on directing attention to relevant information, which in turn leads to accurate representations that can guide reasoning. Across a variety of tasks, experts are more likely to attend to critical elements in problem solving, and less likely to attend to irrelevant information, compared to novices ( Gobet, 2005 ). Domain knowledge plays an important role in helping to guide attention to important features. Parents often direct a child’s attention to critical problem features during problem solving. For example, a parent may keep track of which items have been counted in order to help a child organize counting ( Saxe, Guberman, & Gearhart, 1987 ). Instructional interventions in which children were directed towards critical elements in problem solving improved their attention to these features ( Kloos & VanOrden, 2005 ). Although domain knowledge is helpful in directing attention to critical features, it may sometimes limit novel reasoning in a domain and limit the extent to which attention is paid to disconfirming evidence ( Li & Klahr, 2006 ). Finally, self-generated activity improves encoding. Self-generation of information from memory, rather than passive attention, is associated with more effective encoding because it recruits greater attentional resources than passive encoding ( Chi, 2009 ).

2.1.2. Strategy development

Strategies are sequences of procedural actions used to achieve a goal ( Siegler, 1996 ). In the context of scientific reasoning, strategies are the steps that guide children from their initial state (e.g., a question about the effects of weight and distance in balancing a scale) to a goal state (e.g., understanding the nature of the relationship between variables). We will briefly examine two components of strategy development: strategy acquisition and strategy selection . Strategies are particularly important in the development of scientific reasoning. Children often actively explore objects in a manner that is like hypothesis testing; however, these exploration strategies are not systematic investigations in which variables are manipulated and controlled as in formal hypothesis-testing strategies ( Klahr, 2000 ). The acquisition of increasingly optimal strategies for hypothesis testing, inference, and evidence evaluation leads to more effective scientific reasoning that allows children to construct more veridical knowledge.

New strategies are added to the repertoire of possible strategies through discovery, instruction, or other social interactions ( Chen, 2007 ; Gauvain, 2001 ; Siegler, 1996 ). There is evidence that children can discover strategies on their own ( Chen, 2007 ). Children often discover new strategies when they experience an insight into a new way of solving a familiar problem. For example, 10- and 11-year-olds discovered new strategies for evaluating causal relations between variables in a computerized task only after creating different cars (e.g., comparing the effects of engine size) and testing them ( Schauble, 1990 ). Similarly, when asked to determine the cause of a chemical reaction, children discovered new experimentation strategies only after several weeks ( Kuhn & Phelps, 1982 ). Over time, existing strategies may be modified to reduce time and complexity of implementation (e.g., eliminating redundant steps in a problem solving sequence; Klahr, 1984 ). For example, determining causal relations among variables requires more time when experimentation is unsystematic. In order to identify which variables resulted in the fastest car, children often constructed up to 25 cars, whereas an adult scientist identified the fastest car after constructing only seven cars ( Schauble, 1990 ).

Children also gain new strategies through social interaction, by being explicitly taught a strategy, imitating a strategy, or by collaborating in problem solving ( Gauvain, 2001 ). For example, when a parent asks a child questions about events in a photograph, the parent evokes memories of the event and helps to structure the child’s understanding of the depicted event, a process called conversational remembering ( Middleton, 1997 ). Conversational remembering improves children’s recall of events and often leads to children spontaneously using this strategy. Parent conversations about event structures improved children’s memory for these structures; for example, questions about a child’s day at school help to structure this event and improved recall ( Nelson, 1996 ). Children also learn new strategies by solving problems cooperatively with adults. In a sorting task, preschool children were more likely to improve their classification strategies after working with their mothers ( Freund, 1990 ). Further, children who worked with their parents on a hypothesis-testing task were more likely to identify causal variables than children who worked alone because parents helped children construct valid experiments, keep data records, and repeat experiments ( Gleason & Schauble, 2000 ).

Children also acquire strategies by interacting with an adult modeling a novel strategy. Middle-school children acquired a reading comprehension strategy (e.g., anticipating the ending of a story) after seeing it modeled by their teacher ( Palinscar, Brown, & Campione, 1993 ). Additionally, children can acquire new strategies from interactions with other children. Monitoring other children during problem solving improves a child’s understanding of the task and appears to improve how they evaluate their own performance ( Brownell & Carriger, 1991 ). Elementary school children who collaborated with other students to solve the balance-scale task outperformed students who worked alone ( Pine & Messer, 1998 ). Ten-year-olds working in dyads were more likely to discuss their strategies than children working alone and these discussions were associated with generating better hypotheses than children working alone ( Teasley, 1995 ).

More than one strategy may be useful for solving a problem, which requires a means to select among candidate strategies. One suggestion is that this process occurs by adaptive selection. In adaptive selection, strategies that match features of the problem are candidates for selection. One component of selection is that newer strategies tend to have a slightly higher priority for use when compared to older strategies ( Siegler, 1996 ). Successful selection is made on the basis of the effectiveness of the strategy and its cost (e.g., speed), and children tend to choose the fastest, most accurate strategy available (i.e., the most adaptive strategy).

Cognitive mechanisms provide the basic investigation and inferential tools used in scientific reasoning. The ability to reason about knowledge and the means for obtaining and evaluating knowledge provide powerful tools that augment children’s reasoning. Metacognitive abilities such as these may help explain some of the discrepancies between early scientific reasoning abilities and limitations in older children, as well as some of the developmental changes in encoding and strategy use.

2.2. Metacognitive and metastrategic processes

Sodian, Zaitchik, and Carey (1991 ) argue that two basic skills related to early metacognitive acquisitions are needed for scientific reasoning. First, children need to understand that inferences can be drawn from evidence. The theory of mind literature (e.g., Wellman, Cross, & Watson, 2001 ) suggests that it is not until the age of 4 that children understand that beliefs and knowledge are based on perceptual experience (i.e., evidence). As noted earlier, experimental work demonstrates that preschoolers can use evidence to make judgments about simple causal relationships (Gopnik, Sobel, Schulz, & Glymour, 2001; Schulz & Bonawitz, 2007; Schulz & Gopnik, 2004 ; Schulz, Gopnik,& Glymour, 2007). Similarly, several classic studies show that children as young as 6 can succeed in simple scientific reasoning tasks. Children between 6 and 9 can discriminate between a conclusive and an inclusive test of a simple hypothesis ( Sodian et al., 1991 ). Children as young as 5 can form a causal hypothesis based on a pattern of evidence, and even 4-year-olds seem to understand some of the principles of causal reasoning (Ruffman, Perner, Olson, & Doherty, 1993).

Second, according to Sodian et al. (1991 ), children need to understand that inference is itself a mechanism with which further knowledge can be acquired. Four-year-olds base their knowledge on perceptual experiences, whereas 6-year-olds understand that the testimony of others can also be used in making inferences ( Sodian & Wimmer, 1987 ). Other research suggests that children younger than 6 can make inferences based on testimony, but in very limited circumstances ( Koenig, Clément, & Harris, 2004 ). These findings may explain why, by the age of 6, children are able to succeed on simple causal reasoning, hypothesis testing, and evidence evaluation tasks.

Research with older children, however, has revealed that 8- to 12-year-olds have limitations in their abilities to (a) generate unconfounded experiments, (b) disconfirm hypotheses, (c) keep accurate and systematic records, and (d) evaluate evidence ( Klahr, Fay, & Dunbar, 1993 ; Kuhn, Garcia-Mila, Zohar, & Andersen, 1995; Schauble, 1990 , 1996 ; Zimmerman, Raghavan, & Sartoris, 2003 ). For example, Schauble (1990 ) presented children aged 9-11 with a computerized task in which they had to determine which of five factors affect the speed of racing cars. Children often varied several factors at once (only 22% of the experiments were classified as valid) and they often drew conclusions consistent with belief rather than the evidence generated. They used a positive test strategy, testing variables believed to influence speed (e.g., engine size) and not testing those believed to be non-causal (e.g., color). Some children recorded features without outcomes, or outcomes without features, but most wrote down nothing at all, relying on memory for details of experiments carried out over an eight-week period.

Although the performance differences between younger and older children may be interpreted as potentially contradictory, the differing cognitive and metacognitive demands of tasks used to study scientific reasoning at different ages may account for some of the disconnect in conclusions. Even though the simple tasks given to preschoolers and young children require them to understand evidence as a source of knowledge, such tasks require the cognitive abilities of induction and pattern recognition, but only limited metacognitive abilities. In contrast, the tasks used to study the development of scientific reasoning in older children (and adults) are more demanding and focused on hypothetico-deductive reasoning; they include more variables, involve more complex causal structures, require varying levels of domain knowledge, and are negotiated across much longer time scales. Moreover, the tasks given to older children and adults involve the acquisition, selection, and coordination of investigation strategies, combining background knowledge with empirical evidence. The results of investigation activities are then used in the acquisition, selection, and coordinationof evidence evaluation and inference strategies. With respect to encoding, increases in task complexity require attending to more information and making judgments about which features are relevant. This encoding happens in the context of prior knowledge and, in many cases, it is also necessary to inhibit prior knowledge (Zimmerman & Croker, in press).

Sodian and Bullock (2008 ) also argue that mature scientific reasoning involves the metastrategic process of being able to think explicitly about hypotheses and evidence, and that this skill is not fully mastered until adolescence at the very earliest. According to Amsel et al. (2008 ), metacognitive competence is important for hypothetical reasoning. These conclusions are consistent with Kuhn’s (1989 , 2005 , 2 011a ) argument that the defining feature of scientific thinking is the set of cognitive and metacognitive skills involved in differentiating and coordinating theory and evidence. Kuhn argues that the effective coordination of theory and evidence depends on three metacognitive abilities: (a) The ability to encode and represent evidence and theory separately, so that relations between them can be recognized; (b) the ability to treat theories as independent objects of thought (i.e., rather than a representation of “the way things are”); and (c) the ability to recognize that theories can be false, setting aside the acceptance of a theory so evidence can be assessed to determine the veridicality of a theory. When we consider these cognitive and metacognitive abilities in the larger social context, it is clear that skills that are highly valued by the scientific community may be at odds with the cultural and intuitive views of the individual reasoner ( Lemke, 2001 ). Thus, it often takes time for conceptual change to occur; evidence is not just evaluated in the context of the science investigation and science classroom, but within personal and community values. Conceptual change also takes place in the context of an individual’s personal epistemology, which can undergo developmental transitions (e.g., Sandoval, 2005 ).

2.2.1. Encoding and strategy use

Returning to the encoding and retrieval of information relevant to scientific reasoning tasks, many studies demonstrate that both children and adults are not always aware of their memory limitations while engaged in investigation tasks (e.g., Carey, Evans, Honda, Jay, & Unger, 1989; Dunbar & Klahr, 1989 ; Garcia-Mila & Andersen, 2007 ; Gleason & Schauble, 2000 ; Siegler & Liebert, 1975 ; Trafton & Trickett, 2001 ). Kanari and Millar (2004 ) found that children differentially recorded the results of experiments, depending on familiarity or strength of prior beliefs. For example, 10- to 14-year-olds recorded more data points when experimenting with unfamiliar items (e.g., using a force-meter to determine the factors affecting the force produced by the weight and surface area of boxes) than with familiar items (e.g., using a stopwatch to experiment with pendulums). Overall, children are less likely than adults to record experimental designs and outcomes, or to review notes they do keep, despite task demands that clearly necessitate a reliance on external memory aids.

Children are often asked to judge their memory abilities, and memory plays an important role in scientific reasoning. Children’s understanding of memory as a fallible process develops over middle childhood ( Jaswal & Dodson, 2009 ; Kreuzer, Leonard, & Flavell, 1975). Young children view all strategies on memory tasks as equally effective, whereas 8- to 10-year-olds start to discriminate between strategies, and 12-year-olds know which strategies work best ( Justice, 1986 ; Schneider, 1986 ). The development of metamemory continues through adolescence ( Schneider, 2008 ), so there may not be a particular age that memory and metamemory limitations are no longer a consideration for children and adolescents engaged in complex scientific reasoning tasks. However, it seems likely that metamemory limitations are more profound for children under 10-12 years.

Likewise, the acquisition of other metacognitive and metastrategic skills is a gradual process. Early strategies for coordinating theory and evidence are replaced with better ones, but there is not a stage-like change from using an older strategy to a newer one. Multiple strategies are concurrently available so the process of change is very much like Siegler’s (1996 ) overlapping waves model ( Kuhn et al., 1995 ). However, metastrategic competence does not appear to routinely develop in the absence of instruction. Kuhn and her colleagues have incorporated the use of specific practice opportunities and prompts to help children develop these types of competencies. For example, Kuhn, Black, Keselman, and Kaplan (2000) incorporated performance-level practice and metastrategic-level practice for sixth- to eighth-grade students. Performance-level exercise consisted of standard exploration of the task environment, whereas metalevel practice consisted of scenarios in which two individuals disagreed about the effect of a particular feature in a multivariable situation. Students then evaluated different strategies that could be used to resolve the disagreement. Such scenarios were provided twice a week during the course of ten weeks. Although no performance differences were found between the two types of practice with respect to the number of valid inferences, there were more sizeable differences in measures of understanding of task objectives and strategies (i.e., metastrategic understanding).

Similarly, Zohar and Peled (2008 ) focused instruction in the control-of-variables strategy (CVS) on metastrategic competence. Fifth-graders were given a computerized task in which they had to determine the effects of five variables on seed germination. Students in the control group were taught about seed germination, and students in the experimental group were given a metastrategic knowledge intervention over several sessions. The intervention consisted of describing CVS, discussing when it should be used, and discussing what features of a task indicate that CVS should be used. A second computerized task on potato growth was used to assess near transfer. A physical task in which participants had to determine which factors affect the distance a ball will roll was used to assess far transfer. The experimental group showed gains on both the strategic and the metastrategic level. The latter was measured by asking participants to explain what they had done. These gains were still apparent on the near and far transfer tasks when they were administered three months later. Moreover, low-academic achievers showed the largest gains. It is clear from these studies that although meta-level competencies may not develop routinely, they can certainly be learned via explicit instruction.

Metacognitive abilities are necessary precursors to sophisticated scientific thinking, and represent one of the ways in which children, adults, and professional scientists differ. In order for children’s behavior to go beyond demonstrating the correctness of one’s existing beliefs (e.g., Dunbar & Klahr, 1989 ) it is necessary for meta-level competencies to be developed and practiced ( Kuhn, 2005 ). With metacognitive control over the processes involved, children (and adults) can change what they believe based on evidence and, in doing so, are aware not only that they are changing a belief, but also know why they are changing a belief. Thus, sophisticated reasoning involves both the use of various strategies involved in hypothesis testing, induction, inference, and evidence evaluation, and a meta-level awareness of when, how, and why one should engage in these strategies.

3. Scientific reasoning in context

Much of the existing laboratory work on the development of scientific thinking has not overtly acknowledged the role of contextual factors. Although internal cognitive and metacognitive processes have been a primary focus of past work, and have helped us learn tremendously about the processes of scientific thinking, we argue that many of these studies focused on individual cognition have, in fact, included both social factors (in the form of, for example, collaborations with other students, or scaffolds by parents or teachers) and cultural tools that support scientific reasoning.

3.1. Instructional and peer support: The role of others in supporting cognitive development

Our goal in this section is to re-examine our two focal mechanisms (i.e., encoding and strategy) and show how the development of these cognitive acquisitions and metastrategic control of them are facilitated by both the social and physical environment.

3.1.1. Encoding

Children must learn to encode effectively, by knowing what information is critical to pay attention to. They do so in part with the aid of their teachers, parents, and peers. Once school begins, teachers play a clear role in children’s cognitive development. An ongoing debate in the field of science education concerns the relative value of having children learn and discover how the world works on their own (often called “discovery learning”) and having an instructor guide the learning more directly (often called “direct instruction”). Different researchers interpret these labels in divergent ways, which adds fuel to the debate (see e.g., Bonawitz et al., 2011 ; Hmelo-Silver, Duncan, & Chinn, 2007 ; Kirshner, Sweller, & Clark, 2006; Klahr, 2010 ; Mayer, 2004 ; Schmidt, Loyens, van Gog, & Paas, 2007 ). Regardless of definitions, though, this issue illustrates the core idea that learning takes place in a social context, with guidance that varies from minimal to didactic.

Specifically, this debate is about the ideal role for adults in helping children to encode information. In direct instruction, there is a clear role for a teacher, often actively pointing out effective examples as compared to ineffective ones, or directly teaching a strategy to apply to new examples. And, indeed, there is evidence that more direct guidance to test variables systematically can help students in learning, particularly in the ability to apply their knowledge to new contexts (e.g., Klahr & Nigam, 2004 ; Lorch et al., 2010 ; Strand-Cary & Klahr, 2008 ). There is also evidence that scaffolded discovery learning can be effective (e.g., Alfieri, Brooks, Adrich, & Tenenbaum, 2011). Those who argue for discovery learning often do so because they note that pedagogical approaches commonly labeled as “discovery learning,” such as problem-based learning and inquiry learning, are in fact highly scaffolded, providing students with a structure in which to explore ( Alfieri et al., 2011 ; Hmelo-Silver et al., 2007 ; Schmidt et al., 2007 ). Even in microgenetic studies in which children are described as engaged in “self-directed learning,” researchers ask participants questions along that way that serve as prompts, hints, dialogue, and scaffolds that facilitate learning ( Klahr & Carver, 1995 ). What there appears to be little evidence for is “pure discovery learning” in which students are given little or no guidance and expected to discover rules of problem solving or other skills on their own ( Alfieri et al., 2011 ; Mayer, 2004 ). Thus, it is clear that formal education includes a critical role for a teacher to scaffold children’s scientific reasoning.

A common goal in science education is to correct the many misconceptions students bring to the classroom. Chinn and Malhotra (2002 ) examined the role of encoding evidence, interpreting evidence, generalization, and retention as possible impediments to correcting misconceptions. Over four experiments, they concluded that the key difficulty faced by children is in making accurate observations or properly encoding evidence that does not match prior beliefs. However, interventions involving an explanation of what scientists expected to happen (and why) were very effective in mediating conceptual change when encountering counterintuitive evidence. That is, with scaffolds, children made observations independent of theory, and changed their beliefs based on observed evidence. For example, the initial belief that a thermometer placed inside a sweater would display a higher temperature than a thermometer outside a sweater was revised after seeing evidence that disconfirmed this belief and hearing a scientist’s explanation that the temperature would be the same unless there was something warm inside the sweater. Instructional supports can play a crucial role in improving the encoding and observational skills required for reasoning about science.

In laboratory studies of reasoning, there is direct evidence of the role of adult scaffolding. Butler and Markman (2012a ) demonstrate that in complex tasks in which children need to find and use evidence, causal verbal framing (i.e., asking whether one event caused another) led young children to more effectively extract patterns from scenes they observed, which in turn led to more effective reasoning. In further work demonstrating the value of adult scaffolding in children’s encoding, Butler and Markman (2012b ) found that by age 4, children are much more likely to explore and make inductive inferences when adults intentionally try to teach something than when they are shown an “accidental” effect.

3.1.2. Strategy development and use

As discussed earlier in this chapter, learning which strategies are available and useful is a fundamental part of developing scientific thinking skills. Much research has looked at the role of adults in teaching strategies to children in both formal (i.e., school) and informal settings (e.g., museums, home; Fender & Crowley, 2007 ; Tenenbaum, Rappolt-Schlichtmann, & Zanger, 2004).

A central task in scientific reasoning involves the ability to design controlled experiments. Chen and Klahr (1999 ) found that directly instructing 7- to 10-year-old children in the strategies for designing unconfounded experiments led to learning in a short time frame. More impressively, the effectiveness of the training was shown seven months later, when older students given the strategy training were much better at correctly distinguishing confounded and unconfounded designs than those not explicitly trained in the strategy. In another study exploring the role of scaffolded strategy instruction, Kuhn and Dean (2005 ) worked with sixth graders on a task to evaluate the contribution of different factors to earthquake risk. All students given the suggestion to focus attention on just one variable were able to design unconfounded experiments, compared to only 11% in the control group given their typical science instruction. This ability to design unconfounded experiments increased the number of valid inferences in the intervention group, both immediately and three months later. Extended engagement alone resulted in minimal progress, confirming that even minor prompts and suggestions represent potentially powerful scaffolds. In yet another example, when taught to control variables either with or without metacognitive supports, 11-year-old children learned more when guided in thinking about how to approach each problem and evaluate the outcome ( Dejonckheere, Van de Keere, & Tallir, 2011 ). Slightly younger children did not benefit from the same manipulation, but 4- to 6-year-olds given an adapted version of the metacognitive instruction were able to reason more effectively about simpler physical science tasks than those who had no metacognitive supports (Dejonckheere, Van de Keere, & Mestdagh, 2010).

3.2. Cultural tools that support scientific reasoning

Clearly, even with the number of studies that have focused on individual cognition, a picture is beginning to emerge to illustrate the importance of social and cultural factors in the development of scientific reasoning. Many of the studies we describe highlight that even “controlled laboratory studies” are actually scientific reasoning in context. To illustrate, early work by Siegler and Liebert (1975 ) includes both an instructional context (a control condition plus two types of instruction: conceptual framework , and conceptual framework plus analogs ) and the role of cultural supports. In addition to traditional instruction about variables (factors, levels, tree diagrams), one type of instruction included practice with analogous problems. Moreover, 10- and 13-year-olds were provided with paper and pencil to keep track of their results. A key finding was that record keeping was an important mediating factor in success. Children who had the metacognitive awareness of memory limitations and therefore used the provided paper for record keeping were more successful at producing all possible combinations necessary to manipulate and isolate variables to test hypotheses.

3.2.1. Cultural resources to facilitate encoding and strategy use

The sociocultural perspective highlights the role that language, speech, symbols, signs, number systems, objects, and tools play in individual cognitive development ( Lemke, 2001 ). As highlighted in previous examples, adult and peer collaboration, dialogue, and other elements of the social environment are important mediators. In this section, we highlight some of the verbal, visual, and numerical elements of the physical context that support the emergence of scientific reasoning.

Most studies of scientific reasoning include some type of verbal and pictorial representation as an aid to reasoning. As encoding is the first step in solving problems and reasoning, the use of such supports reduces cognitive load. In studies of hypothesis testing strategies with children (e.g., Croker & Buchanan, 2011 ; Tschirgi, 1980 ), for example, multivariable situations are described both verbally and with the help of pictures that represent variables (e.g., type of beverage), levels of the variable (e.g., cola vs. milk), and hypothesis-testing strategies (see Figure 1 , panel A). In classic work by Kuhn, Amsel, and O’Loughlin (1988 ), a picture is provided that includes the outcomes (children depicted as healthy or sick) along with the levels of four dichotomous variables (e.g., orange/apple, baked potato/French fries, see Kuhn et al., 1988 , pp. 40-41). In fact, most studies that include children as participants provide pictorial supports (e.g., Ruffman et al., 1993 ; Koerber, Sodian, Thoermer, & Nett, 2005). Even at levels of increasing cognitive development and expertise, diagrams and visual aids are regularly used to support reasoning (e.g., Schunn & Dunbar, 1996 ; Trafton & Trickett, 2001 ; Veermans, van Joolingen, & de Jong, 2006).

case study of scientific reasoning

Panel A illustrates the type of pictorial support that accompanies the verbal description of a hypothesis-testing task (from Croker & Buchanan, 2011 ). Panel B shows an example of a physical apparatus (from Triona & Klahr, 2007 ). Panel C shows a screenshot from an intelligent tutor designed to teach how to control variables in experimental design ( Siler & Klahr, 2012 ; see http://tedserver.psy.cmu.edu/demo/ted4.html, for a demonstration of the tutor).

Various elements of number and number systems are extremely important in science. Sophisticated scientific reasoning requires an understanding of data and the evaluation of numerical data. Early work on evidence evaluation (e.g., Shaklee, Holt, Elek, & Hall, 1988 ) included 2 x 2 contingency tables to examine the types of strategies children and adults used (e.g., comparing numbers in particular cells, the “sums of diagonals” strategy). Masnick and Morris (2008 ) used data tables to present evidence to be evaluated, and varied features of the presentation (e.g., sample size, variability of data). When asked to make decisions without the use of statistical tools, even third- and sixth-graders had rudimentary skills in detecting trends, overlapping data points, and the magnitude of differences. By sixth grade, participants had developing ideas about the importance of variability and the presence of outliers for drawing conclusions from numerical data.

Although language, symbols, and number systems are used as canonical examples of cultural tools and resources within the socio-cultural tradition ( Lemke, 2001 ), recent advances in computing and computer simulation are having a huge impact on the development and teaching of scientific reasoning. Although many studies have incorporated the use of physical systems ( Figure 1 , panel B) such as the canal task ( Gleason & Schauble, 2000 ), the ramps task (e.g., Masnick & Klahr, 2003 ), mixing chemicals ( Kuhn & Ho, 1980 ), and globes (Vosniadou, Skopeliti, & Ikospentaki, 2005), there is an increase in the use of interactive computer simulations (see Figure 1 , panel C). Simulations have been developed for electric circuits (Schauble, Glaser, Raghavan, & Reiner, 1992), genetics ( Echevarria, 2003 ), earthquakes ( Azmitia & Crowley, 2001 ), flooding risk ( Keselman, 2003 ), human memory ( Schunn & Anderson, 1999 ), and visual search (Métrailler, Reijnen, Kneser, & Opwis, 2008). Non-traditional science domains have also been used to develop inquiry skills. Examples include factors that affect TV enjoyment ( Kuhn et al., 1995 ), CD catalog sales ( Dean & Kuhn, 2007 ), athletic performance ( Lazonder, Wilhelm, & Van Lieburg, 2009 ), and shoe store sales ( Lazonder, Hagemans, & de Jong, 2010 ).

Computer simulations allow visualization of phenomena that are not directly observable in the classroom (e.g., atomic structure, planetary motion). Other advantages include that they are less prone to measurement error in apparatus set up, and that they can be programmed to record all actions taken (and their latencies). Moreover, many systems include a scaffolded method for participants to keep and consult records and notes. Importantly, there is evidence that simulated environments provide the same advantages as isomorphic “hands on” apparatus ( Klahr, Triona, & Williams, 2007 ; Triona & Klahr, 2007 ).

New lines of research are taking advantage of advances in computing and intelligent computer systems. Kuhn (2011b ) recently examined how to facilitate reasoning about multivariable causality, and the problems associated with the visualization of outcomes resulting from multiple causes (e.g., the causes for different cancer rates by geographical area). Participants had access to software that produces a visual display of data points that represent main effects and their interactions. Similarly, Klahr and colleagues (Siler, Mowery, Magaro, Willows, & Klahr, 2010 ) have developed an intelligent tutor to teach experimentation strategies (see Figure 1 , panel C). The use of intelligent tutors provides the unique opportunity of personally tailored learning and feedback experiences, dependent on each student’s pattern of errors. This immediate feedback can be particularly useful in helping develop metacognitive skills (e.g., Roll, Alaven, McLaren, & Koedinger, 2011) and facilitate effective student collaboration (Diziol, Walker, Rummel, & Koedinger, 2010).

Tweney, Doherty, and Mynatt (1981 ) noted some time ago that most tasks used to study scientific thinking were artificial because real investigations require aided cognition. However, as can be seen by several exemplars, even lab studies include support and assistance for many of the known cognitive limitations faced by both children and adults.

4. Summary and conclusions

Determining the developmental trajectory of scientific reasoning has been challenging, in part because scientific reasoning is not a unitary construct. Our goal was to outline how the investigation, evidence evaluation, and inference skills that constitute scientific reasoning emerge from intuitive information seeking via the interaction of individual and contextual factors. We describe the importance of (a) cognitive processes and mechanisms, (b) metacognitive and metastrategic skills, (c) the role of direct and scaffolded instruction, and (d) a context in which scientific activity is supported and which includes cultural tools (literacy, numeracy, technology) that facilitate the emergence of scientific reasoning. At the outset, we intended to keep section boundaries clean and neat. What was apparent to us, and may now be apparent to the reader, is that these elements are highly intertwined. It was difficult to discuss pure encoding in early childhood without noting the role that parents play. Likewise, it was difficult to discuss individual discovery of strategies, without noting such discovery takes place in the presence of peers, parents, and teachers. Similarly, discussing the teaching and learning of strategies is difficult without noting the role of cultural tools such as language, number, and symbol systems.

There is far more to a complete account of scientific reasoning than has been discussed here, including other cognitive mechanisms such as formal hypothesis testing, retrieval, and other reasoning processes. There are also relevant non-cognitive factors such as motivation, disposition, personality, argumentation skills, and personal epistemology, to name a few (see Feist, 2006 ). These additional considerations do not detract from our assertion that encoding and strategy use are critical to the development of scientific reasoning, and that we must consider cognitive and metacognitive skills within a social and physical context when seeking to understand the development of scientific reasoning. Scientific knowledge acquisition and, importantly, scientific knowledge change is the result of individual and social cognition that is mediated by education and cultural tools. The cultural institution of science has taken hundreds of years to develop. As individuals, we may start out with the curiosity and disposition to be little scientists, but it is a long journey from information seeking to skilled scientific reasoning, with the help of many scaffolds along the way.

Acknowledgements

All authors contributed equally to the manuscript. The authors thank Eric Amsel, Deanna Kuhn, and Jamie Jirout for comments on a previous version of this chapter.

© 2012 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Continue reading from the same book

Edited by Heidi Kloos

Published: 14 November 2012

By Daisy A. Segovia and Angela M. Crossman

6885 downloads

By Joseph L. Amaral, Susan Collins, Kevin T. Bohache ...

4783 downloads

By Mieczyslaw Pokorski, Lukasz Borecki and Urszula Je...

3725 downloads

  • DOI: 10.1007/S10763-004-3224-2
  • Corpus ID: 55376211

The Nature and Development of Scientific Reasoning: A Synthetic View

  • Published 1 September 2004
  • Philosophy, Education
  • International Journal of Science and Mathematics Education

149 Citations

Connecting science and mathematics: the nature of scientific and statistical hypothesis testing, elementary students’ reasoning in drawn explanations based on a scientific theory, establishing assessment scales using a novel disciplinary rationale for scientific reasoning, intertwining evidence- and model-based reasoning in physics sensemaking: an example from electrostatics., insights from coherence in students’ scientific reasoning skills, an investigation of reasoning skills through problem based learning, a hypothetical learning progression for quantifying phenomena in science, cognitive basis and semantic structure of phenomenological reasoning on science among lower secondary school students: a case of indonesia, (icrsme) reasoning at the intersection of science and mathematics in elementary school: a systematic literature review, scientific reasoning: theory evidence coordination in physics-based and non-physics-based tasks, 75 references, the nature and development of hypothetico‐predictive argumentation with implications for science teaching, how do humans acquire knowledge and what does that imply about the nature of knowledge, mental models: towards a cognitive science of language, inference, and consciousness, how good are students at testing alternative explanations of unseen entities, deductive reasoning, brain maturation, and science concept acquisition: are they linked, self‐generated analogies as a tool for constructing and evaluating explanations of scientific phenomena, philosophy of natural science, piaget’s theory, the neurological basis of learning, development and discovery: implications for science and mathematics instruction / anton e. lawson, development of scientific reasoning in college biology: do two levels of general hypothesis-testing skills exist., related papers.

Showing 1 through 3 of 0 Related Papers

Physical Review Physics Education Research

  • Collections
  • Editorial Team
  • Open Access

Documenting the use of expert scientific reasoning processes by high school physics students

A. lynn stephens and john j. clement, phys. rev. st phys. educ. res. 6 , 020122 – published 24 november 2010.

  • Citing Articles (20)

Supplemental Material

  • INTRODUCTION
  • PREVIOUS RESEARCH AND THEORETICAL…
  • METHODOLOGY
  • EPISODES OF REASONING IDENTIFIED FROM…
  • NUMBER OF IDENTIFIED STUDENT-GENERATED…
  • LIMITATIONS
  • SOME WIDER IMPLICATIONS OF THE…
  • ACKNOWLEDGEMENTS

We describe a methodology for identifying evidence for the use of three types of scientific reasoning. In two case studies of high school physics classes, we used this methodology to identify multiple instances of students using analogies, extreme cases, and Gedanken experiments. Previous case studies of expert scientists have indicated that these processes can be central during scientific model construction; here we code for their spontaneous use by students. We document evidence for numerous instances of these forms of reasoning in these classes. Most of these instances were associated with motion- and force-indicating depictive gestures, which we take as one kind of evidence for the use of animated mental imagery. Altogether, this methodology shows promise for use in highlighting the role of nonformal reasoning in student learning and for investigating the possible association of animated mental imagery with scientific reasoning processes.

Figure

  • Received 13 January 2010

DOI: https://doi.org/10.1103/PhysRevSTPER.6.020122

case study of scientific reasoning

This article is available under the terms of the Creative Commons Attribution 3.0 License . Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI.

Authors & Affiliations

Article Text

Vol. 6, Iss. 2 — July - December 2010

case study of scientific reasoning

Authorization Required

Other options.

  • Buy Article »
  • Find an Institution with the Article »

Download & Share

Sign up to receive regular email alerts from Physical Review Physics Education Research

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 3.0 License . This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

  • Forgot your username/password?
  • Create an account

Article Lookup

Paste a citation or doi, enter a citation.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Teaching Scientific Reasoning Skills: A Case Study of a Microcomputer-Based Curriculum

Profile image of Yael Friedler

1989, School Science and Mathematics

Page 1. 58 Teaching Scientific Reasoning Skills: A Case Study of a Microcomputer-Based Curriculum Yael Friedler Rafi Nachmias Nancy Butler Songer School of Education University of California Berkeley, California 94720 ...

Related Papers

Linus Kambeyo

case study of scientific reasoning

Asian Journal of University Education

zulinda zulkipli

The article discusses a study carried out to investigate scientific reasoning skills among 82 science pre-service teachers at the Faculty of Education, Universiti Teknologi MARA (UiTM), one of the public universities in Malaysia. The development of general scientific abilities is critical to enable students of science, technology, engineering, and mathematics (STEM) to successfully handle open-ended real-world tasks in their future careers. Teaching goals in STEM education include fostering content knowledge and developing general scientific abilities. One such ability is scientific reasoning. Scientific reasoning encompasses critical thinking skill is an important and vital learning outcome in modern science education. Lawson (1978) categorized scientific reasoning into four domains: Conservative Concept, Proportional Concept, Control Variable and Probabilistic Thinking, and Hypothetical-Deductive Reasoning. An instrument by Lawson (1978) was adapted for the study. The findings sho...

Argumentation

Eugene Garver

merve kocagul

Current educational reforms consider scientific reasoning skills as significant to engage students into generate scientific explanations. The primary aim of this study is to evaluate the effectiveness of a teacher-training program, which is based on gaining knowledge, instructional strategies and skills for science teachers to promote students' scientific reasoning skills. The participants of this research, which was in holistic single case study design, were an in-service science teacher who had attended to Scientific Reasoning Skills Training Program (SRSTP) and his thirty-two 5th grade students. Teaching Scientific Reasoning Skills Observation Form (TSROF), and Force and Motion Scientific Reasoning Skills Test (FMSRT) were used as data collection tools. Results indicated that the trained teacher showed success at the rate of maximum %61,06 of observed phenomena with %47,76 of them in behaviors dimension and the students showed significant developments both in total score of FMSRT and especially in inductive, deductive, causal, analogical reasoning skills and control of variables strategy.

The International Journal of Science, Mathematics and Technology Learning

anna karelina

… & Evaluation in …

Briana Timmerman

European Journal for Philosophy of Science

Krist Vaesen

About three decades ago, the late Ronald Giere introduced a new framework for teaching scientific reasoning to science students. Giere’s framework presents a model-based alternative to the traditional statement approach—in which scientific inferences are reconstructed as explicit arguments, composed of (single-sentence) premises and a conclusion. Subsequent research in science education has shown that model-based approaches are particularly effective in teaching science students how to understand and evaluate scientific reasoning. One limitation of Giere’s framework, however, is that it covers only one type of scientific reasoning, namely the reasoning deployed in hypothesis-driven research practices. In this paper, we describe an extension of the framework. More specifically, we develop an additional model-based scheme that captures reasoning in application-oriented practices (which are very well represented in contemporary science). Our own teaching experience suggests that this e...

Eurasia Journal of Mathematics, Science and Technology Education

Carl Angell

Computers & Education

Sarantos Psycharis

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

International Journal of Instruction

Bambang Subali

Jurnal Pendidikan Fisika dan Keilmuan (JPFK)

Science Education

Marcia Linn

Cory Buxton

Nancy Songer

Journal of microbiology & biology education

Bill Bradshaw

US-China Education Review B

Robert Mayes

Journal of Science Teacher Education

Julie Grady

Universal Journal of Educational Research

Horizon Research Publishing(HRPUB) Kevin Nelson

Procedia - Social and Behavioral Sciences

Christian Schunn

IAP - Information Age Publishing, Inc.

Dewey I Dykstra

Kenneth Tobin

Annemarie Palincsar

Academia Letters

Osman Yaşar

Laboratory as an Instrument in Improving the Scientific Reasoning Skills of Pre-Service Science Teachers with Different Cognitive Styles

Feride Şahin , Fatma Sasmaz

Journal of College …

Donna Sundre

International Journal of ADVANCED AND APPLIED SCIENCES

sadiah baharom

Research in Science Education

Claudia Oriana Silva

La Shun Carroll

Educational Assessment

Ilonca Hardy

Izzur Aulia

Hello! Where are you in …

Victor Galea

A. Wahab Jufri

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

ANALYSIS OF TWO-TIER QUESTION SCORING METHODS: A CASE STUDY ON THE LAWSON’S CLASSROOM TEST OF SCIENTIFIC REASONING

  • February 2021
  • Journal of Baltic Science Education 20(1):146-159
  • 20(1):146-159
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Qiaoyi Liu at University of Colorado Boulder

  • University of Colorado Boulder

Kathleen Koenig at University of Cincinnati

  • University of Cincinnati

Qiu Ye Li at South China Normal University

  • South China Normal University

Abstract and Figures

Percent Correct on the Two Item Pairs in LCTSR across Different Grade Levels

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Noly Shofiyah
  • Budi Jatmiko

Nadi Suprapto

  • Hamidita Putri Ristia

Ulin Nuha

  • Supeno Supeno
  • Heather Douglas
  • Dianne Anderson

Vincentas Lamanauskas

  • Tereza Hrouzková

Lukas Richterek

  • Rhischa Assabet Shilla
  • Andrea Fatkur Rahman
  • Dewi Zahrotul Afida
  • Diana Gissell Martínez-Suárez

Eshetu Desalegn Alemneh

  • Belete Bedemo Beyene

Zainal Arifin

  • PHYS REV SPEC TOP-PH

Yang Xiao

  • Think Skills Creativ

Shaona Zhou

  • CHEM EDUC RES PRACT
  • A. L. Chandrasegaran

David F. Treagust

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Let your curiosity lead the way:

Apply Today

  • Arts & Sciences
  • Graduate Studies in A&S

Introduction to Scientific Reasoning

Philosophy 102.

Accessibility Links

  • Skip to content
  • Skip to search IOPscience
  • Skip to Journals list
  • Accessibility help
  • Accessibility Help

Click here to close this panel.

Purpose-led Publishing is a coalition of three not-for-profit publishers in the field of physical sciences: AIP Publishing, the American Physical Society and IOP Publishing.

Together, as publishers that will always put purpose above profit, we have defined a set of industry standards that underpin high-quality, ethical scholarly communications.

We are proudly declaring that science is our only shareholder.

Case study : analysis of senior high school students scientific creative, critical thinking and its correlation with their scientific reasoning skills on the sound concept

M Mustika 1 , J Maknun 2 and S Feranie 3

Published under licence by IOP Publishing Ltd Journal of Physics: Conference Series , Volume 1157 , Issue 3 Citation M Mustika et al 2019 J. Phys.: Conf. Ser. 1157 032057 DOI 10.1088/1742-6596/1157/3/032057

Article metrics

2062 Total downloads

Share this article

Author e-mails.

[email protected]

Author affiliations

1 Departemen Pendidikan Fisika, Sekolah Pasca Sarjana, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudi No. 229, Bandung 40154, Indonesia

2 Departemen Pendidikan Arsitektur, Fakultas Pendidikan Teknologi dan Kejuruan, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudi No. 229, Bandung 40154, Indonesia

3 Departemen Pendidikan Fisika, Fakultas Pendidikan Matematika dan Ilmu Pengetahuan Alam, Universitas Pendidikan Indonesia, Jl. Dr. Setiabudi No. 229, Bandung 40154, Indonesia

Buy this article in print

This reseach aims to analyze the correlation between students scientific creative and critical thinking to their scientific reasoning skill related to sound concept. The participant of this research were 42 students from eleventh-grade of science in one of the Private School in Bandung. In this research we use one package scientific creative-critical thinking to solve sound problem and in term of open ended question and scientific reasoning skills instrument in term of multiple choice question related to sound concept. The result of this research showed that the students average score of scientific creative and critical thinking respectively is 23.67 and 17.36 from maximum score 64 and 48. Both of them are in low category achievement. And for average score of scientific reasoning skills is 36.70 from maximum score 100 and it's in low category achievement. Meanwhile, for the correlation between scientific creative-critical thinking skills and scientific reasoning skils respectively are 0.57.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence . Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

How emotions affect logical reasoning: evidence from experiments with mood-manipulated participants, spider phobics, and people with exam anxiety

Recent experimental studies show that emotions can have a significant effect on the way we think, decide, and solve problems. This paper presents a series of four experiments on how emotions affect logical reasoning. In two experiments different groups of participants first had to pass a manipulated intelligence test. Their emotional state was altered by giving them feedback, that they performed excellent, poor or on average. Then they completed a set of logical inference problems (with if p, then q statements) either in a Wason selection task paradigm or problems from the logical propositional calculus. Problem content also had either a positive, negative or neutral emotional value. Results showed a clear effect of emotions on reasoning performance. Participants in negative mood performed worse than participants in positive mood, but both groups were outperformed by the neutral mood reasoners. Problem content also had an effect on reasoning performance. In a second set of experiments, participants with exam or spider phobia solved logical problems with contents that were related to their anxiety disorder (spiders or exams). Spider phobic participants' performance was lowered by the spider-content, while exam anxious participants were not affected by the exam-related problem content. Overall, unlike some previous studies, no evidence was found that performance is improved when emotion and content are congruent. These results have consequences for cognitive reasoning research and also for cognitively oriented psychotherapy and the treatment of disorders like depression and anxiety.

Introduction

In the field of experimental psychology, for a long time the predominant approach was a “divide and conquer” account in which cognition and emotion have been studied in strict isolation (e.g., Ekman and Davidson, 1994 ; Wilson and Keil, 2001 ; Holyoak and Morrison, 2005 ). Yet, in the last decade many researchers have realized that this is a quite artificial distinction and have regarded both systems as distinct but interacting (Dalgleish and Power, 1999 ; Martin and Clore, 2001 ). This new line of research resulted in many interesting findings and showed that emotions can have an influence on how we think and how successful we are at solving cognitive tasks (e.g., Schwarz and Clore, 1983 ; Bless et al., 1996 ; Schwarz and Skurnik, 2003 ). Such findings are not only relevant for basic cognitive research, such as reasoning (e.g., Blanchette, 2014 ), but may also have implications for cognitively oriented psychotherapy and the treatment of disorders like depression and anxiety.

In the present paper we explore the effect of emotion on a cognitive task that is often considered to be a test of rational thinking par excellence: logical reasoning. We start with a brief description of the logical problems that were used in our study. Then we summarize what is currently known about the connection between logical reasoning and emotional states. In the main part of the paper, we describe our hypotheses concerning the connection between logical reasoning and emotional states and then report a series of four experiments, two with a mood induction and two with participants who have a fear of either exams or spiders. In the final section we discuss the connection between logical reasoning and emotions and draw some general conclusions.

Logical reasoning problems

Logical reasoning goes back to the antique Greek philosopher Aristotle and is today considered to be essential for the success of people in school and daily life and all kinds of scientific discoveries (Johnson-Laird, 2006 ). In the psychological lab it is often investigated by means of conditional reasoning tasks. Such tasks are composed of a first premise, a second premise and a conclusion. The first premise consists of an “if p, then q” statement that posits q to be true if p is true. The second premise refers to the truth of the antecedent (“if” part) or the consequent (“then” part). The participants‘ task is to decide whether the conclusion follows logically from the two given premises. In this regard, two inferences are valid and two are invalid (given they are interpreted as implications and not as biconditionals, i.e., as “if and only if”). The two valid inferences are modus ponens (MP; “if p, then q, and p is true, then q is true”) and modus tollens (MT; “if p, then q, and q is false, then p is false”), whereas the two invalid inferences are affirmation of consequent (AC; “if p, then q, and q is true, then p is true”) and denial of antecedent (DA; “if p, then q, and p is false, then q is false”). This type of reasoning task was used for Experiments 2–4 while a Wason selection task (Wason, 1966 ) was used for Experiment 1. The classical Wason selection task (WST) consists of a conditional rule (e.g., “If a card has a vowel on one side, then it has an even number on the other side.”) accompanied by four cards marked with a letter or number, visible only from one side (e.g., A, D, 2, 3). Thus, one side of the card presents the truth or falsity of the antecedent (e.g., A, D) and the other side the truth or falsity of the consequence (e.g., 2, 3). This task requires turning over only those cards which are needed in order to check the validity of the rule. The logically correct response is to turn over the A-card (to check whether the other side is marked with an even number, MP) and the 3-card (because this is not an even number and therefore no vowel should be on the other side, MT). For reasons of brevity, the reader is referred to Johnson-Laird ( 2006 ) and Knauff ( 2007 ) for a detailed overview of the different types of reasoning problems used in the present paper. We used these tasks in the present work since sentential conditional tasks and the Wason selection task are the best understood problems of logical reasoning research (overview in Johnson-Laird and Byrne, 2002 ).

Previous studies and main hypotheses

Several studies on logical reasoning found that participants' performance is modulated by their emotional state. In several experiments, participants underwent a mood induction or were recruited based on their pre-existing emotional state. In both conditions, the emotional state often resulted in a deterioration of reasoning performance (Oaksford et al., 1996 ). In another study participants were recruited because they reported being depressed (Channon and Baker, 1994 ). They were presented with categorical syllogisms and their performance was worse than that of non-depressed participants. One possible explanation is that emotionally congruent information (e.g., sad content in case of being depressed) put additional load on working memory (e.g., Baddeley, 2003 ). Other explanations are that different emotional states affect people's motivation to solve rather complex cognitive tasks (Melton, 1995 ) or that the emotional state affects how attention is allocated (e.g., Gable and Harmon-Jones, 2012 ) even with positive material (e.g., Gable and Harmon-Jones, 2013 ).

The content of the reasoning task can also have an effect on performance. For instance, the content can result in a stereotypical reaction which negatively affects performance on a conditional reasoning task (Lefford, 1946 ; see also De Jong et al., 1998 ). Other studies have shown that negative as well as positive content has a detrimental effect on conditional reasoning performance as opposed to neutral content which may be due to reduced working memory resources (Blanchette and Richards, 2004 ; Blanchette, 2006 ). The problem content can also be freed from any semantic value by using non-words that have been conditioned via classical conditioning to assume an emotional value. Therefore, the effect of non-semantic emotional material on reasoning performance can be investigated. Classical conditioning has been used to condition non-words and neutral words with a negative or positive emotional value and resulted in participants providing fewer logically valid answers in a conditional reasoning task (Blanchette and Richards, 2004 ; Blanchette, 2006 ). The hypothesis that emotions affect how conditional reasoning tasks are interpreted could not be confirmed (Blanchette, 2006 ).

The literature review shows that mood and emotional problem content negatively affect logical reasoning performance. However, the effects on reasoning performance are still ambiguous, in particular when mood is combined with a problem content that is relevant to the mood, e.g., a participant in a sad mood is presented with a sad reasoning problem about bereavement (mood and content are congruent). Some studies have shown that such a combination results in worse performance. Health-anxiety patients, when reasoning about health-threats in a Wason selection task, have a threat-confirming strategy (Smeets et al., 2000 ), for example, they very likely interpret a tremor as a sign of Parkinson's disease or chest pain as an indicator for cardiac infarction, etc., even though other—less dangerous—causes are much more likely. Thus, threat-confirming participants select the card that confirms (rather than falsifies) their fears about the anticipated illness. Controls that do not have health-anxiety do not show such a bias when reasoning about health-threats. These findings are similar to another study that also used a Wason selection task where spider-phobic participants confirmed danger rules and falsified safety rules more often for phobia-relevant information than controls (De Jong et al., 1997a ). Furthermore, socially anxious participants performed worse in relational inference tasks when the content was relevant to social anxiety as opposed to neutral content (Vroling and de Jong, 2009 ). However, spider phobic patients compared to non-phobic controls performed worse when the reasoning problem's content was specifically related to their phobia as well as when it contained general threat material (De Jong et al., 1997b ).

Other studies found no difference between control participants in a neutral mood, participants with health-anxiety (De Jong et al., 1998 ) or participants who were not recruited from a clinical population but nevertheless reported anxiety symptoms (Vroling and de Jong, 2010 ). Participants in a neutral mood as well as anxious participants performed worse in the threat condition. Lastly, some studies even found a beneficial effect of emotions on logical reasoning performance. After the bombing in London in 2005 a study was carried out to investigate if the increased amount of fear which was related to the bombing, has an impact on the performance of participants when solving conditional reasoning tasks that were related in content to the bombing (Blanchette et al., 2007 ). It resulted that fearful participants provided more correct responses on a reasoning task with fear-related content than participants that did not report a high level of fear. In another study participants that had been primed to be angry or who remembered an incident when they have been cheated on, performed better when the reasoning task involved detecting cheaters (Chang and Wilson, 2004 ). This mood congruent effect was not found when participants who remembered an altruistic incident had to detect altruists. An evolutionary psychology explanation is offered for these findings as the authors suggest that the ability to detect cheaters provides an evolutionary advantage (Chang and Wilson, 2004 ).

The ambiguous results in the literature motivated us to bring together the effect of the reasoners' emotional state and the effect of the reasoning problems' emotionally-laden content. Based on this combination we formulated and tested the following hypotheses:

  • Positive and negative emotion 1 will result in a reduction of reasoning performance.
  • Positive and negative problem content will result in a reduction of reasoning performance.
  • There will be an interaction between the person's emotional state and the emotional content of the problem.

To test these hypotheses, four experiments have been carried out to investigate the effect of emotion, problem content and the combination of the two on reasoning performance. The experiments are:

  • Experiment 1: Positive, negative or neutral emotion (induced) paired with a Wason selection task that had positive, negative or neutral problem content.
  • Experiment 2: Positive, negative or neutral emotion (induced) paired with conditional reasoning tasks that had positive, negative or neutral problem content.
  • Experiment 3: Anxious or neutral emotion (spider-phobic or non-phobic participants) paired with conditional reasoning tasks that had neutral, negative or anxious (phobia-relevant) content.
  • Experiment 4: Anxious or neutral emotion (exam anxiety or confidence) paired with conditional reasoning tasks that had neutral, negative or anxious (exam anxiety-relevant) content.

Experiment 1: emotions in the wason selection task

This experiment was designed in order to test the hypotheses that emotion and emotional content have a disrupting effect on reasoning performance. The participants' emotion was either neutral or induced to be positive or negative and then they had to solve Wason selection tasks. The content of the reasoning tasks which all participants had to solve was positive, negative or neutral as well.

Participants

Thirty students from the University of Giessen participated in this study (mean age: 22.93 years; range: 19–30 years; 18 female, 12 male). They did not participate in any previous investigations on conditional reasoning and they received a monetary compensation of eight Euro. The participants came from a range of disciplines and none of them were psychology students. They were all native German speakers and provided informed written consent.

Design and materials

First, the emotional state of the participants was measured with the German version of the positive and negative affect schedule (PANAS; Watson et al., 1988 ; Krohne et al., 1996 ) with which a score for negative and one for positive affect can be computed. Then the participants' emotional state was altered by a manipulated IQ-test. The procedure is described below. However, participants were not told that their emotional state was to be altered with a success-failure-method and they were randomly assigned to the “success group,” “neutral group,” and “failure group.” This method has high reliability and ecological validity (Nummenmaa and Niemi, 2004 ).

During the logical reasoning task participants had to solve 24 Wason selection tasks based on the three types of content (positive, negative, and neutral). While Wason selection tasks with positive emotional value described success situations, the negative ones described failure situations. This was done to create a link between emotion and the content of the reasoning material. Table ​ Table1 1 shows examples of the positive, negative, and neutral logical reasoning problems. The sentences were presented in German language. Each problem was presented by means of four different virtual cards on a computer screen as can be seen in Figure ​ Figure1. 1 . The participants were told that each card contained one part of the rule on one side and the other part of the rule on the other side. On one set of cards, for example, one side of the card contained the information about whether somebody succeeds or not and on the other side whether somebody is glad or not (the correct answer in our example is card 1 and card 4 which means to verify and to falsify the rule; card 1 and card 3 which is the empirically most frequent answer means that participants in both cases try to verify the rule). The order of cards on the screen was pseudo-randomized and the order of Wason selection problems was completely randomized across participants.

Examples for negative (mirroring failure situations), positive (mirroring success situations) and neutral rules (words and sentences were presented in German language in all experiments) .

PositiveWhen somebody passes an exam, then he is happy
When somebody triumphed, then he is lucky
NegativeWhen somebody feels overstrained, then he is sad
When somebody has self-doubts, then he is depressed
NeutralWhen somebody is cabinet maker, then he works with wood
When somebody showers, then he uses shampoo

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0001.jpg

Example of a WST problem with the four corresponding cards .

Participants were tested individually in a quiet laboratory room at the Department of Psychology of the University of Giessen. Prior to the experiment they were informed about the procedure. The emotional state of the participants was measured with the German version of the positive and negative affect schedule (PANAS; Watson et al., 1988 ; Krohne et al., 1996 ) with which a score for negative and for positive affect can be computed. This scale is based on 10 positive and 10 negative adjectives. Participants are required to state the emotional intensity of each word on a five point scale: 1 = “not at all,” 2 = “a little,” 3 = “moderately,” 4 = “quite a bit,” and 5 = “very much.” Thus, for the positive as well as for the negative affect a score ranging between 10 and 50 points could be computed. Examples of the test adjectives are: afraid, guilty, inspired, proud, etc. This emotion measurement schedule has been validated in several studies (e.g., Krohne et al., 1996 ; Crawford and Henry, 2004 ).

After that the participants carried out a subset of items from the IST2000R (Amthauer et al., 2001 ), which is a popular IQ-test in psychological research and practice. This subtest consisted of 13 items from three different categories: sentence completion, calculation and matrix tasks. These items were selected from all items by using the norming data from the intelligence test. For one group we selected the 13 problems that are most difficult, for the second group we selected items with moderate difficulty according to the norms and for the third group we used the easiest items from the IST2000R. Here is one example for the calculation tasks: (24/144) × 96 = ? (difficult), (3/6) + (20/8) = ? (moderate), and 8 × 123 = ? (easy). The items were presented on a sheet of paper and had to be solved by the participants in a limited time. In order to boost the effect of the emotion manipulation we also told them that the test was especially developed to predict academic success and that an average student solves approximately 50% of the items correctly. The time limit was 15 min. After finishing the test the participants received a manipulated verbal feedback on their performance to influence their emotional state. The feedback for the negative emotion group with the difficult problems was: “We are sorry to say that the analysis of your data showed that your performance was below the average student performance.” The feedback for the neutral emotion group with moderate item difficulty was: “The analysis of your data showed that your performance was on average student performance.” The participants from the positive emotion group with the easy items were told that their performance was above the average of student performance. Please note that this feedback did not reflect their real performance, because even if participants managed to solve the difficult problems they got the negative feedback. Accordingly, the participants in the positive emotional group got positive feedback even if they failed to solve the problems.

After this the emotional state was assessed again to see whether the mood induction was successful. Finally they were given the Wason selection tasks. In order to hide the real purpose of our study, we told the participants they had to do the PANAS since current emotions could influence their performance on intelligence tests and that we wanted to control for this. All our experiments were approved by the ethics committee of the German Psychological Association (DGPs).

The experiment then started with the emotion induction sequence [PANAS (t1), intelligence test items, feedback and PANAS(t2)], followed by the 24 Wason selection tasks. A computer administered the Wason selection problems using the SuperLab 4.0 software (Cedrus Corporation, San Pedro, CA) and recorded participants‘ answers (in all experiments). A self-paced design was used for data collection. When the problem with the four cards was presented on the screen participants had to decide which of the cards they would like to turn over in order to check the validity of the given rule. They were asked to pick one or more cards by pressing the corresponding keys on the labeled keyboard. For instance, to turn over card 1 they had to press the “1” key, which was clearly labeled “card 1.” The problems were separated by the instruction to press the <spacebar> whenever ready for the next problem. At the beginning one practice problem was presented to familiarize participants with the task but no feedback was given. At the end of the experiment all participants were informed about the true nature, the intention and the manipulations of the experiment. In all experiments data was analyzed with SPSS19 (IBM © ) using analyses of variance (ANOVAs) and t -tests (details are given in each of the Results sections).

The emotion manipulation was successful, as can be seen in Figure ​ Figure2. 2 . The success group revealed a significant increase of positive affect from t1 to t2 [ t (9) = −4.906, p = 0.001], while the negative affect decreased. The failure group scores showed a significant decrease in positive affect [ t (9) = 5.471, p < 0.001] and a significant increase in negative affect [ t (9) = −4.226, p > 0.01]. For the neutral group no differences were found, neither for the positive nor the negative affect. A One-Way ANOVA including the factor “positive difference scores” and a between-subject factor (neutral, success, or failure group) revealed significant group differences [ F (2, 27) = 23.964, Mean Squared Error ( MSE ) = 6.511, p < 0.001]. A second One-Way ANOVA with the factor “negative difference scores” also showed group differences [ F (2, 27) = 7.975, MSE = 6.407, p < 0.01]. Planned t -tests for independent samples revealed significant differences in positive difference scores for the success and neutral group [ t (18) = 4.618, p < 0.001] and for the success and failure group [ t (18) = 7.069, p < 0.001]. Significant differences in the negative scores were observed for the comparison between success group and failure group [ t (18) = −3.192, p < 0.01], as well as for the comparison between failure group and neutral group [ t (18) = 4.024, p = 0.001].

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0002.jpg

Difference scores (t 2 - t 1 ) for the emotion induction of Experiment 1 for each group . ** p ≤ 0.01, *** p ≤ 0.001.

On average, participants only solved 5% of the problems correctly (turning card 1 and card 4; every other decision was incorrect which occurred in 95% of the cases). We are aware that this performance is very low. Therefore, we initially thought that it might be useful to statistically test whether this performance significantly differs from chance level. We then decided, however, not to follow this idea, because our results agree with the entire literature on the Wason selection task (Wason and Johnson-Laird, 1972 ; overview in Manktelow, 2004 ). Moreover, the usual way of dealing with these low performance rates is to use the “falsification index” and the “confirmation index,” which have been introduced by Oaksford et al. ( 1996 ). These indices give a better performance measurement than just comparing correct answers in the Wason selection tasks. The indices can range from +2 to −2 and provide a measure of whether an individual tried to verify or to falsify a given rule by turning over certain cards or card combinations (Oaksford et al., 1996 ; Chang and Wilson, 2004 ). The falsification index (FI) is computed with the formula FI = (p + not q) − (not p + q) and stands for the participants' tendency to choose the p and not q cards in order to falsify the rule. Note, that a score of +2 is equivalent to full logicality. The confirmation index (CI) is the “complement” of the falsification index; it stands for the degree to which participants choose the p and q cards in order to confirm the rule. It is calculated with the formula CI = (p + q) − (not p + not q) (Oaksford et al., 1996 ). Note that a score of +2 is equivalent to a confirming strategy without falsifying the given rule. The mean falsification index is shown in Figure ​ Figure3 3 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0003.jpg

Falsification index (ranging from −2 to 2) for the WSTs for each group . It represents the choices of p and not-q in order to falsify the rule (modus tollens). ** p ≤ 0.01.

Falsification indices were then used in an ANOVA including the within-subject factor content (positive, negative, neutral) and the between-subject factor group (success, failure, neutral). This analysis showed that emotion of participants resulted in a significant difference, F (2, 27) = 6.033, MSE = 0.574, p < 0.01, but not the content of the reasoning problem. Post-hoc t -tests showed that the falsification index of the failure group differed significantly from those of the neutral group [ t (3.737) = −3.435, p < 0.01] and the success group [ t (10.353) = 3.14, p = 0.01]. Overall, the neutral group [ Mean Falisification Index (MFI) = 0.636, Standard Error (SE) = 0.19] performed better than the success group ( MFI = 0.426, SE = 0.14) and the success group in turn was better than the failure group ( MFI = − 0.029, SE = 0.038). A more detailed descriptive analysis showed that this effect is due to a specific type of error. In fact, participants in the failure group have chosen the p and q card most frequently (Figure ​ (Figure4 4 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0004.jpg

Choices of the p and q cards of the WSTs in relative frequencies (%) for each group . With p and q (modus ponens) participants tried to confirm the rule.

The results indicate that the emotions of an individual have an effect on reasoning performance independent from task content. In particular, a negative emotion resulted in a lower falsification index meaning that participants in a negative emotional state were more likely to deviate from logical norms. The participants in a positive state were also not as good as the neutral group, but this difference was less pronounced. Overall, participants in a neutral emotional state performed best. However, no interaction has been found between participants' emotion and the emotional task content, neither for the falsification index, nor for the confirmation index. Thus, it was not easier for individuals in positive (negative) emotion to solve Wason selection tasks with positive (negative) content. The reason for this might be that the Wason selection task overall is too difficult to solve and that there is no generally accepted theory about what makes the tasks so complex. A recent overview of such approaches can be found in Klauer et al. ( 2007 ). For our studies the reasons for the difficulty of the Wason selection task are not particularly essential. However, a detrimental result might be that participants' low performance could result in a “floor effect” and thus existing effects of the emotional content might not be visible in the data. In order to control for this possible deficit, a paradigm for the subsequent experiment has been chosen which is known to result in better performance.

Experiment 2: emotions and conditional reasoning tasks

The intention of this experiment was to use a reasoning task which participants find easier to solve than a Wason selection task. We therefore used a conditional reasoning paradigm. If such a task is easier, any difference between groups' performance should be much clearer and such differences can be more readily attributed to the experimental manipulation. Again, the conditional reasoning tasks had a positive, negative or neutral content and like in the previous experiment, participants' emotions were either induced (positive or negative) or neutral.

Thirty students from the University of Giessen participated in this study (mean age: 22.6 years; range: 20–27 years; 22 female, 8 male). They did not participate in any of the other investigations. They received an eight Euro compensation for participation. All participants were naïve with respect to the aim of the study, none were psychology students. All were native German speakers and provided informed written consent.

The same success-failure-method which was used in the previous experiment was used for the emotion induction. Reasoning problems consisted of pairs of premises that were followed by a to-be-validated conclusion. Four premise-pairs had a positive, four a neutral and four a negative content. These 12 problems were combined with the four possible inferences: modus ponens (MP), modus tollens (MT), denial of antecedent (DA) and affirmation of consequence (AC), resulting in 48 conditional inferences per participant. All problems were randomized for each participant. Half of the presented conclusions were valid; the other half were invalid. Here are two examples of inferences with a valid conclusion:

  • Modus ponens/positive emotional content
  • Premise 1: When a person succeeds, then the person is glad.
  • Premise 2: A person succeeds.
  • Conclusion: This person is glad.
  • Modus tollens/negative emotional content
  • Premise 1: When a person performs poorly, then this person is angry.
  • Premise 2: A person is not angry.
  • Conclusion: This person did not perform poorly.

The participants were tested individually in a quiet laboratory room at the Department of Psychology of the University of Giessen. Prior to the experiment, the participants were again instructed about the procedure of the experiment. Subsequently, the emotion induction started and resulted in a “success group,” a “failure group,” and a “neutral group.” Then the inferences were presented on a computer screen. A self-paced design was used. After reading the first premise on the screen, participants had to press the space bar to reach the next premise, then again the space bar to reach the conclusion. While both premises were presented in black letters the conclusion was presented in red. The task required an evaluation whether the conclusion followed necessarily from the premises (no evaluations as biconditionals). Participants responded by pressing either a “Yes” key or a “No” key on the keyboard. There were two practice trials at the beginning of the experiment but no feedback was given. At the end of the experiment there was a debriefing and a detailed explanation of the true purpose of the experiment.

The emotion induction was again successful. In the success group the positive affect was elevated and the negative affect reduced (similar to the previous experiments' mood induction). In the failure group the positive affect decreased and the negative affect increased (although the latter was not significant, due to a large standard error). No alteration for positive and negative affect was found in the neutral group. The ANOVA revealed significant group differences [ F (2, 27) = 15.964, MSE = 13.607, p < 0.001] and the t -tests for independent samples showed that the success and neutral group were dissimilar in the difference scores of positive affect [ t (18) = 2.146, p < 0.05], as well as the success and failure group [ t (18) = 5.666, p < 0.001] and the failure and neutral group [ t (18) = −3.854, p < 0.01].

Performance for the sentential conditional inference problems was better than for the Wason selection tasks as 61.46% of the problems were correctly solved. Error rates were compared using an ANOVA for the emotionality of the participants (success, failure, and neutral group) and the emotional content (positive, negative, neutral). Significant differences were found for both factors.

With respect to the emotional state the performance of the participants in the three groups was reliably different [ F (2, 27) = 3.68, MSE = 2.492, p < 0.05] and paired sample t -tests show that error rates for the failure group were significantly higher compared to the neutral group [ t (18) = 2.622, p < 0.05]. The neutral group showed best performance [ Mean (M) = 0.310, SE = 0.046] followed by the success group ( M = 0.402, SE = 0.035) and the failure group which committed most errors ( M = 0.446, SE = 0.024). These results are represented in Figure ​ Figure5 5 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0005.jpg

Error rates in relative frequencies (%) for the conditional reasoning task for each group . * p ≤ 0.05.

The difference between positive and negative content of the reasoning problems was also significant. The ANOVA showed a significant main effect [ F (2, 54) = 3.159, MSE = 0.555, p = 0.05] and the post-hoc paired sample t -tests revealed a significant difference in error rates between positive and negative content [ t (29) = 2.491, p < 0.05]. The fewest errors were made with negative content ( M = 0.356, SE = 0.029), followed by neutral content ( M = 0.385, SE = 0.022), and positive content ( M = 0.417, SE = 0.028). This is visualized in Figure ​ Figure6. 6 . However, no interaction was found between emotional state and task content.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0006.jpg

Error rates in relative frequencies (%) for the conditional reasoning task for each type of content . * p ≤ 0.05.

The reported findings show that several factors can influence reasoning performance. Performance can be affected either by the emotion of the individual or the content of the problem or the type of inference.

The effect of emotion might be due to the fact that emotion results in representations in working memory that occupy the same subsystems that are also needed for reasoning (Oaksford et al., 1996 ). The content effect is also interesting since it challenges previous findings. While we found fewer errors in inferences with negative content, Blanchette and Richards ( 2004 ) found that emotions impair reasoning performance no matter whether they are positive or negative.

Experiment 3: spider-phobic participants and conditional reasoning

In contrast to the previous experiments the sample for this experiment was selected from a population with spider phobia. Therefore, it was not necessary to induce emotions as participants were selected for their anxiety with high ecological validity. This was done to expand the findings of the previous experiments in order to see if a difference in performance can be found for participants that already have pre-existing moods in certain situations without any mood induction. Additionally, we were interested in whether content relevant to the illness of such participants has any effect on their reasoning abilities.

Nine spider phobic students (mean age: 22.33 years; range: 20–26 years; 7 female, 2 male) and seven non-phobic control students (mean age: 22.86 years; range: 20–26 years; 7 female) from the University of Giessen participated in the experiment. Participants were selected from a larger sample by means of scores on the Spider Phobia Questionnaire (SPQ; Klorman et al., 1974 ). SPQ scores of spider fearful students ( M = 20.22; SE = 0.878) were significantly higher than those of the non-fearful control students ( M = 2.00; SE = 0.873) [ t (14) = −14.459; p < 0.001]. Each participant received five Euro or a course credit for participation. Moreover, we controlled for participants being no psychology students (thus, no pre-experience with logical reasoning tasks) and all were native German speakers. All participants provided informed written consent.

Design and procedure were similar to that of Experiment 2. Forty-eight reasoning problems consisted of pairs of premises that were followed by a to-be-validated conclusion. However, the content differed because four statements had a spider phobia relevant content, four were generally negative and four neutral. The presentation of the 48 three-term problems was randomized across participants. Examples of the statements are presented in Table ​ Table2 2 .

Examples of statements with different content .

Spider phobia relevantWhen a person sees a toy spider, then the person is scared witless
NegativeWhen a person is anorexic, then the person has to be force-fed
NeutralWhen a person is a craftsman, then the person has served an apprenticeship

All participants were tested individually in a quiet room at the Department of Psychology of the University of Giessen. At the beginning participants filled out the SPQ. Afterwards the logical reasoning tasks had to be solved. Presentation of problems and recording of responses was identical to Experiment 2.

Error rates of the conditional reasoning task were compared using an ANOVA with the between-subject factor group and the two within-subject factors content and type of reasoning.

For the content of reasoning problems a significant main effect was obtained [ F (2, 28) = 4.645; p < 0.05]. Further paired t -tests showed that error rates for spider phobia relevant problems ( M = 36.72%; SE = 4.30%) resulted in significantly more errors than neutral ones ( M = 30.47%; SE = 4.41%) [ t (15) =2.928; p = 0.01]. This was due to spider phobics performing worse on phobia relevant contents. This interaction between problem content and emotion was significantly different [ F (2, 28) = 6.807; p < 0.01]. A post-hoc paired t -test revealed that spider phobics performed significantly worse for inference problems with spider phobia relevant content ( M = 43.06%; SE = 4.47%) compared to negative ones ( M = 34.72%; SE = 5.01%) [ t (8) = 2.667; p < 0.05]. Furthermore, phobia relevant problems resulted in more errors than neutral ones ( M = 36.81%; SE = 4.71%) but marginally failed to reach significance [ t (8) = 2.268; p = 0.053]. However, non-phobics made significantly more errors for inferences with negative content ( M = 33.93%; SE = 6.38%) compared to spider phobia relevant ( M = 28.57%; SE = 7.20%) [ t (6) = −2.521; p < 0.05] and neutral problems ( M = 22.32%; SE = 7.33%) [ t (6) = −3.653; p < 0.05]. This interaction pattern between the groups and the task content of the conditional reasoning task is visualized in Figure ​ Figure7 7 .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-05-00570-g0007.jpg

Error rates in relative frequencies (%) for the spider phobic and non-phobic participants . * p ≤ 0.05.

Our results show that Spider phobics' performance was worst on problems related to spider phobia. We are aware of the fact that our sample size is rather small. One reason was that it is difficult to find spider phobics, because they usually avoid situations where they are confronted with spiders. However, our control group was also small. The reason for that is that we initially also tested nine participants in the control group (same number as in the experimental group) but then we had to eliminate two participants (due to response strategy, incomplete data recording) and could not replace them by two new participants for technical reasons. However, we do not think that this is a serious problem, because even with this small sample size our differences reached the level of statistical significance. Given these thoughts we think that our results reliably show that illness related tasks impair reasoning for anxiety patients.

There are a couple of possible explanations of how (positive and negative) emotions impeded on reasoning performance. One explanation is that all kinds of emotions have negative effects on the motivation or effort of the participants (e.g., Lefford, 1946 ). Other explanations are based on dual process models (System or Type 1: automatic, fast, intuitive, based on prior knowledge; System or Type 2: effortful, slow, explicit, rule-based, e.g., Stanovich, 2010 ). A good overview on the different theories is provided in Blanchette ( 2014 ). However, we believe that the most reasonable explanation for the current findings is provided by the suppression theory (Oaksford et al., 1996 ): processing phobia relevant material comprised the confrontation with the phobic object which causes fear. This yields a strong emotional response resulting in a pre-load of working memory resources. Moreover, there is evidence that spider phobia could change reasoning patterns. De Jong et al. ( 1997a ) showed that spider phobics tend to rely on a danger-confirming reasoning strategy while solving phobia relevant Wason selection tasks. While spider phobics performed worst on phobia relevant problems in our study, non-phobics revealed worst performance on problems with negative content. These results are in line with Blanchette and Richards ( 2004 ) and Blanchette ( 2006 ). Overall affirmation of consequence and denial of antecedent with spider phobia relevant and negative content resulted in more errors which is similar to findings of Blanchette and Richards ( 2004 ).

Experiment 4: exam-anxious participants and conditional reasoning tasks

This experiment was designed to investigate if the effect found in Experiment 3 extends to other anxiety related conditions such as exam-anxiety. Therefore, participants were also selected based on their anxious state and some of the problems had an emotional content which was relevant to exam-anxiety while others were neutral or generally negative.

The sample consisted of 17 students with exam anxiety and 17 students without exam-anxiety. They have been selected from a larger sample ( N = 47) based on their scores of a measure for exam-anxiety (Hodapp, 1991 ). They were all female because exam-anxiety is more prevalent amongst women (Zeidner and Safir, 1989 ; Chapell et al., 2005 ; Wacker et al., 2008 ). The age range was 20–29 years (mean age for participants with exam-anxiety: 24.24 years, without exam-anxiety: 23.12 years). For remuneration they could choose to receive five Euro or a course credit. Psychology students and people who have already taken part in experiments about this topic were excluded. All participants were native German speakers and provided informed written consent.

Participants were assessed with the TAI-G (Hodapp, 1991 ), a measure for exam-anxiety, in order to differentiate between exam-anxious and non-anxious participants. The TAI-G consists of 30 statements which describe emotions and thoughts in exam situations. Participants are asked how well those statements describe them when they have to take exams. Statements were ranked on a scale from “never” (1), “sometimes” (2), “often” (3) to “almost always” (4).

Examples of such statements are:

“I have a strange sensation in my stomach.”

“Thoughts suddenly start racing through my head that block me.”

“I worry that something could go wrong.”

Scores of the TAI-G range from 30 to 120. In order to be classified as exam-anxious a minimum score of 84 is necessary while a score below 54 is classified as non-exam-anxious. Those limits were obtained in a study with 730 students (Wacker et al., 2008 ) in which one standard deviation ( SD = 14.8) was subtracted from the mean score ( m = 69.1) to obtain the lower limit and added to obtain the upper limit.

Once participants finished the TAI-G, they were given the conditional inference problems. The 48 conditional inference problems consisted of “if, then”-statements of which one third were exam-anxiety-related, one third generally negative and one third emotionally neutral. Examples are given in Table ​ Table3. 3 . Presentation of the problems and recording of answers was identical to Experiments 2 and 3.

Exam-relatedIf a person is waiting in front of the exam room, then the person is nervous
NegativeIf a person has breast cancer, then the person has lumps in her breasts
NeutralIf a person is thirteen years old, then the person is still a child

The selection of exam-phobic and non-exam phobic groups of participants was successful. The group of the exam-anxious participants had a TAI-G score that ranged from 84 to 107 and a mean of 97 ( SE = 1.586). The group of non-exam-anxious participants had a score between 39 to 54 and a mean of 48 ( SE = 1.047). A t -test for independent samples showed a significant difference between groups [ t (32) = 25.788, p < 0.001].

Moreover, as expected, the ANOVA revealed a significant main effect with respect to content [ F (2, 64) = 8.058; p = 0.001]. Post-hoc t -tests showed that conditional inference problems with fear-related content ( M = 44.67%; SE = 2.52%) resulted in more errors than other negative ( M = 36.58%; SE = 2.53%) [ t (33) = 3.703; p = 0.001] and neutral problems ( M = 37.87%; SE = 2.80%) [ t (33) = 2.626; p < 0.05]. A repeated measures ANOVA was carried out based on error rates for type of inference (MP, MT, AC, and DA), content (fear-related, negative, and neutral) and exam-anxiety. However, no significant interaction was found for content and group. This means that both exam-anxious and non-exam-anxious participants performed similar across fear-relevant, negative, and neutral problems.

Our results show that exam-anxious and non-exam-anxious participants performed similar across fear-relevant, negative and neutral problems. Inferences about exam-anxiety resulted in reduced performance in both groups. This may be because all participants were currently enrolled at university and so can relate to exam-anxiety. Moreover, physiological changes have been observed in people who are high-exam-anxious as well as low-exam-anxious (Holroyd et al., 1978 ). Therefore, associations to exam-situations can get triggered which reduce working memory resources and subsequently performance on reasoning problems (Oaksford et al., 1996 ; Blanchette and Richards, 2004 ). In contrast to previous findings (Lefford, 1946 ; De Jong et al., 1998 ; Blanchette and Richards, 2004 ; Blanchette, 2006 ) negative problems did not result in a reduction of performance. Even though these problems were emotional and negative (e.g., “if a person has a miscarriage, then this person will get depressed”) participants may not have been able to relate to the content as it was not as personally relevant to students as the exam-related content.

General discussion

We conducted two experiments with participants who underwent a mood induction and two with participants that were either anxious about spiders or exams. Experiment 1 showed that the emotions of an individual have an effect on reasoning performance independent from task content. In Experiment 2, we found that reasoning performance can be affected either by the emotion of the individual or the content of the problem or the type of inference. In Experiment 3, spider-phobic participants showed lower reasoning performance in spider-related inferences, but in Experiment 4, exam-anxious participants did not perform worse on inferences with an exam-related content.

The results agree with some of our hypotheses but not with all of our initial assumptions. Our first hypothesis was that positive and negative emotion will result in a reduction of logical reasoning performance. This was confirmed as in the first and second experiment participants in a neutral emotional state outperformed those in negative or positive emotion independent of the task (WST and conditionals). These findings are consistent with previous research (Channon and Baker, 1994 ; Melton, 1995 ; Oaksford et al., 1996 ). When a negative or positive emotional state has been induced in participants this results in a deterioration of performance on a Wason selection task compared to participants in a neutral emotional state (Oaksford et al., 1996 ). In another study participants were recruited because they reported being depressed (Channon and Baker, 1994 ). They were presented with categorical syllogisms and their performance was worse than that of non-depressed participants. An explanation that has been offered is that as emotionally congruent information gets retrieved and processed this takes away resources from working memory (e.g., Baddeley, 2003 ) that should have been used to process the reasoning task. In addition, positive emotional states also result in poorer performance (Melton, 1995 ), as it is assumed that people in a positive mood pursuit more global reasoning strategies, paying less attention, and are therefore more prone to errors than people in a negative, analytic mood.

Our results concerning the second hypothesis (predicting a detrimental effect on performance of positive and negative problem content) are mixed. It was confirmed by the third experiment in which non-phobic participants performed best when the content was neutral. On the other hand, the content had no effect on performance in the first experiment, and in the second experiment, best performance was measured with negative content, whereas most errors were committed with positive content. In the fourth experiment there was no difference between negative and neutral content and performance was worst with exam-anxiety related content. These findings partially agree with previous research showing that performance is affected when the content is related to general threats because then participants tend to select threat-confirming and safety-falsification strategies in a Wason selection task (De Jong et al., 1998 ). Other studies have shown that negative as well as positive content has a detrimental effect on conditional reasoning performance as opposed to neutral content which may be due to reduced working memory resources (Blanchette and Richards, 2004 ; Blanchette, 2006 ). Furthermore, if the content is controversial, it can stir up emotions that result in a stereotypical reaction that negatively affects performance of a conditional reasoning task (Lefford, 1946 ). In this study participants made more errors when the content was controversial (e.g., stereotypical responses such as “homeless person are lazy”) as opposed to neutral.

The third hypothesis stating there may be an effect on performance when positive and negative mood is combined with positive and negative problem content was only supported by Experiment 3, which found the expected interaction. Nonetheless, the absence of the suggested interaction in three of four experiments is in line with some previous findings (e.g., De Jong et al., 1998 , health-anxiety; Vroling and de Jong, 2010 , anxiety symptoms in a non-clinical population).

Only in the third experiment participants who are afraid of spiders performed worse on problems with a spider phobia relevant content compared to a negative content which strengthens other findings (De Jong et al., 1997a , b ; Smeets et al., 2000 ; Vroling and de Jong, 2009 ). A similar trend was observed for the performance on spider phobia relevant problems compared to neutral ones. Yet this difference was insignificant, maybe a bigger sample would have yielded clearer results. A previous study showed that, when reasoning about health-threats in a Wason selection task, health-anxiety patients have a threat-confirming strategy (Smeets et al., 2000 ). Controls that do not have health-anxiety do not show such a bias when reasoning about health-threats. These findings are similar to another study that also used a Wason selection task where spider-phobic participants confirmed danger rules and falsified safety rules more often for phobia-relevant information than controls (De Jong et al., 1997a ). Furthermore, socially anxious participants performed worse in relational inference tasks when the content was relevant to social anxiety as opposed to neutral content (Vroling and de Jong, 2009 ). However, spider phobic patients compared to non-phobic controls performed worse when the content of the reasoning problem was specifically related to their phobia as well as when it contained general threat material (De Jong et al., 1997b ).

Why did we find no evidence showing that performance is improved when emotion and content are congruent? In Blanchette et al. ( 2007 ) fearful participants provided more correct responses on a reasoning task with fear-related content than participants that did not report a high level of fear. In another study participants who had been primed to be angry or who remembered an incident when they had been cheated on performed better when the reasoning task involved detecting cheaters (Chang and Wilson, 2004 ).

We think that the ambiguity in previous findings (Channon and Baker, 1994 ; Melton, 1995 ; Oaksford et al., 1996 ; Chang and Wilson, 2004 ; Blanchette et al., 2007 ) and our own experiments may be due to the differences between samples. The first two experiments induced emotions in participants who were primarily sad and frustrated whereas the last two experiments' participants were anxious. Hence one is not comparing like with like. The latter two experiments can be further differentiated as the third experiment selected people for the control group who are not afraid of spiders. However, most students experience some form of exam-anxiety and the sample of the fourth experiment was entirely made up of students. This may explain why participants who reported exam anxiety as well as those who reported none both performed poorly when the content was exam anxiety related.

According to the suppression theory (Oaksford et al., 1996 ), emotion has a detrimental effect on performance because resources are otherwise allocated and not available to solve the task at hand. This means that emotional participants should perform worse than those in a neutral state. This has been confirmed in Experiments 1 and 2. Content may give rise to emotion and so similar results due to reduced working memory resources should also be found in experiments with emotional content. In Experiment 3 best performance was with neutral content, possibly because spider-related content triggered a response that used resources of working memory that would otherwise have been used to solve the task (e.g., avoidance strategy). Anxious content in Experiment 4 resulted in worst performance possibly for the same reason.

Thus far we focused on working memory resources, but it is also possible that attentional processes are of major relevance in this context. For example, correct decisions and decision times may be compromised during emotional (especially negative) processing, since emotional processing (in addition to reasoning) requires attentional resources (see for instance the work of Harmon-Jones et al.). However, we cannot fully dissolve this problem of working memory vs. attention at this stage with these experiments.

The findings of Experiment 3 are in contrast to those of Experiments 1 and 2, where no content and interaction effect were found. People with a phobia may perform worse on problems that have a content which is related to their phobia because they try to avoid stimuli that are anxiety-provoking (American Psychiatric Association, 2000 ). This avoidance is not necessarily found in depressed participants as they tend to ruminate on depressive material (American Psychiatric Association, 2000 ). While participants in Experiments 1 and 2 were not clinically depressed, the emotion that was induced had a depressive quality and therefore may explain why no interaction was found in these experiments. In addition maybe only anxiogenic stimuli have a depleting effect on working memory and previous research was largely based on anxiety (De Jong et al., 1998 ; Blanchette and Richards, 2004 ; Blanchette, 2006 ). In contrast, Lefford's ( 1946 ) material was not anxiogenic but he found an effect. He argued that this was due to a stereotypical response. However, if people do not relate to the content, then this will not result in a stereotypical response.

The reason for why no effect was found in Experiments 1 and 2 might be that the material was not as personally relevant and therefore did not trigger sufficient emotions for an effect to show. This does not explain why in Experiments 2 and 4 best performance was with negative content. One could argue that since this content is negative, participants are more deliberate in order to avoid negative consequences (if personally relevant for them). Furthermore, a more analytic processing style has been proposed for depression (Edwards and Weary, 1993 ) so that this content may have triggered such a processing style compared to a more global processing strategy with a positive emotion. Considering this one would have expected superior performance for negative emotion in Experiments 1 and 2 which was not the case.

Therefore, more clarity might be achieved if experiments compare personally relevant emotional content and emotional content that is not personally relevant. Content should also be differentiated according to it being anxious or depressive. Furthermore, anxious participants should be compared to depressed participants. A distinction has to be made between avoidance caused by anxiety and rumination caused by depression. If a detrimental effect on performance is found in both groups it has to be investigated whether this has the same cause, namely depleted working memory resources (or attentional resources).

From a psychotherapeutic point of view our studies are interesting as they show that spider phobic patients do not only show inadequate emotional responses to spiders. They, in fact, also show a decrement in performing cognitive tasks, such as logical reasoning if they have to do with spiders. The study shows an apparent connection between reported fear on the SPQ (Klorman et al., 1974 ) and behavior during experiments (error rates). Experiments 1 and 2 show that it is neither misery nor happiness but “common unhappiness” (Freud, 1895 , p. 322) that is desirable, because participants in a negative or positive mood did not perform well. This has been the case for decades in some therapeutic approaches which have recognized that being freed from misery better equips one to deal with life's adversities (Freud, 1895 ). People appear to find it easiest to process neutral (non-emotional) information (Experiments 1 and 2) but ideally sessions work with hot cognitions and elicit key emotions and cognitions (Safran and Greenberg, 1982 ; Beck, 1995 ). If neutral information becomes the focus of sessions, then sessions would elicit less key emotions and cognitions and turn into a nice chat which will be remembered pleasantly by the patient. Thereby the patient does not get overwhelmed with emotional material which will have a detrimental effect on reasoning. Instead the emotional material can be introduced bit by bit (e.g., as is the case in systematic desensitization in cognitive behavioral therapy).

It is worthwhile for patients to remember what has been discussed in sessions because new behaviors and alternative viewpoints which have been collaboratively developed in sessions may be easily forgotten, especially when the patient is suffering from a depression which often results in decreased concentration. Some therapists recommend that their patients take notes during sessions (Beck, 1995 ) but if only things that are easily remembered are discussed, this problem is circumvented. Therefore, if the patient wishes to get stabilized, non-emotional material may be best. If they want to work through distressing material however, it will not be possible to avoid emotional content. Hence emotions and cognitions are related and influence each other and one has to combine them according to what the goal is.

Thus far the key finding is that emotional state and content may interact to modulate logical reasoning. This is however only the case if (mood) state and (task) content are related (Experiment 3; spider-related content among spider phobics). But, this does so far not generalize to other contexts, since it could for example not be found in a sample with exam anxiety (Experiment 4; exam anxiety in combination with exam content). These ambiguities, the role of working memory and attentional processes need to be addressed in future studies in order to explain the influence of emotional content and emotion on human reasoning performance.

Author contributions

Nadine Jung did the statistical analysis and wrote the paper. Christina Wranke designed and conducted the experiments, and did the statistical analysis. Kai Hamburger designed the experiments and wrote the paper. Markus Knauff designed the experiments and wrote the paper.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This research was supported by DFG-Graduate Program ”Neuronal Representation and Action Control—NeuroAct” (DFG 885/2) to Christina Wranke and by DFG Grant KN465/9-1 to Markus Knauff. We thank Luzie Jung and Nadja Hehr for carrying out some of the experiments. We further thank Sarah Jane Abbott and Carolina Anna Bosch for proofreading the manuscript. Finally, we thank the reviewers for their valuable comments.

1 For reasons of simplicity the term “emotion” is also used to represent “mood” (emotional state). The distinction between emotion and mood will only be pointed out were necessary.

  • American Psychiatric Association. (2000). Diagnostic and Statistical Manual of Mental Disorders—IV—Text Revision, 4th Edn., Rev . Washington, DC: American Psychiatric Publishing [ Google Scholar ]
  • Amthauer R., Brocke B., Liepmann D., Beauducel A. (2001). Intelligenz-Struktur-Test 2000R . Göttingen: Hogrefe [ Google Scholar ]
  • Baddeley A. D. (2003). Working memory: looking back and looking forward . Nat. Rev. Neurosci . 4 , 829–839 10.1038/nrn1201 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Beck J. (1995). Cognitive Therapy: Basics and Beyond . New York, NY: Guilford [ Google Scholar ]
  • Blanchette I. (2006). The effect of emotion on interpretation and logic in a conditional reasoning task . Mem. Cogn . 34 , 1112–1125 10.3758/BF03193257 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Blanchette I. (ed.). (2014). Emotion and Reasoning . New York, NY: Psychology Press [ Google Scholar ]
  • Blanchette I., Richards A. (2004). Reasoning about emotional and neutral materials: is logic affected by emotion? Psychol. Sci . 15 , 745–752 10.1111/j.0956-7976.2004.00751.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Blanchette I., Richards A., Melnyk L., Lavda A. (2007). Reasoning about emotional contents following shocking terrorist attacks: a tale of three cities . J. Exp. Psychol. Appl . 13 , 47–56 10.1037/1076-898X.13.1.47 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bless H., Clore G. L., Schwarz N., Golisano V., Rabe C., Wölk M. (1996). Mood and the use of scripts: does a happy mood really lead to mindlessness? J. Pers. Soc. Psychol . 71 , 665–679 10.1037/0022-3514.71.4.665 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chang A., Wilson M. (2004). Recalling emotional experiences affects performance on reasoning problems . Evol. Hum. Behav . 25 , 267–276 10.1016/j.evolhumbehav.2004.03.007 [ CrossRef ] [ Google Scholar ]
  • Channon S., Baker J. (1994). Reasoning strategies in depression: effects of depressed mood on a syllogism task . Pers. Indiv. Differ . 17 , 707–711 10.1016/0191-8869(94)90148-1 [ CrossRef ] [ Google Scholar ]
  • Chapell M. S., Blanding B., Silverstein M. E., Takahashi M., Newman B., Gubi A., et al. (2005). Test anxiety and academic performance in undergraduate and graduate students . J. Educ. Psychol . 97 , 268–274 10.1037/0022-0663.97.2.268 [ CrossRef ] [ Google Scholar ]
  • Crawford J. R., Henry J. D. (2004). The positive and negative affect schedule (PANAS): construct validity, measurement properties and normative data in a large non-clinical sample . Br. J. Clin. Psychol . 43 , 245–265 10.1348/0144665031752934 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Dalgleish T., Power M. (eds.). (1999). Handbook of Cognition and Emotion . New York, NY: John Wiley and Sons Ltd [ Google Scholar ]
  • De Jong P. J., Mayer B., van den Hout M. A. (1997a). Conditional reasoning and phobic fear: evidence for a fear-confirming reasoning pattern . Behav. Res. Ther . 35 , 507–516 10.1016/S0005-7967(96)00124-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Jong P. J., Haenen M.-A., Schmidt A., Mayer B. (1998). Hypochondriasis: the role of fear-confirming reasoning . Behav. Res. Ther . 36 , 65–74 10.1016/S0005-7967(97)10009-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • De Jong P. J., Weertman A., Horselenberg R., van den Hout M. A. (1997b). Deductive reasoning and pathological anxiety: evidence for a relatively strong belief bias in phobic subjects . Cogn. Ther. Res . 21 , 647–662 10.1023/A:1021856223970 [ CrossRef ] [ Google Scholar ]
  • Edwards J. A., Weary G. (1993). Depression and the impression-formation continuum: piecemeal processing despite the availability of category information . J. Pers. Soc. Psychol . 64 , 636–645 10.1037/0022-3514.64.4.636 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ekman P., Davidson R. J. (eds.). (1994). The Nature of Emotion: Fundamental Questions . New York, NY: Oxford University Press [ Google Scholar ]
  • Freud S. (1895). Zur Psychotherapie der Hysterie , in Studien über Hysterie , eds Breuer J., Freud S. (Frankfurt: Fischer; ), 271–322 [ Google Scholar ]
  • Gable P. A., Harmon-Jones (2012). Reducing attentional capture of emotion by broadening attention: increased global attention reduces early electrophysiological responses to negative stimuli . Biol. Psychol . 90 , 150–153 10.1016/j.biopsycho.2012.02.006 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gable P. A., Harmon-Jones (2013). Does arousal per se account for the influence of appetitive stimuli on attentional scope and the late positive potential? Psychophysiology 50 , 344–350 10.1111/psyp.12023 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hodapp V. (1991). Das Prüfungsängstlichkeitsinventar TAI-G: eine erweiterte und modifizierte Version mit vier Komponenten . Z. Pädagog. Psychol . 5 , 121–130 [ Google Scholar ]
  • Holroyd K., Westbrook T., Wolf M., Badhorn E. (1978). Performance, cognition and physiological responding in test anxiety . J. Abnorm. Psychol . 87 , 442–451 10.1037/0021-843X.87.4.442 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holyoak K. J., Morrison R. G. (eds.). (2005). The Cambridge Handbook of Thinking and Reasoning . New York, NY: Cambridge University Press [ Google Scholar ]
  • Johnson-Laird P. (2006). How We Reason . Oxford: Oxford University Press [ Google Scholar ]
  • Johnson-Laird P. N., Byrne R. M. J. (2002). Conditionals: a theory of meaning, pragmatics, and inference . Psychol. Rev . 109 , 646–678 10.1037/0033-295X.109.4.646 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klauer K. C., Stahl C., Erdfelder E. (2007). The abstract selection task: new data and an almost comprehensive model . J. Exp. Psychol. Learn. Mem. Cogn . 33 , 680–703 10.1037/0278-7393.33.4.680 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Klorman R., Weerts T., Hastings J., Melamed B., Lang P. (1974). Psychometric description of some fear-specific questionnaires . Behav. Ther . 5 , 401–409 10.1016/S0005-7894(74)80008-0 [ CrossRef ] [ Google Scholar ]
  • Knauff M. (2007). How our brains reason logically . Topoi 26 , 19–36 10.1007/s11245-006-9002-8 [ CrossRef ] [ Google Scholar ]
  • Krohne H. W., Egloff B., Kohlmann C.-W., Tausch A. (1996). Untersuchungen mit einer deutschen Form der Positive and Negative Affect Schedule (PANAS) . Diagnostica 42 , 139–156 [ Google Scholar ]
  • Lefford A. (1946). The influence of emotional subject matter on logical reasoning . J. Gen. Psychol . 34 , 127–151 10.1080/00221309.1946.10544530 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Manktelow K. (2004). Reasoning and Thinking . Hove: Psychology Press [ Google Scholar ]
  • Martin L. L., Clore G. L. (eds.). (2001). Theories of Mood and Cognition: A User‘s Guidebook . Mahwah, NJ: Lawrence Erlbaum [ Google Scholar ]
  • Melton R. J. (1995). The role of positive affect in syllogism performance . Pers. Soc. Psychol. Bull . 21 , 788–794 10.1177/0146167295218001 [ CrossRef ] [ Google Scholar ]
  • Nummenmaa L., Niemi P. (2004). Inducing affective states with success-failure manipulations: a meta-analysis . Emotion 4 , 207–214 10.1037/1528-3542.4.2.207 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Oaksford M., Morris F., Grainger B., Williams J. M. G. (1996). Mood, reasoning, and central executive processes . J. Exp. Psychol. Learn. Mem. Cogn . 22 , 476–492 10.1037/0278-7393.22.2.476 [ CrossRef ] [ Google Scholar ]
  • Safran J. D., Greenberg L. S. (1982). Eliciting “hot cognitions” in cognitive behavior therapy: rationale and procedural guidelines . Can. Psychol . 23 , 83–87 10.1037/h0081247 [ CrossRef ] [ Google Scholar ]
  • Schwarz N., Clore G. L. (1983). Mood, misattribution, and judgments of well-being: informative and directive functions of affective states . J. Pers. Soc. Psychol . 45 , 513–523 10.1037/0022-3514.45.3.513 [ CrossRef ] [ Google Scholar ]
  • Schwarz N., Skurnik I. (2003). Feeling and thinking: implications for problem solving , in The Psychology of Problem Solving , eds Davidson J. E., Sternberg R. (Cambridge: Cambridge University Press; ), 263–292 [ Google Scholar ]
  • Smeets G., de Jong P. J., Mayer B. (2000). If you suffer from a headache, then you have a brain tumour: domain-specific reasoning “bias” and hypochondriasis . Behav. Res. Ther . 38 , 763–776 10.1016/S0005-7967(99)00094-7 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stanovich K. E. (2010). Decision Making and Rationality in the Modern World . New York, NY: Oxford University Press [ Google Scholar ]
  • Vroling M. S., de Jong P. J. (2009). Deductive reasoning and social anxiety: evidence for a fear-confirming belief bias . Cogn. Ther. Res . 33 , 633–644 10.1007/s10608-008-9220-z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vroling M. S., de Jong P. J. (2010). Threat-confirming belief bias and symptoms of anxiety disorders . J. Behav. Ther. Exp. Psychiatry 41 , 110–116 10.1016/j.jbtep.2009.11.002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wacker A., Jaunzeme J., Jaksztat S. (2008). Eine Kurzform des Prüfungsängstlichkeitsinventars TAI-G . Z. Pädagog. Psychol . 22 , 73–81 10.1024/1010-0652.22.1.73 [ CrossRef ] [ Google Scholar ]
  • Wason P. C. (1966). Reasoning , in New Horizons in Psychology I , ed Foss B. M. (Harmondsworth: Penguin; ), 135–151 [ Google Scholar ]
  • Wason P. C., Johnson-Laird P. N. (1972). Psychology of Reasoning: Structure and Content . Cambridge: Harvard University Press [ Google Scholar ]
  • Watson D., Clark L. A., Tellegen A. (1988). Development and validation of brief measures of positive and negative affect: the PANAS scales . J. Pers. Soc. Psychol . 47 , 1063–1070 10.1037/0022-3514.54.6.1063 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilson R. A., Keil F. C. (eds.). (2001). The MIT encyclopedia of the Cognitive Sciences . Cambridge, MA: MIT Press [ Google Scholar ]
  • Zeidner M., Safir M. P. (1989). Sex, ethnic, and social differences in test anxiety among israeli adolescents . J. Genet. Psychol . 150 , 175–185 10.1080/00221325.1989.9914589 [ PubMed ] [ CrossRef ] [ Google Scholar ]

A case study of scientific reasoning

  • Published: December 1993
  • Volume 23 , pages 199–207, ( 1993 )

Cite this article

case study of scientific reasoning

  • Campbell McRobbie 1 &
  • Lyn English 1  

150 Accesses

Explore all metrics

Concern is increasingly being expressed about the teaching of higher order thinking skills in schools and the levels of understanding of scientific concepts by students. Metaphors for the improvement of science education have included science as exploration and science as process skills for experimentation. As a result of a series of studies on how children relate evidence to their theories or beliefs, Kuhn (1993a) has suggested that changing the metaphor to science as argument may be a fruitful way to increase the development of higher order thinking skills and understanding in science instruction. This report is of a case study into the coordination of evidence and theories by a grade 7 primary school student. This student was not able to coordinate these elements in a way that would enable her to rationally consider evidence in relation to her theories. It appeared that the thinking skills associated with science as argument were similar for her in different domains of knowledge and context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

case study of scientific reasoning

Science Teachers’ Views of Argument in Scientific Inquiry and Argument-Based Science Instruction

case study of scientific reasoning

Scientific Reasoning Among Teachers and Teacher Trainees: the Case in Ethiopian Schools and Teacher Training Colleges

case study of scientific reasoning

When Science Is Taught This Way, Students Become Critical Friends: Setting the Stage for Student Teachers

Ash, A., Torrance, N., & Olson, D. (1993, April). The development of children's understanding of necessary and sufficient evidence. Paper presented at the annual conference of the American Educational Research Association, Atlanta, Georgia.

Carey, S. (1986). Cognitive science and science education. American Psychologist, 41 , 1123–1130.

Article   Google Scholar  

Chinn, C.A., & Brewer, W.F. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63 (1), 1–49.

Google Scholar  

Duschl, R. A., & Gitomer, D. H. (1991). Epistemological perspectives on conceptual change: Implications for educational practice. Journal of Research in Science Teaching, 28 , 839–858.

Galotti, K. M. (1989). Approaches to studying formal and everyday reasoning. Psychological Bulletin, 105 , 331–351.

Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39 (2), 93–104.

Holland, J.H., Holyoak, K.J., Nisbett, R.E., & Thagard, P.R. (1986). Induction: Processes inference, learning, and discovery , Cambridge, MA: The MIT Press.

Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence . New York: Basic Books.

Klahr, D., & Dunbar, K. (1988). Dual space search during scientific reasoning. Cognitive Science, 12 , 1–48.

Kuhn, D. (1989). Children and adults as intuitive scientists. Psychological Review, 96 (4), 674–689.

Kuhn, D. (1993a). Connecting scientific and informal reasoning. Merrill-Palmer Quarterly, 39 , 74–103.

Kuhn, D. (1993b). Science as argument: Implications for teaching and learning scientific thinking. Science Education, 77 (3), 319–337.

Kuhn, D., Amsel, E., & O'Loughlin, M. (1988). The development of scientific thinking Skills . New York: Academic Press.

Kuhn, D., Schauble, L., & Garcia-Mila, M. (1992). Cross-domain development of scientific reasoning. Cognition and Instruction, 9 (4), 285–327.

Linn, M.C., & Songer, N.B. (1993). How do students make sense of science? Merrill-Palmer Quarterly, 39 , 47–73.

Mayer Committee. (1992). Employment-related key competencies: A proposal for consultation . Melbourne: Australian Education Council.

O'Brien, D. (1987). The development of conditional reasoning: An iffy proposition. In H. Reese (Ed.), Advances in Child Development and Behaviour (Vol. 20, pp. 61–90). Orlando, FL: Academic Press.

Reif, F., & Larkin, J.H. (1991). Cognition in scientific and everyday domains: Comparison and learning implications. Journal of Research in Science Teaching, 28 (9), 733–760.

Schauble, L. (1990). Belief revision in children: The role of prior knowledge and strategies for generating evidence. Journal of Experimental Child Psychology, 49 31–57.

Schauble, L., Klopfer, L., & Raghavan, K. (1991). Students' transition from an engineering model to a science model of experimentation. Journal of Research in Science Teaching, 28 , 859–882.

Sodian, B., Zaitchik, D., & Carey, S. (1991). Young children's differentiation of hypothetical beliefs from evidence. Child Development, 62 , 753–766.

Tobin, K., & Gallagher, J. (1987). What happens in high school science classrooms? Journal of Curriculum Studies, 19 , 549–560.

Download references

Author information

Authors and affiliations.

Centre for Mathematics and Science Education, Queensland University of Technology, Locked Bag 2, Red Hill, 4059, Brisbane, QLD

Campbell McRobbie ( Acting Director ) &  Lyn English ( Associate Professor )

You can also search for this author in PubMed   Google Scholar

Additional information

Specializations : science learning, scientific reasoning, learning environments, science teacher education.

Specializations : cognition, reasoning in science and mathermatics.

Rights and permissions

Reprints and permissions

About this article

McRobbie, C., English, L. A case study of scientific reasoning. Research in Science Education 23 , 199–207 (1993). https://doi.org/10.1007/BF02357061

Download citation

Issue Date : December 1993

DOI : https://doi.org/10.1007/BF02357061

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Primary School
  • Science Education
  • Scientific Concept
  • Thinking Skill
  • Scientific Reasoning
  • Find a journal
  • Publish with us
  • Track your research

case study of scientific reasoning

Safeguarding Demand Forecasting with Causal Graphs

Causal ai, exploring the integration of causal reasoning into machine learning.

Ryan O'Sullivan

Ryan O'Sullivan

Towards Data Science

What is this series of articles about?

Welcome to my series on Causal AI, where we will explore the integration of causal reasoning into machine learning models. Expect to explore a number of practical applications across different business contexts.

In the last article we covered enhancing marketing mix modelling with Causal AI . In this article we will move onto safeguarding demand forecasting with causal graphs .

If you missed the last article on marketing mix modelling, check it out here:

Enhancing Marketing Mix Modelling with Causal AI

towardsdatascience.com

Introduction

In this article we will delve into how you can safeguard demand forecasting (or any forecasting use case to be honest) with causal graphs.

The following areas will be explored:

  • A quick forecasting 101.
  • What is demand forecasting?
  • A refresher on causal graphs.

How can causal graphs safeguard demand forecasting?

  • A Python case study illustrating how causal graphs can safeguard your forecasts from spurious correlations.

The full notebook can be found here:

causal_ai/notebooks/safeguarding demand forecasting with causal graphs.ipynb at main ·…

This project introduces causal ai and how it can drive business value. - causal_ai/notebooks/safeguarding demand…, forecasting, forecasting 101.

Time series forecasting involves predicting future values based on historical observations.

To start us off, there are a number of terms which it is worth getting familiar with:

  • Auto-correlation — The correlation of a series with it’s previous values at different time lags. Helps identify if there is a trend present.
  • Stationary — This is when the statistical properties of a series are constant over time (e.g. mean, variance). Some forecasting methods assume stationarity.
  • Differencing — This is when we subtract the previous observation from the current observation to transform a non-stationary series into a stationary one. An important step for models which assume stationarity.
  • Seasonality — A regular repeating cycle which occurs at a fixed interval (e.g. daily, weekly, yearly).
  • Trend — The long term movement in a series.
  • Lag — The number of time steps between an observation and a previous value.
  • Residuals — The difference between predicted and actual values.
  • Moving average — Used to smooth out short term fluctuations by averaging a fixed number of past observations.
  • Exponential smoothing — Weights are applied to past observations, with more emphasis placed on recent values.
  • Seasonal decomposition — This is when we separate a time series into seasonal, trend and residual components.

There a a number of different methods which can be used for forecasting:

  • ETS (Error, Trend, Seasonal) — An exponential smoothing method that models error, trend and seasonality components.
  • Autoregressive models (AR models) — Models the current value of the series as a linear combination of it’s previous values.
  • Moving average models (MA models) — Models the current value of the series as a linear combination of past forecast errors.
  • Autoregressive integrated moving average (ARIMA models) — Combines AR and MA models with the incorporation of differencing to make the series stationary.
  • State space models — Deconstructs the timeseries into individual components such as trend and seasonality.
  • Hierarchical models — A method which handles data structured in a hierarchy such as regions.
  • Linear regression — Uses one or more independent variable (feature) to predict the dependent variable (target).
  • Machine learning (ML) — Uses more flexible algorithms like boosting to capture complex relationships.

If you want to dive further into this topic, I highly recommend the following resource which is well known as the go-to guide for forecasting (the version below is free 😀):

Forecasting: Principles and Practice (3rd ed)

3rd edition.

In terms of applying some of the forecasting models using Python, I’d recommend exploring Nixtla which has an extensive list of models implemented and an easy to use API:

Open Source Time Series Ecosystem. Nixtla has 35 repositories available. Follow their code on GitHub.

Demand forecasting.

Predicting the demand for your product is important.

  • It can help manage your inventory, avoiding over or understocking.
  • It can keep your customers satisfied, ensuring products are available when they want them.
  • Reducing holding costs and minimising waste is cost efficient.
  • Essential for strategic planning.

Keeping demand forecasts accurate is essential — In the next section let’s start to think about how causal graphs could safeguard our forecasts…

Causal graphs

Causal graph refresher.

I’ve covered causal graphs a few times in my series, but just in case you need a refresher check out my first article where I cover it in detail:

Using Causal Graphs to answer causal questions

Taking the graph below as an example, let’s say we want to forecast our target variable. We find we have 3 variables which are correlated with it, so we use them as features. Why would including the spurious correlation be a problem? The more features we include the better our forecast right?

Well, not really….

When it comes to demand forecasting one of the major problems is data drift. Data drift in itself isn’t a problem if the relationship between the feature of interest and target remain constant. But when the relationship doesn’t remain constant, our forecasting accuracy will deteriorate.

But how is a causal graph going to help us… The idea is that spurious correlations are much more likely to drift, and much more likely to cause problems when they do.

Not convinced? OK it’s time to jump into the case study then!

Your friend has bought an ice cream van. They paid a consultant a lot of money to build them a demand forecast model. It worked really well for the first few months, but in the last couple of months your friend has been understocking ice cream! They remember that your job title was “data something or other” and come to you for advice.

Creating the case study data

Let me start by explaining how I created the data for this case study. I created a simple causal graph with the following characteristics:

  • Ice cream sales is the target node (X0)
  • Coastal visits is a direct cause of ice cream sales (X1)
  • Temperature is an indirect cause of ice cream sales (X2)
  • Sharks attacks is a spurious correlation (X3)

I then used the following data generating process:

You can see that each node is influenced by past values of itself and a noise term as well as it’s direct parents. To create the data I use a handy module from the time series causal analysis python package Tigramite:

GitHub - jakobrunge/tigramite: Tigramite is a python package for causal inference with a focus on…

Tigramite is a python package for causal inference with a focus on time series data. the tigramite documentation is at….

Tigramite is a great package but I am not going to cover it in detail this time around as is deserves it own article! Below we use the structural_causal_process module following the data generating process above:

We can then visualise our time series:

Now you understand how I have created the data, lets get back to the case study in the next section!

Understanding the data generating process

You start by trying to understand the data generating process by taking the data used in the model. There are 3 features included in the model:

  • Coastal visits
  • Temperature
  • Shark attacks

To get an understanding of the causal graph, you use PCMCI (which is has a great implementation in Tigramite), a method which is suitable for causal time series discovery. I am not going to cover PCMCI this time round as it needs it’s own dedicated article. However, if you are unfamiliar with causal discovery in general, use my previous article to get a good introduction:

Making Causal Discovery work in real-world business settings

The causal graph output from PCMCI can be seen above. The following things jump out:

  • Coastal visits is a direct cause of ice cream sales
  • Temperature is an in-direct cause of ice cream sales
  • Sharks attacks is a spurious correlation

You question why anyone with any common sense would include shark attacks as a feature! Looking at the documentation it seems that the consultant used ChatGPT to get a list of features to consider for the model and then used autoML to train the model.

So if ChatGPT and autoML think shark attacks should be in the model, surely it can’t be doing any harm?

Pre-processing the case study data

Next let’s visit how I pre-processed the data to make it suitable for this case study. To create our features we need to pick up the lagged values for each column (look back at the data generating process to understand why the features need to be the lagged values):

We could use these lagged features to predict ice cream sales, but before we do let’s introduce some data drift to the spurious correlation:

Let’s go back to the case study and understand what we are seeing. Why has the number of shark attacks drifted? You do some research and find out that one of the causes of shark attacks is the number of people surfing. In recent months there has been a huge rise in the popularity of surfing, causing an increase in shark attacks. So how did this effect the ice cream sales forecasting?

Model training

You decide to recreate the model using the same features as the consultant and then using just the direct causes:

The model trained on just the direct causes looks good on both the train and test set.

However, when you train the model using all of the features you see that the model performs well on the train set but not the test set. Seem’s like you identified the problem!

When we compare the predictions from both models of the test set we can see why your friend has been understocking on ice cream!

Closing thoughts

Today we explored how harmful including spurious correlations in your forecasting models can be. Let’s finish off with some closing thoughts:

  • The aim of this article was to start you thinking about how understanding the causal graph can improve your forecasts.
  • I know the example was a little over-exaggerated (I would hope common sense would have helped in this scenario!) but it hopefully illustrates the point.
  • Another interesting point to mention is that the coefficient for shark attacks was negative. This is another pitfall as logically we would have expected this spurious correlation to be positive.
  • Medium-long term demand forecasting it very hard — You often need a forecasting model for each feature to be able to forecast multiple timesteps ahead. Interesting, causal graphs (specifically structural causal models) lend themselves well to this problem.

Follow me if you want to continue this journey into Causal AI — In the next article we see how we can use encouragement design to estimate the effect of product features which need to be fully rolled out (no AB test).

Ryan O'Sullivan

Written by Ryan O'Sullivan

Experienced Data Scientist | Causal AI | Optimisation | Machine Learning | Forecasting | www.linkedin.com/in/ryan-o-sullivan-18488560

Text to speech

Genome study deepens mystery of what doomed Earth's last mammoths

  • Medium Text

Artist's impression of the last woolly mammoth on Wrangel Island

Sign up here.

Reporting by Will Dunham in Washington, Editing by Rosalba O'Brien

Our Standards: The Thomson Reuters Trust Principles. New Tab , opens new tab

case study of scientific reasoning

Science Chevron

Russian scientists conduct autopsy on 44,000-year-old permafrost wolf carcass.

Found by locals in Russia's Yakutia region in 2021, the wolf's body is only now being properly examined by scientists.

Artist's impression of the last woolly mammoth on Wrangel Island

  • Open access
  • Published: 26 June 2024

Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study

  • Giacomo Rossettini   ORCID: orcid.org/0000-0002-1623-7681 1 , 2 ,
  • Lia Rodeghiero 3 ,
  • Federica Corradi 4 ,
  • Chad Cook   ORCID: orcid.org/0000-0001-8622-8361 5 , 6 , 7 ,
  • Paolo Pillastrini   ORCID: orcid.org/0000-0002-8396-2250 8 , 9 ,
  • Andrea Turolla   ORCID: orcid.org/0000-0002-1609-8060 8 , 9 ,
  • Greta Castellini   ORCID: orcid.org/0000-0002-3345-8187 10 ,
  • Stefania Chiappinotto   ORCID: orcid.org/0000-0003-4829-1831 11 ,
  • Silvia Gianola   ORCID: orcid.org/0000-0003-3770-0011 10   na1 &
  • Alvisa Palese   ORCID: orcid.org/0000-0002-3508-844X 11   na1  

BMC Medical Education volume  24 , Article number:  694 ( 2024 ) Cite this article

193 Accesses

3 Altmetric

Metrics details

Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots’ responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question.

An observational cross-sectional design was performed in September of 2023. Accuracy of the three chatbots was evaluated for the CINECA test, where questions were formatted using a multiple-choice structure with a single best answer. The outcome is binary (correct or incorrect). Chi-squared test and a post hoc analysis with Bonferroni correction assessed differences among chatbots performance in accuracy. A p -value of < 0.05 was considered statistically significant. A sensitivity analysis was performed, excluding answers that were not applicable (e.g., images). Narrative coherence was analyzed by absolute and relative frequencies of correct answers and errors.

Overall, of the 820 CINECA multiple-choice questions inputted into all chatbots, 20 questions were not imported in ChatGPT-4 ( n  = 808) and Google Gemini ( n  = 808) due to technical limitations. We found statistically significant differences in the ChatGPT-4 vs Google Gemini and Microsoft Copilot vs Google Gemini comparisons ( p -value < 0.001). The narrative coherence of AI chatbots revealed “Logical reasoning” as the prevalent correct answer ( n  = 622, 81.5%) and “Logical error” as the prevalent incorrect answer ( n  = 40, 88.9%).

Conclusions

Our main findings reveal that: (A) AI chatbots performed well; (B) ChatGPT-4 and Microsoft Copilot performed better than Google Gemini; and (C) their narrative coherence is primarily logical. Although AI chatbots showed promising accuracy in predicting the correct answer in the Italian entrance university standardized examination test, we encourage candidates to cautiously incorporate this new technology to supplement their learning rather than a primary resource.

Trial registration

Not required.

Peer Review reports

Being enrolled in a healthcare science degree in Italy requires a university examination, which is a highly competitive and selective process that demands intensive preparation worldwide [ 1 ]. Conventional preparation methods involve attending classes, studying textbooks, and completing practical exercises [ 2 ]. However, with the emergence of artificial intelligence (AI), digital tools like AI chatbots to assist in exam preparation are becoming more prevalent, presenting novel opportunities for candidates [ 2 ].

AI chatbots such as ChatGPT, Microsoft Bing, and Google Bard are advanced language models that can produce responses similar to humans through a user-friendly interface [ 3 ]. These chatbots are trained using vast amounts of data and deep learning algorithms, which enable them to generate coherent responses and predict text by identifying the relationships between words [ 3 ]. Since their introduction, AI chatbots have gained considerable attention and sparked discussions in medical and health science education and clinical practice [ 4 , 5 , 6 , 7 ]. AI chatbots can provide simulations with digital patients, personalized feedback, and help eliminate language barriers; they also present biases, ethical and legal concerns, and content quality issues [ 8 , 9 ]. As such, the scientific community recommends evaluating the AI chatbot’s accuracy of predicting the correct answer (e.g., passing examination tests) to inform students and academics of their value [ 10 , 11 ].

Several studies have assessed the accuracy of AI chatbots to pass medical education tests and exams. A recent meta-analysis found that ChatGPT-3.5 correctly answered most multiple-choice questions across various medical educational fields [ 12 ]. Further research has shown that newer versions of AI chatbots, such as ChatGPT-4, have surpassed their predecessors in passing Specialty Certificate Examinations in dermatology [ 13 , 14 ], neurology [ 15 ], ophthalmology [ 16 ], rheumatology [ 17 ], general medicine [ 18 , 19 , 20 , 21 ], and nursing [ 22 ]. Others have reported mixed results when comparing the accuracy of multiple AI chatbots (e.g., ChatGPT-4 vs Microsoft Bing, ChatGPT-4 vs Google Bard) in several medical examinations tests [ 23 , 24 , 25 , 26 , 27 , 28 , 29 ]. Recently, two studies observed the superiority of ChatGPT-3.5 over Microsoft Copilot and Google Bard in hematology [ 30 ] and physiology [ 31 ] case solving. Recent work has also observed that ChatGPT-4 outperformed other AI Chatbots in clinical dentistry-related questions [ 32 ], whereas another revealed that ChatGPT-4 and Microsoft Bing outperformed Google Bard and Claude in the Peruvian National Medical Licensing Examination [ 33 ].

These findings suggest a potential hierarchy in accuracy of AI chatbots, although continued study in medical education is certainly warranted [ 3 ]. Further, current studies are limited by predominantly investigating: (A) a single AI chatbot rather than multiple ones; (B) examination tests for students and professionals already in training rather than newcomers to the university; and (C) examination tests for medical specialities rather than for healthcare science (e.g., rehabilitation and nursing). Only two studies [ 34 , 35 ] have attempted to address these limitations, identifying ChatGPT-3.5 as a promising, supplementary tool to pass several standardised admission tests in universities in the UK [ 34 ] and in France [ 35 ]. To our knowledge, no study has been performed on admission tests for admissions to a healthcare science degree program. Healthcare Science is a profession that includes over 40 areas of applied science that support the diagnosis, rehabilitation and treatment of several clinical conditions [ 36 ]. Moreover, the only studies conducted in Italy concerned ChatGPT's accuracy in passing the Italian Residency Admission National Exam for medical graduates [ 37 , 38 ] offering opportunities for further research setting.

Accordingly, to overcome existing knowledge gaps, this study aimed to assess the comparative accuracy of predicting the correct answer of three updated AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance university standardized examination test of healthcare science. The secondary aim was to assess the narrative coherence of the text responses offered by the AI chatbots. Narrative coherence was defined as the internally consistency and sensibility of the internal or external explanation provided by the chatbot.

Study design and ethics

We conducted an observational cross-sectional study following the Strengthening of Reporting of Observational Studies in Epidemiology (STROBE) high-quality reporting standards [ 39 ]. Because no human subjects were included, ethical approval was not required [ 40 ].

This study was developed by an Italian multidisciplinary group of healthcare science educators. The group included professors, lecturers, and educators actively involved in university education in different healthcare disciplines (e.g., rehabilitation, physiotherapy, speech therapy, nursing).

In Italy, the university’s process of accessing the healthcare professions is regulated by the laws according to short- and long-term workforce needs [ 41 ]. Consequently, the placements available for each degree are established in advance; to be enrolled in an academic year, candidates should take a standardized examination test occurring on the same day for all universities. This process, in most Italian universities, is annually managed by the CINECA (Consorzio Interuniversitario per il Calcolo Automatico dell'Italia Nord Orientale), a governmental organization composed of 70 Italian universities, 45 national public research centers, the Italian Ministry of University and Research, and the Italian Ministry of Education [ 42 ]. CINECA prepares the standardized test common to all healthcare disciplines (e.g., nursing and midwifery, rehabilitation, diagnostics and technical, and prevention) for entrance to University [ 43 ]. The test assesses basic knowledge useful as a prerequisite for their future education [ 44 ], in line with the expected knowledge possessed by candidates that encompass students at the end of secondary school, including those from high schools, technical, and professional institutes [ 45 ].

For this study, we adopted the official CINECA Tests from the past 13 years (2011–2023) obtained from freely available public repositories [ 46 , 47 ]. The CINECA Test provided 60–80 range of independent questions per year for a total of 820 multiple-choice questions considered for the analysis. Every question presents five multiple-choice options, with only one being the correct answer and the remaining four being incorrect [ 44 ]. According to the law, over the years, the CINECA test consisted of multiple-choice questions covering four areas: (1) logical reasoning and general culture, (2) biology, (3) chemistry, and (4) physics and mathematics. The accuracy of each AI chatbot was evaluated as the sum of the proportion of correct answers provided among all possible responses for each area and for the total test. In Additional file 1, we reported all the standardized examination tests used in the Italian language and an example of the question stem that was exactly replicated.

Variable and measurements

We assessed the accuracy of three AI chatbots in providing accurate responses for the Italian entrance university standardized examination test for healthcare disciplines. We utilized the latest versions of ChatGPT-4 (OpenAI Incorporated, Mission District, San Francisco, United States) [ 48 ], Microsoft Copilot (Microsoft Corporation, WA, US) [ 49 ] and Google Gemini (Alphabet Inc., CA, US) [ 50 ] that were updated in September 2023. We considered the following variables: (A) the accuracy of predicting the correct answer of the three AI chatbots in the CINECA Test and (B) the narrative coherence and errors of the three AI chatbots responses.

The accuracy of three AI chatbots was assessed by comparing their responses to the correct answers from the CINECA Test. AI Chatbots’ answers were entered into an Excel sheet and categorized as correct or incorrect. Ambiguous or multiple responses were marked as incorrect [ 51 ]. Since none of the three chatbots has integrated multimodal input at this point, questions containing imaging data were evaluated based solely on the text portion of the question stem. However, technical limitations can be present, and a sensitivity analysis was performed, excluding answers that were not applicable (e.g., images).

The narrative coherence and errors [ 52 ] of AI chatbot answers for each question were assessed using a standardized system for categorization [ 53 ]. Correct answers were classified as [ 53 ]: (A) “Logical reasoning”, if they clearly demonstrated the logic presented in the response; (B) “Internal information”, if they included information from the question itself; and (C) “External information”, if they referenced information external to the question.

On the other side, incorrect answers were categorized as [ 53 ]: (A) “Logical error”, when they correctly identify the relevant information but fail to convert it into an appropriate answer; (B) “Information error”, if AI chatbots fail to recognize a key piece of information, whether present in the question stem or through external information; and (C) “Statistical error”, for arithmetic mistakes. An example of categorisation is displayed in Additional file 2. Two authors (L.R., F.C.) independently analyzed the narrative coherence, with a third (G.R.) resolving uncertainties. Inter-rater agreement was measured using Cohen’s Kappa, according to the scale offered by Landis and Koch: < 0.00 “poor”, 0–0.20 “slight”; 0.21–0.40 “fair”, 0.41–0.60 “moderate”, 0.61–0.80 “substantial”, 0.81–1.00 “almost perfect” [ 54 ].

We used each multiple-choice question of the CINECA Test, formatted for proper structure and readability. Because prompt engineering significantly affects generative output, we standardized the input formats of the questions following the Prompt-Engineering-Guide [ 55 , 56 ]. First, we manually entered each question in a Word file, left one line of space and then inserted the five answer options one below the other on different lines. If the questions presented text-based answers, they were directly inputted into the 3 AI chatbots. If the questions were presented as images containing tables or mathematical formulae, they were faithfully rewritten for AI chatbot processing [ 57 ]. If the answers had images with graphs or drawings, they were imported only into Microsoft Copilot because ChatGPT-4 and Google Gemini only accept textual input in their current form and could not process and interpret the meaning of complex images, as present in the CINECA Test, at the time of our study [ 58 ].

On 26th of September 2023, the research group copied and pasted each question onto each of the 3 AI chatbots in the same order in which it was presented in the CINECA Test [ 59 ] and without translating it from the original Italian language to English because the AIs are language-enabled [ 60 ]. To avoid learning bias and that the AI chatbots could learn or be influenced by conversations that existed before the start of the study, we: (A) created and used a new account [ 2 , 51 ], (B) always asked each question only once [ 61 , 62 ], (C) did not provide positive or negative feedback on the answer given [ 60 ], and (D) deleted conversations with the AI chatbots before entering each new question into a new chat (with no previous conversations). We presented an example of a question and answer in Additional file 3.

Statistical analyses

Categorical variables are presented as the absolute frequency with percent and continuous variables as mean with confidence interval (CI, 95%) or median with interquartile range (IQR). The answers were collected as binomial outcomes for each AI chatbot respect to the reference (CINECA Tests). A chi-square test was used to ascertain whether the CINECA Test percentage of correct answers differed among the three AI chatbots according to different taxonomic subcategories (logical reasoning and general culture, biology, chemistry, and physics and mathematics). A sensitivity analysis was performed, excluding answers that were not applicable (e.g., if the answers had images with graphs or drawings). A p -value of < 0.05 was considered significant. Since we are comparing three groups/chatbots, Bonferroni adjustment, Familywise adjustment for multiple measures, for multiple comparisons was applied. Regarding narrative coherence and errors, we calculated the overall correct answers as the relative proportion of correct answers provided among the overall test answers of each AI chatbot accuracy. A descriptive analysis of reasons for logical argumentation of correct answers and categorization of type error was reported by percentage in tables. Statistical analyses were performed with STATA/MP 16.1 software.

AI chatbots’ multiple-choice questions

From our original sample, we inputted all the multiple-choice questions in Microsoft Copilot ( n  = 820). Twelve multiple-choice questions were not imported in ChatGPT-4 ( n  = 808) and Google Gemini ( n  = 808) since they were images with graphs or drawings. The flowchart of the study is shown in Fig.  1 .

figure 1

The study flow chart

AI chatbots’ accuracy

Overall, we found a statistically significant difference in accuracy between the answers of the three chatbots ( p  < 0.001). The results of the Bonferroni adjustment, as a Familywise adjustment for multiple measures and tests between couples, are presented in Table  1 . We found a statistically significant difference in the ChatGPT-4 vs Google Gemini ( p  < 0.001) and Microsoft Copilot vs Google Gemini ( p  < 0.001) comparisons, which indicate a better ChatGPT-4 and Microsoft Copilot accuracy than Google Gemini (Table  1 ). A sensitivity analysis excluding answers that were not applicable (e.g., if the answers had images with graphs or drawings) showed similar results reported in Additional file 4.

AI chatbots’ narrative coherence: correct answers and errors

The Inter-rater agreement regarding AI chatbots’ narrative coherence was “almost perfect” ranging from 0.84–0.88 kappa for internal and logical answers (Additional file 5). The narrative coherence of AI chatbots is reported in Tables 2 and 3 . We excluded from these analyses all not applicable answers (ChatGPT-4: n  = 12, Microsoft Copilot: n  = 0, Google Gemini: n  = 12).

About the category of correct answer (Table  2 ), in ChatGPT-4 (tot = 763), the most frequent feature was “Logical reasoning” ( n  = 622, 81.5%) followed by “Internal information” ( n  = 141, 18.5%). In Microsoft Copilot (tot = 737), the main frequent feature was “Logical reasoning” ( n  = 405, 55%), followed by “External information” ( n  = 195, 26.4%) and “Internal information” ( n  = 137, 18.6%). In Google Gemini (tot = 574), the most frequent feature was “Logical reasoning” ( n  = 567, 98.8%), followed by a few cases of “Internal information” ( n  = 7, 1.2%).

With respect to category of errors (Table  3 ), in ChatGPT-4 (tot = 45), the main frequent reason was “Logical error” ( n  = 40, 88.9%), followed by a few cases of “Information error” ( n  = 4, 8.9%) and statistic ( n  = 1, 2.2%) errors. In Microsoft Copilot (tot = 83), the main frequent reason was “Logical error” ( n  = 66, 79.1%), followed by a few cases of “Information error” ( n  = 9, 11.1%) and “Statistical error” ( n  = 8, 9.8%) errors. In Google Gemini (tot = 234), the main frequent reason was “Logical error” ( n  = 233, 99.6%), followed by a few cases of “Information error” ( n  = 1, 0.4%).

Main findings

The main findings reveal that: (A) AI chatbots reported an overall high accuracy in predicting the correct answer; (B) ChatGPT-4 and Microsoft Copilot performed better than Google Gemini; and (C) considering the narrative coherence of AI chatbots, the most prevalent modality to present correct and incorrect answers were “Logical” (“Logical reasoning” and “Logical error”, respectively).

Comparing our study with existing literature poses a challenge due to the limited number of research that have examined the accuracy of multiple AI chatbots [ 30 , 31 , 32 , 33 ]. Our research shows that AI chatbots can accurately answer questions from the CINECA Test, regardless of the topics (logical reasoning and general culture, biology, chemistry, physics and mathematics). This differs from the fluctuating accuracy found in other studies [ 34 , 35 ]. Our findings support Torres-Zegarra et al.'s observations that the previous version of ChatGPT-4 and Microsoft Bing were superior to Google Bard [ 33 ], while other research groups did not confirm it [ 30 , 31 , 32 ]. This discrepancy may be due to differences in the tests used (e.g., medical specialties vs university entrance), the types of questions targeted at different stakeholders (e.g. professionals vs students), and the version of AI chatbots used (e.g., ChatGPT-3.5 vs 4).

The accuracy ranking of AI chatbots in our study might be due to differences in their neural network architecture. ChatGPT-4 and Microsoft Copilot AI use the GPT (Generative Pre-trained Transformer) architecture, while Google Gemini adopts LaMDA (Language Model for Dialogue Application) and later PaLM 2 (Pathways Language Model) in combination with web search [ 32 ]. The differences in the quality, variety, and quantity of data used for training, the optimization strategies adopted (e.g., fine-tuning), and the techniques applied to create the model could also account for the accuracy differences between AI chatbots [ 63 ]. Therefore, the variations mentioned above could lead to different responses to the same questions, affecting their overall accuracy.

In our study, the narrative coherence shows that AI chatbots mainly offer a broader perspective on the discussed topic using logical processes rather than just providing a simple answer [ 53 ]. This can be explained by the computational abilities of AI chatbots and their capacity to understand and analyze text by recognizing word connections and predicting future words in a sentence [ 63 ]. However, it is important to note that our findings are preliminary, and more research is needed to investigate how narrative coherence changes with advancements in AI chatbot technology and updates.

Implications and future perspective

Our study identifies two contrasting implications of using AI chatbots in education. The positive implication regards AI chatbots as a valuable resource, while the negative implication perceives them as a potential threat. First, our study sheds light on the potential role of AI chatbots as supportive tools to assist candidates in preparation for the Italian entrance university standardized examination test of healthcare science. They can complement the traditional learning methods such as textbooks or in-person courses [ 10 ]. AI chatbots can facilitate self-directed learning, provide explanations and insights on the topics studied, select and filter materials and can be personalized to meet the needs of individual students [ 10 ]. In addition to the knowledge components, these instruments contribute to developing competencies, as defined by the World Health Organization [ 64 ]. Virtual simulation scenarios could facilitate the development of targeted skills and attitudes where students have a virtual interlocutor with a dynamic and human-like approach driven by AI. However, we should highlight that they cannot replace the value of reflection and discussion with peers and teachers, which are crucial for developing meta-competencies of today's students and tomorrow's healthcare professionals [ 10 ]. Conversely, candidates must be protected from simply attempting to use these tools to answer questions while administering exams. Encouraging honesty by avoiding placing and using devices (e.g., mobile phones, tablets) in classrooms is important. Candidates must be encouraged to respond with their preparation and knowledge, given that they are mostly applying for professions where honesty and ethical principles are imperative.

Strengths and limitations

As a strength, we evaluated the comparative accuracy of three AI chatbots in the Italian health sciences university admissions test over the past 13 years on a large sample of questions, considering the narrative consistency of their responses. This enriches the international debate on this topic and provides valuable insights into the strengths and limitations of AI chatbots in the context of university education [ 2 , 3 , 8 , 9 , 11 ].

However, limitations exist and offer opportunities for future study. Firstly, we only used the CINECA Test, while other universities in Italy adopted different tests (e.g., CASPUR and SELECTA). Secondly, we studied three AI Chatbots without considering others presented in the market (e.g., Cloude, Perplexity) [ 31 ]. Thirdly, we adopted both paid (ChatGPT-4) and free (Microsoft Copilot and Google Gemini) versions of AI Chatbots. Although this choice may be a limitation, we aimed to use the most up-to-date and recent versions of the AI Chatbots available when the study was performed. Fourthly, although we inputted all queries into AI chatbots, we processed only some of them as only Microsoft Copilot was able to analyse complex images, as reported in the CINECA Tests, at the time of our study [ 65 , 66 , 67 ]. Fifthly, we inputted the test questions only once to simulate the test execution conditions in real educational contexts [ 32 ], although previous studies have prompted the test questions multiple times in AI chatbots to obtain better results [ 68 ]. However, an AI language model operates differently from regular, deterministic software. These models are probabilistic in nature, forming responses by estimating the probability of the next word according to statistical patterns in their training data [ 69 ]. Consequently, posing the same question twice may not always yield identical answers. Sixthly, we did not calculate the response time of the AI chatbots since this variable is affected by the speed of the internet connection and data traffic [ 51 ]. Seventhly, we assessed the accuracy of AI chatbots in a single country by prompting questions in Italian, which may limit the generalizability of our findings to other contexts and languages [ 70 , 71 ]. Finally, we did not compare the responses of AI chatbots with those of human students since there is no national ranking for admission in Italy, and each university draws up its ranking on its own.

AI chatbots have shown promising accuracy in quickly predicting correct answers, producing writing that is grammatically correct and coherent in a conversation for the Italian entrance university standardized examination test of healthcare science degrees. However, the study provides data regarding the overall performances of different AI Chatbots with regard to the standardized examinations provided in the last 13 years to all candidates willing to enter a healthcare science degree in Italy. Therefore, findings should be placed in the context of a research exercise and may support the current debate regarding the use of AI chatbots in the academic context. Further research is needed to explore the potential of AI chatbots in other educational contexts and to address their limitations as an innovative tool for education and test preparation.

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the Open Science Framework (OSF) repository, https://osf.io/ue5wf/ .

Abbreviations

  • Artificial intelligence

Confidence interval

Consorzio Interuniversitario per il Calcolo Automatico dell'Italia Nord Orientale

Generative pre-trained transformer

Interquartile range

Language model for dialogue application

Pathways language model

Strengthening of Reporting of Observational Studies in Epidemiology

Redazione. Test d’ammissione professioni sanitarie, il 14 settembre 2023. Sanità Informazione. 2023. https://www.sanitainformazione.it/professioni-sanitarie/1settembre-test-dammissione-alle-professioni-sanitarie-fissato-per-il-14-settembre-2023-alle-ore-13-in-tutta-italia/ . Accessed 6 May 2024.

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.

Article   Google Scholar  

Rossettini G, Cook C, Palese A, Pillastrini P, Turolla A. Pros and cons of using artificial intelligence Chatbots for musculoskeletal rehabilitation management. J Orthop Sports Phys Ther. 2023;53:1–17.

Fütterer T, Fischer C, Alekseeva A, Chen X, Tate T, Warschauer M, et al. ChatGPT in education: global reactions to AI innovations. Sci Rep. 2023;13:15310.

Mohammadi S, SeyedAlinaghi S, Heydari M, Pashaei Z, Mirzapour P, Karimi A, et al. Artificial intelligence in COVID-19 Management: a systematic review. J Comput Sci. 2023;19:554–68.

Mehraeen E, Mehrtak M, SeyedAlinaghi S, Nazeri Z, Afsahi AM, Behnezhad F, et al. Technology in the Era of COVID-19: a systematic review of current evidence. Infect Disord Drug Targets. 2022;22:e240322202551.

SeyedAlinaghi S, Abbaspour F, Mehraeen E. The Challenges of ChatGPT in Healthcare Scientific Writing. Shiraz E-Med J. 2024;25(2):e141861. https://doi.org/10.5812/semj-141861 .

Karabacak M, Ozkara BB, Margetis K, Wintermark M, Bisdas S. The advent of generative language models in medical education. JMIR Med Educ. 2023;9:e48163.

Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, et al. The pros and cons of using ChatGPT in medical education: a scoping review. Stud Health Technol Inform. 2023;305:644–7.

Google Scholar  

Abd-Alrazaq A, AlSaad R, Alhuwail D, Ahmed A, Healy PM, Latifi S, et al. Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9:e48291.

Azer SA, Guerrero APS. The challenges imposed by artificial intelligence: are we ready in medical education? BMC Med Educ. 2023;23:680.

Levin G, Horesh N, Brezinov Y, Meyer R. Performance of ChatGPT in medical examinations: a systematic review and a meta-analysis. BJOG Int J Obstet Gynaecol. 2023. https://doi.org/10.1111/1471-0528.17641 .

Passby L, Jenko N, Wernham A. Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions. Clin Exp Dermatol. 2023:llad197. https://doi.org/10.1093/ced/llad197 .

Lewandowski M, Łukowicz P, Świetlik D, Barańska-Rybak W. ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology. Clin Exp Dermatol. 2023:llad255. https://doi.org/10.1093/ced/llad255 .

Giannos P. Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination. BMJ Neurol Open. 2023;5:e000451.

Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved performance of ChatGPT-4 on the OKAP examination: a comparative study with ChatGPT-3.5. J Acad Ophthalmol. 2023;15:e184–7.

Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, et al. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep. 2023;13:22129.

Haze T, Kawano R, Takase H, Suzuki S, Hirawa N, Tamura K. Influence on the accuracy in ChatGPT: Differences in the amount of information per medical field. Int J Med Inf. 2023;180:105283.

Yanagita Y, Yokokawa D, Uchida S, Tawara J, Ikusaka M. Accuracy of ChatGPT on medical questions in the national medical licensing examination in Japan: evaluation study. JMIR Form Res. 2023;7:e48023.

Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the polish medical final examination. Sci Rep. 2023;13:20512.

Brin D, Sorin V, Vaid A, Soroush A, Glicksberg BS, Charney AW, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep. 2023;13:16492.

Kaneda Y, Takahashi R, Kaneda U, Akashima S, Okita H, Misaki S, et al. Assessing the performance of GPT-3.5 and GPT-4 on the 2023 Japanese nursing examination. Cureus. 2023;15:e42924.

Kleinig O, Gao C, Bacchi S. This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and new bing in an Australian medical licensing examination. Med J Aust. 2023;219:237.

Roos J, Kasapovic A, Jansen T, Kaczmarczyk R. Artificial intelligence in medical education: comparative analysis of ChatGPT, Bing, and medical students in Germany. JMIR Med Educ. 2023;9:e46482.

Ali R, Tang OY, Connolly ID, Fridley JS, Shin JH, Zadnik Sullivan PL, et al. Performance of ChatGPT, GPT-4, and google bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 2023. https://doi.org/10.1227/neu.0000000000002551 .

Patil NS, Huang RS, van der Pol CB, Larocque N. Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment. Can Assoc Radiol J. 2024;75(2):344–50. https://doi.org/10.1177/08465371231193716 .

Toyama Y, Harigai A, Abe M, Nagano M, Kawabata M, Seki Y, et al. Performance evaluation of ChatGPT, GPT-4, and bard on the official board examination of the Japan radiology society. Jpn J Radiol. 2023. https://doi.org/10.1007/s11604-023-01491-2 .

Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2023:bjo-2023-324091. https://doi.org/10.1136/bjo-2023-324091 . Online ahead of print.

Meo SA, Al-Khlaiwi T, AbuKhalaf AA, Meo AS, Klonoff DC. The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance. J Diabetes Sci Technol. 2023:19322968231203987. https://doi.org/10.1177/19322968231203987 . Online ahead of print.

Kumari A, Kumari A, Singh A, Singh SK, Juhi A, Dhanvijay AKD, et al. Large language models in hematology case solving: a comparative study of ChatGPT-3.5, google bard, and microsoft bing. Cureus. 2023;15:e43861.

Dhanvijay AKD, Pinjar MJ, Dhokane N, Sorte SR, Kumari A, Mondal H. Performance of large language models (ChatGPT, Bing Search, and Google Bard) in solving case vignettes in physiology. Cureus. 2023;15:e42972.

Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of generative artificial intelligence large language models ChatGPT, google bard, and microsoft bing chat in supporting evidence-based dentistry: a comparative mixed-methods study. J Med Internet Res. 2023. https://doi.org/10.2196/51580 .

Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga-Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National licensing medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20.

Giannos P, Delardas O. Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ. 2023;9:e47737.

Guigue P-A, Meyer R, Thivolle-Lioux G, Brezinov Y, Levin G. Performance of ChatGPT in French language Parcours d’Accès Spécifique Santé test and in OBGYN. Int J Gynaecol Obstet Off Organ Int Fed Gynaecol Obstet. 2023. https://doi.org/10.1002/ijgo.15083 .

Healthcare Science. NSHCS. https://nshcs.hee.nhs.uk/healthcare-science/ . Accessed 6 May 2024.

Alessandri Bonetti M, Giorgino R, Gallo Afflitto G, De Lorenzi F, Egro FM. How does ChatGPT perform on the Italian residency admission national exam compared to 15,869 medical graduates? Ann Biomed Eng. 2023. https://doi.org/10.1007/s10439-023-03318-7 .

Scaioli G, Moro GL, Conrado F, Rosset L, Bert F, Siliquini R. Exploring the potential of ChatGPT for clinical reasoning and decision-making: a cross-sectional study on the Italian medical residency exam. Ann DellIstituto Super Sanità. 2023;59:267–70.

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;4:e296.

Nowell J. Guide to ethical approval. BMJ. 2009;338:b450.

Accesso programmato a livello nazionale. Mi - Ministero dell’istruzione. https://www.miur.gov.it/accesso-programmato-a-livello-nazionale . Accessed 6 May 2024.

Il Consorzio. Cineca. http://www.cineca.it/chi-siamo/il-consorzio . Accessed 6 May 2024.

Salute M della. Professioni sanitarie. https://www.salute.gov.it/portale/professioniSanitarie/dettaglioContenutiProfessioniSanitarie.jsp?lingua=italiano&id=808&area=professioni-sanitarie&menu=vuoto&tab=1 . Accessed 6 May 2024.

Test d’ingresso ai corsi ad accesso programmato e alle scuole di specializzazione. Cineca. http://www.cineca.it/sistemi-informativi-miur/studenti-carriere-offerta-formativa-e-altri-servizi/test-dingresso-ai . Accessed 6 May 2024.

Scuola secondaria di secondo grado. Mi - Ministero dell’istruzione. https://www.miur.gov.it/scuola-secondaria-di-secondo-grado . Accessed 6 May 2024.

Test ammissione professioni sanitarie anni precedenti. TaxiTest. https://taxitest.it/test-ingresso-professioni-sanitarie-anni-passati/ . Accessed 6 May 2024.

Soluzioni dei Test d’Ingresso per Professioni Sanitarie 2023. https://www.studentville.it/app/uploads/2023/09/soluzioni-test-cineca-professioni-sanitarie-2023.pdf . Accessed 6 May 2024.

ChatGPT. https://chat.openai.com . Accessed 6 May 2024.

Microsoft Copilot: il tuo AI Companion quotidiano. Microsoft Copilot: il tuo AI Companion quotidiano. https://ceto.westus2.binguxlivesite.net/ . Accessed 6 May 2024.

Gemini: chatta per espandere le tue idee. Gemini. https://gemini.google.com . Accessed 6 May 2024.

Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141:589–97.

Trabasso T. The Development of Coherence in Narratives by Understanding Intentional Action. In: Stelmach GE, Vroon PA, editors. Advances in Psychology. Vol. 79. North-Holland; 1991. p. 297–314. ISSN 0166-4115, ISBN 9780444884848. https://doi.org/10.1016/S0166-4115(08)61559-9 .

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

Saravia E. Prompt Engineering Guide. https://github.com/dair-ai/Prompt-Engineering-Guide . 2022. Accessed 6 May 2024.

Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. 2023. https://doi.org/10.1007/s10439-023-03272-4 .

Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT–3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. JAAOS -. J Am Acad Orthop Surg. 2023;31:1173.

Guerra GA, Hofmann H, Sobhani S, Hofmann G, Gomez D, Soroudi D, et al. GPT-4 artificial intelligence model outperforms ChatGPT, medical students, and neurosurgery residents on neurosurgery written board-like questions. World Neurosurg. 2023;S1878–8750(23):01144.

Cuthbert R, Simpson AI. Artificial intelligence in orthopaedics: can chat generative pre-trained transformer (ChatGPT) pass Section 1. Postgrad Med J. 2023;99:1110–4.

Friederichs H, Friederichs WJ, März M. ChatGPT in medical school: how successful is AI in progress testing? Med Educ Online. 2023;28.

Weng T-L, Wang Y-M, Chang S, Chen T-J, Hwang S-J. ChatGPT failed Taiwan’s family medicine board exam. J Chin Med Assoc JCMA. 2023;86:762–6.

Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology. 2023;307:e230582.

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40.

Global competency framework for universal health coverage. https://www.who.int/publications-detail-redirect/9789240034686 . Accessed 6 May 2024.

ChatGPT — Release Notes | OpenAI Help Center. https://help.openai.com/en/articles/6825453-chatgpt-release-notes . Accessed 6 May 2024.

Microsoft. Visual Search API | Microsoft Bing. Bingapis. https://www.microsoft.com/en-us/bing/apis/bing-visual-search-api . Accessed 6 May 2024.

What’s ahead for Bard: More global, more visual, more integrated. Google. 2023. https://blog.google/technology/ai/google-bard-updates-io-2023/ . Accessed 6 May 2024.

Zhu L, Mou W, Yang T, Chen R. ChatGPT can pass the AHA exams: Open-ended questions outperform multiple-choice format. Resuscitation. 2023;188:109783.

Probabilistic machine learning and artificial intelligence | Nature. https://www.nature.com/articles/nature14541 . Accessed 6 May 2024.

Ebrahimian M, Behnam B, Ghayebi N, Sobhrakhshankhah E. ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model. BMJ Health Care Inform. 2023;30:e100815.

Seghier ML. ChatGPT: not all languages are equal. Nature. 2023;615:216.

Download references

Acknowledgements

The authors thanks Sanitätsbetrieb der Autonomen Provinz Bozen/Azienda Sanitaria della Provincia Autonoma di Bolzano for covering the open access publication costs.

The authors declare that they receive fundings from the Department of Innovation, Research, University and Museums of the Autonomous Province of Bozen/Bolzano for covering the open access publication costs of this study.

Author information

Silvia Gianola and Alvisa Palese both authors have contributed equally.

Authors and Affiliations

School of Physiotherapy, University of Verona, Verona, Italy

Giacomo Rossettini

Department of Physiotherapy, Faculty of Sport Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670, Spain

Department of Rehabilitation, Hospital of Merano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Merano-Meran, Italy

Lia Rodeghiero

School of Speech Therapy, University of Verona, Verona, Italy

Federica Corradi

Department of Orthopaedics, Duke University, Durham, NC, USA

Duke Clinical Research Institute, Duke University, Durham, NC, USA

Department of Population Health Sciences, Duke University, Durham, NC, USA

Department of Biomedical and Neuromotor Sciences (DIBINEM), Alma Mater University of Bologna, Bologna, Italy

Paolo Pillastrini & Andrea Turolla

Unit of Occupational Medicine, IRCCS Azienda Ospedaliero-Universitaria Di Bologna, Bologna, Italy

Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy

Greta Castellini & Silvia Gianola

Department of Medical Sciences, University of Udine, Udine, Italy

Stefania Chiappinotto & Alvisa Palese

You can also search for this author in PubMed   Google Scholar

Contributions

GR, SG, AP conceived and designed the research and wrote the first draft. LR, FC, managed the acquisition of data. SG, GC, SC, CC, PP, AT managed the analysis and interpretation of data. GR, SG, AP wrote the first draft. All authors read, revised, wrote and approved the final version of manuscript.

Authors' information

A multidisciplinary group of healthcare science educators promoted and developed this study in Italy. The group consisted of professors, lecturers, and tutors actively involved in university education in different healthcare science disciplines (e.g., rehabilitation, physiotherapy, speech therapy, nursing).

Corresponding authors

Correspondence to Giacomo Rossettini , Lia Rodeghiero , Stefania Chiappinotto , Silvia Gianola or Alvisa Palese .

Ethics declarations

Ethics approval and consent to participate.

Not applicable, no humans and patients have been involved in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., supplementary material 4., supplementary material 5., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Rossettini, G., Rodeghiero, L., Corradi, F. et al. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study. BMC Med Educ 24 , 694 (2024). https://doi.org/10.1186/s12909-024-05630-9

Download citation

Received : 24 January 2024

Accepted : 04 June 2024

Published : 26 June 2024

DOI : https://doi.org/10.1186/s12909-024-05630-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health occupations
  • Physical therapy modalities
  • Speech therapy

BMC Medical Education

ISSN: 1472-6920

case study of scientific reasoning

  • Skip to main content

Update your browser for the best possible experience

As of January 1st, 2020, Internet Explorer (versions 11 and below) is no longer supported by Evolve. To get the best possible experience using Evolve, we recommend that you use another web browser. For HESI iNet users click here .

  • Search Menu
  • Sign in through your institution
  • Computer Science
  • Earth Sciences
  • Information Science
  • Life Sciences
  • Materials Science
  • Science Policy
  • Advance Access
  • Special Topics
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • Self-Archiving Policy
  • About National Science Review
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Powerful qtl mapping and favorable allele mining in an all-in-one population: a case study of heading date.

ORCID logo

These authors equally contributed to this work

  • Article contents
  • Figures & tables
  • Supplementary Data

Pengfei Wang, Ying Yang, Daoyang Li, Zhichao Yu, Bo zhang, Xiangchun Zhou, Lizhong Xiong, Jianwei Zhang, Yongzhong Xing, Powerful QTL mapping and favorable allele mining in an all-in-one population: a case study of heading date, National Science Review , 2024;, nwae222, https://doi.org/10.1093/nsr/nwae222

  • Permissions Icon Permissions

The multiparent advanced generation intercross (MAGIC) population is characterized with great potentials in power and resolution of QTL mapping, but SNP-based GWAS does not fully play its potential. In this study, a MAGIC population of 1021 lines was developed from four Xian and four Geng varieties from 5 subgroups of rice. A total of 44,000 genes showed functional polymorphisms among eight parents, including frameshift variations or premature stop codon variations, which provides the potential to map almost all genes of the MAGIC population. Principal component analysis results showed that the MAGIC population had a weak population structure. A high-density bin map of 24,414 bins was constructed. Segregation distortion occurred in the regions possessing the genes underlying genetic incompatibility and gamete development. SNP-based association analysis and bin-based linkage analysis identified 25 significant loci and 47 QTLs for heading date, including 14 known heading date genes. The mapping resolution of genes is dependent on genetic effects with offset distances of less than 55 kb for major effect genes and less than 123 kb for moderate effect genes. Four causal variants and noncoding structure variants were identified to be associated with heading date. Three to four types of alleles with strong, intermediate, weak, and no genetic effects were identified from eight parents, providing flexibility for the improvement of rice heading date. In most cases, japonica rice carries weak alleles, and indica rice carries strong alleles and nonfunctional alleles. These results confirmed that the MAGIC population provides the exceptional opportunity to detect QTLs, and its use is encouraged for mapping genes and mining favorable alleles for breeding.

Author notes

Month: Total Views:
June 2024 115

Email alerts

Citing articles via.

  • Recommend to Your Librarian

Affiliations

  • Online ISSN 2053-714X
  • Print ISSN 2095-5138
  • Copyright © 2024 China Science Publishing & Media Ltd. (Science Press)
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IMAGES

  1. Model of scientific reasoning (according to Mayer, 2007)

    case study of scientific reasoning

  2. PPT

    case study of scientific reasoning

  3. PPT

    case study of scientific reasoning

  4. Model of scientific reasoning (according to Mayer, 2007)

    case study of scientific reasoning

  5. Clinical Reasoning Case Study 2022

    case study of scientific reasoning

  6. (PDF) Case Interpretation and Application In Support of Scientific

    case study of scientific reasoning

VIDEO

  1. Case Study 2: Chapter 2

  2. Scientific Reasoning/Critical Thinking in Labs PER Interest Group Feb 23, 2024

  3. Scientific Method Vocab: Hypothesis Example

  4. Logical reasoning based on Ranking| Arun Sharma| Examples & Basic Concepts| CAT

  5. अज्ञान से विज्ञान

  6. Pulling ideas from the brain

COMMENTS

  1. Understanding the Complex Relationship between Critical Thinking and

    In a study of undergraduate science students, advanced scientific reasoning was most often accompanied by accurate prior knowledge as well as sophisticated epistemological commitments; additionally, for students who had comparable levels of prior knowledge, skillful reasoning was associated with a strong epistemological commitment to the ...

  2. Conceptual review on scientific reasoning and scientific thinking

    When conducting a systematic analysis of the concept of scientific reasoning (SR), we found confusion regarding the definition of the concept, its characteristics and its blurred boundaries with the concept of scientific thinking (ST). Furthermore, some authors use the concepts as synonyms. These findings raised three issues we aimed to answer in the present study: (1) are SR and ST the same ...

  3. PDF A case study of scientific reasoning

    A CASE STUDY OF SCIENTIFIC REASONING Campbell McRobbie & Lyn English Queensland University of Technology ABSTRACT Concern is increasingly being expressed about the teaching of higher order thinking skills in schools and the levels of understanding of scientific concepts by students. Metaphors for the improvement of science education have ...

  4. The development of scientific reasoning in medical education: a

    Scientific reasoning has been defined as a problem-solving process that involves critical thinking in relation to content, procedural, and epistemic knowledge [ 1, 2 ]. One specific approach to the study of scientific reasoning has focused on the development of this cognitive skill throughout medical education.

  5. PDF EVALUATING SCIENTIFIC REASONING ABILITY: THE 16483898 25387138

    The study of scientific reasoning ability (SRA) is one of the frequently discussed topics in science education. In decades, researchers and educators ... (2010) adopted a case study approach to con - struct a matrix for evaluating the Complexity of Scientific Reasoning during Inquiry (CSRI) by analyzing teaching practices. In this CSRI matrix ...

  6. The Emergence of Scientific Reasoning

    1. Introduction. Scientific reasoning encompasses the reasoning and problem-solving skills involved in generating, testing and revising hypotheses or theories, and in the case of fully developed skills, reflecting on the process of knowledge acquisition and knowledge change that results from such inquiry activities. Science, as a cultural institution, represents a "hallmark intellectual ...

  7. A case study of scientific reasoning

    Concern is increasingly being expressed about the teaching of higher order thinking skills in schools and the levels of understanding of scientific concepts by students. Metaphors for the improvement of science education have included science as exploration and science as process skills for experimentation. As a result of a series of studies on how children relate evidence to their theories or ...

  8. Scientific Thinking and Reasoning

    Abstract. Scientific thinking refers to both thinking about the content of science and the set of reasoning processes that permeate the field of science: induction, deduction, experimental design, causal reasoning, concept formation, hypothesis testing, and so on. Here we cover both the history of research on scientific thinking and the different approaches that have been used, highlighting ...

  9. 4

    Automatic writing occurred when a medium (in this case, a woman named Dorothy Martin) entered a trance-like state that allowed her to write out channeled messages from a greater being called Sananda. Martin's hand would basically take on a mind of its own and messages from Sananda would come forth on paper.

  10. PDF THE NATURE AND DEVELOPMENT OF SCIENTIFIC REASONING: A ...

    scientific reasoning abilities and scientific literacy. A Few Key Definitions and Clarifications Before introducing the case study, a few definitions and clarifications are in order. A reasoning pattern is defined as a mental strategy, plan, or rule used to process information and derive conclusions that go beyond direct expe-rience.

  11. The Development and Application of Scientific Reasoning

    Scientific reasoning is by definition a broad term, and encompasses the mental activities that are involved when people attempt to make systematic and empirical based discoveries about the world. The chapter discusses how the domain-general cognitive processes, together with domain-specific knowledge, are used to support the scientific ...

  12. [PDF] The Nature and Development of Scientific Reasoning: A Synthetic

    Abstract This paper presents a synthesis of what is currently known about the nature and development of scientific reasoning and why it plays a central role in acquiring scientific literacy. Science is viewed as a hypothetico-deductive (HD) enterprise engaging in the generation and test of alternative explanations. Explanation generation and test requires the use of several key reasoning ...

  13. Documenting the use of expert scientific reasoning processes by high

    We describe a methodology for identifying evidence for the use of three types of scientific reasoning. In two case studies of high school physics classes, we used this methodology to identify multiple instances of students using analogies, extreme cases, and Gedanken experiments. Previous case studies of expert scientists have indicated that these processes can be central during scientific ...

  14. Learning and Scientific Reasoning

    The development of general scientific abilities is critical to enable students of science, technology, engineering, and mathematics (STEM) to successfully handle open-ended real-world tasks in future careers (1-6). Teaching goals in STEM education include fostering content knowledge and developing general scientific abilities. One such ability, scientific reasoning (7-9), is related to ...

  15. Documenting the use of expert scientific reasoning processes by high

    We describe a methodology for identifying evidence for the use of three types of scientific reasoning. In two case studies of high school physics classes, we used this methodology to identify multiple instances of students using analogies, extreme cases, and Gedanken experiments. Previous case studies of expert scientists have

  16. Scientific thinking styles: The different ways of thinking in

    Historian and philosopher John Forrester argues that psychoanalysis is characterized by a style of scientific thinking and reasoning that he coins "thinking in cases". Since Freud, case studies have been used as a medium for sharing, demonstrating, discovering, expanding, consolidating and "thinking" psychoanalytic knowledge.

  17. (PDF) Teaching Scientific Reasoning Skills: A Case Study of a

    58 Teaching Scientific Reasoning Skills: A Case Study of a Microcomputer-Based Curriculum Yael Friedler Rafi Nachmias Nancy Butler Songer School of Education University of California Berkeley, California 94720 One major concern of science education is preparing students to live in a rapidly changing society. In order to recognize, address, and ...

  18. (Pdf) Analysis of Two-tier Question Scoring Methods: a Case Study on

    This study aims to analyze the effect of the 5E learning cycle model assisted by LKPD based on a three-dimensional thinking graph on scientific reasoning and the improvement of scientific reasoning.

  19. Introduction to Scientific Reasoning

    This course analyzes scientists' reasoning strategies. Case studies from the history of astronomy, epidemiology, molecular biology, and neuroscience, provide a basis for understanding of the character of scientific theories and the means by which they are evaluated. Special attention will be given to the construction and evaluation of ...

  20. Case study : analysis of senior high school students scientific

    Case study : analysis of senior high school students scientific creative, critical thinking and its correlation with their scientific reasoning skills on the sound concept. ... And for average score of scientific reasoning skills is 36.70 from maximum score 100 and it's in low category achievement. Meanwhile, for the correlation between ...

  21. Full article: Thinking in Stories: Narrative Reasoning of an

    Case study and ethnographic research methodologies may be useful in further clinical reasoning research to better understand narrative reasoning. ... Procedural, scientific and diagnostic reasoning, involve problem-solving by diagnosis, pattern recognition and hypothesis testing (Schell & Cervero, Citation 1993), ...

  22. How emotions affect logical reasoning: evidence from experiments with

    Logical reasoning problems. Logical reasoning goes back to the antique Greek philosopher Aristotle and is today considered to be essential for the success of people in school and daily life and all kinds of scientific discoveries (Johnson-Laird, 2006).In the psychological lab it is often investigated by means of conditional reasoning tasks.

  23. A case study of scientific reasoning

    Concern is increasingly being expressed about the teaching of higher order thinking skills in schools and the levels of understanding of scientific concepts by students. Metaphors for the improvement of science education have included science as exploration and science as process skills for experimentation. As a result of a series of studies on how children relate evidence to their theories or ...

  24. Utilizing ChatGPT as a scientific reasoning engine to differentiate

    This study has demonstrated the efficacy of ChatGPT as an advanced scientific reasoning tool for critically evaluating disparate medical evidence, a cornerstone of evidence-based medicine. Impressively, the model not only identifies contradictions but also articulates logically sound reasoning without necessitating explicit prompts.

  25. Safeguarding Demand Forecasting with Causal Graphs

    Creating the case study data. Let me start by explaining how I created the data for this case study. I created a simple causal graph with the following characteristics: Ice cream sales is the target node (X0) Coastal visits is a direct cause of ice cream sales (X1) Temperature is an indirect cause of ice cream sales (X2)

  26. Genome study deepens mystery of what doomed Earth's last mammoths

    "The reason we don't think inbreeding, low genetic diversity or harmful mutations caused the population to be doomed is that if that had been the case, the population should have gone through a ...

  27. Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini

    Background Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized ...

  28. Elsevier Education Portal

    Skip to main content

  29. Powerful QTL mapping and favorable allele mining in an all-in-one

    In this study, a MAGIC population of 1021 lines was developed from four Xian and four Geng varieties from 5 subgroups of rice. A total of 44,000 genes showed functional polymorphisms among eight parents, including frameshift variations or premature stop codon variations, which provides the potential to map almost all genes of the MAGIC population.

  30. Ancient bone shows how Neanderthals cared for the vulnerable, study

    A January 2016 study highlighted the case of a chimp with Down syndrome that survived to 23 months thanks to the care received by the mother, with assistance from the eldest daughter.