• Open access
  • Published: 18 June 2024

Using GPT-4 to write a scientific review article: a pilot evaluation study

  • Zhiping Paul Wang 1 ,
  • Priyanka Bhandary 1 ,
  • Yizhou Wang 1 &
  • Jason H. Moore 1  

BioData Mining volume  17 , Article number:  16 ( 2024 ) Cite this article

1563 Accesses

5 Altmetric

Metrics details

GPT-4, as the most advanced version of OpenAI’s large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4’s capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the consistency in text generation by GPT-4, along with potential plagiarism issues when employing this model for the composition of scientific review papers. Based on the results, we suggest the development of enhanced functionalities in ChatGPT, aiming to meet the needs of the scientific community more effectively. This includes enhancements in uploaded document processing for reference materials, a deeper grasp of intricate biomedical concepts, more precise and efficient information distillation for table generation, and a further refined model specifically tailored for scientific diagram creation.

Peer Review reports

Introduction

A comprehensive review of a research field can significantly aid researchers in quickly grasping the nuances of a specific domain, leading to well-informed research strategies, efficient resource utilization, and enhanced productivity. However, the process of writing such reviews is intricate, involving multiple time-intensive steps. These include the collection of relevant papers and materials, the distillation of key points from potentially hundreds or even thousands of sources into a cohesive overview, the synthesis of this information into a meaningful and impactful knowledge framework, and the illumination of potential future research directions within the domain. Given the breadth and depth of biomedical research—one of the most expansive and dynamic fields—crafting a literature review in this area can be particularly challenging and time-consuming, often requiring months of dedicated effort from domain experts to sift through the extensive body of work and produce a valuable review paper [ 1 , 2 ].

The swift progress in Natural Language Processing (NLP) technology, particularly with the rise of Generative Pre-trained Transformers (GPT) and other Large Language Models (LLMs), has equipped researchers with a potent tool for swiftly processing extensive literature. A recent survey indicates that ChatGPT has become an asset for researchers across various fields [ 3 ]. For instance, a PubMed search for “ChatGPT” yielded over 1,400 articles with ChatGPT in their titles as of November 30th, 2023, marking a significant uptake just one year after ChatGPT’s introduction.

The exploration of NLP technology’s capability to synthesize scientific publications into comprehensive reviews is ongoing. The interest in ChatGPT’s application across scientific domains is evident. Studies have evaluated ChatGPT’s potential in clinical and academic writing [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ], and discussions are underway about its use as a scientific review article generator [ 11 , 12 , 13 ]. However, many of these studies predate the release of the more advanced GPT-4, which may render their findings outdated. In addition, there is no study specifically evaluating ChatGPT (GPT-4) for writing biomedical review papers.

As the applications of ChatGPT are explored, the scientific community is also examining the evolving role of AI in research. Unlike any tool previously utilized in the history of science, ChatGPT has been accorded a role akin to that of a scientist, even being credited as an author in scholarly articles [ 14 ]. This development has sparked ethical debates. While thorough evaluations of the quality of AI-generated scientific review articles are yet to be conducted, some AI tools, such as Scopus AI [ 15 ], are already being employed to summarize and synthesize knowledge from scientific literature databases. However, these tools often come with disclaimers cautioning users about the possibility of AI generating erroneous or offensive content. Concurrently, as ChatGPT’s potential contributions to science are probed, concerns about the possible detrimental effects of ChatGPT and other AI tools on scientific integrity have been raised [ 16 ]. These considerations highlight the necessity for more comprehensive evaluations of ChatGPT from various perspectives.

In this study, we hypothesized that ChatGPT can compose text, tables and figures for a biomedical research paper using two cancer research papers as benchmarks. To test this hypothesis, we used the first paper [ 17 ] to prompt ChatGPT to generate the main ideas and summarize text. Next, we used the second paper [ 18 ] to assess its ability to create tables and figures/graphs. We simulated the steps a scientist would take in writing a cancer research review and assessed GPT-4’s performance at each stage. Our findings are presented across four dimensions: the ability to summarize insights from reference papers on specific topics, the semantic similarity of GPT-4 generated text to benchmark texts, the projection of future research directions based on current publications, and the synthesis of context in the form of tables and graphs. We conclude with a discussion of our overall experience and the insights gained from this study.

Review text content generation by ChatGPT

The design of this study aims to replicate the process a scientist undergoes when composing a biomedical review paper. This involves the meticulous collection, examination, and organization of pertinent references, followed by the articulation of key topics of interest into a structured format of sections, subsections, and main points. The scientist then synthesizes information from the relevant references to develop a comprehensive narrative. A primary objective of this study is to assess ChatGPT’s proficiency in distilling insights from references into coherent text. To this end, a review paper on sex differences in cancer [ 17 ] was chosen as a benchmark, referred to as BRP1 (Benchmark Review Paper 1). Using BRP1 for comparison, ChatGPT’s content generation was evaluated across three dimensions: (1) summarization of main points; (2) generation of review content for each main point; and (3) synthesis of information from references to project future research directions.

Main point summarization

The effectiveness of GPT-4 in summarizing information was tested by providing it with the 113 reference articles from BRP1 to generate a list of potential sections for a review paper. The generated sections were then compared with BRP1’s actual section titles for coverage evaluation (Fig.  1 (A)). Additionally, GPT-4 was tasked with creating possible subsections using the BRP1 section titles and reference articles, which were compared with the actual subsection titles in BRP1.

Review content generation

The review content generation test involved comparing GPT-4’s ability to summarize a given point with the actual text content from BRP1 (Fig.  1 (B)). BRP1 comprises three sections with seven subsections, presenting a total of eight main points. The corresponding text content for each point was manually extracted from BRP1. Three strategies were employed for GPT-4 to generate detailed elaborations for these main points: (1) providing a point only in a prompt for baseline content generation; (2) feeding all references used by BRP1 to GPT-4 for reference-based content generation; (3) using only the references corresponding to a main point, i.e., articles being referred in a subsection of BRP1, for content generation to make a main point. The semantic similarity of the text content generated by these strategies was then compared with the manually extracted content from BRP1.

figure 1

( A ) GPT-4 summarizes sections and subsections; ( B ) GPT-4 generated review content evaluation

Projections on future research

The section on “outstanding questions” in the Concluding Remarks of BRP1 serves a dual purpose: it summarizes conclusions and sets a trajectory for future research into sex differences in cancer. This is a common feature in biomedical review papers, where a forward-looking analysis is synthesized from the main discussions within the paper. The pivotal inquiry is whether ChatGPT, without further refinement, can emulate this forward projection using all referenced articles. The relevance of such a projection is contingent upon its alignment with the main points and references of the review. Moreover, it raises the question of whether the baseline GPT-4 LLM would perform comparably.

To address these queries, all references from BRP1 were inputted into GPT-4 to generate a section akin to Concluding Remarks, encompassing a description of sex differences in cancer, future work, and potential research trajectories. Additionally, three distinct strategies were employed to assess GPT-4’s ability to formulate specific “outstanding questions,” thereby evaluating ChatGPT’s predictive capabilities for future research. These strategies involved uploading all BRP1 reference articles to GPT-4 for projection: (1) without any contextual information; (2) with the inclusion of BRP1’s main points; (3) with a brief description of broad areas of interest. The outputs from these strategies, along with the base model’s output—GPT-4 without reference articles—were juxtaposed with BRP1’s original “outstanding questions” for comparison.

Data process

Chatgpt query.

In initiating this study, we utilized the ChatGPT web application ( https://chat.openai.com/ ). However, we encountered several limitations that impeded our progress:

A cap of ten file uploads, which restricts the analysis of content synthesized from over ten articles.

A file size limit of 50 MB, hindering the consolidation of multiple articles into a single file to circumvent the upload constraint.

Inconsistencies in text file interpretation when converted from PDF format, rendering the conversion of large PDFs to smaller text files ineffective.

Anomalies in file scanning, where ChatGPT would occasionally process only one of several uploaded files, despite instructions to utilize all provided files.

Due to these constraints, we transitioned to using GPT-4 API calls for all tests involving document processing. The GPT-4 API accommodates up to twenty file uploads simultaneously, efficiently processes text files converted from PDFs, and demonstrates reliable file scanning for multiple documents. The Python code, ChatGPT prompts, and outputs pertinent to this study are available in the supplementary materials.

The web version of ChatGPT cannot read from all the PDFs uploaded and is able to process only a subset of them. However, the API version of ChatGPT was set up to be able to upload and process 20 PDFs at a time. Several validation tests were carried out to make sure that it is able to read from all of them equally well. One common validation test was to ask ChatGPT if it could reiterate the Methods section of the 18th PDF and so on. This test was carried out randomly multiple times with a different PDF each time to see if ChatGPT is truly able to upload and process the PDFs.

Text similarity comparison

To assess text content similarity, we employed a transformer network-based pre-trained model [ 19 ] to calculate the semantic similarity between the original text in BRP1 and the text generated by GPT-4. We utilized the util.pytorch_cos_sim function from the sentence_transformers package to compute the cosine similarity of semantic content. Additionally, we conducted a manual validation where one of the authors compared the two texts and then categorized the similarity between the GPT-4 generated content and the original BRP1 content into three distinct levels: semantically very similar (Y), partially similar (P), and not similar (N).

Reproducibility and plagiarism evaluation

The inherent randomness in ChatGPT’s output, attributable to the probabilistic nature of large language models (LLMs), necessitates the validation of reproducibility for results derived from ChatGPT outputs. To obtain relatively consistent responses from ChatGPT, it is advantageous to provide detailed context within the prompt, thereby guiding the model towards the desired response. Consequently, we replicated two review content generation tests, as depicted in Fig.  1 (B)—one based on point references and the other on the GPT-4 base model—one week apart using identical reference articles and prompts via API calls to GPT-4. The first test aimed to evaluate the consistency of file-based content generation by GPT-4, while the second assessed the base model. We compared the outputs from the subsequent run with those from the initial run to determine the reproducibility of the text content generated by ChatGPT.

Prior to considering the utilization of ChatGPT for generating content suitable for publication in a review paper, it is critical to address potential plagiarism concerns. The pivotal question is whether text produced by GPT-4 would be flagged as plagiarized by anti-plagiarism software. In this study, GPT-4 generated a substantial volume of text, particularly for the text content comparison test (Fig.  1 (B)). We subjected both the base model-generated review content and the reference-based GPT-4 review content to scrutiny using iThenticate to ascertain the presence of plagiarism.

Table and figure generation by ChatGPT

Review papers often distill the content from references into tables and further synthesize this information into figures. In this study, we evaluated ChatGPT’s proficiency in generating content in tabular and diagrammatic formats, using benchmark review paper 2 (BRP2) [ 18 ] as a reference, as illustrated in Fig.  2 . The authors of BRP2 developed the seminal Cancer-Immunity Cycle concept, encapsulated in a cycle diagram, which has since become a structural foundational for research in cancer immunotherapy.

Table content generation

Analogous to the file scan anomaly, ChatGPT may disproportionately prioritize one task over others when presented with multiple tasks simultaneously. To mitigate this in the table generation test, we adopted a divide-and-conquer approach, submitting separate GPT-4 prompts to generate content for each column of the table. This strategy facilitated the straightforward assembly of the individual outputs into a comprehensive table, either through GPT-4 or manual compilation.

In BRP2, eleven reference articles were utilized to construct a table (specifically, Table  1 of BRP2) that categorized positive and negative regulators at each stage of the Cancer-Immunity Cycle. These articles were compiled and inputted into ChatGPT, prompting GPT-4 to summarize information for corresponding table columns: Steps, Stimulators, Inhibitors, Other Considerations, and Example References. The content for each column was generated through separate GPT-4 API calls and subsequently compared manually with the content in the original BRP2 table. The semantic similarity and manual validations were carried out for each row of the Table  1 from BRP2. With the API version, we uploaded the references cited within the corresponding row in the table and used that to generate the contents of the row.

Diagram creation

ChatGPT is primarily designed for text handling, yet its capabilities in graph generation are increasingly being explored [ 20 ]. DALL-E, the model utilized by ChatGPT for diagram creation, has been trained on a diverse array of images, encompassing various subjects, styles, contexts, and including scientific and technical imagery. To direct ChatGPT towards producing a diagram that closely aligns with the intended visualization, a precise and succinct description of the diagram is essential. Like the approach for table generation, multiple prompts may be required to facilitate incremental revisions in the drawing process.

In this evaluation, we implemented three distinct strategies for diagram generation, as demonstrated in Fig.  2 . Initially, the 11 reference articles used for table generation were also employed by GPT-4 to generate a description for the cancer immunity cycle, followed by the creation of a diagrammatic representation of the cycle by GPT-4. This approach not only tested the information synthesis capability of GPT-4 but also its diagram drawing proficiency. Secondly, we extracted the paragraph under the section titled ‘The Cancer-Immunity Cycle’ from BRP2 to serve as the diagram description. Terms indicative of a cyclical structure, such as ‘cycle’ and ‘step 1 again,’ were omitted from the description prior to its use as a prompt for diagram drawing. This tested GPT-4’s ability to synthesize the provided information into an innovative cyclical structure for cancer immunotherapy. Lastly, the GPT-4 base model was tasked with generating a cancer immunity mechanism and its diagrammatic representation without any given context. The diagrams produced through these three strategies were scrutinized and compared with the original cancer immunity cycle figure in BRP2 to assess the scientific diagram drawing capabilities of GPT-4.

figure 2

GPT-4 table generation and figure creation

Results and discussions

Main point summary.

As depicted in Fig.  1 A, GPT-4 generated nine potential sections for a proposed paper entitled ‘The Spectrum of Sex Differences in Cancer,’ utilizing the 113 reference articles uploaded, which encompassed all three sections in BRP1. Upon request to generate possible subsections using BRP1 section titles and references, GPT-4 produced four subsections for each section, totaling twelve subsections that encompassed all seven subsections in BRP1. Detailed information regarding GPT-4 prompts, outputs, and comparisons with BRP1 section and subsection titles is provided in the supplementary materials.

The results suggest that ChatGPT can effectively summarize the key points from a comprehensive list of documents, which is particularly beneficial when composing a review paper that references hundreds of articles. With ChatGPT’s assistance, authors can swiftly summarize a list of main topics for further refinement, organization, and editing. Once the topics are finalized, GPT-4 can easily summarize different aspects for each topic, aiding authors in organizing the subsections. This indicates a novel approach to review paper composition that could be more efficient and productive than traditional methods. It represents a collaborative effort between ChatGPT and the review writer, with ChatGPT sorting and summarizing articles, and the author conducting high-level and creative analysis and editing.

During this evaluation, one limitation of GPT-4 was identified: its inability to provide an accurate list of articles referenced for point generation. This presents a challenge in developing an automated pipeline that enables both information summarization and file classification.

Figure  3 illustrates a sample of the text content generation, including the original BRP1 text, the prompt, and ChatGPT’s output. The evaluation results for GPT-4’s review content generation are presented in Table  1 (refer to Fig.  1 B). When generating review content using corresponding references as in BRP1, GPT-4 achieved an average similarity score of 0.748 with the original content in BRP1 across all main points. Manual similarity validation confirmed that GPT-4 generated content that was semantically similar for all 8 points, with 6 points matching very well (Y) and 2 points matching partially (P). When utilizing all reference articles for GPT-4 to generate review content for a point, the mean similarity score was slightly lower at 0.699, with a manual validation result of 5Y3P. The results from the GPT-4 based model were comparable to the corresponding reference-based results, with a mean similarity score of 0.755 and a 6Y2P manual validation outcome.

figure 3

Text generation using GPT4 with specific references ( A ) Original section in BRP1 ( B ) Prompt for same section ( C ) Response from GPT4

As the GPT-4 base model has been trained on an extensive corpus of scientific literature, including journals and articles that explore sex differences in cancer, it is plausible for it to generate text content similar to the original review paper, even for a defined point without any contextual input. The performance when using corresponding references is notably better than when using all references, suggesting that GPT-4 processes information more effectively with relevant and less noisy input.

The similarity score represents only the level of semantic similarity between the GPT-4 output and the original review paper text. It should not be construed as a measure of the quality of the text content generated by GPT-4. While it is relatively straightforward to assess the relevance of content for a point, gauging comprehensiveness is nearly impossible without a gold standard. However, scientific review papers are often required in research areas where such standards do not yet exist. Consequently, this review content similarity test merely indicates whether GPT-4 can produce text content that is semantically akin to that of a human scholar. Based on the results presented in Table  1 , GPT-4 has demonstrated adequate capability in this regard.

In this evaluation, GPT-4 initially synthesized content analogous to the Concluding Remarks section of BRP1 by utilizing all reference articles, further assessing its capability to integrate information into coherent conclusions. Subsequently, GPT-4 projected future research directions using three distinct methodologies. The findings, as detailed in Table  2 , reveal that GPT-4’s content generation performance significantly increased from 0.45 to 0.71 upon the integration of all pertinent references, indicating that the provision of relevant information markedly enhances the model’s guidance. Consequently, although GPT-4 may face challenges in precisely replicating future research due to thematic discrepancies, equipping it with a distinct theme can empower it to produce content that more accurately represents the intended research trajectory. In contrast, the performance of the GPT-4 base model remained comparably stable, regardless of additional contextual cues. Manual verification confirmed GPT-4’s ability to synthesize information from the provided documents and to make reasonably accurate predictions about future research trajectories.

Reproducibility

The comparative analysis of GPT-4 outputs from different runs is presented in Table  3 . Based on previous similarity assessments, a similarity score of 0.7 is generally indicative of a strong semantic correlation in the context of this review paper. In this instance, GPT-4 outputs using corresponding references exhibited an average similarity score of 0.8 between two runs, while the base model scored 0.9. A manual review confirmed that both outputs expressed the same semantic meaning at different times. Consequently, it can be concluded that GPT-4 consistently generates uniform text responses when provided with identical prompts and reference materials.

An intriguing observation is that the GPT-4 base model appears to be more stable than when utilizing uploaded documents. This may suggest limitations in GPT-4’s ability to process external documents, particularly those that are unstructured or highly specialized in scientific content. This limitation aligns with our previous observation regarding GPT-4’s deficiency in cataloging citations within its content summaries.

Plagiarism check

The plagiarism assessment conducted via iThenticate ( https://www.ithenticate.com/ ) yielded a percentage score of 34% for reference-based GPT-4 content generation and 10% for the base model. Of these percentages, only 2% and 3%, respectively, were attributed to matches with the original review paper (BRP1), predominantly due to title similarities, as we maintained the same section and subsection titles. A score of 34% is typically indicative of significant plagiarism concerns, whereas 10% is considered minimal. These results demonstrate the GPT-4 base model’s capacity to expound upon designated points in a novel manner, minimally influenced by the original paper. However, the reference-based content generation raised concerns due to a couple of instances of ‘copy-paste’ style matches from two paragraphs in BRP1 references [ 21 , 22 ], which contributed to the elevated 34% score. In summary, while the overall content generated by ChatGPT appears to be novel, the occurrence of sporadic close matches warrants scrutiny.

This finding aligns with the theoretical low risk of direct plagiarism by ChatGPT, as AI-generated text responses are based on learned patterns and information, rather than direct ‘copy-paste’ from specific sources. Nonetheless, the potential for plagiarism and related academic integrity issues are of serious concern in academia. Researchers have been exploring appropriate methods to disclose ChatGPT’s contributions in publications and strategies to detect AI-generated content [ 23 , 24 , 25 ].

Table construction in scientific publications often necessitates a more succinct representation of relationships and key terms compared to text content summarization and synthesis. This requires ChatGPT to extract information with greater precision. For the five columns of information compiled by GPT-4 for Table  1 in BRP2, the Steps column is akin to summarizing section and subsection titles in BRP1. ‘ Stimulators’ and ‘ Inhibitors’ involve listing immune regulation factors, demanding more concise and precise information extraction. ‘ Other Considerations ’ encompasses additional relevant information, while ‘ Example References ’ lists citations.

For the Steps column, GPT-4 partially succeeded but struggled to accurately summarize information into numbered steps. For the remaining columns, GPT-4 was unable to extract the corresponding information accurately. Extracting concise and precise information from uploaded documents for specific scientific categories remains a significant challenge for GPT-4, which also lacks the ability to provide reference citations, as observed in previous tests. All results, including GPT prompts, outputs, and evaluations, are detailed in the supplementary materials.

In summary, GPT-4 has not yet achieved the capability to generate table content with the necessary conciseness and accuracy for information summary and synthesis.

Figure creation

In the diagram drawing test, we removed all terms indicative of a cyclical graph from the diagram description in the prompt to evaluate whether GPT-4 could independently recreate the original, pioneering depiction of the cancer immune system cycle. We employed three strategies for diagram generation, as depicted in Fig.  2 , which included: (1) using a diagram description generated from references and incorporated into the drawing prompt; (2) using the description from BRP2; (3) relying on the GPT-4 base model. The resulting diagrams produced by GPT-4 are presented in Fig.  4 , with detailed information provided in the supplementary materials.

figure 4

( A ) Original figure ( B ) reference description ( C ) BRP2 description ( D ) base model

These diagrams highlight common inaccuracies in GPT-4’s drawings, such as misspelled words, omitted numbers, and a lack of visual clarity due to superfluous icons and cluttered labeling. Despite these issues, GPT-4 demonstrated remarkable proficiency in constructing an accurate cycle architecture, even without explicit instructions to do so.

In conclusion, while GPT-4 can serve as a valuable tool for conceptualizing diagrams for various biomedical reactions, mechanisms, or systems, professional graph drawing tools are essential for the actual creation of diagrams.

Conclusions

In this study, we evaluated the capabilities of the language model GPT-4 within ChatGPT for composing a biomedical review article. We focused on four key areas: (1) summarizing insights from reference papers; (2) generating text content based on these insights; (3) suggesting avenues for future research; and (4) creating tables and graphs. GPT-4 exhibited commendable performance in the first three tasks but was unable to fulfill the fourth.

ChatGPT’s design is centered around text generation, with its language model finely tuned for this purpose through extensive training on a wide array of sources, including scientific literature. Consequently, GPT-4’s proficiency in text summarization and synthesis is anticipated. When specifically comparing the API GPT model performance on a section providing specific references (references only limited to that section) and all references from the entire paper, the model does better when it is given specific references because providing all references could bring in a lot of noise. One more thing to note is that the prompt specifically mentions not to use external knowledge and hence it must process over a hundred publications and discover relevant information for the section and then compose a reply. This could explain why giving specific references improves performance over giving all references. Remarkably, the GPT-4 base model’s performance is on par with, or in some cases, slightly surpasses that of reference-based text content generation, owing to its training on a diverse collection of research articles and web text. Hence, when given a prompt and some basic points, it performs well since it already possesses all the information needed to generate an appropriate response. Furthermore, reproducibility tests have demonstrated GPT-4’s ability to generate consistent text content, whether utilizing references or solely relying on its base model.

In addition, we assessed GPT-4’s proficiency in extracting precise and pertinent information for the construction of research-related tables. GPT-4 encountered difficulties with this task, indicating that ChatGPT’s language model requires additional training to enhance its ability to discern and comprehend specialized scientific terminology from literature. This improvement necessitates addressing complex scientific concepts and integrating knowledge across various disciplines.

Moreover, GPT-4’s capability to produce scientific diagrams does not meet the standards required for publication. This shortfall may stem from its associated image generation module, DALL-E, being trained on a broad spectrum of images that encompass both scientific and general content. However, with ongoing updates and targeted retraining to include a greater volume of scientific imagery, the prospect of a more sophisticated language model with improved diagrammatic capabilities could be a foreseeable advancement.

To advance the assessment of ChatGPT’s utility in publishing biomedical review articles, we executed a plagiarism analysis on the text generated by GPT-4. This analysis revealed potential issues when references were employed, with GPT-4 occasionally producing outputs that closely resemble content from reference articles. Although GPT-4 predominantly generates original text, we advise conducting a plagiarism check on ChatGPT’s output before any formal dissemination. Moreover, despite the possibility that the original review paper BRP1 was part of GPT-4’s training dataset, the plagiarism evaluation suggests that the output does not unduly prioritize it, considering the extensive data corpus used for training the language model.

Our study also highlights the robust performance of the GPT-4 base model, which shows adeptness even without specific reference articles. This observation leads to the conjecture that incorporating the entirety of scientific literature into the training of a future ChatGPT language model could facilitate the on-demand extraction of review materials. Thus, it posits the potential for ChatGPT to eventually author comprehensive summary and synthesis-based scientific review articles. ChatGPT did not offer any citations for the PDFs that were provided to it at the time this work was written. Therefore, it is advised in such a situation to go section by section, supply a single paper, and obtain a summary of that publication alone so that the user can write a few phrases for that portion and properly credit the paper. On the other hand, the user can supply all articles for commonly recognized knowledge to produce a well-rounded set of statements that require a set of citations.

ChatGPT’s power and versatility warrant additional exploration of various facets. While these are beyond the scope of the current paper, we will highlight selected topics that are instrumental in fostering a more science oriented ChatGPT environment. Holistically speaking, to thoroughly assess ChatGPT’s proficiency in generating biomedical review papers, it is imperative to include a diverse range of review paper types in the evaluation process. For instance, ChatGPT is already equipped to devise data analysis strategies and perform data science tasks in real-time. This capability suggests potential for generating review papers that include performance comparisons and benchmarks of computational tools. However, this extends beyond the scope of our pilot study, which serves as a foundational step toward more extensive research endeavors.

Ideally, ChatGPT would conduct essential statistical analyses of uploaded documents, such as ranking insights, categorizing documents per insight, and assigning relevance weights to each document. This functionality would enable scientists to quickly synthesize the progression and extensively studied areas within a field. When it comes to mitigating hallucination, employing uploaded documents as reference material can reduce the occurrence of generating inaccurate or ‘hallucinated’ content. However, when queries exceed the scope of these documents, ChatGPT may still integrate its intrinsic knowledge base. In such cases, verifying ChatGPT’s responses against the documents’ content is vital. A feasible method is to cross-reference responses with the documents, although this may require significant manual effort. Alternatively, requesting ChatGPT to annotate its output with corresponding references from the documents could be explored, despite being a current limitation of GPT-4.

To address academic integrity concerns, as the development of LLMs progresses towards features that could potentially expedite or even automate the creation of scientific review papers, the establishment of a widely accepted ethical practice guide becomes paramount. Until such guidelines are in place, it remains essential to conduct plagiarism checks on AI-generated content and transparently disclose the extent of AI’s contribution to the published work. The advent of large language models like Google’s Gemini AI [ 26 ] and Perplexity.ai has showcased NLP capabilities comparable to those of GPT-4. This, coupled with the emergence of specialized models such as BioBert [ 27 ], BioBART [ 28 ], and BioGPT [ 29 ] for biomedical applications, highlights the imperative for in-depth comparative studies. These assessments are vital for identifying the optimal AI tool for particular tasks, taking into account aspects such as multimodal functionalities, domain-specific precision, and ethical considerations. Conducting such comparative analyses will not only aid users in making informed choices but also promote the ethical and efficacious application of these sophisticated AI technologies across diverse sectors, including healthcare and education.

Data availability

All source codes, GPT4 generated contents and results are available at the GitHub repository.

Dhillon P. How to write a good scientific review article. FEBS J. 2022;289:3592–602.

Article   CAS   PubMed   Google Scholar  

Health sciences added to the Nature Index. | News | Nature Index. https://www.nature.com/nature-index/news/health-sciences-added-to-nature-index .

Van Noorden R, Perkel JM. AI and science: what 1,600 researchers think. Nature. 2023;621:672–5.

Article   PubMed   Google Scholar  

Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol. 2023;52:1755–8.

Kumar AH. Analysis of ChatGPT Tool to assess the potential of its utility for academic writing in Biomedical Domain. Biology Eng Med Sci Rep. 2023;9:24–30.

Article   Google Scholar  

Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: implications in Scientific writing. Cureus 15, e35179.

Meyer JG, et al. ChatGPT and large language models in academia: opportunities and challenges. BioData Min. 2023;16:20.

Article   PubMed   PubMed Central   Google Scholar  

Mondal H, Mondal S. ChatGPT in academic writing: maximizing its benefits and minimizing the risks. Indian J Ophthalmol. 2023;71:3600.

Misra DP, Chandwar K. ChatGPT, artificial intelligence and scientific writing: what authors, peer reviewers and editors should know. J R Coll Physicians Edinb. 2023;53:90–3.

Ali SR, Dobbs TD, Hutchings HA, Whitaker IS. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5:e179–81.

Alshami A, Elsayed M, Ali E, Eltoukhy AEE, Zayed T. Harnessing the power of ChatGPT for automating systematic review process: Methodology, Case Study, limitations, and future directions. Systems. 2023;11:351.

Huang J, Tan M. The role of ChatGPT in scientific communication: writing better scientific review articles. Am J Cancer Res. 2023;13:1148–54.

PubMed   PubMed Central   Google Scholar  

Haman M, Školník M. Using ChatGPT to conduct a literature review. Account Res. 2023;0:1–3.

ChatGPT listed as author on research papers. many scientists disapprove. https://www.nature.com/articles/d41586-023-00107-z .

Scopus AI. Trusted content. Powered by responsible AI. www.elsevier.com https://www.elsevier.com/products/scopus/scopus-ai .

Conroy G. How ChatGPT and other AI tools could disrupt scientific publishing. Nature. 2023;622:234–6.

Rubin JB. The spectrum of sex differences in cancer. Trends Cancer. 2022;8:303–15.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Chen DS, Mellman I. Oncology meets immunology: the Cancer-Immunity cycle. Immunity. 2013;39:1–10.

Reimers N, Gurevych I, Sentence-BERT. Sentence Embeddings using Siamese BERT-Networks. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (eds. Inui, K., Jiang, J., Ng, V. & Wan, X.) 3982–3992Association for Computational Linguistics, Hong Kong, China, (2019). https://doi.org/10.18653/v1/D19-1410 .

Jin B et al. Large Language Models on Graphs: A Comprehensive Survey. Preprint at https://doi.org/10.48550/arXiv.2312.02783 (2024).

Polkinghorn WR, et al. Androgen receptor signaling regulates DNA repair in prostate cancers. Cancer Discov. 2013;3:1245–53.

Broestl L, Rubin JB. Sexual differentiation specifies Cellular responses to DNA damage. Endocrinology. 2021;162:bqab192.

ChatGPT and Academic Integrity Concerns. Detecting Artificial Intelligence Generated Content | Language Education and Technology. https://www.langedutech.com/letjournal/index.php/let/article/view/49 .

Gao CA et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. 2022.12.23.521610 Preprint at https://doi.org/10.1101/2022.12.23.521610 (2022).

Homolak J. In reply: we do not stand a ghost of a chance of detecting plagiarism with ChatGPT employed as a ghost author. Croat Med J. 2023;64:293–4.

Article   PubMed Central   Google Scholar  

Introducing Gemini. Google’s most capable AI model yet. https://blog.google/technology/ai/google-gemini-ai/#sundar-note .

Lee J, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36:1234–40.

Yuan H et al. Association for Computational Linguistics, Dublin, Ireland,. BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model. in Proceedings of the 21st Workshop on Biomedical Language Processing (eds. Demner-Fushman, D., Cohen, K. B., Ananiadou, S. & Tsujii, J.) 97–109 (2022). https://doi.org/10.18653/v1/2022.bionlp-1.9 .

Luo R, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23:bbac409.

Download references

This research received no specific grant from any funding agency.

Author information

Authors and affiliations.

Department of Computational Biomedicine, Cedars Sinai Medical Center, 700 N. San Vicente Blvd, Pacific Design Center, Suite G-541, West Hollywood, CA, 90069, USA

Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang & Jason H. Moore

You can also search for this author in PubMed   Google Scholar

Contributions

Z. Wang authored the main manuscript text, while P. Bhandary conducted most of the tests and collected the results. Y. Wang and J. Moore contributed scientific suggestions and collaborated on the project. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jason H. Moore .

Ethics declarations

Supplements.

All supplementary materials are available at https://github.com/EpistasisLab/GPT4_and_Review .

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Wang, Z.P., Bhandary, P., Wang, Y. et al. Using GPT-4 to write a scientific review article: a pilot evaluation study. BioData Mining 17 , 16 (2024). https://doi.org/10.1186/s13040-024-00371-3

Download citation

Received : 26 April 2024

Accepted : 11 June 2024

Published : 18 June 2024

DOI : https://doi.org/10.1186/s13040-024-00371-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BioData Mining

ISSN: 1756-0381

can gpt 4 write research paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Pediatr Pharmacol Ther
  • v.28(6); 2023
  • PMC10731938

ChatGPT for Research and Publication: A Step-by-Step Guide

Som s. biswas.

Department of Pediatric Radiology, Le Bonheur Children’s Hospital, The University of Tennessee Health Science Center, Memphis, TN.

Introduction

This commentary provides a concise step-by-step guide on using ChatGPT, an advanced natural language processing (NLP) model, for research and publication purposes. The guide assesses crucial aspects, including data preprocessing, fine-tuning techniques, prompt engineering, and ethical considerations. By addressing challenges related to biases, interpretability, and plagiarism, this commentary offers insights and recommendations for the responsible and ethical use of ChatGPT. The guide empowers researchers to ethically integrate ChatGPT effectively into their workflows, enhancing productivity and improving the quality of their scientific publications. Through clear instructions and guidelines, researchers can tap into the transformative potential of ChatGPT, driving scientific progress in a language-driven world.

In recent years, the field of NLP has witnessed remarkable advancements, bringing us closer to the realm of humanlike language generation. Among these advancements, ChatGPT, based on the groundbreaking GPT-3.5 architecture developed by OpenAI, stands as an impressive language model capable of generating coherent and contextually relevant text responses. With its ability to understand and respond to user inputs, ChatGPT has opened up new possibilities for various applications, including research and publication. 1 – 3

The traditional process of conducting research and publishing scientific papers has been significantly transformed by the emergence of ChatGPT. Researchers and authors can now leverage the power of this sophisticated language model to streamline and enhance their workflow, leading to improved efficiency and higher-quality publications. However, using ChatGPT effectively in the research and publication domain requires a comprehensive understanding of its capabilities, limitations, and best practices.

In this commentary I offer my thoughts for a step-by-step guide for researchers and authors who seek to harness the potential of ChatGPT in their research endeavors and publication efforts. By exploring various aspects, such as data preprocessing, fine-tuning techniques, prompt engineering, and ethical considerations, this guide will equip researchers with the necessary knowledge to harness the full potential of ChatGPT in their scientific pursuits. Moreover, this commentary will delve into the challenges associated with using ChatGPT for research and publication, including biases, interpretability, and concerns regarding plagiarism. By addressing these challenges directly, I aim to provide researchers with valuable insights and recommendations to navigate these important issues and ensure the responsible and ethical use of ChatGPT as a research tool. 4

The significance of my guide lies in its potential to bridge the gap between the rapid progress of language models like ChatGPT and the research and publication process. By elucidating the intricacies of integrating ChatGPT into scientific workflows, researchers will be empowered to leverage this advanced technology effectively, thereby enhancing the overall quality and impact of their research output. 5 In the following sections, I present a comprehensive overview of the steps involved in using ChatGPT for research and publication.

Step 1: Title and Title Page Creation by ChatGPT

ChatGPT can be a valuable tool in generating titles for research papers. Its ability to understand and generate humanlike text allows it to analyze and synthesize information provided by researchers to craft concise and impactful titles. By leveraging its vast knowledge base and language capabilities, ChatGPT can assist in capturing the essence of a research paper, conveying the main focus and contributions succinctly. Researchers can collaborate with ChatGPT by providing relevant information, such as the subject, objectives, methodology, and key findings of their study. ChatGPT can then generate multiple title options, offering different perspectives and angles that researchers can consider. This collaboration with ChatGPT can save time and stimulate creativity, helping researchers refine their titles to accurately represent their work and engage potential readers. ChatGPT can then be used to create the entire title page and then can also customize based on each journal’s recommendations.

For example:

An external file that holds a picture, illustration, etc.
Object name is i2331-348X-28-6-576-f01.jpg

Thus, we see that ChatGPT can write an entire title page based on just the title and author details. We notice that ChatGPT has created an email address that is incorrect and needs manual rectification. However, the rest of the title page, including keywords and the running title, is appropriate.

Step 2: Abstract/Summary Creation by chatGPT

ChatGPT can assist in condensing complex information into a clear and engaging abstract/summary, helping researchers communicate the significance and novelty of their research to a wider audience. By leveraging the language proficiency of ChatGPT, researchers can save time and effort in crafting abstracts while ensuring that the key aspects of their study are accurately represented.

In this example, we demonstrate that ChatGPT can create an entire abstract just by using the title alone. However, the more information researchers provide (preferably the entire body of the paper should be entered into chatGPT), the more accurate the abstract becomes.

An external file that holds a picture, illustration, etc.
Object name is i2331-348X-28-6-576-f02.jpg

Step 3: Introduction Creation by ChatGPT

By collaborating with ChatGPT, researchers can provide key information, such as the background, significance, and objectives of their study. ChatGPT can then generate a well-structured introduction that sets the context, highlights the relevance of the research, and outlines the paper’s objectives. Also, ChatGPT can be used to generate keywords and generate an abbreviations list from the article by using prompts. However, it is important to note that the generated introduction should be reviewed, customized, and refined by the researchers to align with their specific study and writing style.

In the example below, we note that ChatGPT has not only created an introduction but also the objectives of the study, which can then be edited by the human author.

An external file that holds a picture, illustration, etc.
Object name is i2331-348X-28-6-576-f03.jpg

Step 4: Can ChatGPT Create a Literature Review?

Yes, ChatGPT can help generate a literature review, but it is important to note that it may not have access to the most up-to-date research articles and studies due to copyrights and limited access to some journals. Additionally, a literature review typically requires a comprehensive analysis of multiple sources, so the generated response may not cover all relevant studies. Nonetheless, it can assist in providing a basic literature review on a given topic, which will need human authors to add to and edit it.

An external file that holds a picture, illustration, etc.
Object name is i2331-348X-28-6-576-f04.jpg

As we can see, ChatGPT is not as good at giving a detailed review of the literature as it is at summarizing contents or creating an introduction. Thus, its use is limited, if there is any at all, in this section of the paper.

Step 5: Can ChatGPT Assist in Brainstorming the Methodology of Studies?

ChatGPT can be a helpful tool in conceptualizing the methodology for research papers. By engaging in a conversation with ChatGPT, researchers can discuss their research objectives, study design, data collection methods, and data analysis techniques. ChatGPT’s natural language understanding allows it to provide suggestions and insights based on its knowledge base and understanding of research methodologies. Although ChatGPT can assist in generating ideas and providing guidance, it is important for researchers to critically evaluate and adapt the suggestions to align with their specific research goals and requirements.

Although the methodology is something that is unique to each paper and needs a human researcher to conceptualize it, we see in this example that ChatGPT can assist by giving ideas and examples based on the input of the title by the human researcher. Thus, ChatGPT can be part of brainstorming sessions when conceptualizing a study, although this section needs significant editing by a human, unlike the introduction or summary.

An external file that holds a picture, illustration, etc.
Object name is i2331-348X-28-6-576-f05.jpg

Step 6: Do Not Use ChatGPT for Fabricating Patient Data or Results!

This section of the paper must be authentic, and ChatGPT has a limited role, if any, because patient data have to be original. ChatGPT also currently cannot analyze clinical data compared with statistical software, like SPSS Statistics and Base SAS. However, Microsoft appears to be developing an Excel copilot that uses AI to create graphs and plots, and its use needs to be evaluated once it is released to the public. 6

Step 7: Discussion and Conclusions

This section of the paper can be generated by ChatGPT if all results are pasted as input; however, this section also needs manual editing because inaccuracies are common. By discussing their research with ChatGPT, researchers can also identify potential limitations, discuss the broader implications of their findings, and propose future research directions. Although ChatGPT can generate suggestions and facilitate the thought process, it is important for researchers to critically evaluate the information provided and ensure that the Discussion and Conclusion sections align with the specific research objectives and findings of their study. Ultimately, ChatGPT can serve as a supportive tool in developing a comprehensive and well-rounded discussion and conclusion for research papers.

Step 8: References

As per the author's experience, although ChatGPT is capable of creating references for an article, most of them are incorrect. So, using ChatGPT for creating references is not recommended. However, ChatGPT can convert references into any journaling style if the references are entered into ChatGPT and it is asked to convert them into a specific style.

Disadvantages of Using ChatGPT in Research

Although ChatGPT offers numerous advantages for assisting in the writing of research papers, there are also some important potential disadvantages to consider:

  • Lack of domain expertise: ChatGPT is a general-purpose language model trained on a diverse range of Internet text, which means it may lack the specific domain expertise required for certain research topics. It may generate responses that are not accurate or well informed in specialized fields, potentially leading to incorrect or misleading information in research papers.
  • Inconsistency and variability: ChatGPT’s responses can be inconsistent and vary depending on the input phrasing or prompt formulation. This can lead to unpredictability in generating reliable and coherent content, requiring additional effort to refine and ensure accuracy in research papers.
  • Limited control over output: Although researchers can guide the model’s responses through prompts, ChatGPT’s generation process is still primarily autonomous. Researchers have limited control over the precise content and structure of the generated text, which may require careful editing and review to align with specific research goals, standards, and above all, accuracy.
  • Biases and ethical considerations: Language models like ChatGPT can inadvertently reflect biases present in the training data. These biases may perpetuate existing societal or cultural biases in research papers, potentially leading to unfair or discriminatory content. The careful examination and mitigation of biases are crucial to ensure ethical and unbiased research output. 7
  • Lack of interpretability: ChatGPT’s decision-making process is complex and not easily interpretable. Researchers may struggle to understand the reasoning behind the model’s generated responses, making it challenging to assess the reliability and credibility of the information provided. Ensuring transparency and interpretability in research papers becomes more challenging with such models. ChatGPT should cite the sources for its data, like Google Bard does.
  • Plagiarism concerns: Because of its vast training data from the Internet, ChatGPT may inadvertently generate text that resembles or replicates existing content without proper citation or attribution. Researchers must be cautious about unintentional plagiarism and ensure that generated content is appropriately referenced and original. So, all ChatGPt-generated articles need to be double checked using antiplagiarism software.

In this commentary I have provided a comprehensive step-by-step guide for researchers and authors on harnessing the power of ChatGPT in the realm of research and publication. By exploring crucial aspects, such as data preprocessing, fine-tuning techniques, prompt engineering, and ethical considerations, the guide equips researchers with the necessary knowledge and tools to effectively integrate ChatGPT into their scientific workflows. 8

Through clear instructions, examples, and guidelines, researchers can navigate the complexities of using ChatGPT, leading to enhanced productivity and improved quality in their research output. Moreover, I address the challenges associated with biases, interpretability, and plagiarism concerns, ensuring the responsible and ethical usage of ChatGPT as a research tool.

The significance of this research lies in its ability to bridge the gap between the rapid advancements in language models like ChatGPT and the research and publication process. By empowering researchers with the skills to leverage ChatGPT effectively, this guide fosters innovation, drives scientific progress, and opens up new possibilities for transformative contributions to various fields. 9

As language-driven technologies continue to evolve, researchers must stay abreast of the latest advancements and best practices. The step-by-step guide presented in this commentary serves as a valuable resource, providing researchers with the knowledge and guidance necessary to maximize the potential of ChatGPT in their research endeavors. By embracing the capabilities of ChatGPT and ensuring its responsible and ethical use, researchers can revolutionize the way research and publications are conducted. With ChatGPT as a powerful tool in their arsenal, researchers are poised to make significant strides in their respective fields, pushing the boundaries of scientific knowledge and ushering in a new era of language-driven innovation. 10

However, and to reiterate, I cannot overemphasize that ChatGPT has, at present, many disadvantages, including inconsistencies, bias, and plagiarism concerns, that must be addressed by the human author before the article is submitted for publication to a journal, as well as prior to publication, because the human author(s) is solely responsible for their research integrity and accurate reporting.

In conclusion, I have attempted to provide researchers with a comprehensive understanding of how to effectively leverage ChatGPT for research and publication purposes. It has also highlighted the problems and precautions that the human author(s) must take before publishing ChatGPT-generated content. By embracing this step-by-step guide, researchers can unlock the full potential of ChatGPT, driving scientific progress and shaping the future of research and publications.

  • Please use ChatGPT only if allowed by your institution, research lab, and the journal in question.
  • Please acknowledge ChatGPT within your manuscript/published paper wherever you are using it.
  • Please do not fabricate or plagiarize data. ChatGPT can be used only for summarizing texts, improving English writeups, and brainstorming ideas, and not for creating fabricated research raw data.

Acknowledgment.

The author acknowledges that this article was partially generated by ChatGPT (powered by OpenAI’s language model, GPT-3; http://openai.com ). The editing was performed by the human author.

Disclosures. The author declare no conflicts or financial interest in any product or service mentioned in the manuscript, including grants, equipment, medications, employment, gifts, and honoraria.

arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Computer Science > Computation and Language

Title: gpt-4 technical report.

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
Comments: 100 pages; updated authors list; fixed author names and added citation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: [cs.CL]
  (or [cs.CL] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

9 blog links

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Researchers Use GPT-4 To Generate Feedback on Scientific Manuscripts

Combining a large language model and open-source peer-reviewed scientific papers, researchers at Stanford built a tool they hope can help other researchers polish and strengthen their drafts.

Man working with piles of unfinished manuscript files

Scientific research has a peer problem. There simply aren’t enough qualified peer reviewers to review all the studies. This is a particular challenge for young researchers and those at less well-known institutions who often lack access to experienced mentors who can provide timely feedback. Moreover, many scientific studies get “desk rejected” — summarily denied without peer review.

Sensing a growing crisis in an era of increasing scientific study, AI researchers at Stanford University have used the large language model GPT-4 and a dataset of thousands of previously published papers — replete with their reviewer comments — to create a tool that can “pre-review” draft manuscripts.

“Our hope is that researchers can use this pipeline to improve their drafts prior to official submission to conferences and journals,” said James Zou , an assistant professor of biomedical data science at Stanford and a member of the Stanford Institute for Human-Centered AI (HAI). Zou is the senior author of the study , recently published on preprint service arXiv .

Numbers Don’t Lie

The researchers began by comparing comments made by a large language model against those of human peer reviewers. Fortunately, one of the foremost scientific journals, Nature, and its fifteen sub-journals ( Nature Medicine , etc.), not only publishes hundreds of studies a year but includes reviewer comments for some of those papers. And Nature is not alone. The International Conference on Learning Representations (ICLR) does the same with all papers — both accepted and rejected — for its annual machine learning conference.

“Between the two, we curated almost 5,000 peer-reviewed studies and comments to compare with GPT-4’s generated feedback,” Zou says. “The model did surprisingly well.”

The numbers resemble a Venn diagram of overlapping comments. Among the 3,000 or so Nature -family papers in the study, there was intersection between GPT-4 and human comments of almost 31 percent. For ICLR, the numbers were even higher, almost 40 percent of comments by GPT-4 and humans overlapped. What’s more, when looking only at the ICLR’s rejected papers (i.e., less mature papers) the overlap in comments between GPT-4 and humans grew to almost 44 percent — nearly half of all GPT-4 and human comments overlapped.

The significance of these numbers comes into sharper focus in light of the fact that even among humans there is considerable variation among comments by any given paper’s multiple reviewers. Human-to-human overlap was 28 percent for Nature journals and about 35 percent for ICLR. By these metrics, GPT-4 performed comparably to humans.

But while computer-to-human comparisons are instructive, the real test is whether the reviewed paper’s authors valued the comments provided by either review method. Zou’s team conducted a user study where researchers from over 100 institutions submitted their papers, including many preprints, and received GPT-4’s comments. More than half of the participating researchers found GPT-4 feedback “helpful/very helpful” and 82 percent found it “more beneficial” than certain feedback from some human reviewers.

Limits and Horizons

There are caveats to the approach, Zou is quick to highlight in the paper. Notably, GPT-4’s feedback can sometimes be more “generic” and may not pinpoint the deeper technical challenges in the paper. GPT-4 also has the tendency to focus only on limited aspects of scientific feedback (i.e., “add experiments on more datasets”) and comes up short on in-depth Insights on the authors’ methods.

Zou was further careful to emphasize that the team is not suggesting that GPT-4 take the “peer” out of peer review and replace human review. Human expert review “is and should continue to be” the basis of rigorous science, he asserts.

“But we believe AI feedback can benefit researchers in early stages of their paper writing, particularly when considering the growing challenges of getting timely expert feedback on drafts,” Zou concludes. “In that light, we think GPT-4 and human feedback complement one another quite well.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition.  Learn more . 

More News Topics

Academia Insider

How To Use ChatGPT To Write A Literature Review: Prompts & References

In the rapidly evolving world of academic research, the integration of AI tools like ChatGPT has transformed the traditional approach to literature reviews . As a researcher, you should also leverage on this tool to make your research work easier.

In this post, explore how ChatGPT can enhance the literature review process. We’ll explore how specific prompts can effectively guide this advanced AI model to generate insightful content, while ensuring accuracy, relevance, and academic integrity in their scholarly work.

How to Use ChatGPT for Writing a Literature Review

Understand ChatGPT’s Limitations– Relies on existing datasets, may miss latest research.
– May lack depth.
– Risk of generating plagiarized content
Define Research Objective– Define research questions or hypotheses.
– Summarizes current research and identifies literature.
– Assists in keyword identification and context understanding.
Identify Keywords and Search Terms– Generates relevant keywords from extensive dataset.
– Requires clear, concise prompts.
Create Initial Literature Review Outline– Aids in drafting preliminary literature review structure.
– Emphasizes refining outline with detailed research.
Use The Right Prompts– Craft precise prompts for relevant content
– Start with broad understanding, then focus on specifics.
Review ChatGPT’s Responses– Cross-reference with actual research for accuracy.
– Evaluate AI-generated text for coherence and depth.
– Ensure originality to avoid plagiarism.
Ensure Coherence and Flow– Use ChatGPT as a starting point; refine output.
– Review and edit for narrative flow and academic standards.
Edit and Proofread-Improve text coherence and logical progression.
– Check for plagiarism; ensure correct citations.
– Focus on grammar, spelling, and academic language.

Understanding ChatGPT’s Limitations

While it can efficiently generate content, streamline the research process, and provide a comprehensive understanding of relevant literature, its capabilities are not without constraints. Here are some for you to consider:

Dependence On Pre-Existing Datasets

Since ChatGPT is a language model trained on available data, it may not include the most recent research papers or cutting-edge findings in a specific field. This gap can lead to a lack of current state-of-research insights, particularly crucial in fields like technology and science where advancements happen rapidly.

May Lack Depth And Context

ChatGPT, while able to produce summaries and synthesize information, might not fully grasp the nuanced arguments or complex theories specific to a research topic. This limitation necessitates that researchers critically evaluate and supplement AI-generated text with thorough analysis and insights from recent systematic reviews and primary sources.

Risk Of Plagiarism

Although ChatGPT can generate human-like text, it’s vital to ensure that the content for your literature review is original and properly cited. Relying solely on ChatGPT to write a literature review defeats the purpose of engaging deeply with the material and developing a personal understanding of the literature.

Not A Total Replacement of A Researcher

While ChatGPT can assist non-native English speakers in crafting clear and concise academic writing, it’s not a replacement for the human ability to contextualize and interpret research findings. Researchers must guide the AI model with specific prompts and leverage it as a tool rather than a substitute for comprehensive analysis.

By keeping these limitations in mind, ChatGPT can be a valuable aid in the literature review process, but it should be used judiciously and in conjunction with traditional research methods.

Defining Research Objective

When starting on writing a literature review, the initial step involves using ChatGPT to define your research question or hypothesis.

The AI model’s ability to respond with a summary of the current state of research in your field can provide a comprehensive understanding, especially for systematic reviews or research papers.

For example, by inputting a prompt related to your research topic, ChatGPT can generate human-like text, summarizing prior research and highlighting relevant literature.

One insider tip for effectively using ChatGPT in the literature review process is to leverage its natural language processing capabilities to identify relevant keywords.

These keywords are crucial for non-native English speakers or those new to a research field, as they streamline the search for pertinent academic writing. Additionally, ChatGPT can guide you in understanding the context of your research topic, offering insights that are often challenging to find.

Using AI language models like ChatGPT for generating content for your literature review is efficient and effective, saving valuable time. However, it’s vital to critically evaluate the generated text to ensure it aligns with your research objectives and to avoid plagiarism.

can gpt 4 write research paper

ChatGPT’s ability to synthesize large amounts of information can aid in developing a clear and concise outline, but remember, it’s a guide, not a replacement for human analysis.

Despite these limitations, ChatGPT provides a unique advantage in conducting literature reviews. It can automate mundane tasks, allowing researchers to focus on analysis and critical thinking.

Identifying Keywords and Search Terms

Using ChatGPT to identify relevant keywords related to your research topic can significantly streamline your workflow.

For instance, when you input a summary of your research question into ChatGPT, the AI model can generate a list of pertinent keywords.

These keywords are not just randomly selected; they are based on the vast amounts of information in ChatGPT’s dataset, making them highly relevant and often inclusive of terms that are current in your research field.

An insider tip for leveraging ChatGPT effectively is to guide the AI with clear and concise prompts.

For example, asking ChatGPT to: “summarize key themes in [specific field] research papers from the last five years” can yield a list of keywords and phrases that are not only relevant but also reflective of the current state of research.

This approach is particularly beneficial for conducting systematic reviews or for non-native English speakers who might be unfamiliar with specific academic jargon.

While ChatGPT can provide a comprehensive understanding of relevant literature and help automate the identification of keywords, it’s important to critically evaluate the generated content.

Researchers should use ChatGPT as a tool to augment their research process, not as a replacement for human insight.

It’s crucial to mind the limitations of the AI model and ensure that the keywords identified align with the research topic and objectives.

Creating an Initial Literature Review Outline

The key to using ChatGPT effectively in crafting an initial outline lies in its ability to generate content based on specific prompts.

For instance, a researcher working on organic photovoltaic devices can input a prompt into ChatGPT, such as “Help me create a structure for a literature review on organic photovoltaic devices.”

The AI model, using its comprehensive understanding of the research topic, can then produce a preliminary structure, including sections like:

  • Introduction
  • Advances in materials and technology, performance, and efficiency.

This generated outline serves as a valuable starting point. It helps in organizing thoughts and determining the key areas that the literature review should cover. I

mportantly, researchers can refine and expand this initial outline as they delve deeper into their topic, ensuring it aligns with their specific research question and the current state of research.

However, while ChatGPT can streamline the review process and save valuable time in creating an initial outline, researchers should not solely rely on it.

can gpt 4 write research paper

The content generated by ChatGPT must be critically evaluated and supplemented with in-depth research. This involves:

  • Reading systematic reviews
  • Reading research papers, and
  • Summarizing relevant literature to ensure the review is comprehensive and up-to-date.

Get ChatGPT To Help You During Research, Using The Right Prompts

The key to effectively using ChatGPT in this process lies in crafting the right prompts, guiding the AI to generate relevant and useful content. 

When initiating a literature review, the prompt should aim for a broad understanding of the research topic. For instance, asking ChatGPT to:

  • “Give a brief overview of research done on [topic]”
  • “What are some of the recent findings on the [topic] in research?” or 
  • “Summarize the historical development of [topic] in academia”

Helps in capturing the general landscape of the field. These prompts assist in identifying key theories, methodologies, and authors within the research area. As the review progresses, more specific prompts are necessary to delve deeper into individual studies. Queries like:

  • “Summarize the main arguments and findings of [specific paper]” or
  • “What are the strengths and weaknesses of [specific paper]?”

enable ChatGPT to provide detailed insights into particular research papers, aiding in understanding their contribution to the broader field. Comparative prompts are also crucial in synthesizing information across multiple works. Asking ChatGPT to:

  • “Compare and contrast the methodologies of [paper 1] and [paper 2]” or
  • “How do the findings of [paper 1] and [paper 2] agree or disagree?”

helps in discerning the nuances and disparities in the literature. In the final stages of the literature review, prompts should focus on summarizing findings and identifying emerging trends or gaps. For example:

  • “What trends or patterns have emerged from the literature on [topic]?” or
  • “What future research directions are suggested by the literature on [topic]?”

We will share more on these ChatGPT prompts in the later part of this post, read on.

Reviewing ChatGPT’s Responses

When using ChatGPT to write a literature review, it’s crucial to critically evaluate its responses.

Firstly, researchers should cross-reference the information provided by ChatGPT with actual research papers.

This step ensures the accuracy of the data and helps in identifying any discrepancies or outdated information, given that ChatGPT’s dataset may not include the most recent studies.

Another essential aspect is assessing the coherence and depth of the AI-generated text. ChatGPT can summarize and synthesize information efficiently, but it might not capture the nuances of complex theories or research arguments.

Researchers should ensure that the content aligns with their research question and systematically reviews the topic comprehensively. This is where a researcher’s value comes in.

Additionally, verifying the originality of the content is vital to avoid plagiarism. While ChatGPT can generate human-like text, researchers must ensure that the AI-generated content is used as a guide rather than a verbatim source. 

Proper citations and references are essential to maintain the integrity of the literature review. Avoid torpedoing your own research by committing plagiarism.

Ensuring Coherence and Flow

One of the challenges when using such advanced AI language models is ensuring the coherence and flow of the final document. This aspect is crucial as it determines the readability and academic rigor of the literature review.

ChatGPT can generate vast amounts of content on a wide range of topics, responding efficiently to prompts and synthesizing information from its extensive dataset.

However, the content generated by ChatGPT, while informative, might not always align seamlessly with the specific research question or maintain a consistent narrative flow.

can gpt 4 write research paper

To tackle this, researchers need to take an active role in guiding ChatGPT and subsequently refining its output.

A practical approach is to use ChatGPT as a starting point, leveraging its ability to quickly provide summaries, synthesize relevant literature, and identify key references and keywords related to the research topic. For example, prompts like:

  • “Summarize the current research on [topic]” or
  • “Identify key debates in [topic]”

Can yield valuable initial insights.

Once this foundational information is obtained, the crucial task is to carefully review and edit the AI-generated content.

This involves connecting the dots between different sections, ensuring that each part contributes meaningfully to addressing the research question, and refining the language to maintain academic standards.

It’s also essential to check for and avoid plagiarism, ensuring that all sources are correctly cited.

In addition, considering the vast amounts of information ChatGPT can access, it’s vital to verify the accuracy and relevance of the content.

Researchers should cross-reference AI-generated summaries with actual research papers, especially the most recent ones, as ChatGPT’s dataset may not include the latest studies.

Editing and Proofreading

Now that your literature review is mostly written out, now focus on the editing and proofreading. The content generated by ChatGPT needs to be meticulously reviewed and edited. Here are the steps:

  • Verifying the accuracy of the information. Researchers must cross-check the AI-generated content against actual research papers and systematic reviews. This ensures that the latest studies are accurately represented.
  • Improve coherence and flow. Researchers should restructure sentences, ensure logical progression of ideas, and maintain a consistent academic tone throughout the document.
  • Checking for plagiarism. Despite ChatGPT’s ability to generate human-like text, researchers must ensure that all sources are correctly cited and that the review does not inadvertently replicate existing material.
  • Check Grammar and Spelling: Editing should encompass grammar checks, vocabulary refinement, and ensuring that the language used is appropriate for an academic audience.
  • Update Citation: Review citation, or reference list to ensure everything is cited correctly, and the citation list is written out to your required standard, be it MLA, Chicago, or APA.

What ChatGPT Prompts To Use When Writing A Literature Review?

There are many ways to use ChatGPT to write literature review, usually by using the right prompts. Here’s how specific types of prompts can be effectively employed, with multiple examples for each category:

  • “Provide a comprehensive overview of the latest research on [topic].”
  • “Summarize the current understanding and key findings in the field of [topic].”
  • “Detail the dominant theoretical frameworks currently used in [topic].”
  • “Describe the evolution of theoretical approaches in [topic] over the past decade.”
  • “Identify and discuss the major debates or controversies in [topic].”
  • “What are the conflicting viewpoints or schools of thought in [topic]?”
  • “List the leading researchers in [topic] and summarize their key contributions.”
  • “Who are the emerging authors in [topic], and what unique perspectives do they offer?”
  • “Explain the most common research methodologies used in studies about [topic].”
  • “How have the methodologies in [topic] research evolved recently?”
  • “Trace the historical development and major milestones in [topic].”
  • “Provide a timeline of the key discoveries and shifts in understanding in [topic].”
  • “What significant paradigm shifts have occurred in [topic] in the last twenty years?”
  • “How has the focus of research in [topic] changed over time?”
  • “Analyze the methodology and conclusions of [specific paper].”
  • “Discuss the impact and reception of [specific paper] in the field of [topic].”
  • “Compare the results and methodologies of [paper 1] and [paper 2] in [topic].”
  • “How do [paper 1] and [paper 2] differ in their approach to [topic]?”
  • “Based on current literature, what are the suggested future research directions in [topic]?”
  • “Identify gaps in the literature of [topic] that could be explored in future studies.”

By using these types of prompts, researchers can guide ChatGPT to produce content that is not only relevant to their literature review but also rich in detail and scope.

Wrapping Up: Use Other AI Tools Too, Not Just ChatGPT

In conclusion, while ChatGPT serves as a powerful ally in the literature review process, it’s important to recognize it as one of many AI tools available to researchers as well. Diversifying your AI toolkit can enhance the depth and breadth of your review, offering varied perspectives and methodologies.

As AI continues to evolve, embracing a range of these tools can lead to more comprehensive, nuanced, and innovative academic writing, expanding the horizons of research and scholarly exploration beyond what we currently envision.

can gpt 4 write research paper

Dr Andrew Stapleton has a Masters and PhD in Chemistry from the UK and Australia. He has many years of research experience and has worked as a Postdoctoral Fellow and Associate at a number of Universities. Although having secured funding for his own research, he left academia to help others with his YouTube channel all about the inner workings of academia and how to make it work for you.

Thank you for visiting Academia Insider.

We are here to help you navigate Academia as painlessly as possible. We are supported by our readers and by visiting you are helping us earn a small amount through ads and affiliate revenue - Thank you!

can gpt 4 write research paper

2024 © Academia Insider

can gpt 4 write research paper

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • CAREER COLUMN
  • 08 April 2024

Three ways ChatGPT helps me in my academic writing

  • Dritjon Gruda 0

Dritjon Gruda is an invited associate professor of organizational behavior at the Universidade Católica Portuguesa in Lisbon, the Católica Porto Business School and the Research Centre in Management and Economics.

You can also search for this author in PubMed   Google Scholar

Confession time: I use generative artificial intelligence (AI). Despite the debate over whether chatbots are positive or negative forces in academia, I use these tools almost daily to refine the phrasing in papers that I’ve written, and to seek an alternative assessment of work I’ve been asked to evaluate, as either a reviewer or an editor. AI even helped me to refine this article.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-024-01042-3

This is an article from the Nature Careers Community, a place for Nature readers to share their professional experiences and advice. Guest posts are encouraged .

Competing Interests

The author declares no competing interests.

Related Articles

can gpt 4 write research paper

  • Machine learning
  • Peer review

Tales of a migratory marine biologist

Tales of a migratory marine biologist

Career Feature 28 AUG 24

Nail your tech-industry interviews with these six techniques

Nail your tech-industry interviews with these six techniques

Career Column 28 AUG 24

How to harness AI’s potential in research — responsibly and ethically

How to harness AI’s potential in research — responsibly and ethically

Career Feature 23 AUG 24

LLMs produce racist output when prompted in African American English

LLMs produce racist output when prompted in African American English

News & Views 28 AUG 24

Urgently clarify how AI can be used in medicine under new EU law

Correspondence 27 AUG 24

AI firms must play fair when they use academic data in training

AI firms must play fair when they use academic data in training

Editorial 27 AUG 24

No more hunting for replication studies: crowdsourced database makes them easy to find

No more hunting for replication studies: crowdsourced database makes them easy to find

Nature Index 27 AUG 24

How South Korea can build better gender diversity into research

How South Korea can build better gender diversity into research

Nature Index 21 AUG 24

Postdoctoral Associate- Neuromodulation and Computational Psychiatry

Houston, Texas (US)

Baylor College of Medicine (BCM)

can gpt 4 write research paper

Postdoctoral Fellow

Modeling Autism Spectrum Disorders using genetically modified human stem cell-derived brain organoids and mouse models.

New York City, New York (US)

Weill Cornell Medical College

can gpt 4 write research paper

Scientific Visualization Developer

APPLICATION CLOSING DATE: September 29th, 2024   About the institute Human Technopole (HT) is an interdisciplinary life science research institute,...

Human Technopole

can gpt 4 write research paper

Faculty Positions Open, ShanghaiTech University

6 major schools are now hiring faculty members.

Shanghai, China

ShanghaiTech University

can gpt 4 write research paper

Faculty Positions in Neurobiology, Westlake University

We seek exceptional candidates to lead vigorous independent research programs working in any area of neurobiology.

Hangzhou, Zhejiang, China

School of Life Sciences, Westlake University

can gpt 4 write research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

'ZDNET Recommends': What exactly does it mean?

ZDNET's recommendations are based on many hours of testing, research, and comparison shopping. We gather data from the best available sources, including vendor and retailer listings as well as other relevant and independent reviews sites. And we pore over customer reviews to find out what matters to real people who already own and use the products and services we’re assessing.

When you click through from our site to a retailer and buy a product or service, we may earn affiliate commissions. This helps support our work, but does not affect what we cover or how, and it does not affect the price you pay. Neither ZDNET nor the author are compensated for these independent reviews. Indeed, we follow strict guidelines that ensure our editorial content is never influenced by advertisers.

ZDNET's editorial team writes on behalf of you, our reader. Our goal is to deliver the most accurate information and the most knowledgeable advice possible in order to help you make smarter buying decisions on tech gear and a wide array of products and services. Our editors thoroughly review and fact-check every article to ensure that our content meets the highest standards. If we have made an error or published misleading information, we will correct or clarify the article. If you see inaccuracies in our content, please report the mistake via this form .

How to make ChatGPT provide sources and citations

david-gewirtz

One of the biggest complaints about ChatGPT is that it provides information that is difficult to check for accuracy. Those complaints exist because ChatGPT doesn't provide the sources, footnotes, or links from which it derived the information in its answers.

While that is true for the GPT-3.5 model, GPT-4 and GPT-4o  provide more citation resources. While GPT-4 is only for paid subscribers, GPT-4o is available to both free and paid subscribers, although free users get fewer citations and less detail than users with a ChatGPT Plus subscription .

Also:  4 things Claude AI can do that ChatGPT can't  

Here's how ChatGPT describes the approach: "GPT-4o in free mode provides basic and essential citations, focusing on quick and concise references to ensure information is traceable. In contrast, GPT-4o in paid mode offers enhanced, detailed, and frequent citations, including multiple sources and contextual annotations to provide comprehensive verification and understanding of the information. This ensures a robust and reliable experience, especially beneficial for users requiring in-depth information and thorough source verification."

Even with the provided citations in GPT-4o, there are ways to improve your results.

1. Write a query and ask ChatGPT

To start, you need to ask ChatGPT something that needs sources or citations. I've found it's better to ask a question with a longer answer, so there's more "meat" for ChatGPT to chew on. 

Also: The best AI chatbots: ChatGPT and other interesting alternatives to try

Keep in mind that ChatGPT can't provide any information after January 2022 for GPT-3.5, April 2023 for GPT-4, and October 2023 for GPT-4o, and requests for information pre-internet (say, for a paper on Ronald Reagan's presidency) will have far fewer available sources.

Here's an example of a prompt I wrote on a topic that I worked on a lot when I was in grad school:

Describe the learning theories of cognitivism, behaviorism, and constructivism

2. Ask ChatGPT to provide sources

This is where a bit of prompt engineering comes in. A good starting point is with this query:

Please provide sources for the previous answer

I've found that this prompt often provides offline sources, books, papers, etc. The problem with offline sources is you can't check their veracity. Still, it's a starting point. A better query is this:

Please provide URL sources

This prompt specifically tells ChatGPT that you want clickable links to sources. You can also tweak this prompt by asking for a specific quantity of sources, although your mileage might vary in terms of how many you get back:

Please provide 10 URL sources

3. Push ChatGPT to give you higher-quality sources

Most large language models  respond well to detail and specificity . So if you're asking for sources, you can push for higher-quality sources. You'll need to specify that you need reliable and accurate sources. While this approach won't necessarily work, it may remind the AI chatbot to give you more useful responses. For example:

Please provide me with reputable sources to support my argument on... (whatever the topic is you're looking at)

You can also tell ChatGPT the kinds of sources you want. If you're looking for scholarly articles, peer-reviewed journals, books, or authoritative websites, mention these preferences explicitly. For example:

Please recommend peer-reviewed journals that discuss... (and here, repeat what you discussed earlier in your conversation)

When dealing with abstract concepts or theories, request that ChatGPT provide a conceptual framework and real-world examples. Here's an example:

Can you describe the principles of Vygotsky's Social Development Theory and provide real-world examples where these principles were applied, including sources for these examples?

This approach gives you a theoretical explanation and practical instances to trace the original sources or case studies.

Also: Two ways you can build custom AI assistants with GPT-4o

Another idea is to use sources that don't have link rot (that is, they're no longer online at the URL that ChatGPT might know). Be careful with this idea, though, because ChatGPT doesn't know about things after January 2022 for GPT-3.5, April 2023 for GPT-4, and October 2023 for GPT-4o. So, while you might be tempted to use a prompt like this:

Please provide me with sources published within the past five years.

Instead, consider using a prompt like this:

Please provide sources published from 2019 through April 2023.

And, as always, don't assume that whatever output ChatGPT gives you is accurate. It's still quite possible the AI will completely fabricate answers, even to the point of making up the names of what seem like academic journals. It's a sometimes helpful tool, but it's also  a fibber .

4. Attempt to verify/validate the provided sources

Keep this golden rule in mind about ChatGPT-provided sources: ChatGPT is more often wrong than right .

Across the many times I've asked ChatGPT for URL sources, roughly half were just plain bad links. Another 25% or more of the links went to topics completely or somewhat unrelated to the one I was trying to source. GPT-4 and GPT-4o are slightly more reliable, but not by much.

Also: How to use ChatGPT: Everything you need to know

For example, I asked for sources on a backgrounder for the phrase "trust but verify,"  generally popularized by US President Ronald Reagan. I got a lot of sources back, but most didn't exist. I got some back that correctly took me to active pages on the Reagan Presidential Library site, but the page topic had nothing to do with the phrase.

I had better luck with my learning theory question from step 1. There, I got back offline texts from people I knew from my studies who had worked on those theories. I also got back URLs. Once again, only about two in 10 worked or were accurate.

Also:  What does GPT stand for? Understanding GPT-3.5, GPT-4, GPT-4o, and more

Don't despair. The idea isn't to expect ChatGPT to provide sources that you can immediately use. If you instead think of ChatGPT as a research assistant, it will give you some great starting places. Use the names of the articles (which may be completely fake or just not accessible) and drop them into Google. That process will give you some interesting search queries that probably lead to interesting material that can legitimately go into your research.

Also, keep in mind that you're not limited to using ChatGPT. Don't forget all the tools available to researchers and students. Do your own web searches. Check with primary sources and subject-matter experts if they're available. If you're in school, you can even ask your friendly neighborhood librarian for help.

Also:  How to use ChatGPT to create an app

Don't forget that there are many excellent traditional sources. For example, Google Scholar and JSTOR  provide access to a wide range of academically acceptable resources you can cite with reasonable confidence.

One final point: if you merely cut and paste ChatGPT sources into your research, you're likely to get stung. Use the AI for clues, not as a way to avoid the real work of research.

How do you put sources in APA format? 

APA style is a citation style that's often required in academic programs. APA stands for American Psychological Association. I've often thought they invented these style rules to get more customers. The definitive starting point for APA style is the Purdue OWL , which provides a wide range of style guidelines.

Also:   GPT-3.5 vs GPT-4: Is ChatGPT Plus worth its subscription fee?

Be careful: online style formatters might not do a complete job, and you may get your work returned by your professor. It pays to do the work yourself -- and be careful doing it.

How can I make ChatGPT provide more reliable sources for its responses?

This is a good question. I have found that sometimes -- sometimes -- if you ask ChatGPT to give you more sources or re-ask for sources, it will give you new listings. If you tell ChatGPT the sources it provided were erroneous, it will sometimes give you better ones. The bot may also apologize and give excuses. Another approach is to re-ask your original question with a different focus or direction, and then ask for sources for the new answer.

Also: How to access, install, and use AI ChatGPT-4 plugins

Once again, my best advice is to avoid treating ChatGPT as a tool that writes for you and more as a writing assistant. Asking for sources to cut and paste a ChatGPT response is pretty much plagiarism. That said, using ChatGPT's responses, and any sources you can tease out, as clues for further research and writing is a legitimate way to use this tool.

Why are ChatGPT sources often so wrong? 

For some links, it's just link rot. Some links may have changed, since many sources are more than three years old. Other sources are of indeterminate age. Since we don't have a full listing of ChatGPT's sources, it's impossible to tell how valid they are or were. 

Also:   How does ChatGPT actually work?  

Since ChatGPT was trained mostly without human supervision , we know that most of its sources weren't vetted and could be wrong, made up, or completely non-existent.

Trust, but verify.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter , and follow me on Twitter/X at @DavidGewirtz , on Facebook at Facebook.com/DavidGewirtz , on Instagram at Instagram.com/DavidGewirtz , and on YouTube at YouTube.com/DavidGewirtzTV .

More on AI tools

10 ways to speed up your internet connection today, how to use chatgpt to analyze pdfs (and more) for free, how to use chatgpt: everything to know about using gpt-4o and gpt-4o mini.

Have a language expert improve your writing

Check your paper for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

Using AI tools

  • How to Write an Essay with ChatGPT | Tips & Examples

How to Write an Essay with ChatGPT | Tips & Examples

Published on June 22, 2023 by Koen Driessen . Revised on November 16, 2023.

Passing off AI-generated text as your own work is widely considered plagiarism. However, when used correctly, AI tools like ChatGPT can legitimately help guide your writing process.

These tools are especially helpful in the preparation and revision stages of your essay writing.

You can use ChatGPT to:

  • Write a research question
  • Develop an outline
  • Find relevant source information
  • Summarize or paraphrase text
  • Get feedback

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

Writing a research question, developing an outline, finding source recommendations, summarizing/paraphrasing text, getting feedback, other interesting articles, frequently asked questions.

You can use ChatGPT to brainstorm potential research questions or to narrow down your thesis statement . Begin by inputting a description of the research topic or assigned question. Then include a prompt like “Write 3 possible research questions on this topic.”

You can make the prompt as specific as you like. For example, you can include the writing level (e.g., high school essay, college essay), perspective (e.g., first person) and the type of essay you intend to write (e.g., argumentative , descriptive , expository , or narrative ).

You can also mention any facts or viewpoints you’ve gathered that should be incorporated into the output.

If the output doesn’t suit your topic, you can click “Regenerate response” to have the tool generate a new response. You can do this as many times as you like, and you can try making your prompt more specific if you struggle to get the results you want.

Example: Using ChatGPT to write a research question

  • Generate three possible research questions for an argumentative high school essay on the following topic: “The long-term impact of the Covid-19 pandemic.”
  • To what extent has the Covid-19 pandemic exacerbated existing social inequalities and what measures can be taken to address these disparities in the long term?
  • How has the Covid-19 pandemic transformed education systems, and what are the potential long-term implications for students’ learning outcomes and educational equity?
  • What lessons can be learned from the Covid-19 pandemic to enhance global preparedness and response to future public health crises, and how can these lessons be effectively implemented in the long term?
  • Based on the second question, develop three further research questions with a specific focus on distance learning.
  • What are the long-term effects of widespread adoption of distance learning during the Covid-19 pandemic on students’ academic performance, motivation, and overall educational outcomes?
  • How has the shift to distance learning impacted the social and emotional well-being of students, and what strategies can be implemented to address any negative consequences in the long term?
  • What are the key factors that contribute to the success of distance learning programs, and how can educational institutions and policymakers ensure equitable access to quality education in the long term, especially for marginalized and underserved student populations?

Check for common mistakes

Use the best grammar checker available to check for common mistakes in your text.

Fix mistakes for free

Once you’ve decided on a research question, you can use ChatGPT to develop an essay outline . This can help keep you on track by giving you a clear idea of what topics you want to discuss and in what order.

Do this by prompting ChatGPT to create an outline for a specific research question, mentioning any topics or points you want to discuss. You can also mention the writing level and the intended length of your essay so that the tool generates an appropriate outline.

You can then refine this by further prompting ChatGPT or editing the outline manually until it meets your requirements.

Examples: Using ChatGPT to generate an essay outline

  • Develop an outline for an argumentative high school essay with the following research question: “What are the long-term effects of widespread adoption of distance learning during the Covid-19 pandemic on students’ academic performance and overall educational outcomes?”The essay will be about 4 pages long.
  • Background information on the Covid-19 pandemic and the shift to distance learning
  • Thesis statement: The widespread adoption of distance learning during the Covid-19 pandemic has had both positive and negative long-term effects on students’ academic performance and overall educational outcomes.
  • Increased flexibility and personalized learning opportunities
  • Access to a wider range of resources and educational materials
  • Development of self-directed learning skills
  • Critiques and limitations of previous studies
  • Lack of face-to-face interaction and reduced social engagement
  • Increased potential for distractions and lack of structure
  • Technological barriers and unequal access to resources
  • Examination of studies and data on academic achievement during distance learning
  • Analysis of long-term effects on educational attainment and future opportunities
  • Discussion of potential implications for college admissions and career readiness
  • Strategies for promoting student engagement and motivation in distance learning
  • Importance of teacher-student communication and support
  • Addressing technological disparities and ensuring equitable access to resources
  • Recap of the main points discussed
  • Emphasize the need for further research and long-term planning to optimize distance learning experiences
  • Restate the thesis and provide a final thought on the potential long-term effects of distance learning on students’ academic performance and overall educational outcomes

Once you know the scope of your essay, you can find relevant primary and secondary sources to support your argument.

However, we don’t recommend prompting ChatGPT to generate a list of sources as it occasionally makes mistakes (like listing nonexistent sources). Instead, it’s a good idea to use ChatGPT to get suggestions for the types of sources relevant to your essay and track them down using a credible research database or your institution’s library.

When you have found relevant sources, use a specialized tool like the Scribbr Citation Generator to cite them in your essay.

Example: Using ChatGPT to find sources

  • What types of sources can I use to write an essay on the following research question?”What are the long-term effects of widespread adoption of distance learning during the Covid-19 pandemic on students’ academic performance and overall educational outcomes?”
  • Academic Journals and Research Articles: Look for scholarly articles and research studies published in peer-reviewed academic journals. These sources often provide in-depth analysis, empirical evidence, and statistical data on the long-term effects of distance learning.
  • Educational and Government Reports: Reports published by educational institutions, government agencies, or educational organizations can offer valuable insights into the impact of distance learning on academic performance and educational outcomes. These reports may include surveys, case studies, and data analysis specific to the Covid-19 pandemic and distance learning.
  • Official Websites and Publications: Explore the official websites of educational institutions, education ministries, or educational technology companies. They often publish articles, white papers, or research findings related to distance learning and its effects on students’ academic performance.
  • Expert Interviews or Opinion Pieces: Seek out interviews or opinion pieces from experts in the field of education, distance learning, or pedagogy. These sources can provide valuable perspectives and insights on the long-term effects of distance learning on students’ educational outcomes.
  • Online Databases and Digital Libraries: Utilize online databases and digital libraries such as JSTOR, Google Scholar, or academic databases provided by your educational institution. These platforms offer a wide range of scholarly articles, research papers, and publications that can support your arguments.
  • Surveys and Studies Conducted by Educational Institutions: Some schools or universities may have conducted their own surveys or studies to evaluate the impact of distance learning on students. Check if your educational institution has published any research findings or survey results that are relevant to your topic.

During your writing process, you can use ChatGPT as a summarizing tool to condense text to its essential ideas or as a paraphraser to rephrase text in clear, accessible language. Using ChatGPT in these ways can help you to understand complex material, express your own ideas more clearly, and avoid repetition.

Simply input the relevant text and prompt the tool to summarize or paraphrase it. Alternatively, you can use Scribbr’s free text summarizer and Scribbr’s free paraphrasing tool , which are specifically designed for these purposes.

Example: Using ChatGPT to summarize/paraphrase

  • Paraphrase the following text using clear language:”Engaging in an extensive examination of the intricate interplay between remote instructional methods and academic achievement necessitates consideration of diverse factors, including the pedagogical efficacy of virtual platforms, the socioemotional ramifications of reduced face-to-face interactions, the equitable accessibility of technological resources, and the potential long-lasting implications for students’ educational trajectories and future opportunities.”
  • To thoroughly explore the connection between remote teaching methods and academic success, it is crucial to take into account various factors. These factors include evaluating how well virtual platforms support effective teaching, understanding the impact of limited in-person interactions on students’ emotional well-being, ensuring fair access to technology resources, and recognizing the potential long-term effects on students’ educational paths and future prospects.

Once you’ve written your essay, you can prompt ChatGPT to provide feedback and recommend improvements.

You can indicate how the tool should provide feedback (e.g., “Act like a university professor examining papers”) and include the specific points you want to receive feedback on (e.g., consistency of tone, clarity of argument, appropriateness of evidence).

While this is not an adequate substitute for an experienced academic supervisor, it can help you with quick preliminary feedback.

You can also use ChatGPT to check grammar mistakes. However, ChatGPT sometimes misses errors and on rare occasions may even introduce new grammatical mistakes. We suggest using a tool like Scribbr’s free grammar checker , which is designed specifically for this purpose. Or, for more in-depth feedback, try Scribbr’s proofreading and editing service .

Example: Using ChatGPT for feedback

  • Feedback the above text with regard to consistency of tone, clarity of argument, appropriateness of evidence, and coherence of structure.
  • Clarity of Argument: The argument could be further clarified by explicitly connecting the discussed factors to their impact on academic performance. For example, explicitly stating how the proficiency of virtual platforms and equitable access to technology affect academic performance would strengthen the clarity of the argument.
  • Appropriateness of Evidence: The evidence provided is generally appropriate and relevant. To enhance the strength of the evidence, consider incorporating specific examples, research studies, or statistics that directly support the discussed factors and their impact on academic performance. This would provide stronger evidence for the claims made.
  • Coherence of Structure: The structure of the text is coherent, with a clear introduction, body paragraphs addressing different factors, and a conclusion. To improve coherence, consider adding transitional phrases or sentences at the beginning or end of paragraphs to facilitate the smooth flow of ideas between paragraphs. This would enhance the overall structure and make it easier for the reader to follow the argument.

Overall, the text demonstrates a consistent tone, a clear argument, appropriate evidence, and a coherent structure. Clarifying the argument by explicitly connecting the factors to their impact, incorporating stronger evidence, and adding transitional phrases for better coherence would further enhance the text’s effectiveness. Note Passing off AI-generated text as your own work is generally considered plagiarism (or at least academic dishonesty ) and may result in an automatic fail and other negative consequences . An AI detector may be used to detect this offense.

If you want more tips on using AI tools , understanding plagiarism , and citing sources , make sure to check out some of our other articles with explanations, examples, and formats.

  • Citing ChatGPT
  • Best grammar checker
  • Best paraphrasing tool
  • ChatGPT in your studies
  • Is ChatGPT trustworthy?
  • Types of plagiarism
  • Self-plagiarism
  • Avoiding plagiarism
  • Academic integrity
  • Best plagiarism checker

Citing sources

  • Citation styles
  • In-text citation
  • Citation examples
  • Annotated bibliography

Yes, you can use ChatGPT to summarize text . This can help you understand complex information more easily, summarize the central argument of your own paper, or clarify your research question.

You can also use Scribbr’s free text summarizer , which is designed specifically for this purpose.

Yes, you can use ChatGPT to paraphrase text to help you express your ideas more clearly, explore different ways of phrasing your arguments, and avoid repetition.

However, it’s not specifically designed for this purpose. We recommend using a specialized tool like Scribbr’s free paraphrasing tool , which will provide a smoother user experience.

No, it’s not a good idea to do so in general—first, because it’s normally considered plagiarism or academic dishonesty to represent someone else’s work as your own (even if that “someone” is an AI language model). Even if you cite ChatGPT , you’ll still be penalized unless this is specifically allowed by your university . Institutions may use AI detectors to enforce these rules.

Second, ChatGPT can recombine existing texts, but it cannot really generate new knowledge. And it lacks specialist knowledge of academic topics. Therefore, it is not possible to obtain original research results, and the text produced may contain factual errors.

However, you can usually still use ChatGPT for assignments in other ways, as a source of inspiration and feedback.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Driessen, K. (2023, November 16). How to Write an Essay with ChatGPT | Tips & Examples. Scribbr. Retrieved August 26, 2024, from https://www.scribbr.com/ai-tools/chatgpt-essay/

Is this article helpful?

Koen Driessen

Koen Driessen

Other students also liked, how to write good chatgpt prompts, how to use chatgpt in your studies, what are the limitations of chatgpt, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Home → Academic Writing → How to Use ChatGPT to Write a Research Paper: Tips and Tricks to Get You Started

How to Use ChatGPT to Write a Research Paper: Tips and Tricks to Get You Started

Picture of Jordan Kruszynski

Jordan Kruszynski

  • January 4, 2024

can gpt 4 write research paper

If you’re an academic, you’ve probably spent a lot of time planning and writing research papers. It’s a fine art, and one that requires a fair amount of skill, precision and preparation. But whether you’re an old master in the field, or a rookie embarking on your first serious research paper, there are tools that can help you to simplify the planning stages and organise your thoughts more effectively.

One of those tools is the (in)famous ChatGPT , and it’s this that we’ll be focusing on in our article. In the right hands, ChatGPT can become a powerful research tool that will make your paper-writing that little bit easier. So sit back, relax, and discover our tips and tricks for using ChatGPT to write a research paper.

What is ChatGPT?

Just in case you don’t already know, ChatGPT is an artificial intelligence tool developed by OpenAI that can help you with your research. It uses natural language processing to understand what you’re looking for and provide you with relevant information. You can ask it questions, and it will provide you with answers in a conversational style, as well as offer sources to back up its information.

One of the biggest advantages of ChatGPT is that it can save you time. Instead of spending hours searching for sources, you can simply ask ChatGPT for help. This can bring you a reliable list of sources for further investigation fairly quickly. It’s crucial to note however that the AI shouldn’t be exploited to do the actual writing of the paper for you. This could see you accused of plagiarism or misconduct, and besides, as a researcher, you’re probably rightfully proud of your ability to write a compelling paper.

Another advantage of ChatGPT is that it’s always available (even when libraries or other sources of information might be inaccessible) so you can work on your research paper at any time of day or night.

Interested in learning more about how AI programs like ChatGPT are changing the academic landscape? Listen to Oxford researcher Samantha-Kaye Johnston’s views from the frontline in this exciting episode of The Research Beat podcast.

The benefits of using ChatGPT to write a research paper

There are many benefits to using ChatGPT for research papers. Firstly, as we mentioned earlier, it can save you time . A slow drag of several hours looking for specific sources can be reduced to just a few minutes with the AI’s help.

Secondly, it can help you find sources that you might not have found otherwise . ChatGPT has access to a wide range of sources, including academic journals and books.

Thirdly, it can help you organise your research . ChatGPT can provide you with a summary of the information you’ve gathered, making it easier to analyse and integrate into your research paper.

How to use ChatGPT to write a research paper – a step by step guide

  • Start by creating a list of questions that you want to answer in your research paper.
  • Open ChatGPT and ask it one of the questions on your list, for example, ‘What is the critical history of feminist literature in Europe?’
  • ChatGPT will provide you with a list of sources to check out.
  • Read through the sources and take notes on the information that is relevant to your research question.
  • Repeat steps 2-4 for each question on your list.
  • Once you’ve gathered all of your information, organise it into an outline for your research paper.
  • Use the information you’ve gathered to write your research paper.

Working with your sources

Once you have your sources in order, you might want to use prompts to get help from ChatGPT with other parts of the writing process. A prompt is a specific instruction to the AI that can give you tailored information or responses. For example, if you’re struggling to understand part of another research paper, you could use the following prompt:

‘Please explain the following paragraphs in simple words. I am having trouble understanding (insert concept here).’

Input the prompt along with the relevant passage from the source, and ChatGPT will provide a summary that could help you to unlock your understanding of the tricky concept.

Looking for ChatGPT prompts tailor-made for academics? Check out Audemic’s list of over 50 prompts to help you with your work and research!

Writing tips and tricks for using ChatGPT

When using ChatGPT to write a research paper, it’s important to keep a few things in mind. Firstly, make sure that you’re using reliable sources . ChatGPT can provide you with a list of sources, but it’s up to you to determine which ones are reliable.

Secondly, make sure that you’re paraphrasing the information you’ve gathered in your own words . You don’t want to cheat or be accused of it.

Finally, make sure that you’re using the information you’ve gathered to answer your research questions . Everything you uncover through ChatGPT should be used to feed your own understanding and improve the quality and precision of your answers.

Common mistakes to avoid when using ChatGPT to write a research paper

While ChatGPT is an excellent tool for research papers, there are some common mistakes to avoid:

  • Crucially, don’t rely too heavily on ChatGPT. It’s essential to do some of your research on your own and use ChatGPT to supplement it.
  • Don’t forget to cite your sources correctly. Just because ChatGPT provided you with the information doesn’t mean that you don’t need to cite it. Moreover, ChatGPT cannot actually produce academic citations for you.
  • Always remember to proofread your research paper carefully, especially if you’ve used AI elements to construct it.

ChatGPT vs. traditional research methods

While traditional research methods have their advantages and always will, ChatGPT, as we’ve seen, has some of its own. We think one of the best uses for AI programs like ChatGPT is to accelerate parts of the paper-writing process that would otherwise take hours. If you can use the AI to produce a list of interesting and relevant sources, then you can get to work quickly as an academic, studying and analysing those sources to determine their value within your paper. In general, if you approach ChatGPT with an attitude of maintaining quality and integrity, then it can only enhance your work.

Final Thoughts

ChatGPT is everywhere at the moment, and while it has stirred up a great deal of controversy thanks to its implications for academic integrity, it can be an excellent tool for helping to write research papers . The key is using it correctly, and not relying too heavily on it. Focusing on how it can enhance your already-sharp academic writing skills will allow you to save time, find plenty of valuable sources, and organise your paper’s structure more effectively. And that’s how to use ChatGPT to write a research paper!

ChatGPT is not the only AI tool that can help with these parts of the paper-writing puzzle. If you want to listen to academic papers, break them down into digestible pieces and freely take notes on them, then Audemic is waiting for you. Try it for free today!

Keep striving, researchers! ✨

Table of Contents

Related articles.

can gpt 4 write research paper

How to Publish a Research Paper: A Step-by-Step Guide

You’re in academia. You’re going steady. Your research is going well and you begin to wonder: ‘How exactly do I get a

can gpt 4 write research paper

Behind the Scenes: What Does a Research Assistant Do?

Have you ever wondered what goes on behind the scenes in a research lab? Does it involve acting out the whims of

can gpt 4 write research paper

How to Write a Research Paper Introduction: Hook, Line, and Sinker

Want to know how to write a research paper introduction that dazzles? Struggling to hook your reader in with your opening sentences?

Priceton-logo

Blog Podcast

Privacy policy Terms of service

Subscribe to our newsletter!

Discover more from Audemic: Access any academic research via audio

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

can gpt 4 write research paper

Teaching GPT-4 to write code from research papers

Ethan Steininger

Ethan Steininger

A new research paper came out on building reccomendation systems, authored by the ByteDance team: https://arxiv.org/pdf/2209.07663.pdf

I’m no scientist but I love to understand the latest and greatest in research, which is a perfect task for GPT-4.

Prepping the content

Since GPT-4 has a token size limit, we first split the paper into pages:

Now we create our gpt function to simply accept a prompt as string and return the response:

Now we put it together, extract each page’s contents using tika and send the contents as a prompt to chatgpt with a quick qualifier:

Generating the code

If you want all the code together: Github

Now this code isn’t obviously runnable out the gate, but it gives you a starting point for implementing something interesting. We can also ask it to summarize the implementation:

Title: “Spectrum-Preserving Generative Adversarial Networks for Anomaly Detection in High Dimensional Data” Summary: This paper introduces Spectrum-Preserving Generative Adversarial Networks (SP-GANs), a novel approach to anomaly detection in high-dimensional data. The key idea is to preserve the spectral structure of the training data while generating synthetic samples, which allows for improved anomaly detection. SP-GANs consist of a generator and a discriminator, like standard GANs, but with a few important differences. The generator is designed to generate samples in the spectral domain rather than the spatial domain, and the discriminator is trained to distinguish between real and synthetic samples based on their spectral properties. This enables the preservation of the spectral structure of the training data, which is important for accurately capturing the characteristics of normal data and detecting anomalies. The paper presents experimental results on several high-dimensional datasets, including image and sensor data, demonstrating the effectiveness of SP-GANs in anomaly detection tasks. The results show that SP-GANs outperform other state-of-the-art anomaly detection methods, particularly in high-dimensional settings. As a software engineer, you can appreciate the novelty of the SP-GAN architecture and its potential applications in various domains that require effective anomaly detection, such as cybersecurity, fraud detection, and quality control.

Remarkable.

Ethan Steininger

Written by Ethan Steininger

https://ethan.dev

Text to speech

Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors

  • Imaging Informatics and Artificial Intelligence
  • Open access
  • Published: 28 August 2024

Cite this article

You have full access to this open access article

can gpt 4 write research paper

  • Yasuhito Mitsuyama 1 ,
  • Hiroyuki Tatekawa 1 ,
  • Hirotaka Takita 1 ,
  • Fumi Sasaki 1 ,
  • Akane Tashiro 1 ,
  • Satoshi Oue 1 ,
  • Shannon L. Walston 1 ,
  • Yuta Nonomiya 2 ,
  • Ayumi Shintani 2 ,
  • Yukio Miki 1 &
  • Daiju Ueda   ORCID: orcid.org/0000-0002-3878-3616 1 , 3  

Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists.

We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar’s test and Fisher’s exact test were used for statistical analysis.

In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists’ accuracy ranged from 65 to 79%. GPT-4’s final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4’s accuracy was 94%, while radiologists’ fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4’s accuracy remained consistent whether reports were from neuroradiologists or general radiologists.

GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents.

Clinical relevance statement

This study evaluated GPT-4-based ChatGPT’s diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists.

We investigated the diagnostic accuracy of GPT-4 using real-world clinical MRI reports of brain tumors .

GPT-4 achieved final and differential diagnostic accuracy that is comparable with neuroradiologists .

GPT-4 has the potential to improve the diagnostic process in clinical radiology .

Graphical Abstract

can gpt 4 write research paper

Explore related subjects

  • Medical Imaging
  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

The emergence and subsequent advancements of large language models (LLMs) like the GPT series have recently dominated global technology discourse [ 1 ]. These models represent a new frontier in artificial intelligence, using machine learning techniques to process and generate language in a way that rivals human-level complexity and nuance. The rapid evolution and widespread impact of LLMs have become a global phenomenon, prompting discussions on their potential applications and implications [ 2 , 3 , 4 , 5 ]. Moreover, the introduction of chatbots like Chat Generative Pre-trained Transformer (ChatGPT), which use these large language models to generate conversations, has made it easier to utilize these models in a conversational format.

Within the realm of LLMs, the GPT series, in particular, has gained significant attention. Many applications have been explored within the field of radiology [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 ]. Among these, the potential of GPT to assist in diagnosis from image findings is noteworthy [ 18 , 19 , 20 ] because such capabilities could complement the essential aspects of daily clinical practice and education. Two studies show the potential of GPT-4 to generate differential diagnosis in the field of neuroradiology [ 19 , 20 ]. One study utilizes the “Case of the Week” from the American Journal of Neuroradiology [ 19 ], and the other study utilizes “Freiburg Neuropathology Case Conference” cases from the Clinical Neuroradiology journal [ 20 ]. Additionally, large language models like GPT-4 have shown differential diagnostic potential in subspecialties beyond the field of neuroradiology [ 6 ].

Although these pioneering investigations suggest that GPT-4 could play an important role in radiological diagnosis, there are no studies reporting evaluation using real-world radiology reports. Unlike quizzes [ 19 , 20 ], which tend to present carefully curated, typical cases and are created by individuals already aware of the correct diagnosis, real-world radiology reports may contain less structured and more diverse information. This difference might lead to biased evaluations that do not reflect the complex nature of clinical radiology [ 22 , 23 ].

To address this gap, our study examines the diagnostic abilities of GPT-4 using only real-world clinical radiology reports. Large language models like GPT-4 are often utilized in various fields through chatbots such as ChatGPT. Therefore, we specifically evaluated the diagnostic capabilities of GPT-4-based ChatGPT in real-world clinical settings to see how effectively it can diagnose medical conditions. We zeroed in on MRI reports pertaining to brain tumors, given the pivotal role radiological reports play in determining treatment routes such as surgery, medication, or monitoring; and that pathological outcomes offer a definitive ground truth for brain tumors [ 24 ]. We compare the performance of GPT-4 with that of neuroradiologists and general radiologists, aiming to provide a more comprehensive view. Through this investigation, we aim to uncover the capabilities and potential limitations of GPT-4 as a diagnostic tool in a real-world clinical setting. In our daily clinical practice, thinking through differential and final diagnoses can be challenging and time-consuming. If GPT-4 can excel in this diagnostic process, it indicates potential value in clinical scenarios.

Study design

In this retrospective study, GPT-4-based ChatGPT was presented with imaging findings from our real reports and asked to suggest differential and final diagnoses. For a fair comparison, we also presented the same image findings in text form to radiologists and requested differential diagnoses and a final diagnosis. The protocol of this study was reviewed and approved (approval no. 2023-015) by the Ethical Committee of Osaka Metropolitan University Graduate School of Medicine. This study was conducted in accordance with the Declaration of Helsinki. The requirement for informed consent was waived because the radiology reports had been acquired during daily clinical practice. The design of this study is based on the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guideline [ 25 ].

Radiology experts

In this study, three neuroradiologists and four general radiologists were selected. Neuroradiologists were radiologists certified by the Japanese Society of Radiology as specialists in diagnostic imaging, specializing in the central nervous system. General radiologists were defined as radiology residents or radiologists who specialize in areas other than imaging diagnosis. One neuroradiologist and one general radiologist reviewed the collected findings, while the other two neuroradiologists and three general radiologists conducted the reading test.

Data collection

In this study, we consecutively collected brain MRI image findings of preoperative brain tumors from radiological reports taken at Osaka Metropolitan University Hospital (Institution A) from January 2021 to December 2021 and National Hospital Organization Osaka Minami Medical Center (Institution B) from January 2017 to December 2021. These imaging findings were subsequently verified by a neuroradiologist (a board-certified radiologist with 8 years of experience) and a general radiologist (a radiology resident with 4 years of experience). When a diagnosis was described in the imaging findings, it was also removed to avoid data leakage. Any descriptions related to previous imaging findings and unrelated image descriptors (such as ‘Image 1’), were deleted. The report writer (neuroradiologist or general radiologist) was noted.

In- and output procedure for GPT-4-based ChatGPT

All MRI reports were originally written in Japanese and translated into English by a general radiologist (a radiology resident with 4 years of experience). A neuroradiologist (a board-certified radiologist with 8 years of experience) verified that there was no loss of information in the translation. Both radiologists use English in their daily practice. Before each MRI report, the same prompt was processed consecutively in a single conversation. This prompt uses a closed-ended and zero-shot prompting approach. Based on the prompts from previous studies [ 18 , 26 ], our prompt has been modified to specify that the input findings are from head MRI and to request three differential diagnoses ranked in order of likelihood. A neuroradiologist (a board-certified radiologist with 8 years of experience) and a general radiologist (a radiology resident with 4 years of experience) verified that ChatGPT, when given this prompt, ranks three differential diagnoses. We input the following premise into ChatGPT based on the GPT-4 architecture (May 24 version; OpenAI, California, USA; https://chat.openai.com/ ): “List three possible differential diagnoses in order of likelihood from the following head MRI findings.” Then, we input the imaging findings created during clinical practice and received three differential diagnoses from GPT-4. The diagnosis listed highest among the three differential diagnoses was determined to be the final diagnosis. An example of the input to ChatGPT and the output of ChatGPT is shown in Fig.  1 . The information of the report writers was not provided to GPT-4.

figure 1

Examples of interface with ChatGPT. These are input texts to ChatGPT and output texts generated by ChatGPT. The diagnosis listed highest among the three differential diagnoses was determined to be the final diagnosis

Radiologist reading test

We provided the same image findings that were input into GPT-4 to two neuroradiologists (A: a board-certified radiologist with 13 years of experience, B: a board-certified radiologist with 8 years of experience) and three general radiologists (C: a radiology resident with 4 years of experience, D: a radiology resident with 3 years of experience, and E: a radiology resident with 2 years of experience). Readers’ years of experience and specialty certification are shown in Table  1 . These two neuroradiologists and three general radiologists were different from the radiologists who verified the image findings during data collection. They read only these text findings and provided three differential diagnoses including one final diagnosis. Two neuroradiologists and three general radiologists were blind to the information of the report writers.

Output evaluation

We utilized the pathological diagnosis of the tumor that was excised in neurosurgery as the ground truth. A neuroradiologist (a board-certified radiologist with 8 years of experience) and a general radiologist (a radiology resident with 4 years of experience) confirmed whether the differential diagnoses and final diagnosis suggested by both the LLM output and the interpretations of the neuroradiologists and general radiologists were aligned with the pathological diagnosis.

Statistical analysis

We computed the accuracy of both the differential and final diagnoses made by GPT-4 and those of two neuroradiologists and three general radiologists. To compare the diagnostic accuracy of the differential and final diagnoses between GPT-4 and each radiologist, we conducted McNemar’s test. Additionally, we calculated these accuracies separately for when the reporter was a neuroradiologist and when the reporter was a general radiologist, to examine how the quality of input (image findings) affects the diagnoses both by GPT-4 and radiologists. Moreover, Fisher’s exact test was performed to compare the diagnostic accuracy of both GPT-4 and the five radiologists, resulting from the reports by neuroradiologist or general radiologist reporters. p -values less than 0.05 were considered significant. p -values were not corrected for multiple comparisons. These statistical tests were performed using R (version 4.3.1, 2023; R Foundation for Statistical Computing; https://R-project.org ). We measured word counts before and after simplifying MRI findings by both reporter and institution. The mean byte count of MRI reports was assessed for each reporter and institution. We calculated Cohen’s kappa coefficient between GPT-4 and each radiologist. We grouped cases based on the number of radiologists (from 0 to 5) who correctly diagnosed each case and analyzed the accuracy of ChatGPT for the cases in each of the six groups.

A total of 150 radiological reports were included in this research after excluding 96 reports according to the exclusion criteria. A data collection flowchart is shown in Fig.  2 . Demographics of brain MRI cases are shown in Table  2 . The word count of MRI findings by reporter and institution is shown in Supplementary Appendix Table  1 . The average byte count of MRI reports by reporter and institution is shown in Supplementary Appendix Table  2 .

figure 2

Flowchart of data collection. This is the data collection flowchart

The accuracy for final and differential diagnoses by GPT-4, Neuroradiologists A, B, and General radiologists C, D, and E are shown in Table  3 and Fig.  3 . In the final diagnosis, GPT-4 demonstrated diagnostic accuracy comparable to those of neuroradiologists and general radiologists. The accuracy rates were as follows: GPT-4: 73% (95% CI: 65, 79%), Neuroradiologist A: 65% (95% CI: 57, 72%), Neuroradiologist B: 79% (95% CI: 72, 85%), General radiologist C: 65% (95% CI: 57, 72%), General radiologist D: 73% (95% CI: 66, 80%), and General radiologist E: 65% (95% CI: 57, 72%). In differential diagnoses, GPT-4 showed diagnostic accuracy that surpassed those of both neuroradiologists and general radiologists. The accuracy rates were as follows: GPT-4: 94% (95% CI: 89, 97%), Neuroradiologist A: 87% (95% CI: 80, 91%), Neuroradiologist B: 89% (95% CI: 83, 93%), General radiologist C: 76% (95% CI: 69, 82%), General radiologist D: 83% (95% CI: 77, 88%), and General radiologist E: 73% (95% CI: 66, 80%).

figure 3

Accuracy of GPT-4 and radiologists. The point plots with 95% confidence intervals represent the accuracy of GPT-4 and radiologists for the final and differential diagnoses, respectively. The blue, orange, and green plots indicate the accuracy of total report reading, neuroradiologist-writing report reading, and general radiologist-writing report reading, respectively

The accuracy per reporter for final and differential diagnoses by GPT-4, the two neuroradiologists, and the three general radiologists are shown in Table  4 and Fig.  3 . In the final diagnosis, GPT-4 showed a statistically significant difference in diagnostic accuracy when using reports created by neuroradiologists and general radiologists. The accuracy rates were as follows: Neuroradiologist’s report: 80% (95% CI: 71, 87%), General radiologist’s report: 60% (95% CI: 47, 72%), p -value: 0.013. The Cohen’s Kappa scores presented in Table  5 indicate varying levels of agreement among different radiologists. Neuroradiologists A and B showed a higher agreement with each other (0.46) compared to their agreement rates with general radiologists, which ranged from 0.36 to 0.43. Among the general radiologists (C, D, E), the highest agreement was seen between radiologists C and D (0.49), and C and E (0.46), indicating that general radiologists also tend to have higher agreement rates with each other than with neuroradiologists. The accuracy of GPT-4, ranked by the number of correct responses from the five radiologists, is shown in Supplementary Appendix Table  3 .

GPT-4 and five radiologists were presented with preoperative brain MRI findings from 150 cases and asked to list differential and final diagnoses. For final diagnoses, GPT-4’s accuracy was 73% (95% CI: 65, 79%). In comparison, neuroradiologists A through general radiologist E had accuracies of 65% (95% CI: 57, 72%), 79% (95% CI: 72, 85%), 65% (95% CI: 57, 72%), 73% (95% CI: 66, 80%), and 65% (95% CI: 57, 72%), respectively. For differential diagnoses, GPT-4 achieved 94% (95% CI: 89, 97%) accuracy, while the radiologists’ accuracies ranged from 73% (95% CI: 66, 80%) to 89% (95% CI: 83, 93%). In the final diagnoses, GPT-4 showed an accuracy of 80% (95% CI: 71, 87%) with reports from neuroradiologists, compared to 60% (95% CI: 47, 72%) with those from general radiologists, a statistically significant difference ( p -value: 0.013). On the other hand, GPT-4’s differential diagnostic accuracy was 95% (95% CI: 88, 98%) with reports from neuroradiologists and 93% (95% CI: 83, 97%) with reports from general radiologists, not a statistically significant difference ( p -value: 0.73). Cohen’s Kappa scores indicated an overall fair to moderate agreement rate. This suggests that even among neuroradiologists, there may have been many tasks prone to diagnostic disagreements. Additionally, it showed slightly higher agreement rates among physicians of the same specialty. That is, neuroradiologists had a higher agreement rate among themselves than with general radiologists, and general radiologists had a higher agreement rate among themselves than with neuroradiologists.

This study is the first attempt to evaluate GPT-4’s ability to interpret actual clinical radiology reports, rather than from settings like image diagnosis quizzes. The majority of previous research [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 17 , 18 , 19 , 20 , 21 ] suggested the utility of GPT-4 in diagnostics, but these relied heavily on hypothetical environments such as quizzes from academic journals or examination questions [ 27 ]. This approach can lead to a cognitive bias since the individuals formulating the imaging findings or exam questions also possess the answers. In these simulated scenarios, there’s also a propensity to leave out minor findings. Such minor findings, while often deemed insignificant in an experimental setup, are frequently encountered in real-world clinical practice and can have implications for diagnosis. In contrast, our study deviates from this previous methodology by using actual clinical findings, generated in a state of diagnostic uncertainty. This approach facilitates a more robust and practical evaluation of GPT-4’s accuracy, keeping in mind its potential applications in real-world clinical settings.

The diagnostic accuracy of GPT-4 varied depending on whether the input report was written by a neuroradiologist or a general radiologist. Specifically, for the final diagnosis, using reports from the neuroradiologists yielded higher accuracy than using those from general radiologists. However, for differential diagnoses, there was no difference in accuracy, regardless of whether the report was from a neuroradiologist or a general radiologist. Neuroradiologists, due to their experience and specialized knowledge, are more likely to include comprehensive, detailed information necessary for a final diagnosis in their reports [ 28 , 29 , 30 ]. Such high-quality reports likely enhanced GPT-4’s accuracy for final diagnoses. Conversely, GPT-4 possesses the ability to provide accurate differential diagnoses even with the general radiologists report because they can capture certain information crucial for a diagnosis. From these findings, a beneficial application of GPT-4 in clinical and educational settings is for neuroradiologists to use it as a second opinion to assist with final and differential diagnoses. For general radiologists, GPT-4 can be particularly useful for understanding diagnostic cues and learning about differential diagnoses, which can sometimes be time-consuming. When general radiologists encounter complex or unfamiliar cases, consulting GPT-4 could guide their diagnostic direction. Of course, any advice or suggestions from GPT-4 should be considered as just one of many references. General radiologists should prioritize consultation with experts when determining the final diagnosis. In this paper, we compared the diagnostic capabilities from radiologist report texts between GPT-4 and radiologists themselves, and found that generic LLMs have significant potential as diagnostic decision support systems in radiology. If this potential was incorporated into a standard workflow, it is possible to reduce missed findings by consulting the ChatGPT output. This is a valuable future research opportunity.

There are several limitations. This study only used the wording of actual clinical radiology reports and did not evaluate the effect of including other information such as patient history and the image itself, meaning the radiologists’ performance might not match their real-world diagnostic abilities. Furthermore, recent advancements in large language models have enabled the input of not only text but also images. Evaluating the performance of large language models that combine both radiology report texts and images could provide deeper insights into their potential usefulness in radiology diagnostics. Among the two institutions where MRI reports were collected, institution A and the five radiologist readers (neuroradiologists A and B, general radiologists C, D, and E) were from the same institution, which could result in bias due to familiarity with the report style and writing. We have only evaluated the diagnostic performance of GPT-4 in a single language and would like to see it evaluated in multiple language reports. We did not assess MRI reports for diseases other than brain tumors.

GPT-4 has showcased a great diagnostic ability, demonstrating performance comparable to that of neuroradiologists in the task of diagnosing brain tumors from MRI reports. The implications of these findings are far-reaching, suggesting potential real-world utility, particularly in the generation of differential diagnoses for general radiologists in a clinical setting. The encouraging results of this study invite further evaluations of the LLM’s accuracy across a myriad of medical fields and imaging modalities. The end goal of such exploration is to pave the way for the development of more versatile, reliable, and powerful tools for healthcare.

Abbreviations

Chat Generative Pre-trained Transformer

Large language models

OpenAI (2023) GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774

Bubeck S, Chandrasekaran V, Eldan R et al (2023) Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712

Ueda D, Walston SL, Matsumoto T et al (2024) Evaluating GPT-4-based ChatGPT's clinical potential on the NEJM quiz. BMC Digit Health 2:4

Eloundou T, Manning S, Mishkin P, Rock D (2023) GPTs are GPTs: an early look at the labor market impact potential of large language models. Preprint at https://doi.org/10.48550/arXiv.2303.10130

Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. Preprint at https://doi.org/10.48550/arXiv.2005.14165

Kottlors J, Bratke G, Rauen P et al (2023) Feasibility of differential diagnosis based on imaging patterns using a large language model. Radiology 308:e231167

Article   PubMed   Google Scholar  

Haver HL, Ambinder EB, Bahl M et al (2023) Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 307:e230424

Rao A, Kim J, Kamineni M et al (2023) Evaluating GPT as an adjunct for radiologic decision making: GPT-4 versus GPT-3.5 in a breast imaging pilot. J Am Coll Radiol 20:990–997

Gertz RJ, Bunck AC, Lennartz S et al (2023) GPT-4 for automated determination of radiological study and protocol based on radiology request forms: a feasibility study. Radiology 307:e230877

Sun Z, Ong H, Kennedy P et al (2023) Evaluating GPT-4 on impressions generation in radiology reports. Radiology 307:e231259

Mallio CA, Sertorio AC, Bernetti C, Beomonte Zobel B (2023) Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing. Radiol Med 128:808–812

Li H, Moon JT, Iyer D et al (2023) Decoding radiology reports: potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging 101:137–141

Ariyaratne S, Iyengar KP, Nischal N et al (2023) A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 52:1755–1758

McCarthy CJ, Berkowitz S, Ramalingam V, Ahmed M (2023) Evaluation of an artificial intelligence chatbot for delivery of interventional radiology patient education material: a comparison with societal website content. J Vasc Interv Radiol 34:1760–1768.E32

Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307:e230582

Rau A, Rau S, Zoeller D et al (2023) A context-based chatbot surpasses trained radiologists and generic ChatGPT in following the ACR appropriateness guidelines. Radiology 308:e230970

Ray PP (2023) The need to re-evaluate the role of GPT-4 in generating radiology reports. Radiology 308:e231696

Ueda D, Mitsuyama Y, Takita H et al (2023) ChatGPT’s diagnostic performance from patient history and imaging findings on the Diagnosis Please quizzes. Radiology 308:e231040

Suthar PP, Kounsal A, Chhetri L et al (2023) Artificial intelligence (AI) in radiology: a deep dive into ChatGPT 4.0’s accuracy with the American Journal of Neuroradiology’s (AJNR) “Case of the Month.” Cureus 15:e43958

Horiuchi D, Tatekawa H, Oura T et al (2024) Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases. Clin Neuroradiol. https://doi.org/10.1007/s00062-024-01426-y

Nakaura T, Yoshida N, Kobayashi N et al (2023) Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Jpn J Radiol 42:190–200

Gray BR, Mutz JM, Gunderman RB (2020) Radiology as personal knowledge. AJR Am J Roentgenol 214:237–238

Medina LS, Blackmore CC (2007) Evidence-based radiology: review and dissemination. Radiology 244:331–336

Gao H, Jiang X (2013) Progress on the diagnosis and evaluation of brain tumors. Cancer Imaging 13:466–481

Article   PubMed   PubMed Central   Google Scholar  

Bossuyt PM, Reitsma JB, Bruns DE et al (2015) STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology 277:826–832

Li D, Gupta K, Chong J (2023) Evaluating diagnostic performance of ChatGPT in radiology: delving into methods. Radiology 308:e232082

Ueda D, Kakinuma T, Fujita S et al (2023) Fairness of artificial intelligence in healthcare: review and recommendations. Jpn J Radiol 42:3–15

Wang W, van Heerden J, Tacey MA, Gaillard F (2017) Neuroradiologists compared with non-neuroradiologists in the detection of new multiple sclerosis plaques. AJNR Am J Neuroradiol 38:1323–1327

Article   CAS   PubMed   PubMed Central   Google Scholar  

Zan E, Yousem DM, Carone M, Lewin JS (2010) Second-opinion consultations in neuroradiology. Radiology 255:135–141

Briggs GM, Flynn PA, Worthington M et al (2008) The role of specialist neuroradiology second opinion reporting: is there added value? Clin Radiol 63:791–795

Article   CAS   PubMed   Google Scholar  

Download references

Acknowledgements

We are grateful to the National Hospital Organization Osaka Minami Medical Center for providing the data for this study.

The authors state that this work has not received any funding.

Author information

Authors and affiliations.

Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3 Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan

Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita, Fumi Sasaki, Akane Tashiro, Satoshi Oue, Shannon L. Walston, Yukio Miki & Daiju Ueda

Department of Medical Statistics, Graduate School of Medicine, Osaka Metropolitan University, 1-4-3 Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan

Yuta Nonomiya & Ayumi Shintani

Center for Health Science Innovation, Osaka Metropolitan University, 1-4-3, Asahi-machi, Abeno-ku, Osaka, 545-8585, Japan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Daiju Ueda .

Ethics declarations

The scientific guarantor of this publication is D.U.

Conflict of interest

There is no conflict of interest.

Statistics and biometry

No complex statistical methods were necessary for this paper.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained. This work was previously available as a preprint. https://doi.org/10.1101/2023.10.27.23297585 . We have used ChatGPT to generate a portion of the manuscript, but the output was confirmed by the authors.

Study subjects or cohorts overlap

There is no overlap in this study cohort.

Methodology

Retrospective

Observational

Performed at two institutions

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Electronic supplementary material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Mitsuyama, Y., Tatekawa, H., Takita, H. et al. Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Eur Radiol (2024). https://doi.org/10.1007/s00330-024-11032-8

Download citation

Received : 15 December 2023

Revised : 08 July 2024

Accepted : 06 August 2024

Published : 28 August 2024

DOI : https://doi.org/10.1007/s00330-024-11032-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence
  • Natural language processing
  • Magnetic resonance imaging
  • Brain tumor
  • Find a journal
  • Publish with us
  • Track your research

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

Using AI tools

  • How to Write a Paper with ChatGPT | Tips & Examples

How to Write a Paper with ChatGPT | Tips & Examples

Published on 17 August 2023 by Koen Driessen .

A good research paper demonstrates your academic writing skills and knowledge of your chosen topic.

Your research paper should be based on in-depth independent research. However, generative AI tools like ChatGPT can be effectively used throughout the research process to:

  • Brainstorm research questions
  • Develop a methodology
  • Create an outline
  • Find sources
  • Summarise and paraphrase text
  • Provide feedback

Instantly correct all language mistakes in your text

Be assured that you'll submit flawless writing. Upload your document to correct all your mistakes.

upload-your-document-ai-proofreader

Table of contents

Brainstorming research questions, developing a methodology, creating an outline, finding sources, summarising or paraphrasing text, receiving feedback, other interesting articles, frequently asked questions.

You can use ChatGPT to help formulate research questions , brainstorm potential thesis statements , or narrow down a broad subject area to a specific topic that interests you.

However, make sure that the outputs make sense and meet the requirements of your assignment. You can adapt these outputs manually, or use further prompts to ensure they meet your needs. For more information, see our tips for creating effective ChatGPT prompts .

  • Generate three possible research questions for a research paper on the following topic: “The role of virtual reality in enhancing training and skill development in various industries.”
  • “What are the cognitive and skill development benefits of virtual reality-based training compared to traditional training methods across different industries?”
  • “How does the level of immersion in virtual reality training impact knowledge retention and skill transfer in fields such as medicine, aviation, and manufacturing?”
  • “What are the key factors that influence the successful integration and adoption of virtual reality technology for training purposes in diverse industries, and what challenges hinder its widespread implementation?”

The only proofreading tool specialized in correcting academic writing

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

can gpt 4 write research paper

Correct my document today

Your paper should clearly indicate the data collection and analysis methods you used in your research.

You can use ChatGPT to help decide what kind of research you will conduct (e.g., qualitative or quantitative ), how you will analyse the data, and how you will mitigate potential research bias .

Bear in mind that you should be able to rationalize your chosen methodology . For example, you should ensure that it is either a standard methodology in the field or can be justified.

  • Suggest three quantitative research methods appropriate to the following question: “What are the cognitive and skill development benefits of virtual reality-based training compared to traditional training methods in the field of healthcare?”
  • Experimental Study: Conduct a randomized controlled trial (RCT) where healthcare professionals are randomly assigned to either a virtual reality-based training group or a traditional training group. Both groups would receive training in specific healthcare tasks, and their cognitive and skill development outcomes would be measured and compared. Performance metrics, such as accuracy, speed, and efficiency, could be quantitatively assessed to determine the benefits of virtual reality-based training over traditional methods.
  • Survey Research: Design a comprehensive survey targeting healthcare professionals who have undergone both virtual reality-based training and traditional training. The survey could include Likert-scale questions and structured response options to gauge their perceived cognitive and skill development benefits from each training method. Additionally, open-ended questions could be included to gather qualitative insights on specific strengths and weaknesses of each training approach.
  • Task Performance Analysis: Implement a task-based assessment that simulates real-life healthcare scenarios, where participants are evaluated on their ability to handle critical situations. Participants would go through both virtual reality-based training and traditional training before being assessed in the simulated scenarios. Their performance in the assessment could be objectively measured using predefined criteria, allowing for a quantitative comparison of skill development between the two training methods.

A clear outline can help structure your paper, organise your arguments effectively, and make your research more efficient.

You can use generative AI tools like ChatGPT to formulate a potential outline to help guide your research. You can be as specific as you like, giving an overview of your research objectives , main arguments, and any evidence you intend to include in your paper.

However, ChatGPT outputs sometimes contain factual inaccuracies. It’s important to check the headings and subheadings suggested in the output and adapt them to meet your needs.

  • Create an outline for an academic research paper with the following research question:“What are the cognitive and skill development benefits of virtual reality-based training compared to traditional training methods in the field of healthcare?” The paper will be about 10 pages long.

B. Existing studies on cognitive and skill development benefits of virtual reality training

C. Comparison of virtual reality-based training with traditional training methods

D. Gaps in the literature and research limitations

III. Theoretical Framework

A. Theoretical basis for cognitive and skill development in training

B. Application of cognitive theories to virtual reality-based training

C. The role of immersive learning environments in skill acquisition

D. Conceptual framework for the study

IV. Research Methodology

A. Research design and approach (e.g., quasi-experimental, pre-post intervention)

B. Selection of participants and sample size calculation

C. Description of the virtual reality-based training and traditional training programs

D. Instruments and measures for assessing cognitive and skill development

E. Data collection procedures and ethical considerations

V. Data Analysis

A. Overview of statistical methods used for data analysis

B. Treatment of data and data cleaning

C. Quantitative analysis of cognitive and skill development outcomes

D. Presentation of findings with charts and tables

VI. Results

A. Presentation of the results of the cognitive and skill development assessments

B. Comparison of outcomes between virtual reality-based training and traditional training

C. Statistical significance and effect size analysis

D. Interpretation of the results and implications

VII. Discussion

A. Interpretation of findings in the context of the research question

B. Comparison of results with existing literature

C. Explanation of how virtual reality-based training impacts cognitive processes and skill development

D. Limitations of the study and areas for future research

VIII. Practical Implications

A. Practical applications of virtual reality-based training in healthcare settings

B. Recommendations for integrating virtual reality training in healthcare education

C. Benefits of virtual reality for continuous professional development in healthcare

IX. Conclusion

A. Summary of key findings

B. Reiteration of the research question and hypothesis

C. Contributions of the study to the field of healthcare training

D. Concluding remarks

X. References

You can use ChatGPT to help find sources relevant to your research. However, it’s important to be aware of the limitations of ChatGPT (for example, it occasionally cites nonexistent sources).

Instead of asking ChatGPT for specific source recommendations, try asking it for recommendations on the types of sources relevant to your research topic. You can also use it to generate a list of relevant keywords to use in your literature search.

  • What types of sources could I use to write a research paper on the following research question? “What are the cognitive and skill development benefits of virtual reality-based training compared to traditional training methods in the field of healthcare?”
  • Academic Journals: Peer-reviewed journals are valuable sources of scholarly research. Look for articles in medical education, cognitive psychology, virtual reality applications in healthcare, and technology-enhanced learning.
  • Research Studies: Seek out published research studies that specifically investigate the impact of virtual reality-based training on cognitive and skill development in healthcare professionals.
  • Systematic Reviews and Meta-Analyses: These studies provide comprehensive overviews of existing research on a specific topic, making them a valuable source for summarizing and analyzing the overall evidence.
  • Books and Book Chapters: Academic books and book chapters authored by experts in the fields of medical education, virtual reality, cognitive science, or instructional design can provide in-depth insights and theoretical foundations.
  • Conference Proceedings: Proceedings from relevant conferences and symposiums may present the latest findings and advancements in virtual reality-based training and healthcare education.
  • Educational Institution Websites: Universities and medical schools often publish research papers, case studies, and whitepapers related to medical education and virtual reality training.
  • Industry Reports: Reports and whitepapers from healthcare organizations or technology companies exploring the benefits of virtual reality-based training in the healthcare sector.
  • Government and Non-profit Organisations: Look for reports and publications from government agencies and non-profit organisations that focus on medical education and healthcare training.
  • Online Databases: Utilize academic databases such as PubMed, Google Scholar, IEEE Xplore, and ScienceDirect to search for peer-reviewed research articles and conference papers.
  • Expert Interviews: If possible, conduct interviews with experts in medical education, cognitive science, and virtual reality to gain valuable insights and perspectives.
  • Surveys and Polls: Refer to reputable surveys and polls related to virtual reality-based training in the healthcare industry for additional statistical data.

You can use ChatGPT to paraphrase or summarise text. This can help you to condense sources to their most important points and explore new ways of expressing your ideas.

Alternatively you can use the more specialised tools featured on Scribbr’s AI writing resources page, which are designed specifically for these purposes and will give a smoother user experience.

  • Provide a short summary of the following text:“One of the most profound applications of VR in healthcare is its use in surgical training. Traditionally, surgical training has relied on a combination of didactic lectures, observation of procedures, and hands-on experience in the operating room. However, these methods may not provide the ideal environment for novices to develop the necessary dexterity and spatial understanding essential for complex surgeries. VR overcomes these limitations by offering immersive, true-to-life simulations that allow surgeons to navigate through intricate anatomical structures, practice incisions, and learn to manage unexpected challenges that may arise during surgeries. This hands-on experience in a risk-free environment not only enhances technical skills but also builds confidence and reduces the learning curve for new surgical techniques.”
  • Virtual Reality (VR) has a profound impact on healthcare, particularly in surgical training, as it provides immersive simulations that help surgeons develop dexterity, spatial understanding, and confidence for complex surgeries in a risk-free environment, ultimately enhancing their technical skills and reducing the learning curve.

When you’ve finished writing your research paper, you can use ChatGPT to receive feedback. You can be as specific as you like, selecting particular aspects the output should focus on (e.g., tone, clarity of structure, appropriateness of evidence to support your arguments).

You can also use ChatGPT to check grammar, spelling, and punctuation. However, it’s not designed for this purpose and occasionally misses errors. We recommend using a more specialised tool like Scribbr’s free grammar checker . Or, for more comprehensive feedback, Scribbr’s proofreading and editing service .

  • Feedback the above text with regard to appropriateness of evidence and coherence of structure.
  • The research paper demonstrates a well-structured and coherent presentation of the cognitive and skill development benefits of VR-based surgical training. The introduction effectively sets the context and rationale, while the literature review incorporates relevant historical and empirical evidence. The theoretical framework strengthens the foundation for the analysis.The methodology section could provide more details on the research design and study selection criteria. However, the sections on cognitive and skill development benefits present compelling evidence, supported by appropriate sub-sections.The exploration of improved patient outcomes lacks concrete examples but effectively outlines potential benefits. The section on challenges and future directions raises relevant points, but suggestions for solutions could enhance practicality.Overall, the paper combines theoretical, empirical, and practical evidence, offering a well-rounded approach. It establishes the significance of VR-based surgical training and concludes with a concise summary of key findings.

If you want more tips on using AI tools , understanding plagiarism , and citing sources , make sure to check out some of our other articles with explanations, examples, and formats.

  • Citing ChatGPT
  • Best grammar checker
  • Best paraphrasing tool
  • ChatGPT in your studies
  • Is ChatGPT trustworthy?
  • Types of plagiarism
  • Self-plagiarism
  • Avoiding plagiarism
  • Academic integrity
  • Best plagiarism checker

Citing sources

  • Citation styles
  • In-text citation
  • Citation examples
  • Annotated bibliography

Yes, you can use ChatGPT to summarise text . This can help you understand complex information more easily, summarise the central argument of your own paper, or clarify your research question.

You can also use Scribbr’s free text summariser , which is designed specifically for this purpose.

Yes, you can use ChatGPT to paraphrase text to help you express your ideas more clearly, explore different ways of phrasing your arguments, and avoid repetition.

However, it’s not specifically designed for this purpose. We recommend using a specialised tool like Scribbr’s free paraphrasing tool , which will provide a smoother user experience.

No, having ChatGPT write your college essay can negatively impact your application in numerous ways. ChatGPT outputs are unoriginal and lack personal insight.

Furthermore, Passing off AI-generated text as your own work is considered academically dishonest . AI detectors may be used to detect this offense, and it’s highly unlikely that any university will accept you if you are caught submitting an AI-generated admission essay.

However, you can use ChatGPT to help write your college essay during the preparation and revision stages (e.g., for brainstorming ideas and generating feedback).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Driessen, K. (2023, August 17). How to Write a Paper with ChatGPT | Tips & Examples. Scribbr. Retrieved 26 August 2024, from https://www.scribbr.co.uk/using-ai-tools/chatgpt-paper/

Is this article helpful?

Koen Driessen

Koen Driessen

Other students also liked, how to write good chatgpt prompts, chatgpt citations | formats & examples, ethical implications of chatgpt.

IMAGES

  1. GPT-4 vs. GPT-3. OpenAI Models' Comparison

    can gpt 4 write research paper

  2. Chat GPT 4 and Scientific Research Paper

    can gpt 4 write research paper

  3. Research paper writing using Chat GPT open AI

    can gpt 4 write research paper

  4. Research paper and Literature review writing using CHAT GPT

    can gpt 4 write research paper

  5. GPT-4 Explained

    can gpt 4 write research paper

  6. Introducing: GPT-4-powered, scientific summaries

    can gpt 4 write research paper

VIDEO

  1. What is GPT-4?

  2. How to Write Research paper & Report using for EEE Dept using AI Templates

  3. Generate Unlimited Content using Chatgpt-4 omni for Research Paper writing

  4. Is GPT-4 good at essays?

  5. How to write Research paper

  6. How to Write Literature Review with ChatGPT? Using ChatGPT for Literature Reviews in Research Paper

COMMENTS

  1. Using GPT-4 to write a scientific review article: a pilot evaluation study

    The effectiveness of GPT-4 in summarizing information was tested by providing it with the 113 reference articles from BRP1 to generate a list of potential sections for a review paper. The generated sections were then compared with BRP1's actual section titles for coverage evaluation (Fig. 1 (A)).

  2. How to Write a Paper with ChatGPT

    Your research paper should be based on in-depth independent research. However, generative AI tools like ChatGPT can be effectively used throughout the research process to: Brainstorm research questions. Develop a methodology. Create an outline. Find sources. Summarize and paraphrase text. Provide feedback. Note.

  3. Using GPT-4 to write a scientific review article: a pilot evaluation

    GPT-4, as the most advanced version of OpenAI's large language models, has attracted widespread attention, rapidly becoming an indispensable AI tool across various areas. This includes its exploration by scientists for diverse applications. Our study focused on assessing GPT-4's capabilities in generating text, tables, and diagrams for biomedical review papers. We also assessed the ...

  4. How to use ChatGPT to do research for papers, presentations ...

    1. Brainstorm. When you're assigned research papers, the general topic area is generally assigned, but you'll be required to identify the exact topic you want to pick for your paper or research ...

  5. ChatGPT for Research and Publication: A Step-by-Step Guide

    Go to: Step 1: Title and Title Page Creation by ChatGPT. ChatGPT can be a valuable tool in generating titles for research papers. Its ability to understand and generate humanlike text allows it to analyze and synthesize information provided by researchers to craft concise and impactful titles.

  6. [2303.08774] GPT-4 Technical Report

    We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer ...

  7. ChatGPT can write a paper in an hour

    00:23 Using ChatGPT to generate a research paper from scratch. A pair of scientists have produced a research paper in less than an hour with the help of the generative artificial intelligence (AI ...

  8. A large-scale comparison of human-written versus ChatGPT ...

    The ChatGPT service which serves as Web front-end to GPT-3.5 1 and GPT-4 was the fastest-growing service in history to break the 100 million user milestone in January and had 1 billion visits by ...

  9. Scientists used ChatGPT to generate an entire paper from scratch

    A pair of scientists has produced a research paper in less than an hour with the help of ChatGPT — a tool driven by artificial intelligence (AI) that can understand and generate human-like text ...

  10. GPT-4

    The GPT-4 base model is only slightly better at this task than GPT-3.5; however, after RLHF post-training (applying the same process we used with GPT-3.5) there is a large gap. Examining some examples below, GPT-4 resists selecting common sayings (you can't teach an old dog new tricks), however it still can miss subtle details (Elvis Presley ...

  11. Researchers Use GPT-4 To Generate Feedback on Scientific Manuscripts

    Scientific research has a peer problem. There simply aren't enough qualified peer reviewers to review all the studies. ... GPT-4's feedback can sometimes be more "generic" and may not pinpoint the deeper technical challenges in the paper. GPT-4 also has the tendency to focus only on limited aspects of scientific feedback (i.e., "add ...

  12. ChatGPT revisited: Using ChatGPT-4 for finding ...

    Addressing this concern, our research paper aims to reassess ChatGPT's (i.e., GPT-4) ability to identify and source authentic references relevant to specific queries and to evaluate its proficiency in detecting and correcting grammatical errors within academic writing. ... Use of ChatGPT in medical research and scientific writing. Malays Fam ...

  13. ChatGPT for Research and Publication: A Step-by-Step Guide

    This commentary provides a concise step-by-step guide on using ChatGPT, an advanced natural language processing (NLP) model, for research and publication purposes. The guide assesses crucial aspects, including data preprocessing, fine-tuning techniques, prompt engineering, and ethical considerations. By addressing challenges related to biases, interpretability, and plagiarism, this commentary ...

  14. PDF Abstract arXiv:submit/4812508 [cs.CL] 27 Mar 2023

    7 Mar 2023GPT-4 Technical ReportOpenAIAbstractWe report the development of GPT-4, a large-scale, multimodal model which can acce. t image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing ...

  15. How To Use ChatGPT To Write A Literature Review: Prompts & References

    A practical approach is to use ChatGPT as a starting point, leveraging its ability to quickly provide summaries, synthesize relevant literature, and identify key references and keywords related to the research topic. For example, prompts like: "Summarize the current research on [topic]" or. "Identify key debates in [topic]".

  16. Three ways ChatGPT helps me in my academic writing

    On the basis of my summary of a paper in [field], where the main focus is on [general topic], provide a detailed review of this paper, in the following order: 1) briefly discuss its core content ...

  17. Using ChatGPT for Assignments

    Creating an outline of your paper with ChatGPT. You can also use ChatGPT to help you draft a research paper outline or thesis outline.To do this, try generating possible headings and subheadings and then improving them. ChatGPT can help to generate a clear and well-structured outline, especially if you keep adjusting the structure with its help.

  18. How to make ChatGPT provide sources and citations

    1. Write a query and ask ChatGPT. To start, you need to ask ChatGPT something that needs sources or citations. I've found it's better to ask a question with a longer answer, so there's more "meat ...

  19. PDF Using GPT-4 to write a scientific review article: a pilot ...

    In this study, we evaluated the capabilities of ChatGPT with GPT-4 in writing biomedical review papers, using two cancer research papers as benchmarks. The first paper [17] served as a test for hatGPT's ability to generate main points and summarize text, while the second paper [18] tested its capacity for creating tables and graphs.

  20. How to Write an Essay with ChatGPT

    Begin by inputting a description of the research topic or assigned question. Then include a prompt like "Write 3 possible research questions on this topic." You can make the prompt as specific as you like. For example, you can include the writing level (e.g., high school essay, college essay), perspective (e.g., first person) and the type ...

  21. How to Use ChatGPT to Write a Research Paper: Tips and Tricks to Get

    ChatGPT will provide you with a list of sources to check out. Read through the sources and take notes on the information that is relevant to your research question. Repeat steps 2-4 for each question on your list. Once you've gathered all of your information, organise it into an outline for your research paper.

  22. Teaching GPT-4 to write code from research papers

    Teaching GPT-4 to write code from research papers. ... I'm no scientist but I love to understand the latest and greatest in research, which is a perfect task for GPT-4. Prepping the content. Since GPT-4 has a token size limit, we first split the paper into pages:

  23. Comparative analysis of GPT-4-based ChatGPT's diagnostic ...

    The majority of previous research [6,7,8,9,10,11,12, 17,18,19,20,21] suggested the utility of GPT-4 in diagnostics, but these relied heavily on hypothetical environments such as quizzes from academic journals or examination questions . This approach can lead to a cognitive bias since the individuals formulating the imaging findings or exam ...

  24. How to Write a Paper with ChatGPT

    Your research paper should be based on in-depth independent research. However, generative AI tools like ChatGPT can be effectively used throughout the research process to: Brainstorm research questions. Develop a methodology. Create an outline. Find sources. Summarise and paraphrase text. Provide feedback. Note.