• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

question paper of business research methods

Home Market Research

Business Research: Methods, Types & Examples

Business Research

Content Index

Business research: Definition

Quantitative research methods, qualitative research methods, advantages of business research, disadvantages of business research, importance of business research.

Business research is a process of acquiring detailed information on all the areas of business and using such information to maximize the sales and profit of the business. Such a study helps companies determine which product/service is most profitable or in demand. In simple words, it can be stated as the acquisition of information or knowledge for professional or commercial purposes to determine opportunities and goals for a business.

Business research can be done for anything and everything. In general, when people speak about business research design , it means asking research questions to know where the money can be spent to increase sales, profits, or market share. Such research is critical to make wise and informed decisions.

LEARN ABOUT: Research Process Steps

For example: A mobile company wants to launch a new model in the market. But they are not aware of what are the dimensions of a mobile that are in most demand. Hence, the company conducts business research using various methods to gather information, and the same is then evaluated, and conclusions are drawn as to what dimensions are most in demand.

This will enable the researcher to make wise decisions to position his phone at the right price in the market and hence acquire a larger market share.

LEARN ABOUT:  Test Market Demand

Business research: Types and methodologies

Business research is a part of the business intelligence process. It is usually conducted to determine whether a company can succeed in a new region, to understand its competitors, or simply select a marketing approach for a product. This research can be carried out using steps in qualitative research methods or quantitative research methods.

Quantitative research methods are research methods that deal with numbers. It is a systematic empirical investigation using statistical, mathematical, or computational techniques . Such methods usually start with data collection and then proceed to statistical analysis using various methods. The following are some of the research methods used to carry out business research.

LEARN ABOUT: Data Management Framework

Survey research

Survey research is one of the most widely used methods to gather data, especially for conducting business research. Surveys involve asking various survey questions to a set of audiences through various types like online polls, online surveys, questionnaires, etc. Nowadays, most of the major corporations use this method to gather data and use it to understand the market and make appropriate business decisions.

Various types of surveys, like cross-sectional studies , which need to collect data from a set of audiences at a given point of time, or longitudinal surveys which are needed to collect data from a set of audiences across various time durations in order to understand changes in the respondents’ behavior are used to conduct survey research. With the advancement in technology, surveys can now be sent online through email or social media .

For example: A company wants to know the NPS score for their website i.e. how satisfied are people who are visiting their website. An increase in traffic to their website or the audience spending more time on a website can result in higher rankings on search engines which will enable the company to get more leads as well as increase its visibility.

Hence, the company can ask people who visit their website a few questions through an online survey to understand their opinions or gain feedback and hence make appropriate changes to the website to increase satisfaction.

Learn More:  Business Survey Template

Correlational research

Correlational research is conducted to understand the relationship between two entities and what impact each one of them has on the other. Using mathematical analysis methods, correlational research enables the researcher to correlate two or more variables .

Such research can help understand patterns, relationships, trends, etc. Manipulation of one variable is possible to get the desired results as well. Generally, a conclusion cannot be drawn only on the basis of correlational research.

For example: Research can be conducted to understand the relationship between colors and gender-based audiences. Using such research and identifying the target audience, a company can choose the production of particular color products to be released in the market. This can enable the company to understand the supply and demand requirements of its products.

Causal-Comparative research

Causal-comparative research is a method based on the comparison. It is used to deduce the cause-effect relationship between variables. Sometimes also known as quasi-experimental research, it involves establishing an independent variable and analyzing the effects on the dependent variable.

In such research, data manipulation is not done; however, changes are observed in the variables or groups under the influence of the same changes. Drawing conclusions through such research is a little tricky as independent and dependent variables will always exist in a group. Hence all other parameters have to be taken into consideration before drawing any inferences from the research.

LEARN ABOUT: Causal Research

For example: Research can be conducted to analyze the effect of good educational facilities in rural areas. Such a study can be done to analyze the changes in the group of people from rural areas when they are provided with good educational facilities and before that.

Another example can be to analyze the effect of having dams and how it will affect the farmers or the production of crops in that area.

LEARN ABOUT: Market research trends

Experimental research

Experimental research is based on trying to prove a theory. Such research may be useful in business research as it can let the product company know some behavioral traits of its consumers, which can lead to more revenue. In this method, an experiment is carried out on a set of audiences to observe and later analyze their behavior when impacted by certain parameters.

LEARN ABOUT: Behavioral Targeting

For example: Experimental research was conducted recently to understand if particular colors have an effect on consumers’ hunger. A set of the audience was then exposed to those particular colors while they were eating, and the subjects were observed. It was seen that certain colors like red or yellow increase hunger.

Hence, such research was a boon to the hospitality industry. You can see many food chains like Mcdonalds, KFC, etc., using such colors in their interiors, brands, as well as packaging.

Another example of inferences drawn from experimental research, which is used widely by most bars/pubs across the world, is that loud music in the workplace or anywhere makes a person drink more in less time. This was proven through experimental research and was a key finding for many business owners across the globe.

Online research / Literature research

Literature research is one of the oldest methods available. It is very economical, and a lot of information can be gathered using such research. Online research or literature research involves gathering information from existing documents and studies, which can be available at Libraries, annual reports, etc.

Nowadays, with the advancement in technology, such research has become even more simple and accessible to everyone. An individual can directly research online for any information that is needed, which will give him in-depth information about the topic or the organization.

Such research is used mostly by marketing and salespeople in the business sector to understand the market or their customers. Such research is carried out using existing information that is available from various sources. However, care has to be taken to validate the sources from where the information is going to be collected.

For example , a salesperson has heard a particular firm is looking for some solution that their company provides. Hence, the salesperson will first search for a decision maker from the company, investigate what department he is from, and understand what the target company is looking for and what they are into.

Using this research, he can cater his solution to be spot on when he pitches it to this client. He can also reach out to the customer directly by finding a means to communicate with him by researching online.’

LEARN ABOUT: 12 Best Tools for Researchers

Qualitative research is a method that has a high importance in business research. Qualitative research involves obtaining data through open-ended conversational means of communication. Such research enables the researcher to not only understand what the audience thinks but also why he thinks it.

In such research, in-depth information can be gathered from the subjects depending on their responses. There are various types of qualitative research methods, such as interviews, focus groups, ethnographic research, content analysis, and case study research, that are widely used.

Such methods are of very high importance in business research as they enable the researcher to understand the consumer. What motivates the consumer to buy and what does not is what will lead to higher sales, and that is the prime objective for any business.

Following are a few methods that are widely used in today’s world by most businesses.

Interviews are somewhat similar to surveys, like sometimes they may have the same types of questions used. The difference is that the respondent can answer these open-ended questions at length, and the direction of the conversation or the questions being asked can be changed depending on the response of the subject.

Such a method usually gives the researcher detailed information about the perspective or opinions of its subject. Carrying out interviews with subject matter experts can also give important information critical to some businesses.

For example: An interview was conducted by a telecom manufacturer with a group of women to understand why they have less number of female customers. After interviewing them, the researcher understood that there were fewer feminine colors in some of the models, and females preferred not to purchase them.

Such information can be critical to a business such as a  telecom manufacturer and hence it can be used to increase its market share by targeting women customers by launching some feminine colors in the market.

Another example would be to interview a subject matter expert in social media marketing. Such an interview can enable a researcher to understand why certain types of social media advertising strategies work for a company and why some of them don’t.

LEARN ABOUT: Qualitative Interview

Focus groups

Focus groups are a set of individuals selected specifically to understand their opinions and behaviors. It is usually a small set of a group that is selected keeping in mind the parameters for their target market audience to discuss a particular product or service. Such a method enables a researcher with a larger sample than the interview or a case study while taking advantage of conversational communication.

Focus group is also one of the best examples of qualitative data in education . Nowadays, focus groups can be sent online surveys as well to collect data and answer why, what, and how questions. Such a method is very crucial to test new concepts or products before they are launched in the market.

For example: Research is conducted with a focus group to understand what dimension of screen size is preferred most by the current target market. Such a method can enable a researcher to dig deeper if the target market focuses more on the screen size, features, or colors of the phone. Using this data, a company can make wise decisions about its product line and secure a higher market share.

Ethnographic research

Ethnographic research is one of the most challenging research but can give extremely precise results. Such research is used quite rarely, as it is time-consuming and can be expensive as well. It involves the researcher adapting to the natural environment and observing its target audience to collect data. Such a method is generally used to understand cultures, challenges, or other things that can occur in that particular setting.

For example: The world-renowned show “Undercover Boss” would be an apt example of how ethnographic research can be used in businesses. In this show, the senior management of a large organization works in his own company as a regular employee to understand what improvements can be made, what is the culture in the organization, and to identify hard-working employees and reward them.

It can be seen that the researcher had to spend a good amount of time in the natural setting of the employees and adapt to their ways and processes. While observing in this setting, the researcher could find out the information he needed firsthand without losing any information or any bias and improve certain things that would impact his business.

LEARN ABOUT:   Workforce Planning Model

Case study research

Case study research is one of the most important in business research. It is also used as marketing collateral by most businesses to land up more clients. Case study research is conducted to assess customer satisfaction and document the challenges that were faced and the solutions that the firm gave them.

These inferences are made to point out the benefits that the customer enjoyed for choosing their specific firm. Such research is widely used in other fields like education, social sciences, and similar. Case studies are provided by businesses to new clients to showcase their capabilities, and hence such research plays a crucial role in the business sector.

For example: A services company has provided a testing solution to one of its clients. A case study research is conducted to find out what were the challenges faced during the project, what was the scope of their work, what objective was to be achieved, and what solutions were given to tackle the challenges.

The study can end with the benefits that the company provided through its solutions, like reduced time to test batches, easy implementation or integration of the system, or even cost reduction. Such a study showcases the capability of the company, and hence it can be stated as empirical evidence of the new prospect.

Website visitor profiling/research

Website intercept surveys or website visitor profiling/research is something new that has come up and is quite helpful in the business sector. It is an innovative approach to collect direct feedback from your website visitors using surveys. In recent times a lot of business generation happens online, and hence it is important to understand the visitors of your website as they are your potential customers.

Collecting feedback is critical to any business, as without understanding a customer, no business can be successful. A company has to keep its customers satisfied and try to make them loyal customers in order to stay on top.

A website intercept survey is an online survey that allows you to target visitors to understand their intent and collect feedback to evaluate the customers’ online experience. Information like visitor intention, behavior path, and satisfaction with the overall website can be collected using this.

Depending on what information a company is looking for, multiple forms of website intercept surveys can be used to gather responses. Some of the popular ones are Pop-ups, also called Modal boxes, and on-page surveys.

For example: A prospective customer is looking for a particular product that a company is selling. Once he is directed to the website, an intercept survey will start noting his intent and path. Once the transaction has been made, a pop-up or an on-page survey is provided to the customer to rate the website.

Such research enables the researcher to put this data to good use and hence understand the customers’ intent and path and improve any parts of the website depending on the responses, which in turn would lead to satisfied customers and hence, higher revenues and market share.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

  • Business research helps to identify opportunities and threats.
  • It helps identify research problems , and using this information, wise decisions can be made to tackle the issue appropriately.
  • It helps to understand customers better and hence can be useful to communicate better with the customers or stakeholders.
  • Risks and uncertainties can be minimized by conducting business research in advance.
  • Financial outcomes and investments that will be needed can be planned effectively using business research.
  • Such research can help track competition in the business sector.
  • Business research can enable a company to make wise decisions as to where to spend and how much.
  • Business research can enable a company to stay up-to-date with the market and its trends, and appropriate innovations can be made to stay ahead in the game.
  • Business research helps to measure reputation management
  • Business research can be a high-cost affair
  • Most of the time, business research is based on assumptions
  • Business research can be time-consuming
  • Business research can sometimes give you inaccurate information because of a biased population or a small focus group.
  • Business research results can quickly become obsolete because of the fast-changing markets

Business research is one of the most effective ways to understand customers, the market, and competitors. Such research helps companies to understand the demand and supply of the market. Using such research will help businesses reduce costs and create solutions or products that are targeted to the demand in the market and the correct audience.

In-house business research can enable senior management to build an effective team or train or mentor when needed. Business research enables the company to track its competitors and hence can give you the upper hand to stay ahead of them.

Failures can be avoided by conducting such research as it can give the researcher an idea if the time is right to launch its product/solution and also if the audience is right. It will help understand the brand value and measure customer satisfaction which is essential to continuously innovate and meet customer demands.

This will help the company grow its revenue and market share. Business research also helps recruit ideal candidates for various roles in the company. By conducting such research, a company can carry out a SWOT analysis , i.e. understand the strengths, weaknesses, opportunities, and threats. With the help of this information, wise decisions can be made to ensure business success.

LEARN ABOUT:  Market research industry

Business research is the first step that any business owner needs to set up his business to survive or to excel in the market. The main reason why such research is of utmost importance is that it helps businesses to grow in terms of revenue, market share, and brand value.

MORE LIKE THIS

Experimental vs Observational Studies: Differences & Examples

Experimental vs Observational Studies: Differences & Examples

Sep 5, 2024

Interactive forms

Interactive Forms: Key Features, Benefits, Uses + Design Tips

Sep 4, 2024

closed-loop management

Closed-Loop Management: The Key to Customer Centricity

Sep 3, 2024

Net Trust Score

Net Trust Score: Tool for Measuring Trust in Organization

Sep 2, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

Geektonight

Business Research Methods Notes, PDF I MBA 2024

  • Post last modified: 5 April 2022
  • Reading time: 11 mins read
  • Post category: MBA Study Material

question paper of business research methods

Download Business Research Methods Notes, PDF, Books, Syllabus for MBA 2024. We provide complete business research methods pdf. Business Research Methods study material includes business research methods notes, book , courses, case study, syllabus, question paper, MCQ, questions and answers and available in business research methods pdf form.

Business Research Methods subject is included in MBA so students are able to download business research methods notes for MBA 1st year and business research methods notes for MBA 2nd semester.

Table of Content

  • 1 Business Research Methods Syllabus
  • 2 Business Research Methods Notes PDF
  • 3 Business Research Methods Notes
  • 4 Business Research Methods Questions and Answers
  • 5 Business Research Methods Question Paper
  • 6 Business Research Methods Books

Business Research Methods Notes can be downloaded in business research methods pdf from the below article.

Business Research Methods Syllabus

A detailed business research methods syllabus as prescribed by various Universities and colleges in India are as under. You can download the syllabus in business research methods pdf form.

  • Unit 1: Introduction Business Research: Definition-Types of Business Research. Scientific Investigation: The Language of Research: Concepts, Constructs, Definitions, Variables, Propositions and Hypotheses, Theory and Models. Technology and Business Research: Information needs of Business – Technologies used in Business Research: The Internet, E-mail, Browsers and Websites. Role of Business Research in Managerial Decisions Ethics in Business Research.
  • Unit 2: The Research Process: Problem Identification: Broad Problem Area-Preliminary Data Gathering. Literature Survey, Online Data Bases Useful for Business Research, Hypothesis Development, Statement of Hypothesis, Procedure for Testing of Hypothesis. The Research Design: Types of Research Designs: Exploratory, Descriptive, Experimental Designs and Case Study, Measurement of Variables, Operational Definitions and Scales, Nominal and Ordinal Scales, Rating Scales, Ranking Scales, Reliability and Validity.
  • Unit 3: Collection and Analysis of Data : Sources of Data-Primary Sources of Data, Secondary Sources of Data, Data Collection Methods, Interviews, Structured Interviews and Unstructured Interviews, Face to face and Telephone Interviews. Observational Surveys, Questionnaire Construction, Organizing Questions, Structured and Unstructured Questionnaires, Guidelines for Construction of Questionnaires.
  • Unit 4: Data Analysis: An overview of Descriptive, Associational and Inferential, Statistical Measures.
  • Unit 5: The Research Report: Research Reports, Components, The Title Page-Table of Contents, The Executive Summary, The Introductory Section, The Body of the Report, The Final Part of the Report, Acknowledgements, References, Appendix, Guidelines for Preparing a Good Research report Oral Presentation, Deciding on the Content, Visual Aids, The Presenter, The Presentation and Handling Questions.

Business Research Methods Notes PDF

Business Research Methods Notes

question paper of business research methods

Business Research Methods Questions and Answers

If you have already studied the business research methods and notes , then it’s time to move ahead and go through previous year business research methods question papers .

  • What is information? Discuss the type of information need to run the Business.
  • Define the term ‘Research’, Enumerate the characteristics of research. Give a Comprehensive definition of research.
  • What do you mean by scientific investigation and explain them in detail.
  • Indicate the sources of research process. Enumerate the steps of the research process.
  • Give the sources of research problem. How a problem is identified? Enumerate the criteria for the selection of a problem.
  • How is a problem stated? Describe the various ways of defining a problem. Discuss characteristics of good problem and criteria for evaluating a problem.
  • Define the term ‘Review of literature’, how is it different from traditional meaning? Enumerate the objectives and significance of review of literature.
  • What do you mean by ‘Sample Design’? What points should be taken into consideration by a Researcher in developing a sample design for this research project.
  • How would you differentiate between simple random sampling and complex random sampling Designs? Explain clearly giving examples.
  • Why probability sampling is generally preferred in comparison to non-probability sampling? Explain the procedure of selecting a simple random sample.
  • Explain the phrase ‘Analysis of Data’ or ‘Treatment of Data’. Indicate the need and importance of data analysis.
  • Differentiate between descriptive statistical analysis and inferential statistical analysis.
  • Distinguish between parametric statistics and non-parametric statistics. Indicate their uses in different types of data or researches.
  • Indicate the basis for selecting a statistical technique in analyzing data for educational research.
  • What do you understand by research report or thesis? Indicate its need and importance in the research work.

Business Research Methods Question Paper

If you have already studied the business research methods and notes , then it’s time to move ahead and go through previous year business research methods question and answers .

It will help you to understand the question paper pattern and type of business research methods question and answer asked in MBA 1st year business research methods exam. You can download the syllabus in business research methods pdf form.

Business Research Methods Books

Below is the list of business research methods books recommended by the top university in India.

  • Green and Tull, Research Markets Decisions, PHI.
  • Tull Donald and Hawkins De, Marketing Research, PHI.
  • G.C.Beri, Marketing Research, Tata McGraw- Hill Publishers.
  • Luck David and Rubin Ronal, Marketing Research, PHI.
  • Naresh Malhotra, Marketing Research, Pearson Education. Green E. Paul, Tull S. Donald & Albaum, Gerald, Research for Marketing decisions, 6th Ed, PHI, 2006.

In the above article, a student can download business research methods notes for MBA 1st year and business research methods notes for MBA 2nd semester. Business Research Methods study material includes business research methods notes , business research methods books , business research methods syllabus , business research methods question paper , business research methods case study, business research methods questions and answers , business research methods courses in business research methods pdf form.

Go On, Share & Help your Friend

Did we miss something in MBA Study Material or You want something More? Come on! Tell us what you think about our post on Business Research Methods Notes | PDF, Book, Syllabus | MBA 2024 in the comments section and Share this post with your friends.

You Might Also Like

Marketing research notes, pdf, syllabus i mba, bba, bcom 2024, financial modeling notes, pdf, notes, syllabus, paper | mba (2024).

Read more about the article Brand Management Notes PDF | MBA 2024

Brand Management Notes PDF | MBA 2024

Read more about the article Management Information System Notes PDF, Syllabus | MBA 2024

Management Information System Notes PDF, Syllabus | MBA 2024

How to download notes on geektonight, security analysis and portfolio management pdf, notes | mba (2024).

Read more about the article Organisational Development and Change Notes PDF | MBA 2024

Organisational Development and Change Notes PDF | MBA 2024

Read more about the article Strategic Management Notes | PDF, Syllabus | MBA 2024

Strategic Management Notes | PDF, Syllabus | MBA 2024

International financial management notes, pdf | bba, bcom, mba 2024.

Read more about the article Business Communication PDF Notes (2024) | MBA, BBA, BCOM

Business Communication PDF Notes (2024) | MBA, BBA, BCOM

Management theory and organisational behaviour notes, pdf | mba (2024).

Read more about the article Managerial Economics Notes | PDF, Syllabus | MBA 2024

Managerial Economics Notes | PDF, Syllabus | MBA 2024

Leave a reply cancel reply.

You must be logged in to post a comment.

World's Best Online Courses at One Place

We’ve spent the time in finding, so you can spend your time in learning

Digital Marketing

Personal Growth

question paper of business research methods

question paper of business research methods

Development

question paper of business research methods

question paper of business research methods

question paper of business research methods

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Starting the research process
  • 10 Research Question Examples to Guide Your Research Project

10 Research Question Examples to Guide your Research Project

Published on October 30, 2022 by Shona McCombes . Revised on October 19, 2023.

The research question is one of the most important parts of your research paper , thesis or dissertation . It’s important to spend some time assessing and refining your question before you get started.

The exact form of your question will depend on a few things, such as the length of your project, the type of research you’re conducting, the topic , and the research problem . However, all research questions should be focused, specific, and relevant to a timely social or scholarly issue.

Once you’ve read our guide on how to write a research question , you can use these examples to craft your own.

Research question Explanation
The first question is not enough. The second question is more , using .
Starting with “why” often means that your question is not enough: there are too many possible answers. By targeting just one aspect of the problem, the second question offers a clear path for research.
The first question is too broad and subjective: there’s no clear criteria for what counts as “better.” The second question is much more . It uses clearly defined terms and narrows its focus to a specific population.
It is generally not for academic research to answer broad normative questions. The second question is more specific, aiming to gain an understanding of possible solutions in order to make informed recommendations.
The first question is too simple: it can be answered with a simple yes or no. The second question is , requiring in-depth investigation and the development of an original argument.
The first question is too broad and not very . The second question identifies an underexplored aspect of the topic that requires investigation of various  to answer.
The first question is not enough: it tries to address two different (the quality of sexual health services and LGBT support services). Even though the two issues are related, it’s not clear how the research will bring them together. The second integrates the two problems into one focused, specific question.
The first question is too simple, asking for a straightforward fact that can be easily found online. The second is a more question that requires and detailed discussion to answer.
? dealt with the theme of racism through casting, staging, and allusion to contemporary events? The first question is not  — it would be very difficult to contribute anything new. The second question takes a specific angle to make an original argument, and has more relevance to current social concerns and debates.
The first question asks for a ready-made solution, and is not . The second question is a clearer comparative question, but note that it may not be practically . For a smaller research project or thesis, it could be narrowed down further to focus on the effectiveness of drunk driving laws in just one or two countries.

Note that the design of your research question can depend on what method you are pursuing. Here are a few options for qualitative, quantitative, and statistical research questions.

Type of research Example question
Qualitative research question
Quantitative research question
Statistical research question

Other interesting articles

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

Methodology

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, October 19). 10 Research Question Examples to Guide your Research Project. Scribbr. Retrieved September 6, 2024, from https://www.scribbr.com/research-process/research-question-examples/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, writing strong research questions | criteria & examples, how to choose a dissertation topic | 8 steps to follow, evaluating sources | methods & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Business research methods question paper pdf

Content Marketing. Python Pandas. A 1 and 2 Year all semsters Question Paper Download. Short Answer Questions Note : Answer any four questions. Practicing question paper gives you the confidence papeer business research methods question paper pdf the board exam with minimum fear and stress since you get proper idea about question paper pattern and marks weightage. Discuss the various methods of qualitative research. If you would like to leave a comment, please do, I'd love to hear what you think! The figures in the margin indicate full marks. Question Bank Solutions. Share to Twitter Share to Facebook. My Profile. What are the major decisions involved in constructing an itemized rating scale? Objective Questions Note : Answer all questions. BCA 6th Sem. Financial Accounting, Question Paper of B. Enter your email address to comment.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Business Research

Try Qualtrics for free

Business research: definition, types & methods.

10 min read What is business research and why does it matter? Here are some of the ways business research can be helpful to your company, whichever method you choose to carry it out.

What is business research?

Business research helps companies make better business decisions by gathering information. The scope of the term business research is quite broad – it acts as an umbrella that covers every aspect of business, from finances to advertising creative. It can include research methods which help a company better understand its target market. It could focus on customer experience and assess customer satisfaction levels. Or it could involve sizing up the competition through competitor research.

Often when carrying out business research, companies are looking at their own data, sourced from their employees, their customers and their business records. However, business researchers can go beyond their own company in order to collect relevant information and understand patterns that may help leaders make informed decisions. For example, a business may carry out ethnographic research where the participants are studied in the context of their everyday lives, rather than just in their role as consumer, or look at secondary data sources such as open access public records and empirical research carried out in academic studies.

There is also a body of knowledge about business in general that can be mined for business research purposes. For example organizational theory and general studies on consumer behavior.

Free eBook: 2024 global market research trends report

Why is business research important?

We live in a time of high speed technological progress and hyper-connectedness. Customers have an entire market at their fingertips and can easily switch brands if a competitor is offering something better than you are. At the same time, the world of business has evolved to the point of near-saturation. It’s hard to think of a need that hasn’t been addressed by someone’s innovative product or service.

The combination of ease of switching, high consumer awareness and a super-evolved marketplace crowded with companies and their offerings means that businesses must do whatever they can to find and maintain an edge. Business research is one of the most useful weapons in the fight against business obscurity, since it allows companies to gain a deep understanding of buyer behavior and stay up to date at all times with detailed information on their market.

Thanks to the standard of modern business research tools and methods, it’s now possible for business analysts to track the intricate relationships between competitors, financial markets, social trends, geopolitical changes, world events, and more.

Find out how to conduct your own market research and make use of existing market research data with our Ultimate guide to market research

Types of business research

Business research methods vary widely, but they can be grouped into two broad categories – qualitative research and quantitative research .

Qualitative research methods

Qualitative business research deals with non-numerical data such as people’s thoughts, feelings and opinions. It relies heavily on the observations of researchers, who collect data from a relatively small number of participants – often through direct interactions.

Qualitative research interviews take place one-on-one between a researcher and participant. In a business context, the participant might be a customer, a supplier, an employee or other stakeholder. Using open-ended questions , the researcher conducts the interview in either a structured or unstructured format. Structured interviews stick closely to a question list and scripted phrases, while unstructured interviews are more conversational and exploratory. As well as listening to the participant’s responses, the interviewer will observe non-verbal information such as posture, tone of voice and facial expression.

Focus groups

Like the qualitative interview, a focus group is a form of business research that uses direct interaction between the researcher and participants to collect data. In focus groups , a small number of participants (usually around 10) take part in a group discussion led by a researcher who acts as moderator. The researcher asks questions and takes note of the responses, as in a qualitative research interview. Sampling for focus groups is usually purposive rather than random, so that the group members represent varied points of view.

Observational studies

In an observational study, the researcher may not directly interact with participants at all, but will pay attention to practical situations, such as a busy sales floor full of potential customers, or a conference for some relevant business activity. They will hear people speak and watch their interactions , then record relevant data such as behavior patterns that relate to the subject they are interested in. Observational studies can be classified as a type of ethnographic research. They can be used to gain insight about a company’s target audience in their everyday lives, or study employee behaviors in actual business situations.

Ethnographic Research

Ethnographic research is an immersive design of research where one observes peoples’ behavior in their natural environment. Ethnography was most commonly found in the anthropology field and is now practices across a wide range of social sciences.

Ehnography is used to support a designer’s deeper understanding of the design problem – including the relevant domain, audience(s), processes, goals and context(s) of use.

The ethnographic research process is a popular methodology used in the software development lifecycle. It helps create better UI/UX flow based on the real needs of the end-users.

If you truly want to understand your customers’ needs, wants, desires, pain-points “walking a mile” in their shoes enables this. Ethnographic research is this deeply rooted part of research where you truly learn your targe audiences’ problem to craft the perfect solution.

Case study research

A case study is a detailed piece of research that provides in depth knowledge about a specific person, place or organization. In the context of business research, case study research might focus on organizational dynamics or company culture in an actual business setting, and case studies have been used to develop new theories about how businesses operate. Proponents of case study research feel that it adds significant value in making theoretical and empirical advances. However its detractors point out that it can be time consuming and expensive, requiring highly skilled researchers to carry it out.

Quantitative research methods

Quantitative research focuses on countable data that is objective in nature. It relies on finding the patterns and relationships that emerge from mass data – for example by analyzing the material posted on social media platforms, or via surveys of the target audience. Data collected through quantitative methods is empirical in nature and can be analyzed using statistical techniques. Unlike qualitative approaches, a quantitative research method is usually reliant on finding the right sample size, as this will determine whether the results are representative. These are just a few methods – there are many more.

Surveys are one of the most effective ways to conduct business research. They use a highly structured questionnaire which is distributed to participants, typically online (although in the past, face to face and telephone surveys were widely used). The questions are predominantly closed-ended, limiting the range of responses so that they can be grouped and analyzed at scale using statistical tools. However surveys can also be used to get a better understanding of the pain points customers face by providing open field responses where they can express themselves in their own words. Both types of data can be captured on the same questionnaire, which offers efficiency of time and cost to the researcher.

Correlational research

Correlational research looks at the relationship between two entities, neither of which are manipulated by the researcher. For example, this might be the in-store sales of a certain product line and the proportion of female customers subscribed to a mailing list. Using statistical analysis methods, researchers can determine the strength of the correlation and even discover intricate relationships between the two variables. Compared with simple observation and intuition, correlation may identify further information about business activity and its impact, pointing the way towards potential improvements and more revenue.

Experimental research

It may sound like something that is strictly for scientists, but experimental research is used by both businesses and scholars alike. When conducted as part of the business intelligence process, experimental research is used to test different tactics to see which ones are most successful – for example one marketing approach versus another. In the simplest form of experimental research, the researcher identifies a dependent variable and an independent variable. The hypothesis is that the independent variable has no effect on the dependent variable, and the researcher will change the independent one to test this assumption. In a business context, the hypothesis might be that price has no relationship to customer satisfaction. The researcher manipulates the price and observes the C-Sat scores to see if there’s an effect.

The best tools for business research

You can make the business research process much quicker and more efficient by selecting the right tools. Business research methods like surveys and interviews demand tools and technologies that can store vast quantities of data while making them easy to access and navigate. If your system can also carry out statistical analysis, and provide predictive recommendations to help you with your business decisions, so much the better.

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

T4Tutorials.com

Business Research Methods Past Papers

[OBJECTIVE]

Subject: Business Research Methods

Time Allowed: 15 Minutes

Maximum Marks: 10

NOTE: Attempt this Paper on this Question Sheet only. Please encircle the correct option. Division of marks is given in front of each question. This Paper will be collected back after expiry of time limit mentioned above.

Part-I Encircle the right answer, cutting and overwriting is not allowed. (10)

1. The degree of exactness or exactitude in scientific research is known as a) Purposiveness b) Rigor c) Objectivity d) Testability 2. The artificial study setting is known as a) Artificial study b) Contrived c) Non-contrived d) Botha and b 3. A scale that measures both the direction and intensity of the attributes of a concept a) Staple scale b) Dichotomous scale c) Likert scale d) Constant sum rating scale. 4. A subset or subgroup of the population chosen for study a) Subject b) Sample c) Population frame d) Element 5. The hypothesis “what is the distribution of hypertensive patients by income level?” is an example of a) Descriptive hypothesis b) Relational hypothesis c) Correlational hypothesis d) Causal hypothesis 6. the most powerful scale: a) Nominal scale b) Ordinal scale c) Interval scale d) Ratio scale 7. The paired comparison scale is used when, among a small number of objects, respondents are asked to choose between ______ objects at a time. a) Two b) Three c) Four d) None of these 8. _____ is a test of how consistently a measuring instrument measures whatever concept it is measuring. , a) Validity b) Reliability c) Content validity d) Construct validity 9. A question that lends itself to different possible responses to its subparts is called a: a) Loaded question b) Leading question c) Double-barreled question d) Ambiguous question 10. Collecting the necessary data without becoming integral part of the organizational system: a) Participant-observer b) Non participant-observer c) Assistant observer d) None of these

[SUBJECTIVE]

Time Allowed: 2 Hour and 45 Minutes

Maximum Marks: 50

NOTE: ATTEMPT THIS (SUBJECTIVE) ON THE SEPARATE ANSWER SHEET PROVIDED.

Part-II Give Short answers, Each question carries equal marks. (20)

Q# 1: What is descriptive research?

Q# 2: Define Simple Random Sampling?

Q# 3: Define ratio scale with the help of an example.

Q# 4: Differentiate between cross sectional and longitudinal research.

Q# 5: Explain semi structured interview.

Q# 6: What is meant by deductive reasoning?

Q# 7: Write down two advantages and two disadvantages of external researcher.

Q# 8: Explain funneling technique of questioning?

Q# 9: Explain any two possible threats to internal validity in experimental design.

Q# 10: Pros and Cons of observational studies

Part-III Give detailed answers, Each question carries equal marks. (30)

Q# 1: What is hypothetical-deductive method of research? Explain the steps involved in this method of research with the help of an example.

Q# 2: What is reliability and validity in research? How can you assess the reliability and validity of qualitative research?

Q# 3: What is stratified sampling technique? What are its different types? Give an example of a situation where you would use stratified sampling.

Related Posts:

  • Advance Research Methods in Business Universities Past Papers
  • Research Methods in Business Past Papers
  • Business Research Methods Research Topics
  • Advance Research Methods in Business Universities Old Papers
  • Advance Research Methods in Business Universities Previous Papers
  • Business Research Methods MCQs

Admission Open – batch#11

Learn With Studynotes

B.Com Business Research Methods qp

B.Com 5th Semester Business Research Methods Previous Year Question Papers

B.com business research methods previous year question papers.

Download Calicut University B.com V semester Business Research Methods previous year question papers .

Semester 5: Business Research Methods

Download the Bcom Business research methods previous question paper of Nov 2022.

Leave a Comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Business Research Methods - BA4205

[ mba - anna university 2021 regulation ].

Under Class: | 2nd Semester [MBA Dept Anna University 2021 Regulation] |

Business Research Methods - BA4205 - Notes, Important Questions, Semester Question Paper PDF Download

MBA - Business Research Methods - BA4205 Subject (under MBA - Anna University 2021 Regulation) - Notes, Important Questions, Semester Question Paper PDF Download

Important Questions and Question Bank

Semester question papers.

Peer Reviewed

GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation

Article metrics.

CrossRef

CrossRef Citations

Altmetric Score

PDF Downloads

Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research. Our analysis of a selection of questionable GPT-fabricated scientific papers found in Google Scholar shows that many are about applied, often controversial topics susceptible to disinformation: the environment, health, and computing. The resulting enhanced potential for malicious manipulation of society’s evidence base, particularly in politically divisive domains, is a growing concern.

Swedish School of Library and Information Science, University of Borås, Sweden

Department of Arts and Cultural Sciences, Lund University, Sweden

Division of Environmental Communication, Swedish University of Agricultural Sciences, Sweden

question paper of business research methods

Research Questions

  • Where are questionable publications produced with generative pre-trained transformers (GPTs) that can be found via Google Scholar published or deposited?
  • What are the main characteristics of these publications in relation to predominant subject categories?
  • How are these publications spread in the research infrastructure for scholarly communication?
  • How is the role of the scholarly communication infrastructure challenged in maintaining public trust in science and evidence through inappropriate use of generative AI?

research note Summary

  • A sample of scientific papers with signs of GPT-use found on Google Scholar was retrieved, downloaded, and analyzed using a combination of qualitative coding and descriptive statistics. All papers contained at least one of two common phrases returned by conversational agents that use large language models (LLM) like OpenAI’s ChatGPT. Google Search was then used to determine the extent to which copies of questionable, GPT-fabricated papers were available in various repositories, archives, citation databases, and social media platforms.
  • Roughly two-thirds of the retrieved papers were found to have been produced, at least in part, through undisclosed, potentially deceptive use of GPT. The majority (57%) of these questionable papers dealt with policy-relevant subjects (i.e., environment, health, computing), susceptible to influence operations. Most were available in several copies on different domains (e.g., social media, archives, and repositories).
  • Two main risks arise from the increasingly common use of GPT to (mass-)produce fake, scientific publications. First, the abundance of fabricated “studies” seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record. A second risk lies in the increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools and is also optimized to be retrieved by publicly available academic search engines, particularly Google Scholar. However small, this possibility and awareness of it risks undermining the basis for trust in scientific knowledge and poses serious societal risks.

Implications

The use of ChatGPT to generate text for academic papers has raised concerns about research integrity. Discussion of this phenomenon is ongoing in editorials, commentaries, opinion pieces, and on social media (Bom, 2023; Stokel-Walker, 2024; Thorp, 2023). There are now several lists of papers suspected of GPT misuse, and new papers are constantly being added. 1 See for example Academ-AI, https://www.academ-ai.info/ , and Retraction Watch, https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ . While many legitimate uses of GPT for research and academic writing exist (Huang & Tan, 2023; Kitamura, 2023; Lund et al., 2023), its undeclared use—beyond proofreading—has potentially far-reaching implications for both science and society, but especially for their relationship. It, therefore, seems important to extend the discussion to one of the most accessible and well-known intermediaries between science, but also certain types of misinformation, and the public, namely Google Scholar, also in response to the legitimate concerns that the discussion of generative AI and misinformation needs to be more nuanced and empirically substantiated  (Simon et al., 2023).

Google Scholar, https://scholar.google.com , is an easy-to-use academic search engine. It is available for free, and its index is extensive (Gusenbauer & Haddaway, 2020). It is also often touted as a credible source for academic literature and even recommended in library guides, by media and information literacy initiatives, and fact checkers (Tripodi et al., 2023). However, Google Scholar lacks the transparency and adherence to standards that usually characterize citation databases. Instead, Google Scholar uses automated crawlers, like Google’s web search engine (Martín-Martín et al., 2021), and the inclusion criteria are based on primarily technical standards, allowing any individual author—with or without scientific affiliation—to upload papers to be indexed (Google Scholar Help, n.d.). It has been shown that Google Scholar is susceptible to manipulation through citation exploits (Antkare, 2020) and by providing access to fake scientific papers (Dadkhah et al., 2017). A large part of Google Scholar’s index consists of publications from established scientific journals or other forms of quality-controlled, scholarly literature. However, the index also contains a large amount of gray literature, including student papers, working papers, reports, preprint servers, and academic networking sites, as well as material from so-called “questionable” academic journals, including paper mills. The search interface does not offer the possibility to filter the results meaningfully by material type, publication status, or form of quality control, such as limiting the search to peer-reviewed material.

To understand the occurrence of ChatGPT (co-)authored work in Google Scholar’s index, we scraped it for publications, including one of two common ChatGPT responses (see Appendix A) that we encountered on social media and in media reports (DeGeurin, 2024). The results of our descriptive statistical analyses showed that around 62% did not declare the use of GPTs. Most of these GPT-fabricated papers were found in non-indexed journals and working papers, but some cases included research published in mainstream scientific journals and conference proceedings. 2 Indexed journals mean scholarly journals indexed by abstract and citation databases such as Scopus and Web of Science, where the indexation implies journals with high scientific quality. Non-indexed journals are journals that fall outside of this indexation. More than half (57%) of these GPT-fabricated papers concerned policy-relevant subject areas susceptible to influence operations. To avoid increasing the visibility of these publications, we abstained from referencing them in this research note. However, we have made the data available in the Harvard Dataverse repository.

The publications were related to three issue areas—health (14.5%), environment (19.5%) and computing (23%)—with key terms such “healthcare,” “COVID-19,” or “infection”for health-related papers, and “analysis,” “sustainable,” and “global” for environment-related papers. In several cases, the papers had titles that strung together general keywords and buzzwords, thus alluding to very broad and current research. These terms included “biology,” “telehealth,” “climate policy,” “diversity,” and “disrupting,” to name just a few.  While the study’s scope and design did not include a detailed analysis of which parts of the articles included fabricated text, our dataset did contain the surrounding sentences for each occurrence of the suspicious phrases that formed the basis for our search and subsequent selection. Based on that, we can say that the phrases occurred in most sections typically found in scientific publications, including the literature review, methods, conceptual and theoretical frameworks, background, motivation or societal relevance, and even discussion. This was confirmed during the joint coding, where we read and discussed all articles. It became clear that not just the text related to the telltale phrases was created by GPT, but that almost all articles in our sample of questionable articles likely contained traces of GPT-fabricated text everywhere.

Evidence hacking and backfiring effects

Generative pre-trained transformers (GPTs) can be used to produce texts that mimic scientific writing. These texts, when made available online—as we demonstrate—leak into the databases of academic search engines and other parts of the research infrastructure for scholarly communication. This development exacerbates problems that were already present with less sophisticated text generators (Antkare, 2020; Cabanac & Labbé, 2021). Yet, the public release of ChatGPT in 2022, together with the way Google Scholar works, has increased the likelihood of lay people (e.g., media, politicians, patients, students) coming across questionable (or even entirely GPT-fabricated) papers and other problematic research findings. Previous research has emphasized that the ability to determine the value and status of scientific publications for lay people is at stake when misleading articles are passed off as reputable (Haider & Åström, 2017) and that systematic literature reviews risk being compromised (Dadkhah et al., 2017). It has also been highlighted that Google Scholar, in particular, can be and has been exploited for manipulating the evidence base for politically charged issues and to fuel conspiracy narratives (Tripodi et al., 2023). Both concerns are likely to be magnified in the future, increasing the risk of what we suggest calling evidence hacking —the strategic and coordinated malicious manipulation of society’s evidence base.

The authority of quality-controlled research as evidence to support legislation, policy, politics, and other forms of decision-making is undermined by the presence of undeclared GPT-fabricated content in publications professing to be scientific. Due to the large number of archives, repositories, mirror sites, and shadow libraries to which they spread, there is a clear risk that GPT-fabricated, questionable papers will reach audiences even after a possible retraction. There are considerable technical difficulties involved in identifying and tracing computer-fabricated papers (Cabanac & Labbé, 2021; Dadkhah et al., 2023; Jones, 2024), not to mention preventing and curbing their spread and uptake.

However, as the rise of the so-called anti-vaxx movement during the COVID-19 pandemic and the ongoing obstruction and denial of climate change show, retracting erroneous publications often fuels conspiracies and increases the following of these movements rather than stopping them. To illustrate this mechanism, climate deniers frequently question established scientific consensus by pointing to other, supposedly scientific, studies that support their claims. Usually, these are poorly executed, not peer-reviewed, based on obsolete data, or even fraudulent (Dunlap & Brulle, 2020). A similar strategy is successful in the alternative epistemic world of the global anti-vaccination movement (Carrion, 2018) and the persistence of flawed and questionable publications in the scientific record already poses significant problems for health research, policy, and lawmakers, and thus for society as a whole (Littell et al., 2024). Considering that a person’s support for “doing your own research” is associated with increased mistrust in scientific institutions (Chinn & Hasell, 2023), it will be of utmost importance to anticipate and consider such backfiring effects already when designing a technical solution, when suggesting industry or legal regulation, and in the planning of educational measures.

Recommendations

Solutions should be based on simultaneous considerations of technical, educational, and regulatory approaches, as well as incentives, including social ones, across the entire research infrastructure. Paying attention to how these approaches and incentives relate to each other can help identify points and mechanisms for disruption. Recognizing fraudulent academic papers must happen alongside understanding how they reach their audiences and what reasons there might be for some of these papers successfully “sticking around.” A possible way to mitigate some of the risks associated with GPT-fabricated scholarly texts finding their way into academic search engine results would be to provide filtering options for facets such as indexed journals, gray literature, peer-review, and similar on the interface of publicly available academic search engines. Furthermore, evaluation tools for indexed journals 3 Such as LiU Journal CheckUp, https://ep.liu.se/JournalCheckup/default.aspx?lang=eng . could be integrated into the graphical user interfaces and the crawlers of these academic search engines. To enable accountability, it is important that the index (database) of such a search engine is populated according to criteria that are transparent, open to scrutiny, and appropriate to the workings of  science and other forms of academic research. Moreover, considering that Google Scholar has no real competitor, there is a strong case for establishing a freely accessible, non-specialized academic search engine that is not run for commercial reasons but for reasons of public interest. Such measures, together with educational initiatives aimed particularly at policymakers, science communicators, journalists, and other media workers, will be crucial to reducing the possibilities for and effects of malicious manipulation or evidence hacking. It is important not to present this as a technical problem that exists only because of AI text generators but to relate it to the wider concerns in which it is embedded. These range from a largely dysfunctional scholarly publishing system (Haider & Åström, 2017) and academia’s “publish or perish” paradigm to Google’s near-monopoly and ideological battles over the control of information and ultimately knowledge. Any intervention is likely to have systemic effects; these effects need to be considered and assessed in advance and, ideally, followed up on.

Our study focused on a selection of papers that were easily recognizable as fraudulent. We used this relatively small sample as a magnifying glass to examine, delineate, and understand a problem that goes beyond the scope of the sample itself, which however points towards larger concerns that require further investigation. The work of ongoing whistleblowing initiatives 4 Such as Academ-AI, https://www.academ-ai.info/ , and Retraction Watch, https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ . , recent media reports of journal closures (Subbaraman, 2024), or GPT-related changes in word use and writing style (Cabanac et al., 2021; Stokel-Walker, 2024) suggest that we only see the tip of the iceberg. There are already more sophisticated cases (Dadkhah et al., 2023) as well as cases involving fabricated images (Gu et al., 2022). Our analysis shows that questionable and potentially manipulative GPT-fabricated papers permeate the research infrastructure and are likely to become a widespread phenomenon. Our findings underline that the risk of fake scientific papers being used to maliciously manipulate evidence (see Dadkhah et al., 2017) must be taken seriously. Manipulation may involve undeclared automatic summaries of texts, inclusion in literature reviews, explicit scientific claims, or the concealment of errors in studies so that they are difficult to detect in peer review. However, the mere possibility of these things happening is a significant risk in its own right that can be strategically exploited and will have ramifications for trust in and perception of science. Society’s methods of evaluating sources and the foundations of media and information literacy are under threat and public trust in science is at risk of further erosion, with far-reaching consequences for society in dealing with information disorders. To address this multifaceted problem, we first need to understand why it exists and proliferates.

Finding 1: 139 GPT-fabricated, questionable papers were found and listed as regular results on the Google Scholar results page. Non-indexed journals dominate.

Most questionable papers we found were in non-indexed journals or were working papers, but we did also find some in established journals, publications, conferences, and repositories. We found a total of 139 papers with a suspected deceptive use of ChatGPT or similar LLM applications (see Table 1). Out of these, 19 were in indexed journals, 89 were in non-indexed journals, 19 were student papers found in university databases, and 12 were working papers (mostly in preprint databases). Table 1 divides these papers into categories. Health and environment papers made up around 34% (47) of the sample. Of these, 66% were present in non-indexed journals.

Indexed journals*534719
Non-indexed journals1818134089
Student papers4311119
Working papers532212
Total32272060139

Finding 2: GPT-fabricated, questionable papers are disseminated online, permeating the research infrastructure for scholarly communication, often in multiple copies. Applied topics with practical implications dominate.

The 20 papers concerning health-related issues are distributed across 20 unique domains, accounting for 46 URLs. The 27 papers dealing with environmental issues can be found across 26 unique domains, accounting for 56 URLs.  Most of the identified papers exist in multiple copies and have already spread to several archives, repositories, and social media. It would be difficult, or impossible, to remove them from the scientific record.

As apparent from Table 2, GPT-fabricated, questionable papers are seeping into most parts of the online research infrastructure for scholarly communication. Platforms on which identified papers have appeared include ResearchGate, ORCiD, Journal of Population Therapeutics and Clinical Pharmacology (JPTCP), Easychair, Frontiers, the Institute of Electrical and Electronics Engineer (IEEE), and X/Twitter. Thus, even if they are retracted from their original source, it will prove very difficult to track, remove, or even just mark them up on other platforms. Moreover, unless regulated, Google Scholar will enable their continued and most likely unlabeled discoverability.

Environmentresearchgate.net (13)orcid.org (4)easychair.org (3)ijope.com* (3)publikasiindonesia.id (3)
Healthresearchgate.net (15)ieee.org (4)twitter.com (3)jptcp.com** (2)frontiersin.org
(2)

A word rain visualization (Centre for Digital Humanities Uppsala, 2023), which combines word prominences through TF-IDF 5 Term frequency–inverse document frequency , a method for measuring the significance of a word in a document compared to its frequency across all documents in a collection. scores with semantic similarity of the full texts of our sample of GPT-generated articles that fall into the “Environment” and “Health” categories, reflects the two categories in question. However, as can be seen in Figure 1, it also reveals overlap and sub-areas. The y-axis shows word prominences through word positions and font sizes, while the x-axis indicates semantic similarity. In addition to a certain amount of overlap, this reveals sub-areas, which are best described as two distinct events within the word rain. The event on the left bundles terms related to the development and management of health and healthcare with “challenges,” “impact,” and “potential of artificial intelligence”emerging as semantically related terms. Terms related to research infrastructures, environmental, epistemic, and technological concepts are arranged further down in the same event (e.g., “system,” “climate,” “understanding,” “knowledge,” “learning,” “education,” “sustainable”). A second distinct event further to the right bundles terms associated with fish farming and aquatic medicinal plants, highlighting the presence of an aquaculture cluster.  Here, the prominence of groups of terms such as “used,” “model,” “-based,” and “traditional” suggests the presence of applied research on these topics. The two events making up the word rain visualization, are linked by a less dominant but overlapping cluster of terms related to “energy” and “water.”

question paper of business research methods

The bar chart of the terms in the paper subset (see Figure 2) complements the word rain visualization by depicting the most prominent terms in the full texts along the y-axis. Here, word prominences across health and environment papers are arranged descendingly, where values outside parentheses are TF-IDF values (relative frequencies) and values inside parentheses are raw term frequencies (absolute frequencies).

question paper of business research methods

Finding 3: Google Scholar presents results from quality-controlled and non-controlled citation databases on the same interface, providing unfiltered access to GPT-fabricated questionable papers.

Google Scholar’s central position in the publicly accessible scholarly communication infrastructure, as well as its lack of standards, transparency, and accountability in terms of inclusion criteria, has potentially serious implications for public trust in science. This is likely to exacerbate the already-known potential to exploit Google Scholar for evidence hacking (Tripodi et al., 2023) and will have implications for any attempts to retract or remove fraudulent papers from their original publication venues. Any solution must consider the entirety of the research infrastructure for scholarly communication and the interplay of different actors, interests, and incentives.

We searched and scraped Google Scholar using the Python library Scholarly (Cholewiak et al., 2023) for papers that included specific phrases known to be common responses from ChatGPT and similar applications with the same underlying model (GPT3.5 or GPT4): “as of my last knowledge update” and/or “I don’t have access to real-time data” (see Appendix A). This facilitated the identification of papers that likely used generative AI to produce text, resulting in 227 retrieved papers. The papers’ bibliographic information was automatically added to a spreadsheet and downloaded into Zotero. 6 An open-source reference manager, https://zotero.org .

We employed multiple coding (Barbour, 2001) to classify the papers based on their content. First, we jointly assessed whether the paper was suspected of fraudulent use of ChatGPT (or similar) based on how the text was integrated into the papers and whether the paper was presented as original research output or the AI tool’s role was acknowledged. Second, in analyzing the content of the papers, we continued the multiple coding by classifying the fraudulent papers into four categories identified during an initial round of analysis—health, environment, computing, and others—and then determining which subjects were most affected by this issue (see Table 1). Out of the 227 retrieved papers, 88 papers were written with legitimate and/or declared use of GPTs (i.e., false positives, which were excluded from further analysis), and 139 papers were written with undeclared and/or fraudulent use (i.e., true positives, which were included in further analysis). The multiple coding was conducted jointly by all authors of the present article, who collaboratively coded and cross-checked each other’s interpretation of the data simultaneously in a shared spreadsheet file. This was done to single out coding discrepancies and settle coding disagreements, which in turn ensured methodological thoroughness and analytical consensus (see Barbour, 2001). Redoing the category coding later based on our established coding schedule, we achieved an intercoder reliability (Cohen’s kappa) of 0.806 after eradicating obvious differences.

The ranking algorithm of Google Scholar prioritizes highly cited and older publications (Martín-Martín et al., 2016). Therefore, the position of the articles on the search engine results pages was not particularly informative, considering the relatively small number of results in combination with the recency of the publications. Only the query “as of my last knowledge update” had more than two search engine result pages. On those, questionable articles with undeclared use of GPTs were evenly distributed across all result pages (min: 4, max: 9, mode: 8), with the proportion of undeclared use being slightly higher on average on later search result pages.

To understand how the papers making fraudulent use of generative AI were disseminated online, we programmatically searched for the paper titles (with exact string matching) in Google Search from our local IP address (see Appendix B) using the googlesearch – python library(Vikramaditya, 2020). We manually verified each search result to filter out false positives—results that were not related to the paper—and then compiled the most prominent URLs by field. This enabled the identification of other platforms through which the papers had been spread. We did not, however, investigate whether copies had spread into SciHub or other shadow libraries, or if they were referenced in Wikipedia.

We used descriptive statistics to count the prevalence of the number of GPT-fabricated papers across topics and venues and top domains by subject. The pandas software library for the Python programming language (The pandas development team, 2024) was used for this part of the analysis. Based on the multiple coding, paper occurrences were counted in relation to their categories, divided into indexed journals, non-indexed journals, student papers, and working papers. The schemes, subdomains, and subdirectories of the URL strings were filtered out while top-level domains and second-level domains were kept, which led to normalizing domain names. This, in turn, allowed the counting of domain frequencies in the environment and health categories. To distinguish word prominences and meanings in the environment and health-related GPT-fabricated questionable papers, a semantically-aware word cloud visualization was produced through the use of a word rain (Centre for Digital Humanities Uppsala, 2023) for full-text versions of the papers. Font size and y-axis positions indicate word prominences through TF-IDF scores for the environment and health papers (also visualized in a separate bar chart with raw term frequencies in parentheses), and words are positioned along the x-axis to reflect semantic similarity (Skeppstedt et al., 2024), with an English Word2vec skip gram model space (Fares et al., 2017). An English stop word list was used, along with a manually produced list including terms such as “https,” “volume,” or “years.”

  • Artificial Intelligence
  • / Search engines

Cite this Essay

Haider, J., Söderström, K. R., Ekström, B., & Rödl, M. (2024). GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation. Harvard Kennedy School (HKS) Misinformation Review . https://doi.org/10.37016/mr-2020-156

  • / Appendix B

Bibliography

Antkare, I. (2020). Ike Antkare, his publications, and those of his disciples. In M. Biagioli & A. Lippman (Eds.), Gaming the metrics (pp. 177–200). The MIT Press. https://doi.org/10.7551/mitpress/11087.003.0018

Barbour, R. S. (2001). Checklists for improving rigour in qualitative research: A case of the tail wagging the dog? BMJ , 322 (7294), 1115–1117. https://doi.org/10.1136/bmj.322.7294.1115

Bom, H.-S. H. (2023). Exploring the opportunities and challenges of ChatGPT in academic writing: A roundtable discussion. Nuclear Medicine and Molecular Imaging , 57 (4), 165–167. https://doi.org/10.1007/s13139-023-00809-2

Cabanac, G., & Labbé, C. (2021). Prevalence of nonsensical algorithmically generated papers in the scientific literature. Journal of the Association for Information Science and Technology , 72 (12), 1461–1476. https://doi.org/10.1002/asi.24495

Cabanac, G., Labbé, C., & Magazinov, A. (2021). Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals . arXiv. https://doi.org/10.48550/arXiv.2107.06751

Carrion, M. L. (2018). “You need to do your research”: Vaccines, contestable science, and maternal epistemology. Public Understanding of Science , 27 (3), 310–324. https://doi.org/10.1177/0963662517728024

Centre for Digital Humanities Uppsala (2023). CDHUppsala/word-rain [Computer software]. https://github.com/CDHUppsala/word-rain

Chinn, S., & Hasell, A. (2023). Support for “doing your own research” is associated with COVID-19 misperceptions and scientific mistrust. Harvard Kennedy School (HSK) Misinformation Review, 4 (3). https://doi.org/10.37016/mr-2020-117

Cholewiak, S. A., Ipeirotis, P., Silva, V., & Kannawadi, A. (2023). SCHOLARLY: Simple access to Google Scholar authors and citation using Python (1.5.0) [Computer software]. https://doi.org/10.5281/zenodo.5764801

Dadkhah, M., Lagzian, M., & Borchardt, G. (2017). Questionable papers in citation databases as an issue for literature review. Journal of Cell Communication and Signaling , 11 (2), 181–185. https://doi.org/10.1007/s12079-016-0370-6

Dadkhah, M., Oermann, M. H., Hegedüs, M., Raman, R., & Dávid, L. D. (2023). Detection of fake papers in the era of artificial intelligence. Diagnosis , 10 (4), 390–397. https://doi.org/10.1515/dx-2023-0090

DeGeurin, M. (2024, March 19). AI-generated nonsense is leaking into scientific journals. Popular Science. https://www.popsci.com/technology/ai-generated-text-scientific-journals/

Dunlap, R. E., & Brulle, R. J. (2020). Sources and amplifiers of climate change denial. In D.C. Holmes & L. M. Richardson (Eds.), Research handbook on communicating climate change (pp. 49–61). Edward Elgar Publishing. https://doi.org/10.4337/9781789900408.00013

Fares, M., Kutuzov, A., Oepen, S., & Velldal, E. (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources. In J. Tiedemann & N. Tahmasebi (Eds.), Proceedings of the 21st Nordic Conference on Computational Linguistics (pp. 271–276). Association for Computational Linguistics. https://aclanthology.org/W17-0237

Google Scholar Help. (n.d.). Inclusion guidelines for webmasters . https://scholar.google.com/intl/en/scholar/inclusion.html

Gu, J., Wang, X., Li, C., Zhao, J., Fu, W., Liang, G., & Qiu, J. (2022). AI-enabled image fraud in scientific publications. Patterns , 3 (7), 100511. https://doi.org/10.1016/j.patter.2022.100511

Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods , 11 (2), 181–217.   https://doi.org/10.1002/jrsm.1378

Haider, J., & Åström, F. (2017). Dimensions of trust in scholarly communication: Problematizing peer review in the aftermath of John Bohannon’s “Sting” in science. Journal of the Association for Information Science and Technology , 68 (2), 450–467. https://doi.org/10.1002/asi.23669

Huang, J., & Tan, M. (2023). The role of ChatGPT in scientific communication: Writing better scientific review articles. American Journal of Cancer Research , 13 (4), 1148–1154. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164801/

Jones, N. (2024). How journals are fighting back against a wave of questionable images. Nature , 626 (8000), 697–698. https://doi.org/10.1038/d41586-024-00372-6

Kitamura, F. C. (2023). ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology , 307 (2), e230171. https://doi.org/10.1148/radiol.230171

Littell, J. H., Abel, K. M., Biggs, M. A., Blum, R. W., Foster, D. G., Haddad, L. B., Major, B., Munk-Olsen, T., Polis, C. B., Robinson, G. E., Rocca, C. H., Russo, N. F., Steinberg, J. R., Stewart, D. E., Stotland, N. L., Upadhyay, U. D., & Ditzhuijzen, J. van. (2024). Correcting the scientific record on abortion and mental health outcomes. BMJ , 384 , e076518. https://doi.org/10.1136/bmj-2023-076518

Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74 (5), 570–581. https://doi.org/10.1002/asi.24750

Martín-Martín, A., Orduna-Malea, E., Ayllón, J. M., & Delgado López-Cózar, E. (2016). Back to the past: On the shoulders of an academic search engine giant. Scientometrics , 107 , 1477–1487. https://doi.org/10.1007/s11192-016-1917-2

Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & Delgado López-Cózar, E. (2021). Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics , 126 (1), 871–906. https://doi.org/10.1007/s11192-020-03690-4

Simon, F. M., Altay, S., & Mercier, H. (2023). Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown. Harvard Kennedy School (HKS) Misinformation Review, 4 (5). https://doi.org/10.37016/mr-2020-127

Skeppstedt, M., Ahltorp, M., Kucher, K., & Lindström, M. (2024). From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts. Information Visualization , 23 (3), 217–238. https://doi.org/10.1177/14738716241236188

Swedish Research Council. (2017). Good research practice. Vetenskapsrådet.

Stokel-Walker, C. (2024, May 1.). AI Chatbots Have Thoroughly Infiltrated Scientific Publishing . Scientific American. https://www.scientificamerican.com/article/chatbots-have-thoroughly-infiltrated-scientific-publishing/

Subbaraman, N. (2024, May 14). Flood of fake science forces multiple journal closures: Wiley to shutter 19 more journals, some tainted by fraud. The Wall Street Journal . https://www.wsj.com/science/academic-studies-research-paper-mills-journals-publishing-f5a3d4bc

The pandas development team. (2024). pandas-dev/pandas: Pandas (v2.2.2) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.10957263

Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science , 379 (6630), 313–313. https://doi.org/10.1126/science.adg7879

Tripodi, F. B., Garcia, L. C., & Marwick, A. E. (2023). ‘Do your own research’: Affordance activation and disinformation spread. Information, Communication & Society , 27 (6), 1212–1228. https://doi.org/10.1080/1369118X.2023.2245869

Vikramaditya, N. (2020). Nv7-GitHub/googlesearch [Computer software]. https://github.com/Nv7-GitHub/googlesearch

This research has been supported by Mistra, the Swedish Foundation for Strategic Environmental Research, through the research program Mistra Environmental Communication (Haider, Ekström, Rödl) and the Marcus and Amalia Wallenberg Foundation [2020.0004] (Söderström).

Competing Interests

The authors declare no competing interests.

The research described in this article was carried out under Swedish legislation. According to the relevant EU and Swedish legislation (2003:460) on the ethical review of research involving humans (“Ethical Review Act”), the research reported on here is not subject to authorization by the Swedish Ethical Review Authority (“etikprövningsmyndigheten”) (SRC, 2017).

This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are properly credited.

Data Availability

All data needed to replicate this study are available at the Harvard Dataverse: https://doi.org/10.7910/DVN/WUVD8X

Acknowledgements

The authors wish to thank two anonymous reviewers for their valuable comments on the article manuscript as well as the editorial group of Harvard Kennedy School (HKS) Misinformation Review for their thoughtful feedback and input.

This paper is in the following e-collection/theme issue:

Published on 4.9.2024 in Vol 12 (2024)

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study

Authors of this article:

Author Orcid Image

Original Paper

  • Seyma Handan Akyon 1 , MD   ; 
  • Fatih Cagatay Akyon 2, 3 , PhD   ; 
  • Ahmet Sefa Camyar 4 , MD   ; 
  • Fatih Hızlı 5 , MD   ; 
  • Talha Sari 2, 6 , BSc   ; 
  • Şamil Hızlı 7 , Prof Dr, MD  

1 Golpazari Family Health Center, Bilecik, Turkey

2 SafeVideo AI, San Francisco, CA, United States

3 Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

4 Department of Internal Medicine, Ankara Etlik City Hospital, Ankara, Turkey

5 Faculty of Medicine, Ankara Yildirim Beyazit University, Ankara, Turkey

6 Department of Computer Science, Istanbul Technical University, Istanbul, Turkey

7 Department of Pediatric Gastroenterology, Children Hospital, Ankara Bilkent City Hospital, Ankara Yildirim Beyazit University, Ankara, Turkey

Corresponding Author:

Seyma Handan Akyon, MD

Golpazari Family Health Center

Istiklal Mahallesi Fevzi Cakmak Caddesi No:23 Golpazari

Bilecik, 11700

Phone: 90 5052568096

Email: [email protected]

Background: Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed.

Objective: This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study.

Methods: The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs’ understanding of different sections of a research paper.

Results: LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs ( P <.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper—with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding.

Conclusions: This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.

Introduction

Artificial intelligence (AI) has revolutionized numerous fields, including health care, with its potential to enhance patient outcomes, increase efficiency, and reduce costs [ 1 ]. AI devices are divided into 2 main categories. One category uses machine learning techniques to analyze structured data for medical applications, while the other category uses natural language processing methods to extract information from unstructured data, such as clinical notes, thereby improving the analysis of structured medical data [ 2 ]. A key development within natural language processing has been the emergence of large language models (LLMs), which are advanced systems trained on vast amounts of text data to generate human-like language and perform a variety of language-based tasks [ 3 ]. While deep learning models recognize patterns in data [ 4 ], LLMs are trained to predict the probability of a word sequence based on the context. By training on large amounts of text data, LLMs can generate new and plausible sequences of words that the mode has not previously observed [ 4 ]. ChatGPT, an advanced conversational AI technology developed by OpenAI in late 2022, is a general-purpose LLM [ 5 ]. GPT is part of a growing landscape of conversational AI products, with other notable examples including Llama (Meta), Jurassic (Ai21), Claude (Anthropic), Command (Cohere), Gemini (formerly known as Bard), PaLM, and Bard (Google) [ 5 ]. The potential of AI systems to enhance medical care and health outcomes is highly promising [ 6 ]. Therefore, it is essential to ensure that the creation of AI systems in health care adheres to the principles of trust and explainability. Evaluating the medical knowledge of AI systems compared to that of expert clinicians is a vital initial step to assess these qualities [ 5 , 7 , 8 ].

Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. This poses a significant barrier to efficient knowledge acquisition and evidence-based decision-making in health care. There is a need for a tool that can help doctors to process and understand medical papers more efficiently and accurately. Although LLMs are promising in evaluating patients, diagnosis, and treatment processes [ 9 ], studies on reading academic papers are limited. LLMs can be directly questioned and can generate answers from their own memory [ 10 , 11 ]. This has been extensively studied in many papers. However, these pose the problem of artificial hallucinations, which are inaccurate outputs, in LLMs. The retrieval augmented generation (RAG) method, which intuitively addresses the knowledge gap by conditioning language models on relevant documents retrieved from an external knowledge source, can be used to overcome this issue [ 12 ].

The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist provides a standardized framework for evaluating key elements of observational study and sufficient information for critical evaluation. These guidelines consist of 22 items that authors should adhere to before submitting their manuscripts for publication [ 13 - 15 ]. This study aims to address this gap by evaluating the comprehension capabilities of LLMs in accurately and efficiently understanding medical research papers. We use the STROBE checklist to assess LLMs’ ability to understand different sections of research papers. This study uses a novel benchmark pipeline that can process PubMed papers regardless of their length using various generative AI tools. This research will provide critical insights into the strengths and weaknesses of different LLMs in enhancing medical research paper comprehension. To overcome the problem of “artificial hallucinations,” we implement the RAG method. RAG involves providing the LLMs with a prompt that instructs them to answer while staying relevant to the given document, ensuring responses align with the provided information. The results of this study will provide valuable information for medical professionals, researchers, and developers seeking to leverage the potential of LLMs for improving medical literature comprehension and ultimately enhance patient care and research efficiency.

Design of Study

This study uses a methodological research design to evaluate the comprehension capabilities of generative AI tools using the STROBE checklist.

Paper Selection

We included the first 50 observational studies conducted within the past 5 years that were retrieved through an advanced search on PubMed on December 19, 2023, using “obesity” in the title as the search term. The included studies were limited to those written in English, available as free full text, and focusing specifically on human participants ( Figure 1 ). The papers included in the study were statistically examined in detail, and a total of 11 of them were excluded because they were not observational studies. The study was completed with 39 papers. A post hoc power analysis was conducted to assess the statistical power of our study based on the total correct responses across all repetitions. The analysis excluded GPT-4-1106 and GPT-3.5-Turbo-1106 due to their similar performance and the significant differences observed between other models. The power analysis, conducted using G*Power (version 3.1.9.7; Heinrich-Heine-Universität Düsseldorf), indicated that all analyses exceeded 95% power. Thus, the study was completed with the 39 selected papers, ensuring sufficient statistical power to detect meaningful differences in LLM performance.

question paper of business research methods

Benchmark Development

This study used a novel benchmark pipeline to evaluate the understanding capabilities of LLMs when processing medical research papers. To establish a reference standard for evaluating the LLMs’ comprehension, we relied on the expertise of an experienced medical professor and an epidemiology expert doctor. The professor, with their extensive medical knowledge, was tasked with answering 15 questions derived from the STROBE checklist, designed to assess key elements of observational studies and cover different sections of a research paper ( Table 1 ). The epidemiology expert doctor, with their specialized knowledge in statistical analysis and epidemiological methods, provided verification and validation of the professor’s answers, ensuring the rigor of the benchmark. The combined expertise of both professionals provided a robust and reliable reference standard against which the LLMs’ responses were compared.

QuestionsAnswers

Q1. Does the paper indicate the study’s design with a commonly used term in the title or the abstract?

Q2. What is the observational study type: cohort, case-control, or cross-sectional studies?

Q3. Were settings or locations mentioned in the method?

Q4. Were relevant dates mentioned in the method?

Q5. Were eligibility criteria for selecting participants mentioned in the method?

Q6. Were sources and methods of selection of participants mentioned in the method?

Q7. Were any efforts to address potential sources of bias described in the method or discussion?

Q8. Which program was used for statistical analysis?

Q9. Were report numbers of individuals at each stage of the study (eg, numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analyzed) mentioned in the results?

Q10. Was a flowchart used to show the reported numbers of individuals at each stage of the study?

Q11. Were the study participants’ demographic characteristics (eg, age and sex) given in the results?

Q12. Does the discussion part summarize key results concerning study objectives?

Q13. Are the limitations of the study discussed in the paper?

Q14. Is the generalizability of the study discussed in the discussion part?

Q15. Is the funding of the study mentioned in the paper?

This list of 15 questions, 2 multiple-choice and 13 yes or no questions, has been prepared by selecting the STROBE checklist items that can be answered definitively and have clear, nonsubjective responses. Question 1, related to title and abstract, examines the LLMs’ ability to identify and understand research designs and terms that are commonly used, evaluating the model’s comprehension of the concise language typically used in titles and abstracts. Questions 2-8, related to methods, cover various aspects of the study’s methodology, from the type of observational study to the statistical analysis programs used. They test the model’s understanding of the detailed and technical language often found in this section. Questions 9-11, related to results, focus on the accuracy and completeness of reported results, such as participant numbers at each study stage and demographic characteristics. These questions gauge the LLMs’ capability to parse and summarize factual data. Questions 12-14, related to the discussion, involve summarizing key results, discussing limitations, and addressing the study’s generalizability. These questions assess the LLMs’ ability to engage with more interpretive and evaluative content, showcasing their understanding of research impacts and contexts. Question 15, related to funding, tests the LLMs’ attentiveness to specific yet crucial details that could influence the interpretation of research findings.

Development of Novel RAG-Based LLM Web Application

The methodology incorporated a novel web application specifically designed for this purpose to assess the understanding capabilities of generative AI tools in medical research papers ( Figure 2 ). To mitigate the problem of “artificial hallucinations” inherent to LLMs, this study implemented the RAG method, which involves using a web application to dissect PDF-format medical papers from PubMed into text chunks ready to be processed by various LLMs. This approach guides the LLMs to provide answers grounded in the provided information by supplying them with relevant text chunks retrieved from the target paper.

question paper of business research methods

Benchmark Pipeline

The benchmark pipeline itself is designed to process PubMed papers of varying lengths and extract relevant information for analysis. This pipeline operates as follows:

  • Paper retrieval: We retrieved 39 observational studies from PubMed using the search term “obesity” in the title.
  • Text extraction and chunking: Each retrieved PubMed paper was converted to PDF format and then processed through our web application. The application extracts all text content from the paper and divides it into smaller text chunks of manageable size.
  • Vector representation: Using the OpenAI text-ada-embedding-002 model, each text chunk was converted into a representation vector. These vectors capture the semantic meaning of the text chunks, allowing for efficient information retrieval.
  • Vector database storage: The generated representation vectors were stored in a vector database (LanceDB in our case). This database allows for rapid searching and retrieval of the most relevant text chunks based on a given query.
  • Query processing: When a query (question from the STROBE checklist) was posed to an LLM, our pipeline calculated the cosine similarities between the query’s representation vector and the vectors stored in the database. This identified the most relevant text chunks from the paper.
  • RAG: The retrieved text chunks, along with the original query, were then combined and presented to the LLM. This approach, known as RAG, ensured that the LLM’s responses were grounded in the specific information present in the paper, mitigating the risk of hallucinations.
  • Answer generation and evaluation: The LLM generated an answer to the query based on the provided text chunks. The accuracy of each LLM’s response was then evaluated by comparing it to the benchmark answers provided by a medical professor.

Using this benchmark pipeline, we compared the answers of the generative AI tools, such as GPT-3.5-Turbo-1106 (June 11th version), GPT-4-0613 (November 6th version), GPT-4-1106 (June 11th version), PaLM 2 (chat-bison), Claude v1, and Gemini Pro, with the benchmark in 15 questions for 39 medical research papers ( Table 2 ). In this study, 15 questions selected from the STROBE checklists were posed 10 times each for 39 papers to 6 different LLMs.

Generative AI toolVersionCompanyCutoff date
GPT-3,5-TurboNovember 6, 2023OpenAISeptember 2021
GPT-4-0613June 13, 2023OpenAISeptember 2021
GPT-4-1106November 6, 2023OpenAIApril 2023
Claude v1Version 1Anthropic
PaLM 2Chat-bisonGoogle
Gemini Pro1.0Google

a The company does not explicitly state a cutoff date.

Access issues with Claude v1, specifically restrictions on its ability to process certain medical information, resulted in the exclusion of data from 6 papers, limiting the study’s scope to 33 papers. LLMs commonly provide a “knowledge-cutoff” date, indicating the point at which their training data ends and they may not have access to the most up-to-date information. With some LLMs, however, the company does not explicitly state a cutoff date. The explicitly stated cutoff dates are given in Table 2 , based on the publicly available information for each LLM.

A chatbot conversation begins when a user enters a query, often called a system prompt. The chatbot responds in natural language within a second, creating an interactive, conversation-like exchange. This is possible because the chatbot understands context. In addition to the RAG method, providing LLMs with well-designed system prompts that guide them to stay relevant to a given document can help generate responses that align with the provided information. We used the following system prompt for all LLMs:

You are an expert medical professor specialized in pediatric gastroenterology hepatology and nutrition, with a detailed understanding of various research methodologies, study types, ethical considerations, and statistical analysis procedures. Your task is to categorize research articles based on information provided in query prompts. There are multiple options for each question, and you must select the most appropriate one based on your expertise and the context of the research article presented in the query.

The language models used in this study rely on statistical models that incorporate random seeds to facilitate the generation of diverse outputs. However, the companies behind these LLMs do not offer a stable way to fix these seeds, meaning that a degree of randomness is inherent in their responses. To further control this randomness, we used the “temperature” parameter within the language models. This parameter allows for adjustment of the level of randomness, with a lower temperature setting generally producing more deterministic outputs. For this study, we opted for a low-temperature parameter setting of 0.1 to minimize the impact of randomness. Despite these efforts, complete elimination of randomness is not possible. To further mitigate its effects and enhance the consistency of our findings, we repeated each question 10 times for the same language model. By analyzing the responses across these 10 repetitions, we could determine the frequency of accurate and consistent answers. This approach helped to identify instances where the LLM’s responses were consistently aligned with the benchmark answers, highlighting areas of strength and consistency in comprehension.

Statistical Analysis

Each question was repeated 10 times in the same time period to obtain answers from multiple LLMs and ensure the consistency and reliability of responses. Consequently, the responses to the same question were analyzed to determine how many aligned with the benchmark, and the findings were examined. Only the answers that were correct and followed the instructions provided in the question text were considered “correct.” Ambiguous answers, evident mistakes, and responses with an excessive number of candidates were considered incorrect. The data were carefully examined, and the findings were documented and analyzed. Each inquiry and its response formed the basis of the analysis. Various descriptive statistical tests were used to assess the data presented as numbers and percentages. The Shapiro-Wilk test was used to assess the data’s normal distribution. The Kruskal-Wallis and Pearson chi-square tests were used in the statistical analysis. Type I error level was accepted as 5% in the analyses performed using the SPSS (version 29.0; IBM Corp).

Ethical Considerations

This study only used information that had already been published on the internet. Ethics approval is not required for this study since it did not involve any human or animal research participants. This study did not involve a clinical trial, as it focused on evaluating the capabilities of AI tools in understanding medical papers.

In this study, 15 questions selected from the STROBE checklists were posed 10 times each for 39 papers to 6 different LLMs. Access issues with Claude v1, specifically restrictions on its ability to process certain medical information, resulted in the exclusion of data from 6 papers, limiting the study’s scope to 33 papers. The percentage of correct answers for each LLM is shown in Table 3 , with GPT-3.5-Turbo achieving the highest rate (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%).

LLMTotal questions askedCorrect answers, n (%)
GPT-3.5-Turbo-110658503916 (66.9)
GPT-4-061358502580 (44.1)
GPT-4-110658503837 (65.6)
Claude v149502887 (58.3)
PaLM 2-chat-bison58503632 (62.1)
Gemini Pro58502878 (49.2)

Each LLM was compared with another LLM that provided a lower percentage of correct answers. Statistical analysis using the Kruskal-Wallis test revealed statistically significant differences between the LLMs ( P <.001). The lowest correct answer percentage was provided by GPT-4-0613, at 44.1% (n=2580). Gemini Pro yielded 49.2% (n=2878) correct answers, significantly higher than GPT-4-0613 ( P <.001). Claude v1 yielded 58.3% (n=2887) correct answers, statistically significantly higher than Gemini Pro ( P <.001). PaLM 2 achieved 62.1% (n=3632) correct answers, significantly higher than Claude v1 ( P <.001). GPT-4-1106 achieved 65.6% (n=3837) correct answers, significantly higher than PaLM 2 ( P <.001). The difference between GPT-4-1106 and GPT-3.5-Turbo-1106 was not statistically significant ( P =.06). Of the 39 papers analyzed, 28 (71.8%) were published before the training data cutoff date for GPT-3.5-Turbo and GPT-4-0613, while all 39 (100%) papers were published before the cutoff date for GPT-4-1106. Explicit cutoff dates for the remaining LLMs (Claude, PaLM 2, and Gemini Pro) were not publicly available and therefore could not be assessed in this study. When all LLMs are collectively considered, the 3 questions receiving the highest percentage of correct answers were question 12 (n=4025, 68.3%), question 13 (n=3695, 62.8%), and question 10 (n=3565, 60.5%). Conversely, the 3 questions with the lowest percentage of correct responses were question 8 (n=1971, 33.5%), question 15 (n=2107, 35.8%), and question 1 (n=2147, 36.5%; Table 4 ).

QuestionCorrect answers (across all LLMs), n (%)
Q12147 (36.5)
Q23061 (52)
Q32953 (50.2)
Q42713 (46.2)
Q53353 (57.1)
Q63132 (53.3)
Q72530 (43)
Q81971 (33.5)
Q92288 (38.9)
Q103565 (60.5)
Q113339 (56.9)
Q124025 (68.3)
Q133695 (62.8)
Q142578 (43.8)
Q152107 (35.8)

The percentages of correct answers given by all LLMs for each question are depicted in Figure 3 . The median values for questions 7, 8, 9, 10, and 14 were similar across all LLMs, indicating a general consistency in performance for these specific areas of comprehension. However, significant differences were observed in the performance of different LLMs for other questions. The statistical tests used in this analysis were the Kruskal-Wallis test for comparing the medians of multiple groups and the chi-square test for comparing categorical data. For question 1, the fewest correct answers were provided by Claude (n=124, 24.8%) and Gemini Pro (n=197, 39.5%), while the most correct answers were provided by PaLM 2 (n=301, 60.3%; P =.01). In question 2, Claude v1 (n=366, 73.3%) achieved the highest median correct answer count (10.0, IQR 5.0-10.0), while Gemini Pro provided the fewest correct answers (n=237, 47.4%; P =.03). For question 3, GPT-3.5 (n=425, 85.1%) and PaLM 2 (n=434, 86.8%) had the highest median correct answer counts, while GPT-4-0613 (n=164, 32.8%) and Gemini Pro (n=189, 37.9%) had the lowest ( P <.001). In the fourth question, PaLM 2 (n=369, 73.8%), GPT-3.5 (n=293, 58.7%), and GPT-4-1106 (n=336, 67.2%) performed best, while GPT-4-0613 (n=187, 37.4%) showed the lowest performance ( P <.001). For questions 5 and 6, GPT-4-0613 (n=209, 41.8%) and Gemini Pro (n=186, 37.2%) provided fewer correct answers compared to the other LLMs ( P <.001 and P =.001, respectively). In question 11, GPT-4-1106 (n=406, 81.2%), Claude (n=347, 69.4%), and PaLM 2 (n=406, 81.2%) performed well, while Gemini Pro (n=264, 52.8%) had the fewest correct answers ( P =.001). For questions 12 and 13, all LLMs, except GPT-4-0613, performed well in these areas ( P <.001). In question 15, GPT-3.5 (n=368, 73.6%) showed the highest number of correct answers ( P <.001; Multimedia Appendix 1 ).

question paper of business research methods

Principal Findings

AI can improve the data analysis and publication process in scientific research while also being used to generate medical papers [ 16 ]. Although these fraudulent papers may appear well-crafted, their semantic inaccuracies and errors can be detected by expert readers upon closer examination [ 11 , 17 ]. The impact of LLMs on health care is often discussed in terms of their ability to replace health professionals, but their significant impact on medical and research writing applications and limitations is often overlooked. Therefore, physicians involved in research need to be cautious and verify information when using LLMs. As their reliance can lead to ethical concerns and inaccuracies, the scientific community should be vigilant in ensuring the accuracy and reliability of AI tools by using them as aids rather than replacements, understanding their limitations and biases [ 10 , 18 ]. With millions of papers published annually, AI could generate summaries or recommendations, simplifying the process of gathering evidence and enabling researchers to grasp important aspects of scientific results more efficiently [ 18 ]. Moreover, there is limited research focused on assessing the comprehension of academic papers.

This study aimed to evaluate the ability of 6 different LLMs to understand medical research papers using the STROBE checklist. We used a novel benchmark pipeline that processed 39 PubMed papers, posing 15 questions derived from the STROBE checklist to each model. The benchmark was established using the answers provided by an experienced medical professor and validated by an epidemiologist, serving as a reference standard against which the LLMs’ responses were compared. To mitigate the problem of “artificial hallucinations” inherent to LLMs, our study implemented the RAG method, which involves using a web application to dissect PDF-format medical papers into text chunks and present them to the LLMs.

Our findings reveal significant variation in the performance of different LLMs, suggesting that LLMs are capable of understanding medical papers to varying degrees. While newer models like GPT-3.5-Turbo and GPT-4-1106 generally demonstrated better comprehension, GPT-3.5-Turbo outperformed even the more recent GPT-4-0613 in certain areas. This unexpected finding highlights the complexity of LLM performance, indicating that simple assumptions about newer models consistently outperforming older ones may not always hold true. The impact of training data cutoffs on LLM performance is a critical consideration in evaluating their ability to understand medical research [ 19 ]. While we were able to obtain explicitly stated cutoff dates for GPT-3.5-Turbo, GPT-4-1106, and GPT-4-0613, this information was not readily available for the remaining models. This lack of transparency regarding training data limits our ability to definitively assess the impact of knowledge cutoffs on model performance. The observation that all 39 papers were published before the cutoff date for GPT-4-1106, while only 28 papers were published before the cutoff date for GPT-3.5-Turbo and GPT-4-0613, suggests that the knowledge cutoff may play a role in the observed performance differences. GPT-4-1106, with a more recent knowledge cutoff, has access to a larger data set, potentially including information from more recently published research. This could contribute to its generally better performance compared to GPT-3.5-Turbo. However, it is important to note that GPT-3.5-Turbo still outperformed GPT-4-0613 in specific areas, even with a similar knowledge cutoff. This suggests that factors beyond training data (eg, the number of layers, the type of attention mechanism, or the use of transformers) and compression techniques (eg, quantization, pruning, or knowledge distillation) may also play a significant role in LLM performance. Future research should prioritize transparency regarding training data cutoffs and aim to standardize how LLMs communicate these crucial details to users.

This study evaluated the performance of various LLMs in accurately answering specific questions related to different sections of a scholarly paper: title and abstract, methods, results, discussion, and funding. The results shed light on which LLMs excel in specific areas of comprehension and information retrieval from academic texts. PaLM 2 (n=219, 60.3%) showed superior performance in question 1, identifying the study design from the title or abstract, suggesting enhanced capability in understanding and identifying specific terminologies. Claude (n=82, 24.8%) and Gemini Pro (n=154, 39.5%), however, lagged, indicating a potential area for improvement in terminology recognition and interpretation. Claude v1 (n=242, 73.3%) and PaLM 2 (n=295, 86.8%) exhibited strong capabilities in identifying methodological details, such as observational study types and settings or locations (questions 2-8). This suggests a robust understanding of complex methodological descriptions and the ability to distinguish between different study frameworks. For questions regarding the results section (questions 9-11), it is evident that models like GPT-4-1106 (n=317, 81.3%), Claude (n=229, 69.4%), and PaLM 2 (n=276, 81.2%) showed superior performance in providing correct answers related to the study participants’ demographic characteristics and the use of flowcharts. All LLMs except for GPT4-0613 (n=89, 22.8%) exhibited remarkable competence in summarizing key results, discussing limitations, and addressing the generalizability of the study (questions 12-14), which are critical aspects of the discussion section. GPT-3.5 (n=287, 73.6%) particularly excelled in identifying the mention of funding (question 15), indicating a nuanced understanding of acknowledgments and funding disclosures often nuanced and embedded toward the end of papers. Across the array of tested questions, both GPT-3.5 and PaLM 2 exhibit remarkable strengths in understanding and analyzing scholarly papers, with PaLM 2 generally showing a slight edge in versatility, especially in interpreting methodological details and study design. GPT-3.5, while strong in discussing study limitations, generalized findings, and funding details, indicates that improvements can be made in extracting complex methodological information. We observed that different models excelled in different areas, indicating that no single LLM currently demonstrates universal dominance in medical paper understanding. This suggests that factors like training data, model architecture, and question complexity influence performance, and further research is needed to understand the specific contributions of each factor.

Comparison to Prior Work

LLMs can be directly questioned and can generate answers from their own memory [ 11 ]. This has been extensively studied in many medical papers . According to a study, ChatGPT, an LLM, was evaluated on the United States Medical Licensing Examination. The results showed that GPT performed at or near the passing threshold for examinations without any specialized training, demonstrating a high level of concordance and insight in its explanations. These findings suggest that LLMs have the potential to aid in medical education and potentially assist with clinical decision-making [ 5 , 20 ]. Another study aimed to evaluate the knowledge level of GPT in medical education by assessing its performance in a multiple-choice question examination and its potential impact on the medical examination system. The results indicated that GPT achieved a satisfactory score in both basic and clinical medical sciences, highlighting its potential as an educational tool for medical students and faculties [ 21 ]. Furthermore, GPT offers information and aids health care professionals in diagnosing patients by analyzing symptoms and suggesting appropriate tests or treatments. However, advancements are required to ensure AI’s interpretability and practical implementation in clinical settings [ 8 ]. The study conducted in October 2023 explored the diagnostic capabilities of GPT-4V, an AI model, in complex clinical scenarios involving medical imaging and textual patient data. Results showed that GPT-4V had the highest diagnostic accuracy when provided with multimodal inputs, aligning with confirmed diagnoses in 80.6% of cases [ 22 ]. In another study, GPT-4 was instructed to address the case with multiple-choice questions followed by an unedited clinical case report that evaluated the effectiveness of the newly developed AI model GPT-4 in solving complex medical case challenges. GPT-4 correctly diagnosed 57% of the cases, outperforming 99.98% of human readers who were also tasked with the same challenge [ 23 ]. These studies highlight the potential of multimodal AI models like GPT-4 in clinical diagnostics, but further investigation is needed to uncover biases and limitations due to the model’s proprietary training data and architecture.

There are few studies in which LLMs are directly questioned, and their capacities to produce answers from their own memories are compared with each other and expert clinicians. In a study, GPT-3.5 and GPT-4 were compared to orthopedic residents in their performance on the American Board of Orthopaedic Surgery written examination, with residents scoring higher overall, and a subgroup analysis revealed that GPT-3.5 and GPT-4 outperformed residents in answering text-only questions, while residents scored higher in image interpretation questions. GPT-4 scored higher than GPT-3.5 [ 24 ]. A study aimed to evaluate and compare the recommendations provided by GPT-3 and GPT-4 with those of primary care physicians for the management of depressive episodes. The results showed that both GPT-3.5 and GPT-4 largely aligned with accepted guidelines for treating mild and severe depression while demonstrating a lack of gender or socioeconomic biases observed among primary care physicians. However, further research is needed to refine the AI recommendations for severe cases and address potential ethical concerns and risks associated with their use in clinical decision-making [ 25 ]. Another study assessed the accuracy and comprehensiveness of health information regarding urinary incontinence generated by various LLMs. By inputting selected questions into GPT-3.5, GPT-4, and Gemini, the researchers found that GPT-4 performed the best in terms of accuracy and comprehensiveness, surpassing GPT-3.5 and Gemini [ 26 ]. According to a study that evaluates the performance of 2 GPT models (GPT-3.5 and GPT-4) and human professionals in answering ophthalmology questions from the StatPearls question bank, GPT-4 outperformed both GPT-3.5 and human professionals on most ophthalmology questions, showing significant performance improvements and emphasizing the potential of advanced AI technology in the field of ophthalmology [ 27 ]. Some studies showed that GPT-4 is more proficient, as evidenced by scoring higher than GPT-3.5 in both multiple-choice dermatology examinations and non–multiple-choice cardiology heart failure questions from various sources and outperforming GPT-3.5 and Flan-PaLM 540B on medical competency assessments and benchmark data sets [ 28 - 30 ]. In a study conducted on the proficiency of various open-source and proprietary LLMs in the context of nephrology multiple-choice test-taking ability, it was found that their performance on 858 nephSAP questions ranged from 17.1% to 30.6%, with Claude 2 at 54.4% accuracy and GPT-4 at 73.3%, highlighting the potential for adaptation in medical training and patient care scenarios [ 31 ]. To our knowledge, this is the first study to assess the performance of evaluating medical papers and understanding the capabilities of different LLMs. The findings reveal that the performance of LLMs varies across different questions, with some LLMs showing superior understanding and answer accuracy in certain areas. Comparative analysis across different LLMs showcases a gradient of capabilities. The results revealed a hierarchical performance ranking as follows: GPT-4-1106 equals GPT-3.5-Turbo, which is superior to PaLM 2, followed by Claude v1, then Gemini Pro, and finally, GPT-4-0613. Similar to the literature review, GPT-4-1106 and GPT-3.5 showed improved accuracy and understanding compared to other LLMs. This mirrors wider literature trends, indicating LLMs’ rapid evolution and increasing sophistication in handling complex medical queries. Notably, GPT-3.5-Turbo showed better performance than GPT-4-0613, which may be counterintuitive, considering the tendency to assume newer iterations naturally perform better. This anomaly in performance between newer and older versions can be attributed to the application of compression techniques in developing new models to reduce computational costs. While these advancements make deploying LLMs more cost-effective and thus accessible, they can inadvertently compromise the performance of LLMs. The notable absence of responses from PaLM in certain instances, actually stemming from Google’s policy to restrict the use of its medical information, presents an intriguing case within the scope of our discussion. Despite these constraints, PaLM’s demonstrated high performance in other areas is both surprising and promising. This suggests that even when faced with limitations on accessing a vast repository of medical knowledge, PaLM’s underlying architecture and algorithms enable it to make effective use of the information it can access, showcasing the robust potential of LLMs in medical settings even under restricted conditions.

Strengths and Limitations

While LLMs can be directly questioned and generate answers from their own memory, as demonstrated in numerous studies above, this approach can lead to inaccuracies known as hallucinations. Hallucinations in LLMs have diverse origins, encompassing the entire spectrum of the capability acquisition process, with hallucinations primarily categorized into 3 aspects: training, inference, and data. Architecture flaws, exposure bias, and misalignment issues in both pretraining and alignment phases induce hallucinations. To address this challenge, our study used the RAG method, ensuring that the LLMs’ responses were grounded in factual information retrieved from the target paper. The RAG method intuitively addresses the knowledge gap by conditioning language models on relevant documents retrieved from an external knowledge source [ 12 , 32 ]. RAG provides the LLM with relevant text chunks extracted from the specific paper being analyzed. This ensures that the LLM’s responses are directly supported by the provided information, reducing the risk of hallucination. While a few studies have explored the use of RAG to compare LLMs, like the one demonstrating GPT-4’s improved accuracy with RAG for interpreting oncology guidelines [ 33 ], our study is the first to evaluate LLM comprehension of medical research papers using this method. This method conditions LLMs on relevant documents retrieved from an external knowledge source, ensuring their answers are grounded in factual information. The design of system prompts is crucial for LLMs, as it provides context, instructions, and formatting guidelines to ensure the desired output [ 34 ]. In this study, it is empirically determined that a foundational system and set of system prompts universally enhanced the response quality across all language models tested. This approach was designed to optimize the comprehension and summarization capabilities of each generative AI tool when processing medical research papers. The specific configuration of system settings and query structures we identified significantly contributed to improving the accuracy and relevance of the models’ answers. These optimized parameters were crucial in achieving a more standardized and reliable evaluation of each model’s ability to understand complex medical texts. While further research is needed to fully understand the effectiveness of RAG across different medical scenarios, our findings demonstrate its potential to enhance the reliability and accuracy of LLMs in medical research comprehension.

This study, while offering valuable insights, is subject to several limitations. The selection of 50 papers focused on obesity, and the use of a specific set of 15 STROBE-derived questions might not fully capture the breadth of medical research. Additionally, the reliance on binary and multiple-choice questions restricts the evaluation of LLMs’ ability to provide nuanced answers. The rapid evolution of LLMs means that the findings might not be applicable to future versions, and potential biases within the training data have not been systematically assessed. Furthermore, the study’s reliance on a single highly experienced medical professor as the benchmark, while evaluating, might limit the generalizability of the findings. A larger panel of experts with diverse areas of specialization might provide a more comprehensive reference standard for evaluating LLM performance. Further investigation with a wider scope and more advanced methodologies is needed to fully understand the potential of LLMs in medical research.

Future Directions

In conclusion, LLMs show promise for transforming medical research, potentially enhancing research efficiency and evidence-based decision-making. This study demonstrates that LLMs exhibit varying capabilities in understanding medical research papers. While newer models generally demonstrate better comprehension, no single LLM currently excels in all areas. This highlights the need for further research to understand the complex interplay of factors influencing LLM performance. Continued research is crucial to address these limitations and ensure the safe and effective integration of LLMs in health care, maximizing their benefits while mitigating risks.

Acknowledgments

The authors gratefully acknowledge Dr Hilal Duzel for her invaluable assistance in validating the reference standard used in this study. Dr Duzel’s expertise in epidemiology and statistical analysis ensured the accuracy and robustness of the benchmark against which the LLMs were evaluated. We would also like to thank Ahmet Hamza Dogan, a promising future engineer, for his contributions to the LLM analysis.

Conflicts of Interest

None declared.

Percentages of correct answers by large language models for each question.

  • Lv Z. Generative artificial intelligence in the metaverse era. Cogn Robot. 2023;3:208-217. [ CrossRef ]
  • Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. Dec 2017;2(4):230-243. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Orrù G, Piarulli A, Conversano C, Gemignani A. Human-like problem-solving abilities in large language models using ChatGPT. Front Artif Intell. 2023;6:1199350. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chenais G, Gil-Jardiné C, Touchais H, Avalos Fernandez M, Contrand B, Tellier E, et al. Deep learning transformer models for building a comprehensive and real-time trauma observatory: development and validation study. JMIR AI. Jan 12, 2023;2:e40843. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. Feb 2023;2(2):e0000198. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dwivedi YK, Kshetri N, Hughes L, Slade EL, Jeyaraj A, Kar AK, et al. Opinion paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inform Manage. Aug 2023;71:102642. [ CrossRef ]
  • Akyon SH, Akyon FC, Yılmaz TE. Artificial intelligence-supported web application design and development for reducing polypharmacy side effects and supporting rational drug use in geriatric patients. Front Med (Lausanne). 2023;10:1029198. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122-1131.e9. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Preiksaitis C, Ashenburg N, Bunney G, Chu A, Kabeer R, Riley F, et al. The role of large language models in transforming emergency medicine: scoping review. JMIR Med Inform. May 10, 2024;12:e53787. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kumar M, Mani UA, Tripathi P, Saalim M, Roy S. Artificial hallucinations by Google Bard: think before you leap. Cureus. Aug 2023;15(8):e43313. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Májovský M, Černý M, Kasal M, Komarc M, Netuka D. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: pandora's box has been opened. J Med Internet Res. May 31, 2023;25:e46924. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Shuster K, Poff S, Chen M, Kiela D, Weston J. Retrieval augmentation reduces hallucination in conversation. ArXiv. Preprint posted online on April 15, 2021. [ FREE Full text ] [ CrossRef ]
  • Cuschieri S. The STROBE guidelines. Saudi J Anaesth. Apr 2019;13(Suppl 1):S31-S34. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. Oct 20, 2007;335(7624):806-808. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • STROBE Checklist: cohort, case-control, and cross-sectional studies (combined). URL: https:/​/www.​strobe-statement.org/​download/​strobe-checklist-cohort-case-control-and-cross-sectional-studies-combined [accessed 2023-12-28]
  • Chen T. ChatGPT and other artificial intelligence applications speed up scientific writing. J Chin Med Assoc. Apr 01, 2023;86(4):351-353. [ CrossRef ] [ Medline ]
  • Kitamura FC. ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology. Apr 2023;307(2):e230171. [ CrossRef ] [ Medline ]
  • De Angelis L, Baglivo F, Arzilli G, Privitera GP, Ferragina P, Tozzi AE, et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023;11:1166120. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the performance of generative AI large language models ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: comparative mixed methods study. J Med Internet Res. Dec 28, 2023;25:e51580. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wong RS, Ming LC, Raja Ali RA. The intersection of ChatGPT, clinical medicine, and medical education. JMIR Med Educ. Nov 21, 2023;9:e47274. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Meo SA, Al-Masri AA, Alotaibi M, Meo MZS, Meo MOS. ChatGPT knowledge evaluation in basic and clinical medical sciences: multiple choice question examination-based performance. Healthcare (Basel). Jul 17, 2023;11(14):2046. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schubert MC, Lasotta M, Sahm F, Wick W, Venkataramani V. Evaluating the multimodal capabilities of generative AI in complex clinical diagnostics. Medrxiv. 2023. [ CrossRef ]
  • Eriksen AV, Möller S, Ryg J. Use of GPT-4 to diagnose complex clinical cases. NEJM AI. Dec 11, 2023;1(1):AIp2300031. [ CrossRef ]
  • Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT-3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. J Am Acad Orthop Surg. Dec 01, 2023;31(23):1173-1179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Levkovich I, Elyoseph Z. Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians. Fam Med Community Health. Sep 2023;11(4):e002391. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Coşkun B, Bayrak O, Ocakoglu G, Acar HM, Kaygisiz O. Assessing the accuracy of AI language models in providing information on urinary incontinence: a comparative study. Eur J Public Health. 2023;3(3):61-70. [ FREE Full text ] [ CrossRef ]
  • Moshirfar M, Altaf AW, Stoakes IM, Tuttle JJ, Hoopes PC. Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions. Cureus. Jun 2023;15(6):e40822. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • King RC, Samaan JS, Yeo YH, Mody B, Lombardo DM, Ghashghaei R. Appropriateness of ChatGPT in answering heart failure related questions. Heart Lung Circ. May 30, 2024:1-5. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Passby L, Jenko N, Wernham A. Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions. Clin Exp Dermatol. Jun 25, 2024;49(7):722-727. [ CrossRef ] [ Medline ]
  • Nori H, King N, McKinney S, Carignan D, Horvitz E. Capabilities of GPT-4 on medical challenge problems. ArXiv. Preprint posted online on April 12, 2023. [ FREE Full text ] [ CrossRef ]
  • Wu S, Koo M, Blum L, Black A, Kao L, Fei Z, et al. Benchmarking open-source large language models, GPT-4 and Claude 2 on multiple-choice questions in nephrology. NEJM AI. Jan 25, 2024;1(2):AIdbp2300092. [ CrossRef ]
  • Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. ArXiv. Preprint posted online on April 12, 2021. [ FREE Full text ] [ CrossRef ]
  • Ferber D, Wiest IC, Wölflein G, Ebert MP, Beutel G, Eckardt J, et al. GPT-4 for information retrieval and comparison of medical oncology guidelines. NEJM AI. May 23, 2024;1(6):AIcs2300235. [ CrossRef ]
  • Chen Q, Sun H, Liu H, Jiang Y, Ran T, Jin X, et al. An extensive benchmark study on biomedical text generation and mining with ChatGPT. Bioinformatics. Sep 02, 2023;39(9):btad557. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

artificial intelligence
large language model
retrieval augmented generation
Strengthening the Reporting of Observational Studies in Epidemiology

Edited by A Castonguay; submitted 07.04.24; peer-reviewed by C Wang, S Mao, W Cui; comments to author 04.06.24; revised version received 16.06.24; accepted 05.07.24; published 04.09.24.

©Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 04.09.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. Sample Question Paper For Business Research Methods

    question paper of business research methods

  2. II SEM BUSINESS RESEARCH METHODS QUESTION PAPERS

    question paper of business research methods

  3. Business Research Methods Exam Questions And Answers Pdf

    question paper of business research methods

  4. BRM-MCQ BUSINESS RESEARCH METHOD

    question paper of business research methods

  5. SOLUTION: Business research methods mba question bank

    question paper of business research methods

  6. BRM MCQ Google

    question paper of business research methods

VIDEO

  1. BUSINESS RESEARCH METHODS PREVIOUS QUESTION PAPER #calicutuniversity

  2. May/June question paper business studies paper 1 exam preparation

  3. Question Paper

  4. Types of Business Research

  5. BG2nd Semester previous year question paper Business STATISTICS minor course

  6. #VBU #semester 6@ questions paper business research #$

COMMENTS

  1. PDF Question Bank

    4. Explain A)Construct B)Definition C)Proposition D)Hypothesis E)Theory 10 M. 5. Define the term 'Research', Enumerate the characteristics of research. Give a 10 M Comprehensive definition of research. 6. What do you mean by scientific investigation and explain them in detail. 10 M. 7. "Research is much concerned with proper fact finding ...

  2. Sample Question Paper For Business Research Methods

    This document contains a question paper on business research methods with multiple choice and descriptive questions. The questions cover topics like types of research, research objectives vs hypotheses, concepts vs constructs, stages of the research process, differences between exploratory and empirical research, and roles of theory. There is also a case study on advertising spending in India ...

  3. Business Research Methods Exam 1 Practice

    Test: Business Research Methods Exam 1 Practice. 5.0 (1 review) Name: Score: 22 Multiple choice questions. Definition. business research. The two types of business research based on the specificity of its purpose are called _____ and _____. 1. The application of the scientific method in searching for truth about business phenomena is known as

  4. PDF An Introduction to Business Research

    An Introduction to Business Research

  5. BUSINESS RESEARCH METHODS QUESTION PAPER

    Mcq 5. Mcq 4 - research methodology. Mcq 3 - research methodology. Mcq 2 - research methodology. Mcq 1 - research methodology. Question bank new-brm. On Studocu you find all the lecture notes, summaries and study guides you need to pass your exams with better grades.

  6. 33 questions with answers in BUSINESS RESEARCH METHODS

    Business Research Methods - Science topic. Explore the latest questions and answers in Business Research Methods, and find Business Research Methods experts. Questions (33) Publications (314 ...

  7. PDF Research Methods in Business Studies

    Research Methods in Business Studies This accessible guide provides clear and practical explanations of key research methods in business studies, presenting a step-by-step approach to data collection, analysis, and problem solving. Readers will learn how to formulate a research question or problem, choose an appropriate research

  8. Business Research: Methods, Types & Examples

    Business Research: Methods, Types & Examples

  9. Business Research Methods Notes, PDF I MBA 2024

    Business Research Methods Notes, PDF I MBA 2024

  10. Business Research Methods Questions Paper PDF

    This document contains a sample research paper question paper that assesses students' knowledge of business research methods. It is divided into three sections. Section A contains 10 short answer questions worth 2 marks each, testing concepts like research, hypothesis, sampling error, and applications of charts. Section B has 5 long answer questions worth 13 marks each, covering topics such as ...

  11. PDF BUSINESS RESEARCH METHODS

    Business Research Methods 6 In research, the researchers try to find out answers for unsolved questions It should be carefully recorded and reported Business Research Business research refers to systematic collection and analysis of data with the purpose of finding answers to problems facing management.

  12. Exam 13 March 2020, questions and answers

    Business Research Methodology. Chapter 1 - business Research Methods. Q1. A ..... is a proposed explanation possessing limited evidence. Generally, you want to turn logical hypotheses into an empirical hypothesis, putting your theories or postulations to the test.

  13. PDF Business research methods Question Bank SYBMS 2019

    Business research methods Question Bank SYBMS 2019 Paper pattern: - Q1. Objectives (15 marks) Q2. Full length questions (2 sets of 2 questions each. Attempt any one set) Q3. Full length questions (2 sets of 2 questions each. Attempt any one set) Q4. Full length questions (2 sets of 2 questions each. Attempt any one set) Q5. Short notes (3 of 5 ...

  14. Business Research Methods (BRM) Solved MCQs

    This document contains 317 multiple choice questions related to business research methods. The questions cover topics such as types of correlation, Karl Pearson's coefficient of correlation, regression analysis, tests of significance, parametric vs non-parametric tests, coding, frequencies, charts/graphs, measures of central tendency, bivariate analysis, ANOVA, and more. The questions are in ...

  15. Research Methods Exam Questions, Answers & Marks

    RESEARCH METHODS EXAM QUESTIONS, ANSWERS & MARKS. 4.3 (40 reviews) Get a hint. What is an experiment? An experiment is a research technique in which an IV is manipulated / and the effects of this on a DV are observed and measured. / Other (extraneous) variables are held constant. / A true experiment is one in which the IV is directly under the ...

  16. 10 Research Question Examples to Guide your Research Project

    10 Research Question Examples to Guide your ...

  17. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case Study Method: A Step-by-Step Guide for Business ...

  18. Business research methods question paper pdf

    Practicing question paper gives you the confidence papeer business research methods question paper pdf the board exam with minimum fear and stress since you get proper idea about question paper pattern and marks weightage. Discuss the various methods of qualitative research. If you would like to leave a comment, please do, I'd love to hear what ...

  19. Business research: Definition, types & methods

    Business Research: Definition, Types & Methods

  20. Business Research Methods Past Papers

    Subject: Business Research Methods. Time Allowed: 15 Minutes. Maximum Marks: 10. NOTE: Attempt this Paper on this Question Sheet only. Please encircle the correct option. Division of marks is given in front of each question. This Paper will be collected back after expiry of time limit mentioned above. Part-I Encircle the right answer, cutting ...

  21. 314 PDFs

    This paper explores teaching business students research methods using a psychogeographical approach, specifically the technique of dérive. It responds to calls for new ways of teaching in higher ...

  22. B.Com 5th Semester Business Research Methods Previous Year Question Papers

    Semester 5: Business Research Methods. Download the Bcom Business research methods previous question paper of Nov 2022. of 2. Download. of 3. Download. of 3. Download. of 3.

  23. Ba9227 Business Research Methods Anna University Question Bank

    This document contains sample questions from a Business Research Methods exam for an MBA program. It includes 10 short 2-mark questions testing concepts like null hypotheses, reliability, scales, sampling techniques, and research objectives. It also includes 5 longer 16-mark questions requiring explanations of research types, measurement, sampling methods, primary data collection, statistical ...

  24. Business Research Methods

    Business Research Methods - BA4205 Subject (under MBA - Anna University 2021 Regulation) - Notes, Important Questions, Semester Question Paper PDF Download. ... Semester Question Papers Business Research Methods - BA4205 2021 Regulation - Question Paper 2022 April May Download Business Research Methods ...

  25. GPT-fabricated scientific papers on Google Scholar: Key features

    Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research.

  26. Evaluating the Capabilities of Generative AI Tools in Understanding

    Background: Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed. Objective: This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding ...