Data collection in research: Your complete guide

Last updated

31 January 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

In the late 16th century, Francis Bacon coined the phrase "knowledge is power," which implies that knowledge is a powerful force, like physical strength. In the 21st century, knowledge in the form of data is unquestionably powerful.

But data isn't something you just have - you need to collect it. This means utilizing a data collection process and turning the collected data into knowledge that you can leverage into a successful strategy for your business or organization.

Believe it or not, there's more to data collection than just conducting a Google search. In this complete guide, we shine a spotlight on data collection, outlining what it is, types of data collection methods, common challenges in data collection, data collection techniques, and the steps involved in data collection.

Analyze all your data in one place

Uncover hidden nuggets in all types of qualitative data when you analyze it in Dovetail

  • What is data collection?

There are two specific data collection techniques: primary and secondary data collection. Primary data collection is the process of gathering data directly from sources. It's often considered the most reliable data collection method, as researchers can collect information directly from respondents.

Secondary data collection is data that has already been collected by someone else and is readily available. This data is usually less expensive and quicker to obtain than primary data.

  • What are the different methods of data collection?

There are several data collection methods, which can be either manual or automated. Manual data collection involves collecting data manually, typically with pen and paper, while computerized data collection involves using software to collect data from online sources, such as social media, website data, transaction data, etc. 

Here are the five most popular methods of data collection:

Surveys are a very popular method of data collection that organizations can use to gather information from many people. Researchers can conduct multi-mode surveys that reach respondents in different ways, including in person, by mail, over the phone, or online.

As a method of data collection, surveys have several advantages. For instance, they are relatively quick and easy to administer, you can be flexible in what you ask, and they can be tailored to collect data on various topics or from certain demographics.

However, surveys also have several disadvantages. For instance, they can be expensive to administer, and the results may not represent the population as a whole. Additionally, survey data can be challenging to interpret. It may also be subject to bias if the questions are not well-designed or if the sample of people surveyed is not representative of the population of interest.

Interviews are a common method of collecting data in social science research. You can conduct interviews in person, over the phone, or even via email or online chat.

Interviews are a great way to collect qualitative and quantitative data . Qualitative interviews are likely your best option if you need to collect detailed information about your subjects' experiences or opinions. If you need to collect more generalized data about your subjects' demographics or attitudes, then quantitative interviews may be a better option.

Interviews are relatively quick and very flexible, allowing you to ask follow-up questions and explore topics in more depth. The downside is that interviews can be time-consuming and expensive due to the amount of information to be analyzed. They are also prone to bias, as both the interviewer and the respondent may have certain expectations or preconceptions that may influence the data.

Direct observation

Observation is a direct way of collecting data. It can be structured (with a specific protocol to follow) or unstructured (simply observing without a particular plan).

Organizations and businesses use observation as a data collection method to gather information about their target market, customers, or competition. Businesses can learn about consumer behavior, preferences, and trends by observing people using their products or service.

There are two types of observation: participatory and non-participatory. In participatory observation, the researcher is actively involved in the observed activities. This type of observation is used in ethnographic research , where the researcher wants to understand a group's culture and social norms. Non-participatory observation is when researchers observe from a distance and do not interact with the people or environment they are studying.

There are several advantages to using observation as a data collection method. It can provide insights that may not be apparent through other methods, such as surveys or interviews. Researchers can also observe behavior in a natural setting, which can provide a more accurate picture of what people do and how and why they behave in a certain context.

There are some disadvantages to using observation as a method of data collection. It can be time-consuming, intrusive, and expensive to observe people for extended periods. Observations can also be tainted if the researcher is not careful to avoid personal biases or preconceptions.

Automated data collection

Business applications and websites are increasingly collecting data electronically to improve the user experience or for marketing purposes.

There are a few different ways that organizations can collect data automatically. One way is through cookies, which are small pieces of data stored on a user's computer. They track a user's browsing history and activity on a site, measuring levels of engagement with a business’s products or services, for example.

Another way organizations can collect data automatically is through web beacons. Web beacons are small images embedded on a web page to track a user's activity.

Finally, organizations can also collect data through mobile apps, which can track user location, device information, and app usage. This data can be used to improve the user experience and for marketing purposes.

Automated data collection is a valuable tool for businesses, helping improve the user experience or target marketing efforts. Businesses should aim to be transparent about how they collect and use this data.

Sourcing data through information service providers

Organizations need to be able to collect data from a variety of sources, including social media, weblogs, and sensors. The process to do this and then use the data for action needs to be efficient, targeted, and meaningful.

In the era of big data, organizations are increasingly turning to information service providers (ISPs) and other external data sources to help them collect data to make crucial decisions. 

Information service providers help organizations collect data by offering personalized services that suit the specific needs of the organizations. These services can include data collection, analysis, management, and reporting. By partnering with an ISP, organizations can gain access to the newest technology and tools to help them to gather and manage data more effectively.

There are also several tools and techniques that organizations can use to collect data from external sources, such as web scraping, which collects data from websites, and data mining, which involves using algorithms to extract data from large data sets. 

Organizations can also use APIs (application programming interface) to collect data from external sources. APIs allow organizations to access data stored in another system and share and integrate it into their own systems.

Finally, organizations can also use manual methods to collect data from external sources. This can involve contacting companies or individuals directly to request data, by using the right tools and methods to get the insights they need.

  • What are common challenges in data collection?

There are many challenges that researchers face when collecting data. Here are five common examples:

Big data environments

Data collection can be a challenge in big data environments for several reasons. It can be located in different places, such as archives, libraries, or online. The sheer volume of data can also make it difficult to identify the most relevant data sets.

Second, the complexity of data sets can make it challenging to extract the desired information. Third, the distributed nature of big data environments can make it difficult to collect data promptly and efficiently.

Therefore it is important to have a well-designed data collection strategy to consider the specific needs of the organization and what data sets are the most relevant. Alongside this, consideration should be made regarding the tools and resources available to support data collection and protect it from unintended use.

Data bias is a common challenge in data collection. It occurs when data is collected from a sample that is not representative of the population of interest. 

There are different types of data bias, but some common ones include selection bias, self-selection bias, and response bias. Selection bias can occur when the collected data does not represent the population being studied. For example, if a study only includes data from people who volunteer to participate, that data may not represent the general population.

Self-selection bias can also occur when people self-select into a study, such as by taking part only if they think they will benefit from it. Response bias happens when people respond in a way that is not honest or accurate, such as by only answering questions that make them look good. 

These types of data bias present a challenge because they can lead to inaccurate results and conclusions about behaviors, perceptions, and trends. Data bias can be avoided by identifying potential sources or themes of bias and setting guidelines for eliminating them.

Lack of quality assurance processes

One of the biggest challenges in data collection is the lack of quality assurance processes. This can lead to several problems, including incorrect data, missing data, and inconsistencies between data sets.

Quality assurance is important because there are many data sources, and each source may have different levels of quality or corruption. There are also different ways of collecting data, and data quality may vary depending on the method used. 

There are several ways to improve quality assurance in data collection. These include developing clear and consistent goals and guidelines for data collection, implementing quality control measures, using standardized procedures, and employing data validation techniques. By taking these steps, you can ensure that your data is of adequate quality to inform decision-making.

Limited access to data

Another challenge in data collection is limited access to data. This can be due to several reasons, including privacy concerns, the sensitive nature of the data, security concerns, or simply the fact that data is not readily available.

Legal and compliance regulations

Most countries have regulations governing how data can be collected, used, and stored. In some cases, data collected in one country may not be used in another. This means gaining a global perspective can be a challenge. 

For example, if a company is required to comply with the EU General Data Protection Regulation (GDPR), it may not be able to collect data from individuals in the EU without their explicit consent. This can make it difficult to collect data from a target audience.

Legal and compliance regulations can be complex, and it's important to ensure that all data collected is done so in a way that complies with the relevant regulations.

  • What are the key steps in the data collection process?

There are five steps involved in the data collection process. They are:

1. Decide what data you want to gather

Have a clear understanding of the questions you are asking, and then consider where the answers might lie and how you might obtain them. This saves time and resources by avoiding the collection of irrelevant data, and helps maintain the quality of your datasets. 

2. Establish a deadline for data collection

Establishing a deadline for data collection helps you avoid collecting too much data, which can be costly and time-consuming to analyze. It also allows you to plan for data analysis and prompt interpretation. Finally, it helps you meet your research goals and objectives and allows you to move forward.

3. Select a data collection approach

The data collection approach you choose will depend on different factors, including the type of data you need, available resources, and the project timeline. For instance, if you need qualitative data, you might choose a focus group or interview methodology. If you need quantitative data , then a survey or observational study may be the most appropriate form of collection.

4. Gather information

When collecting data for your business, identify your business goals first. Once you know what you want to achieve, you can start collecting data to reach those goals. The most important thing is to ensure that the data you collect is reliable and valid. Otherwise, any decisions you make using the data could result in a negative outcome for your business.

5. Examine the information and apply your findings

As a researcher, it's important to examine the data you're collecting and analyzing before you apply your findings. This is because data can be misleading, leading to inaccurate conclusions. Ask yourself whether it is what you are expecting? Is it similar to other datasets you have looked at? 

There are many scientific ways to examine data, but some common methods include:

looking at the distribution of data points

examining the relationships between variables

looking for outliers

By taking the time to examine your data and noticing any patterns, strange or otherwise, you can avoid making mistakes that could invalidate your research.

  • How qualitative analysis software streamlines the data collection process

Knowledge derived from data does indeed carry power. However, if you don't convert the knowledge into action, it will remain a resource of unexploited energy and wasted potential.

Luckily, data collection tools enable organizations to streamline their data collection and analysis processes and leverage the derived knowledge to grow their businesses. For instance, qualitative analysis software can be highly advantageous in data collection by streamlining the process, making it more efficient and less time-consuming.

Secondly, qualitative analysis software provides a structure for data collection and analysis, ensuring that data is of high quality. It can also help to uncover patterns and relationships that would otherwise be difficult to discern. Moreover, you can use it to replace more expensive data collection methods, such as focus groups or surveys.

Overall, qualitative analysis software can be valuable for any researcher looking to collect and analyze data. By increasing efficiency, improving data quality, and providing greater insights, qualitative software can help to make the research process much more efficient and effective.

research data collection and analysis

Learn more about qualitative research data analysis software

Should you be using a customer insights hub.

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 22 August 2024

Last updated: 5 February 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

Jump to navigation

Home

Cochrane Training

Chapter 5: collecting data.

Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Key Points:

  • Systematic reviews have studies, rather than reports, as the unit of interest, and so multiple reports of the same study need to be identified and linked together before or after data extraction.
  • Because of the increasing availability of data sources (e.g. trials registers, regulatory documents, clinical study reports), review authors should decide on which sources may contain the most useful information for the review, and have a plan to resolve discrepancies if information is inconsistent across sources.
  • Review authors are encouraged to develop outlines of tables and figures that will appear in the review to facilitate the design of data collection forms. The key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner.
  • Effort should be made to identify data needed for meta-analyses, which often need to be calculated or converted from data reported in diverse formats.
  • Data should be collected and archived in a form that allows future access and data sharing.

Cite this chapter as: Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

5.1 Introduction

Systematic reviews aim to identify all studies that are relevant to their research questions and to synthesize data about the design, risk of bias, and results of those studies. Consequently, the findings of a systematic review depend critically on decisions relating to which data from these studies are presented and analysed. Data collected for systematic reviews should be accurate, complete, and accessible for future updates of the review and for data sharing. Methods used for these decisions must be transparent; they should be chosen to minimize biases and human error. Here we describe approaches that should be used in systematic reviews for collecting data, including extraction of data directly from journal articles and other reports of studies.

5.2 Sources of data

Studies are reported in a range of sources which are detailed later. As discussed in Section 5.2.1 , it is important to link together multiple reports of the same study. The relative strengths and weaknesses of each type of source are discussed in Section 5.2.2 . For guidance on searching for and selecting reports of studies, refer to Chapter 4 .

Journal articles are the source of the majority of data included in systematic reviews. Note that a study can be reported in multiple journal articles, each focusing on some aspect of the study (e.g. design, main results, and other results).

Conference abstracts are commonly available. However, the information presented in conference abstracts is highly variable in reliability, accuracy, and level of detail (Li et al 2017).

Errata and letters can be important sources of information about studies, including critical weaknesses and retractions, and review authors should examine these if they are identified (see MECIR Box 5.2.a ).

Trials registers (e.g. ClinicalTrials.gov) catalogue trials that have been planned or started, and have become an important data source for identifying trials, for comparing published outcomes and results with those planned, and for obtaining efficacy and safety data that are not available elsewhere (Ross et al 2009, Jones et al 2015, Baudard et al 2017).

Clinical study reports (CSRs) contain unabridged and comprehensive descriptions of the clinical problem, design, conduct and results of clinical trials, following a structure and content guidance prescribed by the International Conference on Harmonisation (ICH 1995). To obtain marketing approval of drugs and biologics for a specific indication, pharmaceutical companies submit CSRs and other required materials to regulatory authorities. Because CSRs also incorporate tables and figures, with appendices containing the protocol, statistical analysis plan, sample case report forms, and patient data listings (including narratives of all serious adverse events), they can be thousands of pages in length. CSRs often contain more data about trial methods and results than any other single data source (Mayo-Wilson et al 2018). CSRs are often difficult to access, and are usually not publicly available. Review authors could request CSRs from the European Medicines Agency (Davis and Miller 2017). The US Food and Drug and Administration had historically avoided releasing CSRs but launched a pilot programme in 2018 whereby selected portions of CSRs for new drug applications were posted on the agency’s website. Many CSRs are obtained through unsealed litigation documents, repositories (e.g. clinicalstudydatarequest.com ), and other open data and data-sharing channels (e.g. The Yale University Open Data Access Project) (Doshi et al 2013, Wieland et al 2014, Mayo-Wilson et al 2018)).

Regulatory reviews such as those available from the US Food and Drug Administration or European Medicines Agency provide useful information about trials of drugs, biologics, and medical devices submitted by manufacturers for marketing approval (Turner 2013). These documents are summaries of CSRs and related documents, prepared by agency staff as part of the process of approving the products for marketing, after reanalysing the original trial data. Regulatory reviews often are available only for the first approved use of an intervention and not for later applications (although review authors may request those documents, which are usually brief). Using regulatory reviews from the US Food and Drug Administration as an example, drug approval packages are available on the agency’s website for drugs approved since 1997 (Turner 2013); for drugs approved before 1997, information must be requested through a freedom of information request. The drug approval packages contain various documents: approval letter(s), medical review(s), chemistry review(s), clinical pharmacology review(s), and statistical reviews(s).

Individual participant data (IPD) are usually sought directly from the researchers responsible for the study, or may be identified from open data repositories (e.g. www.clinicalstudydatarequest.com ). These data typically include variables that represent the characteristics of each participant, intervention (or exposure) group, prognostic factors, and measurements of outcomes (Stewart et al 2015). Access to IPD has the advantage of allowing review authors to reanalyse the data flexibly, in accordance with the preferred analysis methods outlined in the protocol, and can reduce the variation in analysis methods across studies included in the review. IPD reviews are addressed in detail in Chapter 26 .

MECIR Box 5.2.a Relevant expectations for conduct of intervention reviews

Examining errata ( )

Some studies may have been found to be fraudulent or may for other reasons have been retracted since publication. Errata can reveal important limitations, or even fatal flaws, in included studies. All of these may potentially lead to the exclusion of a study from a review or meta-analysis. Care should be taken to ensure that this information is retrieved in all database searches by downloading the appropriate fields together with the citation data.

5.2.1 Studies (not reports) as the unit of interest

In a systematic review, studies rather than reports of studies are the principal unit of interest. Since a study may have been reported in several sources, a comprehensive search for studies for the review may identify many reports from a potentially relevant study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2018). Conversely, a report may describe more than one study.

Multiple reports of the same study should be linked together (see MECIR Box 5.2.b ). Some authors prefer to link reports before they collect data, and collect data from across the reports onto a single form. Other authors prefer to collect data from each report and then link together the collected data across reports. Either strategy may be appropriate, depending on the nature of the reports at hand. It may not be clear that two reports relate to the same study until data collection has commenced. Although sometimes there is a single report for each study, it should never be assumed that this is the case.

MECIR Box 5.2.b Relevant expectations for conduct of intervention reviews

Collating multiple reports ( )

It is wrong to consider multiple reports of the same study as if they are multiple studies. Secondary reports of a study should not be discarded, however, since they may contain valuable information about the design and conduct. Review authors must choose and justify which report to use as a source for study results.

It can be difficult to link multiple reports from the same study, and review authors may need to do some ‘detective work’. Multiple sources about the same trial may not reference each other, do not share common authors (Gøtzsche 1989, Tramèr et al 1997), or report discrepant information about the study design, characteristics, outcomes, and results (von Elm et al 2004, Mayo-Wilson et al 2017a).

Some of the most useful criteria for linking reports are:

  • trial registration numbers;
  • authors’ names;
  • sponsor for the study and sponsor identifiers (e.g. grant or contract numbers);
  • location and setting (particularly if institutions, such as hospitals, are named);
  • specific details of the interventions (e.g. dose, frequency);
  • numbers of participants and baseline data; and
  • date and duration of the study (which also can clarify whether different sample sizes are due to different periods of recruitment), length of follow-up, or subgroups selected to address secondary goals.

Review authors should use as many trial characteristics as possible to link multiple reports. When uncertainties remain after considering these and other factors, it may be necessary to correspond with the study authors or sponsors for confirmation.

5.2.2 Determining which sources might be most useful

A comprehensive search to identify all eligible studies from all possible sources is resource-intensive but necessary for a high-quality systematic review (see Chapter 4 ). Because some data sources are more useful than others (Mayo-Wilson et al 2018), review authors should consider which data sources may be available and which may contain the most useful information for the review. These considerations should be described in the protocol. Table 5.2.a summarizes the strengths and limitations of different data sources (Mayo-Wilson et al 2018). Gaining access to CSRs and IPD often takes a long time. Review authors should begin searching repositories and contact trial investigators and sponsors as early as possible to negotiate data usage agreements (Mayo-Wilson et al 2015, Mayo-Wilson et al 2018).

Table 5.2.a Strengths and limitations of different data sources for systematic reviews

 

Found easily

Data extracted quickly

Include useful information about methods and results

Available for some, but not all studies (with a risk of reporting biases: see and )

Contain limited study characteristics and methods

Can omit outcomes, especially harms

Identify unpublished studies

Include little information about study design

Include limited and unclear information for meta-analysis

May result in double-counting studies in meta-analysis if not correctly linked to other reports of the same study

Identify otherwise unpublished trials

May contain information about design, risk of bias, and results not included in other public sources

Link multiple sources about the same trial using unique registration number

Limited to more recent studies that comply with registration requirements

Often contain limited information about trial design and quantitative results

May report only harms (adverse events) occurring above a threshold (e.g. 5%)

May be inaccurate or incomplete for trials whose methods have changed during the conduct of the study, or results not kept up to date

Identify studies not reported in other public sources

Describe details of methods and results not found in other sources

Available only for studies submitted to regulators

Available for approved indications, but not ‘off-label’ uses

Not always in a standard format

Not often available for old products

Contain detailed information about study characteristics, methods, and results

Can be particularly useful for identifying detailed information about harms

Describe aggregate results, which are easy to analyse and sufficient for most reviews

Do not exist or difficult to obtain for most studies

Require more time to obtain and analyse than public sources

Allow review authors to use contemporary statistical methods and to standardize analyses across studies

Permit additional analyses that the review authors desire (e.g. subgroup analyses)

Require considerable expertise and time to obtain and analyse

May lead to the same results that can be found in aggregate report

May not be necessary if one has a CSR

5.2.3 Correspondence with investigators

Review authors often find that they are unable to obtain all the information they seek from available reports about the details of the study design, the full range of outcomes measured and the numerical results. In such circumstances, authors are strongly encouraged to contact the original investigators (see MECIR Box 5.2.c ). Contact details of study authors, when not available from the study reports, often can be obtained from more recent publications, from university or institutional staff listings, from membership directories of professional societies, or by a general search of the web. If the contact author named in the study report cannot be contacted or does not respond, it is worthwhile attempting to contact other authors.

Review authors should consider the nature of the information they require and make their request accordingly. For descriptive information about the conduct of the trial, it may be most appropriate to ask open-ended questions (e.g. how was the allocation process conducted, or how were missing data handled?). If specific numerical data are required, it may be more helpful to request them specifically, possibly providing a short data collection form (either uncompleted or partially completed). If IPD are required, they should be specifically requested (see also Chapter 26 ). In some cases, study investigators may find it more convenient to provide IPD rather than conduct additional analyses to obtain the specific statistics requested.

MECIR Box 5.2.c Relevant expectations for conduct of intervention reviews

Obtaining unpublished data ( )

Contacting study authors to obtain or confirm data makes the review more complete, potentially enhances precision and reduces the impact of reporting biases. Missing information includes details to inform risk of bias assessments, details of interventions and outcomes, and study results (including breakdowns of results by important subgroups).

5.3 What data to collect

5.3.1 what are data.

For the purposes of this chapter, we define ‘data’ to be any information about (or derived from) a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators. Review authors should plan in advance what data will be required for their systematic review, and develop a strategy for obtaining them (see MECIR Box 5.3.a ). The involvement of consumers and other stakeholders can be helpful in ensuring that the categories of data collected are sufficiently aligned with the needs of review users ( Chapter 1, Section 1.3 ). The data to be sought should be described in the protocol, with consideration wherever possible of the issues raised in the rest of this chapter.

The data collected for a review should adequately describe the included studies, support the construction of tables and figures, facilitate the risk of bias assessment, and enable syntheses and meta-analyses. Review authors should familiarize themselves with reporting guidelines for systematic reviews (see online Chapter III and the PRISMA statement; (Liberati et al 2009) to ensure that relevant elements and sections are incorporated. The following sections review the types of information that should be sought, and these are summarized in Table 5.3.a (Li et al 2015).

MECIR Box 5.3.a Relevant expectations for conduct of intervention reviews

Describing studies ( )

Basic characteristics of each study will need to be presented as part of the review, including details of participants, interventions and comparators, outcomes and study design.

Table 5.3.a Checklist of items to consider in data collection

Name of data extractors, date of data extraction, and identification features of each report from which data are being extracted

Confirm eligibility of the study for the review

Reason for exclusion

Study design:

Recruitment and sampling procedures used (including at the level of individual participants and clusters/sites if relevant)

Enrolment start and end dates; length of participant follow-up

Details of random sequence generation, allocation sequence concealment, and masking for randomized trials, and methods used to prevent and control for confounding, selection biases, and information biases for non-randomized studies*

Methods used to prevent and address missing data*

Statistical analysis:

Unit of analysis (e.g. individual participant, clinic, village, body part)

Statistical methods used if computed effect estimates are extracted from reports, including any covariates included in the statistical model

Likelihood of reporting and other biases*

Source(s) of funding or other material support for the study

Authors’ financial relationship and other potential conflicts of interest

Setting

Region(s) and country/countries from which study participants were recruited

Study eligibility criteria, including diagnostic criteria

Characteristics of participants at the beginning (or baseline) of the study (e.g. age, sex, comorbidity, socio-economic status)

Description of the intervention(s) and comparison intervention(s), ideally with sufficient detail for replication:

For each pre-specified outcome domain (e.g. anxiety) in the systematic review:

)

For each group, and for each outcome at each time point: number of participants randomly assigned and included in the analysis; and number of participants who withdrew, were lost to follow-up or were excluded (with reasons for each)

Summary data for each group (e.g. 2×2 table for dichotomous data; means and standard deviations for continuous data)

Between-group estimates that quantify the effect of the intervention on the outcome, and their precision (e.g. risk ratio, odds ratio, mean difference)

If subgroup analysis is planned, the same information would need to be extracted for each participant subgroup

Key conclusions of the study authors

Reference to other relevant studies

Correspondence required

Miscellaneous comments from the study authors or by the review authors

*Full description required for assessments of risk of bias (see Chapter 8 , Chapter 23 and Chapter 25 ).

5.3.2 Study methods and potential sources of bias

Different research methods can influence study outcomes by introducing different biases into results. Important study design characteristics should be collected to allow the selection of appropriate methods for assessment and analysis, and to enable description of the design of each included study in a table of ‘Characteristics of included studies’, including whether the study is randomized, whether the study has a cluster or crossover design, and the duration of the study. If the review includes non-randomized studies, appropriate features of the studies should be described (see Chapter 24 ).

Detailed information should be collected to facilitate assessment of the risk of bias in each included study. Risk-of-bias assessment should be conducted using the tool most appropriate for the design of each study, and the information required to complete the assessment will depend on the tool. Randomized studies should be assessed using the tool described in Chapter 8 . The tool covers bias arising from the randomization process, due to deviations from intended interventions, due to missing outcome data, in measurement of the outcome, and in selection of the reported result. For each item in the tool, a description of what happened in the study is required, which may include verbatim quotes from study reports. Information for assessment of bias due to missing outcome data and selection of the reported result may be most conveniently collected alongside information on outcomes and results. Chapter 7 (Section 7.3.1) discusses some issues in the collection of information for assessments of risk of bias. For non-randomized studies, the most appropriate tool is described in Chapter 25 . A separate tool also covers bias due to missing results in meta-analysis (see Chapter 13 ).

A particularly important piece of information is the funding source of the study and potential conflicts of interest of the study authors.

Some review authors will wish to collect additional information on study characteristics that bear on the quality of the study’s conduct but that may not lead directly to risk of bias, such as whether ethical approval was obtained and whether a sample size calculation was performed a priori.

5.3.3 Participants and setting

Details of participants are collected to enable an understanding of the comparability of, and differences between, the participants within and between included studies, and to allow assessment of how directly or completely the participants in the included studies reflect the original review question.

Typically, aspects that should be collected are those that could (or are believed to) affect presence or magnitude of an intervention effect and those that could help review users assess applicability to populations beyond the review. For example, if the review authors suspect important differences in intervention effect between different socio-economic groups, this information should be collected. If intervention effects are thought constant over such groups, and if such information would not be useful to help apply results, it should not be collected. Participant characteristics that are often useful for assessing applicability include age and sex. Summary information about these should always be collected unless they are not obvious from the context. These characteristics are likely to be presented in different formats (e.g. ages as means or medians, with standard deviations or ranges; sex as percentages or counts for the whole study or for each intervention group separately). Review authors should seek consistent quantities where possible, and decide whether it is more relevant to summarize characteristics for the study as a whole or by intervention group. It may not be possible to select the most consistent statistics until data collection is complete across all or most included studies. Other characteristics that are sometimes important include ethnicity, socio-demographic details (e.g. education level) and the presence of comorbid conditions. Clinical characteristics relevant to the review question (e.g. glucose level for reviews on diabetes) also are important for understanding the severity or stage of the disease.

Diagnostic criteria that were used to define the condition of interest can be a particularly important source of diversity across studies and should be collected. For example, in a review of drug therapy for congestive heart failure, it is important to know how the definition and severity of heart failure was determined in each study (e.g. systolic or diastolic dysfunction, severe systolic dysfunction with ejection fractions below 20%). Similarly, in a review of antihypertensive therapy, it is important to describe baseline levels of blood pressure of participants.

If the settings of studies may influence intervention effects or applicability, then information on these should be collected. Typical settings of healthcare intervention studies include acute care hospitals, emergency facilities, general practice, and extended care facilities such as nursing homes, offices, schools, and communities. Sometimes studies are conducted in different geographical regions with important differences that could affect delivery of an intervention and its outcomes, such as cultural characteristics, economic context, or rural versus city settings. Timing of the study may be associated with important technology differences or trends over time. If such information is important for the interpretation of the review, it should be collected.

Important characteristics of the participants in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’.

5.3.4 Interventions

Details of all experimental and comparator interventions of relevance to the review should be collected. Again, details are required for aspects that could affect the presence or magnitude of an effect or that could help review users assess applicability to their own circumstances. Where feasible, information should be sought (and presented in the review) that is sufficient for replication of the interventions under study. This includes any co-interventions administered as part of the study, and applies similarly to comparators such as ‘usual care’. Review authors may need to request missing information from study authors.

The Template for Intervention Description and Replication (TIDieR) provides a comprehensive framework for full description of interventions and has been proposed for use in systematic reviews as well as reports of primary studies (Hoffmann et al 2014). The checklist includes descriptions of:

  • the rationale for the intervention and how it is expected to work;
  • any documentation that instructs the recipient on the intervention;
  • what the providers do to deliver the intervention (procedures and processes);
  • who provides the intervention (including their skill level), how (e.g. face to face, web-based) and in what setting (e.g. home, school, or hospital);
  • the timing and intensity;
  • whether any variation is permitted or expected, and whether modifications were actually made; and
  • any strategies used to ensure or assess fidelity or adherence to the intervention, and the extent to which the intervention was delivered as planned.

For clinical trials of pharmacological interventions, key information to collect will often include routes of delivery (e.g. oral or intravenous delivery), doses (e.g. amount or intensity of each treatment, frequency of delivery), timing (e.g. within 24 hours of diagnosis), and length of treatment. For other interventions, such as those that evaluate psychotherapy, behavioural and educational approaches, or healthcare delivery strategies, the amount of information required to characterize the intervention will typically be greater, including information about multiple elements of the intervention, who delivered it, and the format and timing of delivery. Chapter 17 provides further information on how to manage intervention complexity, and how the intervention Complexity Assessment Tool (iCAT) can facilitate data collection (Lewin et al 2017).

Important characteristics of the interventions in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’. Additional tables or diagrams such as logic models ( Chapter 2, Section 2.5.1 ) can assist descriptions of multi-component interventions so that review users can better assess review applicability to their context.

5.3.4.1 Integrity of interventions

The degree to which specified procedures or components of the intervention are implemented as planned can have important consequences for the findings from a study. We describe this as intervention integrity ; related terms include adherence, compliance and fidelity (Carroll et al 2007). The verification of intervention integrity may be particularly important in reviews of non-pharmacological trials such as behavioural interventions and complex interventions, which are often implemented in conditions that present numerous obstacles to idealized delivery.

It is generally expected that reports of randomized trials provide detailed accounts of intervention implementation (Zwarenstein et al 2008, Moher et al 2010). In assessing whether interventions were implemented as planned, review authors should bear in mind that some interventions are standardized (with no deviations permitted in the intervention protocol), whereas others explicitly allow a degree of tailoring (Zwarenstein et al 2008). In addition, the growing field of implementation science has led to an increased awareness of the impact of setting and context on delivery of interventions (Damschroder et al 2009). (See Chapter 17, Section 17.1.2.1 for further information and discussion about how an intervention may be tailored to local conditions in order to preserve its integrity.)

Information about integrity can help determine whether unpromising results are due to a poorly conceptualized intervention or to an incomplete delivery of the prescribed components. It can also reveal important information about the feasibility of implementing a given intervention in real life settings. If it is difficult to achieve full implementation in practice, the intervention will have low feasibility (Dusenbury et al 2003).

Whether a lack of intervention integrity leads to a risk of bias in the estimate of its effect depends on whether review authors and users are interested in the effect of assignment to intervention or the effect of adhering to intervention, as discussed in more detail in Chapter 8, Section 8.2.2 . Assessment of deviations from intended interventions is important for assessing risk of bias in the latter, but not the former (see Chapter 8, Section 8.4 ), but both may be of interest to decision makers in different ways.

An example of a Cochrane Review evaluating intervention integrity is provided by a review of smoking cessation in pregnancy (Chamberlain et al 2017). The authors found that process evaluation of the intervention occurred in only some trials and that the implementation was less than ideal in others, including some of the largest trials. The review highlighted how the transfer of an intervention from one setting to another may reduce its effectiveness when elements are changed, or aspects of the materials are culturally inappropriate.

5.3.4.2 Process evaluations

Process evaluations seek to evaluate the process (and mechanisms) between the intervention’s intended implementation and the actual effect on the outcome (Moore et al 2015). Process evaluation studies are characterized by a flexible approach to data collection and the use of numerous methods to generate a range of different types of data, encompassing both quantitative and qualitative methods. Guidance for including process evaluations in systematic reviews is provided in Chapter 21 . When it is considered important, review authors should aim to collect information on whether the trial accounted for, or measured, key process factors and whether the trials that thoroughly addressed integrity showed a greater impact. Process evaluations can be a useful source of factors that potentially influence the effectiveness of an intervention.

5.3.5 Outcome s

An outcome is an event or a measurement value observed or recorded for a particular person or intervention unit in a study during or following an intervention, and that is used to assess the efficacy and safety of the studied intervention (Meinert 2012). Review authors should indicate in advance whether they plan to collect information about all outcomes measured in a study or only those outcomes of (pre-specified) interest in the review. Research has shown that trials addressing the same condition and intervention seldom agree on which outcomes are the most important, and consequently report on numerous different outcomes (Dwan et al 2014, Ismail et al 2014, Denniston et al 2015, Saldanha et al 2017a). The selection of outcomes across systematic reviews of the same condition is also inconsistent (Page et al 2014, Saldanha et al 2014, Saldanha et al 2016, Liu et al 2017). Outcomes used in trials and in systematic reviews of the same condition have limited overlap (Saldanha et al 2017a, Saldanha et al 2017b).

We recommend that only the outcomes defined in the protocol be described in detail. However, a complete list of the names of all outcomes measured may allow a more detailed assessment of the risk of bias due to missing outcome data (see Chapter 13 ).

Review authors should collect all five elements of an outcome (Zarin et al 2011, Saldanha et al 2014):

1. outcome domain or title (e.g. anxiety);

2. measurement tool or instrument (including definition of clinical outcomes or endpoints); for a scale, name of the scale (e.g. the Hamilton Anxiety Rating Scale), upper and lower limits, and whether a high or low score is favourable, definitions of any thresholds if appropriate;

3. specific metric used to characterize each participant’s results (e.g. post-intervention anxiety, or change in anxiety from baseline to a post-intervention time point, or post-intervention presence of anxiety (yes/no));

4. method of aggregation (e.g. mean and standard deviation of anxiety scores in each group, or proportion of people with anxiety);

5. timing of outcome measurements (e.g. assessments at end of eight-week intervention period, events occurring during eight-week intervention period).

Further considerations for economics outcomes are discussed in Chapter 20 , and for patient-reported outcomes in Chapter 18 .

5.3.5.1 Adverse effects

Collection of information about the harmful effects of an intervention can pose particular difficulties, discussed in detail in Chapter 19 . These outcomes may be described using multiple terms, including ‘adverse event’, ‘adverse effect’, ‘adverse drug reaction’, ‘side effect’ and ‘complication’. Many of these terminologies are used interchangeably in the literature, although some are technically different. Harms might additionally be interpreted to include undesirable changes in other outcomes measured during a study, such as a decrease in quality of life where an improvement may have been anticipated.

In clinical trials, adverse events can be collected either systematically or non-systematically. Systematic collection refers to collecting adverse events in the same manner for each participant using defined methods such as a questionnaire or a laboratory test. For systematically collected outcomes representing harm, data can be collected by review authors in the same way as efficacy outcomes (see Section 5.3.5 ).

Non-systematic collection refers to collection of information on adverse events using methods such as open-ended questions (e.g. ‘Have you noticed any symptoms since your last visit?’), or reported by participants spontaneously. In either case, adverse events may be selectively reported based on their severity, and whether the participant suspected that the effect may have been caused by the intervention, which could lead to bias in the available data. Unfortunately, most adverse events are collected non-systematically rather than systematically, creating a challenge for review authors. The following pieces of information are useful and worth collecting (Nicole Fusco, personal communication):

  • any coding system or standard medical terminology used (e.g. COSTART, MedDRA), including version number;
  • name of the adverse events (e.g. dizziness);
  • reported intensity of the adverse event (e.g. mild, moderate, severe);
  • whether the trial investigators categorized the adverse event as ‘serious’;
  • whether the trial investigators identified the adverse event as being related to the intervention;
  • time point (most commonly measured as a count over the duration of the study);
  • any reported methods for how adverse events were selected for inclusion in the publication (e.g. ‘We reported all adverse events that occurred in at least 5% of participants’); and
  • associated results.

Different collection methods lead to very different accounting of adverse events (Safer 2002, Bent et al 2006, Ioannidis et al 2006, Carvajal et al 2011, Allen et al 2013). Non-systematic collection methods tend to underestimate how frequently an adverse event occurs. It is particularly problematic when the adverse event of interest to the review is collected systematically in some studies but non-systematically in other studies. Different collection methods introduce an important source of heterogeneity. In addition, when non-systematic adverse events are reported based on quantitative selection criteria (e.g. only adverse events that occurred in at least 5% of participants were included in the publication), use of reported data alone may bias the results of meta-analyses. Review authors should be cautious of (or refrain from) synthesizing adverse events that are collected differently.

Regardless of the collection methods, precise definitions of adverse effect outcomes and their intensity should be recorded, since they may vary between studies. For example, in a review of aspirin and gastrointestinal haemorrhage, some trials simply reported gastrointestinal bleeds, while others reported specific categories of bleeding, such as haematemesis, melaena, and proctorrhagia (Derry and Loke 2000). The definition and reporting of severity of the haemorrhages (e.g. major, severe, requiring hospital admission) also varied considerably among the trials (Zanchetti and Hansson 1999). Moreover, a particular adverse effect may be described or measured in different ways among the studies. For example, the terms ‘tiredness’, ‘fatigue’ or ‘lethargy’ may all be used in reporting of adverse effects. Study authors also may use different thresholds for ‘abnormal’ results (e.g. hypokalaemia diagnosed at a serum potassium concentration of 3.0 mmol/L or 3.5 mmol/L).

No mention of adverse events in trial reports does not necessarily mean that no adverse events occurred. It is usually safest to assume that they were not reported. Quality of life measures are sometimes used as a measure of the participants’ experience during the study, but these are usually general measures that do not look specifically at particular adverse effects of the intervention. While quality of life measures are important and can be used to gauge overall participant well-being, they should not be regarded as substitutes for a detailed evaluation of safety and tolerability.

5.3.6 Results

Results data arise from the measurement or ascertainment of outcomes for individual participants in an intervention study. Results data may be available for each individual in a study (i.e. individual participant data; see Chapter 26 ), or summarized at arm level, or summarized at study level into an intervention effect by comparing two intervention arms. Results data should be collected only for the intervention groups and outcomes specified to be of interest in the protocol (see MECIR Box 5.3.b ). Results for other outcomes should not be collected unless the protocol is modified to add them. Any modification should be reported in the review. However, review authors should be alert to the possibility of important, unexpected findings, particularly serious adverse effects.

MECIR Box 5.3.b Relevant expectations for conduct of intervention reviews

Choosing intervention groups in multi-arm studies ( )

There is no point including irrelevant interventions in the review. Authors should, however, make it clear in the table of ‘Characteristics of included studies’ that these interventions were present in the study.

Reports of studies often include several results for the same outcome. For example, different measurement scales might be used, results may be presented separately for different subgroups, and outcomes may have been measured at different follow-up time points. Variation in the results can be very large, depending on which data are selected (Gøtzsche et al 2007, Mayo-Wilson et al 2017a). Review protocols should be as specific as possible about which outcome domains, measurement tools, time points, and summary statistics (e.g. final values versus change from baseline) are to be collected (Mayo-Wilson et al 2017b). A framework should be pre-specified in the protocol to facilitate making choices between multiple eligible measures or results. For example, a hierarchy of preferred measures might be created, or plans articulated to select the result with the median effect size, or to average across all eligible results for a particular outcome domain (see also Chapter 9, Section 9.3.3 ). Any additional decisions or changes to this framework made once the data are collected should be reported in the review as changes to the protocol.

Section 5.6 describes the numbers that will be required to perform meta-analysis, if appropriate. The unit of analysis (e.g. participant, cluster, body part, treatment period) should be recorded for each result when it is not obvious (see Chapter 6, Section 6.2 ). The type of outcome data determines the nature of the numbers that will be sought for each outcome. For example, for a dichotomous (‘yes’ or ‘no’) outcome, the number of participants and the number who experienced the outcome will be sought for each group. It is important to collect the sample size relevant to each result, although this is not always obvious. A flow diagram as recommended in the CONSORT Statement (Moher et al 2001) can help to determine the flow of participants through a study. If one is not available in a published report, review authors can consider drawing one (available from www.consort-statement.org ).

The numbers required for meta-analysis are not always available. Often, other statistics can be collected and converted into the required format. For example, for a continuous outcome, it is usually most convenient to seek the number of participants, the mean and the standard deviation for each intervention group. These are often not available directly, especially the standard deviation. Alternative statistics enable calculation or estimation of the missing standard deviation (such as a standard error, a confidence interval, a test statistic (e.g. from a t-test or F-test) or a P value). These should be extracted if they provide potentially useful information (see MECIR Box 5.3.c ). Details of recalculation are provided in Section 5.6 . Further considerations for dealing with missing data are discussed in Chapter 10, Section 10.12 .

MECIR Box 5.3.c Relevant expectations for conduct of intervention reviews

Making maximal use of data ( )

) or P values, or even data for individual participants Data entry into RevMan is easiest when 2×2 tables are reported for dichotomous outcomes, and when means and standard deviations are presented for continuous outcomes. Sometimes these statistics are not reported but some manipulations of the reported data can be performed to obtain them. For instance, 2×2 tables can often be derived from sample sizes and percentages, while standard deviations can often be computed using confidence intervals or P values. Furthermore, the inverse-variance data entry format can be used even if the detailed data required for dichotomous or continuous data are not available, for instance if only odds ratios and their confidence intervals are presented. The RevMan calculator facilitates many of these manipulations.

Checking accuracy of numeric data in the review ( )

This is a reasonably straightforward way for authors to check a number of potential problems, including typographical errors in studies’ reports, accuracy of data collection and manipulation, and data entry into RevMan.  For example, the direction of a standardized mean difference may accidentally be wrong in the review. A basic check is to ensure the same qualitative findings (e.g. direction of effect and statistical significance) between the data as presented in the review and the data as available from the original study. Results in forest plots should agree with data in the original report (point estimate and confidence interval) if the same effect measure and statistical model is used.

5.3.7 Other information to collect

We recommend that review authors collect the key conclusions of the included study as reported by its authors. It is not necessary to report these conclusions in the review, but they should be used to verify the results of analyses undertaken by the review authors, particularly in relation to the direction of effect. Further comments by the study authors, for example any explanations they provide for unexpected findings, may be noted. References to other studies that are cited in the study report may be useful, although review authors should be aware of the possibility of citation bias (see Chapter 7, Section 7.2.3.2 ). Documentation of any correspondence with the study authors is important for review transparency.

5.4 Data collection tools

5.4.1 rationale for data collection forms.

Data collection for systematic reviews should be performed using structured data collection forms (see MECIR Box 5.4.a ). These can be paper forms, electronic forms (e.g. Google Form), or commercially or custom-built data systems (e.g. Covidence, EPPI-Reviewer, Systematic Review Data Repository (SRDR)) that allow online form building, data entry by several users, data sharing, and efficient data management (Li et al 2015). All different means of data collection require data collection forms.

MECIR Box 5.4.a Relevant expectations for conduct of intervention reviews

Using data collection forms ( )

Review authors often have different backgrounds and level of systematic review experience. Using a data collection form ensures some consistency in the process of data extraction, and is necessary for comparing data extracted in duplicate. The completed data collection forms should be available to the CRG on request. Piloting the form within the review team is highly desirable. At minimum, the data collection form (or a very close variant of it) must have been assessed for usability.

The data collection form is a bridge between what is reported by the original investigators (e.g. in journal articles, abstracts, personal correspondence) and what is ultimately reported by the review authors. The data collection form serves several important functions (Meade and Richardson 1997). First, the form is linked directly to the review question and criteria for assessing eligibility of studies, and provides a clear summary of these that can be used to identify and structure the data to be extracted from study reports. Second, the data collection form is the historical record of the provenance of the data used in the review, as well as the multitude of decisions (and changes to decisions) that occur throughout the review process. Third, the form is the source of data for inclusion in an analysis.

Given the important functions of data collection forms, ample time and thought should be invested in their design. Because each review is different, data collection forms will vary across reviews. However, there are many similarities in the types of information that are important. Thus, forms can be adapted from one review to the next. Although we use the term ‘data collection form’ in the singular, in practice it may be a series of forms used for different purposes: for example, a separate form could be used to assess the eligibility of studies for inclusion in the review to assist in the quick identification of studies to be excluded from or included in the review.

5.4.2 Considerations in selecting data collection tools

The choice of data collection tool is largely dependent on review authors’ preferences, the size of the review, and resources available to the author team. Potential advantages and considerations of selecting one data collection tool over another are outlined in Table 5.4.a (Li et al 2015). A significant advantage that data systems have is in data management ( Chapter 1, Section 1.6 ) and re-use. They make review updates more efficient, and also facilitate methodological research across reviews. Numerous ‘meta-epidemiological’ studies have been carried out using Cochrane Review data, resulting in methodological advances which would not have been possible if thousands of studies had not all been described using the same data structures in the same system.

Some data collection tools facilitate automatic imports of extracted data into RevMan (Cochrane’s authoring tool), such as CSV (Excel) and Covidence. Details available here https://documentation.cochrane.org/revman-kb/populate-study-data-260702462.html

Table 5.4.a Considerations in selecting data collection tools

Examples

Forms developed using word processing software

Microsoft Access

Google Forms

Covidence

EPPI-Reviewer

Systematic Review Data Repository (SRDR)

DistillerSR (Evidence Partners)

Doctor Evidence

Suitable review type and team sizes

Small-scale reviews (<10 included studies)

Small team with 2 to 3 data extractors in the same physical location

Small- to medium-scale reviews (10 to 20 studies)

Small to moderate-sized team with 4 to 6 data extractors

For small-, medium-, and especially large-scale reviews (>20 studies), as well as reviews that need constant updating

All team sizes, especially large teams (i.e. >6 data extractors)

Resource needs

Low

Low to medium

Low (open-access tools such as Covidence or SRDR, or tools for which authors have institutional licences)

High (commercial data systems with no access via an institutional licence)

Advantages

Do not rely on access to computer and network or internet connectivity

Can record notes and explanations easily

Require minimal software skills

Allow extracted data to be processed electronically for editing and analysis

Allow electronic data storage, sharing and collation

Easy to expand or edit forms as required

Can automate data comparison with additional programming

Can copy data to analysis software without manual re-entry, reducing errors

Specifically designed for data collection for systematic reviews

Allow online data storage, linking, and sharing

Easy to expand or edit forms as required

Can be integrated with title/abstract, full-text screening and other functions

Can link data items to locations in the report to facilitate checking

Can readily automate data comparison between independent data collection for the same study

Allow easy monitoring of progress and performance of the author team

Facilitate coordination among data collectors such as allocation of studies for collection and monitoring team progress

Allow simultaneous data entry by multiple authors

Can export data directly to analysis software

In some cases, improve public accessibility through open data sharing

Disadvantages

Inefficient and potentially unreliable because data must be entered into software for analysis and reporting

Susceptible to errors

Data collected by multiple authors must be manually collated

Difficult to amend as the review progresses

If the papers are lost, all data will need to be re-created

Require familiarity with software packages to design and use forms

Susceptible to changes in software versions

Upfront investment of resources to set up the form and train data extractors

Structured templates may not be as flexible as electronic forms

Cost of commercial data systems

Require familiarity with data systems

Susceptible to changes in software versions

5.4.3 Design of a data collection form

Regardless of whether data are collected using a paper or electronic form, or a data system, the key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner (Li et al 2015). In most cases, a document format should be developed for the form before building an electronic form or a data system. This can be distributed to others, including programmers and data analysts, and as a guide for creating an electronic form and any guidance or codebook to be used by data extractors. Review authors also should consider compatibility of any electronic form or data system with analytical software, as well as mechanisms for recording, assessing and correcting data entry errors.

Data described in multiple reports (or even within a single report) of a study may not be consistent. Review authors will need to describe how they work with multiple reports in the protocol, for example, by pre-specifying which report will be used when sources contain conflicting data that cannot be resolved by contacting the investigators. Likewise, when there is only one report identified for a study, review authors should specify the section within the report (e.g. abstract, methods, results, tables, and figures) for use in case of inconsistent information.

If review authors wish to automatically import their extracted data into RevMan, it is advised that their data collection forms match the data extraction templates available via the RevMan Knowledge Base. Details available here https://documentation.cochrane.org/revman-kb/data-extraction-templates-260702375.html.

A good data collection form should minimize the need to go back to the source documents. When designing a data collection form, review authors should involve all members of the team, that is, content area experts, authors with experience in systematic review methods and data collection form design, statisticians, and persons who will perform data extraction. Here are suggested steps and some tips for designing a data collection form, based on the informal collation of experiences from numerous review authors (Li et al 2015).

Step 1. Develop outlines of tables and figures expected to appear in the systematic review, considering the comparisons to be made between different interventions within the review, and the various outcomes to be measured. This step will help review authors decide the right amount of data to collect (not too much or too little). Collecting too much information can lead to forms that are longer than original study reports, and can be very wasteful of time. Collection of too little information, or omission of key data, can lead to the need to return to study reports later in the review process.

Step 2. Assemble and group data elements to facilitate form development. Review authors should consult Table 5.3.a , in which the data elements are grouped to facilitate form development and data collection. Note that it may be more efficient to group data elements in the order in which they are usually found in study reports (e.g. starting with reference information, followed by eligibility criteria, intervention description, statistical methods, baseline characteristics and results).

Step 3. Identify the optimal way of framing the data items. Much has been written about how to frame data items for developing robust data collection forms in primary research studies. We summarize a few key points and highlight issues that are pertinent to systematic reviews.

  • Ask closed-ended questions (i.e. questions that define a list of permissible responses) as much as possible. Closed-ended questions do not require post hoc coding and provide better control over data quality than open-ended questions. When setting up a closed-ended question, one must anticipate and structure possible responses and include an ‘other, specify’ category because the anticipated list may not be exhaustive. Avoid asking data extractors to summarize data into uncoded text, no matter how short it is.
  • Avoid asking a question in a way that the response may be left blank. Include ‘not applicable’, ‘not reported’ and ‘cannot tell’ options as needed. The ‘cannot tell’ option tags uncertain items that may promote review authors to contact study authors for clarification, especially on data items critical to reach conclusions.
  • Remember that the form will focus on what is reported in the article rather what has been done in the study. The study report may not fully reflect how the study was actually conducted. For example, a question ‘Did the article report that the participants were masked to the intervention?’ is more appropriate than ‘Were participants masked to the intervention?’
  • Where a judgement is required, record the raw data (i.e. quote directly from the source document) used to make the judgement. It is also important to record the source of information collected, including where it was found in a report or whether information was obtained from unpublished sources or personal communications. As much as possible, questions should be asked in a way that minimizes subjective interpretation and judgement to facilitate data comparison and adjudication.
  • Incorporate flexibility to allow for variation in how data are reported. It is strongly recommended that outcome data be collected in the format in which they were reported and transformed in a subsequent step if required. Review authors also should consider the software they will use for analysis and for publishing the review (e.g. RevMan).

Step 4. Develop and pilot-test data collection forms, ensuring that they provide data in the right format and structure for subsequent analysis. In addition to data items described in Step 2, data collection forms should record the title of the review as well as the person who is completing the form and the date of completion. Forms occasionally need revision; forms should therefore include the version number and version date to reduce the chances of using an outdated form by mistake. Because a study may be associated with multiple reports, it is important to record the study ID as well as the report ID. Definitions and instructions helpful for answering a question should appear next to the question to improve quality and consistency across data extractors (Stock 1994). Provide space for notes, regardless of whether paper or electronic forms are used.

All data collection forms and data systems should be thoroughly pilot-tested before launch (see MECIR Box 5.4.a ). Testing should involve several people extracting data from at least a few articles. The initial testing focuses on the clarity and completeness of questions. Users of the form may provide feedback that certain coding instructions are confusing or incomplete (e.g. a list of options may not cover all situations). The testing may identify data that are missing from the form, or likely to be superfluous. After initial testing, accuracy of the extracted data should be checked against the source document or verified data to identify problematic areas. It is wise to draft entries for the table of ‘Characteristics of included studies’ and complete a risk of bias assessment ( Chapter 8 ) using these pilot reports to ensure all necessary information is collected. A consensus between review authors may be required before the form is modified to avoid any misunderstandings or later disagreements. It may be necessary to repeat the pilot testing on a new set of reports if major changes are needed after the first pilot test.

Problems with the data collection form may surface after pilot testing has been completed, and the form may need to be revised after data extraction has started. When changes are made to the form or coding instructions, it may be necessary to return to reports that have already undergone data extraction. In some situations, it may be necessary to clarify only coding instructions without modifying the actual data collection form.

5.5 Extracting data from reports

5.5.1 introduction.

In most systematic reviews, the primary source of information about each study is published reports of studies, usually in the form of journal articles. Despite recent developments in machine learning models to automate data extraction in systematic reviews (see Section 5.5.9 ), data extraction is still largely a manual process. Electronic searches for text can provide a useful aid to locating information within a report. Examples include using search facilities in PDF viewers, internet browsers and word processing software. However, text searching should not be considered a replacement for reading the report, since information may be presented using variable terminology and presented in multiple formats.

5.5.2 Who should extract data?

Data extractors should have at least a basic understanding of the topic, and have knowledge of study design, data analysis and statistics. They should pay attention to detail while following instructions on the forms. Because errors that occur at the data extraction stage are rarely detected by peer reviewers, editors, or users of systematic reviews, it is recommended that more than one person extract data from every report to minimize errors and reduce introduction of potential biases by review authors (see MECIR Box 5.5.a ). As a minimum, information that involves subjective interpretation and information that is critical to the interpretation of results (e.g. outcome data) should be extracted independently by at least two people (see MECIR Box 5.5.a ). In common with implementation of the selection process ( Chapter 4, Section 4.6 ), it is preferable that data extractors are from complementary disciplines, for example a methodologist and a topic area specialist. It is important that everyone involved in data extraction has practice using the form and, if the form was designed by someone else, receives appropriate training.

Evidence in support of duplicate data extraction comes from several indirect sources. One study observed that independent data extraction by two authors resulted in fewer errors than data extraction by a single author followed by verification by a second (Buscemi et al 2006). A high prevalence of data extraction errors (errors in 20 out of 34 reviews) has been observed (Jones et al 2005). A further study of data extraction to compute standardized mean differences found that a minimum of seven out of 27 reviews had substantial errors (Gøtzsche et al 2007).

MECIR Box 5.5.a Relevant expectations for conduct of intervention reviews

Extracting study characteristics in duplicate ( )

Duplicating the data extraction process reduces both the risk of making mistakes and the possibility that data selection is influenced by a single person’s biases. Dual data extraction may be less important for study characteristics than it is for outcome data, so it is not a mandatory standard for the former.

Extracting outcome data in duplicate ( )

Duplicating the data extraction process reduces both the risk of making mistakes and the possibility that data selection is influenced by a single person’s biases. Dual data extraction is particularly important for outcome data, which feed directly into syntheses of the evidence and hence to conclusions of the review.

5.5.3 Training data extractors

Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency. Training should take place at the onset of the data extraction process and periodically over the course of the project (Li et al 2015). For example, when data related to a single item on the form are present in multiple locations within a report (e.g. abstract, main body of text, tables, and figures) or in several sources (e.g. publications, ClinicalTrials.gov, or CSRs), the development and documentation of instructions to follow an agreed algorithm are critical and should be reinforced during the training sessions.

Some have proposed that some information in a report, such as its authors, be blinded to the review author prior to data extraction and assessment of risk of bias (Jadad et al 1996). However, blinding of review authors to aspects of study reports generally is not recommended for Cochrane Reviews as there is little evidence that it alters the decisions made (Berlin 1997).

5.5.4 Extracting data from multiple reports of the same study

Studies frequently are reported in more than one publication or in more than one source (Tramèr et al 1997, von Elm et al 2004). A single source rarely provides complete information about a study; on the other hand, multiple sources may contain conflicting information about the same study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2017b, Mayo-Wilson et al 2018). Because the unit of interest in a systematic review is the study and not the report, information from multiple reports often needs to be collated and reconciled. It is not appropriate to discard any report of an included study without careful examination, since it may contain valuable information not included in the primary report. Review authors will need to decide between two strategies:

  • Extract data from each report separately, then combine information across multiple data collection forms.
  • Extract data from all reports directly into a single data collection form.

The choice of which strategy to use will depend on the nature of the reports and may vary across studies and across reports. For example, when a full journal article and multiple conference abstracts are available, it is likely that the majority of information will be obtained from the journal article; completing a new data collection form for each conference abstract may be a waste of time. Conversely, when there are two or more detailed journal articles, perhaps relating to different periods of follow-up, then it is likely to be easier to perform data extraction separately for these articles and collate information from the data collection forms afterwards. When data from all reports are extracted into a single data collection form, review authors should identify the ‘main’ data source for each study when sources include conflicting data and these differences cannot be resolved by contacting authors (Mayo-Wilson et al 2018). Flow diagrams such as those modified from the PRISMA statement can be particularly helpful when collating and documenting information from multiple reports (Mayo-Wilson et al 2018).

5.5.5 Reliability and reaching consensus

When more than one author extracts data from the same reports, there is potential for disagreement. After data have been extracted independently by two or more extractors, responses must be compared to assure agreement or to identify discrepancies. An explicit procedure or decision rule should be specified in the protocol for identifying and resolving disagreements. Most often, the source of the disagreement is an error by one of the extractors and is easily resolved. Thus, discussion among the authors is a sensible first step. More rarely, a disagreement may require arbitration by another person. Any disagreement that cannot be resolved should be addressed by contacting the study authors; if this is unsuccessful, the disagreement should be reported in the review.

The presence and resolution of disagreements should be carefully recorded. Maintaining a copy of the data ‘as extracted’ (in addition to the consensus data) allows assessment of reliability of coding. Examples of ways in which this can be achieved include the following:

  • Use one author’s (paper) data collection form and record changes after consensus in a different ink colour.
  • Enter consensus data onto an electronic form.
  • Record original data extracted and consensus data in separate forms (some online tools do this automatically).

Agreement of coded items before reaching consensus can be quantified, for example using kappa statistics (Orwin 1994), although this is not routinely done in Cochrane Reviews. If agreement is assessed, this should be done only for the most important data (e.g. key risk of bias assessments, or availability of key outcomes).

Throughout the review process informal consideration should be given to the reliability of data extraction. For example, if after reaching consensus on the first few studies, the authors note a frequent disagreement for specific data, then coding instructions may need modification. Furthermore, an author’s coding strategy may change over time, as the coding rules are forgotten, indicating a need for retraining and, possibly, some recoding.

5.5.6 Extracting data from clinical study reports

Clinical study reports (CSRs) obtained for a systematic review are likely to be in PDF format. Although CSRs can be thousands of pages in length and very time-consuming to review, they typically follow the content and format required by the International Conference on Harmonisation (ICH 1995). Information in CSRs is usually presented in a structured and logical way. For example, numerical data pertaining to important demographic, efficacy, and safety variables are placed within the main text in tables and figures. Because of the clarity and completeness of information provided in CSRs, data extraction from CSRs may be clearer and conducted more confidently than from journal articles or other short reports.

To extract data from CSRs efficiently, review authors should familiarize themselves with the structure of the CSRs. In practice, review authors may want to browse or create ‘bookmarks’ within a PDF document that record section headers and subheaders and search key words related to the data extraction (e.g. randomization). In addition, it may be useful to utilize optical character recognition software to convert tables of data in the PDF to an analysable format when additional analyses are required, saving time and minimizing transcription errors.

CSRs may contain many outcomes and present many results for a single outcome (due to different analyses) (Mayo-Wilson et al 2017b). We recommend review authors extract results only for outcomes of interest to the review (Section 5.3.6 ). With regard to different methods of analysis, review authors should have a plan and pre-specify preferred metrics in their protocol for extracting results pertaining to different populations (e.g. ‘all randomized’, ‘all participants taking at least one dose of medication’), methods for handling missing data (e.g. ‘complete case analysis’, ‘multiple imputation’), and adjustment (e.g. unadjusted, adjusted for baseline covariates). It may be important to record the range of analysis options available, even if not all are extracted in detail. In some cases it may be preferable to use metrics that are comparable across multiple included studies, which may not be clear until data collection for all studies is complete.

CSRs are particularly useful for identifying outcomes assessed but not presented to the public. For efficacy outcomes and systematically collected adverse events, review authors can compare what is described in the CSRs with what is reported in published reports to assess the risk of bias due to missing outcome data ( Chapter 8, Section 8.5 ) and in selection of reported result ( Chapter 8, Section 8.7 ). Note that non-systematically collected adverse events are not amenable to such comparisons because these adverse events may not be known ahead of time and thus not pre-specified in the protocol.

5.5.7 Extracting data from regulatory reviews

Data most relevant to systematic reviews can be found in the medical and statistical review sections of a regulatory review. Both of these are substantially longer than journal articles (Turner 2013). A list of all trials on a drug usually can be found in the medical review. Because trials are referenced by a combination of numbers and letters, it may be difficult for the review authors to link the trial with other reports of the same trial (Section 5.2.1 ).

Many of the documents downloaded from the US Food and Drug Administration’s website for older drugs are scanned copies and are not searchable because of redaction of confidential information (Turner 2013). Optical character recognition software can convert most of the text. Reviews for newer drugs have been redacted electronically; documents remain searchable as a result.

Compared to CSRs, regulatory reviews contain less information about trial design, execution, and results. They provide limited information for assessing the risk of bias. In terms of extracting outcomes and results, review authors should follow the guidance provided for CSRs (Section 5.5.6 ).

5.5.8 Extracting data from figures with software

Sometimes numerical data needed for systematic reviews are only presented in figures. Review authors may request the data from the study investigators, or alternatively, extract the data from the figures either manually (e.g. with a ruler) or by using software. Numerous tools are available, many of which are free. Those available at the time of writing include tools called Plot Digitizer, WebPlotDigitizer, Engauge, Dexter, ycasd, GetData Graph Digitizer. The software works by taking an image of a figure and then digitizing the data points off the figure using the axes and scales set by the users. The numbers exported can be used for systematic reviews, although additional calculations may be needed to obtain the summary statistics, such as calculation of means and standard deviations from individual-level data points (or conversion of time-to-event data presented on Kaplan-Meier plots to hazard ratios; see Chapter 6, Section 6.8.2 ).

It has been demonstrated that software is more convenient and accurate than visual estimation or use of a ruler (Gross et al 2014, Jelicic Kadic et al 2016). Review authors should consider using software for extracting numerical data from figures when the data are not available elsewhere.

5.5.9 Automating data extraction in systematic reviews

Because data extraction is time-consuming and error-prone, automating or semi-automating this step may make the extraction process more efficient and accurate. The state of science relevant to automating data extraction is summarized here (Jonnalagadda et al 2015).

  • At least 26 studies have tested various natural language processing and machine learning approaches for facilitating data extraction for systematic reviews.

· Each tool focuses on only a limited number of data elements (ranges from one to seven). Most of the existing tools focus on the PICO information (e.g. number of participants, their age, sex, country, recruiting centres, intervention groups, outcomes, and time points). A few are able to extract study design and results (e.g. objectives, study duration, participant flow), and two extract risk of bias information (Marshall et al 2016, Millard et al 2016). To date, well over half of the data elements needed for systematic reviews have not been explored for automated extraction.

  • Most tools highlight the sentence(s) that may contain the data elements as opposed to directly recording these data elements into a data collection form or a data system.
  • There is no gold standard or common dataset to evaluate the performance of these tools, limiting our ability to interpret the significance of the reported accuracy measures.

At the time of writing, we cannot recommend a specific tool for automating data extraction for routine systematic review production. There is a need for review authors to work with experts in informatics to refine these tools and evaluate them rigorously. Such investigations should address how the tool will fit into existing workflows. For example, the automated or semi-automated data extraction approaches may first act as checks for manual data extraction before they can replace it.

5.5.10 Suspicions of scientific misconduct

Systematic review authors can uncover suspected misconduct in the published literature. Misconduct includes fabrication or falsification of data or results, plagiarism, and research that does not adhere to ethical norms. Review authors need to be aware of scientific misconduct because the inclusion of fraudulent material could undermine the reliability of a review’s findings. Plagiarism of results data in the form of duplicated publication (either by the same or by different authors) may, if undetected, lead to study participants being double counted in a synthesis.

It is preferable to identify potential problems before, rather than after, publication of the systematic review, so that readers are not misled. However, empirical evidence indicates that the extent to which systematic review authors explore misconduct varies widely (Elia et al 2016). Text-matching software and systems such as CrossCheck may be helpful for detecting plagiarism, but they can detect only matching text, so data tables or figures need to be inspected by hand or using other systems (e.g. to detect image manipulation). Lists of data such as in a meta-analysis can be a useful means of detecting duplicated studies. Furthermore, examination of baseline data can lead to suspicions of misconduct for an individual randomized trial (Carlisle et al 2015). For example, Al-Marzouki and colleagues concluded that a trial report was fabricated or falsified on the basis of highly unlikely baseline differences between two randomized groups (Al-Marzouki et al 2005).

Cochrane Review authors are advised to consult with Cochrane editors if cases of suspected misconduct are identified. Searching for comments, letters or retractions may uncover additional information. Sensitivity analyses can be used to determine whether the studies arousing suspicion are influential in the conclusions of the review. Guidance for editors for addressing suspected misconduct will be available from Cochrane’s Editorial Publishing and Policy Resource (see community.cochrane.org ). Further information is available from the Committee on Publication Ethics (COPE; publicationethics.org ), including a series of flowcharts on how to proceed if various types of misconduct are suspected. Cases should be followed up, typically including an approach to the editors of the journals in which suspect reports were published. It may be useful to write first to the primary investigators to request clarification of apparent inconsistencies or unusual observations.

Because investigations may take time, and institutions may not always be responsive (Wager 2011), articles suspected of being fraudulent should be classified as ‘awaiting assessment’. If a misconduct investigation indicates that the publication is unreliable, or if a publication is retracted, it should not be included in the systematic review, and the reason should be noted in the ‘excluded studies’ section.

5.5.11 Key points in planning and reporting data extraction

In summary, the methods section of both the protocol and the review should detail:

  • the data categories that are to be extracted;
  • how extracted data from each report will be verified (e.g. extraction by two review authors, independently);
  • whether data extraction is undertaken by content area experts, methodologists, or both;
  • pilot testing, training and existence of coding instructions for the data collection form;
  • how data are extracted from multiple reports from the same study; and
  • how disagreements are handled when more than one author extracts data from each report.

5.6 Extracting study results and converting to the desired format

In most cases, it is desirable to collect summary data separately for each intervention group of interest and to enter these into software in which effect estimates can be calculated, such as RevMan. Sometimes the required data may be obtained only indirectly, and the relevant results may not be obvious. Chapter 6 provides many useful tips and techniques to deal with common situations. When summary data cannot be obtained from each intervention group, or where it is important to use results of adjusted analyses (for example to account for correlations in crossover or cluster-randomized trials) effect estimates may be available directly.

5.7 Managing and sharing data

When data have been collected for each individual study, it is helpful to organize them into a comprehensive electronic format, such as a database or spreadsheet, before entering data into a meta-analysis or other synthesis. When data are collated electronically, all or a subset of them can easily be exported for cleaning, consistency checks and analysis.

Tabulation of collected information about studies can facilitate classification of studies into appropriate comparisons and subgroups. It also allows identification of comparable outcome measures and statistics across studies. It will often be necessary to perform calculations to obtain the required statistics for presentation or synthesis. It is important through this process to retain clear information on the provenance of the data, with a clear distinction between data from a source document and data obtained through calculations. Statistical conversions, for example from standard errors to standard deviations, ideally should be undertaken with a computer rather than using a hand calculator to maintain a permanent record of the original and calculated numbers as well as the actual calculations used.

Ideally, data only need to be extracted once and should be stored in a secure and stable location for future updates of the review, regardless of whether the original review authors or a different group of authors update the review (Ip et al 2012). Standardizing and sharing data collection tools as well as data management systems among review authors working in similar topic areas can streamline systematic review production. Review authors have the opportunity to work with trialists, journal editors, funders, regulators, and other stakeholders to make study data (e.g. CSRs, IPD, and any other form of study data) publicly available, increasing the transparency of research. When legal and ethical to do so, we encourage review authors to share the data used in their systematic reviews to reduce waste and to allow verification and reanalysis because data will not have to be extracted again for future use (Mayo-Wilson et al 2018).

5.8 Chapter information

Editors: Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Acknowledgements: This chapter builds on earlier versions of the Handbook . For details of previous authors and editors of the Handbook , see Preface. Andrew Herxheimer, Nicki Jackson, Yoon Loke, Deirdre Price and Helen Thomas contributed text. Stephanie Taylor and Sonja Hood contributed suggestions for designing data collection forms. We are grateful to Judith Anzures, Mike Clarke, Miranda Cumpston and Peter Gøtzsche for helpful comments.

Funding: JPTH is a member of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JJD received support from the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

5.9 References

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 2005; 331 : 267-270.

Allen EN, Mushi AK, Massawe IS, Vestergaard LS, Lemnge M, Staedke SG, Mehta U, Barnes KI, Chandler CI. How experiences become data: the process of eliciting adverse event, medical history and concomitant medication reports in antimalarial and antiretroviral interaction trials. BMC Medical Research Methodology 2013; 13 : 140.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ 2017; 356 : j448.

Bent S, Padula A, Avins AL. Better ways to question patients about adverse medical events: a randomized, controlled trial. Annals of Internal Medicine 2006; 144 : 257-261.

Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997; 350 : 185-186.

Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 2006; 59 : 697-703.

Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia 2015; 70 : 848-858.

Carroll C, Patterson M, Wood S, Booth A, Rick J, Balain S. A conceptual framework for implementation fidelity. Implementation Science 2007; 2 : 40.

Carvajal A, Ortega PG, Sainz M, Velasco V, Salado I, Arias LHM, Eiros JM, Rubio AP, Castrodeza J. Adverse events associated with pandemic influenza vaccines: Comparison of the results of a follow-up study with those coming from spontaneous reporting. Vaccine 2011; 29 : 519-522.

Chamberlain C, O'Mara-Eves A, Porter J, Coleman T, Perlen SM, Thomas J, McKenzie JE. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science 2009; 4 : 50.

Davis AL, Miller JD. The European Medicines Agency and publication of clinical study reports: a challenge for the US FDA. JAMA 2017; 317 : 905-906.

Denniston AK, Holland GN, Kidess A, Nussenblatt RB, Okada AA, Rosenbaum JT, Dick AD. Heterogeneity of primary outcome measures used in clinical trials of treatments for intermediate, posterior, and panuveitis. Orphanet Journal of Rare Diseases 2015; 10 : 97.

Derry S, Loke YK. Risk of gastrointestinal haemorrhage with long term use of aspirin: meta-analysis. BMJ 2000; 321 : 1183-1187.

Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 2013; 346 : f2865.

Dusenbury L, Brannigan R, Falco M, Hansen WB. A review of research on fidelity of implementation: implications for drug abuse prevention in school settings. Health Education Research 2003; 18 : 237-256.

Dwan K, Altman DG, Clarke M, Gamble C, Higgins JPT, Sterne JAC, Williamson PR, Kirkham JJ. Evidence for the selective reporting of analyses and discrepancies in clinical trials: a systematic review of cohort studies of clinical trials. PLoS Medicine 2014; 11 : e1001666.

Elia N, von Elm E, Chatagner A, Popping DM, Tramèr MR. How do authors of systematic reviews deal with research malpractice and misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors. BMJ Open 2016; 6 : e010442.

Gøtzsche PC. Multiple publication of reports of drug trials. European Journal of Clinical Pharmacology 1989; 36 : 429-432.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298 : 430-437.

Gross A, Schirm S, Scholz M. Ycasd - a tool for capturing and scaling data from graphical representations. BMC Bioinformatics 2014; 15 : 219.

Hoffmann TC, Glasziou PP, Boutron I, Milne R, Perera R, Moher D, Altman DG, Barbour V, Macdonald H, Johnston M, Lamb SE, Dixon-Woods M, McCulloch P, Wyatt JC, Chan AW, Michie S. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 348 : g1687.

ICH. ICH Harmonised tripartite guideline: Struture and content of clinical study reports E31995. ICH1995. www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E3/E3_Guideline.pdf .

Ioannidis JPA, Mulrow CD, Goodman SN. Adverse events: The more you search, the more you find. Annals of Internal Medicine 2006; 144 : 298-300.

Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM, Lau J. A web-based archive of systematic review data. Systematic Reviews 2012; 1 : 15.

Ismail R, Azuara-Blanco A, Ramsay CR. Variation of clinical outcomes used in glaucoma randomised controlled trials: a systematic review. British Journal of Ophthalmology 2014; 98 : 464-468.

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay H. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials 1996; 17 : 1-12.

Jelicic Kadic A, Vucic K, Dosenovic S, Sapunar D, Puljak L. Extracting data from figures with software was faster, with higher interrater reliability than manual extraction. Journal of Clinical Epidemiology 2016; 74 : 119-123.

Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 2005; 58 : 741-742.

Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Medicine 2015; 13 : 282.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Systematic Reviews 2015; 4 : 78.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, McKenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, Bantoto B, Luo C, Shams I, Shahid H, Chang Y, Sun G, Mbuagbaw L, Samaan Z, Levine MAH, Adachi JD, Thabane L. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC Medical Research Methodology 2017; 17 : 181.

Li TJ, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Annals of Internal Medicine 2015; 162 : 287-294.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Medicine 2009; 6 : e1000100.

Liu ZM, Saldanha IJ, Margolis D, Dumville JC, Cullum NA. Outcomes in Cochrane systematic reviews related to wound care: an investigation into prespecification. Wound Repair and Regeneration 2017; 25 : 292-308.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 2016; 23 : 193-201.

Mayo-Wilson E, Doshi P, Dickersin K. Are manufacturers sharing data as promised? BMJ 2015; 351 : h4169.

Mayo-Wilson E, Li TJ, Fusco N, Bertizzolo L, Canner JK, Cowley T, Doshi P, Ehmsen J, Gresham G, Guo N, Haythomthwaite JA, Heyward J, Hong H, Pham D, Payne JL, Rosman L, Stuart EA, Suarez-Cuervo C, Tolbert E, Twose C, Vedula S, Dickersin K. Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy. Journal of Clinical Epidemiology 2017a; 91 : 95-110.

Mayo-Wilson E, Fusco N, Li TJ, Hong H, Canner JK, Dickersin K, MUDS Investigators. Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis. Journal of Clinical Epidemiology 2017b; 86 : 39-50.

Mayo-Wilson E, Li T, Fusco N, Dickersin K. Practical guidance for using multiple data sources in systematic reviews and meta-analyses (with examples from the MUDS study). Research Synthesis Methods 2018; 9 : 2-12.

Meade MO, Richardson WS. Selecting and appraising studies for a systematic review. Annals of Internal Medicine 1997; 127 : 531-537.

Meinert CL. Clinical trials dictionary: Terminology and usage recommendations . Hoboken (NJ): Wiley; 2012.

Millard LAC, Flach PA, Higgins JPT. Machine learning to assist risk-of-bias assessments in systematic reviews. International Journal of Epidemiology 2016; 45 : 266-277.

Moher D, Schulz KF, Altman DG. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001; 357 : 1191-1194.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340 : c869.

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, Moore L, O'Cathain A, Tinati T, Wight D, Baird J. Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015; 350 : h1258.

Orwin RG. Evaluating coding decisions. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 139-162.

Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database of Systematic Reviews 2014; 10 : MR000035.

Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Medicine 2009; 6 .

Safer DJ. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. Journal of Nervous and Mental Disease 2002; 190 : 583-592.

Saldanha IJ, Dickersin K, Wang X, Li TJ. Outcomes in Cochrane systematic reviews addressing four common eye conditions: an evaluation of completeness and comparability. PloS One 2014; 9 : e109400.

Saldanha IJ, Li T, Yang C, Ugarte-Gil C, Rutherford GW, Dickersin K. Social network analysis identified central outcomes for core outcome sets using systematic reviews of HIV/AIDS. Journal of Clinical Epidemiology 2016; 70 : 164-175.

Saldanha IJ, Lindsley K, Do DV, Chuck RS, Meyerle C, Jones LS, Coleman AL, Jampel HD, Dickersin K, Virgili G. Comparison of clinical trial and systematic review outcomes for the 4 most prevalent eye diseases. JAMA Ophthalmology 2017a; 135 : 933-940.

Saldanha IJ, Li TJ, Yang C, Owczarzak J, Williamson PR, Dickersin K. Clinical trials and systematic reviews addressing similar interventions for the same condition do not consider similar outcomes to be important: a case study in HIV/AIDS. Journal of Clinical Epidemiology 2017b; 84 : 85-94.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF, PRISMA-IPD Development Group. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. JAMA 2015; 313 : 1657-1665.

Stock WA. Systematic coding for research synthesis. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 125-138.

Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315 : 635-640.

Turner EH. How to access and process FDA drug approval packages for use in research. BMJ 2013; 347 .

von Elm E, Poglia G, Walder B, Tramèr MR. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 2004; 291 : 974-980.

Wager E. Coping with scientific misconduct. BMJ 2011; 343 : d6586.

Wieland LS, Rutkow L, Vedula SS, Kaufmann CN, Rosman LM, Twose C, Mahendraratnam N, Dickersin K. Who has used internal company documents for biomedical and public health research and where did they find them? PloS One 2014; 9 .

Zanchetti A, Hansson L. Risk of major gastrointestinal bleeding with aspirin (Authors' reply). Lancet 1999; 353 : 149-150.

Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database: update and key issues. New England Journal of Medicine 2011; 364 : 852-860.

Zwarenstein M, Treweek S, Gagnier JJ, Altman DG, Tunis S, Haynes B, Oxman AD, Moher D. Improving the reporting of pragmatic trials: an extension of the CONSORT statement. BMJ 2008; 337 : a2390.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

research data collection and analysis

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

research data collection and analysis

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

age gating

Age Gating: Effective Strategies for Online Content Control

Aug 23, 2024

Work-life balance

Work-Life Balance: Why We Need it & How to Improve It

Aug 22, 2024

Organizational MEMORY

Organizational Memory: Strategies for Success and Continuity

Aug 21, 2024

Organizational Change: What it is, Types & How to Manage

Organizational Change: What it is, Types & How to Manage

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

Case Western Reserve University

  • Research Data Lifecycle Guide

Data Collection

Data collection is the process of gathering and measuring information used for research. Collecting data is one of the most important steps in the research process, and is part of all disciplines including physical and social sciences, humanities, business, etc. Data comes in many forms with different ways to store and record data, either written in a lab notebook and or recorded digitally on a computer system. 

While methods may differ across disciplines,  good data management processes begin with accurately and clearly describing the information recorded, the process used to collect the data, practices that ensure the quality of the data, and sharing data to enable reproducibility. This section breaks down different topics that need to be addressed while collecting and managing data for research.

Learn more about what’s required for data collection as a researcher at Case Western Reserve University. 

Ensuring Accurate and Appropriate Data Collection

Accurate data collection is vital to ensure the integrity of research . It is important when planning and executing a research project to consider methods collection and the storage of data to ensure that results can be used for publications and reporting.   The consequences from improper data collection include:

  • inability to answer research questions accurately
  • inability to repeat and validate the study
  • distorted findings resulting in wasted resources
  • misleading other researchers to pursue fruitless avenues of investigation
  • compromising decisions for public policy
  • causing harm to human participants and animal subjects

While the degree of impact from inaccurate data may vary by discipline, there is a potential to cause disproportionate harm when data is misrepresented and misused. This includes fraud or scientific misconduct.

Any data collected in the course of your research should follow RDM best practices to ensure accurate and appropriate data collection. This includes as appropriate, developing data collection protocols and processes to ensure inconsistencies and other errors are caught and corrected in a timely manner.

Examples of Research Data

Research data is any information that has been collected, observed, generated or created in association with research processes and findings.

Much research data is digital in format, but research data can also be extended to include non-digital formats such as laboratory notebook, diaries, or written responses to surveys. Examples may include (but are not limited to):

  • Excel spreadsheets that contains instrument data
  • Documents (text, Word), containing study results
  • Laboratory notebooks, field notebooks, diaries
  • Questionnaires, transcripts, codebooks
  • Audiotapes, videotapes
  • Photographs, films
  • Protein or genetic sequences
  • Test responses
  • Slides, artifacts, specimens, samples
  • Collection of digital objects acquired and generated during the process of research
  • Database contents (video, audio, text, images)
  • Models, algorithms, scripts
  • Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
  • Source code used in application development

To ensure reproducibility of experiments and results, be sure to include and document information such as: 

  • Methodologies and workflows
  • Standard operating procedures and protocols

Data Use Agreements 

When working with data it is important to understand any restrictions that need to be addressed due to the sensitivity of the data. This includes how you download and share with other collaborators, and how it needs to be properly secured. 

Datasets can include potentially sensitive data that needs to be protected, and not openly shared. In this case, the dataset cannot be shared and or downloaded without permission from CWRU Research Administration and may require an agreement between collaborators and their institutions. All parties will need to abide by the agreement terms including the destruction of data once the collaboration is complete.

Storage Options 

UTech provides cloud and on-premise storage to support the university research mission. This includes Google Drive , Box , Microsoft 365 , and various on-premise solutions for high speed access and mass storage. A listing of supported options can be found on UTech’s website .

In addition to UTech-supported storage solutions, CWRU also maintains an institutional subscription to OSF (Open Science Framework) . OSF is a cloud-based data storage, sharing, and project collaboration platform that connects to many other cloud services like Drive, Box, and Github to amplify your research and data visibility and discoverability. OSF storage is functionally unlimited.

When selecting a storage platform it is important to understand how you plan to analyze and store your data. Cloud storage provides the ability to store and share data effortlessly and provides capabilities such as revisioning and other means to protect your data. On-premise storage is useful when you have large storage demands and require a high speed connection to instruments that generate data and systems that process data. Both types of storage have their advantages and disadvantages that you should consider when planning your research project.

Data Security

Data security is a set of processes and ongoing practices designed to protect information and the systems used to store and process data. This includes computer systems, files, databases, applications, user accounts, networks, and services on institutional premises, in the cloud, and remotely at the location of individual researchers. 

Effective data security takes into account the confidentiality, integrity, and availability of the information and its use. This is especially important when data contains personally identifiable information, intellectual property, trade secrets, and or technical data supporting technology transfer agreements (before public disclosure decisions have been made).

Data Categorization 

CWRU uses a 3-tier system to categorize research data based on information types and sensitivity . Determination is based upon risk to the University in the areas of confidentiality, integrity, and availability of data in support of the University's research mission. In this context, confidentiality measures to what extent information can be disclosed to others, integrity is the assurance that the information is trustworthy and accurate, and availability is a guarantee of reliable access to the information by authorized users.

Information (or data) owners are responsible for determining the impact levels of their information, i.e. what happens if the data is improperly accessed or lost accidentally, implementing the necessary security controls, and managing the risk of negative events including data loss and unauthorized access.

Classification

Examples

Loss, corruption, or inappropriate access to information can interfere with CWRU's mission, interrupt business and damage reputations or finances. 

Securing Data

The classification of data requires certain safeguards or countermeasures, known as controls, to be applied to systems that store data. This can include restricting access to the data, detecting unauthorized access, preventative measures to avoid loss of data, encrypting the transfer and storage of data, keeping the system and data in a secure location, and receiving training on best practices for handling data. Controls are classified according to their characteristics, for example:

  • Physical controls e.g. doors, locks, climate control, and fire extinguishers;
  • Procedural or administrative controls e.g. policies, incident response processes, management oversight, security awareness and training;
  • Technical or logical controls e.g. user authentication (login) and logical access controls, antivirus software, firewalls;
  • Legal and regulatory or compliance controls e.g. privacy laws, policies and clauses.

Principal Investigator (PI) Responsibilities

The CWRU Faculty Handbook provides guidelines for PIs regarding the custody of research data. This includes, where applicable, appropriate measures to protect confidential information. It is everyone’s responsibility to ensure that our research data is kept securely and available for reproducibility and future research opportunities.

University Technology provides many services and resources related to data security including assistance with planning and securing data. This includes processing and storing restricted information used in research. 

Data Collected as Part of Human Subject Research 

To ensure the privacy and safety of the individual participating in a human subject research study, additional rules and processes are in place that describe how one can use and disclose data collected,  The Office of Research Administration provides information relevant to conducting this type of research. This includes:

  • Guidance on data use agreements and processes for agreements that involve human-related data or human-derived samples coming in or going out of CWRU.
  • Compliance with human subject research rules and regulations.

According to 45 CFR 46 , a human subject is "a living individual about whom an investigator (whether professional or student) conducting research:

  • Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens; or
  • Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens."

The CWRU Institutional Review Board reviews social science/behavioral studies, and low-risk biomedical research not conducted in a hospital setting for all faculty, staff, and students of the University. This includes data collected and used for human subjects research. 

Research conducted in a hospital setting including University Hospitals requires IRB protocol approval.

Questions regarding the management of human subject research data should be addressed to the CWRU Institutional Review Board .

Getting Help With Data Collection

If you are looking for datasets and other resources for your research you can contact your subject area librarian for assistance.

  • Kelvin Smith Library

If you need assistance with administrative items such as data use agreements or finding the appropriate storage solution please contact the following offices.

  • Research Administration
  • UTech Research Computing
  • Information Security Office

Guidance and Resources

  • Information Security Policy
  • Research Data Protection
  • CWRU Faculty Handbook
  • CWRU IRB Guidance

SurveyCTO

A Guide to Data Collection: Methods, Process, and Tools

A hand holds a smartphone in a green field.

Whether your field is development economics, international development, the nonprofit sector, or myriad other industries, effective data collection is essential. It informs decision-making and increases your organization’s impact. However, the process of data collection can be complex and challenging. If you’re in the beginning stages of creating a data collection process, this guide is for you. It outlines tested methods, efficient procedures, and effective tools to help you improve your data collection activities and outcomes. At SurveyCTO, we’ve used our years of experience and expertise to build a robust, secure, and scalable mobile data collection platform. It’s trusted by respected institutions like The World Bank, J-PAL, Oxfam, and the Gates Foundation, and it’s changed the way many organizations collect and use data. With this guide, we want to share what we know and help you get ready to take the first step in your data collection journey.

Main takeaways from this guide

  • Before starting the data collection process, define your goals and identify data sources, which can be primary (first-hand research) or secondary (existing resources).
  • Your data collection method should align with your goals, resources, and the nature of the data needed. Surveys, interviews, observations, focus groups, and forms are common data collection methods. 
  • Sampling involves selecting a representative group from a larger population. Choosing the right sampling method to gather representative and relevant data is crucial.
  • Crafting effective data collection instruments like surveys and questionnaires is key. Instruments should undergo rigorous testing for reliability and accuracy.
  • Data collection is an ongoing, iterative process that demands real-time monitoring and adjustments to ensure high-quality, reliable results.
  • After data collection, data should be cleaned to eliminate errors and organized for efficient analysis. The data collection journey further extends into data analysis, where patterns and useful information that can inform decision-making are discovered.
  • Common challenges in data collection include data quality and consistency issues, data security concerns, and limitations with offline surveys . Employing robust data validation processes, implementing strong security protocols, and using offline-enabled data collection tools can help overcome these challenges.
  • Data collection, entry, and management tools and data analysis, visualization, reporting, and workflow tools can streamline the data collection process, improve data quality, and facilitate data analysis.

What is data collection?

SurveyCTO Collect app on a tablet and mobile device

The traditional definition of data collection might lead us to think of gathering information through surveys, observations, or interviews. However, the modern-age definition of data collection extends beyond conducting surveys and observations. It encompasses the systematic gathering and recording of any kind of information through digital or manual methods. Data collection can be as routine as a doctor logging a patient’s information into an electronic medical record system during each clinic visit, or as specific as keeping a record of mosquito nets delivered to a rural household.

Getting started with data collection

research data collection and analysis

Before starting your data collection process, you must clearly understand what you aim to achieve and how you’ll get there. Below are some actionable steps to help you get started.

1. Define your goals

Defining your goals is a crucial first step. Engage relevant stakeholders and team members in an iterative and collaborative process to establish clear goals. It’s important that projects start with the identification of key questions and desired outcomes to ensure you focus your efforts on gathering the right information. 

Start by understanding the purpose of your project– what problem are you trying to solve, or what change do you want to bring about? Think about your project’s potential outcomes and obstacles and try to anticipate what kind of data would be useful in these scenarios. Consider who will be using the data you collect and what data would be the most valuable to them. Think about the long-term effects of your project and how you will measure these over time. Lastly, leverage any historical data from previous projects to help you refine key questions that may have been overlooked previously. 

Once questions and outcomes are established, your data collection goals may still vary based on the context of your work. To demonstrate, let’s use the example of an international organization working on a healthcare project in a remote area.

  • If you’re a researcher , your goal will revolve around collecting primary data to answer specific questions. This could involve designing a survey or conducting interviews to collect first-hand data on patient improvement, disease or illness prevalence, and behavior changes (such as an increase in patients seeking healthcare).
  • If you’re part of the monitoring and evaluation ( M&E) team , your goal will revolve around measuring the success of your healthcare project. This could involve collecting primary data through surveys or observations and developing a dashboard to display real-time metrics like the number of patients treated, percentage of reduction in incidences of disease,, and average patient wait times. Your focus would be using this data to implement any needed program changes and ensure your project meets its objectives.
  • If you’re part of a field team , your goal will center around the efficient and accurate execution of project plans. You might be responsible for using data collection tools to capture pertinent information in different settings, such as in interviews takendirectly from the sample community or over the phone. The data you collect and manage will directly influence the operational efficiency of the project and assist in achieving the project’s overarching objectives.

2. Identify your data sources

The crucial next step in your research process is determining your data source. Essentially, there are two main data types to choose from: primary and secondary.

  • Primary data is the information you collect directly from first-hand engagements. It’s gathered specifically for your research and tailored to your research question. Primary data collection methods can range from surveys and interviews to focus groups and observations. Because you design the data collection process, primary data can offer precise, context-specific information directly related to your research objectives. For example, suppose you are investigating the impact of a new education policy. In that case, primary data might be collected through surveys distributed to teachers or interviews with school administrators dealing directly with the policy’s implementation.
  • Secondary data, on the other hand, is derived from resources that already exist. This can include information gathered for other research projects, administrative records, historical documents, statistical databases, and more. While not originally collected for your specific study, secondary data can offer valuable insights and background information that complement your primary data. For instance, continuing with the education policy example, secondary data might involve academic articles about similar policies, government reports on education or previous survey data about teachers’ opinions on educational reforms.

While both types of data have their strengths, this guide will predominantly focus on primary data and the methods to collect it. Primary data is often emphasized in research because it provides fresh, first-hand insights that directly address your research questions. Primary data also allows for more control over the data collection process, ensuring data is relevant, accurate, and up-to-date.

However, secondary data can offer critical context, allow for longitudinal analysis, save time and resources, and provide a comparative framework for interpreting your primary data. It can be a crucial backdrop against which your primary data can be understood and analyzed. While we focus on primary data collection methods in this guide, we encourage you not to overlook the value of incorporating secondary data into your research design where appropriate.

3. Choose your data collection method

When choosing your data collection method, there are many options at your disposal. Data collection is not limited to methods like surveys and interviews. In fact, many of the processes in our daily lives serve the goal of collecting data, from intake forms to automated endpoints, such as payment terminals and mass transit card readers. Let us dive into some common types of data collection methods: 

Surveys and Questionnaires

Surveys and questionnaires are tools for gathering information about a group of individuals, typically by asking them predefined questions. They can be used to collect quantitative and qualitative data and be administered in various ways, including online, over the phone, in person (offline), or by mail.

  • Advantages : They allow researchers to reach many participants quickly and cost-effectively, making them ideal for large-scale studies. The structured format of questions makes analysis easier.
  • Disadvantages : They may not capture complex or nuanced information as participants are limited to predefined response choices. Also, there can be issues with response bias, where participants might provide socially desirable answers rather than honest ones.

Interviews involve a one-on-one conversation between the researcher and the participant. The interviewer asks open-ended questions to gain detailed information about the participant’s thoughts, feelings, experiences, and behaviors.

  • Advantages : They allow for an in-depth understanding of the topic at hand. The researcher can adapt the questioning in real time based on the participant’s responses, allowing for more flexibility.
  • Disadvantages : They can be time-consuming and resource-intensive, as they require trained interviewers and a significant amount of time for both conducting and analyzing responses. They may also introduce interviewer bias if not conducted carefully, due to how an interviewer presents questions and perceives the respondent, and how the respondent perceives the interviewer. 

Observations

Observations involve directly observing and recording behavior or other phenomena as they occur in their natural settings.

  • Advantages : Observations can provide valuable contextual information, as researchers can study behavior in the environment where it naturally occurs, reducing the risk of artificiality associated with laboratory settings or self-reported measures.
  • Disadvantages : Observational studies may suffer from observer bias, where the observer’s expectations or biases could influence their interpretation of the data. Also, some behaviors might be altered if subjects are aware they are being observed.

Focus Groups

Focus groups are guided discussions among selected individuals to gain information about their views and experiences.

  • Advantages : Focus groups allow for interaction among participants, which can generate a diverse range of opinions and ideas. They are good for exploring new topics where there is little pre-existing knowledge.
  • Disadvantages : Dominant voices in the group can sway the discussion, potentially silencing less assertive participants. They also require skilled facilitators to moderate the discussion effectively.

Forms are standardized documents with blank fields for collecting data in a systematic manner. They are often used in fields like Customer Relationship Management (CRM) or Electronic Medical Records (EMR) data entry. Surveys may also be referred to as forms.

  • Advantages : Forms are versatile, easy to use, and efficient for data collection. They can streamline workflows by standardizing the data entry process.
  • Disadvantages : They may not provide in-depth insights as the responses are typically structured and limited. There is also potential for errors in data entry, especially when done manually.

Selecting the right data collection method should be an intentional process, taking into consideration the unique requirements of your project. The method selected should align with your goals, available resources, and the nature of the data you need to collect.

If you aim to collect quantitative data, surveys, questionnaires, and forms can be excellent tools, particularly for large-scale studies. These methods are suited to providing structured responses that can be analyzed statistically, delivering solid numerical data.

However, if you’re looking to uncover a deeper understanding of a subject, qualitative data might be more suitable. In such cases, interviews, observations, and focus groups can provide richer, more nuanced insights. These methods allow you to explore experiences, opinions, and behaviors deeply. Some surveys can also include open-ended questions that provide qualitative data.

The cost of data collection is also an important consideration. If you have budget constraints, in-depth, in-person conversations with every member of your target population may not be practical. In such cases, distributing questionnaires or forms can be a cost-saving approach.

Additional considerations include language barriers and connectivity issues. If your respondents speak different languages, consider translation services or multilingual data collection tools . If your target population resides in areas with limited connectivity and your method will be to collect data using mobile devices, ensure your tool provides offline data collection , which will allow you to carry out your data collection plan without internet connectivity.

4. Determine your sampling method

Now that you’ve established your data collection goals and how you’ll collect your data, the next step is deciding whom to collect your data from. Sampling involves carefully selecting a representative group from a larger population. Choosing the right sampling method is crucial for gathering representative and relevant data that aligns with your data collection goal.

Consider the following guidelines to choose the appropriate sampling method for your research goal and data collection method:

  • Understand Your Target Population: Start by conducting thorough research of your target population. Understand who they are, their characteristics, and subgroups within the population.
  • Anticipate and Minimize Biases: Anticipate and address potential biases within the target population to help minimize their impact on the data. For example, will your sampling method accurately reflect all ages, gender, cultures, etc., of your target population? Are there barriers to participation for any subgroups? Your sampling method should allow you to capture the most accurate representation of your target population.
  • Maintain Cost-Effective Practices: Consider the cost implications of your chosen sampling methods. Some sampling methods will require more resources, time, and effort. Your chosen sampling method should balance the cost factors with the ability to collect your data effectively and accurately. 
  • Consider Your Project’s Objectives: Tailor the sampling method to meet your specific objectives and constraints, such as M&E teams requiring real-time impact data and researchers needing representative samples for statistical analysis.

By adhering to these guidelines, you can make informed choices when selecting a sampling method, maximizing the quality and relevance of your data collection efforts.

5. Identify and train collectors

Not every data collection use case requires data collectors, but training individuals responsible for data collection becomes crucial in scenarios involving field presence.

The SurveyCTO platform supports both self-response survey modes and surveys that require a human field worker to do in-person interviews. Whether you’re hiring and training data collectors, utilizing an existing team, or training existing field staff, we offer comprehensive guidance and the right tools to ensure effective data collection practices.  

Here are some common training approaches for data collectors:

  • In-Class Training: Comprehensive sessions covering protocols, survey instruments, and best practices empower data collectors with skills and knowledge.
  • Tests and Assessments: Assessments evaluate collectors’ understanding and competence, highlighting areas where additional support is needed.
  • Mock Interviews: Simulated interviews refine collectors’ techniques and communication skills.
  • Pre-Recorded Training Sessions: Accessible reinforcement and self-paced learning to refresh and stay updated.

Training data collectors is vital for successful data collection techniques. Your training should focus on proper instrument usage and effective interaction with respondents, including communication skills, cultural literacy, and ethical considerations.

Remember, training is an ongoing process. Knowledge gaps and issues may arise in the field, necessitating further training.

Moving Ahead: Iterative Steps in Data Collection

A woman in a blazer sits at a desk reviewing paperwork in front of her laptop.

Once you’ve established the preliminary elements of your data collection process, you’re ready to start your data collection journey. In this section, we’ll delve into the specifics of designing and testing your instruments, collecting data, and organizing data while embracing the iterative nature of the data collection process, which requires diligent monitoring and making adjustments when needed.

6. Design and test your instruments

Designing effective data collection instruments like surveys and questionnaires is key. It’s crucial to prioritize respondent consent and privacy to ensure the integrity of your research. Thoughtful design and careful testing of survey questions are essential for optimizing research insights. Other critical considerations are: 

  • Clear and Unbiased Question Wording: Craft unambiguous, neutral questions free from bias to gather accurate and meaningful data. For example, instead of asking, “Shouldn’t we invest more into renewable energy that will combat the effects of climate change?” ask your question in a neutral way that allows the respondent to voice their thoughts. For example: “What are your thoughts on investing more in renewable energy?”
  • Logical Ordering and Appropriate Response Format: Arrange questions logically and choose response formats (such as multiple-choice, Likert scale, or open-ended) that suit the nature of the data you aim to collect.
  • Coverage of Relevant Topics: Ensure that your instrument covers all topics pertinent to your data collection goals while respecting cultural and social sensitivities. Make sure your instrument avoids assumptions, stereotypes, and languages or topics that could be considered offensive or taboo in certain contexts. The goal is to avoid marginalizing or offending respondents based on their social or cultural background.
  • Collect Only Necessary Data: Design survey instruments that focus solely on gathering the data required for your research objectives, avoiding unnecessary information.
  • Language(s) of the Respondent Population: Tailor your instruments to accommodate the languages your target respondents speak, offering translated versions if needed. Similarly, take into account accessibility for respondents who can’t read by offering alternative formats like images in place of text.
  • Desired Length of Time for Completion: Respect respondents’ time by designing instruments that can be completed within a reasonable timeframe, balancing thoroughness with engagement. Having a general timeframe for the amount of time needed to complete a response will also help you weed out bad responses. For example, a response that was rushed and completed outside of your response timeframe could indicate a response that needs to be excluded.
  • Collecting and Documenting Respondents’ Consent and Privacy: Ensure a robust consent process, transparent data usage communication, and privacy protection throughout data collection.

Perform Cognitive Interviewing

Cognitive interviewing is a method used to refine survey instruments and improve the accuracy of survey responses by evaluating how respondents understand, process, and respond to the instrument’s questions. In practice, cognitive interviewing involves an interview with the respondent, asking them to verbalize their thoughts as they interact with the instrument. By actively probing and observing their responses, you can identify and address ambiguities, ensuring accurate data collection.  

Thoughtful question wording, well-organized response options, and logical sequencing enhance comprehension, minimize biases, and ensure accurate data collection. Iterative testing and refinement based on respondent feedback improve the validity, reliability, and actionability of insights obtained.

Put Your Instrument to the Test

Through rigorous testing, you can uncover flaws, ensure reliability, maximize accuracy, and validate your instrument’s performance. This can be achieved by:

  • Conducting pilot testing to enhance the reliability and effectiveness of data collection. Administer the instrument, identify difficulties, gather feedback, and assess performance in real-world conditions.
  • Making revisions based on pilot testing to enhance clarity, accuracy, usability, and participant satisfaction. Refine questions, instructions, and format for effective data collection.
  • Continuously iterating and refining your instrument based on feedback and real-world testing. This ensures reliable, accurate, and audience-aligned methods of data collection. Additionally, this ensures your instrument adapts to changes, incorporates insights, and maintains ongoing effectiveness.

7. Collect your data

Now that you have your well-designed survey, interview questions, observation plan, or form, it’s time to implement it and gather the needed data. Data collection is not a one-and-done deal; it’s an ongoing process that demands attention to detail. Imagine spending weeks collecting data, only to discover later that a significant portion is unusable due to incomplete responses, improper collection methods, or falsified responses. To avoid such setbacks, adopt an iterative approach.

Leverage data collection tools with real-time monitoring to proactively identify outliers and issues. Take immediate action by fine-tuning your instruments, optimizing the data collection process, addressing concerns like additional training, or reevaluating personnel responsible for inaccurate data (for example, a field worker who sits in a coffee shop entering fake responses rather than doing the work of knocking on doors).

SurveyCTO’s Data Explorer was specifically designed to fulfill this requirement, empowering you to monitor incoming data, gain valuable insights, and know where changes may be needed. Embracing this iterative approach ensures ongoing improvement in data collection, resulting in more reliable and precise results.

8. Clean and organize your data

After data collection, the next step is to clean and organize the data to ensure its integrity and usability.

  • Data Cleaning: This stage involves sifting through your data to identify and rectify any errors, inconsistencies, or missing values. It’s essential to maintain the accuracy of your data and ensure that it’s reliable for further analysis. Data cleaning can uncover duplicates, outliers, and gaps that could skew your results if left unchecked. With real-time data monitoring , this continuous cleaning process keeps your data precise and current throughout the data collection period. Similarly, review and corrections workflows allow you to monitor the quality of your incoming data.
  • Organizing Your Data: Post-cleaning, it’s time to organize your data for efficient analysis and interpretation. Labeling your data using appropriate codes or categorizations can simplify navigation and streamline the extraction of insights. When you use a survey or form, labeling your data is often not necessary because you can design the instrument to collect in the right categories or return the right codes. An organized dataset is easier to manage, analyze, and interpret, ensuring that your collection efforts are not wasted but lead to valuable, actionable insights.

Remember, each stage of the data collection process, from design to cleaning, is iterative and interconnected. By diligently cleaning and organizing your data, you are setting the stage for robust, meaningful analysis that can inform your data-driven decisions and actions.

What happens after data collection?

A person sits at a laptop while using a large tablet to aggregate data into a graph.

The data collection journey takes us next into data analysis, where you’ll uncover patterns, empowering informed decision-making for researchers, evaluation teams, and field personnel.

Process and Analyze Your Data

Explore data through statistical and qualitative techniques to discover patterns, correlations, and insights during this pivotal stage. It’s about extracting the essence of your data and translating numbers into knowledge. Whether applying descriptive statistics, conducting regression analysis, or using thematic coding for qualitative data, this process drives decision-making and charts the path toward actionable outcomes.

Interpret and Report Your Results

Interpreting and reporting your data brings meaning and context to the numbers. Translating raw data into digestible insights for informed decision-making and effective stakeholder communication is critical.

The approach to interpretation and reporting varies depending on the perspective and role:

  • Researchers often lean heavily on statistical methods to identify trends, extract meaningful conclusions, and share their findings in academic circles, contributing to their knowledge pool.
  • M&E teams typically produce comprehensive reports, shedding light on the effectiveness and impact of programs. These reports guide internal and sometimes external stakeholders, supporting informed decisions and driving program improvements.

Field teams provide a first-hand perspective. Since they are often the first to see the results of the practical implementation of data, field teams are instrumental in providing immediate feedback loops on project initiatives. Field teams do the work that provides context to help research and M&E teams understand external factors like the local environment, cultural nuances, and logistical challenges that impact data results.

Safely store and handle data

Throughout the data collection process, and after it has been collected, it is vital to follow best practices for storing and handling data to ensure the integrity of your research. While the specifics of how to best store and handle data will depend on your project, here are some important guidelines to keep in mind:

  • Use cloud storage to hold your data if possible, since this is safer than storing data on hard drives and keeps it more accessible,
  • Periodically back up and purge old data from your system, since it’s safer to not retain data longer than necessary,
  • If you use mobile devices to collect and store data, use options for private, internal apps-specific storage if and when possible,
  • Restrict access to stored data to only those who need to work with that data.

Further considerations for data safety are discussed below in the section on data security .

Remember to uphold ethical standards in interpreting and reporting your data, regardless of your role. Clear communication, respectful handling of sensitive information, and adhering to confidentiality and privacy rights are all essential to fostering trust, promoting transparency, and bolstering your work’s credibility.

Common Data Collection Challenges

research data collection and analysis

Data collection is vital to data-driven initiatives, but it comes with challenges. Addressing common challenges such as poor data quality, privacy concerns, inadequate sample sizes, and bias is essential to ensure the collected data is reliable, trustworthy, and secure. 

In this section, we’ll explore three major challenges: data quality and consistency issues, data security concerns, and limitations with offline data collection , along with strategies to overcome them.

Data Quality and Consistency

Data quality and consistency refer to data accuracy and reliability throughout the collection and analysis process. 

Challenges such as incomplete or missing data, data entry errors, measurement errors, and data coding/categorization errors can impact the integrity and usefulness of the data. 

To navigate these complexities and maintain high standards, consistency, and integrity in the dataset:

  • Implement robust data validation processes, 
  • Ensure proper training for data entry personnel, 
  • Employ automated data validation techniques, and 
  • Conduct regular data quality audits.

Data security

Data security encompasses safeguarding data through ensuring data privacy and confidentiality, securing storage and backup, and controlling data sharing and access.

Challenges include the risk of potential breaches, unauthorized access, and the need to comply with data protection regulations.

To address these setbacks and maintain privacy, trust, and confidence during the data collection process: 

  • Use encryption and authentication methods, 
  • Implement robust security protocols, 
  • Update security measures regularly, 
  • Provide employee training on data security, and 
  • Adopt secure cloud storage solutions.

Offline Data Collection

Offline data collection refers to the process of gathering data using modes like mobile device-based computer-assisted personal interviewing (CAPI) when t here is an inconsistent or unreliable internet connection, and the data collection tool being used for CAPI has the functionality to work offline. 

Challenges associated with offline data collection include synchronization issues, difficulty transferring data, and compatibility problems between devices, and data collection tools. 

To overcome these challenges and enable efficient and reliable offline data collection processes, employ the following strategies: 

  • Leverage offline-enabled data collection apps or tools  that enable you to survey respondents even when there’s no internet connection, and upload data to a central repository at a later time. 
  • Your data collection plan should include times for periodic data synchronization when connectivity is available, 
  • Use offline, device-based storage for seamless data transfer and compatibility, and 
  • Provide clear instructions to field personnel on handling offline data collection scenarios.

Utilizing Technology in Data Collection

A group of people stand in a circle holding brightly colored smartphones.

Embracing technology throughout your data collection process can help you overcome many challenges described in the previous section. Data collection tools can streamline your data collection, improve the quality and security of your data, and facilitate the analysis of your data. Let’s look at two broad categories of tools that are essential for data collection:

Data Collection, Entry, & Management Tools

These tools help with data collection, input, and organization. They can range from digital survey platforms to comprehensive database systems, allowing you to gather, enter, and manage your data effectively. They can significantly simplify the data collection process, minimize human error, and offer practical ways to organize and manage large volumes of data. Some of these tools are:

  • Microsoft Office
  • Google Docs
  • SurveyMonkey
  • Google Forms

Data Analysis, Visualization, Reporting, & Workflow Tools

These tools assist in processing and interpreting the collected data. They provide a way to visualize data in a user-friendly format, making it easier to identify trends and patterns. These tools can also generate comprehensive reports to share your findings with stakeholders and help manage your workflow efficiently. By automating complex tasks, they can help ensure accuracy and save time. Tools for these purposes include:

  • Google sheets

Data collection tools like SurveyCTO often have integrations to help users seamlessly transition from data collection to data analysis, visualization, reporting, and managing workflows.

Master Your Data Collection Process With SurveyCTO

As we bring this guide to a close, you now possess a wealth of knowledge to develop your data collection process. From understanding the significance of setting clear goals to the crucial process of selecting your data collection methods and addressing common challenges, you are equipped to handle the intricate details of this dynamic process.

Remember, you’re not venturing into this complex process alone. At SurveyCTO, we offer not just a tool but an entire support system committed to your success. Beyond troubleshooting support, our success team serves as research advisors and expert partners, ready to provide guidance at every stage of your data collection journey.

With SurveyCTO , you can design flexible surveys in Microsoft Excel or Google Sheets, collect data online and offline with above-industry-standard security, monitor your data in real time, and effortlessly export it for further analysis in any tool of your choice. You also get access to our Data Explorer, which allows you to visualize incoming data at both individual survey and aggregate levels instantly.

In the iterative data collection process, our users tell us that SurveyCTO stands out with its capacity to establish review and correction workflows. It enables you to monitor incoming data and configure automated quality checks to flag error-prone submissions.

Finally, data security is of paramount importance to us. We ensure best-in-class security measures like SOC 2 compliance, end-to-end encryption, single sign-on (SSO), GDPR-compliant setups, customizable user roles, and self-hosting options to keep your data safe.

As you embark on your data collection journey, you can count on SurveyCTO’s experience and expertise to be by your side every step of the way. Our team would be excited and honored to be a part of your research project, offering you the tools and processes to gain informative insights and make effective decisions. Partner with us today and revolutionize the way you collect data.

Better data, better decision making, better world.

research data collection and analysis

INTEGRATIONS

Research-Methodology

Data Collection Methods

Data collection is a process of collecting information from all the relevant sources to find answers to the research problem, test the hypothesis (if you are following deductive approach ) and evaluate the outcomes. Data collection methods can be divided into two categories: secondary methods of data collection and primary methods of data collection.

Secondary Data Collection Methods

Secondary data is a type of data that has already been published in books, newspapers, magazines, journals, online portals etc.  There is an abundance of data available in these sources about your research area in business studies, almost regardless of the nature of the research area. Therefore, application of appropriate set of criteria to select secondary data to be used in the study plays an important role in terms of increasing the levels of research validity and reliability.

These criteria include, but not limited to date of publication, credential of the author, reliability of the source, quality of discussions, depth of analyses, the extent of contribution of the text to the development of the research area etc. Secondary data collection is discussed in greater depth in Literature Review chapter.

Secondary data collection methods offer a range of advantages such as saving time, effort and expenses. However they have a major disadvantage. Specifically, secondary research does not make contribution to the expansion of the literature by producing fresh (new) data.

Primary Data Collection Methods

Primary data is the type of data that has not been around before. Primary data is unique findings of your research. Primary data collection and analysis typically requires more time and effort to conduct compared to the secondary data research. Primary data collection methods can be divided into two groups: quantitative and qualitative.

Quantitative data collection methods are based on mathematical calculations in various formats. Methods of quantitative data collection and analysis include questionnaires with closed-ended questions, methods of correlation and regression, mean, mode and median and others.

Quantitative methods are cheaper to apply and they can be applied within shorter duration of time compared to qualitative methods. Moreover, due to a high level of standardisation of quantitative methods, it is easy to make comparisons of findings.

Qualitative research methods , on the contrary, do not involve numbers or mathematical calculations. Qualitative research is closely associated with words, sounds, feeling, emotions, colours and other elements that are non-quantifiable.

Qualitative studies aim to ensure greater level of depth of understanding and qualitative data collection methods include interviews, questionnaires with open-ended questions, focus groups, observation, game or role-playing, case studies etc.

Your choice between quantitative or qualitative methods of data collection depends on the area of your research and the nature of research aims and objectives.

My e-book, The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline.

John Dudovskiy

Data Collection Methods

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

research data collection and analysis

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Creating Brand Value
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

7 Data Collection Methods in Business Analytics

Three colleagues discussing data collection by wall of data

  • 02 Dec 2021

Data is being generated at an ever-increasing pace. According to Statista , the total volume of data was 64.2 zettabytes in 2020; it’s predicted to reach 181 zettabytes by 2025. This abundance of data can be overwhelming if you aren’t sure where to start.

So, how do you ensure the data you use is relevant and important to the business problems you aim to solve? After all, a data-driven decision is only as strong as the data it’s based on. One way is to collect data yourself.

Here’s a breakdown of data types, why data collection is important, what to know before you begin collecting, and seven data collection methods to leverage.

Access your free e-book today.

What Is Data Collection?

Data collection is the methodological process of gathering information about a specific subject. It’s crucial to ensure your data is complete during the collection phase and that it’s collected legally and ethically . If not, your analysis won’t be accurate and could have far-reaching consequences.

In general, there are three types of consumer data:

  • First-party data , which is collected directly from users by your organization
  • Second-party data , which is data shared by another organization about its customers (or its first-party data)
  • Third-party data , which is data that’s been aggregated and rented or sold by organizations that don’t have a connection to your company or users

Although there are use cases for second- and third-party data, first-party data (data you’ve collected yourself) is more valuable because you receive information about how your audience behaves, thinks, and feels—all from a trusted source.

Data can be qualitative (meaning contextual in nature) or quantitative (meaning numeric in nature). Many data collection methods apply to either type, but some are better suited to one over the other.

In the data life cycle , data collection is the second step. After data is generated, it must be collected to be of use to your team. After that, it can be processed, stored, managed, analyzed, and visualized to aid in your organization’s decision-making.

Chart showing the Data Lifecycle: Generation, collection, processing, storage, management, analysis, visualization, and interpretation

Before collecting data, there are several factors you need to define:

  • The question you aim to answer
  • The data subject(s) you need to collect data from
  • The collection timeframe
  • The data collection method(s) best suited to your needs

The data collection method you select should be based on the question you want to answer, the type of data you need, your timeframe, and your company’s budget.

The Importance of Data Collection

Collecting data is an integral part of a business’s success; it can enable you to ensure the data’s accuracy, completeness, and relevance to your organization and the issue at hand. The information gathered allows organizations to analyze past strategies and stay informed on what needs to change.

The insights gleaned from data can make you hyperaware of your organization’s efforts and give you actionable steps to improve various strategies—from altering marketing strategies to assessing customer complaints.

Basing decisions on inaccurate data can have far-reaching negative consequences, so it’s important to be able to trust your own data collection procedures and abilities. By ensuring accurate data collection, business professionals can feel secure in their business decisions.

Explore the options in the next section to see which data collection method is the best fit for your company.

7 Data Collection Methods Used in Business Analytics

Surveys are physical or digital questionnaires that gather both qualitative and quantitative data from subjects. One situation in which you might conduct a survey is gathering attendee feedback after an event. This can provide a sense of what attendees enjoyed, what they wish was different, and areas in which you can improve or save money during your next event for a similar audience.

While physical copies of surveys can be sent out to participants, online surveys present the opportunity for distribution at scale. They can also be inexpensive; running a survey can cost nothing if you use a free tool. If you wish to target a specific group of people, partnering with a market research firm to get the survey in front of that demographic may be worth the money.

Something to watch out for when crafting and running surveys is the effect of bias, including:

  • Collection bias : It can be easy to accidentally write survey questions with a biased lean. Watch out for this when creating questions to ensure your subjects answer honestly and aren’t swayed by your wording.
  • Subject bias : Because your subjects know their responses will be read by you, their answers may be biased toward what seems socially acceptable. For this reason, consider pairing survey data with behavioral data from other collection methods to get the full picture.

Related: 3 Examples of Bad Survey Questions & How to Fix Them

2. Transactional Tracking

Each time your customers make a purchase, tracking that data can allow you to make decisions about targeted marketing efforts and understand your customer base better.

Often, e-commerce and point-of-sale platforms allow you to store data as soon as it’s generated, making this a seamless data collection method that can pay off in the form of customer insights.

3. Interviews and Focus Groups

Interviews and focus groups consist of talking to subjects face-to-face about a specific topic or issue. Interviews tend to be one-on-one, and focus groups are typically made up of several people. You can use both to gather qualitative and quantitative data.

Through interviews and focus groups, you can gather feedback from people in your target audience about new product features. Seeing them interact with your product in real-time and recording their reactions and responses to questions can provide valuable data about which product features to pursue.

As is the case with surveys, these collection methods allow you to ask subjects anything you want about their opinions, motivations, and feelings regarding your product or brand. It also introduces the potential for bias. Aim to craft questions that don’t lead them in one particular direction.

One downside of interviewing and conducting focus groups is they can be time-consuming and expensive. If you plan to conduct them yourself, it can be a lengthy process. To avoid this, you can hire a market research facilitator to organize and conduct interviews on your behalf.

4. Observation

Observing people interacting with your website or product can be useful for data collection because of the candor it offers. If your user experience is confusing or difficult, you can witness it in real-time.

Yet, setting up observation sessions can be difficult. You can use a third-party tool to record users’ journeys through your site or observe a user’s interaction with a beta version of your site or product.

While less accessible than other data collection methods, observations enable you to see firsthand how users interact with your product or site. You can leverage the qualitative and quantitative data gleaned from this to make improvements and double down on points of success.

Business Analytics | Become a data-driven leader | Learn More

5. Online Tracking

To gather behavioral data, you can implement pixels and cookies. These are both tools that track users’ online behavior across websites and provide insight into what content they’re interested in and typically engage with.

You can also track users’ behavior on your company’s website, including which parts are of the highest interest, whether users are confused when using it, and how long they spend on product pages. This can enable you to improve the website’s design and help users navigate to their destination.

Inserting a pixel is often free and relatively easy to set up. Implementing cookies may come with a fee but could be worth it for the quality of data you’ll receive. Once pixels and cookies are set, they gather data on their own and don’t need much maintenance, if any.

It’s important to note: Tracking online behavior can have legal and ethical privacy implications. Before tracking users’ online behavior, ensure you’re in compliance with local and industry data privacy standards .

Online forms are beneficial for gathering qualitative data about users, specifically demographic data or contact information. They’re relatively inexpensive and simple to set up, and you can use them to gate content or registrations, such as webinars and email newsletters.

You can then use this data to contact people who may be interested in your product, build out demographic profiles of existing customers, and in remarketing efforts, such as email workflows and content recommendations.

Related: What Is Marketing Analytics?

7. Social Media Monitoring

Monitoring your company’s social media channels for follower engagement is an accessible way to track data about your audience’s interests and motivations. Many social media platforms have analytics built in, but there are also third-party social platforms that give more detailed, organized insights pulled from multiple channels.

You can use data collected from social media to determine which issues are most important to your followers. For instance, you may notice that the number of engagements dramatically increases when your company posts about its sustainability efforts.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

Building Your Data Capabilities

Understanding the variety of data collection methods available can help you decide which is best for your timeline, budget, and the question you’re aiming to answer. When stored together and combined, multiple data types collected through different methods can give an informed picture of your subjects and help you make better business decisions.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems. Not sure which course is right for you? Download our free flowchart .

This post was updated on October 17, 2022. It was originally published on December 2, 2021.

research data collection and analysis

About the Author

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Data Collection Methods | Step-by-Step Guide & Examples

Data Collection Methods | Step-by-Step Guide & Examples

Published on 4 May 2022 by Pritha Bhandari .

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental, or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem .

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The  aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Table of contents

Step 1: define the aim of your research, step 2: choose your data collection method, step 3: plan your data collection procedures, step 4: collect the data, frequently asked questions about data collection.

Before you start the process of data collection, you need to identify exactly what you want to achieve. You can start by writing a problem statement : what is the practical or scientific issue that you want to address, and why does it matter?

Next, formulate one or more research questions that precisely define what you want to find out. Depending on your research questions, you might need to collect quantitative or qualitative data :

  • Quantitative data is expressed in numbers and graphs and is analysed through statistical methods .
  • Qualitative data is expressed in words and analysed through interpretations and categorisations.

If your aim is to test a hypothesis , measure something precisely, or gain large-scale statistical insights, collect quantitative data. If your aim is to explore ideas, understand experiences, or gain detailed insights into a specific context, collect qualitative data.

If you have several aims, you can use a mixed methods approach that collects both types of data.

  • Your first aim is to assess whether there are significant differences in perceptions of managers across different departments and office locations.
  • Your second aim is to gather meaningful feedback from employees to explore new ideas for how managers can improve.

Prevent plagiarism, run a free check.

Based on the data you want to collect, decide which method is best suited for your research.

  • Experimental research is primarily a quantitative method.
  • Interviews , focus groups , and ethnographies are qualitative methods.
  • Surveys , observations, archival research, and secondary data collection can be quantitative or qualitative methods.

Carefully consider what method you will use to gather data that helps you directly answer your research questions.

Data collection methods
Method When to use How to collect data
Experiment To test a causal relationship. Manipulate variables and measure their effects on others.
Survey To understand the general characteristics or opinions of a group of people. Distribute a list of questions to a sample online, in person, or over the phone.
Interview/focus group To gain an in-depth understanding of perceptions or opinions on a topic. Verbally ask participants open-ended questions in individual interviews or focus group discussions.
Observation To understand something in its natural setting. Measure or survey a sample without trying to affect them.
Ethnography To study the culture of a community or organisation first-hand. Join and participate in a community and record your observations and reflections.
Archival research To understand current or historical events, conditions, or practices. Access manuscripts, documents, or records from libraries, depositories, or the internet.
Secondary data collection To analyse data from populations that you can’t access first-hand. Find existing datasets that have already been collected, from sources such as government agencies or research organisations.

When you know which method(s) you are using, you need to plan exactly how you will implement them. What procedures will you follow to make accurate observations or measurements of the variables you are interested in?

For instance, if you’re conducting surveys or interviews, decide what form the questions will take; if you’re conducting an experiment, make decisions about your experimental design .

Operationalisation

Sometimes your variables can be measured directly: for example, you can collect data on the average age of employees simply by asking for dates of birth. However, often you’ll be interested in collecting data on more abstract concepts or variables that can’t be directly observed.

Operationalisation means turning abstract conceptual ideas into measurable observations. When planning how you will collect data, you need to translate the conceptual definition of what you want to study into the operational definition of what you will actually measure.

  • You ask managers to rate their own leadership skills on 5-point scales assessing the ability to delegate, decisiveness, and dependability.
  • You ask their direct employees to provide anonymous feedback on the managers regarding the same topics.

You may need to develop a sampling plan to obtain data systematically. This involves defining a population , the group you want to draw conclusions about, and a sample, the group you will actually collect data from.

Your sampling method will determine how you recruit participants or obtain measurements for your study. To decide on a sampling method you will need to consider factors like the required sample size, accessibility of the sample, and time frame of the data collection.

Standardising procedures

If multiple researchers are involved, write a detailed manual to standardise data collection procedures in your study.

This means laying out specific step-by-step instructions so that everyone in your research team collects data in a consistent way – for example, by conducting experiments under the same conditions and using objective criteria to record and categorise observations.

This helps ensure the reliability of your data, and you can also use it to replicate the study in the future.

Creating a data management plan

Before beginning data collection, you should also decide how you will organise and store your data.

  • If you are collecting data from people, you will likely need to anonymise and safeguard the data to prevent leaks of sensitive information (e.g. names or identity numbers).
  • If you are collecting data via interviews or pencil-and-paper formats, you will need to perform transcriptions or data entry in systematic ways to minimise distortion.
  • You can prevent loss of data by having an organisation system that is routinely backed up.

Finally, you can implement your chosen methods to measure or observe the variables you are interested in.

The closed-ended questions ask participants to rate their manager’s leadership skills on scales from 1 to 5. The data produced is numerical and can be statistically analysed for averages and patterns.

To ensure that high-quality data is recorded in a systematic way, here are some best practices:

  • Record all relevant information as and when you obtain data. For example, note down whether or how lab equipment is recalibrated during an experimental study.
  • Double-check manual data entry for errors.
  • If you collect quantitative data, you can assess the reliability and validity to get an indication of your data quality.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organisations.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g., understanding the needs of your consumers or user testing your website).
  • You can control and standardise the process for high reliability and validity (e.g., choosing appropriate measurements and sampling methods ).

However, there are also some drawbacks: data collection can be time-consuming, labour-intensive, and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research , you also have to consider the internal and external validity of your experiment.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

Operationalisation means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioural avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalise the variables that you want to measure.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, May 04). Data Collection Methods | Step-by-Step Guide & Examples. Scribbr. Retrieved 26 August 2024, from https://www.scribbr.co.uk/research-methods/data-collection-guide/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs quantitative research | examples & methods, triangulation in research | guide, types, examples, what is a conceptual framework | tips & examples.

Table of Contents

What is data collection, why do we need data collection, what are the different data collection methods, data collection tools, the importance of ensuring accurate and appropriate data collection, issues related to maintaining the integrity of data collection, what are common challenges in data collection, what are the key steps in the data collection process, data collection considerations and best practices, choose the right data science program, are you interested in a career in data science, what is data collection: methods, types, tools.

What is Data Collection? Definition, Types, Tools, and Techniques

The process of gathering and analyzing accurate data from various sources to find answers to research problems, trends and probabilities, etc., to evaluate possible outcomes is Known as Data Collection. Knowledge is power, information is knowledge, and data is information in digitized form, at least as defined in IT. Hence, data is power. But before you can leverage that data into a successful strategy for your organization or business, you need to gather it. That’s your first step.

So, to help you get the process started, we shine a spotlight on data collection. What exactly is it? Believe it or not, it’s more than just doing a Google search! Furthermore, what are the different types of data collection? And what kinds of data collection tools and data collection techniques exist?

If you want to get up to speed about what is data collection process, you’ve come to the right place. 

Transform raw data into captivating visuals with Simplilearn's hands-on Data Visualization Courses and captivate your audience. Also, master the art of data management with Simplilearn's comprehensive data management courses  - unlock new career opportunities today!

Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences, business, and healthcare.

Accurate data collection is necessary to make informed business decisions, ensure quality assurance, and keep research integrity.

During data collection, the researchers must identify the data types, the sources of data, and what methods are being used. We will soon see that there are many different data collection methods . There is heavy reliance on data collection in research, commercial, and government fields.

Before an analyst begins collecting data, they must answer three questions first:

  • What’s the goal or purpose of this research?
  • What kinds of data are they planning on gathering?
  • What methods and procedures will be used to collect, store, and process the information?

Additionally, we can break up data into qualitative and quantitative types. Qualitative data covers descriptions such as color, size, quality, and appearance. Quantitative data, unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

Before a judge makes a ruling in a court case or a general creates a plan of attack, they must have as many relevant facts as possible. The best courses of action come from informed decisions, and information and data are synonymous.

The concept of data collection isn’t a new one, as we’ll see later, but the world has changed. There is far more data available today, and it exists in forms that were unheard of a century ago. The data collection process has had to change and grow with the times, keeping pace with technology.

Whether you’re in the world of academia, trying to conduct research, or part of the commercial sector, thinking of how to promote a new product, you need data collection to help you make better choices.

Now that you know what is data collection and why we need it, let's take a look at the different methods of data collection. While the phrase “data collection” may sound all high-tech and digital, it doesn’t necessarily entail things like computers, big data , and the internet. Data collection could mean a telephone survey, a mail-in comment card, or even some guy with a clipboard asking passersby some questions. But let’s see if we can sort the different data collection methods into a semblance of organized categories.

Primary and secondary methods of data collection are two approaches used to gather information for research or analysis purposes. Let's explore each data collection method in detail:

1. Primary Data Collection:

Primary data collection involves the collection of original data directly from the source or through direct interaction with the respondents. This method allows researchers to obtain firsthand information specifically tailored to their research objectives. There are various techniques for primary data collection, including:

a. Surveys and Questionnaires: Researchers design structured questionnaires or surveys to collect data from individuals or groups. These can be conducted through face-to-face interviews, telephone calls, mail, or online platforms.

b. Interviews: Interviews involve direct interaction between the researcher and the respondent. They can be conducted in person, over the phone, or through video conferencing. Interviews can be structured (with predefined questions), semi-structured (allowing flexibility), or unstructured (more conversational).

c. Observations: Researchers observe and record behaviors, actions, or events in their natural setting. This method is useful for gathering data on human behavior, interactions, or phenomena without direct intervention.

d. Experiments: Experimental studies involve the manipulation of variables to observe their impact on the outcome. Researchers control the conditions and collect data to draw conclusions about cause-and-effect relationships.

e. Focus Groups: Focus groups bring together a small group of individuals who discuss specific topics in a moderated setting. This method helps in understanding opinions, perceptions, and experiences shared by the participants.

2. Secondary Data Collection:

Secondary data collection involves using existing data collected by someone else for a purpose different from the original intent. Researchers analyze and interpret this data to extract relevant information. Secondary data can be obtained from various sources, including:

a. Published Sources: Researchers refer to books, academic journals, magazines, newspapers, government reports, and other published materials that contain relevant data.

b. Online Databases: Numerous online databases provide access to a wide range of secondary data, such as research articles, statistical information, economic data, and social surveys.

c. Government and Institutional Records: Government agencies, research institutions, and organizations often maintain databases or records that can be used for research purposes.

d. Publicly Available Data: Data shared by individuals, organizations, or communities on public platforms, websites, or social media can be accessed and utilized for research.

e. Past Research Studies: Previous research studies and their findings can serve as valuable secondary data sources. Researchers can review and analyze the data to gain insights or build upon existing knowledge.

Now that we’ve explained the various techniques, let’s narrow our focus even further by looking at some specific tools. For example, we mentioned interviews as a technique, but we can further break that down into different interview types (or “tools”).

Word Association

The researcher gives the respondent a set of words and asks them what comes to mind when they hear each word.

Sentence Completion

Researchers use sentence completion to understand what kind of ideas the respondent has. This tool involves giving an incomplete sentence and seeing how the interviewee finishes it.

Role-Playing

Respondents are presented with an imaginary situation and asked how they would act or react if it was real.

In-Person Surveys

The researcher asks questions in person.

Online/Web Surveys

These surveys are easy to accomplish, but some users may be unwilling to answer truthfully, if at all.

Mobile Surveys

These surveys take advantage of the increasing proliferation of mobile technology. Mobile collection surveys rely on mobile devices like tablets or smartphones to conduct surveys via SMS or mobile apps.

Phone Surveys

No researcher can call thousands of people at once, so they need a third party to handle the chore. However, many people have call screening and won’t answer.

Observation

Sometimes, the simplest method is the best. Researchers who make direct observations collect data quickly and easily, with little intrusion or third-party bias. Naturally, it’s only effective in small-scale situations.

Accurate data collecting is crucial to preserving the integrity of research, regardless of the subject of study or preferred method for defining data (quantitative, qualitative). Errors are less likely to occur when the right data gathering tools are used (whether they are brand-new ones, updated versions of them, or already available).

Among the effects of data collection done incorrectly, include the following -

  • Erroneous conclusions that squander resources
  • Decisions that compromise public policy
  • Incapacity to correctly respond to research inquiries
  • Bringing harm to participants who are humans or animals
  • Deceiving other researchers into pursuing futile research avenues
  • The study's inability to be replicated and validated

When these study findings are used to support recommendations for public policy, there is the potential to result in disproportionate harm, even if the degree of influence from flawed data collecting may vary by discipline and the type of investigation.

Let us now look at the various issues that we might face while maintaining the integrity of data collection.

In order to assist the errors detection process in the data gathering process, whether they were done purposefully (deliberate falsifications) or not, maintaining data integrity is the main justification (systematic or random errors).

Quality assurance and quality control are two strategies that help protect data integrity and guarantee the scientific validity of study results.

Each strategy is used at various stages of the research timeline:

  • Quality control - tasks that are performed both after and during data collecting
  • Quality assurance - events that happen before data gathering starts

Let us explore each of them in more detail now.

Quality Assurance

As data collecting comes before quality assurance, its primary goal is "prevention" (i.e., forestalling problems with data collection). The best way to protect the accuracy of data collection is through prevention. The uniformity of protocol created in the thorough and exhaustive procedures manual for data collecting serves as the best example of this proactive step. 

The likelihood of failing to spot issues and mistakes early in the research attempt increases when guides are written poorly. There are several ways to show these shortcomings:

  • Failure to determine the precise subjects and methods for retraining or training staff employees in data collecting
  • List of goods to be collected, in part
  • There isn't a system in place to track modifications to processes that may occur as the investigation continues.
  • Instead of detailed, step-by-step instructions on how to deliver tests, there is a vague description of the data gathering tools that will be employed.
  • Uncertainty regarding the date, procedure, and identity of the person or people in charge of examining the data
  • Incomprehensible guidelines for using, adjusting, and calibrating the data collection equipment.

Now, let us look at how to ensure Quality Control.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Quality Control

Despite the fact that quality control actions (detection/monitoring and intervention) take place both after and during data collection, the specifics should be meticulously detailed in the procedures manual. Establishing monitoring systems requires a specific communication structure, which is a prerequisite. Following the discovery of data collection problems, there should be no ambiguity regarding the information flow between the primary investigators and staff personnel. A poorly designed communication system promotes slack oversight and reduces opportunities for error detection.

Direct staff observation conference calls, during site visits, or frequent or routine assessments of data reports to spot discrepancies, excessive numbers, or invalid codes can all be used as forms of detection or monitoring. Site visits might not be appropriate for all disciplines. Still, without routine auditing of records, whether qualitative or quantitative, it will be challenging for investigators to confirm that data gathering is taking place in accordance with the manual's defined methods. Additionally, quality control determines the appropriate solutions, or "actions," to fix flawed data gathering procedures and reduce recurrences.

Problems with data collection, for instance, that call for immediate action include:

  • Fraud or misbehavior
  • Systematic mistakes, procedure violations 
  • Individual data items with errors
  • Issues with certain staff members or a site's performance 

Researchers are trained to include one or more secondary measures that can be used to verify the quality of information being obtained from the human subject in the social and behavioral sciences where primary data collection entails using human subjects. 

For instance, a researcher conducting a survey would be interested in learning more about the prevalence of risky behaviors among young adults as well as the social factors that influence these risky behaviors' propensity for and frequency. Let us now explore the common challenges with regard to data collection.

There are some prevalent challenges faced while collecting data, let us explore a few of them to understand them better and avoid them.

Data Quality Issues

The main threat to the broad and successful application of machine learning is poor data quality. Data quality must be your top priority if you want to make technologies like machine learning work for you. Let's talk about some of the most prevalent data quality problems in this blog article and how to fix them.

Inconsistent Data

When working with various data sources, it's conceivable that the same information will have discrepancies between sources. The differences could be in formats, units, or occasionally spellings. The introduction of inconsistent data might also occur during firm mergers or relocations. Inconsistencies in data have a tendency to accumulate and reduce the value of data if they are not continually resolved. Organizations that have heavily focused on data consistency do so because they only want reliable data to support their analytics.

Data Downtime

Data is the driving force behind the decisions and operations of data-driven businesses. However, there may be brief periods when their data is unreliable or not prepared. Customer complaints and subpar analytical outcomes are only two ways that this data unavailability can have a significant impact on businesses. A data engineer spends about 80% of their time updating, maintaining, and guaranteeing the integrity of the data pipeline. In order to ask the next business question, there is a high marginal cost due to the lengthy operational lead time from data capture to insight.

Schema modifications and migration problems are just two examples of the causes of data downtime. Data pipelines can be difficult due to their size and complexity. Data downtime must be continuously monitored, and it must be reduced through automation.

Ambiguous Data

Even with thorough oversight, some errors can still occur in massive databases or data lakes. For data streaming at a fast speed, the issue becomes more overwhelming. Spelling mistakes can go unnoticed, formatting difficulties can occur, and column heads might be deceptive. This unclear data might cause a number of problems for reporting and analytics.

Become a Data Science Expert & Get Your Dream Job

Become a Data Science Expert & Get Your Dream Job

Duplicate Data

Streaming data, local databases, and cloud data lakes are just a few of the sources of data that modern enterprises must contend with. They might also have application and system silos. These sources are likely to duplicate and overlap each other quite a bit. For instance, duplicate contact information has a substantial impact on customer experience. If certain prospects are ignored while others are engaged repeatedly, marketing campaigns suffer. The likelihood of biased analytical outcomes increases when duplicate data are present. It can also result in ML models with biased training data.

Too Much Data

While we emphasize data-driven analytics and its advantages, a data quality problem with excessive data exists. There is a risk of getting lost in an abundance of data when searching for information pertinent to your analytical efforts. Data scientists, data analysts, and business users devote 80% of their work to finding and organizing the appropriate data. With an increase in data volume, other problems with data quality become more serious, particularly when dealing with streaming data and big files or databases.

Inaccurate Data

For highly regulated businesses like healthcare, data accuracy is crucial. Given the current experience, it is more important than ever to increase the data quality for COVID-19 and later pandemics. Inaccurate information does not provide you with a true picture of the situation and cannot be used to plan the best course of action. Personalized customer experiences and marketing strategies underperform if your customer data is inaccurate.

Data inaccuracies can be attributed to a number of things, including data degradation, human mistake, and data drift. Worldwide data decay occurs at a rate of about 3% per month, which is quite concerning. Data integrity can be compromised while being transferred between different systems, and data quality might deteriorate with time.

Hidden Data

The majority of businesses only utilize a portion of their data, with the remainder sometimes being lost in data silos or discarded in data graveyards. For instance, the customer service team might not receive client data from sales, missing an opportunity to build more precise and comprehensive customer profiles. Missing out on possibilities to develop novel products, enhance services, and streamline procedures is caused by hidden data.

Finding Relevant Data

Finding relevant data is not so easy. There are several factors that we need to consider while trying to find relevant data, which include -

  • Relevant Domain
  • Relevant demographics
  • Relevant Time period and so many more factors that we need to consider while trying to find relevant data.

Data that is not relevant to our study in any of the factors render it obsolete and we cannot effectively proceed with its analysis. This could lead to incomplete research or analysis, re-collecting data again and again, or shutting down the study.

Deciding the Data to Collect

Determining what data to collect is one of the most important factors while collecting data and should be one of the first factors while collecting data. We must choose the subjects the data will cover, the sources we will be used to gather it, and the quantity of information we will require. Our responses to these queries will depend on our aims, or what we expect to achieve utilizing your data. As an illustration, we may choose to gather information on the categories of articles that website visitors between the ages of 20 and 50 most frequently access. We can also decide to compile data on the typical age of all the clients who made a purchase from your business over the previous month.

Not addressing this could lead to double work and collection of irrelevant data or ruining your study as a whole.

Dealing With Big Data

Big data refers to exceedingly massive data sets with more intricate and diversified structures. These traits typically result in increased challenges while storing, analyzing, and using additional methods of extracting results. Big data refers especially to data sets that are quite enormous or intricate that conventional data processing tools are insufficient. The overwhelming amount of data, both unstructured and structured, that a business faces on a daily basis. 

The amount of data produced by healthcare applications, the internet, social networking sites social, sensor networks, and many other businesses are rapidly growing as a result of recent technological advancements. Big data refers to the vast volume of data created from numerous sources in a variety of formats at extremely fast rates. Dealing with this kind of data is one of the many challenges of Data Collection and is a crucial step toward collecting effective data. 

Low Response and Other Research Issues

Poor design and low response rates were shown to be two issues with data collecting, particularly in health surveys that used questionnaires. This might lead to an insufficient or inadequate supply of data for the study. Creating an incentivized data collection program might be beneficial in this case to get more responses.

Now, let us look at the key steps in the data collection process.

In the Data Collection Process, there are 5 key steps. They are explained briefly below -

1. Decide What Data You Want to Gather

The first thing that we need to do is decide what information we want to gather. We must choose the subjects the data will cover, the sources we will use to gather it, and the quantity of information that we would require. For instance, we may choose to gather information on the categories of products that an average e-commerce website visitor between the ages of 30 and 45 most frequently searches for. 

2. Establish a Deadline for Data Collection

The process of creating a strategy for data collection can now begin. We should set a deadline for our data collection at the outset of our planning phase. Some forms of data we might want to continuously collect. We might want to build up a technique for tracking transactional data and website visitor statistics over the long term, for instance. However, we will track the data throughout a certain time frame if we are tracking it for a particular campaign. In these situations, we will have a schedule for when we will begin and finish gathering data. 

3. Select a Data Collection Approach

We will select the data collection technique that will serve as the foundation of our data gathering plan at this stage. We must take into account the type of information that we wish to gather, the time period during which we will receive it, and the other factors we decide on to choose the best gathering strategy.

4. Gather Information

Once our plan is complete, we can put our data collection plan into action and begin gathering data. In our DMP, we can store and arrange our data. We need to be careful to follow our plan and keep an eye on how it's doing. Especially if we are collecting data regularly, setting up a timetable for when we will be checking in on how our data gathering is going may be helpful. As circumstances alter and we learn new details, we might need to amend our plan.

5. Examine the Information and Apply Your Findings

It's time to examine our data and arrange our findings after we have gathered all of our information. The analysis stage is essential because it transforms unprocessed data into insightful knowledge that can be applied to better our marketing plans, goods, and business judgments. The analytics tools included in our DMP can be used to assist with this phase. We can put the discoveries to use to enhance our business once we have discovered the patterns and insights in our data.

Let us now look at some data collection considerations and best practices that one might follow.

We must carefully plan before spending time and money traveling to the field to gather data. While saving time and resources, effective data collection strategies can help us collect richer, more accurate, and richer data.

Below, we will be discussing some of the best practices that we can follow for the best results -

1. Take Into Account the Price of Each Extra Data Point

Once we have decided on the data we want to gather, we need to make sure to take the expense of doing so into account. Our surveyors and respondents will incur additional costs for each additional data point or survey question.

2. Plan How to Gather Each Data Piece

There is a dearth of freely accessible data. Sometimes the data is there, but we may not have access to it. For instance, unless we have a compelling cause, we cannot openly view another person's medical information. It could be challenging to measure several types of information.

Consider how time-consuming and difficult it will be to gather each piece of information while deciding what data to acquire.

3. Think About Your Choices for Data Collecting Using Mobile Devices

Mobile-based data collecting can be divided into three categories -

  • IVRS (interactive voice response technology) -  Will call the respondents and ask them questions that have already been recorded. 
  • SMS data collection - Will send a text message to the respondent, who can then respond to questions by text on their phone. 
  • Field surveyors - Can directly enter data into an interactive questionnaire while speaking to each respondent, thanks to smartphone apps.

We need to make sure to select the appropriate tool for our survey and responders because each one has its own disadvantages and advantages.

4. Carefully Consider the Data You Need to Gather

It's all too easy to get information about anything and everything, but it's crucial to only gather the information that we require. 

It is helpful to consider these 3 questions:

  • What details will be helpful?
  • What details are available?
  • What specific details do you require?

5. Remember to Consider Identifiers

Identifiers, or details describing the context and source of a survey response, are just as crucial as the information about the subject or program that we are actually researching.

In general, adding more identifiers will enable us to pinpoint our program's successes and failures with greater accuracy, but moderation is the key.

6. Data Collecting Through Mobile Devices is the Way to Go

Although collecting data on paper is still common, modern technology relies heavily on mobile devices. They enable us to gather many various types of data at relatively lower prices and are accurate as well as quick. There aren't many reasons not to pick mobile-based data collecting with the boom of low-cost Android devices that are available nowadays.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

Are you thinking about pursuing a career in the field of data science? Simplilearn's Data Science courses are designed to provide you with the necessary skills and expertise to excel in this rapidly changing field. Here's a detailed comparison for your reference:

Program Name Data Scientist Master's Program Post Graduate Program In Data Science Post Graduate Program In Data Science Geo All Geos All Geos Not Applicable in US University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more 8+ skills including Exploratory Data Analysis, Descriptive Statistics, Inferential Statistics, and more 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more Additional Benefits Applied Learning via Capstone and 25+ Data Science Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

We live in the Data Age, and if you want a career that fully takes advantage of this, you should consider a career in data science. Simplilearn offers a Caltech Post Graduate Program in Data Science  that will train you in everything you need to know to secure the perfect position. This Data Science PG program is ideal for all working professionals, covering job-critical topics like R, Python programming , machine learning algorithms , NLP concepts , and data visualization with Tableau in great detail. This is all provided via our interactive learning model with live sessions by global practitioners, practical labs, and industry projects.

1. What is data collection with example?

Data collection is the process of collecting and analyzing information on relevant variables in a predetermined, methodical way so that one can respond to specific research questions, test hypotheses, and assess results. Data collection can be either qualitative or quantitative. Example: A company collects customer feedback through online surveys and social media monitoring to improve their products and services.

2. What are the primary data collection methods?

As is well known, gathering primary data is costly and time intensive. The main techniques for gathering data are observation, interviews, questionnaires, schedules, and surveys.

3. What are data collection tools?

The term "data collecting tools" refers to the tools/devices used to gather data, such as a paper questionnaire or a system for computer-assisted interviews. Tools used to gather data include case studies, checklists, interviews, occasionally observation, surveys, and questionnaires.

4. What’s the difference between quantitative and qualitative methods?

While qualitative research focuses on words and meanings, quantitative research deals with figures and statistics. You can systematically measure variables and test hypotheses using quantitative methods. You can delve deeper into ideas and experiences using qualitative methodologies.

5. What are quantitative data collection methods?

While there are numerous other ways to get quantitative information, the methods indicated above—probability sampling, interviews, questionnaire observation, and document review—are the most typical and frequently employed, whether collecting information offline or online.

6. What is mixed methods research?

User research that includes both qualitative and quantitative techniques is known as mixed methods research. For deeper user insights, mixed methods research combines insightful user data with useful statistics.

7. What are the benefits of collecting data?

Collecting data offers several benefits, including:

  • Knowledge and Insight
  • Evidence-Based Decision Making
  • Problem Identification and Solution
  • Validation and Evaluation
  • Identifying Trends and Predictions
  • Support for Research and Development
  • Policy Development
  • Quality Improvement
  • Personalization and Targeting
  • Knowledge Sharing and Collaboration

8. What’s the difference between reliability and validity?

Reliability is about consistency and stability, while validity is about accuracy and appropriateness. Reliability focuses on the consistency of results, while validity focuses on whether the results are actually measuring what they are intended to measure. Both reliability and validity are crucial considerations in research to ensure the trustworthiness and meaningfulness of the collected data and measurements.

Data Science & Business Analytics Courses Duration and Fees

Data Science & Business Analytics programs typically range from a few weeks to several months, with fees varying based on program and institution.

Program NameDurationFees

Cohort Starts:

32 weeks€ 1,790

Cohort Starts:

11 Months€ 3,790

Cohort Starts:

11 months€ 2,290

Cohort Starts:

8 months€ 2,790

Cohort Starts:

11 months€ 2,790

Cohort Starts:

3 Months€ 1,999
11 months€ 1,099
11 months€ 1,099

Recommended Reads

Data Science Career Guide: A Comprehensive Playbook To Becoming A Data Scientist

Difference Between Collection and Collections in Java

An Ultimate One-Stop Solution Guide to Collections in C# Programming With Examples

Managing Data

Capped Collection in MongoDB

What Are Java Collections and How to Implement Them?

Get Affiliated Certifications with Live Class programs

Data scientist.

  • Industry-recognized Data Scientist Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts

Caltech Data Sciences-Bootcamp

  • Exclusive visit to Caltech’s Robotics Lab

Caltech Post Graduate Program in Data Science

  • Earn a program completion certificate from Caltech CTME
  • Curriculum delivered in live online sessions by industry experts
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Grad Med Educ
  • v.8(2); 2016 May

Design: Selection of Data Collection Methods

Associated data.

Editor's Note: The online version of this article contains resources for further reading and a table of strengths and limitations of qualitative data collection methods.

The Challenge

Imagine that residents in your program have been less than complimentary about interprofessional rounds (IPRs). The program director asks you to determine what residents are learning about in collaboration with other health professionals during IPRs. If you construct a survey asking Likert-type questions such as “How much are you learning?” you likely will not gather the information you need to answer this question. You understand that qualitative data deal with words rather than numbers and could provide the needed answers. How do you collect “good” words? Should you use open-ended questions in a survey format? Should you conduct interviews, focus groups, or conduct direct observation? What should you consider when making these decisions?

Introduction

Qualitative research is often employed when there is a problem and no clear solutions exist, as in the case above that elicits the following questions: Why are residents complaining about rounds? How could we make rounds better? In this context, collecting “good” information or words (qualitative data) is intended to produce information that helps you to answer your research questions, capture the phenomenon of interest, and account for context and the rich texture of the human experience. You may also aim to challenge previous thinking and invite further inquiry.

Coherence or alignment between all aspects of the research project is essential. In this Rip Out we focus on data collection, but in qualitative research, the entire project must be considered. 1 , 2 Careful design of the data collection phase requires the following: deciding who will do what, where, when, and how at the different stages of the research process; acknowledging the role of the researcher as an instrument of data collection; and carefully considering the context studied and the participants and informants involved in the research.

Types of Data Collection Methods

Data collection methods are important, because how the information collected is used and what explanations it can generate are determined by the methodology and analytical approach applied by the researcher. 1 , 2 Five key data collection methods are presented here, with their strengths and limitations described in the online supplemental material.

  • 1 Questions added to surveys to obtain qualitative data typically are open-ended with a free-text format. Surveys are ideal for documenting perceptions, attitudes, beliefs, or knowledge within a clear, predetermined sample of individuals. “Good” open-ended questions should be specific enough to yield coherent responses across respondents, yet broad enough to invite a spectrum of answers. Examples for this scenario include: What is the function of IPRs? What is the educational value of IPRs, according to residents? Qualitative survey data can be analyzed using a range of techniques.
  • 2 Interviews are used to gather information from individuals 1-on-1, using a series of predetermined questions or a set of interest areas. Interviews are often recorded and transcribed. They can be structured or unstructured; they can either follow a tightly written script that mimics a survey or be inspired by a loose set of questions that invite interviewees to express themselves more freely. Interviewers need to actively listen and question, probe, and prompt further to collect richer data. Interviews are ideal when used to document participants' accounts, perceptions of, or stories about attitudes toward and responses to certain situations or phenomena. Interview data are often used to generate themes , theories , and models . Many research questions that can be answered with surveys can also be answered through interviews, but interviews will generally yield richer, more in-depth data than surveys. Interviews do, however, require more time and resources to conduct and analyze. Importantly, because interviewers are the instruments of data collection, interviewers should be trained to collect comparable data. The number of interviews required depends on the research question and the overarching methodology used. Examples of these questions include: How do residents experience IPRs? What do residents' stories about IPRs tell us about interprofessional care hierarchies?
  • 3 Focus groups are used to gather information in a group setting, either through predetermined interview questions that the moderator asks of participants in turn or through a script to stimulate group conversations. Ideally, they are used when the sum of a group of people's experiences may offer more than a single individual's experiences in understanding social phenomena. Focus groups also allow researchers to capture participants' reactions to the comments and perspectives shared by other participants, and are thus a way to capture similarities and differences in viewpoints. The number of focus groups required will vary based on the questions asked and the number of different stakeholders involved, such as residents, nurses, social workers, pharmacists, and patients. The optimal number of participants per focus group, to generate rich discussion while enabling all members to speak, is 8 to 10 people. 3 Examples of questions include: How would residents, nurses, and pharmacists redesign or improve IPRs to maximize engagement, participation, and use of time? How do suggestions compare across professional groups?
  • 4 Observations are used to gather information in situ using the senses: vision, hearing, touch, and smell. Observations allow us to investigate and document what people do —their everyday behavior—and to try to understand why they do it, rather than focus on their own perceptions or recollections. Observations are ideal when used to document, explore, and understand, as they occur, activities, actions, relationships, culture, or taken-for-granted ways of doing things. As with the previous methods, the number of observations required will depend on the research question and overarching research approach used. Examples of research questions include: How do residents use their time during IPRs? How do they relate to other health care providers? What kind of language and body language are used to describe patients and their families during IPRs?
  • 5 Textual or content analysis is ideal when used to investigate changes in official, institutional, or organizational views on a specific topic or area to document the context of certain practices or to investigate the experiences and perspectives of a group of individuals who have, for example, engaged in written reflection. Textual analysis can be used as the main method in a research project or to contextualize findings from another method. The choice and number of documents has to be guided by the research question, but can include newspaper or research articles, governmental reports, organization policies and protocols, letters, records, films, photographs, art, meeting notes, or checklists. The development of a coding grid or scheme for analysis will be guided by the research question and will be iteratively applied to selected documents. Examples of research questions include: How do our local policies and protocols for IPRs reflect or contrast with the broader discourses of interprofessional collaboration? What are the perceived successful features of IPRs in the literature? What are the key features of residents' reflections on their interprofessional experiences during IPRs?

How You Can Start TODAY

  • • Review medical education journals to find qualitative research in your area of interest and focus on the methods used as well as the findings.
  • • When you have chosen a method, read several different sources on it.
  • • From your readings, identify potential colleagues with expertise in your choice of qualitative method as well as others in your discipline who would like to learn more and organize potential working groups to discuss challenges that arise in your work.

What You Can Do LONG TERM

  • • Either locally or nationally, build a community of like-minded scholars to expand your qualitative expertise.
  • • Use a range of methods to develop a broad program of qualitative research.

Supplementary Material

  • Privacy Policy

Research Method

Home » Qualitative Data – Types, Methods and Examples

Qualitative Data – Types, Methods and Examples

Table of Contents

Qualitative Data

Qualitative Data

Definition:

Qualitative data is a type of data that is collected and analyzed in a non-numerical form, such as words, images, or observations. It is generally used to gain an in-depth understanding of complex phenomena, such as human behavior, attitudes, and beliefs.

Types of Qualitative Data

There are various types of qualitative data that can be collected and analyzed, including:

  • Interviews : These involve in-depth, face-to-face conversations with individuals or groups to gather their perspectives, experiences, and opinions on a particular topic.
  • Focus Groups: These are group discussions where a facilitator leads a discussion on a specific topic, allowing participants to share their views and experiences.
  • Observations : These involve observing and recording the behavior and interactions of individuals or groups in a particular setting.
  • Case Studies: These involve in-depth analysis of a particular individual, group, or organization, usually over an extended period.
  • Document Analysis : This involves examining written or recorded materials, such as newspaper articles, diaries, or public records, to gain insight into a particular topic.
  • Visual Data : This involves analyzing images or videos to understand people’s experiences or perspectives on a particular topic.
  • Online Data: This involves analyzing data collected from social media platforms, forums, or online communities to understand people’s views and opinions on a particular topic.

Qualitative Data Formats

Qualitative data can be collected and presented in various formats. Some common formats include:

  • Textual data: This includes written or transcribed data from interviews, focus groups, or observations. It can be analyzed using various techniques such as thematic analysis or content analysis.
  • Audio data: This includes recordings of interviews or focus groups, which can be transcribed and analyzed using software such as NVivo.
  • Visual data: This includes photographs, videos, or drawings, which can be analyzed using techniques such as visual analysis or semiotics.
  • Mixed media data : This includes data collected in different formats, such as audio and text. This can be analyzed using mixed methods research, which combines both qualitative and quantitative research methods.
  • Field notes: These are notes taken by researchers during observations, which can include descriptions of the setting, behaviors, and interactions of participants.

Qualitative Data Analysis Methods

Qualitative data analysis refers to the process of systematically analyzing and interpreting qualitative data to identify patterns, themes, and relationships. Here are some common methods of analyzing qualitative data:

  • Thematic analysis: This involves identifying and analyzing patterns or themes within the data. It involves coding the data into themes and subthemes and organizing them into a coherent narrative.
  • Content analysis: This involves analyzing the content of the data, such as the words, phrases, or images used. It involves identifying patterns and themes in the data and examining the relationships between them.
  • Discourse analysis: This involves analyzing the language and communication used in the data, such as the meaning behind certain words or phrases. It involves examining how the language constructs and shapes social reality.
  • Grounded theory: This involves developing a theory or framework based on the data. It involves identifying patterns and themes in the data and using them to develop a theory that explains the phenomenon being studied.
  • Narrative analysis : This involves analyzing the stories and narratives present in the data. It involves examining how the stories are constructed and how they contribute to the overall understanding of the phenomenon being studied.
  • Ethnographic analysis : This involves analyzing the culture and social practices present in the data. It involves examining how the cultural and social practices contribute to the phenomenon being studied.

Qualitative Data Collection Guide

Here are some steps to guide the collection of qualitative data:

  • Define the research question : Start by clearly defining the research question that you want to answer. This will guide the selection of data collection methods and help to ensure that the data collected is relevant to the research question.
  • Choose data collection methods : Select the most appropriate data collection methods based on the research question, the research design, and the resources available. Common methods include interviews, focus groups, observations, document analysis, and participatory research.
  • Develop a data collection plan : Develop a plan for data collection that outlines the specific procedures, timelines, and resources needed for each data collection method. This plan should include details such as how to recruit participants, how to conduct interviews or focus groups, and how to record and store data.
  • Obtain ethical approval : Obtain ethical approval from an institutional review board or ethics committee before beginning data collection. This is particularly important when working with human participants to ensure that their rights and interests are protected.
  • Recruit participants: Recruit participants based on the research question and the data collection methods chosen. This may involve purposive sampling, snowball sampling, or random sampling.
  • Collect data: Collect data using the chosen data collection methods. This may involve conducting interviews, facilitating focus groups, observing participants, or analyzing documents.
  • Transcribe and store data : Transcribe and store the data in a secure location. This may involve transcribing audio or video recordings, organizing field notes, or scanning documents.
  • Analyze data: Analyze the data using appropriate qualitative data analysis methods, such as thematic analysis or content analysis.
  • I nterpret findings : Interpret the findings of the data analysis in the context of the research question and the relevant literature. This may involve developing new theories or frameworks, or validating existing ones.
  • Communicate results: Communicate the results of the research in a clear and concise manner, using appropriate language and visual aids where necessary. This may involve writing a report, presenting at a conference, or publishing in a peer-reviewed journal.

Qualitative Data Examples

Some examples of qualitative data in different fields are as follows:

  • Sociology : In sociology, qualitative data is used to study social phenomena such as culture, norms, and social relationships. For example, a researcher might conduct interviews with members of a community to understand their beliefs and practices.
  • Psychology : In psychology, qualitative data is used to study human behavior, emotions, and attitudes. For example, a researcher might conduct a focus group to explore how individuals with anxiety cope with their symptoms.
  • Education : In education, qualitative data is used to study learning processes and educational outcomes. For example, a researcher might conduct observations in a classroom to understand how students interact with each other and with their teacher.
  • Marketing : In marketing, qualitative data is used to understand consumer behavior and preferences. For example, a researcher might conduct in-depth interviews with customers to understand their purchasing decisions.
  • Anthropology : In anthropology, qualitative data is used to study human cultures and societies. For example, a researcher might conduct participant observation in a remote community to understand their customs and traditions.
  • Health Sciences: In health sciences, qualitative data is used to study patient experiences, beliefs, and preferences. For example, a researcher might conduct interviews with cancer patients to understand how they cope with their illness.

Application of Qualitative Data

Qualitative data is used in a variety of fields and has numerous applications. Here are some common applications of qualitative data:

  • Exploratory research: Qualitative data is often used in exploratory research to understand a new or unfamiliar topic. Researchers use qualitative data to generate hypotheses and develop a deeper understanding of the research question.
  • Evaluation: Qualitative data is often used to evaluate programs or interventions. Researchers use qualitative data to understand the impact of a program or intervention on the people who participate in it.
  • Needs assessment: Qualitative data is often used in needs assessments to understand the needs of a specific population. Researchers use qualitative data to identify the most pressing needs of the population and develop strategies to address those needs.
  • Case studies: Qualitative data is often used in case studies to understand a particular case in detail. Researchers use qualitative data to understand the context, experiences, and perspectives of the people involved in the case.
  • Market research: Qualitative data is often used in market research to understand consumer behavior and preferences. Researchers use qualitative data to gain insights into consumer attitudes, opinions, and motivations.
  • Social and cultural research : Qualitative data is often used in social and cultural research to understand social phenomena such as culture, norms, and social relationships. Researchers use qualitative data to understand the experiences, beliefs, and practices of individuals and communities.

Purpose of Qualitative Data

The purpose of qualitative data is to gain a deeper understanding of social phenomena that cannot be captured by numerical or quantitative data. Qualitative data is collected through methods such as observation, interviews, and focus groups, and it provides descriptive information that can shed light on people’s experiences, beliefs, attitudes, and behaviors.

Qualitative data serves several purposes, including:

  • Generating hypotheses: Qualitative data can be used to generate hypotheses about social phenomena that can be further tested with quantitative data.
  • Providing context : Qualitative data provides a rich and detailed context for understanding social phenomena that cannot be captured by numerical data alone.
  • Exploring complex phenomena : Qualitative data can be used to explore complex phenomena such as culture, social relationships, and the experiences of marginalized groups.
  • Evaluating programs and intervention s: Qualitative data can be used to evaluate the impact of programs and interventions on the people who participate in them.
  • Enhancing understanding: Qualitative data can be used to enhance understanding of the experiences, beliefs, and attitudes of individuals and communities, which can inform policy and practice.

When to use Qualitative Data

Qualitative data is appropriate when the research question requires an in-depth understanding of complex social phenomena that cannot be captured by numerical or quantitative data.

Here are some situations when qualitative data is appropriate:

  • Exploratory research : Qualitative data is often used in exploratory research to generate hypotheses and develop a deeper understanding of a research question.
  • Understanding social phenomena : Qualitative data is appropriate when the research question requires an in-depth understanding of social phenomena such as culture, social relationships, and experiences of marginalized groups.
  • Program evaluation: Qualitative data is often used in program evaluation to understand the impact of a program on the people who participate in it.
  • Needs assessment: Qualitative data is often used in needs assessments to understand the needs of a specific population.
  • Market research: Qualitative data is often used in market research to understand consumer behavior and preferences.
  • Case studies: Qualitative data is often used in case studies to understand a particular case in detail.

Characteristics of Qualitative Data

Here are some characteristics of qualitative data:

  • Descriptive : Qualitative data provides a rich and detailed description of the social phenomena under investigation.
  • Contextual : Qualitative data is collected in the context in which the social phenomena occur, which allows for a deeper understanding of the phenomena.
  • Subjective : Qualitative data reflects the subjective experiences, beliefs, attitudes, and behaviors of the individuals and communities under investigation.
  • Flexible : Qualitative data collection methods are flexible and can be adapted to the specific needs of the research question.
  • Emergent : Qualitative data analysis is often an iterative process, where new themes and patterns emerge as the data is analyzed.
  • Interpretive : Qualitative data analysis involves interpretation of the data, which requires the researcher to be reflexive and aware of their own biases and assumptions.
  • Non-standardized: Qualitative data collection methods are often non-standardized, which means that the data is not collected in a standardized or uniform way.

Advantages of Qualitative Data

Some advantages of qualitative data are as follows:

  • Richness : Qualitative data provides a rich and detailed description of the social phenomena under investigation, allowing for a deeper understanding of the phenomena.
  • Flexibility : Qualitative data collection methods are flexible and can be adapted to the specific needs of the research question, allowing for a more nuanced exploration of social phenomena.
  • Contextualization : Qualitative data is collected in the context in which the social phenomena occur, which allows for a deeper understanding of the phenomena and their cultural and social context.
  • Subjectivity : Qualitative data reflects the subjective experiences, beliefs, attitudes, and behaviors of the individuals and communities under investigation, allowing for a more holistic understanding of the phenomena.
  • New insights : Qualitative data can generate new insights and hypotheses that can be further tested with quantitative data.
  • Participant voice : Qualitative data collection methods often involve direct participation by the individuals and communities under investigation, allowing for their voices to be heard.
  • Ethical considerations: Qualitative data collection methods often prioritize ethical considerations such as informed consent, confidentiality, and respect for the autonomy of the participants.

Limitations of Qualitative Data

Here are some limitations of qualitative data:

  • Subjectivity : Qualitative data is subjective, and the interpretation of the data depends on the researcher’s own biases, assumptions, and perspectives.
  • Small sample size: Qualitative data collection methods often involve a small sample size, which limits the generalizability of the findings.
  • Time-consuming: Qualitative data collection and analysis can be time-consuming, as it requires in-depth engagement with the data and often involves iterative processes.
  • Limited statistical analysis: Qualitative data is often not suitable for statistical analysis, which limits the ability to draw quantitative conclusions from the data.
  • Limited comparability: Qualitative data collection methods are often non-standardized, which makes it difficult to compare findings across different studies or contexts.
  • Social desirability bias : Qualitative data collection methods often rely on self-reporting by the participants, which can be influenced by social desirability bias.
  • Researcher bias: The researcher’s own biases, assumptions, and perspectives can influence the data collection and analysis, which can limit the objectivity of the findings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Quantitative Data

Quantitative Data – Types, Methods and Examples

Research Information

Information in Research – Types and Examples

Primary Data

Primary Data – Types, Methods and Examples

Research Data

Research Data – Types Methods and Examples

Secondary Data

Secondary Data – Types, Methods and Examples

  • MS in the Learning Sciences
  • Tuition & Financial Aid

SMU Simmons School of Education & Human Development

Qualitative vs. quantitative data analysis: How do they differ?

Educator presenting data to colleagues

Learning analytics have become the cornerstone for personalizing student experiences and enhancing learning outcomes. In this data-informed approach to education there are two distinct methodologies: qualitative and quantitative analytics. These methods, which are typical to data analytics in general, are crucial to the interpretation of learning behaviors and outcomes. This blog will explore the nuances that distinguish qualitative and quantitative research, while uncovering their shared roles in learning analytics, program design and instruction.

What is qualitative data?

Qualitative data is descriptive and includes information that is non numerical. Qualitative research is used to gather in-depth insights that can't be easily measured on a scale like opinions, anecdotes and emotions. In learning analytics qualitative data could include in depth interviews, text responses to a prompt, or a video of a class period. 1

What is quantitative data?

Quantitative data is information that has a numerical value. Quantitative research is conducted to gather measurable data used in statistical analysis. Researchers can use quantitative studies to identify patterns and trends. In learning analytics quantitative data could include test scores, student demographics, or amount of time spent in a lesson. 2

Key difference between qualitative and quantitative data

It's important to understand the differences between qualitative and quantitative data to both determine the appropriate research methods for studies and to gain insights that you can be confident in sharing.

Data Types and Nature

Examples of qualitative data types in learning analytics:

  • Observational data of human behavior from classroom settings such as student engagement, teacher-student interactions, and classroom dynamics
  • Textual data from open-ended survey responses, reflective journals, and written assignments
  • Feedback and discussions from focus groups or interviews
  • Content analysis from various media

Examples of quantitative data types:

  • Standardized test, assessment, and quiz scores
  • Grades and grade point averages
  • Attendance records
  • Time spent on learning tasks
  • Data gathered from learning management systems (LMS), including login frequency, online participation, and completion rates of assignments

Methods of Collection

Qualitative and quantitative research methods for data collection can occasionally seem similar so it's important to note the differences to make sure you're creating a consistent data set and will be able to reliably draw conclusions from your data.

Qualitative research methods

Because of the nature of qualitative data (complex, detailed information), the research methods used to collect it are more involved. Qualitative researchers might do the following to collect data:

  • Conduct interviews to learn about subjective experiences
  • Host focus groups to gather feedback and personal accounts
  • Observe in-person or use audio or video recordings to record nuances of human behavior in a natural setting
  • Distribute surveys with open-ended questions

Quantitative research methods

Quantitative data collection methods are more diverse and more likely to be automated because of the objective nature of the data. A quantitative researcher could employ methods such as:

  • Surveys with close-ended questions that gather numerical data like birthdates or preferences
  • Observational research and record measurable information like the number of students in a classroom
  • Automated numerical data collection like information collected on the backend of a computer system like button clicks and page views

Analysis techniques

Qualitative and quantitative data can both be very informative. However, research studies require critical thinking for productive analysis.

Qualitative data analysis methods

Analyzing qualitative data takes a number of steps. When you first get all your data in one place you can do a review and take notes of trends you think you're seeing or your initial reactions. Next, you'll want to organize all the qualitative data you've collected by assigning it categories. Your central research question will guide your data categorization whether it's by date, location, type of collection method (interview vs focus group, etc), the specific question asked or something else. Next, you'll code your data. Whereas categorizing data is focused on the method of collection, coding is the process of identifying and labeling themes within the data collected to get closer to answering your research questions. Finally comes data interpretation. To interpret the data you'll take a look at the information gathered including your coding labels and see what results are occurring frequently or what other conclusions you can make. 3

Quantitative analysis techniques

The process to analyze quantitative data can be time-consuming due to the large volume of data possible to collect. When approaching a quantitative data set, start by focusing in on the purpose of your evaluation. Without making a conclusion, determine how you will use the information gained from analysis; for example: The answers of this survey about study habits will help determine what type of exam review session will be most useful to a class. 4

Next, you need to decide who is analyzing the data and set parameters for analysis. For example, if two different researchers are evaluating survey responses that rank preferences on a scale from 1 to 5, they need to be operating with the same understanding of the rankings. You wouldn't want one researcher to classify the value of 3 to be a positive preference while the other considers it a negative preference. It's also ideal to have some type of data management system to store and organize your data, such as a spreadsheet or database. Within the database, or via an export to data analysis software, the collected data needs to be cleaned of things like responses left blank, duplicate answers from respondents, and questions that are no longer considered relevant. Finally, you can use statistical software to analyze data (or complete a manual analysis) to find patterns and summarize your findings. 4

Qualitative and quantitative research tools

From the nuanced, thematic exploration enabled by tools like NVivo and ATLAS.ti, to the statistical precision of SPSS and R for quantitative analysis, each suite of data analysis tools offers tailored functionalities that cater to the distinct natures of different data types.

Qualitative research software:

NVivo: NVivo is qualitative data analysis software that can do everything from transcribe recordings to create word clouds and evaluate uploads for different sentiments and themes. NVivo is just one tool from the company Lumivero, which offers whole suites of data processing software. 5

ATLAS.ti: Similar to NVivo, ATLAS.ti allows researchers to upload and import data from a variety of sources to be tagged and refined using machine learning and presented with visualizations and ready for insert into reports. 6

SPSS: SPSS is a statistical analysis tool for quantitative research, appreciated for its user-friendly interface and comprehensive statistical tests, which makes it ideal for educators and researchers. With SPSS researchers can manage and analyze large quantitative data sets, use advanced statistical procedures and modeling techniques, predict customer behaviors, forecast market trends and more. 7

R: R is a versatile and dynamic open-source tool for quantitative analysis. With a vast repository of packages tailored to specific statistical methods, researchers can perform anything from basic descriptive statistics to complex predictive modeling. R is especially useful for its ability to handle large datasets, making it ideal for educational institutions that generate substantial amounts of data. The programming language offers flexibility in customizing analysis and creating publication-quality visualizations to effectively communicate results. 8

Applications in Educational Research

Both quantitative and qualitative data can be employed in learning analytics to drive informed decision-making and pedagogical enhancements. In the classroom, quantitative data like standardized test scores and online course analytics create a foundation for assessing and benchmarking student performance and engagement. Qualitative insights gathered from surveys, focus group discussions, and reflective student journals offer a more nuanced understanding of learners' experiences and contextual factors influencing their education. Additionally feedback and practical engagement metrics blend these data types, providing a holistic view that informs curriculum development, instructional strategies, and personalized learning pathways. Through these varied data sets and uses, educators can piece together a more complete narrative of student success and the impacts of educational interventions.

Master Data Analysis with an M.S. in Learning Sciences From SMU

Whether it is the detailed narratives unearthed through qualitative data or the informative patterns derived from quantitative analysis, both qualitative and quantitative data can provide crucial information for educators and researchers to better understand and improve learning. Dive deeper into the art and science of learning analytics with SMU's online Master of Science in the Learning Sciences program . At SMU, innovation and inquiry converge to empower the next generation of educators and researchers. Choose the Learning Analytics Specialization to learn how to harness the power of data science to illuminate learning trends, devise impactful strategies, and drive educational innovation. You could also find out how advanced technologies like augmented reality (AR), virtual reality (VR), and artificial intelligence (AI) can revolutionize education, and develop the insight to apply embodied cognition principles to enhance learning experiences in the Learning and Technology Design Specialization , or choose your own electives to build a specialization unique to your interests and career goals.

For more information on our curriculum and to become part of a community where data drives discovery, visit SMU's MSLS program website or schedule a call with our admissions outreach advisors for any queries or further discussion. Take the first step towards transforming education with data today.

  • Retrieved on August 8, 2024, from nnlm.gov/guides/data-glossary/qualitative-data
  • Retrieved on August 8, 2024, from nnlm.gov/guides/data-glossary/quantitative-data
  • Retrieved on August 8, 2024, from cdc.gov/healthyyouth/evaluation/pdf/brief19.pdf
  • Retrieved on August 8, 2024, from cdc.gov/healthyyouth/evaluation/pdf/brief20.pdf
  • Retrieved on August 8, 2024, from lumivero.com/solutions/
  • Retrieved on August 8, 2024, from atlasti.com/
  • Retrieved on August 8, 2024, from ibm.com/products/spss-statistics
  • Retrieved on August 8, 2024, from cran.r-project.org/doc/manuals/r-release/R-intro.html#Introduction-and-preliminaries

Return to SMU Online Learning Sciences Blog

Southern Methodist University has engaged Everspring , a leading provider of education and technology services, to support select aspects of program delivery.

This will only take a moment

Investigating Green Economy Studies Using a Bibliometric Analysis

  • Published: 26 August 2024

Cite this article

research data collection and analysis

  • Keerti Manisha   ORCID: orcid.org/0000-0001-5362-5416 1 , 2 &
  • Inderpal Singh   ORCID: orcid.org/0000-0002-4958-8744 1  

In recent years, the number of studies on sustainable development has increased dramatically. The green economy has been positioned as a focus for future research due to the rapid growth of sustainable development studies. A green economy improves living standards and conserves and promotes the subtle utilization of natural resources. Although the concept of the green economy is continuously growing, there remains a gap in the focused research on its evolution and the conceptual landscape. In this context, a worldwide bibliometric analysis and visualization of green economy publications have been carried out to identify recent trends and hotspots using VOSviewer 1.6.10 software. A Web of Science (WoS) keyword search on “green economy” yielded 409 documents between 2002 and 2022. The documents were analyzed in terms of year of publication, countries, institutions, keywords, journals, co-authorship, and co-citations. It was revealed that there has been tremendous growth in green economy research from 2014 to 2019, with 158 published documents. Peoples R China is the forerunner and the most significant contributor to published documents, while India stands in 23rd place. Further, the most common research areas identified are green economy, transition, and energy. Although tourism is a critical driver of green transformation to facilitate global green growth, it is one of the least explored fields. India has a considerable scope for conducting green economy research through collaborations. After that, there was an investigation on a comparative understanding of the two main camps, mainstream and heterodox, toward a comprehensive analysis of sustainable development. This study recognized that limitations and biases are integral to the research approach. Finally, the policy implications of bibliometric analysis on international coalition on the green economy and suggestions on scientific reproduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research data collection and analysis

Explore related subjects

  • Artificial Intelligence

Data Availability

The data sample includes articles from the WoS database covering a wide range of publications on the green economy from various countries and institutions between 2002 and 2022. The data can be made available upon a reasonable request.

It is also known as the Brundtland ( 2017 ).

It is a system-wide coordination body of over 40 specialized agencies, programs, and organs of the United Nations.

VOSviewer—Van Eck and Waltman, Leiden University, Leiden, The Netherlands.

Abbreviations

Analytic Hierarchy Process

Association of Southeast Asian Nations

European Economic Area

European Union

Gross domestic product

Green Economy Research Centre

Global Green Economy Index

Global Green Growth Institute

Green Growth Knowledge Platform

Health-in-all policies

International Labour Organisation

Intergovernmental Panel on Climate Change

The Institut national de recherche en sciences et technologies pour l’environnement et l’agriculture

Millennium development goals

National Clean Energy Fund

National Institution for Transforming India

Organisation for Economic Cooperation and Development

Partnership for Action on Green Economy

Reducing Emissions from Deforestation and Degradation

Science Citation Index Expanded

Sustainable development goals

The Social Science Citation Index

United Kingdom

United Nations

United Nations Conference on Trade and Development

United Nations Department of Economic and Social Affairs

United Nations Environment Program

United States

United States of America

Web of Science

World Wide Fund for Nature

NITI Aayog. (2019). SDG India Index and Dashboard, 2019 – 20 . SDG INDIA INDEX 2.0, 1–292.  https://niti.gov.in/sites/default/files/2019-12/SDG-India-Index-2.0_27-Dec.pdf

Adamowicz, M. (2022). Green deal, green growth and green economy as a means of support for attaining the sustainable development goals. Sustainability, 14 , 5901. https://doi.org/10.3390/su14105901

Article   Google Scholar  

Albort-Morant, G., Henseler, J., Leal-Millán, A., & Cepeda-Carrión, G. (2017). Mapping the field: A bibliometric analysis of green innovation. Sustainability (switzerland), 9 , 1–15. https://doi.org/10.3390/su9061011

Ali, E. B., Anufriev, V. P., & Amfo, B. (2021). Green economy implementation in Ghana as a road map for a sustainable development drive: A review. Scientific African, 12 , e00756. https://doi.org/10.1016/j.sciaf.2021.e00756

AlKhars, M. A., Alwahaishi, S., Fallatah, M. R., & Kayal, A. (2022). A literature review of the environmental Kuznets curve in GCC for 2010–2020. Environmental and Sustainability Indicators, 14 , 100181. https://doi.org/10.1016/j.indic.2022.100181

Astana. (2013). Concept for transition of the Republic of Kazakhstan to green economy until 2050. In  Decree of the President of the Republic of Kazakhstan.  https://policy.asiapacificenergy.org/sites/default/files/ConceptonTransitiontowardsGreenEconomyuntil2050%28EN%29.pdf

Barbier, E. B., & Ame, E. (2016). Building the Green Economy. https://doi.org/10.3138/cpp.2015-017

Barkin, D., & Lemus, B. (2013). Understanding progress: A heterodox approach. Sustainability (switzerland), 5 , 417–431. https://doi.org/10.3390/su5020417

Bhamra, A. (2018). India green economy barometer 2018 . European Union.

Google Scholar  

Bogovic, N. D., & Grdic, Z. S. (2020). Transitioning to a green economy—Possible effects on the Croatian economy. Sustainability (Switzerland), 12 , 1–19. https://doi.org/10.3390/su12229342

Boston, J. (2022). Living within biophysical limits: Green growth versus degrowth. Policy Quarterly, 18 , 81–92. https://doi.org/10.26686/pq.v18i2.7578

Brundtland, G. H. (2017). Our common future (‘The Brundtland Report’): World commission on environment and development. The Top 50 Sustainability Books , 52–55. https://doi.org/10.4324/9781351279086-15

Buseth, J. T. (2017). The green economy in Tanzania: From global discourses to institutionalization. Geoforum, 86 , 42–52. https://doi.org/10.1016/j.geoforum.2017.08.015

Cecere, G., & Mazzanti, M. (2017). Green jobs and eco-innovations in European SMEs. Resource and Energy Economics, 49 , 86–98. https://doi.org/10.1016/j.reseneeco.2017.03.003

Douai, A., Mearman, A., & Negru, I. (2012). Prospects for a heterodox economics of the environment and sustainability. Cambridge Journal of Economics, 36 , 1019–1032. https://doi.org/10.1093/cje/bes053

Droste, N., Hansjürgens, B., Kuikman, P., et al. (2016). Steering innovations towards a green economy: Understanding government intervention. Journal of Cleaner Production, 135 , 426–434. https://doi.org/10.1016/j.jclepro.2016.06.123

Econie, A., & Dougherty, M. L. (2019). Contingent work in the US recycling industry: Permatemps and precarious green jobs. Geoforum, 99 , 132–141. https://doi.org/10.1016/j.geoforum.2018.11.016

Enongene, K. E., & Fobissie, K. (2016). The potential of REDD+ in supporting the transition to a green economy in the Congo Basin. International Forestry Review, 18 , 29–43. https://doi.org/10.1505/146554816818206104

European Environment Agency. (2011). Green economy. In: Europe’s environment - An Assessment of Assessments (EE-AoA). https://www.eea.europa.eu/publications/europes-environment-aoa

Forfas. (2010). Future skills needs of enterprise within the green economy in Ireland . Expert Group on Future Skills Needs, November, 166.  https://www.skillsireland.ie/media/egfsn101129-green_skills_report.pdf

Georgeson, L., & Maslin, M. (2019). Estimating the scale of the US green economy within the global context. Palgrave Communications, 5 , 121. https://doi.org/10.1057/s41599-019-0329-3

Gibbs, D., & O’Neill, K. (2014). The green economy, sustainability transitions and transition regions: A case study of boston. Geografiska Annaler. Series a, Physical Geography, 96 , 201–216. https://doi.org/10.1111/geob.12046

Glänzel, W., & Schoepflin, U. (1999). A bibliometric study of reference literature in the sciences and social sciences. Information Processing & Management, 35 , 31–44. https://doi.org/10.1016/S0306-4573(98)00028-4

Gottschlich, D., Roth, S., Röhr, U., Hackfort, S., Segebart, D., König, C., & König, A. (2014). Doing sustainable economy at the crossroads of gender, care and the green economy: Debates – Common ground – Blind spots. CaGE Texts 4/2014. Translated by Kate Cahoon. Berlin/Lüneburg: Leuphana University.

Spelman, C., Cable, V., & Huhne, C. (2011). Enabling the transition to a green economy: Government and business working together . HMSO, London.

Graczyk, A. (2020). Sustainable energy – Definitions, scope and areas . In: K. Soliman (Ed.), Education excellence and innovation management: A 2025 vision to sustain economic development during global challenges . International Business Information Management Association: Seville, Spain.

Gyamfi, B. A., Adedoyin, F. F., Bein, M. A., et al. (2021). The anthropogenic consequences of energy consumption in E7 economies: Juxtaposing roles of renewable, coal, nuclear, oil and gas energy: Evidence from panel quantile method. Journal of Cleaner Production, 295 . https://doi.org/10.1016/j.jclepro.2021.126373

Hensher, M. (2023). The economics of the wellbeing economy: Understanding heterodox economics for health-in-all-policies and co-benefits. Health Promotion Journal of Australia, 34 , 651–659. https://doi.org/10.1002/hpja.764

Heshmati, A. (2018). An empirical survey of the ramifications of a green economy. International Journal of Green Economics, 12 , 53–85. https://doi.org/10.1504/IJGE.2018.092359

Jackson, J. (2024). Trading-off or trading-in? A critical political economy perspective of green growth’s policy framing. Globalizations , 1–21. https://doi.org/10.1080/14747731.2024.2348259

Jager, J., & Schmidt, L. (2020). The global political economy of green finance: A regulationist perspective. Journal fur Entwicklungspolitik, 36, 31–50. https://doi.org/10.20446/JEP-2414-3197-36-4-31

Jakobsen, O., & Storsletten, V. M. L. (2019). Beyond the green shift — Ecological economics. In J. S. Methi, A. Sergeev, M. Bieńkowska, & B. Nikiforova (Eds.), Borderology: Cross-disciplinary insights from the border zone . Springer Geography. Springer, Cham. https://doi.org/10.1007/978-3-319-99392-8_13

Kharkongor, N. W., & Singh Kanwar, A. V. (2018). Tragedy of commons from Garret Hardin to Elinor Ostrom: A governance perspective, drawing excerpts from India. International Journal of Green Economics, 12 , 182–191. https://doi.org/10.1504/IJGE.2018.097865

Klitgaard, K. (2013). Heterodox political economy and the degrowth perspective. Sustainability (Switzerland), 5 , 276–297. https://doi.org/10.3390/su5010276

Kostoff, R. N. (2002). Citation analysis of research performer quality. Scientometrics, 53 , 49–71. https://doi.org/10.1023/A:1014831920172

Kuhlman, T., & Farrington, J. (2010). What is sustainability? Sustainability, 2 , 3436–3448. https://doi.org/10.3390/su2113436

Laruffa, F. (2022). The dilemma of “sustainable welfare” and the problem of the future in capacitating social policy. Sustainability: Science Practice, and Policy, 18 , 822–836. https://doi.org/10.1080/15487733.2022.2143206

Law, A., DeLacy, T., & McGrath, G. M. (2017). A green economy indicator framework for tourism destinations. Journal of Sustainable Tourism, 25 , 1434–1455. https://doi.org/10.1080/09669582.2017.1284857

Liu, W., & Liao, H. (2017). A bibliometric analysis of fuzzy decision research during 1970–2015. International Journal of Fuzzy Systems, 19 , 1–14. https://doi.org/10.1007/s40815-016-0272-z

Magalhães, N. (2021). The green investment paradigm: Another headlong rush. Ecological Economics, 190 , 107209. https://doi.org/10.1016/j.ecolecon.2021.107209

Makoni, T., & Chikobvu, D. (2018). Modelling and forecasting Zimbabwe’s tourist arrivals using time series method: A case study of Victoria Falls Rainforest. Southern African Business Review, 22 . https://doi.org/10.25159/1998-8125/3791

Manioudis, M., & Meramveliotakis, G. (2022). Broad strokes towards a grand theory in the analysis of sustainable development: A return to the classical political economy. New Political Economy, 27 , 866–878. https://doi.org/10.1080/13563467.2022.2038114

Manisha, K., Singh, I., & Chettry, V. (2023). Investigating and analyzing the causality amid tourism, environment, economy, energy consumption, and carbon emissions using Toda-Yamamoto approach for Himachal Pradesh, India. Environment, Development and Sustainability, 36 . https://doi.org/10.1007/s10668-023-04252-3

Maphosa, M., & Maphosa, V. (2022). A bibliometric analysis of the effects of electronic waste on the environment. Global Journal of Environmental Science and Management, 8 , 589–606. https://doi.org/10.22034/GJESM.2022.04.10

Maynard, M. (2016). The green economy within an emerging new cosmology perspective: Rethinking sustainability, M.Phil diss. Faculty of Economic and Management Sciences, Stellenbosch University. https://scholar.sun.ac.za/server/api/core/bitstreams/a0fb9911-d83c-44f7-8f4c-8a1a6a092401/content

Meramveliotakis, G., & Manioudis, M. (2021). History, knowledge, and sustainable economic development: The contribution of John Stuart Mill’s grand stage theory. Sustainability (Switzerland), 13 , 1–17. https://doi.org/10.3390/su13031468

Merigó, J. M., Cancino, C. A., Coronado, F., & Urbano, D. (2016). Academic research in innovation: A country analysis. Scientometrics, 108 , 559–593. https://doi.org/10.1007/s11192-016-1984-4

Merigó, J. M., Mas-Tur, A., Roig-Tierno, N., & Ribeiro-Soriano, D. (2015). A bibliometric overview of the Journal of Business Research between 1973 and 2014. Journal of Business Research, 68 , 2645–2653. https://doi.org/10.1016/j.jbusres.2015.04.006

Merigó, J. M., & Yang, J. B. (2017). A bibliometric analysis of operations research and management science. Omega (United Kingdom), 73 , 37–48. https://doi.org/10.1016/j.omega.2016.12.004

Musah, M., Gyamfi, B. A., Kwakwa, P. A., & Agozie, D. Q. (2023). Realizing the 2050 Paris climate agreement in West Africa: The role of financial inclusion and green investments. Journal of Environmental Management, 340 , 117911. https://doi.org/10.1016/j.jenvman.2023.117911

Orago, N. (2021). Transboundaries - African heterodox ideologies for the realisation of sustainable development in the continent. Austrian Development Cooperation, July. https://www.vidc.org/fileadmin/martina/studien/transboundaries_nicholas_orago_july2021_final.pdf

Osareh, F. (1996). Bibliometrics, citation analysis and co-citation analysis: A review of literature I. Libri, 46 , 149–158. https://doi.org/10.1515/libr.1996.46.3.149

Pan, S. Y., Gao, M., Kim, H., et al. (2018). Advances and challenges in sustainable tourism toward a green economy. Science of the Total Environment, 635 , 452–469. https://doi.org/10.1016/j.scitotenv.2018.04.134

Pokhriyal, P., Rehman, S., Areendran, G., et al. (2020). Assessing forest cover vulnerability in Uttarakhand, India using analytical hierarchy process. Modeling Earth Systems and Environment, 6 , 821–831. https://doi.org/10.1007/s40808-019-00710-y

Potts, T., Niewiadomski, P., & Prager, K. (2019). The green economy research centre-positioning geographical research in Aberdeen to address the challenges of green economy transitions. Scottish Geographical Journal, 135 , 356–370. https://doi.org/10.1080/14702541.2019.1695907

Rosenberg, E., Lotz-Sisitka, H. B., & Ramsarup, P. (2018). The green economy learning assessment South Africa: Lessons for higher education, skills and work-based learning. Higher Education, Skills and Work-Based Learning, 8 , 243–258. https://doi.org/10.1108/HESWBL-03-2018-0041

Sarpong, K. A., Xu, W., Gyamfi, B. A., & Ofori, E. K. (2023). A step towards carbon neutrality in E7: The role of environmental taxes, structural change, and green energy. Journal of Environmental Management, 337 , 117556. https://doi.org/10.1016/j.jenvman.2023.117556

Schalatek, L. (2015). Merging care and green economy approaches to finance gender-equitable sustainable development in the post-2015 framework. In: ÖFSE (ed.) Austrian development policy, analyzes, reports, information “The Post-2015 Agenda. Reform or Transformation”, Vienna, pp. 31–36. https://www.econstor.eu/bitstream/10419/268193/1/OEPOL2014.pdf

Schock, R. N. (1997). What is sustainability and what influences it? This paper was prepared for submittal to International Electric Research Exchange, General Meeting San Francisco.

Söderholm, P. (2020). The green economy transition: The challenges of technological change for sustainability. Sustainable Earth, 3 , 6. https://doi.org/10.1186/s42055-020-00029-y

Stephan, P., Veugelers, R., & Wang, J. (2017). Reviewers are blinkered by bibliometrics. Nature, 544 , 411–412. https://doi.org/10.1038/544411a

Stroud, D., Fairbrother, P., Evans, C., & Blake, J. (2018). Governments matter for capitalist economies: Regeneration and transition to green and decent jobs. Economic and Industrial Democracy, 39 , 87–108. https://doi.org/10.1177/0143831X15601731

Subramanyam, K. (1983). Bibliometric studies of research collaboration: A review. Journal of Information Science, 6 , 33–38. https://doi.org/10.1177/016555158300600105

European Commission. (2019). Sustainable growth and development in the EU: Concepts and challenges (pp. 1–28). 6 , 33–38. https://ec.europa.eu/social/BlobServlet?docId=21414&langId=en

Swainson, L., & Mahanty, S. (2018). Green economy meets political economy: Lessons from the “Aceh Green” initiative, Indonesia. Global Environmental Change, 53 , 286–295. https://doi.org/10.1016/j.gloenvcha.2018.10.009

Szydlo, W. (2020). Heterodox perspective on sustainable development in economic order - The case of Poland. Education excellence and innovation management: A 2025 vision to sustain economic development during global challenges 15159–15170 WE-Conference Proceedings Citation.

Szydło, W. (2023). Sustainable development, agenda 2030 and food security in historical perspective. Economics and Environment, 85 (154), 174. https://doi.org/10.34659/eis.2023.85.2.560

UNEP. (2015). Indicators for green economy policymaking – A synthesis report of studies in Ghana, Mauritius, and Uruguay . 36. https://www.greenpolicyplatform.org/research/indicators-green-economy-policy-makingsynthesis-report-studies-ghana-mauritius-and-uruguay

Union, E. (2017). Green economy in the Western Balkans - Analysis, best practices, and recommendations. Networking and Advocacy for Green Economy . https://doi.org/10.1108/9781787144996

Vukovic, N., Pobedinsky, V., Mityagin, S., et al. (2019). A study on green economy indicators and modeling: Russian context. Sustainability (Switzerland), 11 , 4629. https://doi.org/10.3390/su11174629

Vuola, M., Korkeakoski, M., Vähäkari, N., et al. (2020). What is a green economy? Review of national-level green economy policies in Cambodia and Lao PDR. Sustainability (switzerland), 12 , 1–20. https://doi.org/10.3390/su12166664

Xie, L., Chen, Z., Wang, H., et al. (2020). Bibliometric and visualized analysis of scientific publications on atlantoaxial spine surgery based on Web of Science and VOSviewer. World Neurosurgery, 137 , 435-442.e4. https://doi.org/10.1016/j.wneu.2020.01.171

Ying, L., Li, M., & Yang, J. (2021). Agglomeration and driving factors of regional innovation space based on intelligent manufacturing and green economy. Environmental Technology and Innovation, 22 , 101398. https://doi.org/10.1016/j.eti.2021.101398

York, T. A., Brent, A. C., Musango, J. K., & de Kock, I. H. (2017). Infrastructure implications of a green economy transition in the Western Cape Province of South Africa: A system dynamics modelling approach. Development Southern Africa, 34 , 529–547. https://doi.org/10.1080/0376835X.2017.1358601

Yu, Y., Li, Y., Zhang, Z., et al. (2020). A bibliometric analysis using VOSviewer of publications on COVID-19. Annals of Translational Medicine, 8 , 816–816. https://doi.org/10.21037/atm-20-4235

Zhang, H., Liang, Q., Li, Y., & Gao, P. (2023). Promoting eco-tourism for the green economic recovery in ASEAN, pp. 2021–2036. https://doi.org/10.1007/s10644-023-09492-x

Zhao, X., Zuo, J., Wu, G., & Huang, C. (2019). A bibliometric review of green building research 2000–2016. Architectural Science Review, 62 , 74–88.

Download references

Acknowledgements

The authors thank the editor of the Journal of the Knowledge Economy and anonymous reviewers for their valuable suggestions and comments and for carefully reviewing the manuscript for finalization. We are immensely grateful to the National Institute of Technology Tiruchirappalli, Tamil Nadu, for providing the infrastructure to conduct this research successfully.

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and affiliations.

Department of Architecture, National Institute of Technology Hamirpur, Hamirpur, Himachal Pradesh, India

Keerti Manisha & Inderpal Singh

Department of Architecture, National Institute of Technology Tiruchirappalli, Tiruchirappalli, Tamil Nadu, India

Keerti Manisha

You can also search for this author in PubMed   Google Scholar

Contributions

Keerti Manisha defined the research scope, described the theoretical framework, gathered data, performed analysis, extracted key findings, prepared tables and figures, wrote the main manuscript, made conclusive remarks, suggested policy implications, revised drafts, selected relevant references, and communicated the final manuscript. Dr. Inderpal Singh guided and reviewed the anonymous comments and suggested appropriate responses for compliance.

Corresponding author

Correspondence to Keerti Manisha .

Ethics declarations

Conflict of interest.

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Manisha, K., Singh, I. Investigating Green Economy Studies Using a Bibliometric Analysis. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-02237-9

Download citation

Received : 31 October 2022

Accepted : 14 July 2024

Published : 26 August 2024

DOI : https://doi.org/10.1007/s13132-024-02237-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bibliometric analysis
  • Green economy
  • Web of Science (WoS)
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Data Collection and Analysis

    research data collection and analysis

  2. 7 Data Collection Methods & Tools For Research

    research data collection and analysis

  3. How to Collect Data

    research data collection and analysis

  4. Flowchart illustrating the process of data collection and analysis

    research data collection and analysis

  5. Data Collection: Methods, Definition, Types, and Tools

    research data collection and analysis

  6. What is Data Analysis? Techniques, Types, and Steps Explained

    research data collection and analysis

COMMENTS

  1. Data Collection

    Data Collection | Definition, Methods & Examples. Published on June 5, 2020 by Pritha Bhandari.Revised on June 21, 2023. Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

  2. Data Collection

    Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation. In order for data collection to be effective, it is important to have a clear understanding ...

  3. Data Collection in Research: Examples, Steps, and FAQs

    Data collection is the process of gathering information from various sources via different research methods and consolidating it into a single database or repository so researchers can use it for further analysis. Data collection aims to provide information that individuals, businesses, and organizations can use to solve problems, track progress, and make decisions.

  4. (PDF) Data Collection Methods and Tools for Research; A Step-by-Step

    One of the main stages in a research study is data collection that enables the researcher to find answers to research questions. Data collection is the process of collecting data aiming to gain ...

  5. Qualitative Research: Data Collection, Analysis, and Management

    Doing qualitative research is not easy and may require a complete rethink of how research is conducted, particularly for researchers who are more familiar with quantitative approaches. There are many ways of conducting qualitative research, and this paper has covered some of the practical issues regarding data collection, analysis, and management.

  6. Chapter 5: Collecting data

    Can copy data to analysis software without manual re-entry, reducing errors. Specifically designed for data collection for systematic reviews ... Much has been written about how to frame data items for developing robust data collection forms in primary research studies. We summarize a few key points and highlight issues that are pertinent to ...

  7. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population. You start with a prediction ...

  8. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  9. Framework for Data Collection and Analysis

    Module 1 • 3 hours to complete. The first course in the specialization provides an overview of the topics to come. This module walks you through the process of data collection and analysis. Starting with a research question and a review of existing data sources, we cover survey data collection techniques, highlight the importance of data ...

  10. Best Practices in Data Collection and Preparation: Recommendations for

    We offer best-practice recommendations for journal reviewers, editors, and authors regarding data collection and preparation. Our recommendations are applicable to research adopting different epistemological and ontological perspectives—including both quantitative and qualitative approaches—as well as research addressing micro (i.e., individuals, teams) and macro (i.e., organizations ...

  11. Data Collection

    Data collection is the process of gathering and measuring information used for research. Collecting data is one of the most important steps in the research process, and is part of all disciplines including physical and social sciences, humanities, business, etc. Data comes in many forms with different ways to store and record data, either written in a lab notebook and or recorded digitally on ...

  12. Guide to Data Collection Methods and Tools

    While we focus on primary data collection methods in this guide, we encourage you not to overlook the value of incorporating secondary data into your research design where appropriate. 3. Choose your data collection method. When choosing your data collection method, there are many options at your disposal.

  13. Survey Data Collection and Analytics Specialization

    This specialization covers the fundamentals of surveys as used in market research, evaluation research, social science and political research, official government statistics, and many other topic domains. In six courses, you will learn the basics of questionnaire design, data collection methods, sampling design, dealing with missing values ...

  14. Data Collection Methods

    Primary Data Collection Methods. Primary data is the type of data that has not been around before. Primary data is unique findings of your research. Primary data collection and analysis typically requires more time and effort to conduct compared to the secondary data research. Primary data collection methods can be divided into two groups ...

  15. PDF Methods of Data Collection in Quantitative, Qualitative, and Mixed Research

    research data. That is, they decide what methods of data collection (i.e., tests, questionnaires, interviews, focus groups, observations, constructed, secondary, and existing data) they will phys-ically use to obtain the research data. As you read this chapter, keep in mind the fundamental principle of mixed research originally defined in ...

  16. 7 Data Collection Methods in Business Analytics

    7 Data Collection Methods Used in Business Analytics. 1. Surveys. Surveys are physical or digital questionnaires that gather both qualitative and quantitative data from subjects. One situation in which you might conduct a survey is gathering attendee feedback after an event.

  17. Methods of Data Collection, Representation, and Analysis

    This chapter concerns research on collecting, representing, and analyzing the data that underlie behavioral and social sciences knowledge. Such research, methodological in character, includes ethnographic and historical approaches, scaling, axiomatic measurement, and statistics, with its important relatives, econometrics and psychometrics. The field can be described as including the self ...

  18. Data Collection Methods

    Step 2: Choose your data collection method. Based on the data you want to collect, decide which method is best suited for your research. Experimental research is primarily a quantitative method. Interviews, focus groups, and ethnographies are qualitative methods. Surveys, observations, archival research, and secondary data collection can be ...

  19. What Is Data Collection: Methods, Types, Tools

    Data collection is the process of collecting and evaluating information or data from multiple sources to find answers to research problems, answer questions, evaluate outcomes, and forecast trends and probabilities. It is an essential phase in all types of research, analysis, and decision-making, including that done in the social sciences ...

  20. Design: Selection of Data Collection Methods

    In this Rip Out we focus on data collection, but in qualitative research, the entire project must be considered. 1, 2 Careful design of the data collection phase requires the following: deciding who will do what, ... Textual or content analysis is ideal when used to investigate changes in official, institutional, or organizational views on a ...

  21. Ethical Considerations in Data Collection and Analysis: a Review

    augmenting data collection processes and underscores the importan ce of advanced data analysis techniques in extracting meaningful insights. The research conducted b y Singh et al. points to a

  22. Learning to Do Qualitative Data Analysis: A Starting Point

    Yonjoo Cho is an associate professor of Instructional Systems Technology focusing on human resource development (HRD) at Indiana University. Her research interests include action learning in organizations, international HRD, and women in leadership. She serves as an associate editor of Human Resource Development Review and served as a board member of the Academy of Human Resource Development ...

  23. Qualitative Data

    Social desirability bias: Qualitative data collection methods often rely on self-reporting by the participants, which can be influenced by social desirability bias. Researcher bias: The researcher's own biases, assumptions, and perspectives can influence the data collection and analysis, which can limit the objectivity of the findings.

  24. Qualitative Data Collection and Analysis

    Qualitative Data Collection and Analysis. Peter D. Bachiochi, Peter D. Bachiochi. Search for more papers by this author ... Wiley Research DE&I Statement and Publishing Policies; Developing World Access ... including rights for text and data mining and training of artificial technologies or similar technologies. The full text of this article ...

  25. Qualitative vs. Quantitative Data Analysis in Education

    Automated numerical data collection like information collected on the backend of a computer system like button clicks and page views; Analysis techniques. Qualitative and quantitative data can both be very informative. However, research studies require critical thinking for productive analysis. Qualitative data analysis methods

  26. (PDF) METHODS OF DATA COLLECTION

    Learn about the concept, types, and issues of data collection methods, with examples and tips from ResearchGate's experts. Download the PDF for free.

  27. 'I code as much as I can because you never know what they might ask for

    The first author was trained in qualitative data collection and had experience carrying out interviews as a data collection method for research projects. ... The data analysis was completed by the first author, with some discussions taking place between the first author and co-authors to clarify the first authors thought process and to act as a ...

  28. Investigating Green Economy Studies Using a Bibliometric Analysis

    here, x represents "all types," y represents "all year," and z are the keywords to collect the publications (N p). Data Collection and Bibliometric Analysis. This study has applied topical retrieval approach on the selected keyword "green economy." The literature type was defined as "all types," which includes article, editorial material, book review, review, news item, letter ...

  29. What Is Data Analytics?

    What Is Data Analytics? Knowledge is power, but the value of information is limited by what you can do with it. Today, the field of data analytics uses AI techniques such as machine learning (ML) and deep learning (DL) AI to transform structured, semistructured, and unstructured data into business intelligence (BI).. Ultimately, the desired result of using AI-enhanced data analytics is to help ...

  30. Survey mode and data quality: Careless responding across three modes in

    Much psychological research depends on participants' diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants ...