• Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = Σ[(xi – x̄)(yi – ȳ)] / [n-1] σx σy

where: xi, yi are the data points, x̄, ȳ are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Factor Analysis

Try Qualtrics for free

Factor analysis and how it simplifies research findings.

17 min read There are many forms of data analysis used to report on and study survey data. Factor analysis is best when used to simplify complex data sets with many variables.

What is factor analysis?

Factor analysis is the practice of condensing many variables into just a few, so that your research data is easier to work with.

For example, a retail business trying to understand customer buying behaviours might consider variables such as ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’. Factor analysis can help condense these variables into a single factor, such as ‘customer purchase satisfaction’.

customer purchase satisfaction tree

The theory is that there are deeper factors driving the underlying concepts in your data, and that you can uncover and work with them instead of dealing with the lower-level variables that cascade from them. Know that these deeper concepts aren’t necessarily immediately obvious – they might represent traits or tendencies that are hard to measure, such as extraversion or IQ.

Factor analysis is also sometimes called “dimension reduction”: you can reduce the “dimensions” of your data into one or more “super-variables,” also known as unobserved variables or latent variables. This process involves creating a factor model and often yields a factor matrix that organizes the relationship between observed variables and the factors they’re associated with.

As with any kind of process that simplifies complexity, there is a trade-off between the accuracy of the data and how easy it is to work with. With factor analysis, the best solution is the one that yields a simplification that represents the true nature of your data, with minimum loss of precision. This often means finding a balance between achieving the variance explained by the model and using fewer factors to keep the model simple.

Factor analysis isn’t a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research , as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

What is a factor?

In the context of factor analysis, a factor is a hidden or underlying variable that we infer from a set of directly measurable variables.

Take ‘customer purchase satisfaction’ as an example again. This isn’t a variable you can directly ask a customer to rate, but it can be determined from the responses to correlated questions like ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’.

While not directly observable, factors are essential for providing a clearer, more streamlined understanding of data. They enable us to capture the essence of our data’s complexity, making it simpler and more manageable to work with, and without losing lots of information.

Free eBook: 2024 global market research trends report

Key concepts in factor analysis

These concepts are the foundational pillars that guide the application and interpretation of factor analysis.

Central to factor analysis, variance measures how much numerical values differ from the average. In factor analysis, you’re essentially trying to understand how underlying factors influence this variance among your variables. Some factors will explain more variance than others, meaning they more accurately represent the variables they consist of.

The eigenvalue expresses the amount of variance a factor explains. If a factor solution (unobserved or latent variables) has an eigenvalue of 1 or above, it indicates that a factor explains more variance than a single observed variable, which can be useful in reducing the number of variables in your analysis. Factors with eigenvalues less than 1 account for less variability than a single variable and are generally not included in the analysis.

Factor score

A factor score is a numeric representation that tells us how strongly each variable from the original data is related to a specific factor. Also called the component score, it can help determine which variables are most influenced by each factor and are most important for each underlying concept.

Factor loading

Factor loading is the correlation coefficient for the variable and factor. Like the factor score, factor loadings give an indication of how much of the variance in an observed variable can be explained by the factor. High factor loadings (close to 1 or -1) mean the factor strongly influences the variable.

When to use factor analysis

Factor analysis is a powerful tool when you want to simplify complex data, find hidden patterns, and set the stage for deeper, more focused analysis.

It’s typically used when you’re dealing with a large number of interconnected variables, and you want to understand the underlying structure or patterns within this data. It’s particularly useful when you suspect that these observed variables could be influenced by some hidden factors.

For example, consider a business that has collected extensive customer feedback through surveys. The survey covers a wide range of questions about product quality, pricing, customer service and more. This huge volume of data can be overwhelming, and this is where factor analysis comes in. It can help condense these numerous variables into a few meaningful factors, such as ‘product satisfaction’, ‘customer service experience’ and ‘value for money’.

Factor analysis doesn’t operate in isolation – it’s often used as a stepping stone for further analysis. For example, once you’ve identified key factors through factor analysis, you might then proceed to a cluster analysis – a method that groups your customers based on their responses to these factors. The result is a clearer understanding of different customer segments, which can then guide targeted marketing and product development strategies.

By combining factor analysis with other methodologies, you can not only make sense of your data but also gain valuable insights to drive your business decisions.

Factor analysis assumptions

Factor analysis relies on several assumptions for accurate results. Violating these assumptions may lead to factors that are hard to interpret or misleading.

Linear relationships between variables

This ensures that changes in the values of your variables are consistent.

Sufficient variables for each factor

Because if only a few variables represent a factor, it might not be identified accurately.

Adequate sample size

The larger the ratio of cases (respondents, for instance) to variables, the more reliable the analysis.

No perfect multicollinearity and singularity

No variable is a perfect linear combination of other variables, and no variable is a duplicate of another.

Relevance of the variables

There should be some correlation between variables to make a factor analysis feasible.

assumptions for factor analysis

Types of factor analysis

There are two main factor analysis methods: exploratory and confirmatory. Here’s how they are used to add value to your research process.

Confirmatory factor analysis

In this type of analysis, the researcher starts out with a hypothesis about their data that they are looking to prove or disprove. Factor analysis will confirm – or not – where the latent variables are and how much variance they account for.

Principal component analysis (PCA) is a popular form of confirmatory factor analysis. Using this method, the researcher will run the analysis to obtain multiple possible solutions that split their data among a number of factors. Items that load onto a single particular factor are more strongly related to one another and can be grouped together by the researcher using their conceptual knowledge or pre-existing research.

Using PCA will generate a range of solutions with different numbers of factors, from simplified 1-factor solutions to higher levels of complexity. However, the fewer number of factors employed, the less variance will be accounted for in the solution.

Exploratory factor analysis

As the name suggests, exploratory factor analysis is undertaken without a hypothesis in mind. It’s an investigatory process that helps researchers understand whether associations exist between the initial variables, and if so, where they lie and how they are grouped.

How to perform factor analysis: A step-by-step guide

Performing a factor analysis involves a series of steps, often facilitated by statistical software packages like SPSS, Stata and the R programming language . Here’s a simplified overview of the process.

how to perform factor analysis

Prepare your data

Start with a dataset where each row represents a case (for example, a survey respondent), and each column is a variable you’re interested in. Ensure your data meets the assumptions necessary for factor analysis.

Create an initial hypothesis

If you have a theory about the underlying factors and their relationships with your variables, make a note of this. This hypothesis can guide your analysis, but keep in mind that the beauty of factor analysis is its ability to uncover unexpected relationships.

Choose the type of factor analysis

The most common type is exploratory factor analysis, which is used when you’re not sure what to expect. If you have a specific hypothesis about the factors, you might use confirmatory factor analysis.

Form your correlation matrix

After you’ve chosen the type of factor analysis, you’ll need to create the correlation matrix of your variables. This matrix, which shows the correlation coefficients between each pair of variables, forms the basis for the extraction of factors. This is a key step in building your factor analysis model.

Decide on the extraction method

Principal component analysis is the most commonly used extraction method. If you believe your factors are correlated, you might opt for principal axis factoring, a type of factor analysis that identifies factors based on shared variance.

Determine the number of factors

Various criteria can be used here, such as Kaiser’s criterion (eigenvalues greater than 1), the scree plot method or parallel analysis. The choice depends on your data and your goals.

Interpret and validate your results

Each factor will be associated with a set of your original variables, so label each factor based on how you interpret these associations. These labels should represent the underlying concept that ties the associated variables together.

Validation can be done through a variety of methods, like splitting your data in half and checking if both halves produce the same factors.

How factor analysis can help you

As well as giving you fewer variables to navigate, factor analysis can help you understand grouping and clustering in your input variables, since they’ll be grouped according to the latent variables.

Say you ask several questions all designed to explore different, but closely related, aspects of customer satisfaction:

  • How satisfied are you with our product?
  • Would you recommend our product to a friend or family member?
  • How likely are you to purchase our product in the future?

But you only want one variable to represent a customer satisfaction score. One option would be to average the three question responses. Another option would be to create a factor dependent variable. This can be done by running a principal component analysis (PCA) and keeping the first principal component (also known as a factor). The advantage of a PCA over an average is that it automatically weights each of the variables in the calculation.

Say you have a list of questions and you don’t know exactly which responses will move together and which will move differently; for example, purchase barriers of potential customers. The following are possible barriers to purchase:

  • Price is prohibitive
  • Overall implementation costs
  • We can’t reach a consensus in our organization
  • Product is not consistent with our business strategy
  • I need to develop an ROI, but cannot or have not
  • We are locked into a contract with another product
  • The product benefits don’t outweigh the cost
  • We have no reason to switch
  • Our IT department cannot support your product
  • We do not have sufficient technical resources
  • Your product does not have a feature we require
  • Other (please specify)

Factor analysis can uncover the trends of how these questions will move together. The following are loadings for 3 factors for each of the variables.

factor analysis data

Notice how each of the principal components have high weights for a subset of the variables. Weight is used interchangeably with loading, and high weight indicates the variables that are most influential for each principal component. +0.30 is generally considered to be a heavy weight.

The first component displays heavy weights for variables related to cost, the second weights variables related to IT, and the third weights variables related to organizational factors. We can give our new super variables clever names.

factor analysis data 2

If we were to cluster the customers based on these three components, we can see some trends. Customers tend to be high in cost barriers or organizational barriers, but not both.

The red dots represent respondents who indicated they had higher organizational barriers; the green dots represent respondents who indicated they had higher cost barriers

factor analysis graph

Considerations when using factor analysis

Factor analysis is a tool, and like any tool its effectiveness depends on how you use it. When employing factor analysis, it’s essential to keep a few key considerations in mind.

Oversimplification

While factor analysis is great for simplifying complex data sets, there’s a risk of oversimplification when grouping variables into factors. To avoid this you should ensure the reduced factors still accurately represent the complexities of your variables.

Subjectivity

Interpreting the factors can sometimes be subjective, and requires a good understanding of the variables and the context. Be mindful that multiple analysts may come up with different names for the same factor.

Supplementary techniques

Factor analysis is often just the first step. Consider how it fits into your broader research strategy and which other techniques you’ll use alongside it.

Examples of factor analysis studies

Factor analysis, including PCA, is often used in tandem with segmentation studies. It might be an intermediary step to reduce variables before using KMeans to make the segments.

Factor analysis provides simplicity after reducing variables. For long studies with large blocks of Matrix Likert scale questions, the number of variables can become unwieldy. Simplifying the data using factor analysis helps analysts focus and clarify the results, while also reducing the number of dimensions they’re clustering on.

Sample questions for factor analysis

Choosing exactly which questions to perform factor analysis on is both an art and a science. Choosing which variables to reduce takes some experimentation, patience and creativity. Factor analysis works well on Likert scale questions and Sum to 100 question types.

Factor analysis works well on matrix blocks of the following question genres:

Psychographics (Agree/Disagree):

  • I value family
  • I believe brand represents value

Behavioral (Agree/Disagree):

  • I purchase the cheapest option
  • I am a bargain shopper

Attitudinal (Agree/Disagree):

  • The economy is not improving
  • I am pleased with the product

Activity-Based (Agree/Disagree):

  • I love sports
  • I sometimes shop online during work hours

Behavioral and psychographic questions are especially suited for factor analysis.

Sample output reports

Factor analysis simply produces weights (called loadings) for each respondent. These loadings can be used like other responses in the survey.

Related resources

Analysis & Reporting

Margin of error 11 min read

Data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, request demo.

Ready to learn more about Qualtrics?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Med Educ

Factor Analysis: a means for theory and instrument development in support of construct validity

Mohsen tavakol.

1 School of Medicine, Medical Education Centre, the University of Nottingham, UK

Angela Wetzel

2 School of Education, Virginia Commonwealth University, USA

Introduction

Factor analysis (FA) allows us to simplify a set of complex variables or items using statistical procedures to explore the underlying dimensions that explain the relationships between the multiple variables/items. For example, to explore inter-item relationships for a 20-item instrument, a basic analysis would produce 400 correlations; it is not an easy task to keep these matrices in our heads. FA simplifies a matrix of correlations so a researcher can more easily understand the relationship between items in a scale and the underlying factors that the items may have in common. FA is a commonly applied and widely promoted procedure for developing and refining clinical assessment instruments to produce evidence for the construct validity of the measure.

In the literature, the strong association between construct validity and FA is well documented, as the method provides evidence based on test content and evidence based on internal structure, key components of construct validity. 1 From FA, evidence based on internal structure and evidence based on test content can be examined to tell us what the instrument really measures - the intended abstract concept (i.e., a factor/dimension/construct) or something else. Establishing construct validity for the interpretations from a measure is critical to high quality assessment and subsequent research using outcomes data from the measure. Therefore, FA should be a researcher’s best friend during the development and validation of a new measure or when adapting a measure to a new population. FA is also a useful companion when critiquing existing measures for application in research or assessment practice. However, despite the popularity of FA, when applied in medical education instrument development, factor analytic procedures do not always match best practice. 2 This editorial article is designed to help medical educators use FA appropriately.

The Applications of FA

The applications of FA depend on the purpose of the research. Generally speaking, there are two most important types of FA: Explorator Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA).

Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is widely used in medical education research in the early phases of instrument development, specifically for measures of latent variables that cannot be assessed directly. Typically, in EFA, the researcher, through a review of the literature and engagement with content experts, selects as many instrument items as necessary to fully represent the latent construct (e.g., professionalism). Then, using EFA, the researcher explores the results of factor loadings, along with other criteria (e.g., previous theory, Minimum average partial, 3 Parallel analysis, 4 conceptual meaningfulness, etc.) to refine the measure. Suppose an instrument consisting of 30 questions yields two factors - Factor 1 and Factor 2. A good definition of a factor as a theoretical construct is to look at its factor loadings. 5 The factor loading is the correlation between the item and the factor; a factor loading of more than 0.30 usually indicates a moderate correlation between the item and the factor. Most statistical software, such as SAS, SPSS and R, provide factor loadings. Upon review of the items loading on each factor, the researcher identifies two distinct constructs, with items loading on Factor 1 all related to professionalism, and items loading on Factor 2 related, instead, to leadership. Here, EFA helps the researcher build evidence based on internal structure by retaining only those items with appropriately high loadings on Factor 1 for professionalism, the construct of interest.

It is important to note that, often, Principal Component Analysis (PCA) is applied and described, in error, as exploratory factor analysis. 2 , 6 PCA is appropriate if the study primarily aims to reduce the number of original items in the intended instrument to a smaller set. 7 However, if the instrument is being designed to measure a latent construct, EFA, using Maximum Likelihood (ML) or Principal Axis Factoring (PAF), is the appropriate method. 7   These exploratory procedures statistically analyze the interrelationships between the instrument items and domains to uncover the unknown underlying factorial structure (dimensions) of the construct of interest. PCA, by design, seeks to explain total variance (i.e., specific and error variance) in the correlation matrix. The sum of the squared loadings on a factor matrix for a particular item indicates the proportion of variance for that given item that is explained by the factors. This is called the communality. The higher the communality value, the more the extracted factors explain the variance of the item. Further, the mean score for the sum of the squared factor loadings specifies the proportion of variance explained by each factor. For example, assume four items of an instrument have produced Factor 1, factor loadings of Factor 1 are 0.86, 0.75, 0.66 and 0.58, respectively. If you square the factor loading of items, you will get the percentage of the variance of that item which is explained by Factor 1. In this example, the first principal component (PC) for item1, item2, item3 and item4 is 74%, 56%, 43% and 33%, respectively. If you sum the squared factor loadings of Factor 1, you will get the eigenvalue, which is 2.1 and dividing the eigenvalue by four (2.1/4= 0.52) we will get the proportion of variance accounted for Factor 1, which is 52 %. Since PCA does not separate specific variance and error variance, it often inflates factor loadings and limits the potential for the factor structure to be generalized and applied with other samples in subsequent study. On the other hand, Maximum likelihood and Principal Axis Factoring extraction methods separate common and unique variance (specific and error variance), which overcomes the issue attached to PCA.  Thus, the proportion of variance explained by an extracted factor more precisely reflects the extent to which the latent construct is measured by the instrument items. This focus on shared variance among items explained by the underlying factor, particularly during instrument development, helps the researcher understand the extent to which a measure captures the intended construct. It is useful to mention that in PAF, the initial communalities are not set at 1s, but they are chosen based on the squared multiple correlation coefficient. Indeed, if you run a multiple regression to predict say  item1 (dependent variable)  from other items (independent variables) and then look at the R-squared (R2), you will see R2 is equal to the communalities of item1 derived from PAF.

Confirmatory Factor Analysis

When prior EFA studies are available for your intended instrument, Confirmatory Factor Analysis extends on those findings, allowing you to confirm or disconfirm the underlying factor structures, or dimensions, extracted in prior research. CFA is a theory or model-driven approach that tests how well the data “fit” to the proposed model or theory. CFA thus departs from EFA in that researchers must first identify a factor model before analysing the data. More fundamentally, CFA is a means for statistically testing the internal structure of instruments and relies on the maximum likelihood estimation (MLE) and a different set of standards for assessing the suitability of the construct of interest. 7 , 8

Factor analysts usually use the path diagram to show the theoretical and hypothesized relationships between items and the factors to create a hypothetical model to test using the ML method. In the path diagram, circles or ovals represent factors. A rectangle represents the instrument items. Lines (→ or ↔) represent relationships between items. No line, no relationship. A single-headed arrow shows the causal relationship (the variable that the arrowhead refers to is the dependent variable), and a double-headed shows a covariance between variables or factors.

If CFA indicates the primary factors, or first-order factors, produced by the prior PAF are correlated, then the second-order factors need to be modelled and estimated to get a greater understanding of the data. It should be noted if the prior EFA applied an orthogonal rotation to the factor solution, the factors produced would be uncorrelated. Hence, the analysis of the second-order factors is not possible. Generally, in social science research, most constructs assume inter-related factors, and therefore should apply an oblique rotation. The justification for analyzing the second-order factors is that when the correlations between the primary factors exist, CFA can then statistically model a broad picture of factors not captured by the primary factors (i.e., the first-order factors). 9   The analysis of the first-order factors is like surveying mountains with a zoom lens binoculars, while the analysis of the second-order factors uses a wide-angle lens. 10 Goodness of- fit- tests need to be conducted when evaluating the hypothetical model tested by CFA. The question is: does the new data fit the hypothetical model? However, the statistical models of the goodness of- fit- tests are complex, and extend beyond the scope of this editorial paper; thus,we strongly encourage the readers consult with factors analysts to receive resources and possible advise.

Conclusions

Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors, attitudes, and dispositions of the construct of interest. This snapshot provides critical evidence for the validity of the measure based on the fit of the test content to the theoretical framework that underlies the construct. Further, the relationships between factors, which can be explored with EFA and confirmed with CFA, help researchers interpret the theoretical connections between underlying dimensions of a construct and even extending to relationships across constructs in a broader theoretical model. However, studies that do not apply recommended extraction, rotation, and interpretation in FA risk drawing faulty conclusions about the validity of a measure. As measures are picked up by other researchers and applied in experimental designs, or by practitioners as assessments in practice, application of measures with subpar evidence for validity produces a ripple effect across the field. It is incumbent on researchers to ensure best practices are applied or engage with methodologists to support and consult where there are gaps in knowledge of methods. Further, it remains important to also critically evaluate measures selected for research and practice, focusing on those that demonstrate alignment with best practice for FA and instrument development. 7 , 11

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Lesson 12: Factor Analysis

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables.  In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

  • Understand the terminology of factor analysis, including the interpretation of factor loadings, specific variances, and commonalities;
  • Understand how to apply both principal component and maximum likelihood methods for estimating the parameters of a factor model;
  • Understand factor rotation, and interpret rotated factor loadings.

12.1 - Notations and Terminology

Collect all of the variables X 's into a vector \(\mathbf{X}\) for each individual subject. Let \(\mathbf{X_i}\) denote observable trait i. These are the data from each subject and are collected into a vector of traits.

\(\textbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right) = \text{vector of traits}\)

This is a random vector, with a population mean. Assume that vector of traits \(\mathbf{X}\) is sampled from a population with population mean vector:

\(\boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right) = \text{population mean vector}\)

Here, \(\mathrm { E } \left( X _ { i } \right) = \mu _ { i }\) denotes the population mean of variable i .

Consider m unobservable common factors \(f _ { 1 } , f _ { 2 } , \dots , f _ { m }\). The \(i^{th}\) common factor is \(f _ { i } \). Generally, m is going to be substantially less than p .

The common factors are also collected into a vector,

\(\mathbf{f} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right) = \text{vector of common factors}\)

Our factor model can be thought of as a series of multiple regressions, predicting each of the observable variables \(X_{i}\) from the values of the unobservable common factors \(f_{i}\) :

\begin{align} X_1 & =  \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & =  \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ &  \vdots \\ X_p & =  \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{align}

Here, the variable means \(\mu_{1}\) through \(\mu_{p}\) can be regarded as the intercept terms for the multiple regression models.

The regression coefficients \(l_{ij}\) (the partial slopes) for all of these multiple regressions are called factor loadings. Here, \(l_{ij}\) = loading of the \(i^{th}\) variable on the \(j^{th}\) factor. These are collected into a matrix as shown here:

\(\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) = \text{matrix of factor loadings}\)

And finally, the errors \(\varepsilon _{i}\) are called the specific factors. Here, \(\varepsilon _{i}\) = specific factor for variable i . The specific factors are also collected into a vector:

\(\boldsymbol{\epsilon} = \left(\begin{array}{c}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_p\end{array}\right) = \text{vector of specific factors}\)

In summary, the basic model is like a regression model. Each of our response variables X is predicted as a linear function of the unobserved common factors \(f_{1}\), \(f_{2}\) through \(f_{m}\). Thus, our explanatory variables are \(f_{1}\) , \(f_{2}\) through \(f_{m}\). We have m unobserved factors that control the variation in our data.

We will generally reduce this into matrix notation as shown in this form here:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+\boldsymbol{\epsilon}\)

12.2 - Model Assumptions

The specific factors or random errors all have mean zero: \(E(\epsilon_i) = 0\); i = 1, 2, ... , p

The common factors, the f 's, also have mean zero: \(E(f_i) = 0\); i = 1, 2, ... , m

A consequence of these assumptions is that the mean response of the i th trait is \(\mu_i\). That is,

\(E(X_i) = \mu_i\)

The common factors have variance one: \(\text{var}(f_i) = 1\); i = 1, 2, ... , m  

Correlation

The common factors are uncorrelated with one another: \(\text{cov}(f_i, f_j) = 0\)   for i ≠ j

The specific factors are uncorrelated with one another: \(\text{cov}(\epsilon_i, \epsilon_j) = 0\)  for i ≠ j  

The specific factors are uncorrelated with the common factors: \(\text{cov}(\epsilon_i, f_j) = 0\);   i = 1, 2, ... , p; j = 1, 2, ... , m  

These assumptions are necessary to estimate the parameters uniquely. An infinite number of equally well-fitting models with different parameter values may be obtained unless these assumptions are made.

Under this model the variance for the i th observed variable is equal to the sum of the squared loadings for that variable and the specific variance:

The variance of trait i is: \(\sigma^2_i = \text{var}(X_i) = \sum_{j=1}^{m}l^2_{ij}+\psi_i\) 

This derivation is based on the previous assumptions. \(\sum_{j=1}^{m}l^2_{ij}\) is called the  Communality for variable i.  Later on, we will see how this is a measure of how well the model performs for that particular variable. The larger the commonality, the better the model performance for the i th variable.

The covariance between pairs of traits i and j is: \(\sigma_{ij}= \text{cov}(X_i, X_j) = \sum_{k=1}^{m}l_{ik}l_{jk}\) 

The covariance between trait i and factor j is: \(\text{cov}(X_i, f_j) = l_{ij}\)

In matrix notation, our model for the variance-covariance matrix is expressed as shown below:

\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)

This is the matrix of factor loadings times its transpose, plus a diagonal matrix containing the specific variances.

Here \(\boldsymbol{\Psi}\) equals:

\(\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\)

A parsimonious (simplified) model for the variance-covariance matrix is obtained and used for estimation.

  • The model assumes that the data is a linear function of the common factors. However, because the common factors are not observable, we cannot check for linearity.

The variance-covariance matrix is going to have p ( p +1)/2 unique elements of \(\Sigma\) approximated by:

  • mp factor loadings in the matrix \(\mathbf{L}\), and
  • p specific variances

This means that there are mp plus p parameters in the variance-covariance matrix. Ideally,  mp + p is substantially smaller than p ( p +1)/2. However, if mp is too small, the mp + p parameters may not be adequate to describe \(\Sigma\). There may always be the case that this is not the right model and you cannot reduce the data to a linear combination of factors.

\(\mathbf{T'T = TT' = I} \)

We can write our factor model in matrix notation:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{LTT'f}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{L^*f^*}+\boldsymbol{\epsilon}\)

Note that This does not change the calculation because the identity matrix times any matrix is the original matrix. This results in an alternative factor model, where the relationship between the new factor loadings and the original factor loadings is:

\(\mathbf{L^*} = \textbf{LT}\)

and the relationship between the new common factors and the original common factors is:

\(\mathbf{f^*} = \textbf{T'f}\)

This gives a model that fits equally well. Moreover, because there is an infinite number of orthogonal matrices, then there is an infinite number of alternative models. This model, as it turns out, satisfies all of the assumptions discussed earlier.

\(E(\mathbf{f^*}) = E(\textbf{T'f}) = \textbf{T'}E(\textbf{f}) = \mathbf{T'0} =\mathbf{0}\),

\(\text{var}(\mathbf{f^*}) = \text{var}(\mathbf{T'f}) = \mathbf{T'}\text{var}(\mathbf{f})\mathbf{T} = \mathbf{T'IT} = \mathbf{T'T} = \mathbf{I}\)

\(\text{cov}(\mathbf{f^*, \boldsymbol{\epsilon}}) = \text{cov}(\mathbf{T'f, \boldsymbol{\epsilon}}) = \mathbf{T'}\text{cov}(\mathbf{f, \boldsymbol{\epsilon}}) = \mathbf{T'0} = \mathbf{0}\)

So f* satisfies all of the assumptions, and hence f* is an equally valid collection of common factors.  There is a certain apparent ambiguity to these models. This ambiguity is later used to justify a factor rotation to obtain a more parsimonious description of the data.

12.3 - Principal Component Method

We consider two different methods to estimate the parameters of a factor model:

Principal Component Method

  • Maximum Likelihood Estimation

A third method, the principal factor method, is also available but not considered in this class.

Let \(X_i\) be a vector of observations for the \(i^{th}\) subject:

\(\mathbf{X_i} = \left(\begin{array}{c}X_{i1}\\ X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\)

\(\mathbf{S}\) denotes our sample variance-covariance matrix and is expressed as:

\(\textbf{S} = \dfrac{1}{n-1}\sum\limits_{i=1}^{n}\mathbf{(X_i - \bar{x})(X_i - \bar{x})'}\)

We have p eigenvalues for this variance-covariance matrix as well as corresponding eigenvectors for this matrix.

 Eigenvalues of \(\mathbf{S}\):

\(\hat{\lambda}_1, \hat{\lambda}_2, \dots, \hat{\lambda}_p\)

Eigenvectors of \(\mathbf{S}\):

\(\hat{\mathbf{e}}_1, \hat{\mathbf{e}}_2, \dots, \hat{\mathbf{e}}_p\)

Recall that the variance-covariance matrix can be re-expressed in the following form as a function of the eigenvalues and the eigenvectors:

Spectral Decomposition of \(Σ\)

\(\Sigma = \sum_{i=1}^{p}\lambda_i \mathbf{e_{ie'_i}} \cong \sum_{i=1}^{m}\lambda_i \mathbf{e_{ie'_i}} = \left(\begin{array}{cccc}\sqrt{\lambda_1}\mathbf{e_1} & \sqrt{\lambda_2}\mathbf{e_2} &  \dots &  \sqrt{\lambda_m}\mathbf{e_m}\end{array}\right)  \left(\begin{array}{c}\sqrt{\lambda_1}\mathbf{e'_1}\\ \sqrt{\lambda_2}\mathbf{e'_2}\\ \vdots\\ \sqrt{\lambda_m}\mathbf{e'_m}\end{array}\right) = \mathbf{LL'}\)

The idea behind the principal component method is to approximate this expression. Instead of summing from 1 to p , we now sum from 1 to m , ignoring the last p - m terms in the sum, and obtain the third expression. We can rewrite this as shown in the fourth expression, which is used to define the matrix of factor loadings \(\mathbf{L}\), yielding the final expression in matrix notation.

This yields the following estimator for the factor loadings:

\(\hat{l}_{ij} = \hat{e}_{ji}\sqrt{\hat{\lambda}_j}\)

This forms the matrix \(\mathbf{L}\) of factor loading in the factor analysis. This is followed by the transpose of \(\mathbf{L}\).  To estimate the specific variances, recall that our factor model for the variance-covariance matrix is

\(\boldsymbol{\Sigma} = \mathbf{LL'} + \boldsymbol{\Psi}\)

in matrix notation. \(\Psi\) is now going to be equal to the variance-covariance matrix minus \(\mathbf{LL'}\).

\( \boldsymbol{\Psi} = \boldsymbol{\Sigma} - \mathbf{LL'}\)

This in turn suggests that the specific variances, the diagonal elements of \(\Psi\), are estimated with this expression:

\(\hat{\Psi}_i = s^2_i - \sum\limits_{j=1}^{m}\lambda_j \hat{e}^2_{ji}\) 

We take the sample variance for the i th variable and subtract the sum of the squared factor loadings (i.e., the commonality).

12.4 - Example: Places Rated Data - Principal Component Method

Example 12-1: places rated.

Let's revisit the Places Rated Example from Lesson 11 .  Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:

  • Climate and Terrain
  • Health Care & Environment
  • Transportation

Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.

Our objective here is to describe the relationships among the variables.

Before carrying out a factor analysis we need to determine m . How many common factors should be included in the model? This requires a determination of how many parameters will be involved.

For p = 9, the variance-covariance matrix \(\Sigma\) contains

\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)

unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to

\(p(m+1) = 9(m+1)\)

Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.

It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.

We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.

Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.

  •   Example

The factor analysis is carried out using the program as shown below:

Download the SAS Program here: places2.sas

Note : In the upper right-hand corner of the code block you will have the option of copying (   ) the code to your clipboard or downloading (   ) the file to your computer.

Performing factor analysis (principal components extraction)

To perform factor analysis and obtain the communalities:.

  • Open the ‘ places_tf.csv ’ data set in a new worksheet.
  • Calc > Calculator
  • Highlight and select ‘climate’ to move it to the Store result window.
  • In the Expression window, enter LOGTEN( 'climate') to apply the (base 10) log transformation to the climate variable.
  • Choose OK . The transformed values replace the originals in the worksheet under ‘climate’.
  • Repeat sub-steps 1) through 4) above for all variables housing through econ.
  • Highlight and select climate through econ to move all 9 variables to the Variables window.
  • Choose 3 for the number of factors to extract.
  • Choose Principal Components for the Method of Extraction.
  • Under Options, select Correlation as Matrix to Factor .
  • Under Graphs, select Scree Plot.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the screen plot graph. The last column has the communality values.

Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression

\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)

These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m =3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.

Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criteria, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:

Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.

Similarly, Factor 2 is correlated most strongly with Crime, Education, and Economics. You can say that the second factor is primarily a measure of these variables.

Likewise, Factor 3 is correlated most strongly with Climate and Economics. You can say that the third factor is primarily a measure of these variables.

The interpretation above is very similar to that obtained in the standardized principal component analysis.

12.5 - Communalities

Example 12-1: continued....

The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:

\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)

To understand the computation of communulaties, recall the table of factor loadings:

Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:

\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)

The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:

5.616885, (located just above the individual communalities), is the "Total Communality".

Performing factor analysis (MLE extraction)

To perform factor analysis and obtain the communities:.

In summary, the communalities are placed into a table:

You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variations in climate, the arts, economics, and health.

One assessment of how well this model performs can be obtained from the communalities.  We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best and is not bad for other variables such as Economics, Health, and the Arts. However, for other variables such as Crime, Recreation, Transportation, and Housing the model does not do a good job, explaining only about half of the variation.

The sum of all communality values is the total communality value:

\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)

Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is

\(\dfrac{5.617}{9} = 0.624\)

This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.

Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:

\(\hat{\Psi}_i = 1-\hat{h}^2_i\)

Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:

\(\hat{\Psi}_1 = 1-0.795 = 0.205\)

The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:

For example, the specific variance for housing is 0.482.

This model provides an approximation to the correlation matrix.  We can assess the model's appropriateness with the residuals obtained from the following calculation:

\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)

This is basically the difference between R and LL' , or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217.  These values give an indication of how well the factor model fits the data.

One disadvantage of the principal component method is that it does not provide a test for lack of fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this.  Such a test is available for the maximum likelihood method.

12.6 - Final Notes about the Principal Component Method

Unlike the competing methods, the estimated factor loadings under the principal component method do not change as the number of factors is increased. This is not true of the remaining methods (e.g., maximum likelihood). However, the communalities and the specific variances will depend on the number of factors in the model. In general, as you increase the number of factors, the communalities increase toward one and the specific variances will decrease toward zero.

The diagonal elements of the variance-covariance matrix \(\mathbf{S}\) (or \(\mathbf{R}\)) are equal to the diagonal elements of the model:

\(\mathbf{\hat{L}\hat{L}' + \mathbf{\hat{\Psi}}}\)

The off-diagonal elements are not exactly reproduced. This is in part due to variability in the data - just random chance. Therefore, we want to select the number of factors to make the off-diagonal elements of the residual matrix small:

\(\mathbf{S - (\hat{L}\hat{L}' + \hat{\Psi})}\)

Here, we have a trade-off between two conflicting desires. For a parsimonious model, we wish to select the number of factors m to be as small as possible, but for such a model, the residuals could be large. Conversely, by selecting m to be large, we may reduce the sizes of the residuals but at the cost of producing a more complex and less interpretable model (there are more factors to interpret).

Another result to note is that the sum of the squared elements of the residual matrix is equal to the sum of the squared values of the eigenvalues left out of the matrix:

\(\sum\limits_{j=m+1}^{p}\hat{\lambda}^2_j\)

General Methods used in determining the number of Factors

Below are three common techniques used to determine the number of factors to extract:

  • Cumulative proportion of at least 0.80 (or 80% explained variance)
  • Eigenvalues of at least one
  • Scree plot is based on the "elbow" of the plot; that is, where the plot turns and begins to flatten out

12.7 - Maximum Likelihood Estimation Method

Maximum Likelihood Estimation requires that the data are sampled from a multivariate normal distribution. This is a drawback of this method. Data is often collected on a Likert scale, especially in the social sciences. Because a Likert scale is discrete and bounded, these data cannot be normally distributed.

Using the Maximum Likelihood Estimation Method, we must assume that the data are independently sampled from a multivariate normal distribution with mean vector \(\mu\) and variance-covariance matrix of the form:

\(\boldsymbol{\Sigma} = \mathbf{LL' +\boldsymbol{\Psi}}\)

where \(\mathbf{L}\) is the matrix of factor loadings and \(\Psi\) is the diagonal matrix of specific variances.

We define additional notation: As usual, the data vectors for n subjects are represented as shown:

\(\mathbf{X_1},\mathbf{X_2}, \dots, \mathbf{X_n}\)

Maximum likelihood estimation involves estimating the mean, the matrix of factor loadings, and the specific variance.

The maximum likelihood estimator for the mean vector \(\mu\), the factor loadings \(\mathbf{L}\), and the specific variances \(\Psi\) are obtained by finding \(\hat{\mathbf{\mu}}\), \(\hat{\mathbf{L}}\), and \(\hat{\mathbf{\Psi}}\) that maximize the log-likelihood given by the following expression:

\(l(\mathbf{\mu, L, \Psi}) = - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL' + \Psi}|} - \dfrac{1}{2}\sum_{i=1}^{n}\mathbf{(X_i-\mu)'(LL'+\Psi)^{-1}(X_i-\mu)}\)

The log of the joint probability distribution of the data is maximized. We want to find the values of the parameters, (\(\mu\), \(\mathbf{L}\), and \(\Psi\)), that are most compatible with what we see in the data. As was noted earlier the solutions for these factor models are not unique. Equivalent models can be obtained by rotation. If \(\mathbf{L'\Psi^{-1}L}\) is a diagonal matrix, then we may obtain a unique solution.

Computationally this process is complex. In general, there is no closed-form solution to this maximization problem so iterative methods are applied. Implementation of iterative methods can run into problems as we will see later.

12.8 - Example: Places Rated Data

Example 12-2: places rated.

This method of factor analysis is being carried out using the program shown below:

Download the SAS Program here: places3.sas

Here we have specified the Maximum Likelihood Method by setting method=ml. Again, we need to specify the number of factors.

You will notice that this program produces errors and does not complete the factor analysis. We will start out without the Heywood or priors options discussed below to see the error that occurs and how to remedy it.

For m = 3 factors, maximum likelihood estimation fails to converge.  An examination of the records of each iteration reveals that the commonality of the first variable (climate) exceeds one during the first iteration.  Because the communality must lie between 0 and 1, this is the cause for failure.

SAS provides a number of different fixes for this kind of error.  Most fixes adjust the initial guess, or starting value, for the commonalities.

  • priors=smc: Sets the prior commonality of each variable proportional to the R 2 of that variable with all other variables as an initial guess.
  • priors=asmc: As above with an adjustment so that the sum of the commonalities is equal to the maximum of the absolute correlations.
  • priors=max: Sets the prior commonality of each variable to the maximum absolute correlation within any other variable.
  • priors=random: Sets the prior commonality of each variable to a random number between 0 and 1.

This option is added within the proc factor line of code (proc factor method=ml nfactors=3 priors=smc;).  If we begin with better-starting values, then we might have better luck at convergence. Unfortunately, in trying each of these options, (including running the random option multiple times), we find that these options are ineffective for our Places Rated Data. The second option needs to be considered.

  • Attempt adding the Heywood option to the procedure (proc factor method=ml nfactors=3 heywood;). This sets communalities greater than one back to one, allowing iterations to proceed. In other words, if the commonality value falls out of bounds, then it will be replaced by a value of one. This will always yield a solution, but frequently the solution will not adequately fit the data.

We start with the same values for the commonalities and then at each iteration, we obtain new values for the commonalities. The criterion is a value that we are trying to minimize in order to obtain our estimates. We can see that the convergence criterion decreases with each iteration of the algorithm.

You can see in the second iteration that rather than report a commonality greater than one, SAS replaces it with the value one and then proceeds as usual through the iterations.

After five iterations the algorithm converges, as indicated by the statement on the second page of the output.  The algorithm converged to a setting where the commonality for Climate is equal to one.

To perform factor analysis using maximum likelihood

  • Choose Maximum Likelihood for the Method of Extraction.
  • Under Results, select All and MLE iterations , and choose OK .
  • Choose OK again . The numeric results are shown in the results area.

12.9 - Goodness-of-Fit

Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:

\(\mathbf{\Sigma = LL' + \Psi}\)

\(\mathbf{L}\) is the matrix of factor loadings, and the diagonal elements of \(Ψ\) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:

\(X^2 = \left(n-1-\frac{2p+4m-5}{6}\right)\log \frac{|\mathbf{\hat{L}\hat{L}'}+\mathbf{\hat{\Psi}}|}{|\hat{\mathbf{\Sigma}}|}\)

The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant in the statistic is called the Bartlett correction. The log is the natural log. In the numerator, we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:

\(\hat{\boldsymbol{\Sigma}} = \frac{n-1}{n}\mathbf{S}\)

and \(\mathbf{S}\) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for \(X_{2}\). However, if the model does not fit well, then the determinants will be different and \(X_{2}\) will be large.

Under the null hypothesis that the factor model adequately describes the relationships among the variables,

\(\mathbf{X}^2 \sim \chi^2_{\frac{(p-m)^2-p-m}{2}} \)

Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if \(X_{2}\) exceeds the critical value from the chi-square table.

Back to the Output...

Looking just past the iteration results, we have....

For our Places Rated dataset, we find a significant lack of fit. \(X _ { 2 } = 92.67 ; d . f = 12 ; p < 0.0001\). We conclude that the relationships among the variables are not adequately described by the factor model. This suggests that we do not have the correct model.

The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy

\(p(m+1) \le \frac{p(p+1)}{2}\)

In the present example, this means that m ≤ 4.

Let's return to the SAS program and change the "nfactors" value from 3 to 4:

We find that the factor model with m = 4 does not fit the data adequately either, \(X _ { 2 } = 41.69 ; d . f . = 6 ; p < 0.0001\). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. The next step could be to drop variables from the data set to obtain a better-fitting model.

12.10 - Factor Rotations

From our experience with the Places Rated data, it does not look like the factor model works well. There is no guarantee that any model will fit the data well.

The first motivation of factor analysis was to try to discern some underlying factors describing the data. The Maximum Likelihood Method failed to find such a model to describe the Places Rated data. The second motivation is still valid, which is to try to obtain a better interpretation of the data. In order to do this, let's take a look at the factor loadings obtained before from the principal component method.

The problem with this analysis is that some of the variables are highlighted in more than one column. For instance, Education appears significant to Factor 1 AND Factor 2. The same is true for Economics in both Factors 2 AND 3. This does not provide a very clean, simple interpretation of the data. Ideally, each variable would appear as a significant contributor in one column.

In fact, the above table may indicate contradictory results. Looking at some of the observations, it is conceivable that we will find an observation that takes a high value on both Factors 1 and 2. If this occurs, a high value for Factor 1 suggests that the community has quality education, whereas a high value for Factor 2 suggests the opposite, that the community has poor education.

Factor rotation is motivated by the fact that factor models are not unique. Recall that the factor model for the data vector, \(\mathbf{X = \boldsymbol{\mu} + LF + \boldsymbol{\epsilon}}\), is a function of the mean \(\boldsymbol{\mu}\), plus a matrix of factor loadings times a vector of common factors, plus a vector of specific factors.

Moreover, we should note that this is equivalent to a rotated factor model, \(\mathbf{X = \boldsymbol{\mu} + L^*F^* + \boldsymbol{\epsilon}}\), where we have set \(\mathbf{L^* = LT}\) and \(\mathbf{f^* = T'f}\) for some orthogonal matrix \(\mathbf{T}\) where \(\mathbf{T'T = TT' = I}\). Note that there are an infinite number of possible orthogonal matrices, each corresponding to a particular factor rotation.

We plan to find an appropriate rotation, defined through an orthogonal matrix \(\mathbf{T}\) , that yields the most easily interpretable factors.

To understand this, consider a scatter plot of factor loadings. The orthogonal matrix \(\mathbf{T}\) rotates the axes of this plot. We wish to find a rotation such that each of the p variables has a high loading on only one factor.

We will return to the program below to obtain a plot.  In looking at the program, there are a number of options (marked in blue under proc factor) that we did not yet explain.

Download the SAS program here: places2.sas

One of the options above is labeled 'preplot'. We will use this to plot the values for factor 1 against factor 2.

In the output these values are plotted, the loadings for factor 1 on the y-axis, and the loadings for factor 2 on the x-axis.

Similarly, the second variable, labeled with the letter B, has a factor 1 loading of about 0.7 and a factor 2 loading of about 0.15.  Each letter on the plot corresponds to a single variable. SAS provides plots of the other combinations of factors, factor 1 against factor 3 as well as factor 2 against factor 3.

Three factors appear in this model so we might consider a three-dimensional plot of all three factors together.

Obtaining a scree plot and loading plot

To perform factor analysis with scree and loading plots:.

  • Transform variables. This step is optional but used in the steps below.  
  • Choose OK. The transformed values replace the originals in the worksheet under ‘climate’.
  • Stat > Multivariate > Factor Analysis
  • Under Graphs, select Scree plot and Loading plot for first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with both the scree plot and the loading plot.

The selection of the orthogonal matrixes \(\mathbf{T}\) corresponds to our rotation of these axes. Think about rotating the axis of the center. Each rotation will correspond to an orthogonal matrix \(\mathbf{T}\). We want to rotate the axes to obtain a cleaner interpretation of the data. We would really like to define new coordinate systems so that when we rotate everything, the points fall close to the vertices (endpoints) of the new axes.

If we were only looking at two factors, then we would like to find each of the plotted points at the four tips (corresponding to all four directions) of the rotated axes. This is what rotation is about, taking the factor pattern plot and rotating the axes in such a way that the points fall close to the axes.

12.11 - Varimax Rotation

This is the sample variances of the standardized loadings for each factor summed over the m factors.

Returning to the options of the factoring procedure (marked in blue):

"rotate," asks for factor rotation and we specified the Varimax rotation of our factor loadings.

"plot," asks for the same kind of plot that we just looked at for the rotated factors. The result of our rotation is a new factor pattern given below (page 11 of SAS output):

Here is a copy of page 10 from the SAS output:

At the top of page 10 of the output, above, we have our orthogonal matrix T .

Using Varimax Rotation

To perform factor analysis with varimax rotation:.

  • Choose Varimax for the Type of Rotation.
  • Under Graphs, select Loading plot for the first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the loading plot.

The values of the rotated factor loadings are:

Let us now interpret the data based on the rotation. We highlighted the values that are large in magnitude and make the following interpretation.

  • Factor 1: primarily a measure of Health, but also increases with increasing scores for Transportation, Education, and the Arts.
  • Factor 2: primarily a measure of Crime, Recreation, the Economy, and Housing.
  • Factor 3: primarily a measure of Climate alone.

This is just the pattern that exists in the data and no causal inferences should be made from this interpretation. It does not tell us why this pattern exists. It could very well be that there are other essential factors that are not seen at work here.

Let us look at the amount of variation explained by our factors under the rotated model and compare it to the original model. Consider the variance explained by each factor under the original analysis and the rotated factors:

The total amount of variation explained by the 3 factors remains the same. Rotations, among a fixed number of factors, do not change how much of the variation is explained by the model. The fit is equally good regardless of what rotation is used.

However, notice what happened to the first factor. We see a fairly large decrease in the amount of variation explained by the first factor. We obtained a cleaner interpretation of the data but it costs us something somewhere. The cost is that the variation explained by the first factor is distributed among the latter two factors, in this case mostly to the second factor.

The total amount of variation explained by the rotated factor model is the same, but the contributions are not the same from the individual factors. We gain a cleaner interpretation, but the first factor does not explain as much of the variation. However, this would not be considered a particularly large cost if we are still interested in these three factors.

Rotation cleans up the interpretation. Ideally, we should find that the numbers in each column are either far away from zero or close to zero. Numbers close to +1 or -1 or 0 in each column give the ideal or cleanest interpretation. If a rotation can achieve this goal, then that is wonderful. However, observed data are seldom this cooperative!

Nevertheless, recall that the objective is data interpretation. The success of the analysis can be judged by how well it helps you to make sense of your data If the result gives you some insight as to the pattern of variability in the data, even without being perfect, then the analysis was successful.

12.12 - Estimation of Factor Scores

Factor scores are similar to the principal components in the previous lesson. Just as we plotted principal components against each other, a similar scatter plot of factor scores is also helpful. We also might use factor scores as explanatory variables in future analyses. It may even be of interest to use the factor score as the dependent variable in a future analysis.

The methods for estimating factor scores depend on the method used to carry out the principal components analysis. The vectors of common factors f are of interest. There are m unobserved factors in our model and we would like to estimate those factors. Therefore, given the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}; i = 1,2,\dots, n,\)

we may wish to estimate the vectors of factor scores

\(\mathbf{f_1, f_2, \dots, f_n}\)

for each observation.

There are a number of different methods for estimating factor scores from the data. These include:

Ordinary Least Squares

  • Weighted Least Squares
  • Regression method

By default, this is the method that SAS uses if you use the principal component method. The difference between the \(j^{th}\) variable on the \(i^{th}\) subject and its value under the factor model is computed. The \(\mathbf{L}\) 's are factor loadings and the f 's are the unobserved common factors. The vector of common factors for subject i , or \( \hat{\mathbf{f}}_i \), is found by minimizing the sum of the squared residuals:

\[\sum_{j=1}^{p}\epsilon^2_{ij} = \sum_{j=1}^{p}(y_{ij}-\mu_j-l_{j1}f_1 - l_{j2}f_2 - \dots - l_{jm}f_m)^2 = (\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})'(\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})\]

This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:

\(\mathbf{\hat{f}_i = (L'L)^{-1}L'(Y_i-\boldsymbol{\mu})}\)

In practice, we substitute our estimated factor loadings into this expression as well as the sample mean for the data:

\(\mathbf{\hat{f}_i = \left(\hat{L}'\hat{L}\right)^{-1}\hat{L}'(Y_i-\bar{y})}\)

Using the principal component method with the unrotated factor loadings, this yields:

\[\mathbf{\hat{f}_i} = \left(\begin{array}{c} \frac{1}{\sqrt{\hat{\lambda}_1}}\mathbf{\hat{e}'_1(Y_i-\bar{y})}\\  \frac{1}{\sqrt{\hat{\lambda}_2}}\mathbf{\hat{e}'_2(Y_i-\bar{y})}\\ \vdots \\  \frac{1}{\sqrt{\hat{\lambda}_m}}\mathbf{\hat{e}'_m(Y_i-\bar{y})}\end{array}\right)\]

\(e_i\) through \(e_m\) are our first m eigenvectors.

Weighted Least Squares (Bartlett)

The difference between WLS and OLS is that the squared residuals are divided by the specific variances as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances.  The factor model fits the data best for variables with low specific variances.  The variables with low specific variances should give us more information regarding the true values for the specific factors.

Therefore, for the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}\)

we want to find \(\boldsymbol{f_i}\) that minimizes

\( \sum\limits_{j=1}^{p}\frac{\epsilon^2_{ij}}{\Psi_j} = \sum\limits_{j=1}^{p}\frac{(y_{ij}-\mu_j - l_{j1}f_1 - l_{j2}f_2 -\dots - l_{jm}f_m)^2}{\Psi} = \mathbf{(Y_i-\boldsymbol{\mu}-Lf_i)'\Psi^{-1}(Y_i-\boldsymbol{\mu}-Lf_i)}\)

The solution is given by this expression where \(\mathbf{\Psi}\) is the diagonal matrix whose diagonal elements are equal to the specific variances:

\(\mathbf{\hat{f}_i = (L'\Psi^{-1}L)^{-1}L'\Psi^{-1}(Y_i-\boldsymbol{\mu})}\)

and can be estimated by substituting the following:

\(\mathbf{\hat{f}_i = (\hat{L}'\hat{\Psi}^{-1}\hat{L})^{-1}\hat{L}'\hat{\Psi}^{-1}(Y_i-\bar{y})}\)

Regression Method

This method is used for maximum likelihood estimates of factor loadings. A vector of the observed data, supplemented by the vector of factor loadings for the i th subject, is considered.

The joint distribution of the data \(\boldsymbol{Y}_i\) and the factor \(\boldsymbol{f}_i\) is

\(\left(\begin{array}{c}\mathbf{Y_i} \\ \mathbf{f_i}\end{array}\right) \sim N \left[\left(\begin{array}{c}\mathbf{\boldsymbol{\mu}} \\ 0 \end{array}\right), \left(\begin{array}{cc}\mathbf{LL'+\Psi} & \mathbf{L} \\ \mathbf{L'} & \mathbf{I}\end{array}\right)\right]\)

Using this we can calculate the conditional expectation of the common factor score \(\boldsymbol{f}_i\) given the data \(\boldsymbol{Y}_i\) as expressed here:

\(E(\mathbf{f_i|Y_i}) = \mathbf{L'(LL'+\Psi)^{-1}(Y_i-\boldsymbol{\mu})}\)

This suggests the following estimator by substituting in the estimates for L and \(\mathbf{\Psi}\):

\(\mathbf{\hat{f}_i = \hat{L}'\left(\hat{L}\hat{L}'+\hat{\Psi}\right)^{-1}(Y_i-\bar{y})}\)

There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.

\(\mathbf{\tilde{f}_i = \hat{L}'S^{-1}(Y_i-\bar{y})}\)

12.13 - Summary

In this lesson we learned about:

  • The interpretation of factor loadings;
  • The principal component and maximum likelihood methods for estimating factor loadings and specific variances
  • How communalities can be used to assess the adequacy of a factor model
  • A likelihood ratio test for the goodness-of-fit of a factor model
  • Factor rotation
  • Methods for estimating common factors
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA). Please refer to A Practical Introduction to Factor Analysis: Confirmatory Factor Analysis .

I. Exploratory Factor Analysis

  • Motivating example: The SAQ
  • Pearson correlation formula

Partitioning the variance in factor analysis

  • principal components analysis
  • principal axis factoring
  • maximum likelihood

Simple Structure

  • Orthogonal rotation (Varimax)
  • Oblique (Direct Oblimin)
  • Generating factor scores

Back to Launch Page

Introduction.

Suppose you are conducting a survey and you want to know whether the items in the survey have similar patterns of responses, do these items “hang together” to create a construct? The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying variables called  factors (smaller than the observed variables), that can explain the interrelationships among those variables. Let’s say you conduct a survey and collect responses about people’s anxiety about using SPSS. Do all these items actually measure what we call “SPSS Anxiety”?

fig01b

Motivating Example: The SAQ (SPSS Anxiety Questionnaire)

Let’s proceed with our hypothetical example of the survey which Andy Field terms the SPSS Anxiety Questionnaire. For simplicity, we will use the so-called “ SAQ-8 ” which consists of the first eight items in the SAQ . Click on the preceding hyperlinks to download the SPSS version of both files. The SAQ-8 consists of the following questions:

  • Statistics makes me cry
  • My friends will think I’m stupid for not being able to cope with SPSS
  • Standard deviations excite me
  • I dream that Pearson is attacking me with correlation coefficients
  • I don’t understand statistics
  • I have little experience of computers
  • All computers hate me
  • I have never been good at mathematics

Pearson Correlation of the SAQ-8

Let’s get the table of correlations in SPSS Analyze – Correlate – Bivariate:

From this table we can see that most items have some correlation with each other ranging from \(r=-0.382\) for Items 3 and 7 to \(r=.514\) for Items 6 and 7. Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer (latent) variables. These interrelationships can be broken up into multiple components

Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean. Factor analysis assumes that variance can be partitioned into two types of variance, common and unique

  • Communality (also called \(h^2\)) is a definition of common variance that ranges between \(0 \) and \(1\). Values closer to 1 suggest that extracted factors explain more of the variance of an individual item.
  • Specific variance : is variance that is specific to a particular item (e.g., Item 4 “All computers hate me” may have variance that is attributable to anxiety about computers in addition to anxiety about SPSS).
  • Error variance:  comes from errors of measurement and basically anything unexplained by common or specific variance (e.g., the person got a call from her babysitter that her two-year old son ate her favorite lipstick).

The figure below shows how these concepts are related:

fig02d

Performing Factor Analysis

As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:

  • factor extraction
  • factor rotation

Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving  simple structure  in order to improve interpretability.

Extracting Factors

There are two approaches to factor extraction which stems from different approaches to variance partitioning: a) principal components analysis and b) common factor analysis.

Principal Components Analysis

Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance (see figure below). Additionally, if the total variance is 1, then the common variance is equal to the communality.

Running a PCA with 8 components in SPSS

The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items. Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later.

First go to Analyze – Dimension Reduction – Factor. Move all the observed variables over the Variables: box to be analyze.

fig4-2a

Under Extraction – Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8. We also bumped up the Maximum Iterations of Convergence to 100.

fig4-2b4

The equivalent SPSS syntax is shown below:

Eigenvalues and Eigenvectors

Before we get into the SPSS output, let’s understand a few things about eigenvalues and eigenvectors.

Eigenvalues represent the total amount of variance that can be explained by a given principal component.  They can be positive or negative in theory, but in practice they explain variance which is always positive.

  • If eigenvalues are greater than zero, then it’s a good sign.
  • Since variance cannot be negative, negative eigenvalues imply the model is ill-conditioned.
  • Eigenvalues close to zero imply there is item multicollinearity, since all the variance can be taken up by the first component.

Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.

Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings  which can be interpreted as the correlation of each item with the principal component. For this particular PCA of the SAQ-8, the  eigenvector associated with Item 1 on the first component is \(0.377\), and the eigenvalue of Item 1 is \(3.057\). We can calculate the first component as

$$(0.377)\sqrt{3.057}= 0.659.$$

In this case, we can say that the correlation of the first item with the first component is \(0.659\). Let’s now move on to the component matrix.

Component Matrix

The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. For example, Item 1 is correlated \(0.659\) with the first component, \(0.136\) with the second component and \(-0.398\) with the third, and so on.

The square of each loading represents the proportion of variance (think of it as an \(R^2\) statistic) explained by a particular component. For Item 1, \((0.659)^2=0.434\) or \(43.4\%\) of its variance is explained by the first component. Subsequently, \((0.136)^2 = 0.018\) or \(1.8\%\) of the variance in Item 1 is explained by the second component. The total variance explained by both components is thus \(43.4\%+1.8\%=45.2\%\). If you keep going on adding the squared loadings cumulatively down the components, you find that it sums to 1 or 100%. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.

Summing the squared component loadings across the components (columns) gives you the communality estimates for each item, and summing each squared loading down the items (rows) gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:

$$(0.659)^2 +  (-.300)^2 – (-0.653)^2 + (0.720)^2 + (0.650)^2 + (0.572)^2 + (0.718)^2 + (0.568)^2 = 3.057$$

You will get eight eigenvalues for eight components, which leads us to the next table.

Total Variance Explained in the 8-component PCA

Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component. For example, Component 1 is \(3.057\), or \((3.057/8)\% = 38.21\%\) of the total variance. Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column.

Choosing the number of components to extract

Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items. One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue (total variance explained) by the component number. Recall that we checked the Scree Plot option under Extraction – Display, so the scree plot should be produced automatically.

fig4-2d

The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? If you look at Component 2, you will see an “elbow” joint. This is the marking point where it’s perhaps not too beneficial to continue further component extraction. There are some conflicting definitions of the interpretation of the scree plot but some say to take the number of components to the left of the the “elbow”. Following this criteria we would pick only one component. A more subjective interpretation of the scree plots suggests that any number of components between 1 and 4 would be plausible and further corroborative evidence would be helpful.

Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%. Picking the number of components is a bit of an art and requires input from the whole research team. Let’s suppose we talked to the principal investigator and she believes that the two component solution makes sense for the study, so we will proceed with the analysis.

Running a PCA with 2 components in SPSS

Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2.

fig06

We will focus the differences in the output between the eight and two-component solution. Under Total Variance Explained, we see that the Initial Eigenvalues no longer equals the Extraction Sums of Squared Loadings. The main difference is that there are only two rows of eigenvalues, and the cumulative percent variance goes up to \(51.54\%\).

Similarly, you will see that the Component Matrix has the same loadings as the eight-component solution but instead of eight columns it’s now two columns.

Again, we interpret Item 1 as having a correlation of 0.659 with Component 1. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest.

Quick check:

True or False

  • The elements of the Component Matrix are correlations of the item with each component.
  • The sum of the squared eigenvalues is the proportion of variance under Total Variance Explained.
  • The Component Matrix can be thought of as correlations and the Total Variance Explained table can be thought of as \(R^2\).

1.T, 2.F (sum of squared loadings), 3. T

Communalities of the 2-component PCA

The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities.

Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality (since this is the total variance across all 8 components), and then proceeds with the analysis until a final communality extracted. Notice that the Extraction column is smaller Initial column because we only extracted two components. As an exercise, let’s manually calculate the first communality from the Component Matrix. The first ordered pair is \((0.659,0.136)\) which represents the correlation of the first item with Component 1 and Component 2. Recall that squaring the loadings and summing down the components (columns) gives us the communality:

$$h^2_1 = (0.659)^2 + (0.136)^2 = 0.453$$

Going back to the Communalities table, if you sum down all 8 items (rows) of the Extraction column, you get \(4.123\). If you go back to the Total Variance Explained table and summed the first two eigenvalues you also get \(3.057+1.067=4.124\). Is that surprising? Basically it’s saying that the summing the communalities across all items is the same as summing the eigenvalues across all components.

1. In a PCA, when would the communality for the Initial column be equal to the Extraction column?

Answer : When you run an 8-component PCA.

  • The eigenvalue represents the communality for each item.
  • For a single component, the sum of squared component loadings across all items represents the eigenvalue for that component.
  • The sum of eigenvalues for all the components is the total variance.
  • The sum of the communalities down the components is equal to the sum of eigenvalues down the items.

1. F, the eigenvalue is the total communality across all items for a single component, 2. T, 3. T, 4. F (you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal).

Common Factor Analysis

The partitioning of variance differentiates a principal components analysis from what we call common factor analysis. Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate. In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.

Running a Common Factor Analysis with 2 factors in SPSS

To run a factor analysis, use the same steps as running a PCA (Analyze – Dimension Reduction – Factor) except under Method choose Principal axis factoring. Note that we continue to set Maximum Iterations for Convergence at 100 and we will see why later.

fig07

Pasting the syntax into the SPSS Syntax Editor we get:

Note the main difference is under /EXTRACTION we list PAF for Principal Axis Factoring instead of PC for Principal Components. We will get three tables of output, Communalities, Total Variance Explained and Factor Matrix. Let’s go over each of these and compare them to the PCA output.

Communalities of the 2-factor PAF

The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. What principal axis factoring does is instead of guessing 1 as the initial communality, it chooses the squared multiple correlation coefficient \(R^2\). To see this in action for Item 1  run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze – Regression – Linear and enter q01 under Dependent and q02 to q08 under Independent(s).

fig08

Pasting the syntax into the Syntax Editor gives us:

The output we obtain from this analysis is

Note that 0.293 (highlighted in red) matches the initial communality estimate for Item 1. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA,  factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.00. This represents the total common variance shared among all items for a two factor solution.

Total Variance Explained (2-factor PAF)

The next table we will look at is Total Variance Explained. Comparing this to the table from the PCA we notice that the Initial Eigenvalues are exactly the same and includes 8 rows for each “factor”. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains 31.38% of the variance whereas Factor 2 explains 6.24% of the variance. Just as in PCA the more factors you extract, the less variance explained by each successive factor.

A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria (Analyze – Dimension Reduction – Factor – Extraction), it bases it off the Initial and not the Extraction solution. This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.067. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.

fig09

  • In theory, when would the percent of variance in the Initial column ever equal the Extraction column?
  • True or False, in SPSS when you use the Principal Axis Factor method the scree plot uses the final factor analysis solution to plot the eigenvalues.

Answers: 1. When there is no unique variance (PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice), 2. F, it uses the initial PCA solution and the eigenvalues assume no unique variance.

Factor Matrix (2-factor PAF)

First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. This is why in practice it’s always good to increase the maximum number of iterations. Now let’s get into the table itself. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor. Just as in PCA, squaring each loading and summing down the items (rows) gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. Let’s calculate this for Factor 1:

$$(0.588)^2 +  (-0.227)^2 + (-0.557)^2 + (0.652)^2 + (0.560)^2 + (0.498)^2 + (0.771)^2 + (0.470)^2 = 2.51$$

This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors (columns) for each item. For example, for Item 1:

$$(0.588)^2 +  (-0.303)^2 = 0.437$$

Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.

The relationship between the three tables

To see the relationships among the three tables let’s first start from the Factor Matrix (or Component Matrix in PCA). We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item. Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table. Going back to the Factor Matrix, if you square the loadings and sum down the items you get Sums of Squared Loadings (in PAF) or eigenvalues (in PCA) for each factor. These now become elements of the Total Variance Explained table. Summing down the rows (i.e., summing down the factors) under the Extraction column we get \(2.511 + 0.499 = 3.01\) or the total (common) variance explained. In words, this is the total (common) variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total (common) variance explained, in this case

$$ (0.437)^2 + (0.052)^2 + (0.319)^2 + (0.460)^2 + (0.344)^2 + (0.309)^2 + (0.851)^2 + (0.236)^2 = 3.01$$

which is the same result we obtained from the Total Variance Explained table. Here is a table that that may help clarify what we’ve talked about:

fig12b

In summary:

  • Squaring the elements in the Factor Matrix gives you the squared loadings
  • Summing the squared loadings of the Factor Matrix across the factors gives you the communality estimates for each item in the Extraction column of the Communalities table.
  • Summing the squared loadings of the Factor Matrix down the items gives you the Sums of Squared Loadings (PAF) or eigenvalue (PCA) for each factor across all items.
  • Summing the eigenvalues or Sums of Squared Loadings in the Total Variance Explained table gives you the total common variance explained.
  • Summing down all items of the Communalities table is the same as summing the eigenvalues or Sums of Squared Loadings down all factors under the Extraction column of the Total Variance Explained table.

True or False (the following assumes a two-factor Principal Axis Factor solution with 8 items)

  • The elements of the Factor Matrix represent correlations of each item with a factor.
  • Each squared element of Item 1 in the Factor Matrix represents the communality.
  • Summing the squared elements of the Factor Matrix down all 8 items within Factor 1 equals the first Sums of Squared Loading under the Extraction column of Total Variance Explained table.
  • Summing down all 8 items in the Extraction column of the Communalities table gives us the total common variance explained by both factors.
  • The total common variance explained is obtained by summing all Sums of Squared Loadings of the Initial column of the Total Variance Explained table
  • The total Sums of Squared Loadings in the Extraction column under the Total Variance Explained table represents the total variance which consists of total common variance plus unique variance.
  • In common factor analysis, the sum of squared loadings is the eigenvalue.

Answers: 1. T, 2. F, the sum of the squared elements across both factors, 3. T, 4. T, 5. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA.

Maximum Likelihood Estimation (2-factor ML)

Since this is a non-technical introduction to factor analysis, we won’t go into detail about the differences between Principal Axis Factoring (PAF) and Maximum Likelihood (ML). The main concept to know is that ML also assumes a common factor analysis using the \(R^2\) to obtain initial estimates of the communalities, but uses a different iterative process to obtain the extraction solution. To run a factor analysis using maximum likelihood estimation under Analyze – Dimension Reduction – Factor – Extraction – Method choose Maximum Likelihood.

fig10

Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables (although Initial columns will overlap). The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit. Non-significant values suggest a good fitting model. Here the p -value is less than 0.05 so we reject the two-factor model.

In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors. The table shows the number of factors extracted (or attempted to extract) as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed. Additionally, NS means no solution and N/A means not applicable. In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative (which cannot happen). For the eight factor solution, it is not even applicable in SPSS because it will spew out a warning that “You cannot request as many factors as variables with any extraction method except PC. The number of factors will be reduced by one.” This means that if you try to extract an eight factor solution for the SAQ-8, it will default back to the 7 factor solution. Now that we understand the table, let’s see if we can find the threshold at which the absolute fit indicates a good fitting model. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose 4-5 factors. We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. Note that there is no “right” answer in picking the best factor model, only what makes sense for your theory. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.

  • The Initial column of the Communalities table for the Principal Axis Factoring and the Maximum Likelihood method are the same given the same analysis.
  • Since they are both factor analysis methods, Principal Axis Factoring and the Maximum Likelihood method will result in the same Factor Matrix.
  • In SPSS, both Principal Axis Factoring and Maximum Likelihood methods give chi-square goodness of fit tests.
  • You can extract as many factors as there are items as when using ML or PAF.
  • When looking at the Goodness-of-fit Test table, a p -value less than 0.05 means the model is a good fitting model.
  • In the Goodness-of-fit Test table, the lower the degrees of freedom the more factors you are fitting.

Answers: 1. T, 2. F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, you can extract as many components as items in PCA, but SPSS will only extract up to the total number of items minus 1, 5. F, greater than 0.05, 6. T, we are taking away degrees of freedom but extracting more factors.

Comparing Common Factor Analysis versus Principal Components

As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance (i.e., no unique variance). For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components. However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.

fig11c

The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:

  • For each item, when the total variance is 1, the common variance becomes the communality.
  • In principal components, each communality represents the total variance across all 8 items.
  • In common factor analysis, the communality represents the common variance for each item.
  • The communality is unique to each factor or component.
  • For both PCA and common factor analysis, the sum of the communalities represent the total variance explained.
  • For PCA, the total variance explained equals the total variance, but for common factor analysis it does not.

Answers: 1. T, 2. F, the total variance for each item, 3. T, 4. F, communality is unique to each item (shared across components or factors), 5. T, 6. T.

Rotation Methods

After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.

  • orthogonal rotation assume factors are independent or uncorrelated with each other
  • oblique rotation factors are not independent and are correlated

The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure. 

Simple structure

Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases. Suppose you wanted to know how well a set of items load on each  factor; simple structure helps us to achieve this.

The definition of simple structure is that in a factor loading matrix:

  • Each row should contain at least one zero.
  • For m factors, each column should have at least m zeroes (e.g., three factors, at least 3 zeroes per factor).

For every pair of factors (columns),

  • there should be several items for which entries approach zero in one column but large loadings on the other.
  • a large proportion of items should have entries approaching zero.
  • only a small number of items have two non-zero entries.

The following table is an example of simple structure with three factors:

Let’s go down the checklist to criteria to see why it satisfies simple structure:

  • each row contains at least one zero (exactly two in each row)
  • each column contains at least three zeros (since there are three factors)
  • for every pair of factors, most items have zero on one factor and non-zeros on the other factor (e.g., looking at Factors 1 and 2, Items 1 through 6 satisfy this requirement)
  • for every pair of factors, all items have zero entries
  • for every pair of factors, none of the items have two non-zero entries

An easier criteria from Pedhazur and Schemlkin (1991) states that

  • each item has high loadings on one factor only
  • each factor has high loadings for only some of the items.

For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test.

Solution: Using the conventional test, although Criteria 1 and 2 are satisfied (each row has at least one zero, each column has at least three zeroes), Criteria 3 fails because for Factors 2 and 3, only 3/8 rows have 0 on one factor and non-zero on the other. Additionally, for Factors 2 and 3, only Items 5 through 7 have non-zero loadings or 3/8 rows have non-zero coefficients (fails Criteria 4 and 5 simultaneously). Using the Pedhazur method, Items 1, 2, 5, 6, and 7 have high loadings on two factors (fails first criteria) and Factor 3 has high loadings on a majority or 5/8 items (fails second criteria).

Orthogonal Rotation (2 factor PAF)

We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability. Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.

Running a two-factor solution (PAF) with Varimax rotation in SPSS

The steps to running a two-factor Principal Axis Factoring is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Varimax. Make sure under Display to check Rotated Solution and Loading plot(s), and under Maximum Iterations for Convergence enter 100.

fig13

Pasting the syntax into the SPSS editor you obtain:

Let’s first talk about what tables are the same or different from running a PAF with no rotation. First, we know that the unrotated factor matrix (Factor Matrix table) should be the same. Additionally, since the  common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution (Rotated Factor Matrix) as well as the transformation used to obtain the rotation (Factor Transformation Matrix). Finally, although the total variance explained by all factors stays the same, the total variance explained by  each  factor will be different.

Rotated Factor Matrix (2-factor PAF Varimax)

The Rotated Factor Matrix table tells us what the factor loadings look like after rotation (in this case Varimax).  Kaiser normalization  is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation. The only drawback is if the communality is low for a particular item, Kaiser normalization will weight these items equally with items with high communality. As such, Kaiser normalization is preferred when communalities are high across all items. You can turn off Kaiser normalization by specifying

Here is what the Varimax rotated loadings look like without Kaiser normalization. Compared to the rotated factor matrix with Kaiser normalization the patterns look similar if you flip Factors 1 and 2; this may be an artifact of the rescaling. Another possible reasoning for the stark differences may be due to the low communalities for Item 2  (0.052) and Item 8 (0.236). Kaiser normalization weights these items equally with the other high communality items.

Interpreting the factor loadings (2-factor PAF Varimax)

In the table above, the absolute loadings that are higher than 0.4 are highlighted in blue for Factor 1 and in red for Factor 2. We can see that Items 6 and 7 load highly onto Factor 1 and Items 1, 3, 4, 5, and 8 load highly onto Factor 2. Item 2 does not seem to load highly on any factor. Looking more closely at Item 6 “My friends are better at statistics than me” and Item 7 “Computers are useful only for playing games”, we don’t see a clear construct that defines the two. Item 2, “I don’t understand statistics” may be too general an item and isn’t captured by SPSS Anxiety. It’s debatable at this point whether to retain a two-factor or one-factor solution, at the very minimum we should see if Item 2 is a candidate for deletion.

Factor Transformation Matrix and Factor Loading Plot (2-factor PAF Varimax)

The Factor Transformation Matrix tells us how the Factor Matrix was rotated. In SPSS, you will see a matrix with two rows and two columns because we have two factors.

How do we interpret this matrix? Well, we can see it as the way to move from the Factor Matrix to the Rotated Factor Matrix. From the Factor Matrix we know that the loading of Item 1 on Factor 1 is \(0.588\) and the loading of Item 1 on Factor 2 is \(-0.303\), which gives us the pair \((0.588,-0.303)\); but in the Rotated Factor Matrix the new pair is \((0.646,0.139)\). How do we obtain this new transformed pair of values? We can do what’s called matrix multiplication. The steps are essentially to start with one column of the Factor Transformation matrix, view it as another ordered pair and multiply matching ordered pairs. To get the first element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) in the first column of the Factor Transformation Matrix.

$$(0.588)(0.773)+(-0.303)(-0.635)=0.455+0.192=0.647.$$

To get the second element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) from the second column of the Factor Transformation Matrix:

$$(0.588)(0.635)+(-0.303)(0.773)=0.373-0.234=0.139.$$

Voila! We have obtained the new transformed pair with some rounding error. The figure below summarizes the steps we used to perform the transformation

fig18

The Factor Transformation Matrix can also tell us angle of rotation if we take the inverse cosine of the diagonal element. In this case, the angle of rotation is \(cos^{-1}(0.773) =39.4 ^{\circ}\). In the factor loading plot, you can see what that angle of rotation looks like, starting from \(0^{\circ}\) rotating up in a counterclockwise direction by \(39.4^{\circ}\). Notice here that the newly rotated x and y-axis are still at \(90^{\circ}\) angles from one another, hence the name orthogonal (a non-orthogonal or oblique rotation means that the new axis is no longer \(90^{\circ}\) apart. The points do not move in relation to the axis but rotate with it.

fig17b

Total Variance Explained (2-factor PAF Varimax)

The Total Variance Explained table contains the same columns as the PAF solution with no rotation, but adds another set of columns called “Rotation Sums of Squared Loadings”. This makes sense because if our rotated Factor Matrix is different, the square of the loadings should be different, and hence the Sum of Squared loadings will be different for each factor. However, if you sum the Sums of Squared Loadings across all factors for the Rotation solution,

$$ 1.701 + 1.309 = 3.01$$

and for the unrotated solution,

$$ 2.511 + 0.499 = 3.01,$$

you will see that the two sums are the same. This is because rotation does not change the total common variance. Looking at the Rotation Sums of Squared Loadings for Factor 1, it still has the largest total variance, but now that shared variance is split more evenly.

Other Orthogonal Rotations

Varimax rotation is the most popular but one among other orthogonal rotations. The benefit of Varimax rotation is that it maximizes the variances of the loadings within the factors while maximizing differences between high and low loadings on a particular factor. Higher loadings are made higher while lower loadings are made lower. This makes Varimax rotation good for achieving simple structure but not as good for detecting an overall factor because it splits up variance of major factors among lesser ones. Quartimax may be a better choice for detecting an overall factor. It maximizes the squared loadings so that each item loads most strongly onto a single factor.

Here is the output of the Total Variance Explained table juxtaposed side-by-side for Varimax versus Quartimax rotation.

You will see that whereas Varimax distributes the variances evenly across both factors, Quartimax tries to consolidate more variance into the first factor.

Equamax is a hybrid of Varimax and Quartimax, but because of this may behave erratically and according to Pett et al. (2003), is not generally recommended.

Oblique Rotation

In oblique rotation, the factors are no longer orthogonal to each other (x and y axes are not \(90^{\circ}\) angles to each other). Like orthogonal rotation, the goal is rotation of the reference axes about the origin to achieve a simpler and more meaningful factor solution compared to the unrotated solution. In oblique rotation, you will see three unique tables in the SPSS output:

  • factor pattern matrix contains partial standardized regression coefficients of each item with a particular factor
  • factor structure matrix contains simple zero order correlations of each item with a particular factor
  • factor correlation matrix is a matrix of intercorrelations among factors

Suppose the Principal Investigator hypothesizes that the two factors are correlated, and wishes to test this assumption. Let’s proceed with one of the most common types of oblique rotations in SPSS, Direct Oblimin.

Running a two-factor solution (PAF) with Direct Quartimin rotation in SPSS

The steps to running a Direct Oblimin is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Direct Oblimin. The other parameter we have to put in is delta , which defaults to zero. Technically, when delta = 0, this is known as Direct Quartimin. Larger positive values for delta increases the correlation among factors. However, in general you don’t want the correlations to be too high or else there is no reason to split your factors up. In fact, SPSS caps the delta value at 0.8 (the cap for negative values is -9999). Negative delta factors may lead to orthogonal factor solutions. For the purposes of this analysis, we will leave our delta = 0 and do a Direct Quartimin analysis.

fig14

All the questions below pertain to Direct Oblimin in SPSS.

  • When selecting Direct Oblimin, delta = 0 is actually Direct Quartimin.
  • Smaller delta values will increase the correlations among factors.
  • You typically want your delta values to be as high as possible.

Answers: 1. T, 2. F, larger delta values, 3. F, delta leads to higher factor correlations, in general you don’t want factors to be too highly correlated

Factor Pattern Matrix (2-factor PAF Direct Quartimin)

The factor pattern matrix represent partial standardized regression coefficients of each item with a particular factor. For example,  \(0.740\) is the effect of Factor 1 on Item 1 controlling for Factor 2 and \(-0.137\) is the effect of Factor 2 on Item 1 controlling for Factor 1. Just as in orthogonal rotation, the square of the loadings represent the contribution of the factor to the variance of the item, but excluding the overlap between correlated factors. Factor 1 uniquely contributes \((0.740)^2=0.405=40.5\%\) of the variance in Item 1 (controlling for Factor 2 ), and Factor 2 uniquely contributes \((-0.137)^2=0.019=1.9%\) of the variance in Item 1 (controlling for Factor 1).

Factor Structure Matrix (2-factor PAF Direct Quartimin)

The factor structure matrix represent the simple zero-order correlations of the items with each factor (it’s as if you ran a simple regression of a single factor on the outcome). For example, \(0.653\) is the simple correlation of Factor 1 on Item 1 and \(0.333\) is the simple correlation of Factor 2 on Item 1. The more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. From this we can see that Items 1, 3, 4, 5, and 8 load highly onto Factor 1 and Items 6, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load well on either factor.

Additionally, we can look at the variance explained by each factor not controlling for the other factors. For example,  Factor 1 contributes \((0.653)^2=0.426=42.6\%\) of the variance in Item 1, and Factor 2 contributes \((0.333)^2=0.11=11.0%\) of the variance in Item 1. Notice that the contribution in variance of Factor 2 is higher \(11\%\) vs. \(1.9\%\) because in the Pattern Matrix we controlled for the effect of Factor 1, whereas in the Structure Matrix we did not.

Factor Correlation Matrix (2-factor PAF Direct Quartimin)

Recall that the more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. In our case, Factor 1 and Factor 2 are pretty highly correlated, which is why there is such a big difference between the factor pattern and factor structure matrices.

Factor plot

The difference between an orthogonal versus oblique rotation is that the factors in an oblique rotation are correlated. This means not only must we account for the angle of axis rotation \(\theta\), we have to account for the angle of correlation \(\phi\). The angle of axis rotation is defined as the angle between the rotated and unrotated axes (blue and black axes). From the Factor Correlation Matrix, we know that the correlation is \(0.636\), so the angle of correlation is \(cos^{-1}(0.636) = 50.5^{\circ}\), which is the angle between the two rotated axes (blue x and blue y-axis). The sum of rotations \(\theta\) and \(\phi\) is the total angle rotation. We are not given the angle of axis rotation, so we only know that the total angle rotation is \(\theta + \phi = \theta + 50.5^{\circ}\).

fig19c

Relationship between the Pattern and Structure Matrix

The structure matrix is in fact a derivative of the pattern matrix. If you multiply the pattern matrix by the factor correlation matrix, you will get back the factor structure matrix. Let’s take the example of the ordered pair \((0.740,-0.137)\) from the Pattern Matrix, which represents the partial correlation of Item 1 with Factors 1 and 2 respectively. Performing matrix multiplication for the first column of the Factor Correlation Matrix we get

$$ (0.740)(1) + (-0.137)(0.636) = 0.740 – 0.087 =0.652.$$

Similarly, we multiple the ordered factor pair with the second column of the Factor Correlation Matrix to get:

$$ (0.740)(0.636) + (-0.137)(1) = 0.471 -0.137 =0.333 $$

Looking at the first row of the Structure Matrix we get \((0.653,0.333)\) which matches our calculation! This neat fact can be depicted with the following figure:

fig21

As a quick aside, suppose that the factors are orthogonal, which means that the factor correlations are 1′ s on the diagonal and zeros on the off-diagonal, a quick calculation with the ordered pair \((0.740,-0.137)\)

$$ (0.740)(1) + (-0.137)(0) = 0.740$$

and similarly,

$$ (0.740)(0) + (-0.137)(1) = -0.137$$

and you get back the same ordered pair. This is called multiplying by the identity matrix (think of it as multiplying \(2*1 = 2\)).

  • Without changing your data or model, how would you make the factor pattern matrices and factor structure matrices more aligned with each other?
  • True or False, When you decrease delta, the pattern and structure matrix will become closer to each other.

Answers: 1. Decrease the delta values so that the correlation between factors approaches zero. 2. T, the correlations will become more orthogonal and hence the pattern and structure matrix will be closer.

Total Variance Explained (2-factor PAF Direct Quartimin)

The column Extraction Sums of Squared Loadings is the same as the unrotated solution, but we have an additional column known as Rotation Sums of Squared Loadings. SPSS says itself that “when factors are correlated, sums of squared loadings cannot be added to obtain total variance”. You will note that compared to the Extraction Sums of Squared Loadings, the Rotation Sums of Squared Loadings is only slightly lower for Factor 1 but much higher for Factor 2. This is because unlike orthogonal rotation, this is no longer the unique contribution of Factor 1 and Factor 2. How do we obtain the Rotation Sums of Squared Loadings? SPSS squares the Structure Matrix and sums down the items.

As a demonstration, let’s obtain the loadings from the Structure Matrix for Factor 1

$$ (0.653)^2 + (-0.222)^2 + (-0.559)^2 + (0.678)^2 + (0.587)^2 + (0.398)^2 + (0.577)^2 + (0.485)^2 = 2.318.$$

Note that \(2.318\) matches the Rotation Sums of Squared Loadings for the first factor. This means that the Rotation Sums of Squared Loadings represent the non- unique contribution of each factor to total common variance, and summing these squared loadings for all factors can lead to estimates that are greater than total variance.

Interpreting the factor loadings (2-factor PAF Direct Quartimin)

Finally, let’s conclude by interpreting the factors loadings more carefully. Let’s compare the Pattern Matrix and Structure Matrix tables side-by-side. First we highlight absolute loadings that are higher than 0.4 in blue for Factor 1 and in red for Factor 2. We see that the absolute loadings in the Pattern Matrix are in general higher in Factor 1 compared to the Structure Matrix and lower for Factor 2. This makes sense because the Pattern Matrix partials out the effect of the other factor. Looking at the Pattern Matrix, Items 1, 3, 4, 5, and 8 load highly on Factor 1, and Items 6 and 7 load highly on Factor 2. Looking at the Structure Matrix, Items 1, 3, 4, 5, 7 and 8 are highly loaded onto Factor 1 and Items 3, 4, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load on any factor. The results of the two matrices are somewhat inconsistent but can be explained by the fact that in the Structure Matrix Items 3, 4 and 7 seem to load onto both factors evenly but not in the Pattern Matrix. For this particular analysis, it seems to make more sense to interpret the Pattern Matrix because it’s clear that Factor 1 contributes uniquely to most items in the SAQ-8 and Factor 2 contributes common variance only to two items (Items 6 and 7). There is an argument here that perhaps Item 2 can be eliminated from our survey and to consolidate the factors into one SPSS Anxiety factor. We talk to the Principal Investigator and we think it’s feasible to accept SPSS Anxiety as the single factor explaining the common variance in all the items, but we choose to remove Item 2, so that the SAQ-8 is now the SAQ-7.

  • In oblique rotation, an element of a factor pattern matrix is the unique contribution of the factor to the item whereas an element in the factor structure matrix is the non- unique contribution to the factor to an item.
  • In the Total Variance Explained table, the Rotation Sum of Squared Loadings represent the unique contribution of each factor to total common variance.
  • The Pattern Matrix can be obtained by multiplying the Structure Matrix with the Factor Correlation Matrix
  • If the factors are orthogonal, then the Pattern Matrix equals the Structure Matrix
  • In oblique rotations, the sum of squared loadings for each item across all factors is equal to the communality (in the SPSS Communalities table) for that item.

Answers: 1. T, 2. F, represent the non -unique contribution (which means the total sum of squares can be greater than the total communality), 3. F, the Structure Matrix is obtained by multiplying the Pattern Matrix with the Factor Correlation Matrix, 4. T, it’s like multiplying a number by 1, you get the same number back, 5. F, this is true only for orthogonal rotations, the SPSS Communalities table in rotated factor solutions is based off of the unrotated solution, not the rotated solution.

As a special note, did we really achieve simple structure? Although rotation helps us achieve simple structure, if the interrelationships do not hold itself up to simple structure, we can only modify our model. In this case we chose to remove Item 2 from our model.

Promax Rotation

Promax rotation begins with Varimax (orthgonal) rotation, and uses Kappa to raise the power of the loadings. Promax really reduces the small loadings. Promax also runs faster than Varimax, and in our example Promax took 3 iterations while Direct Quartimin (Direct Oblimin with Delta =0) took 5 iterations.

  • Varimax, Quartimax and Equamax are three types of orthogonal rotation and Direct Oblimin, Direct Quartimin and Promax are three types of oblique rotations.

Answers: 1. T.

Generating Factor Scores

Suppose the Principal Investigator is happy with the final factor analysis which was the two-factor Direct Quartimin solution. She has a hypothesis that SPSS Anxiety and Attribution Bias predict student scores on an introductory statistics course, so would like to use the factor scores as a predictor in this new regression analysis. Since a factor is by nature unobserved, we need to first predict or generate plausible factor scores. In SPSS, there are three methods to factor score generation, Regression, Bartlett, and Anderson-Rubin.

Generating factor scores using the Regression Method in SPSS

In order to generate factor scores, run the same factor analysis model but click on Factor Scores (Analyze – Dimension Reduction – Factor – Factor Scores). Then check Save as variables, pick the Method and optionally check Display factor score coefficient matrix.

fig25

The code pasted in the SPSS Syntax Editor looksl like this:

Here we picked the Regression approach after fitting our two-factor Direct Quartimin solution. After generating the factor scores, SPSS will add two extra variables to the end of your variable list, which you can view via Data View. The figure below shows what this looks like for the first 5 participants, which SPSS calls FAC1_1 and FAC2_1 for the first and second factors. These are now ready to be entered in another analysis as predictors.

fig26

For those who want to understand how the scores are generated, we can refer to the Factor Score Coefficient Matrix. These are essentially the regression weights that SPSS uses to generate the scores. We know that the ordered pair of scores for the first participant is \(-0.880, -0.113\). We also know that the 8 scores for the first participant are \(2, 1, 4, 2, 2, 2, 3, 1\). However, what SPSS uses is actually the standardized scores, which can be easily obtained in SPSS by using Analyze – Descriptive Statistics – Descriptives – Save standardized values as variables. The standardized scores obtained are:   \(-0.452, -0.733, 1.32, -0.829, -0.749, -0.2025, 0.069, -1.42\). Using the Factor Score Coefficient matrix, we multiply the participant scores by the coefficient matrix for each column. For the first factor:

$$ \begin{eqnarray} &(0.284) (-0.452) + (-0.048)-0.733) + (-0.171)(1.32) + (0.274)(-0.829) \\ &+ (0.197)(-0.749) +(0.048)(-0.2025) + (0.174) (0.069) + (0.133)(-1.42) \\ &= -0.880, \end{eqnarray} $$

which matches FAC1_1  for the first participant. You can continue this same procedure for the second factor to obtain FAC2_1.

The second table is the Factor Score Covariance Matrix,

This table can be interpreted as the covariance matrix of the factor scores, however it would only be equal to the raw covariance if the factors are orthogonal. For example, if we obtained the raw covariance matrix of the factor scores we would get

You will notice that these values are much lower. Let’s compare the same two tables but for Varimax rotation:

If you compare these elements to the Covariance table below, you will notice they are the same.

Note with the Bartlett and Anderson-Rubin methods you will not obtain the Factor Score Covariance matrix.

Regression, Bartlett and Anderson-Rubin compared

Among the three methods, each has its pluses and minuses. The regression method maximizes the correlation (and hence validity) between the factor scores and the underlying factor but the scores can be somewhat biased. This means even if you have an orthogonal solution, you can still have correlated factor scores. For Bartlett’s method, the factor scores highly correlate with its own factor and not with others, and they are an unbiased estimate of the true factor score. Unbiased scores means that with repeated sampling of the factor scores, the average of the scores is equal to the average of the true factor score. The Anderson-Rubin method perfectly scales the factor scores so that the factor scores are uncorrelated with other factors and uncorrelated with other factor scores . Since Anderson-Rubin scores impose a correlation of zero between factor scores, it is not the best option to choose for oblique rotations. Additionally, Anderson-Rubin scores are biased.

In summary, if you do an orthogonal rotation, you can pick any of the the three methods. For orthogonal rotations, use Bartlett if you want unbiased scores, use the regression method if you want to maximize validity and use Anderson-Rubin if you want the factor scores themselves to be uncorrelated with other factor scores. If you do oblique rotations, it’s preferable to stick with the Regression method. Do not use Anderson-Rubin for oblique rotations.

  • If you want the highest correlation of the factor score with the corresponding factor (i.e., highest validity), choose the regression method.
  • Bartlett scores are unbiased whereas Regression and Anderson-Rubin scores are biased.
  • Anderson-Rubin is appropriate for orthogonal but not for oblique rotation because factor scores will be uncorrelated with other factor scores.

Answers: 1. T, 2. T, 3. T

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

A Primer on Factor Analysis in Research using Reproducible R Software

Abdisalam hassan muse (phd).

Amoud University

This primer provides an overview of factor analysis in research, covering the meaning and assumptions of factor analysis, as well as the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The procedure for conducting factor analysis is explained, with a focus on the role of the correlation matrix and a general model of the correlation matrix of individual variables. The paper covers methods for extracting factors, including principal component analysis (PCA) and determining the number of factors to be extracted, such as the comprehensibility, Kaiser Criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis (PA). The meaning and interpretation of communality and eigenvalues are discussed, as well as factor loading and rotation methods such as varimax. The paper also covers the meaning and interpretation of factor scores and their use in subsequent analyses. The R software is used throughout the paper to provide reproducible examples and code for conducting factor analysis.

Introduction

Factor analysis is a statistical technique commonly used in research to identify underlying dimensions or constructs that explain the variability among a set of observed variables. It is often used to reduce the complexity of a dataset by summarizing a large number of variables into a smaller set of factors that are easier to understand and analyze. Factor analysis is widely used in fields such as psychology, education, marketing, and social sciences to explore the relationships between variables and to identify underlying latent constructs.

In this tutorial paper, we will provide an overview of factor analysis, including its meaning and assumptions, the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), and the procedure for conducting factor analysis. We will also cover the role of the correlation matrix and a general model of the correlation matrix of individual variables.

The paper will discuss methods for extracting factors, including principal component analysis (PCA), and determining the number of factors to be extracted using criteria such as the comprehensibility, Kaiser Criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis (PA). The meaning and interpretation of communality and eigenvalues will be discussed, as well as factor loading and rotation methods such as varimax.

Finally, the paper will cover the meaning and interpretation of factor scores and their use in subsequent analyses. The R software will be used throughout the paper to provide reproducible examples and code for conducting factor analysis. By the end of this tutorial paper, readers will have a better understanding of the fundamentals of factor analysis and how to apply it in their research.

Module I: Factor Analysis in Research

Meaning of factor analysis.

Factor analysis is a statistical method that is widely used in research to identify the underlying factors that explain the variations in a set of observed variables. The method is particularly useful in fields such as psychology, sociology, marketing, and education, where researchers often deal with complex datasets that contain many variables. The basic idea behind factor analysis is to identify the common factors that underlie a set of observed variables. By identifying these factors, researchers can reduce the number of variables they need to analyze, simplify the data, and gain insights into the underlying structure of the data.

Factor analysis can be used in two main ways: exploratory and confirmatory.

Exploratory factor analysis is used when the researcher does not have a priori knowledge of the underlying factors and wants to identify them from the data.

Confirmatory factor analysis , on the other hand, is used when the researcher has a specific hypothesis about the underlying factors and wants to test this hypothesis using the data.

Factor analysis has several advantages over other statistical methods. It can help researchers identify the most important variables in a dataset, reduce the number of variables they need to analyze, and provide insights into the relationships between variables. However, it also has some limitations and assumptions that must be taken into account when applying the method.

In this primer or tutorial paper, we will provide an overview of factor analysis, its applications in research, and the steps involved in performing factor analysis. We will also discuss the assumptions and limitations of the method, as well as methods for interpreting and visualizing the results. Finally, we will provide several examples of factor analysis in different fields of research, illustrating how the method can be used to extract meaningful information from complex datasets.

Assumptions of Factor Analysis

Factor analysis is a statistical technique that is used to identify the underlying factors that explain the correlations between a set of observed variables. In order to obtain valid results from factor analysis, certain assumptions must be met. Here are some of the key assumptions of factor analysis:

Normality : Factor analysis assumes that the data is normally distributed. If the data is not normally distributed, then the results of the analysis may be biased or unreliable. Normality can be checked using statistical tests such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test.

Linearity : Factor analysis assumes that the relationships between the observed variables and the underlying factors are linear. If the relationships are non-linear, then the results of the analysis may be biased or unreliable.

Sample size : Factor analysis assumes that the sample size is sufficient to obtain reliable estimates of the factor model. A rule of thumb is to have at least 10 observations per variable, although some researchers recommend a larger sample size.

Absence of multicollinearity : Factor analysis assumes that there is no multicollinearity among the observed variables. Multicollinearity occurs when two or more variables are highly correlated with each other, which can lead to unstable estimates of the factor model.

Adequate factor loading : Factor analysis assumes that there are strong associations (i.e., factor loadings) between the observed variables and the underlying factors. Weak factor loadings may indicate that the observed variables are not good indicators of the underlying factors, or that there are too few factors in the model.

In summary, factor analysis is a powerful technique for identifying the underlying factors that explain the correlations between a set of observed variables. However, the assumptions of normality, linearity, sample size, absence of multicollinearity, and adequate factor loading must be met in order to obtain valid results.

EFA and CFA Factor Analysis Procedures

Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) are two types of factor analysis procedures that are used to identify the underlying factors that explain the correlations between a set of observed variables.

Exploratory Factor Analysis (EFA) is used when there is no prior theory about the underlying factors, and the goal is to identify the factors that explain the correlations between variables. In EFA, the researcher starts with a set of observed variables and uses statistical techniques to identify the most important factors that explain the correlations between them. The researcher does not have any preconceived notion about the number of factors or how they are related to each other. The aim is to identify the underlying structure of the data and to reduce the number of variables that need to be analyzed.

Confirmatory Factor Analysis (CFA), on the other hand, is used when there is a specific theory about the underlying factors, and the goal is to test this theory using the data. In CFA, the researcher starts with a pre-specified model that specifies the number of factors and how they are related to each other. The aim is to test the theory and to determine whether the observed data fit the model. The researcher tests the model using a variety of statistical techniques and evaluates the goodness-of-fit of the model.

EFA is an unsupervised learning method that is used to explore the data and identify the underlying structure. The goal of EFA is to identify the most important factors that explain the correlations between variables and to reduce the number of variables that need to be analyzed. The researcher does not have any preconceived notion about the number of factors or how they are related to each other. EFA involves several steps, such as selecting the appropriate method for factor extraction, determining the number of factors to retain, and selecting the method for factor rotation. The goal is to identify the simplest factor structure that best explains the data.

CFA, on the other hand, is a supervised learning method that is used to confirm or refute a pre-specified theory. The goal of CFA is to evaluate the degree to which the observed data fit the pre-specified model that specifies the proposed number of factors and their relationships. The researcher first creates a model that specifies the proposed number of factors and their relationships, and then tests the fit of the model to the observed data. The researcher can use a variety of statistical techniques to evaluate the goodness-of-fit of the model, such as chi-square tests, comparative fit index (CFI), Tucker-Lewis Index (TLI), and root mean square error of approximation (RMSEA).

Both EFA and CFA require the researcher to consider several assumptions of factor analysis, such as normality, linearity, absence of multicollinearity, and adequate factor loading. Violations of these assumptions can result in biased or unreliable results. Therefore, it is important to conduct appropriate data screening, model testing, and model modification to ensure that the assumptions are met.

In summary, EFA is an exploratory technique used to identify the underlying factors that explain the correlations between variables, while CFA is a confirmatory technique used to test a pre-specified theory about the underlying factors and their relationships. Both procedures require careful consideration of the assumptions of factor analysis and appropriate statistical techniques for model evaluation.

Comparison between EFA and CFA

Note that EFA and CFA are both types of factor analysis, but they differ in their goals, assumptions, and methods. EFA is used to explore the underlying structure of a set of observed variables, while CFA is used to test a specific model of the relationships between the observed variables and latent factors. EFA allows the number of factors to be determined by the data, while CFA requires the number of factors to be specified a priori. EFA allows the factor loadings to vary across different samples or variables, while CFA assumes that the factor loadings are fixed and known a priori. EFA is exploratory and can be used to generate hypotheses, while CFA is confirmatory and can be used to test specific hypotheses.

Real-life Examples

Some real-life examples and research titles that illustrate the differences between exploratory factor analysis (EFA) and confirmatory factor analysis (CFA):

In summary, EFA and CFA are both useful techniques for analyzing data and exploring the underlying structure of observed variables. The choice between EFA and CFA depends on the research question, the goals of the analysis, and the availability of prior knowledge or theoretical frameworks. EFA is useful for exploratory analyses where the underlying structure of the observed variables is unknown or needs to be better understood, while CFA is useful for confirmatory analyses where a pre-specified model of the relationships between the observed variables and latent constructs is available or needs to be tested.

1. Education sector:

Real-life example : A researcher is interested in understanding the factors that influence student engagement in online learning. They collect data on various variables such as perceived usefulness, ease of use, and social presence.

Research title for EFA : “Exploring the underlying factors of student engagement in online learning: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of student engagement in online learning: A confirmatory factor analysis approach”

Explanation : In this example, the researcher may use EFA to explore the underlying structure of the observed variables related to student engagement in online learning and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the researcher may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to student engagement in online learning. In this case, the research title would reflect the approach used in the analysis.

2. Psychology sector:

Real-life example : A psychologist is interested in understanding the factors that contribute to anxiety in adolescents. They collect data on various variables such as stress, self-esteem, and social support.

Research title for EFA : “Identifying the underlying factors of anxiety in adolescents: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of anxiety in adolescents: A confirmatory factor analysis approach”

Explanation : In this example, the psychologist may use EFA to identify the underlying structure of the observed variables related to anxiety in adolescents and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the psychologist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to anxiety in adolescents. In this case, the research title would reflect the approach used in the analysis.

3. Law sector:

Real-life example : A law firm is interested in understanding the factors that contribute to job satisfaction among their employees. They collect data on various variables such as work-life balance, compensation, and career advancement opportunities.

Research title for EFA : “Exploring the underlying factors of job satisfaction among law firm employees: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of job satisfaction among law firm employees: A confirmatory factor analysis approach”

Explanation : In this example, the law firm may use EFA to explore the underlying structure of the observed variables related to job satisfaction among their employees and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the law firm may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to job satisfaction among their employees. In this case, the research title would reflect the approach used in the analysis.

4. Medicine sector:

Real-life example : A physician is interested in understanding the factors that contribute to patient satisfaction with their healthcare experience. They collect data on various variables such as communication with healthcare providers, access to care, and quality of care.

Research title for EFA : “Identifying the underlying factors of patient satisfaction with healthcare: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of patient satisfaction with healthcare: A confirmatory factor analysis approach”

Explanation : In this example, the physician may use EFA to identify the underlying structure of the observed variables related to patient satisfaction with healthcare and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the physician may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to patient satisfaction with healthcare. In this case, the research title would reflect the approach used in the analysis.

5. Engineering sector:

Real-life example : A company is interested in understanding the factors that contribute to customer satisfaction with their products. They collect data on various variables such as product quality, design, and reliability.

Research title for EFA : “Exploring the underlying factors of customer satisfaction with engineering products: An exploratory factor analysis approach”

Research title for CFA : “Developing and validating a model of customer satisfaction with engineering products: A confirmatory factor analysis approach”

Explanation : In this example, the company may use EFA to explore the underlying structure of the observed variables related to customer satisfaction with their engineering products and generate hypotheses about the relationships between the observed variables and latent factors. They may then use the results of the EFA to develop a new customer satisfaction survey. Alternatively, the company may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to customer satisfaction with their engineering products, and validate the new customer satisfaction survey. In this case, the research title would reflect the approach used in the analysis.

6. Public health sector:

Real-life example : A public health researcher is interested in understanding the factors that contribute to health-related quality of life among older adults. They collect data on various variables such as physical functioning, mental health, and social support.

Research title for EFA : “Exploring the underlying factors of health-related quality of life among older adults: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of health-related quality of life among older adults: A confirmatory factor analysis approach”

Explanation : In this example, the public health researcher may use EFA to explore the underlying structure of the observed variables related to health-related quality of life among older adults and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the researcher may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to health-related quality of life among older adults. In this case, the research title would reflect the approach used in the analysis.

7. Finance sector:

Real-life example : A financial analyst is interested in understanding the factors that contribute to stock prices. They collect data on various variables such as earnings per share, market capitalization, and price-earnings ratio.

Research title for EFA : “Exploring the underlying factors of stock prices: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of stock prices: A confirmatory factor analysis approach”

Explanation : In this example, the financial analyst may use EFA to explore the underlying structure of the observed variables related to stock prices and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the analyst may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to stock prices. In this case, the research title would reflect the approach used in the analysis.

8. Project Management sector:

Real-life example : A project manager is interested in understanding the factors that contribute to project success. They collect data on various variables such as project scope, budget, and stakeholder engagement.

Research title for EFA : “Exploring the underlying factors of project success: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of project success: A confirmatory factor analysis approach”

Explanation : In this example, the project manager may use EFA to explore the underlying structure of the observed variables related to project success and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the manager may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to project success. In this case, the research title would reflect the approach used in the analysis.

9. Monitoring and Evaluation (M&E) sector:

Real-life example : An M&E specialist is interested in understanding the factors that contribute to program effectiveness. They collect data on various variables such as program inputs, activities, and outcomes.

Research title for EFA : “Identifying the underlying factors of program effectiveness: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of program effectiveness: A confirmatory factor analysis approach”

Explanation : In this example, the M&E specialist may use EFA to identify the underlying structure of the observed variables related to program effectiveness and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the specialist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to program effectiveness. In this case, the research title would reflect the approach used in the analysis.

10. Data Science sector:

Real-life example : A data scientist is interested in understanding the factors that contribute to customer churn. They collect data on various variables such as customer demographics, usage patterns, and customer service interactions.

Research title for EFA : “Exploring the underlying factors of customer churn: An exploratory factor analysis approach”

Research title for CFA : “Testing a model of customer churn: A confirmatory factor analysis approach”

Explanation : In this example, the data scientist may use EFA to explore the underlying structure of the observed variables related to customer churn and generate hypotheses about the relationships between the observed variables and latent factors. Alternatively, the scientist may use CFA to test a pre-specified model of the relationships between the observed variables and latent constructs related to customer churn. In this case, the research title would reflect the approach used in the analysis.

Overall, the choice between EFA and CFA depends on the research question, the goals of the analysis, and the availability of prior knowledge or theoretical frameworks, regardless of the sector in which the research is being conducted. EFA is useful for exploratory analyses where the underlying structure of the observed variables is unknown or needs to be better understood, while CFA is useful for confirmatory analyses where a pre-specified model of the relationships between the observed variables and latent constructs is available or needs to be tested.

Module II: Correlation Matrix

Role of a correlation matrix in factor analysis.

The correlation matrix is a critical component in factor analysis as it provides information about the relationships between the observed variables. In factor analysis, the goal is to identify the underlying factors that explain the correlations between the observed variables. The correlation matrix provides the information needed to identify the factors.

Factor analysis assumes that the observed variables are correlated because they share common underlying factors. The correlation matrix provides information about the strength and direction of these correlations. The strength of the correlation between two variables indicates how closely they are related, while the sign of the correlation (positive or negative) indicates the direction of the relationship. A positive correlation indicates that the variables tend to increase or decrease together, while a negative correlation indicates that the variables tend to move in opposite directions. Factor analysis uses the correlation matrix to estimate the factor loadings, which represent the degree to which each observed variable is associated with each underlying factor. The factor loadings are used to construct the factor structure, which represents the underlying factors and their relationships. The factor structure can be rotated to simplify and clarify the interpretation of the factors.

In summary, the correlation matrix is a key component in factor analysis as it provides the information needed to identify the underlying factors that explain the correlations between the observed variables. The factor loadings are estimated using the correlation matrix, and the factor structure is constructed based on the estimated loadings. The correlation matrix, therefore, plays a critical role in the factor analysis process.

A general model of a correlation matrix of individual variables

A correlation matrix is a square matrix that shows the correlation coefficients between a set of individual variables. The general model of a correlation matrix can be expressed as follows:

\[\begin{equation} C = \begin{bmatrix} c_{1,1} & c_{1,2} & c_{1,3} & \cdots & c_{1,k} \ c_{2,1} & c_{2,2} & c_{2,3} & \cdots & c_{2,k} \ c_{3,1} & c_{3,2} & c_{3,3} & \cdots & c_{3,k} \ \vdots & \vdots & \vdots & \ddots & \vdots \ c_{k,1} & c_{k,2} & c_{k,3} & \cdots & c_{k,k} \end{bmatrix} \end{equation}\] ,2} & c_{3,3} & & c_{3,k} & & & & c_{k,1} & c_{k,2} & c_{k,3} & & c_{k,k} \end{bmatrix} \end{equation}

where \(C\) is the correlation matrix, \(k\) is the number of individual variables, and \(c_{i,j}\) is the correlation coefficient between the \(i\) -th and \(j\) -th variables.

The diagonal elements of the correlation matrix represent the correlations between each variable and itself, which are always equal to 1. The off-diagonal elements represent the correlations between different pairs of variables. The correlation coefficient can range from -1 to 1, where a value of -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.

The correlation matrix can be used for various purposes, such as identifying clusters of correlated variables, detecting multicollinearity, and exploring the underlying factor structure using factor analysis. It is important to note that the correlation matrix assumes that the variables are continuous, linearly related, and normally distributed. Violations of these assumptions can affect the validity and reliability of the correlation matrix and its interpretation.

Interpreting the correlation matrix

Interpreting a correlation matrix involves examining the strength and direction of the correlations between pairs of variables. The correlation matrix provides a summary of the relationships between the variables, and understanding these relationships is important for many statistical analyses, including regression, factor analysis, and structural equation modeling.

The strength of the correlation is indicated by the absolute value of the correlation coefficient. A correlation coefficient of 0 indicates no relationship between the variables, while a correlation coefficient of 1 (or -1) indicates a perfect positive (or negative) correlation. Correlation coefficients between 0 and 1 (or 0 and -1) indicate varying degrees of positive (or negative) correlation.

The direction of the correlation is indicated by the sign of the correlation coefficient. A positive correlation indicates that the variables tend to increase or decrease together, while a negative correlation indicates that the variables tend to move in opposite directions.

It is also important to consider the context of the variables being analyzed when interpreting the correlation matrix. For example, a correlation of 0.3 between two variables may be considered strong in one context and weak in another context.

Additionally, the correlation matrix does not imply causation, and caution should be exercised when interpreting correlations as evidence of causation. In some cases, it may be necessary to adjust the correlation matrix before interpreting it. For example, if the variables have different scales or units of measurement, it may be necessary to standardize the variables before calculating the correlation coefficients. Additionally, outliers or missing data may need to be addressed before interpreting the correlation matrix.

In summary, interpreting the correlation matrix involves examining the strength and direction of the correlations between pairs of variables and considering the context of the variables being analyzed. It is important to remember that the correlation matrix does not imply causation and that adjustments may be necessary before interpreting the matrix.

Example using R Code

R code for computing a correlation matrix and interpreting the results using a built-in dataset in R:

In this example, we loaded the built-in iris dataset in R and computed the correlation matrix between the variables Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width using the cor() function. We then printed the correlation matrix and interpreted the results by examining the pairwise correlations between the variables.

Module III: Factors Extraction

Factor extraction methods are used in factor analysis to identify the underlying factors that explain the correlations between a set of observed variables. There are several methods for extracting factors, including:

Principal Component Analysis (PCA) : PCA is a data reduction technique that extracts factors based on the variance in the observed variables. PCA identifies factors that account for the maximum amount of variance in the data and is useful when the goal is to reduce the number of variables in the analysis.

Common Factor Analysis (CFA) : CFA is a method that extracts factors based on the shared variance among the observed variables. CFA assumes that the observed variables are influenced by a smaller number of common factors, which are responsible for the correlations among the variables.

Maximum Likelihood (ML) : ML is a statistical technique that estimates the parameters of a statistical model by maximizing the likelihood function. ML is commonly used in CFA and Structural Equation Modeling (SEM) to estimate the factor loadings and other model parameters.

Principal Axis Factoring (PAF) : PAF is a method that extracts factors based on the common variance among the observed variables. PAF assumes that each variable contributes to the factor structure in proportion to its common variance with the other variables.

Unweighted Least Squares (ULS) : ULS is a method that extracts factors based on the correlations among the observed variables. ULS is commonly used in CFA and SEM to estimate the factor loadings and other model parameters.

Maximum Variance (MV) : MV is a method that extracts factors based on the maximum variance in the observed variables. MV is similar to PCA but is less commonly used in factor analysis.

The choice of factor extraction method depends on the research question, the nature and structure of the data, and the assumptions underlying the method. It is important to carefully consider the strengths and limitations of each method and to select a method that is appropriate for the research question and the data at hand. ## Example 1

How to demonstrate the different factor extraction methods using the “bfi” dataset in the “psych” package in R:

R code that demonstrates different factor extraction methods using the bfi data in the psych package:

In this example, we are using the “bfi” dataset from the “psych” package and selecting a subset of variables to use for the factor analysis. We then use six different factor extraction methods, including Principal Component Analysis (PCA), Common Factor Analysis (CFA), Maximum Likelihood Estimation (MLE), Principal Axis Factoring (PAF), Unweighted Least Squares (ULS), and Maximum Variance Extraction (MVE).

For each method, we specify two factors and use the varimax rotation method to simplify the factor structure. We then print out the results for each method using the “print” function.

Note that the “fm” argument in the “fa” function specifies the factor extraction method to use. In this example, we are using “ml” for MLE, “paf” for PAF, “uls” for ULS, and “mve” for MVE. If no “fm” argument is specified, the default factor extraction method in the “fa” function is MLE.

The interpretation of the factor analysis results depends on the specific method used and the research question. In general, the summary output provides information about the factor loadings, communalities, eigenvalues, and other relevant statistics.

The factor diagram and biplot can help visualize the relationships between the variables and the factors.

It is important to carefully examine the results and to consider the assumptions of each method before interpreting the factor analysis results.

Determining the number of factors to be extracted

Determining the number of factors to be extracted in a factor analysis is an important step that involves evaluating the fit of the model and selecting the appropriate number of factors. There are several methods for determining the number of factors, including:

Comprehensibility : This method involves selecting the number of factors that make the most sense conceptually or theoretically. For example, if the research question involves identifying the underlying dimensions of a personality test, the number of factors may be based on the number of personality traits that are hypothesized to exist.

Kaiser Criterion : This method involves selecting the number of factors with eigenvalues greater than 1.0, which is based on the assumption that each factor should account for at least as much variance as one of the original variables. However, this method may overestimate the number of factors, particularly when there are many variables in the analysis.

Variance Explained Criteria : This method involves selecting the number of factors that explain a certain percentage of the total variance in the data. For example, a researcher may decide to retain factors that collectively explain at least 60% or 70% of the variance in the data.

Cattell’s Scree Plot : This method involves plotting the eigenvalues of the factors in descending order and selecting the number of factors at the “elbow” of the plot, which represents the point at which the eigenvalues start to level off. However, this method can be subjective and may be influenced by the researcher’s interpretation of the plot.

Horn’s Parallel Analysis : This method involves comparing the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. The number of factors to retain is based on the eigenvalues of the actual data that exceed the eigenvalues of the randomly generated data. This method is considered to be one of the most accurate methods for determining the number of factors.

In summary, determining the number of factors to be extracted involves evaluating the fit of the model and selecting the appropriate number of factors based on a combination of methods, including comprehensibility, Kaiser criterion, variance explained criteria, Cattell’s scree plot, and Horn’s parallel analysis. It is important to carefully consider the strengths and limitations of each method and to select a method that is appropriate for the research question and the data at hand.

Real-life Example 1

Suppose a researcher is interested in identifying the underlying factors that explain the responses to a questionnaire about job satisfaction. The questionnaire includes 20 items that measure various aspects of job satisfaction, such as salary, work environment, and work-life balance.

Comprehensibility : The researcher may start by considering the theoretical or conceptual structure of job satisfaction. For example, if previous research has identified three dimensions of job satisfaction (i.e., intrinsic, extrinsic, and relational), the researcher may decide to extract three factors.

Kaiser Criterion : The researcher may perform a factor analysis and examine the eigenvalues of the factors. If the first three factors have eigenvalues greater than 1.0, the researcher may decide to extract three factors.

Variance Explained Criteria : The researcher may decide to extract the number of factors that explain a certain percentage of the total variance. For example, the researcher may decide to extract the number of factors that collectively explain at least 60% or 70% of the variance in the data.

Cattell’s Scree Plot : The researcher may plot the eigenvalues of the factors in descending order and select the number of factors at the “elbow” of the plot. For example, if the eigenvalues start to level off after the third factor, the researcher may decide to extract three factors.

Horn’s Parallel Analysis : The researcher may compare the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. If the eigenvalues of the actual data exceed the eigenvalues of the randomly generated data for the first three factors, the researcher may decide to extract three factors.

In this example, the different methods for determining the number of factors may lead to different results. Comprehensibility and previous research suggest that there may be three factors, while the Kaiser criterion, variance explained criteria, and Cattell’s scree plot suggest that three factors may be appropriate. Horn’s parallel analysis may also support the extraction of three factors.

The choice of which method to use ultimately depends on the research question and the nature of the data. In some cases, a combination of methods may be used to determine the appropriate number of factors. For example, the researcher may consider both the theoretical structure of job satisfaction and the results of the factor analysis to decide on the appropriate number of factors to extract.

Real-life Example 2

Suppose a researcher is interested in identifying the underlying factors that explain the responses to a survey on customer satisfaction for a retail store. The survey includes 25 items that measure various aspects of customer satisfaction, such as product quality, store ambiance, customer service, and pricing.

Comprehensibility : The researcher may consider the theoretical or conceptual structure of customer satisfaction based on previous research. For example, if previous research has identified four dimensions of customer satisfaction (i.e., product quality, store ambiance, customer service, and pricing), the researcher may decide to extract four factors.

Kaiser Criterion : The researcher may perform a factor analysis and examine the eigenvalues of the factors. If the first four factors have eigenvalues greater than 1.0, the researcher may decide to extract four factors.

Variance Explained Criteria : The researcher may decide to extract the number of factors that explain a certain percentage of the total variance. For example, the researcher may decide to extract the number of factors that collectively explain at least 70% or 80% of the variance in the data.

Cattell’s Scree Plot : The researcher may plot the eigenvalues of the factors in descending order and select the number of factors at the “elbow” of the plot. For example, if the eigenvalues start to level off after the fourth factor, the researcher may decide to extract four factors.

Horn’s Parallel Analysis: The researcher may compare the eigenvalues of the factors in the actual data to the eigenvalues of factors in randomly generated data with the same sample size and number of variables. If the eigenvalues of the actual data exceed the eigenvalues of the randomly generated data for the first four factors, the researcher may decide to extract four factors.

In this example, the different methods for determining the number of factors may lead to different results. Comprehensibility and previous research suggest that there may be four factors, while the Kaiser criterion, variance explained criteria, and Cattell’s scree plot suggest that four factors may be appropriate. Horn’s parallel analysis may also support the extraction of four factors. The choice of which method to use ultimately depends on the research question and the nature of the data. In some cases, a combination of methods may be used to determine the appropriate number of factors. For example, the researcher may consider both the theoretical structure of customer satisfaction and the results of the factor analysis to decide on the appropriate number of factors to extract.

Overall, the different methods for determining the number of factors to be extracted in the USArrests dataset lead to the extraction of three factors. This suggests that there are three underlying dimensions of crime rates in the United States that are measured by the variables in the dataset. The researcher may interpret these factors as representing different aspects of crime rates, such as violent crime, property crime, and white-collar crime. However, it is important to note that the choice of which method to use ultimately depends on the research question and the nature of the data. Different methods may lead to different conclusions about the appropriate number of factors to extract.

Module IV: Communality and Eigen Values

Communality and eigenvalues are two important concepts in factor analysis. Here’s an explanation of what they are and how they are related:

Communalities : In factor analysis, communalities refer to the proportion of variance in each original variable that is accounted for by the extracted factors. Communalities range from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is explained by the factors. Communalities can be computed as the sum of the squared factor loadings for each variable.

Eigenvalues : Eigenvalues represent the amount of variance in the original variables that is explained by each factor. They are computed as the sum of the squared factor loadings for each factor. Eigenvalues are used to determine the number of factors to extract by examining the magnitude of each eigenvalue .

Communalities and eigenvalues are related in that they both represent the amount of variance in the original variables that is explained by the extracted factors. However, they differ in their interpretation and calculation.

Communalities are used to assess the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

Eigenvalues, on the other hand, are used to determine the number of factors to extract. Factors with eigenvalues greater than 1 are considered to be important and are typically retained. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables.

In summary, communalities and eigenvalues are both important measures in factor analysis, but they serve different purposes. Communalities provide information about the overall adequacy of the factor solution, while eigenvalues are used to determine the number of factors to extract.

Meaning of communality

In factor analysis, communality represents the proportion of variance in each original variable that is accounted for by the extracted factors . In other words, it is the amount of shared variance between the original variable and the factors. Communality is computed as the sum of the squared factor loadings for each variable. The squared factor loadings represent the proportion of variance in the variable that is explained by each factor. By summing the squared factor loadings across all factors, we can obtain the proportion of total variance in the variable that is accounted for by the factors. This is the communality.

Communality ranges from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is explained by the factors. A communality of 1 indicates that all the variance in the variable is accounted for by the extracted factors, while a communality of 0 indicates that none of the variance in the variable is accounted for by the factors.

Communality is an important measure in factor analysis because it provides information about the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

In summary, communality is a measure of the amount of shared variance between the original variables and the extracted factors in factor analysis. It is an important measure for assessing the overall adequacy of the factor solution and identifying variables that may need additional factors to fully capture their variance.

Role of communality in Factor Analysis

Communality plays an important role in factor analysis in several ways:

Adequacy of the factor solution : Communality provides information about the overall adequacy of the factor solution. Higher communalities indicate that the extracted factors are accounting for a larger proportion of the variance in the original variables. If some variables have low communalities, it may indicate that they are not well represented by the factor solution and that additional factors may be needed to fully capture their variance.

Factor selection : Communality is used to determine which factors should be retained in the factor solution. Factors with low communalities are less important and may be dropped from the solution. Factors with high communalities are important and should be retained.

Interpretation of factors : Communality provides information about the unique variance in each variable that is not accounted for by the extracted factors. This unique variance can be used to interpret the meaning of each factor. Variables with high communality values are more strongly related to the extracted factors and may be useful for interpreting the meaning of each factor.

Identification of outliers : Communalities can be used to identify outliers in the data. Variables with extremely low communalities may be outliers and may need to be removed from the analysis.

Overall, communality is an important measure in factor analysis that provides information about the overall adequacy of the factor solution, helps in the selection of factors, aids in the interpretation of factors, and can be used to identify outliers in the data.

Computing communality

Computing communality involves calculating the proportion of variance in each original variable that is accounted for by the extracted factors in a factor analysis. Here’s how to compute communality:

Perform a factor analysis on the dataset using a chosen method and number of factors.

Obtain the factor loadings for each variable. These are the correlations between each variable and each factor.

Square the factor loadings for each variable to obtain the proportion of variance in the variable that is accounted for by each factor.

Sum the squared factor loadings across all factors to obtain the total proportion of variance in the variable that is accounted for by the extracted factors.

The sum of the squared factor loadings is the communality for the variable.

Communality ranges from 0 to 1, with higher values indicating that a larger proportion of the variance in the variable is accounted for by the extracted factors.

Here’s an example R code that computes communality for the built-in USArrests dataset:

In this example, we performed a factor analysis with 1 factor on the USArrests dataset using the factanal() function in R. We then computed the communality for each variable by squaring the factor loadings and summing them across all factors using the apply() function in R. Finally, we printed the resulting communality values for each variable.

By examining the communality values, we can see which variables are most strongly related to the extracted factors and how much of their variance is accounted for by the factors.

Interpreting communality

Interpreting communality involves understanding the amount of variance in each original variable that is accounted for by the extracted factors in a factor analysis. Here are some key points to consider when interpreting communality:

High communality values: Variables with high communality values indicate that a large proportion of their variance is accounted for by the extracted factors. These variables are more strongly related to the factors and may be useful for interpreting the meaning of each factor.

Low communality values: Variables with low communality values indicate that a small proportion of their variance is accounted for by the extracted factors. These variables may not be well represented by the factor solution and may need additional factors to fully capture their variance.

Total variance accounted for: The sum of the communality values across all variables indicates the total proportion of variance in the dataset that is explained by the extracted factors. This can be used to assess the overall adequacy of the factor solution.

Outliers: Variables with extremely low communality values may indicate outliers in the data. These variables may not fit well with the overall pattern of the data and may need to be removed from the analysis.

Overlapping variance: It is important to note that communality measures the shared variance between the original variables and the extracted factors. Variables may have unique variance that is not accounted for by the factors. Thus, low communality values do not necessarily mean that a variable is unimportant or should be removed from the analysis.

Overall, interpreting communality involves understanding the extent to which the extracted factors account for the variance in the original variables. High communality values indicate that the extracted factors are strongly related to the variables, while low communality values may indicate the need for additional factors or the presence of outliers in the data.

Eigen Value

In factor analysis, the eigenvalue of a factor represents the amount of variance in the original variables that is explained by that factor . Specifically, it is the sum of the squared factor loadings for the factor.

Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues.

Eigenvalues are used to determine the number of factors to extract in a factor analysis. One common method for selecting the number of factors is to retain only those factors with eigenvalues greater than 1. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. Another method for selecting the number of factors is to examine a scree plot, which shows the eigenvalues for each factor in descending order. The number of factors to extract is chosen at the “elbow” of the plot, where the eigenvalues start to level off.

It is important to note that eigenvalues are relative measures of importance and can be affected by the number of variables and the sample size.

Thus, it is recommended to use multiple methods for determining the number of factors to extract and to interpret the results in conjunction with other information, such as factor loadings and communalities.

Overall, eigenvalues are an important measure in factor analysis that provide information about the relative importance of each factor in explaining the variance in the original variables. They are used to determine the number of factors to extract and aid in the interpretation of the factor solution.

Role of eigen value

The eigenvalue plays an important role in factor analysis in several ways:

Determining the number of factors: Eigenvalues are used to determine the number of factors to extract in factor analysis. Factors with eigenvalues greater than 1 are considered to be important and are typically retained. This is because factors with eigenvalues less than 1 explain less variance than a single original variable. The number of factors to extract can also be determined by examining a scree plot of the eigenvalues.

Assessing factor importance: Eigenvalues provide information about the relative importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues. This information can be used to assess the importance of each factor in the factor solution.

Interpreting factor meaning: Eigenvalues can aid in the interpretation of the meaning of each factor. Factors with high eigenvalues explain a larger proportion of the variance in the original variables and are more important for interpreting the meaning of the factor.

Identifying outliers: Eigenvalues can be used to identify outliers in the data. Outliers can be identified as variables that have low communalities and/or low eigenvalues.

Overall, eigenvalues are an important measure in factor analysis that provide information about the number of factors to extract, the importance of each factor, and can aid in the interpretation of the factor solution. It is important to note that eigenvalues are relative measures of importance and should be used in conjunction with other information, such as factor loadings and communalities, to fully interpret the results of the factor analysis.

Computing eigen value

Computing the eigenvalues in factor analysis involves extracting the factors and calculating the amount of variance in the original variables that is explained by each factor.

Here’s how to compute the eigenvalues:

Calculate the correlation matrix for the original variables.

Use a matrix decomposition technique, such as the eigenvalue decomposition, to obtain the eigenvalues and eigenvectors of the correlation matrix.

The eigenvalues represent the amount of variance in the correlation matrix that is accounted for by each eigenvector. The eigenvalues are equal to the sum of the squared loadings for each factor. The eigenvalues can be used to determine the number of factors to retain in the factor solution. Factors with eigenvalues greater than 1 are typically considered to be important and are retained.

Here’s an example R code that computes the eigenvalues for the built-in USArrests dataset:

In this example, we first computed the correlation matrix for the USArrests dataset using the cor() function in R. We then performed an eigenvalue decomposition of the correlation matrix using the eigen() function in R. Finally, we extracted the eigenvalues from the resulting eigenvalue decomposition and printed them to the console using the print() function. By examining the eigenvalues, we can determine the number of factors to retain in the factor solution and assess the importance of each factor in explaining the variance in the original variables.

Interpreting eigen value

Interpreting eigenvalues in factor analysis involves understanding the amount of variance in the original variables that is explained by each factor. Here are some key points to consider when interpreting eigenvalues:

Importance of each factor : Eigenvalues provide information about the importance of each factor in explaining the variance in the original variables. Factors with higher eigenvalues explain a larger proportion of the variance in the data than factors with lower eigenvalues. Factors with eigenvalues greater than 1 are typically considered to be important and are retained in the factor solution.

Number of factors to extract : Eigenvalues are used to determine the number of factors to extract in factor analysis. The number of factors can be determined by retaining only those factors with eigenvalues greater than 1 or by examining a scree plot of the eigenvalues. The number of factors to extract should be based on a combination of the eigenvalues, the factor loadings, and the overall interpretability of the factor solution.

Overlapping variance : It is important to note that eigenvalues measure the shared variance between the original variables and the extracted factors. Variables may have unique variance that is not accounted for by the factors. Thus, low eigenvalues do not necessarily mean that a factor is unimportant or should be removed from the analysis.

factor analysis research methodology

Factor Analysis 101: The Basics

  • Market Research , Survey Tips

What is Factor Analysis?

Factor analysis is a powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data. 

By applying this method to your research, you can spot trends faster and see themes throughout your datasets, enabling you to learn what the data points have in common. 

Unlike statistical methods such as regression analysis , factor analysis does not require defined variables. 

Factor analysis is most commonly used to identify the relationship between all of the variables included in a given dataset.

The Objectives of Factor Analysis

 Think of factor analysis as shrink wrap. When applied to a large amount of data, it compresses the set into a smaller set that is far more manageable, and easier to understand. 

The overall objective of factor analysis can be broken down into four smaller objectives: 

  • To definitively understand how many factors are needed to explain common themes amongst a given set of variables.
  • To determine the extent to which each variable in the dataset is associated with a common theme or factor.
  • To provide an interpretation of the common factors in the dataset.
  • To determine the degree to which each observed data point represents each theme or factor.

When to Use Factor Analysis

Determining when to use particular statistical methods to get the most insight out of your data can be tricky.

When considering factor analysis, have your goal top-of-mind.

There are three main forms of factor analysis. If your goal aligns to any of these forms, then you should choose factor analysis as your statistical method of choice: 

Exploratory Factor Analysi s should be used when you need to develop a hypothesis about a relationship between variables. 

Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables.

Construct Validity should be used to test the degree to which your survey actually measures what it is intended to measure.

How To Ensure Your Survey is Optimized for Factor Analysis

If you know that you’ll want to perform a factor analysis on response data from a survey, there are a few things you can do ahead of time to ensure that your analysis will be straightforward, informative, and actionable.

Identify and Target Enough Respondents

Large datasets are the lifeblood of factor analysis. You’ll need large groups of survey respondents, often found through panel services , for factor analysis to yield significant results. 

While variables such as population size and your topic of interest will influence how many respondents you need, it’s best to maintain a “more respondents the better” mindset. 

The More Questions, The Better

While designing your survey , load in as many specific questions as possible. Factor analysis will fall flat if your survey only has a few broad questions.  

The ultimate goal of factor analysis is to take a broad concept and simplify it by considering more granular, contextual information, so this approach will provide you the results you’re looking for. 

Aim for Quantitative Data

If you’re looking to perform a factor analysis, you’ll want to avoid having open-ended survey questions . 

By providing answer options in the form of scales (whether they be Likert Scales , numerical scales, or even ‘yes/no’ scales) you’ll save yourself a world of trouble when you begin conducting your factor analysis. Just make sure that you’re using the same scaled answer options as often as possible.

factor analysis research methodology

See all blog posts >

factor analysis research methodology

  • Company News , Press Release

factor analysis research methodology

  • Company News , Market Research , Onboarding , Panels , Product Enhancements , Solutions
  • 6 minute read

factor analysis research methodology

  • 3 minute read

See it in Action

factor analysis research methodology

  • Privacy Overview
  • Strictly Necessary Cookies
  • 3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Understanding Factor Analysis in Psychology

John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

factor analysis research methodology

Steven Gans, MD is board-certified in psychiatry and is an active supervisor, teacher, and mentor at Massachusetts General Hospital.

factor analysis research methodology

Skynesher / Getty Images

What Is Factor Analysis and What Does It Do?

Types of factor analysis, advantages and disadvantages of factor analysis, how is factor analysis used in psychology.

Like many methods encountered by those studying psychology , factor analysis has a long history.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors.

It was originally discussed by British psychologist Charles Spearman in the early 20th century and has gone on to be used in not only psychology but in other fields that often rely on statistical analyses,

But what is it, what are some real-world examples, and what are the different types? In this article, we'll answer all of those questions.

The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD , who works at the University of California-Irvine, uses factor analysis in her work on attachment.

She is doing research that looks into how people perceive relationships and how they connect to one another. She gives the example of providing a hypothetical questionnaire with 100 items on it and using factor analysis to drill deeper into the data. "So, rather than looking at each individual item on its own I'd rather say, 'Is there is there any way in which these items kind of cluster together or go together so that I can... create units of analysis that are bigger than the individual items."

Factor analysis is looking to identify patterns where it is assumed that there are already connections between areas of the data.

An Example Where Factor Analysis Is Useful

One common example of a factor analysis is when you are taking something not easily quantifiable, like socio-economic status , and using it to group together highly correlated variables like income level and types of jobs.

Factor analysis isn't just used in psychology but also deployed in fields like sociology, business, and technology sector fields like machine learning.

There are two types of factor analysis that are most commonly referred to: exploratory factor analysis and confirmatory factor analysis.

Here are the two types of factor analysis:

  • Exploratory analysis : The goal of this analysis is to find general patterns in a set of data points.
  • Confirmatory factor analysis : The goal of this analysis is to test various hypothesized relationships among certain variables.

Exploratory Analysis

In an exploratory analysis, you are being a little bit more open-minded as a researcher because you are using this type of analysis to provide some clarity in your data set that you haven't yet found. It's an approach that Borelli uses in her own research.

Confirmatory Factor Analysis

On the other hand, if you're using a confirmatory factor analysis you are using the assumptions or theoretical findings you have already identified to drive your statistical model.

Unlike in an exploratory factor analysis, where the relationships between factors and variables are more open, a confirmatory factor analysis requires you to select which variables you are testing for. In Borelli's words:

"When you do a confirmatory factor analysis, you kind of tell your analytic program what you think the data should look like, in terms of, 'I think it should have these two factors and this is the way I think it should look.'"

Let's take a look at the advantages and disadvantages of factor analysis.

A main advantage of a factor analysis is that it allows researchers to reduce a number of variables by combining them into a single factor.

You Can Analyze Fewer Data Points

When answering your research questions, it's a lot easier to be working with three variables than thirty, for example.

Disadvantages

Disadvantages include that the factor analysis relies on the quality of the data, and also may allow for different interpretations of the data. For example, during one study, Borelli found that after deploying a factor analysis, she was still left with results that didn't connect well with what had been found in hundreds of other studies .

Due to the nature of the sample being new and being more culturally diverse than others being explored, she used an exploratory factor analysis that left her with more questions than answers.

The goal of factor analysis in psychology is often to make connections that allow researchers to develop models with common factors in ways that might be hard or impossible to observe otherwise.

So, for example, intelligence is a difficult concept to directly observe. However, it can be inferred from factors that we can directly measure on specific tests.

Factor analysis has often been used in the field of psychology to help us better understand the structure of personality.

This is due to the multitude of factors researchers have to consider when it comes to understanding the concept of personality. This area of personality research is certainly not new, with easily findable research dating as far back as 1942 recognizing its power in personality research.

Britannica. Charles E. Spearman .

United State Environmental Protection Agency. Exploratory Data Analysis .

Flora DB, Curran PJ. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data .  Psychol Methods . 2004;9(4):466-491. doi:10.1037/1082-989X.9.4.466

Wolfle D. Factor analysis in the study of personality .  The Journal of Abnormal and Social Psychology. 1942;37(3):393–397.

By John Loeppky John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

Investigating the Connections Between Short-Selling Deregulation and Green Total Factor Productivity: Empirical Insights from China

  • Published: 23 May 2024

Cite this article

factor analysis research methodology

  • Wenzhen Mai   ORCID: orcid.org/0000-0001-8633-9061 1  

China’s economic development has undergone significant transformations in recent years, focusing on quality-driven development and constructing a modern market-oriented economic system. Central to this transformation is the pursuit of high-quality development by enterprises, which entails not only maximizing economic interests but also fostering sustainability and societal benefits. Achieving this delicate balance is a challenging endeavor involving enhancing green total factor productivity (GTFP), which integrates environmental considerations into traditional productivity measures. This research paper explores the intricate relationship between short-selling deregulation and corporate level green total factor productivity in China by employing a quasi-natural experiment approach, the Propensity Score Matching-Difference in Difference (PSM-DID) method. This analysis uncovers the mechanisms by which short-selling deregulation impacts GTFP through financing efficiency, investment efficiency, and green innovation. The heterogeneity of short-selling deregulation is examined based on ownership structure, external financing dependence, and pollution intensity. The findings reveal that short-selling deregulation, as a form of external governance, positively influences GTFP by reducing financing inefficiency, enhancing investment efficiency, and promoting green innovation capabilities. Importantly, this effect is more pronounced for non-state-owned enterprises, companies with higher external financing dependence, and pollution-intensive firms. This study sheds light on the role of short-selling as a crucial external corporate governance mechanism and provides practical implications for policymakers, suggesting that further expansion of short-selling deregulation can promote sustainable development in China.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

factor analysis research methodology

Data Availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Ali, L., & Akhtar, N. (2023). The effectiveness of export, FDI, human capital, and R&D on total factor productivity growth: The case of Pakistan. Journal of the Knowledge Economy , 1–15.

Arif-Ur-Rahman, M., & Inaba, K. (2020). Financial integration and total factor productivity: In consideration of different capital controls and foreign direct investment. Journal of Economic Structures, 9 , 1–20.

Article   Google Scholar  

Aslaksen, H. M., Hildebrandt, C., & Johnsen, H. C. G. (2021). The long-term transformation of the concept of CSR: Towards a more comprehensive emphasis on sustainability. International Journal of Corporate Social Responsibility, 6 (1), 11.

Ayinaddis, S. G. (2023). The effect of innovation orientation on firm performance: Evidence from micro and small manufacturing firms in selected towns of Awi Zone. Ethiopia. Journal of Innovation and Entrepreneurship, 12 (1), 26.

Cao, E., Du, L., & Ruan, J. (2019). Financing preferences and performance for an emission-dependent supply chain: Supplier vs. bank. International Journal of Production Economics, 208 , 383–399.

Carayannis, E. G., Dezi, L., Gregori, G., & Calo, E. (2021). Smart environments and techno-centric and human-centric innovations for industry and society 5.0: A quintuple helix innovation system view towards smart, sustainable, and inclusive solutions. Journal of the Knowledge Economy , 1–30.

Chen, C., Lan, Q., Gao, M., & Sun, Y. (2018). Green total factor productivity growth and its determinants in China’s industrial economy. Sustainability, 10 (4), 1052.

Chen, H., Guo, W., Feng, X., Wei, W., Liu, H., Feng, Y., & Gong, W. (2021). The impact of low-carbon city pilot policy on the total factor productivity of listed enterprises in China. Resources, Conservation and Recycling, 169 , 105457.

Dahiya, S., Hallak, I., & Matthys, T. (2020). Targeted by an activist hedge fund, do the lenders care? Journal of Corporate Finance, 62 , 101600.

Das, R. C. (2023). Sustainability through total factor productivity growth in agriculture incorporating institutional factors: A post-globalized Indian scenario. International Journal of Social Ecology and Sustainable Development (IJSESD), 14 (1), 1–16. https://doi.org/10.4018/IJSESD.319717

De Angelis, D., Grullon, G., & Michenaud, S. (2017). The effects of short-selling threats on incentive contracts: Evidence from an experiment. The Review of Financial Studies, 30 (5), 1627–1659.

de Oliveira Sousa, S. R., da Silva, W. V., da Veiga, C. P., & Zanini, R. R. (2020). Theoretical background of innovation in services in small and medium-sized enterprises: Literature mapping. Journal of Innovation and Entrepreneurship, 9 , 1–26.

Deng, X., & Cheng, X. (2019). Can ESG indices improve the enterprises’ stock market performance?—An empirical study from China. Sustainability, 11 (17), 4765.

Desalegn, G., & Tangl, A. (2022). Enhancing green finance for inclusive green growth: A systematic approach. Sustainability, 14 (12), 7416.

Dixon, P. N. (2021). Why do short selling bans increase adverse selection and decrease price efficiency? Review of Asset Pricing Studies, 11 (1), 122–168. https://doi.org/10.1093/rapstu/raaa007

Duval, R., Hong, G. H., & Timmer, Y. (2020). Financial frictions and the great productivity slowdown. The Review of Financial Studies, 33 (2), 475–503.

El Ghak, T., Gdairia, A., & Abassi, B. (2021). High-tech entrepreneurship and total factor productivity: The case of innovation-driven economies. Journal of the Knowledge Economy, 12 , 1152–1186.

Ferrando, A., & Ruggieri, A. (2018). Financial constraints and productivity: Evidence from Euro area companies. International Journal of Finance & Economics, 23 (3), 257–282.

Ferreira, J., Cardim, S., & Coelho, A. (2021). Dynamic capabilities and mediating effects of innovation on the competitive advantage and firm’s performance: The moderating role of organizational learning capability. Journal of the Knowledge Economy, 12 , 620–644.

Fu, S., Ma, Z., Ni, B., Peng, J., Zhang, L., & Fu, Q. (2021). Research on the spatial differences of pollution-intensive industry transfer under the environmental regulation in China. Ecological Indicators, 129 , 107921.

Giang, M. H., Xuan, T. D., Trung, B. H., Que, M. T., & Yoshida, Y. (2018). Impact of investment climate on total factor productivity of manufacturing firms in Vietnam. Sustainability, 10 (12), 4815.

He, J., & Tian, X. (2015). SHO time for innovation: The real effects of short sellers . Kelley School of Business Research Paper.

Google Scholar  

Jiakui, C., Abbas, J., Najam, H., Liu, J., & Abbas, J. (2023). Green technological innovation, green finance, and financial development and their role in green total factor productivity: Empirical insights from China. Journal of Cleaner Production, 382 , 135131.

Jiang, H., & Chen, J. (2019). Short selling and financial reporting quality: Evidence from Chinese AH shares. Journal of Contemporary Accounting & Economics, 15 (1), 118–130.

Jiang, J. (2022). Short selling and corporate diversification in emerging markets: Insights from controlling shareholder tunneling. Pacific-Basin Finance Journal, 75 , 101839.

Jiang, L., Zuo, Q., Ma, J., & Zhang, Z. (2021). Evaluation and prediction of the level of high-quality development: A case study of the Yellow River Basin China. Ecological Indicators, 129 , 107994.

Kafka, K. I., Dinçer, H., & Yüksel, S. (2022). Impact-relation map of innovative service development regarding the sustainable growth for emerging markets. Journal of the Knowledge Economy , 1–24.

Kostis, P., Dincer, H., & Yüksel, S. (2023). Knowledge-based energy investments of European economies and policy recommendations for sustainable development. Journal of the Knowledge Economy, 14 (3), 2630–2662.

Lee, C.-C., & Lee, C.-C. (2022). How does green finance affect green total factor productivity? Evidence from China. Energy Economics, 107 , 105863.

Lin, Y.-E., Li, Y.-W., Cheng, T. Y., & Lam, K. (2021). Corporate social responsibility and investment efficiency: Does business strategy matter? International Review of Financial Analysis, 73 , 101585.

Ling, X., Yan, S., & Cheng, L. T. (2022). Investor relations under short-selling pressure. Evidence from strategic signaling by company site visits. Journal of Business Finance and Accounting, 49 , 1145–1174.

Lu, Y., Ahmad, M., Zhang, H., & Guo, J. (2023). Effects of science and technology finance on green total factor productivity in China: Insights from an empirical spatial Durbin model. Journal of the Knowledge Economy , 1–27.

Mai, W., & Hamid, N. I. N. B. A. (2020). The signaling effect of short selling mechanism on firm value. TEST Engineering and Management, 83 (May/June), 20155–20160.

Mai, W., & Hamid, N. I. N. B. A. (2021a). The moderating effect of family business ownership on the relationship between short-selling mechanism and firm value for listed companies in China. Journal of Risk and Financial Management, 14 (6), 236.

Mai, W., & Hamid, N. I. N. B. A. (2021b). Short-selling and financial performance of SMEs in China: The mediating role of CSR performance. International Journal of Financial Studies, 9 (2), 22.

Mai, W., & Hamid, N. I. N. B. A. (2021c). Short-selling deregulation and corporate social responsibility of tourism industry in China. E3S Web Conf, 251 , 03032.

Mallinckrodt, B., Abraham, W. T., Wei, M., & Russell, D. W. (2006). Advances in testing the statistical significance of mediation effects. Journal of Counseling Psychology, 53 (3), 372.

Massa, M., Zhang, B., & Zhang, H. (2015). The invisible hand of short selling: Does short selling discipline earnings management? The Review of Financial Studies, 28 (6), 1701–1736.

Meng, Q., Li, X., Chan, K. C., & Gao, S. (2020). Does short selling affect a firm’s financial constraints? Journal of Corporate Finance, 60 , 101531.

Neumann, T. (2023). Are greener start-ups of superior quality? The impact of environmental orientation on innovativeness, growth orientation, and international orientation. Journal of Innovation and Entrepreneurship, 12 (1), 60.

Nezafat, M., Shen, T., & Wang, Q. (2021). Short selling, agency, and corporate investment. Financial Management, 50 (3), 775–804. https://doi.org/10.1111/fima.12343

Pan, W., Wang, J., Lu, Z., Liu, Y., & Li, Y. (2021). High-quality development in China: Measurement system, spatial pattern, and improvement paths. Habitat International, 118 , 102458.

Pangarso, A., Sisilia, K., Setyorini, R., Peranginangin, Y., & Awirya, A. A. (2022). The long path to achieving green economy performance for micro small medium enterprise. Journal of Innovation and Entrepreneurship, 11 (1), 1–19.

Rennekamp, K., Rupar, K., & Seybert, N. (2020). Short selling pressure, reporting transparency, and the use of real and accruals earnings management to meet benchmarks. Journal of Behavioral Finance, 21 (2), 186–204. https://doi.org/10.1080/15427560.2019.1663853

Serdaroğlu, T. (2015). Financial openness and total factor productivity in Turkey. Procedia Economics and Finance, 30 , 848–862.

Sharif, A., Kartal, M. T., Bekun, F. V., Pata, U. K., Foon, C. L., & Depren, S. K. (2023). Role of green technology, environmental taxes, and green energy towards sustainable environment: Insights from sovereign Nordic countries by CS-ARDL approach. Gondwana Research, 117 , 194–206.

Singer, Z., Wang, Y., & Zhang, J. (2022). Can short sellers detect internal control material weaknesses? Evidence from section 404. Journal of Accounting, Auditing and Finance, 37 (1), 3–38.

Soleas, E. (2021). Environmental factors impacting the motivation to innovate: A systematic review. Journal of Innovation and Entrepreneurship, 10 (1), 17.

Song, M., Du, J., & Tan, K. H. (2018). Impact of fiscal decentralization on green total factor productivity. International Journal of Production Economics, 205 , 359–367.

Tan, J., & Wei, J. (2023). Configurational analysis of ESG performance, innovation intensity, and financial leverage: A study on total factor productivity in Chinese pharmaceutical manufacturing firms. Journal of the Knowledge Economy , 1–25.

Wang, H., Cui, H., & Zhao, Q. (2021). Effect of green technology innovation on green total factor productivity in China: Evidence from spatial Durbin model analysis. Journal of Cleaner Production, 288 , 125624.

Wang, L., Zou, H., & Li, X. (2020). Does short selling affect the investment of Chinese firms? An external financing perspective. Asia-Pacific Journal of Accounting & Economics , 1–24.

Wang, Q., Dou, J., & Jia, S. (2016). A meta-analytic review of corporate social responsibility and corporate financial performance: The moderating effect of contextual factors. Business & Society, 55 (8), 1083–1121.

Wen, H., Chen, S., & Chien-Chiang, L. (2023). Impact of low-carbon city construction on financing, investment, and total factor productivity of energy-intensive enterprises. The Energy Journal, 44 (2).

Wu, C.-M., & Hu, J.-L. (2019). Can CSR reduce stock price crash risk? Evidence from China’s energy industry. Energy Policy, 128 , 505–518.

Wu, G-Z., & You, D-M. (2021). Margin trading, short selling and corporate green innovation. arXiv preprint  arXiv:2107.11255 

Yadav, P., & Mathew, J. (2023). Improving organizational sustainable performance of organizations through green training. International Journal of Social Ecology and Sustainable Development (IJSESD), 14 (1), 1–11. https://doi.org/10.4018/IJSESD.321166

Yu, L., Xu, X., Zhang, W., Fu, Z., & Wu, Z. (2023). Unveiling knowledge economy dynamics in China: Insights into creation, diffusion, and application across varied contexts. Journal of the Knowledge Economy , 1–24.

Yuan, S., & Pan, X. (2022). Corporate carbon disclosure, financing structure, and total factor productivity: Evidence from Chinese heavy polluting enterprises. Environmental Science and Pollution Research, 29 (26), 40094–40109.

Zahid, M., Naeem, H., Aftab, I., & Mughal, S. A. (2021). From corporate social responsibility activities to financial performance: Role of innovation and competitive advantage. Asia Pacific Journal of Innovation and Entrepreneurship, 15 (1), 2–13.

Zhang, J., Lu, G., Skitmore, M., & Ballesteros-Pérez, P. (2021). A critical review of the current research mainstreams and the influencing factors of green total factor productivity. Environmental Science and Pollution Research, 28 (27), 35392–35405.

Zhang, L., Guo, Y., & Sun, G. (2019). How patent signals affect venture capital: The evidence of bio-pharmaceutical start-ups in China. Technological Forecasting and Social Change, 145 , 93–104.

Zhang, Y., & Ikeda, S. S. (2017). Effects of short sale ban on financial liquidity in crisis and non-crisis periods: A propensity score-matching approach. Applied Economics, 49 (28), 2711–2718.

Zhao, J. (2023). Coupling open innovation: Network position, knowledge integration ability, and innovation performance. Journal of the Knowledge Economy, 14 (2), 1538–1558.

Zhen, X., Shi, D., Li, Y., & Zhang, C. (2020). Manufacturer’s financing strategy in a dual-channel supply chain: Third-party platform, bank, and retailer credit financing. Transportation Research Part e: Logistics and Transportation Review, 133 , 101820.

Zhou, Y., Tang, T., & Luo, L. (2023). Is corporate environmental investment a strategic risk management tool? Evidence from short selling threats. Pacific-Basin Finance Journal, 82 , 102129.

Download references

Author information

Authors and affiliations.

International School, Guangzhou Huali College, Guangzhou, 510000, Guangdong, China

Wenzhen Mai

You can also search for this author in PubMed   Google Scholar

Contributions

W. M.: Writing and editing—original draft.

Corresponding author

Correspondence to Wenzhen Mai .

Ethics declarations

Conflict of interest.

The author declares no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Mai, W. Investigating the Connections Between Short-Selling Deregulation and Green Total Factor Productivity: Empirical Insights from China. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-02107-4

Download citation

Received : 19 December 2023

Accepted : 15 May 2024

Published : 23 May 2024

DOI : https://doi.org/10.1007/s13132-024-02107-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Short selling
  • Deregulation
  • Green total factor productivity
  • Sustainable development
  • Financial market
  • External governance

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. What Is Factor Analysis & How Does It Simplify Research?

    factor analysis research methodology

  2. Steps followed in Exploratory Factor Analysis.

    factor analysis research methodology

  3. Factor Analysis

    factor analysis research methodology

  4. PPT

    factor analysis research methodology

  5. Factor Analysis

    factor analysis research methodology

  6. Factor Analysis

    factor analysis research methodology

VIDEO

  1. Factor Analysis (Part-2) by Dr. Sanjeev Bakshi, IGNTU, Amarkantak

  2. Factor Analysis (Part-1) by Dr. Sanjeev Bakshi, IGNTU, Amarkantak

  3. QUANTITATIVE METHODOLOGY (Part 2 of 3):

  4. Video 8 Factor Extraction PQMethod

  5. Lecture 03 Factor Factor Relationship Part 1 I AAE 321 I Farm Management and Production Economics

  6. Factor analysis in Multivariate. || Comprehensive Lecture ||

COMMENTS

  1. Factor Analysis

    Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.

  2. Factor Analysis Guide with an Example

    The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA). You should use either ML or PAF most of the time.

  3. Exploratory Factor Analysis: A Guide to Best Practice

    Exploratory factor analysis (EFA) is one of a family of multivariate statistical methods that attempts to identify the smallest number of hypothetical constructs (also known as factors, dimensions, latent variables, synthetic variables, or internal attributes) that can parsimoniously explain the covariation observed among a set of measured variables (also called observed variables, manifest ...

  4. Factor Analysis and How It Simplifies Research Findings

    Factor analysis isn't a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research, as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

  5. PDF Factor Analysis

    Choosing among Different Methods ! Between MLE and LS ! LS is preferred with " few indicators per factor " Equeal loadings within factors " No large cross-loadings " No factor correlations " Recovering factors with low loadings (overextraction) ! MLE if preferred with " Multivariate normality

  6. Factor Analysis: a means for theory and instrument development in

    Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors ...

  7. Factor analysis

    Higher-order factor analysis is a statistical method consisting of repeating steps factor analysis ... In cross-cultural research. Factor analysis is a frequently used technique in cross-cultural research. It serves the purpose of extracting cultural dimensions.

  8. Lesson 12: Factor Analysis

    Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...

  9. Lesson 12: Factor Analysis

    Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors." The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior.

  10. A Practical Introduction to Factor Analysis

    Factor analysis is a method for modeling observed variables and their covariance structure in terms of unobserved variables (i.e., factors). There are two types of factor analyses, exploratory and confirmatory. Exploratory factor analysis (EFA) is method to explore the underlying structure of a set of observed variables, and is a crucial step ...

  11. A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

    Purpose. This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).

  12. An Introduction to Factor Analysis: Reducing Variables

    Factor analysis is a sophisticated statistical method aimed at reducing a large number of variables into a smaller set of factors. This technique is valuable for extracting the maximum common variance from all variables, transforming them into a single score for further analysis. As a part of the general linear model (GLM), factor analysis is ...

  13. Factor Analysis

    Factor analysis is a multivariate method that can be used for analyzing large data sets with two main goals: 1. to reduce a large number of correlating variables to a fewer number of factors,. 2. to structure the data with the aim of identifying dependencies between correlating variables and examining them for common causes (factors) in order to generate a new construct (factor) on this basis.

  14. Sage Research Methods: Business

    This guide further explains various parts and parcels of factor analysis: (1) the process of factor loading on a specific survey case, (2) the identification process for an appropriate number of factors and optimal combination of factors, depending on the specific research design and goals, and (3) an explanation of dimensions, their reduction ...

  15. Exploratory Factor Analysis: A Guide to Best Practice

    Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements. However, researchers must make several thoughtful and evidence-based methodological decisions while conducting an EFA, and there are a number of options available ...

  16. Introduction to Exploratory Factor Analysis: An Applied Approach

    The most substantive part of the chapter focuses on six steps of EFA. More specifically, we consider variable (or indicator) selection (Step 1), computing the variance-covariance matrix (Step 2), factor-extraction methods (Step 3), factor-retention procedures (Step 4), factor-rotation methods (Step 5), and interpretation (Step 6).

  17. Understanding and Using Factor Scores: Considerations for the ...

    or confirmatory factor analysis procedures, and 63 articles (27.5%) did not provide sufficient information on the methodology used. For example, many factor score methods are built on the assumption that the resulting factor scores will be uncorrelated; however, orthogonal factors are often the rarity rather than the norm in educational research.

  18. (PDF) Overview of Factor Analysis

    Chapter 1. Theoretical In tro duction. • Factor analysis is a collection of methods used to examine how underlying constructs influence the. resp onses on a n umber of measured v ariables ...

  19. A Primer on Factor Analysis in Research using Reproducible R Software

    Factor analysis is a statistical method that is widely used in research to identify the underlying factors that explain the variations in a set of observed variables. The method is particularly useful in fields such as psychology, sociology, marketing, and education, where researchers often deal with complex datasets that contain many variables.

  20. Factor Analysis 101: The Basics

    When considering factor analysis, have your goal top-of-mind. There are three main forms of factor analysis. If your goal aligns to any of these forms, then you should choose factor analysis as your statistical method of choice: Exploratory Factor Analysis should be used when you need to develop a hypothesis about a relationship between variables.

  21. Factor Analysis in Psychology: Types, How It's Used

    The primary goal of factor analysis is to distill a large data set into a working set of connections or factors. Dr. Jessie Borelli, PhD, who works at the University of California-Irvine, uses factor analysis in her work on attachment. She is doing research that looks into how people perceive relationships and how they connect to one another.

  22. Factor Analysis: Easy Definition

    Procrustes analysis is a way to compare two sets of configurations, or shapes. Originally developed to match two solutions from Factor Analysis, the technique was extended to Generalized Procrustes Analysis so that more than two shapes could be compared. The shapes are aligned to a target shape or to each other.

  23. Factor Analysis as a Tool for Survey Analysis

    Abstract and Figures. Factor analysis is particularly suitable to extract few factors from the large number of related variables to a more manageable number, prior to using them in other analysis ...

  24. Investigating the Connections Between Short-Selling ...

    This research paper explores the intricate relationship between short-selling deregulation and corporate level green total factor productivity in China by employing a quasi-natural experiment approach, the Propensity Score Matching-Difference in Difference (PSM-DID) method. This analysis uncovers the mechanisms by which short-selling ...

  25. Applied Sciences

    In order to eliminate the sunlight temperature factor, which has a great influence on the process of joining, finite element analysis was used to further compare and analyze the changes in the internal force and the linearity of the structure under different joining methods. ... The research methods and conclusions of this paper can provide a ...