Logo for VIVA Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

19 11. Quantitative measurement

Chapter outline.

  • Overview of measurement (11 minute read)
  • Operationalization and levels of measurement (20 minute read)
  • Scales and indices (15 minute read)
  • Reliability and validity (20 minute read)
  • Ethical and social justice considerations for measurement (6 minute read)

Content warning: Discussions of immigration issues, parents and gender identity, anxiety, and substance use.

11.1 Overview of measurement

Learning Objectives

Learners will be able to…

  • Provide an overview of the measurement process in social work research
  • Describe why accurate measurement is important for research

This chapter begins with an interesting question: Is my apple the same as your apple? Let’s pretend you want to study apples. Perhaps you have read that chemicals in apples may impact neurotransmitters and you want to test if apple consumption improves mood among college students. So, in order to conduct this study, you need to make sure that you provide apples to a treatment group, right? In order to increase the rigor of your study, you may also want to have a group of students, ones who do not get to eat apples, to serve as a comparison group. Don’t worry if this seems new to you. We will discuss this type of design in Chapter 13 . For now, just concentrate on apples.

In order to test your hypothesis about apples, you need to define exactly what is meant by the term “apple” so you ensure everyone is consuming the same thing. You also need to know what you consider a “dose” of this thing that we call “apple” and make sure everyone is consuming the same kind of apples and you need a way to ensure that you give the same amount of apples to everyone in your treatment group. So, let’s start by making sure we understand what the term “apple” means. Say you have an object that you identify as an apple and I have an object that I identify as an apple. Perhaps my “apple” is a chocolate apple, one that looks similar to an apple but made of chocolate and red dye, and yours is a honeycrisp. Perhaps yours is papier-mache and mine is a Macbook Pro.  All of these are defined as apples, right?

Decorative image

You can see the multitude of ways we could conceptualize “apple,” and how that could create a problem for our research. If I get a Red Delicious (ick) apple and you get a Granny Smith (yum) apple and we observe a change in neurotransmitters, it’s going to be even harder than usual to say the apple influenced the neurotransmitters because we didn’t define “apple” well enough. Measurement in this case is essential to treatment fidelity , which is when you ensure that everyone receives the same, or close to the same, treatment as possible. In other words, you need to make sure everyone is consuming the same kind of apples and you need a way to ensure that you give the same amount of apples to everyone in your treatment group.

In social science, when we use the term  measurement , we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one’s terms in as clear and precise a way as possible. Of course, measurement in social science isn’t quite as simple as using a measuring cup or spoon, but there are some basic tenets on which most social scientists agree when it comes to measurement. We’ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.

An important point here is that measurement does not require any particular instruments or procedures. What it does  require is  some systematic procedure for assigning scores, meanings, and descriptions to individuals or objects so that those scores represent the characteristic of interest. You can measure phenomena in many different ways, but you must be sure that how you choose to measure gives you information and data that lets you answer your research question. If you’re looking for information about a person’s income, but your main points of measurement have to do with the money they have in the bank, you’re not really going to find the information you’re looking for!

What do social scientists measure?

The question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you’ve learned about in other social work classes you’ve taken or the topics you’ve considered investigating yourself. Let’s consider Melissa Milkie and Catharine Warner’s study (2011) [1] of first graders’ mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we’re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.

As you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person’s gender shapes their workplace experiences must measure gender and workplace experiences. You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.

In 1964, philosopher Abraham Kaplan (1964) [2] wrote The   Conduct of Inquiry,  which has since become a classic work in research methodology (Babbie, 2010). [3] In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called “observational terms,” is probably the simplest to measure in social science. Observational terms are the sorts of things that we can see with the naked eye simply by looking at them. Kaplan roughly defines them as conditions that are easy to identify and verify through direct observation. If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.

Indirect observables , on the other hand, are less straightforward to assess. In Kaplan’s framework, they are conditions that are subtle and complex that we must use existing knowledge and intuition to define.If we conducted a study for which we wished to know a person’s income, we’d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won’t have directly observed any of those people being born in the locations they report.

How do social scientists measure?

Measurement in social science is a process. It occurs at multiple stages of a research project: in the planning stages, in the data collection stage, and sometimes even in the analysis stage. Recall that previously we defined measurement as the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. Once we’ve identified a research question, we begin to think about what some of the key ideas are that we hope to learn from our project. In describing those key ideas, we begin the measurement process.

Let’s say that our research question is the following: How do new college students cope with the adjustment to college? In order to answer this question, we’ll need some idea about what coping means. We may come up with an idea about what coping means early in the research process, as we begin to think about what to look for (or observe) in our data-collection phase. Once we’ve collected data on coping, we also have to decide how to report on the topic. Perhaps, for example, there are different types or dimensions of coping, some of which lead to more successful adjustment than others. However we decide to proceed, and whatever we decide to report, the point is that measurement is important at each of these phases.

As the preceding example demonstrates, measurement is a process in part because it occurs at multiple stages of conducting research. We could also think of measurement as a process because it involves multiple stages. From identifying your key terms to defining them to figuring out how to observe them and how to know if your observations are any good, there are multiple steps involved in the measurement process. An additional step in the measurement process involves deciding what elements your measures contain. A measure’s elements might be very straightforward and clear, particularly if they are directly observable. Other measures are more complex and might require the researcher to account for different themes or types. These sorts of complexities require paying careful attention to a concept’s level of measurement and its dimensions. We’ll explore these complexities in greater depth at the end of this chapter, but first let’s look more closely at the early steps involved in the measurement process, starting with conceptualization.

The idea of coming up with your own measurement tool might sound pretty intimidating at this point. The good news is that if you find something in the literature that works for you, you can use it with proper attribution. If there are only pieces of it that you like, you can just use those pieces, again with proper attribution. You don’t always have to start from scratch!

Key Takeaways

  • Measurement (i.e. the measurement process) gives us the language to define/describe what we are studying.
  • In research, when we develop measurement tools, we move beyond concepts that may be subjective and abstract to a definition that is clear and concise.
  • Good social work researchers are intentional with the measurement process.
  • Engaging in the measurement process requires us to think critically about what we want to study. This process may be challenging and potentially time-consuming.
  • How easy or difficult do you believe it will be to study these topics?
  • Think about the chapter on literature reviews. Is there a significant body of literature on the topics you are interested in studying?
  • Are there existing measurement tools that may be appropriate to use for the topics you are interested in studying?

11.2 Operationalization and levels of measurement

  • Define constructs and operationalization and describe their relationship
  • Be able to start operationalizing variables in your research project
  • Identify the level of measurement for each type of variable
  • Demonstrate knowledge of how each type of variable can be used

Now we have some ideas about what and how social scientists need to measure, so let’s get into the details. In this section, we are going to talk about how to make your variables measurable (operationalization) and how you ultimately characterize your variables in order to analyze them (levels of measurement).

Operationalizing your variables

“Operationalizing” is not a word I’d ever heard before I became a researcher, and actually, my browser’s spell check doesn’t even recognize it. I promise it’s a real thing, though. In the most basic sense, when we operationalize a variable, we break it down into measurable parts. Operationalization is the process of determining how to measure a construct that cannot be directly observed. And a constructs are conditions that are not directly observable and represent states of being, experiences, and ideas. But why construct ? We call them constructs because they are built using different ideas and parameters.

As we know from Section 11.1, sometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables . Think about some of the things you’ve learned about in other social work classes—for example, ethnocentrism. What is ethnocentrism? Well, from completing an introduction to social work class you might know that it’s a construct that has something to do with the way a person judges another’s culture. But how would you measure  it? Here’s another construct: bureaucracy. We know this term has something to do with organizations and how they operate, but measuring such a construct is trickier than measuring, say, a person’s income. In both cases, ethnocentrism and bureaucracy, these theoretical notions represent ideas whose meaning we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe the things that they are made up of.

social work research level of measurement

Now, let’s operationalize bureaucracy and ethnocentrism. The construct of bureaucracy could be measured by counting the number of supervisors that need to approve routine spending by public administrators. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe.

How we operationalize our constructs (and ultimately measure our variables) can affect the conclusions we can draw from our research. Let’s say you’re reviewing a state program to make it more efficient in connecting people to public services. What might be different if we decide to measure bureaucracy by the number of forms someone has to fill out to get a public service instead of the number of people who have to review the forms, like we talked about above? Maybe you find that there is an unnecessary amount of paperwork based on comparisons to other state programs, so you recommend that some of it be eliminated. This is probably a good thing, but will it actually make the program more efficient like eliminating some of the reviews that paperwork has to go through would? I’m not really making a judgment on which way is better to measure bureaucracy, but I encourage you to think about the costs and benefits of each way we operationalized the construct of bureaucracy, and extend this to the way you operationalize your own concepts in your research project.

Levels of Measurement

Now, we’re going to move into some more concrete characterizations of variables. You now hopefully understand how to operationalize your concepts so that you can turn them into variables. Imagine a process kind of like what you see in Figure 11.1 below.

social work research level of measurement

Notice that the arrows from the construct point toward the research question, because ultimately, measuring them will help answer your question!

The level of measuremen t of a variable tells us how the values of the variable relate to each other  and what mathematical operations we can perform with the variable. (That second part will become important once we move into quantitative analysis in Chapter 14  and Chapter 15 ).  Many students find this definition a bit confusing. What does it mean when we say that the level of measurement tells us about mathematical operations? So before we move on, let’s clarify this a bit. 

Let’s say you work for your a community nonprofit that wants to develop programs relevant to community members’ ages (i.e., tutoring for kids in school, job search and resume help for adults, and home visiting for elderly community members). However, you do not have a good understanding of the ages of the people who visit your community center. Below is a part of a questionnaire that you developed to.

  • How old are you? – Under 18 years old – 18-30 years old – 31-50 years old – 51-60 years old – Over 60 years old
  • How old are you? _____ years

Look at the two items on this questionnaire. They both ask about age, but t he first item asks about age but asks the participant to identify the age range. The second item asks you to identify the actual age in years. These two questions give us data that represent the same information measured at a different level.

It would help your agency if you knew the average age of clients, right? So, which item on the questionnaire will provide this information? Item one’s choices are grouped into categories. Can you compute an average age from these choices? No. Conversely, participants completing item two are asked to provide an actual number, one that you could use to determine an average age. In summary, the two items both ask the participants to report their age. However, the type of data collected from both items is different and must be analyzed differently. 

We can think about the four levels of measurement as going from less to more specific, or as it’s more commonly called, lower to higher: nominal, ordinal , interval , and ratio . Each of these levels differ and help the researcher understand something about their data.  Think about levels of measurement as a hierarchy.

In order to determine the level of measurement, please examine your data and then ask these four questions (in order).

  • Do I have mutually exclusive categories? If the answer is yes, continue to question #2.
  • Do my item choices have a hierarchy or order? In other words, can you put your item choices in order? If no, stop–you have nominal level data. If the answer is yes, continue to question #3.
  • Can I add, subtract, divide, and multiply my answer choices? If no, stop–you have ordinal level data. If the answer is yes, continue to question #4.
  • Is it possible that the answer to this item can be zero? If the answer is no—you have interval level data. If the answer is yes, you are at the ratio level of measurement.

Nominal level .  The nominal level of measurement is the lowest level of measurement. It contains categories are mutually exclusive, which means means that anyone who falls into one category cannot not fall into another category. The data can be represented with words (like yes/no) or numbers that correspond to words or a category (like 1 equaling yes and 0 equaling no). Even when the categories are represented as numbers in our data, the number itself does not have an actual numerical value. It is merely a number we have assigned so that we can use the variable in mathematical operations (which we will start talking about in Chapter 14.1 ). We say this level of measurement is lowest or least specific because someone who falls into a category we’ve designated could differ from someone else in the same category. Let’s say on our questionnaire above, we also asked folks whether they own a car. They can answer yes or no, and they fall into mutually exclusive categories. In this case, we would know whether they own a car, but not whether owning a car really affects their life significantly. Maybe they have chosen not to own one and are happy to take the bus, bike, or walk. Maybe they do not own one but would like to own one. We cannot get this information from a nominal variable, which is ok when we have meaningful categories. Nominal variables are especially useful when we just need the frequency of a particular characteristic in our sample.

The nominal level of measurement usually includes many demographic characteristics like race, gender, or marital status.

Ordinal leve l . The ordinal level of measurement is the next level of measurement and contains slightly more specific information than the nominal level. This level has mutually exclusive categories and a hierarchy or order. Let’s go back to the first item on the questionnaire we talked about above.

Do we have mutually exclusive categories? Yes. Someone who selects item A cannot also select item B. So, we know that we have at least nominal level data. However, the next question that we need to ask is “Do my answer choices have order?” or “Can I put my answer choices in order?” The answer is yes, someone who selects A is younger than someone who selects B or C. So, you have at least ordinal level data.

From a data analysis and statistical perspective, ordinal variables get treated exactly like nominal variables because they are both categorical variables , or variables whose values are organized into mutually exclusive groups but whose numerical values cannot be used in mathematical operations. You’ll see this term used again when we get into bivariate analysis in Chapter 15.

Interval level The interval level of measurement is a higher level of measurement. This level marks the point where we are able . This level contains all of the characteristics of the previous levels (mutually exclusive categories and order). What distinguishes it from the ordinal level is that the interval level can be used to conduct mathematical computations with data (like an average, for instance).

Let’s think back to our questionnaire about age again and take a look at the second question where we asked for a person’s exact age in years. Age in years is mutually exclusive – someone can’t be 14 and 15 at the same time – and the order of ages is meaningful, since being 18 means something different than being 32. Now, we can also take the answers to this question and do math with them, like addition, subtraction, multiplication, and division.

Ratio level . Ratio level data is the highest level of measurement. It has mutually exclusive categories, order, and you can perform mathematical operations on it. The main difference between the interval and ratio levels is that the ratio level has an absolute zero, meaning that a value of zero is both possible and meaningful. You might be thinking, “Well, age has an absolute zero,” but someone who is not yet born does not have an age, and the minute they’re born, they are not zero years old anymore.

Data at the ratio level of measurement are usually amounts or numbers of things, and can be negative (if that makes conceptual sense, of course). For example, you could ask someone to report how many A’s they have on their transcript or how many semesters they have earned a 4.0. They could have zero A’s and that would be a valid answer.

From a data analysis and statistical perspective, interval and ratio variables are treated exactly the same because they are both continuous variables , or variables whose values are mutually exclusive and can be used in mathematical operations. Technically, a continuous variable could have an infinite number of values.

What does the level of measurement tell us?

We have spent time learning how to determine our data’s level of measurement. Now what? How could we use this information to help us as we measure concepts and develop measurement tools? First, the types of statistical tests that we are able to use are dependent on our data’s level of measurement. (We will discuss this soon in Chapter 15.) The higher the level of measurement, the more complex statistical tests we are able to conduct. This knowledge may help us decide what kind of data we need to gather, and how. That said, we have to balance this knowledge with the understanding that sometimes, collecting data at a higher level of measurement could negatively impact our studies. For instance, sometimes providing answers in ranges may make prospective participants feel more comfortable responding to sensitive items. Imagine that you were interested in collecting information on topics such as income, number of sexual partners, number of times used illicit drugs, etc. You would have to think about the sensitivity of these items and determine if it would make more sense to collect some data at a lower level of measurement.

Finally, sometimes when analyzing data, researchers find a need to change a data’s level of measurement. For example, a few years ago, a student was interested in studying the relationship between mental health and life satisfaction. This student collected a variety of data. One item asked about the number of mental health diagnoses, reported as the actual number. When analyzing data, my student examined the mental health diagnosis variable and noticed that she had two groups, those with none or one diagnosis and those with many diagnoses. Instead of using the ratio level data (actual number of mental health diagnoses), she collapsed her cases into two categories, few and many. She decided to use this variable in her analyses. It is important to note that you can move a higher level of data to a lower level of data; however, you are unable to move a lower level to a higher level.

  • Operationalization involves figuring out how to measure a construct you cannot directly observe.
  • Nominal variables have mutually exclusive categories with no natural order. They cannot be used for mathematical operations like addition or subtraction. Race or gender would be one example.
  • Ordinal variables have mutually exclusive categories  and a natural order. They also cannot be used for mathematical operations like addition or subtraction. Age when measured in categories (i.e., 18-25 years old) would be an example.
  • Interval variables have mutually exclusive categories, a natural order, and can be used for mathematical operations. Age as a raw number would be an example.
  • Ratio variables have mutually exclusive categories, a natural order, can be used for mathematical operations, and have an absolute zero value. The number of times someone calls a legislator to advocate for a policy would be an example.
  • Nominal and ordinal variables are categorical variables, meaning they have mutually exclusive categories and cannot be used for mathematical operations, even when assigned a number.
  • Interval and ratio variables are continuous variables, meaning their values are mutually exclusive and can be used in mathematical operations.
  • Researchers should consider the costs and benefits of how they operationalize their variables, including what level of measurement they choose, since the level of measurement can affect how you must gather your data.
  • What are the primary constructs being explored in the research?
  • Could you (or the study authors) have chosen another way to operationalize this construct?
  • What are these variables’ levels of measurement?
  • Are they categorical or continuous?

11.3 Scales and indices

  • Identify different types of scales and compare them to each other
  • Understand how to begin the process of constructing scales or indices

Quantitative data analysis requires the construction of two types of measures of variables: indices and scales. These measures are frequently used and are important since social scientists often study variables that possess no clear and unambiguous indicators–for instance, age or gender. Researchers often centralize much of work in regards to the attitudes and orientations of a group of people, which require several items to provide indication of the variables. Secondly, researchers seek to establish ordinal categories from very low to very high (vice-versa), which single data items can not ensure, while an index or scale can.

Although they exhibit differences (which will later be discussed) the two have in common various factors.

  • Both are ordinal measures of variables.
  • Both can order the units of analysis in terms of specific variables.
  • Both are composite measures of variables ( measurements based on more than one one data item ).

In general, indices are a sum of series of individual yes/no questions, that are then combined in a single numeric score. They are usually a measure of the quantity of some social phenomenon and are constructed at a ratio level of measurement. More sophisticated indices weigh individual items according to their importance in the concept being measured (i.e. in a multiple choice test where different questions are worth different numbers of points). Some interval-level indices are not weight counted, but contain other indexes or scales within them (i.e. college admissions that score an applicant based on GPA, SAT scores, essays, and place a different point from each source).

This section discusses two formats used for measurement in research: scales and indices (sometimes called indexes). These two formats are helpful in research because they use multiple indicators to develop a composite (or total) score. Co mposite scores provide a much greater understanding of concepts than a single item could. Although we won’t delve too deeply into the process of scale development, we will cover some important topics for you to understand how scales and indices can be used.

Types of scales

As a student, you are very familiar with end of the semester course evaluations. These evaluations usually include statements such as, “My instructor created an environment of respect” and ask students to use a scale to indicate how much they agree or disagree with the statements.  These scales, if developed and administered appropriately, provide a wealth of information to instructors that may be used to refine and update courses. If you examine the end of semester evaluations, you will notice that they are organized, use language that is specific to your course, and have very intentional methods of implementation. In essence, these tools are developed to encourage completion.

As you read about these scales, think about the information that you want to gather from participants. What type or types of scales would be the best for you to use and why? Are there existing scales or do you have to create your own?

The Likert scale

Most people have seen some version of a Likert scale. Designed by Rensis Likert (Likert, 1932) [4] , a Likert scale is a very popular rating scale for measuring ordinal data in social work research. This scale includes Likert items that are simply-worded statements to which participants can indicate their extent of agreement or disagreement on a five- or seven-point scale ranging from “strongly disagree” to “strongly agree.” You will also see Likert scales used for importance, quality, frequency, and likelihood, among lots of other concepts. Below is an example of how we might use a Likert scale to assess your attitudes about research as you work your way through this textbook.

Likert scales are excellent ways to collect information. They are popular; thus, your prospective participants may already be familiar with them. However, they do pose some challenges. You have to be very clear about your question prompts. What does strongly agree mean and how is this differentiated from agree ? In order to clarify this for participants, some researchers will place definitions of these items at the beginning of the tool.

There are a few other, less commonly used, scales discussed next.

Semantic differential scale

This is a composite (multi-item) scale where respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. For instance, in the above Likert scale, the participant is asked how much they agree or disagree with a statement. In a semantic differential scale, the participant is asked to indicate how they feel about a specific item. This makes the s emantic differential scale an excellent technique for measuring people’s attitudes or feelings toward objects, events, or behaviors. The following is an example of a semantic differential scale that was created to assess participants’ feelings about the content taught in their research class.  

Feelings About My Research Class

Directions: Please review the pair of words and then select the one that most accurately reflects your feelings about the content of your research class.

Boring……………………………………….Exciting

Waste of Time…………………………..Worthwhile

Dry…………………………………………….Engaging

Irrelevant…………………………………..Relevant

Guttman scale

This composite scale was designed by Louis Guttman and uses a series of items arranged in increasing order of intensity (least intense to most intense) of the concept. This type of scale allows us to understand the intensity of beliefs or feelings. Each item in the above Guttman scale has a weight (this is not indicated on the tool) which varies with the intensity of that item, and the weighted combination of each response is used as an aggregate measure of an observation. Let’s pretend that you are working with a group of parents whose children have identified as part of the transgender community. You want to know how comfortable they feel with their children. You could develop the following items.

Example Guttman Scale Items

  • I would allow my child to use a name that was not gender-specific (e.g., Ryan, Taylor)    Yes/No
  • I would allow my child to wear clothing of the opposite gender (e.g., dresses for boys)   Yes/No
  • I would allow my child to use the pronoun of the opposite sex                                             Yes/No
  • I would allow my child to live as the opposite gender                                                             Yes/No

Notice how the items move from lower intensity to higher intensity. A researcher reviews the yes answers and creates a score for each participant.

Indices (Indexes)

An index is a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas. It is different from a scale. Scales also aggregate measures; however, these measures examine different dimensions or the same dimension of a single construct. A well-known example of an index is the consumer price index (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. The CPI is a measure of how much consumers have to pay for goods and services (in general) and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and “other goods and services”), which are further subdivided into more than 200 smaller items. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Using a complicated weighting scheme that takes into account the location and probability of purchase for each item, analysts then combine these prices into an overall index score using a series of formulas and rules.

Another example of an index is the Duncan Socioeconomic Index (SEI). This index is used to quantify a person’s socioeconomic status (SES) and is a combination of three concepts: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These very different measures are combined to create an overall SES index score. However, SES index measurement has generated a lot of controversy and disagreement among researchers.

The process of creating an index is similar to that of a scale. First, conceptualize (define) the index and its constituent components. Though this appears simple, there may be a lot of disagreement on what components (concepts/constructs) should be included or excluded from an index. For instance, in the SES index, isn’t income correlated with education and occupation? And if so, should we include one component only or all three components? Reviewing the literature, using theories, and/or interviewing experts or key stakeholders may help resolve this issue. Second, operationalize and measure each component. For instance, how will you categorize occupations, particularly since some occupations may have changed with time (e.g., there were no Web developers before the Internet)? Third, create a rule or formula for calculating the index score. Again, this process may involve a lot of subjectivity. Lastly, validate the index score using existing or new data.

Differences Between Scales and Indices

Though indices and scales yield a single numerical score or value representing a concept of interest, they are different in many ways. First, indices often comprise components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Conversely, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale about customer satisfaction).

Second, indices often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Nevertheless, indexes and scales are both essential tools in social science research.

A note on scales and indices

Scales and indices seem like clean, convenient ways to measure different phenomena in social science, but just like with a lot of research, we have to be mindful of the assumptions and biases underneath. What if a scale or an index was developed using only White women as research participants? Is it going to be useful for other groups? It very well might be, but when using a scale or index on a group for whom it hasn’t been tested, it will be very important to evaluate the validity and reliability of the instrument, which we address in the next section.

It’s important to note that while scales and indices are often made up of nominal or ordinal variables, when we analyze them into composite scores, we will treat them as interval/ratio variables.

  • Scales and indices are common ways to collect information and involve using multiple indicators in measurement.
  • A key difference between a scale and an index is that a scale contains multiple indicators for one concept, whereas an indicator examines multiple concepts (components).
  • In order to create scales or indices, researchers must have a clear understanding of the indicators for what they are studying.
  • What is the level of measurement for each item on each tool? Take a second and think about why the tool’s creator decided to include these levels of measurement. Identify any levels of measurement you would change and why.
  • If these tools don’t exist for what you are interested in studying, why do you think that is?

11.4 Reliability and validity in measurement

  • Discuss measurement error, the different types, and how to minimize the probability of them
  • Differentiate between reliability and validity and understand how these are related to each other and relevant to understanding the value of a measurement tool
  • Compare and contrast the types of reliability and demonstrate how to evaluate each type
  • Compare and contrast the types of validity and demonstrate how to evaluate each type

The previous chapter provided insight into measuring concepts in social work research. We discussed the importance of identifying concepts and their corresponding indicators as a way to help us operationalize them. In essence, we now understand that when we think about our measurement process, we must be intentional and thoughtful in the choices that we make. Before we talk about how to evaluate our measurement process, let’s discuss why we want to evaluate our process. We evaluate our process so that we minimize our chances of error . But what is measurement error?

Types of Errors

We need to be concerned with two types of errors in measurement: systematic and random errors. Systematic errors are errors that are generally predictable. These are errors that, “are due to the process that biases the results.” [5] For instance, my cat stepping on the scale with me each morning is a systematic error in measuring my weight. I could predict that each measurement would be off by 13 pounds. (He’s a bit of a chonk.)

There are multiple categories of systematic errors.

  • Social desirability , occurs when you ask participants a question and they answer in the way that they feel is the most socially desired . For instance, let's imagine that you want to understand the level of prejudice that participants feel regarding immigrants and decide to conduct face-to-face interviews with participants. Some participants may feel compelled to answer in a way that indicates that they are less prejudiced than they really are. 
  • [pb_glossary id="2096"]Acquiescence bias  occurs when participants answer items in some type of pattern, usually skewed to more favorable responses. For example, imagine that you took a research class and loved it. The professor was great and you learned so much. When asked to complete the end of course questionnaire, you immediately mark "strongly agree" to all items without really reading all of the items. After all, you really loved the class. However, instead of reading and reflecting on each item, you "acquiesced" and used your overall impression of the experience to answer all of the items.
  • Leading questions are those questions that are worded in a way so that the participant is "lead" to a specific answer. For instance, think about the question, "Have you ever hurt a sweet, innocent child?" Most people, regardless of their true response, may answer "no" simply because the wording of the question leads the participant to believe that "no" is the correct answer.

In order to minimize these types of errors, you should think about what you are studying and examine potential public perceptions of this issue. Next, think about how your questions are worded and how you will administer your tool (we will discuss these in greater detail in the next chapter). This will help you determine if your methods inadvertently increase the probability of these types of errors. 

These errors differ from random errors , whic are "due to chance and are not systematic in any way." [6] Sometimes it is difficult to "tease out" random errors. When you take your statistics class, you will learn more about random errors and what to do about them. They're hard to observe until you start diving deeper into statistical analysis, so put a pin in them for now.

Now that we have a good understanding of the two types of errors, let's discuss what we can do to evaluate our measurement process and minimize the chances of these occurring. Remember, quality projects are clear on what is measured , how it is measured, and why it is measured . In addition, quality projects are attentive to the appropriateness of measurement tools and evaluate whether tools are used correctly and consistently.  But how do we do that? Good researchers  do not simply  assume  that their measures work. Instead, they collect data to  demonstrate that they work. If their research does not demonstrate that a measure works, they stop using it. There are two key factors to consider in deciding whether your measurements are good: reliability and validity.

Reliability

Reliability refers to the consistency of a measure. Psychologists consider three types of reliability: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).

Test-retest reliability

When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. Test-retest reliability is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the  same group of people at a later time. At neither point has the research participant received any sort of intervention. Once you have these two measurements, you then look at the correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient. Figure 11.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

A scatterplot with scores at time 1 on the x-axis and scores at time 2 on the y-axis, both ranging from 0 to 30. The dots on the scatter plot indicate a strong, positive correlation.

Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.

Internal consistency

Another kind of reliability is internal consistency , which is the consistency of people’s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants’ bets were consistently high or low across trials.

Interrater Reliability

Many behavioral measures involve significant judgment on the part of an observer or a rater. Interrater reliability is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other.

Validity , another key element of assessing measurement quality, is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account—reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. As an absurd example, imagine someone who believes that people’s index finger length reflects their self-esteem and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers. Although this measure would have extremely good test-retest reliability, it would have absolutely no validity. The fact that one person’s index finger is a centimeter longer than another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to interpret these types is that they are other kinds of evidence—in addition to reliability—that should be taken into account when judging the validity of a measure.

Face validity

Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. The finger-length method of measuring self-esteem, on the other hand, seems to have nothing to do with self-esteem and therefore has poor face validity. Although face validity can be assessed quantitatively—for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behavior, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. The Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality characteristics and disorders by having people decide whether each of over 567 different statements applies to them—where many of the statements do not have any obvious relationship to the construct that they measure. For example, the items “I enjoy detective or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both measure the suppression of aggression. In this case, it is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.

Content validity

Content validity is the extent to which a measure “covers” the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that they think positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.

Criterion validity

Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people’s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people’s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People’s scores on this measure should be correlated with their participation in “extreme” activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity ; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

Discriminant validity

Discriminant validity , on the other hand, is the extent to which scores on a measure are not  correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.

Increasing the reliability and validity of measures

We have reviewed the types of errors and how to evaluate our measures based on reliability and validity considerations. However, what can we do while selecting or creating our tool so that we minimize the potential of errors? Many of our options were covered in our discussion about reliability and validity. Nevertheless, the following table provides a quick summary of things that you should do when creating or selecting a measurement tool.

  • In measurement, two types of errors can occur: systematic, which we might be able to predict, and random, which are difficult to predict but can sometimes be addressed during statistical analysis.
  • There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
  • Validity is a judgment based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
  • Once you have used a measure, you should reevaluate its reliability and validity based on your new data. Remember that the assessment of reliability and validity is an ongoing process.
  • Provide a clear statement regarding the reliability and validity of these tools. What strengths did you notice? What were the limitations?
  • Think about your target population . Are there changes that need to be made in order for one of these tools to be appropriate for your population?
  • If you decide to create your own tool, how will you assess its validity and reliability?

11.5 Ethical and social justice considerations for measurement

  • Identify potential cultural, ethical, and social justice issues in measurement.

Just like with other parts of the research process, how we decide to measure what we are researching is influenced by our backgrounds, including our culture, implicit biases, and individual experiences. For me as a middle-class, cisgender white woman, the decisions I make about measurement will probably default to ones that make the most sense to me and others like me, and thus measure characteristics about us most accurately if I don't think carefully about it. There are major implications for research here because this could affect the validity of my measurements for other populations.

This doesn't mean that standardized scales or indices, for instance, won't work for diverse groups of people. What it means is that researchers must not ignore difference in deciding how to measure a variable in their research. Doing so may serve to push already marginalized people further into the margins of academic research and, consequently, social work intervention. Social work researchers, with our strong orientation toward celebrating difference and working for social justice, are obligated to keep this in mind for ourselves and encourage others to think about it in their research, too.

This involves reflecting on what we are measuring, how we are measuring, and why we are measuring. Do we have biases that impacted how we operationalized our concepts? Did we include st a keholders and gatekeepers in the development of our concepts? This can be a way to gain access to vulnerable populations. What feedback did we receive on our measurement process and how was it incorporated into our work? These are all questions we should ask as we are thinking about measurement. Further, engaging in this intentionally reflective process will help us maximize the chances that our measurement will be accurate and as free from bias as possible.

The NASW Code of Ethics discusses social work research and the importance of engaging in practices that do not harm participants. [14] This is especially important considering that many of the topics studied by social workers are those that are disproportionately experienced by marginalized and oppressed populations. Some of these populations have had negative experiences with the research process: historically, their stories have been viewed through lenses that reinforced the dominant culture's standpoint. Thus, when thinking about measurement in research projects, we must remember that the way in which concepts or constructs are measured will impact how marginalized or oppressed persons are viewed.  It is important that social work researchers examine current tools to ensure appropriateness for their population(s). Sometimes this may require researchers to use or adapt existing tools. Other times, this may require researchers to develop completely new measures. In summary, the measurement protocols selected should be tailored and attentive to the experiences of the communities to be studied.

But it's not just about reflecting and identifying problems and biases in our measurement, operationalization, and conceptualization - what are we going to  do about it? Consider this as you move through this book and become a more critical consumer of research. Sometimes there isn't something you can do in the immediate sense - the literature base at this moment just is what it is. But how does that inform what you will do later?

  • Social work researchers must be attentive to personal and institutional biases in the measurement process that affect marginalized groups.
  • What are the potential social justice considerations surrounding your methods?
  • What are some strategies you could employ to ensure that you engage in ethical research?
  • Milkie, M. A., & Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. Journal of Health and Social Behavior, 52 , 4–22 ↵
  • Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science . San Francisco, CA: Chandler Publishing Company. ↵
  • Earl Babbie offers a more detailed discussion of Kaplan’s work in his text. You can read it in: Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth. ↵
  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1–55. ↵
  • Engel, R. & Schutt, R. (2013). The practice of research in social work (3rd. ed.) . Thousand Oaks, CA: SAGE. ↵
  • Engel, R. & Shutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE. ↵
  • Sullivan G. M. (2011). A primer on the validity of assessment instruments. Journal of graduate medical education, 3 (2), 119–120. doi:10.4300/JGME-D-11-00075.1 ↵
  • https://www.socialworkers.org/about/ethics/code-of-ethics/code-of-ethics-english ↵

The process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.

In measurement, conditions that are easy to identify and verify through direct observation.

In measurement, conditions that are subtle and complex that we must use existing knowledge and intuition to define.

The process of determining how to measure a construct that cannot be directly observed

Conditions that are not directly observable and represent states of being, experiences, and ideas.

“a logical grouping of attributes that can be observed and measured and is expected to vary from person to person in a population” (Gillespie & Wagner, 2018, p. 9)

The level that describes the type of operations can be conducted with your data. There are four nominal, ordinal, interval, and ratio.

Level of measurement that follows nominal level. Has mutually exclusive categories and a hierarchy (order).

A higher level of measurement. Denoted by having mutually exclusive categories, a hierarchy (order), and equal spacing between values. This last item means that values may be added, subtracted, divided, and multiplied.

The highest level of measurement. Denoted by mutually exclusive categories, a hierarchy (order), values can be added, subtracted, multiplied, and divided, and the presence of an absolute zero.

variables whose values are organized into mutually exclusive groups but whose numerical values cannot be used in mathematical operations.

variables whose values are mutually exclusive and can be used in mathematical operations

The differerence between that value that we get when we measure something and the true value

Errors that are generally predictable.

Errors lack any perceptable pattern.

The ability of a measurement tool to measure a phenomenon the same way, time after time. Note: Reliability does not imply validity.

The extent to which scores obtained on a scale or other measure are consistent across time

The extent to which different observers are consistent in their assessment or rating of a particular characteristic or item.

The extent to which the scores from a measure represent the variable they are intended to.

The extent to which a measurement method appears “on its face” to measure the construct of interest

The extent to which a measure “covers” the construct of interest, i.e., it's comprehensiveness to measure the construct.

The extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.

A type of Criterion validity. Examines how well a tool provides the same scores as an already existing tool.

A type of criterion validity that examines how well your tool predicts a future criterion.

the group of people whose needs your study addresses

individuals or groups who have an interest in the outcome of the study you conduct

the people or organizations who control access to the population you want to study

Graduate research methods in social work Copyright © 2020 by Matthew DeCarlo, Cory Cummings, Kate Agnelli is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for Mavs Open Press

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

10.1 What is measurement?

Learning objectives.

Learners will be able to…

  • Define measurement
  • Explain where measurement fits into the process of designing research
  • Apply Kaplan’s three categories to determine the complexity of measuring a given variable

Pre-awareness check (Knowledge)

What do you already know about measuring key variables in your research topic?

In social science, when we use the term  measurement , we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. In this chapter, we’ll use the term “concept” to mean an abstraction that has meaning. Concepts can be understood from our own experiences or from particular facts, but they don’t have to be limited to real-life phenomenon. We can have a concept of anything we can imagine or experience such as weightlessness, friendship, or income. Understanding exactly what our concepts mean is necessary in order to measure them.

In research, measurement is a systematic procedure for assigning scores, meanings, and descriptions to concepts so that those scores represent the characteristic of interest. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.

Where does measurement fit in the process of designing research?

Table 10.1 is intended as a partial review and outlines the general process researchers can follow to get from problem formulation to data collection, including measurement. Use the drop down feature in the table to view the examples for each component of the research process. Keep in mind that this process is iterative. For example, you may find something in your literature review that leads you to refine your conceptualizations, or you may discover as you attempt to conceptually define your terms that you need to return back to the literature for further information. Accordingly, this table should be seen as a suggested path to take rather than an inflexible rule about how research must be conducted.

Table 10.1. Components of the Research Process from Problem Formulation to Data Collection. Note. Information on attachment theory in this table came from: Bowlby, J. (1978). Attachment theory and its therapeutic implications. Adolescent Psychiatry, 6 , 5-33

Categories of concepts that social scientists measure

In 1964, philosopher Abraham Kaplan (1964) [1] wrote The Conduct of Inquiry , which has been cited over 8,500 times. [2] In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called “observational terms,” is probably the simplest to measure in social science. Observational terms are simple concepts. They are the sorts of things that we can see with the naked eye simply by looking at them. Kaplan roughly defines them as concepts that are easy to identify and verify through direct observation. If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.

Indirect observables , on the other hand, are less straightforward concepts to assess. In Kaplan’s framework, they are conditions that are subtle and complex that we must use existing knowledge and intuition to define. If we conducted a study for which we wished to know a person’s income, we’d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won’t have directly observed any of those people being born in the locations they report.

Sometimes the concepts that we are interested in are more complex and more abstract than observational terms or indirect observables. Because they are complex, constructs generally consist of more than one concept. Let’s take for example, the construct “bureaucracy.” We know this term has something to do with hierarchy, organizations, and how they operate but measuring such a construct is trickier than measuring something like a person’s income because of the complexity involved. Here’s another construct: racism. What is racism? How would you measure it? The constructs of racism and bureaucracy represent constructs whose meanings we have come to agree on.

Though we may not be able to observe constructs directly, we can observe their components. In Kaplan’s categorization, constructs are concepts that are “not observational either directly or indirectly” (Kaplan, 1964, p. 55), [3] but they can be defined based on observables. An example would be measuring the construct of depression. A diagnosis of depression can be made through the DSM-V which includes diagnostic criteria of fatigue, poor concentration, etc. Each of these components of depression can be observed indirectly. We are able to measure constructs by defining them in terms of what we can observe. Though we may not be able to observe them, we can observe their components.

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS):

Look at the variables in your research question.

  • Classify them as direct observables, indirect observables, or constructs.
  • Do you think measuring them will be easy or hard?
  • What are your first thoughts about how to measure each variable? No wrong answers here, just write down a thought about each variable.

TRACK 2 (IF YOU AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS): 

You are interested in studying older adults’ social-emotional well-being. Specifically, you would like to research the impact on levels of older adult loneliness of an intervention that pairs older adults living in assisted living communities with university student volunteers for a weekly conversation.

Develop a working research question for this topic. Then, look at the variables in your research question.

  • Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science. San Francisco, CA: Chandler Publishing Company. ↵
  • Earl Babbie offers a more detailed discussion of Kaplan’s work in his text. You can read it in: Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth. ↵
  • Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science . San Francisco, CA: Chandler Publishing Company. ↵

The process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena under investigation in a research study.

In measurement, conditions that are easy to identify and verify through direct observation.

things that require subtle and complex observations to measure, perhaps we must use existing knowledge and intuition to define.

Conditions that are not directly observable and represent states of being, experiences, and ideas.

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Levels of Measurement | Nominal, Ordinal, Interval and Ratio

Published on July 16, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Levels of measurement, also called scales of measurement, tell you how precisely variables are recorded. In scientific research, a variable is anything that can take on different values across your data set (e.g., height or test scores).

There are 4 levels of measurement:

  • Nominal : the data can only be categorized
  • Ordinal : the data can be categorized and ranked
  • Interval : the data can be categorized, ranked, and evenly spaced
  • Ratio : the data can be categorized, ranked, evenly spaced, and has a natural zero.

Depending on the level of measurement of the variable, what you can do to analyze your data may be limited. There is a hierarchy in the complexity and precision of the level of measurement, from low (nominal) to high (ratio).

Table of contents

Nominal, ordinal, interval, and ratio data, why are levels of measurement important, which descriptive statistics can i apply on my data, quiz: nominal, ordinal, interval, or ratio, other interesting articles, frequently asked questions about levels of measurement.

Going from lowest to highest, the 4 levels of measurement are cumulative. This means that they each take on the properties of lower levels and add new properties.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The level at which you measure a variable determines how you can analyze your data.

The different levels limit which descriptive statistics you can use to get an overall summary of your data, and which type of inferential statistics you can perform on your data to support or refute your hypothesis .

In many cases, your variables can be measured at different levels, so you have to choose the level of measurement you will use before data collection begins.

  • Ordinal level: You create brackets of income ranges: $0–$19,999, $20,000–$39,999, and $40,000–$59,999. You ask participants to select the bracket that represents their annual income. The brackets are coded with numbers from 1–3.
  • Ratio level: You collect data on the exact annual incomes of your participants.

At a ratio level, you can see that the difference between A and B’s incomes is far greater than the difference between B and C’s incomes.

Descriptive statistics help you get an idea of the “middle” and “spread” of your data through measures of central tendency and variability .

When measuring the central tendency or variability of your data set, your level of measurement decides which methods you can use based on the mathematical operations that are appropriate for each level.

The methods you can apply are cumulative; at higher levels, you can apply all mathematical operations and measures used at lower levels.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

social work research level of measurement

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Levels of measurement tell you how precisely variables are recorded. There are 4 levels of measurement, which can be ranked from low to high:

  • Nominal : the data can only be categorized.
  • Ordinal : the data can be categorized and ranked.
  • Interval : the data can be categorized and ranked, and evenly spaced.
  • Ratio : the data can be categorized, ranked, evenly spaced and has a natural zero.

Depending on the level of measurement , you can perform different descriptive statistics to get an overall summary of your data and inferential statistics to see if your results support or refute your hypothesis .

Some variables have fixed levels. For example, gender and ethnicity are always nominal level data because they cannot be ranked.

However, for other variables, you can choose the level of measurement . For example, income is a variable that can be recorded on an ordinal or a ratio scale:

  • At an ordinal level , you could create 5 income groupings and code the incomes that fall within them from 1–5.
  • At a ratio level , you would record exact numbers for income.

If you have a choice, the ratio level is always preferable because you can analyze data in more ways. The higher the level of measurement, the more precise your data is.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Levels of Measurement | Nominal, Ordinal, Interval and Ratio. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/statistics/levels-of-measurement/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, descriptive statistics | definitions, types, examples, central tendency | understanding the mean, median & mode, nominal data | definition, examples, data collection & analysis, what is your plagiarism score.

Logo for The University of Regina OEP Program

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

16 5.1 Measurement

Learning objectives.

  • Define measurement
  • Describe Kaplan’s three categories of the things that social scientists measure

Measurement is important. Recognizing that fact, and respecting it, will be of great benefit to you—both in research methods and in other areas of life as well.  Measurement is critical to successfully pulling off a social scientific research project. In social science, when we use the term measurement we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one’s terms in as clear and precise a way as possible. Of course, measurement in social science isn’t quite as simple as using a measuring cup or spoon, but there are some basic tenants on which most social scientists agree when it comes to measurement. We’ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.

social work research level of measurement

What do social scientists measure?

  The question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you’ve learned about in other social work classes you’ve taken or the topics you’ve considered investigating yourself. Let’s consider Melissa Milkie and Catharine Warner’s study (2011) of first graders’ mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we’re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.

As you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person’s gender shapes their workplace experiences must measure gender and workplace experiences. You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.

In 1964, philosopher Abraham Kaplan (1964) wrote The Conduct of Inquiry, which has since become a classic work in research methodology (Babbie, 2010).  In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called “observational terms,” is probably the simplest to measure in social science. Observational terms are the sorts of things that we can see with the naked eye simply by looking at them. They are terms that “lend themselves to easy and confident verification” (Kaplan, 1964, p. 54). If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.

social work research level of measurement

Indirect observables , on the other hand, are less straightforward to assess. They are “terms whose application calls for relatively more subtle, complex, or indirect observations, in which inferences play an acknowledged part. Such inferences concern presumed connections, usually causal, between what is directly observed and what the term signifies” (Kaplan, 1964, p. 55). If we conducted a study for which we wished to know a person’s income, we’d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won’t have directly observed any of those people being born in the locations they report.

Sometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables. Think about some of the concepts you’ve learned about in other social work classes—for example, ethnocentrism. What is ethnocentrism? Well, from completing an introduction to social work class you might know that it has something to do with the way a person judges another’s culture. But how would you measure it? Here’s another construct: bureaucracy. We know this term has something to do with organizations and how they operate, but measuring such a construct is trickier than measuring, say, a person’s income. In both cases, ethnocentrism and bureaucracy, these theoretical notions represent ideas whose meaning we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe the things that they are made up of.

Kaplan referred to these more abstract things that behavioral scientists measure as constructs. Constructs are “not observational either directly or indirectly” (Kaplan, 1964, p. 55), but they can be defined based on observables. For example, the construct of bureaucracy could be measured by counting the number of supervisors that need to approve routine spending by public administrators. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe.

Thus far, we have learned that social scientists measure what Kaplan called observational terms, indirect observables, and constructs. These terms refer to the different sorts of things that social scientists may be interested in measuring. But how do social scientists measure these things? That is the next question we’ll tackle.

How do social scientists measure?

  Measurement in social science is a process. It occurs at multiple stages of a research project: in the planning stages, in the data collection stage, and sometimes even in the analysis stage. Recall that previously we defined measurement as the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. Once we’ve identified a research question, we begin to think about what some of the key ideas are that we hope to learn from our project. In describing those key ideas, we begin the measurement process.

Let’s say that our research question is the following: How do new college students cope with the adjustment to college? In order to answer this question, we’ll need some idea about what coping means. We may come up with an idea about what coping means early in the research process, as we begin to think about what to look for (or observe) in our data-collection phase. Once we’ve collected data on coping, we also have to decide how to report on the topic. Perhaps, for example, there are different types or dimensions of coping, some of which lead to more successful adjustment than others. However we decide to proceed, and whatever we decide to report, the point is that measurement is important at each of these phases.

As the preceding example demonstrates, measurement is a process in part because it occurs at multiple stages of conducting research. We could also think of measurement as a process because it involves multiple stages. From identifying your key terms to defining them to figuring out how to observe them and how to know if your observations are any good, there are multiple steps involved in the measurement process. An additional step in the measurement process involves deciding what elements your measures contain. A measure’s elements might be very straightforward and clear, particularly if they are directly observable. Other measures are more complex and might require the researcher to account for different themes or types. These sorts of complexities require paying careful attention to a concept’s level of measurement and its dimensions. We’ll explore these complexities in greater depth at the end of this chapter, but first let’s look more closely at the early steps involved in the measurement process, starting with conceptualization.

Key Takeaways

  • Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.
  • Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs.
  • Measurement occurs at all stages of research.
  • Constructs- are not observable but can be defined based on observable characteristics
  • Indirect observables- things that require indirect observation and inference to measure
  • Measurement- the process by which researchers describe and ascribe meaning to the key facts, concepts, or other phenomena they are investigating
  • Observational terms- things that we can see with the naked eye simply by looking at them

Image attributions

measuring tape by unknown CC-0

human observer by geralt CC-0

Foundations of Social Work Research Copyright © 2020 by Rebecca L. Mauldin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.1 Measurement

Learning objectives.

  • Define measurement
  • Describe Kaplan’s three categories of the things that social scientists measure

Measurement is important. Recognizing that fact, and respecting it, will be of great benefit to you—both in research methods and in other areas of life as well. If, for example, you have ever baked a cake, you know well the importance of measurement. As someone who much prefers rebelling against precise rules over following them, I once learned the hard way that measurement matters. A couple of years ago I attempted to bake my wife a birthday cake without the help of any measuring utensils. I’d baked before, I reasoned, and I had a pretty good sense of the difference between a cup and a tablespoon. How hard could it be? As it turns out, it’s not easy guesstimating precise measures. That cake was the lumpiest, most lopsided cake I’ve ever seen. And it tasted kind of like Play-Doh. Unfortunately for my wife, I did not take measurement seriously and it showed.

blue measuring tape

Just as measurement is critical to successful baking, it is as important to successfully pulling off a social scientific research project. In social science, when we use the term measurement we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one’s terms in as clear and precise a way as possible. Of course, measurement in social science isn’t quite as simple as using a measuring cup or spoon, but there are some basic tenants on which most social scientists agree when it comes to measurement. We’ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.

What do social scientists measure?

  The question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you’ve learned about in other social work classes you’ve taken or the topics you’ve considered investigating yourself. Let’s consider Melissa Milkie and Catharine Warner’s study (2011)  [1] of first graders’ mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we’re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.

As you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person’s gender shapes their workplace experiences must measure gender and workplace experiences. You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.

In 1964, philosopher Abraham Kaplan (1964)  [2] wrote The Conduct of Inquiry, which has since become a classic work in research methodology (Babbie, 2010).  [3] In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called “observational terms,” is probably the simplest to measure in social science. Observational terms are the sorts of things that we can see with the naked eye simply by looking at them. They are terms that “lend themselves to easy and confident verification” (Kaplan, 1964, p. 54). If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.

three people sitting on a bench in front of a wall composed of different people's photographs

Indirect observables , on the other hand, are less straightforward to assess. They are “terms whose application calls for relatively more subtle, complex, or indirect observations, in which inferences play an acknowledged part. Such inferences concern presumed connections, usually causal, between what is directly observed and what the term signifies” (Kaplan, 1964, p. 55). If we conducted a study for which we wished to know a person’s income, we’d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won’t have directly observed any of those people being born in the locations they report.

Sometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables. Think about some of the concepts you’ve learned about in other social work classes—for example, ethnocentrism. What is ethnocentrism? Well, from completing an introduction to social work class you might know that it has something to do with the way a person judges another’s culture. But how would you measure it? Here’s another construct: bureaucracy. We know this term has something to do with organizations and how they operate, but measuring such a construct is trickier than measuring, say, a person’s income. In both cases, ethnocentrism and bureaucracy, these theoretical notions represent ideas whose meaning we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe the things that they are made up of.

Kaplan referred to these more abstract things that behavioral scientists measure as constructs. Constructs are “not observational either directly or indirectly” (Kaplan, 1964, p. 55), but they can be defined based on observables. For example, the construct of bureaucracy could be measured by counting the number of supervisors that need to approve routine spending by public administrators. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe.

Thus far, we have learned that social scientists measure what Kaplan called observational terms, indirect observables, and constructs. These terms refer to the different sorts of things that social scientists may be interested in measuring. But how do social scientists measure these things? That is the next question we’ll tackle.

How do social scientists measure?

  Measurement in social science is a process. It occurs at multiple stages of a research project: in the planning stages, in the data collection stage, and sometimes even in the analysis stage. Recall that previously we defined measurement as the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. Once we’ve identified a research question, we begin to think about what some of the key ideas are that we hope to learn from our project. In describing those key ideas, we begin the measurement process.

Let’s say that our research question is the following: How do new college students cope with the adjustment to college? In order to answer this question, we’ll need some idea about what coping means. We may come up with an idea about what coping means early in the research process, as we begin to think about what to look for (or observe) in our data-collection phase. Once we’ve collected data on coping, we also have to decide how to report on the topic. Perhaps, for example, there are different types or dimensions of coping, some of which lead to more successful adjustment than others. However we decide to proceed, and whatever we decide to report, the point is that measurement is important at each of these phases.

As the preceding example demonstrates, measurement is a process in part because it occurs at multiple stages of conducting research. We could also think of measurement as a process because it involves multiple stages. From identifying your key terms to defining them to figuring out how to observe them and how to know if your observations are any good, there are multiple steps involved in the measurement process. An additional step in the measurement process involves deciding what elements your measures contain. A measure’s elements might be very straightforward and clear, particularly if they are directly observable. Other measures are more complex and might require the researcher to account for different themes or types. These sorts of complexities require paying careful attention to a concept’s level of measurement and its dimensions. We’ll explore these complexities in greater depth at the end of this chapter, but first let’s look more closely at the early steps involved in the measurement process, starting with conceptualization.

Key Takeaways

  • Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.
  • Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs.
  • Measurement occurs at all stages of research.
  • Constructs- are not observable but can be defined based on observable characteristics
  • Indirect observables- things that require indirect observation and inference to measure
  • Measurement- the process by which researchers describe and ascribe meaning to the key facts, concepts, or other phenomena they are investigating
  • Observational terms- things that we can see with the naked eye simply by looking at them

Image attributions

measuring tape by unknown CC-0

human observer by geralt CC-0

  • Milkie, M. A., & Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. Journal of Health and Social Behavior, 52 , 4–22. ↵
  • Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science . San Francisco, CA: Chandler Publishing Company. ↵
  • Earl Babbie offers a more detailed discussion of Kaplan’s work in his text. You can read it in: Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth. ↵

Scientific Inquiry in Social Work Copyright © 2018 by Matthew DeCarlo is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • USC Libraries
  • Research Guides
  • Social Work *
  • Tests & Measurements

Social Work *: Tests & Measurements

  • Search Operators
  • Developing a Search Strategy
  • Scholarly and Popular Sources
  • Reading a scholarly research article effectively
  • Evaluating Your Sources - How do I know this is reliable?
  • Organizing your research with Citation Managers
  • Library Services
  • Full Text Articles
  • Find Dissertations
  • Find Books and E-books
  • Using Google Scholar
  • Find Newspaper Articles
  • Finding Grey Literature in Social Work
  • What are journal impact factors?
  • How do you find a journal impact factor?
  • Workshops and Webinars
  • Antiracist/Anti-oppression Resources
  • Evidence-based Practice Resources
  • Treatment Planners
  • Where do I find background information on a topic?
  • What is a Policy Brief?
  • Topic Specific Statistics and Data
  • Documentaries, Therapy Demonstrations, Training Videos
  • Distance Learning
  • Writing Resources This link opens in a new window
  • Course Guides
  • DSW (Doctor of Social Work) Resources
  • Faculty Resources
  • Alumni Resources
  • APA 7th Edition

What are tests and measurements?

Tests and measurements are typically used for research and assessment. They can be referred to as surveys, instruments, questionnaires, scales, tests, or  measures. 

These measurement tools can be difficult to locate. That is because:\

  • There is no database that provides  free,  full-text copies
  • Some may only be available as a supplement - you may need to  contact the author or publisher for a copy
  • Tests are usually only available for purchase and the full text may not be readily available. 

Reviews of tests are available through the Mental Measurements Yearbook . If there isn't a review of the test you're looking for, search for the name of the test in Google Scholar or in PsycINFO to find the article it was originally published in. That article will give you all the information about the development of the test.

APA PsycTESTS , linked below, contains the full text of some tests, instruments and scales. Sometimes you can view a test because it has been published in a journal article or dissertation.  In PsycTESTS , in the advanced search section , change the field to "tests & measures" in order to find tests or you can select that filter once your do the search and get your results. 

undefined

You can search the Tests and Measurements field of APA PsycINFO  to find articles that used a particular test or measurement. This can help you answer questions about how the measurement has been used, and whether there are related tests in a particular research interest. See below example: 

social work research level of measurement

Tests and Measurements Databases

social work research level of measurement

  • Health and Psychosocial Instruments (HAPI) This link opens in a new window Locate measurement instruments such as surveys, questionnaires, tests, surveys, coding schemes, checklists, rating scales, vignettes, etc. Scope includes medicine, nursing, public health , psychology, social work, communication, sociology, etc.
  • Mental Measurements Yearbook (MMY) This link opens in a new window Use MMY to find REVIEWS of testing instruments. Actual test instruments are NOT provided. Most reviews discuss validity and reliability of tool. To purchase or obtain the actual test materials, you will need to contact the test publisher(s).
  • National Institute on Drug Abuse Screening Tools A chart of screening tools relevant to drug use from the National Institutes of Health
  • PsycINFO This link opens in a new window Abstract and citation database of scholarly literature in psychological, social, behavioral, and health sciences. Includes journal articles, books, reports, theses, and dissertations from 1806 to present.
  • PsycTESTS This link opens in a new window PsycTESTS is a research database that provides access to psychological tests, measures, scales, surveys, and other assessments as well as descriptive information about the test and its development. Records also discuss reliability and validity of the tool. Some records include full-text of the test.
  • SAMHSA-HRSA Center for Integrated Health Solutions Recommended screening resources and tools for mental health and substance use problems

Tests and Measurements Tutorials

Free Instrument Resources

  • BRFSS Questionnaires Developed by CDC & Behavioral Risk Factor Surveillance System (BRFSS) questionnaire designed to collect uniform, state-specific data on preventive health practices and risk behaviors
  • RAND These surveys are designed for a wide range of purposes, including assessing patients' health, screening for mental health conditions, and measuring quality of care and quality of life.
  • The National Assessment of Educational Progress (NAEP) Survey questionnaires collect additional information about students’ demographics and K-12 education experiences.
  • Positive Psychology Research Center Developed by UPenn - Questionnaires to understand and build the emotions, and the strengths and virtues that enable individuals and communities to thrive.
  • National Survey of Children’s Health Questionnaires, Datasets, and Supporting Documents Questionnaire used to examine the physical and emotional health of children ages 0-17 years of age
  • (CHIPTS) Resource Library Innovative interventions to optimize care and treatment of diverse community health issues; including, drug and alcohol use, and HIV-AIDs.
  • << Previous: Evidence-based Practice Resources
  • Next: Treatment Planners >>
  • Last Updated: Feb 23, 2024 1:08 PM
  • URL: https://libguides.usc.edu/socialwork

social work research level of measurement

Research Scales (Nominal, Ordinal, Interval, and Ratio) and the ASWB Exam

Agents of change.

  • November 11, 2023

social work research level of measurement

Are you studying for the ASWB exam and feeling a bit overwhelmed by the research methodology section, especially the part about research scales? You’re not alone! Many folks find this topic tricky.

In this article, we’re going to break down the concepts of research scales (nominal, ordinal, interval, and ratio) and explore how they’re relevant for the ASWB exam.

Learn more about the ASWB exam and create a personalized ASWB study plan with Agents of Change. We’ve helped thousands of Social Workers pass their ASWB exams and want to help you be next!

1) Understanding Research Scales: The Basics

Let’s kick things off by explaining the core concepts of research scales. These scales are the building blocks of research methodology, especially in fields like Social Work where data-driven decisions are the norm. Remember, there are four main types of scales: nominal, ordinal, interval, and ratio. Each plays a unique role in how we interpret and use data.

Nominal Scales: The Simplest Form

  • What They Are : Nominal scales are all about categorization without any order or hierarchy. They are the most basic type of scale, where data is simply tagged with labels.
  • Example : If you’re categorizing clients based on their primary language (e.g., English, Spanish, Mandarin), you’re using a nominal scale. It’s like putting things into different buckets without caring which bucket is bigger or smaller.
  • Importance for ASWB Exam : Expect to see questions that ask you to identify or use nominal scales in Social Work scenarios.

Ordinal Scales: A Step Beyond

  • What They Are : Now, we’re introducing the idea of order. Ordinal scales rank data, but the intervals between ranks are not necessarily equal.
  • Example : Consider a survey where clients rate their satisfaction on a scale from 1 (very dissatisfied) to 5 (very satisfied). This is an ordinal scale. It’s like climbing a ladder, where each rung takes you higher but the distance between rungs might vary.
  • Importance for ASWB Exam : These scales are key in understanding assessments and questionnaires, common tools in social work research.

Interval Scales: Getting More Precise

  • What They Are : Interval scales are a bit more complex. They have equal intervals between values but lack a true zero point, which means you can’t make definitive statements about ratios.
  • Example : The classic example is temperature measurements in Celsius or Fahrenheit. The difference between 10°C and 20°C is the same as between 20°C and 30°C, but you can’t say that 20°C is twice as hot as 10°C.
  • Importance for ASWB Exam : Understanding interval scales is crucial for interpreting studies and reports that involve this type of data, such as psychological scales.

Ratio Scales: The Complete Package

  • What They Are : Ratio scales have all the features of interval scales but with the addition of a true zero point. This allows for meaningful statements about ratios.
  • Example : Weight is a great example. If one client weighs 60 kg and another weighs 120 kg, you can state that the 2nd client is twice as heavy as the 1st.
  • Importance for ASWB Exam : These scales are often used in research involving quantifiable data, such as time spent in therapy or dosage of medication. A deep understanding of ratio scales can significantly help in interpreting and applying research findings in Social Work practice.

Why Bother With Scales?

Each scale provides a different lens through which to view and understand data. In Social Work research and on the ASWB exam, it’s crucial to choose the right scale for the right purpose. Using the wrong scale can lead to misinterpretation of data, which could have real-world consequences.

Learn more about Research Scales and additional tips and tricks for the ASWB exam with Agents of Change

2) The ASWB Exam and Research Scales

This section of the exam isn’t just about memorizing definitions; it’s about understanding how these scales are woven into the fabric of Social Work research and, by extension, Social Work practice. Here’s why they are so crucial:

1. The Heart of Exam Questions

  • Interpreting Data : Many questions on the ASWB exam revolve around interpreting research data. Understanding the type of scale used in a study helps you make sense of the results. For instance, knowing the difference between ordinal and interval scales can change how you interpret survey results.
  • Scenario-Based Questions : The exam often presents scenarios where you’ll need to choose the appropriate scale for a given research design or interpret data collected using these scales. This is where your theoretical knowledge meets practical application. Agents of Change prep courses offer hundreds of practice questions including scenario-based questions!

2. Real-World Application in Social Work

  • Evidence-Based Practice : In Social Work, evidence-based practice is key, and this evidence often comes from research. Whether you’re evaluating the effectiveness of a program or understanding community needs, knowing which scale was used in the research informs your conclusions and decisions.
  • Client Assessments : When assessing clients, you’ll often use tools that rely on these scales. For instance, a mental health questionnaire might use an ordinal scale to gauge the severity of symptoms.

3. Critical Thinking and Analytical Skills

  • Beyond the Surface : The exam tests your ability to think critically about research methodology. This means not just knowing what each scale is, but understanding their implications in research design and data interpretation.
  • Making Informed Decisions : In Social Work, you’re often faced with making decisions based on data. A solid understanding of these scales ensures that you’re making informed, reliable, and valid decisions.

4. Ethical Considerations and Research

  • Ethical Use of Data : Ethical practice in Social Work extends to research. Using the appropriate scale for your data collection respects the integrity of your research and ensures that your conclusions are ethically sound.
  • Cultural Sensitivity in Research : Different scales can have different implications in terms of cultural sensitivity. For instance, nominal scales are often used in research involving diverse populations, where respecting and accurately representing different groups is crucial.

5. Enhancing Communication Skills

  • Translating Data to Practice : As a Social Worker, you’ll often need to explain research findings to clients or colleagues who may not have a background in research methodology. Understanding these scales helps you translate complex data into understandable and actionable information.
  • Advocacy and Policy Development : When advocating for policy changes or program funding, being able to cite and explain research findings effectively can be a game changer. This often involves discussing the type of data collected and how it was measured.

Agents of Change programs include 2 live study groups each month and hundreds of practice questions on key ASWB topics.

3) In-Depth Analysis: Research Scales in Social Work Research

Diving deeper into the world of research scales, let’s explore how these scales are not just theoretical concepts, but practical tools that shape the landscape of Social Work research.

Nominal Scales and Social Work Research

  • How They Are Used : Nominal scales are incredibly versatile in Social Work research. They’re the go-to when the research focuses on categorization without implying any inherent order or rank.
  • Real-Life Application : Social Workers use nominal scales to categorize various demographic factors like ethnicity, gender, or marital status. It’s crucial in studies aimed at understanding the distribution of certain characteristics in a population.
  • Impact on Decision Making : Decisions regarding resource allocation, program development, and policy formulation are often based on the data categorized using nominal scales. They help identify the needs of different groups within a community.

Ordinal Scales in Action

  • How They’re Used : Ordinal scales bring a sense of hierarchy or order to the data but without precise quantification of the differences between ranks.
  • Real-Life Application : In clinical settings, ordinal scales are used for assessing the severity of symptoms, client satisfaction, or the stage of recovery. These scales are pivotal in tracking progress and evaluating treatment effectiveness.
  • Understanding Trends : Ordinal scales can reveal trends in client responses or changes over time, which is invaluable in longitudinal studies or outcome evaluations in Social Work.

Interval Scales: The Middle Ground

  • How They’re Used : Interval scales are about measuring quantities with equal intervals, which allows for more nuanced comparisons.
  • Real-Life Application : Social Workers often encounter interval scales in standardized testing and psychological assessments where attitudes, beliefs, or certain abilities are measured.
  • Implications for Practice : The data from interval scales can inform clinical decisions, program evaluations, and research studies, especially those focused on measuring changes in attitudes or behaviors over time.

Ratio Scales: The Gold Standard

  • How They’re Used : Ratio scales offer the highest level of measurement precision, making them ideal for quantifiable data.
  • Real-Life Application : In Social Work research, ratio scales are used for measuring variables like age, income, time spent in therapy, or dosage of medication.
  • Driving Evidence-Based Practice : The precision of ratio scales aids in developing evidence-based practices. They allow for clear, quantifiable comparisons and assessments, crucial for effective intervention strategies.

Interplay and Integration in Research

  • Combining Scales for Comprehensive Insights : Often, Social work research doesn’t rely on just one type of scale. Studies may combine different scales to get a more comprehensive view. For example, a study might use nominal scales to categorize participants and ratio scales to measure the outcomes of an intervention.
  • Challenges and Considerations : It’s important to choose the right scale for the right purpose. Misapplying a scale can lead to inaccurate conclusions or misinterpretations. Social Workers must be mindful of the strengths and limitations of each scale in their research designs.

4) FAQs – Research Scales and the ASWB Exam

Q: How do research scales impact the interpretation of studies in Social Work?

A: Research scales fundamentally shape how we interpret studies. For instance:

  • Nominal scales help us understand the prevalence or distribution of categorical data, like the number of clients from different ethnic backgrounds.
  • Ordinal scales provide insights into ranked data, like the severity of symptoms, allowing us to grasp trends or general perceptions.
  • Interval scales offer a more precise comparison of quantities, like changes in attitudes measured through surveys, where we can understand the degree of change but not the exact value.
  • Ratio scales give the most precise interpretation, allowing us to make definitive statements about quantities, like the exact amount of time spent in therapy.

Each scale offers a different level of depth and precision, influencing the conclusions we can draw from a study.

Q: Can the misuse of research scales lead to ethical issues in Social Work practice?

A: Absolutely! Ethical issues can arise if research scales are misused. For example:

  • Inaccurate Representation: Using the wrong scale can lead to misrepresentation of data. If ordinal data are treated as interval, it might falsely imply a level of precision that isn’t there.
  • Misguided Decisions: Decisions based on poorly interpreted data can affect the allocation of resources or the efficacy of interventions.
  • Cultural Sensitivity: Inappropriate use of scales, especially nominal scales, can lead to oversimplification or stereotyping of complex cultural or social identities.

Ethical practice in Social Work research demands careful consideration of how scales are used and interpreted.

Q: How can a Social Worker continue to improve their understanding and application of these scales beyond preparing for the ASWB exam?

A: Continuing education and practical application are key:

  • Professional Development Courses : Many organizations offer courses on research methodology and data analysis.
  • Regularly Reading Research Journals : Staying updated with current research helps understand how scales are used in different studies.
  • Participating in Research Projects : Hands-on experience in research projects can deepen understanding.
  • Peer Discussions and Workshops : Engaging with colleagues in discussions or workshops about research findings and methodologies can offer new perspectives and insights.

5) Conclusion

As we wrap up this exploration of research scales (nominal, ordinal, interval, and ratio) and their significant role in the ASWB exam, it’s clear that this isn’t just an academic pursuit. These scales are the backbone of effective, ethical, and informed Social Work practice.

They offer a framework for understanding and interpreting data, crucial for both the exam and real-world scenarios in Social Work. The mastery of these concepts equips aspiring Social Workers with the tools to make sound, data-driven decisions that can profoundly impact individuals and communities. While the journey through understanding these scales might seem daunting at first, remember that it’s a journey toward becoming a more competent and effective Social Worker.

Furthermore, the intersection of these research scales with the ASWB exam underscores the exam’s emphasis on not just theoretical knowledge, but also practical skills. This includes interpreting research findings, assessing client needs, and contributing to the broader field of Social Work research.

6) Practice Question – Research Scales

A Social Worker is analyzing a survey that categorizes respondents’ level of education. The categories are ‘Less than high school’, ‘High school graduate’, ‘Some college’, ‘Bachelor’s degree’, and ‘Graduate degree’. This categorization of education levels is an example of which type of scale?

A) Nominal scale

B) Ordinal scale

C) Interval scale

D) Ratio scale

Correct Answer: B) Ordinal scale

Rationale: The correct answer is B. In the given scenario, the categorization of education levels represents an ordinal scale. An ordinal scale is used for ranking or ordering items that are not equidistant from each other. In this case, the educational categories are ordered or ranked based on the level of education achieved. However, the intervals between these categories are not necessarily equal (for example, the difference in years of education between ‘Some college’ and ‘Bachelor’s degree’ is not the same as between ‘High school graduate’ and ‘Some college’).

Option A, a nominal scale, is incorrect because a nominal scale involves categorization without a specific order (e.g., types of housing: apartment, house, condo). Option C, an interval scale, is also not correct since interval scales involve ordered categories that are equidistant from each other, such as temperature measured in Celsius or Fahrenheit. Option D, a ratio scale, includes a true zero point and equal intervals, like height or weight measurements. Therefore, the education level categories best fit an ordinal scale.

————————————————————————————————————————————————

► Learn more about the Agents of Change course here: https://agentsofchangeprep.com

About the Instructor, Meagan Mitchell: Meagan is a Licensed Clinical Social Worker and has been providing individualized and group test prep for the ASWB for over five years. From all of this experience helping others pass their exams, she created the Agents of Change course to help you prepare for and pass the ASWB exam!

Find more from Agents of Change here:

► Facebook Group: https://www.facebook.com/groups/aswbtestprep

► Podcast: https://anchor.fm/agents-of-change-sw

#socialwork #testprep #aswb #socialworker #socialwork #socialworktest #socialworkexam #exam #socialworktestprep #socialworklicense #socialworklicensing #licsw #lmsw #lcsw #aswbexam #aswb #lcswexam #lmswexam #aswbtestprep #aswbtest #lcswtestprep #lcswtest #lmswtestprep #lmswtest #aswbcourse #learningstyles #learningstyle

Disclaimer: This content has been made available for informational and educational purposes only. This content is not intended to be a substitute for professional medical or clinical advice, diagnosis, or treatment

Share this:

Popular posts.

Gambling Disorder

Gambling Disorder and the Role of Social Workers

Moral Disengagement

What is Moral Disengagement? The Role of Social Workers

Sport Social Work

Sport Social Work: Kickstarting Change

Library Social Work

Library Social Work: The New Community Hub

Fetal Alcohol Spectrum Disorders

Fetal Alcohol Spectrum Disorders and Social Work

Cultural Humility and the ASWB Exam

Cultural Humility and the ASWB Exam

Analyzing Data and the ASWB Exam

Analyzing Data and the ASWB Exam

Ethics of Remote Social Work and ASWB Exam

The Ethics of Remote Social Work on the ASWB Exam

You may also like.

social work research level of measurement

Pass Your Social Work Exam

Try it for free  

Agents of Change will help you prepare to  PASS  your licensing exam and level up your career.

Quick Links

If you have any questions not covered by our FAQs, please get in touch using the email below

Email: [email protected]

Discover more from Agents of Change Social Work Test Prep

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Public Health
  • PMC10244676

Wellbeing measures for workers: a systematic review and methodological quality appraisal

Rebecca j. jarden.

1 Faculty of Medicine, Dentistry and Health Sciences, Melbourne School of Health Sciences, The University of Melbourne, Carlton, VIC, Australia

2 Austin Health, Heidelberg, VIC, Australia

Richard J. Siegert

3 Auckland University of Technology (AUT), North Shore Campus, Auckland, New Zealand

Jane Koziol-McLain

Helena bujalka.

4 Department of Nursing, Melbourne School of Health Sciences, The University of Melbourne, Carlton, VIC, Australia

Margaret H. Sandham

Associated data.

The original contributions presented in the study are included in the article/ Supplementary material , further inquiries can be directed to the corresponding author.

Introduction

Increasing attention on workplace wellbeing and growth in workplace wellbeing interventions has highlighted the need to measure workers' wellbeing. This systematic review sought to identify the most valid and reliable published measure/s of wellbeing for workers developed between 2010 to 2020.

Electronic databases Health and Psychosocial Instruments, APA PsycInfo, and Scopus were searched. Key search terms included variations of [wellbeing OR “well-being”] AND [employee * OR worker * OR staff OR personnel] . Studies and properties of wellbeing measures were then appraised using Consensus-based Standards for the selection of health Measurement Instruments.

Eighteen articles reported development of new wellbeing instruments and eleven undertook a psychometric validation of an existing wellbeing instrument in a specific country, language, or context. Generation and pilot testing of items for the 18 newly developed instruments were largely rated 'Inadequate'; only two were rated as 'Very Good'. None of the studies reported measurement properties of responsiveness, criterion validity, or content validity. The three instruments with the greatest number of positively rated measurement properties were the Personal Growth and Development Scale, The University of Tokyo Occupational Mental Health well-being 24 scale, and the Employee Well-being scale. However, none of these newly developed worker wellbeing instruments met the criteria for adequate instrument design.

This review provides researchers and clinicians a synthesis of information to help inform appropriate instrument selection in measurement of workers' wellbeing.

Systematic review registration

https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=79044 , identifier: PROSPERO, CRD42018079044.

1. Introduction

Organizational interest in workers' wellbeing is increasing, and subsequently work wellbeing interventions are an area of growth. Wellbeing measures can both identify the need for an intervention through assessing the status of workers' wellbeing, and subsequently evaluate the efficacy of an intervention through quantifying the level of change in workers' wellbeing following the intervention. However, the numerous and growing number of available wellbeing measures [e.g., see ( 1 , 2 )], makes identifying and selecting the most appropriate, reliable, and valid instruments for effectiveness evaluations in the workplace difficult. Validity and reliability of these measures have not always been established and there is not yet a gold standard measure of wellbeing to evaluate the construct validity of new measures against. This review will inform future measurement development studies and improve clarity for researchers and clinicians in instrument selection in the measurement of workers' wellbeing.

1.1. Work wellbeing

Theoretical models and definitions of work wellbeing are varied and usually from a Western perspective ( 3 – 5 ). The construct of workers' wellbeing is rich and multifaceted, scaffolding elements that transcend work (the role), workers (the individuals and teams) and workplaces (organizations) ( 6 ). Key factors are thought to include subjective wellbeing, including job satisfaction, attitudes and affect; eudiamonic wellbeing including engagement, meaning, growth, intrinsic motivation and calling; and social wellbeing such as quality connections and satisfaction with co-workers ( 7 ). Laine and Rinne ( 8 ) add to these factors in their “discursive” definition which encompasses healthy living/working, work/family roles, leadership/management styles, human relations/social factors, work-related factors, working life uncertainties and personality/individual factors. Work-Related Quality of Life (WRQoL) add further factors, including general wellbeing, home-work interface, job and career satisfaction, control at work, working conditions and stress at work ( 9 ).

The elements associated with wellbeing differ between occupational groups ( 10 ). For professionals, five elements typically account for the greatest amount of variance in job satisfaction: work-life balance, satisfaction with education, being engaged, experiencing meaning and purpose, and experiencing autonomy ( 10 ). Knowing what constructs workers find meaningful with respect to wellbeing determines the essential content in a wellbeing measure and can vary between occupational groups. For example, laborers value work-life balance, being absorbed, meaning and purpose, feeling respected and having self-esteem ( 10 ), whereas nurses valued workplace characteristics, the ability to cope with changing demands and feedback loops ( 11 ).

1.2. Measuring workers' wellbeing

Given the variations in theoretical models, definitions of, and salience of elements associated with wellbeing in different occupational groups, selecting instruments for the measurement of workers' wellbeing is challenging. While there are multiple methodologies for investigating workers' wellbeing, in this review we focus on quantitative assessment. Two directions in the measurement of workers wellbeing have been taken. First, to use existing wellbeing instruments with workers. Second, to develop new instruments specifically intended to measure workers' wellbeing. The decision to use a given workers' wellbeing measure may be guided by many factors, but it is essential to prioritize the measurement properties of the instrument, such as the reliability, validity and responsiveness of the instrument ( 12 ). A single “gold-standard” measure of workers' wellbeing has not yet been identified and given the afore mentioned heterogeneity in the construct of wellbeing depending on the viewer, a one size fits all gold standard is unlikely to be found. The most appropriate instrument to measure the construct may require a selection of unidimensional (sub) scales, like the measurement of WRQoL ( 9 ). For this review, the aim was to evaluate the measurement properties of instruments that measured the broader construct of workers' wellbeing [e.g., the Workplace Wellbeing Index ( 13 , 14 )]. Any identifiable sub-scales within the instruments were individually reported.

1.3. Systematic reviews of measurement instruments

The systematic review is one method of identifying, appraising and synthesizing research to strengthen the evidence base and inform decisions. The Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) guidelines ( 15 ) supports both rigor and transparency in reviews. A systematic review of studies developing, and reporting, instrument measurement properties enables the generation of new evidence, in much the same way as a systematic review of clinical studies or trials is essential for establishing the effectiveness of an intervention. Well-defined criteria for appraising the methodological quality of studies of instrument measurement properties are therefore important for establishing evidence for the measurement properties of instruments. One such methodology to support this appraisal process was developed through the international COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) initiative ( https://www.cosmin.nl/ ) which sought to improve the selection of outcome measurement instruments for both research and clinical practice [e.g., see ( 12 , 16 – 24 )].

1.4. Systematic reviews of wellbeing measures

There were four previous reviews of measures for assessing wellbeing in adults identified ( 2 , 25 – 27 ). McDowell ( 27 ) reviewed nine specifically selected measures reported by the author to be representative of different conceptualizations of wellbeing. These measures were all developed before 2000 and included: Life Satisfaction Index, the Bradburn Affect Balance Scale, single-item measures, the Philadelphia Morale Scale, the General Wellbeing Schedule, the Satisfaction With Life Scale, the Positive and Negative Affect Scale, the World Health Organization 5-item wellbeing index, and the Ryff's scales of psychological wellbeing. McDowell ( 27 ) described the nine measures and their properties. Lindert et al. ( 26 ) aimed to identify, map and analyze the contents of self-reported wellbeing measurement scales from studies published between 2007 and 2012. Sixty measures were identified, described, and appraised using an author developed evaluation tool based on the recommendations of the Scientific Advisory Committee of the Medical Outcomes Trust and two checklists for health status instruments ( 28 – 31 ). Linton et al. ( 2 ) reviewed 99 self-report measures from studies published between 1993 to 2015 for assessing wellbeing in adults, exploring dimensions of wellbeing and describing development over time using thematic analysis and narrative synthesis. Ong et al. ( 25 ) conducted a broad scoping review to identify measures to assess subjective wellbeing, particularly in the online context, using thematic coding. None of these four reviews used the COSMIN methodology or focused specifically on the wellbeing of workers.

1.5. Objectives

This review aims to: (1) systematically identify articles published from 2010 to 2020 reporting the development of instruments to measure workers' wellbeing, (2) critically appraise the methodological quality of the studies reporting the development of workers' wellbeing measures, (3) critically appraise the psychometric properties of the measures developed for workers' wellbeing, and (4) based on the measures developed between 2010 and 2020, recommend valid and reliable measures of workers' wellbeing. As such, this review informs future measurement development studies and improves clarity for researchers and clinicians in instrument selection in the measurement of workers' wellbeing.

This systematic review largely followed the methods published in the review protocol ( 32 ). Four review protocol variations were required.

2.1. Review protocol variations

The four protocol variations were needed due to project scope and feasibility, new reporting standards being developed between publication of the protocol and completing the review ( 15 ), improved access to programs (e.g., Covidence, Veritas Health Innovation Ltd), updated versions of programs (e.g., Endnote X9), and evolving knowledge of databases, wellbeing definitions, and terminology (in consultation with liaison research librarians across two universities). First, project scope and feasibility were managed through limiting the databases searched to Health and Psychosocial Instruments, APA PsycInfo, and Scopus. These three databases were selected in consultation with a research librarian to maintain breadth. We included a manual reference list review and forward and backward citation chaining of potentially relevant reviews and included studies to strengthen the search. Second, the article publication date range of 2010 to 2020 was applied as a limiter to manage project scope and was selected to align with publication of the COSMIN checklist [e.g., see ( 16 – 19 )], building on earlier work [e.g., ( 33 )]. Third, we have used the updated a Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) guidelines ( 15 ) and COSMIN methodology ( 19 – 21 , 23 , 24 , 34 ). Fourth, latest versions of Endnote (X9) citation management software and Covidence review management software (Veritas Health Innovation Ltd) were used to support the review processes.

2.2. Review inclusion and exclusion criteria

2.2.1. types of instruments.

Eligible workers' wellbeing data collection instruments included interviewer-administered, self-administered or computer-administered. Examples included an online survey, a written questionnaire completed by a worker, or a worker's responses to an interviewer administering the survey.

2.2.2. Types of study design

Eligible studies were those published as a full text original article that report psychometric properties and (1) development of an entirely new instrument or (2) validation of an instrument modified from a previously developed instrument.

2.2.3. Types of settings and participants

The study sample needed to include workers. If other populations were included along with workers, the findings related to workers needed to be differentiated from others. The measure could have been applied to workers in any paid work setting where a workplace is defined as a place where a worker goes to carry out work ( 35 ). For articles reporting multiple studies using several different samples, only those that included workers in at least one sample were included.

2.2.4. Types of measures

Instruments developed or validated for the measurement of workers' wellbeing as an outcome were eligible for inclusion. The disparate theoretical views and definitions of both wellbeing ( 36 – 38 ) and work wellbeing ( 3 – 5 , 8 , 39 ) lead us to include instruments where the term “wellbeing” was specifically stated as either “wellbeing,” “well-being” or “well being.” The term “workers”' needed to be specifically stated as either “employee * ,” “worker * ,” “staff” or “personnel.” Studies reporting the use of instruments to measure commonly cited terms for high levels of wellbeing including flourishing ( 40 , 41 ) and thriving ( 42 – 44 ) were included. Studies in which authors stated they were developing or validating a measure of workers' wellbeing, but only used items or previously developed instruments of other constructs (e.g., happiness, or positive and negative emotions, or satisfaction with life, or depression, or stress or anxiety) were excluded. Studies and measures published in languages other than English were excluded. Abstracts, books, theses and conference proceedings were excluded.

2.3. Search strategy

A three-staged search strategy was used to identify studies that include measures meeting the inclusion criteria: (1) electronic bibliographic databases for published work, (2) reference lists of studies with included measures, and (3) the reference list of previously published reviews.

2.3.1. Information sources

The following electronic bibliographic databases were searched: Health and Psychosocial Instruments (abstract search), APA PsycInfo (abstract search), and Scopus (title, abstract & keyword search).

2.3.2. Search terms

Database key search terms included [wellbeing OR “well-being”] AND [employee * OR worker * OR staff OR personnel] . Search terms for measurement properties of measurement instruments were adapted from the “precise search filter for measurement properties” and “exclusion filter” ( 45 ). The search strategy is provided in Supplementary material .

2.4. Data management

References identified in execution of the search strategy were exported to EndNote X9 bibliographic software, and duplicates were removed. References were imported to Covidence, Veritas Health Innovation Ltd for duplicate screening, appraisals and data extraction.

2.5. Selection process

Titles and abstracts were screened by two independent reviewers (RJ or MS or SB or JD). The full text documents of these potentially relevant studies were independently screened against the eligibility criteria by two reviewers (RJ or MS and SB or JD). Any disagreement was resolved through consensus amongst the review team. Findings from the execution of the search and selection process are presented in a Preferred Reporting Items for Systematic review and Meta-Analysis (PRISMA) flowchart ( 15 ).

2.6. Data collection process

Data were extracted by two reviewers independently (RJ and/or MS and/or HB) into Covidence 2.0 templates adopted from the COSMIN methodology user guide ( 20 ). Final data tables were checked for accuracy and completeness by a third reviewer (RJ and /or MS and /or HB).

2.7. Data analysis process

The findings from execution of the search strategy are described and illustrated in a flow chart; the characteristics of the included studies are tabulated. Analysis of methodological quality followed the procedure of COSMIN methodology for systematic reviews of Patient-Reported Outcome Measures ( 20 ) and supporting resources ( 19 – 21 , 23 , 24 , 34 ). The COSMIN Risk of Bias assessment distinguish between appraisal of content validity, that is the extent to which the area of interest is comprehensively addressed by the items in the instrument, and appraisal of the process of measurement instrument development ( 24 ). Although appropriate instrument design studies support good content validity, distinct appraisal criteria should be applied to instrument development studies and to studies that assess content validity of existing measurement instruments. In the present review, two reviewers (RJ and/or HB and/or MS) independently appraised studies that developed new wellbeing instruments against the COSMIN Patient-Reported Outcome Measure (PROM) development criteria ( 19 , 20 , 23 ), and appraised studies that validated existing instruments against the COSMIN content validity criteria ( 23 ). The term “Patient” in “PROMs” is considered synonymous with the population group for this study, “Worker.” Reviewer consensus occurred through discussion.

2.7.1. Assessment of the methodological quality of the included studies

The COSMIN checklist includes 10 boxes: two for content validity, three for internal structure, and five for the remaining measurement properties of reliability, measurement error, criterion validity, hypotheses testing for construct validity and responsiveness ( 19 , 20 ). Studies were rated as either “Very Good,” “Adequate,” “Doubtful,” or “Inadequate.” The rating “Not Explored” was applied for any measurement properties not investigated for an instrument in any individual article. We have briefly summarized key criteria below based on the COSMIN taxonomy, for further detail please see associated COSMIN methodology user manuals and reference materials ( 19 – 21 , 23 , 34 ).

2.7.1.1. Structural validity, internal consistency and measurement invariance

Three measurement properties relate to the internal structure of an instrument: structural validity, internal consistency, and measurement invariance. Structural validity can be assessed for multi-item instruments that are based on a reflective model where each item in the instrument (or subscale within an instrument) reflect an underlying construct (for example, psychological wellbeing) and should thus be correlated with each other ( 20 ). For the methodological quality of studies of structural validity to be rated “Very Good,” the study must conduct confirmatory factor analysis; include an adequate sample size with respect to the number of items in the instrument; and not have other methodological flaws. Internal consistency is the degree to which items within an instrument (for a unidimensional instrument) or subscale of an instrument (for a multidimensional instrument) are intercorrelated with each other. For studies of internal consistency, the COSMIN Risk of Bias checklist stipulates that a rating of “Very Good” requires the study of internal consistency to report the Cronbach's alpha (or omega) statistic (and for each subscale within a multi-dimensional scale), and for no other major methodological or design flaws in the study. Measurement invariance (also known as cross cultural validity) is the extent to which the translated or culturally modified version of an instrument perform in a similar way to those in the original version. A rating of “Very Good” for the methodological quality of studies of measurement invariance requires evidence that samples being compared for different versions of the instrument are sufficiently similar in terms of any relevant characteristics (except for the key variable that differs between them, such as cultural context); that an appropriate method was used to analyze the data (for example, multi-group confirmatory factor analysis); and there is an adequate sample size, which is dependent on the number of items in the instrument of interest ( 19 – 21 , 23 ).

2.7.1.2. Reliability

Reliability is the proportion of variance in a measure that reflects true differences between people and is assessed in test-retest studies; to avoid confusion with other forms of reliability, it is hereafter referred to as test-retest reliability. For the methodological quality of a test-retest reliability study to be rated as “Very Good,” it must provide evidence that respondents were stable between repeated administration of the test instrument; the interval separating repeated administration of the instrument must be appropriate; and the study must provide evidence that the test conditions between repeated tests were similar. Regarding the statistical methods, the COSMIN Risk of Bias tool specifies that for continuous scores the intraclass correlation coefficient must be calculated ( 19 – 21 , 23 ).

2.7.1.3. Measurement error

Measurement error refers to the error, whether systematic or random, in an individual's score that occurs for reasons other than changes in the construct of interest. Similar to studies of test-retest reliability, for studies of measurement error to be rated as “Very Good,” evidence must be provided that respondents were stable between repeated administration of the instrument, that the interval between repeated administrations of the instrument were appropriate, and that the test conditions were similar for repeated administrations of the instrument. Regarding the appropriateness of statistical methods, standard error of measurement (SEM) or smallest detectable change (SDC) must be reported for continuous scores ( 19 – 21 , 23 ).

2.7.1.4. Criterion validity

Criterion validity is the extent to which scores on a given instrument adequately reflect scores of a “gold standard” instrument that assesses the same construct. For the methodological quality of a study of criterion validity to be rated as “Very Good,” correlations between the instruments must be reported for continuous scores, and the study must be free from other methodological flaws ( 19 – 21 , 23 ). For workers' wellbeing, our systematic search of the literature did not identify a universally accepted “gold standard” for workers' wellbeing for use in evaluating criterion validity. However, given the varied definitions and models of wellbeing, we have evaluated criterion validity for included studies. We have based our evaluation on the individual study authors' definition or model of workers' wellbeing and it's alignment to their selected “gold standard” instrument.

2.7.1.5. Construct validity

Construct validity is the extent to which scores on an instrument are consistent with hypotheses about the construct that it purports to measure. Two broad approaches to hypothesis testing for construct validity are the “convergent validity” approach and the “discriminative or known-groups validity” approach. Hypothesis testing for convergent validity involves comparison on performance on the instrument of interest and another instrument that measures a construct that is hypothesized to be related or unrelated in some way. For studies employing the convergent validity approach to establishing construct validity to be methodologically rated as “Very Good,” the construct measured by the comparator instrument must be clear; sufficient measurement properties of the comparator instrument must have been established in a similar population and the statistical methods must be appropriate. For studies employing the “discriminative or known groups validity” approach to establishing construct validity to be methodologically rated as “Very Good,” the study must adequately describe the relevant features of the subgroups being compared, and appropriate statistical methods must be employed ( 19 – 21 , 23 ).

2.7.1.6. Responsiveness

The measurement property of responsiveness refers to the ability of an instrument to measure changes over time in the construct of interest. It is similar to construct validity, but whereas construct validity refers to a single score, responsiveness refers to the validity of a change in the score, for example, the ability of the instrument to detect a clinically important change. The COSMIN Risk of Bias tool provides standards for assessing the methodological quality of numerous subtypes of responsiveness; for example, for the methodological quality to be rated “Very Good” for a study using the “construct” approach to responsiveness, the study must adequately describe the intervention, and use appropriate statistical methods ( 19 – 21 , 23 ).

The COSMIN guidelines recommend that if PROM development studies or content validity studies are rated as “Inadequate,” then measurement properties should not be assessed. However, we determined that we would appraise the qualities of studies on other measurement properties, even if the initial PROM development was rated as “Inadequate.” By continuing with these further assessments and providing readers with the detailed findings of these assessments, our review will enhance opportunities to strengthen future workers' wellbeing measure development.

2.7.1.7. Evaluation of the study results against criteria for good measurement properties

The quality of the measurement instruments was rated as either “Sufficient,” “Insufficient,” or “Indeterminate” against the criteria of good measurement properties ( 21 ). Briefly, the criteria for a rating of “Sufficient” for each of the measurement properties are as follows; for further detail, see Prinsen et al. ( 21 ). For structural validity, the model fit parameters of a confirmatory factor analysis must meet specified criteria. For internal consistency, an instrument must have at least low level of sufficient structural validity and Cronbach's alpha must be ≥0.7. Thus, in the case that there is “Insufficient” structural validity (for example if structural validity assessment was undertaken only with exploratory factor analysis), internal consistency cannot be appraised even if it has been calculated and reported. For test-retest reliability, the intraclass correlation coefficient must be ≥0.7. For measurement error, the minimally important change (MIC) must exceed smallest detectable change (SDC); whereas the SDC is the smallest change that is attributable to measurement error, the MIC is the smallest change that can be detected that respondents perceive as important. For an instrument's construct validity, the results of hypothesis testing for construct validity must be supported. For measurement invariance, there must be no important differences in the model between the groups being compared. For criterion validity of an instrument, correlation with a “gold standard” must be ≥0.7. For responsiveness, the results of a study of responsiveness must support the hypothesis. We also appraised interpretability, or the extent to which one can assign qualitative meaning to a quantitative score ( 21 , 33 ). However, diverging from the recommendations of Terwee et al. ( 33 ) and subsequently Prinsen et al. ( 21 ) we applied a two-category scoring system to assess interpretability, with a positive rating for studies that report at least some descriptive statistics for the instrument for the sample of interest, and a negative score for studies that did not report descriptive statistics. This differs somewhat to the recommendations of Terwee et al. ( 33 ), who recommend that the minimally important change (MIC) must be reported for a favorable rating of interpretability.

2.8. Data synthesis

The inconsistency in individual study populations, settings and languages did not support meta-analysis, statistical pooling nor a cumulative evidence grade [see De Vet ( 12 ) for further information]. Rather, results are tabulated with statistical summaries and a narrative description.

The initial search returned 8,430 articles; once 252 duplicates were removed, the titles and abstracts of 8,178 studies were screened for relevance, resulting in removal of 7,383 irrelevant studies. Of the remaining 765 full-text studies that were assessed for eligibility, 502 were excluded for reasons including wellbeing not being evident in the instrument (e.g., the instrument measured stress, anxiety, depression); the wrong type of publication (e.g., qualitative research not using a measurement instrument, non-primary research); and insufficient detail about the instrument to determine relevance. Citation searching returned nine further studies for screening (see Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is fpubh-11-1053179-g0001.jpg

Search and screening flow diagram. Flow diagram adapted from Page et al. ( 15 ).

Data were extracted from the remaining 267 articles, to identify: (1) articles that reported the development of a new instrument (which may or may not include workers in the development stage) and that also psychometrically validate it in a sample of workers/employees; (2) articles that reported, as the primary aim of the study, a psychometric validation—in a specific country, language, or context—of an existing work wellbeing instrument that was originally developed more than 10 years ago and/or in a different context or population. Articles that reported the use of one or more existing wellbeing instruments for the purpose of measuring wellbeing as an outcome, rather than reporting instrument development or measurement properties, were excluded at this point ( n = 238). The following analysis and results are for the articles that report the development of a new wellbeing instrument ( n = 18) and those that psychometrically validate a previously developed wellbeing instrument in a new population, language, culture, or context ( n = 11). Within each of these two groups, we appraised both methodological quality of the studies of instrument measurement properties, and the psychometric properties of the instruments .

3.1. Characteristics of articles reporting development of new instruments

The 18 articles that report the development of a new instrument, and the identified psychometric properties, are summarized in Table 1 .

Characteristics of articles reporting development of new instruments.

Methodology abbreviations: ICC, Intraclass correlation coefficient; NR, not reported; NE, not evaluated; SDC, smallest detectable change; SEM, standard error of measurement.

Other abbreviations: BFSI, Banking, Financial Services & Insurance; EWB, employee wellbeing; EMBA, executive Master of Business Administration degree; IPWBW, index of psychological wellbeing at work; IT, Information Technology; ITES, Information Technology Enabled Services; MSW, master of social work; PVM, proactive vitality management; PWBW, psychological wellbeing at work; TOMH, Tokyo Occupational Mental Health; WB, wellbeing; WRWB, work-related wellbeing.

Of the 18 articles reporting the development of new instruments ( 46 – 63 ), four did so with employees in the United States, two with employees in Australia, and two with employees in India. Eight studies developed instruments with populations in China, Japan, Hungary, the UK, the Netherlands, Sweden, Taiwan, and Canada. Two studies did not report specific country contexts for the participants: Anderson et al. ( 46 ) did not report a specific country context and developed their scale with an online survey panel of “employees” who were fluent in English, and Butler and Kern ( 47 ) studied a sample of employees from an online company based at several global offices. All studies included male and female participants, and, as expected given the focus on workers, the mean age of participants in studies that reported this parameter tended to be between early thirties to mid-forties.

Eight of these studies developed instruments with relatively heterogeneous samples of workers from a range of industries and across the country of interest ( 49 – 51 , 55 , 58 , 60 – 62 ). Eight studies developed instruments in well-defined populations in well-defined settings, such as nurses within a specific medical center ( 48 ); staff at a university ( 56 , 63 ); staff at a school or within a specific school system ( 53 , 54 ); social workers undertaking a specific course ( 57 ); staff working in a library service in southern England ( 52 ); and employees of one specific online company with a global presence ( 47 ). Porath and Hyett ( 59 ) report a series of different studies for different measurement properties, with different samples ranging from factory workers to executives.

3.2. Assessment of methodological quality

The methodological quality of studies that develop new wellbeing instruments and of the measurement properties of the instruments are summarized in Table 2 .

Methodological quality of studies that develop new instruments and of the measurement properties of the instruments .

Quality criteria for studies: “Very Good,” “Adequate,” “Doubtful” or “Inadequate” quality according to COSMIN Risk of Bias Assessment; Quality criteria for measurement properties of instruments: Sufficient (+); indeterminate (?); insufficient (–) according to Mokkink et al. ( 20 ).

N/E, means the measurement property was not explored in the included study; Interpretability: Y indicates that “Yes” at least some descriptive statistics are reported for the instrument, N indicates that “No” descriptive statistics are reported.

Each of the 18 articles reported the development of a single new instrument, and thus there were 18 new instruments identified. Within the articles the number of studies investigating the measurement properties of these new instruments in a sample of workers ranged from one ( 47 ) to five ( 61 , 62 ). Butler and Kearn ( 47 ) report studies investigating other measurement properties, but these studies were carried out in different samples that were not exclusively comprised of workers. Across the 18 articles, the methodological quality of studies of measurement properties ranged from “Very Good” to “Inadequate.” The most frequently explored measurement properties for development of a new instrument were structural validity and internal consistency; in contrast, some measurement properties in the COSMIN taxonomy ( 18 ) were studied infrequently (e.g., responsiveness, measurement invariance) or not at all (e.g. criterion validity).

Of the 18 articles, only four explicitly stated that a pilot test was conducted with the target population to check item comprehensiveness and comprehensibility ( 52 , 59 , 60 , 63 ). Conducting a pilot test in the target population (workers or employees) is one of the standards for rating an instrument development study as “Very Good” as opposed to “Inadequate.” These four studies were then assessed according to the extent to which they met the remaining standards for PROM development methodological quality ( 19 , 21 , 24 ). Juniper et al. ( 52 ) did not specify the sample size used for the pilot study (i.e., stating only that “ The questionnaire was pre-tested with a number of library staff to ensure content and instructions were clear .”; p. 110), and so overall the methodological quality of this PROM development study is rates as “Doubtful.” Porath et al.'s ( 59 ) pilot study for comprehensiveness/clarity employed an adequate methodology but a sample size of only 30, so instrument development was rated as “Doubtful.” Pradhan and Hati ( 60 ) also reported both eliciting concepts through interviews and testing items for clarity and comprehensiveness in an adequate sample from the target population, and so was rated as “Very Good.” Zhou and Parmanto ( 63 ) development of the Pitt Wellness Scale was rated as “Very Good,” given its detailed methodology and description of the pilot testing process, and the samples used in these processes.

Three of the remaining 14 articles ( 49 , 55 , 62 ) did involve the target population in concept elicitation through interviews. However, these researchers developed items based on these concepts and proceeded to administer the instrument and explore measurement properties without testing the clarity or comprehensiveness of the individual items with the target population.

The 11 articles left did not refer to a target population involvement at any point during either concept elicitation or pilot testing items for comprehensibility and comprehensiveness. Item generation was informed exclusively by the researchers', and in some cases their colleagues', expertise and familiarity with the literature ( 46 , 48 , 50 , 53 , 56 , 58 , 61 ), or based on items from existing “wellbeing” instruments. However, pilot testing the items for comprehensiveness and comprehensibility in the new context did not occur ( 47 , 51 , 54 , 57 ).

3.2.1. Structural validity

Of the 18 studies that developed new instruments, one [Butler and Kern ( 47 )] was excluded from our evaluation of structural validity because a mixed sample of employed and unemployed participants were used for the study of this specific measurement property. Of the 17 included studies that were evaluated for this measurement property, three were rated as “Inadequate,” four as “Doubtful,” three as “Adequate,” and seven as “Very Good.” The three studies evaluated as “Inadequate” included Juniper et al. ( 52 ), who did not use factor analysis as a method; and Kern et al. ( 54 ) and Parker and Hyett ( 58 ), whose sample size was less than five times the number of items. A common reason for the “Doubtful” ratings included failure to use separate samples for the exploratory and confirmatory factor analysis stages ( 49 , 51 , 60 , 63 ). Three were rated as “Adequate” because they performed only exploratory but not confirmatory factor analysis ( 53 , 56 , 57 ). Seven studies of structural validity were rated as “Very Good” ( 46 , 48 , 50 , 55 , 59 , 61 , 62 ).

Structural validity measurement properties were evaluated for 14 of the included instruments. The other four studies included two studies in which only exploratory factor analysis was carried out, one study that employed a different method of structural validity assessment, and one study that established this measurement property in a mixed sample including, but not exclusively composed of, workers. Of the remaining 11 studies that conducted factor analysis, all but one instrument met the criteria for a rating of “Sufficient” for the measurement property of structural validity; for the Employee Wellbeing Scale ( 55 ), the structural validity measurement property was rated as “Indeterminate” because the factor analysis model fit parameters required according to the COSMIN guidelines were not reported.

3.2.2. Internal consistency

For all but one ( 58 ) of the new wellbeing instruments, studies were carried out to determine internal consistency. All were methodologically rated as “Very Good.” Evaluation of the measurement property of internal consistency requires at least low evidence for sufficient structural validity, and therefore only 11 instruments were evaluated for the internal consistency measurement property; for all of these, internal consistency was rated as “Sufficient.” Although five additional studies report data for the internal consistency of the instrument, this measurement property was not evaluated in the present review because there was not at least low level of evidence for structural validity based on the methods used ( 52 , 53 , 56 , 57 ) or because structural validity assessment had been performed in a non-worker population ( 47 ).

3.2.3. Construct validity

When evaluating studies of construct validity we found researchers used a wide range of methods to evaluate to construct validity, for example, convergent validity ( 59 , 61 , 62 ), nomological validity ( 50 ), and concomitant validity ( 49 ). Some studies investigation of criterion validity was not in accordance with the COSMIN definition of the term, but was better aligned with a study on construct validity. These studies evaluation of criterion validity were considered to be construct validity evidence when considering the COSMIN criteria for construct validity.

Eight of the included articles that developed new instruments conducted studies of what we deemed investigation of construct validity ( 46 , 49 – 51 , 55 , 59 , 61 , 62 ). Six met all of the COSMIN methodological standards for a rating of “Very Good” ( 46 , 49 , 50 , 59 , 61 , 62 ). Two were rated as methodologically “Inadequate” because measurement properties of the comparator or related measurement instrument(s) were not adequately reported ( 51 , 55 ). For all eight instruments for which construct validity was assessed, the measurement property of construct validity was rated as “Sufficient,” though given the risk of bias in the construct validity studies of the instruments of Eaton et al. ( 51 ) and Khatri and Gupta ( 55 ), further exploration is warranted.

3.2.4. Other measurement properties

Several measurement properties were explored infrequently, including (test-retest) reliability, measurement error, measurement invariance, and responsiveness. Watanabe et al. ( 61 ) conducted a study of test-retest reliability and report the intraclass correlation coefficient results; however, given the absence of comments regarding the stability of respondents between timepoints and the similarity of the testing conditions, this study was rated as “Adequate.” Six out of eight subscales in Watanabe et al.'s ( 61 ) instrument had ICCs >0.7, so overall the measurement property of test-retest reliability was rated as “Sufficient.” Both Pradhan and Hati ( 60 ) and Zheng et al. ( 62 ) conducted test-retest reliability studies that were appraised as being of “Doubtful” quality, because the ICC was not determined, and rather the Pearson's correlation coefficient was reported without providing evidence that no systematic change had occurred between each timepoint of the tests. Given the absence of a reported ICC, the measurement property of test-retest reliability for the instruments of both Pradhan and Hati ( 60 ) and Zheng et al. ( 62 ) are rated as “Indeterminate.” None of these studies commented specifically on the stability of the participants between repeated measurements. Zheng et al. ( 62 ) investigated measurement invariance in their development of the instrument however, this was rated as “Inadequate” quality because it did not meet the criteria of ensuring that samples were similar in all ways except for the cultural context. Watanabe et al. ( 61 ) investigated measurement error of the Japanese version of the PERMA Profiler; this was rated as “Adequate,” lacking detail regarding the stability of the employees between the two time points. The measurement error of this scale was rated as “Indeterminate” because the MIC was not reported. Responsiveness was not determined for any instrument. Evidence of responsiveness is a measurement property lacking from the series of wellbeing instruments developed in the 2010–2020 decade.

3.2.5. Interpretability

Terwee et al. ( 33 ) specify that adequate instrument interpretability requires information and means and standard deviations in multiple groups, as well as the minimally important change (MIC). None of the included studies reported MIC so, technically, none of the instruments should be rated as favorable. However, for the purposes of this review, interpretability was rated as either positive or negative, with a positive rating being applied if at least means and standard deviations were reported. Four studies in which new instruments were developed did not report any descriptive statistics for the scores produced from the instrument being developed ( 48 , 51 , 55 , 56 ). All other authors report some descriptive statistics (at least means and standard deviations) for the scale being developed, in some cases for individual items and/or factors within the scale, and in some cases for individual subgroups within the broader sample, for example, for males and females separately ( 53 ) or for different groups depending on duration of work ( 52 ).

3.3. Characteristics of articles reporting psychometric validation of previously developed instruments

Eleven articles reported validation of wellbeing instruments originally developed before 2010 and/or were previously developed or validated in a different population or context ( 64 – 74 ). These 11 articles reporting psychometric evaluations of previously developed wellbeing instruments are summarized in Table 3 .

Characteristics of articles reporting psychometric validation of previously developed instruments.

N/E, not explored; N/R, not reported.

One of these articles validated in a US population of workers a wellbeing instrument previously developed in Brazil ( 64 ). Several of the included articles undertook validations in new populations of workers in countries/languages that differed from the English/American populations in which the instrument had previously been developed and validated ( 65 , 67 – 73 , 75 ). Another sought to validate a previously developed instrument specifically in a population of workers ( 74 ). None of the included articles assessed content validity, criterion validity (according to the specific definition of criterion validity in the COSMIN guidelines) or responsiveness of the instruments.

3.4. Methodological quality of studies and appraisal of measurement properties of psychometrically validated instruments

The quality appraisal of studies that psychometrically validate previously developed instruments and appraisal of the measurement properties of the instruments are summarized in Table 4 .

Methodological quality of studies that psychometrically validate previously developed instruments and measurement properties of the instruments .

Assessment criteria for methodological quality of studies: “Very Good,” “Adequate,” “Doubtful,” or “Inadequate” quality according to COSMIN Risk of Bias Assessment; Quality criteria for appraisal of of measurement properties of instruments: Sufficient (+), indeterminate (?), insufficient (–) according to Terwee et al. ( 23 ).

CFA, confirmatory factor analysis; EFA, exploratory factor analysis; N/E, not explored means the measurement property was not explored in the included study.

Within these 11 validation articles, the number of measurement properties studied for any one instrument ranged from three to four. Structural validity and internal consistency were the most frequently studied properties; measurement error and measurement invariance were infrequently studied; and content validity, criterion validity, reliability and responsiveness were never studied.

3.4.1. Structural validity

Structural validity was investigated in all 11 studies that psychometrically validate previously developed instruments. The methodological quality of these studies was rated as “Very Good” for all except for two rated as “Adequate” ( 65 , 68 ). The reason for the ratings of “Adequate” was that exploratory, but not confirmatory, factor analysis was carried out. Of the nine instruments for which structural validity studies were carried out with confirmatory factor analyzes, the measurement property of structural validity was rated as “Sufficient” in all ( 64 , 67 , 69 – 75 ).

3.4.2. Internal consistency

The methodological quality of internal consistency was rated as “Very Good” for all 11 studies ( 64 , 65 , 67 – 75 ). This measurement property was rated as “Sufficient” for all instruments except for the two for which this measurement property could not be appraised because there was insufficient evidence for structural validity given that it had been assessed with only exploratory but not confirmatory factor analysis ( 65 , 68 ).

3.4.3. Measurement invariance/cross-cultural validity

Two studies evaluated measurement invariance/cross-cultural validity of previously developed wellbeing instruments; both were methodologically rated as “Very Good.” Laguna et al. ( 75 ) validated the Job-Related Affective Wellbeing scale in samples of workers in the Netherlands, Poland and Spain and demonstrated measurement invariance of the instrument across these country contexts. In this study, the property of measurement invariance was rated as “Sufficient.” Senol-Durak and Durak ( 72 ) established measurement invariance of the Turkish version of the Flourishing Scale for male and female employees; the measurement property of this scale was rated as “Sufficient.”

3.4.4. Test-retest reliability

Two studies evaluated test-retest reliability of previously developed wellbeing instruments. Mielniczuk and Łaguna ( 69 ) conducted a test-retest reliability study of the Job-Related Affective Wellbeing scale in a sample of Polish workers, reporting Pearson's correlation coefficients rather than the COSMIN recommendation of intraclass correlations, so the measurement property of test-retest reliability was “Indeterminate”; furthermore, Mielniczuk and Łaguna ( 69 ) did not comment on stability of the respondents in the intervening period, and so the methodological quality was rated as “Doubtful” according to the COSMIN criteria. Watanabe et al. ( 73 ) undertook a test-retest reliability study of the Japanese Workplace PERMA-Profiler, and although they did report the reliability in the COSMIN-recommended manner of intraclass correlations and this parameter was of a sufficient value, they did not comment on the stability of the respondents between the two testing sessions, and so overall the methodological quality of the study on this measurement property was rated as “Adequate.”

3.4.5. Hypothesis testing for construct validity

Eight of the 11 studies investigated construct validity, although reported this using a variety of terms besides “construct validity.” Several of the studies failed to adequately report measurement properties (i.e., descriptive statistics, internal consistency) in the study population for the comparator instruments used in the convergent validity assessment, and so were rated as “Inadequate” ( 65 , 68 , 74 ). The measurement property of construct validity for all eight instruments for which it was assessed met the COSMIN criteria for “Sufficient”; however, given the “Inadequate” rating of the methodological quality for three of the studies ( 65 , 68 , 74 ), the measurement property of construct validity for these instruments should be treated with caution.

3.4.6. Measurement error

Only Watanabe et al. ( 73 ) undertook a study of measurement error for the Japanese Workplace PERMA-Profiler. The methodological quality of this study was rated as “Very Good”; however, the property of measurement error of the Japanese Workplace PERMA-Profiler is rated as “Inconclusive” because the MIC was not reported.

3.4.7. Interpretability

All but two articles ( 64 , 71 ) report at least some descriptive statistics (mean and standard deviation) for the validated instruments; in some cases, descriptive statistics for scores were reported for individual items and factors within the overall instrument. Some report descriptive statistics from the instruments for different subgroups, such as males vs. females ( 68 ) or for different country contexts ( 75 ). None of the articles that validated previously instruments investigated or reported data that would help interpret change score (i.e., the minimal important change, or MIC).

4. Discussion

This review had four objectives. First, to systematically identify articles published from 2010 to 2020 reporting the development of instruments to measure workers' wellbeing. Second, to critically appraise the methodological quality of the studies reporting the development of workers' wellbeing measures. Third, to critically appraise the psychometric properties of the measures developed for workers' wellbeing. Fourth, based on the measures developed between 2010 and 2020, recommend valid and reliable measures of workers' wellbeing.

We screened 8,178 articles, and identified 18 articles reporting development a new instrument to measure workers' wellbeing, and 11 that validated existing measures of wellbeing in workers. Numerous articles were excluded due to measuring constructs other than wellbeing, such as illbeing (e.g. burnout). A number of included instruments had subscales that measured constructs related to wellbeing (e.g., job satisfaction) alongside subscales measuring wellbeing. Notable in our review were the different definitions of wellbeing and consequently the different types of content employed by test developers to represent the construct of wellbeing. Whilst variance in content is a threat to the validity of measures, without an agreed upon definition of wellbeing for workers from the population it concerns (workers themselves), validity will always be attenuated. The newly developed measures group and the previously developed group of measures were appraised using their respective COSMIN quality checklists.

4.1. Methodological quality

Overall the psychometric studies were insufficient to establish the validity of the measures, whether developed between 2010 and 2020, or previously developed before 2010 (or in a different context). In both the newly developed measures and previously developed measures groups, few studies reported the prevalence of missing data or how any missing data was handled, potentially introducing bias if data is systematically missing ( 85 ). Furthermore, statistics used in the analysis were often not clearly reported, omitting details such as the statistical procedures used, rotational methods, or formulas. This creates difficulty in appraising the quality of evidence. No study completed all eight categories to enable a full risk of bias assessment. Whilst exploratory and/or confirmatory factor analysis was often used, hypotheses for CFA were rarely provided and some studies used small samples. Commonly, test-retest reliability, criterion validity, measurement error, responsiveness, and cross-cultural validity were omitted altogether. These steps scaffold together to ensure that the risk of bias is reduced, and omission of a number of these steps as was the case here, has reduced the quality of the studies.

Internal consistency was assessed in all studies despite having a number of limitations for determining reliability [e.g., see ( 86 )]. All except two studies were appraised as having very good internal consistency. Commonly the measurement properties of responsiveness, criterion validity, or content validity were overlooked, and measurement error was rarely reported. For example, measurement error was studied for only one newly developed instrument ( 61 ) and was rated as “Indeterminate” for the instruments studied, because the minimally important change was not reported. The lack of evaluation of responsiveness in workers' wellbeing measures is problematic as it does not enable confidence in the instrument's validity if using to assess the impact of interventions on workers' wellbeing.

Our review highlighted a lack of ongoing validation of existing measures, with few studies completing more than three of the nine methods for establishing methodological quality. No studies of content validity were reported in the 11 articles that established measurement properties in instruments originally developed in a different context. This may reflect an implicit assumption by the researchers that the instrument for which they were establishing measurement properties in a new context/population must have content validity in the new population. Many of the instruments for which measurement properties were reported for new contexts, are in common use (e.g., the Job-Related Affective Wellbeing Scale, the WHO-5, and the Warwick Edinburgh Mental Wellbeing Scale). However, it is recommended that studies establish content validity to ensure items retain their validity in a new context ( 20 ). As with newly developed instruments, this group of instruments also neglected to assess measurement error, with one validation study of a previously developed instrument reporting measurement error but not minimally important change ( 73 ).

4.2. Recommendations of valid and reliable measures of workers' wellbeing

We aimed to synthesize evidence from 2010 to 2020 for workers' wellbeing instrument measurement properties in order to recommend valid and reliable measures. No measure achieved the stringent criteria used in the present review for several reasons. First, validation of a new measure generally requires multiple studies and should be conducted in the population where the measure is intended to be used. Second, the studies themselves did not reach the quality standard necessary. Third, the repetition of studies requires time to complete and then publish, resulting in a lag. The overarching reason this study was undertaken was to support researchers in determining the best available measure to use in workers wellbeing research. Consequently, we now make recommendations based on the best available evidence with the caveat of, no measure met the standard set by COSMIN methodology.

Considering the overall evidence for measurement properties for individual instruments, those with the greatest number of positively rated measurement properties amongst newly developed instruments were: (1) Anderson et al.'s ( 46 ) Personal Growth and Development Scale (PGDS), for which structural validity, internal consistency, construct validity and measurement invariance were all rated as sufficient; (2) Watanabe et al.'s ( 61 ) The University of Tokyo Occupational Mental Health [TOMH] wellbeing 24 scale, for which structural validity, internal consistency, construct validity and reliability were all rated as sufficient, and Item Response Theory (IRT) methods were employed during evaluation; and (3) Zheng et al.'s ( 62 ) Employee Wellbeing (EWB) scale, for which structural validity, internal consistency, construct validity were all rated as sufficient. However, none of these newly developed workers' wellbeing instruments met the COSMIN criteria for adequate instrument design.

The Personal Growth and Development Scale (PGDS) ( 46 ) was reported to measure perceptions of personal growth and development at work, and was developed based on Ryff's general model and items were developed and refined by subject matter experts. The instrument was tested on employees and students through correlating with constructs of interest, and structural invariance testing was undertaken with scalar invariance found longitudinally within groups, but not between groups. Moderate positive correlations were found between employee responses on the PGDS and Basic Needs Satisfaction, Intrinsic Motivation, Identified Regulation, and Satisfaction with Life. The PGDS is a promising measure that requires ongoing validation in worker samples, as predictive validity was only undertaken in the education version. Anderson et al.'s ( 46 ) Personal Growth and Development Scale could be considered for assessing workers' personal growth and development.

Watanabe et al.'s ( 61 ) University of Tokyo Occupational Mental Health [TOMH] wellbeing 24 scale was developed in a methodologically sound way that included IRT and Classical Test Theory (CTT) methods. It was developed specifically in workers, and could be considered for applications that aim to specifically assess wellbeing at work, as an independent concept from general eudemonic wellbeing. Watanabe et al. found their measure had overlapping constructs with Ryff's model of wellbeing and Self Determination Theory.

Zheng et al.'s ( 62 ) Employee Wellbeing Scale (EWS) was methodologically strong in its development, including items from workers and literature prior to psychometric refinement, strengthening its content validity. The EWS had moderate correlations with related wellbeing constructs and could be considered for assessing dimensions of worker wellbeing such as life wellbeing, workplace wellbeing, and psychological wellbeing. Configural invariance was found between worker samples from China and the United States despite cultural differences, suggesting elements of wellbeing may transcend culture.

Amongst studies of psychometric validation of instruments originally developed before 2010 (or in different context), the best available evidence was for the (1) Flourishing Scale; (2) Workplace PERMA Profiler, (3) Spanish Orientation to Happiness Scale, and (4) the Job Related Affective Wellbeing Scale. As was the case in the newer instruments, validation studies were predominated by CTT methods of evaluation.

4.3. Recommendations for future research

A key recommendation based on the findings of this review is that future instrument development studies (1) include the target population throughout the stages of concept elicitation and, subsequently, in pilot testing items for relevance, comprehensiveness and comprehensibility; (2) include samples of an adequate size during the development stage; and (3) describe instrument development methods in adequate detail. Given that most of the measurement properties of worker wellbeing instruments developed between 2010 and 2020 are not reported, there are many opportunities for establishing and validating other measurement properties of recently developed instruments. A consideration for future research is that IRT methods should be used in the development and evaluation of measures. The present research found that studies mainly relied on CTT which has a number of limitations that IRT methods overcome. Although COSMIN does not suggest IRT over CTT, IRT methods such as Rasch analysis are increasingly being used in psychology to increase measurement precision ( 87 ). No study reported MIC, the smallest within-person change over time above which patients, or, in the context of the current review, employees perceive themselves importantly changed ( 34 ). Future studies may explore how this property could be defined, which in turn will enable future research using workers' wellbeing instruments to infer meaningful changes in workers' wellbeing as a result of interventions or changes in circumstances.

4.4. Strengths and limitations

The strengths of this review are the use of COSMIN methodology and criteria for assessing studies of measurement properties and the measurement properties of instruments to support rigor and transparency in the review, like resources such as the Cochrane Handbook for Systematic Review and PRISMA guidelines have done for strengthening rigor and transparency in systematic reviews of interventions (as just one example).

The main limitations of this review relate to subjectivity. Despite the use of the COSMIN guidelines, there is still some subjectivity in identifying studies about specific measurement properties, given the diverse names for measurement properties that are used by researchers and that do not align with Mokkink et al.'s ( 18 ) taxonomy. Additionally, the use of just three databases and exclusion of studies not reported in the English language contributed to a potential selection bias, particularly associated with studies validating previously developed measures in new languages.

5. Conclusion

This review has elucidated the specific measures of workers' wellbeing developed and reported in the decade of 2010 to 2020 and assessed both risk of bias of studies reporting measure development and the quality of measurement properties. This synthesis is an important first step to support future workers' wellbeing researchers to identify and select the most appropriate instruments for effectiveness evaluations. Employing a standardized taxonomy and methodological approach in a globally cohesive and targeted manner will strengthen future scientifically informed developments in workers' wellbeing measurement.

Data availability statement

Author contributions.

Conceptualization: RJ, MS, RS, and JK-M. Data curation: RJ, MS, and HB. Formal analysis and investigation: RJ, MS, HB, and RS. Project administration and resources: RJ. Writing—original draft and writing—review and editing: RJ, HB, MS, RS, and JK-M. All authors contributed to the article and approved the submitted version.

Acknowledgments

Thank you Shivanthi Balalla (AUT University) and Juliet Drown (AUT University) for your support with the screening. Thank you to the research librarians, Andrew (Drew) South (AUT University) and Lindy Cochrane (The University of Melbourne), for your support in developing the search strategy.

Abbreviations

ASVE, average shared variance extracted; BFSI, banking, financial services & insurance; CFA, confirmatory factor analysis; COSMIN, COnsensus based Standards for the selection of health status Measurement Instruments framework; CTT, classical test theory; EFA, exploratory factor analysis; EMBA, executive master of business administration degree; EWB, employee wellbeing; EWS, employee wellbeing scale; GRADE, grading of recommendations assessment, development, and evaluation; ICC, intraclass correlation coefficient; IPWBW, index of psychological wellbeing at work; IRT, item response theory; IT, Information Technology; ITES, Information Technology Enabled Services; MIC, minimally important change; NE, not evaluated; NR, not reported; PGDS, Personal Growth and Development Scale; PRISMA, preferred reporting items for systematic review and meta-analysis; PRISMA-P, preferred reporting items for systematic review and meta-analysis protocols; PROMs, patient-reported outcome measures; PVM, proactive vitality management; PWBW, psychological wellbeing at work; SEM, standard error of measurement; SDC, smallest detectable change; TOMH, Tokyo occupational mental health; WB, wellbeing; WRQoL, work-related quality of life; WRWB, work-related wellbeing.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2023.1053179/full#supplementary-material

Psychological capital, social support, work engagement, and life satisfaction: a longitudinal study in COVID-19 pandemic

  • Published: 05 April 2024

Cite this article

  • Ludmila Dudasova   ORCID: orcid.org/0000-0003-0326-6982 1 ,
  • Jakub Prochazka   ORCID: orcid.org/0000-0002-6386-1401 2 &
  • Martin Vaculik   ORCID: orcid.org/0000-0001-8901-5855 3  

28 Accesses

Explore all metrics

Psychological capital (PsyCap) has gained prominence as an important resource for positive work attitudes, behaviors, and organizational outcomes. This pre-registered study aims to broaden existing understanding of the relationship between PsyCap and positive attitudes and behaviors using longitudinal evidence. A sample of 202 teachers ( M  = 45.33 years, SD  = 10.76) completed a set of online questionnaires in two measurement waves, two years apart. Using structural equation modelling with a pre-registered syntax, we found support for PsyCap as a mediator of the effects of perceived social support on changes in work engagement and life satisfaction within the two-year period. Perceived social support predicted the level of PsyCap measured two years later. A higher level of PsyCap was positively associated with changes in work engagement and life satisfaction between the two measurement waves. As the first data collection took place in the spring of 2019 and the second in the spring of 2021, the results also highlight the role of social support and PsyCap in dealing with demands related to the COVID-19 pandemic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

social work research level of measurement

Data availability

The data that support the findings of this study are available from the OSF repository: https://osf.io/xc9r2?view_only=f1be973ed38c479eaf31ea3b9d07903d

Alarcon, G. M., & Lyons, J. B. (2011). The relationship of engagement and job satisfaction in working samples. The Journal of Psychology, 145 (5), 463–480. https://doi.org/10.1080/00223980.2011.584083

Article   PubMed   Google Scholar  

Alat, P., Das, S. S., Arora, A., & Jha, A. K. (2023). Mental health during COVID-19 lockdown in India: Role of psychological capital and internal locus of control. Current Psychology, 42 (3), 1923–1935. https://doi.org/10.1007/s12144-021-01516-x

Alessandri, G., Consiglio, C., Luthans, F., & Borgogni, L. (2018). Testing a dynamic model of the impact of psychological capital on work engagement and job performance. Career Development International, 23 , 33–47. https://doi.org/10.1108/CDI-11-2016-0210

Article   Google Scholar  

Allison, P. D. (1990). Change scores as dependent variables in regression analysis. Sociological Methodology, 20 , 93–114. https://doi.org/10.2307/271083

Almeida, J., Molnar, B. E., Kawachi, I., & Subramanian, S. V. (2009). Ethnicity and nativity status as determinants of perceived social support: Testing the concept of familism. Social Science & Medicine, 68 (10), 1852–1858. https://doi.org/10.1016/j.socscimed.2009.02.029

Alnazly, E., Khraisat, O. M., Al-Bashaireh, A. M., & Bryant, C. L. (2021). Anxiety, depression, stress, fear and social support during COVID-19 pandemic among Jordanian healthcare workers. Plos One, 16 (3), e0247679. https://doi.org/10.1371/journal.pone.0247679

Article   PubMed   PubMed Central   Google Scholar  

Ammar, A., Chtourou, H., Boukhris, O., Trabelsi, K., Masmoudi, L., Brach, M., ... & ECLB-COVID19 Consortium. (2020). COVID-19 home confinement negatively impacts social participation and life satisfaction: a worldwide multicenter study . International Journal of Environmental Research and Public Health, 17 (17), 6237. https://doi.org/10.3390/ijerph17176237

Asante, K. O. (2012). Social support and the psychological wellbeing of people living with HIV/AIDS in Ghana. African Journal of Psychiatry, 15 (5), 340–345. https://doi.org/10.4314/ajpsy.v15i5.42

Avey, J. B., Reichard, R. J., Luthans, F., & Mhatre, K. H. (2011). Meta-analysis of the impact of positive psychological capital on employee attitudes, behaviors, and performance. Human Resource Development Quarterly, 22 (2), 127–152. https://doi.org/10.1002/hrdq.20070

Bakker, A. B., & Bal, P. M. (2010). Weekly work engagement and performance: A study among starting teachers. Journal of Occupational and Organizational Psychology, 83 , 189–206. https://doi.org/10.1348/096317909x402596

Bakker, A. B., & Demerouti, E. (2007). The job demands-resources model: State of the art. Journal of Managerial Psychology, 22 (3), 309–328. https://doi.org/10.1108/02683940710733115

Bakker, A. B., & Demerouti, E. (2014). Job demands–resources theory. In P. Y. Chen & C. L. Cooper (Eds.), Wellbeing: A complete reference guide (pp. 1–28). Wiley. https://doi.org/10.1002/9781118539415.wbwell019

Chapter   Google Scholar  

Bakker, A. B., & Demerouti, E. (2008). Towards a model of work engagement. Career Development International, 13 (3), 209–223. https://doi.org/10.1108/13620430810870476

Bakker, A. B., & Xanthopoulou, D. (2013). Creativity and charisma among female leaders: The role of resources and work engagement. The International Journal of Human Resource Management, 24 (14), 2760–2779. https://doi.org/10.1080/09585192.2012.751438

Bareket-Bojmel, L., Shahar, G., Abu-Kaf, S., & Margalit, M. (2021). Perceived social support, loneliness, and hope during the COVID-19 Pandemic: Testing a mediating model in the UK, USA, and Israel. British Journal of Clinical Psychology, 60 (2), 133–148. https://doi.org/10.1111/bjc.12285

Beaton, D. E., Bombardier, C., Guillemin, F., & Ferraz, M. B. (2000). Guidelines for the process of cross– cultural adaptation of self-report measures. Spine, 25 (24), 3186–3191. https://doi.org/10.1097/00007632-200012150-00014

Brand, E. F., Lakey, B., & Berman, S. (1995). A preventive, psychoeducational approach to increase perceived social support. American Journal of Community Psychology, 23 (1), 117–135. https://doi.org/10.1007/BF02506925

Breevaart, K., Bakker, A. B., Demerouti, E., & Derks, D. (2016). Who takes the lead? A multi-source diary study on leadership, work engagement, and job performance. Journal of Organizational Behavior, 37 (3), 309–325. https://doi.org/10.1002/job.2041

Blustein, D. L., Duffy, R., Ferreira, J. A., Cohen-Scali, V., Cinamon, R. G., & Allan, B. A. (2020). Unemployment in the time of COVID-19: A research agenda. Journal of Vocational Behavior, 119 , 103436. https://doi.org/10.1016/J.JVB.2020.103436

Brooks, S. K., Dunn, R., Amlot, R., Rubin, G. J., & Greenberg, N. (2018). A systematic, thematic review of social and occupational factors associated with psychological outcomes in healthcare employees during an infectious disease outbreak. Journal of Occupational and Environmental Medicine, 60 (3), 248–257. https://doi.org/10.1097/JOM.0000000000001235

Cao, X., & Chen, L. (2019). Relationships among social support, empathy, resilience and work engagement in haemodialysis nurses. International Nursing Review, 66 (3), 366–373. https://doi.org/10.1111/inr.12516

Carasco-Saul, M., Kim, W., & Kim, T. (2015). Leadership and employee engagement: Proposing research agendas through a review of literature. Human Resource Development Review, 14 (1), 38–63. https://doi.org/10.1177/1534484314560406

Carver, C. S., & Scheier, M. (2003). Optimism. In S. J. Lopez & C. R. Snyder (Eds.), Positive psychological assessment: A handbook of models and measures. American Psychology.

Google Scholar  

Chen, H., & Eyoun, K. (2021). Do mindfulness and perceived organizational support work? Fear of COVID-19 on restaurant frontline employees’ job insecurity and emotional exhaustion. International Journal of Hospitality Management, 94 , 102850. https://doi.org/10.1016/j.ijhm.2020.102850

Chew, Q. H., Wei, K. C., Vasoo, S., Chua, H. C., & Sim, K. (2020). Narrative synthesis of psychological and coping responses towards emerging infectious disease outbreaks in the general population: practical considerations for the COVID-19 pandemic. Singapore Medical Journal, 61 (7), 350–356. https://doi.org/10.11622/smedj.2020046

Christian, M. S., Garza, A. S., & Slaughter, J. E. (2011). Work engagement: A quantitative review and test of its relations with task and contextual performance. Personnel Psychology, 64 , 89–136. https://doi.org/10.1111/j.1744-6570.2010.01203.x

Cohn, M. A., Fredrickson, B. L., Brown, S. L., Mikels, J. A., & Conway, A. M. (2009). Happiness unpacked: Positive emotions increase life satisfaction by building resilience. Emotion, 9 (3), 361–368. https://doi.org/10.1037/a0015952

Dahlem, N. W., Zimet, G. D., & Walker, R. R. (1991). The multidimensional scale of perceived social support: A confirmation study. Journal of Clinical Psychology, 47 (6), 756–761. https://doi.org/10.1002/1097-4679(199111)47:6%3c756::AID-JCLP2270470605%3e3.0.CO;2-L

Datu, J. A. D., & Valdez, J. P. M. (2019). Psychological capital is associated with higher levels of life satisfaction and school belongingness. School Psychology International, 40 (4), 331–346. https://doi.org/10.1177/0143034319838011

De Bruin, G. P., & Henn, C. M. (2013). Dimensionality of the 9-item Utrecht work engagement scale (UWES-9 ) . Psychological Reports, 112 (3), 788–799. https://doi.org/10.2466/01.03.PR0.112.3.788-799

Diener, E. (Ed.). (2009).  Assessing well-being: The collected works of Ed Diener  (Vol. 39). Springer Science & Business Media.

Diener, E. D., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction with Life Scale. Journal of Personality Assessment, 49 (1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13

Dudasova, L., Prochazka, J., Vaculik, M., & Lorenz, T. (2021). Measuring psychological capital: Revision of the compound psychological capital scale (CPC-12). PLoS ONE, 16 (3), e0247114. https://doi.org/10.1371/journal.pone.0247114

Eldor, L., & Harpaz, I. (2016). A process model of employee engagement: The learning climate and its relationship with extra-role performance behaviors. Journal of Organizational Behavior, 37 (2), 213–235. https://doi.org/10.1002/job.2037

Erdogan, B., Bauer, T. N., Truxillo, D. M., & Mansfield, L. R. (2012). Whistle while you work: A review of the life satisfaction literature. Journal of Management, 38 (4), 1038–1083. https://doi.org/10.1177/0149206311429379

Finch, J., Farrell, L. J., & Waters, A. M. (2020). Searching for the HERO in youth: Does psychological capital (PsyCap) predict mental health symptoms and subjective wellbeing in Australian school-aged children and adolescents? Child Psychiatry & Human Development, 51 (6), 1025–1036. https://doi.org/10.1007/s10578-020-01023-3

Fiorilli, C., Schneider, B., Buonomo, I., & Romano, L. (2019). Family and nonfamily support in relation to burnout and work engagement among Italian teachers. Psychology in the Schools, 56 (5), 781–791. https://doi.org/10.1002/pits.22235

Graham, J. W., Tatterson, J. W., & Widaman, K. F. (2000). Creating parcels for multi-dimensional constructs in structural equation modeling. Paper presented at annual meeting of the society of multivariate experimental psychology; Saratoga Springs, NY.

Grant, A. M., Christianson, M. K., & Price, R. H. (2007). Happiness, health, or relationships? Managerial practices and employee well-being tradeoffs. Academy of Management Perspectives, 21 (3), 51–63. https://doi.org/10.5465/amp.2007.26421238

Grey, I., Arora, T., Thomas, J., Saneh, A., Tohme, P., & Abi-Habib, R. (2020). The role of perceived social support on depression and sleep during the COVID-19 pandemic. Psychiatry Research, 293 , 113452. https://doi.org/10.1016/j.psychres.2020.113452

Gudmundsdottir, D. G. (2013). The impact of economic crisis on happiness. Social Indicators Research, 110 (3), 1083–1101. https://doi.org/10.1007/s11205-011-9973-8

Gustavson, K., Røysamb, E., Borren, I., Torvik, F. A., & Karevold, E. (2016). Life satisfaction in close relationships: Findings from a longitudinal study. Journal of Happiness Studies, 17 (3), 1293–1311. https://doi.org/10.1007/s10902-015-9643-7

Guven, C., & Saloumidis, R. (2014). Life satisfaction and longevity: Longitudinal evidence from the German socio-economic panel. German Economic Review, 15 (4), 453–472. https://doi.org/10.1111/geer.12024

Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, 76 (4), 408–420. https://doi.org/10.1080/03637750903310360

Headey, B., & Muffels, R. (2018). A theory of life satisfaction dynamics: Stability, change and volatility in 25-year life trajectories in Germany. Social Indicators Research, 140 (2), 837–866. https://doi.org/10.1007/s11205-017-1785-z

Heaney, C. A. (1991). Enhancing social support at the workplace: Assessing the effects of the caregiver support program. Health Education Quarterly, 18 (4), 477–494. https://doi.org/10.1177/109019819101800406

Heller, D., Watson, D., & Ilies, R. (2004). The role of person versus situation in life satisfaction: a critical examination. Psychological Bulletin, 130 (4), 574–600. https://doi.org/10.1037/0033-2909.130.4.574

Hobfoll, S. E. (2011). Conservation of resources theory: Its implication for stress, health, and resilience. In S. Folkman (Ed.), The Oxford handbook of stress, health, and coping (pp. 127–147). Oxford University Press.

Huang, L., & Zhang, T. (2022). Perceived social support, psychological capital, and subjective well-being among college students in the context of online learning during the COVID-19 pandemic. The Asia-Pacific Education Researcher, 31 (5), 563–574.

Insights for Education. (n.d.). Covid-19 and school responses: Tracking and analysis. https://education.org/covid-19 . Access 25 May 2021.

Jiang, Y. (2021). Mobile social media usage and anxiety among university students during the COVID-19 Pandemic: The mediating role of psychological capital and the moderating role of academic burnout. Frontiers in Psychology, 12 , 612007. https://doi.org/10.3389/fpsyg.2021.612007

Kahn, W. A. (1990). Psychological conditions of personal engagement and disengagement at work. Academy of Management Journal, 33 (4), 692–724. https://doi.org/10.5465/256287

Karademas, E. C. (2006). Self-efficacy, social support and well-being. Personality and Individual Differences, 40 (6), 1281–1290. https://doi.org/10.1016/j.paid.2005.10.019

Kašpárková, L., Vaculík, M., Procházka, J., & Schaufeli, W. B. (2018). Why resilient workers perform better: The roles of job satisfaction and work engagement. Journal of Workplace Behavioral Health, 33 (1), 43-62š.

Kılınç, T., & Sis Çelik, A. (2021). Relationship between the social support and psychological resilience levels perceived by nurses during the COVID-19 pandemic: A study from Turkey. Perspectives in Psychiatric Care, 57 (3), 1000–1008. https://doi.org/10.1111/ppc.12648

Kim, W. (2017). Examining mediation effects of work engagement among job resources, job performance, and turnover intention. Performance Improvement Quarterly, 29 (4), 407–425. https://doi.org/10.1002/piq.21235

Kim, W., Kim, J., Woo, H., Park, J., Jo, J., Park, S. H., & Lim, S. Y. (2017). The relationship between work engagement and organizational commitment: Proposing research agendas through a review of empirical literature. Human Resource Development Review, 16 (4), 350–376. https://doi.org/10.1177/1534484317725967

Kööts-Ausmees, L., & Realo, A. (2015). The association between life satisfaction and self–reported health status in Europe. European Journal of Personality, 29 (6), 647–657. https://doi.org/10.1002/per.2037

Kramer, A., & Kramer, K. Z. (2020).The potential impact of the Covid-19 pandemic on occupational status, work from home, and occupational mobility . Journal of Vocational Behavior , 103442. https://doi.org/10.1016/J.JVB.2020.103442

Lakey, B., & Orehek, E. (2011). Relational regulation theory: A new approach to explain the link between perceived social support and mental health. Psychological Review, 118 (3), 482–495. https://doi.org/10.1037/a0023477

Lei, H., Leaungkhamma, L., & Le, P. B. (2020). How transformational leadership facilitates innovation capability: The mediating role of employees’ psychological capital. Leadership & Organization Development Journal . https://doi.org/10.1108/LODJ-06-2019-0245

Leijten, F. R., van den Heuvel, S. G., van der Beek, A. J., Ybema, J. F., Robroek, S. J., & Burdorf, A. (2015). Associations of work-related factors and work engagement with mental and physical health: A 1-year follow-up study among older workers. Journal of Occupational Rehabilitation, 25 (1), 86–95. https://doi.org/10.1007/s10926-014-9525-6

Leiter, M. P., & Bakker, A. B. (2010). Work engagement: A handbook of essential theory and research . Psychology Press.

Lewis, C. A., Shevlin, M. E., Smékal, & Dorahy, M. J. (1999). Factor structure and reliability of a Czech translation of the Satisfaction With Life Scale among Czech university students. Studia Psychologica, 41 (3), 239–244.

Li, Y. (2018a). Building well-being among university teachers: The roles of psychological capital and meaning in life. European Journal of Work and Organizational Psychology, 27 (5), 594–602.

Li, Y. (2018b). Linking protean career orientation to well-being: The role of psychological capital. Career Development International, 23 (2), 178–196. https://doi.org/10.1108/CDI-07-2017-0132

Li, A., Wang, S., Cai, M., Sun, R., & Liu, X. (2021). Self-compassion and life-satisfaction among Chinese self-quarantined residents during COVID-19 pandemic: A moderated mediation model of positive coping and gender. Personality and Individual Differences, 170 , 110457. https://doi.org/10.1016/j.paid.2020.110457

Lin, N. (1986). Conceptualizing social support. In N. Lin, A. Dean, & W. M. Ensel (Eds.), Social support, life events, and depression (pp. 17–30). Academic.

Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9 , 151–173. https://doi.org/10.1207/S15328007SEM0902_1

Lu, L., Lu, A. C. C., Gursoy, D., & Neale, N. R. (2016). Work engagement, job satisfaction, and turnover intentions. International Journal of Contemporary Hospitality Management, 28 (4), 737–761. https://doi.org/10.1108/IJCHM-07-2014-0360

Luthans, F., Avey, J. B., Avolio, B. J., Norman, S. M., & Combs, G. M. (2006). Psychological capital development: Toward a micro-intervention. Journal of Organizational Behavior, 27 , 387–393. https://doi.org/10.1002/job.373

Luthans, F. (2002). Positive organizational behavior: Developing and managing psychological strengths. Academy of Management Perspectives, 16 (1), 57–72. https://doi.org/10.5465/ame.2002.6640181

Luthans, F., Youssef, C. M., & Avolio, B. J. (2007). Psychological capital: Developing the human competitive edge . Oxford University Press.

Luthans, F., Youssef, C. M., Sweetman, D. S., & Harms, P. D. (2013). Meeting the leadership challenge of employee well-being through relationship PsyCap and health PsyCap. Journal of Leadership & Organizational Studies, 20 (1), 118–133. https://doi.org/10.1177/1548051812465893

Ma, J. L., Lai, K. Y. C., & Lo, J. W. K. (2017). Perceived social support in Chinese parents of children with attention deficit hyperactivity disorder in a Chinese context: Implications for social work practice. Social Work in Mental Health, 15 (1), 28–46. https://doi.org/10.1080/15332985.2016.1159643

Magro, S. W., Utesch, T., Dreiskamper, D., & Wagner, J. (2019). Self-esteem development in middle childhood: Support for sociometer theory. International Journal of Behavioral Development, 43 (2), 118–127. https://doi.org/10.1177/0165025418802462

Major, B., Cozzarelli, C., Sciacchitano, A. M., Cooper, M. L., Testa, M., & Mueller, P. M. (1990). Perceived social support, self-efficacy, and adjustment to abortion. Journal of Personality and Social Psychology, 59 (3), 452–463. https://doi.org/10.1037//0022-3514.59.3.452

May, D. R., Gilson, R. L., & Harter, L. M. (2004). The psychological conditions of meaningfulness, safety and availability and the engagement of the human spirit at work. Journal of Occupational and Organizational Psychology, 77 (1), 11–37. https://doi.org/10.1348/096317904322915892

Minghui, L., Lei, H., Xiaomeng, C., & Potměšilc, M. (2018). Teacher efficacy, work engagement, and social support among Chinese special education schoolteachers. Frontiers in Psychology, 9 , 648. https://doi.org/10.3389/fpsyg.2018.00648

Müller, D., Kropp, M., Anslow, C., & Meier, A. "The Effects on Social Support and Work Engagement with Scrum Events," 2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), 2021, pp. 101–104. https://doi.org/10.1109/CHASE52884.2021.00019

Murphy, S. A., Duxbury, L., & Higgins, C. (2007). The individual and organizational consequences of stress, anxiety, and depression in the workplace: A case study. Canadian Journal of Community Mental Health, 25 (2), 143–157. https://doi.org/10.7870/cjcmh-2006-0018

Muthén, L., & Muthén, B. (1998–2017). Mplus user’s guide (8th ed.). Muthén & Muthén.

Nakata, A., Irie, M., & Takahashi, M. (2013). A single-item global job satisfaction measure is associated with quantitative blood immune indices in white-collar employees. Industrial Health, 51 (2), 193–201.

Nasurdin, A. M., Ling, T. C., & Khan, S. N. (2018). Linking social support, work engagement and job performance in nursing. International Journal of Business & Society, 19 (2), 363–386.

Nauffal, D. A. D., & Sbeity, R. (2013). The role of perceived social support in predicting subjective well-being in Lebanese college students. Happiness & Well-Being, 1 , 121–134.

Newman, A., Ucbasaran, D., Zhu, F. E. I., & Hirst, G. (2014). Psychological capital: A review and synthesis. Journal of Organizational Behavior, 35 (S1), 120–138. https://doi.org/10.1002/job.1916

Niswaty, R., Wirawan, H., Akib, H., Saggaf, M. S., & Daraba, D. (2021). Investigating the effect of authentic leadership and employees’ psychological capital on work engagement: Evidence from Indonesia. Heliyon, 7 (5), e06992. https://doi.org/10.1016/j.heliyon.2021.e06992

Nolzen, N. (2018). The concept of psychological capital: A comprehensive review. Management Review Quarterly, 68 (3), 237–277. https://doi.org/10.1007/s11301-018-0138-6

Oberländer, M., & Bipp, T. (2022). Do digital competencies and social support boost work engagement during the COVID-19 pandemic? Computers in Human Behavior, 130 , 107172.

Ozbay, F., Johnson, D. C., Dimoulas, E., Morgan, C. A., Charney, D., & Southwick, S. (2007). Social support and resilience to stress: From neurobiology to clinical practice. Psychiatry (edgmont), 4 (5), 35–40.

PubMed   Google Scholar  

Paek, S., Schuckert, M., Kim, T. T., & Lee, G. (2015). Why is hospitality employees’ psychological capital important? The effects of psychological capital on work engagement and employee morale. International Journal of Hospitality Management, 50 , 9–26. https://doi.org/10.1016/j.ijhm.2015.07.001

Pavot, W., & Diener, E. (2008). The satisfaction with life scale and the emerging construct of life satisfaction. The Journal of Positive Psychology, 3 (2), 137–152. https://doi.org/10.1080/17439760701756946

Pressman, S. D., & Cohen, S. (2005). Does positive affect influence health? Psychological Bulletin, 131 , 925–971.

Prochazka, J., Kacmar, P., Lebedova, T., Dudasova, L., & Vaculik, M. (2023). The revised compound psychological capital scale (CPC-12R): Validity and cross-cultural invariance in an organizational context. International Journal of Mental Health and Addiction , 1–22.

Proctor, C. L., Linley, P. A., & Maltby, J. (2009). Youth life satisfaction: A review of the literature. Journal of Happiness Studies, 10 (5), 583–630. https://doi.org/10.1007/s10902-008-9110-9

Qi, M., Zhou, S.-J., Guo, Z.-C., Zhang, L.-G., Min, H.-J., Li, X.-M., & Chen, J.-X. (2020). The effect of social support on mental health in Chinese adolescents during the outbreak of COVID-19. Journal of Adolescent Health . https://doi.org/10.1016/j.jadohealth.2020.07.001

Quílez-Robres, A., Lozano-Blasco, R., Íñiguez-Berrozpe, T., & Cortés-Pascual, A. (2021). Social, family, and educational impacts on anxiety and cognitive empathy derived from the COVID-19: Study on families with children.  Frontiers in Psychology, 12 . https://doi.org/10.3389/fpsyg.2021.562800

Rabenu, E., Yaniv, E., & Elizur, D. (2017). The relationship between psychological capital, coping with stress, well-being, and performance. Current Psychology, 36 (4), 875–887. https://doi.org/10.1007/s12144-016-9477-4

Redman, T., & Snape, E. (2006). The consequences of perceived age discrimination amongst older police officers: Is social support a buffer? British Journal of Management, 17 (2), 167–175. https://doi.org/10.1111/j.1467-8551.2006.00492.x

Rego, A., Owens, B., Leal, S., Melo, A. I., eCunha, M. P., Goncalves, L., & Ribeiro, P. (2017). How leader humility helps teams to be humbler, psychologically stronger, and more effective: A moderated mediation model. The Leadership Quarterly, 28 (5), 639–658. https://doi.org/10.1016/j.leaqua.2017.02.002

Ren, Y., & Ji, B. (2019). Correlation between perceived social support and loneliness among Chinese adolescents: mediating effects of psychological capital. Psychiatria Danubina, 31 (4), 421–428. https://doi.org/10.24869/psyd.2019.421

Rich, B. L., Lepine, J. A., & Crawford, E. R. (2010). Job engagement: Antecedents and effects on job performance. Academy of Management Journal, 53 (3), 617–635. https://doi.org/10.5465/amj.2010.51468988

Roemer, A., & Harris, C. (2018). Perceived organisational support and well-being: The role of psychological capital as a mediator. SA Journal of Industrial Psychology, 44 (1), 1–11. https://doi.org/10.4102/sajip.v44i0.1539

Rosella, L. C., Fu, L., Buajitti, E., & Goel, V. (2019). Death and chronic disease risk associated with poor life satisfaction: A population-based cohort study. American Journal of Epidemiology, 188 (2), 323–331. https://doi.org/10.1093/aje/kwy245

Rothmann, S. (2008). Job satisfaction, occupational stress, burnout and work engagement as components of work-related wellbeing. SA Journal of Industrial Psychology, 34 (3), 11–16.

Ryff, C. D., & Keyes, C. L. M. (1995). The structure of psychological well-being revisited. Journal of Personality and Social Psychology, 69 (4), 719–727. https://doi.org/10.1037/0022-3514.69.4.719

Saks, A. M. (2006). Antecedents and consequences of employee engagement. Journal of Managerial Psychology, 21 , 600–619. https://doi.org/10.1108/02683940610690169

Saks, A. M., & Gruman, J. A. (2011). Getting newcomers engaged: The role of socialization tactics. Journal of Managerial Psychology, 26 (5), 383–402. https://doi.org/10.1108/02683941111139001

Salanova, M., & Ortega-Maldonado, A. (2019). Psychological capital development in organizations: an integrative review of evidence-based intervention programs. In  Positive psychological intervention design and protocols for multi-cultural contexts  (pp. 81–102). Springer.

Saltzman, K. M., & Holahan, C. J. (2002). Social support, self-efficacy, and depressive symptoms: An integrative model. Journal of Social and Clinical Psychology, 21 (3), 309–322.

Sarason, B. R., Pierce, G. R., & Sarason, I. G. (1990). Social support: The sense of acceptance and the role of relationships. In B. R. Sarason, I. G. Sarason, & G. R. Pierce (Eds.), Social support: An interactional view (pp. 97–128). Wiley.

Sarason, I. G., Sarason, B. R., & Shearin, E. N. (1986). Social support as an individual difference variable: Its stability, origins, and relational aspects. Journal of Personality and Social Psychology, 50 (4), 845–855. https://doi.org/10.1037/0022-3514.50.4.845

Schaubroeck, J., & Merritt, D. E. (1997). Divergent effects of job control on coping with work stressors: The key role of selfefficacy. Academy of Management Journal, 40 , 738–754. https://doi.org/10.5465/257061

Schaufeli, W. B. (2011). Work engagement: What do we know? [Powerpoint slides]. Retrieved from http://www.psihologietm.ro/OHPworkshop/schaufeli_work_engagement_1.pdf

Schaufeli, W. B., Bakker, A. B., & Salanova, M. (2006). The measurement of work engagement with a short questionnaire: A cross-national study. Educational and Psychological Measurement, 66 (4), 701–716. https://doi.org/10.1177/0013164405282471

Schaufeli, W. B., Salanova, M., Gonzalez-Romá, V., & Bakker, A. B. (2002). The measurement of engagement and burnout: A confirmative analytic approach. Journal of Happiness Studies, 3 , 71–92. https://doi.org/10.1023/A:1015630930326

Schaufeli, W. B., Taris, T. W., & Van Rhenen, W. (2008). Workaholism, burnout, and work engagement: Three of a kind or three different kinds of employee well-being? Applied Psychology, 57 (2), 173–203. https://doi.org/10.1111/j.1464-0597.2007.00285.x

Schimmack, U., Diener, E., & Oishi, S. (2009). Life-satisfaction is a momentary judgment and a stable personality characteristic: The use of chronically accessible and stable sources. Assessing Well-being (pp. 181–212). Springer.

Segerstrom, S. C. (2007). Optimism and resources: Effects on each other and on health over 10 years. Journal of Research in Personality, 41 (4), 772–786. https://doi.org/10.1016/j.jrp.2006.09.004

Seligman, M. E., & Csikszentmihalyi, M. (2014). Positive psychology: An introduction. Flow and the foundations of positive psychology (pp. 279–298). Springer.

Shahyad, S., Besharat, M. A., Asadi, M., Alipour, A. S., & Miri, M. (2011). The relation of attachment and perceived social support with life satisfaction: Structural equation model. Procedia-Social and Behavioral Sciences, 15 , 952–956. https://doi.org/10.1016/j.sbspro.2011.03.219

Shavitt, S., Cho, Y. I., Johnson, T. P., Jiang, D., Holbrook, A., & Stavrakantonaki, M. (2016). Culture moderates the relation between perceived stress, social support, and mental and physical health. Journal of Cross-Cultural Psychology, 47 (7), 956–980. https://doi.org/10.1177/0022022116656132

Siedlecki, K. L., Salthouse, T. A., Oishi, S., & Jeswani, S. (2014). The relationship between social support and subjective well-being across age. Social Indicators Research, 117 (2), 561–576. https://doi.org/10.1007/s11205-013-0361-4

Simpson, M. R. (2009). Engagement at work: A review of the literature. International Journal of Nursing Studies, 46 , 1012–1024. https://doi.org/10.1016/j.ijnurstu.2008.05.003

Singh, S., & Mansi, S. (2009). Psychological capital as predictor of psychological well being. Journal of the Indian Academy of Applied Psychology, 35 , 233–238.

Sippel, L. M., Pietrzak, R. H., Charney, D. S., Mayes, L. C., & Southwick, S. M. (2015). How does social support enhance resilience in the trauma-exposed individual? Ecology and Society, 20 (4), 10. https://doi.org/10.5751/ES-07832-200410

Sirgy, M. J. (2012). The psychology of quality of life: Hedonic well-being, life satisfaction, and eudaimonia . Springer Science & Business Media.

Book   Google Scholar  

Siu, O. L., Lo, B. C. Y., Ng, T. K., & Wang, H. (2023). Social support and student outcomes: The mediating roles of psychological capital, study engagement, and problem-focused coping. Current Psychology, 42 (4), 2670–2679. https://doi.org/10.1007/s12144-021-01621-x

Snyder, C. R., Sympson, S. C., Ybasco, F. C., Borders, T. F., Babyak, M. A., & Higgins, R. L. (1996). Development and validation of the State Hope Scale. Journal of Personality and Social Psychology, 70 (2), 321–335. https://doi.org/10.1037/0022-3514.70.2.321

Stahnke, B., & Cooley, M. (2020). A Systematic Review of the Association Between Partnership and Life Satisfaction. The Family Journal, 29 (2), 182–189. https://doi.org/10.1177/1066480720977517

Steel, P., Schmidt, J., & Shultz, J. (2008). Refining the relationship between personality and subjective well-being. Psychological Bulletin, 134 (1), 138–161. https://doi.org/10.1037/0033-2909.134.1.138

Sweetman, D., & Luthans, F. (2010). The power of positive psychology: Psychological capital and work engagement. Work Engagement: A Handbook of Essential Theory and Research, 54 , 68.

Szkody, E., Stearns, M., Stanhope, L., & McKinney, C. (2021). Stress-buffering role of social support during COVID-19. Family Process, 60 (3), 1002–1015. https://doi.org/10.1111/famp.12618

Turliuc, M. N., & Candel, O. S. (2021). The relationship between psychological capital and mental health during the Covid-19 pandemic: A longitudinal mediation model.  Journal of Health Psychology , 13591053211012771. https://doi.org/10.1177/13591053211012771

Uchino, B. N., Cacioppo, J. T., & Kiecolt-Glaser, J. K. (1996). The relationship between social support and physiological processes: A review with emphasis on underlying mechanisms and implications for health. Psychological Bulletin, 119 (3), 488–531.

Uchmanowicz, I., et al. (2019). Life satisfaction, job satisfaction, life orientation and occupational burnout among nurses and midwives in medical institutions in Poland: A cross-sectional study. British Medical Journal Open, 9 , e024296. https://doi.org/10.1136/bmjopen-2018-024296

Upadyaya, K., Vartiainen, M., & Salmela-Aro, K. (2016). From job demands and resources to work engagement, burnout, life satisfaction, depressive symptoms, and occupational health. Burnout Research, 3 (4), 101–108. https://doi.org/10.1016/j.burn.2016.10.001

Udayar, S., Urbanaviciute, I., & Rossier, J. (2020). Perceived social support and Big Five personality traits in middle adulthood: A 4-year cross-lagged path analysis. Applied Research in Quality of Life, 15 (2), 395–414. https://doi.org/10.1007/s11482-018-9694-0

Veenhoven, R. (2008). Healthy happiness: Effects of happiness on physical health and quences for preventative health care. Journal of Happiness Studies, 9 , 449–469. https://doi.org/10.1007/s10902-006-9042-1

Wilks, S. E. (2008). Resilience amid academic stress: The moderating impact of social support among social work students. Advances in Social Work, 9 (2), 106–125. https://doi.org/10.18060/51

Wu, D. C., Kim, H. S., & Collins, N. L. (2021). Perceived responsiveness across cultures: The role of cultural fit in social support use. Social and Personality Psychology Compass, 15 (9), e12634. https://doi.org/10.1111/spc3.12634

Wu, W.-Y., & Nguyen, K.-V.H. (2019). The antecedents and consequences of psychological capital: A meta-analytic approach. Leadership & Organization Development Journal, 40 (4), 435–456. https://doi.org/10.1108/LODJ-06-2018-0233

Veenhoven, R., & Hagenaars, A. (Eds.). (1989). Did the crisis really hurt? Effects of the 1980–1982 economic recession on satisfaction, mental health and mortality . Universitaire Pers.

Xanthopoulou, D., Bakker, A. B., Demerouti, E., & Schaufeli, W. B. (2007). The role of personal resources in the job demands-resources model. International Journal of Stress Management, 14 (2), 121–141. https://doi.org/10.1037/1072-5245.14.2.121

Yang, C., Xia, M., & Zhou, Y. (2022). How is perceived social support linked to life satisfaction for individuals with substance-use disorders? The mediating role of resilience and positive affect. Current Psychology, 41 (5), 2719–2732.

Yee, P. L., Santoro, K. E., Paul, J. S., & Rosenbaum, L. B. (1996). Information processing approaches to the study of relationship and social support schemata. Handbook of social support and the family (pp. 25–42). Springer.

Young, H. R., Glerum, D. R., Wang, W., & Joseph, D. L. (2018). Who are the most engaged at work? A meta-analysis of personality and employee engagement. Journal of Organizational Behavior, 39 (10), 1330–1346. https://doi.org/10.1002/job.2303

Zhang, R. (2017). The stress-buffering effect of self-disclosure on Facebook: An examination of stressful life events, social support, and mental health among college students. Computers in Human Behavior, 75 , 527–537. https://doi.org/10.1016/j.chb.2017.05.043

Zimet, G. D., Dahlem, N. W., Zimet, S. G., & Farley, G. K. (1988). The multidimensional scale of perceived social support. Journal of Personality Assessment, 52 (1), 30–41. https://doi.org/10.1207/s15327752jpa5201_2

Download references

This study was supported by the project When Close Relationships Matter: A Longitudinal Study of Psychological Capital Development (GA20-03810S) of the Czech Science Foundation, GACR. Full name of the funder: Grantová agentura České republiky. Funder website: https://gacr.cz/ This funding did not play any role in the study design, data collection, analysis, and so on.

Author information

Authors and affiliations.

Department of Psychology, Faculty of Social Studies, Masaryk University, Jostova 10, Brno, 602 00, Czech Republic

Ludmila Dudasova

Department of Business Management, Faculty of Economics and Administration, Masaryk University, Lipova 41a, Brno, 602 00, Czech Republic

Jakub Prochazka

Interdisciplinary Research Team on Internet and Society, Faculty of Social Studies, Masaryk University, Jostova 10, 602 00, Brno, Czech Republic

Martin Vaculik

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ludmila Dudasova .

Ethics declarations

The research was conducted in accordance with the ethical standards of Masaryk University.

Informed consent

Informed consent was obtained from all participants included in the study.

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Dudasova, L., Prochazka, J. & Vaculik, M. Psychological capital, social support, work engagement, and life satisfaction: a longitudinal study in COVID-19 pandemic. Curr Psychol (2024). https://doi.org/10.1007/s12144-024-05841-9

Download citation

Accepted : 05 March 2024

Published : 05 April 2024

DOI : https://doi.org/10.1007/s12144-024-05841-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Psychological capital
  • Work engagement
  • Life satisfaction
  • COVID-19 pandemic
  • Find a journal
  • Publish with us
  • Track your research

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

11.2: Operational definitions

  • Last updated
  • Save as PDF
  • Page ID 135144

  • Matthew DeCarlo, Cory Cummings, & Kate Agnelli
  • Open Social Work Education

Learning Objectives

Learners will be able to…

  • Define and give an example of indicators and attributes for a variable
  • Apply the three components of an operational definition to a variable
  • Distinguish between levels of measurement for a variable and how those differences relate to measurement
  • Describe the purpose of composite measures like scales and indices

Conceptual definitions are like dictionary definitions. They tell you what a concept means by defining it using other concepts. In this section we will move from the abstract realm (theory) to the real world (measurement).  Operationalization is the process by which researchers spell out precisely how a concept will be measured in their study. It involves identifying the specific research procedures we will use to gather data about our concepts. If conceptually defining your terms means looking at theory, how do you operationally define your terms? By looking for indicators of when your variable is present or not, more or less intense, and so forth. Operationalization is probably the most challenging part of quantitative research, but once it’s done, the design and implementation of your study will be straightforward.

social work research level of measurement

Operationalization works by identifying specific  indicators that will be taken to represent the ideas we are interested in studying. If we are interested in studying masculinity, then the indicators for that concept might include some of the social roles prescribed to men in society such as breadwinning or fatherhood. Being a breadwinner or a father might therefore be considered  indicators  of a person’s masculinity. The extent to which a man fulfills either, or both, of these roles might be understood as clues (or indicators) about the extent to which he is viewed as masculine.

Let’s look at another example of indicators. Each day, Gallup researchers poll 1,000 randomly selected Americans to ask them about their well-being. To measure well-being, Gallup asks these people to respond to questions covering six broad areas: physical health, emotional health, work environment, life evaluation, healthy behaviors, and access to basic necessities. Gallup uses these six factors as indicators of the concept that they are really interested in, which is  well-being .

Identifying indicators can be even simpler than the examples described thus far. Political party affiliation is another relatively easy concept for which to identify indicators. If you asked a person what party they voted for in the last national election (or gained access to their voting records), you would get a good indication of their party affiliation. Of course, some voters split tickets between multiple parties when they vote and others swing from party to party each election, so our indicator is not perfect. Indeed, if our study were about political identity as a key concept, operationalizing it solely in terms of who they voted for in the previous election leaves out a lot of information about identity that is relevant to that concept. Nevertheless, it’s a pretty good indicator of political party affiliation.

Choosing indicators is not an arbitrary process. As described earlier, utilizing prior theoretical and empirical work in your area of interest is a great way to identify indicators in a scholarly manner. And you conceptual definitions will point you in the direction of relevant indicators. Empirical work will give you some very specific examples of how the important concepts in an area have been measured in the past and what sorts of indicators have been used. Often, it makes sense to use the same indicators as previous researchers; however, you may find that some previous measures have potential weaknesses that your own study will improve upon.

All of the examples in this chapter have dealt with questions you might ask a research participant on a survey or in a quantitative interview. If you plan to collect data from other sources, such as through direct observation or the analysis of available records, think practically about what the design of your study might look like and how you can collect data on various indicators feasibly. If your study asks about whether the participant regularly changes the oil in their car, you will likely not observe them directly doing so. Instead, you will likely need to rely on a survey question that asks them the frequency with which they change their oil or ask to see their car maintenance records.

  • What indicators are commonly used to measure the variables in your research question?
  • How can you feasibly collect data on these indicators?
  • Are you planning to collect your own data using a questionnaire or interview? Or are you planning to analyze available data like client files or raw data shared from another researcher’s project?

Remember, you need  raw data . You research project cannot rely solely on the results reported by other researchers or the arguments you read in the literature. A literature review is only the first part of a research project, and your review of the literature should inform the indicators you end up choosing when you measure the variables in your research question.

Unlike conceptual definitions which contain other concepts, operational definition consists of the following components: (1) the variable being measured and its attributes, (2) the measure you will use, (3) how you plan to interpret the data collected from that measure to draw conclusions about the variable you are measuring.

Step 1: Specifying variables and attributes

The first component, the variable, should be the easiest part. At this point in quantitative research, you should have a research question that has at least one independent and at least one dependent variable. Remember that variables must be able to vary. For example, the United States is not a variable. Country of residence is a variable, as is patriotism. Similarly, if your sample only includes men, gender is a constant in your study, not a variable. A  constant is a characteristic that does not change in your study.

When social scientists measure concepts, they sometimes use the language of variables and attributes. A  variableno post  refers to a quality or quantity that varies across people or situations.  Attributes are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable’s attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of measurement are  categorical , meaning their attributes are categories rather than numbers. The latter two levels of measurement are  continuous , meaning their attributes are numbers.

social work research level of measurement

I exist to frustrate researchers’ categorizations.

Levels of measurement

Hair color is an example of a nominal level of measurement.  Nominal measures are categorical, and those categories cannot be mathematically ranked. As a brown-haired person (with some gray), I can’t say for sure that brown-haired people are better than blonde-haired people. As with all nominal levels of measurement, there is no ranking order between hair colors; they are simply different. That is what constitutes a nominal level of gender and race are also measured at the nominal level.

What attributes are contained in the variable  hair color ? While blonde, brown, black, and red are common colors, some people may not fit into these categories if we only list these attributes. My wife, who currently has purple hair, wouldn’t fit anywhere. This means that our attributes were not exhaustive.  Exhaustiveness means that all possible attributes are listed. We may have to list a lot of colors before we can meet the criteria of exhaustiveness. Clearly, there is a point at which exhaustiveness has been reasonably met. If a person insists that their hair color is  light burnt sienna , it is not your responsibility to list that as an option. Rather, that person would reasonably be described as brown-haired. Perhaps listing a category for  other color  would suffice to make our list of colors exhaustive.

What about a person who has multiple hair colors at the same time, such as red and black? They would fall into multiple attributes. This violates the rule of  mutual exclusivity , in which a person cannot fall into two different attributes. Instead of listing all of the possible combinations of colors, perhaps you might include a  multi-color  attribute to describe people with more than one hair color.

Making sure researchers provide mutually exclusive and exhaustive is about making sure all people are represented in the data record. For many years, the attributes for gender were only male or female. Now, our understanding of gender has evolved to encompass more attributes that better reflect the diversity in the world. Children of parents from different races were often classified as one race or another, even if they identified with both cultures. The option for bi-racial or multi-racial on a survey not only more accurately reflects the racial diversity in the real world but validates and acknowledges people who identify in that manner. If we did not measure race in this way, we would leave empty the data record for people who identify as biracial or multiracial, impairing our search for truth.

Unlike nominal-level measures, attributes at the ordinal  level can be rank ordered. For example, someone’s degree of satisfaction in their romantic relationship can be ordered by rank. That is, you could say you are not at all satisfied, a little satisfied, moderately satisfied, or highly satisfied. Note that even though these have a rank order to them (not at all satisfied is certainly worse than highly satisfied), we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute.

This can get a little confusing when using rating scales . If you have ever taken a customer satisfaction survey or completed a course evaluation for school, you are familiar with rating scales. “On a scale of 1-5, with 1 being the lowest and 5 being the highest, how likely are you to recommend our company to other people?” That surely sounds familiar. Rating scales use numbers, but only as a shorthand, to indicate what attribute (highly likely, somewhat likely, etc.) the person feels describes them best. You wouldn’t say you are “2” likely to recommend the company, but you would say you are not very likely to recommend the company. Ordinal-level attributes must also be exhaustive and mutually exclusive, as with nominal-level variables.

At the interval level, attributes must also be exhaustive and mutually exclusive and there is equal distance between attributes. Interval measures are also continuous, meaning their attributes are numbers, rather than categories. IQ scores are interval level, as are temperatures in Fahrenheit and Celsius. Their defining characteristic is that we can say how much more or less one attribute differs from another. We cannot, however, say with certainty what the ratio of one attribute is in comparison to another. For example, it would not make sense to say that a person with an IQ score of 140 has twice the IQ of a person with a score of 70. However, the difference between IQ scores of 80 and 100 is the same as the difference between IQ scores of 120 and 140.

While we cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, we can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level. Finally, at the  ratio level, attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point. Thus, with these variables, we  can  say what the ratio of one attribute is in comparison to another. Examples of ratio-level variables include age and years of education. We know that a person who is 12 years old is twice as old as someone who is 6 years old. Height measured in meters and weight measured in kilograms are good examples. So are counts of discrete objects or events such as the number of siblings one has or the number of questions a student answers correctly on an exam. The differences between each level of measurement are visualized in Table 11.1.

Table 11.1 Criteria for Different Levels of Measurement

Levels of measurement=levels of specificity

We have spent time learning how to determine our data’s level of measurement. Now what? How could we use this information to help us as we measure concepts and develop measurement tools? First, the types of statistical tests that we are able to use are dependent on our data’s level of measurement. With nominal-level measurement, for example, the only available measure of central tendency is the mode. With ordinal-level measurement, the median or mode can be used as indicators of central tendency. Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode). Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores. The higher the level of measurement, the more complex statistical tests we are able to conduct. This knowledge may help us decide what kind of data we need to gather, and how.

That said, we have to balance this knowledge with the understanding that sometimes, collecting data at a higher level of measurement could negatively impact our studies. For instance, sometimes providing answers in ranges may make prospective participants feel more comfortable responding to sensitive items. Imagine that you were interested in collecting information on topics such as income, number of sexual partners, number of times someone used illicit drugs, etc. You would have to think about the sensitivity of these items and determine if it would make more sense to collect some data at a lower level of measurement (e.g., asking if they are sexually active or not (nominal) versus their total number of sexual partners (ratio).

Finally, sometimes when analyzing data, researchers find a need to change a data’s level of measurement. For example, a few years ago, a student was interested in studying the relationship between mental health and life satisfaction. This student used a variety of measures. One item asked about the number of mental health symptoms, reported as the actual number. When analyzing data, my student examined the mental health symptom variable and noticed that she had two groups, those with none or one symptoms and those with many symptoms. Instead of using the ratio level data (actual number of mental health symptoms), she collapsed her cases into two categories, few and many. She decided to use this variable in her analyses. It is important to note that you can move a higher level of data to a lower level of data; however, you are unable to move a lower level to a higher level.

  • Check that the variables in your research question can vary…and that they are not constants or one of many potential attributes of a variable.
  • Think about the attributes your variables have. Are they categorical or continuous? What level of measurement seems most appropriate?

social work research level of measurement

Step 2: Specifying measures for each variable

Let’s pick a social work research question and walk through the process of operationalizing variables to see how specific we need to get. I’m going to hypothesize that residents of a psychiatric unit who are more depressed are less likely to be satisfied with care. Remember, this would be a inverse relationship—as depression increases, satisfaction decreases. In this question, depression is my independent variable (the cause) and satisfaction with care is my dependent variable (the effect). Now we have identified our variables, their attributes, and levels of measurement, we move onto the second component: the measure itself.

So, how would you measure my key variables: depression and satisfaction? What indicators would you look for? Some students might say that depression could be measured by observing a participant’s body language. They may also say that a depressed person will often express feelings of sadness or hopelessness. In addition, a satisfied person might be happy around service providers and often express gratitude. While these factors may indicate that the variables are present, they lack coherence. Unfortunately, what this “measure” is actually saying is that “I know depression and satisfaction when I see them.” While you are likely a decent judge of depression and satisfaction, you need to provide more information in a research study for how you plan to measure your variables. Your judgment is subjective, based on your own idiosyncratic experiences with depression and satisfaction. They couldn’t be replicated by another researcher. They also can’t be done consistently for a large group of people. Operationalization requires that you come up with a specific and rigorous measure for seeing who is depressed or satisfied.

Finding a good measure for your variable depends on the kind of variable it is. Variables that are directly observable don’t come up very often in my students’ classroom projects, but they might include things like taking someone’s blood pressure, marking attendance or participation in a group, and so forth. To measure an indirectly observable variable like age, you would probably put a question on a survey that asked, “How old are you?” Measuring a variable like income might require some more thought, though. Are you interested in this person’s individual income or the income of their family unit? This might matter if your participant does not work or is dependent on other family members for income. Do you count income from social welfare programs? Are you interested in their income per month or per year? Even though indirect observables are relatively easy to measure, the measures you use must be clear in what they are asking, and operationalization is all about figuring out the specifics of what you want to know. For more complicated constructs, you will need compound measures (that use multiple indicators to measure a single variable).

How you plan to collect your data also influences how you will measure your variables. For social work researchers using secondary data like client records as a data source, you are limited by what information is in the data sources you can access. If your organization uses a given measurement for a mental health outcome, that is the one you will use in your study. Similarly, if you plan to study how long a client was housed after an intervention using client visit records, you are limited by how their caseworker recorded their housing status in the chart. One of the benefits of collecting your own data is being able to select the measures you feel best exemplify your understanding of the topic.

Measuring unidimensional concepts

The previous section mentioned two important considerations: how complicated the variable is and how you plan to collect your data. With these in hand, we can use the level of measurement to further specify how you will measure your variables and consider specialized rating scales developed by social science researchers.

Measurement at each level

Nominal measures assess categorical variables. These measures are used for variables or indicators that have mutually exclusive attributes, but that cannot be rank-ordered. Nominal measures ask about the variable and provide names or labels for different attribute values like social work, counseling, and nursing for the variable profession. Nominal measures are relatively straightforward.

Ordinal measures often use a rating scale. It is an ordered set of responses that participants must choose from. Figure 11.1 shows several examples. The number of response options on a typical rating scale is usualy five or seven, though it can range from three to 11. Five-point scales are best for unipolar scales where only one construct is tested, such as frequency (Never, Rarely, Sometimes, Often, Always). Seven-point scales are best for bipolar scales where there is a dichotomous spectrum, such as liking (Like very much, Like somewhat, Like slightly, Neither like nor dislike, Dislike slightly, Dislike somewhat, Dislike very much). For bipolar questions, it is useful to offer an earlier question that branches them into an area of the scale; if asking about liking ice cream, first ask “Do you generally like or dislike ice cream?” Once the respondent chooses like or dislike, refine it by offering them relevant choices from the seven-point scale. Branching improves both reliability and validity (Krosnick & Berent, 1993). [9]  Although you often see scales with numerical labels, it is best to only present verbal labels to the respondents but convert them to numerical values in the analyses. Avoid partial labels or length or overly specific labels. In some cases, the verbal labels can be supplemented with (or even replaced by) meaningful graphics. The last rating scale shown in Figure 11.1 is a visual-analog scale, on which participants make a mark somewhere along the horizontal line to indicate the magnitude of their response.

social work research level of measurement

Figure 11.1 Example rating scales for closed-ended questionnaire items

Interval measures are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Likewise, if you have a scale that asks respondents’ annual income using the following attributes (ranges): $0 to 10,000, $10,000 to 20,000, $20,000 to 30,000, and so forth, this is also an interval measure, because the mid-point of each range (i.e., $5,000, $15,000, $25,000, etc.) are equidistant from each other. The intelligence quotient (IQ) scale is also an interval measure, because the measure is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although we do not really know whether that is truly the case). Interval measures allow us to examine “how much more” is one attribute when compared to another, which is not possible with nominal or ordinal measures. You may find researchers who “pretend” (incorrectly) that ordinal rating scales are actually interval measures so that we can use different statistical techniques for analyzing them. As we will discuss in the latter part of the chapter, this is a mistake because there is no way to know whether the difference between a 3 and a 4 on a rating scale is the same as the difference between a 2 and a 3. Those numbers are just placeholders for categories.

Ratio measures are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a “true zero” point (where the value zero implies lack or non-availability of the underlying construct). Think about how to measure the number of people working in human resources at a social work agency. It could be one, several, or none (if the company contracts out for those services). Measuring interval and ratio data is relatively easy, as people either select or input a number for their answer. If you ask a person how many eggs they purchased last week, they can simply tell you they purchased `a dozen eggs at the store, two at breakfast on Wednesday, or none at all.

Commonly used rating scales in questionnaires

The level of measurement will give you the basic information you need, but social scientists have developed specialized instruments for use in questionnaires, a common tool used in quantitative research. As we mentioned before, if you plan to source your data from client files or previously published results

Although Likert scale  is a term colloquially used to refer to almost any rating scale (e.g., a 0-to-10 life satisfaction scale), it has a much more precise meaning. In the 1930s, researcher Rensis Likert (pronounced LICK-ert) created a new approach for measuring people’s attitudes (Likert, 1932). [10]  It involves presenting people with several statements—including both favorable and unfavorable statements—about some person, group, or idea. Respondents then express their agreement or disagreement with each statement on a 5-point scale:  Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree . Numbers are assigned to each response and then summed across all items to produce a score representing the attitude toward the person, group, or idea. For items that are phrased in an opposite direction (e.g., negatively worded statements instead of positively worded statements), reverse coding is used so that the numerical scoring of statements also runs in the opposite direction. The entire set of items came to be called a Likert scale, as indicated in Table 11.2 below.

Unless you are measuring people’s attitude toward something by assessing their level of agreement with several statements about it, it is best to avoid calling it a Likert scale. You are probably just using a rating scale. Likert scales allow for more granularity (more finely tuned response) than yes/no items, including whether respondents are neutral to the statement. Below is an example of how we might use a Likert scale to assess your attitudes about research as you work your way through this textbook.

Table 11.2 Likert scale

Semantic differential scales are composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Whereas in the above Likert scale, the participant is asked how much they  agree or disagree  with a statement, in a semantic differential scale the participant is asked to indicate how they  feel  about a specific item. This makes the semantic differential scale an excellent technique for measuring people’s attitudes or feelings toward objects, events, or behaviors. Table 11.3 is an example of a semantic differential scale that was created to assess participants’ feelings about this textbook. 

Table 11.3. A semantic differential scale for measuring attitudes towards a textbook

Notice that on a Likert scale, each item is different but the choices for the scale are the same (e.g., strongly agree, agree, etc.). However, for a semantic differential scale, the thing that you are reviewing, in this case, beliefs about research content, remains the same. It is the choices that change. Semantic differential is believed to be an excellent technique for measuring people’s attitude or feelings toward objects, events, or behaviors.

This composite scale was designed by Louis Guttman and uses a series of items arranged in increasing order of intensity (least intense to most intense) of the concept. This type of scale allows us to understand the intensity of beliefs or feelings. Each item in the above  Guttman scale has a weight (this is not indicated on the tool) which varies with the intensity of that item, and the weighted combination of each response is used as an aggregate measure of an observation.

Notice how the items move from lower intensity to higher intensity. A researcher reviews the yes answers and creates a score for each participant.

For more complicated measures, researchers use scales and indices (sometimes called indexes) to measure their variables because they assess multiple indicators to develop a composite (or total) score. Composite scores provide a much greater understanding of concepts than a single item could. Although we won’t delve too deeply into the process of scale development, we will cover some important topics for you to understand how scales and indices developed by other researchers can be used in your project.

Although they exhibit differences (which will later be discussed) the two have in common various factors.

  • Both are ordinal measures of variables.
  • Both can order the units of analysis in terms of specific variables.
  • Both are composite measures .

social work research level of measurement

The previous section discussed how to measure respondents’ responses to predesigned items or indicators belonging to an underlying construct. But how do we create the indicators themselves? The process of creating the indicators is called scaling. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Stevens (1946)\(^{11}\) said, “Scaling is the assignment of objects to numbers according to a rule.” This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research.

The outcome of a scaling process is a scale , which is an empirical structure for measuring items or indicators of a given construct. Understand that multidimensional “scales”, as discussed in this section, are a little different from “rating scales” discussed in the previous section. A rating scale is used to capture the respondents’ reactions to a given item on a questionnaire. For example, an ordinally scaled item captures a value between “strongly disagree” to “strongly agree.” Attaching a rating scale to a statement or instrument is not scaling. Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items.

If creating your own scale sounds painful, don’t worry! For most multidimensional variables, you would likely be duplicating work that has already been done by other researchers. Specifically, this is a branch of science called psychometrics. You do not need to create a scale for depression because scales such as the Patient Health Questionnaire (PHQ-9), the Center for Epidemiologic Studies Depression Scale (CES-D), and Beck’s Depression Inventory (BDI) have been developed and refined over dozens of years to measure variables like depression. Similarly, scales such as the Patient Satisfaction Questionnaire (PSQ-18) have been developed to measure satisfaction with medical care. As we will discuss in the next section, these scales have been shown to be reliable and valid. While you could create a new scale to measure depression or satisfaction, a study with rigor would pilot test and refine that new scale over time to make sure it measures the concept accurately and consistently. This high level of rigor is often unachievable in student research projects because of the cost and time involved in pilot testing and validating, so using existing scales is recommended.

Unfortunately, there is no good one-stop=shop for psychometric scales. The Mental Measurements Yearbook provides a searchable database of measures for social science variables, though it woefully incomplete and often does not contain the full documentation for scales in its database. You can access it from a university library’s list of databases. If you can’t find anything in there, your next stop should be the methods section of the articles in your literature review. The methods section of each article will detail how the researchers measured their variables, and often the results section is instructive for understanding more about measures. In a quantitative study, researchers may have used a scale to measure key variables and will provide a brief description of that scale, its names, and maybe a few example questions. If you need more information, look at the results section and tables discussing the scale to get a better idea of how the measure works. Looking beyond the articles in your literature review, searching Google Scholar using queries like “depression scale” or “satisfaction scale” should also provide some relevant results. For example, searching for documentation for the Rosenberg Self-Esteem Scale (which we will discuss in the next section), I found this  report from researchers investigating acceptance and commitment therapy  which details this scale and many others used to assess mental health outcomes. If you find the name of the scale somewhere but cannot find the documentation (all questions and answers plus how to interpret the scale), a general web search with the name of the scale and “.pdf” may bring you to what you need. Or, to get professional help with finding information, always ask a librarian!

Unfortunately, these approaches do not guarantee that you will be able to view the scale itself or get information on how it is interpreted. Many scales cost money to use and may require training to properly administer. You may also find scales that are related to your variable but would need to be slightly modified to match your study’s needs. You could adapt a scale to fit your study, however changing even small parts of a scale can influence its accuracy and consistency. While it is perfectly acceptable in student projects to adapt a scale without testing it first (time may not allow you to do so), pilot testing is always recommended for adapted scales, and researchers seeking to draw valid conclusions and publish their results must take this additional step.

An  index is a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas. It is different from a scale. Scales also aggregate measures; however, these measures examine different dimensions  or  the same dimension of a single construct. A well-known example of an index is the  consumer price index  (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. The CPI is a measure of how much consumers have to pay for goods and services (in general) and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and “other goods and services”), which are further subdivided into more than 200 smaller items. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Using a complicated weighting scheme that takes into account the location and probability of purchase for each item, analysts then combine these prices into an overall index score using a series of formulas and rules.

Another example of an index is the  Duncan Socioeconomic Index  (SEI). This index is used to quantify a person’s socioeconomic status (SES) and is a combination of three concepts: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These very different measures are combined to create an overall SES index score. However, SES index measurement has generated a lot of controversy and disagreement among researchers.

The process of creating an index is similar to that of a scale. First, conceptualize (define) the index and its constituent components. Though this appears simple, there may be a lot of disagreement on what components (concepts/constructs) should be included or excluded from an index. For instance, in the SES index, isn’t income correlated with education and occupation? And if so, should we include one component only or all three components? Reviewing the literature, using theories, and/or interviewing experts or key stakeholders may help resolve this issue. Second, operationalize and measure each component. For instance, how will you categorize occupations, particularly since some occupations may have changed with time (e.g., there were no Web developers before the Internet)? As we will see in step three below, researchers must create a rule or formula for calculating the index score. Again, this process may involve a lot of subjectivity, so validating the index score using existing or new data is important.

Scale and index development at often taught in their own course in doctoral education, so it is unreasonable for you to expect to develop a consistently accurate measure within the span of a week or two. Using available indices and scales is recommended for this reason.

Differences between scales and indices

Though indices and scales yield a single numerical score or value representing a concept of interest, they are different in many ways. First, indices often comprise components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Conversely, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale about customer satisfaction).

Second, indices often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Nevertheless, indexes and scales are both essential tools in social science research.

Scales and indices seem like clean, convenient ways to measure different phenomena in social science, but just like with a lot of research, we have to be mindful of the assumptions and biases underneath. What if a scale or an index was developed using only White women as research participants? Is it going to be useful for other groups? It very well might be, but when using a scale or index on a group for whom it hasn’t been tested, it will be very important to evaluate the validity and reliability of the instrument, which we address in the rest of the chapter.

Finally, it’s important to note that while scales and indices are often made up of nominal or ordinal variables, when we analyze them into composite scores, we will treat them as interval/ratio variables.

  • Look back to your work from the previous section, are your variables unidimensional or multidimensional?
  • Describe the specific measures you will use (actual questions and response options you will use with participants) for each variable in your research question.
  • If you are using a measure developed by another researcher but do not have all of the questions, response options, and instructions needed to implement it, put it on your to-do list to get them.

social work research level of measurement

If we were operationalizing blood pressure, the cuff and reader would be the measure…but how do we interpret what is high, low, and normal blood pressure?

Step 3: How you will interpret your measures

The final stage of operationalization involves setting the rules for how the measure works and how the researcher should interpret the results. Sometimes, interpreting a measure can be incredibly easy. If you ask someone their age, you’ll probably interpret the results by noting the raw number (e.g., 22) someone provides and that it is lower or higher than other people’s ages. However, you could also recode that person into age categories (e.g., under 25, 20-29-years-old, generation Z, etc.). Even scales may be simple to interpret. If there is a scale of problem behaviors, one might simply add up the number of behaviors checked off–with a range from 1-5 indicating low risk of delinquent behavior, 6-10 indicating the student is moderate risk, etc. How you choose to interpret your measures should be guided by how they were designed, how you conceptualize your variables, the data sources you used, and your plan for analyzing your data statistically. Whatever measure you use, you need a set of rules for how to take any valid answer a respondent provides to your measure and interpret it in terms of the variable being measured.

For more complicated measures like scales, refer to the information provided by the author for how to interpret the scale. If you can’t find enough information from the scale’s creator, look at how the results of that scale are reported in the results section of research articles. For example, Beck’s Depression Inventory (BDI-II) uses 21 statements to measure depression and respondents rate their level of agreement on a scale of 0-3. The results for each question are added up, and the respondent is put into one of three categories: low levels of depression (1-16), moderate levels of depression (17-30), or severe levels of depression (31 and over).

One common mistake I see often is that students will introduce another variable into their operational definition. This is incorrect. Your operational definition should mention only one variable—the variable being defined. While your study will certainly draw conclusions about the relationships between variables, that’s not what operationalization is. Operationalization specifies what instrument you will use to measure your variable and how you plan to interpret the data collected using that measure.

Operationalization is probably the trickiest component of basic research methods, so please don’t get frustrated if it takes a few drafts and a lot of feedback to get to a workable definition. At the time of this writing, I am in the process of operationalizing the concept of “attitudes towards research methods.” Originally, I thought that I could gauge students’ attitudes toward research methods by looking at their end-of-semester course evaluations. As I became aware of the potential methodological issues with student course evaluations, I opted to use focus groups of students to measure their common beliefs about research. You may recall some of these opinions from  Chapter 1 , such as the common beliefs that research is boring, useless, and too difficult. After the focus group, I created a scale based on the opinions I gathered, and I plan to pilot test it with another group of students. After the pilot test, I expect that I will have to revise the scale again before I can implement the measure in a real social work research project. At the time I’m writing this, I’m still not completely done operationalizing this concept.

Key Takeaways

  • Operationalization involves spelling out precisely how a concept will be measured.
  • Operational definitions must include the variable, the measure, and how you plan to interpret the measure.
  • There are four different levels of measurement: nominal, ordinal, interval, and ratio (in increasing order of specificity).
  • Scales and indices are common ways to collect information and involve using multiple indicators in measurement.
  • A key difference between a scale and an index is that a scale contains multiple indicators for one concept, whereas an indicator examines multiple concepts (components).
  • Using scales developed and refined by other researchers can improve the rigor of a quantitative study.

Use the research question that you developed in the previous chapters and find a related scale or index that researchers have used. If you have trouble finding the exact phenomenon you want to study, get as close as you can.

  • What is the level of measurement for each item on each tool? Take a second and think about why the tool’s creator decided to include these levels of measurement. Identify any levels of measurement you would change and why.
  • If these tools don’t exist for what you are interested in studying, why do you think that is?

IMAGES

  1. Social Work Exam

    social work research level of measurement

  2. (PDF) Social Impact Measurement: Classification of Methods

    social work research level of measurement

  3. Levels of measurement

    social work research level of measurement

  4. PPT

    social work research level of measurement

  5. Measurement in social science research

    social work research level of measurement

  6. Levels of Measurement for Variables in Social Research

    social work research level of measurement

VIDEO

  1. Research in Social Work

  2. socialwork I social work education I msw I bsw social work course

  3. Social Work Research 16 September 2023

  4. Degree Semester 1

  5. SOCIAL SCIENCE VS SOCIAL WORK RESEARCH @DGS EDUPEDIA

  6. Social work research social work important topic pms urdu

COMMENTS

  1. 5.3 Levels of measurement

    Chapter Three: Ethics in social work research. 3.1 Research on humans. 3.2 Specific ethical issues to consider. 3.3 Ethics at micro, meso, and macro levels ... When using nominal level of measurement in research, it is very important to assign the attributes of potential answers very precisely.

  2. 6.3: Levels of Measurement

    In his seminal article titled "On the theory of scales of measurement" published in Science in 1946, psychologist Stanley Smith Stevens (1946) defined four generic types of rating scales for scientific measurements: nominal, ordinal, interval, and ratio scales. The statistical properties of these scales are shown in Table 6.1.

  3. 11. Quantitative measurement

    The ordinal level of measurement is the next level of measurement and contains slightly more specific information than the nominal level. This level has mutually exclusive categories and a hierarchy or order. Let's go back to the first item on the questionnaire we talked about above. ... The practice of research in social work (3rd. ed ...

  4. 5.03: Levels of measurement

    Attributes are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable's attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of ...

  5. 9.1: Measurement

    Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs. Measurement occurs at all stages of research.

  6. 10.1 What is measurement?

    5.3 Social work research paradigms. 5.4 Using theory in research design. 5.5 Developing a theoretical framework ... which Kaplan called "observational terms," is probably the simplest to measure in social science. ... you would like to research the impact on levels of older adult loneliness of an intervention that pairs older adults living ...

  7. State of Modern Measurement Approaches in Social Work Research ...

    models' increased utilization to address measurement problems in social work research (DeRoos & Allen-Meares, 1993, 1998; Nugent & Hankins, 1989, 1992). ... (CFA) and multi-level modeling methods. Second, it presents the results of a structured review assessing the penetration of IRT-based methods into the field of social work as

  8. 5.3 Levels of measurement

    Chapter Three: Ethics in social work research. 8. 3.1 Research on humans. 9. 3.2 Specific ethical issues to consider. 10. 3.3 Ethics at micro, meso, and macro levels. ... When using nominal level of measurement in research, it is very important to assign the attributes of potential answers very precisely.

  9. Levels of Measurement

    In scientific research, a variable is anything that can take on different values across your data set (e.g., height or test scores). There are 4 levels of measurement: Nominal: the data can only be categorized. Ordinal: the data can be categorized and ranked. Interval: the data can be categorized, ranked, and evenly spaced.

  10. Chapter Five: Defining and measuring concepts

    Chapter Three: Ethics in social work research. 8. 3.1 Research on humans. 9. 3.2 Specific ethical issues to consider. 10. 3.3 Ethics at micro, meso, and macro levels. ... 5.3 Levels of measurement; 5.4 Operationalization; 5.5 Measurement quality; 5.6 Challenges in quantitative measurement;

  11. 5.1 Measurement

    Chapter Three: Ethics in social work research. 8. 3.1 Research on humans. 9. 3.2 Specific ethical issues to consider. 10. 3.3 Ethics at micro, meso, and macro levels. ... These sorts of complexities require paying careful attention to a concept's level of measurement and its dimensions. We'll explore these complexities in greater depth at ...

  12. 9.1 Measurement

    Ethics in social work research. 5.0 Chapter introduction. Chapter Outline; Content Advisory; 5.1 Research on humans. Human research versus nonhuman research; ... These sorts of complexities require paying careful attention to a concept's level of measurement and its dimensions. We'll explore these complexities in greater depth at the end of ...

  13. Social Work *: Tests & Measurements

    Tests and measurements are typically used for research and assessment. They can be referred to as surveys, instruments, questionnaires, scales, tests, or measures. These measurement tools can be difficult to locate. That is because:\. There is no database that provides free, full-text copies. Some may only be available as a supplement - you may ...

  14. 4 Levels of Measurement in Social Science Research

    In social science empirical research, measurement of behavior is a crucial aspect to creating new knowledge about people and human interactions. There are 4 levels of measurement in social science research that every good researcher understands. These four levels of measurement include nominal, ordi.

  15. 6: Measurement of Constructs

    Testing theories (i.e., theoretical propositions) require measuring these constructs accurately, correctly, and in a scientific manner, before the strength of their relationships can be tested. Measurement refers to careful, deliberate observations of the real world and is the essence of empirical research.

  16. Research Scales (Nominal, Ordinal, Interval, and Ratio) and the ASWB

    Diving deeper into the world of research scales, let's explore how these scales are not just theoretical concepts, but practical tools that shape the landscape of Social Work research. Nominal Scales and Social Work Research. How They Are Used: Nominal scales are incredibly versatile in Social Work research. They're the go-to when the ...

  17. Wellbeing measures for workers: a systematic review and methodological

    Social workers enrolled in an MSW-level social work research course: Age: NR: Structural * Intrapersonal wellbeing: 0.80 Interpersonal wellbeing: 0.84: NE: NE: ... Given that most of the measurement properties of worker wellbeing instruments developed between 2010 and 2020 are not reported, there are many opportunities for establishing and ...

  18. Psychological capital, social support, work engagement, and ...

    A higher level of PsyCap was positively associated with changes in work engagement and life satisfaction between the two measurement waves. As the first data collection took place in the spring of 2019 and the second in the spring of 2021, the results also highlight the role of social support and PsyCap in dealing with demands related to the ...

  19. 11.2: Operational definitions

    Operationalization involves spelling out precisely how a concept will be measured. Operational definitions must include the variable, the measure, and how you plan to interpret the measure. There are four different levels of measurement: nominal, ordinal, interval, and ratio (in increasing order of specificity).