InfinityLearn logo

Weighted Mean – Definition, Formula, Uses, and FAQs

Infinity Learn IL premier league ILPL

Table of Contents

Introduction to Weighted Mean

Weighted Mean – Definition: The weighted mean is a type of arithmetic mean, which is calculated by multiplying each value in a data set by a weight and then adding up the results. The weighted mean is often used when the values in a data set are not all of equal importance. The weight can be thought of as a measure of the importance of each value in the data set.

Fill Out the Form for Expert Academic Guidance!

Please indicate your interest Live Classes Books Test Series Self Learning

Verify OTP Code (required)

I agree to the terms and conditions and privacy policy .

Fill complete details

Target Exam ---

Weight Definition

There is no one-size-fits-all definition of weight, as it can mean different things to different people. In general, weight is a measure of how much mass an object has. It can calculated by multiplying an object’s mass by the gravitational force exerted on it. Weight can also thought of as a force that resists movement, or the amount of mass needed to produce a certain acceleration.

Take free test

What Weighted Mean?

The weighted mean is a calculation that takes into account the relative importance of each number in a set. To calculate the weighted mean, you first need to determine the weight of each number. The weight of a number is the number’s importance divided by the total number of numbers in the set. Then, you add up the weights of all the numbers in the set, and divide the total weight by the total number of numbers. The weighted mean is the result of this calculation.

Define Weighted Mean

The weighted mean is a mathematical calculation that takes into account the relative importance of each number in a set. The calculation performed by multiplying each number in the set by a weight, and then adding the results. The weighted mean then calculated by dividing the sum by the sum of the weights.

Weighted Mean Formula

The weighted mean formula used to calculate the average of a set of numbers, where some numbers are given more weight than others. weighted mean calculated by multiplying each number in the set by its weight, and then adding all of the weighted values together. The weighted mean formula is:

(Weighted Mean = \frac{1}{N}\sum_{i=1}^{N}w_i x_i)

(Weighted Mean\) is the weighted mean

(N\) is the number of data points

(w_i\) is the weight of data point \(i\)

(x_i\) is the data point \(i\)

Take free test

Uses of Weighted Means

Weighted means often used in surveys, where they can used to adjust for the non-response bias of a survey. This done by giving different weights to different respondents, depending on how likely they to respond. This can give a more accurate estimate of the population mean than would obtained if all respondents given the same weight.

Weighted means can also used to adjust for the clustering of data. This done by giving different weights to different data points, depending on how close they are to other data points. This can give a more accurate estimate of the population mean than would obtained if all data points given the same weight.

Solved Example

Example: Weighted Mean In a class of 30 students, the average (mean) score on a test is 75. The average score of the male students is 80 and the average score of the female students is 70. Find the weighted mean score.

Ans. The weighted mean score is 76.7.

Helpful Resources

  • NCERT Solutions for Class 6
  • NCERT Solutions for Class 7
  • NCERT Solutions for Class 8
  • NCERT Solutions for Class 9
  • NCERT Solutions for Class 10
  • NCERT Solutions for Class 11
  • NCERT Solutions for Class 12

Frequently Asked Questions (FAQs)

What does weighted mean means.

Weighted mean means giving different values or items varying levels of importance or weight in a calculation.

How do you explain weighted mean in research?

In research, the weighted mean accounts for the significance of individual data points by assigning weights, often used in aggregating survey results.

How do you calculate weighted mean?

To calculate the weighted mean, multiply each value by its corresponding weight, sum these products, and divide by the total weight.

What is the weighted mean difference?

The weighted mean difference measures the impact of different factors on the mean, often used in statistical analysis to assess the influence of variables.

Related content

Call Infinity Learn

Talk to our academic expert!

Language --- English Hindi Marathi Tamil Telugu Malayalam

Get access to free Mock Test and Master Class

Register to Get Free Mock Test and Study Material

Offer Ends in 5:00

  • Weighted Mean

Popular statistics

Popular topics, definition of weighted mean, examples of weighted mean applications, advantages of weighted mean, limitations of weighted mean, weighted mean vs. arithmetic mean, u.s.: population diagnosed with depression, by education....

U.S.: Population Diagnosed with Depression, by education level

  • Autocorrelation
  • Correlation
  • Extrapolation
  • Full population survey
  • Geometric Distribution
  • Interpolation
  • Joint Probability Density
  • Kappa Statistic
  • Linear model
  • Margin of error
  • Normal distribution
  • Opinion poll
  • Probability
  • Qualitative data
  • Random sample
  • Statistical hypothesis testing
  • Time series
  • Uncertainty and Statistics

Logo for British Columbia/Yukon Open Authoring Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Measures of Central Location

Weighted Mean

Learning Outcomes

Understand the difference between a regular and weighted mean. Learn how to calculated a weighted mean and calculate for a missing weight or data value.

weighted mean in research purpose

“The weighted arithmetic mean is similar to an ordinary  arithmetic mean  (the most common type of  average ), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others.” [1]

This kind of average is called the weighted mean, and is given by the following formula:

\[x_w =\frac{\Sigma x\cdot w}{\Sigma w}\]

Where the x’s are the data you are looking to average and the w’s are the weights assigned to each of those data points.

Illustration

Suppose you have $3,000, and make 3 different investments, some invested at 5%, some at 6%, and some at 7%.

Question: Is your average rate of return equal to [latex]\frac{(5 + 6 + 7)\%} {3} = 6%[/latex]?

Answer: The answer ONLY if you invested equal amounts in each investment, and NO otherwise. In the case where you invest different amounts, a weighted average is required to determine the average rate of return.

Example: Investing Different Amounts in Three Investments

Example 4.1.1.

Let us suppose you invested $30,000 at 5%, $6,000 at 6%, and $4,500 at 7%.

Question: Is the overall (average) rate of return for your investment 6%?

Answer: No, it is given by the following calculation:

[latex]\begin{align*} x_w &=\frac{5\%\cdot$30,000 +6\%\cdot$6,000 + 7\%\cdot$4,500}{$30,000+$6,000+$4,500}\\ &=\frac{$1,500 +$360 + $315}{$30,000+$6,000+$4,500}\\ &= \frac{$2,175}{$40,500}\\ &= 0.0537 = 5.37\% \end{align*}[/latex]

Example: Tables for Weighted Mean Calculations

Example 4.1.2.

Let us, instead, use a table to organize the calculations from the previous example. Note: the x · w column is the product of the x and w values.

This gives the following results:

[latex]\begin{align*} x_w &=\frac{\Sigma x\cdot w}{\Sigma w} =\frac{$40,500}{$2,175} =0.0537=5.37\% \end{align*}[/latex]

Example: Excel for Weighted Mean (Video)

Example 4.1.3.

We can also use Excel’s =sum() and =sumproduct formulas to calculate the weighted means:

Click here for the Excel solutions shown in the above video .

Example: Solving for Missing x-value (X)

Example 4.2.1.

Problem Setup: You purchased shares several years ago and sold them today:

  • You purchased 100 shares of Moderna for $200 each
  • You purchased 40 shares of Microsoft for $40 each
  • You earned 12% on your investments today (when you sold all of the shares).
  • But, you LOST 15% on the sale of your Moderna shares.

Question: What rate of return did you earn on the sale of your Microsoft shares?

[latex]\begin{align*} 12\% &=\frac{-$3,000 +$16,000 x}{$36,000} \end{align*}[/latex]

Steps : We need to use algebra to solve for this equation:

  • Cross multiply: [latex]12\%\cdot$36,000 =-$3,000 +$16,000 x[/latex]
  • Collect like terms: [latex]$4,320+$3,000 =$16,000 x[/latex]
  • Divide both sides by $16,000: [latex]\frac{$7,320}{$16,000}= \frac{$16,000 x}{$16,000}[/latex]
  • Simplify: [latex]0.4575=x=45.75\%[/latex]

Answer: You earned 45.75% on your Microsoft Shares.

Example: Solving for Missing Weight (Exercise)

Example 4.2.2.

You also had an opportunity to invest in some Amazon stock for $150/share. The shares earned 20% over the same time period. If you bought these shares, you would end up earning 18% overall over the investment period.

Question: How much money would you have had to invest in Amazon stock to achieve an overall average rate of 15% on your investments?

Instructions: Drag and drop the values below into the correct boxes:

Key Takeaways (Exercise)

Key Takeaways: Weighted Mean

Your Own Notes

  • Are there any notes you want to take from this section? Is there anything you’d like to copy and paste below?
  • These notes are for you only (they will not be stored anywhere)
  • Make sure to download them at the end to use as a reference
  • https://en.wikipedia.org/wiki/Weighted_arithmetic_mean ↵

An Introduction to Business Statistics for Analytics (1st Edition) Copyright © 2024 by Amy Goldlist; Charles Chan; Leslie Major; Michael Johnson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

  • Search Search Please fill out this field.

What Is Weighted Average?

Weighting a stock portfolio.

  • Pros and Cons
  • Types of Averages

The Bottom Line

  • Corporate Finance
  • Financial statements: Balance, income, cash flow, and equity

Weighted Average: Definition and How It Is Calculated and Used

weighted mean in research purpose

A weighted average is a calculation that takes into account the varying degrees of importance of the numbers in a data set. A weighted average can be more accurate than a simple average in which all numbers in a data set are assigned an identical weight.

Key Takeaways

  • The weighted average takes into account the relative importance or frequency of some factors in a data set.
  • A weighted average is sometimes more accurate than a simple average.
  • In a weighted average, each data point value is multiplied by the assigned weight, which is then summed and divided by the number of data points.
  • A weighted average can improve the data’s accuracy.
  • Stock investors use a weighted average to track the cost basis of shares bought at varying times.

Weighted Average

Paige McLaughlin / Investopedia

What Is the Purpose of a Weighted Average?

In calculating a simple average, or arithmetic mean , all numbers are treated equally and assigned equal weight. But a weighted average assigns weights that determine in advance the relative importance of each data point. In calculating a weighted average, each number in the data set is multiplied by a predetermined weight before the final calculation is made.

A weighted average is most often computed to equalize the frequency of the values in a data set. For example, a survey may gather enough responses from every age group to be considered statistically valid, but the 18 to 34 age group may have fewer respondents than all others relative to their share of the population . The survey team may weigh the results of the 18 to 34 age group so that their views are represented proportionately.

However, values in a data set may be weighted for other reasons than the frequency of occurrence. For example, if students in a dance class are graded on skill, attendance, and manners, the grade for skill may be given greater weight than the other factors.

Each data point value in a weighted average is multiplied by the assigned weight, which is then summed and divided by the number of data points. The final average number reflects the relative importance of each observation and is thus more descriptive than a simple average. It also has the effect of smoothing out the data and enhancing its accuracy.

Investors usually build a position in a stock over a period of several years. That makes it tough to keep track of the cost basis on those shares and their relative changes in value. The investor can calculate a weighted average of the share price paid for the shares. To do so, multiply the number of shares acquired at each price by that price, add those values, then divide the total value by the total number of shares.

A weighted average is arrived at by determining in advance the relative importance of each data point.

For example, say an investor acquires 100 shares of a company in year one at $10, and 50 shares of the same stock in year two at $40. To get a weighted average of the price paid, the investor multiplies 100 shares by $10 for year one and 50 shares by $40 for year two, then adds the results to get a total of $3,000. Then the total amount paid for the shares, $3,000 in this case, is divided by the number of shares acquired over both years, 150, to get the weighted average price paid of $20.

This average is now weighted with respect to the number of shares acquired at each price, not just the absolute price.

The weighted average is sometimes also called the weighted mean.

Advantages and Disadvantages of Weighted Average

Pros of weighted average.

Weighted average provides a more accurate representation of data when different values within a dataset hold varying degrees of importance. By assigning weights to each value based on their significance, weighted averages ensure that more weight is given to data points that have a greater impact on the overall result. This allows for a more nuanced analysis and decision-making process.

Next, weighted averages are particularly useful for handling skewed distributions or outliers within a dataset. Instead of being overly influenced by extreme values, weighted averages take into account the relative importance of each data point. This means you can "manipulate" your data set so it's more relevant, especially when you don't want to consider extreme values.

Thirdly, weighted averages offer flexibility in their application across various fields and disciplines. Whether in finance, statistics, engineering, or manufacturing , weighted averages can be customized to suit specific needs and objectives. For instance, like we discussed above, weighted averages are commonly used to calculate portfolio returns where the weights represent the allocation of assets. Weighted averages can also be used in the manufacturing process to determine the right combination of goods to use.

Cons of Weighted Average

One downside of a weighted average is the potential for subjectivity in determining the weights assigned to each data point. Deciding on the appropriate weights can be challenging, and it often involves subjective judgment where you don't actually know the weight to attribute. This subjectivity can introduce bias into the analysis and undermine the reliability of the weighted average.

Weighted averages may be sensitive to changes in the underlying data or weighting scheme. Small variations in the weights or input values can lead to significant fluctuations in the calculated average, making the results less stable and harder to interpret. This sensitivity can be particularly problematic in scenarios where the weights are based on uncertain or volatile factors which may include human emotion (i.e. are you confident you'll feel the same about the appropriate weights over time?).

Last, the interpretation of weighted averages can be more complex compared to simple arithmetic means. Though weighted averages provide a single summary statistic, they may make it tough to understand the full scope of the relationship across data points. Therefore, it's essential to carefully assess how the weights are assigned and the values are clearly communicated to those who interpret the results.

Accurate representation via weighted significance, aiding nuanced decision-making.

Handles outliers, mitigating extreme value influence for relevance.

Flexible across fields, tailor needs, or objectives.

Subjectivity in determining weights introduces bias and undermines reliability.

Sensitivity to changes in data or weighting scheme affects stability.

Adds complexity compared to arithmetic mean, potentially obscuring analysis

Examples of Weighted Averages

Weighted averages show up in many areas of finance besides the purchase price of shares, including portfolio returns , inventory accounting, and valuation. When a fund that holds multiple securities is up 10% on the year, that 10% represents a weighted average of returns for the fund with respect to the value of each position in the fund.

For inventory accounting, the weighted average value of inventory accounts for fluctuations in commodity prices, for example, while LIFO (last in, first out) or FIFO (first in, first out) methods give more importance to time than value.

When evaluating companies to discern whether their shares are correctly priced, investors use the weighted average cost of capital (WACC) to discount a company’s cash flows. WACC is weighted based on the market value of debt and equity in a company’s capital structure.

Weighted Average vs. Arithmetic vs. Geometric

Weighted averages provide a tailored solution for scenarios where certain data points hold more significance than others. However, there are other forms of calculating averages, some of which were mentioned earlier. The two main alternatives are the arithmetic average and geometric average.

Arithmetic means, or simple averages, are the simplest form of averaging and are widely used for their ease of calculation and interpretation. They assume that all data points are of equal importance and are suitable for symmetrical distributions without significant outliers. Arithmetic means will often be easier to calculate since you divide the sum of the total by the number of instances. However, it is much less nuanced and does not allow for much flexibility.

Another common type of central tendency measure is the geometric mean . The geometric mean offers a specialized solution for scenarios involving exponential growth or decline. By taking the nth root of the product of n values, geometric means give equal weight to the relative percentage changes between values. This makes them particularly useful in finance for calculating compound interest rates or in epidemiology for analyzing disease spread rates.

A weighted average is a statistical measure that assigns different weights to individual data points based on their relative significance, resulting in a more accurate representation of the overall data set. It is calculated by multiplying each data point by its corresponding weight, summing the products, and dividing by the sum of the weights.

Is Weighted Average Better?

Whether a weighted average is better depends on the specific context and the objectives of your analysis. Weighted averages are better when different data points have varying degrees of importance, allowing you to have a more nuanced representation of the data. However, they may introduce subjectivity in determining weights and can be sensitive to changes in the weighting scheme

How Does a Weighted Average Differ From a Simple Average?

A weighted average accounts for the relative contribution, or weight, of the things being averaged, while a simple average does not. Therefore, it gives more value to those items in the average that occur relatively more.

What Are Some Examples of Weighted Averages Used in Finance?

Many weighted averages are found in finance, including the volume-weighted average price (VWAP) , the weighted average cost of capital, and exponential moving averages (EMAs) used in charting. Construction of portfolio weights and the LIFO and FIFO inventory methods also make use of weighted averages.

How Do You Calculate a Weighted Average?

You can compute a weighted average by multiplying its relative proportion or percentage by its value in sequence and adding those sums together. Thus, if a portfolio is made up of 55% stocks, 40% bonds, and 5% cash, those weights would be multiplied by their annual performance to get a weighted average return. So if stocks, bonds, and cash returned 10%, 5%, and 2%, respectively, the weighted average return would be (55 × 10%) + (40 × 5%) + (5 × 2%) = 7.6%.

Statistical measures can be a very important way to help you in your investment journey. You can use weighted averages to help determine the average price of shares as well as the returns of your portfolio. It is generally more accurate than a simple average. You can calculate the weighted average by multiplying each number in the data set by its weight, then adding up each of the results together.

Tax Foundation. “ Inventory Valuation in Europe .”

My Accounting Course. “ Weighted Average Cost of Capital (WACC) Guide .”

CDC. " Measures of Spread ."

weighted mean in research purpose

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Biostatistics

  • Data Science
  • Programming
  • Social Science
  • Certificates
  • Undergraduate
  • For Businesses
  • FAQs and Knowledge Base

Test Yourself

  • Instructors

Weighted Mean

Statistical glossary.

Weighted Mean:

Besides its use as a descriptive statistic , the weighted mean is also used to construct filters .

See also Mean values (comparison) and the online short course Basic Concepts in Probability and Statistics

Browse Other Glossary Entries

Planning on taking an introductory statistics course, but not sure if you need to start at the beginning? Review the course description for each of our introductory statistics courses and estimate which best matches your level, then take the self test for that course. If you get all or almost all the questions correct, move on and take the next test.

Data Analytics

Considering becoming adata scientist, customer analyst or our data science certificate program?

Advanced Statistics Quiz

Looking at statistics for graduate programs or to enhance your foundational knowledge?

Regression Quiz

Entering the biostatistics field? Test your skill here.

Stay Informed

Read up on our latest blogs

Learn about our certificate programs

Find the right course for you

We'd love to answer your questions

Our mentors and academic advisors are standing by to help guide you towards the courses or program that makes the most sense for you and your goals.

300 W Main St STE 301, Charlottesville, VA 22903

(434) 973-7673

[email protected]

By submitting your information, you agree to receive email communications from Statistics.com. All information submitted is subject to our privacy policy . You may opt out of receiving communications at any time.

Weighted Mean

Also called Weighted Average

A mean where some values contribute more than others.

When we do a simple mean (or average), we give equal weight to each number.

Here is the mean of 1, 2, 3 and 4:

Add up the numbers, divide by how many numbers:

Mean =   1 + 2 + 3 + 4 4   =   10 4   = 2.5

We could think that each of those numbers has a "weight" of ¼ (because there are 4 numbers):

Mean = ¼ × 1 + ¼ × 2 + ¼ × 3 + ¼ × 4 = 0.25 + 0.5 + 0.75 + 1 = 2.5

Same answer.

Now let's change the weight of 3 to 0.7 , and the weights of the other numbers to 0.1 so the total of the weights is still 1 :

Mean = 0.1 × 1 + 0.1 × 2 + 0.7 × 3 + 0.1 × 4 = 0.1 + 0.2 + 2.1 + 0.4 = 2.8

This weighted mean is now a little higher ("pulled" there by the weight of 3).

Weighted means can help with decisions where some things are more important than others:

camera

Example: Sam wants to buy a new camera, and decides on the following rating system:

  • Image Quality 50%
  • Battery Life 30%
  • Zoom Range 20%

The Sonu camera gets 8 (out of 10) for Image Quality, 6 for Battery Life and 7 for Zoom Range

The Conan camera gets 9 for Image Quality, 4 for Battery Life and 6 for Zoom Range

Which camera is best?

Sonu: 0.5 × 8 + 0.3 × 6 + 0.2 × 7 = 4 + 1.8 + 1.4 = 7.2

Conan: 0.5 × 9 + 0.3 × 4 + 0.2 × 6 = 4.5 + 1.2 + 1.2 = 6.9

Sam decides to buy the Sonu.

What if the Weights Don't Add to 1?

When the weights don't add to 1, divide by the sum of weights.

lunch

Example: Alex usually eats lunch 7 times a week, but some weeks only gets 1, 2, or 5 lunches.

Alex had lunch:

  • on 2 weeks: only one lunch for the whole week
  • on 14 weeks: 2 lunches each week
  • on 8 weeks: 5 lunches each week
  • on 32 weeks: 7 lunches each week

What is the mean number of lunches Alex has each week?

Use "Weeks" as the weighting:

Weeks × Lunches = 2 × 1 + 14 × 2 + 8 × 5 + 32 × 7 = 2 + 28 + 40 + 224 = 294

Also add up the weeks:

Weeks = 2 + 14 + 8 + 32 = 56

Divide total lunches by total weeks:

Mean =   294 56   = 5.25

It looks like this:

But it is often better to use a table to make sure you have all the numbers correct:

Example (continued):

  • w for the number of weeks (the weight)
  • x for lunches (the value we want the mean of)

Multiply w by x , sum up w and sum up wx :

The symbol Σ ( Sigma ) means "Sum Up"

Divide Σwx by Σw :

(Same answer as before.)

And that leads us to our formula:

Weighted Mean =   Σwx Σw

In other words: multiply each weight w by its matching value x , sum that all up, and divide by the sum of weights.

  • Weighted Mean : A mean where some values contribute more than others.
  • When the weights add to 1: just multiply each weight by the matching value and sum it all up

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

  • For Weighting Online Opt-In Samples, What Matters Most?
  • 1. How different weighting methods work

Table of Contents

  • 2. Reducing bias on benchmarks
  • 3. Variability of survey estimates
  • Acknowledgements
  • Appendix A: Survey methodology
  • Appendix B: Synthetic population dataset
  • Appendix C: Adjustment procedures

Historically, public opinion surveys have relied on the ability to adjust their datasets using a core set of demographics – sex, age, race and ethnicity, educational attainment, and geographic region – to correct any imbalances between the survey sample and the population. These are all variables that are correlated with a broad range of attitudes and behaviors of interest to survey researchers. Additionally, they are well measured on large, high-quality government surveys such as the American Community Survey (ACS), conducted by the U.S. Census Bureau, which means that reliable population benchmarks are readily available.

But are they sufficient for reducing selection bias 6  in online opt-in surveys? Two studies that compared weighted and unweighted estimates from online opt-in samples found that in many instances, demographic weighting only minimally reduced bias, and in some cases actually made bias worse. 7  In a previous Pew Research Center study comparing estimates from nine different online opt-in samples and the probability-based American Trends Panel, the sample that displayed the lowest average bias across 20 benchmarks (Sample I) used a number of variables in its weighting procedure that went beyond basic demographics, and it included factors such as frequency of internet use, voter registration, party identification and ideology. 8  Sample I also employed a more complex statistical process involving three stages: matching followed by a propensity adjustment and finally raking (the techniques are described in detail below).

The present study builds on this prior research and attempts to determine the extent to which the inclusion of different adjustment variables or more sophisticated statistical techniques can improve the quality of estimates from online, opt-in survey samples. For this study, Pew Research Center fielded three large surveys, each with over 10,000 respondents, in June and July of 2016. The surveys each used the same questionnaire, but were fielded with different online, opt-in panel vendors. The vendors were each asked to produce samples with the same demographic distributions (also known as quotas) so that prior to weighting, they would have roughly comparable demographic compositions. The survey included questions on political and social attitudes, news consumption, and religion. It also included a variety of questions drawn from high-quality federal surveys that could be used either for benchmarking purposes or as adjustment variables. (See Appendix A for complete methodological details and Appendix F for the questionnaire.)

This study compares two sets of adjustment variables: core demographics (age, sex, educational attainment, race and Hispanic ethnicity, and census division) and a more expansive set of variables that includes both the core demographic variables and additional variables known to be associated with political attitudes and behaviors. These additional political variables include party identification, ideology, voter registration and identification as an evangelical Christian, and are intended to correct for the higher levels of civic and political engagement and Democratic leaning observed in the Center’s previous study .

The analysis compares three primary statistical methods for weighting survey data: raking, matching and propensity weighting. In addition to testing each method individually, we tested four techniques where these methods were applied in different combinations for a total of seven weighting methods:

Propensity weighting

  • Matching + Propensity weighting
  • Matching + Raking
  • Propensity weighting+ Raking
  • Matching + Propensity weighting + Raking

Because different procedures may be more effective at larger or smaller sample sizes, we simulated survey samples of varying sizes. This was done by taking random subsamples of respondents from each of the three (n=10,000) datasets. The subsample sizes ranged from 2,000 to 8,000 in increments of 500. 9  Each of the weighting methods was applied twice to each simulated survey dataset (subsample): once using only core demographic variables, and once using both demographic and political measures. 10  Despite the use of different vendors, the effects of each weighting protocol were generally consistent across all three samples. Therefore, to simplify reporting, the results presented in this study are averaged across the three samples.

Often researchers would like to weight data using population targets that come from multiple sources. For instance, the American Community Survey (ACS), conducted by the U.S. Census Bureau, provides high-quality measures of demographics. The Current Population Survey (CPS) Voting and Registration Supplement provides high-quality measures of voter registration. No government surveys measure partisanship, ideology or religious affiliation, but they are measured on surveys such as the General Social Survey (GSS) or Pew Research Center’s Religious Landscape Study (RLS).

For some methods, such as raking, this does not present a problem, because they only require summary measures of the population distribution. But other techniques, such as matching or propensity weighting, require a case-level dataset that contains all of the adjustment variables. This is a problem if the variables come from different surveys.

To overcome this challenge, we created a “synthetic” population dataset that took data from the ACS and appended variables from other benchmark surveys (e.g., the CPS and RLS). In this context, “synthetic” means that some of the data came from statistical modeling (imputation) rather than directly from the survey participants’ answers. 11

The first step in this process was to identify the variables that we wanted to append to the ACS, as well as any other questions that the different benchmark surveys had in common. Next, we took the data for these questions from the different benchmark datasets (e.g., the ACS and CPS) and combined them into one large file, with the cases, or interview records, from each survey literally stacked on top of each other. Some of the questions – such as age, sex, race or state – were available on all of the benchmark surveys, but others have large holes with missing data for cases that come from surveys where they were not asked.

weighted mean in research purpose

The next step was to statistically fill the holes of this large but incomplete dataset. For example, all the records from the ACS were missing voter registration, which that survey does not measure. We used a technique called multiple imputation by chained equations (MICE) to fill in such missing information. 12  MICE fills in likely values based on a statistical model using the common variables. This process is repeated many times, with the model getting more accurate with each iteration. Eventually, all of the cases will have complete data for all of the variables used in the procedure, with the imputed variables following the same multivariate distribution as the surveys where they were actually measured.

The result is a large, case-level dataset that contains all the necessary adjustment variables. For this study, this dataset was then filtered down to only those cases from the ACS. This way, the demographic distribution exactly matches that of the ACS, and the other variables have the values that would be expected given that specific demographic distribution. We refer to this final dataset as the “synthetic population,” and it serves as a template or scale model of the total adult population.

This synthetic population dataset was used to perform the matching and the propensity weighting. It was also used as the source for the population distributions used in raking. This approach ensured that all of the weighted survey estimates in the study were based on the same population information. See Appendix B for complete details on the procedure.

For public opinion surveys, the most prevalent method for weighting is iterative proportional fitting, more commonly referred to as raking. With raking, a researcher chooses a set of variables where the population distribution is known, and the procedure iteratively adjusts the weight for each case until the sample distribution aligns with the population for those variables. For example, a researcher might specify that the sample should be 48% male and 52% female, and 40% with a high school education or less, 31% who have completed some college, and 29% college graduates. The process will adjust the weights so that gender ratio for the weighted survey sample matches the desired population distribution. Next, the weights are adjusted so that the education groups are in the correct proportion. If the adjustment for education pushes the sex distribution out of alignment, then the weights are adjusted again so that men and women are represented in the desired proportion. The process is repeated until the weighted distribution of all of the weighting variables matches their specified targets.

Raking is popular because it is relatively simple to implement, and it only requires knowing the marginal proportions for each variable used in weighting. That is, it is possible to weight on sex, age, education, race and geographic region separately without having to first know the population proportion for every combination of characteristics (e.g., the share that are male, 18- to 34-year-old, white college graduates living in the Midwest). Raking is the standard weighting method used by Pew Research Center and many other public pollsters.

In this study, the weighting variables were raked according to their marginal distributions, as well as by two-way cross-classifications for each pair of demographic variables (age, sex, race and ethnicity, education, and region).

Matching is another technique that has been proposed as a means of adjusting online opt-in samples. It involves starting with a sample of cases (i.e., survey interviews) that is representative of the population and contains all of the variables to be used in the adjustment. This “target” sample serves as a template for what a survey sample would look like if it was randomly selected from the population. In this study, the target samples were selected from our synthetic population dataset, but in practice they could come from other high-quality data sources containing the desired variables. Then, each case in the target sample is paired with the most similar case from the online opt-in sample. When the closest match has been found for all of the cases in the target sample, any unmatched cases from the online opt-in sample are discarded.

If all goes well, the remaining matched cases should be a set that closely resembles the target population. However, there is always a risk that there will be cases in the target sample with no good match in the survey data – instances where the most similar case has very little in common with the target. If there are many such cases, a matched sample may not look much like the target population in the end.

There are a variety of ways both to measure the similarity between individual cases and to perform the matching itself. 13  The procedure employed here used a target sample of 1,500 cases that were randomly selected from the synthetic population dataset. To perform the matching, we temporarily combined the target sample and the online opt-in survey data into a single dataset. Next, we fit a statistical model that uses the adjustment variables (either demographics alone or demographics + political variables) to predict which cases in the combined dataset came from the target sample and which came from the survey data.

The kind of model used was a machine learning procedure called a random forest. Random forests can incorporate a large number of weighting variables and can find complicated relationships between adjustment variables that a researcher may not be aware of in advance. In addition to estimating the probability that each case belongs to either the target sample or the survey, random forests also produce a measure of the similarity between each case and every other case. The random forest similarity measure accounts for how many characteristics two cases have in common (e.g., gender, race and political party) and gives more weight to those variables that best distinguish between cases in the target sample and responses from the survey dataset. 14

We used this similarity measure as the basis for matching.

The final matched sample is selected by sequentially matching each of the 1,500 cases in the target sample to the most similar case in the online opt-in survey dataset. Every subsequent match is restricted to those cases that have not been matched previously. Once the 1,500 best matches have been identified, the remaining survey cases are discarded.

For all of the sample sizes that we simulated for this study (n=2,000 to 8,000), we always matched down to a target sample of 1,500 cases. In simulations that started with a sample of 2,000 cases, 1,500 cases were matched and 500 were discarded. Similarly, for simulations starting with 8,000 cases, 6,500 were discarded. In practice, this would be very wasteful. However, in this case, it enabled us to hold the size of the final matched dataset constant and measure how the effectiveness of matching changes when a larger share of cases is discarded. The larger the starting sample, the more potential matches there are for each case in the target sample – and, hopefully, the lower the chances of poor-quality matches.

A key concept in probability-based sampling is that if survey respondents have different probabilities of selection, weighting each case by the inverse of its probability of selection removes any bias that might result from having different kinds of people represented in the wrong proportion. The same principle applies to online opt-in samples. The only difference is that for probability-based surveys, the selection probabilities are known from the sample design, while for opt-in surveys they are unknown and can only be estimated.

For this study, these probabilities were estimated by combining the online opt-in sample with the entire synthetic population dataset and fitting a statistical model to estimate the probability that a  case comes from the synthetic population dataset or the online opt-in sample. As with matching, random forests were used to calculate these probabilities, but this can also be done with other kinds of models, such as logistic regression. 15  Each online opt-in case was given a weight equal to the estimated probability that it came from the synthetic population divided by the estimated probability that it came from the online opt-in sample. Cases with a low probability of being from the online opt-in sample were underrepresented relative to their share of the population and received large weights. Cases with a high probability were overrepresented and received lower weights.

As with matching, the use of a random forest model should mean that interactions or complex relationships in the data are automatically detected and accounted for in the weights. However, unlike matching, none of the cases are thrown away. A potential disadvantage of the propensity approach is the possibility of highly variable weights, which can lead to greater variability for estimates (e.g., larger margins of error).

Combinations of adjustments

Some studies have found that a first stage of adjustment using matching or propensity weighting followed by a second stage of adjustment using raking can be more effective in reducing bias than any single method applied on its own. 16  Neither matching nor propensity weighting will force the sample to exactly match the population on all dimensions, but the random forest models used to create these weights may pick up on relationships between the adjustment variables that raking would miss. Following up with raking may keep those relationships in place while bringing the sample fully into alignment with the population margins.

These procedures work by using the output from earlier stages as the input to later stages. For example, for matching followed by raking (M+R), raking is applied only the 1,500 matched cases. For matching followed by propensity weighting (M+P), the 1,500 matched cases are combined with the 1,500 records in the target sample. The propensity model is then fit to these 3,000 cases, and the resulting scores are used to create weights for the matched cases. When this is followed by a third stage of raking (M+P+R), the propensity weights are trimmed and then used as the starting point in the raking process. When first-stage propensity weights are followed by raking (P+R), the process is the same, with the propensity weights being trimmed and then fed into the raking procedure.

  • When survey respondents are self-selected, there is a risk that the resulting sample may differ from the population in ways that bias survey estimates. This is known as selection bias, and it occurs when the kinds of people who choose to participate are systematically different from those who do not on the survey outcomes. Selection bias can occur in both probability-based surveys (in the form of nonresponse) as well as online opt-in surveys. ↩
  • See Yeager, David S., et al. 2011. “ Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples. ” Public Opinion Quarterly 75(4), 709-47; and Gittelman, Steven H., Randall K. Thomas, Paul J. Lavrakas and Victor Lange. 2015. “ Quota Controls in Survey Research: A Test of Accuracy and Intersource Reliability in Online Samples .” Journal of Advertising Research 55(4), 368-79. ↩
  • In the 2016 Pew Research Center study a standard set of weights based on age, sex, education, race and ethnicity, region, and population density were created for each sample. For samples where vendors provided their own weights, the set of weights that resulted in the lowest average bias was used in the analysis. Only in the case of Sample I did the vendor provide weights resulting in lower bias than the standard weights ↩
  • Many surveys feature sample sizes less than 2,000, which raises the question of whether it would be important to simulate smaller sample sizes. For this study, a minimum of 2,000 was chosen so that it would be possible to have 1,500 cases left after performing matching, which involves discarding a portion of the completed interviews. ↩
  • The process of calculating survey estimates using different weighting procedures was repeated 1,000 times using different randomly selected subsamples. This enabled us to measure the amount of variability introduced by each procedure and distinguish between systematic and random differences in the resulting estimates. ↩
  • The idea for augmenting ACS data with modeled variables from other surveys and measures of its effectiveness can be found in Rivers, Douglas, and Delia Bailey. 2009. “ Inference from Matched Samples in the 2008 US National Elections .” Presented at the 2009 American Association for Public Opinion Research Annual Conference, Hollywood, Florida; and Ansolabehere, Stephen, and Douglas Rivers. 2013. “ Cooperative Survey Research .” Annual Review of Political Science 16(1), 307-29. ↩
  • See Azur, Melissa J., Elizabeth A. Stuart, Constantine Frangakis, and Philip J. Leaf. 2011. “Multiple Imputation by Chained Equations: What Is It and How Does It Work?: Multiple Imputation by Chained Equations.” International Journal of Methods in Psychiatric Research 20(1), 40–49. ↩
  • See Stuart, Elizabeth A. 2010. “ Matching Methods for Causal Inference: A Review and a Look Forward .” Statistical Science 25(1), 1-21 for a more technical explanation and review of the many different approaches to matching that have been developed. ↩
  • See Appendix C for a more detailed explanation of random forests and the matching algorithm used in this report, as well as Zhao, Peng, Xiaogang Su, Tingting Ge and Juanjuan Fan. 2016. “ Propensity Score and Proximity Matching Using Random Forest .” Contemporary Clinical Trials 47, 85-92. ↩
  • See Buskirk, Trent D., and Stanislav Kolenikov. 2015. “ Finding Respondents in the Forest: A Comparison of Logistic Regression and Random Forest Models for Response Propensity Weighting and Stratification. ” Survey Methods: Insights from the Field (SMIF). ↩
  • See Dutwin, David and Trent D. Buskirk. 2017. “ Apples to Oranges or Gala versus Golden Delicious? Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples .” Public Opinion Quarterly 81(S1), 213-239. ↩

Sign up for our weekly newsletter

Fresh data delivery Saturday mornings

Sign up for The Briefing

Weekly updates on the world of news & information

  • Nonprobability Surveys

Polling methods are changing, but reporting the views of Asian Americans remains a challenge

The coronavirus pandemic’s impact on pew research center’s global polling, assessing the risks to online polls from bogus respondents, when online survey respondents only ‘select some that apply’, q&a: why and how we expanded our american trends panel to play a bigger role in our u.s. surveys, most popular.

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

Talk to our experts

1800-120-456-456

  • Weighted Mean

ffImage

Introduction to Weighted Mean

In Mathematics, the weighted mean is used to calculate the average value of the data. In the weighted mean calculation, the average value can be calculated by providing different weights to some of the individual values. We need to calculate the weighted mean when data is given in a different way compared to the arithmetic mean or sample mean. Different types of means are used to calculate the average of the data values. Let’s understand what is weighted mean and how to define weighted mean along with solved examples.

Weight Definition

Weight is defined as the measure of how heavy an object is. The weights cannot be negative. Some weight can be zero, but not all of them, since division by zero is not allowed.

The data elements which have a high weight will contribute more to the weighted mean as compared to the elements with a low weight.

What is Weighted Mean?

To calculate the weighted mean of certain data, we need to multiply the weight associated with a particular event or outcome with its associated outcome and finally sum up all the products together. It is very useful in calculating a theoretically expected outcome. Apart from weighted mean and arithmetic mean, there are various types of means such as harmonic mean, geometric mean, and so on.

Define Weighted Mean

The weighted mean is defined as an average computed by giving different weights to some of the individual values. When all the weights are equal, then the weighted mean is similar to the arithmetic mean. A free online tool called the weighted mean calculator is used to calculate the weighted mean for the given range of values.

Weighted Mean Formula

To calculate the weighted mean for a given set of non-negative data x 1 ,x 2 ,x 3 ,...x n with non-negative weights w 1 ,w 2 ,w 3 ,..., we use the formula given below.

\[WeightedMean\overline{( W )} = \frac {\sum_{i=1}^{n} w_{i}x_{i}}{\sum_{i=1}^{n} w_i}\]

Where \[\overline {W}\] is the weighted mean,

x is the repeating value,

n is the number of terms whose mean is to be calculated, and

w is the individual weights.

Uses of Weighted Means

Weighted means are useful in a wide variety of scenarios in our daily life. For example, a student uses a weighted mean in order to calculate their percentage grade in a course. In such a case, the student has to multiply the weighing of all assessment items in the course (e.g., assignments, exams, projects, etc.) by the respective grade that was obtained in each of the categories. 

It is used in descriptive statistical analysis, such as index numbers calculation. For example, stock market indices such as Nifty or BSE Sensex are computed using the weighted average method. It can also be applied in physics to find the center of mass and moments of inertia of an object.

It is also useful for businessmen to evaluate the average prices of goods purchased from different vendors where the purchased quantity is considered as the weight. It gives a better understanding of his expenses.

A customer's decision on whether to buy a product or not depends on the quality of the product, knowledge of the product, cost of the product, and service by the franchise. The customer allocates weight to each criterion and calculates the weighted average. This will help him to make a better decision on buying the product.

To recruit a person for a job, the interviewer looks at the personality, working capabilities, educational qualification, and team working skills. Based on the profile, different levels of importance (weights) are given, and then the final selection is made.

Important Notes

The weights can be in the form of quantities, decimals, whole numbers , fractions, or percentages.

If the weights are given in percentage, then the sum of the percentage will be 100%.

Weighted average for quantities (x) i having weights in percentage (P) i % is: Weighted average = ∑ (P) i % × (x) i

Solved Example of Weighted Mean

Suppose a marketing firm conducted a survey of 1,000 households to determine the average number of TVs each household owns. The data shows that there are more households with two or three TVs and a few numbers with one of four. Every household in the sample has at least one TV and not a household has more than four. Calculate the mean number of TVs per household.

Solution: Here most of the values in this data set are repeated multiple times, we can easily compute the sample mean as a weighted mean. Following are steps to calculate the weighted arithmetic mean.

Step 1: First assign a weight to each value in the dataset.

x 1 =1, w 1 =73

x 2 =2, w 2 =378

x 3 =3, w 3 =459

x 4 =4, w 4 =90

Step 2: Now compute the numerator of the weighted mean formula .

To calculate it, multiply each sample by its weight and then add the products together to get the final value

\[\displaystyle\sum\limits_{i=1}^4 i\] w i x i = w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4

= 1 x 73 + 2 x 378 + 3 x 459 + 4 x 90

= 73 + 756 + 1377 + 360

Step 3: Now, compute the denominator of the weighted mean formula by adding their weights together.

\[\displaystyle\sum\limits_{i=1}^4 i\] w i = w 1 + w 2 + w 3 + w 4

= 73 + 378 + 459 + 90

Step 4: Finally divide the numerator value by the denominator value.

\[\frac{\displaystyle\sum\limits_{i=1}^4w_{i}x_{i}}{\displaystyle\sum\limits_{i=1}^4 w_{i}}\]

=\[\frac {2566}{1000}\]

Hence, the mean number of TVs per household in this sample is 2.566.

Note: The weighted mean can be easily influenced by an outlier in our data. If we have very high or very low values in our data set, then we cannot rely on the weighted mean.

Weighted Mean is a mean where some of the values contribute more than others. It represents the average of a given data. The Weighted mean is similar to the arithmetic mean or sample mean. Sometimes it is also known as the weighted average.

When the weights add to 1, we just have to multiply each weight by the matching value and sum it all up.

Otherwise, we have to multiply each weight w by its matching value x, the sum that all up, and divide it by the sum of weight. 

arrow-right

FAQs on Weighted Mean

1. What is weighted mean used for?

Weighted means are used in a wide variety of scenarios. For example, a student may use a weighted mean in order to calculate their percentage grade in a course such as a student would multiply the weighing of all assessment items in the course.

2. What are the steps involved in the calculation of weighted mean?

Following are the steps that are involved in the calculation of weighted mean.

Step 1: First list down the numbers and weights in tabular form. Representation in tabular form makes the calculations easy.

Step 2:  Multiply each number and relevant weight assigned to that number (w 1  by x 1 , w 2  by x 2 , and so on).

Step 3: Add the numbers obtained in Step 2 i.e . Σ x i   w i .

Step 4: Find the sum of the weights i.e. Σw i .

Step 5: Divide the total of the values obtained in step 3 by the sum of the weights obtained in step 4 i.e.  \[\frac {\sum x_i w_i}{\sum w_i}\]

3. How can we use the weighted mean calculator?

Following are the steps used in the weighted mean calculator :

Step 1: First enter the range values and their weight weighted mean in the input field.

Step 2: Now click the button “Solve” to display the weighted mean.

Step 3: Finally, the weighted mean will be displayed in the output field.

4. What are the characteristics of weighted mean?

Weighted Mean is the average computed to provide different weights to the individual values. 

If all the weights are equal, then the weighted mean is equal to the arithmetic mean.

It denotes the average of the given data. The weighted mean is equal to the arithmetic mean or sample mean. 

The weighted mean is calculated when data is provided in a different way, compared to the arithmetic mean or sample mean.

Weighted means are generally similar to arithmetic means and they have few counter-instinctive properties. 

Data elements with high weight contribute more to the weighted mean than the low weighted elements.

The weights cannot be negative, in some cases it may be zero, but not in all, since division by zero is not available or not allowed. 

Weighted means play an important role in the data analysis systems, weighted differential, and integral calculus.

Weighted Mean

Weighted mean is a type of average that helps in contributing equally to the final mean when some data points are weighted more than the others. It is most commonly used in statistics when the data is associated with the population. If the data is weighted the same across the entire set, then the weighted mean is equal to the arithmetic mean . Let us learn more about the weighted mean, the formula, and solve a few examples to understand the concept better.

What is Weighted Mean?

The weighted mean is a type of mean that is calculated by multiplying the weight associated with a particular event or outcome with its associated quantitative outcome and then summing all the products together. In other words, when some values weigh more than the others that's when the weighted mean is calculated.

Weighted Mean Definition

The weighted mean is defined as the summation of the product of weights and quantities, divided by the summation of weights. The concept of the weighted mean is quite often used in accounts, to give different weights based on time or based on priority.

Weighted Mean Formula

How to Calculate the Weighted Mean?

While finding the average for an equally weighted set of values, we use the simple process of the arithmetic mean. Where all the values are added and divided by the total number of items in the set. However, the weighted mean is calculated when one of the values has more weight than the others. It can be calculated by using these two simple steps:

  • Multiply the numbers in the set by the weights.
  • Add the results.

But in certain values, the given data set is more important than the others. A weight (w) n is attached to each of the values (x) n . The general formula to find the weighted mean is given as,

Weighted mean = Σ(w) n (x̄) n /Σ(w) n

  • x̄ = the mean value of the set of given data.
  • w = corresponding weight for each observation.

The simple steps used to calculate the weight mean through the formula is:

  • Step 1: Add all the weighted values together.
  • Step 2: Multiply the weighted values and the quantities in the data set.
  • Step 3: Add the values together obtained in step 2.
  • Step 4: Divide the result by the number obtained in step 1.

Weighted Mean Formula

The weighted mean formula helps to find the mean of the quantities by assigning weights to the quantities. Based on the level of importance of the quantities, weights are assigned to the quantities. The below formula for weighted mean includes variables \(x_1\), \(x_2\), \(x_3\)...\(x_n\), and their weights \(w_1\), \(w_2\), \(w_3\)...\(w_n\) respectively. Here this is similar to the average and the weighted mean represents the summary value of all the available quantities. The weighted mean has the same units as that of the individual quantities.

\[ \bar x = \frac{w_1x_1 + w_2x_2 + ......+ w_nx_n}{w_1 + w_2 + ... + w_n} \]

\[\bar x = \frac{\sum w_nx_n}{\sum w_n} \]

Let us look at an example to understand this better.

Example: Find weighted mean for following data set w = {2, 5, 6, 8, 9}, x = {4, 3, 7, 5, 6}

Given data sets w = {2, 5, 6, 8, 9}, x = {4, 3, 7, 5, 6} and N = 5

Weighted mean = ∑(weights × quantities) / ∑(weights)

= (w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 x 4 + w 5 x 5 ) / (w 1 + w 2 + w 3 + w 4 + w 5 )

= (2 × 4 + 5 × 3 + 6 × 7 + 8 × 5 + 9 × 6) / (2 + 5 + 6 + 8 + 9)

= ( 8 + 15 + 42 + 40 + 54) / 30

Therefore, the weighted mean is 5.3.

Weighted Mean Vs Arithmetic Mean

☛Related Articles

Check out the interesting topics to learn more about weighted mean.

  • Categorical Data
  • Range in Statistics
  • Geometric Mean
  • Mean of Grouped Data

Weighted Mean Examples

Example 1: A teacher provides the following weightage of 20% for class attendance, 30% for project work, 40% for tests, and 10% for home assignments. A student scores 80/100 for class attendance, 4/5 in project work, 35/50 in tests, and 8/10 in home assignments. Find the final score of the student.

\(\begin{align} \bar x &= \frac{w_1x_1 + w_2x_2 + ......+ w_nx_n}{w_1 + w_2 + ... + w_n} \\&=\frac{20\%.\frac{80}{100} + 30\%.\frac{4}{5} + 40\%.\frac{35}{50} + 10\%.\frac{8}{10}}{20\% + 30\% + 40\% + 10\%} \\&=\frac{20\%.0.8 + 30\%.0.8 + 40\%.0.7 + 10\%.0.8}{100\%} \\&=\frac{0.2 \times 0.8 + 0.3 \times 0.8 + 0.4 \times 0.7 + 0.1 \times 0.8}{1} \\&=0.16 + 0.24 + 0.28 + 0.08 \\&=0.76 \end{align} \) Therefore, the final score of the student is 0.76.

Example 2: For a job application 0.8 weightage is given to academic qualification, 0.7 is given to personality, 0.4 is given to the location. The prospective candidate scores 4.5/5 for academic qualification, 3/5 for personality, and 2.8/5 for location. Find the final score received by the candidate.

\(\begin{align} \bar x &= \frac{w_1x_1 + w_2x_2 + ......+ w_nx_n}{w_1 + w_2 + ... + w_n} \\&=\frac{0.8 \times \frac{4.5}{5}+ 0.7 \times \frac{3}{5} + 0.4 \times \frac{2.8}{5}}{0.8 + 0.7 + 0.4} \\&=\frac{0.8 \times 0.9 + 0.7 \times 0.6 + 0.4 \times 0.56}{0.19} \\&=\frac{0.72 + 0.42 + 0.224}{0.19} \\&=\frac{1.364}{0.19} \\&= 7.18 \end{align} \) Hence, the final score of the candidate is 7.18.

Example 3: Ben is a fruit merchant who sells various types of fruits in Chicago. Some fruits are of higher quality and are sold at a higher price. He wants you to calculate the weighted mean from the following data:

\(\begin{align} \bar x &= \frac{w_1x_1 + w_2x_2 + ......+ w_nx_n}{w_1 + w_2 + ... + w_n}

= 100 × 80 + 50 × 70 + 20 × 60 + 15 × 50 / 100 + 50 + 20 + 15

= 8000 + 3500 + 1200 + 750 / 185

= 13450/185

Therefore, the weighted mean is 75.7

go to slide go to slide go to slide

weighted mean in research purpose

Book a Free Trial Class

FAQs on Weighted Mean

What is the meaning of weighted mean.

The weighted mean is a mean that is calculated by multiplying the weight associated with a particular event or outcome with its associated quantitative outcome and then summing all the products together.

How Do You Calculate Weighted Mean?

Weighted mean can be calculated in two ways.

  • When the weights add to 1 then multiply the weights with the value and add them.
  • When the weights add to more than 1 then we use the weighted mean formula.

What is the Weighted Mean Formula?

The weighted mean formula is:

  • \(x_1\), \(x_2\), \(x_3\)...\(x_n\) are the variables.
  • \(w_1\), \(w_2\), \(w_3\)...\(w_n\) are the weights.

When Should I Use Weighted Mean?

When some values weigh more than the other values i.e. count more, that's when weighted mean is used.

  • Open access
  • Published: 06 April 2021

On weighted means and their inequalities

  • Mustapha Raïssouli 1 , 2 &
  • Shigeru Furuichi   ORCID: orcid.org/0000-0002-9929-0954 3  

Journal of Inequalities and Applications volume  2021 , Article number:  65 ( 2021 ) Cite this article

2459 Accesses

4 Citations

Metrics details

In (Pal et al. in Linear Multilinear Algebra 64(12):2463–2473, 2016 ), Pal et al. introduced some weighted means and gave some related inequalities by using an approach for operator monotone functions. This paper discusses the construction of these weighted means in a simple and nice setting that immediately leads to the inequalities established there. The related operator version is here immediately deduced as well. According to our constructions of the means, we study all cases of the weighted means from three weighted arithmetic/geometric/harmonic means by the use of the concept such as stable and stabilizable means. Finally, the power symmetric means are studied and new weighted power means are given.

1 Introduction

The mean inequalities arise in various contexts and attract many mathematicians by their developments and applications. It has been proved throughout the literature that the mean theory is useful in a theoretical point of view as well as in practical purposes.

1.1 Standard weighted means

As usual, we understand by (binary) mean a map m between two positive numbers such that \(\min (a,b)\leq m(a,b)\leq \max (a,b)\) for any \(a,b>0\) . Continuous (symmetric/homogeneous) means are defined in the habitual way. If m is a mean, then we define its dual by \(m^{*}(a,b)= (m(a^{-1},b^{-1}) )^{-1}\) . It is easy to see that if m is continuous,(resp. symmetric, homogeneous), then so is \(m^{*}\) . Of course, \(m^{**}=m\) for any mean m . The means \((a,b)\longmapsto \min (a,b)\) and \((a,b)\longmapsto \max (a,b)\) are called trivial means. A mean m is called strict if \(m(a,b)=a\) implies \(a=b\) . The trivial means are not strict.

Among the standard means, we recall the arithmetic mean \(a\nabla b=\frac{a+b}{2}\) , the geometric mean \(a\sharp b=\sqrt{ab}\) , the harmonic mean \(a!b=\frac{2ab}{a+b}\) , the logarithmic mean \(L(a,b)=\frac{b-a}{\log b-\log a}\) with \(L(a,a)=a\) , and the identric mean \(I(a,b)=e^{-1} (b^{b}/a^{a} )^{1/(b-a)}\) with \(I(a,a)=a\) . It is easy to see that \(H^{*}=A\) and \(G^{*}=G\) , where H , A , and G stand for harmonic, arithmetic, and geometric mean, respectively. The explicit expressions of \(L^{*}\) and \(I^{*}\) can be easily deduced from those of L and I , respectively. All these means are strict. The following chain of inequalities is well known in the literature (e.g. [ 2 , Corollary 13 in Sect. 5 of Chapter II]):

Let \(m_{v}\) be a binary map indexed by \(v\in [0,1]\) . We say that \(m_{v}\) is a weighted mean if the following assertions are satisfied:

\(m_{v}\) is a mean for any \(v\in [0,1]\) ;

\(m_{1/2}:=m\) is a symmetric mean;

\(m_{v}(a,b)=m_{1-v}(b,a)\) for any \(a,b>0\) and \(v\in [0,1]\) .

It is obvious that (iii) implies (ii). The mean \(m:=m_{1/2}\) is called the associated symmetric mean of \(m_{v}\) . It is not hard to check that if \(m_{v}\) is a weighted mean then so is \(m_{v}^{*}\) .

The standard weighted means are recalled in the following. The weighted arithmetic mean \(a\nabla _{v}b=(1-v)a+vb\) , the weighted geometric mean \(a\sharp _{v}b=a^{1-v}b^{v}\) , and the weighted harmonic mean \(a!_{v}b= ((1-v)a^{-1}+vb^{-1} )^{-1}\) . For \(v=1/2\) , they coincide with \(a\nabla b\) , \(a\sharp b\) , and \(a!b\) , respectively. These weighted means satisfy

for any \(a,b>0\) and \(v\in [0,1]\) . These weighted means are all strict provided \(v\in (0,1)\) .

1.2 Two weighted means

Recently, Pal et al. [ 6 ] introduced a class of operator monotone functions from which they deduced other weighted means, namely the weighted logarithmic mean defined by

and the weighted identric mean given by

One has \(L_{0}(a,b):=\lim_{v\downarrow 0}L_{v}(a,b)=a\) and \(L_{1}(a,b):=\lim_{v\uparrow 1}L_{v}(a,b)=b\) , with similar equalities for \(I(a,b)\) . One can see that \(L_{v}\) and \(I_{v}\) satisfy conditions (i),(ii), and (iii). For \(v=1/2\) , they coincide with \(L(a,b)\) and \(I(a,b)\) , respectively.

It has been shown in [ 6 , Theorem 2.4, Theorem 3.1] that the inequalities

hold for any \(a,b>0\) and \(v\in [0,1]\) .

Otherwise, Furuichi and Minculete [ 3 ] gave a systematic study from which they obtained a lot of mean inequalities involving \(L_{v}(a,b)\) and \(I_{v}(a,b)\) . Some of their inequalities are refinements and reverses of ( 1.5 ) and ( 1.6 ).

The outline of this paper is organized as follows: In Sect.  2 we give simple forms for \(L_{v}(a,b)\) and \(I_{v}(a,b)\) , and mean inequalities are obtained in a fast and nice way. We also deduce two other weighted means from \(L_{v}(a,b)\) and \(I_{v}(a,b)\) . Section  3 is devoted to investigating a general approach in service of weighted means. We then obtain more weighted means in another point of view. Section  4 displays the operator version of the previous weighted means as well as their related inequalities. In Sect.  5 we recall the standard power means known in the literature, and we use, in Sect.  6 , our approach for obtaining some new weighted means associated with the previous power means.

2 Another point of view for defining \(L_{v}(a,b)\) and \(I_{v}(a,b)\)

We preserve the same notations as in the previous section. The expressions ( 1.3 ) and specially ( 1.4 ) seem to be hard in computational context. We will see that we can rewrite them in other forms having convex characters.

2.1 Simple forms of \(L_{v}(a,b)\) and \(I_{v}(a,b)\)

The key idea of this section turns out to be the following result.

Theorem 2.1

For any \(a,b>0\) and \(v\in [0,1]\) , we have

Starting from the middle expression of ( 2.1 ) and using the definition of \(L(a,b)\) and \(a\sharp _{v}b\) , we get the desired result after simple algebraic manipulations as follows:

In a similar way we get ( 2.2 ) as

In what follows we will see that inequalities ( 1.5 ) and ( 1.6 ) can be immediately deduced from ( 2.1 ) and ( 2.2 ), respectively. In fact we will prove more.

Theorem 2.2

Let \(a,b>0\) and \(v\in [0,1]\) . Then we have

The two right inequalities of ( 2.3 ) are those of ( 1.5 ). We will prove them again by using ( 2.1 ). Indeed, ( 2.1 ) with the help of ( 1.1 ), and then ( 1.2 ) yields

We now prove the two left inequalities of ( 2.3 ). Again, ( 2.1 ) with ( 1.1 ) and then ( 1.2 ) implies that

We left to the reader the routine task to prove ( 2.4 ) in a similar manner. □

From ( 2.1 ) and ( 2.2 ) we immediately see that \(L_{v}(a,b)=L_{1-v}(b,a)\) and \(I_{v}(a,b)=I_{1-v}(b,a)\) for any \(a,b>0\) and \(v\in [0,1]\) .

From ( 2.1 ) and ( 2.2 ), it is immediate to see that \(L_{v}(a,b)\) and \(I_{v}(a,b)\) are binary means in the sense that they satisfy the conditions itemized in [ 6 ].

Inequalities ( 2.3 ) and ( 2.4 ) give alternative simple proofs for [ 3 , Corollary 2.2] and [ 3 , Corollary 2.3], respectively.

In order to emphasize even more the importance of ( 2.1 ) and ( 2.2 ), we present below more results. These results investigate some inequalities refining the right inequalities in ( 2.3 ) and ( 2.4 ). We need the following lemma.

Let \(a>0\) be fixed . Then the real functions \(x\longmapsto L(a,x)\) and \(x\longmapsto I(x,a)\) are ( strictly ) concave for \(x>0\) .

It is a simple exercise of real analysis. □

Theorem 2.5

We prove the first inequality in ( 2.5 ). Since the map \(x\longmapsto L(a\sharp _{v}b,x)\) is concave for \(x>0\) , then ( 2.1 ) yields

The second and third inequalities of ( 2.5 ) follow from ( 1.1 ).

To prove the second inequality of ( 2.6 ), we write by using the previous lemma

This, with the fact that \(I(a,b)\geq a\sharp b\) , implies that \(I (a\nabla _{v}b,a )\geq a\nabla _{v}(a\sharp b)\) . Similarly, we show that \(I (a\nabla _{v}b,b )\geq (a\sharp b)\nabla _{v}b\) . This, with ( 2.2 ), yields the second inequality of ( 2.6 ). To prove the first inequality of ( 2.6 ), we write

after a simple computation. The proof is finished. □

2.2 Two other weighted means

A natural question arises from the previous subsection: do we have a weighted mean \(M_{v}(a,b)\) such that

We can also put the following question: do we have a weighted mean \(P_{v}(a,b)\) such that

In what follows we will answer the two preceding questions. Recall that \(L^{*}\) denotes the dual of the logarithmic mean L , and \(L_{v}^{*}\) is the dual of the weighted logarithmic mean \(L_{v}\) , as previously defined. Similar sentence for \(I^{*}\) and \(I_{v}^{*}\) . We will establish the following result.

Theorem 2.6

We can of course assume that \(v\in (0,1)\) . If in ( 2.1 ) we replace a and b with \(a^{-1}\) and \(b^{-1}\) , respectively, then we get

Taking the inverses side by side and using the definition of the weighted harmonic mean, we infer that

Now, let us set

If \(v\in (0,1)\) is fixed, for any \(a>0\) and \(x>0\) , it is easy to see that there exists a unique \(b>0\) such that \(a\sharp _{v}b=x\) . This means that M is well defined by ( 2.12 ). Further, if we remark that \(a\sharp _{v}b=x\) implies \(x^{-1}=a^{-1}\sharp _{v}b^{-1}\) , then ( 2.12 ) becomes

It follows that M is the dual of the logarithmic mean L . Following ( 2.11 ) and ( 2.7 ), the associated weighted mean \(M_{v}\) of M is such that

i.e. \(M_{v}(a,b)\) is the dual of the weighted logarithmic mean \(L_{v}\) . We left to the reader the task to prove ( 2.10 ) in a similar manner. □

After this, let us observe the following question: is \(L_{v}\) the unique weighted mean satisfying ( 2.1 )? In the next section, we answer this question via a general point of view. A similar question can be put for ( 2.2 ), ( 2.9 ), and ( 2.10 ).

3 Weighted means in a general point of view

As already pointed before, we investigate here a study that shows how to construct some weighted means in a general point of view.

3.1 Position of the problem

Let \(a,b>0\) and \(v\in [0,1]\) . Let \(p_{v}\) and \(q_{v}\) be two weighted means. We write \(ap_{v}b:=p_{v}(a,b)\) and \(aq_{v}b:=q_{v}(a,b)\) for the sake of simplicity. As previously, \(p:=p_{1/2}\) and \(q:=q_{1/2}\) , and we write \(apb:=p(a,b)\) and \(aqb=q(a,b)\) . To fix the idea, and for the first time, we can choose \(p_{v}\) and \(q_{v}\) among the three standard weighted means i.e. \(ap_{v}b,aq_{v}b\in \{a\nabla _{v}b,a\sharp _{v}b,a!_{v}b \}\) .

Our general problem reads as follows: do we have a weighted mean \(M_{v}(a,b)\) such that

To answer this question, it is in fact enough to justify that there exists one and only one (symmetric) mean M such that

Indeed, \(p_{v}(a,b)\) and \(q_{v}(a,b)\) are given. Once the (symmetric) mean M is found, we obtain \(M_{v}(a,b)\) by substituting \(M(a,b)\) in ( 3.1 ).

Note that if \(ap_{v}b,aq_{v}b\in \{a\nabla _{v}b,a\sharp _{v}b,a!_{v}b \}\) , then we have nine cases. Theorem  2.1 answers the previous question for \((p_{v},q_{v})=(\sharp _{v},\nabla _{v})\) and \((p_{v},q_{v})=(\nabla _{v},\sharp _{v})\) , while Theorem  2.6 answers the question for \((p_{v},q_{v})=(\sharp _{v},!_{v})\) and \((p_{v},q_{v})=(!_{v},\sharp _{v})\) .

Our aim here is to answer the previous question in its general form. We need to recall some notions and results as background material that we will summarize in the next subsection.

3.2 Stable and stabilizable means

We recall here in short the concept of stable and stabilizable means introduced in [ 7 , 8 , 10 ]. Let \(m_{1}\) , \(m_{2}\) , and \(m_{3}\) be three given symmetric means. For all \(a,b>0\) , the resultant mean map of \(m_{1}\) , \(m_{2}\) , and \(m_{3}\) is defined by [ 7 ]

A symmetric mean m is called stable if \({\mathcal{R}}(m,m,m)=m\) and stabilizable if there exist two nontrivial stable means \(m_{1}\) and \(m_{2}\) such that \({\mathcal{R}}(m_{1},m,m_{2})=m\) . We then say that m is \((m_{1},m_{2})\) -stabilizable. If m is stable, then so is \(m^{*}\) , and if m is \((m_{1},m_{2})\) -stabilizable, then \(m^{*}\) is \((m_{1}^{*},m_{2}^{*})\) -stabilizable. The tensor product of \(m_{1}\) and \(m_{2}\) is the map, denoted \(m_{1}\otimes m_{2}\) , defined by

A symmetric mean m is called cross mean if the map \(m^{\otimes 2}:=m\otimes m\) is symmetric in its four variables. Every cross mean is stable, see [ 7 ], and the converse is still an open problem.

It is worth mentioning that the operator version of the previous concepts as well as their related results has been investigated in a detailed manner in [ 11 ]. It has been proved there that every cross operator mean is stable but the converse does not in general hold provided that the Hilbert operator space is of dimension greater than 2.

The following results will be needed later, see [ 7 , 8 , 10 ].

Theorem 3.1

The arithmetic , geometric , and harmonic means are cross means , and so they are stable .

The logarithmic mean L is \((!,\nabla )\) - stabilizable and \((\nabla ,\sharp )\) - stabilizable , while the identric mean I is \((\sharp ,\nabla )\) - stabilizable .

The mean \(L^{*}\) is \((\nabla ,!)\) - stabilizable and \((!,\sharp )\) - stabilizable , while \(I^{*}\) is \((\sharp ,!)\) - stabilizable .

For more examples and properties about stable and stabilizable means, we can consult [ 7 , 8 , 10 , 11 ]. See also Sect.  5 .

Theorem 3.2

Let \(m_{1}\) and \(m_{2}\) be two symmetric means such that \(m_{1}\leq m_{2}\) ( resp . \(m_{2}\leq m_{1}\) ). Assume that \(m_{1}\) is a strict cross mean . Then there exists one and only one \((m_{1},m_{2})\) - stabilizable mean m such that \(m_{1}\leq m\leq m_{2}\) ( resp . \(m_{2}\leq m\leq m_{1}\) ).

3.3 The main result

Now, we are in the position to answer our previous question as recited in the following result.

Theorem 3.3

Let \(a,b>0\) and \(v\in [0,1]\) . Let \(p_{v}\) and \(q_{v}\) be two weighted means such that \(p:=p_{1/2}\) and \(q:=q_{1/2}\) are stable . Assume that q is a strict cross mean . Then there exists one and only one weighted mean \(M_{v}(a,b)\) such that ( 3.1 ) holds . Further , \(M:=M_{1/2}\) is the unique \((q,p)\) - stabilizable mean .

As already pointed before, it is enough to consider ( 3.2 ). Following the previous subsection, ( 3.2 ) can be written as

This means that M is \((q,p)\) -stabilizable. According to Theorem  3.2 , such M exists and is unique. Since \(p_{v}\) and \(q_{v}\) are given, we then deduce the existence and uniqueness of \(M_{v}\) satisfying ( 3.1 ). The proof is finished. □

Following Theorem  3.1 , the symmetric means \(a\nabla b\) , \(a\sharp b\) , \(a!b\) are cross means, and so they are stable. From the preceding theorem we immediately deduce the following corollary.

Corollary 3.4

If \(p_{v},q_{v}\in \{\nabla _{v},\sharp _{v},!_{v}\}\) , then we have the same conclusion as in the previous theorem .

The condition \(p_{v},q_{v}\in \{\nabla _{v},\sharp _{v},!_{v}\}\) includes exactly nine cases. The following examples discuss these cases in detail.

Example 3.5

Assume that \((p_{v},q_{v})=(\sharp _{v},\nabla _{v})\) . Theorem  3.3 implies that M is the unique \((\nabla ,\sharp )\) -stabilizable mean, and so Theorem  3.1 gives \(M=L\) . The related weighted mean is the weighted logarithmic mean \(L_{v}\) given by ( 2.1 ).

Assume that \((p_{v},q_{v})=(\nabla _{v},\sharp _{v})\) . In a similar way as previously, \(M=I\) and \(M_{v}=I_{v}\) is given by ( 2.2 ).

Similarly, if \((p_{v},q_{v})=(\sharp _{v},!_{v})\) , then \(M=L^{*}\) and \(M_{v}\) is given by ( 2.9 ). If \((p_{v},q_{v})=(!_{v},\sharp _{v})\) , then \(M=I^{*}\) and \(M_{v}\) is given by ( 2.10 ).

Example 3.6

For the three cases \(p_{v}=q_{v}\in \{\nabla _{v},\sharp _{v},!_{v}\}\) , it is not hard to check that \(M_{v}=p_{v}=q_{v}\) , and so \(M=p=q\) . We can show this separately for every case by checking ( 3.1 ) or use Theorem  3.3 when combined with Theorem  3.1 . The details are immediate and therefore omitted here for the reader.

We have two cases left to see, namely \((p_{v},q_{v})=(\nabla _{v},!_{v})\) and \((p_{v},q_{v})=(!_{v},\nabla _{v})\) , which we discuss in the two following examples, respectively.

Example 3.7

Assume that \((p_{v},q_{v})=(\nabla _{v},!_{v})\) . Following Theorem  3.3 , M is the unique \((!,\nabla )\) -stabilizable mean and by Theorem  3.1 one has \(M=L\) . The related weighted mean \(M_{v}\) is given by

By construction, we have \(M=L\) the logarithmic mean. From ( 3.3 ) we can check again that \(M:=M_{1/2}=L\) . In other words, the weighted mean \(M_{v}(a,b)\) defined by ( 3.3 ) is a second weighted logarithmic mean which we denote by \({\mathcal{L}}_{v}\) . Its explicit form is given by

Example 3.8

Assume that \((p_{v},q_{v})=(!_{v},\nabla _{v})\) . In a similar way as previously, we show that the associated mean M is here given by \(M=L^{*}\) the dual logarithmic mean. The associated weighted mean \(M_{v}\) is defined by

Also, from this latter relation we can verify that \(M:=M_{1/2}=L^{*}\) and \(M_{v}={\mathcal{L}}_{v}^{*}\) , with \({\mathcal{L}}_{v}^{*}(a,a)=a\) and

The previous examples are summarized in Table  1 .

4 Operator version

The operator version of the previous weighted means as well as their related operator inequalities have been also discussed in [ 6 ]. By using their approach for operator monotone functions and referring to the Kubo–Ando theory [ 5 ], they studied the analogues of \(L_{v}(a,b)\) and \(I_{v}(a,b)\) when the positive real numbers a and b are replaced with positive invertible operators.

Here, and with ( 2.1 ) and ( 2.2 ), we do not need any more tools for giving in an explicit setting the operator versions of \(L_{v}(a,b)\) and \(I_{v}(a,b)\) . Before exploring this, let us recall a few basic notions about operator means.

Let H be a complex Hilbert space, and let \({\mathcal{B}}(H)\) be the \(\mathbb{C}^{*}\) -algebra of bounded linear operators acting on H . The notation \({\mathcal{B}}^{+*}(H)\) refers to the open cone of all (self-adjoint) positive invertible operators in \({\mathcal{B}}(H)\) . As usual, the notation \(A\leq B\) means that \(A,B\in {\mathcal{B}}(H)\) are self-adjoint and \(B-A\) is positive semi-definite. A real-valued function f on a nonempty J of \({\mathbb{R}}\) is said to be operator monotone if and only if \(A\leq B\) implies \(f(A)\leq f(B)\) for self-adjoint operators A and B whose spectral \(\sigma (A),\sigma (B)\subset J\) . As usual, \(f(A)\) is defined by the techniques of functional calculus. For further details about operator monotone functions, we can consult [ 1 , 4 , 12 , 13 ] and the related references cited therein. Some examples of operator monotone functions are considered in what follows.

Following the Kubo–Ando theory [ 5 ], there exists a unique one-to-one correspondence between operator means and operator monotone functions. More precisely, an operator mean m in the Kubo–Ando sense is such that

for some positive monotone increasing function \(f_{m}\) on \((0,\infty )\) . The function \(f_{m}\) in ( 4.1 ) is called the representing function of the operator mean m . An operator mean in the Kubo–Ando sense is called operator monotone mean.

Let \(A,B\in {\mathcal{B}}^{+*}(H)\) and \(v\in [0,1]\) . As standard examples of operator monotone means, the following

are known in the literature as the weighted arithmetic mean, the weighted geometric mean, and the weighted harmonic mean of A and B , respectively. If \(v=1/2\) , they are simply denoted by \(A\nabla B\) , \(A\sharp B\) , and \(A!B\) , respectively. The previous operator means satisfy the following double inequality:

The weighted logarithmic mean and the weighted identric mean of A and B can be, respectively, defined through:

For \(v=1/2\) , they are simply denoted by \(L(A,B)\) and \(I(A,B)\) , respectively. From ( 1.1 ), with the help of ( 4.1 ), we can immediately see that the chain of inequalities

is also valid for any \(A,B\in {\mathcal{B}}^{+*}(H)\) and \(v\in [0,1]\) .

For the sake of information, the logarithmic mean \(L(A,B)\) previously defined can be also alternatively given by one of the following integral forms:

It is worth mentioning that ( 4.3 ) and ( 4.4 ) define \(L_{v}(A,B)\) and \(I_{v}(A,B)\) just in the theoretical context. To give the explicit forms of \(L_{v}(A,B)\) and \(I_{v}(A,B)\) , analogues to those of ( 1.3 ) and ( 1.4 ), by using ( 4.3 ) and ( 4.4 ), appears to be not obvious and no result reaching from this way. However, according to Theorem  2.1 with ( 4.1 ), we immediately deduce the following.

Theorem 4.1

For any \(A,B\in {\mathcal{B}}^{+*}(H)\) and \(v\in [0,1]\) , we have

Since all the involved operators in ( 4.6 ) and ( 4.7 ) are operator means in the sense of ( 4.1 ), then by Theorem  2.2 we immediately deduce the following result as well.

Theorem 4.2

Let \(A,B\in {\mathcal{B}}^{+*}(H)\) and \(v\in [0,1]\) . Then we have

The operator inequalities ( 4.8 ) and ( 4.9 ) recover Theorem 2.4 and Theorem 3.2 of [ 6 ], respectively.

By the same arguments as previous, the operator version of Theorem  2.5 is immediately given in the following statement.

Theorem 4.4

5 power symmetric means.

This section deals with some weighted means for power symmetric means in one or two parameters. Let \(a,b>0\) and p , q be two real numbers. We recall the following:

• The power binomial mean defined by

This includes the particular cases \(B_{1}(a,b)=a\nabla b\) , \(B_{0}(a,b):=\lim_{p\rightarrow 0}B_{p}(a,b)=a \sharp b\) and \(B_{-1}(a,b)=a!b\) . Note that \(B_{1/2}(a,b)=(a\nabla b)\nabla (a\sharp b)\) .

• The power logarithmic mean defined by

We have \(L_{-2}(a,b)=a\sharp b\) , \(L_{-1}(a,b)=L(a,b)\) , \(L_{0}(a,b)=I(a,b)\) , \(L_{1}(a,b)=a \nabla b\) .

• The power difference mean given by

In particular, \(D_{-2}(a,b)=a!b\) , \(D_{-1}(a,b)=L^{*}(a,b)\) , \(D_{-1/2}(a,b)=a\sharp b\) , \(D_{0}(a,b)=L(a,b)\) , and \(D_{1}(a,b)=a\nabla b\) .

• The power exponential mean defined as

As special cases, \(I_{-1}(a,b)=I^{*}(a,b)\) , \(I_{0}(a,b)=a\sharp b\) , and \(I_{1}(a,b)=I(a,b)\) .

• The second power logarithmic mean defined through

In particular, \({\mathcal{L}}_{-1}(a,b)=L^{*}(a,b)\) , \({\mathcal{L}}_{0}(a,b)=a\sharp b\) , and \({\mathcal{L}}_{1}(a,b)=L(a,b)\) .

• The previous power means are included in the so-called Stolarsky mean \(S_{p,q}\)

in the sense that

All the previous power means are symmetric in a and b . Also, remark that \(S_{p,q}\) is symmetric in p and q . Otherwise, the power binomial mean \(B_{p}\) is stable for any \(p\in {\mathbb{R}}\) and the following result holds, see [ 9 ].

Theorem 5.1

For any \(p,q\in {\mathbb{R}}\) , the Stolarsky mean \(S_{p,q}\) is \((B_{q-p},B_{p} )\) - stabilizable .

The previous theorem, when combined with ( 5.7 ) and a simple argument of continuity, immediately implies the following, see also [ 7 ].

Corollary 5.2

For all real number p , the following assertions hold :

The power mean \(L_{p}\) is \((B_{p},\nabla )\) - stabilizable , while \(D_{p}\) is \((\nabla ,B_{p})\) - stabilizable .

The power mean \(I_{p}\) is \((\sharp ,B_{p})\) - stabilizable , while \({\mathcal{L}}_{p}\) is \((B_{p},\sharp )\) - stabilizable .

Now, let us observe the following remark which is of interest.

Since \(S_{p,q}=S_{q,p}\) , we can also say that \(S_{p,q}\) is \((B_{p-q},B_{q} )\) -stabilizable. This, with ( 5.7 ), implies also that \(L_{p}\) is \((B_{-p},B_{p+1})\) -stabilizable, \(D_{p}\) is \((!,B_{p+1})\) -stabilizable, \({\mathcal{L}}_{p}\) is \((B_{-p},B_{p})\) -stabilizable, and no news for \(I_{p}\) . Obviously, (i) and (ii) of Corollary  5.2 are simpler than these latter statements.

6 Some new weighted power means

In this section we investigate the weighted means of the previous power means. The weighted power binomial mean can be immediately given by

This, with the results presented in the preceding section, will allow us to construct some new weighted power means. Recall that \(m_{v}\) is called weighted mean if it satisfies the conditions: \(m_{v}\) is a mean for any \(v\in [0,1]\) , \(m:=m_{1/2}\) is a symmetric mean, and \(m_{v}(a,b)=m_{1-v}(b,a)\) for any \(a,b>0\) and \(v\in [0,1]\) . We then say that \(m_{v}\) is a m -weighted mean and m is the symmetric mean of \(m_{v}\) . It is obvious that for any weighted mean \(m_{v}\) its associated symmetric mean \(m:=m_{1/2}\) is unique. However, for a given symmetric mean m , we can have two m -weighted means \(m_{v}\) and \(l_{v}\) i.e. \(m_{1/2}=l_{1/2}=m\) . For more explanation about this latter situation, see the examples below.

In a general context, we have the following result.

Theorem 6.1

Let m and l be two stable means , and let M be \((l,m)\) - stabilizable . Let \(m_{v}\) and \(l_{v}\) be the m - weighted mean and the l - weighted mean , respectively . Then the following

is a M - weighted mean .

It is straightforward. The details are simple and therefore omitted here for the reader. □

Applying the previous simple result to the preceding power means, we immediately obtain their associated weighted power means. We present these in the following examples. We begin by the \(S_{p,q}\) -weighted mean, and we then deduce the other weighted power means as particular cases.

Example 6.2

By Theorem  5.1 , \(S_{p,q}\) is \((B_{q-p},B_{p})\) -stabilizable. By Theorem  6.1 , an \(S_{p,q}\) -weighted mean is given by

Utilizing ( 5.6 ) with ( 5.1 ), the explicit form of \(S_{p,q;v}(a,b)\) is given by (for \(a\neq b\) )

provided that \(p\neq 0\) , \(q\neq 0\) , \(p\neq q\) , where we write \(B_{p;v}:=B_{p;v}(a,b)\) for simplifying the writing. The three cases \(p=0\) , \(q=0\) , and \(p=q\) will be presented later.

Example 6.3

Since \(S_{p,q}\) is also \((B_{p-q},B_{q})\) -stabilizable, see Remark  5.3 , another \(S_{p,q}\) -weighted mean is given by

or, in an explicit form, if \(p\neq 0\) , \(q\neq 0\) , \(p\neq q\) , and \(a\neq b\) ,

Example 6.4

By Corollary  5.2 , \(L_{p}\) is \((B_{p},\nabla )\) -stabilizable. By Theorem  6.1 , the \(L_{p}\) -weighted mean is given by

By ( 5.1 ), ( 5.2 ), and ( 5.6 ), or just using the relation \(L_{p}=S_{1,p+1}\) with ( 6.2 ), we obtain the explicit form of \(L_{p;v}\) :

Similarly, since \(D_{p}\) is \((\nabla ,B_{p})\) -stabilizable, we then deduce that the \(D_{p}\) -weighted mean is given by

or in an explicit form, with \(B_{p;v}:=B_{p;v}(a,b)\) ,

Example 6.5

By similar arguments as in the previous examples, we obtain:

The \({\mathcal{L}}_{p}\) -weighted mean is defined by

or in an explicit form

The \(I_{p}\) -weighted mean is given by

We left to the reader the task of giving the explicit form of this latter weighted power mean.

Availability of data and materials

Not applicable.

Besenyei, A., Petz, D.: Completely positive mappings and mean matrices. Linear Algebra Appl. 435 , 984–997 (2011)

Article   MathSciNet   Google Scholar  

Bullen, P.S.: Handbook of Means and Their Inequalities. Kluwer Academic, Dordrecht (2003)

Book   Google Scholar  

Furuichi, S., Minculete, N.: Refined inequalities on the weighted logarithmic mean. J. Math. Inequal. 14 , 1347–1357 (2020)

Izumino, S., Nakamura, N.: Elementary proofs of operator monotonicity of some functions. Sci. Math. Jpn. Online e-2013 , 679–686

Kubo, F., Ando, T.: Means of positive linear operators. Math. Ann. 246 , 205–224 (1980)

Pal, R., Singh, M., Moslehian, M.S., Aujla, J.S.: A new class of operator monotone functions via operator means. Linear Multilinear Algebra 64 (12), 2463–2473 (2016)

Raïssouli, M.: Stability and stabilizability for means. Appl. Math. E-Notes 11 , 159–174 (2011)

MathSciNet   MATH   Google Scholar  

Raïssouli, M.: Refinements for mean-inequalities via the stabilizability concept. J. Inequal. Appl. 2012 , 55 (2012)

Raïssouli, M.: Stabilizability of the Stolarsky mean and its approximation in terms of the power binomial mean. Int. J. Math. Anal. 6 (18), 871–881 (2012)

Raïssouli, M.: Positive answer for a conjecture about stabilizable means. J. Inequal. Appl. 2013 , 467 (2013)

Raïssouli, M.: Stable and stabilizable means with linear operator variables. Linear Multilinear Algebra 62 (9), 1153–1168 (2014)

Udagawa, Y.: Operator monotonicity of a 2-parameter family of functions and \(\exp \{ f(x) \} \) related to the Stolarsky mean. Oper. Matrices 11 (2), 519–532 (2017)

Udagawa, Y., Wada, S., Yamazaki, T., Yanagida, M.: On a family of operator means involving the power difference means. Linear Algebra Appl. 485 , 124–131 (2015)

Download references

Acknowledgements

The authors would like to thank the referees for their careful and insightful comments to improve our manuscript.

SF was partially supported by JSPS KAKENHI Grant Number 16K05257.

Author information

Authors and affiliations.

Department of Mathematics, Science Faculty, Taibah University, P.O. Box 30097, Al Madinah Al Munawwarah, Zip Code 41477, Saudi Arabia

Mustapha Raïssouli

Department of Mathematics, Science Faculty, Moulay Ismail University, Meknes, Morocco

Department of Information Science, College of Humanities and Sciences, Nihon University, 3-25-40, Sakurajyousui, Setagaya-ku, Tokyo, 156-8550, Japan

Shigeru Furuichi

You can also search for this author in PubMed   Google Scholar

Contributions

The work presented here was carried out in collaboration between all authors. The study was initiated by the first author. The second author played the role of the corresponding author. All authors contributed equally and significantly in writing this article. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Shigeru Furuichi .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Raïssouli, M., Furuichi, S. On weighted means and their inequalities. J Inequal Appl 2021 , 65 (2021). https://doi.org/10.1186/s13660-021-02589-9

Download citation

Received : 08 June 2020

Accepted : 18 March 2021

Published : 06 April 2021

DOI : https://doi.org/10.1186/s13660-021-02589-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Weighted means
  • Weighted operator means and operator inequalities

weighted mean in research purpose

The Significance of Duration Weighted Neighborhood Effects for Violent Behavior and Explanation of Ethnoracial Differences

  • Original Paper
  • Open access
  • Published: 07 May 2024

Cite this article

You have full access to this open access article

weighted mean in research purpose

  • Paul E. Bellair   ORCID: orcid.org/0000-0002-5467-6001 1 ,
  • Thomas L. McNulty 2 &
  • Daniel L. Carlson 3  

13 Accesses

Explore all metrics

Two important issues constrain the neighborhood effects literature. First, most prior research examining neighborhood effects on aggression and self-reported violence uses a point in time (i.e., cross-sectional) estimate of neighborhood disadvantage even though the duration of exposure to neighborhood disadvantage varies between families. Second, neighborhood effects may be understated due to over-controlling for family socioeconomic conditions. Both limitations suggest that prior research may be underestimating neighborhood effects, which impacts research on the invariance thesis and explanation of ethnoracial differences.

The sample is drawn from the restricted use Future of Families and Child Well-being study. Data to measure youth’s exposure to neighborhood disadvantage is drawn from birth through age 9, with dependent variables measured at age 15. We estimate marginal structural models (MSM) with inverse probability of treatment weights (IPTW.

The results support hypotheses, indicating that the duration weighted measure of neighborhood disadvantage is more strongly associated with aggression and self-reported violence than the point in time, and that it accounts for a larger share of the ethnoracial differences.

Conclusions

The findings provide a clear image of the consequences of long-term exposure to neighborhood disadvantage for aggression and violence. They suggest that criminologists addressing neighborhood effects should attempt, when feasible, to document and model the duration of exposure to neighborhood disadvantage. They are also consistent with and add to a growing literature addressing MSM modeling with IPTW weights.

Similar content being viewed by others

weighted mean in research purpose

Neighborhood Influences on Antisocial Behavior During Childhood and Adolescence

Ecological context, concentrated disadvantage, and youth reoffending: identifying the social mechanisms in a sample of serious adolescent offenders.

weighted mean in research purpose

Neighborhoods, Schools, and Adolescent Violence: Ecological Relative Deprivation, Disadvantage Saturation, or Cumulative Disadvantage?

Avoid common mistakes on your manuscript.

Introduction

The literature documenting neighborhood effects on aggression and self-reported violence is constrained by at least two issues. First, prior studies likely understate the magnitude of neighborhood effects because they employ a point in time estimate of neighborhood disadvantage. As such, those studies do not capture cumulative, longer-term consequences of exposure to neighborhood disadvantage. Analytically, this means that adolescents experiencing short-term exposure to neighborhood disadvantage are treated as conceptually equivalent to those who have been exposed across a larger segment of their lives, particularly during childhood. Previous research finds that neighborhood effects on health, sex risk, educational, and life course outcomes are more pronounced when neighborhood effects are weighted to reflect the duration of the exposure (Kravitz-Wirtz 2016a , b ; Wodtke 2013 ; Wodtke et al. 2011 ). However, prior research has not addressed criminological outcomes such as self-reported violence or focused more specifically on aggression. Second, neighborhood effects research has characteristically controlled for family socioeconomic characteristics, especially family structure and income, for instance (see Baumer and South  2001 ; Brewster 1994 ; Browning et al. 2004 ; Cleveland and Gilson 2004 ; Kim 2010 ; Santelli et al. 2000 ). Wodtke et al. ( 2011 ) illustrate that these conventional regression models overcontrol for family SES and introduce collider–stratification bias (Elwert and Winship 2014 ; Greenland 2003 ). Wodtke et al. ( 2011 ) advocate utilizing marginal structural modeling (MSM) with inverse probability of treatment weights (IPTW), which corrects for the tendency to underestimate neighborhood effects.

Utilizing MSM with IPTW is also important when addressing ethnoracial differences in violent behavior. Shaw and McKay ( 1942 ) were among the first to document racial and ethnic differences in delinquency and to argue that they are a function of disadvantaged neighborhood conditions. Sampson and Wilson ( 1995 ) extend this logic and propose the racial invariance thesis, which hypothesizes that racial/ethnic disparities in violence reflect relative group-specific exposure to neighborhood structures, particularly concentrated disadvantage. Research addressing ethnoracial differences is built up on “point in time” (i.e., cross-sectional) measures of concentrated disadvantage, even though this approach makes the dubious assumption that each group’s length of exposure to disadvantage is roughly equivalent. This is a critical limitation. As Sharkey and Elwert ( 2011 ) document, many African American youth reside in disadvantaged neighborhoods their entire life. Sometimes this pattern is inter-generational within families, and both are likely true to an extent for other groups of color including multiracial and Hispanic families. This suggests that the developing literature on duration weighted modeling may have substantial relevance for understanding ethnoracial differences in aggression and violence.

In this paper, we contribute to the literature on violence by utilizing MSM and IPTW methods to inform a comparison between duration weighted and point in time estimates of concentrated neighborhood disadvantage. Drawing data from the Future of Families and Child Well-being study, which provides a national sample of adolescents who have been followed since birth, we test the hypothesis that duration weighted disadvantage exerts a stronger effect on primary caregiver’s reports of aggressive behavior and youth’s self-reported violence than the point in time estimate, and that it explains a greater percent of ethnoracial differences in those outcomes. We find support for both hypotheses. The findings carve a clearer image of the consequences of longer-term exposure to neighborhood disadvantage for aggressive and violent behaviors, and the explanation of ethnoracial differences. In the next sections, we review foundational literature examining neighborhood, duration weighted measurement, ethnoracial differences, and provide additional background on MSMs and IPTW.

Neighborhood Effects and Ethnoracial Disparities

The literature documenting neighborhood effects emerged in earnest in the mid-1980's (see Simcha-Fagan and Schwartz 1986 ) and early 1990’s, and “has become something of a cottage industry in the social sciences” (Sampson et al. 2002 : p 444). Although important methodological issues remain, researchers have focused inquiry on whether neighborhoods matter, rather than on the question of how they matter (Sharkey 2014 ). In particular, researchers have in general neglected the conceptual difference between episodic and long-term residence in disadvantaged neighborhoods (Sharkey and Elwert 2011 ).

The majority of research on ethnoracial disparities in violent behavior is framed within the social disorganization tradition. For instance, Shaw and McKay ( 1942 ) argue that high rates of delinquency among ethnoracial groups relative to Whites is a product of segregation and economic inequality (Sampson and Wilson 1995 ; Sampson et al. 2018 ; Wilson 1987 , 1996 , 2009 ), and is reflected by racialized economic disadvantage (Krivo et al. 2009 ; Peterson and Krivo 2010 ). As Graham ( 2018 : 450–1) eloquently summarizes with respect to Black and White youth:

“The typical Black adolescent in America lives in a very different type of neighborhood, attends a very different type of school, and is embedded in a very different type of social network, than her White counterpart. Almost a half century after the civil rights era, these differences remain poignant ... Minority children tend to grow up in different social environments than their White counterparts. This observation, while self-evident, is nevertheless important and, perhaps, too infrequently made.”

In addition, cultural adaptation and legal cynicism reinforce the use of aggression and violence as a means of conflict resolution in impoverished neighborhoods, which further places youth at risk (Anderson 1999 ; Heimer 1997 ; Stewart et al. 2006 ; Stewart and Simons 2006 ).

Our hypothesis is that the differences in social worlds experienced by White, Black, Multiracial, and Latinx youth are directly relevant for understanding ethnoracial differences in aggression and violence. Official (UCR) and self-reported data point to race differences, but the magnitude varies by source. Compared with non-Hispanic White adolescents, it is well documented that Black youth are over-represented in arrests for serious index crimes, particularly violent crimes (Hawkins et al. 2000 ), while Asian youth are under-represented. UCR data do not include the ethnicity of arrestees, thus they are less useful for investigating Hispanic involvement. Ethnoracial differences are less pronounced in self-report surveys for a variety of reasons, but Black-White difference is nevertheless evident, particularly in violence and more serious offenses (Morenoff 2005 ). Self-report studies also reveal that Hispanics are over-represented in violence relative to Whites (McNulty and Bellair 2003b ). Adolescents of Asian descent are under-represented relative to Whites (McNulty and Bellair 2003b ).

Findings from recent contextual studies indicate that ethnoracial disparities in violence are reduced once concentrated disadvantage is held constant (Bellair and McNulty 2005 ; Lauritsen and White 2001 ; McNulty and Bellair 2003a , b ; Sampson et al. 2005 ), which is consistent with the racial invariance thesis (Sampson et al. 2018 ). This has created ambiguity in the literature because concentrated disadvantage does not fully explain ethnoracial disparities in violence in many studies, which some argue contradicts the invariance thesis (Unnever 2018 ). This has led to calls for a distinct theory to address the unique causes of crime in poor, African American neighborhoods, and we presume other communities of color, such as a high incidence of police contact and experiences of discrimination (Unnever et al. 2016 ). Yet, this may be a false dichotomy, because point in time measurement of neighborhood disadvantage may not fully capture historic or more contemporary discrimination that is rooted in neighborhood structures (Sampson et al. 2018 ). Indeed, most studies employ point-in-time (i.e., cross-sectional) indicators of disadvantage, which may underestimate neighborhood effects relative to duration weighted measurement (see Carlson et al. 2022 ).

Duration Weighted Neighborhood Disadvantage and Ethnoracial Disparities

Ambiguous findings pertaining to the role of neighborhoods in producing ethnoracial disparities in violence do not account for the length of one’s life spent residing in disadvantaged neighborhoods. Most importantly, we hypothesize that the failure to account for cumulative exposure to neighborhood disadvantage throughout childhood and adolescence may lead to underestimation of neighborhood effects. Some research assesses the critical distinction between point-in-time or cross-sectional measurement of neighborhood disadvantage and duration weighted or cumulative exposure to neighborhood disadvantage. One study suggests that point in time measurement is a reasonable proxy for duration weighted measurement (Kunz et al. 2003 ). Most studies (Kravitz-Wirtz 2016a , b ; Wodtke 2013 ), however, document significant variation in youth exposure to neighborhood disadvantage over time (e.g., Timberlake 2007 ; Quillian 2003 ), and report that incorporating variation in exposure significantly improves prediction and explanation of self-rated health, smoking initiation, sex risk behavior, health inequalities, high school graduation, reading and math test scores, and adolescent parenting (Carlson et al. 2022 ; Hicks et al. 2018 ; Jackson and Mare 2007 ; Quillian 2003 ; Timberlake 2007 ; Wodtke et al. 2011 ). Although the aforementioned studies indicate support for duration weighted measurement of neighborhood disadvantage, there is very limited research that includes criminological outcomes. In particular, there are no studies to our knowledge focusing specifically on youth aggression and self-reported violence.

Disadvantaged neighborhoods have fewer institutional resources, such as quality schools, recreational activities, child care, medical facilities, and employment opportunities (Leventhal and Brooks-Gunn 2000 ). The dearth of institutional resources is likely to undermine children and adolescent’s social bonds to prosocial institutions such as school (Hirschi 1969 ). Adolescents residing in disadvantaged neighborhoods often experience distinctive cultural milieus and role models (Akers 1998 ), and may observe economic marginality among residents that leads them to conclude that their own future opportunities are limited (Bellair and Roscigno 2000 ). Economic deprivation, blighted institutional resources, and limited educational or employment opportunities may weaken the constraints and opportunity costs that impede risky behavior (Borowsky et al. 2009 ; Edin and Kefalas 2005 ).

The distinction between point-in-time exposure and cumulative exposure is critical because the neighborhood mechanisms that produce delinquent behavior may be more fundamental when experienced over a longer duration. As Sharkey ( 2014 : p 567) poignantly describes:

“It is natural to think that the residential environment surrounding children will have a greater influence on their lives if they are in the same environment over years or decades. Only recently, however, has the dimension of time entered into the empirical literature on neighborhood effects.”

Sharkey ( 2014 ) further points out that in the context of Wilson ( 1987 ) and Massey and Denton’s ( 1993 ) literature defining work, duration of exposure to disadvantaged residential environments is a critical dimension, yet that component is rarely addressed in most research. Due in part to discriminatory housing practices and residential segregation, African American, Multiracial, and Hispanic youth are more likely than Whites (and Asians) to experience generational exposure to concentrated neighborhood disadvantage (Lichter et al. 2016 ; Sharkey 2013 ; Timberlake 2007 ). Because a duration-weighted neighborhood disadvantage index captures adolescents that may have spent their entire lives exposed to the extremes of concentrated disadvantage, it is likely to yield better explanation of ethnoracial disparities.

An additional limitation of previous research is that time-varying factors, like family structure and socioeconomic status (SES), are treated as control variables that are unique relative to the neighborhood effect. However, time-varying confounders are in part outcomes of living in a disadvantaged residential context, and could bias estimates of neighborhood effects downward. This may introduce collider-stratification bias (Elwert and Winship 2014 ; Greenland 2003 ) due to joint associations with unobserved confounders that are themselves related to youth violence, and prior treatment to neighborhood disadvantage and violence (Elwert and Winship 2014 ; Greenland 2003 ; Wodtke et al. 2011 ). Time-varying factors may also mediate the association between neighborhood disadvantage and youth violence. Although including time-varying controls in regression models may produce downwardly biased estimates of neighborhood effects, excluding these variables may result in overestimation of neighborhood effects. Indeed, factors like family structure and SES also affect the length of youths’ exposure to neighborhood disadvantage (for a review, see Carlson et al. 2014 ).

Data and Methods

We address these issues with data drawn from the Future of Families and Child Well-Being Study (FFCWS). FFCWS is a longitudinal cohort study of 4,898 children born between 1998 and 2000 in 20 large U.S. cities, with an oversample of nonmarital births. Interviews in the hospital with mothers and fathers (or primary caregivers) at the child’s birth comprise the baseline, Wave 1 data. Follow ups are conducted when children reach age 1 (Wave 2), 3 (Wave 3), 5 (Wave 4), 9 (Wave 5), and 15 (Wave 6). Phone interviews with parents are conducted when the children are ages 1 (Wave 2), 3 (Wave 3), and 5 (Wave 4), and in-home assessments of children’s home environments are also conducted at ages 3 (Wave 3), 5 (Wave 4), 9 (Wave 5), and 15 (Wave 6). US census tract data are matched to the FFCWS at each wave because they have been used extensively to estimate neighborhood-level effects in prior research (Coulton et al. 2001 ; Krieger et al. 2003 ). Indicators of neighborhood disadvantage are drawn from the 2000 US Census through Wave 5, and the 2010 US Census and the 2015 American Community Survey 5-year averages are used at Wave 6.

The analysis is restricted to youth with complete data through Wave 6 because that is the year we measure the dependent variables. The sample is further delimited to youth who were identified by their parents as multiracial, non-Hispanic Black, non-Hispanic White, non-Hispanic Asian, or Hispanic. The very small number of youth (i.e., 40) who did not fit those categories are dropped due to limited statistical power. Of the original 4,898 children whose mothers participated in the baseline interview, 3,404 completed the Wave 6 interview when the youth were 15 years old. Missing information on parents’ income and work hours contributed the most to attrition across waves. Of the 3,404 youth who participated in the Wave 6 interview, 3,360 cases had complete information on parent/guardian reports of youths’ aggressive behavior and 3,381 had complete information on youths’ self-report of violent behavior. Missing information for independent and control variables at each wave was replaced using multiple imputation with chained imputations in Stata 15. We used Von Hippel ( 2020 ) program “how_many_imputations” in Stata to estimate the number of imputations needed for efficient estimation of standard errors. The program provides a recommended number of imputations such that standard errors for estimates do not change significantly with additional imputations. The results indicated that 10 imputed data sets were likely sufficient. As we describe in more detail below, missing data is addressed in part by employing a censoring weight that upweights cases with a greater likelihood of attrition.

Two dependent variables are analyzed in this study. The p rimary caregiver report of youth aggressio n is a summary scale of primary caregivers’ responses to 11 items from the child behavior checklist (CBCL). The items reflect whether the child is aggressive in different contexts. The items were completed by the primary caregiver when the youth were fifteen years old. Examples of items include whether the child argues a lot, is cruel to or bullies other children, destroys things, and gets into fights with and/or physically attacks other children. Response options for each item include: 0 = not true; 1 = sometimes true; 2 = always true. The scale reflects the mean level of involvement across each item.

The second dependent variable is y outh’s self-report of violent behavio r at age 15. Four items are included: getting into a serious physical fight, hurting someone badly enough to need medical care, using or threatening to use a weapon, and taking part in a group fight. Response categories range from never, 1 or 2 times, 3 or 4 times, or 5 or more times (coded 0 to 3, respectively). Self-reported violence is measured as the mean involvement across those items.

Ethnoracial identity is represented by a set of dummy variables. The variables distinguish multiracial, non-Hispanic Black, Hispanic, non-Hispanic Asian youth from non-Hispanic White youth. White youth comprise the reference category.

Neighborhood disadvantage is a composite index of seven indicators at the census-tract level, including (1) percentage of residents over age 16 who are unemployed, (2) percentage of households below the federal poverty threshold, (3) percentage of female-headed households with minor children, (4) percentage of adults over age 25 without a high school degree, (5) percentage of properties that are vacant, (6) percentage of residents who are non-White, and (7) median household income (reverse coded). Not all studies include percent non-White in the calculation of concentrated disadvantage. It is included here to capture the effect of historic discrimination, which Sampson ( 2012 ) argues is rooted in contemporary neighborhood structure. Principal component analysis identified a single common factor among the indicators and a composite score for each. Footnote 1 Each indicator of disadvantage is then standardized (i.e., represented as z scores) using the national census-tract average and standard deviation and then combined.

We then divided census tracts into quintiles based on the national distribution of the composite neighborhood disadvantage measures to create an ordinal measure of neighborhood disadvantage at age 9 that ranged from 1 = least disadvantaged to 5 = most disadvantaged. By creating an ordinal measure of neighborhood disadvantage we can more parsimoniously predict treatment and construct probability of treatment weights. The use of a continuous measure of disadvantage would likely necessitate the estimation of thousands of models for each disadvantage score. To construct duration of exposure to neighborhood disadvantage through age 9 , we took the average of neighborhood disadvantage across the five waves up to and including wave 5 (age 9).

To explain ethnoracial disparities in youth aggression and violence, and to demonstrate the utility of measuring duration-weighted exposure to disadvantage, models alternately and collectively include a measure of neighborhood disadvantage at age 9 (Wave 5) and duration-weighted exposure to neighborhood disadvantage . Because duration-weighted exposure is an average composite of exposures to neighborhood disadvantage, direct comparisons of effect magnitude can be made between proximal exposure to disadvantage at age 9 and long-term exposure as indicated by the duration-weighted measure. The means of the neighborhood disadvantage measures are statistically indistinguishable, yet readers should exercise some caution when comparing coefficients because small differences in the distributions could complicate the comparison. Although measuring neighborhood disadvantage up to age 9 establishes temporal ordering with respect to the outcomes, it may result in underestimation of neighborhood effects. However, a supplemental model (not shown) that substituted a measure of neighborhood disadvantage (i.e., point in time) at age 15 (Wave 6) did not produce substantively different results.

Control Variables

We control for a set of time-invariant and time-varying confounders in our models. Time-invariant baseline controls include youth female (1 = female), mother’s age at youth's birth (in years), mother U.S. born (1 = yes), youth born outside of marriage (1 = yes), and mother’s education (1 = less than high school, 2 = high school or equivalent, 3 = some college or technical degree, 4 = college or advanced degree). Five time-varying socioeconomic confounders were assessed at each wave and are included in the estimation of the marginal structural and IPTW models. Family structure is assessed with dummy variables distinguishing two-parent biological family (reference), single mother living with relatives or alone, mother–stepfather family, and other family arrangements. Two-parent biological families were defined as a child living with both biological parents, either cohabiting or married. We controlled for household income measured in dollars to assess time-varying aspects of family socioeconomic status (SES). Parent work hours is a continuous measure indicating the number of hours the parent worked; participants not in the labor force were coded 0. Father’s work hours are used in limited cases when the child resides with the father only, and mother’s work hours are used in all other cases. Last, residential mobility (1 = yes) between waves is included to capture whether respondents moved in between waves.

Analytic Strategy

Marginal structural models (MSM) (Fewell et al. 2004 ) are used to assess the role of exposure to neighborhood disadvantage in explaining youths’ aggressive and violent behavior, including ethnoracial differences. We choose MSM for our analyses because MSM produces more valid estimates in situations when an explanatory time-varying variable – in this instance exposure to neighborhood disadvantage – may be reciprocally related to time-varying confounders – like family SES. MSM creates a pseudo-population where time-varying explanatory variables and time-varying confounders are unassociated with each other (Do et al. 2013 ). The models are run in two stages. In the first stage, we construct an inverse probability of treatment weight (IPTW) for youths’ probability of exposure to their actual neighborhood disadvantage quintile at Wave k . This probability is calculated based on time-varying and time-invariant predictors of exposure to neighborhood disadvantage. In Stage 2, the IPTW is used to adjust regression estimates predicting aggression and violence.

To construct the IPTW, we used data from the first five waves of the FFCWS among youth in our analytic sample. The IPTW is presented in Eq.  1 .

f{*} is the conditional probability density function, A ( k ) is the time-varying treatment at Wave k , A ( k – 1) is previous treatment history up to wave ( k – 1), and \(\underline{L}(k)\) represents the history of time-dependent confounders. Estimation of the IPTW includes information on youths’ neighborhood quintile at baseline and at Wave ( k – 1), time-invariant baseline covariates, and information on time-varying covariates at Wave k , Wave ( k – 1), and baseline.

We calculate a stabilized IPTW to reduce variability and non-normality in the estimate. Stabilized weights have smaller variance, a mean of around 1, and are approximately normally distributed (Robins et al. 2000 ). Using notation from Fewell et al. ( 2004 ), we calculate the stabilized IPTW as follows:

The numerator is a constrained version of the predicted probability of treatment (i.e., exposure to actual neighborhood disadvantage quintile) that includes neighborhood treatment at Wave k, Wave ( k – 1), and baseline measures of covariates only L (0). It can be interpreted as the subject’s probability of experiencing their treatment history up to Wave k given baseline values of controls (Do et al. 2013 ). The denominator is the probability of treatment at Wave k given time-varying covariates and baseline controls.

To partially account for the possibility of nonrandom sample attrition, we multiply the stabilized IPTW at each wave of observation by a censoring weight which estimates the inverse probability of dropping out of the study at any particular wave as predicted by time-invariant and time-varying variables. Table 1 displays summary statistics for the stabilized IPTW and censoring weight.

We conduct our analysis in two steps. First, we establish the utility of the MSM by comparing estimates of the association of exposure to neighborhood disadvantage with caregivers’ reports of youth aggression and youths’ self-reported violence across three models – 1) unadjusted bivariate generalized linear model (GLM) regression (gamma distribution); 2) multivariate GLM regression (gamma) with time-invariant and time-varying controls, and 3) IPTW adjusted multivariate gamma GLM regression (gamma) with time-invariant controls. Analyses of aggressive and violent behaviors are conducted using GLM (gamma) given the continuous, positively skewed distribution of scores for aggressive and violent behavior and a high variance relative to the scale mean. Given that we assess the dependent variables at only one point in time (age 15), values for time-varying variables and the IPTW are calculated as the average value of these variables across waves. To establish temporal order, all time-varying variables, including neighborhood disadvantage and all confounders, as well as the IPTW are assessed only through Wave 5 (age 9).

In the second step we use GLM regression (gamma) and the KHB decomposition method (Breen et al. 2013 ) to examine how exposure to neighborhood disadvantage mediates the association of ethnoracial identity with aggressive and violent behaviors. As described in the measures section, two measures of neighborhood disadvantage in our models – a point-in-time estimate of neighborhood disadvantage at age 9 and a duration-weighted measure of exposure to neighborhood disadvantage through age 9. We include both measures to compare their association with aggression and self-reported violence. Both variables are measured on the same scale, with similar means and standard errors that are statistically indistinguishable ( p  > 0.05). Thus, the coefficients are directly comparable, but with caution. In the first regression model, we examine the association of ethnoracial identity with youths’ aggressive/violent behaviors. In Models 2 through 4, we add the two measures of neighborhood disadvantage separately and then jointly. In Model 5, we include background confounders of the association between neighborhood disadvantage and the outcomes.

Descriptive Statistics

Table 2 presents descriptive statistics for the full sample in the first column and the ethnoracial subgroups in columns 2–6. Non-Hispanic Black children (47%) comprise the largest share of the sample, followed by Hispanics (20%), non-Hispanic Whites (17%), and those who identify as multiracial (15%). Non-Hispanic Asian youths are under-represented in the FFCWS with only 40 children in the sample. Consistent with the literature on violence, primary caregiver reports of aggression and the youths’ self-reported violent behavior are relatively rare within the sample. The scale mean of 0.27 on the primary caregiver report of aggression indicates that about 27% of youth behaved aggressively. Among youth, the mean score for violent behavior (0.16) indicates that only 16% of the youth engaged in any of the violent behaviors, on average. There is significant variation in aggression and violent self-reports among ethnoracial subgroups. Youth who identify as multiracial and non-Hispanic Black evidence a significantly higher score on primary caregivers’ reports of aggressive behavior compared to non-Hispanic White children. Hispanic and Asian youth are statistically similar to non-Hispanic Whites. Regarding youths’ self-reports of violent behavior, multiracial, Black, and Hispanic youth report engaging in more violence than non-Hispanic White youth. Asian youth are statistically similar to non-Hispanic Whites.

On average, youths in the sample reside in contexts that are neither extremely disadvantaged nor advantaged, experiencing levels between those polar extremes (i.e., 3.102 out of 5). The neighborhood disadvantage means also vary based on the duration weighted or point in time (i.e., age 9). Non-Hispanic White and Asian youth are at the lowest risk of longer-term exposure to neighborhood disadvantage, and evidence lower levels of neighborhood disadvantage at age 9, compared to multiracial, Black, and Hispanic youth. Duration weighted exposure is especially pronounced for non-Hispanic Blacks and Hispanics.

Youths’ biological sex is roughly evenly split with females comprising about 49% of the sample, and there are slightly more males than females in each ethnoracial group except Asian. Non-Hispanic White and Asian children are significantly less likely to have been born to younger mothers, or to have been born outside of marriage, compared to the other ethnoracial groups. Most mothers were born in the U.S. The exception is Hispanic mothers, 40% of whom are foreign born. Roughly 76% of the sample is born outside of marriage, reflecting the sampling design of the FFCWS. Educational attainment among the mothers is somewhat constrained, with just under a third of mothers attaining less than high school and roughly the same proportion finishing high school. About 25% have attained some college with about 1 in 10 completing a college or graduate degree. Non-Hispanic White and Asian mothers are most likely to be college graduates. At the other extreme, multiracial (29%), Black (32%), and especially the mothers of Hispanic children (50%) are significantly more likely than White mothers (13%) to have less than a high school degree.

Over half of the families in the data are led by single mothers, just under one third are two-parent biological, and 15% are mother-stepfather, with a small fraction of other family types. Among the ethnoracial subgroups, non-Hispanic White and Asian children are significantly more likely to live in two biological parent families, whereas multiracial (59%), Black (63%), and Hispanic children (49%) are significantly more likely to be in single parent households compared to their White and Asian counterparts.

The typical household earns just over $41,000, and mothers work about 25 h per week on average. There is marked variation among the subgroups, with Asian and White households earning over twice the income relative to non-Hispanic Black and Hispanic, with multi-racial households earning somewhat, but not substantially higher wages relative to non-Hispanic Black and Hispanic households. Parent’s work hours do not vary markedly across subgroups, although Hispanic mothers work the fewest number of hours. Multiracial and non-Hispanic Black youth are most likely to have moved between waves, followed by Hispanics, non-Hispanic Whites and Asians.

Multivariate Models

Table 3 illustrates the consequences of failing to properly account for reciprocity in the relationships of time-varying confounders with neighborhood disadvantage. In a bivariate GLM (gamma) model, the duration weighted measure of neighborhood disadvantage exerts a positive and significant effect on aggressive behavior, but the effect dissipates and is not significant in the multivariate GLM (gamma) regression when time-invariant and time-varying variables are included as statistical controls. This indicates that when the IPTW weight is not utilized, the effect of duration weighted disadvantage is underestimated. However, once the stabilized IPTW is incorporated into the model, the duration weighted effect is positive and significant, indicating the importance of the length of time exposed to disadvantage and its role in shaping aggressive behavior by age 15. A similar pattern is observed when the lens is shifted towards youth’s self-reported violence, although the effect of duration weighted exposure to neighborhood disadvantage remains significant without invoking the stabilized IPTW.

Primary Caregiver Reports of Youth Aggression

Table 4 regresses primary caregiver reports of aggression (age 15) on ethnoracial group and neighborhood disadvantage. Model 1 highlights the baseline levels of aggressive behavior by ethnoracial identity. It indicates that Black children exhibit significantly more aggressive behavior compared to their non-Hispanic White counterparts. Multiracial, Hispanic, and Asian youth evidence statistically similar levels of involvement relative to non-Hispanic Whites.

Model 2 incorporates duration-weighted exposure to neighborhood disadvantage, which has the expected positive, significant effect on adolescent aggression. Importantly, Model 2 indicates that the disparity in aggressive behavior between White and Black adolescents is reduced to non-significance, reflecting the relatively longer-term exposure of the latter to neighborhood disadvantage compared to White children (see Table  2 ). The Hispanic-White disparity is suppressed in Model 1, as Model 2 reveals significantly lower levels of aggression among Hispanic youth (relative to White) when duration weighted exposure is controlled. In Model 3, we incorporate the point-in-time measure of neighborhood disadvantage, which exerts a significant, positive effect on aggressive behavior. It is also noteworthy that the point-in-time measure does not fully explain the heightened aggression among Black adolescents relative to White adolescents.

In Model 4, the duration weighted disadvantage measure is contrasted directly with the point in time measure. The results reveal that the duration weighted measure retains significance, while the point in time measure is zero. Variance inflation factor (VIF) estimates using OLS regression reveal little evidence of multicollinearity between these measures of disadvantage (VIF = 3.4). This indicates that the duration weighted measure, which reflects longer term exposure to disadvantaged contexts, has greater explanatory power than the point in time estimate and that point in time estimates are not proxies for measures of duration of exposure to disadvantage. Note as well that the difference in coefficients between the duration-weighted and point-in-time measures in Model 4 are statistically different ( p  < 0.01), lending additional evidence that duration weighted exposure better captures the realities of the consequences of longer term exposure to neighborhood disadvantage relative to cross-sectional measures.

Model 5 includes the remaining time-invariant control variables along with duration weighted exposure to neighborhood disadvantage. The latter retains a significant, positive effect and the Black effect remains zero. We note, however, consistent with recent research, that Hispanic youth evidence a lower risk of aggressive behavior than White youth, an effect that is suppressed by duration-weighted exposure to neighborhood disadvantage. Control variables have expected effects on aggression, but do not explain away the effect of duration weighted disadvantage on aggressive behavior. Females are less involved than males. Greater maternal educational attainment also exerts a constraint. In contrast, youth born outside of marriage, to younger mothers, and to U.S. born mothers are more likely to engage in aggressive behavior at age 15. Overall, the findings indicate support for the hypothesis that longer-term exposure to concentrated disadvantage is more consequential for Black and Hispanic youth (relative to White) than is point in time exposure.

Youth Self-reports of Violent Behavior

Table 5 presents GLM (Gamma) regressions of youth self-reports of violent behavior on ethnoracial identity and neighborhood disadvantage. Model 1 shows that Multiracial, Black, and Hispanic adolescents evidence significantly more involvement in violence compared to non-Hispanic white youth. Asian youth are statistically similar to their white counterparts. Model 2 incorporates duration weighted exposure to neighborhood disadvantage, which has a significant effect ( p  < 0.001) on adolescent self-reported violence. The Hispanic-White disparity in violence involvement is reduced in magnitude and to non-significance, whereas the Multiracial and Black coefficients have been reduced substantially but remain significant.

Model 3 in Table  5 incorporates point-in-time neighborhood disadvantage at age 9, which has a significant, positive effect on violent behavior. Ethnoracial disparities in violent behavior compared to Whites remain significant. Compared to Model 1 the Multiracial, Black, and Hispanic coefficients are reduced but not to the extent evidenced in Model 2, which included duration weighted exposure to neighborhood disadvantage. In Model 4, the duration weighted disadvantage measure is contrasted directly with the point in time measure. The latter is zero and not significant, whereas duration weighted exposure retains a significant positive effect on violence. This indicates that prolonged exposure to neighborhood disadvantage has greater explanatory power. Importantly, the effect of duration-weighted exposure is significantly larger than the point-in-time coefficient in Table  5 , Model 4 ( p  < 0.001), which again suggests that duration-weighted exposure better captures the consequences of longer term exposure to neighborhood disadvantage relative to point-in-time measures.

Model 5 includes the time-invariant control variables along with duration weighted exposure to neighborhood disadvantage. The Hispanic-White disparity in violence remains zero, whereas the magnitude of the coefficients for Multiracial and Black youth have been reduced substantially (compared to Model 1), although they remain significant. Control variables have significant effects in theoretically expected directions that mirror the findings in Table  4 .

KHB Decomposition

Table 6 presents the results from the KHB decomposition analyses. The KHB method decomposes the total effect of a variable into direct and indirect effects (Karlson et al. 2012 ), including both discrete and continuous variables, providing a test of mediation. The KHB method was not designed specifically to decompose effects for GLM models and so estimates should be interpreted with some caution. The decomposition compares the coefficients of the ethnoracial variables between models that alternately include the duration weighted measure of disadvantage (Model 5 of Tables 4 and 5 are used as input). The difference between the coefficient of the ethnoracial variables in the two models reveals the portion of the total effect that is mediated by duration-weighted exposure to neighborhood disadvantage.

With respect to caregiver reports of aggression, findings indicate that about 70% of the total Multiracial-White difference, 87% of the total Black-White difference, and about 217% Footnote 2 of the Hispanic-White difference in aggression is explained by duration-weighted exposure to neighborhood disadvantage. With respect to youth self-reports of violence, about 45% of the total Multiracial-White difference, 52% of the Black-White difference, and about 80% of the Hispanic-White difference is attributable to long-term exposure to neighborhood disadvantage. The results highlight the consequences of prolonged exposure to neighborhood disadvantage for aggression and violence outcomes, especially during childhood.

Discussion and Conclusion

This study addresses a series of issues that are critical for the neighborhood effect literature as it pertains to studies of aggression and self-reported violence. First, and perhaps most importantly, the results support a growing body of literature documenting the importance of duration of exposure to neighborhood disadvantage (Kravitz-Wirtz 2016a , b ; Wodtke 2013 ; Wodtke et al. 2011 ). Most studies do not differentiate between individuals or families that are intermittently exposed to socioeconomically disadvantaged neighborhoods from those that experience a longer duration of exposure, and some studies have suggested that there is no need to make this distinction (Kunz et al. 2003 ). Our findings support the hypothesis that duration weighted measurement of disadvantage exerts a larger effect on adolescent aggressive and violent behavior than the cross-sectional (i.e., point in time) estimate. When both measures are included in the same model, only the duration weighted measure retains significance.

The distinction intersects directly with multilevel racial invariance research, particularly studies that address ethnoracial differences in aggression and violence (e.g., McNulty and Bellair, 2003b ). Shaw and McKay ( 1942 ) argued that the neighborhood contexts experienced by poor Blacks, in particular, are qualitatively distinct from those of predominantly White neighborhoods. This logic is the basis of the racial invariance thesis, wherein ethnoracial differences in delinquency, particularly violence, are the result of group-specific exposure to neighborhood structures, particularly concentrated disadvantage (Sampson and Wilson 1995 ). We find, as have others, that Black and Hispanic families are more likely to be exposed to disadvantage and experience greater socioeconomic inequality relative to White families. Findings reveal that prolonged exposure to neighborhood disadvantage explains the Black-White disparity in aggression and that Hispanics are at a significantly lower risk of aggressive behavior relative to Whites (Table  4 ). Regarding youth self-reported violence, duration weighted exposure explains the Hispanic-White disparity in violence, and substantially mediates Multiracial and Black effects (Tables 5 and 6 ). The analysis indicates that duration weighted exposure to neighborhood disadvantage explains a larger share of the association with aggression and violence than does the point in time measure. We conclude that point in time estimation is not a proxy for duration weighted exposure despite the fact that those measures are typically correlated.

We suggest that analyses of neighborhood effects should be wary of this distinction in future research. It will also be important to replicate these findings and to go beyond with analysis of delinquency scales that tap into serious misbehavior that is likely to draw official attention from schools and the criminal justice system, including more serious forms of violence for which ethnoracial disparities are especially pronounced. In addition, multilevel models addressing neighborhood effects on individual-level disparities in violence are needed to revisit the racial invariance thesis using duration weighted exposure measures. Our results provide additional evidence that longer term exposure to concentrated neighborhood disadvantage is quite detrimental to families and children, and that it is directly associated with aggression and violence.

Beyond the issue of the importance of duration weighted disadvantage, the results for the primary caregiver reports of aggression indicate that conventional regression models often overcontrol for individual-level socioeconomic characteristics (e.g., family SES), leading researchers to underestimate neighborhood effects (Elwert and Winship 2014 ; Greenland 2003 ). As Wodtke et al. ( 2011 ) advocate, we use marginal structural modeling (MSM) with inverse probability of treatment weights (IPTW) to correct for the tendency to underestimate neighborhood effects.

We recognize the difficulty of addressing these issues in prior research. Cross-sectional datasets lack the detailed migration information needed to correctly characterize the duration of exposure to neighborhood disadvantage, but longitudinal data sets offer the opportunity to document these exposures. The issue can be rectified by geocoding previous addresses retrospectively. Yet, even with that information, it is unlikely that most longitudinal data sets contain the residential addresses of previous generations and thus cannot fully address the generational issues raised by Sharkey and Elwert ( 2011 ). The results presented here do not include generational data, but even without it, we are still able to pick up and document strong duration weighted effects on youth aggression and violent behavior.

The analysis has limitations, in particular the gaps between waves in the FFCWS prevent us from forming a more accurate estimate of long-term exposure to neighborhood disadvantage. For example, wave 1 and wave 2 are conducted at birth and at age 1, but subsequent follow-ups do not occur until age 3 and then again at age 5. The final wave we utilize for the disadvantage measures takes place when children are 9 years old. Although respondents are asked whether they changed residences between the follow-ups, there is no available residential geocode from which tract-level census data could be merged for the intervening years. This could be improved in future research with data sets that have annual geocoded residential information. Beyond the issue of measuring youth’s exposure to disadvantage is the issue of measuring parent’s residential patterns. This latter point addresses intergenerational patterns of exposure, which are likely to vary significantly between and importantly within ethnoracial groups.

Overall, we think that the findings open up new possibilities to investigate neighborhood effects in criminology. There are several subareas in which neighborhood effects are prominent, including the delinquency, gang, recidivism, and conflict literatures, among others. Neighborhoods clearly matter, but so does measurement. In closing, we second Sharkey’s ( 2013 ; 2014 ) insight that researcher’s should refocus attention away from asking whether neighborhoods matter for families and youth outcomes, and towards investigation of how they matter, especially duration of exposure.

Factor loadings for Neighborhood Disadvantage: proportion no high school degree .399; proportion in poverty = .450; proportion unemployed = .384; proportion of properties that are vacant .123; proportion of female-headed households .432; proportion of residents who are non-white .368; median household income .392.

Duration weighted exposure to neighborhood disadvantage explains more than 100% of the total Hispanic-White difference since lower levels of aggression among Hispanic youth relative to White youth is suppressed via exclusion of disadvantage from models.

Akers R (1998) Social learning and social structure: A general theory of crime and deviance. Northeastern University Press, Boston, MA

Google Scholar  

Anderson E (1999) Code of the Street: Decency, Violence, and the Moral Life of the Inner City. New York, NY: W. W. Norton and Company

Baumer EP, South SJ (2001) Community effects on youth sexual activity. J Marriage Fam 63(2):540–54

Article   Google Scholar  

Bellair PE, McNulty TL (2005) Beyond the bell-curve: Community disadvantage and the explanation of black-white differences in adolescent violence. Criminology 43:1135–1168

Bellair PE, Roscigno VJ (2000) Local labor market opportunity and adolescent delinquency. Soc Forces 78:1509–1538

Borowsky IW, Ireland M, Resnick MD (2009) Health status and behavioral outcomes for youth who anticipate a high likelihood of early death. Pediatrics 124(1):e81-88

Breen R, Karlson KB, Holm A (2013) Total, direct, and indirect effects in logit and probit models. Sociological Methods & Research 42(2):164–191

Brewster KL (1994) Race differences in sexual activity among adolescent women: The role of neighborhood characteristics. Am Sociol Rev 59(3):408–424

Browning CR, Leventhal T, Brooks-Gunn J (2004) Neighborhood context and racial differences in early adolescent sexual activity. Demography 41(4):697–720

Carlson DL, McNulty TL, Bellair PE, Watts S (2014) Neighborhoods and racial/ethnic disparities in adolescent sexual risk behavior. J Youth Adolesc 43(9):1536–1549

Carlson DL, Bellair PE, McNulty TL (2022) Duration-weighted exposure to neighborhood disadvantage and racial-ethnic differences in adolescent sexual behavior. J Health Soc Behav 63(1):71–89

Cleveland HH, Gilson M (2004) The effects of neighborhood proportion of single-parent families and mother-adolescent relationships on adolescents’ number of sexual partners. J Youth Adolesc 33(4):319–329

Coulton CJ, Korbin J, Chan T, Marilyn Su (2001) Mapping residents’ perceptions of neighborhood boundaries: a methodological note. Am J Community Psychol 29(2):371–383

Do DP, Wang Lu, Elliott MR (2013) Investigating the relationship between neighborhood poverty and mortality risk: A marginal structural modeling approach. Soc Sci Med 91(1):58–66

Edin K, Kefalas M (2005) Promises I can keep: Why poor women put motherhood before marriage. University of California Press, Berkeley

Elwert F, Winship C (2014) Endogenous selection bias: The problem of conditioning on a collider variable. Ann Rev Sociol 40:31–53

Fewell Z, Hernán MA, Wolfe F, Tilling K, Choi H, Sterne JAC (2004) Controlling for time-dependent confounding using marginal structural models. Stand Genomic Sci 4(4):402–420

Graham BS (2018) Identifying and estimating neighborhood effects. J Econ Lit 56(2):450–500

Greenland S (2003) Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology 14(3):300–306

Hawkins DF, Laub JH, Lauritsen JL, Cothern L (2000) Race, ethnicity, and serious and violent juvenile offending. Washington, DC: U.S. Department of Justice, Office of Juvenile Justice and Delinquency Prevention

Heimer K (1997) Socioeconomic status, subcultural definitions, and violent delinquency. Soc Forces 75:799–833

Hicks AL, Handcock MS, Sastry N, Pebley AR (2018) Sequential neighborhood effects: The effect of long-term exposure to concentrated disadvantage on children’s reading and math test scores. Demography 55(1):1–31

Hirschi T (1969) Causes of delinquency. University of California Press, Berkeley

Jackson MI, Mare RD (2007) Cross-sectional and longitudinal measurements of neighborhood experience and their effects on children. Soc Sci Res 36(2):590–610

Karlson KB, Holm A, Breen R (2012) Comparing regression coefficients between same-sample nested models using logit and probit: a new method. Sociol Methodol 42(1):286–313

Kim J (2010) Influence of neighbourhood collective efficacy on adolescent sexual behaviour: Variation by gender and activity participation. Child: Care, Health and Development 36(5):646–54

Kravitz-Wirtz N (2016a) A discrete-time analysis of the effects of more prolonged exposure to neighborhood poverty on the risk of smoking initiation by age 25. Soc Sci Med 148:79–92

Kravitz-Wirtz N (2016b) Temporal effects of child and adolescent exposure to neighborhood disadvantage on black/white disparities in young adult obesity. J Adolesc Health 58(5):551–557

Krivo LJ, Peterson RD, Kuhl DC (2009) Segregation, racial structure, and neighborhood violent crime. Am J Sociol 114(6):1765–1802. https://doi.org/10.1086/597285

Krieger N, Zierler S, Hogan JW, Waterman P, Chen J, Lemieux K, Gjelsvik A (2003) Geocoding and measurement of neighborhood socioeconomic position: a US perspective. In: Kawachi I, Berkman LF (eds) Neighborhoods and health. Oxford University Press, Oxford, pp 147–148

Kunz J, Page ME, Solon G (2003) Are point-in-time measures of neighborhood characteristics useful proxies for children’s long-run neighborhood environment? Econ Lett 79(2):231–237

Lauritsen JL, White NA (2001) Putting violence in its place: The influence of race, ethnicity, gender, and place on the risk for violence. Criminol Public Policy 1:37–60

Leventhal T, Brooks-Gunn J (2000) The neighborhoods they live in: The effects of neighborhood residence on child and adolescent outcomes. Psychol Bull 126(2):309–37

Lichter DT, Parisi D, Taquino MC (2016) Emerging patterns of Hispanic residential segregation: Lessons from Rural and Small-Town America. Rural Sociol 81(4):483–518

Massey DS, Denton NA (1993) American Apartheid: Segregation and the Making of the Underclass. Harvard University Press, Cambridge

McNulty TL, Bellair PE (2003a) Explaining racial and ethnic differences in adolescent violence: Structural Disadvantage, family well-being, and social capital. Justice Quart 20:1–32

McNulty TL, Bellair PE (2003b) Explaining racial and ethnic differences in serious adolescent violent behavior. Criminology 41(3):709–747. https://doi.org/10.1111/j.1745-9125.2003.tb01002.x

Morenoff J (2005) Racial and ethnic disparities in crime and delinquency in the United States.” Pp. 139–173 in Ethnicity and Causal Mechanisms, M. Rutter and M. Tienda (Eds.). Cambridge University Press

Peterson RD, Krivo LJ (2010) Divergent social worlds: Neighborhood crime and the racial spatial divide. Russell Sage Foundation, New York, NY

Quillian L (2003) How long are exposures to poor neighborhoods? The long-term dynamics of entry and exit from poor neighborhoods. Popul Res Policy Rev 22(3):221–249

Robins JM, Hernan MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560

Sampson RJ (2012) Great American City: Chicago and the enduring neighborhood effect. University of Chicago Press

Sampson RJ, Wilson WJ (1995) Toward a Theory of Race, Crime, and Urban Inequality. In: Crime and Inequality, J. Hagan, and R. D. Peterson, (ed) Stanford. Stanford University Press, CA, pp 37–54

Sampson RJ, Morenoff JD, Gannon-Rowley T (2002) Assessing ‘Neighborhood Effects’: Social processes and new directions in research. Annu Rev Soc 28(1):443–78. https://doi.org/10.1146/annurev.soc.28.110601.141114

Sampson RJ, Morenoff JD, Raudenbush SW (2005) Social anatomy of racial and ethnic disparities in violence. Am J Public Health 95(2):224–232. https://doi.org/10.2105/AJPH.2004.037705

Sampson RJ, Wilson WJ, Katz H (2018) Reassessing toward a theory of race, crime, and urban inequality: enduring and new challenges In 21st Century America. Du Bois Rev 15(1):13–34. https://doi.org/10.1017/S1742058X18000140

Santelli JS, Lowry R, Brener ND, Robin L (2000) The association of sexual behaviors with socioeconomic status, family structure and race-ethnicity among US adolescents. Am J Public Health 90(10):1582–1588

Sharkey P (2013) Stuck in Place: Urban neighborhoods and the end of progress toward racial equality. University of Chicago Press, Chicago, IL

Book   Google Scholar  

Sharkey P (2014) Spatial Segmentation and the black middle class. Am J Sociol 119(4):903–954

Sharkey P, Elwert F (2011) The legacy of disadvantage: Multigenerational neighborhood effects on cognitive ability. Am J Sociol 116(6):1934–1981. https://doi.org/10.1086/660009

Shaw CR, McKay HD (1942) Juvenile delinquency and Urban Areas. University of Chicago Press, Chicago, IL.

Simcha-Fagan O, Schwartz JE (1986) Neighborhood and delinquency: An assessment of contextual effects. Criminology 24(4):667–703

Stewart E, Simons RL (2006) Structure and culture in African American adolescent violence: A partial test of the code of the street thesis. Justice Q 23:1–33

Stewart E, Schreck C, Simons RL (2006) I ain’t gonna let no one disrespect me”: Does the code of the street reduce or increase violent victimization among African American adolescents. J Res Crime Delinq 43:427–458

Timberlake JM (2007) Racial and ethnic inequality in the duration of children’s exposure to neighborhood poverty and affluence. Soc Probl 54(3):319–342

Unnever JD (2018) The racial invariance thesis in criminology: Toward a black criminology. In: building a black criminology: Race, Theory, and Crime, J.D. Unnever, S. L. Gabbidon, and C. Chouhy, (ed) New York. Routledge, NY, pp 77–100

Unnever JD, Barnes JC, Cullen FT (2016) The racial invariance thesis revisited: Testing an African American theory of offending. J Contemp Crim Justice 32(1):7–26. https://doi.org/10.1177/1043986215607254

Von Hippel PT (2020) How many imputations do you need? A two-stage calculation using a quadratic rule. Sociol Methods Res 49(3):699–718

Wilson WJ (1987) The truly disadvantaged: The Inner City, the Underclass, and Public Policy. University of Chicago Press, Chicago, IL

Wilson WJ (1996) When work disappears: The world of the New Urban Poor. Random House, New York, NY

Wilson WJ (2009) More than just race: Being black and poor in the Inner City. W.W. Norton, New York, NY

Wodtke GT (2013) Duration and timing of exposure to neighborhood poverty and the risk of adolescent parenthood. Demography 50(5):1765–88

Wodtke GT, Harding DJ, Elwert F (2011) Neighborhood effects in temporal perspective: The impact of long-term exposure to concentrated disadvantage on High School Graduation. Am Sociol Rev 76(5):713–736

Download references

Author information

Authors and affiliations.

The Ohio State University, 241 Townshend Hall, 1885 Neil Ave., Columbus, OH, 43210, USA

Paul E. Bellair

University of Georgia, Athens, USA

Thomas L. McNulty

The University of Utah, Salt Lake City, USA

Daniel L. Carlson

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Paul E. Bellair .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Bellair, P.E., McNulty, T.L. & Carlson, D.L. The Significance of Duration Weighted Neighborhood Effects for Violent Behavior and Explanation of Ethnoracial Differences. J Quant Criminol (2024). https://doi.org/10.1007/s10940-024-09588-1

Download citation

Accepted : 22 April 2024

Published : 07 May 2024

DOI : https://doi.org/10.1007/s10940-024-09588-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Self-reported violence
  • Neighborhood disadvantage
  • Neighborhood effect
  • Social disorganization
  • Ethnoracial differences
  • Marginal structural modeling with IPTW
  • Duration weighted neighborhood disadvantage
  • Find a journal
  • Publish with us
  • Track your research

The Federal Register

The daily journal of the united states government, request access.

Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.

If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.

An official website of the United States government.

If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.

COMMENTS

  1. Weighted Mean

    Uses of Weighted Means. Weighted means are useful in a wide variety of scenarios. For example, a student may use a weighted mean in order to calculate his/her percentage grade in a course. In such an example, the student would multiply the weighing of all assessment items in the course (e.g., assignments, exams, projects, etc.) by the ...

  2. Weighted Average: Formula & Calculation Examples

    A weighted average is a type of mean that gives differing importance to the values in a dataset. In contrast, the regular average, or arithmetic mean, gives equal weight to all observations. The weighted average is also known as the weighted mean, and I'll use those terms interchangeably.

  3. Weighted Mean: Formula: How to Find Weighted Mean

    Weighted mean = Σwx/Σw. Σ = summation (in other words…add them up!). w = the weights. x = the value. To use the formula: Multiply the numbers in your data set by the weights. Add the numbers in Step 1 up. Set this number aside for a moment. Add up all of the weights.

  4. Weighted Mean

    The weighted mean is a mathematical calculation that takes into account the relative importance of each number in a set. The calculation performed by multiplying each number in the set by a weight, and then adding the results. The weighted mean then calculated by dividing the sum by the sum of the weights.

  5. PDF WEIGHTED MEANS AND MEANS AS WEIGHTED SUMS

    The ordinary (arithmetic) mean is a weighted mean with all weights equal to 1. 2 Another way to describe a weighted meanof a list of numbers is a sum of coefficients times the numbers, where the coefficients add up to 1. In this case, the coefficients are called the weights. (Note the ambiguity in the use of "weight".)

  6. Weighted Mean

    The weighted mean, alternatively termed as a weighted average, presents a methodology for computing the average of a set of values. In this method, each value is assigned a weight, a measure of its importance or frequency within the dataset. To calculate the weighted mean, each value is multiplied by its corresponding weight, the products are ...

  7. Weighted Mean

    Learn how to calculated a weighted mean and calculate for a missing weight or data value. "The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average ), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others.". [1]

  8. Weighted Average: Definition and How It Is Calculated and Used

    Weighted average is a mean calculated by giving values in a data set more influence according to some attribute of the data. It is an average in which each quantity to be averaged is assigned a ...

  9. Weighted Arithmetic Mean

    The weighted arithmetic mean is a measure of central tendency of a set of quantitative observations when not all the observations have the same importance.. We must assign a weight to each observation depending on its importance relative to other observations. The weighted arithmetic mean equals the sum of observations multiplied by their weights divided by the sum of their weights.

  10. Weighted Mean

    The weighted mean is a measure of central tendency . The weighted mean of a set of values is computed according to the following formula: where. are non-negative coefficients, called "weights", that are ascribed to the corresponding values . Only the relative values of the weights matter in determining the value of the weighted mean.

  11. The Weighted Mean

    accepted as the more accurate ones. Then we end up with a 'weighted mean'. We see that 'accurate' here is roughly the equivalent of 'important' in the top paragraph. We now proceed to discuss how the weighted mean is estimated using least-squares. In section 5.2.1 ('The mean as a least-squares t'), a single value m was tted by

  12. Weighted Mean

    Summary. Weighted Mean: A mean where some values contribute more than others. When the weights add to 1: just multiply each weight by the matching value and sum it all up. Otherwise, multiply each weight w by its matching value x, sum that all up, and divide by the sum of weights: Weighted Mean = Σwx Σw.

  13. 1. How different weighting methods work

    The analysis compares three primary statistical methods for weighting survey data: raking, matching and propensity weighting. In addition to testing each method individually, we tested four techniques where these methods were applied in different combinations for a total of seven weighting methods: Raking. Matching.

  14. Weighted Mean

    Solution: Here most of the values in this data set are repeated multiple times, we can easily compute the sample mean as a weighted mean. Following are steps to calculate the weighted arithmetic mean. Step 1: First assign a weight to each value in the dataset. x 1 =1, w 1 =73. x 2 =2, w 2 =378. x 3 =3, w 3 =459. x 4 =4, w 4 =90 . Step 2: Now compute the numerator of the weighted mean formula.

  15. Weighted Mean Formula

    The weighted mean equation is a statistical method that calculates the average by multiplying the weights with their respective mean and taking its sum. It is a type of average in which weights assign individual values to determine the relative importance of each observation.

  16. Weighted means statistics

    The usefulness of weighted means statistics as a consensus mean estimator in collaborative studies is discussed. A random effects model designed to combine information from several sources is employed to justify their appeal to metrologists. Some methods of estimating the uncertainties and of constructing confidence intervals are reviewed.

  17. Measures of central tendency: The mean

    Weighted mean is calculated when certain values in a data set are more important than the others. A weight w i is attached to each of the values x i to reflect this importance. For example, When weighted mean is used to represent the average duration of stay by a patient in a hospital, the total number of cases presenting to each ward is taken ...

  18. Weighted Mean

    Weighted mean. The weighted mean involves multiplying each data point in a set by a value which is determined by some characteristic of whatever contributed to the data point. An example should help make that rather vague definition clearer. In meta-analysis, a researcher has a set of effect sizes from a number of studies and wishes to combine ...

  19. Weighted Mean

    The general formula to find the weighted mean is given as, Weighted mean = Σ (w) n (x̄) n /Σ (w) n. where, x̄ = the mean value of the set of given data. w = corresponding weight for each observation. The simple steps used to calculate the weight mean through the formula is: Step 1: Add all the weighted values together.

  20. (PDF) Weighted means and weighting functions

    In this method, the decision-maker's expectation levels for the input variables are directly transformed into weights by making use of the generator function of a weighted quasi-arithmetic mean.

  21. Step By Step Computation of the Average Weighted Mean

    In this video, I will demonstrate how to compute and discuss your results for the Average Weighted Mean. This is useful for research papers that utilized Lik...

  22. On weighted means and their inequalities

    are known in the literature as the weighted arithmetic mean, the weighted geometric mean, and the weighted harmonic mean of A and B, respectively. If \ (v=1/2\), they are simply denoted by \ (A\nabla B\), \ (A\sharp B\), and \ (A!B\), respectively. The previous operator means satisfy the following double inequality:

  23. Weighted mean statistics in interlaboratory studies

    Abstract. The usefulness of weighted means statistics as a consensus mean estimator in collaborative studies is discussed. A random effects model designed to combine information from several ...

  24. The Significance of Duration Weighted Neighborhood Effects ...

    Purpose Two important issues constrain the neighborhood effects literature. First, most prior research examining neighborhood effects on aggression and self-reported violence uses a point in time (i.e., cross-sectional) estimate of neighborhood disadvantage even though the duration of exposure to neighborhood disadvantage varies between families. Second, neighborhood effects may be understated ...

  25. Federal Register, Volume 89 Issue 91 (Thursday, May 9, 2024)

    Since 2011, the technological advances from full-scale deployments (e.g., the Petra Nova and Boundary Dam projects discussed later in this preamble) combined with supportive policies in multiple states and the financial incentives included in the IRA, mean that CCS can be deployed at scale today.

  26. Federal Register :: New Source Performance Standards for Greenhouse Gas

    If you are using public inspection listings for legal research, you should verify the contents of the documents against a final, official edition of the Federal Register. Only official editions of the Federal Register provide legal notice of publication to the public and judicial notice to the courts under 44 U.S.C. 1503 & 1507.