• Search Search Please fill out this field.
  • Behavioral Economics

Revealed Preference in Economics: What Does It Show?

strong form of preference hypothesis

Investopedia / Julie Bang

What is Revealed Preference?

Revealed preference, a theory offered by American economist Paul Anthony Samuelson in 1938, states that consumer behavior, if their income and the item's price are held constant, is the best indicator of their preferences.

Key Takeaways

  • Revealed preference, a theory offered by American economist Paul Anthony Samuelson in 1938, states that consumer behavior, if their income and the item's price are held constant, is the best indicator of their preferences.
  • Revealed preference theory works on the assumption that consumers are rational.
  • Three primary axioms of revealed preference are WARP, SARP, and GARP.

Understanding Revealed Preference

For a long time, consumer behavior, most notably consumer choice, had been understood through the concept of utility. In economics,  utility refers to how much satisfaction  or pleasure consumers get from the purchase of a product, service, or experienced event. However, utility is incredibly difficult to quantify in indisputable terms, and by the beginning of the 20th Century, economists were complaining about the pervasive reliance on utility. Replacement theories were considered, but all were similarly criticized, until Samuelson's "Revealed Preference Theory," which posited that consumer behavior was not based on utility, but on observable behavior that relied on a small number of relatively uncontested assumptions.

Revealed preference is an economic theory regarding an individual's consumption patterns, which asserts that the best way to measure consumer preferences is to observe their purchasing behavior. Revealed preference theory works on the assumption that consumers are rational. In other words, they will have considered a set of alternatives before making a purchasing decision that is best for them. Thus, given that a consumer chooses one option out of the set, this option must be the preferred option.

Revealed preference theory allows room for the preferred option to change depending upon price and budgetary constraints. By examining the preferred preference at each point of constraint, a schedule can be created of a given population's preferred items under a varied schedule of pricing and budget constraints. The theory states that given a consumer's budget, they will select the same bundle of goods (the "preferred" bundle) as long as that bundle remains affordable. It is only if the preferential bundle becomes unaffordable that they will switch to a less expensive, less desirable bundle of goods.

The original intention of revealed preference theory was to expand upon the theory of marginal utility, coined by Jeremy Bentham. Utility, or enjoyment from a good, is very hard to quantify, so Samuelson set about looking for a way to do so. Since then, revealed preference theory has been expanded upon by a number of economists and remains a major theory of consumption behavior. The theory is especially useful in providing a method for analyzing consumer choice empirically.

Three Axioms of Revealed Preference

As economists developed the revealed preference theory, they identified three primary axioms of revealed preference—the weak axiom, the strong axiom, and the generalized axiom.

  • Weak Axiom of Revealed Preference (WARP): This axiom states that given incomes and prices, if one product or service is purchased instead of another, then, as consumers, we will always make the same choice. The weak axiom also states that if we buy one particular product, then we will never buy a different product or brand unless it is cheaper, offers increased convenience, or is of better quality (i.e. unless it provides more benefits). As consumers, we will buy what we prefer and our choices will be consistent, so suggests the weak axiom.
  • Strong Axiom of Revealed Preference (SARP): This axiom states that in a world where there are only two goods from which to choose, a two-dimensional world, the strong and weak actions are shown to be equivalent.
  • Generalized Axiom of Revealed Preference (GARP): This axiom covers the case when, for a given level of income and or price, we get the same level of benefit from more than one consumption bundle. In other words, this axiom accounts for when no unique bundle that maximizes utility exists.

Example of Revealed Preference

As an example of the relationships expounded upon in revealed preference theory, consider consumer X that purchases a pound of grapes. It is assumed under revealed preference theory that consumer X prefers that pound of grapes above all other items that cost the same, or are cheaper than, that pound of grapes. Since consumer X prefers that pound of grapes over all other items they can afford, they will only purchase something other than that pound of grapes if the pound of grapes becomes unaffordable. If the pound of grapes becomes unaffordable, consumer X will then move on to a less preferable substitute item.

Criticisms of Revealed Preference Theory

Some economists say that revealed preference theory makes too many assumptions. For instance, how can we be sure that consumer's preferences remain constant over time? Isn’t it possible that an action at a specific point in time reveals part of a consumer’s preference scale just at that time? For example, if just an orange and an apple were available for purchase, and the consumer chooses an apple, then we can definitely say that the apple is revealed preferred to the orange.

There is no proof to back up the assumption that a preference remains unchanged from one point in time to another. In the real world, there are lots of alternative choices. It is impossible to determine what product or set of products or behavioral options were turned down in preference to buying an apple.

Brown University, Orlando Bravo Center for Economic Research. " A Rationalization of the Weak Axiom of Revealed Preference ."

Princeton University Press. " 1 Revealed Preference ," Page 8.

strong form of preference hypothesis

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Write a Strong Hypothesis | Guide & Examples

How to Write a Strong Hypothesis | Guide & Examples

Published on 6 May 2022 by Shona McCombes .

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more variables . An independent variable is something the researcher changes or controls. A dependent variable is something the researcher observes and measures.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism, run a free check.

Step 1: ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2: Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalise more complex constructs.

Step 3: Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

Step 4: Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

Step 5: Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

Step 6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis is not just a guess. It should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, May 06). How to Write a Strong Hypothesis | Guide & Examples. Scribbr. Retrieved 21 May 2024, from https://www.scribbr.co.uk/research-methods/hypothesis-writing/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, operationalisation | a guide with examples, pros & cons, what is a conceptual framework | tips & examples, a quick guide to experimental design | 5 steps & examples.

Hick’s Logical Theory of Demand: Preference Hypothesis and Logic of Ordering

strong form of preference hypothesis

In order to explain the behaviour of an ideal consumer Prof. Hicks assumes preference hypothesis as a principle which governs the behaviour of such a consumer.

The assumption of behavior according to a scale of preferences is known as preference hypothesis.

Hicks explains the meaning of preference hypothesis or behaviour according to the scale of preference as follows:

“The ideal consumer (who is not affected by anything else than current market conditions) chooses that alternative out of the various alternatives open to him, which he most prefers, or ranks most highly. In one set of market conditions he makes one choice, in others other choices; but the choices he makes always express the same ordering, and must, therefore, be consistent with one another. This is the hypothesis made about the behaviour of the ideal consumer.”

ADVERTISEMENTS:

The above statement of Hicks implies that the consumer in a given market situation chooses the most preferred combination and he will choose different combinations in different market situations but his choices in different market situations will be consistent with each other.

It is important to remember that Hicks’ demand theory presented in Value and Capital’ was also based upon the preference hypothesis but there he expressed the given scale of preferences at once in the form of a set of indifference curves. This direct introduction of geometrical device has, as already noted above, various disadvantages and has, therefore, been given up. In ‘Revision of Demand Theory Hicks begins from the logic of ordering itself rather than starting from the geometrical application of it.

According to him, “the demand theory which is based upon the preference hypothesis turns out to be nothing else but an economic application of the logical theory of ordering.” Therefore, before deriving demand theory from preference hypothesis he explains the “logic of order”. In this context he draws out difference between strong ordering and weak ordering. He then proceeds to base his demand theory on weak-ordering form of preference hypothesis.

Strong and Weak Orderings Distinguished :

A set of items is strongly ordered, if each item has a place of its own in the order and each item could then be given a number and to each number there would be one item and only one item which would correspond. A set of items is weakly ordered if the items are clustered into groups but none of the items within a group can be put ahead of the others. “A weak ordering consists of a division into groups, in which sequence of groups is strongly ordered, but in which there is no ordering within the groups.”

It should be noted that indifference curves imply weak ordering in as much as all the points on a given indifference curve are equally desirable and hence occupy same place in the order. On the other hand, revealed preference approach implies strong ordering since it assumes that the choice of a combination reveals consumer’s preference for it over all other alternative combinations open to him. Choice can reveal preference for a combination only if all the alternative combinations are strongly ordered.

Weak ordering implies that the consumer chooses a position and rejects others open to him, then the rejected positions need not be inferior to the position actually chosen but may have been indifferent to it. Hence, under weak ordering, actual choice fails to reveal definite preference. The strong ordering and weak-ordering as applied to the theory of demand are illustrated in Fig. 13.1.

Strong Ordering: Choice Reveals Preference

If the consumer is confronted with the price-income situation aa, then he can choose any combination that lies in or on triangle aOa. Suppose that our consumer chooses the combination A. Let us assume that our consumer is an ideal consumer who is acting according to his scale of preferences. Now, the question is how his act of choice of A from among the available alternatives within and on the triangle aOa is to be interpreted.

If the available alternatives are strongly ordered, then the choice of A by the consumer will show that he prefers A over all other available alternatives. In Samuelson’s language he ‘reveals his preference’ for A over all other possible alternatives which are rejected. Since, under strong ordering, the consumer shows definite preference for the selected alternative, there is no question of any indifferent positions to the selected one.

Hicks’ Criticism of the Logic of Strong Ordering :

Hicks criticises the logic of strong ordering. “If we interpret the preference hypothesis to mean strong ordering, we cannot assume that all the geometrical points, which lie within or on the triangle aOa represent effective alternatives. A two-dimensional continuum point cannot be strongly ordered.”

Prof. Hicks further says that if commodities are assumed to be available only in discrete units, so that the diagram is to be conceived as being drawn on squared paper and the only effective alternatives are the points at the corners of squares and therefore the selected point must also lie at the corner of a square, then the strong ordering hypothesis is acceptable.

Since in the real world, commodities are available in discrete units, therefore the strong ordering hypothesis should not present any difficulty. But Hicks contends that the actual commodities may be available in integral number of units but this cannot be said of the composite commodity money, which is usually measured on the V-axis.

Hicks regards money to be finally divisible. To quote him:

“If everyone of the actual commodities into which M can be exchanged is itself only available in discrete units; but if the number of such commodities is large, there will be a large number of ways in which a small increment of M can be consumed by rearrangement of consumption among the individual commodities, whence it will follow that the units in which M is to be taken to be available must be considered as exceedingly small.

And as soon as any individual commodity becomes available in units that are finally divisible, M must be regarded as finally divisible. In practice, we should usually think of M as being money, held back for the purchase of other commodities than X; though money is not finally divisible in a mathematical sense, the smallest monetary unit (farthing or cent) is so small in relation to the other units with which we are concerned that the imperfect divisibility of money is in practice a thing of no importance.

For these reasons, while it is a theoretical improvement to be able to regard the actual commodity X as available in discrete units it is no improvement at all to be obliged to impute same indivisibility to the composite commodity M. It is much better to regard money as finally divisible.

So, according to Hicks, where the choice is between any good which is available in discrete units and money which is finally divisible, the possibility of equally desired combinations must be accepted and strong ordering has, therefore, to be given up. Why the strong ordering hypothesis is not valid when the choice is between money which is finally divisible and is represented on the Y-axis and the commodity X which is imperfectly divisible and is represented on the X-axis is illustrated in Fig. 13.2.

This is because when money measured on Y-axis is taken to be finally divisible, the effective alternatives will no longer be represented by square corners, they will appear in the diagram as a series of parallel lines (or stripes) as shown in Fig. 13.2. All points on the stripes will be effective alternatives but such alternatives cannot be strongly ordered “unless the whole of one stripe was preferred to the whole of the next stripe, and so on; which means that the consumer would always prefer an additional unit of X whatever he had to pay for it.” But this is quite absurd.

Strong Ordering cannot be Maintained when One Commodity is Money

Thus, the effective alternatives appearing on the stripes cannot be strongly ordered. Again, suppose there are two alternatives P and Q on a given stripe which are such that P is preferred to R on another stripe, while R is preferred to Q. Given that, we can always find a point between P and Q on a given stripe which is indifferent to R.

It is thus evident that when various alternatives appear as a series of stripes, there can be a relation of indifference between some of them. Thus strong ordering cannot be maintained when various alternative combinations consist of the composite commodity money which is finally divisible and actual commodity which is available only in discrete units. “As soon as we introduce the smallest degree of continuity (such as is introduced by the ‘striped’ hypothesis) strong ordering has to be given up.”

The Logic of Weak Ordering :

After rejecting the strong ordering hypothesis. Hicks proceeds to establish the case for the adoption of the weak ordering hypothesis. As noted above, the weak ordering hypothesis recognizes the relation of indifference, while the strong ordering hypothesis does not. In the words of Hicks, “If the consumer’s scale of preferences is weakly ordered, then his choice of a particular position A does not show (or reveal) that A is preferred to any rejected position within or on the triangle: all that is shown is that there is no rejected position which is preferred to A. It is perfectly possible that some rejected position may be indifferent to A; the choice of A instead of that rejected position is then a matter of ‘chance’.

From the above statement of Hicks it is clear that, under the weak ordering hypothesis, the choice of a particular combination does not indicate preference for that particular combination over another possible alternative combination but it only shows that all other possible alternative combinations within or on the choice triangle cannot be preferred to the chosen combination.

There is possibility of some rejected combinations being indifferent to the selected one. If preference hypothesis in its weak ordering form is adopted, then it yields so little information about the consumer’s behavior that the basic propositions of demand theory- cannot be derived from it.

Therefore, Hicks has felt it necessary to introduce an additional hypothesis along with the adoption of the weak ordering hypothesis so as to derive basic propositions of demand theory. This additional hypothesis which is introduced is simply that ‘the consumer will always prefer a larger amount of money to a smaller amount of money, provided that the amount of good X at his disposal is unchanged.

It should be carefully noted that it is not necessary to make this additional hypothesis if strong ordering form of preference hypothesis is adopted. But this additional hypothesis which has been introduced by Hicks is very reasonable and is always implicit in economic analysis, even though it is not explicitly stated every time.

Now the question is what positive information is provided by weak ordering approach when supported by the above additional hypothesis. Let us consider Fig. 13.3. From all the available combinations within and on the triangle aOa the consumer chooses A. Under weak ordering hypothesis alone the choice of A rather than B which lies within the triangle aOa does not show that A is preferred to B; it only shows that B is not preferred to A. In other words, under weak ordering alone, the choice of A rather than B means that either A is preferred to B, or the consumer is indifferent between A and B.

Weak-ordering Approach along with an additional hypothesis about money

Now, consider the position L which lies where the stripe through B meets the line aa. On the additional hypothesis made, L is preferred to B, since L contains more amount of money than B, amount of X being the same in both the positions. If A and Bare indifferent, then from the transitivity it follows that L is preferred to A. But L was available when A was selected. Therefore, though L can be indifferent to A, it cannot be preferred to A.

Thus, it follows that the possibility that A and B are indifferent must be ruled out. Hence, when we adopt the weak ordering along with the additional hypothesis we come to the conclusion that the chosen combination A is preferred to any combination such as B which lies within the triangle. What cannot be said with certainty under weak ordering even with the additional hypothesis is whether the chosen combination A is preferred to a combination such as L which lies on the triangle, that is, on the line aa. A can be either preferred to L or indifferent to it.

Drawing the difference between the implications of strong and weak orderings. Hicks says. “The difference between the consequences of strong and weak ordering so interpreted amounts to no more than this: that under strong ordering the chosen position is shown to be preferred to all other positions within and on the triangle, while under weak ordering it is preferred to all positions within the triangle, but may be indifferent to other positions on the same boundary as itself.”

It will be evident from above that the difference between the effects of the strong and weak orderings is very small and that it only affects a class of limiting cases (i.e., positions lying on the triangle). The weak ordering theory, Hicks says, “has a larger tolerance and, therefore, it deals with these limiting cases rather better”. Apart from this, weak ordering hypothesis, contends Hicks, is more useful and desirable.

“If we take the strong ordering approach, we are committing ourselves to discontinuity not merely to the indivisibility of the particular commodity, demand for which is being studied, but also to the indivisibility of the composite commodity used as a background. If, on the other hand, we take the weak ordering approach, we are committing ourselves to some degree of continuity but divisibility of the background commodity is itself quite sufficient to ensure that the weak ordering approach is practicable.”

As stated above, the weak ordering approach to be useful for demand theory requires an additional assumption to be made, namely, that the consumer prefers a larger amount of money to a smaller amount. Further, another assumption which is to be necessarily made when the weak ordering approach is adopted is that the preference order is transitive. These two additional assumptions are not required in the case of strong ordering approach.

Related Articles:

  • Preference Hypothesis and Strong Ordering (Explained With Diagram)
  • Need for Revision of Hick’s Logical Ordering Theory of Demand
  • How to Derive Law of Demand through Logical Weak Ordering Approach?
  • Choice of Revealed Preference (With Diagram)

Expected Utility Hypothesis

  • Living reference work entry
  • Latest version View entry history
  • First Online: 01 January 2017
  • Cite this living reference work entry

strong form of preference hypothesis

  • Mark J. Machina 2  

699 Accesses

4 Citations

The expected utility hypothesis – that is, the hypothesis that individuals evaluate uncertain prospects according to their expected level of ‘satisfaction’ or ‘utility’ – is the predominant descriptive and normative model of choice under uncertainty in economics. It provides the analytical underpinnings for the economic theory of risk-bearing, including its applications to insurance and financial decisions, and has been formally axiomatized under conditions of both objective (probabilistic) and subjective (event-based) uncertainty. In spite of evidence that individuals may systematically depart from its predictions, and the development of alternative models, expected utility remains the leading model of economic choice under uncertainty.

This chapter was originally published in The New Palgrave Dictionary of Economics , 2nd edition, 2008. Edited by Steven N. Durlauf and Lawrence E. Blume

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Bibliography

Allais, M. 1953. Fondements d’une théorie positive des choix comportant un risque et critique des postulats et axiomes de l’école Américaine. Colloques Internationaux du Centre National de la Recherche Scientifique 40: 257–332. Trans. as: The foundations of a positive theory of choice involving risk and a criticism of the postulates and axioms of the American School. In Expected utility hypotheses and the Allais paradox , ed. M. Allais and O. Hagen. Dordrecht: D. Reidel, 1979.

Google Scholar  

Arrow, K. 1974. Essays in the theory of risk-bearing . Amsterdam: North-Holland.

Atkinson, A. 1970. On the measurement of inequality. Journal of Economic Theory 2: 244–263.

Article   Google Scholar  

Batra, R. 1975. The pure theory of international trade under uncertainty . London: Macmillan.

Book   Google Scholar  

Baumol, W. 1951. The Neumann–Morgenstern utility index: An ordinalist view. Journal of Political Economy 59: 61–66.

Baumol, W. 1958. The cardinal utility which is ordinal. Economic Journal 68: 665–672.

Bernoulli, D. 1738. Specimen theoriae novae de mensura sortis. Commentarii Academiae Scientiarum Imperialis Petropolitanae . Trans. as Exposition of a new theory on the measurement of risk. Econometrica 22 (1954): 23–36.

Debreu, G. 1959. Theory of value: An axiomatic analysis of economic equilibrium . New Haven: Yale University Press.

Ellsberg, D. 1954. Classical and current notions of ‘measurable utility’. Economic Journal 64: 528–556.

Fishburn, P. 1982. The foundations of expected utility . Dordrecht: D. Reidel.

Fleming, W., and S.-J. Sheu. 1999. Optimal long term growth rate of expected utility of wealth. Annals of Applied Probability 9: 871–903.

Friedman, M., and L. Savage. 1948. The utility analysis of choices involving risk. Journal of Political Economy 56: 279–304.

Herstein, I., and J. Milnor. 1953. An axiomatic approach to measurable utility. Econometrica 21: 291–297.

Hey, J. 1979. Uncertainty in microeconomics . Oxford/New York: Martin Robinson/New York University Press.

Hirshleifer, J. 1965. Investment decision under uncertainty: Choice theoretic approaches. Quarterly Journal of Economics 79: 509–536.

Hirshleifer, J. 1966. Investment decision under uncertainty: Applications of the state-preference approach. Quarterly Journal of Economics 80: 252–277.

Hirshleifer, J., and J. Riley. 1979. The analytics of uncertainty and information – An expository survey. Journal of Economic Literature 17: 1375–1421.

Karni, E. 1985. Decision making under uncertainty: The case of state-dependent preferences . Cambridge, MA: Harvard University Press.

Karni, E., and D. Schmeidler. 1991. Utility theory with uncertainty. In Handbook of mathematical economics , ed. W. Hildenbrand and H. Sonnenschein, vol. 4. Amsterdam: North-Holland.

Kreps, D. 1988. Notes on the theory of choice . Boulder: Westview Press.

Levhari, D., and T.N. Srinivasan. 1969. Optimal savings under uncertainty. Review of Economic Studies 36: 153–164.

Levy, H. 1992. Stochastic dominance and expected utility: Survey and analysis. Management Science 38: 555–593.

Lippman, S., and J. McCall. 1981. The economics of uncertainty: Selected topics and probabilistic methods. In Handbook of mathematical economics , ed. K. Arrow and M. Intriligator, vol. 1. Amsterdam: North-Holland.

Lusztig, M., and P. James. 2006. How does free trade become institutionalised? An expected utility model of the Chrétien era. World Economy 29: 491–505.

Machina, M. 1983. The economic theory of individual behavior toward risk: Theory, evidence and new directions . Technical report no. 433. Institute for Mathematical Studies in the Social Sciences, Stanford University.

Malinvaud, E. 1952. Note on von Neumann–Morgenstern’s strong independence axiom. Econometrica 20: 679–680.

Markowitz, H. 1952. The utility of wealth. Journal of Political Economy 60: 151–158.

Marschak, J. 1950. Rational behavior, uncertain prospects, and measurable utility. Econometrica 18: 111–141.

Meltzer, D. 2001. Addressing uncertainty in medical cost-effectiveness analysis: Implications of expected utility maximization for methods to perform sensitivity analysis and the use of cost-effectiveness analysis to set priorities for medical research. Journal of Health Economics 20: 109–129.

Menger, K. 1934. Das Unsicherheitsmoment in der Wertlehre. Zeitschrift für Nationalökonomie . Trans. as: The role of uncertainty in economics. In Essays in mathematical economics in honor of Oskar Morgenstern , ed. M. Shubik. Princeton: Princeton University Press, 1967.

Merton, R. 1969. Lifetime portfolio selection under uncertainty: The continuous time case. Review of Economics and Statistics 51: 247–257.

Mosteller, F., and P. Nogee. 1951. An experimental measurement of utility. Journal of Political Economy 59: 371–404.

Pratt, J. 1964. Risk aversion in the small and in the large. Econometrica 32: 122–136.

Quirk, J., and R. Saposnick. 1962. Admissibility and measurable utility functions. Review of Economic Studies 29: 140–146.

Ramsey, F. 1926. Truth and probability. In The foundations of mathematics and other logical essays , ed. R. Braithwaite. New York: Harcourt, Brace and Co, 1931. Reprinted in Foundations: Essays in philosophy, logic, mathematics and economics , ed. D. Mellor. New Jersey: Humanities Press, 1978.

Ross, S. 1981. Some stronger measures of risk aversion in the small and in the large, with applications. Econometrica 49: 621–638.

Rothschild, M., and J. Stiglitz. 1970. Increasing risk I: A definition. Journal of Economic Theory 2: 225–243.

Rothschild, M., and J. Stiglitz. 1971. Increasing risk II: Its economic consequences. Journal of Economic Theory 3: 66–84.

Samuelson, P. 1950. Probability and attempts to measure utility. Economic Review 1: 167–173.

Samuelson, P. 1952. Probability, utility, and the independence axiom. Econometrica 20: 670–678.

Savage, L. 1954. The foundations of statistics . New York: John Wiley & Sons. Revised edition: New York: Dover, 1972.

von Neumann, J., and O. Morgenstern. 1944. Theory of games and economic behavior . Princeton: Princeton University Press.

von Neumann, J., and O. Morgenstern. 1947. Theory of games and economic behavior . 2nd ed. Princeton: Princeton University Press.

von Neumann, J., and O. Morgenstern. 1953. Theory of games and economic behavior . 3rd ed. Princeton: Princeton University Press.

Whitmore, G., and M. Findlay, eds. 1978. Stochastic dominance: An approach to decision making under risk . Lexington: D.C. Heath.

Wolfson, L., J. Kadane, and M. Small. 1996. Expected utility as a policy making tool: An environmental health example. In Bayesian biostatistics , ed. D. Berry and D. Stangl. New York: Marcel Dekker.

Download references

Author information

Authors and affiliations.

http://link.springer.com/referencework/10.1057/978-1-349-95121-5

Mark J. Machina

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations, copyright information.

© 2008 The Author(s)

About this entry

Cite this entry.

Machina, M.J. (2008). Expected Utility Hypothesis. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95121-5_127-2

Download citation

DOI : https://doi.org/10.1057/978-1-349-95121-5_127-2

Received : 13 January 2017

Accepted : 13 January 2017

Published : 18 April 2017

Publisher Name : Palgrave Macmillan, London

Online ISBN : 978-1-349-95121-5

eBook Packages : Springer Reference Economics and Finance Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

  • Publish with us

Policies and ethics

Chapter history

DOI: https://doi.org/10.1057/978-1-349-95121-5_127-2

DOI: https://doi.org/10.1057/978-1-349-95121-5_127-1

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.7(4); Jul-Aug 2020

Logo of eneuro

Scientific Hypothesis-Testing Strengthens Neuroscience Research

Author contributions: B.E.A. conceived of and wrote the paper.

Science needs to understand the strength of its findings. This essay considers the evaluation of studies that test scientific (not statistical) hypotheses. A scientific hypothesis is a putative explanation for an observation or phenomenon; it makes (or “entails”) testable predictions that must be true if the hypothesis is true and that lead to its rejection if they are false. The question is, “how should we judge the strength of a hypothesis that passes a series of experimental tests?” This question is especially relevant in view of the “reproducibility crisis” that is the cause of great unease. Reproducibility is said to be a dire problem because major neuroscience conclusions supposedly rest entirely on the outcomes of single, p valued statistical tests. To investigate this concern, I propose to (1) ask whether neuroscience typically does base major conclusions on single tests; (2) discuss the advantages of testing multiple predictions to evaluate a hypothesis; and (3) review ways in which multiple outcomes can be combined to assess the overall strength of a project that tests multiple predictions of one hypothesis. I argue that scientific hypothesis testing in general, and combining the results of several experiments in particular, may justify placing greater confidence in multiple-testing procedures than in other ways of conducting science.

Significance Statement

The statistical p value is commonly used to express the significance of research findings. But a single p value cannot meaningfully represent a study involving multiple tests of a given hypothesis. I report a survey that confirms that a large fraction of neuroscience work published in The Journal of Neuroscience does involve multiple-testing procedures. As readers, we normally evaluate the strength of a hypothesis-testing study by “combining,” in an ill-defined intellectual way, the outcomes of multiple experiments that test it. We assume that conclusions that are supported by the combination of multiple outcomes are likely to be stronger and more reliable than those that rest on single outcomes. Yet there is no standard, objective process for taking multiple outcomes into account when evaluating such studies. Here, I propose to adapt methods normally used in meta-analysis across studies to help rationalize this process. This approach offers many direct and indirect benefits for neuroscientists’ thinking habits and communication practices.

Introduction

Scientists are not always clear about the reasoning that we use to conduct, communicate, and draw conclusions from our work, and this can have adverse consequences. A lack of clarity causes difficulties and wastes time in evaluating and weighing the strength of each others’ reports. I suggest that these problems have also influenced perceptions about the “reproducibility crisis” that science is reportedly suffering. Concern about the reliability of science has reached the highest levels of the NIH ( Collins and Tabak, 2014 ) and numerous other forums ( Landis et al., 2012; Task Force on Reproducibility, American Society for Cell Biology, 2014 ). Many of the concerns stem from portrayals of science like that offered by the statistician, John Ioannidis, who argues that “most published research findings are false,” especially in biomedical science ( Ioannidis, 2005 ). He states “… that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p value <0.05 ” (italics added).

He continues, “Research is not most appropriately represented and summarized by p values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p values.”

Additional concerns are added by Katherine Button and colleagues ( Button et al., 2013 ), who conclude that much experimental science, such as neuroscience, is fatally flawed because its claims are based on statistical tests that are “underpowered,” largely because of small experimental group sizes. Statistical power is essentially the ability of a test to identify a real effect when it exists. Power is defined as “1-β,” where β is the probability of failing to reject the null hypothesis when it should be rejected. Statistical power varies from 0 to 1 and values of ≥0.8 are considered “good.” Button et al. (2013) calculate that the typical power of a neuroscience study is ∼0.2, i.e., quite low.

However, these serious concerns arise from broad assumptions that may not be universally applicable. Biomedical science encompasses many experimental approaches, and not all are equally susceptible to the criticisms. Projects in which multiple tests are performed to arrive at conclusions are expected to be more reliable than those in which one test is considered decisive. To the extent that basic (“pre-clinical”) biomedical science consists of scientific hypothesis testing, in which a given hypothesis is subjected to many tests of its predictions, it may be more reliable than other forms of research.

It is critical here to distinguish between a “scientific hypothesis” and a “statistical hypothesis,” which are very different concepts ( Alger, 2019 ; chapter 5). A scientific hypothesis is a putative conceptual explanation for an observation or phenomenon; it makes predictions that could, in principle, falsify it. A statistical hypothesis is simply a mathematical procedure (often part of Null Hypothesis Significance Testing, NHST) that is conducted as part of a broader examination of a scientific hypothesis ( Alger, 2019 , p. 133). However, scientific hypotheses can be tested without using NHST methods, and, vice versa, NHST methods are often used to compare groups when no scientific hypothesis is being tested. Unless noted otherwise, in this essay “hypothesis” and “hypothesis testing” refer to scientific hypotheses.

To appreciate many of the arguments of Ioannidis, Button, and their colleagues, it is necessary to understand their concept of positive predictive value (PPV; see equation below). This is a statistical construct that is used to estimate the likelihood of reproducing a given result. PPV is defined as “the post-study probability that [the experimental result] is true” ( Button et al., 2013 ). In addition to the “pre-study odds” of a result’s being correct, the PPV is heavily dependent on the p value of the result and the statistical power of the test. It follows from the statisticians’ assumptions about hypotheses and neuroscience practices that calculated PPVs for neuroscience research are low ( Button et al., 2013 ). On the other hand, PPVs could be higher if their assumptions did not apply. I stress that I am not advocating for the use of the PPV, which can be criticized on technical grounds, but must refer to it to examine the statistical arguments that suggest deficiencies in neuroscience.

To look into the first assumption, that neuroscience typically bases many important conclusions on single p valued tests, I analyze papers published in consecutive issues of The Journal of Neuroscience during 2018. For the second assumption, I review elementary joint probability reasoning that indicates that the odds of obtaining a group of experimental outcomes by chance alone are generally extremely small. This notion is the foundation of the argument that conclusions derived from multiple experiments should be more secure those derived from one test. However, there is currently no standard way of objectively evaluating the significance of a collection of results. As a step in this direction, I use two very different procedures, Fisher’s method of combining results and meta-analysis of effect sizes ( Cummings and Calin-Jageman, 2017 ) measured by Cohen’s d , which have not, as far as I know, been applied to the problem of combining outcomes in the way that we need. Finally, in Discussion, I suggest ways in which combining methods such as these can improve how we assess and communicate scientific findings.

Materials and Methods

To gauge the applicability of the statistical criticisms to typical neuroscience research, I classified all Research Articles that appeared in the first three issues of The Journal of Neuroscience in 2018 according my interpretation of the scientific “modes” they represented, i.e., “hypothesis testing,” “questioning,” etc., because these modes have different standards for acceptable evidence. Because my focus is on hypothesis testing, I did a pdf search of each article for “hypoth” (excluding references to “statistical” hypothesis and cases where “hypothesis” was used incorrectly as a synonym for “prediction”). I also searched “predict” and “model” (which was counted when used as a synonym for “hypothesis” and excluded when it referred to “animal models,” “model systems,” etc.) and checked the contexts in which the words appeared. In judging how to categorize a paper, I read its Abstract, Significance Statement, and as much of the text, figure legends, and Discussion as necessary to understand its aims and see how its conclusions were reached. Each paper was classified as “hypothesis-based,” “discovery science,” (identifying and characterizing the elements of an area), “questioning” (a series of related questions not evidently focused on a hypothesis), or “computational-modeling” (where the major focus was on a computer model, and empirical issues were secondary).

I looked not only at what the authors said about their investigation, i.e., whether they stated directly that they were testing a hypothesis or not, but what they actually did. As a general observation, scientific authors are inconsistent in their use of “hypothesis,” and they often omit the word even when it is obvious that they are testing a hypothesis. When the authors assumed that a phenomenon had a specific explanation, then conducted experimental tests of logical predictions of that explanation, and drew a final conclusion related to the likely validity of the original explanation, I counted it as implicitly based on a hypothesis even if the words “hypothesis,” “prediction,” etc. never appeared. For all hypothesis-testing papers, I counted the number of experimental manipulations that tested the main hypothesis, even if there were one or more subsidiary hypotheses (see example in text). If a paper did not actually test predictions of a potential explanation, then I categorized it as “questioning” or “discovery” science. While my strategy was unavoidably subjective, the majority of classifications would probably be uncontroversial and disagreements unlikely to change the overall trends substantially.

To illustrate use of the statistical combining methods, I analyzed the paper by Cen et al. (2018) , as suggested by a reviewer of the present article. The authors made multiple comparisons with ANOVAs followed by Bonferroni post hoc tests; however, to make my analysis more transparent, I measured means and SEMs from their figures and conducted two-tailed t tests. When more than one experimental group was compared with the same standard control, I took only the first measurement to avoid possible complications of non-independent p values. I used the p values to calculate the combined mean significance level for all of the tests according to Fisher’s method (see below). This is an extremely conservative approach, as including the additional tests would have further increased the significance of the combined test.

For the meta-analysis of the Cohen’s d parameter ( Cummings and Calin-Jageman, 2017 ; p. 239), I calculated effect sizes on the same means and SEMs from which p values were obtained for the Fisher’s method example. I determined Cohen’s d using an on-line calculator ( https://www.socscistatistics.com/effectsize/default3.aspx ) and estimated statistical power with G*-Power ( http://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html ). I then conducted a random-effects meta-analysis on the Cohen’s d values with Exploratory Software for Confidence Interval (ESCI) software, which is available at https://thenewstatistics.com/itns/esci/ ( Cummings and Calin-Jageman, 2017 ).

Of the total of 52 Research Articles in the first three issues of The Journal of Neuroscience in 2018, I classified 39 (75%) as hypothesis-based, with 19 “explicitly” and 20 “implicitly” testing one or more hypotheses. Of the remaining 13 papers, eight appeared to be “question” or “discovery” based, and five were primarily computer-modeling studies that included a few experiments (see Table 1 ). Because the premises and goals of the non-hypothesis testing kinds of studies are fundamentally distinct from hypothesis-testing studies ( Alger, 2019 ; chp. 4), the same standards cannot be used to evaluate them, and I did not examine these papers further.

Analysis of The Journal of Neuroscience Research Articles

Classification of research reports published in The Journal of Neuroscience , vol. 38, issues 1–3, 2018, identified by page range ( n  = 52). An x denotes that the paper was classified in this category. Categories were: Hyp-E: at least one hypothesis was fairly explicitly stated; Hyp-I: at least one hypothesis could be inferred from the logical organization of the paper and its conclusions, but was not explicitly stated; Alt-Hyp: at least one alternative hypothesis in addition to the main one was tested; # Tests: is an estimate of the number of experiments that critically tested the major (not subsidiary or other) hypothesis; Support: the tests were consistent with the main hypothesis; Reject: at least some tests explicitly falsified at least one hypothesis; Disc: a largely “discovery science” report, not obviously hypothesis-based; Ques: experiments attempted to answer a series of questions, not unambiguously hypothesis-based; Comp: mainly a computational modeling study, experimental data were largely material for model.

None of the papers based its major conclusion on a single test. In fact, the overarching conclusion of each hypothesis-based investigation was supported by approximately seven experiments (6.9 ± 1.57, mean ± SD, n  = 39) that tested multiple predictions of the central hypothesis. In 20 papers, at least one (one to three) alternative hypothesis was directly mentioned. Typically (27/39), the experimental tests were “consistent” with the overall hypothesis, while in 19 papers, at least one hypothesis was explicitly falsified or ruled out. These results replicate previous findings ( Alger, 2019 ; chapter 9).

As noted earlier, some science criticism rests on the concept that major scientific conclusions rest on the outcome of a single p valued test. Indeed, there are circumstances in which the outcome of a single test is intended to be decisive, for instance, in clinical trials of drugs where we need to know whether the drugs are safe and effective or not. Nevertheless, as the preceding analysis showed, the research published in The Journal of Neuroscience is not primarily of this kind. Moreover, we intuitively expect conclusions bolstered by several lines of evidence to be more secure than those resting on just one. Simple statistical principles quantify this intuition.

Provided that individual events are truly independent—the occurrence of one does not affect the occurrence of the other and the events are not correlated—then the rule is to multiply their probabilities to get the probability of the joint, or compound, event in which all of the individual events occur together or sequentially. Consider five games of chance with probabilities of winning of 1/5, 1/15, 1/20, 1/6, and 1/10. While the odds of winning any single game are not very small, if you saw someone step up and win all five in a row, you might well suspect that he was a cheat, because the odds of doing that are 1/90,000.

The same general reasoning applies to the case in which several independent experimental predictions of a given hypothesis are tested. If the hypothesis is that GABA is the neurotransmitter at a given synapse, then we could use different groups of animals, experimental preparations, etc. and test five independent predictions: that synaptic stimulation will evoke an IPSP; chemically distinct pharmacological agents will mimic and block the IPSP; immunostaining for the GABA-synthetic enzyme will be found in the pre-synaptic nerve terminal; the IPSP will not occur in a GABA receptor knock-out animal, etc. The experiments test completely independent predictions of the same hypothesis, hence the chance probability of obtaining five significant outcomes that are consistent with it by random chance alone must be much lower than that of obtaining any one of them. If the tests were done at p  ≤ 0.05, the chances would be ≤(0.05) 5 or ≤3.13 −7 that they would all just happen to be consistent with the hypothesis. Accordingly, we feel that a hypothesis that has passed many tests to be on much firmer ground than if it had passed only one test. Note, however, that the product of a group of p values is just a number; it is not itself a significance level.

It can be difficult for readers to tease the crucial information out of scientific papers as they are currently written. Not only is the work intrinsically complicated, but papers are often not written to maximize clarity. A common obstacle to good communication is the tendency of scientific papers to omit a direct statement of the hypotheses that are being tested, which is an acute problem in papers overflowing with data and significance tests. An ancillary objective of my proposal for analysis is to encourage authors to be more straightforward in laying out the logic of their work. It may be instructive to see how a complex paper can be analyzed.

As an example, I used the paper of Cen et al. (2018) . Although the paper reports a total of 114 p values, they do not all factor equally in the analysis. The first step is to see how the experiments are organized. The authors state that their main hypothesis is that N-cadherin, regulated by PKD1, promotes functional synapse formation in the rodent brain. It appears that the data in the first two figures of the paper provide the critical tests of this hypothesis. These figures include 43 statistical comparisons, many of which were controls to ensure measurement validity, or which did not critically test the hypothesis, e.g., 18 tests of spine area or miniature synaptic amplitude were supportive, but not critical. I omitted them, as well as multiple comparisons made to the same control group to avoid the possibility of correlations among p values. For instance, if an effect was increased by PKD1 overexpression (OE) and reduced by dominant negative (DN) PKD1, I counted only the increase, as both tests used the same vector-treated control group. In the end, six unique comparisons tested crucial, independent, non-redundant predictions (shown in Figs. 2 A2 , B2 , D2 , E2 , 3 B2 , C2 of Cen et al., 2018 ). I emphasize that this exercise is merely intended to illustrate the combining methods; the ultimate aim is to encourage authors to explain and justify their decisions about including or excluding certain tests in their analyses.

Cen et al. (2018) test the following predictions of their main hypothesis with a variety of morphologic and electrophysiological methods:

(1) N-cadherin directly interacts with PKD1.Test: GST pull-down.

(2) N-cadherin and PKD1 will co-localize to the synaptic region. Test: immunofluorescence images.

Predictions (1) and (2) are descriptive, i.e., non-quantified; other predictions are tested quantitatively.

(3) PKD1 increases synapse formation. Tests ( Fig. 2 A2 ): OE of hPKD1 increases spine density and area ( p  < 0.001 for both); DN-hPKD1 decreases spine density and area ( p  < 0.001 for both).

An external file that holds a picture, illustration, etc.
Object name is SN-ENUJ200177F002.jpg

Diagram of the logical structure of Cen et al. (2018) . The paper reports several distinct groups of experiments. One group tests the main hypothesis and others test subsidiary hypotheses that are complementary to the main one but are not a necessary part of it. Connections between hypotheses and predictions that are logically necessary are indicated by solid lines; dotted lines indicate complementary, but not mandatory, connections. Falsification of the logically-necessary predictions would call for rejection of the hypothesis in its present form; falsification of any of the subsidiary hypothesis would not affect the truth of the main hypothesis. The figure numbers in the boxes identify the source of major data in Cen et al., 2018 that were used to test the indicated hypothesis.

(4) PKD1 increases synaptic transmission. Tests ( Fig. 2 B2 ): OE of hPKD1 increases mEPSC frequency ( p  < 0.006) but not amplitude; DN-hPKD1 decreases mEPSC frequency ( p  < 0.002) but not amplitude.

(5) PKD1 acts upstream of N-cadherin on synapse formation and synaptic transmission. Tests ( Fig. 3 B2 ): DN-hPKD-induced reductions of spine density and area are rescued by OE of N-cadherin ( p  < 0.001 for both). DN-hPKD1-induced reduction in mEPSC frequency is rescued by OE of N-cadherin ( p  < 0.001 for both).

An external file that holds a picture, illustration, etc.
Object name is SN-ENUJ200177F003.jpg

Meta-analysis of the effect sizes observed in the primary tests of the main hypothesis of Cen et al. (2018 ; n  = 6; shown in Fig. 1 ). I obtained effect sizes by measuring the published figures and calculated Cohen’s d values with an on-line calculator: https://www.socscistatistics.com/effectsize/default3.aspx . Analysis and graphic display (screenshot) were done with ESCI (free at https://thenewstatistics.com/itns/esci/ ). Top panel shows individual effect sizes (corrected, d unbiased ) for the tendency of small samples to overestimate true effect sizes (see Cummings and Calin-Jageman, 2017 ; pp 176–177), N s and degrees of freedom (df) of samples compared, together with confidence intervals (CIs) of effect sizes and relative weights (generated by ESCI and based mainly on sample size) that were assigned to each sample. Upper panel also shows mean effect size for random effects model and CI for mean. Bottom panel shows individual means (squares) and CIs for d unbiased (square size is proportional to sample weight). The large diamond at the very bottom is centered (vertical peak of diamond) at the mean effect size, while horizontal diamond peaks indicate CI for the mean.

These are the key predictions the main hypothesis: their falsification would have called for the rejection of the hypothesis in its present form. The organization of these tests of the main hypothesis is illustrated in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is SN-ENUJ200177F001.jpg

Diagram of the main hypothesis and predictions of Cen et al. (2018) . The solid lines connect the hypothesis and the logical predictions tested. This diagram omits experimental controls tests that primarily validate techniques, include non-independent p values, or add useful but non-essential information. The main hypothesis predicts that PKD1 associates directly with N-cadherin, and that PKD1 and N-cadherin jointly affect synaptic development in a variety of structural and physiological ways. Separate groups of experiments test these predictions.

Cen et al. (2018) go on to identify specific sites on N-cadherin that PKD1 binds and phosphorylates and they test the associated hypothesis that these sites are critical for the actions of PKD1 on N-cadherin. They next investigate β-catenin as a binding partner for N-cadherin and test the hypothesis that this binding is promoted by PKD1. While these subsidiary hypotheses and their tests clearly complement and extend the main hypothesis, they are distinct from it and must be analyzed separately. Whether falsified or supported, the outcomes of testing them would not affect the conclusion of the main hypothesis. The relationships among the main hypothesis and other hypotheses are shown in Figure 2 . Note that Cen et al. (2018) is unusually intricate, although not unique; the diagrams of most papers will not be nearly as complicated as Figures 1 , ​ ,2 2 .

Basic probability considerations imply that the odds of getting significant values for all six critical tests in Cen et al. (2018) by chance alone are extremely tiny; however, as mentioned, the product of a group of p values is not a significance level. R.A. Fisher introduced a method for converting a group of independent p values that all test a given hypothesis into a single parameter that can be used in a significance test ( Fisher, 1925 ; Winkler et al., 2016 ; see also Fisher’s combined probability test https://en.wikipedia.org/wiki/Fisher’s_method ; https://en.wikipedia.org/wiki/Extensions_of_Fisher’s_method ). For convenience, I will call this parameter “p FM ” because it is not a conventional p value. Fisher’s combined test is used in meta-analyses of multiple replications of the same experiment across a variety of conditions or laboratories but has not, to my knowledge, been used to evaluate a collection of tests of a single scientific hypothesis. Fisher’s test is:

where p i is the p value of the i th test and there are k tests in all. The sum of the natural logarithms ( ln ) of the p values, multiplied by −2, is a χ 2 variable with 2 k degrees of freedom and can be evaluated via a table of critical values for the χ 2 distribution (for derivation of Fisher’s test equation, see: https://brainder.org/2012/05/11/the-logic-of-the-fisher-method-to-combine-p-values/ ). Applying Fisher’s test to Cen et al.’s major hypothesis (k = 6; df = 12), yields

In other words, the probability, p FM , of getting their collection of p values by chance alone is <0.001, and therefore, we may be justified having confidence in the conclusion. Fisher’s method, or a similar test, does not add any new element but gives an objective estimate of the significance of the combined results. (Note here that the mathematical transformation involved can yield results that differ qualitatively from simply multiplying the p values.) I stress that Fisher’s method is markedly affected by any but the most minor correlations (i.e., r  > 0.1) among p values; notable correlations among these values will cause p FM to be much lower (i.e., more extreme) than the actual significance value ( Alves and Yu, 2014 ; Poole et al., 2016 ). Careful experimental design is required to ensure the independence of the tests to be combined.

Fisher’s method is only one of a number of procedures for combining test results. To illustrate an alternative approach, I re-worked the assessment of Cen et al. (2018) as a meta-analysis (see Borenstein et al., 2007 ; Cummings and Calin-Jageman, 2017 ) of the effect sizes, defined by Cohen’s d , of the same predictions. Cohen’s d is a normalized, dimensionless measure of the mean difference between control and experimental values. I treated each prediction of the main hypothesis in Cen et al. (2018) as a two-sample independent comparisons test, determined Cohen’s d for each comparison, and conducted a random-effects meta-analysis (see Materials and Methods). Figure 3 shows effect sizes together with their 95% confidence intervals for each individual test, plus the calculated group mean effect size (1.518) and its confidence interval (1.181, 1.856). Effect sizes of 0.8 and 1.2 are considered “large” and “very large,” respectively, hence, an effect size of 1.518 having a 95% confidence interval well above zero is quite impressive and reinforces the conclusion reached by Fisher’s method, namely, that Cen et al.’s experimental tests strongly corroborate their main hypothesis.

The findings underscore the conclusions that (1) when evaluating the probable validity of scientific conclusions, it is necessary to take into account all of the available data that bear on the conclusion; and (2) obtaining a collection of independent experimental results that all test a given hypothesis constitutes much stronger evidence regarding the hypothesis than any single result. These conclusions are usually downplayed or overlooked in discussions of the reproducibility crisis and their omission distorts the picture.

To appreciate the problem, we can re-examine the argument that the PPV of much neuroscience is also low ( Button et al., 2013 ). PPV is quantified as:

where R represents the “pre-study odds” that a hypothesis is correct, α is the p value, and 1-β is the power of the statistical test used to evaluate it. R is approximated as the anticipated number of true (T) hypotheses divided by the total number of alternative hypotheses in play, true plus false; i.e., R = T/(T + F). This argument depends heavily on the concept of “pre-study odds.” In the example of a “gene-screen” experiment ( Ioannidis, 2005 ) that evaluates 1000 genes, i.e., 1000 distinct “hypotheses” where only one gene is expected to be the correct one (note that these are not true hypotheses, but it is simplest to retain the statisticians’ nomenclature here). R is ∼1/1000, and with a p value for each candidate gene of 0.05, PPV would be quite low, ∼0.01, even if the tests have good statistical power (≥0.8). That is, the result would have ∼1/100 chance of being replicated, apparently supporting the conclusion that most science is false.

Fortunately, these concerns do not translate directly to laboratory neuroscience work in which researchers are testing actual explanatory hypotheses. Instead of confronting hundreds of alternatives, in these cases, previous work has reduced the number to a few genuine hypotheses. The maximum number of realistic alternative explanations that I found in reviewing The Journal of Neuroscience articles was four and that was rare. Nonetheless, in such cases, R and PPV would be relatively high. For example, with four alternative hypotheses, R would be 1/4; i.e., ∼250 times greater than in the gene-screen case. Even with low statistical power of ∼0.2 and p value of 0.05, PPV would be ∼0.5, meaning that, by the PPV argument, replication of experimental science that tests four alternative hypotheses should be ∼50 times more likely than that of the open-ended gene-screen example.

Furthermore, PPV is also inversely related to the p value, α; the smaller the α, the larger the PPV. A realistic calculation of PPV should reflect the aggregate probability of getting the cluster of results. Naive joint probability considerations, Fisher’s method, or a meta-analysis of effect sizes, all argue strongly that the aggregate probability of obtaining a given group of p values will be much smaller than any one p value. Taking these much smaller aggregate probabilities into account gives much higher PPVs for multiple-part hypothesis-testing experiments. For example, Cen et al. (2018) , as is very common, do not specify definite alternative hypotheses; they simply posit and test their main hypothesis, so the implied alternative hypothesis is that the main one is false; hence, R = 1/2. Applying p  < 0.001, as suggested by both Fisher’s method and meta-analysis, to Cen et al.’s main hypothesis implies a PPV of 0.99, that is, according to the PPV argument, their primary conclusion regarding N-cadherin and PKD1 on synapse formation should have a 99% chance of being replicated.

Finally, we should note that these calculations incorporate the low statistical power reported by Button et al. (2013) , i.e., 0.2, whereas actual power in many kinds of experiments may be higher. Cen et al. (2018) did not report a pre-study power analysis, yet post hoc power (as determined by G* Power software) for the six tests discussed earlier ranged from 0.69 to 0.91 (mean = 0.79), which, although much higher than the earlier estimate, is still underestimated. Power depends directly on effect size, which for the results reported by Cen et al. (2018) ranged from 1.38 to 2.02, and the version of G* Power that I used does not accept effect sizes >1.0. Thus, the higher levels of statistical power achievable in certain experiments will also make their predicted reliability dramatically higher than previously calculated.

To determine the validity and importance of a multifaceted, integrated study, it is necessary to examine the study as a whole. Neuroscience has no widely accepted method for putting together results of constituent experiments and arriving at a global, rational assessment of the whole. Since neuroscience relies heavily on scientific hypothesis testing, I propose that it would benefit from a quantitative way of assessing hypothesis-testing projects. Such an approach would have a number of benefits. (1) Typical papers are jammed full of experimental data, and yet the underlying logic of the paper, including its hypotheses and reasoning about them, is frequently left unstated. The use of combining methods would require authors to outline their reasoning explicitly, which would greatly improve the intelligibility of their papers, with concomitant savings of time and energy spent in deciphering them. (2) The reliability of projects whose conclusions are derived from several tests of a hypothesis cannot be meaningfully determined by checking the reliability of one test. The information provided by combining tests would distinguish results expected to be more robust from those likely to be less robust. (3) Criticisms raised by statisticians regarding the reproducibility of neuroscience often presuppose that major scientific conclusions are based on single tests. The use of combining tests will delineate the limits of this criticism.

Fisher’s method and similar meta-analytic devices are well-established procedures for combining the results of multiple studies of the “same” basic phenomenon or variable; however, what constitutes the “same” is not rigidly defined. “Meta-analysis is the quantitative integration of results from more than one study on the same or similar questions” ( Cummings and Calin-Jageman, 2017 ; p. 222). For instance, it is accepted practice to include studies comprising entirely different populations of subjects and even experimental conditions in a meta-analysis. If the populations being tested are similar enough, then it is considered that there is a single null hypothesis and a fixed-effects meta-analysis is conducted; otherwise, there is no unitary null-hypothesis, and a random-effects meta-analysis is appropriate ( Fig. 3 ; Borenstein et al., 2007 ; Cummings and Calin-Jageman, 2017 ). Combining techniques like those reviewed here have not, as far as I know, expressly been used to evaluate single hypotheses, perhaps because the need to do so has not previously been recognized.

Meta-analytic studies can reveal the differences among studies as well as quantify their similarities. Indeed, one off-shoot of meta-analysis is “moderator analysis” to track down sources of variability (“moderators”) among the groups included in an analysis ( Cummings and Calin-Jageman, 2017 , p. 230). Proposing and testing moderators is essentially the same as putting forward and testing hypotheses to account for differences. In this sense, among others, the estimation approaches and hypothesis-testing approaches clearly complement each other.

I suggest that Fishers’ method, meta-analyses of effect sizes, or related procedures that concatenate results within a multitest study would be a sensible way of assessing the significance of many investigations. In practice, investigators could report both the p values from constituent tests as well as an aggregated significance value. This would reveal the variability among the results and assist in the interpretation of the aggregate significance value for the study. Should the aggregate test parameters themselves have a defined significance level and, if so, what should that be? While ultimately the probability level for an aggregate test that a scientific community recognizes as “significant” will be a matter of convention, it might make sense to stipulate a relatively stringent level, say ≤0.001 or even greater, for any parameter, e.g., p FM , that is chosen to represent a collection of tests.

It is important that each prediction truly follow from the hypothesis being investigated and that the experimental results are genuinely independent of each other for the simple combining tests that I have discussed. (More advanced analyses can deal with correlations among p values; Poole et al., 2016 .) There is a major ancillary benefit to this requirement. Ensuring that tests are independent will require that investigators plan their experimental designs carefully and be explicit about their reasoning in their papers. These changes should improve both the scientific studies and the clarity of the reports; it would be good policy in any case and the process can be as transparent as journal reviewers and editors want it to be.

Besides encouraging investigators to organize and present their work in more user-friendly terms, the widespread adoption of combining methods could have additional benefits. For instance, it would steer attention away from the “significance” of p values. Neither Fisher’s method nor meta-analyses require a threshold p value for inclusion of individual test outcomes. The results of every test of a hypothesis should be taken into account no matter what its p value. This could significantly diminish the unhealthy overemphasis on specific p values that has given rise to publication bias and the “file drawer problem” in which statistically insignificant results are not published.

Use of combining tests would also help filter out single, significant-but-irreproducible results that can otherwise stymie research progress. The “winner’s curse” ( Button et al., 2013 ), for example, happens when an unusual, highly significant published result cannot be duplicated by follow-up studies because, although highly significant, the result was basically a statistical aberration. Emphasizing the integrated nature of most scientific hypothesis-testing studies will decrease the impact of an exceptional result when it occurs as part of a group.

Of course, no changes in statistical procedures or recommendations for the conduct of research can guarantee that science will be problem free. Methods for combining test results are not a panacea and, in particular, will not curb malpractice or cheating. Nevertheless, by fostering thoughtful experimental design, hypothesis-based research, explicit reasoning, and reporting of experimental results, they can contribute to enhancing the reliability of neuroscience research.

Recently, a large group of eminent statisticians ( Benjamin et al., 2018 ) has recommended that science “redefine” its “α” (i.e., significance level), to p  < 0.005 from p  < 0.05. These authors suggest that a steep decrease in p value would reduce the number of “false positives” that can contribute to irreproducible results obtained with more relaxed significance levels. However, another equally large and eminent group of statisticians ( Lakens et al., 2018 ) disagrees with this recommendation, enumerating drawbacks to a much tighter significance level, including an increase in the “false negative rate,” i.e., missing out on genuine discoveries. This second group argues that, instead of redefining the α level, scientists should “justify” whatever α level they choose and do away with the term “statistical significance” altogether.

I suggest that there is third way: science might reserve the more stringent significance level for a combined probability parameter, such as p FM . This would provide many of the advantages of a low p value for summarizing overall strength of conclusions without the disadvantages of an extremely low p value for individual tests. A demanding significance level for conclusions derived from multiple tests of a single hypothesis would help screen out the “false positives” resulting from single, atypical test results. At the same time, a marginal or even presently “insignificant” result, would not be discounted if it were an integral component of a focused group of tests of a hypothesis, which would help guard against both the problem of “false negatives” and an obsession with p values.

Acknowledgments

Acknowledgements: I thank Asaf Keller for his comments on a draft of this manuscript. I also thank the two reviewers for their useful critical comments that encouraged me to develop the arguments herein more fully.

Reviewing Editor: Leonard Maler, University of Ottawa

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Raymond Dingledine.

Dear Dr Alger

Your revised manuscript is much improved. One reviewer has requested some minor revisions (below). I have also read the revised ms in detail and fully agree with this reviewer. I think this Ms can potentially be very relevant to the design of statistical analyses of neuroscience papers.

I have not been able to get a second reviewer and I think, in light of the comments of the one reviewer and my own careful review of the Ms, that it will not be necessary to do so.

Leonard Maler

Reviewer comments:

My initial comments have all been addressed. However, the extensive new text raises a few additional issues.

line 129: “random” - an unfortunate use of the word, unless Brad actually used a randomizing technique to select the paper.

line 184-85 “If the tests were done at p<.05....” I get the logic of this, with the analogy of the odds of flipping 5 heads in a row for example, but it doesn’t seem entirely right. For example if the investigator were to select p<.8 for each 2-sided test of an hypothesis, and perform 20 such tests, the random chance that all 20 tests would be consistent with the hypothesis is 0.012. Fisher’s method takes care of this problem, with chi2=8.9 and 40 DF the combined pFM-value is >99.9%, ie not in support of the hypothesis. It would be worth pointing this issue out to the reader unless my logic is faulty.

line 249: “i” should be subscript for clarity

line 358: “mutli"

Author Response

Dear Dr. Maler,

I was encouraged to learn that both reviewers found merit my manuscript that might, if suitably revised, allow it to be published. I thank both for their thoughtful comments and suggestions and believe that responding to them has enabled me to improve the paper in many ways.

The MS was submitted, at the recommendation of Christophe Bernard, as an Opinion piece and I felt constrained by the journal’s “2-4 page,” limit on Opinions. This revision includes a fuller description of my arguments, a worked example, as well as additional discussion that the original MS lacked. I suggested Fisher’s Method as an example of a combining approach that offers a number of benefits to interpreting and communicating research results. The Method has been well-studied, is intuitively easy to grasp and, provided that the combined p-values are independent, is robust. Most importantly, I believe that its use would have the salutary effect of obliging authors to be more explicit about their reasoning and how they draw their conclusions than they often are. However, there is nothing unique about Fisher’s Method, and I now include a meta-analysis of effect sizes of the identical results that I used with Fisher’s Method, as well as citations to alternative methods, to make this point. I have re-titled the paper to reflect the de-emphasis of the method, as I believe is consonant with the critiques of the MS.

Reviewer 1 raised several profound objections that, in the limit, would call into question the validity of most published neuroscience research. Because these are unquestionably complex and significant issues, I’ve addressed them in detail here but, because they go far beyond the confines of my modest Opinion, have only touched on them briefly in the text. The text has been thoroughly revised with major changes appearing in bold.

General remarks

I am afraid that my presentation of Fisher’s Method overshadowed my overarching goal, which was to call attention to weaknesses in how scientists present their work and how published papers are evaluated, I used the reproducibility crisis to call attention to both problems. I believe that, while testing scientific hypotheses is the bulwark of much research, as a community we do not think about or communicate as clearly as we could, and that the reliability of scientific hypothesis-testing work is, as a result, under-estimated by statistical arguments.

The idea of using Fisher’s Method, essentially a meta-analytic approach, is a novel extension of current practice, and I’ve clarified and moderated my proposal in this regard. The changes do not alter the fundamental argument that having multiple tests focused on a single hypothesis can lead to stronger scientific conclusions than less tightly organized collections of results.

When investigators do multiple different tests of a hypothesis, they expect the reader to put the results together intellectually and arrive at a coherent interpretation of the data. Yet there is no standard objective method for quantitatively combining a group of related results. Both Reviewers allude to the problem of the complexity of published papers. Reviewer 1 calls the Cen et al. (2018) paper a “tangle” of results (I counted 116 comparisons in the figures) and doubts that any coherent interpretation of them is to be had. Likewise Reviewer 2 notes that Bai et al. (2018) lists 138 p-values, and asks how they could be interpreted.

The Reviewers’ vivid descriptions of the confusing ways in which papers are presented confirms the need for improvements in current practices. I believe that improvements are possible and that focusing attention on the hypotheses that are usually present in many publications can be a starting point for change.

The Reviewers’ comments underscore the deficiencies in the current system, which is sadly lacking in impetus for rigor of communication. Placing the onus of discerning the underlying logic that shapes a project exclusively on readers is unreasonable. I believe papers can be made much more user-friendly. Authors should state whether they have a hypothesis and which of results test it. They should tell us plainly their conclusions are justified and why. I suspect that merely telling people to change their ways would have little effect. An indirect approach that offers investigators the ability to apply a solid, objective way of summarizing their results might be a step in right direction.

The Reviewer raises several issues, but two are most important: a request for a worked example, which I now provide, and the “file drawer problem,” in which results that do not meet a prescribed significance level, usually p<0.05, are not reported, but left in a file drawer, which skews the literature towards positive results. My proposal can contribute to alleviating this problem.

1. A worked example; Cen et al (2018).

Cen et al. state (p. 184):

”... we proposed that N-cadherin might contribute to the “cell- cell adhesion” between neurons under regulation of PKD.”

Further, “In this work, we used morphological and electrophysiological studies of cultured hippocampal neurons to demonstrate that PKD1 promotes functional synapse formation by acting upstream of N-cadherin.”

In the last paragraph of the Discussion (p. 198) they conclude:

"Overall, our study demonstrates one of the multiregulatory mechanisms of PKD1 in the late phase of neuronal development: the precise regulation of membrane N-cadherin by PKD1 is critical for synapse formation and synaptic plasticity, as shown in our working hypothesis (Fig. 10).” The bolded quotes express, respectively, their main hypothesis, the methods they use to test it, and an overview of their conclusions about it. After testing the main hypothesis, they tested related hypotheses that each had its own specific predictions. All of the findings are represented in their model.

In brief (details in Methods section of MS) to analyze the paper I counted as actual predictions only experiments which were capable of falsifying the hypothesis. Non-quantifiable descriptive experiments contributed logically to the hypothesis test, but obviously did not count quantitatively. They authors also included many control experiments and other results which served merely to validate a particular test, and I do not include these either. To simplify the example and avoid possibly including non-independent tests, if more than one manipulation tested a given prediction, say against a common control group, I counted only the first one. Note that this is an extremely conservative approach, as including more supportive experiments in meta-analyses typically strengthen the case.

For example, Cen et al. (2018) tested the following predictions of their main hypothesis:

a. Prediction: PKD1 increases synapse formation. Test: Over-expression (OE) of hPKD1 increases spine density (p<0.001); dominant negative (DN)-hPKD1 decreases spine density (p<0.001).

b. Prediction: PKD1 increases synaptic transmission. Test: OE of hPKD1 increases mEPSC frequency (p<0.05); DN-hPKD1 decreases mEPSCfrequency (p<0.01).

c. Prediction: PKD1 regulates N-cad surface expression. Test: Surface biotinylation assay. OE hPKD1 increases N-cad (p<0.05); DN-hPKD1 decreases N-cad (p<0.001).

d. Prediction: PKD1 acts upstream of N-cad on synapse formation. Test: knock-down of PKD1 is rescued by OE of N-cad; spine density (p<0.001) and mEPSC frequency (p<0.01).

I believe a strong case can be made that each prediction does follow logically from the main hypothesis. Furthermore, although the subsidiary subsequent hypotheses are linked to the main one, they are clearly separate hypotheses and not predictions of it; no matter what the outcomes of testing them are, they do not directly reflect on the truth or falsity of the main hypothesis.

Thus, the authors go on to identify specific sites on N-cad that PKD1 binds and phorphorylates and to test the hypothesis that these sites are critical for the actions of PKD1 on N-cad. They also investigate β-catenin as a binding partner for N-cad and test the hypothesis that this binding is promoted by PKD1. At the end, they test the hypothesis that N-cad influences LTP and acts pre-synaptically. These are distinct, though related, hypotheses that feed into the, authors’ summary hypothesis. I believe the structure of Cen et al.’s paper, their procedures, and conclusions follow a reasoned, logical plan, which is not always as explicitly delineated as it could be, but is typical of most JNS papers (see also analysis of Bai et al., below). I now include diagrams of the complete structure of the paper, together with a detailed diagram of the logic of the main hypothesis in Figures 2 and 3.

2. My use of Fisher’s Method.

In his 1932 text R.A. Fisher states:

"When a number of quite independent tests of significance have been made, it sometimes happens that although few or none can be claimed individually as significant, yet the aggregate gives an impression that the probabilities are on the whole lower than would often have been obtained by chance. It is sometimes desired, taking account only of these probabilities, and not of the detailed composition of the data from which they are derived, which may be of very different kinds, to obtain a single test of the significance of the aggregate, based on the product of the probabilities individually observed.” (Italics added.)

The Reviewer asks if all tests in Cen et al. relate to “the same hypothesis?” This question has two answers: one trivial, one substantial. The Reviewer also asks about “all available evidence.”

a. The “same hypothesis?” The answer is “no” in the trivial sense: there is more than one hypothesis in the paper. Yet the paper is organized into clusters of experiments that each do test “the same hypothesis.” As shown above, many experiments test predictions of the main hypothesis.

In a more interesting sense, the answer is “yes,” each group of experiments does a single hypothesis. First, it is critical to distinguish between, “statistical” and “scientific” hypotheses, as they are not the same thing (see Alger, 2019). The scientific hypothesis is a putative explanation for a phenomenon; the statistical hypothesis is part of a mathematical procedure, e.g., for comparing results. Certainly, when combining tests from different kinds of experiments, e.g., electrophysiological recordings and dendritic spine measurements, then the identical statistical null hypothesis cannot be at issue, as the underlying populations are entirely different. But combined analyses do not require the identical null hypotheses (see, e.g., overview in Winkler et al. 2016). Or consider that a standard random-effects meta-analysis is specifically designed to include different populations of individuals and different experimental conditions. In fact, commonly, “... a meta-analysis includes studies that were not designed to be direct replications, and that differ in all sorts of ways.” (Cummings and Calin-Jageman, p. 233). Hence, the “same hypothesis” cannot mean the same statistical null hypothesis. Actually. Fisher’s Method tests a “global null” hypothesis not a single null and that is why it can accommodate the results of many different experimental techniques. Indeed, this approach allows combining in a rational way groups of results in a way that agrees with common usage and reasoning.

b. Are all tests a “fair and unbiased sample of the available evidence?” In a combining method, authors identify which of their experiments test the hypothesis in question and include all of those test results in their calculations. There is no selection process; simply include all of the relevant tests that were done.

3. The “file drawer problem"

The Reviewer alludes several times to the “file-drawer” problem. As I emphasize in original and revised MSs, Fisher’s Method does not presuppose or require a particular threshold p-value for significance, if the p-value for a test is 0.072, then that is what goes into calculating pFM. Therefore, both Fisher’s Method and analogous meta-analytic approaches explicitly and directly work against the file-drawer problem by calling for the reporting of all relevant data. Combining methods can help ameliorate, not exacerbate, the file drawer problem.

In the specific context of Cen et al. (2018) is there internal evidence for the problem?

a. According to Simonsohn et al. (2013) the “p-curve,” i.e., the distribution of p-values, for a set of findings is highly informative. The basic argument is that, if investigators are selectively sequestering insignificant results, e.g., those with p-values > 0.05, then there will be an excess of p-values which just reach significance, between 0.04 and 0.05 because, having achieved a significant result, investigators lose motivation to try for a more significant one. There will be a corresponding dearth of results that are highly significant, e.g. {less than or equal to} 0.01. In contrast, legitimate results will tend to have an excess of highly significant results have the opposite pattern, that is, a large excess of highly significant results and far fewer marginal ones. I did a census of the p-values (n = 114) in Cen et al.’s figures and found there were 52 at {less than or equal to} 0.001, 20 between 0.01, 18 at 0.05 and 24 non-significant at 0.05. Thus the p-value pattern in this paper is exactly opposite to the expectation of the file-drawer problem.

b. Too small effect sizes and-or low power would be inconsistent with the pattern of highly significant results and might suggest “questionable research practices.” To see if the effect sizes and post-hoc power calculations (a priori values were not given) in Cen et al.’s paper were consistent with their p-values, I measured means and standard errors roughly from the computer displays of their figures. I determined effect sizes with an on-line effect-size calculator (https://www.socscistatistics.com/effectsize/default3.aspx). The mean effect sizes (Cohen’s ds) was 1.518 (confidence interval from 1.181 to 1.856). I used G*power (http://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html) and found that the post-hoc power of these tests ranged from 0.69 to 0.92 (mean = 0.79). Although these power values are very good, they are underestimates, because G*power does not accept an effect size greater than 0.999, i.e., less than the real values. Since power increases with effect size, the actual power of Cen et al.’s tests were even higher than estimated. Finally, I carried out a meta-analysis of measured effect sizes with the free software program ESCI (available at https://thenewstatistics.com/itns/esci/; Cummings and Calin-Jageman, 2017). As Figure 4 in the MS shows, the mean effect size (vertical peak of the diamond at the bottom) falls within the cluster of effect sizes and has a 95% confidence interval that is well above zero. These large values are consistent with the results that Cen et al. claim. In short, there is no obvious reason to think that proceeding to analyzing with a combined test would no be legitimate.

c. The Reviewer implies that the prevalence of positive results in, e.g., JNS papers, is prima facie evidence either of “incredible predictive ability” or bad behavior on the part of investigators. This seems to be an unduly harsh or cynical view of our colleagues and a very depressing slant on our field, I would like to suggest other plausible explanations:

c.i. Ignorance. According to a survey that I conducted (Alger, 2019; Fig. 9.2A, p. 221), the majority of scientists (70% of 444 responses) have had {less than or equal to} 1 hour of formal instruction in the scientific method and scientific thinking. This mirrors my own educational background, and I strongly suspect that many neuroscientists are simply unaware of the importance of reporting negative or weak results.

c.ii. Imitation. Authors, particularly younger ones, mimic what they read about and see in print. It is a vicious cycle that should be broken, but in the meantime we shouldn’t be surprised if individuals who see others reporting only positive results do the same.

c.iii. Distorted scientific review and reward system. The simple fact is that referees, journal editors, and granting agencies hold the power to say what gets published and, therefore, rewarded by grants, promotions, speaking engagements, etc. If the literature is skewed, the primary responsibility lies with these groups. Authors have no choice but to follow their lead.

c.iv. The conundrum of pilot data. This crucial issue often gets little attention in this context,, so I’ll expand on it. Briefly, I believe that the most common data found in file drawers are not well-controlled, highly powered, definitively negative studies but, rather, small, weak, inconclusive clutches of pilot data that just seemed “not worth following up.”

These are the kinds of data that reviewers would probably reject as meaningless, and herein lies the conundrum. If an investigator wants to abandon an apparently unpromising line of investigation and also wants to avoid committing the file-drawer offense, what to do? Continue work to achieve a decisive result with a small, variable effect size? This could require large numbers of tests, many subjects etc., to achieve adequate power for a result of possibly dubious value, or even a situation in which the required ‘n’ becomes so “impractically and disappointingly large” (e.g., Cummings and Calin-Jageman, 2017; p. 285) it is infeasible to proceed.

Like most of my colleagues, I have faced this issue and made what I thought was a rational and defensible decision to stop throwing scarce and hard-won taxpayer dollars down the drain and wasting my and most co-workers valuable time and effort on a dead-end project. But, and this is absolutely crucial for the argument, strictly speaking my decision was unjustified. Because I found, say, the effect sizes of drug X, were small and my n’s were low, the power of my pilot tests did not permit me to reject the hypothesis that drug X was effective, with confidence. It appears that an extreme “report all data” standard could mean that we can’t do a study without being committed to turning it into a full investigation. Probably the file drawer problem would solved or drastically reduced by abolishing all pilot studies. But everyone agrees that this is nonsense; pilot studies that are “hardly ever for reporting” are a vital, integral part of science (Cummings and Calin-Jageman, p. 261).

An alternative to banning pilot studies might be to place inconclusive data into an unreviewed repository, such as bioRxiv. Of course, this would transform much of the “file-drawer problem” into a “bioRxiv problem’” and not get more negative partial results published in refereed journals. The literature would still be dominated by positive results. A major advance would be to nurture much needed respect for thoroughgoing, rigorously obtained negative data, a goal that I heartily support, because good hard negative data play the lead role in rejecting a false hypothesis. This worthy goal is independent of the pilot data conundrum, the question of using Fisher’s Method, or other forms of meta-analyses along the lines I suggest.

d. The Reviewer concludes, “Finally, and most importantly, it is highly unlikely that Cen et al. reports all of the evidence collected for this project [...] without testing at least a couple of other possibilities.” But this leads back to the concern about pilot studies; indeed Cen et al., might well have done unreported pilot studies.

In sum, whether it is reasonable or not to use Fisher’s Method, or to include expanded sets of would-have-been pilot data, are critical though nrelated questions. Mixing them together will not help us answer either one.

4. Do I either misunderstand Fisher’s Method, “most neuroscience research,” or both?

I have explained my rationale for suggesting Fisher’s Method and have now included another method of meta-analysis which leads to the same conclusions as that Mether. I have carefully explained, documented, and analyzed the neuroscience research that my proposal is intended for. By design, my proposal only applies to multiple-prediction hypothesis-testing studies, as others are ineligible for combining methods.

5. How did I code the papers and was my system reliable?

The Reviewer asks about the “coding” that I used to classify the papers as hypothesis-testing or not. I did not code the papers in a rote, mechanical way. Instead, I read the papers carefully and judged them as sensibly as I could in concert with other factors:

I was mindful of the results of a survey that I conducted of hundreds of scientists (Alger, 2019, Fig. 9,7, p. 226), in which a majority (76%) of respondents (n = 295) replied that they “usually” or “always” stated the hypothesis of their papers. So I assumed that the papers had a logical coherence and looked for it.

To address the issue of reliability, I re-analyzed the 52 JNS papers from 2018 Inasmuch as the nearly all of them are well outside my area of expertise, and it had been over 9 months since I went over them, I had very little retention of their contents (or even ever having seen them before!). I re-downloaded all the papers to start with fresh pdfs and analyzed them again without looking at my old notes. The original- and re-analyses agreed reasonably well.

In the most important break-down, I initially classified 42/52 (80.8%) as hypothesis testing, on re-analysis, I classified 38/52 (73.1%) this way. Of the 42 initially classed as hypothesis-testing in the re-analysis, I categorized 35 (83.3%) this way. The distribution of explicit to implicit hypotheses was 20 and 22, originally, and 19 and 19 in the reanalysis. The average number of experimental tests per hypothesis-testing paper was 7.5 {plus minus} 2.26 (n = 42, CI, 6.187,8.131) initially, and 6.08 {plus minus}2.12 (n = 38, CI, 5.406, 6.754) on re-analysis (Cohen’s d = 0.648). It appears that my judgments were fairly consistent, with the exception of my tallies of the average number of critical experiments/paper. In this case, prompted by the Reviewers’ comments, I omitted control and redundant experiments that I had initially included and focused on key manipulations that were most directly aimed at testing the main hypothesis. This change does not alter the impact of the analysis. In preparing the revision, I have re-reviewed key discrepancies between original and re-analysis and reconciled them in the two analyses and updated Table 1 to reflect my best judgments.

Naturally, there may be honest disagreements in judgments about whether an experiment does test a legitimate prediction of a hypothesis, whether it actually follows necessarily and, hence, could really falsify it, etc.. This is why I do not classify Cen et al.’s prediction that PKD1 enhances LTP as following from their main hypothesis. The known linkages between mechanisms of synapse formation and LTP do not allow one to deduce that PKD1 must enhance LTP. That is, if PKD1 did not enhance LTP, their main hypothesis would not be threatened. Of course, the analysis underlying that classification rating in the Table is necessarily less exhaustive than my treatments of the Cen et al., (2018) and Bai et al. (2018) papers.

Regardless of the precise details, I think that it’s clear that the majority of research published in JNS relies on some form of hypothesis testing and draws its conclusions from multiple lines of evidence. Despite the good agreement in my two analyses, I stress that the main points that I wanted to make were, firstly, that the hypothesis-testing science that dominates the pages of JNS, is entirely distinct from the studies that base their major conclusions on single experiments that the statisticians focus on. And, secondly, that scientific authors should be inspired to make the presentation of their work more readily interpretable to readers.

The fact that many papers are beset by needless complexity is a shortcoming that I suspect both Reviewers would agree exists. Ideally, by calling attention to how scientific papers are organized, my approach will help encourage authors to be more forthright in explaining and describing their work, its purpose and logic.

I think that what is frequently lacking in scientific publications is a clear outline of their underlying logical structure. Our current system tolerates loose reasoning and vagueness and authors sometimes throw all kinds of data into their paper and leave it to the rest of us to figure it all out. I find this sort of dis-organization in a paper deplorable and suspect the Reviewers would agree. Where we may disagree is in our interpretations of the mass of data that is often presented. I believe that poor communication practices do not imply poor science. Rather they reflect the factors-deficient training, imitation, improper rewards-that I’ve alluded to, rather than absence of structure. Reviewer 1 says, “there is no easy way,” of untangling results, and I agree. Yet, deficiencies in presentation skills are correctible with training and encouragement from responsible bodies, mentors, and supervisors. Requiring authors to state their reasoning explicitly is exactly the kind of benefit that procedure such as Fisher’s Method could help foster.

6. Does Boekel et al. (2015) seriously undermins confidence in neuroscience generally?

The main point that Boekel et al. make is that, when using pre-registration standards-publishing their intended measurements, analysis plan, etc., in advance, they got very different results than had been reported in a series of original studies they tried to directly replicate. However, this study, though evidently carefully carried out, need not destroy faith in neuroscience research for at least two reasons: a) the research investigated by Boekel et al. is not representative of a great deal of neuroscience research such as in JNS. b) As a form of meta-science, Boekel et al.’s results themselves are subject to criticism, some of which I offer below.

a. Boekel et al., tried to replicate five cognitive neuroscience studies, comprising 17 sub-experiments, which were mainly correlational comparisons of behavioral phenomena, e.g. response times or sizes of social networks, with MRI measurements of gross brain structures. Their replication rate was low: only 1/17 (6%) studies met their criterion for replication. Although one way of assessing replicability is to ask whether or not the mean of the replicating study falls within the confidence interval of the original study (e.g. Calin-Jageman and Cummings, 2019). The data of Boekel et al. show (their Fig. 8) that this obviously happened in 5 cases, and was borderline in 2 others. In this sense, the replicators got “the same results” in 5/17 (29%) or 7/17 (41%) experiments, which are lower than the 83% expected (Calin-Jageman and Cummings, 2019), though clearly greater than 6%. In any event, the message of Boekel et al. was heavily dependent on the outcome of a single test; e.g. the predicted relationship between the behavior measure and an MRI measurement.

In contrast with the type of experiment, both Reviewers of my MS found that a randomly selected JNS paper reported a highly complex set of experiments that supported its conclusion. Such papers cannot be meaningfully assessed by testing one piece of evidence. No single experiment per se in either Cen et al. or Bai et al., is so crucial that the main message of the papers would be destroyed if it were not replicated. Failure of an experiment could mean that the hypothesis is false, or it could point to a variety of alternative explanations for the failure.

c. Boekel et al.’s study is quite small, including only five replication attempts and two are from the same group (Kanai and colleagues), which raises broad concerns about the degree to which it represents neuroscience at large, as well as issues of sample independence in this study. More flags are raised by the fact that, in all, Boekel et al. examined 17 correlational experiments from the five studies. The MRI tests were done on limited groups of subjects, i.e., on average, > 3 datasets were obtained from each subject. Can these be considered 17 independent studies and, if not, to what extent this would affect the validity of the conclusions?

Boekel et al.’s work was done in their home country, the Netherlands, and yet three of the five studies involved measures (e.g., social media behavior) which could well show a cultural influence. In their Discussion, the authors consider these and other factors that could have influenced their study and indicate that further tests are required to dig deeper..

d. Reviewer 1’s comments about not using “all available data” seem especially germane to Boekel et al., who acknowledge that the experiments they chose to replicate for their meta-analysis were not selected randomly or exhaustively, but were specifically chosen, “...from the recent literature based on the brevity of their behavioral data acquisition.” However justifiable this rationale might be, their selection process runs counter to the importance of using “all available evidence” in a broad meta-analytic context.

My concern about the generalizability of Boekel et al.’s results is further heightened by a number of additional peculiarities of their paper:

Remarkably, the first paper that Boekel et al., attempt and fail to replicate (Forstman et al., 2010) is from their own laboratory (Forstmann is the senior author on Boekel et al.). Moreover, Forstmann had already replicated their own result in 2010, before the 2015 study. In 2010, the authors report replications of measurements of Supplementary Motor Area (SMA) involvement in their behavioral test, which involves a psychological construct, the LBA (“Caution”) parameter, which purports to account for a “speed-accuracy” trade off that their subjects perform when responding quickly to a perceptual task. There are a number of worrisome features of these studies: In the original study (Forstmann 1) they show a highly significant and precise (p=0.00022) correlation, r = 0.93, between the SMA measure and the Caution parameter. Likewise, in the 2010 replication, they find r = 0.76 (p=0.0050). In 2015, Boekel et al. found r=0.03, and no p-value was reported (the Bayes factor provided some support for the null hypothesis of zero correlation). It is unlikely that simple sampling error can account for the marked differences among the three tests and, as noted, the authors discuss several possibilities that could account for systematic error..

It is worth mentioning in this context that Boekel et al. (2015) has been challenged on technical grounds (Boekel et al. responded in 2016; Cortex 74:248-252). Nevertheless, in 2017 (M.C. Keuken, et al. Cortex, 93:229-233), also from Forstmann’s laboratory, published a lengthy “Corrigendum” in which they corrected a “mistake in the post-processing pipeline” and a number of other errors in Boekel et al. (2015) which did not, however, alter their original conclusions.

(As a side note, according to Google Scholar, Forstmann et al. (2010) has been cited a respectable 328 times and Boekel et al. (2015) 154 times (as of 4/4/20). Apparently, Forstmann et al. (2010) was wrong yet this paper does not appear to have been retracted.)

For all of these reasons, I do not think that the findings of Boekel et al. (2015) can be fairly generalized to question the validity of neuroscience research as a whole.

As regards more extensive replicability studies, I have extensively reviewed the Reproducibility Study: Psychology (RPP) by the Open Science Consortium, led by Nosek that attempted to reproduce 100 psychology studies (Alger, 2019, Chp. 7). In brief, the reproducibility rate of the RPP varied from 36 - 47%, low, but again much higher than Boekel et al. (2015) found. In the book, I also cover critical reviews of the RPP, which identified a range of technical issues, including faithfulness of replication conditions, statistical power calculations etc., leading to the conclusion that, while reproducibility was probably lower than would be ideal, the observed rates were not so far from the expected rates as warrant declaring a “crisis.”

Conclusion of response to Reviewer 1.

I regret having conveyed an unrealistically “sunny” picture of neuroscience (which I do not hold) and have tried to correct this impression in the text. I do believe that the reliability of neuroscientific research is as poor as extreme views have it, that reliability is enhanced when rigorous, well-controlled, statistically sound, procedures are applied to testing hypothesis with a battery of logically-entailed predictions. I drew on the statisticians’ construct of PPV, to show that my view is consonant with the statistical arguments. The process of presenting experimental results logically, identifying scientific hypotheses, and experiments that test them can yield considerable benefits to neuroscience. The concept of combining results objectively using Fisher’s Method, meta-analysis, or perhaps other techniques, can bring needed attention to this often overlooked topic.

The Reviewer points to the paper by Bai et al. (2018, JNS, pp. 32-50) to raise questions of the independence of tests, conceptual focus of papers, and “pre-study odds.” In response, I have discussed the caveats of Fisher’s Method, reduced its prominence and, as the Reviewer recommends, added an alternative method of meta-analysis to decrease emphasis on Fisher’s Method and the PPV argument.

Reviewer 2 echoes Reviewer 1’s concerns about applying a method like Fisher’s given the complexity of the literature. To respond to this concern, I have analyzed Bai et al., as I did Cen et al.. although I do not include this analysis in the MS. As the Reviewer infers, there is more than one hypothesis in the paper; a central one as well as ancillary hypotheses that are linked to it, but are not predicted by the main one. Despite its complexity however, I believe there is a relatively straightforward logical structure to the paper.

1. Hypothesis tests in Bai et al. (2018)

The main hypothesis of this paper is that circular RNA DGLAPA (circDGLAPA) regulates ischemic stroke outcomes by reducing the micro RNA, miR-143, levels and preserving blood-brain barrier (BBB) integrity. The authors test several predictions of this hypothesis:

a. Prediction: miR-143 levels should be elevated in ALS and stroke models. Test: measure miR-143 level in ALS patients and tMCAO mice. (Fig. 1)

b. Prediction: miR-143 levels should be related to degree of stroke damage: Test: compare infarct size in tMCAO WT and miR-143 knock-down (miR-143+/-) mice. (Fig. 2)

c. Prediction: tMCAO-induced loss of cerebrovascular integrity will be reduced in miR-143+/- mice. Test: BBB permeability in WT and miR-143+/- tMCAO models. Levels of tight junctional proteins (TJPs) are decreased by miR-143. (Fig. 2)

d. Prediction: circDGLAPA should bind to miR-143 and be decreased in stroke models. Test: measure circDGLAPA levels in ALS patients and tMCAO mice. Biochemical assays and cellular co-localization of circDGLAPA and miR-143. (Figure 3)

e. Prediction: overexpression of circDGLAPA should be neuroprotective in tMCAO mice. Test: infarct size, BBB permeability, and TJP protein measurements in control and overexpression of circDGLAPA conditions. (Figs. 4, 5)

I believe these are all direct predictions of the main hypothesis; falsification of any of them would call for its rejection or revision. After testing the main hypothesis, Bai et al. go on to test two related mechanistic hypotheses about how miR-143 damages the BBB. These hypotheses are that: miR-143 regulates the transition of endothelial cells into mesenchymal cells (EndoMT), and that that it does so by targeting HECTD1.

Inasmuch as the outcomes of the two mechanistic explanations cannot affect the truth or falsity of the main hypothesis, I do not classify them as integral to it but, rather, as plausibly complementary extensions of it. However, I am not an expert in this area, and I don’t see the links between the main hypothesis and the subsidiary tests as being as tight as an expert might. Nevertheless, while a debate on this point may affect the shape of the logical structure of the paper, it would not, in my opinion, support an argument that there is no such structure. Nor would it challenge the conclusion that the structure is fundamentally a hypothesis-testing one. Obviously, tree diagrams similar to those in Figs. 2 and 3 could be constructed for Bai et al. (2018).

2. Independence of tests and Fisher’s Method in Bai et al. (2018)

The Reviewer also raises an important concern about the number of truly independent experiments, because the simple form of Fisher’s Method yields excessively high combined p-values if non-independent constituent p-values are used. I now reiterate this critical feature in the text, but note that when independence can be assumed, that Fisher’s test performs well (e.g., Alves and Yu, 2014, PLoS One, 9(3):91225; Poole et al., 2016, Bioinformatics, 32:i430-i436). I also argue that the issue can be dealt with by being conservative-if there is doubt about whether one test is independent of another, leave one out of the calculation. This is conservative, since including it would almost certainly increase the significance of the combined test. I agree with the Reviewer’s comment that all judgments such as those regarding independence might be “open to debate,” but argue that this does not means that judgments are invalid or worthless. Moreover, the debate about them can be illuminating. The difficulties a reader might have in deciding which experiments are truly independent remain largely unaddressed under the current system.

With regard to the specific issue of Fig. 2 in Bai et al., it is difficult to know whether the whole-brain Evans Blue measurements (2A,B) and Western Blot analyses for three different proteins at three different time points (6h, 12h, and 24h) post-surgery were done on single groups of experimental animals. The methods that the authors use, one-way ANOVA followed by a multiple-corrections comparison, should allow for data from at least one time point, say at 24h, to be used in the calculation. In any case, the authors would have to defend their reasoning for including each p-value and the p-values would have to be independent.

My goal is not to have readers become more proficient in ferreting out the logic that infuses a paper, but to persuade authors to make their reasoning plainer. If an investigator wishes to use a summarizing meta-analytic technique, then the onus is on the investigator to make the case for using it.

3. “Pre-study odds"

I have tried to correct the misapprehension that I inadvertently caused on the issue of PPV. Again, my purpose was not to defend or advocate for this parameter, but to respond to the criticisms of statisticians’ that partly rest on it. I now point out that the concept of “pre-study odds” is not unequivocally well-defined and is therefore debatable. It is particularly problematic given the statisticians’ tendency to consider every possible experimental outcome, say particular gene, to be a “hypothesis,” as opposed to the more substantive definition of scientific hypothesis as an explanation for a phenomenon. Despite the theoretical difficulty in determining what exactly “pre-study odds” means, in practical terms, it is very rare for there to be more than two or three genuine hypotheses to be at issue in a neuroscience investigation. The conclusion that hypothesis-testing projects that test only a small set of alternative hypotheses must be more dependable than open-ended searches with hundreds or thousands of conceivable hypotheses is a direct consequence of the PPV argument.

Finally, combining methods such as Fisher’s and meta-analyses and the PPV argument are entirely separate issues: combining methods could provide insightful summary analyses without PPV and vice versa, PPV calculations do not require a combination test of any kind.

4. Other methods of meta-analysis.

As the Reviewer mentions there are numerous possibilities for meta-analysis. I acknowledged as much in the original MS and, as an example, cited Winkler et al. that lists a variety of such methods. These authors discuss Fisher’s, Stouffer’s and many others as well as a variety of “non-parametric comparisons” for combining data. Both Fisher’s and Stouffer’s tests are similarly affected by correlated p-values, however, Alves and Yu (2014) demonstrate that Fisher’s Method generally outperforms Stouffer’s under these conditions. Nevertheless, no test is perfect.

The use of combination tests that the Reviewer mentions, where the “same experiment is replicated by different people using different animals,” would be a conventional context for meta-analysis. In contrast, my primary intention was not to champion any particular method, but rather to put forward the concept of using combination methods to summarize the different results of a multi-faceted test of a hypothesis.

5. Other comments

a. Calculate post-hoc statistical power of Fisher’s Method? An excellent question. According to Winkler et al., it is possible to calculate something like statistical power, but it must be done numerically and is difficult. In view of the uncertainties necessarily surrounding all of these statistical procedures, it is apparently impractical.

b. Is Fisher’s test “one-sided?” Another interesting question which is related to the previous one. Fisher’s Method tests a “global null” statistical hypothesis: namely, that none of the independent probability tests “partial nulls” that it combines are null. Fisher’s Method is an example of a “union-intersection” test that expressly includes non-identical datasets. Hence, there is not a single null “side” of a distribution. It is more accurate to think of Fisher’s test as having a multi-dimenstional “null space,” which is the intersection of all true null results (see e.g., Fig. 1 of Winkler et al. 2016) rather than a side.

  • Alger BE (2019) Defense of the scientific hypothesis: from reproducibility crisis to big data . New York: Oxford University Press. [ Google Scholar ]
  • Alves G, Yu YK (2014) Accuracy evaluation of the unified p-values from combining correlated p-values . PLoS One 9 :e91225. 10.1371/journal.pone.0091225 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ, Berk R, Bollen KA, Brembs B, Brown L, Camerer C, Cesarini D, Chambers CD, Clyde M, Cook TD, De Boeck P, Dienes Z, Dreber A, Easwaran K, Efferson C, Fehr E, et al. (2018) Redefine statistical significance . Nat Hum Behav 2 :6–10. 10.1038/s41562-017-0189-z [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borenstein M, Hedges L, Rothstein H (2007) Meta analysis: fixed effects vs. random effects . Available from www.meta-analysis.com . [ PubMed ]
  • Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR (2013) Power failure: why small sample size undermines the reliability of neuroscience . Nat Rev Neurosci 14 :365–376. 10.1038/nrn3475 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cen C, Luo LD, Li WQ, Li G, Tian NX, Zheng G, Yin DM, Zou Y, Wang Y (2018) PKD1 promotes functional synapse formation coordinated with N-cadherin in hippocampus . J Neurosci 38 :183–199. 10.1523/JNEUROSCI.1640-17.2017 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Collins FS, Tabak LA (2014) Policy: NIH plans to enhance reproducibility . Nature 505 :612–613. 10.1038/505612a [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cummings G, Calin-Jageman R (2017) Introduction to the new statistics: estimation, open science and beyond . New York: Routledge. [ Google Scholar ]
  • Fisher RA (1925) Statistical methods for research Workers In: Biological monographs and manuals series . Edinburgh: Oliver and Boyd. [ Google Scholar ]
  • Ioannidis JPA (2005) Why most published research findings are false . PLoS Med 2 :e124. 10.1371/journal.pmed.0020124 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ, Argamon SE, Baguley T, Becker RB, Benning SD, Bradford DE, Buchanan EM, Caldwell AR, Van Calster B, Carlsson R, Chen SC, Chung B, Colling LJ, Collins GS, Crook Z, Cross ES, et al. (2018) Justify your alpha . Nat Hum Behav 2 :168–171. 10.1038/s41562-018-0311-x [ CrossRef ] [ Google Scholar ]
  • Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, Crystal RG, Darnell RB, Ferrante RJ, Fillit H, Finkelstein R, Fisher M, Gendelman HE, Golub RM, Goudreau JL, Gross RA, Gubitz AK, Hesterlee SE, Howells DW, Huguenard J, et al. (2012) A call for transparent reporting to optimize the predictive value of preclinical research . Nature 490 :187–191. 10.1038/nature11556 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Poole W, Gibbs DL, Shmulevich I, Bernard B, Knijnenburg TA (2016) Combining dependent P-values with an empirical adaptation of Brown’s method . Bioinformatics 32 :i430–i436. 10.1093/bioinformatics/btw438 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Task Force on Reproducibility, American Society for Cell Biology (2014) How can scientists enhance rigor in conducting basic research and reporting research results . Available at http://www.acsb.org/reproducibility .
  • Winkler AM, Webster MA, Brooks JC, Tracey I, Smith SM, Nichols TE (2016) Non-parametric combination and related permutation tests for neuroimaging . Hum Brain Mapp 37 :1486–1511. 10.1002/hbm.23115 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Your Article Library

The revealed preference theory of demand.

strong form of preference hypothesis

ADVERTISEMENTS:

The Revealed Preference Theory of Demand!

In both the Marshallian cardinal utility theory of demand and Hicks-Allen indifference curve theory of demand introspective method has been applied to explain the consumer’s behaviour. In other words, both these theories provide psychological explanation of consumer’s demand; they derive laws about consumer’s demand from how he would react psychologically to certain hypo­thetical changes in price and income.

But the Revealed Preference Theory which has been put for­ward by Prof. Samuelson seeks to explain consumer’s demand from his actual behaviour in the market in various price-income situations. Thus, in sharp contrast to psychological or introspective explanation Prof. Samuelson’s revealed preference theory provides behaviouristic explanation of consumer’s demand. Besides, revealed preference theory is based upon the concept of ordinal utility.

In other words, revealed preference theory regards utilities to be merely comparable and not quantifiable. Prof. Tapas Majumdar has described Samuelson’s revealed preference theory as “Behaviourist Ordinalist.” The description “Behaviourist Ordinalist” highlights the two basic fea­tures of the revealed preference theory: first, it applies behaviouristic method, and secondly it uses the concept of ordinal utility.

Preference Hypothesis and Strong Ordering :

Prof. Samuelson’s revealed preference theory has Preference Hypothesis as a basis of his theory of demand. According to this hypothesis when a consumer is observed to choose a combination A out of various alternative combinations open to him, then he ‘reveals’, his preference for A over all other alternative combinations which he could have purchased. In other words, when a consumer chooses a combination A, it means he considers all other alternative combinations which he could have purchased to be inferior to A.

Still in other words, it means he rejects all other alternative combinations open to him in favour of the chosen combination A. Thus, according to Prof. Samuelson, choice reveals preference. Choice of a combination A reveals his definite preference for A over all other rejected combinations.

From the hypothesis of ‘choice reveals preference’ we can obtain definite information about the preferences of a consumer from observing his behaviour in the mar­ket. By comparing preferences of a consumer revealed in different price-income situations we can obtain certain information about his preference scale.

Let us graphically explain the preference hypothesis. Given the prices of two commodities X and Y and the income of the consumer, budget line PL is drawn in Fig. 12.1. The budget line PL, represents a given price-income situation.

Given the price-income situation as represented by PL, the consumer can buy or choose any combination lying within or on the triangle OPL. In other words, all combinations lying on the line PL such as A, B,C and lying below the line PL such as D, E, F and G are alternative combinations open to him, from among which he has to choose any combination.

If our consumer chooses combi­nation A out of all those open to him in the given price-income situation, it means he reveals his preference for A over all other combinations such as B, C, D, E and F which are rejected by him. As is evident from Fig. 12.1, in his observed cho­sen combination A, the consumer is buying OM quantity of commodity X and ON quantity of commodity Y.

Strong Form of Preference Hypothesis:

It should be carefully noted that Prof. Samuelson’s revealed preference theory is based upon the strong form of preference hypothesis. In other words, in revealed preference theory, strong-ordering preference hypothesis has been applied.

Strong ordering implies that there is definite ordering of various combinations in consumer’s scale of preferences and therefore the choice of a combination by a consumer reveals his definite preference for that over all other alternatives open to him. Thus, under strong ordering, relation of indifference between various alternative combinations is ruled out.

When in Fig. 12.1 a consumer chooses a combination A out of various alternative combinations open to him, it means he has a definite preference for A over all others, the possibility of the chosen combination A being indifferent to any other possible combina­tion is ruled out by strong ordering hypothesis.

Choice Reveals Preference

Consistency Postulate or Weak Axiom of Revealed Preference (WARP):

The revealed preference theory rests upon another basic assumption which has been called the ‘consistency postulate’. In fact, the consistency postulate is implied in strong ordering preference hypothesis. The consistency postulate can be stated thus: “no two observations of choice behaviour are made which provide conflicting evidence to the individual’s preference.”

In other words, consis­tency postulate asserts that if an individual chooses A rather than B in one particular instance, then he cannot choose B rather than A in any other instance. If he chooses A rather than B in one instance and chooses B rather than A in another when A and B are present in both the instances, then he is not behaving consistently.

Thus, consistency postulate requires that if once A is revealed to be preferred to B by an individual, then B cannot be revealed to be preferred to A by him at any other time when A and B are present in both the cases. Since comparison here is between two situations, consistency involved in this case has been called ‘two term consistency’ by J. R. Hicks.

If a person chooses a combination A rather than combination B which he could purchase with the given budget constraint, then it cannot happen that he would choose (i.e. prefer) B over A in some other situation in which he could have bought A if he so wished.

This means his choices or preferences must be consistent. This is called revealed preference axiom. We illustrate, revealed preference axiom in Figure 12.2. Suppose with the given prices of two goods X and Y and given his money income to spend on two goods, PL is the budget line facing a consumer.

In this budgetary situation PL, the consumer chooses A when he could have purchased B (note that combination B would have even cost him less than A). Thus, his choice of A over B means he prefers the combina­tion A to the combination B of the two goods.

Now suppose that price of good X falls, and with some income adjustment, budget line changes to P’L’. Budget line P’L’ is flatter than PL reflecting relatively lower price of X as compared to the budget line PL. With this new budget line PLc if the consumer chooses combination B when he can purchase combination A (as A lies below the budget line PLc in Figure 12.2), then the consumer will be inconsistent in his preferences that is, he will be violating the axiom of revealed preference.

Such inconsistent consumer’s behaviour is ruled out in revealed preference theory based on strong ordering. This axiom of revealed preference according to which consumer’s choices are consistent is also called ‘Weak Axiom of revealed Preference or simply WARP.

To sum, up according to the axiom of revealed preference if combination A is directly revealed preferred to another combina­tion B, then in any other situation, the combination B cannot be revealed to be preferred to A by the consumer when combination A is also affordable.

Consumer's Preferences are Inconsisitent

Now consider Figure 12.3 where to start with a consumer is facing budget line PL where he chooses combination A of two goods X and Y. Thus, consumer prefers combination A to all other combinations within and on the triangle OPL.

Now suppose that budget constraint changes to PLC and consumer purchases combination B on it as combination B lies outside the original budget line PL it was not affordable when combination A was chosen. Therefore, choice of combination B with the budget line PLC is consistent with his earlier choice A with the budget constraint PL and is in accordance with the axiom of revealed preference.

Consumer's Choices Satisfy Axiom of Revealed Preference

Transitivity Assumption of Revealed Preference Theory:

The axiom of revealed preference described above provides us a consistency condition that must be satisfied by a rational consumer who makes an optimum choice. Apart from the axiom of revealed preference, revealed preference theory also assumes that revealed preferences are transi­tive.

According to this, if an optimizing consumer prefers combination A to combination B of the goods and prefers combination B to combination C of the goods, then he will also prefer combination A to combination C of the goods. To put briefly, assumption of transitivity of preferences requires that if A > B and B > C, then A > C.

In this way we say that combination A is indirectly revealed preferred to combination C. Thus, if a combination A is either directly or indirectly revealed to be preferred to another combination we say that combination A is revealed to be preferred to the other combination. Consider Figure 12.4 where with budget constraint PL, the consumer chooses A and therefore reveals his preference for A over combination B which he could have purchased as combination B is affordable in budget constraint PL.

Now suppose budget constraint facing the consumer changes to P’L’, he chooses B when he could have purchased C. Thus, the consumer prefers B to C. From the transitivity assumption it follows that the consumer will prefer combination A to combination C.

Thus, combination A is indirectly revealed to be preferred to combination C. We therefore conclude that the consumer prefers A either directly or indirectly to all those combination of the two goods lying in the shaded region in Figure 12.4.

Reevealed Preferences and Transitive

It is thus evident from above that concept of revealed preference is a very significant and powerful tool which provides a lot of information about preferences of a consumer who behaves in an optimising and consistent manner. By merely booking at the consumer’s choices in different price- income situations we can get a lot of information about the underlying consumer’s preferences.

Deriving Demand Theorem from Revealed Preference Hypothesis :

Revealed preference hypothesis can be utilised to establish the demand theorem. Prof. Samuelson has derived the Marshallian law of demand from his revealed preference hypothesis. Marshallian law of demand, as is well known, states that a rise in the price of a good must, if income and other prices are held constant, results in the reduction of amount demanded of the good and vice versa.

In other words, according to Marshall’s law of demand, there is inverse relation between price and amount demanded of a good. Samuelson proceeds to establish relationship between price and demand by assuming that income elasticity of demand is positive. From positive income elasticity, he deduces the Marshallian inverse price-demand relationship.

He states the demand theorem what he calls the Fundamental Theorem of Consumption Theory as under:

“Any good (simple or composite) that is known always to increase in demand when money income alone rises must definitely shrink in demand when its price alone rises”. It is clear from the above statement of Fundamental Theorem of Consumption that positive income elasticity of demand has been made a necessary qualification to the inverse price-demand relationship.

The geometrical proof of the Fundamental Theorem is illustrated in Fig. 12.5. Let us suppose that the consumer spends his entire income on two goods X and Y. Further suppose that his income in terms of good X is OB, and in terms of Y is OB.

Now the budget line AB represents the price-income situation confronting the consumer, within or on the triangle OAB are available to the consumer, from which he can buy any combina­tion. Suppose that the consumer is observed to choose the combination Q. This means that Q is revealed to be preferred to all other combinations that lie in or on the triangle OAB.

Now, suppose that price of good X rises, price of remaining unchanged. With the rise in price of X the budget line shifts to the new position AC. The budget line AC represents new price-income situation. We now want to know what is the ef­fect of this rise in price of good X on its quantity demanded, assuming that demand varies directly with income (i.e., income elasticity of demand is positive).

It is evident from Fig.12.5 that combi­nation Q is not available to the consumer in price- income situation AC. Let us compensate the con­sumer for the higher price of X by granting him extra money so that he can buy the same combination Q even at the higher price of X.

The amount of money which is required to be granted to the consumer so that he could buy the original combi­nation Q at the higher price of X has been called Cost-difference by Prof. J. R. Hicks. In Fig. 12.5, a line DE parallel to AC has been drawn so that it passes through Q. DE represents the higher price of X and the money income after it has been increased by cost difference.

Proving Fundamental Demand Theorem in Case of Rise in price

Now, the question is which combination will be chosen by the consumer in price-income situa­tion DE. The original combination Q is available in price-income situation DE. It is evident from Figure 12.5 that he will not choose any combination lying below Q on the line DE.

This is because if he chooses any combination below Q on the line DE, his choice would be inconsistent. All combinations below Q on DE, that is, all combinations on QE could have been bought by the con­sumer but had been rejected by him in price-income situation AB in favour of Q. (All points on QE were contained in the original choice triangle OAB.).

Since we are assuming consistency of choice behaviour on the part of the consumer he will not choose in price-income situation DE, any combi­nation below Q on QE in preference to Q when Q is available in the new situation.

It follows, therefore, that in the price-income situation DE the consumer will either choose the original combi­nation Q or any other combination on QD segment of DE or within shaded area QAD. It should be noted that choice of any other combination on QD or within the shaded area QAD in preference to Q by the consumer will not be inconsistent since combinations lying above Q on QD or within shaded region QAD were not available in price-income situation AB.

In price-income situation DE if the consumer chooses the original combination Q, it means he will be buying the same amount of goods X and Y as before, and if he chooses any combination above Q on QD or within the shaded area QAD, it means that he will be buying less amount of commodity X and greater amount of Y than before.

Thus, even after sufficient extra income has been granted to the consumer to compensate him for the rise in price of good X, he purchases either the same or the smaller quantity of X at the higher price. Now, if the extra money granted to him is withdrawn, he will definitely buy the smaller amount of X at the higher price, if the demand for good X is known always to fall with the decrease in income (that is, if income elasticity of demand fox X is positive).

In other words, when the price of good X rises and no extra money is granted to the consumer so that he faces price-income situation AC, he will purchase less amount of good X than at Q. Thus assuming a positive income-elasticity of demand, the inverse price-demand relationship is es­tablished so far as rise in price is concerned.

That the inverse price-demand relationship holds good in case of a fall in price also is demon­strated in Fig. 12.6. Let us suppose that AB repre­sents original price income situation and further that other consumer reveals his preference for Q over all other combinations in or on the triangle OAB. Now, suppose that price of good X falls so that the price line shifts to the right to the position AC.

Let us take away some amount of money from the consumer so that he is left with just sufficient amount of money which enables him to purchase the original combi­nation Q at the lower price of good X. Thus, in Figure. 12.6, a line DE is drawn parallel to AC so that it passes through Q. Price line DE represents lower price of X as given by AC and the money income after it has been reduced by the cost difference.

Proving Fundamental Demand Theorem in Case of Fall in Price

It is obvious that in price-income situation DE, the consumer cannot choose any combination above Q on QD, since all such combinations were available to him in the original price-income situation AS and were rejected by him in favour of Q.

The consumer will, therefore, choose either Q or any other combination on QE or from within the shaded region QEB. In price-income situation DE, his choice of Q means that he buys the same quantity of goods X and Y as in original price-income situation AB, and his choice of any other combination on QE or from within the shaded region QEB means that he buys a larger amount of good X and a smaller amount of good y than in the original price-income situation AB.

Thus, even after consumer’s income has been reduced, he buys either the same quantity of X or more at the lower price. And if we give him back the amount of money taken away from him so that he confronts again price-income situation AC he will definitely buy more of X at the lower price, provided that his demand for X rises with the rise in income (i.e., his income elasticity of demand for good X is positive).

The two demonstrations given above together prove the fundamental theorem of consumption theory, according to which any good whose demand varies directly with income must definitely shrink in demand when its price rises and expands in demand when its price falls.

It may be noted that Samuelson’s theory involves two implicit assumptions which have not been explicitly stated. In the first place the consumer is always shown to choose a combination on the price line. In other words, he is never shown to choose a combination from within the triangle. This is based upon the assumption that a consumer always prefers a larger collection of goods to a smaller one.

Secondly, another implicit assumption involved in Samuelson’s theory is that the consumer is shown to choose only one combination of goods in every price-income situation. With these two implicit assumptions the inverse price-demand relationship is deduced by Samuelson by making explicit assumptions of consistency of choice and a positive income elasticity of demand.

Breaking up of Price Effect into Substitution Effect and Income Effect :

Having now explained the derivation of law of demand from revealed preference approach we are now in a position to show how in the revealed preference approach price effect can be broken up into substitution and income effects. We will explain this by considering the case of fall in price of a commodity.

Now consider Figure 12.7 where, to begin with, price-income situation faced by a consumer is given by the budget the AB. With price-income situation represented by the budget line, AB suppose the con­sumer chooses combination Q and buys OM quantity of commodity X.

Breaking up Price Effect into Substitution and Income Effects

Now, suppose price of commodity X falls and as a result budget line shifts to the new position AC. Now, income of the consumer is reduced so much that the new budget line DE passes through the origi­nal chosen combination Q. That is, in­come is reduced equal to the cost differ­ence so that gain in real income caused by the fall in price of commodity X is cancelled out.

As seen above, with the new budget line DE, to be consistent in his behaviour the consumer can either choose the original combination Q or any combination lying on the segment QE of the budget line DE. If he chooses again the original combination Q, the Slutsky substitution effect well be zero.

However, suppose that the consumer actually chooses combination S on the segment QE of the new budget line DE. Now, choice of the combination S shows that there will be substitu­tion effect due to which the consumer will buy MN more of good X.

Note that sub-situation effect is negative in the sense that the relative fall in price has led to the increase in quantity demanded of X, that is, change in quantity demanded is in opposite direction to the change in price. It should be noted that choice of combination S on segment QE in preference to combination Q of the budget line DE is not inconsistent because combinations on QE segment and within the shaded area were not available before when combination Q was earlier chosen in price- income situation AB.

Thus, with the new budget line DE after consumer’s income has been adjusted to cancel out the gain in real income resulting from a relative fall in price of X, the consumer chooses either Q (when substitution effect is, zero or a combination such as S on segment QE when substi­tution effect leads to the increase in quantity demanded of good X by MN.

This is generally known as Slutsky theorem which states that if income effect is ignored substitution effect will lead to the increase in quantity demanded of the good whose price has fallen and therefore the Marshallian law of demand describing inverse relationship between quantity demanded and price of a good will hold good, that is, due to substitution effect alone demand curve slopes downward.

Now, if the consumer chooses the combination S on the line segment QE of budget line DE it means that he buys MN more due to the substitution effect. Thus he prefers combination S to combi­nation Q. In other words, his choice of S instead of Q reveals that he will be better off at S as compared to Q.

Now, if money income withdrawn from him is restored to him so that he is faced with the budget line AC’. If income effect is positive, he will choose a combination, say R) on the budget line AC’ to the right of point S indicating that as a result of income effect he buys NH more of the commodity X.

Thus quantity demanded of commodity X increases by MN as a result of substitution effect and by NH as a result of income effect. This proves the law of demand stating inverse relationship between price and quantity demanded.

On budget line DE, if the consumer chooses combination Q and consequently substitution effect is zero, the whole increase in quantity demanded MH as a result of decline in price of good X will be due to positive income effect.

However, which is more likely the substitution effect will lead to the choice by a consumer a combination such as S that lies to the right of Q on the line segment QE and will therefore cause increase in quantity demanded. This substitution effect is reinforced by the positive income effect and as result we get a downward sloping demand curve.

It needs to be emphasised that in revealed preference theory it is not possible to locate exact positions of points S and R obtained as a result of substitution effect and income effect respectively. It will be recalled that with indifference curve analysis we could obtain precise points to which a consumer moves as a result of substitution and income effects and as we saw that these were the points of tangency of indifference curves with the relevant budget lines.

As explained above, revealed preference theory is based on the assumption that all points on or below the budget line are strongly ordered and the relation of indifference of a consumer between some combinations of goods is therefore ruled out.

In revealed preference theory, choice of a consumer reveals his preference for a chosen position; it cannot reveal indifference of consumer between combinations. Therefore, in revealed preference theory we can say about the direction of substitu­tion effect through logical ordering and cannot measure the exact size of it, nor we can measure the exact amount of income effect of the change in price.

Besides, the substitution effect obtained through variation in income through cost difference method does not represent pure substitution effect in the Hicksian sense in which the consumer’s satisfaction remains constant. In substitution effect obtained by the revealed preference theory through Slutskian method of cost difference, the consumer moves from point Q to point S on budget line DE.

His choice of S on budget line DE instead of Q under the influence of substitution effect shows that he prefers S to Q. That is, he is better off in position S as compared to the position Q. Therefore, it is maintained by some economists that the substitution effect obtained in revealed preference theory is not a pure one and contains also some income effect.

However, the present author is of the view that the two types of substitution effect (Hicksian and the one obtained in revealed preference theory) differ with regard to the concept of real income used by them. In indifference curve analysis the term real income is used in the sense of level of satisfaction obtained by the consumer, whereas in revealed preference theory real income is used in the sense of purchasing power.

Thus, Hicksian substitution effect involves the change in the quantity demanded of a good when its relative price alone changes, level of his satisfaction remaining the same. On the other hand, revealed preference theory considers substitution effect as the result of change in relative price of a good on its quantity demanded, purchasing power remaining the same.

In Fig. 12.7 we obtain budget line DE after variation in income by cost difference so that it passes through the original combination Q chosen by the consumer before the fall in price. This implies that with budget line DE, he can buy, if he so desires the original combination Q, that is, the gain in purchasing power or real income caused by fall in price of X has been cancelled out by the reduction in his money income.

Critical Appraisal of Revealed Preference Theory :

Samuelson’s revealed preference theory has gained some advantages over the Marshallian cardinal utility theory and Hicks-Allen indifference curve theory of demand. It is the first to apply behaviouristic method to derive demand theorem from observed consumer’s behaviour. In contrast, both the earlier theories, namely, Marshallian utility analysis and Hicks-Allen indifference curve theory were psychological or introspective explanations of consumer’s behaviour.

Now, the question is whether it is the behaviouristic approach or the psychological approach which is more correct to explain consumer’s demand. We are of the opinion that no prior ground for choosing between behaviourist and introspective methods can be offered which would be accept­able irrespective of personal inclinations.

Commenting on the behaviourist – ordinalist controversy, Professor Tapas Majumdar says, “Behaviourism certainly has great advantages of treading only on observed ground; it cannot go wrong.'” But whether it goes far enough is the question. It may also be claimed for the method of introspection that operationally it can get all the results which are obtained by the alternative method, and it presumes to go further, it not only states, but also explains its theorems”.

We may conclude that which of the two methods is better and more satisfactory depends upon one’s personal philosophical inclinations. However, behaviourist method has re­cently gained wide support from the economists and has become very popular.

The concept of reveal preference is a powerful tool which can provide significance informa­tion about consumer’s preferences from which we can derive law of demand or downward sloping demand curve. Revealed preference theory does this without assuming that a consumer possesses complete information about his preferences and indifferences.

In indifference curve analysis it was supposed that consumers had complete and consistent scale of preferences reflected in a set of indifference curves. His purchases of goods were in accordance with the scale of preferences. It is as if consumers were carrying complete indifference maps in their mind and purchasing goods accord­ingly.

Therefore, it was considered better to derive demand theorem by observing consumer’s behaviour in making actual choices. Most economists now-a-days believe that it is unrealistic to assume that a consumers have complete knowledge of their scale of preferences depicted in a set of indifference curves.

The merit of revealed preference theory is that it has made possible to derive law of demand (i.e. downward sloping demand curve) on the basis of revealed preference without using indifference curves and associated restrictive assumptions.

Further, it has enabled us to divide the price effect into its two component parts, namely, substitution and income effects through cost difference method and axiom of revealed preference. Cost difference method requires only market data regarding purchases of goods in different market situations. Cost difference (∆C) can be simply measured by change in price (AP) multiplied by the quantity initially purchased by him. Thus,

∆C = ∆P X .Q X

Where ∆C stands for the cost difference, ∆P x stands for the change in price of good X. Q x is the quantity purchased by the consumer before the change in price of the good X. Further, with revealed preference theory we can even establish the existence of indifference curves and their important property of convexity. However, it is not-worthy that indifference curves are not required for deriving law of demand or downward sloping demand curve.

Indifference curve analysis requires less information than Marshall’s cardinal utility theory. But it still requires a lot of information on the part of a consumer since indifference curve analysis requires him to be able to rank consistently all possible combinations of goods.

On the other hand, in Samuelson’s revealed preference theory of demand the consumer does not require to rank his preferences on the basis of his introspection. It is based on the preferences revealed by his purchases or choices in the different market situations and on the axiom of revealed preference.

If consumer’s preferences and tastes do not change, revealed preference theory enables us to derive demand theorem just from observation of his market behaviour, that is, what purchases or choices he makes in different market situations.

It is however assumed that his preference pattern or tastes do not change. As said above, we can even construct indifference curves from consumers’ revealed preferences even though they are not re­quired for establishing law of demand.

A Critique of Revealed Preference Theory :

Although Samuelson’s revealed preference approach has made some important improvements upon the earlier theories of demand but it is not free from all flaws. Various criticisms have been levelled against it.

First, Samuelson does not admit the possibility of indifference in consumer’s behaviour. The rejection of indifference by Samuelson follows from his strong ordering preference hypothesis.

Prof. J.R. Hicks in his later work “A Revision of Demand Theory” does not consider the assumption of strong ordering as satisfactory and instead employs weak ordering from of preference hypothesis. Whereas under strong ordering, the chosen combination is shown to be pre­ferred to all other combinations in and on the triangle, under weak ordering the chosen combination is preferred to all positions within the triangle but may be either preferred to or indifferent to other combinations on the same triangle (i.e. on the budget line).

Further, in Samuelson’s theory, preference is considered to be revealed for a single act of choice. It has been pointed out that if preference is to be judged from a large number of observations, and then the possibility of indifference also emerges.

Thus, an individual reveals preference for A over B if he chooses A rather than B more frequently than he chooses B rather than A over a given number of observations. Now, we can say that an individual is indifferent between the two situations A and B if a definite preference for either does not emerge from a sufficiently large number of observations.

Thus only because Samuelson regards preference to be revealed from a single act of choice that indifference relation is methodologically inadmissible to his theory. The possibility of indifference relation clearly emerges if the existence of preference or otherwise is to be judged from a suffi­ciently large number of observations.

Furthermore, if we assume that an individual is able to compare his ends, which is a very valid assumption to be made about the individual’s behaviour. Then the possibility of indifference or in other words, remaining at the same level of satisfaction by sacrificing some amount of one good for a certain amount of another good will emerge clearly.

Thus, commenting on the Samuelson’s re­vealed preference theory from ‘welfare’ point of view Prof. Tapas Majumdar remarks: “It may be remembered that in all forms of welfare theory, indeed in any integral view of human activity, we have to assume that the individual can always compare his ends. If this axiom is not granted, the whole of welfare economics falls to the ground. And if this axiom is granted, then the idea of remaining on the same level of welfare while sacrificing something of one commodity for something else of another will emerge automatically.”

Since Samuelson proves his demand theorem on the basis of positive income elasticity of de­mand, it cannot derive the demand theorem when income effect or income elasticity is negative. Thus, Samuelson is able to establish the demand theorem in case in which, in terms of Hicksain indifference curve theory, substitution effect has been reinforced by positive income effect of the price change.

When the income elasticity is negative, Samuelson’s revealed preference theory is unable to establish the demand theorem. In other words, given negative income elasticity of de­mand, we cannot know on the basis of revealed preference theory as to what will be the direction of change in demand as a result of change in price.

Thus Samuelson’s revealed preference theory cannot derive the demand theorem when (i) the income elasticity is negative and the negative in­come effect is smaller than the substitution effect; and (a) the income elasticity is negative and the negative income effect is greater than the substitution effect.

From above it follows that Samuelson’s theory cannot account for Giffen’s Paradox. The case of Giffen goods occurs when the income effect is negative and this negative income effect is so powerful that it outweighs the substitution effect.

In case of Giffen goods, demand varies directly with price. Since he assumes income elasticity to be positive in his establishment of demand theo­rem, his theory cannot explain Giffen good. We thus conclude that though Samuelson makes im­provement over Hicks-Allen indifference curves theory of demand in respect of methodology adopted (that is, its behaviourist method is superior to Hicks-Allen’s introspective method) but in respect of the content of the demand theorem established by it, it is a few steps backward as com­pared to Hicks-Allen indifference curve theory of demand.

Lastly, Samuelson’s fundamental axiom ‘choice reveals preference’ has been criticised. Under conditions of perfect competition choice of a collection by a consumer may well reveal his prefer­ence for it but “this axiom is invalid for situations where the individual choosers are known to be capable of employing strategies of a game theory type” It should, however, be noted that even indifference curve theory does not apply to the situation where strategies of a game theory type are to be employed.

In the end, we may emphasise the point that superiority of Samuelson’s theory lies in his apply­ing scientific or behaviouristic method to the consumer’s demand and his description of preference hypothesis.

Related Articles:

  • Why does Demand Curve Slopes Downward? – Answered!
  • The Revealed Preference Theory of Demand | Economics

No comments yet.

Leave a reply click here to cancel reply..

You must be logged in to post a comment.

web statistics

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 20 June 2023

Social preferences and well-being: theory and evidence

  • Masaki Iwasaki   ORCID: orcid.org/0000-0003-3569-2425 1  

Humanities and Social Sciences Communications volume  10 , Article number:  342 ( 2023 ) Cite this article

2771 Accesses

1 Citations

Metrics details

Many studies have shown that individuals engage in prosocial behaviors, such as pro-environmental and charitable behaviors, on the basis of their social preferences. But the nature of social preferences has not been well studied, and it has been unclear how they relate to individual well-being. It is important to clarify this linkage so that various policies and laws can maximize social welfare. This study explores the hypothesis that social preferences are in general positively correlated with subjective well-being and that individuals who are more prosocial are happier than individuals who are more proself. This study first presents a theoretical model that mathematically describes the relationship between social preferences and subjective well-being. Then it uses survey data from the United States to empirically examine the relationship between the two. Regression analysis finds a statistically significant positive correlation between prosociality and total well-being, a correlation driven primarily by eudaimonic well-being and hedonic well-being, subdomains of total well-being. The effect size of prosociality on well-being is similar to the effect sizes of parenthood, income, and education, which are important determinants of well-being, thus confirming that prosociality is a crucial determinant of individual well-being.

Similar content being viewed by others

strong form of preference hypothesis

Determinants of behaviour and their efficacy as targets of behavioural change interventions

strong form of preference hypothesis

Mechanisms linking social media use to adolescent mental health vulnerability

strong form of preference hypothesis

Participatory action research

Introduction.

International agreements and national laws use rewards, sanctions, nudging, and other techniques of intervention to encourage individuals and businesses to take prosocial actions like recycling and saving electricity to protect the global environment. Footnote 1 These laws may intentionally or unintentionally alter individual preferences in addition to individual behavior (Mattauch et al., 2022 ). When the law encourages prosocial preferences, a simple but essential question, the question investigated by this paper, arises: Are prosocial people happier than proself people? Footnote 2 If persons with prosocial preferences have lower levels of well-being than those with proself preferences, people will be unhappy to the extent that they are impelled or nudged to consider the interests of others as well as their own. This would mean that many laws may cause individuals to be unhappy. Despite its importance, the relationship between social preferences and well-being has not yet been examined scientifically.

Social preferences are the preferences of individuals regarding the payoffs or well-being of others (Charness and Rabin, 2002 ; Levitt and List, 2007 ), and individuals behave prosocially on the basis of their social preferences (Murphy and Ackermann, 2014 ). Individuals with prosocial preferences tend to behave more prosocially than individuals with proself preferences because they are happier themselves when others are happier. In recent years, the relationship between prosocial behavior and well-being has been gaining attention, with many studies finding a causal relationship or at least a correlation between the two (Falk and Graeber, 2020 ; Song et al., 2020 ; Kushlev et al., 2022 ; Rinner et al., 2022 ). Surprisingly, however, little research has been done on the relationship between social preferences and well-being. In general, those with prosocial preferences exhibit greater frequency or degree of prosocial behavior. But because many factors contribute to prosocial behavior, even persons with proself preferences may exhibit prosocial behavior. So are individuals with prosocial preferences happier than individuals with proself preferences? To examine this question scientifically, we need to address two major problems.

The first problem is that no formal theoretical model has yet described the relationship between social preferences and levels of individual well-being. When considering social preferences and well-being, we see that the relationship between relevant variables differs from person to person, and verbal models cannot sufficiently avoid ambiguity. So it is necessary to describe the relationship between social preferences and levels of well-being mathematically. Some studies use verbal models to theoretically analyze the relationship between prosocial behavior and happiness (Carlson et al., 1988 ; Aknin and Whillans, 2021 ; Hui, 2022 ). The present study uses a mathematical model that complements such verbal models.

The second problem is that few studies have empirically examined the relationship between social preferences and well-being (see the “Literature review” section). Because prosocial behavior can be directly observed, it is relatively easy to analyze its relationship with well-being or happiness empirically, which helps explain why there are so many empirical studies of this relationship. Social preferences, on the other hand, cannot be directly observed and must be inferred from individual behavior, making it difficult to explore the relationship between social preferences and happiness. This study investigates the correlation.

It does so by presenting a theoretical framework for analyzing the relationship between social preferences and well-being and providing evidence from survey data of adults in the United States of the positive correlation between prosociality and various domains of subjective well-being. We follow the literature in defining a social preference as an individual’s preference regarding the payoffs or well-being of others (Charness and Rabin, 2002 ; Levitt and List, 2007 ). Social preference pertains to how the individual ranks possible combinations of personal payoffs and the payoffs of others. Depending on the degree to which one cares about the interests of others, one’s social preference can be prosocial or non-prosocial.

Like Dixit and Levin ( 2017 ) and Tilman et al. ( 2019 ), we define prosociality as the tendency of an individual to care about the payoffs or well-being of others, which in the literature and in the present paper is mathematically represented by a parameter. Although the concept of prosociality resembles the concept of social preference and is often used interchangeably with it, prosociality differs in that it enables us to think of levels of prosociality, such as high and low levels. Individuals with high prosociality care more about the payoffs of others; individuals with low prosociality care less.

Distinguishing between preferences and behaviors in accordance with distinctions often made by economists (Samuelson, 1938 ; Sen, 1973 ), we assume that individuals engage in prosocial behaviors—behaviors that help or benefit others—on the basis of their social preferences and that persons with higher prosociality are more likely to engage in prosocial behaviors like donating money or volunteering. An enormous literature considers the relationship between prosocial behavior and well-being. Theoretically, the causal relationship between prosocial behavior and happiness is reciprocal: the happiness of people increases when they engage in prosocial behavior, and happier people are more likely to engage in such behavior. Many empirical studies have found only a correlation between the two, but some have also found a causal relationship (Meier and Stutzer, 2008 ; Aknin et al., 2012 ; Boenigk and Mayr, 2016 ; Lawton et al., 2021 ).

Unlike these studies, the present study examines the relationship between social preferences and happiness rather than between prosocial behavior and happiness. Researchers have shown that social preference or prosociality is relatively stable (Van Lange and Semin-Goossens, 1998 ; Böhm et al., 2021 ). Whether social preferences have a fundamental relationship with individual welfare has important implications for how policies and laws enacted with the intention of influencing social preferences in turn affect social welfare, which is the aggregate of individual welfare.

This paper develops a formal theoretical model for analyzing well-being when individuals have heterogeneous social preferences. The model mathematically defines the relationships between social preferences, prosociality, and well-being and describes the hypothesis to be tested by the empirical analysis. We define prosocial preferences as preferences in which, with other conditions being held constant, the level of well-being increases as the payoffs of others increase. We define proself preferences as preferences in which the level of well-being decreases or remains unchanged as the payoffs of others increase. We also define prosociality as a parameter that expresses the degree to which one considers the payoffs of others, and we hypothesize that an increase in prosociality leads to an increase in level of well-being. The theoretical model is developed only to the extent necessary for the empirical analysis and is quite simple.

Then, in the empirical analysis, we test the hypothesis that prosociality is associated with happiness. Researchers have developed various measures of prosociality. We assess it by measuring social value orientation (SVO) using the Slider Measure developed by Murphy et al. ( 2011 ), which has been used frequently in recent economics or behavioral economics research (Grosch and Rau, 2017 ; D’Attoma et al., 2020 ). The Slider Measure is excellent in that it treats SVO both as a traditional categorical variable and as a continuous variable. We also use the Pemberton Happiness Index developed by Hervás and Vázquez ( 2013 ) to measure aspects of well-being. Their index consists of the sub-domains of remembered and experienced well-being, and remembered well-being consists of general well-being, eudaimonic well-being, hedonic well-being, and social well-being.

To test the hypotheses, regression analysis was conducted with each form of well-being as the dependent variable and with the SVO score (a continuous variable) or the SVO category (a categorical variable) as the independent variable. Parenthood (Pollmann-Schult, 2014 ; Radó, 2020 ), political preference (Napier and Jost, 2008 ; Onraet et al., 2017 ), income (Boyce et al., 2010 ; FitzRoy and Nolan, 2022 ), and education (Cuñado and de Gracia, 2012 ; Nikolaev, 2018 ), which have been used in previous studies, were also used as independent variables. Gender, age, employment, and marital status were used as control variables.

The regression analysis found a statistically significant positive correlation between SVO and total well-being. Focusing on the subdomains of total well-being, SVO had a statistically significant correlation not only with relatively short-lived hedonic well-being but also with more enduring eudaimonic well-being. The effect sizes of SVO on each of these dimensions of well-being were similar to the effect sizes of parenthood, income, and education, each of which is an important determinant of well-being. The analysis suggests that prosociality is a very important determinant of well-being.

The following section provides a review of the literature. The section on Theory of Heterogeneous Social Preferences and Well-Being presents a theoretical model. The Methodology section describes how the hypothesis that prosociality and happiness are correlated may be tested. The Results section reports the results of the regression analysis. The Conclusions and Discussion section considers implications for policy.

Literature review

Overview of the literature.

The present study contributes to three strands of research: social preferences and social value orientation (SVO), subjective well-being, and heterogeneous preferences. We first provide an overview of the literature, then examine the literature on each topic in detail.

In this study, we adopt a theoretical framework which assumes that individuals behave in a prosocial manner on the basis of their social preferences (Murphy and Ackermann, 2014 ). Individuals with prosocial preferences tend to behave more prosocially than those with proself preferences because the well-being of others has a greater positive impact on their own well-being. Recently, the number of studies on the relationship between prosocial behavior and well-being has been increasing remarkably. They include both empirical studies (Falk and Graeber, 2020 ; Song et al., 2020 ; Kushlev et al., 2022 ; Rinner et al., 2022 ) and theoretical studies using verbal models (Carlson et al., 1988 ; Aknin and Whillans, 2021 ; Hui, 2022 ). However, in part, because preferences cannot be directly observed, little research has been done on the relationship between social preferences and well-being.

We first mathematically formulate the relationship between social preferences and well-being. Decancq et al. ( 2017 ) presented a formal model of the relationship between the heterogeneous preferences and well-being. But because their model does not explicitly consider social preferences, we extend it. In the theoretical model of the present study, preferences that reflect the fact that all other things being equal, one’s own well-being increases as the payoffs of others increase are called prosocial preferences.

We then empirically examine the relationship between social preferences and well-being, which raises the question of how to measure these attributes. We look in detail at the literature on SVO to explain why we measure social preferences using SVO. With respect to well-being, researchers have shown that happiness consists of multiple dimensions. So we explore the literature on various aspects of happiness and its determinants. We also see the literature on heterogeneous preferences to explain why we need to consider heterogeneity for preferences, including social preferences, in considering the effects of policies and laws on society and the economy.

Literature on SVO

Social preference has many dimensions, including SVO and social mindfulness (Van Doesum et al., 2021 ). This paper focuses on SVO as a variable representing one aspect of social preference because of the large amount of research on it and how easy SVO is to measure. Research on SVO has a long history (Messick and McClintock, 1968 ; Murphy and Ackermann, 2014 ), and studies have shown SVO to be a predictor of many behaviors, including volunteer and donation behavior. But these studies have not made clear whether well-being differs among individuals with different SVOs; and, if so, to what extent and in which domains of well-being the differences are manifested. This study provides evidence for these questions.

After mathematician John von Neumann and economist Oskar Morgenstern established the foundations of game theory (Von Neumann and Morgenstern, 1944 ), it became possible to formally analyze interactions among decision-makers. The analyses usually assumed that in the course of such interactions, each individual pursues only his own self-interest, an assumption that often enabled useful predictions. Other investigators studied cases in which individuals may care about the interests of others as well as their own. Psychologists David Messick and Charles McClintock devised so-called decomposed games, games in which a decision maker has a unilateral choice about how to allocate resources between himself and another person (Messick and McClintock, 1968 ). Influenced by their study, the concept of SVO eventually emerged.

On the basis of SVO, people can be categorized into two main groups: prosocial and proself (De Cremer and Van Lange, 2001 ). Proself persons are mainly concerned with their own self-interest; prosocial people care about the interests of others as well as their own. Prosocial and proself groups can, in turn, be subdivided in accordance with specific motivations. The groups most often distinguished are prosocial, individualistic, and competitive (Murphy and Ackermann, 2014 ). In the case of two persons, a prosocial person maximizes joint gains for himself and the other person. An individualistic person maximizes self-gain, and a competitive person maximizes the difference between self-gain and the other person’s gain.

Studies have shown that SVO can predict various behavior. For instance, to study the association between SVO and volunteer behavior, McClintock and Allison ( 1989 ) classified students at a US university into three SVO-based groups: prosocial, individualistic, and competitive. The students were asked to volunteer for a psychological research project at their university and to indicate how many hours they would volunteer. Prosocial students devoted more hours to the research. Van Lange et al. ( 2011 ) showed that prosocial students at a Netherlands university were more likely than individualistic and competitive students to volunteer for psychological experiments.

Studies have also shown that SVO predicts donating behavior. When Van Lange et al. ( 2007 ) asked survey participants in the Netherlands about their donations, they found that prosocial people donated more often than individualistic and competitive people, especially to organizations for poor and ill people. A survey conducted in three regions of Bangladesh by Shahrier et al. ( 2017 ) showed that prosocial people donated more money to humanitarian activities than individualistic and competitive people did. These studies suggest that SVO has predictive power in both developed countries and developing countries.

Literature on subjective well-being

Researchers have studied various determinants of subjective well-being: parenthood (Pollmann-Schult, 2014 ; Radó, 2020 ), political preferences (Napier and Jost, 2008 ; Onraet et al., 2017 ), income (Boyce et al., 2010 ; FitzRoy and Nolan, 2022 ), education (Cuñado and de Gracia, 2012 ; Nikolaev, 2018 ). Because these factors play a significant role in social life, they are highly correlated with happiness. How much we care about others plays an important role in social life, so it is natural to assume that social preferences also have a large impact on happiness. But this assumption has not been fully examined in previous studies. The present study shows that prosociality is indeed correlated with happiness, with an effect size similar to the effect sizes of other determinants of happiness.

Measuring subjective well-being is a difficult task. Instances of subjective well-being can be divided into remembered well-being and experienced well-being; i.e., they can be distinguished with respect to when the experiences are being evaluated. Remembered well-being is an evaluation of one’s experiences as one remembers them after these experiences are over. Experienced well-being is an evaluation of one’s experiences in real-time. Remembered well-being may be biased by imperfect memory, imperfect conditions of evaluation, and other factors (Kahneman and Riis, 2005 ). Experienced well-being may not fully capture the long-term effects of experiences on well-being (Oliver, 2017 ). To compensate for their potential incompleteness, these two forms of reporting well-being should be used complementarily.

Remembered well-being can be subdivided into general well-being, eudaimonic well-being, hedonic well-being, and social well-being. General well-being is an evaluation of life satisfaction: a global evaluation of one’s life as assessed by one’s own criteria (Diener et al., 1985 ). Eudaimonic well-being is an evaluation of one’s actualization of potential. Hedonic well-being is an evaluation of one’s balance of pleasure and pain (Ryan and Deci, 2001 ). Social well-being is an evaluation of one’s circumstances and functioning in society (Keyes, 1998 ).

Eudaimonic and hedonic views of well-being have long histories (Ryan and Deci, 2001 ). The ancient Greek philosopher Aristotle considered hedonic happiness to be vulgar. He thought that happiness is the actualization of human potential. Another ancient Greek philosopher, Aristippus, thought that the proper goal of life is to maximize pleasure and that happiness is the sum of momentary pleasures. Eudaimonic well-being is often regarded as more enduring than hedonic well-being because the realization of potential is usually not a fleeting phenomenon, whereas simple pleasure and pain tend to be momentary (Steger et al., 2008 ).

This paper measures remembered well-being (general well-being, eudaimonic well-being, hedonic well-being, and social well-being) and experienced well-being and examines their correlation with social preferences.

Literature on heterogeneous preferences

This study also contributes to the literature on heterogeneous preferences, particularly heterogeneous social preferences. When considering the effects of policies and laws on society and the economy, conclusions may vary depending on the extent to which the relevant preferences of members of society are heterogeneous. For example, Ziegler ( 2020 ) showed that persons with prosocial preferences are more likely to enter into green energy contracts because they derive more utility from efforts to protect the environment than those with non-prosocial preferences do. The government could make its renewable energy policy more effective by making the process of supplying electricity more transparent. Showing that green energy contracts function to protect the environment would appeal to those with prosocial preferences.

In addition, Fehr and Schmidt ( 1999 ) showed that when there are social members with different social preferences—selfish individuals and prosocial individuals—the distribution of social preferences affects whether competition or cooperation occurs in equilibrium. Because many policies, such as environmental policies, require the cooperation of social members, the effects of these policies may vary in consequence of the distribution of social preferences.

As these examples show, the effects of policies and laws change depending on the heterogeneity of the social preferences of people. Analysis of the effects of policies and laws on social welfare ultimately requires an aggregation of individual welfare. So it is useful to know how the social preferences of individuals are related to their welfare levels in the first place. Although some recent research, such as the study by Decancq et al. ( 2017 ), presents a method of calculating inequality of well-being by considering the heterogeneous preferences of individuals, none has examined in detail the relationship between heterogeneous social preferences and levels of well-being. The present paper identifies a positive correlation between social preferences and welfare levels.

Theory of heterogeneous social preferences and well-being

To structure our thinking, we extend the model of heterogeneous preferences and well-being developed by Decancq et al. ( 2017 ) to the case of heterogeneous social preferences.

Suppose that there are n individuals in a society. We assume that the outcomes of life in dimension m  > 1 affect the well-being of each individual, and we denote the outcome vector for each individual i by \({{{\boldsymbol{l}}}}_i = (l_i^1,l_i^2, \cdots ,l_i^m)\) . Each person i has a well-behaved preference order R i for his or her set of outcome vectors. These preferences mean well-considered judgments about what each individual considers to be the good life. We assume that the preference order R i of each individual i can be expressed as a function of a preference vector consisting of k parameters \({{{\boldsymbol{a}}}}_i = (a_i^1,a_i^2, \cdots ,a_i^k)\) ; that is, R i  =  R ( a i ). We assume that the subjective well-being WB of each individual i depends on the outcome vector l i and the preference vector a i : WB ( l i , a i ).

These assumptions are the same as those of the model used by Decancq et al. But because we want to consider social preferences explicitly, we are adding a few more assumptions. Suppose that the subjective well-being WB of each individual i also depends on the outcomes of individuals other than i , and we denote the outcome matrix by \({{{\boldsymbol{L}}}}_{ - i} = ({{{\boldsymbol{l}}}}_1,\,{{{\boldsymbol{l}}}}_2, \cdots ,{{{\boldsymbol{l}}}}_{i - 1},{{{\boldsymbol{l}}}}_{i + 1}, \cdots ,{{{\boldsymbol{l}}}}_n)\) . This means that the well-being WB of each individual i depends not only on l i but also on L − i . Let L denote the outcomes in the society. Now the well-being WB of each individual i depends on the outcome matrix L and the preference vector a i : WB ( L , a i ). We also assume that the p th preference parameter of each individual i , \(a_i^p\) , is a prosociality parameter, which represents a preference about the outcomes of other individuals L − i .

Depending on \(a_i^p\) , each individual i can have a higher well-being with the same personal outcome l i if the outcomes of other individuals in the society L − i have better values. If for all individuals j  ≠  h it is the case that \({{{\boldsymbol{l}}}}_j^ \ast = {{{\boldsymbol{l}}}}_j\) , and for individual h it is the case that for \({{{\boldsymbol{\delta}}}} \in {\Bbb R}_ + ^m\backslash \{ 0\}\) , \({{{\boldsymbol{l}}}}_h^ \ast = {{{\boldsymbol{l}}}}_h + {{{\boldsymbol{\delta}}}}\) , we denote the outcome matrix by L * . We can now define prosocial preferences.

Definition . Individual i has a prosocial preference R i = R ( a i ) if

On the basis of this definition, it follows that individual i has a non-prosocial preference if \(WB({{{\boldsymbol{L}}}}^ \ast ,{{{\boldsymbol{a}}}}_{\it{i}}) \le WB({{{\boldsymbol{L}}}},{{{\boldsymbol{a}}}}_{\it{i}}).\)

We are interested in whether, in general, individuals with prosocial preferences have a higher level of well-being than individuals with non-prosocial preferences, given the same personal outcome and the same outcomes for others. Suppose that individuals i and j have a different prosocial preference parameter for each other. This means that \(a_i^p \,\ne\, a_j^p\) , where \(a_i^p\) is the prosociality parameter for individual i and \(a_j^p\) is the prosociality parameter for individual j . Suppose also that individual i has a prosocial preference R ( a i ), but individual j has a non-prosocial preference R ( a j ). We are interested in whether the following is generally (not always) true in the real world for any outcome L where l i  =  l j :

More generally, the level of the prosociality parameter \(a_i^p\) of each individual i may be correlated with the level of subjective well-being WB ( L , a i ) whether or not the individual’s preference is prosocial.

Hypothesis . The level of prosociality is correlated with the level of subjective well-being.

We will now empirically examine this hypothesis.

Methodology

Measurement method of social preferences.

We use SVO, an aspect of social preference, as an explanatory variable for well-being; as a continuous variable, SVO represents degree of prosociality. This means that we are using SVO as a proxy variable for the prosociality parameter. Previous studies have developed a variety of methods for measuring SVO (Messick and McClintock, 1968 ). The present study uses the Slider Measure developed by Murphy et al. ( 2011 ), a method that many scholars have begun to use.

In this method, subjects are asked to choose an allocation of gains between the self (the subject) and another person in six different situations. Footnote 3 In each situation, subjects have nine options for allocating the gains, as shown in Table 1 . The gains in the six situations are indicated by the six dotted lines in Fig. 1 . The vertical axis represents the gain of the other person, and the horizontal axis represents the gain of the subject. The four points (50, 100), (85, 85), (100, 50), and (85, 15) correspond to idealized altruistic choices, prosocial choices, individualistic choices, and competitive choices that are made when a person chooses an allocation of self-gain and other-gain from allocations located on the circle. The gains in the six situations are located on the six dotted lines that interconnect these four points. Each of the six situations corresponds to one of the six dotted lines.

figure 1

The author made this figure based on the description of Murphy et al. ( 2011 , p. 773).

After a subject chooses allocations in the six situations, the mean gain of the subject \(\bar A_s\) and the mean gain of the other \(\bar A_{\rm {o}}\) are calculated. Then 50 is subtracted from each mean gain so that the angle of the point ( \(\bar A_{\rm {s}},\bar A_{\rm {o}}\) ) to the center of the circle (50, 50) can be calculated. The SVO score of each subject is defined as the arctangent of the ratio of these adjusted means:

where SVO o is the SVO score, also called the SVO angle. Murphy et al. recommended that SVO be used as a continuous construct because it measures how much an individual sacrifices in order to make another individual better off. Footnote 4 In any case, the Slider Measure can classify subjects in terms of conventional categories. Based on the SVO scores, subjects can be classified as follows: altruistic ( SVO o  > 57.15), prosocial (57.15 >  SVO o  > 22.45), individualistic (22.45 >  SVO o  > −12.04), and competitive (−12.04 >  SVO o ). This classification is especially useful for comparing the results of various studies since many studies used this classification before the Slider Measure came into general use.

Measurement method of well-being

Although many methods have been developed to measure different aspects of well-being, most measure only a single domain of well-being. Because our interest lies in the relationship between various domains of well-being and SVO, we want to use a method that subsumes many domains. The Pemberton Happiness Index developed by Hervás and Vázquez ( 2013 ) does so.

Hervás and Vázquez combined several widely used scales of well-being in order to measure both remembered well-being and experienced well-being. In the case of remembered well-being, subjects are asked to rate the statements in Table 2 on an 11-point Likert scale (0 = total disagreement, 10 = total agreement). Remembered well-being is measured as the mean score of these 11 ratings. The sum of raw scores divided by 11 provides a mean score ranging from 0 to 10.

Remembered well-being consists of general well-being, eudaimonic well-being, hedonic well-being, and social well-being. General well-being is measured by questions (r1) and (r2) on global life satisfaction (Diener et al., 1985 ; Ryan and Frederick, 1997 ). Eudaimonic well-being has six components: life meaning, self-acceptance, personal growth, relatedness, perceived control, and autonomy. These components are based on the model of psychological well-being developed by Ryff ( 1989 ). They are measured by statements (r3)–(r8). Hedonic well-being has two components, positive affect, and negative affect, which are based on the Positive and Negative Affect Schedule (PANAS) scale developed by Watson et al. ( 1988 ). Positive affect and negative affect are measured by statements (r9) and (r10). Social well-being is measured by statement (r11) about a person’s situation and functioning in society (Keyes, 1998 ).

To measure experienced well-being, subjects are asked to answer “yes” or “no” regarding whether the events listed in Table 3 occurred the day before. Items (e1), (e3), (e5), (e7), and (e8) are positive experiences; items (e2), (e4), (e6), (e9), and (e10) are negative experiences. The occurrence of each positive experience is counted as 1, and the nonoccurrence of each negative experience is also counted as 1. The sum of these scores is a single overall score that ranges from 0 (no positive experiences and 5 negative experiences) to 10 (5 positive experiences and no negative experiences).

Total well-being, which includes both remembered well-being and experienced well-being, is calculated by adding a subject’s scores for remembered well-being (11 scores) and experienced well-being (1 score), then dividing this total score by 12 to obtain a mean score that ranges from 0 to 10.

Participants and procedure

The data were collected using Amazon Mechanical Turk (MTurk), an online crowdsourcing platform. Footnote 5 In March 2016, the author recruited participants in the United States. Footnote 6 Power analysis indicated that a sample size of 200 would be sufficient to achieve 80% power, assuming a small to medium effect size. The author, therefore, collected 212 samples. The participants were asked to first complete the Slider Measure, then to complete the questionnaires for the Pemberton Happiness Index and answer demographic questions. The mean time required to complete the entire procedure was 3 min and 30 s. Each participant received 0.5 US dollars for participating.

Regression model

To analyze the impact of SVO on well-being in a way that takes into account other independent variables, we use regression analysis. This is the regression model:

In this model, the dependent variable WB i is the subjective well-being of individual i . To analyze the multiple aspects of well-being, we use the scores for total well-being, remembered well-being, general well-being, eudaimonic well-being, hedonic well-being, social well-being, and experienced well-being as dependent variables.

The independent variable of interest, SVO i , is the SVO score of individual i . In our basic model, we follow the recommendation of Murphy et al. ( 2011 ) in regarding SVO as a continuous construct and in using the SVO score as an independent variable. In order to compare our results with those of previous studies, we also estimate a model using a binary variable as an independent variable instead of the SVO score. The binary variable classifies individuals into prosocial individuals and individualistic individuals based on their SVO score while leaving other variables unchanged.

The symbol x i represents a vector of the other independent variables. Researchers have found that many factors affect well-being; in this study, we use parenthood, political preferences, income level, and education level as independent variables.

Despite the costs and stress of child-rearing, in general parenthood positively affects well-being (Pollmann-Schult, 2014 ; Radó, 2020 ). We use a binary variable that indicates whether a respondent has one or more children as an independent variable. With respect to political preferences, researchers have shown that political conservatives have higher subjective well-being than political liberals (Napier and Jost, 2008 ; Onraet et al., 2017 ). We use a categorical variable representing political preferences as an independent variable. Participants are categorized as Republican, Democratic, Independent, or Other. With respect to income, researchers have shown that results vary depending on whether the concept of income rank, relative income, or household income is used; but, in general, income positively affects well-being (Boyce et al., 2010 ; FitzRoy and Nolan, 2022 ). We have data only on categories of household income levels, so we treat income as an ordinal variable. Regarding education, researchers have found its impact on well-being to be complex as well. In general, though, higher levels of education positively influence well-being (Cuñado and de Gracia, 2012 ; Nikolaev, 2018 ). We treat education as a categorical variable because we have data on the final educational degrees of the respondents.

The symbol \({{{\boldsymbol{z}}}}_i\) represents a vector of the control variables, including variables indicating gender, age, employment status, and marital status. A variable of gender is a binary variable, and the other variables are categorical. The symbol ε i is the error term.

We first look at the demographic data of the participants (Table 4 ). Women constituted 45.3% of all participants, persons younger than 40 constituted 65.2%, persons with a bachelor’s degree or higher 51.4%, wage employees or the self-employed 79.7%, persons with a household income of $50,000 or more 43.8%, Democrats 50%, Republicans 20.8%, married persons 33.5%, persons with one or more children 41.5%.

Table 5 reports the means, standard deviations (SD), and Pearson correlation coefficients for SVO and well-being variables. The mean SVO score of 23.883 indicates that the average participant was prosocial. The mean score for total well-being was 6.830. The mean scores for remembered well-being and experienced well-being—the subdomains of total well-being—were 6.846 and 6.656, respectively. In the study by Hervás and Vázquez ( 2013 ), these scores were similar for the US sample at 6.93 and 6.32, respectively. With respect to the subdomains of remembered well-being, although the mean scores for general well-being, eudaimonic well-being, and hedonic well-being ranged between about 6.7 and 7.1, the mean score for social well-being was 5.925, deviating downward from the other scores.

The SVO was weakly correlated with total well-being, remembered well-being, and hedonic well-being at significance levels of 5%, 5%, and 1%, respectively. Among the correlation coefficients between the SVO and these well-being domains, the coefficient between the SVO and hedonic well-being was the largest at 0.189. Footnote 7 Although not reported in Table 5 , the internal consistency of remembered well-being and total well-being, measured by Cronbach’s alpha, were 0.940 and 0.942, respectively. This degree of internal consistency is similar to that found in the study of Hervás and Vázquez ( 2013 ), in which these scores were both 0.93 for the US sample.

Table 6 reports the distribution of SVO categories traditionally used in many studies. For the sake of comparison, the distributions of SVO categories in two data sets in Murphy et al. ( 2011 ) are reported as well. Their sample was students at a European university, and they used the Slider Measure in their second and third experimental sessions. In the present study, the proportions of altruistic participants, prosocial participants, individualistic participants, and competitive participants were 0.5%, 55.7%, 43.9%, and 0%, respectively. This distribution is similar to the distributions in the study by Murphy et al., in which prosocial participants constituted the majority and individualistic participants constituted the second-largest group. In both studies, altruistic and competitive individuals were rare.

Now let us look at the main results. Table 7 reports the results of ordinary least-squares regression using SVO as a continuous construct, with the SVO score as an independent variable and with total well-being, remembered well-being, general well-being, eudaimonic well-being, hedonic well-being, social well-being, and experienced well-being as dependent variables. The coefficient of SVO was largest at 0.028 when hedonic well-being was a dependent variable. This means that an increase of 1 in SVO score was associated with an increase of 0.028 in hedonic well-being. The SVO coefficient was about 0.02 when total well-being, remembered well-being, eudaimonic well-being, and experienced well-being were used as dependent variables, and the SVO coefficient was about 0.015 when general well-being and social well-being were used. The SVO coefficient was statistically significant at the 1% level when hedonic well-being was a dependent variable, and it was statistically significant at the 5% level when total well-being, remembered well-being, and eudaimonic well-being were dependent variables.

The coefficients of the other independent variables were generally consistent with the coefficients reported in the studies discussed in the section “Literature review”. The coefficient of parenthood was largest, 0.672, when eudaimonic well-being was a dependent variable, and it was about 0.6 when total well-being, remembered well-being and general well-being were dependent variables. With respect to political preferences, the coefficient for Republican supporters, with Democratic supporters as the reference category, was largest, 0.973, when eudaimonic well-being was a dependent variable; it was about 0.9 when total well-being, remembered well-being and general well-being were dependent variables. These coefficients were statistically significant at the 5% level or at the 1% level.

With respect to household income, the coefficients for the higher income categories were generally positive, with $30,000 or less as the reference category. For the category of $70,000–$79,999 and the category of $150,000 or more, the coefficients were high, with values greater than 1. Most of these coefficients were statistically significant at the 5% level or 1% level. In 2016, when this study was conducted, the median household income in the United States was $59,039 (U.S. Census Bureau, 2017 ). This means that the coefficients of household income were much higher in the income categories that were slightly or extremely above the median income than in the other income categories. With respect to educational degrees, with the category of high school graduate as the reference category, the coefficient for the category of doctoral or professional degree was considerably higher than the coefficients for the other categories.

For the sake of comparing the relative magnitudes of the coefficients, the bottom panel of Table 7 also reports the standardized coefficients. These coefficients indicate by how many standard deviations each dependent variable changes when each independent variable increases by one standard deviation. Looking closely at the regressions for total well-being, we find that the standardized coefficient of SVO was 0.159. At the same time, the standardized coefficients for parenthood, income of $150,000 or more, and doctoral or professional degree categories were 0.158, 0.167, and 0.176. Thus, the effect size of SVO on total well-being was comparable to the effect sizes of parenthood, income, and education. Looking at the subdomains of total well-being, we obtain similar conclusions for remembered well-being, eudaimonic well-being, and hedonic well-being: the effect sizes of SVO were comparable to the effect sizes of parenthood, income, and education.

Most previous studies treat SVO as a categorical variable. So for the sake of comparison and to provide a check of robustness, Table 8 reports the results of ordinary least-squares regression with the categorical SVO variable as an independent variable. Using the four traditional SVO typologies, we find that the number of observations for the altruistic group in our sample was 1 (0.5%) and that the number of observations for the competitive group was 0 (Table 6 ). For the sake of convenience, a respondent in the altruistic group was included in the prosocial group. Footnote 8 We use a dummy variable that takes 1 if each individual is prosocial and 0 if each individual is individualistic.

Looking at the results, we find that the coefficients of all variables except SVO and the adjusted R -squared values were almost the same as when the continuous SVO variable was used. Focusing on the unstandardized coefficients of SVO that were statistically significant, we see that they were 0.579 for total well-being, 0.566 for remembered well-being, 0.568 for eudaimonic well-being, 0.833 for hedonic well-being, and 0.725 for experienced well-being. This means that when each individual was prosocial, each well-being score was greater by the magnitude of the coefficient than the well-being score when each individual was individualistic. In the case of the regressions for experienced well-being, the coefficient of the continuous SVO variable was not statistically significant. But the coefficient of the categorical SVO variable was statistically significant. We also find that the standardized coefficient of the categorical SVO variable in each regression was almost identical to the standardized coefficient of the continuous SVO variable in each regression. This suggests that when analyzing the relationship between social preferences and well-being, whether one uses a continuous or a categorical SVO variable may not have a significant impact on the conclusions.

Conclusions and discussion

We measured prosociality by using SVO scores and examined the correlations between prosociality and various aspects of well-being. With simple correlation analysis, we observed weak correlations that were statistically significant between SVO and total well-being, remembered well-being, and hedonic well-being. When we analyzed the correlations more rigorously with regression analysis using the continuous variable SVO and controlling for the influence of other explanatory variables, SVO had a statistically significant correlation with eudaimonic well-being in addition to the statistically significant correlations with the other well-being variables. This indicates that prosociality is correlated not only with momentary hedonic well-being but also with more enduring eudaimonic well-being. Looking at the effect sizes of SVO on each dimension of well-being, we saw that the results were similar to the effect sizes of parenthood, income, and education, which are important determinants of well-being.

In regression analyses using the categorical variable SVO, SVO also had a statistically significant correlation with experienced well-being. Given the fact that SVO is essentially a continuous variable that expresses how much one sacrifices one’s own gain for the gain of others and that the SVO categories were created somewhat artificially for the sake of convenience, the statistical significance of the correlation between the categorical variable SVO and experienced well-being may have arisen by chance. The relationship between SVO and experienced well-being needs to be further researched.

In neither correlation analysis nor regression analysis did SVO have a statistically significant correlation between general well-being and social well-being. This result suggests that there is no difference in the level of happiness between prosocial people and proself people with respect to such aspects of happiness as satisfaction with life in general and satisfaction with society, but that prosocial people are happier than proself people with respect to such aspects as realization of their potential (eudaimonic well-being) and momentary pleasure and pain (hedonic well-being). We may interpret this fact as meaning that proself people can obtain satisfaction with life in general and with society by increasing their own gain but have less opportunity to realize their potential related to helping others and less opportunity to feel pleasure by increasing the gains of others.

One limitation of this study is that it confirmed only a correlation between prosociality and happiness and not also a causal relationship between them. Although we used SVO as a measure of prosociality, SVO pertains to only one aspect of it, and there are other ways to measure prosociality, such as social mindfulness (Van Doesum et al., 2021 ). It is unclear whether we would obtain similar results by using different measures. Moreover, this study is based on a survey of adults in the United States, and data from other countries are necessary for the sake of external validity. Hopefully, future studies will overcome these limitations by using the present study as a starting point.

This study provides a theoretical foundation for policies and laws that encourage individuals to have prosocial preferences. Various international agreements and laws, especially those pertaining to the environment, have encouraged individuals to take prosocial actions or to have prosocial preferences. Usually, the justification for such policies and laws is that if individuals act only in their own self-interest, the natural environment will be destroyed and the economy and society will eventually be unable to sustain themselves. But such policies and laws can also be justified on the basis of the fact that when individuals have prosocial preferences, this in itself fosters individual welfare, which in turn fosters social welfare as the sum of all individual welfare. Governments, firms, and individuals in various countries may not yet fully recognize the importance of this rationale.

This study also suggests the value of further research about SVO, subjective well-being, heterogeneous preferences, and their interrelationships. For example, the fact that this study found that social preferences affect subjective well-being on the same scale as such determinants as parenthood, income, and education highlights the importance of investigating how countries may foster prosociality. Although educational policies in many countries at least nominally regard cultivation of prosociality as a priority, the true importance of prosociality, including the fact that prosociality is a cause of happiness, has yet to be fully recognized.

Data availability

Owing to ongoing research and analysis, the supporting data are currently available only to bona fide researchers, under the condition of a signed nondisclosure agreement. For details regarding the data and information on how to request access, please contact the corresponding author.

Examples include the European Union’s “Directive 2008/99/EC on the protection of the environment through criminal law” and the United Kingdom’s “Climate change agreements.” For examples of nudging in the environmental arena, see Ghesla et al. ( 2019 ) and Wee et al. ( 2021 ). Generally, a nudge is any feature of a choice architecture that influences people’s behavior in a predictable way without limiting their options or radically changing their incentives (Thaler and Sunstein 2008 ).

Another question is whether laws and policies can change individual preferences; and, if they can, whether they can change them in a targeted way (Bowles and Polania-Reyes, 2012 ). This question is not addressed in the present paper.

These six items are called primary items. The Slider Measure also includes nine secondary items for analyzing prosocial motivations in further detail, but this paper does not use them.

One might suppose that our theoretical model differs from the SVO formulation because we considered whether individual well-being would increase only if the gains of others increased, holding other variables constant. However, there is no contradiction in using SVO to measure the prosociality parameter of the theoretical model; measuring how much one’s own gain can be reduced for the sake of the gain of others also means measuring how much one’s utility increases with an increase in the gain of others.

This method of collecting data has been used extensively to recruit participants for surveys and experimental studies in social sciences like psychology and economics. Researchers have confirmed that the data collected using MTurk are at least as reliable as data collected by other standard methods, such as by recruiting college students (see Buhrmester et al., 2011 ).

The study was conducted in accordance with the ethical standards of the 1964 Declaration of Helsinki and the ethical standards of the author’s institution. The author obtained informed consent from all participants.

Note that although the correlations are high for each of the well-being variables, the regression analysis uses each variable as a dependent variable. So the problem of multicollinearity does not arise.

Even if a respondent in the altruistic group was excluded from our sample, it does not affect our conclusions.

Aknin LB, Dunn EW, Norton MI (2012) Happiness runs in a circular motion: evidence for a positive feedback loop between prosocial spending and happiness. J Happiness Stud 13(2):347–355. https://doi.org/10.1007/s10902-011-9267-5

Article   Google Scholar  

Aknin LB, Whillans AV (2021) Helping and happiness: a review and guide for public policy. Soc Issues Policy Rev 15(1):3–34. https://doi.org/10.1111/sipr.12069

Boenigk S, Mayr ML (2016) The happiness of giving: evidence from the German socioeconomic panel that happier people are more generous. J Happiness Stud 17(5):1825–1846. https://doi.org/10.1007/s10902-015-9672-2

Böhm R, Fleiß J, Rybnicek R (2021) On the stability of social preferences in inter-group conflict: a lab-in-the-field panel study. J Confl Resolution 65(6):1215–1248 https://doi.org/10.1177/0022002721994080

Bowles S, Polania-Reyes S (2012) Economic incentives and social preferences: substitutes or complements. J Econ Lit 50(2):368–425. https://doi.org/10.1257/jel.50.2.368

Boyce CJ, Brown GDA, Moore SC (2010) Money and happiness: rank of income, not income, affects life satisfaction. Psychol Sci 21(4):471–475. https://doi.org/10.1177/0956797610362671

Article   PubMed   Google Scholar  

Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s Mechanical Turk: a new source of inexpensive, yet high-quality, data. Perspect Psychol Sci 6(1):3–5. https://doi.org/10.1177/1745691610393980

Carlson M, Charlin V, Miller N (1988) Positive mood and helping behavior: a test of six hypotheses. J Pers Soc Psychol 55:211–229. https://doi.org/10.1037/0022-3514.55.2.211

Article   CAS   PubMed   Google Scholar  

Charness G, Rabin M (2002) Understanding social preferences with simple tests. Q J Econ 117(3):817–869. https://doi.org/10.1162/003355302760193904

Article   MATH   Google Scholar  

Cuñado J, de Gracia FP (2012) Does education affect happiness? Evidence for Spain. Soc Indic Res 108(1):185–196. https://doi.org/10.1007/s11205-011-9874-x

D’Attoma JW, Volintiru C, Malézieux A (2020) Gender, social value orientation, and tax compliance. CESifo Econ Stud 66(3):265–284. https://doi.org/10.1093/cesifo/ifz016

De Cremer D, Van Lange PAM (2001) Why prosocials exhibit greater cooperation than proselfs: the roles of social responsibility and reciprocity. Eur J Personal 15(S1):S5–S18. https://doi.org/10.1002/per.418

Decancq K, Fleurbaey M, Schokkaert E (2017) Wellbeing inequality and preference heterogeneity. Economica 84(334):210–238. https://doi.org/10.1111/ecca.12231

Diener E, Emmons RA, Larsen RJ, Griffin S (1985) The satisfaction with life scale. J Pers Assess 49(1):71–75. https://doi.org/10.1207/s15327752jpa4901_13

Dixit A, Levin S (2017) Social creation of pro-social preferences for collective action. In: Buchholz W, Rübbelke D (eds) The theory of externalities and public goods: essays in memory of Richard C. Cornes. Springer International Publishing, pp. 127–143

Falk A, Graeber T (2020) Delayed negative effects of prosocial spending on happiness. Proc Natl Acad Sci USA 117(12):6463–6468. https://doi.org/10.1073/pnas.1914324117

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Fehr E, Schmidt KM (1999) A theory of fairness, competition, and cooperation. Q J Econ 114(3):817–868. https://doi.org/10.1162/003355399556151

FitzRoy FR, Nolan MA (2022) Income status and life satisfaction. J Happiness Stud 23(1):233–256. https://doi.org/10.1007/s10902-021-00397-y

Ghesla C, Grieder M, Schmitz J (2019) Nudge for good? Choice defaults and spillover effects. Front Psychol 10. https://www.frontiersin.org/article/10.3389/fpsyg.2019.00178

Grosch K, Rau HA (2017) Gender differences in honesty: the role of social value orientation. J Econ Psychol 62:258–267. https://doi.org/10.1016/j.joep.2017.07.008

Hervás G, Vázquez C (2013) Construction and validation of a measure of integrative well-being in seven languages: the Pemberton Happiness Index. Health Qual Life Outcomes 11(1):66. https://doi.org/10.1186/1477-7525-11-66

Article   PubMed   PubMed Central   Google Scholar  

Hui BPH (2022) Prosocial behavior and well-being: shifting from the ‘chicken and egg’ to positive feedback loop. Curr Opin Psychol 44:231–236. https://doi.org/10.1016/j.copsyc.2021.09.017

Kahneman D, Riis J (2005) Living, and thinking about it: two perspectives on life. In: Huppert FA, Baylis N, Keverne B (eds) The science of well-being, Oxford University Press, pp. 284–305

Keyes CLM (1998) Social well-being. Soc Psychol Q 61(2):121–140. https://doi.org/10.2307/2787065

Kushlev K, Radosic N, Diener E (2022) Subjective well-being and prosociality around the globe: happy people give more of their time and money to others. Soc Psychol Personal Sci 13(4):849–861. https://doi.org/10.1177/19485506211043379

Lawton RN, Gramatki I, Watt W, Fujiwara D (2021) Does volunteering make us happier, or are happier people more likely to volunteer? Addressing the problem of reverse causality when estimating the wellbeing impacts of volunteering. J Happiness Stud 22(2):599–624. https://doi.org/10.1007/s10902-020-00242-8

Levitt SD, List JA (2007) What do laboratory experiments measuring social preferences reveal about the real world. J Econ Perspect 21(2):153–174. https://doi.org/10.1257/jep.21.2.153

Mattauch L, Hepburn C, Spuler F, Stern N (2022) The economics of climate change with endogenous preferences. Resour Energy Econ 69:101312. https://doi.org/10.1016/j.reseneeco.2022.101312

McClintock CG, Allison ST (1989) Social value orientation and helping behavior. J Appl Soc Psychol 19(4):353–362. https://doi.org/10.1111/j.1559-1816.1989.tb00060.x

Meier S, Stutzer A (2008) Is volunteering rewarding in itself. Economica 75(297):39–59. https://doi.org/10.1111/j.1468-0335.2007.00597.x

Messick DM, McClintock CG (1968) Motivational bases of choice in experimental games. J Exp Soc Psychol 4(1):1–25. https://doi.org/10.1016/0022-1031(68)90046-2

Murphy RO, Ackermann KA (2014) Social value orientation: theoretical and measurement issues in the study of social preferences. Personal Soc Psychol Rev 18(1):13–41. https://doi.org/10.1177/1088868313501745

Murphy RO, Ackermann KA, Handgraaf MJJ (2011) Measuring social value orientation. Judgm Decision Mak 6(8):771–781. http://journal.sjdm.org/11/m25/m25.html

Napier JL, Jost JT (2008) Why are conservatives happier than liberals. Psychol Sci 19(6):565–572. https://doi.org/10.1111/j.1467-9280.2008.02124.x

Nikolaev B (2018) Does higher education increase hedonic and eudaimonic happiness. J Happiness Stud 19(2):483–504. https://doi.org/10.1007/s10902-016-9833-y

Oliver A (2017) Distinguishing between experienced utility and remembered utility. Public Health Eth 10(2):122–128. https://doi.org/10.1093/phe/phw014

Onraet E, Van Assche J, Roets A, Haesevoets T, Van Hiel A (2017) The happiness gap between conservatives and liberals depends on country-level threat: a worldwide multilevel study. Soc Psychol Personal Sci 8(1):11–19. https://doi.org/10.1177/1948550616662125

Pollmann-Schult M (2014) Parenthood and life satisfaction: why don’t children make people happy? J Marriage Fam 76(2):319–336. https://doi.org/10.1111/jomf.12095

Radó MK (2020) Tracking the effects of parenthood on subjective well-being: evidence from Hungary. J Happiness Stud 21(6):2069–2094. https://doi.org/10.1007/s10902-019-00166-y

Rinner MT, Haller E, Meyer AH, Gloster AT (2022) Is giving receiving? The influence of autonomy on the association between prosocial behavior and well-being. J Context Behav Sci 24:120–125. https://doi.org/10.1016/j.jcbs.2022.03.011

Ryan RM, Deci EL (2001) On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. Annu Rev Psychol 52(1):141–166. https://doi.org/10.1146/annurev.psych.52.1.141

Ryan RM, Frederick C (1997) On energy, personality, and health: subjective vitality as a dynamic reflection of well-being. J Pers 65(3):529–565. https://doi.org/10.1111/j.1467-6494.1997.tb00326.x

Ryff CD (1989) Happiness is everything, or is it? Explorations on the meaning of psychological well-being. J Pers Soc Psychol 57(6):1069. https://doi.org/10.1037/0022-3514.57.6.1069

Samuelson PA (1938) A note on the pure theory of consumer’s behaviour. Economica 5(17):61–71. https://doi.org/10.2307/2548836

Sen A (1973) Behaviour and the concept of preference. Economica 40(159):241–259. https://doi.org/10.2307/2552796

Shahrier S, Kotani K, Kakinaka M (2017) Religiosity may not be a panacea: importance of prosociality to maintain humanitarian donations. Working Papers SDES-2017-23, Kochi University of Technology, School of Economics and Management. https://ideas.repec.org/p/kch/wpaper/sdes-2017-23.html

Song Y, Broekhuizen ML, Dubas JS (2020) Happy little benefactor: prosocial behaviors promote happiness in young children from two cultures. Front Psychol 11. https://www.frontiersin.org/articles/10.3389/fpsyg.2020.01398

Steger MF, Kashdan TB, Oishi S (2008) Being good by doing good: daily eudaimonic activity and well-being. J Res Pers 42(1):22–42. https://doi.org/10.1016/j.jrp.2007.03.004

Thaler RH, Sunstein CR (2008) Nudge: improving decisions about health, wealth, and happiness. Penguin, London

Google Scholar  

Tilman AR, Dixit AK, Levin A (2019) Localized prosocial preferences, public goods, and common-pool resources. Proc Natl Acad Sci USA 116(12):5305–5310. https://doi.org/10.1073/pnas.1802872115

Article   ADS   CAS   PubMed   Google Scholar  

U.S. Census Bureau (2017) Was median household income in 2016 the highest median household income ever reported from the Current Population Survey. Annual Social and Economic Supplement? https://www.census.gov/newsroom/blogs/random-samplings/2017/09/was_median_household.html . Accessed 18 May 2023

Van Doesum NJ, Murphy RO, Gallucci M, Aharonov-Majar E, Athenstaedt U, Au WT, Bai L, Böhm R, Bovina I, Buchan NR, Chen XP, Dumont KB, Engelmann JB, Eriksson K, Euh H, Fiedler S, Friesen J, Gächter S, Garcia C, … Lange PAMV (2021) Social mindfulness and prosociality vary across the globe. Proc Natl Acad Sci USA 118(35). https://doi.org/10.1073/pnas.2023846118

Van Lange PAM, Bekkers R, Schuyt TNM, Vugt MV (2007) From games to giving: social value orientation predicts donations to noble causes. Basic Appl Soc Psychol 29(4):375–384. https://doi.org/10.1080/01973530701665223

Van Lange PAM, Schippers M, Balliet D (2011) Who volunteers in psychology experiments? An empirical review of prosocial motivation in volunteering. Personal Individ Differ 51(3):279–284. https://doi.org/10.1016/j.paid.2010.05.038

Van Lange PAM, Semin-Goossens A (1998) The boundaries of reciprocal cooperation Eur J Soc Psychol 28(5):847–854. https://doi.org/10.1002/(SICI)1099-0992(199809/10)28:5<847::AID-EJSP886>3.0.CO;2-L

Von Neumann J, Morgenstern O (1944) Theory of games and economic behavior. Princeton University Press

Watson D, Clark LA, Tellegen A (1988) Development and validation of brief measures of positive and negative affect: the PANAS scales. J Personal Soc Psychol 54(6):1063–1070. https://doi.org/10.1037//0022-3514.54.6.1063

Article   CAS   Google Scholar  

Wee SC, Choong WW, Low ST (2021) Can “nudging” play a role to promote pro-environmental behaviour? Environ Challenges 5:100364. https://doi.org/10.1016/j.envc.2021.100364

Ziegler A (2020) Heterogeneous preferences and the individual change to alternative electricity contracts. Energy Econ 91:104889. https://doi.org/10.1016/j.eneco.2020.104889

Download references

Acknowledgements

The author thanks Rebecca Hollander-Blumoff, Robert J. MacCoun, Shozo Ota, Mitchell Polinsky, and the participants of a seminar held at Harvard University in 2020, the Law and Psychology Seminar at Stanford University, and the 2022 Behavioral Law and Economics Workshop. This work was supported by the New Faculty Startup Fund from Seoul National University.

Author information

Authors and affiliations.

Seoul National University School of Law, Seoul, Republic of Korea

Masaki Iwasaki

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Masaki Iwasaki .

Ethics declarations

Competing interests.

The author declares no competing interests.

Ethical approval

This study was conducted in compliance with the ethical standards of the 1964 Declaration of Helsinki and the ethical standards of Stanford University, and the author confirmed with the Stanford University IRB that he was allowed to publish this study.

Informed consent

Each participant in this study voluntarily gave their informed consent after being thoroughly briefed on the nature of the study, the procedures to be followed, their rights as a participant, and any potential risks. This ensured their understanding and willingness to participate.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Iwasaki, M. Social preferences and well-being: theory and evidence. Humanit Soc Sci Commun 10 , 342 (2023). https://doi.org/10.1057/s41599-023-01782-z

Download citation

Received : 10 September 2022

Accepted : 19 May 2023

Published : 20 June 2023

DOI : https://doi.org/10.1057/s41599-023-01782-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

strong form of preference hypothesis

Adam Becker

Author and astrophysicist, weak forms and strong forms.

For Cameron Neylon, because he kept asking me for this…

The Sapir-Whorf hypothesis 1 states that language affects thought — how we speak influences how we think. Or, at least, that’s one form of the hypothesis, the weak form. The strong form of Sapir-Whorf says that language determines thought, that how we speak forms a hard boundary on how and what we think. The weak form of Sapir-Whorf says that we drive an ATV across the terrain of thought; language can smooth the path in some areas and create rocks and roadblocks in others, but it doesn’t fundamentally limit where we can go. The strong form, in contrast, says we drive a steam train of thought, and language lays down the rails. There’s an intricate maze of forks and switchbacks spanning the continent, but at the end of the day we can only go where the rails will take us — we can’t lay down new track, no matter how we might try.

Most linguists today accept that some form of the weak Sapir-Whorf hypothesis must be true: the language(s) we speak definitely affect how we think and act. But most linguists also accept that the strong Sapir-Whorf hypothesis can’t be true, just as a matter of empirical fact. New words are developed, new concepts formed, new trails blazed on the terrain of thought. Some tasks may be easier or harder depending on whether your language is particularly suited for them — though even this is in dispute . But it’s simply not the case that we can’t think about things if we don’t have the words for them, nor that language actually determines our thought. In short, while the weak form of Sapir-Whorf is probably correct, the strong form is wrong. And this makes some sense: it certainly seems like language affects our thoughts, but it doesn’t seem like language wholly determines our thoughts.

But the Sapir-Whorf hypothesis isn’t the only theory with strong and weak forms — in fact, there’s a whole pattern of theories like this, and associated rhetorical dangers that go along with them. The pattern looks like this:

  • Start with a general theoretical statement about the world, where…
  • …there are two forms, a weak form and a strong form, and…
  • …the weak form is obviously true — how could it not be? — and…
  • …the strong form is obviously false, or at least much more controversial. Then, the rhetorical danger rears its head, and…
  • …arguments for the (true) weak form are appropriated, unmodified or nearly so, as arguments for the strong form by the proponents of the latter. (You also sometimes see this in reverse: people who are eager to deny the strong form rejecting valid arguments for the weak form.)

I don’t know why (5) happens, but I suspect (with little to no proof) that this confusion stems from rejection of a naive view of the world. Say you start with a cartoonishly simple picture of some phenomenon — for example, say you believe that thought isn’t affected by language in any way at all. Then you hear (good!) arguments for the weak form of the Sapir-Whorf hypothesis, which shows this cartoon picture is too simple to capture reality. With your anchor line to your old idea cut, you veer to the strong form of Sapir-Whorf. Then, later, when arguing for your new view, you use the same arguments that convinced you your old naive idea was false — namely, arguments for the weak form. (This also suggests that when (5) happens in reverse, this is founded in the same basic confusion: people defend themselves from the strong form by attacking the weak form because they would feel unmoored from their (naive) views if the weak form were true.) But why this happens is all speculation on my part. All I know for sure is that it does happen.

Cultural relativism about scientific truth is another good example. The two forms look something like this:

Weak form : Human factors like culture, history, and economics influence the practice of science, and thereby the content of our scientific theories.

Strong form : Human factors like culture, history, and economics wholly determine the content of our scientific theories.

It’s hard to see how the weak form could be wrong. Science is a human activity, and like any human activity, it’s affected by culture, economics, history, and other human factors. But the strong form claims that science is totally disconnected from anything like a “real world,” is simply manufactured by a variety of cultural and social forces, and has no special claim to truth. This is just not true. In her excellent book Brain Storm — itself about how the weak form of this thesis has played out in the spurious science of innate gender differences in the development of the human brain — Rebecca Jordan-Young forcefully rejects the strong form of relativism about science, and addresses both directions of the rhetorical confusion that arises from confounding the weak form with the strong:

The fact that science is not, and can never be, a simple mirror of the world also does not imply that science is simply “made up” and is not constrained by material phenomena that actually exist—the material world “pushes back” and exerts its own effects in science, even if we accept the postmodern premise that we humans have no hope of a direct access to that world that is unmediated by our own practices and culturally determined cognitive and linguistic structures. There is no need to dogmatically insist (against all evidence) that science really is objective in order to believe in science as a good and worthwhile endeavor, and even to believe in science as a particularly useful and trustworthy way of learning about the world. 2

Successful scientific theories, in general, must bear some resemblance to the world at large. Indeed, the success of scientific theories in predicting phenomena in the world would be nothing short of a miracle if there were absolutely no resemblance between the content of those theories and the content of the world. 3 That’s not to say that our theories are perfect representations of the world, nor that they are totally unaffected by cultural and political factors: far from it. I’m writing a book right now that’s (partly) about the cultural and historical factors influencing the debate on the foundations of quantum physics. But the content of our scientific theories is certainly not solely determined by human factors. Science is our best attempt to learn about the nature of the world. It’s not perfect. That’s OK.

There are many people, working largely in Continental philosophy and critical theory of various stripes, who advocate the strong form of relativism about science. 4 Yet most of their arguments which are ostensibly in favor of this strong form are actually arguments for the weak form: that culture plays some role in determining the content of our best scientific theories. 5 And that’s simply not the same thing.

Another, much more popular example of a strong and weak form problem is the set of claims around the “power of positive thinking.” The weak form suggests that being more confident and positive can make you happier, healthier, and more successful. This is usually true, and it’s hard to see how it couldn’t be usually true — though there are many specific counterexamples. For example, positive thinking can’t keep your house from being destroyed by a hurricane. Yet the strong form of positive-thinking claims — known as “the law of attraction,” and popularized by The Secret — suggests exactly that. This states that positive thinking, and positive thinking alone, can literally change the world around you for the better, preventing and reversing all bad luck and hardship. 6 Not only is this manifestly untrue, but the logical implications are morally repugnant: if bad things do happen to you, it must be a result of not thinking positively enough . For example, if you have cancer, and it’s resistant to treatment, that must be your fault . While this kind of neo-Calvinist victim-blaming is bad enough, it becomes truly monstrous — and the flaw in the reasoning particularly apparent — when extended from unfortunate individual circumstances to systematically disadvantaged groups. The ultimate responsibility for slavery, colonialism, genocide, and institutionalized bigotry quite obviously does not lie with the victims’ purported inability to wish hard enough for a better world.

In short, easily-confused strong and weak forms of a theory abound. I’m not claiming that this is anything like an original idea. All I’m saying is that some theories come in strong and weak forms, that sometimes the weak forms are obviously true and the strong obviously false, and that in those cases, it’s easy to take rhetorical advantage (deliberately or not) of this confusion. You could argue that the weak form directly implies the strong form in some cases, and maybe it does. But that’s not generally true, and you have to do a lot of work to make that argument — work that often isn’t done.

Again, I strongly suspect other people have come up with this idea. When I’ve talked with people about this, they’ve generally picked it up very quickly and come up with examples I didn’t think of. This seems to be floating around. If someone has a good citation for it, I’d be immensely grateful.

Image credit: Zink Dawg at English Wikipedia , CC-BY 3.0. I was strongly tempted to use this image instead.

  • This is apparently a historical misnomer, but we’ll ignore that for now. [ ↩ ]
  • Rebecca M. Jordan-Young, in Brain Storm: The Flaws in the Science of Sex Differences, Harvard University Press, 2011, pp. 299-300. Emphasis in the original. [ ↩ ]
  • See J.J.C. Smart,  Philosophy and Scientific Realism , and Hilary Putnam,  Mathematics, Matter, and Method . [ ↩ ]
  • Bruno Latour is the first name that comes to mind. [ ↩ ]
  • See, for example, Kuhn, who even seems to have confused himself about whether he was advocating the strong or the weak version. [ ↩ ]
  • The “arguments” in favor of this kind of nonsense take advantage of more than just the confusion between the strong and weak forms of the thesis about positive thinking. They also rely on profound misunderstandings about quantum physics and other perversions of science. But let’s put that aside for now. [ ↩ ]

Share this:

One thought on “ weak forms and strong forms ”.

There’s Occam’s Rusty Razor at work. Weak versions of theories necessitate lots of conditionals. Simpler just to eschew all conditionals. But simplicity itself is a virtue only with lots of subtlety and conditionality. Rusty razors butcher. Eschew Occam’s Rusty Razor.

Comments are closed.

IMAGES

  1. Preference Hypothesis and Strong Ordering (Explained With Diagram)

    strong form of preference hypothesis

  2. 13 Different Types of Hypothesis (2024)

    strong form of preference hypothesis

  3. Preference Hypothesis and Strong Ordering (Explained With Diagram)

    strong form of preference hypothesis

  4. Preference Hypothesis and Strong Ordering (Explained With Diagram)

    strong form of preference hypothesis

  5. Preference Hypothesis and Strong Ordering (Explained With Diagram)

    strong form of preference hypothesis

  6. How to Write a Strong Hypothesis in 6 Simple Steps

    strong form of preference hypothesis

VIDEO

  1. Revealed Preference theory

  2. Revealed Preference Theory|What is Revealed Preference|Revealed Preference Hypothesis|Microeconomics

  3. Proportion Hypothesis Testing, example 2

  4. Step 1. Form Null Hypothesis (H_0) and Alternative Hypothesis (H_1)

  5. Types of Knowledge

  6. Preference similarity Hypothesis (Linder Hypothesis)

COMMENTS

  1. Preference Hypothesis and Strong Ordering (Explained With Diagram)

    It should be carefully noted that Samuelson's revealed preference theory is based upon the strong form of preference hypothesis. In other words, in revealed preference theory, strong- ordering preference hypothesis has been applied. Strong ordering implies that there is definite ordering of various combinations in consumer's scale of ...

  2. Revealed preference

    Revealed preference theory was a means to reconcile demand theory by defining utility functions by observing behaviour. Therefore, revealed preference is a way to infer the preferences of individuals given the observed choices. It contrasts with attempts to directly measure preferences or utility, for example through stated preferences.

  3. Revealed Preference in Economics: What Does It Show?

    Revealed preference is an economic theory of consumption behavior which asserts that the best way to measure consumer preferences is to observe their purchasing behavior. Revealed preference ...

  4. The Revealed Preference Hypothesis (With Diagram)

    Samuelson introduced the term 'revealed preference into economics in 1938. Since then the literature in this field has proliferated. The revealed preference hypothesis is considered as a major breakthrough in the theory of demand, because it has made possible the establishment of the 'law of demand' directly (on the basis of the revealed preference axiom) without the use of indifference ...

  5. PDF Samuelson's Approach to Revealed Preference Theory: Some Recent ...

    test of the simplest form of the utility maximization hypothesis: if a bundle xis revealed preferred over a bundle y, then at some other instance, yshould not be revealed preferred over x. WARP requires the revealed preference relation to be asymmetric. Houthakker (1950) generalized WARP by introducing the strong axiom of revealed preference (SARP)

  6. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  7. Formulating Strong Hypotheses

    There are some important things to consider when building a compelling, testable hypothesis. Clearly state the prediction you are proposing. Make sure that the hypothesis clearly defines the topic and the focus of the study. Mask wearing and its effect on virus case load. Aim to write the hypothesis as an if-then statement.

  8. Foundations of Contemporary Revealed Preference Theory

    2.1 Traditional Revealed Preference Theory. RPT is not simply a theory. It is a broad research program in the theory of consumer choice. Footnote 3 The revealed preference research program can be thought of as an extended theoretical family—a family containing various family members with different conceptual insights, theoretical structures, and paradigmatic applications—but all bearing a ...

  9. How to Write a Strong Hypothesis

    Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  10. Hypothesis Testing Preferences in Research Decision Making

    Experiment 2 preferences for which study to conduct (1=strong preference for the gun control ineffective study, 3=no preference, 5=strong preference for the gun control effective study) as a function of ideology (liberal, moderate, conservative), implementer (independent research institute vs. not explicitly stated), and prior evidence (absence ...

  11. PDF The Expectations Hypothesis

    The Expectations Hypothesis ⁄ Antonios Sangvinatsosy University of Southern California Current Draft: March 29, 2008 ⁄ I thank Aggie Moon for providing research assistantship. All errors are my own. yDepartment of Finance and Business Economics, Marshall School of Business, University of Southern Cali- fornia, 701 Exposition Blvd, Hofiman Hall 701, Los Angeles, CA 90089-1427.

  12. Hick's Logical Theory of Demand: Preference Hypothesis and Logic of

    According to him, "the demand theory which is based upon the preference hypothesis turns out to be nothing else but an economic application of the logical theory of ordering.". Therefore, before deriving demand theory from preference hypothesis he explains the "logic of order". In this context he draws out difference between strong ...

  13. Preference Formation and Attitude Research

    The experimental hypothesis is that preference level for a stimulus selected many times will increase in a perceptual judgment task. They conducted a preliminary survey and sampled meaningless figures with similar preferences (Fig. 13 ), using the meaningless figures presented by Attneave and Arnoult ( 1956 ) and Vanderplas and Garvin ( 1959 ).

  14. On choice, preference, and preference for choice.

    In this paper, we examine several common everyday meanings of choice, propose behavioral definitions of choice, choosing, and preference, and recommend ways for behavioral researchers to talk consistently about these concepts. We also examine the kinds of performance in the contexts of various procedures that might be appropriately described as a preference for choice. In our view, the most ...

  15. Expected Utility Hypothesis

    Mathematically, the hypothesis that the preference function V(·) takes the form of a statistical expectation is equivalent to the condition that it be 'linear in the probabilities', that is, either a weighted sum of the components of P (i.e. ∑U i p i) or else a weighted integral of the functions F(·) or f(·) (∫U(x)dF(x) or ∫ U(x)f ...

  16. Scientific Hypothesis-Testing Strengthens Neuroscience Research

    Statistical power is essentially the ability of a test to identify a real effect when it exists. Power is defined as "1-β," where β is the probability of failing to reject the null hypothesis when it should be rejected. Statistical power varies from 0 to 1 and values of ≥0.8 are considered "good.".

  17. The Revealed Preference Theory of Demand

    Strong Form of Preference Hypothesis: It should be carefully noted that Prof. Samuelson's revealed preference theory is based upon the strong form of preference hypothesis. In other words, in revealed preference theory, strong-ordering preference hypothesis has been applied.

  18. Sapir-Whorf Hypothesis

    Their emphasis on fieldwork and their preference for historical and descriptive linguistics kept them separate from the new linguistics of the 1960s, Chomsky's generative grammar. ... The strong form of the Sapir-Whorf hypothesis claims that people from different cultures think differently because of differences in their languages. So, native ...

  19. Does mother know best? The preference-performance hypothesis and parent

    A substantial amount of research on host-plant selection by insect herbivores is focused around the preference-performance hypothesis (PPH). To date, the majority of studies have primarily considered insects with aboveground life cycles, overlooking insect herbivores that have both aboveground and belowground life stages, for which the PPH ...

  20. Social preferences and well-being: theory and evidence

    Overview of the literature. The present study contributes to three strands of research: social preferences and social value orientation (SVO), subjective well-being, and heterogeneous preferences.

  21. PDF Hypothesis Market The Efficient

    Roll: first weak form empirical work in "fair game" tradition Used 3 theories of term structure: Pure expectations hypothesis Two market segmentation hypotheses ("liquidity preference" hypothesis) Three theories differ only in value assigned to "liquidity premium" r jt =E(r o,t+j-1 |Փ t)+L jt L j,t: liquidity premium, r

  22. Weak Forms and Strong Forms

    All I know for sure is that it does happen. Cultural relativism about scientific truth is another good example. The two forms look something like this: Weak form: Human factors like culture, history, and economics influence the practice of science, and thereby the content of our scientific theories. Strong form: Human factors like culture ...

  23. Mating preferences

    The Immunocompetence Handicap Hypothesis This hypothesis suggests that secondary sexual characteristics such as a low waist-to-chest ratio or masculine facial features (e.g. strong jawline, larger brow ridge, more muscular) are reliable indicators of mate quality as the hormones that cause their development (i.e. testosterone) suppress the ...