greater than (>) less than (<)
H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.
H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ 30
H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30
A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.
H 0 : The drug reduces cholesterol by 25%. p = 0.25
H a : The drug does not reduce cholesterol by 25%. p ≠ 0.25
We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:
H 0 : μ = 2.0
H a : μ ≠ 2.0
We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 66 H a : μ __ 66
We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:
H 0 : μ ≥ 5
H a : μ < 5
We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 45 H a : μ __ 45
In an issue of U.S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.
H 0 : p ≤ 0.066
H a : p > 0.066
On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : p __ 0.40 H a : p __ 0.40
In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis , typically denoted with H 0 . The null is not rejected unless the hypothesis test shows otherwise. The null statement must always contain some form of equality (=, ≤ or ≥) Always write the alternative hypothesis , typically denoted with H a or H 1 , using less than, greater than, or not equals symbols, i.e., (≠, >, or <). If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis. Never state that a claim is proven true or false. Keep in mind the underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only in terms of non-absolute certainties.
H 0 and H a are contradictory.
Chris Drew (PhD)
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]
Learn about our Editorial Process
A null hypothesis is a general assertion or default position that there is no relationship or effect between two measured phenomena.
It’s a critical part of statistics, data analysis, and the scientific method . This concept forms the basis of testing statistical significance and allows researchers to be objective in their conclusions.
A null hypothesis helps to eliminate biases and ensures that the observed results are not due to chance. The rejection or failure to reject the null hypothesis helps in guiding the course of research.
The null hypothesis, often denoted as H 0 , is the hypothesis in a statistical test which proposes no statistical significance exists in a set of observed data.
It hypothesizes that any kind of difference or importance you see in a data set is due to chance.
Null hypotheses are typically proposed to be negated or disproved by statistical tests, paving way for the acceptance of an alternate hypothesis.
Importantly, a null hypothesis cannot be proven true; it can only be supported or rejected with confidence.
Should evidence – via statistical analysis – contradict the null hypothesis, it is rejected in favor of an alternative hypothesis. In essence, the null hypothesis is a tool to challenge and disprove that there is no effect or relationship between variables.
I like to show this video to my students which outlines a null hypothesis really clearly and engagingly, using real life studies by research students! The into explains it really well:
“There’s an idea in science called the null hypothesis and it works like this: when you’re setting out to prove a theory, your default answer should be “it’s not going to work” and you have to convince the world otherwise through clear results”
Here’s the full video:
An alternative hypothesis is the direct contrast to the null hypothesis. It posits that there is a statistically significant relationship or effect between the variables being observed.
If the null hypothesis is rejected based on the test data, the alternative hypothesis is accepted.
Importantly, while the null hypothesis is typically a statement of ‘no effect’ or ‘no difference,’ the alternative hypothesis states that there is an effect or difference.
Comprehension Checkpoint: How does the null hypothesis help to ensure that research is objective and unbiased?
A statement of no effect or no relationship | A statement that suggests there is an effect or relationship | |
H | H or H | |
The average time to recover using Drug A is the same as with Drug B | The average time to recover using Drug A is less than with Drug B | |
No statistical significance between observed data | Statistical significance exists between observed data | |
The observed result is due to chance | The observed result is due to the effect or relationship |
The null hypothesis plays a critical role in numerous research settings, promoting objectivity and ensuring findings aren’t due to random chance.
In all these areas, the null hypothesis helps minimize bias, enabling researchers to support their findings with statistically significant data. It forms the backbone of many scientific research methodologies , promoting a disciplined approach to uncovering new knowledge.
See More Hypothesis Examples Here
The null hypothesis is a cornerstone of statistical analysis and empirical research. It serves as a starting point for investigations, providing a baseline premise that the observed effects are due to chance. By understanding and applying the concept of the null hypothesis, researchers can test the validity of their assumptions, making their findings more robust and reliable. In essence, the null hypothesis ensures that the scientific exploration remains objective, systematic, and free from unintended bias.
Your email address will not be published. Required fields are marked *
Last Updated: January 17, 2024 Fact Checked
This article was co-authored by Joseph Quinones and by wikiHow staff writer, Jennifer Mueller, JD . Joseph Quinones is a High School Physics Teacher working at South Bronx Community Charter High School. Joseph specializes in astronomy and astrophysics and is interested in science education and science outreach, currently practicing ways to make physics accessible to more students with the goal of bringing more students of color into the STEM fields. He has experience working on Astrophysics research projects at the Museum of Natural History (AMNH). Joseph recieved his Bachelor's degree in Physics from Lehman College and his Masters in Physics Education from City College of New York (CCNY). He is also a member of a network called New York City Men Teach. There are 7 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 29,054 times.
Are you working on a research project and struggling with how to write a null hypothesis? Well, you've come to the right place! Start by recognizing that the basic definition of "null" is "none" or "zero"—that's your biggest clue as to what a null hypothesis should say. Keep reading to learn everything you need to know about the null hypothesis, including how it relates to your research question and your alternative hypothesis as well as how to use it in different types of studies.
Thanks for reading our article! If you’d like to learn more about physics, check out our in-depth interview with Joseph Quinones .
Dec 3, 2022
Don’t miss out! Sign up for
wikiHow’s newsletter
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
5.2 - writing hypotheses.
The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).
When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.
Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)). The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).
Research Question | Is the population mean different from \( \mu_{0} \)? | Is the population mean greater than \(\mu_{0}\)? | Is the population mean less than \(\mu_{0}\)? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\mu=\mu_{0} \) | \(\mu=\mu_{0} \) | \(\mu=\mu_{0} \) |
Alternative Hypothesis, \(H_{a}\) | \(\mu\neq \mu_{0} \) | \(\mu> \mu_{0} \) | \(\mu<\mu_{0} \) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Is there a difference in the population? | Is there a mean increase in the population? | Is there a mean decrease in the population? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\mu_d=0 \) | \(\mu_d =0 \) | \(\mu_d=0 \) |
Alternative Hypothesis, \(H_{a}\) | \(\mu_d \neq 0 \) | \(\mu_d> 0 \) | \(\mu_d<0 \) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Is the population proportion different from \(p_0\)? | Is the population proportion greater than \(p_0\)? | Is the population proportion less than \(p_0\)? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(p=p_0\) | \(p= p_0\) | \(p= p_0\) |
Alternative Hypothesis, \(H_{a}\) | \(p\neq p_0\) | \(p> p_0\) | \(p< p_0\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Are the population means different? | Is the population mean in group 1 greater than the population mean in group 2? | Is the population mean in group 1 less than the population mean in groups 2? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\mu_1=\mu_2\) | \(\mu_1 = \mu_2 \) | \(\mu_1 = \mu_2 \) |
Alternative Hypothesis, \(H_{a}\) | \(\mu_1 \ne \mu_2 \) | \(\mu_1 \gt \mu_2 \) | \(\mu_1 \lt \mu_2\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Are the population proportions different? | Is the population proportion in group 1 greater than the population proportion in groups 2? | Is the population proportion in group 1 less than the population proportion in group 2? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(p_1 = p_2 \) | \(p_1 = p_2 \) | \(p_1 = p_2 \) |
Alternative Hypothesis, \(H_{a}\) | \(p_1 \ne p_2\) | \(p_1 \gt p_2 \) | \(p_1 \lt p_2\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Is the slope in the population different from 0? | Is the slope in the population positive? | Is the slope in the population negative? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\beta =0\) | \(\beta= 0\) | \(\beta = 0\) |
Alternative Hypothesis, \(H_{a}\) | \(\beta\neq 0\) | \(\beta> 0\) | \(\beta< 0\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Research Question | Is the correlation in the population different from 0? | Is the correlation in the population positive? | Is the correlation in the population negative? |
---|---|---|---|
Null Hypothesis, \(H_{0}\) | \(\rho=0\) | \(\rho= 0\) | \(\rho = 0\) |
Alternative Hypothesis, \(H_{a}\) | \(\rho \neq 0\) | \(\rho > 0\) | \(\rho< 0\) |
Type of Hypothesis Test | Two-tailed, non-directional | Right-tailed, directional | Left-tailed, directional |
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
Methodology
Published on May 6, 2022 by Shona McCombes . Revised on November 20, 2023.
A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection .
Daily apple consumption leads to fewer doctor’s visits.
What is a hypothesis, developing a hypothesis (with example), hypothesis examples, other interesting articles, frequently asked questions about writing hypotheses.
A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.
A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).
Hypotheses propose a relationship between two or more types of variables .
If there are any control variables , extraneous variables , or confounding variables , be sure to jot those down as you go to minimize the chances that research bias will affect your results.
In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .
Professional editors proofread and edit your paper by focusing on:
See an example
Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.
Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.
At this stage, you might construct a conceptual framework to ensure that you’re embarking on a relevant topic . This can also help you identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalize more complex constructs.
Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.
You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:
To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable.
In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.
If you are comparing two groups, the hypothesis can state what difference you expect to find between them.
If your research involves statistical hypothesis testing , you will also have to write a null hypothesis . The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .
Research question | Hypothesis | Null hypothesis |
---|---|---|
What are the health benefits of eating an apple a day? | Increasing apple consumption in over-60s will result in decreasing frequency of doctor’s visits. | Increasing apple consumption in over-60s will have no effect on frequency of doctor’s visits. |
Which airlines have the most delays? | Low-cost airlines are more likely to have delays than premium airlines. | Low-cost and premium airlines are equally likely to have delays. |
Can flexible work arrangements improve job satisfaction? | Employees who have flexible working hours will report greater job satisfaction than employees who work fixed hours. | There is no relationship between working hour flexibility and job satisfaction. |
How effective is high school sex education at reducing teen pregnancies? | Teenagers who received sex education lessons throughout high school will have lower rates of unplanned pregnancy teenagers who did not receive any sex education. | High school sex education has no effect on teen pregnancy rates. |
What effect does daily use of social media have on the attention span of under-16s? | There is a negative between time spent on social media and attention span in under-16s. | There is no relationship between social media use and attention span in under-16s. |
If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.
Statistics
Research bias
Discover proofreading & editing
A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).
Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
McCombes, S. (2023, November 20). How to Write a Strong Hypothesis | Steps & Examples. Scribbr. Retrieved August 28, 2024, from https://www.scribbr.com/methodology/hypothesis/
Other students also liked, construct validity | definition, types, & examples, what is a conceptual framework | tips & examples, operationalization | a guide with examples, pros & cons, "i thought ai proofreading was useless but..".
I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”
In mathematics, Statistics deals with the study of research and surveys on the numerical data. For taking surveys, we have to define the hypothesis. Generally, there are two types of hypothesis. One is a null hypothesis, and another is an alternative hypothesis .
In probability and statistics, the null hypothesis is a comprehensive statement or default status that there is zero happening or nothing happening. For example, there is no connection among groups or no association between two measured events. It is generally assumed here that the hypothesis is true until any other proof has been brought into the light to deny the hypothesis. Let us learn more here with definition, symbol, principle, types and example, in this article.
Table of contents:
The null hypothesis is a kind of hypothesis which explains the population parameter whose purpose is to test the validity of the given experimental data. This hypothesis is either rejected or not rejected based on the viability of the given population or sample . In other words, the null hypothesis is a hypothesis in which the sample observations results from the chance. It is said to be a statement in which the surveyors wants to examine the data. It is denoted by H 0 .
In statistics, the null hypothesis is usually denoted by letter H with subscript ‘0’ (zero), such that H 0 . It is pronounced as H-null or H-zero or H-nought. At the same time, the alternative hypothesis expresses the observations determined by the non-random cause. It is represented by H 1 or H a .
The principle followed for null hypothesis testing is, collecting the data and determining the chances of a given set of data during the study on some random sample, assuming that the null hypothesis is true. In case if the given data does not face the expected null hypothesis, then the outcome will be quite weaker, and they conclude by saying that the given set of data does not provide strong evidence against the null hypothesis because of insufficient evidence. Finally, the researchers tend to reject that.
Here, the hypothesis test formulas are given below for reference.
The formula for the null hypothesis is:
H 0 : p = p 0
The formula for the alternative hypothesis is:
H a = p >p 0 , < p 0 ≠ p 0
The formula for the test static is:
Remember that, p 0 is the null hypothesis and p – hat is the sample proportion.
Also, read:
There are different types of hypothesis. They are:
Simple Hypothesis
It completely specifies the population distribution. In this method, the sampling distribution is the function of the sample size.
Composite Hypothesis
The composite hypothesis is one that does not completely specify the population distribution.
Exact Hypothesis
Exact hypothesis defines the exact value of the parameter. For example μ= 50
Inexact Hypothesis
This type of hypothesis does not define the exact value of the parameter. But it denotes a specific range or interval. For example 45< μ <60
Sometimes the null hypothesis is rejected too. If this hypothesis is rejected means, that research could be invalid. Many researchers will neglect this hypothesis as it is merely opposite to the alternate hypothesis. It is a better practice to create a hypothesis and test it. The goal of researchers is not to reject the hypothesis. But it is evident that a perfect statistical model is always associated with the failure to reject the null hypothesis.
The null hypothesis says there is no correlation between the measured event (the dependent variable) and the independent variable. We don’t have to believe that the null hypothesis is true to test it. On the contrast, you will possibly assume that there is a connection between a set of variables ( dependent and independent).
The null hypothesis is rejected using the P-value approach. If the P-value is less than or equal to the α, there should be a rejection of the null hypothesis in favour of the alternate hypothesis. In case, if P-value is greater than α, the null hypothesis is not rejected.
Now, let us discuss the difference between the null hypothesis and the alternative hypothesis.
|
| |
1 | The null hypothesis is a statement. There exists no relation between two variables | Alternative hypothesis a statement, there exists some relationship between two measured phenomenon |
2 | Denoted by H | Denoted by H |
3 | The observations of this hypothesis are the result of chance | The observations of this hypothesis are the result of real effect |
4 | The mathematical formulation of the null hypothesis is an equal sign | The mathematical formulation alternative hypothesis is an inequality sign such as greater than, less than, etc. |
Here, some of the examples of the null hypothesis are given below. Go through the below ones to understand the concept of the null hypothesis in a better way.
If a medicine reduces the risk of cardiac stroke, then the null hypothesis should be “the medicine does not reduce the chance of cardiac stroke”. This testing can be performed by the administration of a drug to a certain group of people in a controlled way. If the survey shows that there is a significant change in the people, then the hypothesis is rejected.
Few more examples are:
1). Are there is 100% chance of getting affected by dengue?
Ans: There could be chances of getting affected by dengue but not 100%.
2). Do teenagers are using mobile phones more than grown-ups to access the internet?
Ans: Age has no limit on using mobile phones to access the internet.
3). Does having apple daily will not cause fever?
Ans: Having apple daily does not assure of not having fever, but increases the immunity to fight against such diseases.
4). Do the children more good in doing mathematical calculations than grown-ups?
Ans: Age has no effect on Mathematical skills.
In many common applications, the choice of the null hypothesis is not automated, but the testing and calculations may be automated. Also, the choice of the null hypothesis is completely based on previous experiences and inconsistent advice. The choice can be more complicated and based on the variety of applications and the diversity of the objectives.
The main limitation for the choice of the null hypothesis is that the hypothesis suggested by the data is based on the reasoning which proves nothing. It means that if some hypothesis provides a summary of the data set, then there would be no value in the testing of the hypothesis on the particular set of data.
What is meant by the null hypothesis.
In Statistics, a null hypothesis is a type of hypothesis which explains the population parameter whose purpose is to test the validity of the given experimental data.
Hypothesis testing is defined as a form of inferential statistics, which allows making conclusions from the entire population based on the sample representative.
The null hypothesis is either accepted or rejected in terms of the given data. If P-value is less than α, then the null hypothesis is rejected in favor of the alternative hypothesis, and if the P-value is greater than α, then the null hypothesis is accepted in favor of the alternative hypothesis.
The importance of the null hypothesis is that it provides an approximate description of the phenomena of the given data. It allows the investigators to directly test the relational statement in a research study.
If the result of the chi-square test is bigger than the critical value in the table, then the data does not fit the model, which represents the rejection of the null hypothesis.
Put your understanding of this concept to test by answering a few MCQs. Click ‘Start Quiz’ to begin!
Select the correct answer and click on the “Finish” button Check your score and answers at the end of the quiz
Visit BYJU’S for all Maths related queries and study materials
Your result is as below
Request OTP on Voice Call
MATHS Related Links | |
Register with byju's & watch live videos.
A hypothesis that states that there is no relationship between two population parameters
The null hypothesis states that there is no relationship between two population parameters, i.e., an independent variable and a dependent variable . If the hypothesis shows a relationship between the two parameters, the outcome could be due to an experimental or sampling error. However, if the null hypothesis returns false, there is a relationship in the measured phenomenon.
The null hypothesis is useful because it can be tested to conclude whether or not there is a relationship between two measured phenomena. It can inform the user whether the results obtained are due to chance or manipulating a phenomenon. Testing a hypothesis sets the stage for rejecting or accepting a hypothesis within a certain confidence level.
Two main approaches to statistical inference in a null hypothesis can be used– significance testing by Ronald Fisher and hypothesis testing by Jerzy Neyman and Egon Pearson. Fisher’s significance testing approach states that a null hypothesis is rejected if the measured data is significantly unlikely to have occurred (the null hypothesis is false). Therefore, the null hypothesis is rejected and replaced with an alternative hypothesis.
If the observed outcome is consistent with the position held by the null hypothesis, the hypothesis is accepted. On the other hand, the hypothesis testing by Neyman and Pearson is compared to an alternative hypothesis to make a conclusion about the observed data. The two hypotheses are differentiated based on observed data.
A null hypothesis is a theory based on insufficient evidence that requires further testing to prove whether the observed data is true or false. For example, a null hypothesis statement can be “the rate of plant growth is not affected by sunlight.” It can be tested by measuring the growth of plants in the presence of sunlight and comparing this with the growth of plants in the absence of sunlight.
Rejecting the null hypothesis sets the stage for further experimentation to see a relationship between the two variables exists. Rejecting a null hypothesis does not necessarily mean that the experiment did not produce the required results, but it sets the stage for further experimentation.
To differentiate the null hypothesis from other forms of hypothesis, a null hypothesis is written as H 0 , while the alternate hypothesis is written as H A or H 1 . A significance test is used to establish confidence in a null hypothesis and determine whether the observed data is not due to chance or manipulation of data.
Researchers test the hypothesis by examining a random sample of the plants being grown with or without sunlight. If the outcome demonstrates a statistically significant change in the observed change, the null hypothesis is rejected.
The annual return of ABC Limited bonds is assumed to be 7.5%. To test if the scenario is true or false, we take the null hypothesis to be “the mean annual return for ABC limited bond is not 7.5%.” To test the hypothesis, we first accept the null hypothesis.
Any information that is against the stated null hypothesis is taken to be the alternative hypothesis for the purpose of testing the hypotheses. In such a case, the alternative hypothesis is “the mean annual return of ABC Limited is 7.5%.”
We take samples of the annual returns of the bond for the last five years to calculate the sample mean for the previous five years. The result is then compared to the assumed annual return average of 7.5% to test the null hypothesis.
The average annual returns for the five-year period are 7.5%; the null hypothesis is rejected. Consequently, the alternative hypothesis is accepted.
An alternative hypothesis is the inverse of a null hypothesis. An alternative hypothesis and a null hypothesis are mutually exclusive, which means that only one of the two hypotheses can be true.
A statistical significance exists between the two variables. If samples used to test the null hypothesis return false, it means that the alternate hypothesis is true, and there is statistical significance between the two variables.
Hypothesis testing is a statistical process of testing an assumption regarding a phenomenon or population parameter. It is a critical part of the scientific method, which is a systematic approach to assessing theories through observations and determining the probability that a stated statement is true or false.
A good theory can make accurate predictions. For an analyst who makes predictions, hypothesis testing is a rigorous way of backing up his prediction with statistical analysis. It also helps determine sufficient statistical evidence that favors a certain hypothesis about the population parameter.
Thank you for reading CFI’s guide to Null Hypothesis. To keep advancing your career, the additional resources below will be useful:
Access and download collection of free Templates to help power your productivity and performance.
Already have an account? Log in
Take your learning and productivity to the next level with our Premium Templates.
Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.
Already have a Self-Study or Full-Immersion membership? Log in
Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.
Already have a Full-Immersion membership? Log in
Null Hypothesis , often denoted as H 0, is a foundational concept in statistical hypothesis testing. It represents an assumption that no significant difference, effect, or relationship exists between variables within a population. It serves as a baseline assumption, positing no observed change or effect occurring. The null is t he truth or falsity of an idea in analysis.
In this article, we will discuss the null hypothesis in detail, along with some solved examples and questions on the null hypothesis.
Table of Content
Null hypothesis symbol, formula of null hypothesis, types of null hypothesis, null hypothesis examples, principle of null hypothesis, how do you find null hypothesis, null hypothesis in statistics, null hypothesis and alternative hypothesis, null hypothesis and alternative hypothesis examples, null hypothesis – practice problems.
Null Hypothesis in statistical analysis suggests the absence of statistical significance within a specific set of observed data. Hypothesis testing, using sample data, evaluates the validity of this hypothesis. Commonly denoted as H 0 or simply “null,” it plays an important role in quantitative analysis, examining theories related to markets, investment strategies, or economies to determine their validity.
Null Hypothesis represents a default position, often suggesting no effect or difference, against which researchers compare their experimental results. The Null Hypothesis, often denoted as H 0 asserts a default assumption in statistical analysis. It posits no significant difference or effect, serving as a baseline for comparison in hypothesis testing.
The null Hypothesis is represented as H 0 , the Null Hypothesis symbolizes the absence of a measurable effect or difference in the variables under examination.
Certainly, a simple example would be asserting that the mean score of a group is equal to a specified value like stating that the average IQ of a population is 100.
The Null Hypothesis is typically formulated as a statement of equality or absence of a specific parameter in the population being studied. It provides a clear and testable prediction for comparison with the alternative hypothesis. The formulation of the Null Hypothesis typically follows a concise structure, stating the equality or absence of a specific parameter in the population.
H 0 : μ 1 = μ 2
This asserts that there is no significant difference between the means of two populations or groups.
H 0 : p 1 − p 2 = 0
This suggests no significant difference in proportions between two populations or conditions.
H 0 : σ 1 = σ 2
This states that there’s no significant difference in variances between groups or populations.
H 0 : Variables are independent
This asserts that there’s no association or relationship between categorical variables.
Null Hypotheses vary including simple and composite forms, each tailored to the complexity of the research question. Understanding these types is pivotal for effective hypothesis testing.
The Equality Null Hypothesis, also known as the Simple Null Hypothesis, is a fundamental concept in statistical hypothesis testing that assumes no difference, effect or relationship between groups, conditions or populations being compared.
In some studies, the focus might be on demonstrating that a new treatment or method is not significantly worse than the standard or existing one.
The concept of a superiority null hypothesis comes into play when a study aims to demonstrate that a new treatment, method, or intervention is significantly better than an existing or standard one.
In certain statistical tests, such as chi-square tests for independence, the null hypothesis assumes no association or independence between categorical variables.
In tests like ANOVA (Analysis of Variance), the null hypothesis suggests that there’s no difference in population means across different groups.
The principle of the null hypothesis is a fundamental concept in statistical hypothesis testing. It involves making an assumption about the population parameter or the absence of an effect or relationship between variables.
In essence, the null hypothesis (H 0 ) proposes that there is no significant difference, effect, or relationship between variables. It serves as a starting point or a default assumption that there is no real change, no effect or no difference between groups or conditions.
Rejecting the Null Hypothesis occurs when statistical evidence suggests a significant departure from the assumed baseline. It implies that there is enough evidence to support the alternative hypothesis, indicating a meaningful effect or difference. Null Hypothesis rejection occurs when statistical evidence suggests a deviation from the assumed baseline, prompting a reconsideration of the initial hypothesis.
Identifying the Null Hypothesis involves defining the status quotient, asserting no effect and formulating a statement suitable for statistical analysis.
The Null Hypothesis is rejected when statistical tests indicate a significant departure from the expected outcome, leading to the consideration of alternative hypotheses. It occurs when statistical evidence suggests a deviation from the assumed baseline, prompting a reconsideration of the initial hypothesis.
In statistical hypothesis testing, researchers begin by stating the null hypothesis, often based on theoretical considerations or previous research. The null hypothesis is then tested against an alternative hypothesis (Ha), which represents the researcher’s claim or the hypothesis they seek to support.
The process of hypothesis testing involves collecting sample data and using statistical methods to assess the likelihood of observing the data if the null hypothesis were true. This assessment is typically done by calculating a test statistic, which measures the difference between the observed data and what would be expected under the null hypothesis.
In the realm of hypothesis testing, the null hypothesis (H 0 ) and alternative hypothesis (H₁ or Ha) play critical roles. The null hypothesis generally assumes no difference, effect, or relationship between variables, suggesting that any observed change or effect is due to random chance. Its counterpart, the alternative hypothesis, asserts the presence of a significant difference, effect, or relationship between variables, challenging the null hypothesis. These hypotheses are formulated based on the research question and guide statistical analyses.
The null hypothesis (H 0 ) serves as the baseline assumption in statistical testing, suggesting no significant effect, relationship, or difference within the data. It often proposes that any observed change or correlation is merely due to chance or random variation. Conversely, the alternative hypothesis (H 1 or Ha) contradicts the null hypothesis, positing the existence of a genuine effect, relationship or difference in the data. It represents the researcher’s intended focus, seeking to provide evidence against the null hypothesis and support for a specific outcome or theory. These hypotheses form the crux of hypothesis testing, guiding the assessment of data to draw conclusions about the population being studied.
Criteria | Null Hypothesis | Alternative Hypothesis |
---|---|---|
Definition | Assumes no effect or difference | Asserts a specific effect or difference |
Symbol | H | H (or Ha) |
Formulation | States equality or absence of parameter | States a specific value or relationship |
Testing Outcome | Rejected if evidence of a significant effect | Accepted if evidence supports the hypothesis |
Let’s envision a scenario where a researcher aims to examine the impact of a new medication on reducing blood pressure among patients. In this context:
Null Hypothesis (H 0 ): “The new medication does not produce a significant effect in reducing blood pressure levels among patients.”
Alternative Hypothesis (H 1 or Ha): “The new medication yields a significant effect in reducing blood pressure levels among patients.”
The null hypothesis implies that any observed alterations in blood pressure subsequent to the medication’s administration are a result of random fluctuations rather than a consequence of the medication itself. Conversely, the alternative hypothesis contends that the medication does indeed generate a meaningful alteration in blood pressure levels, distinct from what might naturally occur or by random chance.
Mathematics Maths Formulas Probability and Statistics
Example 1: A researcher claims that the average time students spend on homework is 2 hours per night.
Null Hypothesis (H 0 ): The average time students spend on homework is equal to 2 hours per night. Data: A random sample of 30 students has an average homework time of 1.8 hours with a standard deviation of 0.5 hours. Test Statistic and Decision: Using a t-test, if the calculated t-statistic falls within the acceptance region, we fail to reject the null hypothesis. If it falls in the rejection region, we reject the null hypothesis. Conclusion: Based on the statistical analysis, we fail to reject the null hypothesis, suggesting that there is not enough evidence to dispute the claim of the average homework time being 2 hours per night.
Example 2: A company asserts that the error rate in its production process is less than 1%.
Null Hypothesis (H 0 ): The error rate in the production process is 1% or higher. Data: A sample of 500 products shows an error rate of 0.8%. Test Statistic and Decision: Using a z-test, if the calculated z-statistic falls within the acceptance region, we fail to reject the null hypothesis. If it falls in the rejection region, we reject the null hypothesis. Conclusion: The statistical analysis supports rejecting the null hypothesis, indicating that there is enough evidence to dispute the company’s claim of an error rate of 1% or higher.
Q1. A researcher claims that the average time spent by students on homework is less than 2 hours per day. Formulate the null hypothesis for this claim?
Q2. A manufacturing company states that their new machine produces widgets with a defect rate of less than 5%. Write the null hypothesis to test this claim?
Q3. An educational institute believes that their online course completion rate is at least 60%. Develop the null hypothesis to validate this assertion?
Q4. A restaurant claims that the waiting time for customers during peak hours is not more than 15 minutes. Formulate the null hypothesis for this claim?
Q5. A study suggests that the mean weight loss after following a specific diet plan for a month is more than 8 pounds. Construct the null hypothesis to evaluate this statement?
The null hypothesis (H 0 ) and alternative hypothesis (H a ) are fundamental concepts in statistical hypothesis testing. The null hypothesis represents the default assumption, stating that there is no significant effect, difference, or relationship between variables. It serves as the baseline against which the alternative hypothesis is tested. In contrast, the alternative hypothesis represents the researcher’s hypothesis or the claim to be tested, suggesting that there is a significant effect, difference, or relationship between variables. The relationship between the null and alternative hypotheses is such that they are complementary, and statistical tests are conducted to determine whether the evidence from the data is strong enough to reject the null hypothesis in favor of the alternative hypothesis. This decision is based on the strength of the evidence and the chosen level of significance. Ultimately, the choice between the null and alternative hypotheses depends on the specific research question and the direction of the effect being investigated.
What does null hypothesis stands for.
The null hypothesis, denoted as H 0 , is a fundamental concept in statistics used for hypothesis testing. It represents the statement that there is no effect or no difference, and it is the hypothesis that the researcher typically aims to provide evidence against.
A null hypothesis is formed based on the assumption that there is no significant difference or effect between the groups being compared or no association between variables being tested. It often involves stating that there is no relationship, no change, or no effect in the population being studied.
In statistical hypothesis testing, if the p-value (the probability of obtaining the observed results) is lower than the chosen significance level (commonly 0.05), we reject the null hypothesis. This suggests that the data provides enough evidence to refute the assumption made in the null hypothesis.
In research, the null hypothesis represents the default assumption or position that there is no significant difference or effect. Researchers often try to test this hypothesis by collecting data and performing statistical analyses to see if the observed results contradict the assumption.
The null hypothesis (H0) is the default assumption that there is no significant difference or effect. The alternative hypothesis (H1 or Ha) is the opposite, suggesting there is a significant difference, effect or relationship.
Rejecting the null hypothesis implies that there is enough evidence in the data to support the alternative hypothesis. In simpler terms, it suggests that there might be a significant difference, effect or relationship between the groups or variables being studied.
Formulating a null hypothesis often involves considering the research question and assuming that no difference or effect exists. It should be a statement that can be tested through data collection and statistical analysis, typically stating no relationship or no change between variables or groups.
The null hypothesis is commonly symbolized as H 0 in statistical notation.
The null hypothesis serves as a starting point for hypothesis testing, enabling researchers to assess if there’s enough evidence to reject it in favor of an alternative hypothesis.
Rejecting the null hypothesis implies that there is sufficient evidence to support an alternative hypothesis, suggesting a significant effect or relationship between variables.
Various statistical tests, such as t-tests or chi-square tests, are employed to evaluate the validity of the Null Hypothesis in different scenarios.
Similar reads.
Ai generator.
Making a certain class or laboratory experiment would require a good null hypothesis . You will be given variables to be used in your experiment and then you would be able to identify the relationship between the two. Every beginning of the experiment report would indicate your hypotheses. It is proven useful for it can be tested to prove if the result is considered false.
A null hypothesis is used during experiments to prove that there is no difference in the relationship between the two variables. Every type of experiment would require you to make a null hypothesis. From the word itself “null” means zero or no value. If you want to practice making a good experiment report , consider providing a good null hypothesis. Null hypothesis is designed to be rejected if the alternative hypothesis is proven to be exact.
1. medical research.
1. effectiveness of therapy.
1. effect of fertilizers on plant growth.
1. effect of marketing campaign on sales.
1. comparing means.
1. medical studies.
1. null hypothesis significance test example.
The null hypothesis is a fundamental concept in statistics and scientific research . It serves several critical purposes in the process of hypothesis testing, guiding researchers in drawing meaningful conclusions from their data. Below are the primary purposes of the null hypothesis:
The null hypothesis provides a baseline or a default position that indicates no effect, no difference, or no relationship between variables. It is the statement that researchers aim to test against an alternative hypothesis. By starting with the assumption that there is no effect, researchers can objectively assess whether the data provide enough evidence to support the alternative hypothesis.
By assuming no effect or no difference, the null hypothesis helps eliminate bias in research. Researchers approach their study without preconceived notions about the outcome, ensuring that the results are based on the data collected rather than personal beliefs or expectations.
The null hypothesis provides a structured framework for conducting statistical tests. It is essential for calculating p-values and test statistics, which determine whether the observed data are significantly different from what would be expected under the null hypothesis. This framework allows for a standardized approach to testing hypotheses across various fields of study.
The null hypothesis facilitates decision-making in research by providing clear criteria for accepting or rejecting it. If the data provide sufficient evidence to reject the null hypothesis, researchers can conclude that there is a statistically significant effect or difference. This decision-making process is critical in advancing scientific knowledge and understanding.
The null hypothesis plays a crucial role in controlling Type I and Type II errors in hypothesis testing. A Type I error occurs when the null hypothesis is incorrectly rejected (a false positive), while a Type II error happens when the null hypothesis is incorrectly accepted (a false negative). By defining the null hypothesis, researchers can set significance levels (e.g., alpha level) to manage the risk of these errors.
Rejecting the null hypothesis is a critical step in the process of hypothesis testing. The decision to reject the null hypothesis is based on statistical evidence derived from the data collected in a study. Below are the key factors that determine when the null hypothesis is rejected:
The p-value is a measure of the probability that the observed data (or something more extreme) would occur if the null hypothesis were true. The null hypothesis is rejected if the p-value is less than or equal to the predetermined significance level (?).
The test statistic is a standardized value calculated from sample data during a hypothesis test. It measures the degree to which the sample data differ from the null hypothesis. The decision to reject the null hypothesis depends on whether the test statistic falls within the critical region.
Confidence intervals provide a range of values that are likely to contain the population parameter. If the confidence interval does not include the value specified by the null hypothesis, the null hypothesis is rejected.
Effect size measures the magnitude of the difference between groups or the strength of a relationship between variables. While not a direct criterion for rejecting the null hypothesis, a substantial effect size can support the decision to reject the null hypothesis when combined with a significant p-value.
A statement that there is no effect or difference. | A statement that there is an effect or difference. | |
Serves as a baseline or default position. | Represents the outcome the researcher aims to support. | |
Assumes no relationship or effect. | Assumes a relationship or effect exists. | |
“The new drug has no effect on blood pressure.” | “The new drug lowers blood pressure.” | |
Retained if the p-value is greater than the significance level (?). | Accepted if the p-value is less than or equal to the significance level (?). | |
Falls outside the critical region, indicating no significant effect. | Falls within the critical region, indicating a significant effect. | |
Denoted by H0. | Denoted by H1 or Ha. | |
Focuses on the absence of a significant effect or relationship. | Focuses on the presence of a significant effect or relationship. | |
Incorrectly rejecting a true null hypothesis (false positive). | N/A | |
N/A | Incorrectly accepting a false null hypothesis (false negative). |
Writing a null hypothesis is a crucial step in designing a scientific study or experiment. The null hypothesis (H0) serves as a starting point for statistical testing and represents a statement of no effect or no difference. Here’s a step-by-step guide on how to write a null hypothesis:
Start by clearly defining the research question you want to investigate. Understand what you are testing and what you expect to find.
Identify the independent and dependent variables in your study.
The null hypothesis should assert that there is no effect, no difference, or no relationship between the variables. It is usually written as a statement of equality or no change.
In formal scientific writing, use symbols and proper notation to represent the null hypothesis.
The null hypothesis is crucial as it provides a baseline for comparison and allows researchers to test the significance of their findings.
A null hypothesis is stated as no effect or no difference, typically in the form “There is no [effect/difference] between [groups/variables].”
The alternative hypothesis (H1) suggests that there is an effect or difference between variables, opposing the null hypothesis.
Rejecting the null hypothesis means the data provides sufficient evidence to support the alternative hypothesis, indicating a significant effect or difference.
A p-value measures the probability that the observed data would occur if the null hypothesis were true. Low p-values indicate strong evidence against the null hypothesis.
A Type I error occurs when the null hypothesis is incorrectly rejected, meaning a false positive result is concluded.
A Type II error happens when the null hypothesis is incorrectly accepted, meaning a false negative result is concluded.
The significance level, often set at 0.05, is chosen based on the acceptable risk of making a Type I error in the context of the study.
No, the null hypothesis can only be rejected or not rejected. Failing to reject it does not prove it true, only that there is not enough evidence against it.
Larger sample sizes increase the test’s power, reducing the risk of Type II errors and making it easier to detect a true effect.
Text prompt
10 Examples of Public speaking
20 Examples of Gas lighting
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Nature Cell Biology ( 2024 ) Cite this article
Metrics details
Analyses of transcriptional bursting from single-cell RNA-sequencing data have revealed patterns of variation and regulation in the kinetic parameters that could be inferred. Here we profiled newly transcribed (4-thiouridine-labelled) RNA across 10,000 individual primary mouse fibroblasts to more broadly infer bursting kinetics and coordination. We demonstrate that inference from new RNA profiles could separate the kinetic parameters that together specify the burst size, and that the synthesis rate (and not the transcriptional off rate) controls the burst size. Importantly, transcriptome-wide inference of transcriptional on and off rates provided conclusive evidence that RNA polymerase II transcribes genes in bursts. Recent reports identified examples of transcriptional co-bursting, yet no global analyses have been performed. The deep new RNA profiles we generated with allelic resolution demonstrated that co-bursting rarely appears more frequently than expected by chance, except for certain gene pairs, notably paralogues located in close genomic proximity. Altogether, new RNA single-cell profiling critically improves the inference of transcriptional bursting and provides strong evidence for independent transcriptional bursting of mammalian genes.
It has been nearly 50 years since the transcription of nascent RNA was described as a bursting process, where periods of transcriptional activity were interspersed with periods of inactivity 1 . Direct evidence of stochastic transcription and bursting dynamics have come from real-time imaging experiments where nascent RNAs are monitored over time 2 , 3 , 4 and indirectly from time-lapse microscopy on the resulting protein levels 5 . Complementing evidence has emerged from analyses of steady-state RNA counts across single cells, either using single-molecule RNA fluorescence in situ hybridization (smRNA-FISH) 6 , 7 , 8 or single-cell RNA sequencing (scRNA-seq) 9 together with modelling to infer kinetic parameters that best describe the observed RNA count distributions. Whereas both strategies can summarize average bursting features, for example, the burst frequency or burst size (that is, RNAs transcribed per burst), real-time methods can investigate the variation in bursting over time. The strength of scRNA-seq-based bursting inferences lies in the ability to infer allele-level kinetics across thousands of endogenous genes in parallel, whereas more targeted smRNA-FISH approaches have higher sensitivity, which is important for accurate burst size estimations.
Although transcriptional bursting has been extensively studied 10 , 11 , there are several important open questions regarding bursting kinetics. A central question is whether all genes are transcribed in bursts or whether subsets of genes also show constitutive expression. Most evidence argues for general transcriptional bursting, although predominantly constitutive expression was observed in studies in bacteria 12 and human cells 13 , and for specific genes in yeast 3 , 14 . While transcriptome-wide inference of bursting parameters from steady-state single-cell RNA counts is effective 9 , the synthesis and closing rates (that together make up the burst size) could not be individually determined, and absolute burst frequency estimates required separately measured degradation rates. Therefore, information on how long bursts lasts is mostly derived from sporadic observations 11 .
An equally fundamental question for transcription is whether the bursting of each gene is independent or whether closely related genes (by genomic or spatial distance) are prone to co-bursting. Several lines of evidence have indirectly implied coordinated co-bursting of multiple genes, such as reports of transcriptional factories 15 , 16 and transcriptional condensates 17 . A recent study found spatial coupling in nascent RNA when the gene loci are in close spatial proximity 18 . Intriguingly, studies in the fruit fly have demonstrated coordinated bursting of transgenes 4 and for pairs of paralogues 19 . However, allele-level analyses of co-bursting can fully control for spurious correlations from cellular heterogeneity (for example, cell cycle and activation states) or technical variability in measurements that otherwise could lead to false positive correlations.
In this study, we investigated transcriptional bursting kinetics transcriptome-wide in primary fibroblasts through temporally resolved scRNA-seq. Analysis of the newly transcribed RNA greatly improved the inference of kinetic parameters. Interestingly, the varying burst sizes (RNA molecules per burst) observed across genes were found to correlate with inferred synthesis rate, whereas the burst durations showed little variation across genes. Investigating the allele-level new RNA profiles across the single cells, we demonstrate an overall lack of co-bursting of nearby genes except for a few gene pairs with modest increase in co-bursting.
An attractive strategy for analyses of transcriptional dynamics and bursting kinetics is to count only the RNA molecules transcribed within a defined time period, demonstrated by recent 4-thiouridine (4sU)-based single-cell new RNA profiling methods 20 , 21 . In these methods, cells are exposed to the uridine analogue 4sU for a short period of time, leading to 4sU being incorporated into the transcribed RNA. During library construction, the alkylation of 4sU and subsequent reverse transcription (RT) results in base conversions in the complementary DNA at the positions of the 4sU incorporation 20 , 21 . The presence of 4sU-induced T-to-C conversions against the reference genome enables the computational separation of new and pre-existing RNA molecules. Previous 4sU-based single-cell methods 20 , 21 , however, suffered from low sensitivity and cellular throughput. Here, we developed NASC-seq2, a miniaturized version of NASC-seq 20 with higher sensitivity and cellular throughput, that also includes unique molecular identifier (UMI). To compare the performance of NASC-seq2 with the original NASC-seq 20 , we applied NASC-seq2 to individual K562 cells ( n = 613) that were exposed to 4sU for 2 h. NASC-seq2 showed drastically improved sensitivity and detected on average 2,000 more genes per cell (at matched 100,000 total RNA reads) compared with NASC-seq (Fig. 1a ). The improvement mainly stems from starting with nanolitre lysis volume (following Smart-seq3xpress 22 ), which enabled the dimethyl sulfoxide (DMSO)-based alkylation step to be carried out in a low volume and subsequently diluted out (instead of using bead purification before cDNA amplification). Since the ability to separate new and old RNAs depends on the length of sequenced reads (Extended Data Fig. 1a ), we used longer short-read sequencing strategies (PE200). Analyses of the observed base conversions, through a mixture model that infers the probability of 4sU-induced base conversions (Pc) and conversions due to errors introduced during library preparation or sequencing (Pe), demonstrated a high signal-to-noise (Pc/Pe) ratio of ~45 (Extended Data Fig. 1b–d ). The average power in assigning new RNA molecules was above 90% (Extended Data Fig. 1e,f ), and we found that approximately 20% of the detected RNA molecules in K562 cells were transcribed within the 2-h period (Extended Data Fig. 1g,h ).
a , Plot showing the number of genes detected per K562 cell as a function of reads sequenced, for K562 cells processed with NASC-seq2 (613 individual cells) and NASC-seq 20 (138 individual cells), respectively. The mean number of genes per method and sequencing depth is shown, together with error bars (1.96× s.e.m.). b , Illustration of large-scale NASC-seq2 experiment on F1 primary fibroblasts. Four technical replicates of primary fibroblast cultures were independently exposed to 4sU and collected for FACS and NASC-seq2 library construction. For transcriptional dynamics analyses, cells from all replicates were pooled. c , Uniform Manifold Approximation and Projection (UMAP) of primary fibroblasts, overlayed with contour plots, showing that assayed primary fibroblast cells did not show apparent patterns of heterogeneity. d , Boxplots showing the obtained signal-to-noise level (Pc/Pe) in fibroblasts with ( n = 8,912) and without ( n = 783) 4sU (2 h). The boxplots show the median and boundaries (first and third quartile), and the whiskers denote 1.5 times the interquartile range of the box. e , Density plot for the obtained power to call RNA molecules as new ( y axis) against the reconstructed RNA molecule length ( x axis). f , Contour plots showing the fraction of new RNA molecules per cell ( x axis) against total detected RNA molecules per cell ( y axis) for fibroblasts with and without 4sU. g , Scatter plot of burst frequency estimates ( x axis) for mouse primary fibroblasts previously inferred from total RNA counts 9 against the fraction of cells with new RNA ( y axis) detected after 2-h 4sU exposure. Source numerical data are available in Source data .
Analysis of transcriptional dynamics in mouse primary fibroblasts.
We next sought to create a comprehensive transcriptome dynamics data set, by applying NASC-seq2 to profile new RNA (2-h 4sU labelling) on 10,000 individual primary mouse fibroblasts (Fig. 1b ). The primary fibroblasts came from a female F1 mouse (C57Bl/6 × CAST) so that transcribed genetic polymorphisms could be used to also assign RNA to the alleles 23 . Single-cell libraries from 8,912 4sU-exposed cells passed quality control, and analysed cells were homogeneous and with no substructures in lower-dimensional projections (Fig. 1c ). The median signal-to-noise ratio was 20, showing strong separation between 4sU-exposed cells ( n = 8,912) and control non-exposed fibroblast cells ( n = 783) (Fig. 1d and Extended Data Fig. 2 ). The average power to call new RNA molecules was 70% and dependent on sequenced and reconstructed length (Fig. 1e and Methods ). We detected approximately 100,000 RNA molecules per cell, out of which 12.5% were assigned as new (Fig. 1f ). Comparing the fraction of cells with detected new RNA to previously reported burst frequencies for similar fibroblasts 9 revealed a strong correlation (Spearman r = 0.9) (Fig. 1g ), indicating that the observed new RNA counts were in general agreement with transcriptome-wide transcriptional burst kinetic data inferred from steady-state scRNA-seq 9 .
The two-state telegraph model of transcription 24 (Fig. 2a ) is often used for steady-state estimation of kinetics, where four rate parameters dictate the transcriptional dynamics. Each loci transition from transcriptional off or on state (based on the k on and k off rates), where the rate of RNA transcription in the on state is controlled by the synthesis rate ( k syn ) and subject to RNA degradation ( k d ). To extend the model to the transient (pulse-labelling) state, the probability mass function was derived that describe the new RNA counts as a function of bursting kinetic parameters and 4sU-labelling time ( Methods and Supplementary Note ). Having measured both new and pre-existing RNA per cell enabled us to also derive degradation rates. Using the mass function and degradation rates, three gene-level transcriptional bursting parameters were inferred from new RNA counts using maximum likelihood estimation, with parameters initialized from three count summary statistics ( Methods ). These included the fraction of cells with new RNA (Fig. 2b ) and the coefficient of variation in new RNA counts (Fig. 2c ), which we found informative primarily for k on and k syn , respectively. The distribution of all four inferred rate parameters was visualized on absolute time scales (Fig. 2d ) with indicated number of genes robustly inferred per distribution and error bars with geometric standard deviation showing accuracy. Typical half-lives of transcripts were between an hour and a day, and the frequency of bursts varied from one per day to one per hour, whereas a burst lasted only around a minute during which RNA can be transcribed at rates of 3–200 molecules per hour (Fig. 2d ).
a , Illustration of the two-state telegraph model of transcription. b , Contour plot of inferred burst frequencies ( k on ) against the fraction of cells with detected new RNA (12,284 genes included with robust k on inference). c , Contour plot of synthesis rate ( k syn ) against the coefficient of variation (CV) of new RNA counts over cells (4,437 genes with robust k syn inference). d , Density plots for inferred bursting parameters in primary fibroblasts, with the number of genes for which the respective parameter could be robustly assigned as shown in the figure. Technical losses could cause a constant k syn underestimation bias. Top bar: indicative waiting times. e , Contour plot of synthesis ( k syn ) rate against transcriptional off ( k off ) rate inferred from all primary fibroblasts. f , Contour plot of synthesis ( k syn ) rates inferred on cell subset half 1 against transcriptional off ( k off ) rates inferred from cell subset half 2. g , Contour plot of burst frequency ( k on ) rates inferred on cell subset half 1 against burst frequency ( k on ) rates inferred from cell subset half 2. h , Contour plot of burst size ( k syn / k off ) inferred on cell subset half 1 against burst size ( k syn /k off ) rates inferred from cell subset half 2. i , Contour plot of synthesis ( k syn ) rates inferred on cell subset half 1 against synthesis ( k syn ) rates inferred from cell subset half 2. j , Contour plot of transcriptional off ( k off ) rates inferred on cell subset half 1 against transcriptional off ( k off ) rates inferred from cell subset half 2. Plots in d – g are based on 1,216 genes that had robust inference on all four parameters in each cell subset. k , Correlation matrices, summarizing Spearman correlations obtained when comparing inferences from the two cell subsets, but subsampling the numbers of cells per subset and ordering the genes according to their mean expression. Geometric standard deviation (s.d., technical variation) is shown as error bars. Source numerical data are available in Source data .
However, a general problem with the joint inference of kinetic parameters is the risk of spurious correlations between parameters. In fact, the synthesis rate ( k syn ) and off rate ( k off ) were found to correlate when inferred from the 8,912 fibroblast cells (Fig. 2e ), probably reflecting technical noise during inference affecting both parameters. To this end, we separated the cells into two halves and performed independent inference on each half. Importantly, the correlation between the synthesis rate ( k syn ) and off rate ( k off ) was strongly reduced (Fig. 2f ). The inference of individual parameters from the two cell halves was reproducible (Fig. 2g–j ), with the highest number of cells required for the inferences of synthesis ( k syn ) and off ( k off ) rates (Fig. 2k ). Moreover, simulations were used to validate the correct central estimates and unimodal distribution (Extended Data Fig. 3a–c ). Reassuringly, burst frequency and size estimates inferred from steady-state scRNA-seq data were highly concordant with those inferred from new RNA profiles (Extended Data Fig. 3d,e ). Thus, inference from steady-state scRNA-seq data fails to accurately infer transcriptional off ( k off ) and synthesis rate ( k syn ) (Extended Data Fig. 3 ), but those parameters could be inferred from new RNA counts (Fig. 2 ).
Having determined the robustness of the inference based on the analyses of cell halves, we next explored patterns of bursting kinetics from the transcriptome-wide data. Interestingly, the transcriptional off ( k off ) rates were 100-fold higher than transcriptional on ( k on ) rates, demonstrating that all inferred genes were expressed in bursts. Focusing on the smaller subset of 1,216 genes for which all four parameters were reproducibly inferred in both cell halves, we found that the rate constants ( k on and k off ) were uncorrelated even though they specify a mutually reversible process (Fig. 3a ). Interestingly, we found that only the synthesis rate ( k syn ) was correlated with inferred burst size, whereas k off was not, indicating that the burst duration is more invariant while the rate of synthesis specifies the amounts of RNA produced per burst (Fig. 3b,c ). We validated that the synthesis rate ( k syn ) controls the burst size, through similar kinetic inference in the K562 cells (Extended Data Fig. 4 ) albeit with lower correlation and accuracy due to fewer sequenced cells. Analysis of core promoter elements revealed significant interactions with burst size (and not frequency), as previously reported 9 , and additionally the interaction was found for k syn since it determines the burst size (Extended Data Fig. 5 ). A systematic comparison of measured and derived parameters across 7,234 genes demonstrated the impact of burst frequency on the overall expression and that the correlation between burst size and synthesis rate holds transcriptome-wide (Fig. 3d ). In line with the small burst sizes detected, which in part can be a technical underestimate, we find a moderate correlation between burst size and the fraction of cell with new RNA, possibly indicating that unproductive on states may occur. Highly expressed genes are more variable in terms of burst size, whereas medium and lowly expressed genes differ mostly in burst frequency (Fig. 4a–c ), as previously reported 5 . Correlating observed and inferred parameters in highly expressed genes revealed a negative correlation between degradation rate and burst frequency (Fig. 4d ).
a , Contour plot of transcriptional on ( k on ) and off ( k off ) rates, inferred separately on two non-overlapping halves of the cells (1,216 genes with robust inference of k on , k off , k syn and burst size). b , Contour plot of burst size (inferred from cell subset half 2) against the synthesis rate, k syn (inferred from cell subset half 1) (1,216 genes with robust inference of k on , k off , k syn and burst size). c , Contour plot of burst size (inferred from cell subset half 2) against the off rate, k off (inferred from cell subset half 1) (1,216 genes with robust inference of k on , k off , k syn and burst size). d , Spearman correlation matrix from parallel inference of two cell halves (each with 4,458 cells) based on the 7,234 genes with robust k on and burst size inference. Measurements are indicated with asterisks in contrast to derived estimates (#). P values from the Spearman correlation tests were Benjamini–Hochberg adjusted. In a – c , geometric standard deviation (s.d., technical variation) is shown as error bars. Source numerical data are available in Source data . expr., expressing.
a , b , Contour plot of expression rate (inferred from cell subset half 1) against burst frequency (inferred from cell subset half 2), for the 1,216 most highly expressed genes ( a ) or for 7,234 genes ( b ). c , Contour plot of expression rate (inferred from cell subset half 1) against burst size (inferred from cell subset half 2) for 7,234 genes. d , Correlation matrix (as in Fig. 3d ) for the 1,216 most highly expressed genes (4,458 cells per cell half). Apart from the four estimated parameters in the telegraph model, the heat map includes measurements (*) and parameter-derived estimates (#), where mean occupancy (fraction of time in on state) = k on /( k on + k off ), burst size (RNAs per on state) = k syn / k off , burst frequency (on states per time) = 1/(1/ k on + 1/ k off ), expression rate (RNAs per time) = k syn × k on /( k on + k off ). Geometric standard deviation indicates technical variation. n.s., P > 0.05. Source numerical data are available in Source data .
Studies have reported gene pairs showing co-bursting 4 , 18 , 19 , that is, that simultaneous RNA transcription occurred more often than expected by chance (Fig. 5a ). However, no previous study has investigated allele-level measurements, which constitute an important control, where nearby genes on the same parental chromosome (allele) can be compared with comparisons between the same gene pair on the opposite chromosome (other allele). Co-bursting should manifest as genes that burst more often from the same allele across cells than expected if all genes were transcribed independently (Fig. 5b–d ). Having performed the experiments in F1 mice, we assigned all new RNA molecules (and reads) to their allelic origin ( Methods ). As previously shown 23 , 25 , the allelic estimates accurately capture X-chromosome inactivation and imprinting features (Extended Data Fig. 6 ). We computed allelic new RNA counts for each gene and cell and performed pair-wise comparisons of genes as a function of their genomic distance (on the linear chromosome).
a , Illustration of a genomic region with two nearby genes. b , Illustration of new RNA obtained in four cells, if all alleles and genes are transcribed independently. c , Illustration of new RNA obtained in four cells, if nearby genes co-burst from the same allele. d , Illustration of a correlation of new RNA from two nearby genes on the same allele, for example from co-bursting. e , Simulation showing the fraction new RNA observations in cells (mean across cells; y axis) and the fraction new RNA observations coming from a single bursting event ( x axis), for the indicated 4sU labelling time periods. f , The mean fraction of single burst new RNA observations ( y axis) against the 4sU labelling time period. g , Plot showing the power to detect significant co-bursting ( y axis) as a function of synthetically implanting coordinated new RNA counts in an increasing fraction of cells. h , Boxplots showing the observed correlations for pair-wise comparisons of genes within different genomic distance bins for autosomal genes. In grey are new RNA count correlations (irrespective of allelic origin), and in green and purple are correlations obtained on new RNA counts from the same allele (green) and, as a control, from different alleles (purple). i , Boxplots showing the correlations from same allele (green in h ) minus the correlations from different alleles (purple in h ), for pair-wise autosomal genes separated by indicated genomic distance. j , Boxplots showing the correlations from same allele minus different alleles (as in i ), for pair-wise genes on the X chromosome, separated by indicated genomic distance (280 gene pairs in <0.1; 881 in <0.5; 1,291 in <1.5; 900 in <2.5; 787 in <3.5). In h and i , the numbers of gene pairs are as follows: 11,308 for bin <0.1; 34,411 for <0.5; 64,773 for <1.5; 51,532 for <2.5; 50,494 for <3.5. In h – j , the boxplots show the median and boundaries (first and third quartile), and the whiskers denote 1.5 times the interquartile range of the box. Source numerical data are available in Source data .
Through Gillespie bursting simulation using the inferred gene-level kinetics, we simulated new RNA counts and monitored the fraction of cells with new RNA observations and how often new RNA observations were derived from a single burst event ( Methods ). We computed the average detection of new RNA events (across expressed genes) across cells labelled for different time periods, and the fraction of those new RNA observations that were derived from a single burst event (Fig. 5e ). The analysis demonstrated that 4sU labelling for 1–2 h led to the highest frequency of single-burst new RNA observations (Fig. 5f ). The statistical power in detecting significant co-bursting was estimated through simulation where coordinated new RNA was synthetically implanted for an increasing number of cells (Fig. 5g ), demonstrating high power when co-bursting was visible in only a low percentage of cells.
Having demonstrated that the collected 2-h 4sU experiment is well suited to investigate co-bursting patterns transcriptome-wide, we focused our analysis on autosomal genes. We observed positive correlations among gene pairs when correlating new RNA profiles (irrespective of allelic origin) (Fig. 5h , grey) and to a smaller degree when comparing allelic new RNA counts from the same allele (Fig. 5h , green), but also in the control correlations across the non-meaningful other allele (Fig. 5h , purple). The drop in correlations for allelic resolution new RNA counts is due to losing 50% of the reads compared with new RNA counts without allelic assignments. Importantly, the correlations from the same allele (in cis ) were generally not stronger than the spurious correlations from the other allele (in trans ), and therefore, after subtracting the median correlations ( cis – trans ) gene-pair correlations approached zero, with a few outliers (Fig. 5i ). Next, as a positive control, correlations among gene pairs on the X chromosome showed much greater correlations from the same allele (Fig. 5j ). The strong allelic signal in cis for X-chromosome genes is obviously stemming from the silencing of one chromosome and not from co-bursting. Finally, we investigated in more detail the autosomal gene outliers that showed sign of co-bursting (Fig. 5i , Extended Data Fig. 7 and Supplementary Table 1 ). The handful of outlier gene pairs contained false positives, including multiple annotated pseudogenes expressed in one transcript (Gsm49257–Gsm18787) and imprinted gene pairs (Rian-Meg3; Extended Data Fig. 7c ). Remaining gene pairs were paralogue gene pairs (for example, genes encoding for granzyme D proteins; Supplementary Table 1 ) or overlapping RNA on opposite strands (Extended Data Fig. 7e ). We performed similar analyses using different metrics or statistical tests for detecting co-bursting at the allelic resolution, which identified sporadic paralogue gene pairs with co-busting, but no analysis provided evidence for co-bursting having a larger role in shaping the transcriptional dynamics in mammalian cells.
Surprisingly, we found no distance dependence among the new RNA correlations (Fig. 5h ) despite such correlations having been previously reported 26 . The lack of general co-bursting, however, argues that such correlations should not exist. We suspected the cellular heterogeneity and batch effects present in previous data could have confounded the correlations. We tested this idea by selecting 20% of primary fibroblast cells that were the most different from the main population of cells, which created heterogeneity-derived distance-dependent correlation between genes (Extended Data Fig. 8 ).
In this study, we demonstrate that the inference of transcriptional bursting parameters can be considerably improved when analysing newly transcribed RNAs across thousands of individual cells. This was achieved through improved 4sU-based scRNA-seq (NASC-seq2) and by developing the computational inference from pulse-labelled RNA distributions. The two-state model of transcription was extended to this transient case to model the numbers of new RNAs per cell as a function of the labelling time and kinetic parameters. Noteworthily, the computational complexity of inference from the transient (pulse-labelled) state is more difficult than inference from steady-state counts, which required a dedicated inference strategy and the use of C libraries that handle computing with higher numerical precision. Profiling of nearly 10,000 fibroblast cells allowed the inference of the transcriptional off rate ( k off ) and synthesis rate ( k syn ) for thousands of endogenous genes, beyond the inference possible from steady-state measurements that failed to separate these parameters 9 .
Measuring k off and k syn rates for thousands of endogenous genes revealed that the synthesis rate ( k syn ) controls the burst size, which we demonstrated both in mouse primary fibroblasts and in human K562 cells. In contrast, the k off rate was revealed to be relatively constant across all genes. This was apparent since k off values inferred from the two data halves had less correlation (Fig. 2j ), showed moderate variation (Fig. 2d ) despite having similar technical noise as k syn , and k off values did not correlate with burst sizes (Fig. 3c ). This result agrees with a previous study that imaged nascent RNA across selected genes and similarly identified the k off rates to be invariant 27 . Thus, the duration of active bursting periods seems relatively constant across genes, on the order of a minute, whereas the amount of transcribed RNAs in the burst is achieved by higher synthesis rate of RNA. We found the synthesis rates to span from 3 to 200 molecules per hour (Fig. 2d ). Previously, we reported that burst size of genes are higher when they contain specific core promoter elements 9 . The factors that bind the core promoter elements therefore must be able to recruit associated factors and RNA polymerases more efficiently, resulting in higher RNA transcription per burst, since the synthesis rate is regulated (as opposed to the time in the on state). As previously reported 5 , 9 , increased burst size is predominantly used to increase the expression of the highly expressed genes, whereas modulating burst frequencies is predominantly used to tune expression for most other genes. This pattern was also apparent in our study, and importantly, the biological results of this study (that k syn controls burst size, whereas k off is invariant) applies to both the highly expressed and more moderately expressed genes (Figs. 3b–d and 4d ). It will be intriguing to associate specific regulators and processes to each kinetic parameter 11 , and in this pursuit, we believe single-cell new RNA profiling (as shown with NASC-seq2) will have critical importance, in particular since new RNA profiling has the power to identify RNA effects after the perturbation of specific regulators 28 , such that the extension to the single-cell level should be able to identify bursting kinetic alterations from perturbations.
The extent of bursting or constitutive expression has been debated, with large-scale experiments in bacteria 12 and human 13 that favoured predominantly constitutive expression, whereas time-lapse fluorescence microscopy at the protein level favoured predominantly transcriptional bursting 5 . Several analyses of specific genes have reported transcriptional bursting 2 , 4 , 6 , 7 , and steady-state RNA counts from scRNA-seq better fit the model of transcription that allow for bursting 9 . Importantly, in this study, we inferred transcriptional on and off rates for thousands of endogenous RNA polymerase II transcribed genes. All genes with inferable kinetics were found to be expressed in bursts, with k off values typically 100-fold larger than k on values (Fig. 3a ), providing direct RNA-level evidence for general bursting of mouse genes.
Another outstanding question is to what extent nearby genes may have coordinated bursting, so-called co-bursting, or spatial coupling of bursting. Since it is well known that nearby genes are more often involved in similar functions 29 and that the chromosomes are organized in topological domains 30 , it follows naturally that nearby genes often show correlated expression across cell types 29 . However, co-bursting is, by definition, different from co-expression, and to what extent nearby genes may have coordinated bursting is debated. Reports of transcriptional hubs 15 , 16 and condensates 17 indirectly imply co-bursting, and co-bursting was recently observed in fruit flies by real-time imaging of transgenes 4 and paralogues 19 . Spatial coupling of bursting events was also recently reported when analysing nascent RNA 18 . The single-cell new RNA profiling across nearly 10,000 fibroblasts at allelic resolution enabled us to investigate this question transcriptome-wide. Importantly, full-length scRNA-seq methods can easily monitor allelic expression from the detection of transcribed single-nucleotide polymorphisms within sequenced reads, often employing crosses of genetically distant mouse strains (here CAST and C57Bl/6) 23 , 25 . We investigated whether new RNA counts of pairs of genes were more often found from the same allele (for example, both CAST) compared with cells with pairs of new RNA counts from the other allele (for example, one CAST and one C57). No such pattern was present in the data; instead, most genes have similar cell counts from the same and different alleles, indicative of independent transcriptional processes with occasional co-bursting happening by chance. The few outliers we detected were mostly technical, for example, two imprinted genes or two pseudogenes expressed from the same RNA transcript. We did see examples of paralogues with higher proportion of cells with nascent RNA from the same alleles, possibly indicating that certain paralogue gene pairs may be more prone to co-bursting. This is in stark contrast to the two alleles of the same genes that are always statistically independent, despite most often having identical gene regulatory elements. Thus, shared gene regulatory elements (as for paralogues 19 ) together with closer genomic distances, might be a prerequisite for the sparse, few examples of coordinated co-bursting in eukaryotic genomes.
It is important to note the limitations with this study. It is not fully understood to what extent the 4sU exposure and incorporation into RNA affects the cells and the library construction. Typically, the complexity in single-cell RNA-seq libraries from 4sU exposed cells are smaller than in cells unexposed to 4sU, and with new RNA profiling in single cells the burst sizes obtained seem systematically underestimated compared with inference from standard scRNA-seq 9 and other studies 10 , 11 . It is probably a combination of slight hindrance of 4sU-incorporated RNAs during library construction and false negative assignment of RNAs due to too sparse T-to-C conversions in the sequenced reads. Yet, with the single-cell new RNA profiling method we developed in this study, NASC-seq2, we show that it is possible to detect over 100,000 RNA molecules per cell (Fig. 1f ) for sensitive analyses of transcriptional dynamics at the resolution of individual bursts.
Primary mouse fibroblasts were obtained from the tail of a 5-month-old female CAST/EiJ × C57BL/6J mouse (ethical permit numbers N95/15 and 13572-2020 from Jordbruksverket) by removal of the tail skin and culturing the minced tail in Dulbecco’s modified Eagle medium (high glucose, Gibco) supplemented with 10% embryonic stem cell foetal bovine serum (FBS; Gibco), 1% penicillin–streptomycin (Gibco), 1% non-essential amino acids (Gibco), 1% sodium pyruvate (Gibco) and 0.1 mM β-mercaptoethanol (Gibco) in a humidified incubator at 37 °C and 5% CO 2 . Tail explants were removed after 5 days, and the cultures were passaged twice to obtain a fibroblast culture. Cells were then frozen and stored in 90% FBS and 10% DMSO until needed for the experiment.
Cells were thawed and passaged twice and cells were seeded 10,000 cells per well of a six-well plate. The next day, 4-thiouridine (4sU; Sigma) was added to the growth medium to a final concentration of 200 µM. No 4sU was added to the wells that served as unlabelled control. After 2 h of labelling time, cells were detached using trpLE (Gibco), washed in cold Dulbecco’s phosphate-buffered saline and transferred through a mesh (35 µm, Corning), keeping the cell suspension on ice until sorting. We performed four experiments as described, after which cells were pooled for fluorescence-activated cell sorting (FACS) and downstream analyses.
K562 cells were obtained from DSMZ and authenticated by DSMZ Identification Service according to standards for STR profiling (ASN-0002). K562 cells (100,000) cultured in RPMI (Gibco) supplemented with 10% FBS (Sigma-Aldrich), 1× GlutaMAX supplement (Gibco) and 1% penicillin–streptomycin (Gibco) were seeded into each well of a six-well plate on the day before the experiment. Cells were labelled for 2 h with 200 µM 4sU, washed and resuspended in cold Dulbecco’s phosphate-buffered saline and transferred through a 35 µm mesh and kept on ice until sorting.
A 5′ molecular spike plasmid pool 31 was in vitro transcribed using a T7 MaxiScript (Thermo Fisher Scientific) according to the manufacturer’s protocol, with 10% of dUTP replaced by 4sUTP (Jena Bioscience). The resulting spUMI was purified by RNeasy column (Qiagen).
Cells were sorted on a BD FACS Melody cell sorter (BD Biosciences). Propidium iodine (Invitrogen) was used to stain for dead cells and the general gating strategy during FACS is shown in Extended Data Fig. 1i . From the live population, single cells were sorted through a 100 µm nozzle into 384-well polymerase chain reaction (PCR) plates with 0.3 µl lysis buffer containing 2.5 U µl −1 recombinant RNase inhibitor (RRI, 40 U µl −1 , Takara), 0.04 pg µl −1 of the aforementioned 4sU-containing spUMI pool and 0.1% Triton X-100, overlaid by 3 µl Vapor-Lock (Qiagen) per well. Plates were sealed, spun down immediately after sorting and stored at −80 °C until further processing.
Plates were taken from −80 °C, briefly spun down and kept on ice until adding 0.3 µl of alkylation mix containing 50 mM Tris–HCl (pH 8.4), 10 mM iodoacetamide (Sigma-Aldrich, dissolved in DMSO) and a total percentage of 45% DMSO (concentrations calculated for 0.6 µl alkylation volume). Plates were alkylated for 15 min at 50 °C and quenched with 0.4 µl of quenching mix containing 35 mM dithiothreitol (DTT, Sigma), 0.5 mM dNTPs, 0.6 µM oligo-dT primer (5′-biotin-ACGAGCATCAGCAGCATACGATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3′; IDT) and 0.4 U µl −1 RRI for 5 min at room temperature followed by a denaturation step at 72 °C for 10 min. Concentrations were calculated for 1 µl final quenching volume (DTT) and 4 µl total reaction volume of RT (dNTPs, oligo-dT primer, RRI), respectively. For individual molecule counting, NASC-seq2 uses the UMI-containing Smart-seq3 template-switching oligo 32 (5′-biotin-AGAGACAGATTGCGCAATGNNNNNNNNrGrGrG-3′; IDT) in the RT reaction. A mix of 25 mM Tris–HCl (pH 8.0), 35 mM NaCl, 1 mM GTP (Tris-buffered, Thermo Fisher Scientific), 2.5 mM MgCl 2 , 5% Polyethylene Glycol (PEG), 2 mM DTT, 0.4 U µl −1 RRI, 2 µM template-switching oligo and 2 U µl −1 Maxima H-minus reverse transcriptase (Thermo Fisher Scientific) was prepared with indicated concentrations applying to the final RT volume of 4 µl. The dilution of DMSO from 45% in the alkylation reaction to below 7% in the RT reaction allows for the use of Maxima H-minus RT enzyme, which is sensitive to high concentrations of DMSO. Three microlitres of RT mix was dispensed into each well and incubated at 42 °C for 90 min, 10 cycles of 50 °C and 42 °C for 2 min each, and final denaturation for 5 min at 85 °C. The remaining library preparation follows the smart-seq3 protocol 32 .
Briefly, 6 µl pre-amplification PCR mix was added per well containing 1× KAPA HiFi PCR buffer (2 mM Mg at 1×, Roche), 0.02 U µl −1 KAPA HiFi HotStart DNA polymerase (Roche), 0.5 µM forward primer (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTGCGCAATG-3′; IDT) and 0.1 µM reverse primer (5′-ACGAGCATCAGCAGCATACGA-3′; IDT), 0.3 mM dNTPs and 0.5 mM MgCl 2 . PCR was performed at 98 °C for 3 min, 21 cycles of 98 °C for 20 s, 65 °C for 30 s and 72 °C for 4 min, followed by 5 min final extension at 72 °C.
Amplified cDNA was cleaned up with 6 µl of home-made SPRI beads in 22% PEG and eluted in 12 µl ultrapure water (Invitrogen). Per-well cDNA concentrations were quantified using the QuantiFluor dsDNA assay (Promega) on an FLUOstar Omega plate reader (BMG Labtech) and diluted with ultrapure water to 200 pg µl −1 .
One microlitre of diluted cDNA was tagmented with 1 µl tagmentation mix containing TD buffer (final concentration 10 mM Tris–HCl (pH 7.5), 5 mM MgCl 2 and 5% N , N -dimethylformamide) and 0.08 µl Tn5 enzyme (ATM, Illumina Nextera XT sample preparation kit) per well for 10 min at 55 °C. Next, 0.5 µl of freshly prepared 0.2% sodium dodecyl sulfate was added and incubated for 5 min at room temperature.
For tagmentation PCR, 1.5 µl of custom Nextera index primers (0.5 µM each) was added per well followed by the addition of 3 µl tagmentation PCR mix with concentrations of 1× Phusion HF buffer (Thermo Fisher Scientific), 0.2 mM dNTPs and 0.01 U µl −1 Phusion DNA polymerase (Thermo Fisher Scientific) in the final PCR volume of 7 µl. Tagmentation PCR was performed at 72 °C for 3 min for gap filling, initial denaturation at 98 °C for 3 min, followed by 12 cycles of 98 °C for 10 s, 55 °C for 30 s and 72 °C for 30 s, with final elongation of 5 min at 72 °C. Final indexed library was pooled and purified using 0.6× volume home-made SPRI beads in 22% PEG. Library concentrations were quantified using the dsDNA Qubit kit (Invitrogen) and visualized using the Agilent bioanalyzer high-sensitivity DNA kit. The full protocol of NASC-seq2 has also been deposited in protocols.io ( https://doi.org/10.17504/protocols.io.6qpvr43nogmk/v1 ).
Libraries were pooled and converted into circular libraries through five cycles of adapter conversion PCR and circularization of 1 pmol of the product using the MGIEasy Universal Library Conversion kit App-A (MGI Tech.). Final circular, single-stranded DNA (ssDNA) sequencing libraries were quantified with the ssDNA Qubit kit (Invitrogen). DNA nanoballs were made from the circular ssDNA library pools using a custom primer (5′-TCGCCGTATCATTCAAGCAGAAGACG-3′) for rolling-circle amplification and loaded onto Flow Cell Large (FCL) flow cells (MGI Tech.). Custom sequencing primers were added to the sequencing cartridges ((read 1: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3′; MDA: 5′-CGTATGCCGTCTTCTGCTTGAATGATACGGCGAC-3′, read 2: 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′; i7 index: 5′-CCGTATCATTCAAGCAGAAGACGGCATACGAGAT-3′; i5 index: 5′-CTGTCTCTTATACACATCTGACGCTGCCGACGA-3′)). The resulting libraries were sequenced on an MGI DNBseq G400RS using StandardMPS PE200 reagents.
zUMIs 33 (v. 2.9.7) was used to process raw FASTQ files. First, reads were filtered on the basis of the cell barcode quality (five bases with a Phred score <20). The 5′ UMI containing reads were identified by the pattern ATTGCGCAATG with up to two mismatches. UMI containing reads with a poor-quality UMI sequence were filtered out (three bases with a Phred score <20). Reads were mapped to the human (hg38) or mouse (mm39) genome using STAR (v. 2.7.1 for human and v. 2.7.3a for mouse respectively) and error corrected UMI counts were quantified based on gene annotations (ENSEMBL GRCh38.95 for human and GENCODE GRCm39.vM29 for mouse, respectively). Previously published K562 NASC-seq cells 20 were re-processed as above to have identical processing and gene annotations to the K562 NASC-seq2 data.
Partial reconstruction of RNA molecules up to 1 kb is feasible from paired-end short-read sequencing data where the 5′ end contain an UMI 32 . Since the ability to separate new and old RNA is highly dependent on the RNA sequence length (Extended Data Fig. 1a ), reconstructing RNA sequences longer than that obtained from the paired reads alone, benefit the RNA classification. To this end, we used the UMI-containing reads and reconstructed their other paired reads using stitcher.py ( https://github.com/AntonJMLarsson/stitcher.py ). Briefly, paired-end reads with the same error-corrected UMI sequences were grouped, and the merged reconstructed sequence was written to a new bam file. The Phred scores of the merged sequences was propagated in cases more than one read sequence covered a base. The likelihood of each base call was derived from the Phred score, and the remaining likelihood of the other three bases being correct was distributed equally. If the initial sequencer base call as N, then the likelihood was distributed equally over all four bases. The most likely base call for each position was calculated using the softmax function, and probability scores above 0.3 were considered sufficient for a A, T, C or G base call. If the probability was below 0.3, the base call became N. The corresponding Phred score was calculated as −10 × log 10 (1 − p_max).
Reconstructed molecules were compared with the reference genome to extract the number of mismatches. Depending on gene strand, the position of each T > C or A > G mismatch was saved. Mismatches were evaluated using a binomial distribution B( k | n , p ), where k is the number of molecules with the observed mismatch in that position, n is the number of molecules that cover the position and p is the median fraction of mismatches over all positions. All positions considered significant ( α = 0.05 after Bonferroni correction) were masked.
To estimate the probability of observing a converted position (T > C for positively stranded genes and A > G for negatively stranded genes) we applied the previously described expectation-maximization algorithm 20 , 34 . Briefly, the mismatch statistics are gathered for each reconstructed molecule for each cell. We consider the mismatches to arise from a two-component binomial mixture, where one component is governed by the conversion probability ( \({p}_{{\mathrm{c}}}\) ) and the other component is governed by the error probability ( \({p}_{{\mathrm{e}}}\) ). To estimate the error probability, we used the mean of the C > T and G > A mismatch rates. The statistics for the mismatch expected in new molecules (T > C or A > G) and the \({p}_{{\mathrm{e}}}\) were used in the expectation-maximization algorithm to obtain the \({p}_{{\mathrm{c}}}\) estimate. For molecule-level hypothesis testing, we used the likelihood-ratio test with the null hypothesis \({H}_{0}\text{:}\,p={p}_{{\mathrm{c}}}\) and alternative hypothesis \({H}_{{\mathrm{A}}}\text{:}\,{p}={p}_{{\mathrm{c}}}\) with a binomial likelihood at \(\alpha =0.05\) . Each molecule was then genotyped according to the observed single-nucleotide variants that have been validated to segregate the two mouse strains, only genotyping the molecule as maternal or paternal if exclusively maternal or paternal variants were observed.
Using the steady-state probability generating function for the telegraph model, we used common tools used in mathematical analysis to derive the probability distribution of observing n molecules after labelling time t . Due to the numerical instability of evaluating Kummer’s function in Python libraries (for example, scipy) that are crucial to this computation, we implemented this computation using the C library Arb 35 , which allows for arbitrary precision of variables and also contains a very accurate module to compute Kummer’s function. More information on the derivation of the new RNA probability mass function is available in Supplementary Note 1 , and the Arb implementation is available on GitHub (see ‘Code availability’ statement).
Based on the valley of the bimodal distribution of reconstructed molecules per sample, K562 cells were filtered on the basis of having over 2,700. For fibroblasts, we required over 4,000. This removal of low-quality samples caused suspiciously artefact-like bimodal distributions in the kinetic estimates to resolve and disappear.
As the function in Supplementary Note 1 takes kinetic parameters and outputs the probability distribution of observing n molecules, and our aim is to go the opposite direction, from molecular count data to the inference of kinetic parameters, we devised a new strategy to numerically invert the function. In this three-step strategy, a look-up table was first created and used to approximate kinetic parameters from count summary statistics, used as parameter initialization in the subsequent maximum likelihood inference. To create a look-up table, we first simulated new RNA counts for combinations of kinetic parameters. The new RNA count vectors obtained for each kinetic parameter combination was summarized using three summary statistics (fraction cells with new RNA expression, average expression among cells with expression, coefficient of variation in new RNA counts among cells with expression). The kinetic parameters used for modelling together with the summary statistics were turned into a look-up table. The simulations spanned 73 different k on values (from 0.002 to 50), 38 different k syn values (1 to 200) and 55 different k off values (0.25 to 500). The values were chosen equidistant on log scale, so that they are steps of ~1.15×. We chose these boundaries because they covered the data’s distribution and the limit on k syn and k off size was needed as larger values resulted in bimodal errors in simulations. The 2-h 4sU incubation period was set, and the gene-level degradation rates were inferred as 0.065 h −1 (calculated as −ln(fraction_old)/time using fibroblasts). To avoid numerical error propagations, we used 10,000-bit floating point numbers for the formula in Supplementary Note 1 . Where possible when performing look-up searches, we used linear interpolation to identify initial kinetic parameters, and when linear interpolation failed, we simply identified the closest trio of kinetic parameters. Due to limitations in the linear interpolation, only summary values corresponding to the expression interval 0.01–350 counts per cell were used.
Next, we calculated the same three summary statistics and used them in conjunction with the look-up table to interpolate kinetic parameters or, where interpolation broke down, take the closest entry (in terms of summary statistics) in the look-up table. If identified kinetic parameters were on the boundary, the gene was excluded. Bootstrapping was used on the initial cells to infer confidence intervals to each parameter (50 bootstraps per gene), and we denoted robust inferences (or ‘controlled’ inferences) when the difference between upper and lower quartile was below 2 and when less than 50% of the bootstrap inferences failed to identify parameters.
The kinetic parameters identified from the look-up table was next used as initialization for maximum likelihood estimation to estimate the kinetic parameters: k on , k syn and k off . This uses a log likelihood function for match between the observed count distribution, and formula’s probability distribution of observing n molecules given kinetic parameters. An optimization algorithm (L-BFGS-B) search for the kinetic parameters that maximized likelihood.
Based on the kinetic estimates, we also calculated mean occupancy (fraction of time in on state) = k on /( k on + k off ), burst size (RNAs per on state) = k syn / k off , burst frequency (on states per time) = 1/(1/ k on + 1/ k off ), and expression rate (RNAs per time) = k syn × k on /( k on + k off ).
For figures where the data were split in halves, the geometric standard deviation was calculated as exp(sqrt(avg((log(half1)-log(half2)) 2 for genes)/2)). For figures without that split, it was calculated as geom_avg(95%CI_high/95%CI_low for genes) (1/(2 × 1.96)) on the basis of bootstrapped confidence intervals.
Through simulation, we created count tables of 4,000 cells from the distributions of the formula in Supplementary Note 1 using k on values of 0.2, 0.8944 and 4, k syn 10, 22.36 and 50 and k off 30, 77.46 and 200, corresponding to low, middle and high values of their distributions, with 100 simulations per each value combination, degradation at 0.065 and time either 2 h (for nascent RNA) or 1,000 h (for pre-existing + nascent RNA). We then estimated parameters from each simulated data set and plotted the deviation of inferred parameters from the input (true) values of k on , k syn and k off . Estimation of kinetic parameters from total (pre-existing + new) RNA was done as previously described 9 .
Read-level quantification of temporal state and allelic origin was performed for the co-bursting analysis (Fig. 5 ), although similar results were reached while using molecule-level assignment of temporal state and allelic origin. For each read pair (all sequencing was performed using PE200), we assigned the read to allelic origin based on the observed single-nucleotide variants that separate the two mouse strains (CAST and C57Bl/6). Reads without transcribed genetic variation were removed. Next, using the 4sU-induced mismatches per read, we performed hypothesis testing to assign reads as new. The hypothesis testing was generated using the \({p}_{{\mathrm{e}}}\) and \({p}_{{\mathrm{c}}}\) estimates obtained on the molecule-level analysis above.
Pairs of read-level new RNA counts per gene was compared using Spearman correlations in scipy, either when using all new RNA counts, or after stratifying RNA counts to allelic origin. Comparison between new RNA counts between two genes of the same chromosome was named cis , whereas non-meaningful comparisons of two gene on to different chromosomes were named trans . Additionally, chi-square and Fisher’s exact tests (scipy) were applied to allele-assigned new RNA counts for gene pairs, with similar conclusions.
No statistical method was used to pre-determine sample sizes (that is, cell numbers) throughout the study. We aimed to create a highly informative data set on transcriptional bursting in one cell-type with both high cell numbers (close to 10,000) and deep RNA counting per cell (median 100,000) to have sufficient power to infer transcriptional bursting and co-bursting, since power increase with cell numbers (Fig. 2k ). The experiments were not randomized, and the investigators were not blinded to allocation during the experiments and outcome assessment. The NASC-seq2 data set from K562 cells was used for Fig. 1a and Extended Data Figs. 1 and 4 and were based on one biological experiment involving 613 post-quality-control filtered cells, where data for each cell were treated independently throughout NASC-seq2 library preparation and analyses. The improvement with NASC-seq2 over NASC-seq has been repeatedly observed in K562 cells and primary fibroblasts. The large-scale NASC-seq2 experiment on F1 primary fibroblasts was generated from four technical replicates of primary fibroblast cultures that were independently exposed to 4sU and collected for FACS and NASC-seq2 library construction; each cell was treated independently throughout NASC-seq2 library preparation and initial analyses. For transcriptional dynamics and co-bursting analyses of the primary fibroblasts (Figs. 1 – 5 and Extended Data Figs. 2 , 3 and 5 – 8 ), cells from all replicates were pooled before analyses since as they had uniform transcriptional patterns (Fig. 1c ). Inferences of transcriptional bursting parameters was performed in parallel on independent subset of cells to avoid spurious correlations among inferred parameters (as described in detail in Methods ). Throughout the study, P values refer to two-tailed tests, with the details on the statistical analysis performed for each type of data analysis reported in the respective Methods section. Spearman correlation analyses (in Figs. 3d and 4d and Extended Data Fig. 4d ) were performed in the analysis since only real positive numbers were possible, including zeros, and since the parameters did not easily follow a normal or lognormal distribution. For Extended Data Fig. 8 , analyses used a binomial test to assess if the median was significantly departed from zero (as it allows the null hypothesis that correlations are distributed above or below zero with equal probability). In Extended Data Fig. 3b,c , we use Hartigan’s dip test for unimodality because the data seem to follow a normal distribution (Extended Data Fig. 3a ).
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Raw NASC-seq2 sequencing data (K562 and primary fibroblast cells) have been deposited in ENA (accession ID: PRJEB60799 ) and source data has been deposited in Zenodo ( https://doi.org/10.5281/zenodo.12092003 ). Kinetic estimates, code and count tables are available on GitHub ( https://github.com/sandberg-lab/NASC-seq2 ). We downloaded genome sequences from UCSC Genome Browser (mouse: GRCm39/mm39 and human:GRCh38/hg38) and GENCODE gene annotations (human ENSEMBL GRCh38.95 and mouse GRCm39.vM29). Source data are provided with this paper.
We have deposited code for processing and analyses of NASC-seq2 data on GitHub ( https://github.com/sandberg-lab/NASC-seq2 ).
McKnight, S. L. & Miller, O. L. Post-replicative nonribosomal transcription units in D. melanogaster embryos. Cell 17 , 551–563 (1979).
Article CAS PubMed Google Scholar
Chubb, J. R., Trcek, T., Shenoy, S. M. & Singer, R. H. Transcriptional pulsing of a developmental gene. Curr. Biol. 16 , 1018–1025 (2006).
Article CAS PubMed PubMed Central Google Scholar
Larson, D. R., Zenklusen, D., Wu, B., Chao, J. A. & Singer, R. H. Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science 332 , 475–478 (2011).
Fukaya, T., Lim, B. & Levine, M. Enhancer control of transcriptional bursting. Cell 166 , 358–368 (2016).
Dar, R. D. et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proc. Natl Acad. Sci. USA 109 , 17454–17459 (2012).
Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling. Science 297 , 836–840 (2002).
Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4 , e309 (2006).
Article PubMed PubMed Central Google Scholar
Bartman, C. R., Hsu, S. C., Hsiung, C. C.-S., Raj, A. & Blobel, G. A. Enhancer regulation of transcriptional bursting parameters revealed by forced chromatin looping. Mol. Cell 62 , 237–247 (2016).
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565 , 251–254 (2019).
Nicolas, D., Phillips, N. E. & Naef, F. What shapes eukaryotic transcriptional bursting? Mol. Biosyst. 13 , 1280–1290 (2017).
Rodriguez, J. & Larson, D. R. Transcription in living cells: molecular mechanisms of bursting. Annu. Rev. Biochem. 89 , 189–212 (2020).
Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329 , 533–538 (2010).
Yunger, S., Rosenfeld, L., Garini, Y. & Shav-Tal, Y. Single-allele analysis of transcription kinetics in living mammalian cells. Nat. Methods 7 , 631–633 (2010).
Zenklusen, D., Larson, D. R. & Singer, R. H. Single-RNA counting reveals alternative modes of gene expression in yeast. Nat. Struct. Mol. Biol. 15 , 1263–1271 (2008).
Cisse, I. I. et al. Real-time dynamics of RNA polymerase II clustering in live human cells. Science 341 , 664–667 (2013).
Fanucchi, S., Shibayama, Y., Burd, S., Weinberg, M. S. & Mhlanga, M. M. Chromosomal contact permits transcription between coregulated genes. Cell 155 , 606–620 (2013).
Sharp, P. A., Chakraborty, A. K., Henninger, J. E. & Young, R. A. RNA in formation and regulation of transcriptional condensates. RNA 28 , 52–57 (2022).
Bohrer, C. H. & Larson, D. R. Synthetic analysis of chromatin tracing and live-cell imaging indicates pervasive spatial coupling between genes. eLife 12 , e81861 (2023).
Levo, M. et al. Transcriptional coupling of distant regulatory genes in living embryos. Nature 605 , 754–760 (2022).
Hendriks, G.-J. et al. NASC-seq monitors RNA synthesis in single cells. Nat. Commun. 10 , 3138 (2019).
Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571 , 419–423 (2019).
Hagemann-Jensen, M., Ziegenhain, C. & Sandberg, R. Scalable single-cell RNA sequencing from full transcripts with Smart-seq3xpress. Nat. Biotechnol. 40 , 1452–1457 (2022).
Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343 , 193–196 (2014).
Peccoud, J. & Ycart, B. Markovian modeling of gene-product synthesis. Theor. Popul. Biol. 48 , 222–234 (1995).
Article Google Scholar
Reinius, B. et al. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq. Nat. Genet. 48 , 1430–1435 (2016).
Tarbier, M. et al. Nuclear gene proximity and protein interactions shape transcript covariations in mammalian single cells. Nat. Commun. 11 , 5445 (2020).
Wan, Y. et al. Dynamic imaging of nascent RNA reveals general principles of transcription dynamics and stochastic splice site selection. Cell 184 , 2878–2895.e20 (2021).
Muhar, M. et al. SLAM-seq defines direct gene-regulatory functions of the BRD4–MYC axis. Science 360 , 800–805 (2018).
Hurst, L. D., Pál, C. & Lercher, M. J. The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5 , 299–310 (2004).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485 , 376–380 (2012).
Ziegenhain, C., Hendriks, G.-J., Hagemann-Jensen, M. & Sandberg, R. Molecular spikes: a gold standard for single-cell RNA counting. Nat. Methods 19 , 560–566 (2022).
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38 , 708–714 (2020).
Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. zUMIs—a fast and flexible pipeline to process RNA sequencing data with UMIs. Gigascience 7 , giy059 (2018).
Jürges, C., Dölken, L. & Erhard, F. Dissecting newly transcribed and old RNA using GRAND-SLAM. Bioinformatics 34 , i218–i226 (2018).
Johansson, F. Arb: efficient arbitrary-precision midpoint-radius interval arithmetic. IEEE Trans. Comput. 66 , 1281–1292 (2017).
Download references
This work was supported by grants to R.S. from the Swedish Research Council, the Knut and Alice Wallenberg Foundation, Karolinska Institutet, the Göran Gustafsson Foundation and the Torsten Söderberg Foundation.
Open access funding provided by Karolinska Institute.
These authors contributed equally: Daniel Ramsköld, Gert-Jan Hendriks, Anton J. M. Larsson.
Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
Daniel Ramsköld, Gert-Jan Hendriks, Anton J. M. Larsson, Juliane V. Mayr, Christoph Ziegenhain, Michael Hagemann-Jensen, Leonard Hartmanis & Rickard Sandberg
You can also search for this author in PubMed Google Scholar
R.S., G.-J.H. and A.J.M.L. conceived the overall study. G.-J.H. and M.H.-J. developed the NASC-seq2 method. G.-J.H. and J.V.M. performed the NASC-seq2 experiments of the study. A.J.M.L., C.Z. and L.H. developed the computational processing of sequence data and assignments to temporal and allelic states. A.J.M.L. derived the new RNA probability mass function and original implementations. D.R., A.J.M.L. and R.S. performed all biological analyses of bursting kinetics from NASC-seq2 data. R.S. wrote the manuscript, with input from A.J.M.L., D.R. and G.-J.H.
Correspondence to Rickard Sandberg .
Competing interests.
The authors declare no competing interests.
Peer review information.
Nature Cell Biology thanks Simon Anders for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data fig. 1 nasc-seq2 quality control data for k562 cells..
( a ) Line plots showing the power to assign RNA molecules as new, as a function of the reconstructed sequence length, colored according to different 4sU-induced conversion levels (signal-to-noise: Pc / Pe). Dashed vertical lines show the power obtained without molecule-level reconstruction for single-end 75bp and paired-end 150bp sequencing, respectively. ( b ) Boxplot showing the inferred probability of conversion (Pc) in K562 cells (n = 613). ( c ) Boxplot showing the inferred probability of error (Pe) in K562 cells (n = 613). ( d ) Boxplot showing the signal-to-noise in K562 cells (n = 613). ( e ) Histogram summarizing the number of molecules and the power to assign them as new. ( f ) Boxplot showing the fraction of RNA molecules assigned as new in K562 cells. ( g ) Boxplots with number of genes (left) and RNA molecules (right) detected based on all RNAs (new and pre-existing). ( h ) Boxplots with number of genes (left) and RNA molecules (right) detected in new RNA. Source numerical data are available in source data. ( i ) Gating strategy for single-cell sorting into 384-well plates. Initial gating selects for cells and in particular singlets (avoiding doublets), whereas gating for Propidium Iodide selects for live cells.
( a ) Histogram with total number of genes detected using all RNA molecules (new+pre-existing), for cells with 4sU (blue) and controls cells without 4sU (yellow). ( b ) Histogram with total number of RNA molecules (new+pre-existing), for cells with 4sU (blue) and controls cells without 4sU (yellow). ( c ) Histogram with fraction new RNA, for cells with 4sU (blue) and controls cells without 4sU (yellow). ( d ) Boxplot showing the inferred probability of conversion (Pc) in primary fibroblast data. ( e ) Boxplot showing the inferred probability of error (Pe) in primary fibroblast data. ( f ) Boxplots showing the probability of error across all primary fibroblasts cells, stratified by plate column of cells. ( g ) Boxplots showing the probability of conversion across all primary fibroblasts cells, stratified by plate column of cells. Note that cells in columns 1 and 2 on each plate of primary fibroblasts where control cells that were not exposed to 4sU. Source numerical data are available in source data.
( a ) Density plots showing the error in inferred parameter compared to the true input to the simulation. For each parameter, we simulated 4000 individual cells for 27 combinations of kon , ksyn and koff values (corresponding to the low, middle and high end of their distributions), and repeated each combination 100 times to obtain statistics on the inference error and bias. Data in a-c was based on 27 independent simulations. ( b ) Hartigans’ dip test for fit to unimodality for the three error distributions (from a) and burst size. All were found to be unimodal. ( c ) Hartigans’ dip test for fit to unimodality for the three error distributions (from a) and burst size, when instead inferred from steady-state RNA counts (that is new + pre-existing RNA). Here, koff and ksyn obtain bimodal error distributions, giving rise to a false bimodal distribution in parameters when not inferred from nascent RNA. ( d ) Contour plots showing correlations between burst frequency inferred from new RNA counts (x-axis) against burst frequency inferred from total RNA counts using the steady-state inference model (y-axis). ( e ) Contour plots showing correlations between burst size inferred from new RNA counts (x-axis) against burst size inferred from total RNA counts using the steady-state inference model (y-axis). ( f ) Estimation from allelic new RNA counts give half as large kon values, as expected, but also lower ksyn values from an inefficiency to assign RNA molecules to both temporal state and allelic origin (that is there are many RNA molecules that are discarded which brings down the synthesis rate). Therefore, fewer genes retain their small confidence intervals. Source numerical data are available in source data.
( a ) Contour plot of transcriptional on ( kon) and off ( koff) rates, inferred separately on two non-overlapping halves of the cells (306 and 307 cells per half), across 1,916 genes with robust inference of kon and burst size. ( b ) Contour plot of burst size (inferred from cell subset, or half, 2) against the synthesis rate, ksyn , (inferred from cell subset, or half, 1) (1,916 genes with robust inference of kon and burst size). ( c ) Contour plot of burst size (inferred from cell subset, or half, 2) against the off rate, koff , (inferred from cell subset, or half, 1) (1,916 genes with robust inference of kon and burst size). ( d ) Spearman correlation matrix of all measurements (asterisks) and derived estimates from K562 cells, when in parallel inferred from the two cell subsets (or halves) based on the 1,916 genes with robust kon and burst size inference. Geometric standard deviation indicates technical variation. Source numerical data are available in source data.
Four independent linear regressions were made, to assess significant interactions between core promoter elements and each of these four quantities ( a : burst size, b : burst frequency, c : ksyn , d : koff ). The results of the modeling is presented, with pink rectangles highlighting the motifs, their interactions and p-values. The kinetic parameters was inferred from fibroblast new RNA, genes with controlled (that is low CI) burst size, kon, ksyn and koff. The results demonstrate that TATA motifs have significant effect on burst size and ksyn , but not with burst frequency and koff .
( a ) Scatter plots of total new RNA counts from the CAST and C57 alleles (mean across cells) showing the lack of overall bias in allelic counts. ( b ) Fraction of new RNA reads from the C57 allele for autosomal genes (blue) and genes on the X-chromosome (yellow), showing that overall new RNA counts for autosomal genes center on 0.5 and X-chromosome genes follow the inactivation. ( c ) Scatter plots showing with imprinted genes, showing their total new RNA counts towards the maternal and paternal allele. ( d ) Histogram showing the fraction of reads assigned to the maternal allele for imprinted genes in primary fibroblast cells. ( e ) Fraction of reads per cells that were assigned to maternal origin for imprinted genes. The columns on the right show the percentage of cells that had 0 or 1 fraction of reads maternal. Source numerical data are available in source data.
( a ) Scatter plot showing the spearman correlation for gene pairs based on new RNA counts from the same allele (x-axis) against the spearman correlation for gene pairs based on new RNA counts from the opposite alleles (y-axis), highlighting the strongest gene-pairs with increased spearman correlation only when based on new RNA counts from the same allele. Data shown is from the smallest genomic distance bin (< 0.1 Mbp). ( b ) Histogram showing the power to assign individual reads as new, for all reads assigned to the C57 allele. ( c ) Barplots showing the new RNA counts for the Meg3-Rian gene pair interaction (each shown on positive or negative y-axis scale), together with the observed Spearman correlations. For Meg3 and Rian, the increased correlation of new RNA counts from the same allele is simply a consequence of both genes having imprinting and only expression from the maternal allele, driving a non-meaningful signal of co-bursting. ( d ) Barplots showing the new RNA counts for the Ccl7-Ccl2 gene pair interaction (each shown on positive or negative y-axis scale), together with the observed Spearman correlations. For Ccl7 and Ccl2, the increased correlation of new RNA counts from the same allele could indicate a light increased co-bursting, although a technical nature such as read mismapping has not been ruled out. ( e ) Barplots showing the new RNA counts for the Dio3-Dio3os gene pair interaction (each shown on positive or negative y-axis scale), together with the observed Spearman correlations. For Dio3 and Dio3os, only Dio3 have high expression across the cells with very sparse counts for Dio3os – a non-coding RNA on the opposite strand of Dio3. Still the correlation in expression from the same allele is higher than the correlations from the opposite alleles, although in-depth scrutiny of read assignments to strands of these two overlapping RNAs have not been performed. Source numerical data are available in source data.
( a ) We turned our homogeneous fibroblasts data into a mix of two distinct groups by arranging the cells on a 1-dimensional similarity axis using multidimensional scaling and removing the middle 80% of cells. ( b ) The resulting data heterogeneity introduced a distance dependence of the gene-gene correlations, for both same allele, opposite allele and combined reads. ( c ) It also introduced a distance-dependent correlation difference between same allele and non-same (P = 10 −10 , Spearman correlation test). Significance of having a mean above zero for the first bin was P < 10 −350 (Binomial test). Figures based on 208,379 gene-gene tests. Source numerical data are available in source data.
Supplementary information.
Supplementary Note 1.
Peer review file, supplementary table 1.
Gene pairs with highest amount of observed co-bursting.
Numerical source data.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Cite this article.
Ramsköld, D., Hendriks, GJ., Larsson, A.J.M. et al. Single-cell new RNA sequencing reveals principles of transcription at the resolution of individual bursts. Nat Cell Biol (2024). https://doi.org/10.1038/s41556-024-01486-9
Download citation
Received : 16 March 2024
Accepted : 15 July 2024
Published : 28 August 2024
DOI : https://doi.org/10.1038/s41556-024-01486-9
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
COMMENTS
H 0 (Null Hypothesis): Population parameter =, ≤, ≥ some value. H A (Alternative Hypothesis): Population parameter <, >, ≠ some value. Note that the null hypothesis always contains the equal sign. We interpret the hypotheses as follows: Null hypothesis: The sample data provides no evidence to support some claim being made by an individual.
To distinguish it from other hypotheses, the null hypothesis is written as H 0 (which is read as "H-nought," "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the ...
Below are typical examples of writing a null hypothesis for various parameters and hypothesis tests. Related posts: Descriptive vs. Inferential ... is the population parameter for the mean, and you'll need to include it in the statement for this type of study. For example, an experiment compares the mean bone density changes for a new ...
An example of the null hypothesis is that light color has no effect on plant growth. The null hypothesis (H 0) is the hypothesis that states there is no statistical difference between two sample sets. In other words, it assumes the independent variable does not have an effect on the dependent variable in a scientific experiment.
The null hypothesis is the statement that a researcher or an investigator wants to disprove. ... Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. In this case, the sample data provides insufficient data to conclude that the effect exists in the population.
Revised on June 22, 2023. The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (Ha or H1): There's an effect in the population. The effect is usually the effect of the ...
Step 1: Figure out the hypothesis from the problem. The hypothesis is usually hidden in a word problem, and is sometimes a statement of what you expect to happen in the experiment. The hypothesis in the above question is "I expect the average recovery period to be greater than 8.2 weeks.". Step 2: Convert the hypothesis to math.
The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.
Null Hypothesis Examples. "Hyperactivity is unrelated to eating sugar " is an example of a null hypothesis. If the hypothesis is tested and found to be false, using statistics, then a connection between hyperactivity and sugar ingestion may be indicated. A significance test is the most common statistical test used to establish confidence in a ...
The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.
The null hypothesis and the alternative hypothesis are types of conjectures used in statistical tests to make statistical inferences, which are formal methods of reaching conclusions and separating scientific claims from statistical noise.. The statement being tested in a test of statistical significance is called the null hypothesis. The test of significance is designed to assess the strength ...
The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (HA): There's an effect in the population. The effect is usually the effect of the independent variable on the dependent ...
Concept Review. In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of claim.If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with H 0.The null is not rejected unless the hypothesis test shows otherwise.
10.1 - Setting the Hypotheses: Examples. A significance test examines whether the null hypothesis provides a plausible explanation of the data. The null hypothesis itself does not involve the data. It is a statement about a parameter (a numerical characteristic of the population). These population values might be proportions or means or ...
Alternative Hypothesis: Definition: A statement of no effect or no relationship: A statement that suggests there is an effect or relationship: Denotation: H 0: H a or H 1: Example: ... For example, a null hypothesis might propose a new teaching technique doesn't enhance student performance. If data contradicts this, the technique may be ...
Write a research null hypothesis as a statement that the studied variables have no relationship to each other, or that there's no difference between 2 groups. Write a statistical null hypothesis as a mathematical equation, such as. μ 1 = μ 2 {\displaystyle \mu _ {1}=\mu _ {2}} if you're comparing group means.
The null hypothesis ( H0. H 0. ) is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt. The alternative hypothesis ( Ha. H a. ) is a claim about the population that is contradictory to H0. H 0.
5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). Null Hypothesis. The statement that there is not a difference in the population (s), denoted as H 0.
5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.
Here, the hypothesis test formulas are given below for reference. The formula for the null hypothesis is: H 0 : p = p 0. The formula for the alternative hypothesis is: H a = p >p 0, < p 0 ≠ p 0. The formula for the test static is: Remember that, p 0 is the null hypothesis and p - hat is the sample proportion.
A null hypothesis is a theory based on insufficient evidence that requires further testing to prove whether the observed data is true or false. For example, a null hypothesis statement can be "the rate of plant growth is not affected by sunlight.". It can be tested by measuring the growth of plants in the presence of sunlight and comparing ...
Null hypothesis, often denoted as H0, is a foundational concept in statistical hypothesis testing. It represents an assumption that no significant difference, effect, or relationship exists between variables within a population. Learn more about Null Hypothesis, its formula, symbol and example in this article
Below are the primary purposes of the null hypothesis: 1. Baseline for Comparison. The null hypothesis provides a baseline or a default position that indicates no effect, no difference, or no relationship between variables. It is the statement that researchers aim to test against an alternative hypothesis.
For molecule-level hypothesis testing, we used the likelihood-ratio test with the null hypothesis \({H}_{0} ... (see 'Code availability' statement). Sample quality filter ...