Introduction: Biopics, Biography, Heritage, and the Literary Biopic

  • First Online: 02 June 2019

Cite this chapter

characteristics of a biography pdf

  • Hila Shachar 4  

Part of the book series: Palgrave Studies in Adaptation and Visual Culture ((PSADVC))

437 Accesses

2 Altmetric

Shachar introduces the key debates in the analysis of biopics and heritage cinema in relation to the theorisation of the literary biopic as its own genre with a unique aesthetic template. Moving from an analysis of the early Hollywood ‘golden days’ of the studio-system era, which churned out biopics as part of a celebrity-driven enterprise, Shachar traces how the contemporary literary biopic both sits alongside and diverges from such early Hollywood cinema, and overlaps with concerns of more recent heritage films. She concludes by analysing how the contemporary literary biopic can be conceived of in generic terms, and lays out the methodological approach of how it represents authorial identity through an amalgamation of postmodern, intersectional, and Romantic ideological politics of representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Baldick, Chris. 1983. The Social Mission of English Criticism, 1848–1932 . Oxford: Clarendon Press.

Google Scholar  

Barthes, Roland. 1995. The Death of the Author. In Authorship: From Plato to Postmodernism: A Reader , ed. Sean Burke, 125–130. Edinburgh: Edinburgh University Press.

Batchelor, John. 1995. Introduction. In The Art of Literary Biography , ed. John Batchelor, 1–11. Oxford: Clarendon Press.

Chapter   Google Scholar  

Becoming Jane . 2007. Directed by Julian Jarrold. UK/Ireland, HanWay Films/UK Film Council/Ecosse Films/Bórd Scannán na hÉireann/Miramax/BBC Films.

Beloved Infidel . 1959. Directed by Henry King. USA, Twentieth Century Fox.

Bennett, Andrew. 2005. The Author . London: Routledge.

Bingham, Dennis. 2010. Whose Lives Are They Anyway? The Biopic as Contemporary Film Genre . New Brunswick: Rutgers University Press.

Buchanan, Judith (ed.). 2013. The Writer on Film: Screening Literary Authorship . Houndmills, Basingstoke, Hampshire: Palgrave Macmillan.

Crenshaw, Kimberlé. 1991. Mapping the Margins: Intersectionality, Identity Politics, and Violence Against Women of Color. Stanford Law Review 43 (6): 1241–1299.

Article   Google Scholar  

Custen, George F. 1992. Bio/Pics: How Hollywood Constructed Public History . New Brunswick: Rutgers University Press.

Denby, David. 2013. Hitler in Hollywood. The New Yorker , September 16.

Derrida, Jacques. 1981. The Law of Genre. In On Narrative , ed. W.T.C. Mitchell, 51–77. Chicago: University of Chicago Press.

Eagleton, Terry. 1983. Literary Theory: An Introduction . Oxford: Basil Blackwell.

Ehrenstein, David. 1993. Out of the Wilderness: An Interview with Sally Potter. Film Quarterly 47 (1): 2–7.

Elsaesser, Thomas. 1986. Film as Social History: The Dieterle/Warner Brothers Bio-pic. Wide Angle 8 (2): 15–32.

Garber, Marjorie. 1987. Shakespeare’s Ghost Writers: Literature as Uncanny Causality . New York: Methuen.

Higson, Andrew. 2003. English Heritage, English Cinema: Costume Drama Since 1980 . Oxford: Oxford University Press.

Howl . 2010. Directed by Rob Epstein and Jeffrey Friedman. USA, Werc Werk Works/Telling Pictures/RabbitBandini Productions.

Keyishian, Harry. 2000. Shakespeare and Movie Genre. In The Cambridge Companion to Shakespeare on Film , ed. Russell Jackson, 72–86. Cambridge: Cambridge University Press.

Kucich, John, and Dianne F. Sadoff. 2000. Introduction: Histories of the Present. In Victorian Afterlife: Postmodern Culture Rewrites the Nineteenth Century , ed. John Kucich and Dianne F. Sadoff, ix–xxx. Minneapolis: University of Minnesota Press.

Minier, Márta, and Maddelena Pennacchia. 2014. Adaptation, Intermediality and the British Celebrity Biopic . Farnham: Ashgate.

Monk, Claire. 1995a. The British “Heritage Film” and Its Critics. Critical Survey 7 (2): 116–124.

Monk, Claire. 1995b. Sexuality and the Heritage. Sight and Sound 5 (10): 32–34.

Neale, Steve. 2000. Genre and Hollywood . London: Routledge.

Paulson, William. 2003. Intellectuals. In The Cambridge Companion to Modern French Culture , ed. Nicholas Hewitt, 145–164. Cambridge: Cambridge University Press.

Pidduck, Julianne. 2004. Contemporary Costume Film: Space, Place and the Past . London: BFI.

Polaschek, Bronwyn. 2013. The Postfeminist Biopic: Narrating the Lives of Plath, Kahlo, Woolf and Austen . Houndmills, Basingstoke, Hampshire: Palgrave Macmillan.

Book   Google Scholar  

Rosenstone, Robert. 2006. Telling Lives. In History on Film/Film on History , 89–110. Harlow: Pearson Longman.

Rosenstone, Robert. 2007. In Praise of the Biopic. In Lights, Camera, History: Portraying the Past in Film , ed. Richard Francaviglia and Jerry Ronitzky, 11–29. College Station and Arlington: Texas A&M University Press.

Sadoff, Dianne. F. 2010. Victorian Vogue: British Novels on Screen . Minneapolis: The University of Minnesota Press.

Schlaeger, Jürgen. 1995. Biography: Cult as Culture. In The Art of Literary Biography , ed. John Batchelor, 57–71. Oxford: Clarendon Press.

Shakespeare in Love . 1998. Directed by John Madden. USA/UK, Universal Pictures/Miramax.

The Barretts of Wimpole Street . 1934. Directed by Sidney Franklin. USA, MGM.

The Life and Times of Emile Zola . 1937. Directed by William Dieterle. USA, Warner Bros.

Urwand, Ben. 2013. The Collaboration: Hollywood’s Pact with Hitler . Cambridge, MA: Belknap Press.

Wilde . 1997. Directed by Brian Gilbert. UK, Samuelson Productions/Dove International/Capitol Films/BBC Films.

Download references

Author information

Authors and affiliations.

De Montfort University, Leicester, UK

Hila Shachar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Hila Shachar .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s)

About this chapter

Shachar, H. (2019). Introduction: Biopics, Biography, Heritage, and the Literary Biopic. In: Screening the Author. Palgrave Studies in Adaptation and Visual Culture. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-18850-4_1

Download citation

DOI : https://doi.org/10.1007/978-3-030-18850-4_1

Published : 02 June 2019

Publisher Name : Palgrave Macmillan, Cham

Print ISBN : 978-3-030-18849-8

Online ISBN : 978-3-030-18850-4

eBook Packages : Literature, Cultural and Media Studies Literature, Cultural and Media Studies (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Biology LibreTexts

1.4: Characteristics of Life

  • Last updated
  • Save as PDF
  • Page ID 6255

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

f-d:6c3c77b6b7f6ec28b5b0945dba474ec7e7977d4d36c13de41b966061 IMAGE_TINY IMAGE_TINY.1

What do a bacterium and a whale have in common?

Do they share characteristics with us? All living organisms, from the smallest bacterium to the largest whale, share certain characteristics of life. Without these characteristics, there is no life .

Characteristics of Life

Look at the duck decoy in Figure below. It looks very similar to a real duck. Of course, real ducks are living things. What about the decoy duck? It looks like a duck, but it is actually made of wood. The decoy duck doesn’t have all the characteristics of a living thing. What characteristics set the real ducks apart from the decoy duck? What are the characteristics of living things?

A duck decoy looks like it's alive, but it doesn't have all the characteristics of a living thing

This duck decoy looks like it’s alive. It even fools real ducks. Why isn’t it a living thing?

To be classified as a living thing, an object must have all six of the following characteristics:

  • It responds to the environment.
  • It grows and develops.
  • It produces offspring.
  • It maintains homeostasis.
  • It has complex chemistry.
  • It consists of cells.

Response to the Environment

All living things detect changes in their environment and respond to them. What happens if you step on a rock? Nothing; the rock doesn’t respond because it isn’t alive. But what if you think you are stepping on a rock and actually step on a turtle shell? The turtle is likely to respond by moving—it may even snap at you!

Growth and Development

All living things grow and develop. For example, a plant seed may look like a lifeless pebble, but under the right conditions it will grow and develop into a plant. Animals also grow and develop. Look at the animals in Figure below. How will the tadpoles change as they grow and develop into adult frogs?

Tadpoles go through visible changes that show growth and development, a characteristic of life

Tadpoles go through many changes to become adult frogs.

Reproduction

All living things are capable of reproduction. Reproduction is the process by which living things give rise to offspring. Reproducing may be as simple as a single cell dividing to form two daughter cells. Generally, however, it is much more complicated. Nonetheless, whether a living thing is a huge whale or a microscopic bacterium, it is capable of reproduction.

Keeping Things Constant

All living things are able to maintain a more-or-less constant internal environment. They keep things relatively stable on the inside regardless of the conditions around them. The process of maintaining a stable internal environment is called homeostasis . Human beings, for example, maintain a stable internal body temperature. If you go outside when the air temperature is below freezing, your body doesn’t freeze. Instead, by shivering and other means, it maintains a stable internal temperature.

Complex Chemistry

All living things—even the simplest life forms—have a complex chemistry. Living things consist of large, complex molecules, and they also undergo many complicated chemical changes to stay alive. Thousands (or more) of these chemical reactions occur in each cell at any given moment. Metabolism is the accumulated total of all the biochemical reactions occurring in a cell or organism. Complex chemistry is needed to carry out all the functions of life.

All forms of life are built of at least one cell. A cell is the basic unit of the structure and function of living things. Living things may appear very different from one another on the outside, but their cells are very similar. Compare the human cells on the left in Figure below and onion cells on the right in Figure below. How are they similar? If you click on the animation titled Inside a Cell at the link below, you can look inside a cell and see its internal structures. http://bio-alive.com/animations/cell-biology.htm

Humans and onions look very different, but when comparing the cells, you might notice some similarities

Human Cells (left). Onion Cells (right). If you looked at cells under a microscope, this is what you might see.

  • All living things detect changes in their environment and respond to them.
  • All living things grow and develop.
  • All living things are capable of reproduction, the process by which living things give rise to offspring.
  • All living things are able to maintain a constant internal environment through homeostasis.
  • All living things have complex chemistry.
  • All forms of life are built of cells. A cell is the basic unit of the structure and function of living things.

Explore More

Use this resource to answer the questions that follow.

  • http://www.hippocampus.org/Biology → Non-Majors Biology → Search: Defining Biology
  • What does "biology" encompass?
  • What characteristics define life?
  • Define metabolism.
  • Are viruses living? Explain your answer.
  • List the six characteristics of all living things.
  • Define homeostasis.
  • What is a cell?
  • Making the next generation is known as ____________.
  • Assume that you found an object that looks like a dead twig. You wonder if it might be a stick insect. How could you determine if it is a living thing?
  • Genetics and Genomics
  • Computational and Systems Biology

Sequence characteristics and an accurate model of abundant hyperactive loci in the human genome

  • Sanjarbek Hudaiberdiev author has email address
  • Ivan Ovcharenko author has email address
  • National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
  • https://doi.org/ 10.7554/eLife.95170.1
  • Open access
  • Copyright information

Enhancers and promoters are classically considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with seemingly no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1,003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected 5 distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.

eLife assessment

This valuable study explores the sequence characteristics and conservation of high-occupancy target loci, which are genomic regions bound by a multitude of transcription factors, at promoters and enhancers throughout the human genome. The computational analyses presented in this study are solid , although the evidence for some claims is inadequate . This study would be a helpful resource for researchers performing ChIP-seq based analyses of transcription factor binding.

  • https://doi.org/ 10.7554/eLife.95170.1.sa3
  • Read the peer reviews
  • About eLife assessments

Introduction

Tissue-specificity of gene expression is orchestrated by the combination of transcription factors (TFs) that bind to regulatory regions such as promoters, enhancers, and silencers 1 , 2 . Classically, an enhancer is thought to be bound by a few TFs that recognize a specific DNA motif at their cognate TF binding site (TFBS) through its DNA-binding domain and recruit other molecules necessary for catalyzing the transcriptional machinery 3 – 5 . Based on the arrangements of the TFBSs, also called “motif grammar”, the architecture of enhancers is commonly categorized into “enhanceosome” and “billboard” models 6 , 7 . In the enhanceosome model, a rigid grammar of motifs facilitates the formation of a single structure comprising multiple TFs which then activates the target gene 8 , 9 . This model requires the presence of all the participating proteins. Under the billboard model, on the other hand, the TFBSs are independent of each other and function in an additive manner 10 . However, as the catalogs of TF ChIP-seq assays have expanded thanks to the major collaborative projects such as ENCODE 11 and modENCODE 12 , this assertion that the TFs interact with DNA through the strictly defined binding motifs has fallen under increasing contradiction with empirically observed patterns of DNA binding regions of TFs. In particular, there have been reported genomic regions that seemingly get bound by a large number of TFs with no apparent DNA sequence specificity. These genomic loci have been dubbed high-occupancy target (HOT) regions and were detected in multiple species 12 – 16 .

Initially, these regions have been partially attributed to technical and statistical artifacts of the ChIP-seq protocol, resulting in a small list of blacklisted regions which are mostly located in unstructured DNA regions such as repetitive elements and low complexity regions 17 , 18 . These blacklisted regions have been later excluded from the analyses and they represent a small fraction of the mapped ChIP-seq peaks. In addition, various studies have proposed the idea that some DNA elements can serve as permissive TF binding platforms such as GC-rich promoters, CpG islands, R-loops, and G-quadruplexes 17 , 18 . Other studies have argued that these regions are highly consequential regions enriched in epigenetic signals of active regulatory elements such as histone modification regions and high chromatin accessibility 12 , 19 , 20 .

Early studies of the subject have been limited in scope due to the small number of available TF ChIP-seq assays. There have been numerous studies in recent years with additional TFs across multiple cell lines. For instance, Partridge et al. 20 studied the HOT loci in the context of 208 chromatin-associated proteins. They observed that the composition of the chromatin-associated proteins differs depending on whether the HOT locus is located in an enhancer or promoter. Wreczycka et al. 18 performed a cross-species analysis of HOT loci in the promoters of highly expressed genes, and established that some of the HOT loci correspond to the “hyper-ChIPable” regions. Remarker et al. 19 conducted a comparative study of HOT regions in multiple cell lines and detected putative driver motifs at the core segments of the HOT loci.

In this study, we used the most up-to-date set of TF ChIP-seq assays available from the ENCODE project and incorporated functional genomics datasets such as 3D chromatin data (Hi-C), eQTLs, GWAS, and clinical disease variants to characterize and analyze the functional implications of the HOT loci. We report that the HOT loci are one of the prevalent modes of regulatory TF-DNA interactions; they represent active regulatory regions with distinct patterns of bound TFs manifested as clusters of promoter-specific, enhancer-specific, and chromatin-associated proteins. They are active during the embryonic stage and are enriched in disease-associated variants. Finally, we propose a model for the HOT regions based on the idea of the existence of large transcriptional condensates.

HOT loci are one of the prevalent modes of TF-DNA interactions

To define and analyze the high-occupancy target (HOT) loci, we used the most up-to-date catalog of ChIP-seq datasets (n=1,003) of TFs obtained from the ENCODE Project assayed in HepG2, K562, and H1-hESC (H1) cells (545, 411, and 47 ChIP-seq assays respectively, see Methods for details). While the TFs are defined as sequence-specific DNA-binding proteins that control the transcription of genes, the currently available ChIP-seq datasets include the assays of many other types of transcription-related proteins such as cofactors, coactivators, histone acetyltransferases as well as RNA Polymerase 2 variants. Therefore we collectively call all of these proteins DNA-associated proteins (DAPs). Using the datasets of DAPs, we overlaid all of the ChIP-seq peaks and obtained the densities of DAP binding sites across the human genome using a non-overlapping sliding window of length 400 bp and considered a binding site to be present in a given window if 8 bp centered at the summit of a ChIP-seq peak as overlapping. Given that the analyzed three cell lines contain varying numbers of assayed DAPs, we binned the loci according to the number of overlapping DAPs in a logarithmic scale with 10 intervals and defined HOT loci as those that fall to the highest 4 bins, which translates to those which contain on average >18% of available DAPs for a given cell line (see Methods for a detailed description and justifications). This resulted in 25,928, 15,231, and 2,732 HOT loci in HepG2, K562, and H1 cells respectively. We applied our definition to the Roadmap Epigenomic ChIP-seq datasets and observed that the number of available ChIP-seq datasets significantly affects the resulting HOT loci. However, the HOT loci defined using the Roadmap Epigenomic datasets were almost entirely composed of subsets of the ENCODE-based HOT loci, comprising 50%, 62%, and 15% in HepG2, K562, and H1, respectively (Table S5). Importantly, we note that the distribution of the number of loci is not multimodal, but rather follows a uniform spectrum, and thus, this definition of HOT loci is ad-hoc ( Fig 1A , Fig S1). Therefore, in addition to the dichotomous classification of HOT and non-HOT loci, we use all of the DAP-bound loci to extract the correlations with studied metrics with the number of bound DAPs when necessary. Throughout the study, we used the loci from the HepG2 cell line as the primary dataset for analyses and used the K562 and H1 datasets when the comparative analysis was necessary.

characteristics of a biography pdf

HOT loci are prevalent in the genome. A ) Distribution of the number of loci by the number of overlapping peaks 400bp loci. Loci are binned on a logarithmic scale (Table S1. Methods). The shaded region represents the HOT loci. B ) Prevalence of DAPs in HOT loci. Each dot is a DAP. X-axis: percentage of HOT loci in which DAP is present. Y-axis: percentage of total peaks of DAPs that are located in HOT loci. Dot color and size are proportional to the total number of ChIP-seq peaks of DAP. C ) Breakdown of HepG2 HOT loci to the promoter, intronic and intergenic regions. D ) Fractions of HOT enhancer and promoter loci located in ATAC-seq. E ) Overlaps between the HOT enhancer, HOT promoter, super-enhancer, regular enhancer, H3K27ac, and H4K4me1 regions. All of the visualized data is generated from the HepG2 cell line.

Although the HOT loci represent only 5% of all the DAP-bound loci in HepG2, they contain 51% of all mapped ChIP-seq peaks. The fraction of the ChIP-seq peaks of each DAP overlapping HOT loci varies from 0% to 91%, with an average of 65% ( Fig 1B , y-axis). Among the DAPs that are present in the highest fraction of HOT loci are ( Fig 1B , x-axis) SAP130, MAX, ARID4B, ZGPAT, HDAC1, MED1, TFAP4, and SOX6. The abundance of histone deacetylase-related factors mixed with transcriptional activators suggests that the regulatory functions of HOT loci are a complex interplay of activation and repression. RNA Polymerase 2 (POLR2) is present in 42% of HOT loci arguing for active transcription at or in the proximity of HOT loci (including mRNA and eRNA transcription). When the fraction of peaks of individual DAPs overlapping the HOT loci are considered ( Fig 1B , y-axis), DAPs with >90% overlap are GMEB2 (essential for replication of parvoviruses), ZHX3 (zinc finger transcriptional repressor), and YEATS2 (subunit of acetyltransferase complex). Whereas the DAPs that are least associated with HOT loci (<5%) are ZNF282 (transcriptional repressor), MAFK, EZH2 (histone methyltransferase), and TRIM22 (ubiquitin ligase). The fact that HOT loci harbor more than half of the ChIP-seq peaks suggests that the HOT loci are one of the prevalent modes of TF-DNA interactions rather than an exceptional case, as has been initially suggested by earlier studies 17 , 18 .

Around half of the HOT loci (51%) are located in promoter regions (46% in primary promoters and 5% in alternative promoters), 25% in intronic regions, and only 24% are in intergenic regions with 9% being located >50 kbs away from promoters, suggesting that the HOT loci are mainly clustered in vicinities (promoters and introns) of transcription start sites and therefore potentially playing essential roles in the regulation of nearby genes ( Fig 1C ). When considering the non-promoter HOT loci, we observed that they were universally located in regions of H3K27ac or H3K4me1, indicating that they are active enhancers (Fig S2 A-D). When comparing the definitions of promoters and enhancers based on chromHMM states and ENCODE SCREEN annotations, the composition of HOT loci in relation to promoters and enhancers showed similar fractions (Fig S2E). Both HOT promoters and enhancers are almost entirely located in the chromatin-accessible regions (97% and 93% of the total sequence lengths, respectively, Fig 1D ). We compared our definition of the HOT loci to those reported in Remaker et al. 2020 19 and Boyle et al. 2014 21 . We observed that because these two studies define HOT loci using 2 kb windows, they cover a larger fraction of the genome. Our set of HOT loci largely consisted of subsets of those defined in these two studies, with overlap percentages of 81%, 93%, and 100% in HepG2, K562, and H1, respectively (Fig S3). Further analysis revealed that our set of HOT loci primarily constitutes the “core” and more conserved (Fig S4) regions of HOT loci defined in the mentioned studies, while their composition in terms of promoter, intronic, and intergenic regions is similar (Fig S5), suggesting that the three definitions point to loci with similar characteristics.

To further dissect the composition of HOT enhancer loci, we compared them to super-enhancers as defined in the study by White et al. 2013 22 and a set of regular enhancers (see Methods for definitions). Overall, 31% of HOT enhancers and 16% of HOT promoters are located in super-enhancers, while 97% of all HOT loci overlap H3K27ac or H3K4me1 regions ( Fig 1E ). While HOT enhancers and promoters appear to provide a critical foundation for super-enhancer formation, they represent only a small fraction of super-enhancer sequences overall accounting for 9% of combined super-enhancer length.

A 400 bp HOT locus, on average, harbors 125 DAP peaks in HepG2. However, the peaks of DAPs are not uniformly distributed across HOT loci. There are 68 DAPs with >80% of all of the peaks located in HOT loci ( Fig 1B ). To analyze the signatures of unique DAPs in HOT loci, we performed a PCA analysis using the overlapping DAP combinations for each HOT locus. This analysis showed that the principal component 1 (PC1) is correlated with the total number of distinct DAPs located at a given HOT locus (Fig S6A). PC2 separates the HOT promoters and HOT enhancers ( Fig 2A , FigS6B), and the PC1-PC2 combination also separates the p300-bound HOT loci (Fig S6C). This indicates that the HOT promoters and HOT enhancers must have distinct signatures of DAPs. To test if such signatures exist, we clustered the DAPs according to the fractions of HOT promoter and HOT enhancer loci that they overlap with. This analysis showed that there is a large cluster of DAPs (n=458) which on average overlap only 17% of HOT loci which are likely secondary to the HOT locus formation (Fig S7). We focused on the other, HOT-enriched, cluster of DAPs (n=87) which are present in 53% of HOT loci on average (Fig S7) and consist of four major clusters of DAPs ( Fig 2D ). Cluster I comprises 4 DAPs ZNF687, ARID4B, MAX, and SAP130 which are present in 75% of HOT loci on average. The three latter of these DAPs form a PPI interaction network (PPI enrichment p-value=0.001) (Fig S8A). We called this cluster of DAPs essential regulators given their widespread presence in both HOT enhancers and HOT promoters. Cluster II comprises 29 DAPs which are present in 47% of the HOT loci and are 1.7x more likely to overlap HOT promoters than HOT enhancers. Among these DAPs are POLR2 subunits, PHF8, GABP1, GATAD1, TAF1 etc. The strongest associated GO molecular function term with the DAPs of this cluster is RNA Polymerase transcription factor initiation activity suggestive of their direct role in transcriptional activity (FigS8B). Cluster III comprises 16 DAPs which are 1.9x more likely to be present in HOT enhancers than in HOT promoters. These are a wide variety of transcriptional regulators among which are those with high expression levels in liver NFIL3, NR2F6, and pioneer factors HNF4A, CEBPA, FOXA1, and FOXA2. The majority (13/16) of DAPs of this cluster form a PPI network (PPI enrichment p-value < 10 −16 , Fig S8C). Among the strongest associated GO terms of biological processes are those related to cell differentiation ( white fat cell differentiation , endocrine pancreas development , dopaminergic neuron differentiation , etc.) suggesting that cluster III HOT enhancers underlie cellular development. Cluster IV comprises 12 DAPs which are equally abundant in both HOT enhancers and HOT promoters (64% and 63% respectively), which form a PPI network (PPI enrichment p-value < 6×10 −16 , Fig S8D) with HDAC1 (Histone deacetylase 1) being the node with the highest degree, suggesting that the DAPs of the cluster may be involved in chromatin-based transcriptional repression. Lastly, Cluster V comprises 26 DAPs of a wide range of transcriptional regulators, with a 1.3x skew towards the HOT enhancers. While this cluster contains prominent TFs such as TCF7L2, FOXA3, SOX6, FOSL2, etc., the variety of the pathways and interactions they partake in makes it difficult to ascertain the functional patterns from the constituent of DAPs alone. Although this clustering analysis reveals subsets of DAPs that are specific to either HOT enhancers or HOT promoters (clusters II and III), it still does not explain what sorts of interplays take place between these recipes of HOT promoters and HOT enhancers, as well as with the other clusters of DAPs with equal abundance in both the HOT promoters and HOT enhancers. To test the significance of the PPI networks described above, we ran 100 trials for each cluster by randomly selecting an equal number of DAPs reported in PPI networks and calculated the significance of the PPI enrichment p-values. All of the reported PPI enrichment p-values were significantly higher than the randomized trials (p-value < 0.01, one-sample t-test).

characteristics of a biography pdf

PCA plots of HOT loci based on the DAP presence vectors. Each dot represents a HOT locus: A ) PC1 and PC2, marked promoters and enhancers. B ) PC1 and PC2, marked p300 bound HOT loci. C ) PC1 and PC4, marked CTCF bound HOT loci. The dashed lines in A,B,C are logistic regression lines. auROC values are indicated on x-axes. D ) DAPs hierarchically clustered by their involvement in HOT promoters and HOT enhancers. Heatmap colors indicate the % of HOT enhancers or promoters that a given DAP overlaps with. All of the visualized data is generated from the HepG2 cell line.

Notably, PC4 separates HOT loci associated with CTCF ( Fig 2C ) and Cohesin (Fig S6D). This clear separation of CTCF- and Cohesin-bound HOTs is surprising, given that only relatively small fractions of their peaks (21% and 38% respectively) reside in HOT loci, and present in 36% of the HOT loci, compared to some other DAPs with much higher presence described above, that do not get separated clearly by the PCA analysis. Furthermore, CTCF- and Cohesin-bound HOT enhancer loci are located significantly closer (p-value<10 −100 ; Mann-Whitney U Test) to the nearest genes (Fig S9B), making it more likely that those loci are proximal enhancers. And the total number of overlapping DAPs is significantly higher (p-value< 10 −100 ; Mann-Whitney U Test) in CTCF- and Cohesin-bound loci compared to the rest of the HOT loci (Fig S9C), suggesting that at least a portion of the number of DAPs in HOT loci can be explained by 3D chromatin contacts between the genomic regions mediated by CTCF-Cohesin complex.

To comprehensively quantify the 3D chromatin interactions involving the HOT loci, we used Hi-C data with 5 kbs resolution 23 (see Methods). First, we obtained statistically significant chromatin interactions using FitHiChIP tool 24 (see Methods) and observed that HOT loci are enriched in chromatin interactions and 1.66x more likely to engage in chromatin interactions than the regular enhancers (p-value< 10 −20 , Chi-square test). When all of the DAP-bound loci are considered, the number of chromatin interactions positively correlates with the number of bound DAPs (rho=0.3, p-value<10 −100 , Spearman correlation). Next, we overlayed the chromatin interactions with the loci binned by the number of bound DAPs. We observed that the loci with high numbers of bound DAPs are more likely to engage in chromatin interactions with other loci harboring large numbers of DAPs, i.e. the HOT loci have the propensity to connect through long-range chromatin interactions with other HOT loci ( Fig 3A ). To further validate this observation, we obtained frequently interacting regions (FIREs) 25 , and observed that the FIREs are 2.89x (p-value<10 −230 , Chi-square test) enriched HOT loci compared to the regular enhancers (see Methods). Moreover, 66% of HOT loci are located in TAD regions and 21% are located in chromatin loops. In particular, the HOT loci are 2.97x (p-value< 10 −230 , Mann-Whitney U test) enriched in the chromatin loop anchor regions (11% of the HOT loci) compared to regular enhancers. To investigate further, we analyzed the loop anchor regions harboring HOT loci and observed that the number of multi-way contacts on loop anchors correlates with the number of bound DAPs (rho=0.84 p-value< 10 −4 ; Pearson correlation). The number of multi-way interactions in loop anchor regions varies between 1 and 6, with only one locus, in an extreme case, serving as an anchor for 6 overlapping loops on chromosome 2 ( Fig 3B ). That locus contains one HOT enhancer harboring 101 DAPs, located 23 kbs away from the LINC02583 gene, which is linked to liver-specific GWAS traits hematocrit , left ventricular diastolic function, and high-density lipoprotein cholesterol , highlighting the important role of the HOT locus. Of the loop anchor regions with >3 overlapping loops, 51% contained at least one HOT locus, suggesting an interplay between chromatin loops and HOT loci ( Fig 3B ). Overall, 94% of HOT loci are located in regions with at least one chromatin interaction. This observation is consistent with previous reports that much of the long-range 3D chromatin contacts form through the interactions of large protein complexes 26 . While there is a correlation between the HOT loci and chromatin interactions, the causal relation between these two properties of genomic loci is not clear.

characteristics of a biography pdf

A ) Densities of long-range Hi-C chromatin contacts between the DAP-bound loci. Each horizontal and vertical bin represents the loci with the number of bound DAPs between the edge values. The density values of each cell are normalized by the maximum value across all pairwise bins. Green boxes represent HOT loci. B ) Distribution of HOT loci in Hi-C contact regions. X-axis is the number of Hi-C contacts. Numbers in the top row indicate the total number of genomic loci engaging in the given number of Hi-C contacts. Bars indicate the % of loci that contain at least one HOT locus. C ) Distribution of the number of HOT loci in regions with a given number of Hi-C contacts. X-axis is the same as B. All of the visualized data is generated from the HepG2 cell line.

A set of DAPs stabilizes the interactions of DAPs at HOT loci

Next, using the ChIP-seq signal intensity as a proxy for the DAP binding affinity, we sought to analyze the patterns of binding affinities of DAP-DNA of interactions in HOT loci. We observed that the overall binding affinities of DAPs correlate with the total number of colocalizing DAPs ( Fig 4A , rho=0.97, p-value<10 −10 ; Spearman correlation). Moreover, even when calculated DAP-wise, the average of the overall signal strength of every DAP correlates with the fraction of HOT loci that the given DAP overlaps with (rho=0.6, p-value<10 −29 ; Spearman correlation. Figure 4B ), meaning that the overall average value of the signal intensity of a given DAP is largely driven by the ChIP-seq peaks which are located in HOT loci.

characteristics of a biography pdf

HOT regions induce strong ChIP-seq signals. A ) Distribution of the signal values of the ChIP-seq peaks by the number of bound DAPs. The shaded region represents the HOT loci. B,C ) DAPs sorted by the ratio of ChIP-seq signal strength of the peaks located in HOT loci and non-HOT loci. 20 most HOT-specific (red bars) and 20 most non-HOT-specific (blue bars) DAPs are depicted. B ) Fold change (log2) of the HOT and non-HOT loci ChIP-seq signals. C ) Distribution of the average ChIP-seq signal in the loci binned by the number of bound DAPs. Rows represent the loci with the bound DAPs indicated by the values of the edges (y-axis). Green box regions demarcate the HOT regions. D ) Signal values of ssDAPs, nssDAPs (see the text for description), H3K27ac, CTCF, P300 peaks in HOT promoters and enhancers. All of the visualized data is generated from the HepG2 cell line.

While the overall average of the ChIP-seq signal intensity in HOT loci is greater when compared to the rest of the DAP-bound loci, individual DAPs demonstrate different levels of involvement in HOT loci. When sorted by the ratio of the signal intensities in HOT vs. non-HOT loci, among those with the highest HOT-affinities are GATAD1, MAX, NONO as well as POLR2G and Mediator subunit MED1 ( Fig 4B,C ). Whereas those with the opposite affinity (i.e. those that have the strongest binding sites in non-HOT loci) are REST, RFX5, TP53, etc ( Fig 4B,C ). By analyzing the signal strengths of DAPs jointly, we observed that a host of DAPs likely has a stabilizing effect on the binding of DAPs in that, when present, the signal strengths of the majority of DAPs are on average 1.7x greater. These DAPs are CREB1, RFX1, ZNF687, RAD51, ZBTB40 and GPBP1L1.

So far, we have treated the DAPs under a single category and did not make a distinction based on their known DNA-binding properties. Previous studies have discussed the idea that sequence-specific DAPs (ssDAPs) can serve as anchors, similar to the pioneer TFs, which could facilitate the formation of HOT loci 19 , 20 , 27 . We asked if ssDAPs yield greater signal strength values than non-sequence-specific DAPs (nssDAPs). To test this hypothesis, we classified the DAPs into those two categories using the definitions provided in the study (Lambert et al. 2018) 28 and categorized the ChIP-seq signal values into these two groups. While statistically significant (p-value< 0.001, Mann-Whitney U test), the differences in the average signals of ssDAPs and nssDAPs in both HOT enhancers and HOT promoters are small ( Fig 4D ). Moreover, while the average signal values of ssDAPs in HOT enhancers are greater than that of the nssDAPs, in HOT promoters this relation is reversed. At the same time, the average signal strength of the DAPs is 3x greater than the average signal strength of H3K27ac peaks in HOT loci. Based on this, we concluded that the ChIP-seq signal intensities do not seem to be a function of the DNA-binding properties of the DAPs.

Sequence features that drive the accumulation of DAPs

We next analyzed the sequence features of the HOT loci. For this purpose, we first addressed the evolutionary conservation of the HOT loci using phastCons scores generated using an alignment of 46 vertebrate species 29 . The average conservation scores of the DAP-bound loci are in strong correlation with the number of bound DAPs (rho=0.98, p-value<10 −130 ; Spearman correlation) indicating that the negative selection exerted on HOT loci are proportional to the number of bound DAPs ( Fig 5A ). With 120 DAPs per locus on average, these HOT regions are 1.7x more conserved than the regular enhancers in HepG2 ( Fig 5B ). We observed a similar trend of conservation levels when the phastCons scores generated from primates and placental mammals and primates were considered, the HOT loci being 1.45x and x1.1 more conserved than the regular enhancers, respectively (Fig S10). In addition, we observed that the HOT loci of all three cell lines (HepG2, K562, and H1) overlap with 22 ultraconserved regions, among which are the promoter regions of 11 genes including SP5, SOX5, AUTS2, PBX1, ZFPM2, ARID1A, OLA1 and the enhancer regions of (within <50kbs of their TSS) 5S rRNA, MIR563, SOX21, etc. (full list in Table S4). Among them are those which have been linked to diseases and other phenotypes. For example, DNAJC1 30 and OLA1 (which interacts with BRCA1) have been linked to breast cancer in cancer GWAS studies 31 . Whereas AUTS2 32 and SOX5 33 have been linked to predisposition to neurological conditions such as Autism spectrum disorder, intellectual disability, and neurodevelopmental disorder. Of these genes, ARID1A, AUTS2, DNAJC1, OLA1, SOX5, and ZFPM2 have been reported to have strong activities in the Allen Mouse Brain Atlas 34 .

characteristics of a biography pdf

Sequence features of HOT loci. A) Distribution of conservation score in loci bound by DAPs in HepG2 and K562. The logarithmic part of the bins is expressed in terms of the percentages of loci that each bin covers, averaged over two cell lines. The correlation value is Pearson. The shaded region represents HOT loci. B ) phastCons conservation scores of regular enhancer, HOT loci, and exon regions. The values are normalized by the average scores of regular enhancers. C ) Classification performances (auROC and auPRC values) of HOT loci against the backgrounds of DHS, promoter, and regular enhancer regions. The X-axis values are the methods used for classifications. Methods starting with “seq -” are based on sequences (CNNs and gkmSVM, refer to Methods and main text). Starting with “feat -” are methods where all sequence features are used (GC, CpG, GpC, CpG island). Depicted values for feature-based SVMs are run using linear kernels.

CpG islands have been postulated to serve as permissive TF binding platforms 35 , 36 and this has been listed as one of the possible reasons for the existence of HOT loci in a previous study 18 . To test this hypothesis, we extracted the overlap rates of all DAP-bound loci with CpG islands (see Methods). While the overall fraction of loci that overlap CpG islands correlates strongly with the number of bound DAPs (rho=0.7, p-value=0.001; Pearson correlation), only 12% of HOT enhancers overlapped CpG island whereas, for the HOT promoters, this fraction was 83%, suggesting that CpG islands alone do not explain HOT enhancer loci despite accounting for the majority of HOT promoters loci (Fig S11A). Similarly, the average GC content is strongly correlated with the number of bound DAPs (rho=0.89, p-value<10 −4 ; Pearson correlation, Fig S11B), with the average GC content of 64% and 51% in HOT promoters and HOT enhancers respectively (p-value<10 −100 , Mann-Whitney U test), in both HepG2 and K562.

In addition, we observed that the average content of repeat elements in the loci strongly and negatively correlates with the number of bound DAPs across the cell lines (rho=-0.9, p-value=<10 −5 ; Pearson, Fig S11C), which is likely the result of the fact that the HOTs are under elevated negative selection and reject insertion of repetitive DNA.

Other genomic sequence features that have been considered in the context of HOT loci in previous studies include and are not limited to G-quadruplex, R-loops, methylation patterns, etc., which have concluded that each of them can partially explain the phenomenon of the HOT loci 13 , 17 , 18 . Still, one of the central questions remains whether the HOT loci are driven by sequence features or they are the result of cellular biology not strictly related to the sequences, such as the proximal accumulation of DAPs in foci due to the biochemical properties of accumulated molecules, or other epigenetic mechanisms.

To address this question with a broader approach, we asked whether the HOT loci can be accurately predicted based on their DNA sequences alone, and sequence features, including GC, CpG, GpC contents, and CpG island coverage. For sequence-based classification, we trained a Convolutional Neural Network (CNN) model using one-hot encoded sequences and an SVM classifier trained on gapped k-mers (gkmSVM) 37 (Supplemental Methods 1.6.2, Figure S12). For feature-based classification, we trained logistic regression (LogReg) classifiers and SVM classifiers with linear, polynomial, radial basis function, and sigmoid kernels (See Supplemental Methods 1.6.2.2 for detailed analysis). We carried out the classification experiments using the following control sets: a) randomly selected loci from merged DNaseI Hypersensitivity Sites (DHS) of cell lines in the Roadmap Epigenomics Project, b) promoter regions, and c) regular enhancers (See Supplemental Methods 6.1.1 for definitions).

Using the sequence features, we trained separate models using each of the features in addition to one with all of the features combined. We observed that, when averaged across all the methods, GC content value possesses the highest amount of discrimination power (auROC: 0.73), followed by the combination of all features (auROC: 0.70) (Figure S13A,B). When compared across the classification methods, LogReg and SVM with linear kernel outperformed the other non-linear kernels by 20%, suggesting that the features possess linearly combined or largely overlapping effects in encoding the information in HOT loci (Figure S13A).

When classified using the sequences directly, CNN yielded the highest performance with auROC of 0.91, while for the gkmSVM it was 0.86 (both averaged over cell lines and control sets), suggesting that CNNs capture the motif grammar of the HOT loci better than gapped k-mers (Figure S13C). When the two classification schemes (sequence- and feature-based) are compared, CNNs outperformed the LogReg and linear SVMs by a factor of 1.3x (or 17%), suggesting that there is additional information that is highly relevant to the DNA-DAP interaction density encoded in the DNA sequences, in addition to the GC, CpG, GpC ( Fig 5C ). This is in line with the observation mentioned above, that 88% of the HOT enhancers do not overlap with annotated CpG islands. This analysis concluded that the mechanisms of HOT locus formation are likely encoded in their DNA sequences.

Of the control regions tested, we observed that CNNs can discriminate with the auROC values of 0.91, 0.89, and 0.87 for DHS, promoters, and regular enhancers respectively ( Fig 5C ). This observation reflects the fact that HOT loci are themselves located in enhancers and promoters, albeit representing fairly separable and distinct subsets of them.

Extending the input regions from 400 bp to 1 kbs for sequence-based classification did not lead to a significant increase in performance, suggesting that the core 400 bp regions contain most of the information associated with DAP density (Fig S14).

Highly expressed housekeeping genes are commonly regulated by HOT promoters

After characterizing the HOT loci in terms of the DAP composition and sequence features, we sought to analyze the cellular processes they partake in. HOT loci were previously linked to highly expressed genes 18 . In both inspected differentiated cell lines (HepG2 and K562), the number of DAPs positively correlates with the expression level of their target gene (enhancers were assigned to their nearest genes for this analysis; rho=0.56, p-value<10 −10 ; Spearman correlation; Fig 15A). In HepG2, the average expression level of the target genes of promoters with at least one DAP bound is 1.7x higher than that of the target genes of enhancers with at least one DAP bound, whereas when only HOT loci are considered this fold-increase becomes 4.7x. This suggests that the number of bound DAPs of the HOT locus has a direct impact on the level of the target gene expression. Moreover, highly expressed genes (RPKM>50) were 4x more likely to have multiple HOT loci within the 50 kbs of their TSSs than the genes with RPKM<5 (p-value<10 −12 , chi-square test). In addition, the average distance between HOT enhancer loci and the nearest gene is 4.5x smaller than with the regular enhancers (p-value<10 −30 , Mann-Whitney U test). Generally, we observed that the distances between the HOT enhancers and the nearest genes are negatively correlated with the number of bound DAPs (rho=-0.9; p-value<10 −6 ; Pearson correlation. Fig S15B), suggesting that the increasing number of bound DAPs makes the regulatory region more likely to be the TSS-proximal regulatory region.

To further analyze the distinction in involved biological functions between the HOT promoters and enhancers, we compared the fraction of housekeeping (HK) genes that they regulate, using the list of HK genes reported by (Hounkpe et al. 2021) 38 . According to this definition, 64% of HK genes are regulated by a HOT promoter and only 30% are regulated by regular promoters ( Fig 6A ). The HOT enhancers, on the other hand, flank 21% of the HK genes, which is less than the percentage of HK genes flanked by regular enhancers (38%). For comparison, 22% of the flanking genes of super-enhancers constitute HK genes. The involvement of HOT promoters in the regulation of HK genes is also confirmed in terms of the fraction of loci flanking the HK genes, namely, 21% of the HOT promoters regulate 64% of the HK genes. This fraction is much smaller (<9% on average) for the rest of the mentioned categories of loci (HOT and regular enhancers, regular promoters, and super-enhancers, Fig 6A ).

characteristics of a biography pdf

HOT promoters are ubiquitous and HOT enhancers are tissue-specific. A) Fractions of housekeeping genes regulated by the given category of loci (blue). Fractions of the loci which regulate the housekeeping genes (orange) B ) Tissue-specificity ( tau ) scores of the target genes of different types of regulatory regions C) GO enriched terms of HOT promoters and enhancers of HepG2. 0 values in the p-values columns indicate that the GO term was not present in the top 50 enriched terms as reported by the GREAT tool. All of the visualized data is generated from the HepG2 cell line.

We then asked whether the tissue-specificities of the expression levels of target genes of the HOT loci reflect their involvement in the regulation of HK genes. For this purpose, we used the tau metric as reported by (Palmer et al. 2021) 39 , where a high tau score (between 0 and 1) indicates a tissue-specific expression of a gene, whereas a low tau score means that the transcript is expressed stably across tissues. We observed that the average tau scores of target genes of HOT enhancers are significantly but by a small margin greater than the regular enhancers (0.66 and 0.63, respectively. p-value<10 −18 , Mann-Whitney U test), with super-enhancers being equal to regular enhancers (0.63). The difference in the average tau scores of the HOT and regular promoters is stark (0.57 and 0.74 respectively, p-value<10 −100 , Mann-Whitney U test), representing a 23% increase ( Fig 6B ). Combined with the involvement in the regulation of HK genes, average tau scores suggest that the HOT promoters are more ubiquitous than the regular promoters whereas HOT enhancers are more tissue-specific than the regular and super-enhancers. Further supporting this, the GO enrichment analysis showed that the GO terms associated with the set of genes regulated by HOT promoters are basic HK cellular functions (such as RNA processing , RNA metabolism , ribosome biogenesis, etc.), whereas HOT enhancers are enriched in GO terms of cellular response to the environment and liver-specific processes (such as response to insulin, oxidative stress, epidermal growth factors, etc.) ( Fig 6C ).

A core set of HOT loci is active during development which expands after differentiation

Having observed that the HOT loci are active regions in many other human cell types, we asked if the observations made on the HOT loci of differentiated cell lines also hold true in the embryonic stage. To that end, we analyzed the HOT loci in H1 cells. It is important to note that the number of available DAPs in H1 cells is significantly smaller (n=47) than in HepG2 and K562, due to a much smaller size of the ChIP-seq dataset generated in H1. Therefore, the criterion of having >17% of available DAPs yields n>15 DAPs for the H1, as opposed to 77 and 55 for HepG2 and K562, respectively. However, many of the features of the loci that we’ve analyzed so far demonstrated similar patterns (GC contents, target gene expressions, ChIP-seq signal values etc.) when compared to the DAP-bound loci in HepG2 and K562, suggesting that albeit limited, the distribution of the DAPs in H1 likely reflects the true distribution of HOT loci. To alleviate the difference in available DAPs, in addition to comparing the HOT loci defined using the complete set of DAPs, we also (a) applied the HOT classification routing using a set of DAPs (n=30) available in all three cell lines (b) randomly subselected DAPs in HepG2 and K562 to match the number of DAPs in H1.

We observed that, when the complete set of DAPs is used, 85% of the HOT loci of H1 are also HOT loci in either of the other two differentiated cell lines ( Fig 7A ). However, only <10% of the HOT loci of the two differentiated cell lines overlapped with H1 HOT loci, suggesting that the majority of the HOT loci are acquired after the differentiation. A similar overlap ratio was observed based on DAPs common to all three cell lines ( Fig 7B ), where 68% of H1 HOT loci overlapped with that of the differentiated cell lines. These overlap levels were much higher than the randomly selected DAPs matching the H1 set (30%, Fig 7C ).

characteristics of a biography pdf

H1-hESC HOT loci A ) Overlaps between the HOT loci of three cell lines. B ) Overlaps between the HOT loci of cell lines defined using the set of DAPs available in all three cell lines. C ) Fractions of H1 HOT loci overlapping that of the HepG2 and K562 using the complete set of DAPs, common DAPs, and DAPs randomly subsampled in HepG2/K562 to match the size of H1 DAPs set D ) phastCons scores of HOT loci in HepG2, K562, and H1. The ratio of average conservation scores of HOT promoters with that of the HOT enhancers is at the top of every cell line’s group.

Average evolutionary conservation scores (phastCons) of the developmental HOT loci are 1.3x higher than K562 and HepG2 HOT loci (p-value<10 −10 , Mann-Whitney U test, Fig 7D ). It is conceivable to hypothesize that the embryonic HOT loci are located mainly in regions with higher conservation regions, and more regulatory regions emerge as HOT loci after the differentiation. Some of these tissue-specific HOT loci could be those that are acquired more recently (compared to the H1 HOT loci), as it is known that the enhancers are often subject to higher rates of evolutionary turnover than the promoters 40 .

GO enrichment analysis showed that H1 HOT promoters, similarly to the other cell lines, regulate the basic housekeeping processes (Fig S16) while the HOT enhancers regulate responses to environmental stimuli and processes active during the embryonic stage such as TORC1 signaling and beta-catenin-TCF assembly . This suggests that the main processes that the HOT promoters are involved in during the development remain relatively unchanged after the differentiation (in terms of associated GO terms, and due to being the same loci as the HOT promoters in differentiated cell lines), whereas the scope of the cellular activities regulated by HOT enhancers gets expanded after differentiation to be more exclusively tissue-specific.

HOT loci are enriched in causal variants

After establishing the expression and tissue-specificities of the HOT loci, we next analyzed the polymorphic variability in HOT loci and whether these loci are enriched in phenotypically causal variants. First, we analyzed the density of common variants extracted from the gnomAD database 41 (filtered with MAF>5%). We observed that HOT enhancers and HOT promoters are depleted in INDELs (4.7 and 4.1 variants per 1 kbs, respectively), compared to the regular enhancers and regular promoters (5.5 and 6.2 variants per 1 kbs, p-value<10 −4 and <10 −100 , respectively, Mann-Whitney U test; Fig 8A ). Contradicting the pattern of conservation scores described above, the distribution of common SNPs is elevated in HOT enhancers and HOT promoters compared to regular enhancers and regular promoters (1.14x and 1.07x fold-enrichment, p-values <10 −20 and <10 −100 , respectively, Mann-Whitney U test; Fig 8B ). This elevation of common variants in HOT loci, despite being located in conserved loci has been reported in a previous study in which the binding motifs of TFs were observed to colocalize in regions where the density of common variants was higher than average 42 .

characteristics of a biography pdf

Densities of variants A ) common INDELs (MAF>5%), B ) common SNPs (MAF>5%), C ) eQTLs, D ) caQTLs E ) raQTLs, and F ) GWAS and LD (r2>0.8) variants in HOT loci and regular promoters and enhancers. G ) Enriched GWAS traits in HOT enhancers and promoters. All of the visualized data is generated from the HepG2 cell line.

The eQTLs, on the other hand, are 2.0x enriched in HOT promoters compared to the regular promoters (p-value<10 −21 , Mann-Whitney U test), while HOT enhancers are only moderately enriched in eQTLs compared to the regular enhancers (1.15x, p-value>0.05, Mann-Whitney U test; Fig 8C ). eQTL enrichment in HOT promoters and regular promoters (compared to HOT and regular enhancers respectively) is in line with the known characteristics of the eQTL dataset, that the eQTLs most commonly reflect TSS-proximal gene-variant relationships, and therefore are enriched in promoter regions since the TSS-distal eQTLs are hard to detect due to the burden of multiple tests 43 .

Unlike the eQTL analysis, we observed that the chromatin accessibility QTLs (caQTLs) are dramatically enriched in the overall enhancer regions (HOT and regular) compared to the promoters (HOT and regular) (4.1x, p-value<10 −100 ; Mann-Whitney U test, Fig 8D ). This observation confirms the findings of the study which reported the caQTL dataset in HepG2 cells 44 , which reported that the likely causal caQTLs are predominantly the variants disrupting the binding motifs of liver-expressed TFs enriched in liver enhancers. However, within the promoters regions, the HOT promoters are 3.0x enriched in caQTLs compared to the regular promoters (p-value=0.001; Mann-Whitney U test), whereas the fold enrichment in HOT enhancers is insignificant (1.2x, p-value=0.22, Mann-Whitney U test).

A similar enrichment pattern displays the reporter array QTLs (raQTLs 45 ), with respect to the overall (HOT and regular) promoter and enhancer regions, with 3.3x enrichment in enhancers (p-value<10 −10 , Mann-Whitney U test, Fig 8E ). But, within-promoters and within-enhancers enrichments show that the enrichment in HOT promoters is more pronounced than the HOT enhancers (3.6x and 1.8x, p-values<0.01 and <10 −11 , respectively, Mann-Whitney U test). The enrichment of the raQTLs in enhancers over the promoters likely reflects the fact that the SNP-containing loci are first filtered for raQTL detection according to their capacities to function as enhancers in the reporter array 45 .

Combined, all three QTL datasets show a pronounced enrichment in HOT promoters compared to the regular promoters, whereas only the raQTLs show significant enrichment in HOT enhancers. This suggests that the individual DAP ChIP-seq peaks in HOT promoters are more likely to have consequential effects on promoter activity if altered, while HOT enhancers are less susceptible to mutations. Additionally, it is noteworthy that only the raQTLs are the causal variants, whereas e/caQTLs are correlative quantities subject to the effects of LD.

Finally, we used the GWAS SNPs combined with the LD SNPs (r2>0.8) and observed that the HOT promoters are significantly enriched in GWAS variants (1.8x, p-value<10 −100 ) whereas the HOT enhancers show no significant enrichment over regular enhancers (p-value>0.1, Mann-Whitney U test) ( Fig 8F ). We then calculated the fold-enrichment levels GWAS traits SNPs using the combined DHS regions of Roadmap Epigenome cell lines as a background (see Methods). Filtering the traits with significant enrichment in HOT loci (p-value<10 −3 , Binomial test, Bonferroni corrected, see Methods) left 7 traits, of which all are definitively related to the liver functions ( Fig 8G ). Of the seven traits, only one ( Blood protein level ) was significantly enriched in regular promoters. While the regular enhancers are enriched in most of the (6 of 7) traits, the overall enrichment values in HOT enhancers are 1.3x greater compared to the regular enhancers. The fold-increase is even greater (1.5x) between the HOT and DHS regions. When the enrichment significance levels are selected using unadjusted p-values, we obtained 24 GWAS traits, of which 22 are related to liver functions (Fig S17). This analysis demonstrated that the HOT loci are important for phenotypic homeostasis.

HOT loci have been noticed and studied in different species since the early years of the advent of the ChIP-seq datasets 12 – 16 , 27 . Up until recently, most of the studies have extensively studied the reasons through which the ChIP-seq peaks appeared to be binding to HOT loci with no apparent sequence specificity and characterized certain sequence features of the HOT loci which could enable elevated read mapping rates 13 , 17 , 18 . As the number of assayed DAPs in multiple human cell types and model organisms has increased, however, the assumption of the HOT loci being exceptional cases and results of false positives in ChIP-seq protocols have given way to the acceptance that the HOT loci, with exorbitant numbers of mapped TFBSs, are indeed hyperactive loci with distinct features characteristic of active regulatory regions 19 , 20 .

In this study, we studied the HOT loci in multiple complementary aspects to the previous works and expanded the scope of characterization extensively using the functional genomics datasets. We used the two most extensively characterized differentiated cell lines of the ENCODE Project; HepG2 and K562. We also included the H1-hESC human stem cells to study the activities of HOT loci during the embryonic stage. The number of assayed DAPs in these cell lines is far from complete 28 , therefore it is important to note that as the sizes of the assayed DAP ChIP-seq datasets increase, our understanding of the mechanisms of HOT loci will certainly improve. However, the core principles can already be inferred using the currently available datasets. Previous studies have used different metrics to define the HOT loci. For example, Wreczycka et al. 2019 18 used the 99 th percentile of the density of TFBSs for a 500 bp sliding window, Remaker et al. 2020 19 used the window length of 2 kb and required >25% of TFs to be mapped, Partridge et al. 2020 20 used loci with >70 chromatin-associated proteins in 2 kb window. These heterogeneous definitions, however, fail to appreciate that the histogram of loci binned by the number of harbored TFBSs represents an exponential distribution ( Fig 1A , Fig S1). We, therefore, applied our analyses both to the binarily defined HOT and non-HOT loci, as well as to the overall spectrum of loci in the context of TFBS density. This approach allowed us to better understand the correlations of characteristics of loci with the TF activity. Noticeably, this approach showed us that the HOT loci have their propensities to engage in long-range chromatin contacts with other equally or more DAP-bound loci than less active ones, making it more clear that the HOT loci are located in 3D hubs and FIREs ( Fig 3A ).

Using the datasets generated in H1 we established that only <10% of the HOT loci in two differentiated cell lines overlap with the HOT loci of stem cells. This points to the high tissue-specificity of the HOT loci. Previous studies have also concluded that the HOT loci are not constitutive by nature, and are established in a dynamic manner after the differentiation 21 . Of note, we used the datasets related to H1 cells in order to study the developmental aspects of the HOT phenomenon, and due to the much smaller sizes of the available datasets, we did not include the H1 in other parts of the analyses.

We conducted our analyses using a more comprehensive set of datasets of functional genomics including Hi-C data, eQTLs, raQTLs, etc. Our approach of splitting the HOT loci into enhancer and promoter regions allowed us to detect distinct patterns characteristic of these two categories. While the HOT promoters and enhancers share some sequence features, they are bound by a distinct set of DAPs and possess different biases in enrichments of different types of QTLs.

We have analyzed the patterns of DAPs in HOT loci using PCA in a similar way described in Partridge et al. 2020 20 , which was conducted only on chromatin-associated proteins in HepG2 since we asked if the findings of the study conducted only on chromatin-associated proteins hold true for the HOT loci defined using an unbiased set of DAPs, and we observed that the chromatin-associated DAPs can be distinctly separated from the other transcription-related DAPs.

Previous studies have carried out extensive mapping of the binding motifs of TFs to the HOT loci and identified a small set of “anchor” binding motifs of a few key tissue-specific TFs 13 , 19 , and proposed that perhaps these driver TFs initiated the formation of HOT loci, similar to how the pioneer factors function. More importantly, more studies have come to the conclusion that the overwhelming majority of the peaks do not contain the corresponding motifs and that most of the mapped peaks represent indirect binding through TF-TF interactions 19 , 20 , 42 , 46 . We relied on the conclusions of these studies in making the assumption that the inexplicably high density of DAPs could not be explained by the direct binding events and did not carry out the analyses based on DNA-binding motifs. Interestingly, the high prediction accuracy of our deep learning model is in agreement with the notion of the existence of shared motifs among the HOT loci but also implies that the indirectly bound loci also carry shared sequence features, perhaps other than the binding motifs or weak motifs which are not detected using the traditional PWM-based tools of motif detection. More studies are needed to further categorize the HOT loci along with the binding affinities of TFs.

Another model that has been increasingly attributed to the formation and maintenance of long-range 3D chromatin interactions involves phase-separated condensates 47 – 50 . Some enhancers (dubbed MegaTrans enhancers) were shown to drive the formation of large chromosomal assemblies involving a high concentration of TFs 47 . In general, it has been increasingly appreciated that condensates ubiquitously attract and activate enhancers 51 – 53 . The property of the condensates, which is of special interest to this study, is the capacity to serve as a “storage” of factors and co-factors inside the phase-separated droplets. For instance, the condensates can store hundreds of p300 molecules at active enhancers such that their catalytic histone acetyltransferase activity is decreased while in the phase-separated state, essentially kept in dormant mode until released 54 . The detection of condensates relies on low-throughput live cell imaging methods such as FISH, which often involves only a few tagged molecules. Therefore, currently, there are no datasets of condensate formation with large numbers of molecules simultaneously that we could use to make statistical inferences. However, there is already an increasing body of research reporting that many transcriptional activities are driven by the formation of condensates, where each of them studies individual proteins in their contexts. Based on all this, we postulate that the HOT loci might be the loci where transcriptional condensates form. Once the condensates of sufficient size form, the kinetic trap that it creates can facilitate the accumulation of a soup of DAPs, which then can undergo high-intensity protein-protein and protein-DNA interactions, many constituents of which then get mapped to the involved DNA regions upon ChIP-seq experiments.

Condensates formed at different foci in the nucleus have been shown to acquire physiochemical properties depending on their functions 49 . For instance, the sizes of the transcriptional condensates have been shown to be regulated by the concentration of RNA molecules contained in them 55 . Initially, RNA molecules serve as scaffolds to form the condensates, however, once the concentration of nascent RNA starts to increase due to transcription the condensates dissolve, providing a regulatory feedback loop for the condensates, thus explaining the phenomenon of transcriptional bursts 56 . Another aspect that this RNA-based condensate regulation explains is the enrichment of transcribed RNAs in the active enhancers. Indeed, we observed extreme enrichment of eRNAs in HOT enhancers (Fig S18), further supporting the condensate hypothesis of the HOT loci.

With the condensates assumed, the HOT loci become all the more explainable since ChIP-seq extracts the reads from populations of millions of cells, amounting to an average of many underlying protein-protein and protein-DNA interactions. With the advent of more precise protocols such as CUT&RUN, micro-C, and single-cell versions of ChIP-seq, ATAC-seq combined with bigger databases of experimentally verified condensate studies, we will have a better understanding of how the HOT loci form and gain insights into the causal relations between the high concentrations of DAPs and the transcriptional condensates.

Transcription factor (DAP), histone modification, DNase-I hypersensitivity sites ChIP-seq and ATAC-seq datasets for HepG2, K562, H1-hESC cell lines were batch downloaded from the ENCODE Project 57 . For each DAP of each cell line, if there were multiple datasets, the one with the latest date was selected, prioritizing the ones with the least among of audit errors and warnings (Table S1). The GRCh37/hg19 assembly was used as a reference genome throughout the study. In those cases when ChIP-seq dataset was reported on GRCh38/hg38, the coordinates were converted to hg19 using liftOver. The phastCons evolutionary conservation scores generated from 46 vertebrate species, placental mammals and primates, CpG islands, repeat elements and GENCODE TSS annotations were all obtained from the UCSC genome browser database 11 . Transcribed enhancer regions (eRNAs) were obtained from the FANTOM database 58 . Super-enhancer regions were obtained from (Hnisz et al. 2013) 59 .

Hi-C datasets were obtained from ENCODE Project. Please refer to Supplemental Methods 1.3 for detailed description of Hi-C data analysis.

GC contents were calculated using the “nuc’’ functionality of the bedtools program 60 . Gene expression data was obtained from the Roadmap Epigenomics project. For analyzing the expression levels of target genes, the gene of the overlapping TSS was used for promoters, whereas for enhancers, the nearest genes were selected using the bedtools closest function. Tissue-specificity metric tau scores for genes were downloaded from (Palmer et al. 2021) 39 which were calculated using the data mined from Gene Expression Omnibus 61 .

Definitions

The loci were divided into bins according to a two-part scale. The first part is on a linear scale from 1 to 5, the second part is on a logarithmic scale from 5 to the maximum number of DAPs bound to a single locus in that cell line (Table 1). These nominal numbers are used in cases when the distributions are displayed for individual cell lines (such as Fig1A and Fig). When the figures display the distributions for two cell lines in a joint manner (such as Fig3A,B ), the edges are converted to the average percentages of the overall scale lengths for each cell line.

Regular enhancers were defined as central 400bp regions of DNase-I hypersensitivity sites (DHS) which overlap H3K27ac histone modification regions with promoter and exons removed from them.

Promoters were defined as 1.5kbs upstream and 500 downstream regions of the canonical and alternative TSS coordinates were extracted from the knownGenes.txt table obtained from UCSC Genome Browser.

All the genomic arithmetic operations were done using the bedtools program 60 . Figures were generated using Matplotlib 62 and Seaborn 63 packages. Statistical and numerical analyses were done using the pandas, NumPy , SciPy and sklearn packages 64 in Python programming language. Genomic repeat regions were extracted from RepeatMasker table obtained from http://www.repeatmasker.org/ . CpG islands were extracted from cpgIslandExt table obtained from the UCSC Genome Browser. Protein-protein interaction network information was obtained using the https://string-db.org web interface 65 .

Statistical analyses

All the statistical significance analyses were done using the SciPy package. Statistical significance of genomic region overlaps was calculated using the “ bedtools fisher ” command. The p-values too small to be represented by the command line output were represented as <1E-100.

Correlation values with the number of bound TFs were calculated using the average of the value for the bins, and the midpoint numbers of the edges of each bin.

GWAS analysis

NHGRI-EBI GWAS database variants were grouped according to their traits (dataset e0_r2022-11-29). For each GWAS SNP, LD SNPs with r2>0.8 were added using the plink v1.9 66 program using the parameters --ld-window-r2 0.8 --ld-window-kb 100 --ld-window 1000000 . Enrichments of GWAS-trait SNPs were calculated as the ratios of densities of SNPs in each class of regions (eg. HOT enhancers, HOT promoters) to either that of the regular enhancers or the DHS regions. Statistical significance of enrichment was calculated using the binomial test. FDR values were calculated using the Bonferroni correction.

Sequence classification analysis

Classification tasks were constructed in a binary classification setup. The control regions were used from: a) Randomly selected (10x the size of the HOT loci) merge DHS regions from all the available datasets from Roadmap Epigenomic Project b) using all of the promoter regions as defined above c) regular enhancers as defined above, with the HOT loci subtracted (see Supplemental Methods 1.6.1 for details).

Sequence-based classification (CNN) : sequences were converted to one-hot encoding and a Convolutional Neural Network was trained using each of the control regions as negative set. The model was built using tensorflow v2.3.1 67 and trained on NVIDIA k80 GPUs (see Supplemental Methods 1.6.2.1 for details).

Sequence-based classification (SVM) : SVM models were trained using the LS-GKM package 37 (see Supplemental Methods 1.6.2.2 for details).

Feature-based classification : sequences were represented in terms of GC, CpG, GpC contents and overlap percentages with annotated CpG islands. Logistic regression and SVM classifiers were trained using these sequence features (see Supplemental Methods 1.6.3 for details).

Variant analysis

Common SNPs and INDELs were extracted from the gnomAD r2.1.1 dataset 41 . Variants with PASS filter value and MAF>5% were selected using the “view -f PASS -i ‘MAF[0]>0.05’” options of bcftools program 68 . Loss-of-function variants were downloaded from the gnomAD website under the option “all homozygous LoF curation” section of v2.1.1 database. raQTLs were downloaded from https://sure.nki.nl 45 . Liver and blood eQTLs were extracted from the GTEx v8 dataset ( https://www.gtexportal.org/home/datasets ). Liver caQTLs were obtained from the supplementary material of 44 .

Software and Data Availability Statement

The codebase used for generating the results presented in this manuscript is available at https://github.com/okurman/HOT . Supplemental and source datasets used in the study are available at https://zenodo.org/records/10267278 .

Acknowledgements

This work utilized the computational resources of the NIH HPC Biowulf cluster. ( http://hpc.nih.gov ). This research was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.

  • ENCODE Project Consortium
  • Gorkin D. U.
  • Forsberg M.
  • Serfling E.
  • Schaffner W.
  • Furlong E. E. M.
  • Prescott S. L.
  • Maniatis T.
  • Arnosti D. N.
  • Kulkarni M. M.
  • Davis C. A.
  • modENCODE Consortium
  • Gerstein M. B.
  • Stampfel G.
  • Yáñez-Cuna J. O.
  • Dickson B. J.
  • Teytelman L.
  • Thurtle D. M.
  • van Oudenaarden A.
  • Wreczycka K.
  • Ramaker R. C.
  • Partridge E. C.
  • Boyle A. P.
  • Whyte W. A.
  • Lieberman-Aiden E.
  • Bhattacharyya S.
  • Vijayanand P.
  • Schmitt A. D.
  • Quinodoz S. A.
  • Lambert S. A.
  • Michailidou K.
  • Daigle T. L.
  • Deaton A. M.
  • Hounkpe B. W.
  • De Paula E. V.
  • Freitas A. A.
  • de Magalhães J. P.
  • Karczewski K. J.
  • Vierstra J.
  • GTEx Consortium
  • Currin K. W.
  • van Arensbergen J.
  • White S. M.
  • Snyder M. P.
  • Shrinivas K.
  • Sharp P. A.
  • Chakraborty A. K.
  • Henninger J. E.
  • Young R. A.
  • Quinlan A. R.
  • Hunter J. D.
  • Virtanen P.
  • Szklarczyk D.
  • Chang C. C.

Article and author information

Sanjarbek hudaiberdiev, for correspondence:, ivan ovcharenko.

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication .

  • Reviewing Editor Nicolas Altemose Stanford University, United States of America
  • Senior Editor Sofia Araújo University of Barcelona, Barcelona, Spain

Reviewer #1 (Public Review):

This study explores the sequence characteristics and features of high-occupancy target (HOT) loci across the human genome. The computational analyses presented in this paper provide information into the correlation of TF binding and regulatory networks at HOT loci that were regarded as lacking sequence specificity.

By leveraging hundreds of ChIP-seq datasets from the ENCODE Project to delineate HOT loci in HepG2, K562, and H1-hESC cells, the investigators identified the regulatory significance and participation in 3D chromatin interactions of HOT loci. Subsequent exploration focused on the interaction of DNA-associated proteins (DAPs) with HOT loci using computational models. The models established that the potential formation of HOT loci is likely embedded in their DNA sequences and is significantly influenced by GC contents. Further inquiry exposed contrasting roles of HOT loci in housekeeping and tissue-specific functions spanning various cell types, with distinctions between embryonic and differentiated states, including instances of polymorphic variability. The authors conclude with a speculative model that HOT loci serve as anchors where phase-separated transcriptional condensates form. The findings presented here open avenues for future research, encouraging more exploration of the functional implications of HOT loci.

The concept of using computational models to define characteristics of HOT loci is refreshing and allows researchers to take a different approach to identifying potential targets. The major strengths of the study lies in the very large number of datasets analyzed, with hundreds of ChIP-seq data sets for both HepG2 and K562 cells as part of the ENCODE project. Such quantitative power allowed the authors to delve deeply into HOT loci, which were previously thought to be artifacts.

Weaknesses:

While this study contributes to our knowledge of HOT loci, there are critical weaknesses that need to be addressed. There are questions on the validity of the assumptions made for certain analyses. The speculative nature of the proposed model involving transcriptional condensates needs either further validation or be toned down. Furthermore, some apparent contradictions exist among the main conclusions, and these either need to be better explained or corrected. Lastly, several figure panels could be better explained or described in the figure legends.

  • https://doi.org/ 10.7554/eLife.95170.1.sa2

Reviewer #2 (Public Review):

The paper 'Sequence characteristic and an accurate model of abundant hyperactive loci in human genome' by Hydaiberdiev and Ovcharenko offers comprehensive analyses and insights about the 'high-occupancy target' (HOT) loci in the human genome. These are considered genomic regions that overlap with transcription factor binding sites. The authors provided very comprehensive analyses of the TF composition characteristics of these HOT loci. They showed that these HOT loci tend to overlap with annotated promoters and enhancers, GC-rich regions, open chromatin signals, and highly conserved regions, and that these loci are also enriched with potentially causal variants with different traits.

Overall, the HOT loci' definition is clear and the data of HOT regions across the genome can be a useful dataset for studies that use HepG2 or K562 as a model. I appreciate the authors' efforts in presenting many analyses and plots backing up each statement.

It is noteworthy that the HOT concept and their signature characteristics as being highly functional regions of the genome are not presented for the first time here. Additionally, I find the main manuscript, though very comprehensive, long-winded and can be put in a shorter, more digestible format without sacrificing scientific content.

The introduction's mention of the blacklisted region can be rather misleading because when I read it, I was anticipating that we are uncovering new regulatory regions within the blacklisted region. However, the paper does not seem to address the question of whether the HOT regions overlap, if any, with the ENCODE blacklisted regions afterward. This plays into the central assessment that this manuscript is long-winded.

The introduction also mentioned that HOT regions correspond to 'genomic regions that seemingly get bound by a large number of TFs with no apparent DNA sequence specificity' (this point of 'no sequence specificity' is reiterated in the discussion lines 485-486). However, later on in the paper, the authors also presented models such as convolutional neural networks that take in one-hot-encoded DNA sequence to predict HOT performed really well. It means that the sequence contexts with potential motifs can still play a role in forming the HOT loci. At the same time, lines 59-60 also cited studies that "detected putative drive motifs at the core segments of the HOT loci". The authors should edit the manuscript to clarify (or eradicate) contradictory statements.

  • https://doi.org/ 10.7554/eLife.95170.1.sa1

Reviewer #3 (Public Review):

Hudaiberdiev and Ovcharenko investigate regions within the genome where a high abundance of DNA-associated proteins are located and identify DNA sequence features enriched in these regions, their conservation in evolution, and variation in disease. Using ChIP-seq binding profiles of over 1,000 proteins in three human cell lines (HepG2, K562, and H1) as a data source they're able to identify nearly 44,000 high-occupancy target loci (HOT) that form at promoter and enhancer regions, thus suggesting these HOT loci regulate housekeeping and cell identity genes. Their primary investigative tool is HepG2 cells, but they employ K562 and H1 cells as tools to validate these assertions in other human cell types. Their analyses use RNA pol II signal, super-enhancer, regular-enhancer, and epigenetic marks to support the identification of these regions. The work is notable, in that it identifies a set of proteins that are invariantly associated with high-occupancy enhancers and promoters and argues for the integration of these molecules at different genomic loci. These observations are leveraged by the authors to argue HOT loci as potential sites of transcriptional condensates, a claim that they are well poised to provide information in support of. This work would benefit from refinement and some additional work to support the claims.

Condensates are thought to be scaffolded by one or more proteins or RNA molecules that are associated together to induce phase separation. The authors can readily provide from their analysis a check of whether HOT loci exist within different condensate compartments (or a marker for them). Generally, ChIPSeq signal from MED1 and Ronin (THAP11) would be anticipated to correspond with transcriptional condensates of different flavors, other coactivator proteins (e.g., BRD4), would be useful to include as well. Similarly, condensate scaffolding proteins of facultative and constitutive heterochromatin (HP1a and EZH2/1) would augment the authors' model by providing further evidence that HOT Loci occur at transcriptional condensates and not heterochromatin condensates. Sites of splicing might be informative as well, splicing condensates (or nuclear speckles) are scaffolded by SRRM/SON, which is probably not in their data set, but members of the serine arginine-rich splicing factor family of proteins can serve as a proxy-SRSF2 is the best studied of this set. This would provide a significant improvement to their proposed model and be expected since the authors note that these proteins occur at the enhancers and promoter regions of highly expressed genes.

It is curious that MAX is found to be highly enriched without its binding partner Myc, is Myc's signal simply lower in abundance, or is it absent from HOT loci? How could it be possible that a pair of proteins, which bind DNA as a heterodimer are found in HOT loci without invoking a condensate model to interpret the results?

Numerous studies have linked the physical properties of transcription factor proteins to their role in the genome. The authors here provide a limited analysis of the proteins found at different HOT-loci by employing go terms. Is there evidence for specific types of structural motifs, disordered motifs, or related properties of these proteins present in specific loci?

Condensates themselves possess different emergent properties, but it is a product of the proteins and RNAs that concentrate in them and not a result of any one specific function (condensates can have multiple functions!)

Transcriptional condensates serve as functional bodies. The notion the authors present in their discussion is not held by practitioners of condensate science, in that condensates exist to perform biochemical functions and are dissolved in response to satisfying that need, not that they serve simply as reservoirs of active molecules. For example, transcriptional condensates form at enhancers or promoters that concentrate factors involved in the activation and expression of that gene and are subsequently dissolved in response to a regulatory signal (in transcription this can be the nascently synthesized RNA itself or other factors). The association reactions driving the formation of active biochemical machinery within condensates are materially changed, as are the kinetics of assembly. It is unnecessary and inaccurate to qualify transcriptional condensates as depots for transcriptional machinery.

This work has the potential to advance the field forward by providing a detailed perspective on what proteins are located in what regions of the genome. Publication of this information alongside the manuscript would advance the field materially.

  • https://doi.org/ 10.7554/eLife.95170.1.sa0

Be the first to read new articles from eLife

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • New editors
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Outrunning the grim reaper: longevity of the first 200 sub-4 min mile male runners
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-7651-3740 Stephen Foulkes 1 , 2 ,
  • Dean Hewitt 1 ,
  • Rachel Skow 1 ,
  • Douglas Dover 3 ,
  • Padma Kaul 3 ,
  • http://orcid.org/0000-0002-3906-3784 André La Gerche 2 , 4 ,
  • Mark Haykowsky 1
  • 1 Integrated Cardiovascular Exercise Physiology and Rehabilitation Laboratory, Faculty of Nursing , University of Alberta , Edmonton , Alberta , Canada
  • 2 Heart, Exercise and Research Trials Laboratory , St Vincent’s Institute of Medical Research , Melbourne , Victoria , Australia
  • 3 Canadian VIGOUR Centre, Faculty of Medicine and Dentistry , University of Alberta , Edmonton , Alberta , Canada
  • 4 Cardiology Department , St Vincent’s Hospital Melbourne , Fitzroy , Victoria , Australia
  • Correspondence to Dr Mark Haykowsky, Integrated Cardiovascular Exercise Physiology and Rehabilitation Laboratory, Faculty of Nursing, University of Alberta, Edmonton, AB T6G 1C9, Canada; mhaykows{at}ualberta.ca

Objectives To determine the impact of running a sub-4 min mile on longevity. It was hypothesised that there would be an increase in longevity for runners who successfully completed a sub-4 min mile compared with the general population.

Methods As part of this retrospective cohort study, the Sub-4 Alphabetic Register was used to extract the first 200 athletes to run a sub-4 min mile. Each runner’s date of birth, date of their first successful mile attempt, current age (if alive) or age at death was compared with the United Nations Life Tables to determine the difference in each runner’s current age or age at death with their country of origin-specific life expectancy.

Results Of the first 200 sub-4 min mile runners (100% male), 60 were dead (30%) and 140 were still alive. Sub-4 min mile runners lived an average of 4.7 years beyond their predicted life expectancy (95% CI 4.7 to 4.8). When accounting for the decade of completion (1950s, 1960s or 1970s), the longevity benefits were 9.2 years (n=22; 95% CI 8.3 to 10.1), 5.5 years (n=88; 95% CI 5.3 to 5.7) and 2.9 years (n=90; 95% CI 2.7 to 3.1), respectively.

Conclusion Sub-4 min mile runners have increased longevity compared with the general population, thereby challenging the notion that extreme endurance exercise may be detrimental to longevity.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. Not applicable - all data were obtained from free publicly available databases or information sources.

https://doi.org/10.1136/bjsports-2024-108386

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

Regular moderate exercise is considered a pillar of healthy ageing. However, there are concerns that exposing the body to extreme exercise bouts may be harmful to longevity.

WHAT THIS STUDY ADDS

We compared the longevity of the first 200 athletes to run a sub-4 min mile (the epitome of extreme exercise and pushing the body to its physiological limits) with that of the general population.

We showed that athletes who complete a sub-4 min mile live several years longer than the general population.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

Our findings challenge the notion that extreme endurance exercise may be detrimental to longevity, reinforcing the benefits of exercise, even at training levels required for elite performance.

Introduction

6 May 2024 was the 70th anniversary of what is widely considered one of the most significant achievements of the modern sporting era. Specifically, at a track meet at Oxford University’s Iffley Road stadium, Roger Bannister, a 25-year-old Englishman and medical trainee, became the first person to run 1 mile in under 4 min (3:59.4 min, to be exact). 1 2 This represented a monumental breakthrough as it challenged the notion of what many believed was an impenetrable barrier for human exercise physiology and sport performance. 1–3 However, it also raised questions about the potential costs of pushing the human body to the level required to achieve this feat. 3 The 70th anniversary of Bannister’s world record-breaking achievement highlights the progress that can be made by demonstrating the new upper limits of human performance. Since 1954, more than 1750 athletes have joined Bannister in the halls of sub-4 min fame, 4 with the world record baton passing from Bannister to another 18 remarkable athletes, and is now held by Hicham El Guerrouj from Morocco with a time of 3:43.13 min set 25 years ago in 1999. 4 While any doubts surrounding the possibility of a human breaking the 4 min mile have been put to rest, the concerns around the health sequelae—and particularly the cardiovascular consequences of pushing the human body to its physiological limits—persist. 3 5–7

Concerns about whether too much exercise can be harmful predate Bannister by several centuries with the tale of Pheidippides’ fateful run from Marathon to Athens where he died suddenly shortly after announcing the Greeks’ battlefield victory over the Persians (although a tale that has seen much embellishment over the subsequent centuries). 3 7 The protagonists of the view that extreme exercise may cause long-term adverse health effects point to evidence of a ‘U-shaped’ or ‘reverse J-shaped’ association between cardiac events and exercise dose. 5 6 This view suggests that regular moderate exercise provides health benefits but that extremes either side—sedentary behaviour on one side and high volumes of intense endurance exercise on the other—may increase the risk of premature mortality. 5 6 This hypothesis is supported by detailed physiological investigations showing that high-intensity exercise bouts and/or extreme sporting events such as marathons, endurance cycling and Ironman triathlons are associated with potentially concerning changes in cardiac structure or function including acute increases in biomarkers of cardiac injury, reduced resting left and right ventricular function and myocardial fibrosis (although in a minority of athletes). 8 There is some epidemiological evidence to suggest that higher volumes of strenuous exercise have no benefit to longevity relative to sedentary adults, and may even be harmful. 9 However, this finding is based on a low number of community-dwelling individuals and, as such, extreme caution is warranted when extrapolating a potentially underpowered observation from recreationally active community-dwelling adults to the broader population of high-level endurance athletes. Indeed, epidemiological studies focused on populations selected specifically for their extreme exercise behaviour and/or physiologic capabilities (eg, Tour de France cyclists, 10 11 Olympic athletes, 12 rowers 13 ) have shown increased rather than decreased longevity compared with the general population. Whether this holds true for the extreme genetic, physiological and exercise training phenotype of the sub-4 min mile runner is an intriguing question. Notably, a majority of these previous studies have focused on long-endurance sports, thereby testing the duration component of extreme exercise. The repeated bouts of near maximal to maximal exercise performed by mile runners makes them a unique population in which to test the potential impact of extreme intense exercise on longevity.

Therefore, inspired by the correspondence from Maron and Thompson exploring longevity of the first 20 4 min mile runners, 14 we endeavoured to extend their initial observations to the first 200 4 min mile runners with more robust statistical methods to determine the longevity effects of running a sub-4 min mile. We tested the hypothesis that there would be an increase in longevity for runners who successfully completed a sub-4 min mile compared with the general population.

Study design

A retrospective cohort study of the first 200 sub-4 min mile runners.

Participants

We used a publicly available database (the Sub-4 Alphabetic Register, https://nuts.org.uk/sub-4/index.htm ) 4 which provides a compendium of athletes who have broken the 4 min mark for the mile as of 6 June 2022. The list includes details of 1759 runners who were tabulated and sorted by their first recorded date of sub-4 min mile achievement in order to identify and extract relevant details for the first 200 athletes to successfully break the 4 min mark. The sample size of 200 runners was selected as they would be at an age at—if not beyond—the typical life expectancy for their generation (ie, those who ran sub-4 min after this period may be too young to determine a true longevity effect compared with the general population).

Outcome assessment

From the Sub-4 Alphabetic Register, we extracted information for the first 200 runners including each runner’s name, the date and time of their first successful sub-4 min mile attempt and nationality at the time of their attempt. From this, the first 200 runners were searched online to gather their date of birth and date of death (if applicable). In those runners who were still alive (no evidence of the athlete’s death in a comprehensive search), their current age was calculated with a censor date of 31 December 2023.

Searches for each athlete’s date of birth and death were conducted on publicly available websites including Olympic, international and national athletic federations and Wikipedia.com. To minimise the risk of incorrect publications for cause of death online and to account for those with no publicly known cause of death, we chose to exclude cause of death from our overall analysis. After the initial search (conducted by DH) on 18 January 2024, two further independent searches were completed by two separate investigators for further confirmation.

Survival and statistical analyses

The primary outcome was the average difference in life expectancy between the first 200 sub-4 min mile runners and the general population (matched for age, sex and nationality). The follow-up period for the runners was from the exact time of their first successful sub-4 min attempt to either the age of 100 years, the end of the study (31 December 2023) or death (if applicable) and was compared with matched population life table values. The observed life years of the runners were computed from the length of follow-up.

Life expectancy, conditional on being alive (and their age) at the time of their sub-4 min attempt was derived as follows. United Nations life tables 15 were linked to the runners by country at time of achievement, calendar year, sex and single year age for all years to age 100 or December 2023. Matching by age and year captured the changing conditions affecting survival over time. Life tables were only available until 2021, therefore 2022 and 2023 were assumed to be the same as 2021. The annual probability of survival at age of achievement and at turning age 100 were modified to account for the partial time at risk during those years. Individual life expectancy was computed using standard life table techniques. 16 The difference in life expectancy was the observed life years for a runner less their population-matched life expectancy and was then averaged over all 200 runners. The SE of the average difference in life expectancy used the leave-one-out jackknife. Survival curves used years since the first successful sub-4 min mile attempt as the time scale. Runners’ survival used the Kaplan–Meier estimator and the expected survival from combining all runners used the Ederer II method. 17 Confidence intervals were set at 95%.

Equity, diversity and inclusion statement

Our study included all identified sub-4 min mile runners regardless of ethnicity/nationality or socioeconomic status. No women have yet broken the sub-4 min mile barrier, so we were unable to incorporate sex or gender into our analysis. The multidisciplinary authorship team included representation from exercise physiology, sports cardiology and population health, two women and five men, and two early-career and one junior scientist.

Cohort characteristics

Of the first 200 sub-4 min mile runners, the first successful attempt was Roger Bannister in 1954 and the 200th runner in 1974. The nationalities of the included runners spanned 28 different countries across Europe (n=88), North America (n=78), Oceania (n=22) and Africa (n=12). Year of birth for the 200 studied runners ranged from 1928 to 1955. The mean±SD age of runners at completion was 23.4±2.8 years and times to complete the mile ranged between 3:52.86 and 3:59.9 min. For two runners, only year of birth could be determined so we used 31 December for the corresponding year to ensure longevity was not overestimated.

Longevity in sub-4 min mile runners versus the general population

Of the first 200 runners to achieve a sub-4 min mile, 60 (30%) were found to be dead and 140 were alive at the time of the analysis. The average age at death was 73.6±13.7 years (range 24.3–91.9 years) while the average age of the surviving runners was 77.6±5.5 years (range 68.3–93.8). We were unable to ascertain specific causes of death in many of the deceased runners and therefore did not add this to the analysis. However, of the seven runners who died before age 55 with a confirmed reported cause of death, six were due to traumatic deaths or suicide and one was due to pancreatic cancer.

Based on the observed versus expected survival analysis, sub-4 min mile runners showed an increase of 4.74 years (95% CI 4.66 to 4.82; n=200) beyond their predicted life expectancy based on sex, age, year of birth, age at sub-4 min mile completion and nationality ( figure 1 ). When accounting for the decade of completion, those whose first successful attempt was in the 1950s lived an average of 9.2 years (95% CI 8.3 to 10.1; n=22) longer than the general population during an average 67.0 years of follow-up, while those whose first successful attempt was in the 1960s and 1970s showed an increase of 5.5 years (95% CI 5.3 to 5.7; n=88) and 2.9 years (95% CI 2.7 to 3.1; n=90) during average follow-up times of 58.2 and 51.3 years, respectively ( figure 2 ).

  • Download figure
  • Open in new tab
  • Download powerpoint

Proportion (and associated 95% CI) of surviving sub-4 min mile runners (n=200) by each year since their successful attempt compared with referents from the general population (matched for sex, age and nationality).

Proportion (and associated 95% CI) of surviving sub-4 min mile runners by each year since their successful attempt according to the period of their successful attempt (1950–1959, n=22; 1960–1969, n=88; 1970–1979, n=90) compared with referents from the general population (matched for sex, age and nationality).

Longevity in Olympians and non-Olympians

We did not account for specific socioeconomic status other than gender and nationality; however, there was no benefit from being an Olympian compared with a non-Olympian. In fact, there was a trend for non-Olympians to live slightly longer (non-Olympians (n=79): 5.68 years (95% CI 5.6 to 5.76); Olympians (n=121): 4.13 years (95% CI 3.97 to 4.3)).

To the best of our knowledge, this study represents the largest report of the longevity of runners to successfully run 1 mile in under 4 min. Whether such an elite feat has consequences for health and longevity is an important question. In studying the longevity of the first 200 4 min mile runners we show that they have a longer lifespan than the general population and, as a corollary, our analysis shows that breaking previously conceived boundaries of running physiology does not come at the cost of a shortened lifespan. This finding challenges the upper ends of the U-shaped exercise hypothesis (as it relates to longevity) 5 6 and, once again, reiterates the benefits of exercise on the lifespan, even at the levels of training required for elite performance.

The overall cost-benefit of extreme exercise has been a concern for athletes, medical professionals and the broader public at large for some time. 3 5–7 Sub-4 min mile runners represent a unique population to address this question as it is an event that pushes the respiratory, cardiovascular, skeletal muscle and metabolic systems (aerobic and anaerobic) to their maximal limit. 3 Moreover, while the duration of their event is relatively short compared with prototypical endurance sports, the high aerobic and anaerobic requirements of middle distance events such as the mile necessitates completing relatively high training volumes (~9–12 hours or 120–170 km per week), 18 19 although with a higher proportion of this weekly volume (up to 20–30%) including repeated bouts of high-intensity or near maximal efforts. 18 19 The combination of extreme physiological demands, the profound adaptations along the oxygen cascade and repeated bouts of high-intensity exercise training required to achieve such a feat raises the possibility of pushing the body beyond its limits, particularly from an intensity perspective. Our analysis showed that sub-4 min mile runners do not experience a reduced lifespan as a consequence of achieving sporting success but, rather, lived almost 5 years longer than the life expectancy of their peers. This confirms and extends the initial reports from Maron and Thompson of the first 20 sub-4 min mile runners 14 who they reported lived an average of 12 years beyond their life expectancy. Our sub-analysis focused on this same generation of runners (ie, those who completed their attempt in the 1950s) reports a slightly shorter longevity benefit (9.2 years), which likely reflects the more robust statistical and epidemiological approach we used to determine longevity. Interestingly, we found that this benefit remained significant but was progressively attenuated with each subsequent decade of completion (ie, 1960s and 1970s). This may reflect improvements in life expectancy from the general population over this period secondary to advances in diagnosis and management of several major communicable and non-communicable diseases. 20 However, it should be noted that we calculated the cumulative longevity benefit accrued from the time of each athlete’s successful sub-4 min mile attempt until the end of the evaluation period (or their death if it occurred earlier). Therefore, the 1960–69 and 1970–79 cohorts may have up to 10–20 years less time to accrue the longevity benefit than the 1950–59 cohort. The positive longevity effects seen in the sub-4 min mile runners is not specific to middle distance runners. Our results are comparable to the longevity benefits seen in other athletic populations with similarly extreme physiological features and exercise training habits including former Olympians, 12 21–24 Tour de France cyclists, 10 11 elite long distance runners, 23 and Olympic rowers. 13 Taken together, these results continue to challenge the most concerning component of the U-shaped or reverse J-shaped hypothesis (ie, reduced longevity from excessive exercise 9 ) by illustrating that sub-4 min mile runners and other extreme athletic populations do not experience detrimental consequences to their lifespan as a result of their sporting endeavours. However, we extend previous reports focused on athletes representing the duration-dependent mechanism of exercise-induced cardiac injury to a population that performed high volumes of exercise at near maximal to maximal intensity. 18 19

The factors contributing to increased longevity in our cohort and others are yet to be definitively established. The longevity benefits seen in high-level athletic populations are greatest in those participating in endurance sport (ie, running, cycling, rowing), with studies reporting on the longevity of elite power and/or strength athletes showing smaller or no clear longevity benefits. 12 22 23 25 While we could not determine the cause of death for the majority of runners, studies reporting on Tour de France cyclists and cohorts of Olympians (that include middle-to-long distance runners) suggest the longevity effects are primarily mediated by decreased rates of cardiovascular and cancer-related mortality. 11 12 22 25 The physiological mechanisms for these benefits are yet to be determined, but likely reflect the positive adaptations of endurance exercise on cardiovascular, metabolic and immune-related health and function. Indeed, common to all endurance athletes (including middle distance runners) is the development of a high maximal oxygen uptake, 26 which is one of the strongest independent predictors of incident cardiovascular disease, cancer and all-cause mortality. 27 It is also likely that these populations possess favourable genetics and engage in additional healthy lifestyle behaviours beyond exercise training and competition. Indeed, there appears to be a likely genetic component to athletic performance 28 29 that extends to successfully running a sub-4 min mile. Intriguingly, 20 sets of brothers, including six sets of twins, and father and son combinations were among the first 200 sub-4 min mile runners. 4 Of note, three brothers (Jakob, Henrik and Filip Ingebritsen) have achieved the sub-4 min mile, with Jakob being the youngest athlete to achieve the sub-4 min mile at 16 years of age. Whether the genetic, epigenetic (and subsequent phenotypic) features that allow one to run the sub-4 min mile also contribute to their increased longevity is an intriguing question, and highlights the unique health insights that can be gained from studying elite athletic populations.

Study limitations

Limitations in the details that were available for the athlete cohort mean that we could not determine the cause of death for the majority of individuals. However, as noted above, athletes for whom a definitive cause of death was available primarily died as a result of traumatic accidents, which is consistent with other studies of longevity in athletes. 11 22 25 We also do not have any information on the lifelong exercise habits (or other health behaviours) of our cohort, so we cannot determine the precise relationship between lifelong exercise dose and longevity. Studies of elite athletes suggest that a majority continue to regularly perform high-volume and high-intensity exercise training after retirement from competition, 22 25 so part of the longevity benefit reported in our study could also reflect the accrual of the cumulative benefits from lifelong exercise in some athletes. Regardless, the fact that a single metric of performance in early adulthood was able to predict a longevity benefit up to 60 years later suggests a legacy effect of running a sub-4 min mile. Whether that is due to the features required to achieve success or reflects the clustering of lifelong healthy exercise and lifestyle behaviours remains an important question. Moreover, while the lack of exercise training history precludes a dose-response analysis, the broader cohort of sub-4 min mile runners includes some notable athletes such as Nick Willis of New Zealand who has broken the 4 min mile for 20 consecutive years and Steve Scott of the USA who has broken the 4 min mile 137 times in his career, both of whom are still alive today and do not appear to experience major detrimental health effects as a result of these performances.

We also did not see better survival outcomes in the Olympian cohort, which suggests that socioeconomic factors one might expect to be more common in Olympians (financial support, ancillary health behaviour and support) do not mediate the longevity effects. Our comparison against the general population (similar to other studies of elite athletes) precludes assessment of how other lifestyle factors (eg, diet, smoking status), cardiometabolic risk factors and other potential medical confounders to longevity (eg, hypertension, hypercholesterolaemia) or genetics contributed to the increased longevity. It is possible that the extreme exercise required to run a sub-4 min mile is deleterious to the lifespan, but the effect is not large enough to overcome the positive effects of other factors seen in athletes such as favourable genetics, healthy diet, low rates of smoking and other health conditions. However, this is a difficult question to address as it requires comparison against a population with similar characteristics (with the exception of completing extreme exercise). Importantly, our data speak to the mean longevity effects in sub-4 min mile runners, although it should be noted that a minority of athletes (such as those with genetic predisposition) may develop cardiac complications as a direct result or accelerated by exposure to high volumes of intense or long-duration exercise, 6 highlighting the importance of individualised assessment and management in athletic cohorts. 6

Last, our cohort consisted entirely of male athletes. Indeed, to this day, no female has accomplished the sub-4 min mile, with the closest time run by Faith Kipyegon from Kenya at 4:07.64 in 2023 (World record). Unfortunately, we could not readily address this question as there was no comparable database of female athletes. This may also reflect the exclusion of women from middle-to-long distance events at major sporting events such as the Olympics due to prior (and misguided) concerns about the potential ill effects of female athletes performing such extreme exercise (with the women’s 1500 m not introduced until 1972). 30 This latter point in particular highlights the importance of future research to address the longevity of female middle distance runners (either in female mile or 1500 m runners). However, it may require several years to ensure adequate follow-up time has accrued to test the potential longevity effects.

Conclusions

Analysis of the first 200 runners to break the sub-4 min mile shows that they live an average of 4.7 years longer than the general population. This challenges the hypothesis that extreme exercise may be detrimental to longevity and reinforces the benefits of exercise to the lifespan, even at the levels of training required for elite performance.

Ethics statements

Patient consent for publication.

Not applicable.

Acknowledgments

We would like to thank Thomas McMurtry for his assistance with the athlete search process. We also extend our gratitude to the late Bob Sparks, Ian R. Smith and Bob Philips for developing and maintaining the Sub-4 chronicle.

  • Phillips B ,
  • Eijsvogels TMH ,
  • Thompson PD ,
  • Franklin BA
  • Franklin BA ,
  • Al-Zaiti SS , et al
  • Thompson PD
  • Donaldson JA ,
  • Coleman DA , et al
  • Schnohr P ,
  • O’Keefe JH ,
  • Marott JL , et al
  • Sanchis-Gomar F ,
  • Olaso-Gonzalez G ,
  • Corella D , et al
  • Marijon E ,
  • Tafflet M ,
  • Antero-Jacquemin J , et al
  • De Larochelambert Q , et al
  • Antero-Jacquemin J ,
  • Desgorces FD ,
  • Dor F , et al
  • United Nations, Department of Economic and Social Affairs (Population Division)
  • Sandbakk Ø ,
  • Enoksen E , et al
  • Jaensch A , et al
  • Kettunen JA ,
  • Kujala UM ,
  • Kaprio J , et al
  • Kontro TK ,
  • Lee-Heidenreich J ,
  • Lee-Heidenreich D ,
  • Clarke PM ,
  • Walter SJ ,
  • Hayen A , et al
  • Legaz-Arrese A ,
  • Munguía-Izquierdo D ,
  • Nuviala Nuviala A , et al
  • Arena R , et al
  • Bouchard C ,
  • Rice T , et al
  • Konopka MJ ,
  • van den Bunder JCML ,
  • Rietjens G , et al

ALG and MH are joint senior authors.

X @S_FoulkesAEP, @alagerche, @@mhaykows; @iCARE_lab_UofA

Contributors All authors have contributed to and reviewed the revised version of the manuscript. SF, DH, ALG, MH: conceived and designed the research. DH, RS: collected data. DD: analyzed data and prepared figures. SF, DH, DD, RS, PK, ALG, MH: interpreted the results. SF, DH, MH: drafted the manuscript. SF, DH, RS, DD, PK, ALG, MH: edited and revised the manuscript and approved the final version. Guarantor: MH

Funding No funding sources were used in the direct preparation of this article. RS is supported by an Alberta Innovates Postdoctoral Fellowship. MH is supported by an endowed research chair in Aging and Quality of Life from the Faculty of Nursing, University of Alberta.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

IMAGES

  1. Features of a biography poster by moshing

    characteristics of a biography pdf

  2. PPT

    characteristics of a biography pdf

  3. what is biography pdf

    characteristics of a biography pdf

  4. 45 Biography Templates & Examples (Personal, Professional)

    characteristics of a biography pdf

  5. Learn about the features of a biography and how they can help with

    characteristics of a biography pdf

  6. biography vs autobiography

    characteristics of a biography pdf

VIDEO

  1. What is a Biography?

  2. What is Biography? Explain Biography, Define Biography, Meaning of Biography

  3. Writing: Writing a biography

  4. Elements of a Biography

  5. Elements of a Biography

  6. How to Write a Biography: for kids!

COMMENTS

  1. PDF Writing a Biography

    Dictionary.com defines a biography as a "written account of a person's life." A biography can be as short as a sentence or a full length book {some individuals may need multiple volumes}. A short biography tells the basic facts of that person's life. Longer biographies include detailed information, often in story form.

  2. PDF What to Look for When You Read a Biography

    A good biography presents the facts about a person's life including what the subject did and how he or she made a difference in the world. It should also tell the story in an interesting way, showing what the person was really like, how he or she acted, and how others responded. What picture does the author paint of the individual?

  3. PDF Writing A Biography

    2) Series of Events: The main body of your writing should describe signi cant events that occurred in the person's life. These events need to be written in chronological order (order of time). Use third person and past tense. Use capital letters for names of people and places. Use time connectives. Use action verbs.

  4. PDF UNIT 1 BIOGRAPHY AND AUTOBIOGRAPHY: AN INTRODUCTION

    1.7 Answers to Check Your Progress. 1.0 OBJECTIVES. We shall discuss the characteristics of biography and autobiography in this Unit. If you read this Unit carefully, you will be able to: outline the various aspects of biography; describe the origins and development of biography; define the forms of biography and autobiography;

  5. PDF Writing a Biography

    MAIN BODY OF BIOGRAPHY. Using your previous knowledge from your solo talk, or the fact sheets, choose the most important events in their life to write about. You can talk about their childhood if you wish. First Level - Choose 3-4 key events. Second Level - Choose 5-6 key events. Write down your chosen key events on your planning sheet.

  6. PDF Understanding Biographies

    a biography. It is not, however, a cookery book with just the one recipe for 'how to write a biography'. The aim of Understanding Biographies is to uncover the essence of biography. Biography should be seen as an important compo-nent of our cultural heritage and history, and as a genre it has its

  7. PDF The Nature of Biography

    THE NATURE OF BIOGRAPHY John A. Garraty I Biography, to begin with a very simple definition, is the rec-ord of a life. It is thus a branch of history, a small segment of a bigger pattern, just as the story of the development of a town, a state, or a nation may be thought of as an element in a larger whole. The word "biography" has often been ...

  8. PDF Lesson: Introduce the Elements of Biographies

    Although fiction and biography share characteristics, a biography is an informational text with many of the elements learned from the social studies unit. As you read aloud a biography, ask them to pay attention to how the book is an example of nonfiction. Discuss After the read aloud, begin comparing the previous genres studies to the biography.

  9. PDF Lesson Plan to analyze a biography

    2. Determine ideas of a text and analyze their development; summarize the key supporting details and ideas. 5. Analyze the structure of texts, including how specific sentences, paragraphs, and larger parts of the text relate to each other and the whole. Target: Figure out how to identify important events in a biography.

  10. PDF Professional Biography Guide

    The long biography is typically used as a more complete professional introduction of yourself. Long bios are used as an author bio in book writing, as an introduction for speakers/presenters, or as a board director/leadership team description in organizations, among other uses. The long bio is typically between 3-5

  11. PDF Introduction: Biopics, Biography, Heritage, and the Literary ...

    1 INTRODUCTION: BIOPICS, BIOGRAPHY, HERITAGE … 5 generic characteristics we expect from the biopic (1992). I will explore these characteristics in more detail later on in this chapter, but for now, it is important to note their wider industrial and cultural context. Minier and Pennacchia point out that what Custen's extensive study provides us

  12. PDF Year 5/6: Biographies

    Q What is a biography? Link to the Greek prefix of 'bio' meaning 'life' Q What do you think are the features of a biography? Class teacher to make a list of pupil's prior knowledge about the features of a biography Display the poster containing all the features of a biography - discuss any features that the children have not mentioned

  13. Biography

    biography, form of literature, commonly considered nonfictional, the subject of which is the life of an individual.One of the oldest forms of literary expression, it seeks to re-create in words the life of a human being—as understood from the historical or personal perspective of the author—by drawing upon all available evidence, including that retained in memory as well as written, oral ...

  14. PDF THE ART OF BIOGRAPHY IN ANTIQUITY

    Biography'; that is at any rate what I am literally doing, attempting to convey something of a reader's experience of the text as it progresses. At the same time, of course, it has been important to bring in the best of the ix ... the characteristics of a certain composition, or because 'genre expectation' ...

  15. PDF Features of a Biography

    Opens with an attention grabbing introduction that summarises the main events of the person s life and makes the audience want to read on. Key events are written in chronological order. Early life, family, home and influences help the audience to understand the person. Use relevant images and captions for interest.

  16. (PDF) Introduction: The Biographical Turn. Lives in History

    International in scope, The Biographical Turn emphasises that the individual can have a lasting impact on the past and that lives that are now forgotten can be as important for the historical ...

  17. To identify the features of a biography

    Key learning points. In this lesson, we will start off looking at parentheses: brackets, dashes and commas and their function within a sentence. We will explore the features of a biography: layout, punctuation, purpose, language, sentence structure, etc.

  18. PDF GENRE UNIT: BIOGRAPHIES

    A biography on the life of George Washington which highlights many memorable moments in his life. Some of these include that he was a boy who had a bossy mother, was a great athlete and horseback rider, and was the very first president of the brand-new United States. This book has 101 pages (10 chapters) with numerous maps, diagrams and

  19. (PDF) Biographical methods

    The life story method is also known as the life history, biographical method and similar, which is a retrospective narrative of an individual about his life or parts of life in either written or ...

  20. PDF Box 5.2 Characteristics of Ancient Biographies © 2018 Mark Allan Powell

    Biography Box 5.2 Characteristics of Ancient Biographies • no pretense of detached objectivity • no concern for establishing facts (e.g., by citing evidence or sources) • little attention to historical data (names, dates, places) • little attention to chronology of events or development of the subject's thought

  21. 1.4: Characteristics of Life

    This page titled 1.4: Characteristics of Life is shared under a CK-12 license and was authored, remixed, and/or curated by CK-12 Foundation via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

  22. PDF Characteristics of Life

    A bacterium or a protist like amoebas and paramecia are unicellular. However, most of the organisms you are familiar with, such as dogs and trees, are multicellular. Multicellular organisms contain hundreds, thousands, even trillions of cells or more. Multicellular organisms may have their cells organized into tissues, organs, and systems.

  23. PDF Characteristics of Life

    most bacteria have several 3-4000 genes. animals and plants have 10-30,000 of genes. 4. Metabolism. 6. life, at simplest level, can be visualized as a bag of 1000's of chemical reactions all occurring at the same time = metabolism. metabolism = all the chemical reactions occurring in a cell or organism.

  24. Sequence characteristics and an accurate model of abundant ...

    eLife assessment. This valuable study explores the sequence characteristics and conservation of high-occupancy target loci, which are genomic regions bound by a multitude of transcription factors, at promoters and enhancers throughout the human genome. The computational analyses presented in this study are solid, although the evidence for some claims is inadequate.

  25. Virological characteristics of the SARS-CoV-2 KP.2 variant

    The JN.1 variant (BA.2.86.1.1), arising from BA.2.86(.1) with the S:L455S substitution, exhibited increased fitness and outcompeted the previous dominant XBB lineage by the biggening of 2024. JN.1 subsequently diversified, leading to the emergence of descendants with spike (S) protein substitutions such as S:R346T and S:F456L. Particularly, the KP.2 (JN.1.11.1.2) variant, a descendant of JN.1 ...

  26. Outrunning the grim reaper: longevity of the first 200 sub-4 min mile

    Methods As part of this retrospective cohort study, the Sub-4 Alphabetic Register was used to extract the first 200 athletes to run a sub-4 min mile. Each runner's date of birth, date of their first successful mile attempt, current age (if alive) or age at death was compared with the United Nations Life Tables to determine the difference in each runner's current age or age at death with ...