• Chapter 6. Ecological studies

Geographical comparisons

Time trends, occupation and social class.

  • Chapter 1. What is epidemiology?
  • Chapter 2. Quantifying disease in populations
  • Chapter 3. Comparing disease rates
  • Chapter 4. Measurement error and bias
  • Chapter 5. Planning and conducting a survey
  • Chapter 7. Longitudinal studies
  • Chapter 8. Case-control and cross sectional studies
  • Chapter 9. Experimental studies
  • Chapter 10. Screening
  • Chapter 11. Outbreaks of disease
  • Chapter 12. Reading epidemiological reports
  • Chapter 13. Further reading

Follow us on

Content links.

  • Collections
  • Health in South Asia
  • Women’s, children’s & adolescents’ health
  • News and views
  • BMJ Opinion
  • Rapid responses
  • Editorial staff
  • BMJ in the USA
  • BMJ in South Asia
  • Submit your paper
  • BMA members
  • Subscribers
  • Advertisers and sponsors

Explore BMJ

  • Our company
  • BMJ Careers
  • BMJ Learning
  • BMJ Masterclasses
  • BMJ Journals
  • BMJ Student
  • Academic edition of The BMJ
  • BMJ Best Practice
  • The BMJ Awards
  • Email alerts
  • Activate subscription

Information

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.1 - what is an ecological study.

An ecological study is an observational study in which at least one variable is measured at the group level. An ecological study is especially appropriate for initial investigation of causal hypothesis. Let's look at an example to understand what a group-level variable is.

Example 6-1: Section  

Results from an ecological study examining diet and sunlight as risks for prostate cancer mortality.

Colli JL, Colli A. International comparison of prostate cancer mortality rates with dietary practices and sunlight levels . Urologic Oncology 2006:24;184-194.

In Figure 1 (reproduced from the Colli paper), the x- axis indicates per capita sugar calories consumed per day. The y-axis represent the age-adjusted mortality rate for prostate cancer. Each dot represents a country, plotting the per-capita sugar consumption and the age-adjusted prostate cancer mortality rate for that particular country. Per-capita sugar consumption is an ecological variable because it is the average measure of exposure to sugar for all the people in the country. It does not mean that every person in the country ate exactly the same specified amount of sugar. Similarly, the mortality rate is a group-level variable because it represents the country's experience, not any individual person's experience in the country. Neither of these variables are at the individual-level. The line in the figure is a fitted regression for the dots, showing a standard statistical procedure for ecological studies. (However, statistical procedures for ecological studies can be quite complex!) Do the data support the hypothesis that increased sugar consumption is associated with increased prostate cancer mortality? - YES! Do the data determine that sugar consumption causes prostate cancer death? NO!

The study also considered the effects of other factors on prostate cancer mortality. In Table 3, the authors consider the effect of a country's latitude and ultraviolet (UV) index on prostate cancer mortality rates. The data support the hypothesis that sunlight exposure is associated with prostate cancer death.

Abbreviations: B = coefficient from regressing prostate cancer mortality on the independent variable; P =probability value used to test the null hypothesis H o :B = 0:R = Pearson correlation coefficient.

Finally, the investigators used a stepwise multiple regression model to show that ultraviolet index, cereals, sugar and onion consumption provided the best fit when ambient sunlight exposure was included as a factor. What conclusion do we draw?

Come up with an answer to this question and then click on the button below to reveal the answer.

What conclusion do we draw? Does a lack of onions and cereal in the diet, overindulging in sugar along with low UV index cause prostrate cancer death?

No, causation is not established by any individual study not matter how strong the findings or how strong the study design. In this study, there was also a strong relationship between animal fat consumption and prostrate cancer; however, animal fat consumption was collinear with UV index and thus if UV index was in the model, animal fat consumption was not.

  • Type 2 Diabetes
  • Heart Disease
  • Digestive Health
  • Multiple Sclerosis
  • Diet & Nutrition
  • Supplements
  • Health Insurance
  • Public Health
  • Patient Rights
  • Caregivers & Loved Ones
  • End of Life Concerns
  • Health News
  • Thyroid Test Analyzer
  • Doctor Discussion Guides
  • Hemoglobin A1c Test Analyzer
  • Lipid Test Analyzer
  • Complete Blood Count (CBC) Analyzer
  • What to Buy
  • Editorial Process
  • Meet Our Medical Expert Board

Ecological Analysis on Population Health

An ecological analysis is a way for scientists to look at large-scale impacts of time-specific interventions on population health . In these types of studies, researchers examine the health of a population before and after some time-specific event or intervention.

For example, ecological analyses are often performed on data collected before and after the introduction of a national vaccination program. They can also be performed after a major natural disaster to see if there were any public health consequences.

Ecological analyses are not limited to researching the effects of health interventions. They can also be used to analyze the impact of political or environmental changes and natural disasters on health or to assess non-health outcomes.

Orbon Alija / Getty Images

The sole defining characteristic of ecological analyses is that the unit being analyzed is the population, not the individual. They are based on population statistics and do not generally take into account the timeline or details of any specific person's health.

For example, an ecological study that looks at abnormal Pap smear rates before and after the initiation of a nationwide HPV vaccination program would not look at whether any particular individual had been vaccinated. Instead, it would simply look at the prevalence of abnormal results in the years before and after vaccinations had begun.

Although ecological analyses can be quite useful when it comes to looking at the impacts of large-scale interventions, they are limited by the fact that they cannot look at cause and effect in individuals. It is important to take this into account when interpreting their results.

Ecological studies have been used to refute the proposed link between autism and the MMR vaccine. When researchers have examined autism rates before and after the initiation of vaccination programs (or before and after changes in vaccine compliance), they have seen no correlation between autism and vaccination.

Instead of a relationship with vaccines, it appears that autism rates have climbed slowly over time—possibly due to changes in diagnostic criteria and/or unidentified environmental factors.

Another example of ecological analysis is an examination of the effect of HPV vaccination on abnormal Pap smears or on cervical cancer rates. Several studies in countries with a far wider uptake of the HPV vaccine than in the United States have done just that.

Research in the United Kingdom and Australia has shown decreases in genital warts , as well as a decline in pre-cancerous cervical changes.

Gerber JS, Offit PA. Vaccines and autism: a tale of shifting hypotheses .  Clin Infect Dis . 2009;48(4):456-461. doi:10.1086/596476

Centers for Disease Control and Prevention.  Data and statistics on autism spectrum disorder . Reviewed September 25, 2020.

El-Zein M, Richardson L, Franco EL. Cervical cancer screening of HPV vaccinated populations: Cytology, molecular testing, both or none .  J Clin Virol . 2016;76 Suppl 1(Suppl 1):S62-S68. doi:10.1016/j.jcv.2015.11.020

Davidsson A. Disasters as an opportunity for improved environmental conditions . International Journal of Disaster Risk Reduction , Volume 48. 2020. doi:10.1016/j.ijdrr.2020.101590

Epidemiology for the Uninitiated, fourth edition, Chapter 6: Ecological studies . BMJ Publications.

By Elizabeth Boskey, PhD Boskey has a doctorate in biophysics and master's degrees in public health and social work, with expertise in transgender and sexual health.

ecological analysis of case study

PH717 Module 1B - Descriptive Tools

Descriptive epidemiology & descriptive statistics.

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  

On This Page sidebar

Ecological Studies (Correlational Studies)

Test yourself.

Learn More sidebar

Ecologic studies assesses the overall frequency of disease in a series of populations and looks for a correlation with the average exposure in the populations. These studies are unique in that the analysis is not based on data on individuals. Instead, the data points are the average levels of exposure and the overall frequency of disease in a series of populations. Therefore, the unit of observation is not a person ; rather, it is an entire population or group.

In the study below investigators used commerce data to compute the overall consumption of meat by various nations. They then calculated the average (per capita) meat consumption per person by dividing total national meat consumption by the number of people in a given country. There is a clear linear trend; countries with the lowest meat consumption have the lowest rates of colon cancer, and the colon cancer rate among these countries progressively increases as meat consumption increases.

ecological analysis of case study

Note that in reality, people's meat consumption probably varied widely within each nation, and the exposure that was calculated was an average that assumes that everyone ate the average amount of meat. This average exposure was then correlated with the overall disease frequency in each country. The example here suggests that the frequency of colon cancer increases as meat consumption increases. The characteristic of ecological studies that is most striking is that there is no information about individual people. If the data were summarized in a spread sheet, you would not see data on individual people; you would see records with data on average exposure in multiple groups .

An ecological study correlated per capita alcohol consumption to death rates from coronary heart disease (CHD) in different countries, and it appeared that there was a fairly striking negative correlation as shown in the graph below.

ecological analysis of case study

However, a cohort study with data on alcohol consumption in individual subjects showed that there was a J-shaped relationship. People who drank modestly had a lower mortality rates than those who did not drink at all, but among higher levels of individual consumption there was a striking linear increase in mortality, as shown in the graph below.

ecological analysis of case study

Source: Adapted from AR Dyer et al. Alcohol consumption and 17-year mortality in the Chicago Western Electric Company Study. Prev. Med . 1980; 9(1):78-90.

The real question was whether individuals who drank heavily had higher or lower mortality rates than those who drank modestly or not all, but the ecologic study led to an incorrect conclusion because it was based on aggregate data. In reality, most people drink modestly, but mortality rates are much greater in the small number of people who drink very heavily. The misleading conclusion from the ecologic study is an example of the ecologic fallacy.

Is the following statement true or false?

For an exposure to cause a health outcome the exposure must precede the outcome in a given person. This is what is observed in ecologic studies.

To see an extraordinary example of an ecologic study, play the video below created by Hans Rosling. This is a magnificent example that examines the correlation between income and life expectancy in the countries of the world over time. It is also a terrific example of a creative, engaging, and powerful way to display a vast quantity of data.

alternative accessible content

return to top | previous page | next page

Content ©2020. All Rights Reserved. Date last modified: September 10, 2020. Wayne W. LaMorte, MD, PhD, MPH

  • Research article
  • Open access
  • Published: 04 September 2020

An ecological study of socioeconomic predictors in detection of COVID-19 cases across neighborhoods in New York City

  • Richard S. Whittle   ORCID: orcid.org/0000-0002-7437-5433 1 , 2 &
  • Ana Diaz-Artiles   ORCID: orcid.org/0000-0002-0459-9327 1 , 2  

BMC Medicine volume  18 , Article number:  271 ( 2020 ) Cite this article

11k Accesses

97 Citations

55 Altmetric

Metrics details

New York City was the first major urban center of the COVID-19 pandemic in the USA. Cases are clustered in the city, with certain neighborhoods experiencing more cases than others. We investigate whether potential socioeconomic factors can explain between-neighborhood variation in the COVID-19 test positivity rate.

Data were collected from 177 Zip Code Tabulation Areas (ZCTA) in New York City (99.9% of the population). We fit multiple Bayesian Besag-York-Mollié (BYM) mixed models using positive COVID-19 tests as the outcome, a set of 11 representative demographic, economic, and health-care associated ZCTA-level parameters as potential predictors, and the total number of COVID-19 tests as the exposure. The BYM model includes both spatial and nonspatial random effects to account for clustering and overdispersion.

Multiple regression approaches indicated a consistent, statistically significant association between detected COVID-19 cases and dependent children (under 18 years old), population density, median household income, and race. In the final model, we found that an increase of only 5% in young population is associated with a 2.3% increase in COVID-19 positivity rate (95% confidence interval (CI) 0.4 to 4.2%, p =0.021). An increase of 10,000 people per km 2 is associated with a 2.4% (95% CI 0.6 to 4.2%, p =0.011) increase in positivity rate. A decrease of $10,000 median household income is associated with a 1.6% (95% CI 0.7 to 2.4%, p <0.001) increase in COVID-19 positivity rate. With respect to race, a decrease of 10% in White population is associated with a 1.8% (95% CI 0.8 to 2.8%, p <0.001) increase in positivity rate, while an increase of 10% in Black population is associated with a 1.1% (95% CI 0.3 to 1.8%, p <0.001) increase in positivity rate. The percentage of Hispanic ( p =0.718), Asian ( p =0.966), or Other ( p =0.588) populations were not statistically significant factors.

Conclusions

Our findings indicate associations between neighborhoods with a large dependent youth population, densely populated, low-income, and predominantly black neighborhoods and COVID-19 test positivity rate. The study highlights the importance of public health management during and after the current COVID-19 pandemic. Further work is warranted to fully understand the mechanisms by which these factors may have affected the positivity rate, either in terms of the true number of cases or access to testing.

Peer Review reports

On 21 January 2020, the first case of coronavirus disease 2019 (COVID-19) in the USA was reported in Washington State [ 1 ]. The first case was not reported in New York state until 1 March 2020 [ 2 ]. By the time the World Health Organization (WHO) declared a global pandemic on 11 March 2020, there were 345 cases in New York City (NYC), and this number skyrocketed to nearly 18,000 cases just 2 weeks later [ 2 , 3 ]. NYC rapidly became the epicenter of the pandemic in the USA, with a transmission rate five times higher than the rest of the country, and over a third of all confirmed national cases by early April [ 4 ].

During a pandemic, there is likely to be large variation in both disease transmission and disease testing between regions [ 5 ]. These two factors cause large variation in disease reporting between different areas [ 6 ]. This is particularly true in the early stages of the outbreak, before disease testing has become widespread and standardized.

Contemporary and historical studies on previous pandemics, including H1N1 pandemics in 1918 and 2009, suggest that socioeconomic factors on a national level can affect detection rates and medical outcomes [ 7 – 9 ]. Thus, socioeconomic factors such as young or old populations, race, affluence, inequality, poverty, unemployment, insurance, or access to healthcare may account for differences in reported cases of COVID-19 between neighborhoods in NYC.

The aim of this ecological study was to identify potential neighbourhood-level socioeconomic determinants of the COVID-19 test positivity rate and explain between-neighborhood variation during the early, exponential growth stage of the pandemic in NYC: from the first detected case in 1 March until 5 April 2020.

Data collection

Data on positive COVID-19 cases were collected from NYC Department of Health and Mental Hygiene (DOHMH) Incident Command System for COVID-19 Response (Surveillance and Epidemiology Branch in collaboration with Public Information Office Branch) [ 2 ]. Since the NYC DOHMH was discouraging people with mild to moderate symptoms from being tested during the time period covered, the data primarily represents people with more severe illness. Since at the time of writing the pandemic is still ongoing, data were taken at a snapshot on 5 April2020. This date was chosen to cover the first month of the pandemic in NYC, since understanding early etiology of the pandemic and local influences is important in helping to inform future management [ 10 ]. Data were a cumulative count up to and including 5 April 2020. On this date, NYC had a cumulative total of 64,955 cases [ 11 ], including deaths and hospitalizations.

The available dataset included 64,512 cases (99.3% of total cases), with each case representing a positive diagnosis of COVID-19 along with the patient’s Zip Code Tabulation Area (ZCTA). ZCTAs are generalized areal representations of United States Postal Service (USPS) Zip Code service areas. ZCTAs were the areas in which patients reported their home address, as opposed to either where they became symptomatic or where they reported for testing/treatment. The area of interest covered 177 ZCTAs within NYC, from 10001 (Chelsea, Manhattan) to 11697 (Breezy Point, Queens). Of these cases, there were 4712 where the patient ZCTA was unknown and thus these cases were discarded, leaving 59,800 cases (92.1% of total cases). Note that this total is not meant to be an indicator of the total number of COVID-19 cases at this time, rather the count of detected cases. The dataset also included the total number of tests conducted by ZCTA. Figure  1 a shows a histogram of detected cases by ZCTA as at 5 April 2020, grouped by the five boroughs of NYC (Bronx, Brooklyn, Manhattan, Queens, and Staten Island); Fig.  1 b displays these cases on a map as a percentage of total COVID-19 tests performed.

figure 1

New York City detected COVID-19 cases by Zip Code Tabulation Area (ZCTA). As at 5 April 2020. a Histogram of detected cases by ZCTA, grouped by borough. b Positivity rate, or detected cases as a percentage of total tests

Data on potential predictor variables were collected from the United States Census Bureau American Community Survey (ACS). ACS is a continuous sample survey of 3.5 million households every year including questions beyond the decadal census on subjects such as education, employment, internet access, and transportation. Data were collected at ZCTA level from the ACS 2014-2018 5-year estimate [ 12 ], which is the most recent publicly available.

The 5-year estimate was chosen instead of the most recent 1-year estimate because the latter was not available in an aggregated form at ZCTA level and only at the Public Use Microdata Area (PUMA) level. PUMAs contain multiple ZCTAs, but for the most part, the boundaries are not equivalent to the ZCTA boundaries used in the COVID-19 dataset. In addition, while the 5-year estimate is less current, it has a smaller margin of error than the 1-year estimate and greater statistical reliability for small geographic areas. To further understand any potential differences, we compared a sample of the ACS 5-year estimate with the most recent available 1-year estimate in an area where these two area systems overlap: Rockaway Peninsula, where PUMA area 3604114 ( NYC Queens Community District 14: Far Rockaway, Breezy Point & Broad Channel PUMA ) overlaps with ZCTAs 11691, 11692, 11693, 11694, and 11697. We found agreement in all parameters included in our study within the margins of error of the survey.

Demographic parameters

Five demographic parameters were included in the study: percentage of young dependent population, Young ; percentage of aged population, Aged ; males per 100 females, MFR ; percentage of the population identifying as white, Race ; and population density, Density . Young dependent population was defined as the percentage of the total population aged under 18. Aged population was the percentage of the total population 65+. These are both typically economically inactive populations. The increased severity of COVID-19 with increasing age has been well documented [ 13 ], and there has been recent evidence of asymptomatic carrier transmission particularly among young people [ 14 , 15 ]. Males per 100 females was chosen to capture the balance of sex in the population. We were interested in whether sex differences lead to significant variation in detected cases. Some reports suggest a racial disparity in case detection rates across the USA. A report from NYU Furman Center for housing, neighborhoods, and urban policy suggests mortality rates are higher among the city’s “Hispanic, Black, and non-Hispanic/Latino: Other” populations [ 16 ]. For the present study, we initially chose to include the percentage of the population that identify as white (alone or in combination with another race) as a combined indicator of all minority populations. Thus, we united multiple races with distinct levels of COVID-19 incidence [ 17 ] into a single metric for model building purposes (i.e., white vs non-white). Then, we also considered a more detailed analysis of the racial structure of neighborhoods by further analyzing five separate racial groups: White, Black, Hispanic, Asian, and Other (including American Indian and Alaska Native, Native Hawaiian and Other Pacific Islanders, Caribbean, and Mixed Race). Finally, we also included population density based on studies of the 2008 H1N1 Influenza pandemic highlighting population density as a significant risk factor for transmission [ 18 ]. The distributions of demographic predictors in the area of interest are shown in Fig.  2 .

figure 2

New York City demographic predictors by Zip Code Tabulation Area (ZCTA). Data based on American Community Survey (ACS) 2018 5-year estimates. a Young , percentage of population aged under 18. b Aged , percentage of population aged 65+. c MFR , males per 100 females. d Race , percentage of population that identify as white (alone or in combination with another race). e Density , population density in ’000s persons per km 2

Economic parameters

Four economic parameters were included in the study: Gini index, Gini ; median household income, Income ; percentage of labor force unemployed, Unemployment ; and percentage of population living below the poverty threshold, Poverty . Gini index is a measure of economic inequality ranging from 0 to 1. An index of 0 indicates all the wealth in an area is divided equally among the population, while an index of 1 indicates all the wealth is held by one individual. While some studies have argued against the adverse effects of unequal income [ 19 ], an association has been demonstrated between inequality and population health [ 20 ]. We also included household income, which was a significant predictor for hospitalizations in the 2009 influenza pandemic [ 21 ]. Specifically, in the present study, we use median household income as a ZCTA-level predictor. Finally, unemployment and poverty both have documented association with health outcomes, including in pandemic scenarios [ 22 , 23 ]. While there is some level of collinearity between these two variables, we include both as one relates to the economically active labor force whereas the other relates to the total population. The distributions of economic predictors in the area of interest are shown in Fig.  3 .

figure 3

New York City economic predictors by Zip Code Tabulation Area (ZCTA). Data based on American Community Survey (ACS) 2018 5-year estimates. a Gini , Gini index. b Income , median household income. c Unemployment , percentage of working age population unemployed. d Poverty , percentage of total population living below the poverty threshold

Health parameters

Two parameters related to healthcare access were included in the study: percentage of population uninsured, Uninsured ; and total number of hospital bed per 1000 people within 5 km, Beds . It has been documented that lack of insurance can delay access to timely healthcare, particularly during pandemics [ 24 ]. We hypothesized that this parameter could affect virus transmission and/or access to testing, therefore affecting detection rates. Finally, we chose Beds as a parameter related to proximity to healthcare, which has been shown to be inversely associated with adverse outcomes in other geospatial public health studies [ 25 ]. For a city containing multiple hospitals such as NYC, we defined a proximity metric in this study as population normalized number of hospital beds within 5 km. This predictor was chosen as a secondary metric reflecting general societal access to healthcare and localized investment in healthcare infrastructure. The distributions of health related predictors in the area of interest are shown in Fig.  4 a, b. Figure  4 also shows two other factors used in the model; Fig.  4 c shows the number of tests conducted in each ZCTA used as the model exposure, and Fig.  4 d shows the neighborhood connectivity between ZCTAs, used for spatial effects.

figure 4

New York City health predictors by Zip Code Tabulation Area (ZCTA). Data based on American Community Survey (ACS) 2018 5-year estimates. a Uninsured , percentage of total population uninsured. b Beds , total number of hospital beds per 1000 people within 5 km. c Total COVID-19 tests (exposure). d neighborhood connectivity

Statistical analysis

Prior to analysis of potential predictors, we considered multiple base regression models. Given the significant spatial correlation in the present case data as evidenced by the Moran Index, I (176)=0.642, p <0.0005 [ 26 ], we explored potential regression models both with and without spatial effects. We compared four base models (no predictors): (1) a Poisson model with random intercept, (2) a Poisson Besag-York-Mollié (BYM) model [ 27 ], (3) a negative binomial model with random intercept, and (4) a negative binomial BYM model. The BYM model is the union of a Besag model [ 28 ], υ , and a nonspatial random effect, ν , such that the linear predictor for spatial unit i , η i , is given by Eq 1 :

where υ i has an intrinsic conditional autoregressive (ICAR) structure [ 29 ]. We used the reparameterization of the BYM model proposed by Riebler et al. [ 30 ], known as the BYM2 model and shown in Eq 2 :

where τ γ is the overall precision hyperparameter, φ ∈ [0,1] is the mixing hyperparameter representing the proportional division of variance between the spatial and nonspatial effects, υ ∗ is the spatial (ICAR) effect with a scaling factor such that Var( υ ∗ )≈1, and ν ∗ is the nonspatial random-effect with ν ∗ ∼ N(0,1). Penalized complexity (PC) priors are applied to hyperparameters τ γ and φ (compared to log-gamma priors in the random intercept model) [ 31 ]. All four models used ZCTA total number of COVID-19 tests as the exposure and a log-link function. We selected the model with the lowest deviance information criterion (DIC) [ 32 ], representing the best trade-off between model fit and complexity.

Characteristics for the four base models examined, including hyperparameters, are shown in Table  1 . The two Poisson models (models 1 and 2) had significantly lower DIC than the negative binomial models. The Poisson BYM2 model (model 2) was marginally better than the simple random effect model (model 1). Thus, the Poisson BYM2 model was selected and used for all future analyses and regressions.

Adding predictors

Multiple regression models were built using a method adjusted from Nikolopoulos et al. [ 33 ]. In the univariable models, we considered each predictor variable separately (i.e., one model per variable). In the multivariable model, we considered all predictor variables together. We further built a partial multivariable model using only those predictors that were significant in the univariable models. Finally, we built a model using stepwise backwards elimination procedure, starting with the fully saturated model and removing the least significant predictor until we were left with a model containing only significant predictors [ 33 ]. In all cases, the expected number of detected COVID-19 cases in ZCTA i , λ i , was represented by Eq 3 :

where E i is the exposure (i.e., number of tests) for ZCTA i , β 0 is the intercept, β p is coefficient of the fixed effect for predictor p ∈ {1... P }, x ip is the value of predictor p in ZCTA i , and the spatial and nonspatial random effects for ZCTA i are described by the BYM2 model detailed above. Vague Gaussian priors are assumed on all β .

Model fitting

Regression estimates are presented as mean and 95% confidence intervals (CI) sampled from the posterior marginal distribution, along with corresponding p values. We used posterior tail-area of the fixed effects as a Bayesian counterpart to p value [ 34 ]. All significance levels were two-sided with p value of <0.05 considered statistically significant. Statistical analysis was performed using R Statistical Software (version 4.0.0; R Foundation for Statistical Computing, Vienna, Austria). Models were fit via integrated nested Laplace approximation [ 35 ] using the R-INLA package [ 36 ]. Vague priors were assumed on all models.

As at 5 April 2020, 59,800 COVID-19 cases were reported with a known ZCTA. The highest number of cases in any particular ZCTA was 1,446 in ZCTA 11368 (Corona, Queens), while the lowest was 7 in ZCTA 10006 (Wall St, Manhattan). With respect to the proportion of tests returned positive, these two ZCTAs also had the highest and lowest positivity rates (23.33% and 77.70% respectively). On average, 0.71% of the total NYC population had tested positive for COVID-19, with 56.47% of total tests conducted returning a positive result.

Using the base model, Fig.  5 a shows the area specific relative risk ζ i . A value of ζ i =1 represents a positivity rate in line with the total population average (56.47% of total COVID-19 tests in area i have returned positive), while, for example, a value of ζ i =1.2 represents a positivity rate 1.2 times the total population average (67.76%). Figure  5 b shows the posterior probability that the relative risk is greater than 1, p ( ζ i >1| y ). The map shows that the highest risk area is Corona, Queens, with three other significant clusters in the Bronx, Southeast Queens, and Southwest Brooklyn.

figure 5

Disease mapping model for COVID-19 cases in New York City by Zip Code Tabulation Area (ZCTA). As at April 5, 2020, using base Poisson BYM2 model with no predictors. The area specific relative risk is multiplied by the total population average COVID-19 positivity rate (56.47%) to give the area specific positivity rate. a Area-specific relative risk, ζ i . b Posterior probability for relative risk, p ( ζ i >1| y )

Spread and collinearity of the predictors was assessed through histograms, bivariate scatterplots, and Pearson correlation coefficients. The strongest collinearities existed between income, poverty, and unemployment. There was only one bivariate correlation above 0.7 (median household income and poverty) and none above 0.8. It was decided to leave all predictors in the analysis and to build multiple regression models in order to consider the effects of collinearity. Figure  6 shows panel plots of the bivariate relations between the predictors.

figure 6

Panel plot showing bivariate relationships between predictors. Diagonal : Distribution of all 11 predictor variables. Lower: Bivariate scatter plots. Upper: Pearson correlations between pairs of predictors

Table  2 shows a summary of the regression estimates from the different regression models investigated. In particular, four predictors appear significant in all four models: percentage of dependent youth population, race, population density, and median household income. Percentage change in the COVID-19 positivity rate per unit change in the predictors can be found from exp( β ).

Concerning youth dependency ( Young ), a 5% increase in the percentage of young population leads to an increase in COVID-19 positivity rate of 4.8% (95% CI 2.9 to 6.7%, p <0.001) in the univariable model, an increase of 3.3% (95% CI 1.0 to 5.5%, p =0.005) in the full multivariable model, an increase of 3.9% (95% CI 1.7 to 6.0%, p =0.001) in the partial multivariable model, and an increase of 2.5% (95% CI 0.6 to 4.3%, p =0.009) in the stepwise backwards elimination model. Concerning race ( Race ), a 10% decrease in the white population leads to an increase in COVID-19 positivity rate of 2.8% (95% CI 2.0 to 3.5%, p <0.001) in the univariable model, an increase of 1.8% (95% CI 0.9 to 2.7%, p <0.001) in the full multivariable model, an increase of 1.4% (95% CI 0.4 to 2.3%, p =0.005) in the partial multivariable model, and an increase of 1.9% (95% CI 1.0 to 2.8%, p <0.001) in the stepwise backwards elimination model. Concerning population density ( Density ), an increase of 10,000 people per km 2 leads to an increase in COVID-19 positivity rate of 3.1% (95% CI 1.2 to 5.0%, p =0.002) in the univariable model, an increase of 3.2% (95% CI 1.3 to 5.0%, p =0.001) in the full multivariable model, an increase of 2.3% (95% CI 0.5 to 4.1%, p =0.013) in the partial multivariable model, and an increase of 3.4% (95% CI 1.6 to 5.1%, p <0.001) in the stepwise backwards elimination model. Finally, concerning income ( Income ), a $10,000 decrease in median household income leads to an increase in COVID-19 positivity rate of 2.8% (95% CI 2.1 to 3.4%, p <0.001) in the univariable model, an increase of 2.5% (95% CI 1.3 to 3.6%, p <0.001) in the full multivariable model, an increase of 2.6% (95% CI 1.3 to 3.8%, p <0.001) in the partial multivariable model, and an increase of 2.1% (95% CI 1.2 to 2.9%, p <0.001) in the stepwise backwards elimination model.

Final model

A final model was built using percentage of young dependent population ( Young ), race ( Race ), population density ( Density ), and median household income ( Income ) as predictors. Table  3 shows a summary of the regression estimates from this model. Figure  7 a shows the area specific relative risk ζ i for this model, while Fig.  7 b shows the posterior probability that the relative risk is greater than 1, p ( ζ i >1| y ). In this model, a 5% increase in the young population leads to a 2.3% (95% CI 0.4 to 4.2%, p =0.021) increase in COVID-19 positivity rate. A 10% decrease in the white (alone or in combination with another race) population leads to a 1.2% (95% CI 0.3 to 2.1%, p =0.021) increase in COVID-19 positivity rate. A 10,000 person per km 2 increase in population density leads to a 2.4% (95% CI 0.6 to 4.2%, p =0.011) increase in COVID-19 positivity rate. A $10,000 decrease in median household income leads to a 1.6% (95% CI 0.7 to 2.4%, p <0.001) increase in positivity rate. Figure  8 shows the positivity rate for COVID-19 by ZCTA against each of these predictors, along with our regression estimates and CIs.

figure 7

Ecological regression model for COVID-19 cases in New York City by Zip Code Tabulation Area (ZCTA). As at April 5, 2020, final Poisson BYM2 model including percentage of young population, percentage of population identifying as white (alone or in combination with another race), population density, and median household income as predictors. a Area-specific relative risk, ζ i . b Posterior probability for relative risk, p ( ζ i >1| y )

figure 8

Positivity rate for total COVID-19 tests in New York City by Zip Code Tabulation Area (ZCTA) against predictors used in final model. As at 5 April 2020, using final Poisson BYM2 model. Red regression lines show model estimates and 95% confidence interval (CI) with other predictors held at their mean values. a Percentage of young population. b Percentage of population that identify as white (alone or in combination with another race). c Population density. d Median household income

To further investigate the significant predictor race, we conducted additional modeling efforts and divided Race into five racial groupings: White, Black or African American, Hispanic, Asian, and Other (including American Indian and Alaska Native, Native Hawaiian and Other Pacific Islanders, Caribbean, and Mixed Race). We ran the final model five times which each of these racial groups considered explicitly one at a time. Table  4 shows a summary of the regression estimates from these models. In all cases, the significance of the other three predictors ( Young , Density , and Income ) was unchanged.

We found race ( Race ) to be significant for proportion of White population ( p <0.001) and Black population ( p <0.001), but not for Hispanic ( p =0.718), Asian ( p =0.966), or Other ( p =0.588) populations. A 10% decrease in the White (alone) population leads to a 1.8% (95% CI 0.8 to 2.8%) increase in the positivity rate, while a 10% increase in the Black population leads to a 1.1% (95% CI 0.3 to 1.8%) increase in the positivity rate. Figure  9 shows the positivity rate for COVID-19 by ZCTA as a function of the percentage of White and Black populations, along with our regression estimates and CIs.

figure 9

Positivity rate for total COVID-19 tests in New York City by Zip Code Tabulation Area (ZCTA) as a function of race. As at 5 April 2020, Poisson BYM2 models incorporating explicit racial groupings along with young population ( Young ), population density ( Density ), and median household income ( Income ) as predictors. Regression lines show model estimates and 95% confidence interval (CI) with other predictors held at their mean values. a Percentage of population identifying as white. b Percentage of population identifying as Black

During the opening stages of the COVID-19 pandemic in NYC, there was considerable variation in detected cases between neighborhoods in the city. Disease mapping shown in Fig.  5 displays a number of high risk areas, notably around Corona, Southeast Queens, East Bronx, and the orthodox Jewish community around Borough Park, Brooklyn. The unprecedented national response included a large number of media stories touting various covariates as predictors of either COVID-19 cases or mortality. In this ecological study, we attempted to use spatial modeling techniques to assess the association between number of COVID-19 cases detected in different neighborhoods of NYC and neighbourhood-level predictors. Our findings indicated a significant direct association between detected cases and the proportion of young dependents in the population as well as population density. We also found a significant inverse relationship between detected cases and median household income. We further found a significant positive association between COVID-19 cases and the proportion of the population identifying as black, and conversely, an inverse relationship with the proportion of the population identifying as white. We did not find a consistently significant relationship between detected cases and the other potential predictors; even those such as poverty, unemployment, and lack of insurance that were significant in a univariable model.

Our findings indicate statistically significant associations between three of the five demographic predictors included in the study. We find percentage of young dependents in the population to be a statistically significant predictor in all of the models in which it appears as a factor. Conversely, we find that the aged percentage of the population (65+) is not consistently a significant predictor of COVID-19 test positivity rate. This is congruent with evidence from Chan et al. [ 14 ] and Bai et al. [ 15 ], both of whom suggest significant transmission by young asymptomatic carriers. We further hypothesize that attitudes and behavioral patterns could play a significant role in this effect. As an example, increasing mortality of COVID-19 with age has been well publicized, and we suggest this may incline older communities to adhere to preventative public-health measures more. Conversely, the same information may be interpreted by younger populations that they are not at significant risk, potentially encouraging riskier behaviors. We found that high density population is a significant predictor of increased COVID-19 test positivity rate. These results support multiple studies of the current pandemic [ 37 – 39 ] that found that contact rates in well-mixed populations are proportional to population density. In the extreme scenario, the influence of high population density was seen in the rapid spread of the virus on cruise ships, notably the Diamond Princess, in late January 2020 [ 40 , 41 ]. Hu et al. use kinetic theory of Van der Waals gas models to show that population contact rates increase with population density (to a saturation limit) [ 42 ]. These increased contact patterns in higher density neighborhoods, combined with disease transmission through respiratory droplets [ 43 ] likely leads to increased positivity rates.

Race (White/non-White) was a consistent significant factor in our original statistical analysis. When we examined race in greater detail, we found significant associations between COVID-19 positivity rate and the proportions of the population identifying as Black (positive association) or White (negative association), but not Hispanic, Asian, or Other. There has been much reporting on disparities in COVID-19 influence due to race [ 17 ]. The confounding sociological relationships between race and economic affluence are well established [ 44 ], with African Americans more likely to live in densely populated, low-income neighborhoods, leading to increased contact patterns [ 45 ]. Further, the higher incidence of concomitant comorbidities among African American populations (including hypertension, diabetes, obesity, and cardiovascular disease) [ 46 ] may lead to an increase in symptomatic cases. Other cohort studies have also shown differences in racial groups that we combined into our Other category [ 47 ]. Due to the low number of cases associated with these minority racial populations, we chose not to further divide our race groups, which could increase the risk of ecological fallacy with our aggregate methodology [ 48 ].

While the balance of males and females was not consistently significant as a factor, we found some evidence that areas with more males are associated with higher detected COVID-19 cases. Wenham et al. [ 49 ] note the lack of sex analysis by global health institutions. Studies have posited sex differences in immunological function [ 50 ] or smoking prevalence/pattern [ 51 ] as potential causes of differing medical outcomes. We found no studies to date examining sex specific behavior trends in relation to COVID-19 transmission and incidence. Looking back further, we found conflicting evidence from studies on the 2009 H1N1 pandemic. Some studies suggested that females were more willing to engage in public health precautions [ 52 ], while others suggested no significant sex effects [ 53 ]. We suggest that further studies be undertaken to consider whether sex specific behavioral, employment, or other trends are mechanisms that could explain sex effects on positivity rates.

Regarding the economic predictors, we note that our findings are in agreement with a previous, non-pandemic study [ 54 ], which found that affluence (in our case household income) was a significant predictor on self-rated health while poverty and income inequality (the Gini index) were not significant factors. Wen et al. suggest that the presence of affluence sustains neighborhood social organizations, which in turn positively affect health. If we extend this argument to the current pandemic, we could hypothesize that these social organizations further act to pass on information and promote community adoption of transmission-reduction policies such as social distancing [ 55 ]. Furthermore, we note that those in low affluence neighborhoods are more likely to live in higher density residence arrangements, for example community housing and shared family dwellings, contributing to transmission of the virus among the neighborhood [ 40 ]. While previous studies [ 56 ] have found influence of unemployment on disease transmission, we note that the unprecedented shutdown of national infrastructure and the economy has meant that many previously employed people suddenly found themselves either unemployed, furloughed, or working from home. In a short period of time, this drastic measure has completely altered the employment landscape of NYC such that it is unsurprising that the unemployment figure from 2018 is not significant.

We found that neither of our healthcare-related predictors was consistently significant. Lack of insurance has previously been a barrier to both diagnosis and treatment [ 57 , 58 ]. However, in the COVID-19 pandemic, significant state resources were directed such that testing was freely available to all eligible New York residents. Furthermore, testing became freely available to all USA residents on 18 March 2020, as a result of the Families First Coronavirus Response Act (H.R. 6201) [ 59 ]. Given the unprecedented free access to testing, it is unsurprising that lack of insurance was not a significant predictor by 5 April when the data were collected. We hypothesize that conducting the same analysis on detected cases prior to 18 March could potentially draw different conclusions about the significance of insurance. Unfortunately, the data on detected cases by ZCTA only became publicly available from NYC DOHMH on 1 April and did not include temporal granularities prior to that date.

In addition to the four predictors in our final model, we also considered collinearity of the remaining predictors by conducting a principal component analysis (PCA). We generated a single social deprivation metric encompassing unemployment, poverty, and lack of insurance, all of which had a reasonable degree of correlation (we did not include race or income since they were significant on their own). We conducted similar regression approaches using this metric; however, it was only significant in the univariable case ( p <0.001).

We note five key limitations of the ecological study. First, our dependent variable is the number of detected COVID-19 cases, which may be significantly different from the number of true cases [ 60 ]. We believe, however, that this does not detract from the validity of the study, since characterization of the detection and prevalence is important for pandemic management [ 61 ]. Studies on HIV rates among at risk populations suggest that the relationship between predictors and the number of detected cases is likely a complex interaction via at least three pathways: the true number of cases, access to testing (means) [ 62 ], and population attitudes to testing (motivation) [ 63 , 64 ]. Thus, we can still develop valid inferences, even if we cannot elicit with certainty which one (or ones) of these pathways the significant predictors act through. This limitation also incorporates natural selection bias in the dependent variable, in that there is a self-selecting group of the population who choose to be tested for COVID-19 (for example due to the presence of symptoms or known contact with an infected person). This group, captured by the total COVID-19 tests, may have different characteristics to the total NYC population (one example could be young people being more likely to get tested). By using the total number of COVID-19 tests as our exposure, we limit the scope to inferences about the test positivity rate, and we further caution that this should not be used as an unbiased estimator of total COVID-19 incidence [ 65 ]. Second, any associations made must be interpreted with caution since, as with any observational study, spurious correlations produced by unstudied confounding factors may be present. Caution is also advised due to the ecological fallacy of making individual inferences from aggregate data. Further verification is required to determine true causative links between predictors and detected cases even when associations are significant. Third, the significant predictors found are likely not the only explanations for different positivity rates between different neighborhoods. However, this study does provide useful insight into explaining between-neighborhood variation. Fourth, since testing has been coordinated within the city limits at the borough level, there may be borough-level biases related to COVID-19 testing. However, if these biases exist, they likely inhibit testing access in low-income neighborhoods [ 66 , 67 ] such that the inverse association found between income and positive cases is more pronounced than what the model suggests.

Finally, in our spatial model, we used an ICAR adjacency matrix of first-order lag points, i.e., a nearest neighbor structure where two ZCTAs are considered connected if (and only if) they share a border. An argument can be made that, in a highly mixed urban environment such as NYC, this structure, shown in Fig.  4 d, does not adequately capture the spatial heterogeneity. However, there is sparse literature on the application of different neighborhood structures to BYM models [ 3.0.CO;2-C." href="/articles/10.1186/s12916-020-01731-6#ref-CR68" id="ref-link-section-d19862093e5664">68 , 69 ]; Rodrigues and Assunção argue that this is primarily due to the ease of nearest neighbor implementation using geographic information systems (GIS) [ 70 ]. To investigate the effect of neighborhood mixing, we created an additional series of lagged adjacency matrices from second- through fifth-order implying increasing levels of connectivity. We ran all our model simulations (univariable, multivariable, partial multivariable, stepwise elimination, and our final model) using each one of the five new adjacency matrices, generating 20 new sets of results and associated p values. In all cases (i.e., all neighborhood connectivities), the main study conclusions were unaltered; in particular, young dependent population, race, and income were still significant predictors in all models. The significance of population density however did decline with increased mixing, ceasing to be significant above third-order connectivity in our final model.

Within the constraints imposed by the limitations of an ecological analysis, we conclude that there exist consistent, significant associations between COVID-19 test positivity rate and the percentage of young dependents in the population as well as population density. Further, there is also a significant association between COVID-19 test positivity rate and low income neighborhoods. Finally, there is a significant association between neighborhoods with a large percentage of black population or a low percentage of white population and COVID-19 test positivity rate. The significance of young dependents likely comes from differing contact patterns between young and old populations. We suggest further studies to be undertaken to determine any underlying causative mechanisms to these associations, paying particular attention to willingness to engage in public health behaviors and to asymptomatic carrier transmission. We finally highlight that while predictors may change with increased time and access to testing, this study provides important insights into public health behavior in the early stages of the current and future pandemics.

Availability of data and materials

The datasets analyzed for this study are publicly available, a repository can be found on GitHub: https://github.com/rswhittle/NYC-COVID19-socioeconomic .

Abbreviations

American Communities Survey

Besag-York-Mollié(model)

Confidence interval

Coronavirus disease 2019

Deviance information criterion

New York City Department of Health and Mental Hygiene

Influenza A virus subtype H1N1

New York City

Public Use Microdata Area

Zip Code Tabulation Area

Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020. https://doi.org/10.1016/S1473-3099(20)30120-1 .

NYC Department of Health and Mental Hygiene (DOHMH). NYC Coronavirus (COVID-19) data. 2020. Available from: https://github.com/nychealth/coronavirus-data . Accessed 10 Apr 2020.

Cucinotta D, Vanelli M. WHO declares COVID-19 a pandemic. Acta Biomedica. 2020; 91(1):157–60. https://doi.org/10.23750/abm.v91i1.9397 .

PubMed   Google Scholar  

Stier AJ, Berman MG, Bettencourt LMA. COVID-19 attack rate increases with city size (March 30, 2020). Mansueto Inst Urban Innov Res Pap No. 19. 2020. https://ssrn.com/abstract=3564464 . Accessed 10 Apr2020.

Cohen J, Kupferschmidt K. Countries test tactics in ‘war’ against COVID-19. Science. 2020; 367(6484):1287–8. https://doi.org/10.1126/science.367.6484.1287 .

CAS   PubMed   Google Scholar  

Angelopoulos AN, Pathak R, Varma R, Jordan MI. On Identifying and Mitigating Bias in the Estimation of the COVID-19 Case Fatality Rate. Harvard Data Science Review. 2020. Special Issue 1 - COVID-19. https://doi.org/10.1162/99608f92.f01ee285 .

Britten RH. The incidence of epidemic influenza, 1918-19: a further analysis according to age, sex, and color of the records of morbidity and mortality obtained in surveys of 12 localities. Public Health Rep (1896–1970). 1932; 47(6):303. https://doi.org/10.2307/4580340 .

Google Scholar  

Sydenstricker E. The Incidence of Influenza among Persons of Different Economic Status during the Epidemic of 1918. Public Health Rep (1896-1970). 1931; 46(4):154–170. https://doi.org/10.2307/4579923 .

La Ruche G, Tarantola A, Barboza P, Vaillant L, Gueguen J, Gastellu-Etchegorry M, et al.The 2009 pandemic H1N1 influenza and indigenous populations of the Americas and the Pacific. Eurosurveillance. 2009; 14(42):19366. https://doi.org/10.2807/ese.14.42.19366-en .

World Health Organization. Pandemic influenza preparedness and response: a WHO guidance document. Geneva: WHO Press; 2009.

NYC Department of Health and Mental Hygiene (DOHMH). Coronavirus disease 2019 (COVID-19) daily data summary: April 5, 2020. 2020. Available from: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-04052020-2.pdf . Accessed 10 Apr 2020.

United States Census Bureau. American Community Survey 2014-2018 5-year estimates. 2018. Available from: https://www.census.gov/data.html . Accessed 10 Apr 2020.

Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al.Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet. 2020; 395(10229):1054–62. https://doi.org/10.1016/S0140-6736(20)30566-3 .

CAS   Google Scholar  

Chan JFW, Yuan S, Kok KH, To KKW, Chu H, Yang J, et al.A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet. 2020; 395(10223):514–23. https://doi.org/10.1016/S0140-6736(20)30154-9 .

Bai Y, Yao L, Wei T, Tian F, Jin DY, Chen L, et al.Presumed asymptomatic carrier transmission of COVID-19. JAMA J Am Med Assoc. 2020. https://doi.org/10.1001/jama.2020.2565 .

NYU Furman Center. COVID-19 cases in New York City, a neighborhood-level analysis. New York: New York University; 2020. Available from: https://furmancenter.org/thestoop/entry/covid-19-cases-in-new-york-city-a-neighborhood-level-analysis . Accessed 10 Apr 2020.

Webb Hooper M, Nápoles AM, Pérez-Stable EJ. COVID-19 and racial/ethnic disparities. JAMA J Am Med Assoc. 2020; 323(24):2466–7. https://doi.org/10.1001/jama.2020.8598 .

Fang LQ, Wang LP, De Vlas SJ, Liang S, Tong SL, Li YL, et al.Distribution and risk factors of 2009 pandemic influenza A (H1N1) in Mainland China. Am J Epidemiol. 2012; 175(9):890–7. https://doi.org/10.1093/aje/kwr411 .

PubMed   PubMed Central   Google Scholar  

Lynch J, Smith GD, Hillemeier M, Shaw M, Raghunathan T, Kaplan G. Income inequality, the psychosocial environment, and health: comparisons of wealthy nations. Lancet. 2001; 358(9277):194–200. https://doi.org/10.1016/S0140-6736(01)05407-1 .

Babones SJ. Income inequality and population health: correlation and causality. Soc Sci Med. 2008; 66(7):1614–1626. https://doi.org/10.1016/j.socscimed.2007.12.012 .

Thompson DL, Jungk J, Hancock E, Smelser C, Landen M, Nichols M, et al.Risk factors for 2009 pandemic influenza A (H1N1)-related hospitalization and death among racial/ethnic groups in New Mexico. Am J Public Health. 2011; 101(9):1776–84. https://doi.org/10.2105/AJPH.2011.300223 .

Janlert U, Hammarström A. Which theory is best? Explanatory models of the relationship between unemployment and health. BMC Public Health. 2009; 9(1):1–9. https://doi.org/10.1186/1471-2458-9-235 .

Whittle HJ, Palar K, Seligman HK, Napoles T, Frongillo EA, Weiser SD. How food insecurity contributes to poor HIV health outcomes: qualitative evidence from the San Francisco Bay Area. Soc Sci Med. 2016; 170:228–236. https://doi.org/10.1016/j.socscimed.2016.09.040 .

Bouye K, Truman BI, Hutchins S, Richard R, Brown C, Guillory JA, et al.Pandemic influenza preparedness and response among public-housing residents, single-parent families, and low-income populations. Am J Public Health. 2009; 99 Suppl 2(S2):287–93. https://doi.org/10.2105/AJPH.2009.165134 .

Tomita A, Vandormael AM, Cuadros D, Slotow R, Tanser F, Burns JK. Proximity to healthcare clinic and depression risk in South Africa: geospatial evidence from a nationally representative longitudinal study. Soc Psychiatry Psychiatr Epidemiol. 2017; 52(8):1023–30. https://doi.org/10.1007/s00127-017-1369-x .

Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950; 37(1/2):17–23. https://doi.org/10.2307/2332142 .

Besag J, York J, Mollié A. Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math. 1991; 43(1):1–20. https://doi.org/10.1007/BF00116466 .

Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B (Methodological). 1974; 36(2):192–236. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x .

Besag J, Kooperberg C. On conditional and intrinsic autoregression. Biometrika. 1995; 82(4):733–46. https://doi.org/10.2307/2337341 .

Riebler A, Sørbye SH, Simpson D, Rue H. An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Stat Methods Med Res. 2016; 25(4):1145–65. https://doi.org/10.1177/0962280216660421 .

Simpson D, Rue H, Riebler A, Martins TG, Sørbye SH. Penalising model component complexity: a principled, practical approach to constructing priors. Stat Sci. 2017; 32(1):1–28. https://doi.org/10.1214/16-STS576 .

Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol). 2002; 64(4):583–639. https://doi.org/10.1111/1467-9868.00353 .

Nikolopoulos G, Bagos P, Lytras T, Bonovas S. An ecological study of the determinants of differences in 2009 pandemic influenza mortality rates between countries in Europe. PLoS ONE. 2011; 6(5):e19432. https://doi.org/10.1371/journal.pone.0019432 .

CAS   PubMed   PubMed Central   Google Scholar  

Meng XL. Posterior predictive p -values. Ann Stat. 1994; 22(3):1142–60. https://doi.org/10.1214/AOS/1176325622 .

Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B (Stat Methodol). 2009; 71(2):319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x .

Martins TG, Simpson D, Lindgren F, Rue H. Bayesian computing with INLA: new features. Comput Stat Data Anal. 2013; 67:68–83. https://doi.org/10.1016/j.csda.2013.04.014 .

Rocklöv J, Sjödin H. High population densities catalyse the spread of COVID-19. J Travel Med. 2020:1–2. https://doi.org/10.1093/jtm/taaa038 .

Sjödin H, Wilder-Smith A, Osman S, Farooq Z, Rocklöv J. Only strict quarantine measures can curb the coronavirus disease (COVID-19) outbreak in Italy, 2020. Eurosurveillance. 2020; 25(13):2000280. https://doi.org/10.2807/1560-7917.ES.2020.25.13.2000280 .

PubMed Central   Google Scholar  

CDC COVID-19 Response Team. Geographic differences in COVID-19 cases, deaths, and incidence — United States, February 12–April 7, 2020. Morb Mortal Wkly Rep. 2020; 69(15):465–71. https://doi.org/10.15585/mmwr.mm6915e4 .

Rocklöv J, Sjödin H, Wilder-Smith A. COVID-19 outbreak on the Diamond Princess cruise ship: estimating the epidemic potential and effectiveness of public health countermeasures. J Travel Med. 2020. https://doi.org/10.1093/jtm/taaa030 .

Zhang S, Diao MY, Yu W, Pei L, Lin Z, Chen D. Estimation of the reproductive number of novel coronavirus (COVID-19) and the probable outbreak size on the Diamond Princess cruise ship: a data-driven analysis. Int J Infect Dis. 2020; 93:201–4. https://doi.org/10.1016/j.ijid.2020.02.033 .

Hu H, Nigmatulina K, Eckhoff P. The scaling of contact rates with population density for the infectious disease models. Math Biosci. 2013; 244(2):125–34. https://doi.org/10.1016/j.mbs.2013.04.013 .

Bourouiba L. Turbulent gas clouds and respiratory pathogen emissions: potential implications for reducing transmission of COVID-19. JAMA Insights. 2020; 323(18):1837–8. https://doi.org/10.1001/jama.2020.4756 .

Phelan TJ, Schneider M. Race, ethnicity, and class in American suburbs. Urban Aff Rev. 1996; 31(5):659–80. https://doi.org/10.1177/107808749603100504 .

Shah M, Sachdeva M, Dodiuk-Gad RP. COVID-19 and racial disparities. Journal of American Dermatology. 2020; 83(1):e35. https://doi.org/10.1016/j.jaad.2020.04.046 .

Yancy CW. COVID-19 and African Americans. JAMA J Am Med Assoc. 2020; 323(19):1891–2. https://doi.org/10.1001/jama.2020.6548 .

Keawe’aimoku Kaholokula J, Samoa RA, Miyamoto RES, Palafox N, Daniels SA. COVID-19 special column: COVID-19 hits native Hawaiian and Pacific Islander communities the hardest. Hawai’i J Health Soc Welf. 2020; 79(5):144–6.

Finney JW, Humphreys K, Kivlahan DR, Harris AHS. Why health care process performance measures can have different relationships to outcomes for patients and hospitals: understanding the ecological fallacy. Am J Public Health. 2011; 101(9):1635–42. https://doi.org/10.2105/AJPH.2011.300153 .

Wenham C, Smith J, Morgan R. COVID-19: the gendered impacts of the outbreak. The Lancet. 2020; 395(10227):846–8. https://doi.org/10.1016/S0140-6736(20)30526-2 .

Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al.Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet. 2020; 395(10223):507–13. https://doi.org/10.1016/S0140-6736(20)30211-7 .

Liu S, Zhang M, Yang L, Li Y, Wang L, Huang Z, et al.Prevalence and patterns of tobacco smoking among Chinese adult men and women: findings of the 2010 national smoking survey. J Epidemiol Community Health. 2017; 71(2):154–61. https://doi.org/10.1136/jech-2016-207805 .

Park JH, Cheong HK, Son DY, Kim SU, Ha CM. Perceptions and behaviors related to hand hygiene for the prevention of H1N1 influenza transmission among Korean university students during the peak pandemic period. BMC Infect Dis. 2010; 10(1):1–8. https://doi.org/10.1186/1471-2334-10-222 .

Kiviniemi MT, Ram PK, Kozlowski LT, Smith KM. Perceptions of and willingness to engage in public health precautions to prevent 2009 H1N1 influenza transmission. BMC Public Health. 2011; 11(1):1–8. https://doi.org/10.1186/1471-2458-11-152 .

Wen M, Browning CR, Cagney KA. Poverty, affluence, and income inequality: neighborhood economic structure and its implications for health. Soc Sci Med. 2003; 57(5):843–60. https://doi.org/10.1016/S0277-9536(02)00457-4 .

Anderson RM, Heesterbeek H, Klinkenberg D, Hollingsworth TD. How will country-based mitigation measures influence the course of the COVID-19 epidemic?The Lancet. 2020; 395(10228):931–4. https://doi.org/10.1016/S0140-6736(20)30567-5 .

Munch Z, Van Lill SWP, Booysen CN, Zietsman HL, Enarson DA, Beyers N. Tuberculosis transmission patterns in a high-incidence area: A spatial analysis. Int J Tuberc Lung Dis. 2003; 7(3):271–7.

Doyle JJ. Health insurance, treatment and outcomes: using auto accidents as health shocks. Rev Econ Stat. 2005; 87(2):256–70. https://doi.org/10.1162/0034653053970348 .

Kwara A, Herold JS, Machan JT, Carter EJ. Factors associated with failure to complete isoniazid treatment for latent tuberculosis infection in Rhode Island. Chest. 2008; 133(4):862–8. https://doi.org/10.1378/chest.07-2024 .

H R. Families First Coronavirus Response Act. 2020. https://www.congress.gov/bill/116th-congress/house-bill/6201/ . Accessed 9 June 2020.

Gostic KM, Gomez ACR, Mummah RO, Kucharski AJ, Lloyd-Smith JO. Estimated effectiveness of symptom and risk screening to prevent the spread of COVID-19. eLife. 2020:9. https://doi.org/10.7554/eLife.55570 .

Lipsitch M, Swerdlow DL, Finelli L. Defining the epidemiology of COVID-19 — studies needed. New England J Med. 2020; 382(13):1194–6. https://doi.org/10.1056/NEJMp2002125 .

Meehan SA, Leon N, Naidoo P, Jennings K, Burger R, Beyers N. Availability and acceptability of HIV counselling and testing services. A qualitative study comparing clients’ experiences of accessing HIV testing at public sector primary health care facilities or non-governmental mobile services in Cape Town, South Afr. BMC Public Health. 2015; 15(1):845. https://doi.org/10.1186/s12889-015-2173-8 .

Jereni BH, Muula AS. Availability of supplies and motivations for accessing voluntary HIV counseling and testing services in Blantyre, Malawi. BMC Health Serv Res. 2008; 8(1):1–6. https://doi.org/10.1186/1472-6963-8-17 .

Downing M, Knight K, Reiss TH, Vernon K, Mulia N, Ferreboeuf M, et al.Drug users talk about HIV testing: motivating and deterring factors. AIDS Care Psychol Socio-Med Aspects AIDS/HIV. 2001; 13(5):561–77. https://doi.org/10.1080/09540120120063205 .

Fenton NE, Neil M, Osman M, McLachlan S. COVID-19 infection and death rates: the need to incorporate causal explanations for the data and avoid bias in testing. J Risk Res. 2020:1–4. https://doi.org/10.1080/13669877.2020.1756381 .

Pai NP, Vadnais C, Denkinger C, Engel N, Pai M. Point-of-care testing for infectious diseases: diversity, complexity, and barriers in low- and middle-income countries. PLoS Med. 2012; 9(9). https://doi.org/10.1371/journal.pmed.1001306 .

O’Loughlin JL, Paradis G, Gray-Donald K, Renaud L. The impact of a community-based heart disease prevention program in a low-income, inner-city neighborhood. Am J Public Health. 1999; 89(12):1819–26. https://doi.org/10.2105/AJPH.89.12.1819 .

MacNab YC, Dean CB. Parametric bootstrap and penalized quasi-likelihood inference in conditional autoregressive models. Stat Med. 2000; 19(17-18):2421–35. https://doi.org/10.1002/1097-0258(20000915/30)19:17/18<2421::AID-SIM579>3.0.CO;2-C.

White G, Ghosh SK. A stochastic neighborhood conditional autoregressive model for spatial data. Comput Stat Data Anal. 2009; 53(8):3033–46. https://doi.org/10.1016/j.csda.2008.08.010 .

Rodrigues EC, Assunção R. Bayesian spatial models with a mixture neighborhood structure. J Multivar Anal. 2012; 109:88–102. https://doi.org/10.1016/j.jmva.2012.02.017 .

NYC Geodatabase (NYC GDB) Project. 2010 New York City Zip Code Tabulation Areas (ZCTAs). 2016. Available from: http://hdl.handle.net/2451/34509 . Accessed 10 Apr 2020.

Download references

Acknowledgements

Figures were created using shapefiles publicly available from the NYC Geodatabase (NYC GDB) project [ 71 ].

This study did not receive any funding.

Author information

Authors and affiliations.

Department of Aerospace Engineering, Texas A&M University, College Station, TX, USA

Richard S. Whittle & Ana Diaz-Artiles

International Space University, Illkirch-Graffenstaden, France

You can also search for this author in PubMed   Google Scholar

Contributions

RSW conceived and designed the work. RSW collected data. RSW designed the model and the computational framework and analyzed the data. RSW drafted the manuscript. RSW and AD-A revised the manuscript for critical intellectual content. RSW and AD-A approved the final version of the manuscript.

Corresponding author

Correspondence to Richard S. Whittle .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Whittle, R.S., Diaz-Artiles, A. An ecological study of socioeconomic predictors in detection of COVID-19 cases across neighborhoods in New York City. BMC Med 18 , 271 (2020). https://doi.org/10.1186/s12916-020-01731-6

Download citation

Received : 17 April 2020

Accepted : 04 August 2020

Published : 04 September 2020

DOI : https://doi.org/10.1186/s12916-020-01731-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Positivity rate
  • Socioeconomic factors
  • Besag-York-Mollié model
  • Youth dependency
  • Population density

BMC Medicine

ISSN: 1741-7015

ecological analysis of case study

Arizona State University - School of Human Evolution & Social Change

Sign In / Sign Out

Navigation: asu universal.

  • Arts and Sciences
  • Design and the Arts
  • Engineering
  • Health Solutions
  • Nursing and Health Innovation
  • Public Programs
  • Sustainability
  • University College
  • Polytechnic
  • Downtown Phoenix
  • Online and Extended
  • Lake Havasu
  • Research Park
  • Washington D.C.

ecological analysis of case study

You are here

Case studies of social-ecological systems.

  • Browse (active tab)

Browse case studies header

There follows a list of case studies in the SES Library.

To read this content please select one of the options below:

Please note you do not have access to teaching notes, institutional ecological footprint analysis ‐ a case study of the university of newcastle, australia.

International Journal of Sustainability in Higher Education

ISSN : 1467-6370

Article publication date: 1 March 2001

With documented declines in the biophysical state of the planet, there is a need to develop indicators of sustainability. Ecological footprint analysis (EFA) can be considered an indicator of sustainability that converts consumption and waste production into units of equivalent land area. Based on the reality of biophysical limits to growth, and presenting data in an aggregated, quantifiable, yet easily comprehensible form, EFA is also a useful tool for environmental policy and management. EFA has typically been applied at the national and regional level as well, as for assessment of technology. This paper develops an ecological footprint model for institutional contexts and this study of the University of Newcastle (NSW) is the first institutional level EFA undertaken in Australia. The case study shows tertiary institutions to be net importers of consumption items and thus dependent on a vast external environment. The EFA highlights those areas of consumption which constitute the largest part of the footprint and thus provides the opportunity for targeting those areas for active management. EFA for this tertiary institution clearly identifies that a reduced ecological footprint would mean a movement towards sustainability.

  • Sustainable development
  • Tertiary education
  • Environmental management strategy

Flint, K. (2001), "Institutional ecological footprint analysis ‐ A case study of the University of Newcastle, Australia", International Journal of Sustainability in Higher Education , Vol. 2 No. 1, pp. 48-62. https://doi.org/10.1108/1467630110380299

Copyright © 2001, MCB UP Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

PCR bias in ecological analysis: a case study for quantitative Taq nuclease assays in analyses of microbial communities

Affiliation.

  • 1 Lehrstuhl für Physiologie und Biochemie der Pflanzen, Universität Konstanz, Constance, Yerseke, The Netherlands. [email protected]
  • PMID: 11055948
  • PMCID: PMC92404
  • DOI: 10.1128/AEM.66.11.4945-4953.2000

Succession of ecotypes, physiologically diverse strains with negligible rRNA sequence divergence, may explain the dominance of small, red-pigmented (phycoerythrin-rich) cyanobacteria in the autotrophic picoplankton of deep lakes (C. Postius and A. Ernst, Arch. Microbiol. 172:69-75, 1999). In order to test this hypothesis, it is necessary to determine the abundance of specific ecotypes or genotypes in a mixed background of phylogenetically similar organisms. In this study, we examined the performance of Taq nuclease assays (TNAs), PCR-based assays in which the amount of an amplicon is monitored by hydrolysis of a labeled oligonucleotide (TaqMan probe) when hybridized to the amplicon. High accuracy and a 7-order detection range made the real-time TNA superior to the corresponding end point technique. However, in samples containing mixtures of homologous target sequences, quantification can be biased due to limited specificity of PCR primers and probe oligonucleotides and due to accumulation of amplicons that are not detected by the TaqMan probe. A decrease in reaction efficiency, which can be recognized by direct monitoring of amplification, provides experimental evidence for the presence of such a problem and emphasizes the need for real-time technology in quantitative PCR. Use of specific primers and probes and control of amplification efficiency allow correct quantification of target DNA in the presence of an up to 10(4)-fold excess of phylogenetically similar DNA and of an up to 10(7)-fold excess of dissimilar DNA.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Bacteria / growth & development
  • Bacteria / isolation & purification*
  • DNA Probes / genetics
  • DNA, Bacterial / analysis
  • Fresh Water / microbiology
  • Polymerase Chain Reaction / methods*
  • Sensitivity and Specificity
  • Taq Polymerase / metabolism*
  • DNA, Bacterial
  • Taq Polymerase

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 April 2024

Hybrid speciation driven by multilocus introgression of ecological traits

  • Neil Rosser   ORCID: orcid.org/0000-0001-7796-2548 1 , 2   na1 ,
  • Fernando Seixas   ORCID: orcid.org/0000-0002-2026-5992 1   na1 ,
  • Lucie M. Queste   ORCID: orcid.org/0000-0002-7402-8079 2 ,
  • Bruna Cama   ORCID: orcid.org/0000-0002-4100-9680 2 ,
  • Ronald Mori-Pezo 3 , 4 ,
  • Dmytro Kryvokhyzha   ORCID: orcid.org/0000-0001-6498-1977 1 , 5 ,
  • Michaela Nelson 2 ,
  • Rachel Waite-Hudson 2 ,
  • Matt Goringe 2 ,
  • Mauro Costa 6 ,
  • Marianne Elias   ORCID: orcid.org/0000-0002-1250-2353 7 , 8 ,
  • Clarisse Mendes Eleres de Figueiredo   ORCID: orcid.org/0000-0001-6706-7064 9 , 10 ,
  • André Victor Lucci Freitas   ORCID: orcid.org/0000-0002-5763-4990 11 ,
  • Mathieu Joron   ORCID: orcid.org/0000-0003-1043-4147 12 ,
  • Krzysztof Kozak   ORCID: orcid.org/0000-0001-8980-3173 8 ,
  • Gerardo Lamas   ORCID: orcid.org/0000-0002-3664-6730 13 ,
  • Ananda R. P. Martins 14 ,
  • W. Owen McMillan 8 ,
  • Jonathan Ready   ORCID: orcid.org/0000-0002-9374-8661 9 , 10 ,
  • Nicol Rueda-Muñoz 15 ,
  • Camilo Salazar   ORCID: orcid.org/0000-0001-9217-6588 15 ,
  • Patricio Salazar   ORCID: orcid.org/0000-0001-8988-0769 16 ,
  • Stefan Schulz   ORCID: orcid.org/0000-0002-4810-324X 17 ,
  • Leila T. Shirai 11 ,
  • Karina L. Silva-Brandão 18 ,
  • James Mallet   ORCID: orcid.org/0000-0002-3370-0367 1 &
  • Kanchon K. Dasmahapatra   ORCID: orcid.org/0000-0002-2840-7019 2 , 19  

Nature volume  628 ,  pages 811–817 ( 2024 ) Cite this article

12k Accesses

1 Citations

278 Altmetric

Metrics details

  • Biodiversity
  • Ecological genetics
  • Evolutionary genetics
  • Quantitative trait loci

Hybridization allows adaptations to be shared among lineages and may trigger the evolution of new species 1 , 2 . However, convincing examples of homoploid hybrid speciation remain rare because it is challenging to demonstrate that hybridization was crucial in generating reproductive isolation 3 . Here we combine population genomic analysis with quantitative trait locus mapping of species-specific traits to examine a case of hybrid speciation in Heliconius butterflies. We show that Heliconius elevatus is a hybrid species that is sympatric with both parents and has persisted as an independently evolving lineage for at least 180,000 years. This is despite pervasive and ongoing gene flow with one parent, Heliconius pardalinus , which homogenizes 99% of their genomes. The remaining 1% introgressed from the other parent, Heliconius melpomene , and is scattered widely across the H. elevatus genome in islands of divergence from H. pardalinus . These islands contain multiple traits that are under disruptive selection, including colour pattern, wing shape, host plant preference, sex pheromones and mate choice. Collectively, these traits place H. elevatus on its own adaptive peak and permit coexistence with both parents. Our results show that speciation was driven by introgression of ecological traits, and that speciation with gene flow is possible with a multilocus genetic architecture.

Similar content being viewed by others

ecological analysis of case study

Phylogenomics and the rise of the angiosperms

ecological analysis of case study

Diversity-dependent speciation and extinction in hominins

ecological analysis of case study

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Biodiversity has long been depicted as a ‘tree of life’, but a wealth of genomic data has made clear that the branches and leaves of the tree often do not represent neatly defined units. Instead, they comprise a braided delta of evolutionary lineages linked by hybridization and introgression 4 . Although gene flow tends to homogenize populations 5 , it may also contribute to adaptation and even drive speciation if introgressed variants cause reproductive isolation 1 , 2 , 6 . Polyploid (chromosome-doubling) hybrid speciation is common in plants 7 , 8 , but compelling examples of homoploid hybrid speciation (without a change in chromosome number) remain scarce and contested, especially in animals 3 . This is because it is difficult to prove that hybridization had a pivotal role in creating reproductive isolation between the hybrid lineage and the parental species 3 . Here we present evidence for homoploid hybrid speciation in Heliconius butterflies. We show that introgression of key adaptive traits from H. melpomene caused H. elevatus to diverge from H. pardalinus , despite ongoing gene flow among sympatric H. elevatus and H. pardalinus , which homogenizes 99% of their genomes (Fig. 1a ).

figure 1

a , Evolutionary relationships and main introgression events described in this study. We test the hypothesis that introgression of major pre-mating and post-mating ecological isolating traits from H. melpomene led to the establishment of H. elevatus as a new stable hybrid species. Mya, million years ago. b , Geographical distributions of major clades. Locations at which both H. elevatus and H. pardalinus were sampled are numbered. c , Distance-based network using genome-wide independent SNPs. This concatenated tree shows the existence of two distinct clusters, Amazonian versus non-Amazonian, in both H. elevatus and H. pardalinus . d , Topology weighting analysis (TWISST) showing the percentage of the 11,509 non-overlapping genomic windows of 1,000 SNPs in which the majority of subtrees (that is, topology weighting ≥ 0.5) clusters H. elevatus (ele) with either H. pardalinus (par) (93.2%; top) or H. melpomene (mel) (0.52%; bottom). Note that although H. elevatus groups with H. pardalinus in 93.2% of windows, only 1.61% of those trees yield the two species as reciprocally monophyletic. By contrast, all three species are monophyletic in 81% of the windows for which H. elevatus groups with H. melpomene . Subscripts indicate geographical distributions for H. elevatus and H. pardalinus (Ama, Amazon; And, Andes; Gui, Guianas) and subspecies for H. melpomene (Agl, aglaope ; Ams, amaryllis ). e , A multispecies coalescent model with introgression supports a hybrid origin of H. elevatus , with the introgression time coinciding closely with the origin of the species (the 95% HPD intervals are given within parenthesis). Images of butterfly wings are copyright of the authors and Michel Cast.

Heliconius butterflies are chemically defended by cyanogenic glycosides, either sequestered from larval passion-vine host plants (Passifloraceae) or synthesized de novo 9 , 10 . Adults signal their toxicity to predators through their brightly coloured wing patterns. The cost of educating predators is shared with other defended butterflies and moths through mutualistic Müllerian mimicry 11 . Mimicry among species is not restricted to colour pattern, because co-mimics also converge in flight behaviour and wing shape 12 , 13 . Hybrids with intermediate phenotypes are selected against because predators do not recognize them 14 , 15 . Therefore, different mimetic phenotypes correspond to fitness peaks in an adaptive landscape maintained by disruptive selection. How populations transition to new fitness peaks remains an unanswered question, but adaptive introgression provides a possible route.

Heliconius elevatus , H. pardalinus and H. melpomene present an excellent system with which to elucidate the role of hybridization in speciation. All three species are sympatric across the Amazon basin 16 , 17 , 18 (Fig. 1b ). Heliconius elevatus and H. pardalinus are closely related 19 , but they have strikingly different colour patterns (Fig. 1a ). Heliconius pardalinus has a ‘tiger’ mimetic colour pattern typical of its close relatives. By contrast, H. elevatus has a red, black and yellow pattern, mimicking the much more distantly related H. melpomene 19 . This phenotypic convergence results in part from introgression at the major colour patterning loci cortex and optix 20 . Because Heliconius elevatus uses colour pattern as a partial cue in mate choice 17 , these introgressed alleles are likely to promote pleiotropic reproductive isolation from H. pardalinus . This suggests that H. elevatus has a hybrid origin, and here we test that hypothesis.

Hybrid genome of H. elevatus

We analysed whole-genome sequences of 92 wild-caught individuals of these three species: 42 H. elevatus (12 locations), 33 H. pardalinus (7 locations) and 17 H. melpomene (4 locations). For H. elevatus and H. pardalinus , our sampling spanned their combined geographical range (Fig. 1b and Supplementary Table 1 ). A concatenated whole-genome phylogenetic network groups H. pardalinus with H. elevatus , whereas H. melpomene forms a much deeper lineage separated by several other species 21 (Fig. 1c ). This topology is echoed in 93.2% of the genealogies estimated in sliding windows of 1,000 single-nucleotide polymorphisms (SNPs) across the genome, whereas 0.52% of the genealogies cluster H. elevatus with H. melpomene (Fig. 1d ). This is suggestive of introgression between these two species but could also be explained by retention of ancestral polymorphisms. Testing this hypothesis under the multispecies coalescent with introgression framework 22 (Fig. 1e ), we find that most of the H. elevatus genome is derived from H. pardalinus , with a 0.71% (95% high posterior density (HPD) 0.32–1.11%) contribution from H. melpomene . Heliconius elevatus arose as an independent lineage around 180,000 years ago (kya) (Fig. 1e ; 95% HPD 137–216 kya; see also Extended Data Fig. 1 ). This divergence time coincides closely with the divergence from H. pardalinus (212 kya, 95% HPD 201–224 kya) and the timing of introgression from H. melpomene (193 kya, 95% HPD 142–247 kya). These coalescent-based estimates are therefore consistent with H. elevatus being a hybrid lineage that formed through admixture between H. pardalinus and H. melpomene .

Ongoing local gene flow with H. pardalinus

Notably, the whole-genome concatenated phylogenetic network suggests that sympatric populations of H. elevatus and H. pardalinus in the Amazon are more closely related to each other than to allopatric conspecifics from the Peruvian Andes and the Guianas (Fig. 1c ). Only 1.92% (1.50% + 0.42%; Fig. 1d ) of the 11,509 windows across the genome yield reciprocally monophyletic genealogies for H. pardalinus and H. elevatus ; this is confirmed by multispecies coalescent-based trees across the genome (Extended Data Fig. 2 ). We therefore investigated whether this apparent double species paraphyly could be explained by extensive ongoing gene flow between H. elevatus and H. pardalinus in sympatry in the Amazon.

Putative natural hybrids have been reported occasionally between H. elevatus and H. pardalinus 23 , but the two very rarely mate in captivity 17 . However, F 1 hybrids from forced matings are fully fertile 17 . We therefore examined the genomes of wild-caught individuals of the two species from sympatric populations across their range for evidence of gene flow. Focusing on SNPs diagnostic for H. elevatus and H. pardalinus , we find a few individuals with long tracts of heterozygous ancestry, in some cases spanning whole chromosomes, indicating recent gene flow (Extended Data Fig. 3 ). Four-population ( f 4 ) tests comparing within- and between-species gene flow support ongoing interspecific gene flow in sympatry (Fig. 2a ). Estimated levels of effective gene flow between the species in sympatry are high ( Nm  > 1, where N is the effective population size, and m is the migration rate per generation), quite sufficient to homogenize neutral variation between species; indeed, gene flow approaches the rates that are found among nearby populations of the same species (Fig. 2b and Supplementary Table 2 ). Finally, we performed demographic modelling based on the site-frequency spectrum under different demographic scenarios and topologies. The best supported model retrieved a tree in which H. elevatus and H. pardalinus were reciprocally monophyletic, and confirmed that gene flow has been prevalent throughout their combined history (Fig. 2c and Extended Data Fig. 4 ). After their initial split, populations of H. pardalinus in the Amazon diverged from those in the Andes, and Amazonian populations of H. elevatus diverged from those in the Guianas (Fig. 2c and Supplementary Table 3 ). The two species then began to overlap broadly in the Amazon from around 28 kya (95% confidence interval 25.6–30.0 kya) until the present, with high levels of gene flow in sympatry. Nonetheless, sympatric populations of H. elevatus and H. pardalinus in the Amazon form mutually monophyletic genetic clusters (Fig. 1c ); thus, the two species remain differentiated and can clearly coexist despite extensive ongoing gene flow, implying the existence of strong sexual and ecological isolation.

figure 2

a , Gene flow between species in sympatry is typically significantly greater ( f 4  > 0, filled points) than the within-species gene flow between populations across the Amazon basin. Numbers next to the points indicate the population pairs compared (see Fig. 1b ). A significant positive correlation indicates that the within-species gene flow between populations X and Y declines with increasing geographical distance relative to between-species gene flow at populations X and/or Y . b , Estimates of effective migration rate ( Nm ) within and between species using G-PhoCS. In the Amazon, between-species Nm (par Ama /ele Ama ) is similar to within-species Nm between locations (par Ama /par Ama and ele Ama /ele Ama ). The estimates for par Ama /ele Ama are denoted as filled and open circles and correspond to within and between location comparisons, respectively. c , The best supported demographic model based on the site-frequency spectrum analysis supports H. elevatus and H. pardalinus as reciprocally monophyletic (Extended Data Fig. 4 ). They initially diverged with continuous gene flow (933 to 221 kya) and after splitting into Amazonian and non-Amazonian populations, they came into secondary contact and continued exchanging genes until the present (45 kya to the present). Numbers within the blocks are effective population sizes in thousands. Arrows between groups represent continuous gene flow; numbers above or below arrows indicate 2 Nm  values.

Lack of gene flow with H. melpomene

The genome of Heliconius elevatus is, on average, more distantly related to its other parental species H. melpomene than it is to H. pardalinus (Fig. 1d and Extended Data Fig. 5a ). Yet gene flow from H. melpomene is plausible, because the latter is known to hybridize occasionally with other equally distant species in the wild 24 , 25 . None of the 31 H. elevatus or 17 H. melpomene individuals collected from areas of sympatry show any tracts of heterospecific ancestry (Extended Data Fig. 5b ). Likewise, f 4 tests do not detect any signals of gene flow (Supplementary Table 4 ). These data indicate that, in contrast to the extensive ongoing gene flow detected between H. elevatus and H. pardalinus , any recent gene flow between H. elevatus and H. melpomene is extremely rare. Because the H. elevatus and H. melpomene colour pattern phenotypes are essentially identical, this trait is probably not used to discriminate conspecifics 26 . Instead, their coexistence is likely to be due to strong assortative mating mediated by traits such as male sex pheromones and host plants (Extended Data Fig. 6 ), as well as female-limited hybrid sterility, which evolves rapidly 27 , 28 and helps to isolate H. melpomene from other sympatric, co-mimetic species 29 .

Barriers inherited from H. melpomene

As a result of extensive ongoing gene flow, differentiation ( F ST ) between sympatric populations of H. elevatus and H. pardalinus is approximately zero across around 99% of their genomes (Fig. 3 ). Only around 1% of the genome shows increased differentiation ( F ST  ≥ 0.2) and retrieves both species as reciprocally monophyletic on the basis of topology weighting by iterative sampling of subtrees (TWISST) analysis, comprising 44 genomic islands of divergence. Notably, genealogies within genomic islands resolve all populations of both species, including the peripheral allopatric lineages, as reciprocally monophyletic (Fig. 3 ). Furthermore, introgression from H. melpomene is especially prevalent in these islands and is found in 32 of the 44 genomic islands of divergence (Fisher’s exact test P  < 0.001; Fig. 3 ). Because these genomic islands resist homogenization despite gene flow in sympatry, we hypothesize that they contain the genetic basis for species differences.

figure 3

Patterns of genomic divergence between sympatric H. elevatus and H. pardalinus in the Amazon together with locations of mapped traits. The black line and y axis show F ST in 25-kb sliding windows across the genome. Coloured bars show significant QTLs for different traits, with the QTL peak indicated by the triangle and the Bayesian credible intervals by the length of the bar. For colour pattern and wing shape, only QTLs with non-overlapping credible intervals are shown. Most of the genome shows very low F ST due to gene flow in the Amazon, causing the double paraphyly topology for H. elevatus and H. pardalinus in Fig. 1c . Genomic regions with rare phylogenetic topologies (bottom right) supporting introgression from H. melpomene (white circles, introgression tree) and resolving the pardalinus – elevatus species tree (grey circles, species tree) are shown above the plot. These topologies often coincide with one another and with F ST peaks. O, outgroup ( Heliconius ethilla ). FW, forewing; HW, hindwing.

To understand the genetic architecture of traits that allow the coexistence of H. elevatus and H. pardalinus , we crossed sympatric Amazonian populations of these species. We identified quantitative trait loci (QTLs) for several species-specific traits, including colour pattern, male sex pheromones, male preference for female colour pattern, wing shape, flight and female host plant preference, in F 2 and backcross offspring (Fig. 4 ). These traits contribute to reproductive isolation because they are under divergent selection and/or directly determine mate choice. For example, host preference is likely to be under divergent ecological selection and also confers non-random mating because Heliconius mate in the vicinity of their host plants 30 , 31 . In total, we identified 63 QTLs associated with species differences at these traits, which mapped to 14 of the 21 chromosomes (Fig. 3 and Supplementary Table 5 ).

figure 4

a , Heliconius elevatus and H. pardalinus differ in host plant preference during egg-laying; female H. elevatus show a stronger preference for Passiflora venusta relative to Passiflora riparia . Heliconius melpomene has a very distinct host plant preference and lays eggs on neither of these plants (Extended Data Fig. 6 ). Point sizes here and in b , c are scaled by the log of sample size. Error bars are 95% confidence limits. b , The two species differ in flight dynamics; H. elevatus beats its wings significantly faster than H. pardalinus , and converges towards H. melpomene aglaope . c , Given a choice, male H. elevatus individuals preferentially court model female wings with their own colour pattern relative to H. pardalinus , whereas H. pardalinus males exhibit no preference. d , Principal component analysis (PCA) of forewing colour pattern in hybrid crosses with the parental species and H. melpomene aglaope rotated and projected into this space. Wings show the top 10% of pixels contributing to the variance in PC1. e , PCA of male sex pheromones in hybrid crosses, with parental species rotated and projected. Differences between the species are driven mainly by variance in alkanes. Selected loadings: (1) hexahydrofarnesylacetone; (2) ( Z )-9-heneicosene; (3) (Z)-11-eicosenylacetate; (4) ( Z )-9−tricosene; (5) 11-methylhexacosane; (6) 11-methylpentacosane; (7) heptacosane; (8) tricosane; (9) heneicosane (inset figure); and (10) homovanillyl alcohol. f , PCA of hindwing shape in hybrid crosses, with parental species rotated and projected. Inset wing shows changes in hindwing shape observed along PC2. Only landmarks along the margin of the wing are shown. Individual specimens are depicted as circles: H. elevatus , blue; H. pardalinus , red; F 2 s and backcrosses, grey; and H. melpomene , yellow.

QTLs for colour pattern mapped to chromosomes 1, 5, 10, 12, 13, 15 and 18, with those on chromosomes 10, 15 and 18 containing the known colour patterning genes WntA , cortex and optix (refs. 16 , 17 , 18 ). We identified a large effect locus on chromosome 20 that determined variation in hindwing shape ( H. elevatus ancestry is associated with wider and shorter hindwings). Hybrid flight dynamics were quantified using high-frame-rate video footage. A single locus on chromosome 12 predicted wing beat frequency and explained 43% of the variance. Consistent with species differences (Fig. 4b ), individuals with genotype EE (homozygous ancestry for H. elevatus ) beat their wings faster (11.2 ± 0.1 Hz) than did PP (homozygous ancestry for H. pardalinus ) individuals (10.9 ± 0.2 Hz), in which E is the H. elevatus allele and P is the H. pardalinus allele. In controlled insectary experiments, Heliconius elevatus females exhibited a strong preference for Passiflora venusta relative to Passiflora riparia (Fig. 4a ), concordant with wild host plant records 17 . A single locus on chromosome 2 predicted the preference of female hybrids for different host plants (Fig. 3 ); the probability of ovipositing on P. venusta increased from 0.3 (s.e. 0.19–0.42) for genotype PP to 0.87 (s.e. 0.81–0.91) for genotype EE.

Mate choice among sympatric populations is further mediated by female preference for male sex pheromones secreted on wing androconia and male preference for colour pattern (attractiveness of females to males) 13 . We found large effect QTLs for male androconial volatiles on chromosomes 19 and 20. These genomic regions (see Supplementary Table 5 ) contain many genes encoding enzymes that are involved in fatty acid metabolism, such as reductases and Δ9-desaturases 32 —strong candidates for controlling differences between the saturated-fatty-acid-derived androconial volatiles of H. elevatus , and the unsaturated-fatty-acid-derived blend of H. pardalinus . For example, genotype EE at the chromosome 19 locus is associated with an approximately 100-fold increase in the concentration of heneicosane, relative to genotype PP, with the QTL explaining about a third of the variance.

The male colour pattern preference QTL with the highest LOD score (3.14) was tightly linked to QTLs for androconial volatiles and wing shape on chromosome 20. However, this QTL was not statistically significant. This might be explained if male preference is highly polygenic. In support of this, we found that the probability of a male courting the H. elevatus colour pattern is positively correlated with the total fraction of the male’s H. elevatus chromosomal ancestry ( P  < 0.05, generalized linear mixed model with binomial errors and individual-level random effect). For comparison, we applied the same test to host plant choice and found no such association, suggesting that host preference has a simpler genetic basis.

Consistent with hybridization driving speciation, QTLs underpinning species-specific traits are linked to genomic windows introgressed from H. melpomene far more often than when the position of these QTLs is randomized across the genome (mean recombination rate between QTLs and nearest introgression topology, c  = 0.26; randomized mean c  = 0.39; P  < 0.001). QTL peaks that are tightly linked to H. melpomene introgression regions ( c  < 0.05) include those that determine colour pattern mimicry on chromosomes 10, 15 and 18, wing shape on chromosomes 19 and 20, male sex pheromones on chromosome 19 and 20, host plant preference on chromosome 2 and male preference on chromosome 20. Moreover, for colour pattern, wing shape, male sex pheromones and flight behaviour, H. elevatus exhibits trait values similar to those of H. melpomene (Fig. 4 ), providing a direct link between introgression, genotype and phenotype. Hence, these loci influencing ecological traits and derived from introgression represent key genomic regions that enabled hybrid speciation (Fig. 3 ).

Linkage or pleiotropy among traits are often thought to be necessary to circumvent the homogenizing effect of gene flow 5 , 33 . After removing overlapping loci in the colour pattern and wing shape phenotypic classes (Fig. 3 ), 28% of QTLs were tightly linked to at least one other species trait locus (recombination fraction c  < 0.05), and only 11% were completely unlinked ( c  ≈ 0.5). The mean recombination fraction ( c ) between trait loci and their nearest neighbour was significantly lower than when positions of the loci were randomized across the genome (observed mean c  = 0.26; randomized mean c  = 0.37; P  = 0.001). Thus, although QTLs for traits that underpin reproductive isolation are scattered across the genome, there is nonetheless significant clustering among traits. Inversions can be important for maintaining linkage disequilibria between traits that confer reproductive isolation as they suppress recombination 34 . However, with the exception of chromosome 15, in which a known inversion is associated with colour pattern differences between H. elevatus and H. pardalinus 35 , we found no candidate inversions overlapping QTL peaks (Extended Data Fig. 7 ).

Speciation was driven by introgression

The question of how new species originate and adapt to environments is fundamental to evolutionary biology. Hybridization might have a key role in establishing barriers to gene flow by creating new allelic combinations 36 , 37 . Many genomic studies have provided evidence of admixture among species 4 , 38 , 39 , 40 , 41 , but convincing cases of homoploid hybrid speciation remain scarce 1 , 3 , 6 . Here we show that H. elevatus is a hybrid species, the origin of which was triggered by introgression of traits from H. melpomene into a H. pardalinus -like ancestor (Fig. 1a ). These traits place Heliconius elevatus on a separate adaptive peak and allow it to coexist in sympatry with both parental species, despite occasional but pervasive gene flow with H. pardalinus that distorts the evolutionary history of 99% of the genome away from the species tree. Furthermore, we estimate that H. elevatus has persisted in widespread sympatry as a distinct lineage for over 720,000 generations, suggesting that it is stable and not in the process of fusing with H. pardalinus . To our knowledge, this makes it the oldest reported case of homoploid hybrid speciation, and our study is among the few to fulfil the strict criteria for hybrid speciation that were laid out in a previous study 3 . Because H. elevatus overlaps broadly across its Amazonian distribution with both progenitors, it also differs from most other previously described putative hybrid species 42 , 43 , 44 , including Heliconius heurippa 45 , which overlap with only one or neither of the parental lineages. Consistent with models of sympatric speciation 46 , traits conferring mate choice and divergent selection are clustered within the genome. Nonetheless, there are multiple clusters of these species-specific QTLs across different chromosomes. Adaptive coupling among these unlinked loci therefore spreads the effects of selection across the genome, allowing multiple genomic regions to evolve as a coadapted unit 47 , 48 , 49 , 50 , 51 . The capacity of this multilocus genetic architecture to resist gene flow indicates that sympatric speciation can occur more readily than predicted by simple theory based on small numbers of traits or loci.

Data collection and whole-genome resequencing

Collections and library preparation.

Adult butterflies were collected between 2009 and 2018 and stored at −20 °C in either salt-saturated dimethyl sulfoxide or 100% ethanol. RNA-free genomic DNA was extracted from the thorax of butterflies using Qiagen Blood and Tissue and E.Z.N.A Tissue DNA kits (Omega Bio-tek), and used to prepare 350-bp insert Illumina libraries for 33 individuals, which were sequenced using 100–150-bp paired-end sequencing on Illumina instruments. Collecting and export permit numbers are provided in the Acknowledgements. We complemented these samples with previously published sequences (see Supplementary Table 1 for sample details).

Read filtering, mapping and genotype calling

After demultiplexing, reads were filtered for Illumina adapters using cutadapt v.1.8.1 (ref. 52 ) and then mapped to the H. melpomene assembly v.2.5 (Hmel2.5, ref. 53 )(ref. 54 ) using BWA-MEM v.0.7.15 (ref. 55 ) with default parameters and marking short split hits as secondary. Mapped reads were sorted and duplicate reads removed using sambamba v.0.6.8 (ref. 56 ) sort and markdup modules, respectively. Mapped reads were further realigned around indels using the Genome Analysis Toolkit (GATK) v.3.8 RealignerTargetCreator and IndelRealigner modules 57 , 58 , to reduce the number of indel miscalls. Read depth and other relevant read alignment quality control metrics were computed using QualiMap v.2.2.1 (ref. 59 ).

Genotype calling was performed using the bcftools v.1.5 (ref. 60 ) mpileup and call modules, requiring a minimum MQ (mapping quality) and QUAL (base quality) of 20. Genotyping was performed jointly for individuals belonging to the same population using the multiallelic and rare-variant calling option (-m) in bcftools call. Ploidy aware genotype calling was performed for the Z chromosome. Genotypes were filtered using the bcftools filter module. Both invariant and variant sites were required to have QUAL (quality of the variant call) ≥ 20 and MQ (root mean square mapping quality) ≥ 20, with DP (read depth) ≥ 8 for individual genotypes (DP ≥ 4 for females on the Z chromosome) and GQ (genotype quality) ≥ 20. All genotypes not fulfilling these criteria or within 5 bp of an indel (--SnpGap) were recoded as missing data.

Species relationships and demographic modelling of hybrid speciation

Relationships between H. elevatus , H. pardalinus , H. melpomene and other closely related species were investigated by building a phylogenetic network. The dataset was filtered to include only biallelic sites (excluding singletons) without missing data and at least 1 kb apart, using Plink v.1.9 (ref. 61 ). Pairwise absolute genetic distances between all pairs of samples were calculated using the disMat.py script ( https://github.com/simonhmartin/genomics_general ). The distance matrix was then used to construct a phylogenetic network using the NeighbourNet approach 62 , implemented in SplitsTree v.4.15.1 (ref. 63 ), with default parameters.

We also investigated species relationships by estimating a concatenated neighbour-joining tree. In this analysis, we included both variable and invariable sites, at least 1 kb apart and without missing data. The neighbour-joining tree was estimated from individuals’ pairwise distances using the R package ape v.5.7 (ref. 64 ) ‘read.dna’ and ‘nj’ functions. Trees were rooted using the ‘midpoint’ function from the R package phangorn v.2.11.1 (ref. 65 ). Bootstrap supports were obtained on the basis of 100 bootstrap replicates, using the ‘boot.phylo’ function in the R package ape v.5.7 (ref. 64 ).

Genealogical relationships along the genome between the three focal species ( H. elevatus , H. pardalinus and H. melpomene ) were further investigated using TWISST 66 ( https://github.com/simonhmartin/twisst ), and using Heliconius nattereri as an outgroup species. Only SNPs fixed in the outgroup ( H. nattereri ), variable in the focal species and with a minimum allele frequency (MAF) of 0.05 were considered. Statistical phasing and imputation were performed using Beagle v.5.1 (ref. 67 ), with default settings. The phased filtered dataset was used to infer neighbour-joining phylogenies for non-overlapping windows of 1,000 SNPs (median size of around 23.6 kb), assuming the GTR substitution model, in PHYML (ref. 68 ). Exact weightings were computed for all phylogenies. Windows were classified into each of the following categories when weighting support was 0.5 or greater: (i) H. elevatus and H. pardalinus group together but are not reciprocally monophyletic; (ii) H. elevatus and H. pardalinus group together and are reciprocally monophyletic; (iii) H. elevatus and H. melpomene group together but are not reciprocally monophyletic; and (iv) H. elevatus and H. melpomene group together and are reciprocally monophyletic.

To infer the timing of introgression from H. melpomene into H. elevatus and its split from H. pardalinus , we used the multispecies coalescent-with-introgression (MSCi) model implemented in BPP v.4.6.2 (ref. 22 ) (A00 analysis). For each species of the three species, we selected four individuals to generate sequenced alignments. For H. melpomene , we used H. melpomene aglaope from Peru. Given the population structure between Amazonian and non-Amazonian population of both H. elevatus and H. pardalinus and evidence for gene flow between the two species in the Amazon, we first performed this analysis using the non-Amazonian populations (that is, H. elevatus bari and H. pardalinus sergestus ). Loci were selected randomly from autosomes, requiring loci to be 2 kb long, a minimum distance of 20 kb to the next closest locus and 5 kb from the closest exon as annotated in H. melpomene assembly v.2.5. For each locus, individuals with more than 20% missing data and sites containing missing genotype calls were removed. Only loci containing all individuals and 800 bp passing filters were considered. Heterozygous sites were assigned IUPAC ambiguity codes. Demographic parameter estimation was performed using a fixed species tree, with introgression events (see Fig. 1e and Extended Data Fig. 1 ). An inverse gamma prior (invG) was applied both to the root age ( τ 0 ) and to all populations’ effective population sizes ( θ ) – invG( a  = 3, b  = 0.06) and invG( a  = 3, b  = 0.04), respectively. A beta prior was applied to the introgression probability ( j ) – Beta( a  = 1, b  = 1). The MCMC was run for 1,000,000 iterations after 50,000 iterations of burn-in, sampling every 10 iterations.

Historic and recent gene flow

Species-diagnostic snps.

To characterize instances of recent gene flow between Amazonian H. pardalinus and H. elevatus , we relied on ancestry-informative SNPs (allele frequency difference ≥ 0.8) between these two groups. Only ancestry-informative SNPs at least 10 kb apart were considered. For each SNP, an ancestry score of 0 and 1 was assigned for H. elevatus homozygous and H. pardalinus homozygous variants, respectively, and 0.5 for heterozygous. We then calculate each individual’s ancestry (average ancestry across SNPs) and heterozygosity, on the basis of the ancestry-informative SNPs passing the filters. A custom R script was used to visualize genotypes of species-diagnostic SNPs across the genomes of different individuals. The same approach was used to determine species-diagnostic SNPs between Amazonian H. elevatus and H. melpomene .

f 4 statistics

We calculated the f 4 statistics in ADMIXTOOLS (ref. 69 ) to measure shared drift between pairs of populations of different species in the same location versus between pairs of populations of the same species in different locations. Shared drift between populations of different species in the same location is indicative of gene flow between species, and shared drift between populations of the same species in different locations is indicative of grouping by species. Only autosomal biallelic SNPs were considered in this analysis. Standard errors were estimated through a weighted block jackknife approach over 500-kb blocks. We also measured the Euclidean geographic distance between all possible pairs of locations and performed a Mantel test for its correlation with the f 4 statistics.

Estimates of gene flow between population pairs

We used G-PhoCS (ref. 70 ) to estimate divergence times, effective population sizes and migration rates between pairs of populations of H. elevatus and H. pardalinus , both within and between species. In all analyses, we also include one individual from an outgroup species ( Heliconius besckei ) and estimate model parameters assuming possible bidirectional migration between the two ingroup species. G-PhoCS uses multiple independent neutrally evolving loci to infer demographic parameters. Therefore, we first defined regions of the genome within scaffolds larger than 1 Mb and at least 1 kb away from exons as annotated in H. melpomene assembly v.2.5. Within these regions we then selected 1-kb blocks that were at least 10 kb apart from the nearest block and produced sequence alignments, masking annotated repetitive elements and CpG islands identified with the software gCluster (ref. 71 ). Because previous studies have reported extensive introgression between both H. elevatus and H. pardalinus with other Heliconius species in large regions of the genome surrounding the three major colour pattern loci, we excluded blocks in chromosomes containing these loci (chromosomes 10, 15 and 18). We also excluded blocks in the Z chromosome owing to its different effective population size. For each alignment, we excluded individuals with more than 60% missing genotype calls, and only alignments with at least three individuals per population (or all individuals in the populations for those with fewer than three individuals) and a minimum of 100 bp for which no more than 25% of individuals had missing genotype calls were considered. We coded heterozygous genotype calls using IUPAC codes. A gamma prior with α  = 2 and β  = 100 was used for both the mutational-scaled effective population size ( θ ) and the divergence time ( τ ) between the two ingroup populations, whereas a gamma prior with α  = 2 and β  = 50 was used for the divergence time to the outgroup. For the mutation-scaled migration rates, we defined a gamma prior with α  = 0.005 and β  = 0.00001. The model was run three times, with a burn-in of 50,000 iterations (allowing for automatic fine-tuning of the parameters) followed by 200,000 iterations, sampling every 200 iterations. Convergence of the Markov chain and between the three different replicates was inspected using custom scripts. To convert the θ and τ estimates to absolute effective population size and divergence time, we assumed an average mutation rate ( µ ) of 2.9 × 10 −9 substitutions per site per generation and an average generation time ( g ) of 0.25 years (ref. 72 ). We also obtain estimates of the effective migration rate ( N e m ) using the formula: N e m AB  =  M AB × θ B /4.

Simulations to infer robustness of G-PhoCS inferences

Whenever Nm  > 1, estimates of Nm for the same population comparisons varied both in value and directionality between different replicate runs of G-PhoCS. To investigate the cause for these differences, we performed coalescent simulations using MSMS (ref. 73 ). We considered the same demographic scenario as for the G-PhoCS runs; that is, two sister populations (A and B) that diverged at T D1 and split from the outgroup (C) at T D2 , and allowing either unidirectional or bidirectional migration between A and B. The split time between the two sister populations ( T D1 ) was set to four million generations, and eight million generations for the split of the outgroup ( T D2 ). An effective population size ( N e ) of one million or five million was assumed for the two ingroup populations (400,000 for the outgroup), and varying levels of gene flow ( Nm ) were considered (0.01, 0.1, 1.0, 2.0 and 10.0). For each scenario, we simulated 100 trees in MSMS (ref. 73 ), from which we generated sequence alignments using Seq-Gen v.1.3.4 (ref. 74 ). Custom scripts were used to combine pairs of haploid sequences into diploid sequences, using IUPAC codes for heterozygous sites, and to convert the alignments to the G-PhoCS sequence format. Finally, we ran G-PhoCS for the simulated datasets using the same settings as described above. Whenever Nm  > 1 in the simulated datasets, G-PhoCS showed a similar behaviour to what was seen in our analysis of the Heliconius data (Supplementary Table 6 ). We believe that this effect is due to the difficulty of estimating gene flow when the populations are nearly panmictic. Hence, for each population pairwise comparison, the highest Nm estimate among the three replicate runs is presented in Fig. 2b .

Species-tree inference

Phylogenetic relationships between the H. pardalinus and H. elevatus major groups were inferred using the multispecies coalescent (MSC) approach implemented in BPP v.4.6.2 (ref. 22 ), while accounting for incomplete lineage sorting. Three H. p. sergestus individuals (with the highest coverage) and three H. elevatus individuals from the Guianas (the individual with the highest coverage per location (French Guiana, Suriname and Venezuela)) were considered. For Amazonian H. pardalinus and H. elevatus , again, only the individual with the highest coverage from each of three locations—Ecuador, Bolivia and Brazil—was included. For this analysis, loci were selected by first defining regions of the genome within scaffolds larger than 1 Mb. To minimize the effect of linked selection, these regions also had to be at least 2 kb from exons as annotated in Heliconius melpomene v.2.5 (Hmel2.5, ref. 54 ). Because the analysis assumes no intra-locus recombination and independence between loci, we selected loci of 100–250 bp and at least 2 kb from neighbouring loci. Sequence alignments were produced for all loci, masking repetitive elements as annotated in the reference genome and CpG islands identified with the software gCluster (ref. 75 ). For each locus, individuals with more than 50% missing genotype calls were excluded from the alignment and only loci with at least two individuals per population were considered. Furthermore, sites with more than 20% of individuals with missing genotype calls were removed and loci with less than 50 bp passing filters were excluded. Loci were grouped into blocks of 100 loci, and those overlapping the inversion on chromosome 15 were grouped in a separate block. Species-tree estimation was then performed in BPP v.4.6.2 using the A01 analysis (species-tree inference assuming no gene flow). Inverse gamma priors (invGs) were applied both to the root age ( τ 0 ) and to effective population sizes ( θ ) – invG(3, 0.06) and invG(3, 0.04), respectively. Parameters were scaled assuming a mutation rate of 2.9 × 10 −9 substitutions per site per generation and a generation time of 0.25 years (ref. 54 ). The MCMC was run for 1,000,000 iterations after 50,000 iterations of burn-in, sampling every 10 iterations. Three independent runs were performed for each block, using different starting species trees, and only blocks showing consistency among the three independent runs were considered. The most abundant estimated tree across the genome showed both species to be paraphyletic with respect to each other (Extended Data Fig. 2 ). We believe that this non-taxonomic arrangement is due to gene flow, which is not accounted for in the model.

Demographic modelling by analysis of site-frequency spectra

To understand the prevalence of gene flow at different stages of the speciation history of H. elevatus and H. pardalinus , we performed demographic modelling based on analysis of the site-frequency spectrum (SFS) using fastsimcoal2 v.2.7.0.2 (ref. 76 ). For this analysis, we considered all Amazonian and non-Amazonian populations of H. elevatus and H. pardalinus . Individuals with more than 50% missing data were excluded from the analysis and only sites genotyped in at least 80% of the individuals (including all four H. p. sergestus ) were considered. Furthermore, only sites at least 2 kb apart and at least 10 kb from exons were considered, to mitigate the effects of linkage disequilibrium and linked selection, respectively. We further excluded sites within repetitive regions as annotated in the H. melpomene assembly Hmel2.5. The 209,115 sites that were retained after filtering were polarized by assessing the allele present in three outgroup species— H. besckei , Heliconius ismenius telchinia and Heliconius numata robigus . From each of the outgroup species, we chose one individual with the highest coverage and assigned the ancestral allele to each site if it was genotyped and monomorphic in the outgroup species. The unfolded multidimensional site-frequency spectrum (multiSFS) was generated using easySFS ( https://github.com/isaacovercast/easySFS ), using the recommended down projection approach (four individuals of H. p. sergestus ; 10 northeastern group H. elevatus ; and 20 H. pardalinus and 20 H. elevatus individuals from the Amazon) to maximize the number of segregating sites while accounting for missing data. For each demographic model, fitting of the simulated multidimensional site-frequency spectra to the empirical data was maximized using the composite-likelihood method implemented in fastsimcoal v.2.7 (ref. 77 ). For all model parameters, we used wide search ranges from which initial starting parameter values were randomly sampled. For each model, we performed 100 independent fastsimcoal2 runs. Parameter estimates optimization was performed for 40 expectation-maximization cycles and the expected SFS was estimated using 100,000 coalescent simulations. The best fitting model was identified by means of the Akaike information criterion, considering for each model the optimization run with the highest likelihood (using the script https://github.com/speciationgenomics/scripts/blob/master/calculateAIC.sh ). To account for stochasticity in the likelihood approximation, we further compared likelihood distributions of the different models by performing 100 independent runs from parameter values estimated under the most likely replicate run for each model. Finally, for the best fit model, confidence intervals around the maximum likelihood parameter estimates were obtained by nonparametric block-bootstrapping. For this, the 209,115 sites were divided into 100 blocks and sampled with replacement.

Genomic islands of divergence and introgression

Summary statistics.

We calculated between-population differentiation ( F ST ) for Amazonian and non-Amazonian populations of both H. elevatus and H. pardalinus groups, in sliding windows of 25 kb (5 kb step size) along the genome using the script popgenWindows.py ( https://github.com/simonhmartin/genomics_general ). The script implements a version of Hudson’s K ST (ref. 78 ), modified to avoid weighting nucleotide diversity in each population by sample size. Individuals with more than 50% missing data were removed. Only sites with a maximum of two alleles, and with at least three individuals with genotype calls per population (or the total number of individuals in populations with fewer than three individuals) were considered. Only windows with at least 10% of sites passing filters were considered in the analysis.

Topology weighting

To determine genomic regions in which H. elevatus and H. pardalinus are reciprocally monophyletic (that is, genomic regions that are potentially involved in species barriers), genealogical relationships between Amazonian and non-Amazonian populations along the genome were quantified using TWISST 66 ( https://github.com/simonhmartin/twisst ). The same dataset as for F ST was used, but also adding five individuals of representative outgroup species ( H. besckei , H. ismenius , H. numata , H. nattereri and H. ethilla ). Statistical phasing and imputation were performed using Beagle 5.1 (ref. 67 ), with default settings. Only SNPs fixed in all outgroup individuals and variable in the ingroup population with an MAF of 0.05 were considered. The phased filtered dataset was used to infer neighbour-joining phylogenies for windows of 100 SNPs (slide every 25 SNPs), assuming the GTR substitution model, in PHYML (ref. 68 ). Exact weightings were computed for all phylogenies. To estimate the proportion of trees supporting a grouping of individuals by species versus grouping by geography, we considered five groups: (i) H. elevatus from the Guianas (Venezuela, Suriname and French Guiana); (ii) H. elevatus from the Amazon; (iii) H. pardalinus from the Amazon; (iv) H. p. sergestus (Andes); and (v) an outgroup, H. nattereri . Because we hypothesize that introgression from H. melpomene into H. elevatus could be involved in speciation of the latter and H. pardalinus , the same analysis was performed including only Amazonian H. elevatus , Amazonian H. pardalinus and two H. melpomene populations ( H. m. amaryllis and H. m. aglaope ). By including H. ethilla (a sister species to H. elevatus and H. pardalinus ) as a fifth population, we were able to polarize the genealogies, allowing determination of the direction of introgression.

Association between H. melpomene introgression and genomic islands of divergence

To test whether H. melpomene introgression in the genome of H. elevatus is associated with genomic islands of divergence between sympatric H. elevatus and H. pardalinus , we performed a Fisher’s exact test. First, we defined genomic islands of divergence as regions with F ST  ≥ 0.2 and in which TWISST recovered both H. pardalinus and H. elevatus as reciprocally monophyletic (with weight ≥ 0.8). Second, we defined as introgressed, genomic regions in which TWISST grouped H. elevatus with H. melpomene with a weight ≥ 0.8. We then performed a Fisher’s exact test, as implemented in bedtools v.2.30.0 (ref. 79 ), to test whether the two sets of genomic intervals overlap more than expected given the size of the reference genome.

Genetic mapping of traits involved in reproductive isolation

Captive populations of Amazonian H. elevatus pseudocupidineus , H. pardalinus butleri and H. m. agalope were established in outdoor insectaries in Tarapoto, Peru and in heated indoor insectaries in York, UK, as previously described 17 . Crosses for QTL mapping were generated by mating H. elevatus with H. pardalinus to produce F 1 broods, and then by either crossing these amongst themselves to generate F 2 broods or backcrossing to parental taxa.

Colour pattern phenotyping

Dorsal surfaces of wings from 12 H. elevatus , 19 H. pardalinus , 14 H. m. aglaope , 348 F 2 and 50 backcross hybrids were photographed in a light box against a white background using a Canon EOS D1000 together with an X-rite ColorChecker Mini (Supplementary Table 7 ). From each image, we selected a single forewing and hindwing for analysis, clipped the image to the wing outlines and flipped wings when necessary to ensure that all were similarly orientated (resulting in two files; one forewing and one hindwing). To align the wings so that pixels represent homologous units among individuals, we used image registration 80 , a regression-based method that aligns two sets of wings (a source and a reference) according to intensity-based similarity. We chose the reference set of wings using the PCA of wing shape (see below). For forewing (36 PCs) and hindwing (26 PCs) we found the mean value for each PC across all F 2 and backcross individuals. We assigned the reference individual as the individual that had the minimum deviation from these mean values (summed across all PCs). We then checked all alignments by eye. To allow for minor misalignment or damage to wings, we included pixels in which up to 5% of individuals had missing RGB values.

Wing shape was quantified in 31 H. elevatus , 26 H. pardalinus , 10 H. m. aglaope and 308 F 2 and 36 backcross hybrids using landmark-based geometric morphometrics analyses (Supplementary Table 7 ). The ventral side of the butterfly wings was scanned using a flatbed scanner at 300 dpi and landmarks were placed at specific vein intersections 81 on the forewing (20 landmarks) and hindwing (15 landmarks) using tpsDig2 82 . Landmark coordinates were adjusted for size and orientation using a Procrustes analysis from the package geomorph 83 . Forewings and hindwings were analysed separately.

H. elevatus ( n  = 12), H. pardalinus ( n  = 13), H. m. aglaope ( n  = 5) and F 2 s ( n  = 40) were filmed flying freely in a large flight cage (5 × 2.5 × 2 m) using a GoPro HERO 4 Black camera at 239.7 frames per second at a resolution of 720p (Supplementary Table 7 ). Videos were studied in slow motion using GoPro Studio v.2.5.9.2658. Flight sequences in which an individual was flying straight and level for at least five wing beats were selected to measure wing beat frequency (WBF). WBF was measured by counting the number of complete wing beats and the number of video frames. Five WBF measurements were taken per individual from separate flight sequences and used to calculate the individuals’ mean WBF by dividing the total number of wing beats across all flight sequences by the total flight time estimated from the number of video frames.

Female host plant preference

Host plant preference assays for QTL mapping were performed by introducing single H. elevatus , H. pardalinus and F 2 females ( n  = 24, 32 and 31, respectively) into cages measuring 1 m ( W ) × 2 m ( L ) × 1.7 m ( H ), with two approximately equally sized shoots of the host plants ( P. riparia and P. venusta ) placed in the back corners. At the end of each day, the number of eggs laid on each plant species was recorded and the eggs were removed (Supplementary Table 7 ). To compare the oviposition preference of Peruvian H. elevatus , H. pardalinus and H. melpomene , groups of females (wild-caught and/or reared) of a given taxon were released into a large cage (2.5 m ( W ) × 5 m ( L ) × 2 m ( H )) containing single representatives of 21 species of Passiflora that are commonly found near Tarapoto, Peru and which represent potential host plants 17 . The number of eggs laid on each host plant was recorded at the end of each day. A total of 126 females were tested, resulting in a total of 889 eggs (176 from 35 H. elevatus females, 288 from 24 H. melpomene and 425 from 51 H. pardalinus ).

Male sexual preference

To assay male preference for female colour pattern, we presented H. elevatus , H. pardalinus and F 2 males ( n  = 46, 66 and 106, respectively) with a pair of model female wings (one H. elevatus and one H. pardalinus ), and recorded courtship events (full details of the experimental set-up are provided in ref. 17 ). Males were tested individually and placed in the experimental cage one day earlier to allow acclimatization. Trials lasted 15–30 min. The number of courtships (defined as sustained flight 5–15 cm over a model) by the males directed towards each of the model wings was recorded (Supplementary Table 7 ).

Phenotyping of androconial volatiles

Male Heliconius produce complex chemical blends of volatile compounds from their hindwing androconia. These blends have been shown to function as sex pheromones in several other Heliconius species and in butterflies in general 84 , 85 . Androconial regions were excised from 13 H. elevatus , 10 H. pardalinus , 7 H. melpomene malleti individuals and 122 F 2 and 17 backcross hybrids 21 days after eclosion, and suspended in dichloromethane. The extracts were analysed by gas chromatography–mass spectrometry (GC–MS), as reported previously 16 , 86 (Supplementary Table 7 ) on a 7890A GC-System coupled with an MSD 5975C mass analyser (Agilent Technologies) instrument fitted with an HP-5MS column (50 m, 0.25 mm internal diameter, 0.25 µm film thickness). The ionization method was electron impact with a collision energy of 70 eV. Conditions were as follows: inlet pressure 9.79 psi, He 20 ml min −1 , injection volume 1 µl. The GC was programmed as follows: start at 50 °C, increase at 5 °C min −1 to 320 °C and hold that temperature for 5 min. The carrier gas was He at 1.2 ml min −1 . For all identified compounds, the concentration was calculated from the peak’s area, as reported by AMDIS software 87 . Each compound’s chromatogram was interpreted by AMDIS through the NIST databases and the additional databases compiled at the Institute of Organic Chemistry of Technische Universität Braunschweig. All identifiable compounds running between undecane and nonacosanal were scored. Potential contaminants or extraneous compounds were excluded, together with compounds that appeared fewer than 10 times across the entire dataset.

DNA extraction and RAD library preparation for QTL analysis

RNA-free genomic DNA was extracted from thoracic tissue using a Qiagen DNeasy Blood and Tissue Kit following manufacturer’s standard protocol. Restriction-site-associated DNA (RAD) libraries were prepared using a protocol modified from (ref. 88 , using a PstI restriction enzyme, sixteen 6-bp P1 barcodes and eight indexes. DNA was Covaris sheared to 300–700 bp and gel size selected. A total of 128 individuals were sequenced per lane, with 125-bp paired-end reads, on an Illumina HiSeq 2500 (Supplementary Table 8 ).

SNP calling

Fastq files from each RAD library were demultiplexed using process_radtags from Stacks 89 , and BWA-MEM 90 was used with default parameters to map the reads to the H. melpomene assembly v.2.5 (ref. 91 ). BAM files were then sorted and indexed with Samtools (ref. 90 ), and Picard v.1.119 ( https://github.com/broadinstitute/picard ) was used to add read group data and mark PCR duplicates. To check for errors, confirm pedigrees and assign samples with unrecorded pedigree to families, we used Plink v.1.9 (ref. 61 ) to estimate the fraction of the genome that is identical by descent (IBD; \(\widehat{{\boldsymbol{\pi }}}\) ) between all pairwise combinations of samples (siblings and parent-offspring comparisons should yield \(\widehat{{\boldsymbol{\pi }}}\) values close to 0.5). In addition, for specimens that were sequenced multiple times, we checked that samples derived from the same individual \((\widehat{{\boldsymbol{\pi }}}\approx 1)\) . We then merged these samples, using the MergeSamFiles command from Picard Tools, and used Samtools mpileup command to call SNPs.

Linkage map construction

Linkage maps were built for hybrid and within-species crosses using Lep-MAP3 (ref. 92 ). Pedigrees are provided in Supplementary Table 8 . SNPs were first converted to posterior genotype likelihoods for each of ten possible SNP genotypes. We used the ParentCall2 module to correct erroneous or missing parental genotypes and call sex-linked markers using a log-odds difference of >2 (ZLimit) and halfSibs = 1. We used Filtering2 to remove SNPs showing segregation distortion, specifying a P value limit of 0.01; that is, there is a 1:100 chance that a randomly segregating marker is discarded. We then separated markers into chromosomes using their Hmel2.5 scaffold. To obtain genetic distances between markers, we fixed the order of the markers to their order in Hmel2.5, and then evaluated this order, using all markers and specifying no recombination in females. We then used map2gentypes.awk to convert the Lep-MAP3 output to four-way fully informative genotypes with no missing data. To assign ancestry to phased haplotype blocks in the hybrid linkage map, we used biallelic sites with significantly different allele frequencies in the parental species ( χ 2 test applied to sequences for 26 H. elevatus and 47 H. pardalinus individuals from Peru and Ecuador).

QTL mapping

The colour pattern, androconial volatiles and wing shape datasets are multivariate and highly collinear. We therefore used PCA to reduce the phenotypic values for the hybrids to orthogonal vectors (PCs), which we then used as phenotypes in QTL mapping. For wing shape, we applied PCA to the Procrustes coordinates. For the androconial volatiles, we applied the PCA to the set of compounds that were significantly different between the two parental species (one-tailed paired t -test). For colour pattern, we performed a PCA on the concatenated RGB values from the aligned images and retained PCs that explained more than 1% of the variance.

For colour pattern, androconial volatiles, wing shape and WBF, we tested for associations between phenotype and genotype using linear models with normal errors. For wing shape, we included centroid size as a covariate to control for allometry. For female host plant choice and male preference for female colour pattern, we (i) logistically transformed the proportions and used linear models with normal errors; and (ii) used generalized linear mixed models with an individual-level random effect to account for overdispersion and binomial errors. The significance of QTL scans was assessed by permuting the phenotypes relative to the genotypes (1,000 permutations). For traits phenotyped in both males and females, a sex-specific significance threshold was used to avoid spurious sex linkage (see Supplementary Table 5 ).

We first analysed all data using F 2 s only, using R/qtl (ref. 93 ) to estimate genotype probabilities at 1-cM intervals, using the Haldane mapping function and an assumed genotyping error rate of 0.001. These genotype probabilities were then used as the dependent variable in models, and for traits phenotyped in both males and females we included sex and cross direction as covariates for markers on the sex chromosome. For traits for which backcrosses had been scored in addition to F 2 s, we performed an additional round of analyses combining F 2 s with backcrosses. In this case, we used the categorical genotypes (EE, EP and PP) inferred from linkage mapping as the dependent variables, and added random effects for cross type (three levels: F 2 , backcross to H. elevatus , backcross to H. pardalinus ), sex or individual. Model structures and estimated coefficients are provided in Supplementary Table 5 .

To test whether QTLs are significantly clustered (that is, genetically linked), for each QTL we estimated the recombination probability with its nearest neighbouring QTLs (using the position of the maximum LOD score), and took the mean of the resulting vector (low values indicate that most QTLs are linked to at least one other QTL; high values indicate that most QTLs are unlinked). We then randomized the position of the QTLs 10,000 times and compared the observed data to the randomized dataset using a two-tailed test ( P  = the proportion of randomized datasets that give a result more extreme than the observed data × 2). When multiple QTLs overlapped within the phenotypic classes forewing colour pattern, hindwing colour pattern, forewing shape and hindwing shape, we included only the best supported QTL (highest LOD score). To test whether species and introgression topologies are associated with QTLs, we applied the same test.

To identify putative structural rearrangements between H. elevatus and H. pardalinus , we compared recombination rates between F 2 s and within-species crosses (F 2 s, 441 individuals across 26 families; H. elevatus , 179 individuals across 9 families; H. pardalinus , 296 individuals across 15 families). Regions that are freely recombining within species but not in F 2 s represent candidate rearrangements that might facilitate divergence and speciation. The probability of the within-species recombination events observed within an F 2 breakpoint can be given as p n , where p is the fraction of parental individuals in the mapping crosses and n is the observed number of recombination events. We estimated p n within each F 2 breakpoint and considered breakpoints in which p  < 0.01 to be candidate rearrangements.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

Newly generated whole-genome sequencing data used in the population genomic analyses and RAD-sequencing data used in the cross analyses have been uploaded to the NCBI Sequence Read Archive (SRA) ( PRJNA1074694 ). NCBI SRA accessions for individual samples are listed in Supplementary Tables 1 and 8 . Phenotypic data are available in Supplementary Table 7 and at https://doi.org/10.5281/zenodo.10685466 (ref. 94 ) and https://doi.org/10.5281/zenodo.10689714 (ref. 95 ).

Code availability

Custom code used for the genomic analyses ( https://github.com/FernandoSeixas/HeliconiusHybridSpeciation ) and the QTL mapping ( https://github.com/heliconius-maps/HeliconiusHybridSpeciation ) is available from GitHub.

Lamichhaney, S. et al. Rapid hybrid speciation in Darwin’s finches. Science 359 , 224–228 (2018).

Article   ADS   CAS   PubMed   Google Scholar  

Abbott, R. et al. Hybridization and speciation. J. Evol. Biol. 26 , 229–246 (2013).

Article   CAS   PubMed   Google Scholar  

Schumer, M., Rosenthal, G. G. & Andolfatto, P. How common is homoploid hybrid speciation? Evolution 68 , 1553–1560 (2014).

Article   PubMed   Google Scholar  

Lamichhaney, S. et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518 , 371–375 (2015).

Coyne, J. A. & Orr, H. A. Speciation (Sinauer Associates, 2004).

Olave, M., Nater, A., Kautt, A. F. & Meyer, A. Early stages of sympatric homoploid hybrid speciation in crater lake cichlid fishes. Nat. Commun. 13 , 5893 (2022).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Barker, M. S., Arrigo, N., Baniaga, A. E., Li, Z. & Levin, D. A. On the relative abundance of autopolyploids and allopolyploids. New Phytol. 210 , 391–398 (2016).

Mallet, J. Hybrid speciation. Nature 446 , 279–283 (2007).

Engler-Chaouat, H. S. & Gilbert, L. E. De novo synthesis vs. sequestration: negatively correlated metabolic traits and the evolution of host plant specialization in cyanogenic butterflies. J. Chem. Ecol. 33 , 25–42 (2007).

Engler, H. S., Spencer, K. C. & Gilbert, L. E. Preventing cyanide release from leaves. Nature 406 , 144–145 (2000).

Joron, M. & Mallet, J. Diversity in mimicry: paradox or paradigm? Trends Ecol. Evol. 13 , 461–466 (1998).

Page, E., Queste, L., Rosser, N., Mallet, J. & Dasmahapatra, K. K. Pervasive mimicry in flight behavior among aposematic butterflies. Proc. Natl Acad. Sci. USA 121 , e2300886121 (2024).

Jones, R. T. et al. Wing shape variation associated with mimicry in butterflies. Evolution 67 , 2323–2334 (2013).

Merrill, R. M. et al. Disruptive ecological selection on a mating cue. Proc. R. Soc. B 279 , 4907–4913 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Arias, M. et al. Crossing fitness valleys: empirical estimation of a fitness landscape associated with polymorphic mimicry. Proc. R. Soc. B 283 , 20160391 (2016).

Cama, B. et al. Exploitation of an ancestral pheromone biosynthetic pathway contributes to diversification in Heliconius butterflies. Proc. R. Soc. B 289 , 20220474 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rosser, N. et al. Geographic contrasts between pre- and postzygotic barriers are consistent with reinforcement in Heliconius butterflies. Evolution 73 , 1821–1838 (2019).

Benson, W. W., Brown, K. S. & Gilbert, L. E. Coevolution of plants and herbivores: passion flower butterflies. Evolution 29 , 659–680 (1975).

Kozak, K. M. et al. Multilocus species trees show the recent adaptive radiation of the mimetic Heliconius butterflies. Syst. Biol. 64 , 505–524 (2015).

Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487 , 94–98 (2012).

Article   ADS   Google Scholar  

Thawornwattana, Y., Seixas, F. A., Yang, Z. & Mallet, J. Major patterns in the introgression history of Heliconius butterflies. eLife 12 , RP90656 (2023).

Flouri, T., Jiao, X., Rannala, B. & Yang, Z. A Bayesian implementation of the multispecies coalescent model with introgression for phylogenomic analysis. Mol. Biol. Evol. 37 , 1211–1223 (2020).

Brower, A. V. Z. Alternative facts: a reconsideration of putatively natural interspecific hybrid specimens in the genus Heliconius (Lepidoptera: Nymphalidae). Zootaxa 4499 , 1–87 (2018).

Dasmahapatra, K. K., Silva-Vásquez, A., Chung, J.-W. & Mallet, J. Genetic analysis of a wild-caught hybrid between non-sister Heliconius butterfly species. Biol. Lett. 3 , 660–663 (2007).

Mallet, J., Beltrán, M., Neukirchen, W. & Linares, M. Natural hybridization in heliconiine butterflies: the species boundary as a continuum. BMC Evol. Biol. 7 , 28 (2007).

González-Rojas, M. F. et al. Chemical signals act as the main reproductive barrier between sister and mimetic Heliconius butterflies. Proc. R. Soc. B 287 , 20200587 (2020).

Rosser, N. et al. Complex basis of hybrid female sterility and Haldane’s rule in Heliconius butterflies: Z-linkage and epistasis. Mol. Ecol. 31 , 959–977 (2022).

Jiggins, C. D. et al. Sex-linked hybrid sterility in a butterfly. Evolution 55 , 1631–1638 (2001).

CAS   PubMed   Google Scholar  

Sánchez, A. P. et al. An introgressed wing pattern acts as a mating cue. Evolution 69 , 1619–1629 (2015).

Merrill, R. M., Naisbit, R. E., Mallet, J. & Jiggins, C. D. Ecological and genetic factors influencing the transition between host-use strategies in sympatric Heliconius butterflies. J. Evol. Biol. 26 , 1959–1967 (2013).

Estrada, C. & Gilbert, L. E. Host plants and immatures as mate-searching cues in Heliconius butterflies. Anim. Behav. 80 , 231–239 (2010).

Article   Google Scholar  

Byers, K. J. R. P. et al. Clustering of loci controlling species differences in male chemical bouquets of sympatric Heliconius butterflies. Ecology and Evolution 11 , 89–107 (2021).

Felsenstein, J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals? Evolution 35 , 124–138 (1981).

Rieseberg, L. H. Chromosomal rearrangements and speciation. Trends Ecol. Evol. 16 , 351–358 (2001).

Jay, P. et al. Supergene evolution triggered by the introgression of a chromosomal inversion. Curr. Biol. 28 , 1839–1845 (2018).

Marques, D. A., Meier, J. I. & Seehausen, O. A Combinatorial view on speciation and adaptive radiation. Trends Ecol. Evol. 34 , 531–544 (2019).

Hench, K., Helmkampf, M., McMillan, W. O. & Puebla, O. Rapid radiation in a highly diverse marine environment. Proc. Natl Acad. Sci. USA 119 , e2020457119 (2022).

Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328 , 710–722 (2010).

Palkopoulou, E. et al. A comprehensive genomic history of extinct and living elephants. Proc. Natl Acad. Sci. USA 115 , E2566–E2574 (2018).

Li, G., Figueiró, H. V., Eizirik, E. & Murphy, W. J. Recombination-aware phylogenomics reveals the structured genomic landscape of hybridizing cat species. Mol. Biol. Evol. 36 , 2111–2126 (2019).

Suvorov, A. et al. Widespread introgression across a phylogeny of 155 Drosophila genomes. Curr. Biol. 32 , 111–123.e5 (2022).

Barrera-Guzmán, A. O., Aleixo, A., Shawkey, M. D. & Weir, J. T. Hybrid speciation leads to novel male secondary sexual ornamentation of an Amazonian bird. Proc. Natl Acad. Sci. USA 115 , E218–E225 (2018).

Article   ADS   PubMed   Google Scholar  

Hermansen, J. S. et al. Hybrid speciation in sparrows I: phenotypic intermediacy, genetic admixture and barriers to gene flow. Mol. Ecol. 20 , 3812–3822 (2011).

Nieto Feliner, G. et al. Is homoploid hybrid speciation that rare? An empiricist’s view. Heredity 118 , 513–516 (2017).

Mavárez, J. et al. Speciation by hybridization in Heliconius butterflies. Nature 441 , 868–871 (2006).

Gavrilets, S. Fitness Landscapes and the Origin of Species (Princeton Univ. Press, 2004).

Butlin, R. K. & Smadja, C. M. Coupling, reinforcement, and speciation. Am. Nat. 191 , 155–172 (2018).

Barton, N. H. Multilocus clines. Evolution 37 , 454–471 (1983).

Flaxman, S. M., Wacholder, A. C., Feder, J. L. & Nosil, P. Theoretical models of the influence of genomic architecture on the dynamics of speciation. Mol. Ecol. 23 , 4074–4088 (2014).

Kautt, A. F. et al. Contrasting signatures of genomic divergence during sympatric speciation. Nature 588 , 106–111 (2020).

Wessinger, C. A. et al. A few essential genetic loci distinguish Penstemon species with flowers adapted to pollination by bees or hummingbirds. PLoS Biol. 21 , e3002294 (2023).

Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17 , 10 (2011).

Davey, J. W. et al. No evidence for maintenance of a sympatric Heliconius species barrier by chromosomal inversions. Evolution Letters 1 , 138–154 (2017).

Edelman, N. B. et al. Genomic architecture and introgression shape a butterfly radiation. Science 366 , 594–599 (2019).

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at 10.48550/arXiv.1303.3997 (2013).

Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31 , 2032–2034 (2015).

McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20 , 1297–1303 (2010).

DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43 , 491–498 (2011).

Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32 , 292–294 (2016).

Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27 , 2987–2993 (2011).

Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 , 559–575 (2007).

Bryant, D. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21 , 255–265 (2003).

Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23 , 254–267 (2006).

Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35 , 526–528 (2019).

Schliep, K., Potts, A. J., Morrison, D. A. & Grimm, G. W. Intertwining phylogenetic trees and networks. Methods Ecol. Evol. 8 , 1212–1220 (2017).

Martin, S. H. & Van Belleghem, S. M. Exploring evolutionary relationships across the genome using topology weighting. Genetics 206 , 429–438 (2017).

Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81 , 1084–1097 (2007).

Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59 , 307–321 (2010).

Patterson, N. et al. Ancient admixture in human history. Genetics 192 , 1065–1093 (2012).

Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43 , 1031–1034 (2011).

Gómez-Martín, C., Lebrón, R., Oliver, J. L. & Hackenberg, M. Prediction of CpG islands as an intrinsic clustering property found in many eukaryotic DNA sequences and its relation to DNA methylation. Methods Mol. Biol. 1766 , 31–47 (2018).

Keightley, P. D. et al. Estimation of the spontaneous mutation rate in Heliconius melpomene . Mol. Biol. Evol. 32 , 239–243 (2015).

Ewing, G. & Hermisson, J. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26 , 2064–2065 (2010).

Rambaut, A. & Grass, N. C. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 13 , 235–238 (1997).

Article   CAS   Google Scholar  

Li, X., Chen, F. & Chen, Y. Gcluster: a simple-to-use tool for visualizing and comparing genome contexts for numerous genomes. Bioinformatics 36 , 3871–3873 (2020).

Excoffier, L. et al. fastsimcoal2: demographic inference under complex evolutionary scenarios. Bioinformatics 37 , 4882–4885 (2021).

Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic inference from genomic and SNP data. PLoS Genet. 9 , e1003905 (2013).

Hudson, R. R., Boos, D. D. & Kaplan, N. L. A statistical test for detecting geographic subdivision. Mol. Biol. Evol. 9 , 138–151 (1992).

Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 , 841–842 (2010).

Clayden, J., Modat, M., Presles, B., Anthopoulos, T. & Daga, P. RNiftyReg: Image registration using the ‘NiftyReg’ library. R version 2.8.1 https://cran.r-project.org/web/packages/RNiftyReg (2023).

Queste, L. M. The Evolution of Flight and Wing Shape in Heliconius Butterflies PhD thesis, Univ. York (2020).

Rohlf, F. J. tpsDig v.2.05 (State University of New York at Stony Brook, 2006). https://www.sbmorphometrics.org/soft-dataacq.html .

Adams, D. C., Collyer, M. & Kaliontzopoulou, A. Geomorph: Geometric morphometric analyses of 2D and 3D landmark data. R version 3.1.0 https://cran.r-project.org/web/packages/geomorph (2019).

Mérot, C., Frérot, B., Leppik, E. & Joron, M. Beyond magic traits: multimodal mating cues in Heliconius butterflies. Evolution 69 , 2891–2904 (2015).

Darragh, K. et al. Male sex pheromone components in Heliconius butterflies released by the androconia affect female choice. PeerJ 5 , e3953 (2017).

Ehlers, S., Blow, R., Szczerbowski, D., Jiggins, C. & Schulz, S. Variation of clasper scent gland composition of Heliconius butterflies from a biodiversity hotspot. ChemBioChem 24 , e202300537 (2023).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass. Spectrom. 10 , 770–781 (1999).

Etter, P. D., Preston, J. L., Bassham, S., Cresko, W. A., Johnson, E.A. Local de novo assembly of RAD paired-end contigs using short sequencing reads. PLoS ONE 6 , e18561 (2011).

Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22 , 3124–3140 (2013).

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25 , 1754–1760 (2009).

Davey, J. W. et al. Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3 6 , 695–708 (2016).

Rastas, P. Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data. Bioinformatics 33 , 3726–3732 (2017).

Broman, K. W., Wu, H., Sen, Ś. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19 , 889–890 (2003).

Rosser, N. Image data for Rosser et al. 2024 Hybrid speciation driven by multilocus introgression of ecological traits. Zenodo 10.5281/zenodo.10685466 (2024).

Rosser, N. GCMS data for Rosser et al. 2024 Hybrid speciation driven by multilocus introgression of ecological traits. Zenodo 10.5281/zenodo.10689714 (2024).

Download references

Acknowledgements

This work was funded by the NERC grant NE/K012886/1 to K.K.D.; a National Geographic Waitt grant (W400-15) to N.R.; grant Schu 984/12-1 from the DFG to S.S.; and Harvard University. K.K. was supported by a fellowship from the Smithsonian Institution. A.V.L.F. acknowledges support from FAPESP (2021/03868-8), from the Brazilian Research Council–CNPq (304291/2020-0) and from the USAID–US National Academy of Sciences (NAS) (AID-OAA-A-11-00012). The University of York Viking cluster high-performance computing facility was used for some of the analyses. We thank SERFOR, the Peruvian Ministry of Agriculture and the Área de Conservación Regional Cordillera Escalera (0289-2014-MINAGRI-DGFFS/DGEFFS, 020-014/GRSM/PEHCBM/DMA/ACR-CE and 040–2015/GRSM/PEHCBM/DMA/ACR-CE) for collecting permits; the Ministerio del Ambiente and Museo Ecuatoriano de Ciencias Naturales in Ecuador (005-IC-FAU-DNBAPVS/MA) for collecting permits; the ICMBio for permits (52562-3 and 10438-1); and the Conselho Nacional de Desenvolvimento Científico e Tecnológico–CNPq for approving our scientific expedition (Expediente PR no. 01300.000477/2016-49, portaria no. 4.628). This study is registered at the Brazilian SISGEN (A752FC2). Field collections in Colombia were conducted under permit no. 530 issued by the Autoridad Nacional de Licencias Ambientales of Colombia (ANLA). We thank J. Caldwell, M. Chouteau, C. Córdova, N. Edelman, S. Galluser, C. López, M. McClure, C. Pérez, C. Segami and M. Tuanama in Peru, and R. Aldaz, A. Toporov and K. Willmott in Ecuador, for help and support with fieldwork; C. Thomas, W. Valencia-Montoya, A. Kautt and J. Coughlan for comments; and M. Cast for providing some of the butterfly images used in figures.

Author information

These authors jointly supervised this work: Neil Rosser, Fernando Seixas

Authors and Affiliations

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

Neil Rosser, Fernando Seixas, Dmytro Kryvokhyzha & James Mallet

Department of Biology, University of York, York, UK

Neil Rosser, Lucie M. Queste, Bruna Cama, Michaela Nelson, Rachel Waite-Hudson, Matt Goringe & Kanchon K. Dasmahapatra

URKU Estudios Amazónicos, Tarapoto, Perú

Ronald Mori-Pezo

Universidad Nacional Autónoma de Alto Amazona, Yurimaguas, Perú

Department of Clinical Sciences, Lund University Diabetes Centre, Malmö, Sweden

Dmytro Kryvokhyzha

Residencial Las Cumbres, Caracas, Venezuela

Mauro Costa

Institut Systématique, Evolution, Biodiversité, UMR 7205 MNHN-CNRS-EPHE-UPMC Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France

Marianne Elias

Smithsonian Tropical Research Institute, Panama City, Panama

Marianne Elias, Krzysztof Kozak & W. Owen McMillan

Institute for Biological Sciences, Federal University of Pará (UFPA), Belém, Brazil

Clarisse Mendes Eleres de Figueiredo & Jonathan Ready

Centre for Advanced Studies of Biodiversity (CEABIO), Belém, Brazil

Departamento de Biologia Animal and Museu de Diversidade Biológica, Instituto de Biologia, Universidade Estadual de Campinas, São Paulo, Brazil

André Victor Lucci Freitas & Leila T. Shirai

Centre d’Ecologie Fonctionnelle et Evolutive, UMR 5175 CNRS, Université de Montpellier–Université Paul Valéry Montpellier–EPHE, Montpellier, France

Mathieu Joron

Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Peru

Gerardo Lamas

Redpath Museum, McGill University, Montreal, Quebec, Canada

Ananda R. P. Martins

Biology Program, Faculty of Natural Sciences, Universidad del Rosario, Bogotá, Colombia

Nicol Rueda-Muñoz & Camilo Salazar

Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield, UK

Patricio Salazar

Institut für Organische Chemie, Technische Universität Braunschweig, Braunschweig, Germany

Stefan Schulz

Leibniz Institute for the Analysis of Biodiversity Change, Museum de Natur Hamburg Zoology, Hamburg, Germany

Karina L. Silva-Brandão

Leverhulme Centre for Anthropocene Biodiversity, Department of Biology, University of York, York, UK

Kanchon K. Dasmahapatra

You can also search for this author in PubMed   Google Scholar

Contributions

N.R., F.S., J.M. and K.K.D designed the study. N.R. was responsible for fieldwork, including designing and conducting crosses, phenotyping of traits and performing QTL, linkage and recombination-rate analyses. F.S. performed the population genomic analyses (with D.K., N.R., J.M. and K.K.D.) and the demographic and coalescent analyses. L.M.Q. contributed to phenotyping host plant preference, colour pattern preference, wing shape, colour pattern (with B.C. and R.W.-H.) and flight (with M.G.). B.C. and S.S. led the analysis of androconial volatiles. R.M.-P. provided fieldwork assistance in Peru. M.N. constructed the RAD libraries. All other authors contributed to sample collection. N.R., F.S., J.M. and K.K.D wrote and finalized the paper with contributions from all authors.

Corresponding authors

Correspondence to Neil Rosser or James Mallet .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 evaluating the hybrid speciation hypothesis under the msci model..

For each model, a schematic representation is depicted on top and the estimated values under the MSCi are presented in the table below. In the schematics, open circles denote internal nodes and arrows between internal nodes represent single migration pulses. Effective population sizes (N e ) are scaled to thousand individuals. The age ( τ ) of splits and nodes involved in hybridization events are given in kya. Introgression probabilities ( φ ) are depicted in blue and are given as a percentage. a , Model in which the two parental species, S (ancestral of H. melpomene ) and T (ancestral of H. pardalinus ), hybridize to originate the hybrid lineage H (ancestral of H. elevatus ), with φ contribution from parent S and 1- φ from parent T. The nodes S and T may have distinct ages and are older than node H. b , Same as model a , but constraining introgression from S into H to occur after an initial split between H. elevatus and H. pardalinus ( τ H   <   τ T ). Introgression is instantaneous and occurs at time τ H (which is the same as τ S ). c , Model allowing bidirectional migration between an H. melpomene ancestor (X) into the ancestral population of H. elevatus and H. pardalinus (Y), between an H. melpomene ancestor (S) and the lineage leading to H. elevatus (H), and between the lineage leading to H. elevatus (E) and the lineage leading to H. pardalinus (P). Note that in all models, the 95% HPD intervals of the age of gene flow from H. melpomene into H. elevatus and the split between H. elevatus with H. pardalinus overlap, in line with H. elevatus being a hybrid lineage. These are highlighted in red.

Extended Data Fig. 2 MSC analysis of H. elevatus and H. pardalinus .

Species-tree phylogeny along chromosomes calculated in blocks of 100 loci using BPP (ref. 22 ). Only the five major topologies are depicted (minor topologies are coloured in grey).

Extended Data Fig. 3 Species-diagnostic SNPs.

a , Number of species-diagnostic SNPs per chromosome. Species-diagnostic SNPs were defined as SNPs with an allelic difference of at least 0.8 between all Amazonian populations of H. elevatus and H. pardalinus . Chromosomes with at least 20 diagnostic SNPs are denoted with an asterisk (*) and shown in more detail in c . b , Triangular plot of hybrid index and observed heterozygosity, based on the 1,156 species-diagnostic SNPs, shows no evidence of early generation hybrids. c , Distribution of species-diagnostic SNPs along chromosomes in wild-caught H. elevatus and H. pardalinus . The physical location of SNPs along chromosomes (in Mb) are shown on top. Different blocks of SNPs within a chromosome, defined as groups of SNPs more than 500 kb apart, are denoted in alternating colours (black and grey). For visualization purposes, only chromosomes with at least 20 diagnostic SNPs are shown and SNP blocks were subsampled to show only one in every two SNPs. Long tracts of heterozygous genotypes (e.g. chromosome 19) suggest relatively recent hybridization followed by backcrossing.

Extended Data Fig. 4 Schematic of all demographic models tested with fastsimcoal2.

Two different tree topologies and 12 models per topology were tested. We considered the topology that retrieves both H. elevatus and H. pardalinus as monophyletic; that is, the species tree, (topology 1) and the most frequent topology across the genome (Extended Data Fig. 2 ), after excluding gene flow between H. elevatus and Amazonian H. pardalinus and in which H. p. sergestus is the first population to split (topology 2). The different demographic models are split into five main categories (depicted in different boxes): SI, strict isolation; AM, ancestral migration; SC, secondary contact; AM-SC, ancestral migration followed by secondary contact; IM, isolation with migration. Arrows between demes indicate gene flow (each direction being estimated as an independent parameter). Effective population sizes were allowed to change at split times. Note that for models under tree topology 1, the split times between H. elevatus populations and between H. pardalinus populations are different parameters and thus can assume different values.

Extended Data Fig. 5 Genetic evidence for current reproductive isolation between H. elevatus and H. melpomene .

a , Neighbour-joining tree based on autosomal sites sampled every 1 kb (166,989 sites). Values next to branches denote bootstrap values (based on 100 bootstrap iterations). Images of butterfly wings are copyright of the authors and Michel Cast. b , Distribution of species-diagnostic SNPs along chromosomes in wild-caught H. elevatus and H. melpomene . Species-diagnostic SNPs were defined as SNPs with an allelic difference of at least 0.8 between all Amazonian populations of H. elevatus and all H. melpomene populations. The physical location of SNPs along chromosomes (in Mb) are shown on top. For visualization purposes, SNP blocks were subsampled to show only 1 in every 20 SNPs. The lack of long tracts of heterozygous genotypes (or introgressed homozygous genotypes) suggests that there is no recent hybridization, followed by backcrossing, between these two species.

Extended Data Fig. 6 PCAs of male sex pheromones and host plant use show that H. elevatus , H. pardalinus and H. melpomene from the western Amazon form three distinct clusters in trait space.

a , PCA applied to concentrations of 30 male androconial volatiles. Loadings for selected compounds are annotated. b , PCA applied to oviposition preference of H. elevatus , H. pardalinus and H. melpomene for 21 species of Passiflora . Heliconius melpomene (24 females) laid 288 eggs and exhibited a strong preference for P. menispermifolia and P. triloba . Heliconius elevatus (35 females) laid 173 eggs and exhibited a preference for P. kaipiriensis . H. pardalinus butleri (51 females) laid 425 eggs and had a more generalized host plant use. To estimate the sample variance for each species, subsamples of 30 were drawn with replacement from the distribution of each species (1,000 replicates). PCA was then run on these bootstrapped replicates, polygons are minimum convex hulls encompassing all subsamples for each species. Images of butterfly wings are copyright of the authors and Michel Cast.

Extended Data Fig. 7 F ST and genetic distances plotted against physical distance.

Physical distance is shown on the x axis; grey intervals are 1 Mb and black intervals are 5 Mb. Coloured bars show significant QTLs, with the QTL peak indicated by the triangle and the Bayesian credible intervals indicated by the length of the bar. Genetic distances are estimated using three crosses—within population ( Heliconius elevatus ; elev and Heliconius pardalinus ; pard) and between population (F 2 ). Candidate inversions (CIs; indicated by black arrows) are regions that recombine within species but not in hybrids (see  Methods ). The largest CI we identified was around 1.4 Mb long at the distal end of chromosome 16. However, we identified no CIs greater than 870 kb within the credible intervals of QTLs, and only one instance of a CI that was coincident with a QTL peak (on chromosome 15). Nonetheless, some CIs outside of QTLs present compelling targets for future investigation. Notably, at the proximal end of chromosome 19 and the distal end of chromosome 16, two large CIs overlap regions with elevated F ST in which phylogenies resolve species boundaries (see Fig. 3 ).

Supplementary information

Supplementary information.

A full guide for Supplementary Tables 1–8 (Tables supplied separately).

Reporting Summary

Supplementary tables.

Supplementary Tables 1–8.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Rosser, N., Seixas, F., Queste, L.M. et al. Hybrid speciation driven by multilocus introgression of ecological traits. Nature 628 , 811–817 (2024). https://doi.org/10.1038/s41586-024-07263-w

Download citation

Received : 26 September 2023

Accepted : 01 March 2024

Published : 17 April 2024

Issue Date : 25 April 2024

DOI : https://doi.org/10.1038/s41586-024-07263-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Surprise hybrid origins of a butterfly species.

  • Megan E. Frayer
  • Jenn M. Coughlan

Nature (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

ecological analysis of case study

Estimating turbidity concentrations in highly dynamic rivers using Sentinel-2 imagery in Google Earth Engine: Case study of the Godavari River, India

  • Research Article
  • Published: 01 May 2024

Cite this article

ecological analysis of case study

  • Meena Kumari Kolli 1 , 2 &
  • Pennan Chinnasamy   ORCID: orcid.org/0000-0002-3184-2134 1 , 2 , 3 , 4 , 5 , 6  

Turbidity is an essential biogeochemical parameter for water quality management because it shapes the physical landscape and regulates ecological systems. It varies spatially and temporally across large water bodies, but monitoring based on point-source field observations remains a difficult task in developing countries due to the need for logistics and costs. In this study, we present a novel semi-analytical approach for estimating turbidity from remote sensing reflectance \(({R}_{{\text{rs}}})\) in moderate to highly turbid waters in the lower part of the Godavari River (i.e., locations near Rajahmundry). The proposed method includes two sub-algorithms—Normalized Difference Turbidity Index (NDTI) and semi-empirical single-band turbidity ( \({T}_{{\text{s}}}\) ) algorithm—to retrieve spectral reflectance information corresponding to the study locations for turbidity modeling. Sentinel-2 Multi-Spectral Imager data have been used to quantify the turbidity in the Google Earth Engine (GEE) platform. The correlation analysis was observed between spectral reflectance values and in situ turbidity data using cubic polynomial regression equations. The results indicated that the \({T}_{{\text{s}}}\) , which uses the only red-edge wavelength, identified turbidity as the most accurate across all locations (highest R 2  = 0.91, lowest RMSE = 0.003), followed by NDTI (highest R 2  = 0.85, lowest RMSE = 0.05), respectively. The remote sensing data application provides a better way to monitor turbidity at large spatio-temporal scales in attaining the water quality standards of the Godavari River.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

ecological analysis of case study

Data availability

Sources of all the data have been described properly. Derived data supporting the findings of this study are available based on the request to the corresponding author.

Ahmed U, Mumtaz R, Anwar H, Shah AA, Irfan R, García-Nieto J (2019) Efficient water quality prediction using supervised machine learning. Water 11:2210

Article   CAS   Google Scholar  

Alka S, Sushma P, Singh TS, Patel JG, Tanwar H (2014) Wetland information system using remote sensing and GIS in Himachal Pradesh, India. Asian J Geoinform 14(4):13–22

Google Scholar  

Anderson CW (2006) Turbidity (ver. 2.1): US Geological survey techniques of water-resources investigations. Book 9, chap. A6., sec. 6.7., from: http://pubs.water.usgs.gov/twri9A6/ . Accessed 10 Jul 2022

Bosire AS, Nyantika D, Mamboleo D (2021) Evaluation of the effects of human activities on water resources in Masaba North, Nyamira County. Int J Res Sch Commun 4(2):85–98

Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612

Bustamante J, Pacios F, Diaz-Delgado R, Aragones D (2009) Predictive models of turbidity and water depth in the Donana marshes using LANDSAT TM and ETM + images. J Environ Manage 90:2219–2225

Article   Google Scholar  

Chander S, Gujrati A, Hakeem KA, Garg V, Issac AM, Dhote PR, Kumar V, Sahay A (2019) Water quality assessment of River Ganga and Chilika lagoon using AVIRIS-NG hyperspectral data. Curr Sci 116(7):1172–1181

Chen Z, Muller-Karger F, Hu C (2007) Remote sensing of water clarity in Tampa Bay. Remote Sens Environ 109:249–259

Chen X, Liu L, Zhang X, Li J, Wang S, Liu D, Duan H, Song K (2021) An assessment of water color for inland water in China using a Landsat 8-Derived Forel-Ule Index and the Google Earth Engine platform. IEEE J Sel Top Appl Earth Obs Remote Sens 14:5773–5785

Chinnasamy P, Honap VU (2023) Spatiotemporal variations in soil loss across the biodiversity hotspots of Western Ghats Region, India. J Earth Syst Sci 132(2):90

Chinnasamy P, Shah Z, Shahid S (2023) Impact of lockdown on air quality during COVID-19 pandemic: a case study of India. J Indian Soc Remote Sens 51(1):103–120

Choubey VK (1992) Correlation of turbidity with Indian Remote Sensing Satellite-1A data. Hydrol Sci 37(2):129–140

DeLuca N, Zaitchik B, Curriero F (2018) Can multispectral information improve remotely sensed estimates of total suspended solids? A statistical study in Chesapeake bay. Rem Sens 10:1393

Dogliotti AI, Ruddick KG, Nechad B, Doxaran D, Knaeps E (2015) A single algorithm to retrieve turbidity from remotely-sensed data in all coastal and estuarine waters. Remote Sens Environ 156:157–168

Garg V, Kumar AS, Aggarwal SP, Kumar V, Dhote PR, Thakur PK, Nikam BR, Sambare RS, Siddiqui A, Muduli PR, Rastogi G (2017) Spectral similarity approach for mapping turbidity of an inland waterbody. J Hydrol 550:527–537

Garg V, Aggarwal SP, Chauhan P (2020) Changes in turbidity along Ganga River using Sentinel-2 satellite data during lockdown associated with COVID-19. Geomat Nat Hazards Risks 11:1175–1195

Gholizadeh MH, Melesse AM, Reddi L (2016) A comprehensive review on water quality parameters estimation using remote sensing techniques. Sensors 16:1298

Gorelick N, Hancher M, Dixon M, Ilyushchhenko S, Thau D, Moore R (2017) Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens Environ 202:18–27

Hasmadi MI, Norsaliza U (2010) Analysis of SPOT-5 data for mapping turbidity level of River Klang, Peninsular Malaysia. Appl Remote Sens J 1:14–18

Hussain J, Husain I, Arif M, Gupta N (2017) Studies on heavy metal contamination in the Godavari river basin. Appl Water Sci 7:4539–4548

Islam M, Sado K (2006) Analyses of ASTER and Spectroradiometer data with in situ measurements for turbidity and transparency study of lake Abashri. Int J Geoinf 2:31–45

Konik M, Kowalczuk P, Zablocka M, Makarewicz A, Meler J, Zdun A, Darecki M (2020) Empirical relationships between remote-sensing reflectance and selected inherent optical properties in Nordic Sea surface water for the MODIS and OLCI Ocean Colour Sensors. Remote Sens 12:2774

Lacaux JP, Tourre YM, Vignolles C, Ndione JA, Lafaye M (2007) Classification of ponds from high-spatial resolution remote sensing: application to Rift Valley fever epidemics in Senegal. Remote Sens Environ 106(1):66–74

Lobo FDL, Nagel GW, Maciel DA, de Carvalho LAS, Martins VS, Barbosa CCF, de Moraes Novo EML (2021) AlgaeMAp: algae bloom monitoring application for inland waters in Latin America. Remote Sens 13:2874

Lonare A, Maheshwari B, Chinnasamy P (2022) Village level identification of sugarcane in Sangali, Maharashtra using open source data. J Agrometeorol 24(3):249–254

Luis KM, Rheuban JE, Kavanaugh MT, Glover DM, Wei J, Lee Z, Doney SC (2019) Capturing coastal water clarity variability with Landsat 8. Mar Pollut Bull 145:96–104

Mahato LL, Pathak AK, Kapoor D, Patel N, Murthy M (2004) Surface water monitoring and evaluation of Indravati reservoir using the application of principal component analysis using satellite remote sensing technology. In: Proceedings of Map Asia, Beijing, China, 26–29 August 2004

Maltese A, Capodici F, Ciraolo G, La Loggia G (2013) Coastal zone water quality: calibration of a water-turbidity equation for MODIS data. Eur J Remote Sens 46:333–347

Markert KN, Schmidt CM, Griffin RE, Flores AI, Poortinga A, Saah DS, Muench RE, Clinton NE, Chishtie F, Kityuttachai K, Someth P, Anderson ER, Aekakkararungroj A, Ganz DJ (2018) Historical and operational monitoring of surface sediments in the Lower Mekong Basin using Landsat and Google Earth Engine Cloud Computing. Remote Sens 10:909

Martinez E, Gorgues T, Lengaigne M, Fontana C, Sauzede R, Menkes C, Uitz J, Di Lorenzo E, Fablet R (2020) Reconstructing global chlorophyll-a variations using a non-linear statistical approach. Front Mar Sci 7:464

McCarthy FMG, Riddick NL, Volik O, Danesh DC, Krueger AM (2018) Algal palynomorphs as proxies of human impact on freshwater resources in the Great Lakes region. Anthropocene 21:16–31

McFeeters SK (1996) The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int J Remote Sens 17(7):1425–1432

Nechad B, Ruddick KG, Park Y (2010) Calibration and validation of a generic multisensor algorithm for mapping of total suspended matter in turbid waters. Remote Sens Environ 114:854–866

Nechad B, Ruddick KG, Neukermans G (2009) Calibration and validation of a generic multisensor algorithm for mapping of turbidity in coastal waters. Proc. SPIE 7473, Remote Sensing of the Ocean, Sea Ice, and Large Water Regions 2009, 74730H. https://doi.org/10.1117/12.830700

Chapter   Google Scholar  

Papoutsa C, Retalis A, Toulios L, Hadjimitsis DG (2014) Defining the Landsat TM/ETM+ and CHRIS/PROBA spectral regions in which turbidity can be retrieved in inland waterbodies using field spectroscopy. Int J Remote Sens 35:1674–1692

Peterson KT, Sagan V, Sloan JJ (2020) Deep learning-based water quality estimation and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud computing. Giscience Remote Sens 57:510–525

Pote SK, Singal SK (2012) Srivastava DK (2012) Assessment of surface water quality of Godavari River at Aurangabad. Asian J Water Environ Pollut 9(1):117–122

CAS   Google Scholar  

Potes M, Costa MJ, Salgado R (2012) Satellite remote sensing of water turbidity in Aiqueva reservoir and implications on lake modeling. Hydrol Earth Syst Sci Discuss 16:1623–1633

Quang NH, Sasaki J, Higa H, Huan NH (2017) Spatiotemporal variation of turbidity based on Landsat 8 OLI in Cam Ranh Bay and Thuy Trieu Lagoon, Vietnam. Water 9(8):570

Rodríguez-López L, Duran-Llacer I, González-Rodríguez L, Cardenas R, Urrutia R (2021) Retrieving water turbidity in Araucanian lakes (South-Central Chile) based on Multispectral Landsat Imagery. Remote Sens 13:3133

Rodriguez-Perez J, Leigh C, Liquet B, Kermorvant C, Peterson E, Sous D, Mengersen K (2020) Detecting technical anomalies in high-frequency water-quality data using artificial neural networks. Environ Sci Technol 54:13719–13730

Sebastiá-Frasquet MT, Aguilar-Maldonado JA, Santamaría-Del-Ángel E, Estornell J (2019) Sentinel 2 analysis of turbidity patterns in a coastal lagoon. Remote Sens 11(24):2926

Simpson ZP, Haggard BE (2018) Optimizing the flow adjustment of constituent concentrations via LOESS for trend analysis. Environ Monit Assess 190:103. https://doi.org/10.1007/s10661-018-6461-5

Singh N, Nalgire SM, Gupta M, Chinnasamy P (2022) Potential of open source remote sensing data for improved spatiotemporal monitoring of inland water quality in India: case study of Gujarat. Photogramm Eng Remote Sens 88:155–163

Traganos D, Poursanidis D, Aggarwal B, Chrysoulakis N, Reinartz P (2018) Estimating satellite-derived bathymetry (SDB) with the Google Earth Engine and Sentinel-2. Remote Sens 10:859

Union E (2008) Directive 2008/56/EC of the European Parliament and of the council of June 17 2008 establishing a framework for community action in the field of marine environmental policy (Marine Strategy Framework Directive). Off J Eur Union 164:19–40

Vos K, Splinter KD, Harley MD, Simmons JA, Turner IL (2019) CoastSat: A Google Earth Engine-enabled Python toolkit to extract shorelines from publicly available satellite imagery. Environ Model Softw 122:104528

Wang L, Xu M, Liu Y, Liu H, Beck R, Reif M, Emery E, Young J, Wu Q (2020) Mapping freshwater chlorophyll-a concentrations at a regional scale integrating multi-sensor satellite observations with Google Earth Engine. Remote Sens 12:3278

Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York, New York, USA. https://ggplot2.tidyverse.org/ . Accessed 22 Aug 2022

Wu JL, Ho CR, Huang CC, Srivastav AL, Tzeng JH, Lin YT (2014) Hyperspectral sensing for turbid water quality monitoring in freshwater rivers: empirical relationship between reflectance and turbidity and total solids. Sensors 14:22670–22688

Yin F, Lewis PE, Gomez-Dans J, Wu Q (2019) A sensor-invariant atmospheric correction method: application toSentinel-2/MSI and Landsat 8/OLI. https://doi.org/10.31223/osf.io/ps957

Yunus AP, Masago Y, Hijioka Y (2020) COVID-19 and surface water quality: improved lake water quality during the lockdown. Sci Total Environ 731:139012

Yunus AP, Masago Y, Hijioka Y (2021) Analysis of long-term (2002–2020) trends and peak events in total suspended solids concentrations in the Chesapeake Bay using MODIS imagery. J Environ Manage 299:113550

Zhang C, Di L, Yang Z, Lin L, Hao P (2020) AgKit4EE: A toolkit for agricultural land use modeling of the conterminous United States based on Google Earth Engine. Environ Model Softw 129:104694

Zheng G, DiGiacomo PM (2022) A simple water-clarity turbidity index for the Great Lakes. J Great Lakes Res 48:684–694

Download references

Acknowledgements

This work is partially supported by the Institute Post-doctoral Fellowship at IIT Bombay. The authors would like to thank the European Space Agency for free access to the Sentinel-2 data. We are grateful to Google for the GEE platform, which provided an efficient and powerful computing platform.

This research has been supported by the Water Productivity Improvement in Practice (Water-PIP) project, which is funded by the IHE Delft Water and Development Partnership Programme (WDPP) under the programmatic cooperation between the Directorate General for International Cooperation (DGIS) of the Ministry of Foreign Affairs of the Netherlands and IHE Delft (ID: DGIS Activity DME0121369).

Award Number: 111349 (WATERPIP project) | Recipient: Pennan Chinnasamy, Ph.D.

Author information

Authors and affiliations.

Centre for Technology Alternatives for Rural Areas, Indian Institute of Technology (IITB), Powai, Mumbai, Maharashtra, 400076, India

Meena Kumari Kolli & Pennan Chinnasamy

Rural Data Research and Analysis Lab (RuDRA), IIT Bombay, Mumbai, India

Interdisciplinary Programme in Climate Studies (IDPCS), IIT Bombay, Mumbai, India

Pennan Chinnasamy

Centre for Machine Intelligence and Data Science(C‑MInDS), IIT Bombay, Mumbai, India

Ashank Desai Centre for Policy Studies, IIT Bombay, Mumbai, India

Nebraska Water Center, University of Nebraska, Lincoln, NE, USA

You can also search for this author in PubMed   Google Scholar

Contributions

Ideation, discussion, data sources identification, draft preparation, editing, and finalization—Prof. Pennan Chinnasamy.

Data collection, GIS, analysis, draft preparation, writing, and maps—Meena Kumari Kolli.

Corresponding author

Correspondence to Pennan Chinnasamy .

Ethics declarations

Ethical approval.

Not applicable.

Consent to participate

Consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Responsible Editor: Philippe Garrigues

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Kolli, M.K., Chinnasamy, P. Estimating turbidity concentrations in highly dynamic rivers using Sentinel-2 imagery in Google Earth Engine: Case study of the Godavari River, India. Environ Sci Pollut Res (2024). https://doi.org/10.1007/s11356-024-33344-4

Download citation

Received : 06 November 2022

Accepted : 11 April 2024

Published : 01 May 2024

DOI : https://doi.org/10.1007/s11356-024-33344-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sentinel-2 Multi-Spectral Image
  • Normalized Difference Turbidity Index
  • Single-band turbidity
  • Regression analysis
  • Godavari River
  • Google Earth Engine
  • Find a journal
  • Publish with us
  • Track your research
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, is it possible for sustainability the case from the new capital city of indonesia.

ecological analysis of case study

  • 1 Department of Public Administration, Universitas Sultan Ageng Tirtayasa, Serang, Banten, Indonesia
  • 2 Department of Government, Universitas Sultan Ageng Tirtayasa, Serang, Banten, Indonesia
  • 3 Department of Politics, Universitas Padjajaran, Bandung, West Java, Indonesia

The decision to relocate the nation’s capital from Jakarta is not without reason. Jakarta, the nation’s capital, is regarded as less than ideal, with numerous issues such as flooding, air pollution, poor water quality, and political and environmental sustainability. This research will be based on the framework of ecological citizenship to investigate active citizens. The lesson from other countries that relocate their capital city as a comparison. This research uses a qualitative research method with a literature study type of research. reviewing several previous studies on citizenship and academic texts on moving the nation’s capital, studies on moving the capital, and legislation on the nation’s capital. This research tries to find how the possibility of environment sustainability in the new capital project. Ecological concerns have not been on the agenda of public discussion. Moreover, this research provides more information on the opportunity of ecological citizenship community in Indonesia’s new capital city project, in the context of the sustainability agenda.

1 Introduction

The relocation of Indonesia’s capital city from Jakarta has been a long-planned project since the 1960s when President Soekarno was in power. However, due to changes in leadership from one president to the next, this has never become a reality. In 2019, the government of President Jokowi officially announced a plan to relocate the country’s capital from Jakarta to East Kalimantan ( Minister of National Development Planning, 2019 ).

The decision to relocate the nation’s capital from Jakarta is not without reason. Jakarta, the nation’s capital, is regarded as less than ideal, with numerous issues such as flooding, air pollution, poor water quality, and political and environmental sustainability ( Steinberg, 2007 ). Aside from environmental issues and conditions that are perceived to be worsening, Jakarta is also perceived to have a lack of capacity to manage existing problems ( Teo et al., 2020b ). Other debates that have become strong reasons for the government to relocate the capital from Jakarta include: approximately 57% of Indonesia’s population is concentrated on the Java island; approximately 58.49% of Indonesia’s economic concentration is concentrated on the Java island; the water crisis that occurred in Jakarta and East Java; the largest land conversion occurred on the Java island, approximately 44% in 2020; and urbanism. Jakarta’s environmental quality is deteriorating (flooding, land subsidence, sea-level rise, river water quality that is polluted by up to 96%, congestion, and other transportation problems that cost up to 56 trillion per year) ( Minister of National Development Planning, 2019 ).

The public is also concerned about the funding required to relocate the capital city to East Kalimantan. The first scenario requires at least 466 trillion (32.9 billion USD) in development costs, while the second scenario requires 323 trillion (22.8 billion USD). Despite the fact that the Jokowi government claims that this relocation will increase economic equality outside of Java by around 50% ( Minister of National Development Planning, 2019 ).

For the time being, what some environmental activists are concerned about is the possibility that relocating the capital will only exacerbate existing problems in East Kalimantan ( Sloan et al., 2019 ; Shimamura and Mizunoya, 2020 ; van de Vuurst and Escobar, 2020 ; Teo et al., 2020a , b ). Penajam Paser, as the center area for the development of the government place for the new capital, is close to several existing cities such as Samarinda, Balikpapan, and Kutai Kartanegara, and in some existing cities, environmental damage has been caused by mining activities that began in the early 2000s (interview with informant 1 ) ( Siburian, 2012 , 2015 ).

For example, several major floods occurred in Samarinda in early 2020, as reported by floodlist.com (Indonesia—Samarinda Floods Again, Thousands Affected) and the Jakarta Post. 2 The same thing occurred in Balikpapan and Kutai Kartanegara, which have a high risk of natural disasters due to the massive mining activities that take place there ( Saputra et al., 2023 ; Suling et al., 2023 ). Environmental damage has also begun to impact the Balik and Dayak tribal areas, as indigenous residents in the Paser area are behind the jargon of green city development 3 . This condition is a problem arising from land conversion, namely changing open areas to built-up areas. The land area that will be used as the new capital city reaches 180,965 hectares. In short, the National Capital Region (IKN) is divided into three rings. The first ring covers an area of 5,644 hectares, which the government calls the Core Central Government Area; the second ring covers 42,000 hectares, which the government calls the National Capital Region (IKN); and the third ring covers 133,321 hectares, which the government calls the National Capital Expansion Area ( Bappenas, 2019 ).

2 The politics of citizenship

This qualitative research aims to investigate the problems of the relocation of the new capital city of Indonesia. This qualitative research method attempts to elaborate on the meaning of the phenomenon and describe it in this research using the model of citizenship as an approach to perceive the actor of green discourse in the new capital city project. Understanding the project of relocating Indonesia’s new capital and its relationship to the potential for environmental degradation. This research will attempt to use qualitative methods while focussing on actors involved in exclusion and inclusion on the agenda of public engagement and contestation. The Politics of Citizenship is a contestation arena in which members, legal status, rights, and participation are contested. Citizens are at risk of exclusion as well as inclusion in this contest, as seen in Figure 1 ( Stokke, 2017 ). Citizenship politics is also linked to the issue of distribution and redistribution in democratization in order to address problems of inequality. It is to see how to understand the context of civil rights that has evolved into political rights in the tradition of citizenship as a framework ( Betts, 1990 ). In this context, citizenship is more easily understood as membership.

www.frontiersin.org

Figure 1 . Dimensions and stratification of citizenship. Source: Stokke (2017) .

Using library research, this article uses official state documents to see the movements of actors in the transmission of Indonesia’s capital city. Several documents were reviewed: the Document for Preparing a Strategic Environmental Study for the National Capital Masterplan for Fiscal Year 2020; the Academic Text of the Draft Law on Capital State Cities; the Pocket Book on the Transfer of State Capitals; and Law Number 3 of 2022 concerning State Capitals. So, this research will answer questions related to the New National Capital regarding: How does the New National Capital impact environmental degradation? Regarding citizenship, this research also discusses how actors play a role in the distribution of democracy to overcome the gaps that occur.

Citizenship politics has many aspects to each issue. It can, however, be divided into two broad categories: membership and legal status issues, and rights and participation. In this framework, citizenship is very closely related to power relations in an existing contestation, whether it is a contestation of exclusion and inclusion in a democratic and political institution, or whether it is a contestation of class, gender, civil, labor, or others ( Betts, 1990 ). Furthermore, if we include the citizenship debate, we can see how the contestation is influenced by a latent ideology, namely how liberalism and communitarianism affect citizenship as a framework ( Hikmawan et al., 2021 ). As a result, membership engagement and political behavior will differ. Individual rights and civic republicanism debates are framed by various forms of citizenship, including differentiated citizenship ( Young, 1990 , 2000 , 2011 ). Citizenship is sought in this case as a framework for analyzing oppression and exclusion in various forms of membership.

3 A lesson and opportunity

Aside from environmental damage, it is also feared that potential conflicts will arise in the new capital. Penajam Paser, the site of the new capital’s construction, is an area with ancestral customs and environmental activities, namely customary lands that may be impacted by the project’s construction. Even Kutai Kartanegara, one of Indonesia’s oldest kingdoms, is concerned about the potential consequences (interview with informant). The question is whether the central government has taken inclusive steps to avoid ethnic conflict between newcomers and residents, as happened in Sampan, Madura.

In examining the potential problems that may exist, in this case, environmental damage and the impact of environmental conflicts that may arise during the process of relocating the new capital to East Kalimantan, it is worth reconsidering whether environmental issues are of concern to the people of Indonesia, particularly East Kalimantan. Or is the government’s discourse for economic equality as well as equitable development, which is the discourse of President Jokowi’s government in relocating the capital, covering all forms of damage that are currently occurring as well as the potential that will arise later? Of course, this is intriguing enough to warrant further investigation.

In answering this question, it is interesting to look at ecological citizenship as an effort to understand the potential damage that exists from various sources of local actors who are involved and to see the scope of the government’s environmental governance agenda. Do not allow the agenda of relocating the capital to become merely a matter of development and business project effort.

This research is based on existing environmental concerns and issues. Furthermore, the ecological citizenship approach enables us to see the involvement of public engagement relations and environmental governance, which has become a government discourse ( Smith, 2003 ; Dobson, 2004 ; Dobson and Eckersley, 2006 ).

The most difficult challenge of ecological citizenship is distinguishing between what is on the agenda of the Jokowi Government in terms of economic equality and how this discourse must be accompanied by sustainability in existing environmental issues. Even with the “Smart Green, Beautiful, and Sustainable” discourse, the relocation of the new capital must be on the agenda with the Indonesian people, not just for the sake of the economy, let alone the interests of a few elite groups for the sake of the business project.

In general, ecological awareness has not been on the agenda of Indonesian people, and it is very rare to come up in public discussion; only a few local activists are active in mining cases in East Kalimantan, the majority of which are owned by the national political elite, and they are now turning to the issue of relocating the capital. One of the concerns expressed by local environmental activists such as “JATAM” is that there may be a link between the damage caused by large mining corporations owned by national elites and efforts to relocate the new capital to conceal the damage. Of course, this suspicion assumption necessitates additional data and research.

Indonesia is not the only country that has had its capital city relocated. As shown in Table 1 , some countries’ capital cities have been relocated.

www.frontiersin.org

Table 1 . List of relocated capital cities.

The case of Brazil teaches us about how relocating the capital can alter the way civil society and elites interact with one another. Brasília has been the federal capital of Brazil since 1960. The relocation of Brazil’s old capital, Rio de Janeiro, was intended to redistribute development outcomes and promote a more modern vision of the country. However, as time passed, relocation to Brasilia began to exacerbate land conflicts between poor landless migrants who had relocated to the new capital and state authorities. This case demonstrated that the establishment of a new capital city does not necessarily result in more equitable development ( Anugrah, 2019 ).

While Jokowi’s decision to relocate the capital has garnered considerable attention in Indonesia, there are already numerous examples of countries that have chosen to relocate their administrative center to a new location around the world. According to Potter (2017) , almost 30% of countries have their capital outside of their largest city, and 11 countries have shifted capitals since 1960. The reason for the relocation of the capital city is most likely a civil–society conflict ( Potter, 2017 ). However, various factors may influence such decisions. While the reason for relocating the capital city in Canberra, Australia, was to preserve a political symbol for the nation, in other cases, governments attempted to create a balance of power in the face of a divided population, such as when the United States established Washington City, or in the case of Naypyidaw, Myanmar, the decision was forced by the threat of civil unrest ( Campante et al., 2013 ).

Instead, the Myanmar government designated Naypyidaw as their new political capital in 2005. In the case of Myanmar, we can see how the relocation of the capital will cause a number of issues if the impact is not properly calculated. Myanmar, for example, relocated its capital from Yangon to Naypyidaw. Analysts disagree on what prompted the military government to relocate the capital: whether it was to promote the idea of state-led ethnic and racial unity and rural development, to bury memories of popular unrest in Yangon, to avoid growing pro-democracy protests, or a combination of these factors ( Anugrah, 2019 ). The Myanmar government stated that the relocation to Naypyidaw, which was built from the ground up in the middle of rice paddies and sugar cane fields, was similar to building Canberra or Brasilia, an administrative capital away from the traffic jams and overcrowding of Yangon, their previous capital. The city itself was divided into several zones, each with its own distinct design. In addition, the city is home to a military base, which is inaccessible to citizens or other personnel without written permission. The government has set aside two hectares of land each for foreign embassies and United Nations missions’ headquarters. In 2017, the Chinese embassy formally opened its interim liaison office, the first foreign office permitted to open in Naypyidaw. However, as in the case of Putrajaya in Malaysia, many foreign embassies are hesitant to relocate to this new city. State Counselor Daw Aung San Suu Kyi presided over a meeting at the Ministry of Foreign Affairs in Naypyidaw in February 2018, where she urged foreign governments to relocate their embassies to the capital. In fact, in 2019, several reports on Myanmar pointed out that many government-owned buildings and houses are facing a state of decay because of neglect, and no officials are living or working in those buildings. 4

Because of its proximity and cultural affinity with Indonesia, it is worth devoting a few lines to the Malaysian case. The Malaysian government, led by Prime Minister Mahathir bin Mohammad, planned to relocate the capital to a new city called Putrajaya in the mid-1980s. The city was divided into two sections by the master plan: the core and the periphery. The core was designed to be, and still is, the administrative and symbolic heart of the city and country, showcasing Putrajaya’s identity through grand civic structures. Hotels, shopping malls, commercial offices, exhibition and convention centers, private colleges, a private hospital, and various tourist attractions are also located in the core. While the periphery is designed to house 14 residential neighborhoods with 67,000 housing units, each neighborhood contains a variety of housing for a range of incomes, such as detached homes, rowhouses, shophouses, and high-rise apartments. There are numerous commercial clusters throughout the city where residents can walk to buy groceries at a wet market, supermarket, or corner shop, as well as a mosque. Furthermore, because it is located in the heart of Borneo, the new capital city of Putrajaya has the potential to create a forest management problem ( Sloan et al., 2019 ). This development condition will trigger the problem of forest degradation, indicated by the decline in forest cover in the Kalimantan region, one of which is caused by the use of land in forest areas, even though the IKN region is part of Kalimantan Island, whose spatial planning direction is to realize the sustainability of biodiversity conservation areas and protected areas with wet tropical forest vegetation covering at least 45% of the area of Kalimantan Island as the Lungs of the World.

This ambitious project also aims to demonstrate Malaysia’s modernisation and new Muslim identity ( Moser, 2010 ). Unfortunately, despite its great design, large budget, and explicit goal of creating a “garden city,” Putrajaya has been unable to attract people other than civil servants and tourists who come to visit it because the city does not allow people to commute around it ( Moser, 2010 ). However, if we look deeper, the issue is more complex than mere connectivity: the development of Putrajaya is a clear example of central planning, as described in the preceding paragraphs, and we have learned that a city is rather a spontaneous command arising from human interaction; it is not surprising that it has been unable to become a vibrant urban context; even international diplomacy has refused to move. Indeed, a city is made up of cultural, political, and economic elements that cannot be separated. To encourage people to relocate to Putrajaya, the Malaysian government enacted various housing incentives and subsidies, including the construction of thousands of affordable homes and the support of the city with mass transportation projects to attract workers from Kuala Lumpur to live in Putrajaya and commute to Kuala Lumpur. However, the anticipated results have yet to be revealed.

South Korea’s experience differs from that of Malaysia’s Putrajaya. The democratic process of relocating the capital city from Seoul to Sejong. Parties took part in the vote and made the decision ( Cowley, 2014 ; Hackbarth and de Vries, 2021 ). Furthermore, the project includes industry and universities in the planning of relocating the capital city ( Lee, 2011 ; Hackbarth and de Vries, 2021 ). In contrast to the Malaysian process, Putrajaya was decided by the prime minister and has since become a private project. In this case, Indonesia is more akin to what Malaysia has done by relocating its capital city. President Jokowi himself made the decision to move the capital from Jakarta to East Kalimantan.

Another similarity between Sejong and Putrajaya in the process of relocating the capital is that both prioritize the ecological concept for the new capital. However, there are significant differences in several areas. For example, Putrajaya has several issues with public transportation, which increases the use of private transportation. Meanwhile, Sejong prefers smart public transportation, making it accessible to both residents and visitors ( Kang, 2012 ).

It is demonstrated in some of the preceding literature that explaining new capital can be done using various approaches. Furthermore, several studies have been conducted that reflect this approach to explaining environmental impacts as well as social and economic conflicts. Furthermore, several studies have attempted to explain Sejong and Putrajaya as the new capital through urban city, smart city, land use and conservation, and migration aspects ( Moser, 2010 ; Lee, 2011 ; Kang, 2012 ; Cowley, 2014 ; Sloan et al., 2019 ).

There are fundamental differences between what I will do in the research on the case of the new capital in East Kalimantan, Indonesia, and what has been done in previous studies. In this research, I will investigate how the contestation arose during the process of relocating the capital city to East Kalimantan. Furthermore, this research focusses on the search for and analysis of a transformation of ecological citizenship, which will see the contestation of actors involved in the discourse of environmental sustainability in the development of new capital in Indonesia.

4 Is it possible for sustainability

Ecological citizenship refers to the relationship between citizenship and the environment, including human beings and non-human natural beings ( Smith, 2003 , 2004 ; Dobson, 2004 ; Dobson and Eckersley, 2006 ; Carme, 2008 ). The ecological citizenship framework focusses on contesting actors in an arena, namely the relationship between humans and nature. Developmentalism has become an unavoidable reality in today’s democracy. Every country must strike a balance between using natural resources in its activities and maintaining current sustainability. In this context, ecological citizenship is a framework that sees membership as a contestation of political rights, public engagement, and personal duties and ethics. The framework offered ( Dobson and Eckersley, 2006 ) for understanding environmental sustainability is in line with what is understood ( Smith, 2004 ), namely how the environment must be included in democratic policy institutions that are aware of the existence of the environment.

Dobson sees ecological citizenship as more than just imagination or an abstract concept in this context. It is, however, a result of an attempt to establish democratic institutions for the distribution of equality in the environment. The current debate takes place in a liberal democratic arena in which the environment is viewed as a potential benefit that can be reduced through citizen or full membership involvement. Furthermore, the question that arises is how ecological citizenship can be used to build a democratic institution that is large and inclusive.

Dobson also believes that in the case of urban politics, ecological citizenship is transformed from cosmopolitanism, which focusses on the distribution and redistribution of democratic institutions for environmental equality, to post-cosmopolitanism, which focusses on commitment to citizens beyond the state ( Dobson, 2004 ). Of course, this occurs because Dobson recognizes that in a liberal democracy, the opportunity to exploit the environment for economic gain is greater than that of ecological citizenship itself in terms of environmental sustainability.

As a result, ecological citizenship is a contingency in which citizens are seen as plural and “differentiated.” This is a term coined by Iris Marion Young, who contends that the primary form of citizen groups is “difference” ( Young, 1990 , 2000 , 2011 ; Appiah et al., 2007 ). The commitment that post-cosmopolitanism bases on Dobson’s frame of ecological citizenship is insufficient to accommodate different types of citizens who have different perspectives on their environment. In the sense that the environment is not only viewed scientifically as a need for sustainability in order to avoid climate change but also from an ethical standpoint, which sees an existence coexisting with human beings.

The “Differentiated Ecological Citizenship” model is a way of thinking about diversity and sustainability. The deadlock in Dobson’s post-cosmopolitanism is seeing citizens as homogeneous, rather than heterogeneous, with a commitment, obligations, and rights to preserve the environment, as a form of anticipation of developmentalism in liberal democracy. This viewpoint limits the need for diversity among environmental citizens. It is “indigenous people” in its most extreme form in understanding the relational diversity of membership in seeing its environment. They see that threats to environmental damage, such as the global community’s concern about climate change and ecosystem balance, are not on the agenda. They have a better understanding that the environment is an inseparable component of life that cannot be exploited. In the context of democracy, the diversity of thoughts on the form of citizens who are excluded in a rationalistic framework.

As a result, this research, which views ecological citizenship as a form of contingency, has the potential to broaden the definition of ecological citizenship beyond what Dobson offers in a post-cosmopolitanism framework of ecological citizenship. By combining the context of a new capital’s relocation in Indonesia with the plural groups of citizenship in Indonesia, culture, race, religion, and identity can all differ from one group to the next. In this context, broadening the definition of ecological citizenship becomes an urgent necessity in order to accommodate all of the competing membership forms that exist. The expansion of this form also allows for inclusiveness in the contestation and public discourse surrounding the development of a new capital city in East Kalimantan. This research also demonstrates the academic contribution to the theory of ecological citizenship. As an example of a practical contribution. This research examines the various forms of ecological citizenship in Indonesia. Specifically, providing an analysis of ecological citizenship participation in Indonesia.

5 Activism for sustainability

One of the most interesting findings from the researcher’s interviews with local informants is information about the daily relationships of environmental activists, aside from WALHI and Greenpeace, as representatives of environmental NGOs, namely JATAM (Jaringan Advokasi Tambang/Mining Advocacy Networks). 5 JATAM is a non-governmental organization (NGO) in East Kalimantan concerned with environmental issues. Pradarma Rupang is the JATAM Kaltim Disseminator, and he is involved in research, advocacy, and protests against mining policies and activities carried out by large corporations in East Kalimantan (interview with informant). 6

According to several JATAM reports, since the central government’s decision to relocate the national capital to Penajam Paser, massive changes in land ownership have occurred, with large-scale investments made by investors from outside Kalimantan, the majority of which were dominated by large corporations from Jakarta and Surabaya, to acquire land around Penajam Paser. This, of course, has caused land prices in East Kalimantan to increase. Eventually, these lands will be converted into commercial activities, which is expected to worsen the situation in East Kalimantan, adding to environmental activists’ concerns.

The East Kalimantan Provincial Government also faces numerous challenges in issuing permits to companies seeking to exploit nature. The local government has granted at least 161 exploitation permits, and another issue is that there are 2.4 million hectares of abandoned areas as a result of approximately 800 land exploitation permits that have been abandoned with no effort to improve the environment. 7 BAPPENAS claims that the new national capital will be an ideal city, with a minimum of 50% green open space integrated with the environmental landscape in accordance with the framework of the forest city development concept that the government always campaigns for. However, this plan has not been presented in the form of a concrete development program, so it is not clear how the government plans to build urban housing without disturbing the local ecosystem ( News.mongabay.com, 2019 ). What has happened is a large-scale project in Kalimantan, making the forests where animals live fragmented, including eliminating corridors that are vital for animals ( Alamgir et al., 2019 ).

The current urgency of ecological citizenship is the most visible form of a shared issue in placing the environment not only as a discourse for a development project that began in 2021 despite the COVID-19 pandemic and is scheduled to be completed in 2024, coinciding with the end of Jokowi’s presidential term, but as an initial effort to fully understand the formation of public engagement as a way of anticipating the potentials that could occur if environmental sustainability is not prioritized in the relocation of the new capital. This research also attempts to map a variety of current and future issues, such as conflict between indigenous peoples over land and their sustainability, economic conflicts between migrants and local communities, and environmental damage caused by converting land into business activities to meet the needs of more than 800,000 bureaucrats who will migrate, and to address the impact of flooding caused by environmental damage in surrounding cities such as Samarinda, Balikpapan, and Kutai Kartanegara, which has yet to be resolved, as well as the issue of the need for clean water that has emerged later.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

Ethical approval was not required for the study involving human participants in accordance with the local legislation and institutional requirements. The participants provided written informed consent for participation in the study.

Author contributions

LA: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. MH: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – review & editing, Writing – original draft. JS: Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Acknowledgments

The authors would like to show their gratitude towards the supervisors and colleagues who have been a part of this writing journey for their endless support and assistance during the process of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. ^ The author conducted an interview by phone with a local researcher at Mulawarman University.

2. ^ See http://floodlist.com/asia/indonesia-floods-samarinda-may-2020 and https://www.thejakartapost.com/news/2020/01/15/more-than-7000-affected-by-flooding-landslides-in-samarinda.html .

3. ^ Ekuatorial.com , 2022.

4. ^ See the reports from By NANDA and YE MON in December 2019. https://www.frontiermyanmar.net/en/official-residences-in-nay-pyi-taw-left-to-ruin/ .

5. ^ One of the local environmental activism in east Kalimantan (see https://www.jatam.org/tentang-kami ).

6. ^ Interview with a local researcher at Mulawarman University.

7. ^ See  https://www.liputan6.com/regional/read/4047118/catatan-aktivis-lingkungan-soal-ibu-kota-baru-di-kaltim .

Alamgir, M., Campbell, M., Sloan, S., Suhardiman, A., Surpiatna, J., and Laurance, W. F. (2019). High risk infrastructure projects pose imminent threats to forests in Indonesia Borneo. Sci. Rep. 9:140. doi: 10.1038/s41598-018-36594-8

PubMed Abstract | Crossref Full Text | Google Scholar

Anugrah, I. (2019). Out of sight, out of mind? Political accountability and Indonesia’s new capital plan. Available at: https://www.newmandala.org/out-of-sight-out-of-mind-political-accountability-and-indonesias-new-capital-plan/ .

Google Scholar

Appiah, K. A., Benhabib, S., Young, I. M., and Fraser, N. (2007). Justice, governance, cosmopolitanism, and the politics of difference: Reconfiguration in a transnational world . Cambridge, MA.W.E.B. Du Bois Lectures.

Bappenas. (2019). Dampak Ekonomi dan Skema Pembiayaan Pemindahan Ibu Kota Negara. Dialog Nasional II: Menuju Ibu Kota Masa Depan: Smart, Green and Beautiful.

Betts, K. (1990). Book reviews: CITIZENSHIP. J.M. Barbalet. Milton Keynes. J. Sociol. 26, 291–294. doi: 10.1177/144078339002600233

Crossref Full Text | Google Scholar

Campante, F. R., Do, Q. A., and Guimaraes, B. (2013). Isolated capital cities and misgovernance: theory and evidence. Available at: https://ssrn.com/abstract=2210254

Carme, M. E. (2008). Promoting ecological citizenship: rights, duties and political agency. ACME 7, 113–134.

Cowley, R. (2014). Learning from utopia? The case of Sejong city University of Westminster. London

Dobson, A. (2004). Citizenship and the environment . Oxford: Oxford University Press.

Dobson, A., and Eckersley, R. (2006). Political theory and the ecological challenge . Cambridge: Cambridge University Press.

Hackbarth, T. X., and de Vries, W. T. (2021). An evaluation of massive land interventions for the relocation of capital cities. Urban Sci. 5:25. doi: 10.3390/urbansci5010025

Hikmawan, M. D., Indriyany, I. A., and Hamid, A. (2021). Resistance against corporation by the religion-based environmental movement in water resources conflict in Pandeglang, Indonesia. Otoritas 11, 19–32. doi: 10.26618/ojip.v11i1.3305

Kang, J. (2012). A study on the future sustainability of Sejong, South Korea’s multifunctional administrative city, focusing on implementation of transit oriented development , Uppsala University. Uppsala

Lee, D. (2011). Political leadership during a policy shift: the effort to revise the Sejong City plan. Korean J. Policy Stud. 26, 1–19. doi: 10.52372/kjps26101

Minister of National Development Planning . (2019). Dampak ekonomi dan skema pembiayaan pemindahan ibu kota negara. Dialog Nasional ii: menuju ibu kota masa depan: smart, green and beautiful . Jakarta: Kementerian PPN/Bappenas.

Moser, S. (2010). Putrajaya: Malaysia’s new federal administrative capital. Cities 27, 285–297. doi: 10.1016/j.cities.2009.11.002

News.mongabay.com . (2019). Available at: https://news.mongabay.com/2019/08/red-flags-as-indonesia-eyes-relocating-its-capital-city-to-borneo/

Potter, A. (2017). Locating the government: capital cities and civil conflict. Res. Politics. 4:2053168017734077:205316801773407. doi: 10.1177/2053168017734077

Saputra, T., Zuhdi, S., Kusumawardani, F., and Novaria, R. (2023). The effect of economic development on illegal gold Mining in Kuantan Singingi, Indonesia. J. Gov. 8, 31–42. doi: 10.31506/jog.v8i1.16883

Shimamura, T., and Mizunoya, T. (2020). Sustainability prediction model for capital city relocation in Indonesia based on inclusive wealth and system dynamics. Sustainability 12, 1–25. doi: 10.3390/su12104336

Siburian, R. (2012). Pertambangan Batu Bara: Antara Mendulang Rupiah Dan Menebar Potensi Konflik. J. Masy.Indones. 38, 69–92. doi: 10.14203/jmi.v38i1.297

Siburian, R. (2015). “Emas Hitam”: Degradasi Lingkungan dan Pemarginalan Sosial “black gold”: environmental degradation and social marginalization. J. Penelit. Kesejaht. Sos. 14, 1–19.

Sloan, S., Campbell, M. J., Alamgir, M., Lechner, A. M., Engert, J., and Laurance, W. F. (2019). Trans-national conservation and infrastructure development in the heart of Borneo. PLoS One 14, e0221947–e0221922. doi: 10.1371/journal.pone.0221947

Smith, G. (2003). Deliberative democracy and the environment . London. Routledge.

Smith, G. (2004). “Liberal democracy and sustainability” in Environmental politics . eds. M. Wissenburg and Y. Levy (London: Routledge), 386–409.

Steinberg, F. (2007). Jakarta: environmental problems and sustainability. Habitat Int. 31, 354–365. doi: 10.1016/j.habitatint.2007.06.002

Stokke, K. (2017). Politics of citizenship: towards an analytical framework. Nor. Geogr. Tidsskr. 71, 193–207. doi: 10.1080/00291951.2017.1369454

Suling, C. F., Purnomo, E. P., Hubacek, K., and Anand, P. (2023). The influence of “renewable energy directive II” policy for the sustainability of palm oil industry in Indonesia. J. Gov. 8, 330–347. doi: 10.31506/jog.v8i3.19930

Teo, H. C., Lechner, A. M., Sagala, S., and Arceiz, A. C. (2020a). Environmental implications of Indonesia's new planned capital in East Kalimantan, Borneo Island , Resilience Development Initiative. Jawa Barat

Teo, H. C., Lechner, A. M., Sagala, S., and Campos-Arceiz, A. (2020b). Environmental impacts of planned capitals and lessons for Indonesia’s new capital. Land 9, 1–17. doi: 10.3390/land9110438

van de Vuurst, P., and Escobar, L. E. (2020). Perspective: climate change and the relocation of Indonesia’s capital to Borneo. Front. Earth Sci. 8, 1–6. doi: 10.3389/feart.2020.00005

Young, I. M. (1990). Justice and the politics of difference Iris . Princeton University Press. New Jersey

Young, I. M. (2000). Inclusion and democracy . Oxford University Press. Oxford.

Young, I. M. (2011). Responsibility for justice . Oxford University Press. Oxford.

Keywords: new capital city, relocation, ecology, citizenship, sustainability, Indonesia

Citation: Agustino L, Hikmawan MD and Silas J (2024) Is it possible for sustainability? The case from the new capital city of Indonesia. Front. Polit. Sci . 6:1362337. doi: 10.3389/fpos.2024.1362337

Received: 28 December 2023; Accepted: 02 April 2024; Published: 22 April 2024.

Reviewed by:

Copyright © 2024 Agustino, Hikmawan and Silas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Leo Agustino, [email protected]

This article is part of the Research Topic

Climate Change, Natural Resources, and Human Security in Governance and Society: Vulnerabilities and Adaptation Strategies

IMAGES

  1. (PDF) The Analysis of Environmental Case Studies

    ecological analysis of case study

  2. Descriptive Epidemiology, Case Reports, Case Series, Cross-Sectional

    ecological analysis of case study

  3. (PDF) Ecological Analysis of Tuberculosis Patients’

    ecological analysis of case study

  4. Environmental Analysis: Steps, Examples & Benefits

    ecological analysis of case study

  5. Ecological study design multiple group study and statistical analysis

    ecological analysis of case study

  6. case control, ecological, & cross-sectional studies Flashcards

    ecological analysis of case study

VIDEO

  1. Ecological levels of analysis (PSY)

  2. Ecological Case Description

  3. Ecological Analysis

  4. Gathering Ecological Data

  5. TO STUDY ECOLOGICAL ADAPTATIONS IN XERIC AND HYDRIC CONDITIONS

  6. Ecological Niche

COMMENTS

  1. Ecological Approach in Practice: A Case Study of the Ounce of

    This paper will, therefore, examine existing research to take a broad look at the critical ecologies of adolescents and assess the ecological sensitivities of typical program models that serve low-income adolescent mothers. Finally, using the case study of the Ounce of Prevention Fund in Chicago, Illinois (The Ounce), the paper will illustrate ...

  2. Chapter 6. Ecological studies

    Like geographical studies, analysis of secular trends may be biased by differences in the ascertainment of disease. As health services have improved, diagnostic criteria and techniques have changed. Furthermore, whereas in geographical studies the differences are accessible to current inquiry, validating secular changes is more difficult as it ...

  3. Ecologic studies in epidemiology: concepts, principles, and ...

    An ecologic study focuses on the comparison of groups, rather than individuals; thus, individual-level data are missing on the joint distribution of variables within groups. Variables in an ecologic analysis may be aggregate measures, environmental measures, or global measures. The purpose of an ecologic analysis may be to make biologic ...

  4. PDF Study Design VI

    Previously in this series I have given an overview of the main types of study design and the techniques used to minimise biased results. In this article I describe more fully ecological studies ...

  5. 7.2

    An ecological study is an observational study in which at least one variable is measured at the group level. An ecological study is especially appropriate for the initial investigation of causal hypotheses. ... Analysis of Ecologic Studies. Analytic models in ecologic studies are of different forms: ... Comparing & Combining Case-Control and ...

  6. Methodology Series Module 7: Ecologic Studies and Natural Experiments

    In this module, we will discuss study designs that have not been covered in the previous modules - ecologic studies and natural experiments. Go to: Ecologic Studies. In an ecologic study, the unit of analysis is a group or aggregate rather than the individual. It may be the characteristics of districts, states, or countries.

  7. 6.1

    An ecological study is an observational study in which at least one variable is measured at the group level. An ecological study is especially appropriate for initial investigation of causal hypothesis. ... Case Control Studies. 5.1 - Rationale & Designs; 5.2 - Basic Concepts of Exposure; ... Sample Ecological Data and Analysis; 7.3 ...

  8. Ecological study

    In epidemiology, ecological studies are used to understand the relationship between outcome and exposure at a population level, where 'population' represents a group of individuals with a shared characteristic such as geography, ethnicity, socio-economic status of employment. What differentiates ecological studies from other studies is that the unit analysis being studied is the group ...

  9. On Ecological Studies: A Short Communication

    Consequently, ecological studies may actually be more accurate than case control studies in estimating exposures, at least in regard to radon. Cohen (1990) also notes that the large amount of data available in the ecological design reduces the confounder effect to a level of the case control study that accounts for known exposures of confounders.

  10. Analyzing Ecological Data

    Each case study explores the statistical options most appropriate to the ecological questions being asked and will help the reader choose the best approach to analysing their own data. ... If you want an edited volume on different methods of ecological data analysis, then this book is worth looking through." (Times Higher Education, May, 2008)

  11. Ecological Analysis and Impacts of Health Interventions

    An ecological analysis is a way for scientists to look at large-scale impacts of time-specific interventions on population health. In these types of studies, researchers examine the health of a population before and after some time-specific event or intervention. For example, ecological analyses are often performed on data collected before and ...

  12. The Combination of Ecological and Case-Control Data

    Ecological and case-control data with a binary exposure, and the outcome stratified by a binary confounder in a generic area. In the study design we consider the ecological data consist of N1+0, N1+1, N and M1+, M+1, while the case-control data are n010, n110, n011, n111 and n00, n10, n01, n11. Ecological Z = 0.

  13. Ecological Studies (Correlational Studies)

    Ecological Studies (Correlational Studies) Ecologic studies assesses the overall frequency of disease in a series of populations and looks for a correlation with the average exposure in the populations. These studies are unique in that the analysis is not based on data on individuals. Instead, the data points are the average levels of exposure ...

  14. Identification and stability analysis of critical ecological land: Case

    The authors of the present study attempted to study environmental stability by introducing the concept of complex network for ecological space stability analysis. The potential ecological corridor transition lines were extracted based on critical ecological land patch transition points, and the importance of ecological nodes was evaluated ...

  15. An ecological study of socioeconomic predictors in detection of COVID

    Background New York City was the first major urban center of the COVID-19 pandemic in the USA. Cases are clustered in the city, with certain neighborhoods experiencing more cases than others. We investigate whether potential socioeconomic factors can explain between-neighborhood variation in the COVID-19 test positivity rate. Methods Data were collected from 177 Zip Code Tabulation Areas (ZCTA ...

  16. PDF A Review Of Ecological Assessment Case Studies From A Risk ...

    case studies were recently evaluated to provide further insight into the ecological risk assessment process. As with the original case studies, each of the five new case studies was evaluated by scientific experts at EPA-sponsored workshops. Two workshops were held in September 1992 (57 Federal

  17. Ecological vulnerability analysis: A river basin case study

    In this paper, the concept of ecosystem vulnerability is elaborated, with particular focus to aquatic ecosystems. A numeric "Vulnerability index" is developed, in order to evaluate the potential response of ecosystem features to multiple stressors. The index has been applied to the case study of two river ecosystems subject to different ...

  18. Institutional Ecological Footprint Analysis: A Case Study of the

    This short case study combines desk analysis of publicly available Australian University sustainability reports, strategy plans, compliance reports of energy and water, built environment ...

  19. Case Studies of Social-Ecological Systems

    This case represents the collaborative effort of an international team focused on studying how coastal regions may respond to climate change as part of the Multi-Scale Adaptations to Climate Change and Social-Ecological Sustainability in Coastal Areas (MAGIC) research project funded by a Belmont grant. The study involved three sites: Cornwall ...

  20. Institutional ecological footprint analysis ‐ A case study of the

    This paper develops an ecological footprint model for institutional contexts and this study of the University of Newcastle (NSW) is the first institutional level EFA undertaken in Australia. The case study shows tertiary institutions to be net importers of consumption items and thus dependent on a vast external environment.

  21. Ecological amplitude and indication potential of mining bees ...

    Park, Y. S. et al. Application of a self-organizing map to select representative species in multivariate analysis: A case study determining diatom distribution patterns across France. Ecol. Inf. 1 ...

  22. Spatio-Temporal Evolution Analysis of Land Use Change and ...

    Land use change in emerging nations raises landscape ecological risks (LERS), hastens the deterioration of urban and rural ecosystem services, endangers human w ... a Case Study of Chengdu Plain. 45 Pages Posted: 26 Apr 2024. See all articles by Yali Wei ... and the spatial and temporal evolution analysis from 2010-2040 was conducted to ...

  23. PCR bias in ecological analysis: a case study for quantitative Taq

    PCR bias in ecological analysis: a case study for quantitative Taq nuclease assays in analyses of microbial communities Appl Environ Microbiol. 2000 Nov;66(11):4945-53. doi: 10.1128/AEM.66.11.4945-4953.2000. ... In this study, we examined the performance of Taq nuclease assays (TNAs), PCR-based assays in which the amount of an amplicon is ...

  24. Hybrid speciation driven by multilocus introgression of ecological

    Genomic studies of Heliconius butterflies provide evidence that Heliconius elevatus is a hybrid species, and that its speciation was driven by introgression of traits from Heliconius ...

  25. Construction of Cultivated Land Ecological Network Based on Supply and

    The research on the ecological protection of cultivated land has gradually become a focus and frontier of cultivated land protection. Constructing an ecological network of a cultivated land system is important to improve the effect of cultivated land ecological protection. In this study, the supply-demand ratio of five ecosystem services was calculated from 2000 to 2020 in Shandong Province, a ...

  26. Ecological studies of COVID-19 and air pollution: How useful are they?

    A key limitation of ecological studies is the inability to identify and account for heterogeneity within the geographical areas used as the unit of analysis. 33 The bias may be so severe as to completely reverse the direction of the association, such as shown in a parallel analyses of ecological and individual-level case-control data of radon ...

  27. Integrating policy quantification analysis into ecological security

    @article{Yang2024IntegratingPQ, title={Integrating policy quantification analysis into ecological security pattern construction: A case study of Guangdong-Hong Kong-Macao Greater Bay Area}, author={Meng Yang and Ju He and Longyu Shi and Yingying Lv and Jingwen Li}, journal={Ecological Indicators}, year={2024}, url={https://api ...

  28. Estimating turbidity concentrations in highly dynamic rivers ...

    Turbidity is an essential biogeochemical parameter for water quality management because it shapes the physical landscape and regulates ecological systems. It varies spatially and temporally across large water bodies, but monitoring based on point-source field observations remains a difficult task in developing countries due to the need for logistics and costs. In this study, we present a novel ...

  29. Frontiers

    The decision to relocate the nation's capital from Jakarta is not without reason. Jakarta, the nation's capital, is regarded as less than ideal, with numerous issues such as flooding, air pollution, poor water quality, and political and environmental sustainability. This research will be based on the framework of ecological citizenship to investigate active citizens.

  30. Wide-Area Subsidence Monitoring and Analysis Using Time-Series ...

    Ground subsidence often occurs over a large area. Although traditional monitoring methods have high accuracy, they cannot perform wide-area ground deformation monitoring. Synthetic Aperture Radar (SAR) interferometry (InSAR) technology utilizes phase information in SAR images to extract surface deformation information in a low-cost, large-scale, high-precision, and high-efficiency manner.