Local epidemic and variant detection
During March–July 2020, the number of confirmed COVID-19 cases remained low, associated with the strict stay-at-home order. After the order was lifted in June 2020, the number of cases increased steeply during the summer of 2020, initiating the local epidemic (Fig. 1a). Since then, we observed three epidemic waves with high points in November 2020, April 2021, and August 2021. By September 30th, 2021, the Puerto Rico Department of Health and the Centers for Disease Control and Prevention reported over 181,599 confirmed cases26.
a Graphical representation of the number of daily SARS-CoV-2 cases confirmed by antigen tests (light blue) and molecular tests (dark blue) reported by the PRDH from March 2020 to September 30th, 2021, shown as a 21-day rolling average. Arrows indicate the timeline of government responses and vaccination milestones. b Proportion of all SARS-CoV-2 lineages and emerging variants detected in Puerto Rico and published in GISAID from March 2020 to September 30th, 2021 (n = 2514 sequences after filtering for high-coverage genomes with complete sampling dates). Non-VBM/VOC lineages (n > 63) are categorized as a collective labeled “Other”, except for local lineage B.1.588 due to high frequency and focus of this study. No genomes were obtained during May 2020. c Proportion of all Delta sub-lineages detected in Puerto Rico published in GISAID until September 30th, 2021 (n = 1360 genomes). Sub-lineages with more than 5% detection are represented by individual color, whereas sub-lineages with <5% detection are categorized as a collective labeled “Other”.
We conducted genomic surveillance for 19 months since March 2020, where, each month, we sequenced SARS-CoV-2 positive specimens from recently symptomatic and asymptomatic patients residing in 63 of the 78 municipalities, covering all seven health regions of the island. The frequency of lineages detected was calculated periodically from all viral genomic sequences from Puerto Rico published in GISAID, including our sequence datasets, and reported to the PRDH to inform case investigations and surveillance (Fig. 1b). Our data comprise the genomes from the initial SARS-CoV-2 confirmed infections detected in March 2020, which included three European tourists that arrived on the island in a cruise ship and eight residents with no recent travel history declared. PANGO lineage assignment after sequencing identified lineage A.1, a lineage predominant in Europe at the time, in the three travelers with infection, while lineages B.1 and B.1.1 were identified in residents with infection. B.1x lineages predominated in the United States and the Americas. The initial phase of the epidemic was characterized by the detection of a wide diversity of B.1x lineages that circulated at low frequency for short periods of time, suggesting that the local epidemic was initiated by multiple introduction events. In August 2020, we detected the emergence of lineage B.1.588 in various municipalities of the island. Lineage B.1.588 rapidly became the predominant lineage in Puerto Rico, circulating at high frequency for ~4 months and causing the first epidemic wave in November 2020 (Fig. 1a, b). Circulation of lineage B.1.588 declined during the first wave of the epidemic in the winter of 2020, a season of local holiday festivities and frequent travel. During this season, the diversity of B.1x lineages increased, and the first emergent variants were detected in the island, VBM B.1.427/429 (Epsilon) in December 2020 and Alpha in January 2021 concordant to variant emergence in the United States (Fig. 1b). Concurrently, the first stage of the COVID-19 vaccination campaign in Puerto Rico started in mid-December 2020 for the elderly population and first responders. A steep reduction in confirmed cases was observed in the following months despite the introduction of VBMs B.1.526 (Iota) and P.1/1.1 (Gamma) in February and March 2021 respectively and the predominant circulation of Alpha in March 2021 (Fig. 1b). The second wave of the epidemic was observed in April 2021 with Alpha predominating (Fig. 1a, b). Though other emerging variants continued to be detected, the frequency of detection remained low, and Alpha predominated for ~3–4 months. The second stage of the COVID-19 vaccination campaign started in April 2021 for all adults and was immediately followed by a sharp decrease in confirmed cases, a period in which ~50% of the population had received at least one dose of the vaccine26 (Fig. 1a). VOC B.1.617.2/AY.x (Delta) was first detected in June 2021, concordant with the emergence in the United States, and rapidly dominated transmission. During the same period, we detected the emergence of VBM B.1.621 (Mu), which caused a small local outbreak in the western part of the island, as well as a modest increase in Gamma infections (Fig. 1b). The third epidemic wave was observed in August 2021, coinciding with a summer of increased travel and the removal of local government-imposed restrictions on business indoor occupant capacity and public gatherings (Fig. 1a, b). During this period, most COVID-19 cases in Puerto Rico were caused by Delta and ~18 Delta sub-lineages were detected in the island, with AY.3 as the most frequently sampled sub-lineage (Fig. 1c). These data provide further evidence of the multiple importations received during this period of the epidemic. By September 30th, 2021, a steep decrease in confirmed cases was observed, a point in which more than 77% of the population had received at least one dose of the vaccine (Fig. 1a).
Phylogenetic reconstruction of the local pandemic
This study generated 753 complete genomes from viruses sampled between March 2020 and September 2021. Our dataset was combined with 2611 publicly available genomes in GISAID to understand the emergence and spread of the viruses circulating in Puerto Rico in a global context. We reconstructed the local and regional epidemic using a time-calibrated phylogenetic tree inferred with maximum likelihood (Fig. 2). This global phylogenetic analysis estimated that the initial SARS-CoV-2 introductions occurred between February 19 and March 16, 2020. Most viral genomes from Puerto Rico descend or are closely related to genomes from the United States. However, we were unable to determine the precise origin at the state level due to the limited sampling during the emergence period and subsequent low circulation. The resulting tree topology inferred viral sequences from Puerto Rico scattered across the global tree, smaller short-lived monophyletic clusters, and larger monophyletic clusters that suggest sustained transmission of a particular genotype. Our analysis also showed the emergence and evolution of the SARS-CoV-2 variants detected in Puerto Rico. Multiple monophyletic clusters of Puerto Rican sequences were inferred within the clades formed by each emergent variant and the size of the clades is proportional to the frequency of genomes sampled in the island (Figs. 1b and 2). The observed clustering patterns in the phylogenetic trees and the rapid increase in frequency following initial detection indicate multiple virus introductions with swift expansion across the island in a short period of time.
Maximum likelihood tree inferred with 3364 complete genomes including 753 viral genomes from Puerto Rico sampled between March 23rd, 2020 and September 30th, 2021 (red branches) combined with 2611 complete genomes retrieved from GISAID during the same period to provide a global backdrop with a higher focus on the Americas region. Node structure is supported by 1000 bootstrap replicates. Branches marked in red represent taxa from Puerto Rico. The outer ring is color-coded by region of origin. The inner wedges are color-coded to represent emerging variants of interest or concern. The phylogenetic tree is rooted in Wuhan/WH01/2019 and Wuhan/Hu-1/2019 reference genomes.
Detection and spread of autochthonous lineage B.1.588
During the initial phase of epidemic transmission, we detected the emergence of an autochthonous lineage, B.1.588, which rapidly spread across the island. Based on GISAID data and cov-lineages.org reports (https://cov-lineages.org/lineages), lineage B.1.588 was first detected in Puerto Rico on August 2nd, 2020: sequence EPI_ISL_1168693. Initially, lineage B.1.588 circulated only in Puerto Rico, accounting for approximately half of the viruses sampled in the island in September 2020. B.1.588 quickly became the predominant lineage in Puerto Rico during the first epidemic wave, circulating for 4 months until it was replaced by the emergence of Alpha in January 2021 (Fig. 1b). This study sequenced 97 out of the 115 B.1.588 genomes from Puerto Rico found in GISAID. To understand the emergence and spread of this lineage, we reconstructed a focused phylogenetic tree using maximum likelihood and Bayesian inference with 103 B.1.588 sequences from Puerto Rico, 58 B.1.588 sequences from the United States and an additional set of 77 B.1 lineage sequences closely related to B.1.588 (Fig. 3). Our analysis estimated that lineage B.1.588 diverged from its parental lineage B.1 between May 21st, 2020, and July 16th, 2020 in Puerto Rico, after the appearance of two non-conservative mutations: T20I in the spike protein and M234I in the nucleocapsid protein. Subsequently, lineage B.1.588 spread broadly to the United States, mainly in New York, Texas, Florida, and California, where it circulated until May 2021 concomitant with a diversity of other lineages and variants. More than 990 B.1.588 genomes have been reported in the United States.
Phylogenetic reconstruction of monophyletic lineage B.1.588 using Bayesian maximum clade credibility tree inferred with 239 complete genomes including 130 genomes from Puerto Rico (103 B.1.588 genomes) sampled between July 2020 and March 2021. Node support was tested by posterior probability. The gray circle represents B.1 viral genomes from Puerto Rico and the United States that cluster basal to the B.1.588 monophyletic lineage. Red circle taxa tips represent viral genomes from Puerto Rico. Colored shade bar on the right of the tree indicates the taxa region of origin. The phylogenetic tree is rooted in Wuhan/WH01/2019 and Wuhan/Hu-1/2019 reference genomes.
Emergence of SARS-CoV-2 variants
VBM Alpha was first detected in Puerto Rico in January 2021, co-circulating with local predominant lineage B.1.588 and other B lineages at a lower rate (Fig. 1b). Notably, this VBM replaced the well-established autochthonous lineage B.1.588. The emergence and epidemiology of Alpha in Puerto Rico resembled the patterns observed in the United States, with rapid spread and a sharp increase in confirmed cases19,20,41. To understand the emergence and spread of Alpha in Puerto Rico, we inferred a maximum likelihood phylogenetic-focused tree with all Alpha genomes obtained in our dataset in addition to a subset of other Alpha genomes from Puerto Rico, the United States, and a regional context backdrop (Fig. 4). The resulting inference estimated that the emergence of Alpha in Puerto Rico may have occurred between November 6th, 2020, and December 31st, 2020. Tree topology showed multiple monophyletic clusters of Puerto Rican sequences diverging across a period of 4–5 months of circulation. The larger clusters of Puerto Rican sequences suggest that local transmission of specific Alpha genotypes was sustained, succeeding after multiple introduction events. Most of these clusters were associated with sequences from the United States, suggesting that multiple introductions occurred over a period of 5–6 months, propelling the local transmission of this variant. We also found a subset of Puerto Rican sequences associated with sequences from the Caribbean and the Americas but low node support impaired resolution of transmission patterns.
Phylogenetic reconstruction using a maximum likelihood tree inferred with 730 time-calibrated complete genomes, including 160 viral genomes from Puerto Rico and 570 contextual viral genomes from the United States and the Americas to provide a regional backdrop. Node structure supported by 1000 bootstrap replicates. Tree topology shaded in red represents clusters of viral genomes from Puerto Rico, blue shades represent clusters of genomes from the United States, and the gray shades represent clusters from other countries.
VOC Delta was first detected in Puerto Rico in June 2021, during a period when SARS-CoV-2 transmission was declining, and the vaccination campaign was progressing rapidly (Fig. 1a, b). After its initial detection, Delta spread rapidly across the island (Fig. 1b). Over 30% of the COVID-19 cases sampled and sequenced in June 2021 were caused by Delta. This variant has been characterized broadly as the most dominant emerging variant, replacing most lineages, and causing most of COVID-19 cases in the United States and Puerto Rico from its emergence through November 2021. To understand the rapid emergence and spread of Delta in the island, we reconstructed a maximum likelihood phylogenetic-focused tree with all Delta Puerto Rican sequences obtained in our dataset supplemented with additional sequences from Puerto Rico and the United States retrieved from GISAID with collection dates between May 1st, 2021, and September 30th, 2021. According to our phylogenetic inference, the emergence of Delta in Puerto Rico may have occurred between April 15th and June 14th, 2021, potentially after one or multiple introductions. The precise origin of the introductions was challenging to resolve, considering that multiple sequences from Mexico, the United States, and the Caribbean cluster among the early sampled sequences from Puerto Rico with low node support, <75% bootstrap value (Fig. 5). The first Delta lineage to be detected was B.1.617.2, which seems directly related to a small number of VBM B.1.617.1 (Kappa) that clustered basal to the focused tree. Tree topology is similar to the patterns observed in the Alpha focus tree, where more than 17 distinct clusters with sequences from Puerto Rico were observed diverging across 4 months of circulation in the island. Most of these clusters were associated with distinct Delta sub-lineages and seem closely related to similar sequences from the United States and the Caribbean. These clustering patterns and the diversity of Delta sub-lineages detected suggest that multiple introductions throughout 5–6 months propelled the emergence and transmission of this variant in the island.
Phylogenetic reconstruction using a maximum likelihood tree inferred with 815 time-calibrated complete genomes including 324 viral genomes from Puerto Rico and 491 contextual genomes from the United States and the Americas to provide a regional backdrop. Node structure supported by 1000 bootstrap replicates. Tree topology sections shaded in red represent clusters of viral genomes from Puerto Rico. Each cluster from Puerto Rico is labeled with cluster number C.x and Delta sub-lineage PANGO assignment [AY.x].