A genômica Einkorn lança luz sobre a história do trigo domesticado mais antigo

Natureza (2023)Cite este artigo

95 Altmétrico

Detalhes das métricas

Einkorn (Triticum monococcum) foi a primeira espécie de trigo domesticada e foi fundamental para o nascimento da agricultura e da Revolução Neolítica no Crescente Fértil, há cerca de 10.000 anos1,2. Aqui geramos e analisamos conjuntos de genoma de 5,2 Gb para einkorn selvagens e domesticados, incluindo centrômeros completamente montados. Os centrômeros de Einkorn são altamente dinâmicos, mostrando evidências de mudanças antigas e recentes de centrômeros causadas por rearranjos estruturais. A análise de sequenciamento do genoma completo de um painel de diversidade revelou a estrutura populacional e a história evolutiva do einkorn, revelando padrões complexos de hibridizações e introgressões após a dispersão do einkorn domesticado do Crescente Fértil. Mostramos também que cerca de 1% do subgenoma A do pão de trigo moderno (Triticum aestivum) se origina do einkorn. Esses recursos e descobertas destacam a história da evolução do einkorn e fornecem uma base para acelerar a melhoria assistida pela genômica do einkorn e do trigo para pão.

Einkorn (T. monococcum) foi a primeira espécie de trigo que os humanos domesticaram há cerca de 10.000 anos no Crescente Fértil, uma região do Oriente Próximo que é frequentemente referida como o Berço da Civilização1,2. O einkorn selvagem era um ingrediente dos mais antigos produtos semelhantes ao pão conhecidos, cozidos por caçadores-coletores na atual Jordânia, quatro milênios antes do surgimento da agricultura3. Einkorn teve um papel fundamental no estabelecimento da agricultura no Crescente Fértil e é a única espécie diplóide de trigo (2n = 2x = 14, genoma AmAm) da qual existem formas selvagens e domesticadas. Uma diferença morfológica notável entre einkorn selvagem e domesticado é o sistema de dispersão de grãos. O einkorn selvagem tem uma raque frágil que facilita a dispersão das sementes, enquanto a raque do einkorn domesticado não é frágil4. Einkorn está intimamente relacionado ao Triticum urartu, o doador do genoma A do trigo duro tetraplóide (Triticum durum) e do trigo hexaplóide (T. aestivum)5. Em contraste com o T. urartu, os einkorn selvagens e domesticados têm uma longa história de cultivo e seleção humana em diversas condições ambientais, o que torna o einkorn uma fonte valiosa de variação genética para o melhoramento do trigo. Múltiplas introgressões naturais e artificiais de einkorn em trigo para pão contendo genes importantes para a agricultura foram descritas . Análises genéticas populacionais indicam que os einkorn selvagens se agrupam em três grupos distintos (raças α, β e γ) e apontam para uma região ao redor das montanhas Karacadağ, no sudeste da Turquia, como o local da domesticação dos einkorn11,12,13,14,15,16,17 .

Aqui estabelecemos e analisamos um conjunto abrangente de recursos genômicos para einkorn, incluindo conjuntos de referência em escala cromossômica anotados de novo de um acesso de einkorn selvagem e um domesticado, bem como sequenciamento de todo o genoma de um painel de diversidade de einkorn. Nossos resultados desvendam a complexa história evolutiva do einkorn e oferecem insights sobre a dinâmica do genoma das Triticeae, incluindo a estrutura do centrômero, ao mesmo tempo que estabelecem recursos valiosos que aumentam a caixa de ferramentas genômicas para o melhoramento do trigo.

Geramos conjuntos de referência de dois acessos einkorn usando uma combinação de sequenciamento de consenso circular PacBio, mapeamento óptico e captura de conformação cromossômica (Tabela de Dados Estendidos 1, Tabela Suplementar 1 e Figura 1 Complementar). TA10622 é uma raça local einkorn domesticada (T. monococcum L. subsp. monococcum) com raque não frágil que foi coletada na Albânia no início do século XX. O acesso de einkorn selvagem TA299 (T. monococcum L. subsp. aegilopoides; raça α) foi coletado durante uma expedição em 1972 no norte do Iraque e tem uma raque frágil. As integridades de montagem foram verificadas usando um mapa genético einkorn (Tabelas Suplementares 2 e 3). Observamos um alto grau de colinearidade entre os dois conjuntos de pseudomoléculas (Fig. 1 e Figura Complementar 2) e entre os dois conjuntos de einkorn e o subgenoma A do pão de trigo (Figura 3 Complementar). As exceções mais óbvias foram os bem descritos rearranjos do cromossomo 4A do trigo para pão, que sofreram inversões e translocações no trigo poliplóide . Anotamos 32.230 e 32.090 modelos de genes de alta confiança nas 7 pseudomoléculas de TA299 e TA10622, respectivamente (pontuações BUSCO de 99,2% para TA299 e 99,4% para TA10622) (Tabelas Suplementares 4 e 5).

30% missing) at the population level. In JoinMap, we removed identical markers (similarity = 1) and mapped only one marker of the identical pair. We grouped the markers using minimum LOD of 6 and the markers were mapped using a regression mapping approach and the Kosambi function. The linkage maps were visualized using Mapchart (v.2.32; https://www.wur.nl/en/show/mapchart.htm). Linkage maps were constructed using this approach with both wild and domesticated einkorn assemblies./p>

60.0 || MQ < 40.00 || MQRankSum < −12.5 || ReadPosRankSum < −8.0 || SOR > 3.0’. In total, 208,855,939 SNPs were called from 219 einkorn accessions. After quality control using VCFtools108 (v.0.1.17), the raw SNPs were filtered using GATK107 (v.4.1.8.0) and VCFtools108 (v.0.1.17) as follows: SNP clusters, defined as three or more SNPs located within 10 bp; low and high average SNP depth (4 ≤ DP ≥ 15); and SNPs located in the unanchored chromosome were removed. Moreover, one misclassified accession (TA574; initially was classified as γ) was removed on the basis of PCA and divergence analysis. Finally, only biallelic SNPs were retained for further analyses, representing a final VCF file of 121,459,674 SNPs (Supplementary Table 15). These SNPs were annotated using snpEff109 (v.5.0e) with TA299 HC gene models. The false-positive error rate of variant calling (percentage of polymorphic sites in a resequenced TA299 sample compared with the TA299 reference) was 0.008%, which is comparable to the error rates of other studies43,44,45,46 (Supplementary Fig. 19a). Variants were evenly distributed across the seven chromosomes, except for the centromeres that showed a marked reduction in variant densities due to reduced read mapping (Supplementary Fig. 19b, Supplementary Fig. 20 and Supplementary Table 16). Approximately 2.2% of the total SNPs were gene-proximal (2 kb upstream and downstream of a coding sequence). An additional 0.8% of the SNPs were located in introns and 0.5% in exons. Of the exonic SNPs, 317,023 (53.4%) were non-synonymous affecting 26,505 genes, of which 9,145 SNPs resulted in a disruption of coding sequences (premature stop codon) in 5,726 genes. Furthermore, 45.7% of the total SNPs (55,558,212 SNPs) represented rare variants with a minor allele frequency below 1% (Supplementary Fig. 19c and Supplementary Table 17). Variant calling using the TA10622 assembly revealed very similar results on the basis of population divergence, PCA and nucleotide diversity (α, π = 0.0012; β, π = 0.0017; γ, π = 0.0022; domesticated, π = 0.0012; Supplementary Fig. 21a–c), confirming the high accuracy of variant calling and the independence of population structure analyses from which reference assembly is used. The SNP calling against the TA10622 reference assembly was used for the analyses presented in Extended Data Fig. 7a,b,e./p> 10% and 5% randomly sampled SNPs; total SNPs = 5,318,268). First, the genetic distances were computed using Euclidean distances with the ‘dist’ function in the stats R package. The distance matrix was converted to a phylo object using the R package ape and the tree was generated using the phyclus R package. For estimating individual ancestry coefficients, the R package LEA ‘snmf’ function was used with the entropy option and with 10 independent runs for each K (K is the number of putative ancestral populations) from K = 1 to K = 10 using the same SNP subset used to generate the phylogenetic tree. The cross-entropy value decreased with increasing K and reached a plateau starting from K = 6 (Supplementary Fig. 14)./p>13-fold coverage. We used the Illumina reads of TA4342-L96 (Sequence Read Archive: SRR21543761) as the parental control. We followed the MutMap protocol with minor modifications57. High-quality filtered reads were aligned to the T. monococcum accession TA10622 using BWA96. SAM files were converted into .bam files using SAMtools69. SAMtools (markdup option) was used to mark and remove PCR duplicates. Improperly mapped read pairs were removed from the .bam files retaining only concordantly aligned reads with MAPQ ≥ 30. The BCFtools mpileup tool was used for SNP calling70. SNPs were filtered on the basis of the following criteria: minQ ≥ 30, Fisher Strand (FS) > 40, mapping quality (MQ < 40), minDP > 3 and genotype quality (GQ < 20). SNPs within 10 bp proximity of indels were removed and only the biallelic SNPs were retained. SNP positions with an identical allele in both TA4342-L96 and the tin3 mutant bulk were treated as varietal SNPs and were removed from the analysis. SnpSift109 was used to select EMS-type (G/C to A/T) transitions from the VCF file. We considered the positions with a SNP index of ≥0.9 to be homozygous, whereas SNPs with an SNP index of <0.3 were removed, and the rest were considered to be heterozygous. We used the mutplot tool (https://github.com/VivianBailey/Mutplot) to calculate the average SNP index using a window size of 100 kb116. The average SNP index was plotted along the chromosomes using ggplot2117. SnpEff 5.0c (build 2020-11-25 14:23) was used to calculate the effect of the variants on genes./p>