Skip to main content
The Plant Cell logoLink to The Plant Cell
. 2020 Jun 1;32(6):1771–1772. doi: 10.1105/tpc.20.00257

On the Importance of Variation: A High-Resolution Map of Copy Number Variants in Arabidopsis[OPEN]

Matthias Benoit 1,
PMCID: PMC7268814  PMID: 32265264

Linking genotype to phenotype is a major challenge in plant biology. Phenotypic variation observed between individuals of the same plant species is the consequence of a vast array of genetic variation, including single-nucleotide polymorphisms (SNPs) and small or large structural variants including copy number variation (CNV). CNVs are genetic polymorphisms in which a segment of the genome differs in number between individuals. CNVs may result from deletions, insertions, duplications, and larger amplifications occurring at coding and noncoding genomic regions. CNVs are formed through diverse genetic mechanisms including unequal crossing over, DNA double-strand break repair, and transposon activity. CNVs may have facilitated the evolution and diversification of plant phenotypes, mostly through modification of gene dosage. Nevertheless, a population-scale analysis of CNVs to resolve their diversity and consequences on plant phenotypes is still lacking. In a new study, Zmienko et al. (2020) provide a population-scale map of DNA CNVs in Arabidopsis (Arabidopsis thaliana) in an effort to facilitate the exploration of the genetic determinants of phenotypic variation in the model plant.

Taking advantage of the short-read whole-genome sequencing data released through the 1001 Genomes Consortium (2016), the authors called CNVs on 1064 Arabidopsis accessions (Zmienko et al., 2020). CNVs were identified with different tools, based on read-depth, read-pair, split-read, or hybrid strategies. Through this combined approach, the authors uncovered 19,003 CNVs in the Arabidopsis genome. In parallel, 70,137 large insertions/deletions (indels) were called only by read pair-based callers, to expand the repertoire of structural variation identified in Arabidopsis. Data are accessible at http://athcnv.ibch.poznan.pl through a user-friendly interface. After extensive benchmarking, analysis of the distribution and the genomic content of CNVs revealed an overlap with 18.3% of protein-coding genes, particularly enriched in defense- and stress-response genes. Most of the CNVs were located in genomic segments that are enriched in transposable elements (TEs) and associated with high levels of genomic rearrangements. Conversely, CNVs overlapping with TEs tend to lie farther from genes compared with non-CNV TEs. Taken together, those two observations suggest that the distribution of CNVs is in part defined by the local genomic context (see figure).

graphic file with name TPC_202000257_f1.jpg

Genomic Distribution of the Newly Identified CNVs and Large Indels.

Circos plot showing the chromosomal distribution of CNVs and large indels identified by Zmienko et al. (2020). SNPs were identified by the 1001 Genomes Consortium (2016). CNVs and large indels are distributed throughout the chromosomes and are most abundant in pericentromeric regions. (Adapted from Zmienko et al. [2020], Figure 2.)

The authors then harnessed the newly identified gene-associated CNVs as markers to infer Arabidopsis population structure. Compared with the classical approach of using SNPs, the CNV-based approach performed better at identifying the global distribution of each accession but was less sensitive for detecting genetic subgroups. Use of CNVs as markers for population structure analysis thus represents a valuable complement to SNP markers. At the individual accession level, up to 26.9% of gene-associated CNVs showed variation within the population. The authors investigated the consequences of change in gene copy number on phenotypic variation. Using CNV markers in a genome-wide association study, they show a strong association between resistance to Pseudomonas and change in the copy number of RPS5 and RPM1 resistance genes. This example shows that CNVs can be efficiently used as markers for a genome-wide association study.

The availability of a copy number variant map in Arabidopsis offers unprecedented perspectives for assessing the consequences of gene CNV on quantitative phenotypes in plants. The increasing availability and refinement of long-read DNA-sequencing technologies (Michael et al., 2018; Jiao and Schneeberger, 2020) will help to resolve particularly complex CNVs and to provide further insights into the consequences of structural variation on the evolution of plant phenotypes.

Footnotes

[OPEN]

Articles can be viewed without a subscription.

References

  1. Jiao W.B., Schneeberger K.(2020). Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11: 989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Michael T.P., Jupe F., Bemm F., Motley S.T., Sandoval J.P., Lanz C., Loudet O., Weigel D., Ecker J.R.(2018). High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9: 541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Zmienko A., Marszalek-Zenczak M., Wojciechowski P., Samelak-Czajka A., Luczak M., Kozlowski P., Karlowski W.M., Figlerowicz M.(2020). AthCNV: A map of DNA copy number variations in the Arabidopsis thaliana genome. Plant Cell 32: 1797–1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. 1001 Genomes Consortium. (2016). 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166: 481–491. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from The Plant Cell are provided here courtesy of Oxford University Press

RESOURCES