Skip to main content
PeerJ Computer Science logoLink to PeerJ Computer Science
. 2020 Jan 20;6:e251. doi: 10.7717/peerj-cs.251

RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms

Zhaodong Hao 1,2, Dekang Lv 3, Ying Ge 3, Jisen Shi 1, Dolf Weijers 2, Guangchuang Yu 4,, Jinhui Chen 1,
Editor: Sebastian Ventura
PMCID: PMC7924719  PMID: 33816903

Abstract

Background

Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics.

Results

The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively.

Conclusion

The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

Keywords: Genome, Chromosome, Idiogram, R package, Data visualization

Introduction

Recently, with the development of sequencing technologies, especially rapid advances in third generation sequencing including Pacific Biosciences (Eid et al., 2009) and Oxford Nanopore Technologies (Laver et al., 2015), BioNano genome mapping (Cao et al., 2014) and high-throughput chromatin conformation capture sequencing (Dekker et al., 2002), more and more species have their genomes sequenced or updated to the chromosome level (Jiao & Schneeberger, 2017; Phillippy, 2017). After the chromosome-level genome completion, an overview of some genome characteristics can help to better understand a species genome, such as gene and transposon distribution across the sunflower genome (Badouin et al., 2017).

An idiogram, also known as a karyotype, is defined as the phenotypic appearance of chromosomes in the nucleus of an eukaryotic cell and has been widely used to visualize the genome-wide data since the first web server, Idiographica, came online in 2007 (Kin & Ono, 2007). There are dozens of tools have been developed for circular genome visualization with a Perl language-based tool Circos being the most used one (Krzywinski et al., 2009; Parveen, Khurana & Kumar, 2019). In contrast, there are not many alternatives for non-circular plots of whole genome information on idiograms. Although few R packages, like GenomeGraphs (Durinck et al., 2009), ggbio (Yin, Cook & Lawrence, 2012), IdeoViz (Pai & Ren, 2014), chromPlot (Orostica & Verdugo, 2016) and chromDraw (Janecka & Lysak, 2016), and JavaScript libraries, like Ideogram.js (Weitz et al., 2017) and karyotypeSVG (Prlic, 2017), have been developed for non-circular genome visualization, they are either limited in several species and data visualization types or lacking the ample customization. Recently, two R packages, karyoploteR (Gel & Serra, 2017) and chromoMap (Anand, 2019), with strengthened capacities have been developed.

However, one function that all these non-circular plots fail to achieve, as Circos does, is to visualize the relationship between two or more species using Bezier curves on idiograms. This function is very useful and allows to interpret genome-wide relationships more intuitively, especially in the visualization of whole genome duplication. Indeed, Circos is usually used to show syntenic blocks both in inter- and intraspecies genome comparisons using Bezier curves (Hu et al., 2019; Wang et al., 2019). Thus, there is a lack of a R package for non-circular genome visualization and allowing to visualize genome-wide relationships between two or more species using Bezier curves on idiograms.

Scalable Vector Graphics (SVG) is a language for describing two-dimensional graphics applications and images. SVG graphics is defined in an eXtensible Markup Language (XML) text file which means that one can easily use any text editor or drawing software to create and edit SVG graphics. Most R graphics packages are built on two graphics systems, the traditional graphics system and the grid graphics system. Here, we developed an R package (RIdeogram) to draw high-quality idiograms without species limitations, that allows to visualize and map whole-genome information on the idiograms based on the SVG language. Besides, RIdeogram can also be used to show the genome synteny with Bezier curves linking the syntenic blocks on idiograms.

Description

The package RIdeogram is written in R (R Core Team, 2018), one of the most popular programming languages widely used in statistical computing, data analytics and graphics. However, this new R graphics package is not built based on any existing graphics systems. We use the R environment to read the custom input files and calculate the drawing element positions in a coordinate system. Then, we use R to write all element information into a text file following the XML format which are used to define graphics by the SVG language. A list of the currently implemented commands is given in Table 1. In general, there are three main functions, GFFex, ideogram and convertSVG implemented in the package RIdeogram. Users can use the function data to load the example data or the basic R function read.table to load the custom data from local files. The function GFFex can be used to extract the information from a GFF3 format genome annotation file. Then, the function ideogram can be used to compute the information for all drawing elements based on the input files and generate a A4-sized SVG file containing a vector graphic which can be conveniently viewed and modified using the software Adobe Illustrator or Inkscape. Alternatively, users can also use the function convertSVG to convert this SVG file into an adjustable image format (pdf, png, tiff, or jpg) with a user-defined resolution according to the practical requirements.

Table 1. Functions contained in the package RIdeogram.

Function name Description
GFFex Extract information from a GFF3 format genome annotation fill
ideogram Map and visualize the genome-wide data on the idiograms
convertSVG Convert the output file from the SVG format to the format users chose
svg2tiff Convert the output file from the SVG format to the TIFF format
svg2pdf Convert the output file from the SVG format to the PDF format
svg2jpg Convert the output file from the SVG format to the JPG format
svg2png Convert the output file from the SVG format to the PNG format

In general, there are two types of data, i.e., continuous and discrete data. For mapping and visualizing, RIdeogram considers the continuous data, such as gene density across the whole genome in 1-Mb windows, as overlaid features and maps them on the idiograms with dark/light colors representing high/low values. For the other data type that are scattered throughout the whole genome, such as the chromosomal distribution of members in one gene family, RIdeogram can add track labels next to the idiograms with three shapes (box, circle and triangle) available to represent different characteristics of these members, such as the subclade that one gene member belongs to. Users can also combine the shapes and colors to represent more than three distinct characteristic types. Furthermore, users can also map the continuous data as a heatmap, a line or area chart along the idiograms. In addition, RIdeogram also provides functions for the visualization of dual and ternary genome synteny using Bezier curves on the idiograms.

RIdeogram is available through CRAN (https://cran.r-project.org/web/packages/RIdeogram/) and is developed on GitHub (https://github.com/TickingClock1992/RIdeogram). Further extensions in development and fixes can be seen in the issue listing page on the package’s GitHub page. The new function that we are planning to implement in next version include, but are not limited to, developing more types of data visualization along the idiograms, visualizing genome synteny for more species and enlarging the user-specified genome regions to display detailed characteristics, as we gather more from users.

Examples

Our first example use the data contained in this package. After the completion of genome sequencing, assembly and annotation, RIdeogram can be used to give some idea of how genes are distributed across the whole genome. The example data contained numbers of protein-coding genes calculated in 1-Mb windows which can be considered as continues data and positions of 500 random selected non-coding RNAs, including ribosomal RNAs (rRNAs), transfer RNAs (tRNAs) and microRNAs (miRNAs), which can be considered as discrete data. RIdeogram maps the gene density information on the idiograms as overlaid features in a heat map and adds track labels next to the idiograms with green boxes, purple circles and orange triangles representing rRNAs, tRNAs and miRNAs, respectively (Fig. 1). Obviously, inter- and intra-chromosomal gene distributions are non-uniform. For instance, the chromosomal regions adjacent to the centromeres are gene-poor in chromosome 1, 9 and 16 while those are gene-rich in chromosome 11, 14 and 17. This function can be applied to many different situations, such as single nucleotide polymorphism (SNP) density and candidate markers (Fig. S1 & Data S1, original data see Li et al., 2019), DNA methylation dynamics and potential activated genes (Fig. S2 & Data S2, original data see Huang et al., 2019) and transcription factor (TF) binding sites and candidate target genes (Fig. S3 & Data S3, original data see Shamimuzzaman & Vodkin, 2013).

Figure 1. Gene distribution across the whole human genome.

Figure 1

The overlaid heatmap shows the gene density and the tack labels refer to 500 random selected RNAs consisted of rRNAs (green boxes), tRNA (purple circles) and miRNA (orange triangles) locus across the human genome. Annotation information was downloaded from the GENCODE website (https://www.gencodegenes.org).

Besides visualizing some specific genome characteristics across the whole genome at the chromosome level as showed in Fig. 1, RIdeogram can also be used to compare two relevant genome features, such as gene and repeat density, which will provide some important implications for better understanding the relevance of chromosomal distribution patterns of these two features. The example data implemented in this package also contained the information of long terminal repeat (LTR) distribution across the human genome. Since the transposable elements have been suggested to have a potential detrimental effect on gene expression (Hollister & Gaut, 2009), the distributions of gene and LTR are supposed to be opposite across the whole genome as a result of natural selection. As expect, the region that has a relatively high gene content usually has a relatively low LTR density and vice versa (Fig. S4), indicating that LTR seems to avoid inserting in the regions with a high gene content in the genome. This similar phenomenon was also observed in the sunflower genome explained using two idiogram graphics, one showing the gene distribution and the other showing the LTR distribution (Badouin et al., 2017). Using RIdeogram, users can integrate these two graphics into one, much easier for researchers to interpret and readers to understand. Apart from the differences, this function can also be used to show the similarities, like the similar genetic diversity patterns across the whole genome between two geographical groups of the same species, in different label types (Fig. S5 & Data S4, Fig. S6, original data see Chen et al., 2019).

In addition, RIdeogram can also be used to show syntenic comparisons between two or three genomes. As shown in Fig. 2, the syntenic blocks between each pair of species, which were identified using MCScan (Tang et al., 2008), were plotted. Particularly, a typical ancestral region in the basal angiosperm Amborella can be tracked to up to two regions in Liriodendron and to up to three regions in grape. Based on the fact that no lineage-specific polyploidy event has been found in Amborella and a whole-genome triplication has been detected in grape, it is reasonable to assume a single Liriodendron lineage-specific whole genome duplication event (Chen et al., 2019). Furthermore, RIdeogram allows to visualize a dual genome comparison, such as the genome synteny between human and mouse (Fig. S7 and Data S5). Compared to autosomes, the syntenic blocks between human and mouse X chromosomes occupy almost the entirety of each X chromosome, suggesting a highly conserved syntenic relationship of the X chromosome within the eutherian mammalian lineage (Ross et al., 2005).

Figure 2. Syntenic comparison of three plant genomes.

Figure 2

Genome synteny patterns show that a typical ancestral region in the basal angiosperm Amborella can be tracked to up to two regions in Liriodendron and to up to three regions in grape. Gray wedges in the background highlight major syntenic blocks spanning more than 30 genes between the genomes (highlighted by one syntenic set shown in colored).

Conclusion

The RIdeogram package provides an efficient and effective way to build idiograms with no species limitations and map genome-wide information on the idiograms for better visualizing and understanding the chromosomal distribution patterns of some particular genomic features. Meanwhile, this package can be also used to visualize syntenic analysis between genomes. Additionally, it is user-friendly and accessible for biologists without extensive computer programming expertise. Finally, RIdeogram can generate two types of images, a vector graphic or a bitmap file, both in high-quality and meeting conventional requirements for direct use in presentations or journal publications.

Supplemental Information

Figure S1. The distribution of 200,481 SNPs selected for the pear array design.

The SNP markers are counted in a 100-Kb window. The light-yellow color represents a low content and the navy-blue color represents a high content of SNPs (range 0–215). The red circles represent the SNP markers which are significantly associated with nine traits. The plot shows that these 200,481 SNPs selected from original 18.3 million SNPs have a uniform distribution and are appropriate to be used to further develop the pear array.

DOI: 10.7717/peerj-cs.251/supp-1
Figure S2. Fold changes of DNA methylation levels at hyper-DMRs along nine orange chromosomes during fruit ripening.

The average fold changes are calculated in a 100-Kb window. The light-orange color represents a low fold enrichment change and the dark-orange color represents a high fold enrichment change of methylation (range 0.67–0.88). The green triangles represent genes that located in regions with a fold change of DNA methylation greater than one. The plot shows that the alteration of DNA methylation at hyper-DMRs during fruit ripening is unevenly distributed across the whole orange genome, with an obvious enrichment in some specific regions, probably the centromeric heterochromatin regions.

DOI: 10.7717/peerj-cs.251/supp-2
Figure S3. The distribution of NAC binding sites and candidate genes potentially regulated by NAC during soybean seedling development.

The light- and dark-purple colors represent an enriched peak detected from the ChIP-Seq data with a low and high fold enrichment (range 2.52–14.23), respectively. Boxes and circles represent genes that are up- and down-regulated during soybean seedling development, respectively. Genes which have no significantly changes during soybean seedling development are represented by triangles. This plot shows a DNA-binding-site landscape for the NAC transcription factor and potential target genes that are probably regulated by this transcription factor during soybean seedling development.

DOI: 10.7717/peerj-cs.251/supp-3
Figure S4. A comparison of chromosomal distribution of genes and LTRs in the human genome.

The gene number and LTR number are both counted in a 1-Mb window. Red color represents the gene number (range 0–135 per Mb) and blue color represents the LTR number (range 0–606 per Mb). The light and dark colors represent a low and high content, respectively. This plot shows that gene and LTR have an opposite distribution pattern along the human chromosomes.

DOI: 10.7717/peerj-cs.251/supp-4
Figure S5. The distribution of genetic diversity within two different geographical Liriodendron groups along 19 chromosomes.

Distributions of nucleotide diversity (p) along 19 Liriodendron chromosomes among accessions came from western (range 8.34 × 10−5–4.87 × 10−3) and eastern China (range 7.26 × 10−5–4.09 × 10−3) are plotted. The nucleotide diversity in two groups are both calculated in a 2-Mb sliding window with a 1-Mb step. The plot shows that the nucleotide diversity dynamics across the whole Liriodendron genome within eastern and western China groups share the same pattern.

DOI: 10.7717/peerj-cs.251/supp-5
Figure S6. The distribution of genetic differentiation between and diversity within two different geographical Liriodendron groups along 19 chromosomes.

Distributions of genetic differentiation (Fst) between western and eastern China groups and nucleotide diversity (p) among accessions came from western and eastern China are plotted. The genetic differentiation between and nucleotide diversity within two groups are all calculated in a 2-Mb sliding window with a 1-Mb step. The genetic differentiation distribution is mapped on idiograms while genetic diversity distributions are mapped along the idiograms as line charts (a) and area charts (b).

DOI: 10.7717/peerj-cs.251/supp-6
Figure S7. Genome synteny between human and mouse.

Syntenic blocks were constructed using SynBuilder ( http://bioinfo.konkuk.ac.kr/synteny_portal/htdocs/synteny_builder.php). The reference genomes for human and mouse were hg38 and mm10, respectively. The minimum size of a reference block was set to be 150 kb.

DOI: 10.7717/peerj-cs.251/supp-7
Data S1. Data and code for visualizing pear SNP density across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-8
Data S2. Data and code for visualizing DNA methylation dynamics during orange fruit ripening across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-9
Data S3. Data and code for visualizing NAC binging sites during soybean seeding development across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-10
Data S4. Data and code for visualizing genetic diversity between two Liriodendron groups across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-11
Data S5. Data and code for visualizing genome synteny between human and mouse.
DOI: 10.7717/peerj-cs.251/supp-12
Article S1. Codes for examples.
DOI: 10.7717/peerj-cs.251/supp-13

Acknowledgments

We thank Dr. Zhongjuan Zhang for her comments on the manuscript.

Funding Statement

This work was supported by the Key Research and Development Plan of Jiangsu Province (BE2017376), the Foundation of Jiangsu Forestry Bureau (LYKJ[2017]42), the Qinglan Project of Jiangsu Province and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Contributor Information

Guangchuang Yu, Email: gcyu1@smu.edu.cn.

Jinhui Chen, Email: chenjh@njfu.edu.cn.

Additional Information and Declarations

Competing Interests

The authors declare there are no competing interests.

Author Contributions

Zhaodong Hao conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Dekang Lv performed the experiments, performed the computation work, authored or reviewed drafts of the paper, and approved the final draft.

Ying Ge performed the experiments, performed the computation work, authored or reviewed drafts of the paper, typeset the code, and approved the final draft.

Jisen Shi and Dolf Weijers performed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

Guangchuang Yu and Jinhui Chen conceived and designed the experiments, performed the computation work, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

Data and codes are available at GitHub: https://github.com/TickingClock1992/RIdeogram.

References

  • Anand (2019).Anand L. chromoMap: interactive visualization and mapping of chromosomes. bioRxiv. 2019 doi: 10.1101/605600. [DOI]
  • Badouin et al. (2017).Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Briere C, Owens GL, Carrere S, Mayjonade B, Legrand L, Gill N, Kane NC, Bowers JE, Hubner S, Bellec A, Berard A, Berges H, Blanchet N, Boniface MC, Brunel D, Catrice O, Chaidir N, Claudel C, Donnadieu C, Faraut T, Fievet G, Helmstetter N, King M, Knapp SJ, Lai Z, Le Paslier MC, Lippi Y, Lorenzon L, Mandel JR, Marage G, Marchand G, Marquand E, Bret-Mestries E, Morien E, Nambeesan S, Nguyen T, Pegot-Espagnet P, Pouilly N, Raftis F, Sallet E, Schiex T, Thomas J, Vandecasteele C, Vares D, Vear F, Vautrin S, Crespi M, Mangin B, Burke JM, Salse J, Munos S, Vincourt P, Rieseberg LH, Langlade NB. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature. 2017;546:148–152. doi: 10.1038/nature22380. [DOI] [PubMed] [Google Scholar]
  • Cao et al. (2014).Cao H, Hastie AR, Cao D, Lam ET, Sun Y, Huang H, Liu X, Lin L, Andrews W, Chan S, Huang S, Tong X, Requa M, Anantharaman T, Krogh A, Yang H, Cao H, Xu X. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience. 2014;3 doi: 10.1186/2047-217X-3-34. Article 34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Chen et al. (2019).Chen J, Hao Z, Guang X, Zhao C, Wang P, Xue L, Zhu Q, Yang L, Sheng Y, Zhou Y, Xu H, Xie H, Long X, Zhang J, Wang Z, Shi M, Lu Y, Liu S, Guan L, Zhu Q, Yang L, Ge S, Cheng T, Laux T, Gao Q, Peng Y, Liu N, Yang S, Shi J. Liriodendron genome sheds light on angiosperm phylogeny and species-pair differentiation. Nature Plants. 2019;5:18–25. doi: 10.1038/s41477-018-0323-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Dekker et al. (2002).Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
  • Durinck et al. (2009).Durinck S, Bullard J, Spellman PT, Dudoit S. GenomeGraphs: integrated genomic data visualization with R. BMC Bioinformatics. 2009;10:2. doi: 10.1186/1471-2105-10-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Eid et al. (2009).Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
  • Gel & Serra (2017).Gel B, Serra E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics. 2017;33:3088–3090. doi: 10.1093/bioinformatics/btx346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hollister & Gaut (2009).Hollister JD, Gaut BS. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Research. 2009;19:1419–1428. doi: 10.1101/gr.091678.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hu et al. (2019).Hu L, Xu Z, Wang M, Fan R, Yuan D, Wu B, Wu H, Qin X, Yan L, Tan L, Sim S, Li W, Saski CA, Daniell H, Wendel JF, Lindsey K, Zhang X, Hao C, Jin S. The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis. Nature Communications. 2019;10 doi: 10.1038/s41467-019-12607-6. Article 4702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Huang et al. (2019).Huang H, Liu R, Niu Q, Tang K, Zhang B, Zhang H, Chen K, Zhu JK, Lang Z. Global increase in DNA methylation during orange fruit development and ripening. Proceedings of the National Academy of Sciences of the United States of America. 2019;116:1430–1436. doi: 10.1073/pnas.1815441116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Janecka & Lysak (2016).Janecka J, Lysak MA. chromDraw: an R package for visualization of linear and circular karyotypes. Chromosome Research. 2016;24:217–223. doi: 10.1007/s10577-015-9513-5. [DOI] [PubMed] [Google Scholar]
  • Jiao & Schneeberger (2017).Jiao WB, Schneeberger K. The impact of third generation genomic technologies on plant genome assembly. Current Opinion in Plant Biology. 2017;36:64–70. doi: 10.1016/j.pbi.2017.02.002. [DOI] [PubMed] [Google Scholar]
  • Kin & Ono (2007).Kin T, Ono Y. Idiographica: a general-purpose web application to build idiograms on-demand for human, mouse and rat. Bioinformatics. 2007;23:2945–2946. doi: 10.1093/bioinformatics/btm455. [DOI] [PubMed] [Google Scholar]
  • Krzywinski et al. (2009).Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information aesthetic for comparative genomics. Genome Research. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Laver et al. (2015).Laver T, Harrison J, O’Neill PA, Moore K, Farbos A, Paszkiewicz K, Studholme DJ. Assessing the performance of the Oxford Nanopore Technologies MinION. Biomolecular Detection and Quantification. 2015;3:1–8. doi: 10.1016/j.bdq.2015.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Li et al. (2019).Li X, Singh J, Qin M, Li S, Zhang X, Zhang M, Khan A, Zhang S, Wu J. Development of an integrated 200K SNP genotyping array and application for genetic mapping, genome assembly improvement and genome wide association studies in pear (Pyrus) Plant Biotechnology Journal. 2019;17:1582–1594. doi: 10.1111/pbi.13085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Orostica & Verdugo (2016).Orostica KY, Verdugo RA. chromPlot: visualization of genomic data in chromosomal context. Bioinformatics. 2016;32:2366–2368. doi: 10.1093/bioinformatics/btw137. [DOI] [PubMed] [Google Scholar]
  • Pai & Ren (2014).Pai S, Ren J. IdeoViz: plots data (continuous/discrete) along chromosomal ideogram. R package version 1.8.02014
  • Parveen, Khurana & Kumar (2019).Parveen A, Khurana S, Kumar A. Overview of genomic tools for circular visualization in the next-generation genomic sequencing era. Current Genomics. 2019;20:90–99. doi: 10.2174/1389202920666190314092044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Phillippy (2017).Phillippy AM. New advances in sequence assembly. Genome Research. 2017;27:xi–xiii. doi: 10.1101/gr.223057.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Prlic (2017).Prlic A. KaryotypeSVG—SVG based ideograms of chromosomes showing cytogenetic bands. Version 0.2.0https://github.com/andreasprlic/karyotypeSVG 2017
  • R Core Team (2018).R Core Team . Vienna: R Foundation for Statistical Computing; 2018. [Google Scholar]
  • Ross et al. (2005).Ross MT, Grafham DV, Coffey AJ, Scherer S, McLay K, Muzny D, Platzer M, Howell GR, Burrows C, Bird CP, Frankish A, Lovell FL, Howe KL, Ashurst JL, Fulton RS, Sudbrak R, Wen GP, Jones MC, Hurles ME, Andrews TD, Scott CE, Searle S, Ramser J, Whittaker A, Deadman R, Carter NP, Hunt SE, Chen R, Cree A, Gunaratne P, Havlak P, Hodgson A, Metzker ML, Richards S, Scott G, Steffen D, Sodergren E, Wheeler DA, Worley KC, Ainscough R, Ambrose KD, Ansari-Lari MA, Aradhya S, Ashwell RIS, Babbage AK, Bagguley CL, Ballabio A, Banerjee R, Barker GE, Barlow KF, Barrett IP, Bates KN, Beare DM, Beasley H, Beasley O, Beck A, Bethel G, Blechschmidt K, Brady N, Bray-Allen S, Bridgeman AM, Brown AJ, Brown MJ, Bonnin D, Bruford EA, Buhay C, Burch P, Burford D, Burgess J, Burrill W, Burton J, Bye JM, Carder C, Carrel L, Chako J, Chapman JC, Chavez D, Chen E, Chen G, Chen Y, Chen ZJ, Chinault C, Ciccodicola A, Clark SY, Clarke G, Clee CM, Clegg S, Clerc-Blankenburg K, Clifford K, Cobley V, Cole CG, Conquer JS, Corby N, Connor RE, David R, Davies J, Davis C, Davis J, Delgado O, DeShazo D, Dhami P, Ding Y, Dinh H, Dodsworth S, Draper H, Dugan-Rocha S, Dunham A, Dunn M, Durbin KJ, Dutta I, Eades T, Ellwood M, Emery-Cohen A, Errington H, Evans KL, Faulkner L, Francis F, Frankland J, Fraser AE, Galgoczy P, Gilbert J, Gill R, Glockner G, Gregory SG, Gribble S, Griffiths C, Grocock R, Gu YH, Gwilliam R, Hamilton C, Hart EA, Hawes A, Heath PD, Heitmann K, Hennig S, Hernandez J, Hinzmann B, Ho S, Hoffs M, Howden PJ, Huckle EJ, Hume J, Hunt PJ, Hunt AR, Isherwood J, Jacob L, Johnson D, Jones S, Jong PJde, Joseph SS, Keenan S, Kelly S, Kershaw JK, Khan Z, Kioschis P, Klages S, Knights AJ, Kosiura A, Kovar-Smith C, Laird GK, Langford C, Lawlor S, Leversha M, Lewis L, Liu W, Lloyd C, Lloyd DM, Loulseged H, Loveland JE, Lovell JD, Lozado R, Lu J, Lyne R, Ma J, Maheshwari M, Matthews LH, McDowall J, McLaren S, McMurray A, Meidl P, Meitinger T, Milne S, Miner G, Mistry SL, Morgan M, Morris S, Muller I, Mullikin JC, Nguyen N, Nordsiek G, Nyakatura G, O’Dell CN, Okwuonu G, Palmer S, Pandian R, Parker D, Parrish J, Pasternak S, Patel D, Pearce AV, Pearson DM, Pelan SE, Perez L, Porter KM, Ramsey Y, Reichwald K, Rhodes S, Ridler KA, Schlessinger D, Schueler MG, Sehra HK, Shaw-Smith C, Shen H, Sheridan EM, Shownkeen R, Skuce CD, Smith ML, Sotheran EC, Steingruber HE, Steward CA, Storey R, Swann RM, Swarbreck D, Tabor PE, Taudien S, Taylor T, Teague B, Thomas K, Thorpe A, Timms K, Tracey A, Trevanion S, Tromans AC, d’Urso M, Verduzco D, Villasana D, Waldron L, Wall M, Wang QY, Warren J, Warry GL, Wei XH, West A, Whitehead SL, Whiteley MN, Wilkinson JE, Willey DL, Williams G, Williams L, Williamson A, Williamson H, Wilming L, Woodmansey RL, Wray PW, Yen J, Zhang JK, Zhou JL, Zoghbi H, Zorilla S, Buck D, Reinhardt R, Poustka A, Rosenthal A, Lehrach H, Meindl A, Minx PJ, Hillier LW, Willard HF, Wilson RK, Waterston RH, Rice CM, Vaudin M, Coulson A, Nelson DL, Weinstock G, Sulston JE, Durbin R, Hubbard T, Gibbs RA, Beck S, Rogers J, Bentley DR. The DNA sequence of the human X chromosome. Nature. 2005;434:325–337. doi: 10.1038/nature03440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Shamimuzzaman & Vodkin (2013).Shamimuzzaman M, Vodkin L. Genome-wide identification of binding sites for NAC and YABBY transcription factors and co-regulated genes during soybean seedling development by ChIP-Seq and RNA-Seq. BMC Genomics. 2013;14:477. doi: 10.1186/1471-2164-14-477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Tang et al. (2008).Tang HB, Wang XY, Bowers JE, Ming R, Alam M, Paterson AH. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Research. 2008;18:1944–1954. doi: 10.1101/gr.080978.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Wang et al. (2019).Wang M, Tu L, Yuan D, Zhu, Shen C, Li J, Liu F, Pei L, Wang P, Zhao G, Ye Z, Huang H, Yan F, Ma Y, Zhang L, Liu M, You J, Yang Y, Liu Z, Huang F, Li B, Qiu P, Zhang Q, Zhu L, Jin S, Yang X, Min L, Li G, Chen LL, Zheng H, Lindsey K, Lin Z, Udall JA, Zhang X. Reference genome sequences of two cultivated allotetraploid cottons. Gossypium hirsutum and Gossypium barbadense. Nature Genetics. 2019;51:224–229. doi: 10.1038/s41588-018-0282-x. [DOI] [PubMed] [Google Scholar]
  • Weitz et al. (2017).Weitz EM, Pantano L, Zhu J, Upton B, Busby B. Viewing RNA-seq data on the entire human genome. F1000Res. 2017;6 doi: 10.12688/f1000research.9762.1. Article 596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Yin, Cook & Lawrence (2012).Yin TF, Cook D, Lawrence M. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biology. 2012;13:R77. doi: 10.1186/gb-2012-13-8-r77. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. The distribution of 200,481 SNPs selected for the pear array design.

The SNP markers are counted in a 100-Kb window. The light-yellow color represents a low content and the navy-blue color represents a high content of SNPs (range 0–215). The red circles represent the SNP markers which are significantly associated with nine traits. The plot shows that these 200,481 SNPs selected from original 18.3 million SNPs have a uniform distribution and are appropriate to be used to further develop the pear array.

DOI: 10.7717/peerj-cs.251/supp-1
Figure S2. Fold changes of DNA methylation levels at hyper-DMRs along nine orange chromosomes during fruit ripening.

The average fold changes are calculated in a 100-Kb window. The light-orange color represents a low fold enrichment change and the dark-orange color represents a high fold enrichment change of methylation (range 0.67–0.88). The green triangles represent genes that located in regions with a fold change of DNA methylation greater than one. The plot shows that the alteration of DNA methylation at hyper-DMRs during fruit ripening is unevenly distributed across the whole orange genome, with an obvious enrichment in some specific regions, probably the centromeric heterochromatin regions.

DOI: 10.7717/peerj-cs.251/supp-2
Figure S3. The distribution of NAC binding sites and candidate genes potentially regulated by NAC during soybean seedling development.

The light- and dark-purple colors represent an enriched peak detected from the ChIP-Seq data with a low and high fold enrichment (range 2.52–14.23), respectively. Boxes and circles represent genes that are up- and down-regulated during soybean seedling development, respectively. Genes which have no significantly changes during soybean seedling development are represented by triangles. This plot shows a DNA-binding-site landscape for the NAC transcription factor and potential target genes that are probably regulated by this transcription factor during soybean seedling development.

DOI: 10.7717/peerj-cs.251/supp-3
Figure S4. A comparison of chromosomal distribution of genes and LTRs in the human genome.

The gene number and LTR number are both counted in a 1-Mb window. Red color represents the gene number (range 0–135 per Mb) and blue color represents the LTR number (range 0–606 per Mb). The light and dark colors represent a low and high content, respectively. This plot shows that gene and LTR have an opposite distribution pattern along the human chromosomes.

DOI: 10.7717/peerj-cs.251/supp-4
Figure S5. The distribution of genetic diversity within two different geographical Liriodendron groups along 19 chromosomes.

Distributions of nucleotide diversity (p) along 19 Liriodendron chromosomes among accessions came from western (range 8.34 × 10−5–4.87 × 10−3) and eastern China (range 7.26 × 10−5–4.09 × 10−3) are plotted. The nucleotide diversity in two groups are both calculated in a 2-Mb sliding window with a 1-Mb step. The plot shows that the nucleotide diversity dynamics across the whole Liriodendron genome within eastern and western China groups share the same pattern.

DOI: 10.7717/peerj-cs.251/supp-5
Figure S6. The distribution of genetic differentiation between and diversity within two different geographical Liriodendron groups along 19 chromosomes.

Distributions of genetic differentiation (Fst) between western and eastern China groups and nucleotide diversity (p) among accessions came from western and eastern China are plotted. The genetic differentiation between and nucleotide diversity within two groups are all calculated in a 2-Mb sliding window with a 1-Mb step. The genetic differentiation distribution is mapped on idiograms while genetic diversity distributions are mapped along the idiograms as line charts (a) and area charts (b).

DOI: 10.7717/peerj-cs.251/supp-6
Figure S7. Genome synteny between human and mouse.

Syntenic blocks were constructed using SynBuilder ( http://bioinfo.konkuk.ac.kr/synteny_portal/htdocs/synteny_builder.php). The reference genomes for human and mouse were hg38 and mm10, respectively. The minimum size of a reference block was set to be 150 kb.

DOI: 10.7717/peerj-cs.251/supp-7
Data S1. Data and code for visualizing pear SNP density across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-8
Data S2. Data and code for visualizing DNA methylation dynamics during orange fruit ripening across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-9
Data S3. Data and code for visualizing NAC binging sites during soybean seeding development across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-10
Data S4. Data and code for visualizing genetic diversity between two Liriodendron groups across the whole genome.
DOI: 10.7717/peerj-cs.251/supp-11
Data S5. Data and code for visualizing genome synteny between human and mouse.
DOI: 10.7717/peerj-cs.251/supp-12
Article S1. Codes for examples.
DOI: 10.7717/peerj-cs.251/supp-13

Data Availability Statement

The following information was supplied regarding data availability:

Data and codes are available at GitHub: https://github.com/TickingClock1992/RIdeogram.


Articles from PeerJ Computer Science are provided here courtesy of PeerJ, Inc

RESOURCES