Abstract
Genomics is a sequence-based informatics science and a three-dimensional-structure-based material science. However, in practice, most genomics researchers utilize sequence-based informatics approaches or three-dimensional-structure-based material science techniques, not both. This division is, at least in part, the result of historical developments rather than a fundamental necessity. The underlying computational tools, experimental techniques, and theoretical models were developed independently. The primary result presented here is a framework for the unification of informatics- and physics-based data associated with DNA, nucleosomes, and chromatin. The framework is based on the mathematical representation of geometrically exact rods and the generalization of DNA basepair step parameters. Data unification enables researchers to integrate computational, experimental, and theoretical approaches for the study of chromatin biology. The framework can be implemented using model-view-controller design principles, existing genome browsers, and existing molecular visualization tools. We developed a minimal, web-based genome dashboard, G-Dash-min, and applied it to two simple examples to demonstrate the usefulness of data unification and proof of concept. Genome dashboards developed using the framework and design principles presented here are extensible and customizable and are therefore more broadly applicable than the examples presented. We expect a number of purpose-specific genome dashboards to emerge as a novel means of investigating structure-function relationships for genomes that range from basepairs to entire chromosomes and for generating, validating, and testing mechanistic hypotheses.
Significance
The genome dashboard framework provides a novel means of unifying one-dimensional sequence-based informatics and three-dimensional structure and four-dimensional dynamics data obtained from computational, experimental, and theoretical approaches. Numerous applications and model-specific genome dashboards can be developed based on the proposed framework. G-Dash-min, a minimal, web-based implementation of a genome dashboard, demonstrates data unification can be achieved from basepairs to entire chromosomes in real time as a web application.
Introduction
Chromatin is the biomaterial that contains the genome in all higher organisms. There is no consensus on the structure of chromatin (1), but there is a wealth of informatics and structure data available. From an informatics point of view, efforts such as the 1000 Genomes (2), ENCODE (3), and 4D Nucleome (4) projects provide sequence-based reference data. Coupling the reference data with next-generation sequencing and informatics analysis pipelines enables individual labs to conduct genome-wide association studies that link chromatin reprogramming with disease and altered gene expression as described, for example, in (5,60). Hi-C (6,7), Micro-C (8), and other chromosome conformation capture methods (9,10) provide distance constraints as a measure of the large-scale organization of chromatin structure. Super-resolution microscopy (11,12) provides optical visualization of three-dimensional (3D) structures at nanometer-scale resolution, and electron microscopy (13), NMR (14), and x-ray crystallography (15) provide ångström-scale resolution of individual nucleosomes and nucleosome arrays. Strategies for modeling chromatin structure are rapidly maturing (16, 17, 18, 19). The desire to merge computational and experimental approaches is recognized (20,21), but a significant challenge in chromatin structural biology is unifying these diverse data sets to advance our understanding of structure-function relationships and to validate genomic mechanisms of action.
Here, we categorize data according to the method typically used to display and analyze the data. Sequence-based informatics is considered a one-dimensional (1D) representation of an arbitrary number of data sets. Contact and distant constraints are two-dimensional (2D) representations. X-ray, NMR, and super-resolution microscopy are 3D representations. Molecular modeling and dynamics are four-dimensional (4D) representations. There exists a growing collection of computational tools that convert 2D data to 3D structures of chromatin (22), and computational models can promote 3D structures to dynamics or sampling data (four dimensions). But there remain few tools, other than ICM Web (23), that directly link sequence (1D) with chromatin structure (3D) or dynamics (4D). Thus, researchers utilizing 1D sequence-based methods are missing 3D and 4D structure and dynamics data, including steric and geometric constraints in their analyses, and researchers utilizing 3D and 4D computational and experimental methods are missing the wealth of informatics data available in sequence-based data sets.
A dashboard, as in an automobile or airplane cockpit, is a console for managing data that also includes controllers for navigating a physical world that appears in a window. A “genome dashboard” unifies informatics (1D), contact and two-angle representations (2D) (24), structure (3D), and dynamics (4D) data describing DNA, nucleosomes, and chromatin. For the purpose of developing such genome dashboards, we have identified a framework that unifies 1D and 3D representations and a general method for implementing it that supports data visualization and manipulation. The framework is bidirectional, i.e., can map 3D representations to 1D and 1D representations to 3D.
The framework is based on mathematical representations of geometrically exact rods (25), presented below as The Model, followed by Design Considerations for implementing it using model-view-controller (MVC) software development principles. This approach enables an off-the-shelf (OTS) approach for assembling genome dashboards that is both extensible and portable. Two examples are then presented to demonstrate that G-Dash-min, a minimal web-based implementation of a genome dashboard, can function in real time to convert informatics data into physical structures and physical structures into informatics data.
Methods
Framework
The model
From the genome dashboard perspective, informatics is any data that map to DNA sequence as any number of 1D informatics tracks T(s). Generally speaking, next-generation sequencing relies on aligning experimental data with chromosome coordinates and information theory for analysis. Physical structures include computational models and direct imaging by experiment. For the computational models, energy functions and physical laws are employed for analysis. The energy functions are typically grouped into external and internal energies. U = Uext + Uint, where Uext captures through-space interactions and Uint captures local conformation and dynamical properties. Both types of energy functions require knowledge of material properties and geometry. Material properties such as van der Waals radii, dielectric properties, partial charge distributions, moments of inertia, mass, stiffness parameters, bond angles, etc. are all parameters associated with a specific physical model or force field. The model itself may employ atomic, coarse-grained, or even continuum approximations. In all cases, the physical structure may be expressed in a laboratory (external) or material (internal) reference frame. The structure itself may be obtained from theoretical or experimental techniques.
Our strategy for unification (merging data from different sources) is based on the idea that DNA is the common thread in chromatin structural biology. Unification is achieved through laboratory (Cartesian coordinate) and material (internal coordinate) representations of DNA as an oriented space curve or just ribbon for simplicity (Fig. 1). Because unification is based on geometric considerations, our strategy is independent of the parameters associated with a specific physical model. Associating an energy landscape with a physical structure requires one to choose a physical model, but an experimentally determined structure can be compared to informatics data without recourse to any such physical model. Our framework does not provide a model, but it also does not restrict the user’s choice of model. Thus, various implementations of the genome dashboard framework may support one or many models or rely solely on experimentally determined structure data.
Figure 1.
Unification is the process of merging data from different sources. Physical structures and informatics data are unified by mathematical representations of an oriented space curve in laboratory [(s), D(s)] and material [] reference frames. The conformation of a physical structure C(s) is associated with the laboratory frame, and informatics track data T(s) is associated with the material frame. Masks M(s) alter the material properties of DNA and may be expressed in either representation. Exchanging data between laboratory and material frames unifies the physical structure and informatics.
In a laboratory reference frame, a continuous ribbon has centerline (s) and unit length directors embedded in the ribbon that capture the local orientation of the ribbon. The directors can be represented by a director frame matrix D = (26). This matrix also serves to transform a representation in the material (internal) frame to a representation in the laboratory (Cartesian coordinate) frame.
An equivalent description of the ribbon is based on the director frames themselves. This description is a material reference frame description that captures the translations and rotations connecting one director frame to the next, represented here by []. The two representations [(s), D(s)] and [] are equivalent descriptions of the conformation of an oriented space curve, denoted simply as C(s) for the Cartesian coordinate representation and T(s) when expressed in the material frame and interpreted as informatics tracks.
Converting between the [] and [(s), D(s)] representations requires either a differentiation or an integration, as expressed by the following equations.
| (1) |
| (2) |
Here, (s) is recognized as the unnormalized tangent to the ribbon expressed in the laboratory frame. (s) is the same vector expressed in the internal frame. D(s) is recognized as the vector corresponding to the instantaneous axis of rotation of the director frames located along the ribbon at position s, as represented in the laboratory coordinate frame. Discrete approximations to Eq. 1 and piecewise integration of Eq. 2 can be employed to obtain a collection of discrete director frames. Different models may require different numerical algorithms to achieve the required discretization. Numerical methods suitable for DNA are discussed below. They are reading strand invariant. Together, the material (internal) and laboratory (Cartesian coordinate) representations provide a basis for unifying informatics ([]) and structure ([(s), D(s)]) data.
DNA conformation C(s) is at best a basepair discrete approximation to a continuous oriented space curve (27,28). Basepair step parameters (29,30) and associated algorithms provide established methods for describing double- and single-stranded DNA as a discrete, oriented space curve at atomic resolution. A sequence-specific dinucleotide accurate model of dsDNA in the [] (basepair step parameter) representation can be obtained from x-ray (31) or molecular dynamics (32,33) studies. A [(s), D(s)] description is obtained by integrating []. There are two widely used tools for basepair step parameter analysis. 3DNA (34) uses a Euler angle (E-A)-based method and employs a “RollTilt” approximation (35). Curves+ (36) uses a Euler-Rodrigues (E-R)-based method (37). Both methods utilize a midstep plane construction to ensure that the computed parameter values are not affected by the choice of reading strand. Mathematically, one must invert the signs of tilt and shift upon strand reversal to preserve the alignment of director frames with the DNA major and minor groove (38).
Basepair step parameter values obtained from the same DNA structure using the 3DNA and Curves+ methods are known to differ (39). Differences may arise from at least three sources. The assignment of director frames to basepairs may differ. However, the methods for assigning director frames are well defined for ideal pairing (30), so these differences typically occur only for significant deviations from ideal geometries. Another source of differences is method dependent. These differences have not been well studied. Basepair step parameter values obtained from 3DNA and from Curves+ differ even when the director frames used for the calculations are identical, i.e., even when the first problem is eliminated. Thus, values obtained from one method should not in general be interchanged with the other. Finally, differences arise from implementation and usage; e.g., numerical precision during file read and write operations may differ. Nonetheless, these two methods work well for all-atom representations of the pairing and stacking of basepairs in double-stranded DNA, with the caveat that neither provides information about the DNA backbone. Recent efforts now support proper reconstruction of the DNA backbone (40).
Chromatin, for our purposes, is a biomolecular structure composed of DNA and external agents such as histones that alter the material properties (conformation, dynamics, flexibility, energetics, chemical properties) of a contiguous length of DNA from s to s + n. We label any such external agent, including histones, as a mask M(s, s + n). Any number of identical or unique masks (Mi(si, si + ni)) may be associated with a sequence of DNA (si to si + ni). With this approach, chromatin folding is an informatics problem of managing an inventory of masks. A geometric description of structure requires only knowledge of how masks alter conformation. More generally, a mask may alter the parameters associated with the energy functions for a specific physical model.
There are two strategies for activating structural changes associated with a mask. The first is achieved in the material reference frame with the conformation of the masked DNA denoted by a list of internal coordinates M(s, s + n) = that spans n basepairs. As the name suggests, the mask replaces the values associated with DNA. We can calculate CM(s) for this mask from Eq. 2, as discussed above. The second approach is achieved in the laboratory reference frame with the conformation of DNA described as a rigid entity with M(s, s + n) = . The mask consists of Cartesian coordinates and director frames, which can be converted to TM(s) using Eq. 1 as discussed above. The Cartesian coordinate representation of M(s, s + n) requires only a single translation and rotation to position each rigidly masked element in the laboratory reference frame.
In terms of masks, a nucleosome is a DNA superhelix and histones. Depending on the modeling strategy, the histones can be represented independently of DNA as a single entity (sphere, cylinder, ellipsoid), a collection of beads, or an all-atom model. Alternatively, the DNA and histones can be included in the nucleosome mask as a single entity (see Fig. S4). Docking individual histones or the complete histone octamer to a superhelix or placing entire nucleosomes between linkers can be achieved with the same methods and tools used for describing the relative rotations and translations of basepairs. However, the E-A and E-R methods (35,37) were only developed and validated for basepair-level discretization of DNA.
The above masking strategy for nucleosomes can be applied to any protein-DNA complex. The linker DNA connecting masks is often assumed to be free DNA, but in general, even linker DNA may be described by masks, e.g., bent or less flexible linkers. Likewise, chemical modification of the DNA (e.g., methylation, which does not change the sequence but the physical-chemical properties of DNA) is a mask.
In the context of a genome dashboard, chromatin folding is an informatics problem of describing the unique masks and tracking their locations along a sequence of DNA and a 3D structure problem of assessing the validity of a chromatin fold. The masks can be developed and manipulated based on informatics or physical analyses. In this manner, genome dashboards are designed to enable users to efficiently define and navigate chromatin folding landscapes.
Design considerations
Any genome dashboard is a finite state machine that can be efficiently developed using MVC design principles ((41); Fig. 2). This approach ensures that a dashboard’s components are independent, replaceable, and extensible.
Figure 2.
MVC design. Model: Laboratory frame [(s), D(s)] and material frame [] descriptions of DNA as the common thread, an inventory of masks M(si, si + ni), and procedures for converting between representations are given. View: an MV displays C(s), a genome browser (GB) displays T(s), and a CP provides a graphical interface to the controller. G-Dash-min uses JSmol and Biodalliance for the MV and GB components, respectively. An OTS approach enables a genome dashboard to use any desired MVs and GBs. Controller manages the exchange of data between model and views.
The “model” in the MVC schema is the data and related logic. For a genome dashboard, the model includes the [, D] and [] representations of DNA as a discrete (or even continuous (42,43)) oriented space curve, the inventory of masks Mi(si, si + ni), any associated track data T(s), and procedures for converting between representations. In general, the rotations and translations associated with a mask may be large. If a mask is a rigid entity, this information can be leveraged to improve performance. For example, representing all 147 basepairs of DNA and eight histones in a nucleosome as a single director frame along with a large deformation of the path of DNA reduces computational and data costs by approximately n × 146, where n is the number of nucleosomes containing 147 basepairs.
The “view” in the MVC schema provides the user interface and renders data. A genome dashboard includes a 3D/4D molecular visualization (MV) for rendering [, D], a genome browser (GB) for rendering [], and a control panel (CP) as a graphical user interface to the controller. A genome dashboard can be designed as a web application or a standalone application. For web applications, javascript-based MVs such as JSmol (44) and NGL Viewer (45) are optimal. For standalone applications, MVs such as VMD (46) and PyMOL (47) are optimal. Likewise, for web applications, the GB should be javascript based, like Biodalliance (48). For standalone applications, JBrowse (49) or other modern GBs may provide advantages.
The “controller” in the MVC schema manages the exchange of data between the view and the model. For a given genome dashboard, the MV and GB can be OTS elements, but the CP and controller are application specific. We expect that different instances of the genome dashboard concept will target different users and utilize different physical models or, in the case of purely experimental data, not even include a physical model. The controller enables the user to manage the physical models. Different strategies for managing the MV, GB, and CP will likely emerge, but the underlying model remains as described above.
Results
Based on the framework proposed above, we developed a minimal, web-based genome dashboard named “G-Dash-min.” Here, we demonstrate two examples using G-Dash-min to show how genome dashboards contribute to our understanding of biological function by the unification of informatics and physical structures.
Informatics to physical structure
A hormone response element is a specific sequence of DNA representing 15 basepairs. Selective binding of an activated hormone receptor to the hormone response element is a critical component of the hormone response mechanism; see Fig. 1–41 of (50). A variant of this gene regulatory mechanism is employed to control numerous physiologic functions in all higher organisms. To demonstrate the power of data unification achieved with our G-Dash-min application, we have identified estrogen response elements (EREs) using ERE-Finder (51). These informatics data are displayed as the ERE Track in Fig. 3. All experimentally determined nucleosome positions for the human genome (52) are also displayed in Fig. 3 as the Nuc-Pos Track. These tracks provide locations for EREs and nucleosomes. The representation in the GB is insufficient to determine whether or not the locations are physically realizable. Nonetheless, these tracks are sufficient to identify several regions of interest. One of them is associated with chromosome coordinates chr6:168,131,722…168,132,130. Here, we find three overlapping nucleosomes and an ERE that appears to function as a classic switching mechanism. We explore this hypothesis with G-Dash-min by generating physical structures.
Figure 3.
Colored boxes: C(s) and T(s) representations of two allowed states indicated by red and blue boxes, respectively. Upper boxes are T(s) representations of nucleosome positions (blue bars) and an ERE (red bar). Lower boxes are C(s) representations (small beads represent five basepairs; large beads represent histone octamers). Colored ellipses are the corresponding all-atom structures with the estrogen receptor DNA-binding domain docked to the DNA as in PDB: 1HCQ. (a) The ERE is located within a nucleosome, with the major groove facing inward. The receptor is prohibited from binding. (b) The ERE is located in a nucleosome-free region. Docking PDB: 1HCQ indicates that the ERE is physically accessible.
We first selected the single nucleosome shown in the bottom of the Nuc-Pos Track in Fig. 3 and generated a coarse-grained representation in G-Dash-min with the computing tools that also drive ICM Web (23). Mapping the ERE location onto a coarse-grained physical structure, an informatics problem, provided the ERE’s location, but without knowledge of the major groove orientation, one still cannot determine the accessibility of the ERE site for estrogen receptor binding. We constructed an all-atom model using the basepair step parameter data generated by the ICM Web computing tools in G-Dash-min. With the all-atom model, we see that the major groove is actually facing toward the histones. This prevents the estrogen receptor DNA-binding domain (Protein Data Bank, PDB: 1HCQ (53)) from binding to this region of the DNA major groove. To bind PDB: 1HCQ to the all-atom model, we downloaded the all-atom model from G-Dash-min, then loaded it and PDB: 1HCQ into VMD. A simple VMD script fits the DNA in PDB: 1HCQ to the DNA in the G-Dash-min all-atom model. (The models are provided as Data S1).
We used the same approach to model the two nucleosomes shown in the top of the Nuc-Pos Track in Fig. 3. Mapping the ERE location to the coarse-grained model suggests the ERE may be accessible to estrogen receptors. Using the same procedure and script as before, we find that PDB: 1HCQ can physically access this ERE with the nucleosome present. We point out that PDB: 1HCQ is only the DNA-binding domain of the estrogen receptor. There exist steric conflicts between the estrogen receptor DNA-binding domain and the histones, so this is not the complete story, but it strongly suggests this site as a candidate for a genetic switching mechanism.
With this example, we demonstrate with G-Dash-min how informatics is used to construct a physical structure that extends and validates the interpretation of the informatics data. We emphasize that any informatics track or combination of tracks can be used to inform the physical structure. All-atom and coarse-grained molecular mechanics can be used to further explore these structures. The choice of physical model depends on the exact question being posed.
Physical structure to informatics
Models of chromatin are rapidly maturing. As the models develop, there is increasing demand to capture biologic realism, including DNA sequence, experimentally determined nucleosome positions, states of chemical modification, etc. Without a genome dashboard, manually curating informatics data to build a “biologically inspired” model is a time-consuming and tedious task that informs the initial model but does not necessarily support the interpretation of modeling results. Genome dashboards enable any available informatics data to be easily associated with an existing physical model or experimentally determined structure to achieve a meaningful biological interpretation. Here, we import an HOXC mesoscale model generated by the Schlick lab (54) into G-Dash-min to demonstrate how informatics can be overlaid onto an existing physical model or structure.
In G-Dash-min, we provide an upload function that is specific for DiscoTech-based models (55). We upload and convert the HOXC mesoscale model into C(s) and T(s) representations (Fig. 4). The HOXC model utilizes a nine-basepairs-per-bead model that includes both the location and orientation of each bead and each DiscoTech-based nucleosome, i.e., (s) and D(s) are provided for both the ribbon and the masks. The DiscoTech nucleosomes are represented as masks consisting of a single director frame and center atom. We calculate [] based on the [(s), D(s)] data provided for DNA beads and masks. The [] values computed are no longer DNA basepair step parameters; however, they still represent an oriented space curve or ribbon. We refer to them as generalized step parameters and utilize the same naming conventions as for the DNA parameters (tilt, roll, twist, shift, slide, and rise) and display them as informatics tracks in the GB. Twist and rise are displayed as green structural informatics tracks in Fig. 4 d. Because the generalized parameter values differ significantly from those associated with DNA, small-angle approximations are no longer valid.
Figure 4.
(a) HOXC coarse-grained model of chromatin containing ∼55,000 basepairs of DNA and 284 nucleosomes. Uploading the HOXC model to G-Dash-min generates (b) a two-angle representation of the HOXC model, the color bar represents the index of nucleosomes, from red to blue, (c) a distance-distance matrix based on nucleosome centers of mass, the color bar represents the distance between nucleosomes, darker is closer, and black for the distance between nucleosomes less than 10 nm, and (d) structural informatics data. “Generalized Helical Parameter” (“Twist” and “Rise”) and nucleosome position (“Nucleosomes”) data are displayed alongside experimentally determined nucleosome positions (“Nuc-Pos”) and other informatics data (“Gencode”).
To determine whether either of the E-R or E-A algorithms is suitable for our generalized step parameters, we have converted all six HOXC models reported in (54) from the C(s) to the T(s) and back to the C(s) representations. We obtain only small root mean-square deviation values between the original and reconstructed HOXC models generated using either the E-A or E-R algorithms to determine generalized step parameter values and to convert between [, D] and [] representations. Our experience is that both the E-A and E-R methods are acceptable algorithms for implementing the model even when generalized step parameters are utilized. (All of the HOXC models and root mean-square deviation results are provided as Data S2).
The HOXC model was constructed for a specific sequence of DNA: HOXC10 of the annotated human genome assembly 38, which begins at chr12:53,985,065. However, the HOXC models provided do not contain sequence information because the model itself is sequence independent. Thus, whenever a DiscoTech model is uploaded into G-Dash-min, it must be aligned with sequence using the yellow sequence selection bar (Fig. 4). If sequence information is included in the model, the model can be automatically aligned to data in a GB. Once aligned, the structural informatics tracks enable us to compare the nucleosome positions used in the uploaded model (green nucleosomes track in Fig. 4) to experimentally determined nucleosome positions (blue Nuc-Pos Track in Fig. 4). It is clear that the two tracks differ. Resolving these differences promises to advance our understanding of both the experimental and modeling data.
To complete our multidimensional representation of chromatin folding, we have incorporated 2D representations into G-Dash-min. Fig. 4 b is a two-angle plot, and Fig. 4 c is a distance-distance matrix plot. As with the structure tracks, these representations are automatically generated for structures containing sufficiently many (more than three) nucleosomes. The two-angle plot is a Woodcock equivalent (WE) plot (24,56). On these plots, α is the angle between the centers of three adjacent nucleosomes, and β is the dihedral rotation angle obtained from the centers of four adjacent nucleosomes. For this analysis, the reported α- and β-values are computed as if the linkers were straight, even if the linkers are not. For this reason, we have adopted the label WE plot. The distance-distance matrix reports the center-to-center distance between all nucleosomes. Unlike the exchange of data between the informatics (1D) and physical structures (3D), the WE plot data and distance-distance maps are one directional. The 2D representations are obtained from the 3D model but cannot be used to generate 3D structures. Methods exist for this purpose (22) but have not yet been implemented in G-Dash-min.
Discussion
As a working example of a genome dashboard, G-Dash-min demonstrates that informatics and physical structures can be unified in a web-based application in real time. Our tests demonstrate that interactive usage can be achieved for systems containing 10,000–50,000 director frames that may correspond to entire basepairs or coarse-grained beads. The algorithms for converting informatics to physical structures and physical structures to informatics work in both directions for basepair-resolution structures using either the E-R (Curves+)- or E-A (3DNA)-based method. We believe our application to the HOXC model is the first demonstration that both the E-R and the E-A algorithms can also be utilized to calculate internal coordinate step parameters for coarse-grained models discretized well beyond the basepair level. We refer to such internal coordinates as “generalized step parameters” to emphasize that they are named, computed, and interpreted in the same way the DNA basepair step parameters are (Data S3). We have not addressed strategies for developing or validating a specific coarse-grain model or methods for embedding director frames. Rather, we demonstrated that existing conventions and tools can be repurposed to support the model in a genome dashboard. The advantage of this approach is that intuition and tools developed for one atomic or coarse-grained model will carry over to other models.
G-Dash-min also demonstrates that an OTS approach coupled with MVC principles can be employed to efficiently develop genome dashboards that are customizable, extensible, and portable. G-Dash-min can generate atomic or coarse-grained models of DNA, nucleosomes, and chromatin by combining any experimentally or theoretically determined informatics data. Coupling G-Dash-min’s atomic modeling capabilities with high-performance, high-throughput workflows, and our Theoretical Molecular Biology Library at Louisiana Tech University of nucleosome simulations (57) provides a software ecosystem for overnight comparative molecular dynamics simulations of nucleosomes (58). Such models are necessary for developing designer nucleosomes and assessing protein-DNA interactions in their native context (59).
The genome dashboard framework achieves unification of informatics and physical data. Now, the challenge is data format. Genomics data have well-defined data formats, but there are numerous data formats for Cartesian coordinate data and no established conventions for representing director frame data or step parameters. We have demonstrated that the existing E-R and E-A algorithms can support multiscale and multidimensional modeling of DNA as the common thread in chromatin using generalized helical parameters. We deliberately avoided energetic considerations and focused on structure. The genome dashboard user or developer must decide which energy model is most appropriate for their particular application. We thus expect many genome dashboards to be developed, each tailored for a specific use.
As described here, genome dashboards are designed to work with chromatin folding, but the model and framework presented here are not limited to chromatin, eukaryotes, or even DNA. The informatics data can be any data associated with a 1D indexing system, e.g., protein sequence or a SMILES string. The structure data can be a slender body or any ordered collection of points in space that are linked to form an oriented space curve. The latter includes observations of nucleosomes with super-resolution microscopy. If the sequence ordering of nucleosomes in a microscopy image can be enumerated and the orientation of each nucleosome determined, it will be possible to unify 1D, 2D, 3D, and 4D representations of chromatin determined solely from experimental images and to investigate biologic function in the same manner as shown in Figs. 3 and 4.
Author Contributions
Z.L. developed G-Dash-min as a web-based tool and prepared the manuscript. R.S.’s contributions were related mostly to all-atom modeling and helix parameter analysis. T.C.B. is responsible for the overall development of this manuscript and the G-Dash concept.
Acknowledgments
We thank Joohyun Kim and Jinghua Ge of the Center for Computation and Technology, Louisiana State University and John Gentle of Science Gateways Community Institute at Texas Advanced Computing Center for assistance and advice. We thank the Schlick Lab at New York University for sharing coordinate data for the HOXC models.
Research reported in this publication was supported by an Institutional Development Award from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20 GM103424-18 and partially supported by the National Science Foundation through cooperative agreement OIA-1541079. T.C.B. is partially supported by the Hazel Stewart Garner Professorship fund through the Louisiana Board of Regents.
Editor: Wilma Olson.
Footnotes
Supporting Material can be found online at https://doi.org/10.1016/j.bpj.2020.02.018.
Supporting Material
The files included are: bottom.pdb and top.pdb corresponding to models a) and b), respectively, shown in Fig. 3 of the manuscript.
There are 6 models: HOXC, Life-like, Life-like-Ac, Life-like-LH, uniformNFR, and uniformNRL. Detailed descriptions of these model can be found in (54). HOXC.zip contains 3 representations for each model in xyz file format. Files with only the “.xyz” suffix contain the director frames that we extracted directly from the HOXC models provided to us. For DNA: CA is the center atom of each director frame. H1,H2,H3 represent the directors. For Nucleosomes: O is the center atom of each director frame. OH1, OH2,OH3 represent the directors. Files with the “.E-R.xyz” suffix were converted from “xyz” director frames to “generalized step parameter” representations and then back to “xyz” director frames using the E-R method. Files with the “.E-A.xyz” suffix were converted using the E-A method. RMSD values between the initial model and the “E-R” and “E-A” reconstructed models are shown in Table S1.
The file naming convention indicates whether the E-A (35) (“x3dna_” filenames) or E-R (37) (“curves_” filenames) algorithm has been used to convert the director frames (“_rd_” filenames) to generalized step parameters (“_hp_” filenames).
References
- 1.Fussner E., Ching R.W., Bazett-Jones D.P. Living without 30nm chromatin fibers. Trends Biochem. Sci. 2011;36:1–6. doi: 10.1016/j.tibs.2010.09.002. [DOI] [PubMed] [Google Scholar]
- 2.Auton A., Brooks L.D., Abecasis G.R., 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Consortium E.P., ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dekker J., Belmont A.S., Zhong S., 4D Nucleome Network The 4D nucleome project. Nature. 2017;549:219–226. doi: 10.1038/nature23884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cowper-Sal⋅lari R., Zhang X., Lupien M. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet. 2012;44:1191–1198. doi: 10.1038/ng.2416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Belton J.-M., McCord R.P., Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Belaghzal H., Dekker J., Gibcus J.H. Hi-C 2.0: an optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods. 2017;123:56–65. doi: 10.1016/j.ymeth.2017.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hsieh T.S., Fudenberg G., Rando O.J. Micro-C XL: assaying chromosome conformation from the nucleosome to the entire genome. Nat. Methods. 2016;13:1009–1011. doi: 10.1038/nmeth.4025. [DOI] [PubMed] [Google Scholar]
- 9.Dekker J., Rippe K., Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
- 10.Sati S., Cavalli G. Chromosome conformation capture technologies and their impact in understanding genome function. Chromosoma. 2017;126:33–44. doi: 10.1007/s00412-016-0593-6. [DOI] [PubMed] [Google Scholar]
- 11.Duim W.C., Jiang Y., Moerner W.E. Super-resolution fluorescence of huntingtin reveals growth of globular species into short fibers and coexistence of distinct aggregates. ACS Chem. Biol. 2014;9:2767–2778. doi: 10.1021/cb500335w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ricci M.A., Cosma M.P., Lakadamyali M. Super resolution imaging of chromatin in pluripotency, differentiation, and reprogramming. Curr. Opin. Genet. Dev. 2017;46:186–193. doi: 10.1016/j.gde.2017.07.010. [DOI] [PubMed] [Google Scholar]
- 13.Wilson M.D., Costa A. Cryo-electron microscopy of chromatin biology. Acta Crystallogr. D Struct. Biol. 2017;73:541–548. doi: 10.1107/S2059798317004430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mlynárik V. Introduction to nuclear magnetic resonance. Anal. Biochem. 2017;529:4–9. doi: 10.1016/j.ab.2016.05.006. [DOI] [PubMed] [Google Scholar]
- 15.Tan S., Davey C.A. Nucleosome structural studies. Curr. Opin. Struct. Biol. 2011;21:128–136. doi: 10.1016/j.sbi.2010.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Becker N.B., Everaers R. From rigid base pairs to semiflexible polymers: coarse-graining DNA. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2007;76:021923. doi: 10.1103/PhysRevE.76.021923. [DOI] [PubMed] [Google Scholar]
- 17.Schlick T., Hayes J., Grigoryev S. Toward convergence of experimental studies and theoretical modeling of the chromatin fiber. J. Biol. Chem. 2012;287:5183–5191. doi: 10.1074/jbc.R111.305763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Perišić O., Schlick T. Computational strategies to address chromatin structure problems. Phys. Biol. 2016;13:035006. doi: 10.1088/1478-3975/13/3/035006. [DOI] [PubMed] [Google Scholar]
- 19.Portillo-Ledesma S., Schlick T. Bridging chromatin structure and function over a range of experimental spatial and temporal scales by molecular modeling. WIREs Comput. Mol. Sci. 2020;10:e1434. doi: 10.1002/wcms.1434. Published online August 6, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ozer G., Luque A., Schlick T. The chromatin fiber: multiscale problems and approaches. Curr. Opin. Struct. Biol. 2015;31:124–139. doi: 10.1016/j.sbi.2015.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Perkel J.M. Plot a course through the genome. Nature. 2017;549:117–118. doi: 10.1038/549117a. [DOI] [PubMed] [Google Scholar]
- 22.4DN Software. https://www.4dnucleome.org/software.html.
- 23.Stolz R.C., Bishop T.C. ICM Web: the interactive chromatin modeling web server. Nucleic Acids Res. 2010;38:W254–W261. doi: 10.1093/nar/gkq496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Woodcock C.L., Grigoryev S.A., Whitaker N. A chromatin folding model that incorporates linker variability generates fibers resembling the native structures. Proc. Natl. Acad. Sci. USA. 1993;90:9021–9025. doi: 10.1073/pnas.90.19.9021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Simo J.C., Marsden J.E., Krishnaprasad P. The Hamiltonian structure of nonlinear elasticity: the material and convective representations of solids, rods, and plates. Arch. Ration. Mech. Anal. 1988;104:125–183. [Google Scholar]
- 26.Simo J.C., Vu-Quoc L. A geometrically-exact rod model incorporating shear and torsion-warping deformation. Int. J. Solids Struct. 1991;27:371–393. [Google Scholar]
- 27.Calladine C.R., Drew H.R. Principles of sequence-dependent flexure of DNA. J. Mol. Biol. 1986;192:907–918. doi: 10.1016/0022-2836(86)90036-7. [DOI] [PubMed] [Google Scholar]
- 28.Fathizadeh A., Eslami-Mossallam B., Ejtehadi M.R. Definition of the persistence length in the coarse-grained models of DNA elasticity. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2012;86:051907. doi: 10.1103/PhysRevE.86.051907. [DOI] [PubMed] [Google Scholar]
- 29.Dickerson R.E. Definitions and nomenclature of nucleic acid structure components. Nucleic Acids Res. 1989;17:1797–1803. doi: 10.1093/nar/17.5.1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Olson W.K., Bansal M., Berman H.M. A standard reference frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. 2001;313:229–237. doi: 10.1006/jmbi.2001.4987. [DOI] [PubMed] [Google Scholar]
- 31.Olson W.K., Gorin A.A., Zhurkin V.B. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl. Acad. Sci. USA. 1998;95:11163–11168. doi: 10.1073/pnas.95.19.11163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lavery R., Zakrzewska K., Sponer J. A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA. Nucleic Acids Res. 2010;38:299–313. doi: 10.1093/nar/gkp834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pasi M., Maddocks J.H., Lavery R. μABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA. Nucleic Acids Res. 2014;42:12272–12283. doi: 10.1093/nar/gku855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lu X.-J., Olson W.K. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 2003;31:5108–5121. doi: 10.1093/nar/gkg680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.el Hassan M.A., Calladine C.R. The assessment of the geometry of dinucleotide steps in double-helical DNA; a new local calculation scheme. J. Mol. Biol. 1995;251:648–664. doi: 10.1006/jmbi.1995.0462. [DOI] [PubMed] [Google Scholar]
- 36.Lavery R., Moakher M., Zakrzewska K. Conformational analysis of nucleic acids revisited: Curves+ Nucleic Acids Res. 2009;37:5917–5929. doi: 10.1093/nar/gkp608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gonzalez O., Petkevičiūtė D., Maddocks J.H. A sequence-dependent rigid-base model of DNA. J. Chem. Phys. 2013;138:055102. doi: 10.1063/1.4789411. [DOI] [PubMed] [Google Scholar]
- 38.Diekmann S. Definitions and nomenclature of nucleic acid structure parameters. J. Mol. Biol. 1989;205:787–791. doi: 10.1016/0022-2836(89)90324-0. [DOI] [PubMed] [Google Scholar]
- 39.Babcock M.S., Pednault E.P., Olson W.K. Nucleic acid structure analysis. Mathematics for local Cartesian and helical structure parameters that are truly comparable between structures. J. Mol. Biol. 1994;237:125–156. doi: 10.1006/jmbi.1994.1213. [DOI] [PubMed] [Google Scholar]
- 40.Petkevičiūtė D., Pasi M., Maddocks J.H. cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA. Nucleic Acids Res. 2014;42:e153. doi: 10.1093/nar/gku825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dijkstra E.W. Programming as a discipline of mathematical nature. Am. Math. Mon. 1974;81:608–612. [Google Scholar]
- 42.Bishop T.C. VDNA: the virtual DNA plug-in for VMD. Bioinformatics. 2009;25:3187–3188. doi: 10.1093/bioinformatics/btp566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bishop T.C., Hearst J.E. Potential function describing the folding of the 30 nm fiber. J. Phys. Chem. B. 1998;102:6433–6439. [Google Scholar]
- 44.Herráez A. Biomolecules in the computer: jmol to the rescue. Biochem. Mol. Biol. Educ. 2006;34:255–261. doi: 10.1002/bmb.2006.494034042644. [DOI] [PubMed] [Google Scholar]
- 45.Rose A.S., Hildebrand P.W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res. 2015;43:W576–W579. doi: 10.1093/nar/gkv402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Humphrey W., Dalke A., Schulten K. VMD: visual molecular dynamics. J. Mol. Graph. 1996;14:33–38, 27–28. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 47.Schrödinger L.L.C. 2015. The PyMOL molecular graphics system, version 1.8 (Schrödinger) [Google Scholar]
- 48.Down M.P., Thomas A., Hubbard T.J.P. Dalliance: interactive genome viewing on the web. Bioinformatics. 2011;27:889–890. doi: 10.1093/bioinformatics/btr020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Skinner M.E., Uzilov A.V., Holmes I.H. JBrowse: a next-generation genome browser. Genome Res. 2009;19:1630–1638. doi: 10.1101/gr.094607.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Norman A.W., Litwack G. Second Edition. Academic Press; Cambridge, MA: 1997. Hormones. [Google Scholar]
- 51.Anderson A.P., Jones A.G. erefinder: genome-wide detection of oestrogen response elements. Mol. Ecol. Resour. 2019;19:1366–1373. doi: 10.1111/1755-0998.13046. [DOI] [PubMed] [Google Scholar]
- 52.Zhao Y., Wang J., Xiao J. NucMap: a database of genome-wide nucleosome positioning map across species. Nucleic Acids Res. 2019;47:D163–D169. doi: 10.1093/nar/gky980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Schwabe J., Chapman L., Rhodes D. The crystal structure of the estrogen receptor DNA-binding domain bound to DNA: how receptors discriminate between their response elements. Cell. 1993;75:567–578. doi: 10.1016/0092-8674(93)90390-c. [DOI] [PubMed] [Google Scholar]
- 54.Bascom G.D., Myers C.G., Schlick T. Mesoscale modeling reveals formation of an epigenetically driven HOXC gene hub. Proc. Natl. Acad. Sci. USA. 2019;116:4955–4962. doi: 10.1073/pnas.1816424116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zhang Q., Beard D.A., Schlick T. Constructing irregular surfaces to enclose macromolecular complexes for mesoscale modeling using the discrete surface charge optimization (DISCO) algorithm. J. Comput. Chem. 2003;24:2063–2074. doi: 10.1002/jcc.10337. [DOI] [PubMed] [Google Scholar]
- 56.Schiessel H., Gelbart W.M., Bruinsma R. DNA folding: structural and mechanical properties of the two-angle model for chromatin. Biophys. J. 2001;80:1940–1956. doi: 10.1016/S0006-3495(01)76164-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sun R., Li Z., Bishop T.C. TMB library of nucleosome simulations. J. Chem. Inf. Model. 2019;59:4289–4299. doi: 10.1021/acs.jcim.9b00252. [DOI] [PubMed] [Google Scholar]
- 58.Smith J.A., Khamra Y.E., Jha S. Proceedings of the Conference on Extreme Science and Engineering Discovery Environment Gateway to Discovery - XSEDE '13. ACM Press; 2013. Scalable online comparative genomics of mononucleosomes. a BigJob. [Google Scholar]
- 59.Bishop T.C., Kosztin D., Schulten K. How hormone receptor-DNA binding affects nucleosomal DNA: the role of symmetry. Biophys. J. 1997;72:2056–2067. doi: 10.1016/S0006-3495(97)78849-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hamdi Y., Leclerc M., Simard J. Functional analysis of promoter variants in genes involved in sex steroid action, DNA repair and cell cycle control. Genes (Basel) 2019;10:E186. doi: 10.3390/genes10030186. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The files included are: bottom.pdb and top.pdb corresponding to models a) and b), respectively, shown in Fig. 3 of the manuscript.
There are 6 models: HOXC, Life-like, Life-like-Ac, Life-like-LH, uniformNFR, and uniformNRL. Detailed descriptions of these model can be found in (54). HOXC.zip contains 3 representations for each model in xyz file format. Files with only the “.xyz” suffix contain the director frames that we extracted directly from the HOXC models provided to us. For DNA: CA is the center atom of each director frame. H1,H2,H3 represent the directors. For Nucleosomes: O is the center atom of each director frame. OH1, OH2,OH3 represent the directors. Files with the “.E-R.xyz” suffix were converted from “xyz” director frames to “generalized step parameter” representations and then back to “xyz” director frames using the E-R method. Files with the “.E-A.xyz” suffix were converted using the E-A method. RMSD values between the initial model and the “E-R” and “E-A” reconstructed models are shown in Table S1.
The file naming convention indicates whether the E-A (35) (“x3dna_” filenames) or E-R (37) (“curves_” filenames) algorithm has been used to convert the director frames (“_rd_” filenames) to generalized step parameters (“_hp_” filenames).




