Abstract
Olduvai protein domains, encoded by the NBPF gene family, are responsible for the largest increase in copy number of any protein-coding region in the human genome. This has spawned various genetics studies which have linked these domains to human brain development and divergence from our primate ancestors, as well as currently relevant cognitive diseases such as schizophrenia and autism. There are six separate Olduvai domains which together form the majority of the various protein products of the NBPF genes. The six domains involve three conserved domains (CON1–3), and three human-lineage-specific domains (HLS1–3) which occur in triplet. Here, we present the solution nuclear magnetic resonance backbone assignments for the CON1 domain, which has been linked to the severity of autism. The data confirm that CON1 is an intrinsically disordered protein (IDP). Additionally, we use innovative Hα-detected experiments which allow us to not only assign the Hα atoms and N atoms of proline residues, but also to assign residues where HN-experiments suffered from peak overlap or broadening.
Keywords: Olduvai domain, DUF1220, IDP, autism, backbone chemical shift assignment
Biological context
Olduvai protein domains (formerly known as DUF1220) (Sikela & van Roy, 2017) are thought to be associated with human brain evolution. These domains show the largest human specific increase in copy number of any coding region in the genome (O’Bleness et al., 2012). This dramatic increase in copy number seems to be involved with the evolution of the human brain and its rapid growth and divergence from other primates (Popesco et al., 2006). Olduvai domains consist of six domains, each about 65 amino acids in length, and sorted into two subtypes based on their sequences: conserved domains 1 – 3 (CON1–3), which are conserved across species, and human lineage specific domains 1 – 3 (HLS1–3), which are specifically extended in humans. The domains are encoded by 24 NBPF genes (Neuroblastoma Breaking Point Family), which vary in the number of each domain encoded in the gene but show specific patterning of the various Olduvai subtypes (O’Bleness et al., 2012) (Fig. 1). The majority of NBPF genes are located at the 1q21 locus, a known hotspot for duplication. One of the simplest NBPF genes, NBPF15, encodes a protein that consists of an N-terminal coiled-coil domain followed by each of the Olduvai subunits: CON1, CON2, HLS1–3 and CON3 (Fig. 1). The HLS1–3 domains together are termed the “Olduvai triplet” (Fig. 1) and in some NBPF genes this triplet can be duplicated anywhere from two to upwards of twenty times. There is a specific extended subtype of HLS1 (HLS1-ex) in NBPF genes with more than one Olduvai triplet copy. It has been shown that furin cleaves NBPF proteins near the N-terminus of HLS1-ex. Therefore, it suggests CON domains and the Olduvai triplet are independent cleavage products that confer separate functionality (Pacheco et al., 2021). An expansion in CON1 duplications in humans has been linked to increased severity of Autism Spectrum Disorder (ASD) symptoms (Davis et al., 2019), while a decrease has been associated with negative symptoms of schizophrenia (Searles Quick, et al., 2015). It has been hypothesized that these symptoms are a result of irregular neurogenesis, perhaps influencing neuron count as micro- and macrocephaly are also diseases linked to copy numbers of Olduvai domains (Dumas et al., 2012).
Figure 1.

Organization of Olduvai domains. Cartoon depiction of canonical organization of the Olduvai domains in the simplest NBPF gene, NBPF15, which harbors a single Olduvai triplet. The intrinsic disorder of the CON1 domain is highlighted.
On the structural level, it has been shown using NMR backbone assignment and titration that the HLS1–3 Olduvai domains are disordered proteins with no long-range interaction between them (Issaian et al., 2019). Here, we present the backbone resonance assignment of CON1 including Hα and proline resonances obtained from a suite of non-standard pulse sequences. The assignment constitutes a basis for future investigations of CON1 dynamics and interactions with potential binding partners.
Methods and experiments
Protein expression and purification
The CON1 construct was derived from residues 174–247 of NBPF15 protein and cloned into the expression vector pET-21b(+) with an N-terminal maltose binding protein (MBP) tag coupled to a tobacco etch virus (TEV) protease cleavage site, and a C-terminal 6x-Histidine tag. Nine residues N-terminal to CON1 are extra residues from cloning, in addition to 8 residues C-terminal to CON1 (including 6xHis-tag). The CON1 plasmid was transformed and then expressed in Escherichia coli strain Rosetta™ 2(DE3)pLysS cells. A single colony from an LB plate grown overnight was selected and resuspended in 2 mL LB-Miller media and shaken at 37°C for 8 hours. A small 15N/13C preculture with M9 media (1 g/L 15N-ammonium chloride and 1.5 g/L 13C-glucose) was inoculated with the LB preculture (1000:1) and shaken at 37°C overnight. 15N/13C-supplemented M9 media was inoculated with the M9 preculture (50:1) and shaken at 37°C overnight. The culture was induced with 0.5 mM isopropyl-1-thio-d-galactopyranoside (IPTG) once the A600 reached 0.6 and shaken for a further 5 hours at 30°C. Cells were harvested by centrifugation at 6°C for 10 min at 4500 × g.
The resulting cell pellets were resuspended in nickel buffer A (50 mM Tris (pH 7.5), 300 mM NaCl, 3% glycerol, 10 mM imidazole). Cells were broken via sonication (6 × 20 s) on ice. The lysates were clarified by centrifugation at 20,000 × g for 30 min at 4°C and soluble fraction loaded onto a Ni Sepharose column (GE Healthcare) pre-equilibrated with nickel buffer A. After loading the column was washed with 5 column volumes (CV) of buffer A. The His-tagged CON1 protein was then eluted from the column with nickel buffer B (50 mM Tris (pH 7.5), 300 mM NaCl, 3% glycerol, and 400 mM imidazole). The elution fractions were pooled and spiked with 5 mM dithiothreitol (DTT). TEV protease was added to the eluent in a ratio of 1 TEV:100 protein and the sample was dialyzed overnight against 10 mM Tris (pH 7.5), 100 mM NaCl, and 5 mM DTT. The cleaved sample was loaded onto a Source 15Q column (GE Healthcare) pre-equilibrated with Q buffer A (20 mM Tris (pH 8.0), 100 mM NaCl). Proteins were separated by gradient elution with Q buffer B (20 mM Tris (pH 8.0), 1M NaCl) using the following gradient: 0 – 50% B (0 – 5 CV), 50 – 75% B (5 – 6 CV), 75 – 100% B (6 – 6.5 CV), 100% B (6.5 – 7.5 CV). The relevant fractions were collected and concentrated using a 3,000 MWCO concentrator (Sartorius) and exchanged into NMR buffer (50 mM potassium phosphate, 100 mM NaCl, and 2 mM DTT) via dialysis. The protein purity was verified using 12% denaturing SDS page.
NMR spectroscopy
15N/13C-labeled CON1 in 50 mM potassium phosphate, 100 mM NaCl, and 2 mM DTT had a protein concentration of about 500 μM. Backbone assignment at 25°C was completed using HN-detected experiments 1H-15N HSQC, HNCACB, CBCA(co)NH, HNCO, and HN(ca)CO (Cavanagh et al., 2007; Ikura et al., 1990), as well as Hα-detected experiments Ha(CA)CON, iHaCaNCO, HaCaCONCaHa, and HaCaCON (Karjalainen et al., 2020; Mäntylahti et al., 2011). Experiments were performed on triple-resonance 900 MHz Varian and 600 MHz Bruker Avance Neo spectrometers. Backbone assignment at 5°C was completed using 1H-15N HSQC and HNCACB experiments recorded on the 600 MHz Bruker Avance Neo spectrometer and the 25°C assignment. For the HN-detected experiments, 3D experiments were acquired with a non-uniform sampling (NUS) scheme generated by NUS@HMS generator (Hyberts et al., 2012), with 10–47% sampling of 51 and 98 or 80 and 160 complex points in the indirect 15N and 13C dimensions, respectively, and 2048 complex data points in the direct dimension. The spectral widths were 13.65 ppm (1H), 34.99 ppm (15N), 14.00 ppm (13C=O), and 79.99 ppm (13Cα/13Cβ). For 3D experiments the number of scans was 16 and interscan delay of 1 or 1.1 s. For 2D experiments 32 scans were performed, all with interscan delays of 1.6 s. The Hα-detected experiments were also acquired using NUS schemes from NUS@HMS generator with either 7.5 or 19% sampling of 80 or 128 complex points in the indirect 15N or 13C dimensions. We used 512 or 1024 complex points in the direct dimension. The spectral widths were 11.90, 8.1 or 10.4 ppm (1Hα), 28 ppm (15N), 12 ppm (13C). The NUS spectra were constructed using hmsIST software (Hyberts et al., 2012), and NUS zero-filling was used for the linearly acquired 2D spectra. A solvent subtraction function was applied in the direct dimension. For data processing and visualization NMRPipe/NMRDraw was used (Delaglio et al., 1995) as well as NMRFAM Sparky (Lee et al., 2015). Resonance assignments were completed using CCPnmr version 2.4.2 (Vranken et al., 2005).
Assignment and data deposition
We confirmed the disordered nature of CON1 due to the limited peak dispersion shown in the 15N-1H HSQC (Fig. 2a, 3a). The CON1 sequence has low complexity and comprises many stretches of repeated residues or repeated sequences of amino acids. Due to this there is poor dispersion and many overlapping peaks. To overcome this challenge and increase assignment success we implemented innovative Hα-detected experiments. These experiments involve multiple coherence transfer pathways linking up to four atoms connecting sequentially three different residues, which leads to severely decreased sensitivity in folded proteins. However, in IDPs such coherence transfer is still efficient because of the elevated transverse relaxation times due to the higher flexibility nature (Mäntylahti et al., 2011). Additionally, unlike HN protons, Hα protons are non-labile and are not susceptible to chemical exchange with the solvent (Karjalainen et al., 2020). The implementation of these experiments increased assignment success by adding an additional variable to differentiate between residues and allowed for 100% backbone assignment of the prolines (Hα, N, Cα, Cβ, CO) (Table 1). Overall, the resonance assignment rate was 90.0 and 92.9% at 25 and 5°C respectively (Table 1). These assignments are not including the additional seventeen residues due to cloning. We calculated these percentages based on the number of 15N resonances assigned. The experiments at 5°C were performed to reduce conformational exchange common in disordered proteins and potentially make use of chemical shift changes that would separate overlapped peaks. While this increases the overall tumbling time of the molecule and causes peak broadening, ideally it will also quench slow exchange in some residues, thus increasing resolution. We were able to assign five additional residues using the 5°C spectrum. Additionally, we a noticed a redistribution of exchange peaks in certain residues, as exemplified by Gly 52 (Fig. 2b, 3b) where a new exchange peak emerges while the original exchange peak is slightly weakened. It is possible that the presence of the exchange peaks present at both temperatures is cause by isomerization of Pro 53. The population of the minor states are roughly 30 and 20% at 25°C and 5°C, respectively. Although it has been reported that the populations of cis-Pro are between 5–10% for IDPs (Alderson et al., 2018; Mateos et al., 2020), it has also been noted that prolines with neighboring aromatic residues can have higher cis populations (Mateos et al., 2020). In any case, the additional exchange peak at 5°C would indicate an exchange process independent of the isomerization. The increase in residues exhibiting multiple exchange peaks at low temperature suggests that there are increased or changed slow motions within CON1. The backbone assignments for both temperatures have been deposited in the BMRB with accession codes 51015 (25°C) and 51016 (5°C).
Figure 2.

CON1 1H-15N HSQC at 25°C. (a) HSQC spectra are shown with peak assignments labeled. (b) Zoomed-in region of 25°C HSQC showing exchange peaks for residue 52 Glycine.
Figure 3.

CON1 1H-15N HSQC spectra at 5°C. (a) HSQC spectra are shown with peak assignments labeled. (b) Zoomed-in region of 5°C HSQC at residue 52 Glycine showing a second exchange peak in addition to the one at 25°C (Fig. 2b).
Table 1:
Backbone assignment statistics for CON1 at 25 and 5°C
| CON1 (with NBPF15 relevant numbering | Total number of relevant residues* | Total number of relevant non-proline residues | % Backbone heavy atom resonances assigned (number of atoms assigned-excluding prolines) | % Prolines assigned | % 1Hα assigned (including prolines) |
|---|---|---|---|---|---|
| NBPF15-CON1 174–247 (25°C) | 74 | 70 | 90.0% (63 15N, 67 Cα, 62 Cβ, 69 CO) | 100% (1Hα, 15N, Cα, Cβ, CO) | 91.9 % |
| NBPF15-CON1 174–247 (5°C) | 74 | 70 | 92.9% (65 15N, 66 Cα, 66 Cβ) | N/A | N/A |
Does not include 17 extra non-relevant residues from cloning
Secondary structure prediction
From the extensive peak overlap in the 2D 1H-15N HSQC spectra we confirmed the disordered nature of CON1. However, while IDPs do not adopt any stable secondary or folded tertiary structure, they often sample aspects of secondary structure in solution (Jane Dyson & Ewright, 2002). Indeed, the presence of slow (micro- to millisecond) exchange in CON1 indicates residual structure that would not be present in strictly disordered protein (Adamski et al., 2019). The Secondary Structure Propensity score (SSP) (Marsh et al., 2006) gives an estimation of the secondary structure of a protein using available chemical shifts (a score of 1 indicates a fully formed α-helix, while a score of −1 indicates a fully formed β-sheet). IDPs often have scores between −0.2 and 0.2 indicating no secondary structure elements are present. The SSP scores for CON1, calculated using HN, N, Cα, and Cβ atoms, both at 25°C and 5°C show most residues fall within this disordered range (Fig. 4a). However, there are also regions, mostly near the N-terminus, where the SSP drops below −0.2, suggesting these residues have a propensity to form β-strands, likely in a transient manner. Compared to 25°C, the 5°C SSP scores of many of the residues are the same or similar. However, there is one region, residues 48–53, where the SSP score significantly increases, representing an increase in the disordered nature compared to 25°C. To confirm these trends we employed CheSPI (Nielsen & Mulder, 2021), a more recent algorithm to analyze populations of secondary structure elements, at both temperatures. In contrast to SSP, CheSPI can distinguish structured regions other than α-helix and β-sheet from true disorder. Nevertheless, all residues show consistently the highest population for disorder (Fig 4b, gray bars representing disorder), thus agreeing with the general disorder shown in the SSP. The regions around residue 20 and residue 52, which exhibit higher structural propensity in SSP also show the same patterns in CheSPI at 25°C, though the type of structure is altered from sheet-like in SSP to turn or helix-like in CheSPI. The population of the more structured conformations decreases at 5°C, again in agreement with the SSP that the disorder is more prevalent at 5°C. Interestingly, there is an increase in the population of extended conformation for the segment around residue 50 at 5°C, mirroring the increased SSP sheet propensity at 25°C. We note that the 25°C data has fewer assignments in this region than the 5°C data, most prominently the highly informative Cα shifts for residues 49, 50, and 51 are missing. Therefore, we repeated the SSP and CheSPI analyses only with the chemical shifts that are available at both temperatures. Generally, the SSP scores for both temperatures were closer together with the discrepancies around residues 20, 30, and 40 remaining consistent. However, around residue 50 the scores were almost identical with an enhanced but narrower drop to nearly 0.8, suggesting a nearly full sheet. Similarly for CheSPI, the population of the more structured conformations decreases at 5°C, and there is a slight increase of the sheet population around residue 50. However, the SSP scores reduce to one half if the Hα and CO shifts are included. Conclusively, the analyses suggest that there are segments with increased ordered structure, with slight differences between 5 and 25°C, and possibly a relatively high population of extended sheet around residue 50 but with a large uncertainty (10–50%).
Figure 4.

Secondary element scores for CON1. Compared are the scores from two secondary structure algorithms at two temperatures. (a) Secondary Structure Propensity (SSP) scores (Marsh et al., 2006) were determined using HN, N, Cα, and Cβ chemical shifts. Scores of +1 and −1 indicate fully formed α-helix and β-sheet, respectively. The scores for 25°C and 5°C experiments are shown in black and green, respectively. (b) CheSPI secondary structure predictions (Nielsen & Mulder, 2021) at 25° and 5° are shown at the top and bottom, respectively. Color-coded are the populations of extended (blue), helical (red), turn (green) and non-folded (gray) elements sampled by each residue.
Acknowledgements
The authors thank David Jones (University of Colorado) for his help with NMR spectroscopy and support in the CU Anschutz NMR core. We are also grateful to professor Permi Perttu at the University of Jyvaskyla, Finland, for sharing the Hα pulse sequences. This project was supported by NIH Grant R01GM130694–01 and a start-up package by the University of Colorado to B.V., University of Colorado Cancer Center Support Grant P30 CA046934, and NIH Biomedical Research Support Shared Grant S10 OD025020–01.
Footnotes
Conflict of interest
The authors declare they have no conflict of interest.
Compliance with ethical standards
Accession numbers
The chemical shift assignments for CON1 25°C and CON1 5°C have been deposited in the Biological Magnetic Resonance Data Bank under accession numbers 51015 and 51016 respectively.
References
- Adamski W, Salvi N, Maurin D, Magnat J, Milles S, Jensen MR, Abyzov A, Moreau CJ, & Blackledge M (2019). A Unified Description of Intrinsically Disordered Protein Dynamics under Physiological Conditions Using NMR Spectroscopy. Journal of the American Chemical Society, 141(44). 10.1021/jacs.9b09002 [DOI] [PubMed] [Google Scholar]
- Alderson TR, Lee JH, Charlier C, Ying J, & Bax A (2018). Propensity for cis-Proline Formation in Unfolded Proteins. ChemBioChem, 19(1). 10.1002/cbic.201700548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cavanagh J, Fairbrother WJ, Palmer AG, Skelton NJ, & Rance M (2007). Protein NMR Spectroscopy. In Protein NMR Spectroscopy. 10.1016/B978-0-12-164491-8.X5000-3 [DOI] [Google Scholar]
- Davis JM, Heft I, Scherer SW, & Sikela JM (2019). A third linear association between Olduvai (DUF1220) copy number and severity of the classic symptoms of inherited autism. American Journal of Psychiatry, 176(8). 10.1176/appi.ajp.2018.18080993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, & Bax A (1995). NMRPipe: a multidimensional spectral processing system based on UNIX pipes. Journal of Biomolecular NMR, 6(3), 277–293. [DOI] [PubMed] [Google Scholar]
- Dumas LJ, O’bleness MS, Davis JM, Dickens CM, Anderson N, Keeney JG, Jackson J, Sikela M, Raznahan A, Giedd J, Rapoport J, Nagamani SSC, Erez A, Brunetti-Pierri N, Sugalski R, Lupski JR, Fingerlin T, Cheung SW, & Sikela JM (2012). DUF1220-domain copy number implicated in human brain-size pathology and evolution. American Journal of Human Genetics, 91(3). 10.1016/j.ajhg.2012.07.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyberts SG, Milbradt AG, Wagner AB, Arthanari H, & Wagner G (2012). Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling. Journal of Biomolecular NMR, 52(4), 315–327. 10.1007/s10858-012-9611-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ikura M, Kay LE, & Bax A (1990). A Novel Approach for Sequential Assignment of 1H, 13C, and 15N Spectra of Larger Proteins: Heteronuclear Triple-Resonance Three-Dimensional NMR Spectroscopy. Application to Calmodulin. Biochemistry, 29(19). 10.1021/bi00471a022 [DOI] [PubMed] [Google Scholar]
- Issaian A, Schmitt L, Born A, Nichols PJ, Sikela J, Hansen K, Vögeli B, & Henen MA (2019). Solution NMR backbone assignment reveals interaction-free tumbling of human lineage-specific Olduvai protein domains. Biomolecular NMR Assignments, 13(2). 10.1007/s12104-019-09902-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jane Dyson H, & Ewright P (2002). Insights into the structure and dynamics of unfolded proteins from nuclear magnetic resonance. 10.1016/S0065-3233(02)62012-1 [DOI] [PubMed]
- Karjalainen M, Tossavainen H, Hellman M, & Permi P (2020). HACANCOi: a new Hα-detected experiment for backbone resonance assignment of intrinsically disordered proteins. Journal of Biomolecular NMR, 74(12). 10.1007/s10858-020-00347-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee W, Tonelli M, & Markley JL (2015). NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics (Oxford, England), 31(8), 1325–1327. 10.1093/bioinformatics/btu830 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mäntylahti S, Hellman M, & Permi P (2011). Extension of the HA-detection based approach: (HCA)CON(CA)H and (HCA)NCO(CA)H experiments for the main-chain assignment of intrinsically disordered proteins. Journal of Biomolecular NMR, 49(2). 10.1007/s10858-011-9470-z [DOI] [PubMed] [Google Scholar]
- Marsh JA, Singh VK, Jia Z, & Forman-Kay JD (2006). Sensitivity of secondary structure propensities to sequence differences between α- and γ-synuclein: Implications for fibrillation. Protein Science, 15(12). 10.1110/ps.062465306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mateos B, Conrad-Billroth C, Schiavina M, Beier A, Kontaxis G, Konrat R, Felli IC, & Pierattelli R (2020). The Ambivalent Role of Proline Residues in an Intrinsically Disordered Protein: From Disorder Promoters to Compaction Facilitators. Journal of Molecular Biology, 432(9). 10.1016/j.jmb.2019.11.015 [DOI] [PubMed] [Google Scholar]
- Nielsen JT, & Mulder FAA (2021). CheSPI: chemical shift secondary structure population inference. Journal of Biomolecular NMR, 75(6–7). 10.1007/s10858-021-00374-w [DOI] [PubMed] [Google Scholar]
- O’Bleness MS, Michael Dickens C, Dumas LJ, Kehrer-Sawatzki H, Wyckoff GJ, & Sikela JM (2012). Evolutionary history and genome organization of duf1220 protein domains. G3: Genes, Genomes, Genetics, 2(9). 10.1534/g3.112.003061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pacheco A, Issaian A, Davis J, Anderson N, Nemkov T, Vögeli B, Hansen K, & Sikela JM (2021). Proteolytic Activation of Human-specific Olduvai Domains by the Furin Protease. BioRxiv, 2021.07.06.450945. 10.1101/2021.07.06.450945 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Popesco MC, MacLaren EJ, Hopkins J, Dumas L, Cox M, Meltesen L, McGavran L, Wyckoff GJ, & Sikela JM (2006). Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science, 313(5791). 10.1126/science.1127980 [DOI] [PubMed] [Google Scholar]
- Sikela JM, & van Roy F (2017). A proposal to change the name of the NBPF/DUF1220 domain to the Olduvai domain. F1000Research, 6. 10.12688/f1000research.13586.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vranken WF, Boucher W, Stevens TJ, Fogh RH, Pajon A, Llinas M, Ulrich EL, Markley JL, Ionides J, & Laue ED (2005). The CCPN data model for NMR spectroscopy: Development of a software pipeline. Proteins: Structure, Function, and Bioinformatics, 59(4), 687–696. 10.1002/prot.20449 [DOI] [PubMed] [Google Scholar]
