Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 7.
Published in final edited form as: J Learn Disabil. 2024 May 28;57(6):411–416. doi: 10.1177/00222194241254091

A community data sharing resource: The LDbase data repository

Sara A Hart 1, Christopher Schatschneider 1, Tara Reynolds 1, Favenzio Calvo 1
PMCID: PMC11542900  NIHMSID: NIHMS2030302  PMID: 38807421

Abstract

In this paper we show the learning disabilities field what LDbase is, why it’s important for the field, what it offers the field, and examples of how you can leverage LDbase in your own work.


In 2018 our team, represented in part by the authors of this paper, began work on a data repository to serve as a resource for the learning disabilities field. That data repository, LDbase (publicly available at www.LDbase.org since 2021) is the result of this ongoing collaboration, which has grown into something bigger than any of us imagined when we started. One of our early supporters, Professor Stephanie Al Otaiba, invited us to write this paper to show the learning disabilities field what LDbase is, why it’s important for the field, what it offers the field, and examples of how you can leverage LDbase in your own work.

What is LDbase

LDbase is a domain-specific data repository for the field of learning disabilities. A domain-specific repository is one that is limited to a specific data type or related to a certain discipline. This contrasts with generalist repositories (e.g., ICPSR, OSF) which accept any data, no matter the data type, format, content or disciplinary focus. Generally, it is recommended that researchers give priority to domain-specific repositories when choosing where to share their data (National Institute of Health, 2020). There are many reasons for this recommendation: Domain-specific repositories have more detailed metadata profiles, making it easier to upload and search for datasets that fit very specific parameters in your research. They also build community around the shared data, allowing you to find potential collaborators interested in your field of research. In addition, domain-specific repositories, like LDbase, are built with researchers in a particular field in mind, developing interfaces and features tailored to create an optimized experience.

To make the difference between generalist repositories and domain-specific repositories more concrete, imagine you were shopping for a pair of sunglasses. You could go to a generalist store to buy sunglasses, like Walmart or Target. You would be faced with figuring out where the sunglasses are in the store, having a limited selection of sunglasses, difficulty finding glasses with the specific features you were looking for because you have to look at each tag, and you will lose time trying to find a mirror somewhere in the store. A more successful approach would be to go to a store that sells only sunglasses, provides you with a much larger selection, displays labels to you so that you can quickly find what you were looking for, and has mirrors set up on every wall so you can see if they fit you correctly. Similarly, generalist repositories are available to store or find data from our field, but they are not to fit the needs of our field, making them a less ideal choice. A domain-specific repository makes data sharing and data reuse easier for a field. Since becoming publicly available, LDbase has been listed as an NIH supported Domain-Specific Repository (https://www.nlm.nih.gov/NIHbmic/domain_specific_repositories.html).

Who is LDbase for?

Simply put, LDbase is for the learning disabilities community broadly defined. We allow for the “LD” in the LDbase to stand not only for “learning disabilities” but also “learning and development”. We do not restrict to projects that contain only samples identified as learning disabled, but instead have made LDbase well positioned to store data related to individuals in learning contexts or impacting individuals’ learning outcomes. LDbase is freely available to use for any researcher if they have data that fit the scope of LDbase. Investigators from any country can use LDbase. However, LDbase was created through funding from the National Institutes of Health by researchers based in the USA, and non-USA investigators are warned to check their local laws and customs for sharing data within LDbase. For example, we have attempted to comply with the European Union GDPR law which oversees, in part, data sharing, but we cannot guarantee that we do, given the intricacies of the how the law is being enforced.

What type of data does LDbase store?

LDbase accepts only behavioral data, defined as quantitative data gathered via questionnaire, cognitive testing, and the like. We do not accept image data, neurophysiological data, video data, and others. We limit data format to only behavioral data multiple reasons. From the start, we focused on the sustainability of LDbase, and the latter file types have extensive, and expensive, data storage needs. Second, other data repositories already exist to host that type of data (e.g., Databrary for video data), and it is not appropriate to duplicate those efforts. Instead, we allow our video and image sharing users to still share their projects on LDbase, complete with metadata, and then provide a URL link to where their data is stored, rather than uploading their data. The benefit of this is that the LD community will still find, reuse, and cite your data.

Datasets may be shared in the format of your choice. Common examples include csv, SAS, Excel, SPSS, and R. Flat files, as opposed to relational databases, should be shared to allow for easy download and reuse of data.

In addition to datasets, LDbase is also designed for researchers to share codebooks, data dictionaries, code in the original format, project documentation and any other project resources that will help others understand your study. The goal of open science is to provide and share the full picture of your research, so that your data can be properly understood, reused, and cited for years to come, ideally with no need to contact you in the future.

Why is LDbase important

Why did the learning disabilities field need LDbase?

We saw that the broad field of learning disabilities, which intersects the fields of education, psychology, and communication science and disorders, among others, was unique in its need, and positionality, for a domain-specific data repository. First, many of the projects in the field are grant funded, which means that the field feels considerable top-down pressure from granting agencies to share data. A mandate passed in 2013 requires data collected using federal funding to be open and accessible to the public (The White House, 2013). In response to this, the major federal agencies funding learning disabilities research now require explicit data sharing plans as part of their grant applications. For example, as of January 2023, NIH requires a formal data management and sharing plan, requiring data sharing at publication or end of grant, whatever comes first. Institute of Education Sciences (IES) has required data sharing of final research data for most grant types starting since 2013. For IES-funded projects alone, as of 2020 approximately 350 research projects are subject to this data sharing requirement and have data which are ready to share (Albro, 2020). These investigators need a place to share their data, and the support of a domain-specific repository on how to properly share their data. Second, the learning disabilities community is unique in being well poised to capitalize on the power of shared data. We often use common measures (e.g., Woodcock-Johnson Tests of Achievement) which makes integrating across datasets possible. Many of our research questions often require hard to collect populations (e.g., children with specific disorders) or methods that require expensive specialized data collection (e.g., intervention studies). Investigators who have limitations on how many hard to collect participants they can reach can combine across datasets to create a more powerful single dataset of their population for analysis. Investigators without access to funded resources can use shared data to advance the field by bringing their unique perspectives to their research questions, democratizing access to data.

What does LDbase offer

LDbase is a data repository that has been created specifically for the learning disabilities community. When we started to make LDbase, we knew that our learning disabilities community users wanted features that were not available with other data repositories. The following is a list of some of these features.

  1. Flexible layouts. The currently available data repositories assume a one-to-one-to-one correspondence of project to data collection to manuscript. This means that the data repository interface tends to be flat and, in many cases, does not allow any file nesting at all. Many in the learning disabilities community consider a project to be a broad bucket that might have multiple waves or types of data collection (e.g., longitudinal, teacher surveys, parent questionnaires, and child testing), and publish multiple manuscripts from the same project or even same data source. This means we needed to build LDbase to allow hierarchical file storage that had built-in flexibility. All new data uploads start first with a project page, which allows investigators to provide metadata on the broader project (e.g., project description, project investigators). After a project page is established, the investigator can add any number of datasets and documents, with any file nesting desired. This type of nested project layout allows others, who are reusing your data, to easily navigate your data and understand what pieces of the project they are looking at, and how it connects to the project as a whole. For example, nesting allows people to see that there are 3 cohorts stored in 3 datasets, and that ‘this’ codebook and ‘this’ piece of code goes specifically with ‘this’ dataset, and it does not apply to other datasets on this project.

  2. Controlled access capabilities. Data sharing means providing summary or individual-level data in a data repository. It does not require openly sharing all data, and indeed, many in the learning disabilities community might have various reasons for why they want to control access to their data. LDbase allows for open sharing as well as controlled access sharing. Open sharing fully meets the goals of data sharing, in that the data is available to any user with an easy download. On LDbase, openly shared data are available to any user, including those who have a registered account or not. Controlled access sharing means an investigator has stored their data on LDbase, including the important metadata to describe the data which is openly available, but the data itself is not available to use to any user. When using this feature an investigator can choose to set a date by which the data will become fully open (e.g., to meet a grant requirement for sharing), or the investigator can choose to not set a date, which effectively makes it permanently restricted access. Any LDbase registered user can request access to controlled access data through the internal messaging system, and the investigator can decide if they want to accept the request and allow access to that user through a specialized sharing feature. Each individual dataset shared on LDbase can have different access settings. This feature might be useful for investigators who want to (or are required to) participate in data sharing but they would rather have some data not fully open to any user, for ethical reasons (e.g., sensitive data) or personal (e.g., don’t want to share until the first major manuscripts are completed).

  3. DOI minting. Best practices in data sharing say that each product shared should get a persistent unique identifier, to allow for better citation and data discovery. LDbase uses DOI’s as a permanent digital objective identifier to assist with the permanent archiving and access of shared data, and we are setup to provide a DOI to anyone, free of charge, at any time.

  4. Citation (for project and each document). Data that is shared is a citable product, which can be added to CV’s and grant progress reports. Each data product, and documentation, is fully citable, and the LDbase interface has a feature that allows a user to easily copy a citation.

  5. Data sharing resources. In the end, we not only created a data repository, but also an informative website. By that we mean we have built out informational resources on open science and data sharing, with many of our resources answering questions that have been asked by the LD community. We have provided templates (e.g., informed consent language for data sharing, samples of Data Management Plans) and white papers sharing best practices particular to our field (e.g., data management basics, working with your IRB, de-identifying your data), all available on LDbase.org with a specialized search function of the resources.

  6. Management of your project users. We anticipated that the PI of projects would not necessarily be the individual who does the actual file uploads for their data sharing, but instead they might have students or staff who would do it. We have two levels of project users who have various editing and access capabilities. A Project Administrator is a role that provides the highest level of access to a project, its data, and any related documentation. We imagine that a PI and potentially a senior level staff member on a project would have this level of access. A Project Administrator can add and edit all metadata, upload files, access embargoed data, allow other users to access embargoed data, edit all other user access roles for the project, and delete pieces of the project or even delete the project entirely. A Project Editor is the second level of administrative user on a project. We imagine this role would be given to trainees and staff working with a PI. A Project Editor can add and edit metadata, upload files, access embargoed data, and allow other users to access embargoed data, but they cannot delete any data. Any registered user of LDbase can be given these project administrator roles. Access is assigned per project, can be different from project to project, and is managed by you. You can add/revoke access rights at any time, allowing you to remain in control of your data.

  7. Advanced search interface. The purpose of data sharing is to allow data to be accessible to others, for reproducibility or reuse. This means that others must be able to find your data. Part of our LDbase team are librarians, specially trained in information finding. We have advanced search capabilities, including the ability to use Boolean search terms or use preselected facets to limit search results. You can also search on data you are specifically looking for. Do you want to find a dataset with Dual Language Learners, collected via a longitudinal study, in elementary schools? You can do that on LDbase. But remember, also one can only search on study attributes that have been provided. Therefore, we require or request metadata about the research study to be entered when projects are created. Metadata not only makes your data more findable but also makes our search functionality better than those currently available in all other data repositories.

  8. Sharing data by uploading or linking. An LD investigator can share data on LDbase two ways. If the data is otherwise not available online, then they can upload their data to be stored on and shared from LDbase. However, if the investigator already has the data stored somewhere else online, either because they had to or they did before LDbase was available, then best practice is that they do not upload the data again to LDbase. Instead, for these investigators, we provide an option to create your project on LDbase, add your metadata that supports FAIR data sharing, and then add a link to the external website where the data is stored, rather than uploading the data to the repository. As LDbase is a one-stop shop for our field to find data, doing this allows secondary users access to your data which otherwise they might struggle to find. Linking to an external resource is also helpful for files that are better off stored in other places, like preregistrations (which OSF is purposely set up to host) or file types LDbase doesn’t store (e.g., if you have a data file of imaging data stored somewhere else but all behavioral data will be on LDbase).

  9. Metadata. This is the key that makes the repository work for everyone. Metadata is all of the informational tags that are used in LD research. It includes everything you can think of, like: assessments used, educational environments the research took place in, what year the study happened, who provided funding, was a control group used, is the socio-economic status shared, was it randomized, were twins studied, was ADHD a focus, and who are the PIs. You will be asked to enter in metadata related to your project, data, and other documentation during your upload process (you can see what will be expected of you here https://ldbase.org/data-sharing-resources/guides/what-information-do-I-need).

  10. Broader findability. LDbase currently uses a schema.org metadata format. This means that LDbase metadata is searchable by other programs, importantly including the google crawler, allowing for LDbase data to be “found” by google (and google data and google scholar). As part of our future work, we are continuing to build out other API capabilities, allowing easier access to the metadata stored in LDbase.

Sustainability

LDbase is a free to use resource, and we were sensitive to sustainability from the very start of our work building LDbase. We made choices regarding our features that ensure that LDbase is sustainable after our NIH funding is done. The two most major decisions were first we would give the ability to store and access data and related documentation entirely to users, and second, we would not be involved in any step of looking into a dataset. There are implications to these decisions. First, we do not restrict who can become a registered user of LDbase. This contrasts with data repositories such as Databrary, which goes through a process of requiring all new users to get user requirements signed by their university official. This ensures that all users have certain credentials, however, it does not allow inclusive data access and the process is labor intensive. Second, we also cannot speak to data “quality” of the data stored in LDbase. To do so would require us to go inside datasets uploaded to LDbase and confirm aspects such as full deidentification, cleanliness, documentation, and the like. This would require staff time and expertise. ICPSR does provide such services, for a fee. We have heard from one user that public access to a dataset on ICPSR has been held up for over 15 months because the data checking process is backed up. We provide many resources on how to check for deidentification, good data management practices and documentation, but in the end the onus is on both the data depositor and any secondary data user to ensure data quality. This allows us to keep ongoing costs very low. To cover those remaining costs, we built LDbase in partnership with the Florida State University Libraries, which recognize LDbase as a part of their archive. This means the FSU Libraries will maintain curation of LDbase after our funding period.

Examples of secondary data uses

LDbase was made for two reasons. First, it provides a free place that LD researchers can store their data, to meet federal data sharing requirements or to contribute to open science. Second, it provides a place for LD investigators to go to find data for reuse. This reuse is useful for many different goals. By bringing new investigators to data, with their own backgrounds, their own theoretical leanings, and their own methodological approaches, new research questions can be conceived which were not considered by the original investigator team. These new ideas increase the creative use of data and can lead to breakthroughs and advance the field. These new investigators also sometimes become new collaborators, increasing the collaborative networks of all investigators involved. This has been shown to increase creativity in research (e.g., Hall et al., 2018). Data reuse can also increase the transparency of the research process. By openly sharing data that contributed to scientific knowledge, others can access the data and check the analytical pipeline for reproducibility. This adds to the credibility of our field’s findings, a crucial task for a field like ours which directly contributes to the instruction and interventions that children receive. Data sharing can also promote equity in the research process. Not all investigators have access to high quality learning disabilities data, because they are junior, have backgrounds that make them less likely to be funded, or are at under resourced institutions. By opening data for others to use, you are providing a resource that can benefit all.

Through our years of working to support our community in data sharing, we have seen many examples of successful data reuse, in both papers and grants.

Papers.

Our field has examples of large data collections that were collected with the purpose of sharing. Examples include the PISA, TIMSS, and ECLS. There are many datasets available that are less well-known but no less valuable. Many of these are stored in data repositories like LDbase. One example of an interesting reuse of data stored on LDbase is Hall, van Dijk, Chow & Comella (under review). This paper uses part of the Project KIDS dataset (Hart et al., 2021, with full data description available in van Dijk et al., 2022). Project KIDS combined nine extant intervention projects into one very large integrated dataset for novel research questions concerning individual differences in intervention responses, which was already an example of the power of data sharing. Hall et al (2021) further extended the usefulness of the shared data to use innovative analytical techniques to examine the impacts of a reading intervention on math outcomes. They found that the reading intervention had a small impact on applied problem solving, and that activating word-level reading skills via the reading intervention impacted math fluency (Hall et al., 2021). Interestingly, the first author has used the same shared dataset again for a very different manuscript, using the data as part of a “how-to” paper in using longitudinal structural equation models for a school psychology audience (Hall & Clark, in press). Indeed, shared data can be used to advance research questions important to the field, as well as to support pedagogy.

Grants.

Sometimes a new idea or new method does not need new data collection to be tested. This is especially the case in an era of shrinking buying power from grant dollars, where collecting data with children, teachers, and families is expensive and takes time. By leveraging already committed resources to collect data, it is possible to propose effective and interesting ideas to granting agencies that include secondary data analysis. A recent example some of our team was part of a new NIH-funded R01 (Jessica Toste, PI). This grant leverages existing datasets from efficacy trials of small group interventions for reading difficulties. In this grant, the datasets will be combined using integrative data analysis (Curran et al., 2014; van Dijk et al., 2022), creating a large dataset of thousands of students who have received supplemental instruction in reading, including their treatment status, pre- and post-reading scores, and demographic information. The research questions for this grant are related to group differences in treatment response, questions that were not possible in any of the individual projects because of small sample sizes within a project. This is not an issue when we combine across datasets. This project was possible due to personal connections by the investigators allowing access to each dataset. The final combined dataset, and many of the individual datasets, will be made available on LDbase.org, which will make available these once private datasets to others who happen to not be well connected. By continuing to share the data from our field, we will allow others to innovate and maximize the knowledge that can be gained from the data.

It is our dream, and one reason to support and participate in data sharing data, that as the field of learning disabilities shares more data, inequities in access to high quality data will diminish. In the end, the north star of our field is individuals with learning disabilities. Data sharing is one way we can advance our field towards supporting all individuals who are at risk, or have, learning disabilities.

Acknowledgements

Building LDbase has truly been a team effort. Beyond the authors, the following people, listed in alphabetical order, have contributed to LDbase, giving expertise, time, and effort to its success: Brian Arsenault, Bryan Brown, Veronica Mellado De La Cruz, Ashley Edwards, Stephanie Estrera, Mason Hall, Jessica Logan, Jean Philips, Jeffrey Shero, Rachel Smart, Micah Vandegrift, Wilhelmina van Dijk, and Christine White.

This work is supported by Eunice Kennedy Shriver National Institute of Child Health & Human Development Grant R01HD095193. Views expressed herein are those of the authors and have neither been reviewed nor approved by the granting agencies.

References

  1. Albro E (2020, January). IES annual principal investigators meeting. https://ies.ed.gov/pimeeting/
  2. Curran PJ, McGinley JS, Bauer DJ, Hussong AM, Burns A, Chassin L, … & Zucker R (2014). A moderated nonlinear factor model for the development of commensurate measures in integrative data analysis. Multivariate Behavioral Research, 49(3), 214–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Hall GJ, & Clark KN (in press). Demystifying longitudinal data analysis using structural equation models in school psychology. Journal of School Psychology. 10.1016/j.jsp.2023.03.003 [DOI] [PubMed] [Google Scholar]
  4. Hall GJ, van Dijk W, Chow JC, & Comella S (under review). Decrypting the code: Investigating a reading intervention’s impact on math problem solving and calculation fluency. https://osf.io/preprints/psyarxiv/jvyzw
  5. Hall KL, Vogel AL, Huang GC, Serrano KJ, Rice EL, Tsakraklides SP, & Fiore SM (2018). The science of team science: A review of the empirical evidence and research gaps on collaboration in science. American Psychologist, 73(4), 532. [DOI] [PubMed] [Google Scholar]
  6. Hart SA, Otaiba SA, Connor C, & Schatschneider C (2021). Project KIDS. LDbase. 10.33009/ldbase.1619716971.79ee [DOI] [Google Scholar]
  7. National Institutes of Health. (2020, October 29) Supplemental Information to the NIH Policy for Data Management and Sharing: Selecting a Repository for Data Resulting from NIH-Supported Research. NIH Grants & Funding. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-016.html [Google Scholar]
  8. The White House. (2013, May 9). Executive order: Making open and machine readable the new default for government information. https://obamawhitehouse.archives.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-
  9. van Dijk W, Norris CU, Al Otaiba S, Schatschneider C, & Hart SA (2022). Exploring individual differences in response to reading intervention: Data from Project KIDS (kids and individual differences in schools). Journal of Open Psychological Data, 10(1), 2. 10.5334/jopd.58 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. van Dijk W, Schatschneider C, Al Otaiba S, & Hart SA (2022). Assessing measurement invariance across multiple groups: When is fit good enough?. Educational and Psychological Measurement, 82(3), 482–505. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES