Skip to main content
NIST Author Manuscripts logoLink to NIST Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 23.
Published in final edited form as: Microsc Microanal. 2020 Jul 30;26(Suppl 2):2950–2952. doi: 10.1017/s1431927620023314

NexusLIMS: Leveraging Shared Microscopy Resources for Data Analysis with a Configurable Laboratory Information Management System

Joshua Taillon 1, Raymond Plante 2, Marcus Newrock 2, June Lau 2, Gretchen Greene 2
PMCID: PMC7987231  NIHMSID: NIHMS1670883

Researchers at electron microscopy (EM) facilities produce large amounts of data from many types of instrumentation, typically using proprietary software to collect and analyze their results. Often, due to finite funding and restrictive licensing terms, data analysis is restricted to a single computer or constrained functionality in an “offline” copy of the software, limiting individual user productivity, as well as the number of researchers capable of data analysis. Luckily, this situation has brought about a sharp increase in the publication of open-source scientific software tools, as evidenced by the success of the Data Acquisition Schemes, Machine Learning Algorithms, and Open Source Software Development for Electron Microscopy symposium at the 2019 Microscopy and Microanalysis Meeting [1]. Such tools greatly improve reproducibility and scientific integrity by illuminating all data processing steps, rather than relying on “black box” operations from vendor-supplied software.

While critical for reproducible science, the use of open source data analysis tools is hampered by the lacking data infrastructure at most EM facilities (until recently, NIST included). At a typical facility, raw data is written to hard drives on the microscope “support” computers, which are periodically cleared as space becomes limited, and the onus is on the user to ensure their data is copied regularly (typically via USB) and backed up elsewhere, such as on external hard drives in the user’s office. Perhaps most importantly, this data (and any associated metadata such as instrument configuration) is usually only viewable from within the commercial software packages, leading users to haphazardly embed metadata via individual file-naming conventions that vary from person to person. As a result, this data becomes unmanageable between users, hampering collaboration and data sharing. Eventually, over longer time spans or across guest-researcher or postdoc tenures, datasets are effectively abandoned and then forgotten after publication, even though that data may be useful to others (with the right context). Individually managed (meta)data based on arbitrary user preference fails to meet the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles [2], and limits the scientific productivity of multi-user facilities and institutions.

In an attempt to rectify this situation at NIST, the Electron Microscopy Nexus was chosen as a test bed for a custom-built Laboratory Information Management System (LIMS) [3]. The Nexus facility is an internal multi-user EM cooperative where tool owners agree to share a common reservation system and basic data management model (i.e. where data is saved). It encompasses 10 instruments, varying in capabilities (TEM, SEM, STEM, FIB, EDS, etc.), manufacturer, and age, which is typical of many multi-instrument facilities. This variety presents challenges, but also opportunities to design a modular LIMS that provides value to researchers and is flexible enough to handle a wide array of instrumentation and use patterns. The result of this design is NexusLIMS, a data management solution that automatically (without user input) captures experimental context and raw data while streamlining researchers’ abilities to search, explore, edit, and download data from any networked device, a vast improvement over the status quo.

NexusLIMS is comprised of two interacting systems (see Figure 1), known as the back- and front-ends. The back-end is responsible for gathering enough contextual information to build a “record” of a given experiment. This experimental context is captured from multiple data sources (highlighted in green in Fig. 1), including instrument schedulers, metadata recorded by proprietary software, electronic laboratory notebook records, a database of session logs, and the collected data itself. Using this content, a structured eXtensible Markup Language (XML) record is built to fully describe the experiment and data, together with their relevant context. This record is automatically loaded into a customized instance of the Configurable Data Curation System (CDCS) (highlighted in blue) [4], linked to archival storage of data on a central file server. Experiments are displayed to users using customizable XML stylesheets, creating a central system for browsing, searching, accessing, and analyzing experimental records (see Figure 2). This entire process transpires with zero interaction from users, requiring no changes to their current workflows.

Figure 1.

Figure 1.

A schematic representation of the NexusLIMS system developed at NIST. The “back-end” experimental record building code is depicted on the left, with the “front-end” user-facing interactive portion depicted on the right. See the text for further details. Such a design enables easy access to data from any location via user interaction or API access, and promotes the FAIR principles.

Figure 2.

Figure 2.

A typical display of an experimental record shown to the user via the CDCS-powered front-end. A few features are highlighted: the summary of the experiment (blue rectangle) contains contextual information about the experiment, extracted from the metadata collected when the instrument is reserved. On the right side (green rectangle), an interactive gallery of preview images allows a user to quickly page through the data contained within to gain insight into the experiment’s content (often not possible with proprietary data formats). On the left (pink rectangle), a listing of the experiment’s content provides context about how many different sets of data were collected during this session. Further below, the individual datasets (files) are clustered into “experimental activities”, and for every file, the user can search instrumental setup parameters and metadata extracted from the proprietary data formats in both human-readable (brown rectangle) and machine-readable (red rectangle) formats. Links are also provided to quickly download the original data.

By extracting and collecting these experimental records, a centralized repository of experiment metadata and raw data from the EM Nexus Facility has been created, and is currently being used by over 50 research staff and managers at NIST. Users have been able to more easily share their data, search across multiple experiments or samples, or find related experiments using the powerful querying tools provided by CDCS. Project managers can easily find data collected by former interns and postdocs, and all data is automatically backed up without any intervention required by users. In the future (once more data has been collected), this system will enable “in the cloud” data analysis using centralized services (such as Jupyter Notebooks), as well as automated analysis of large amounts of data using AI/ML techniques.

References

RESOURCES