Developing the Minimum Dataset for the New Mexico Decedent Image Database

Shamsi Daneshvari Berry; Philip J Kroth; Heather J H Edgar; Teddy D Warner

doi:10.1055/s-0041-1730999

. 2021 Jun 2;12(3):518–527. doi: 10.1055/s-0041-1730999

Developing the Minimum Dataset for the New Mexico Decedent Image Database

Shamsi Daneshvari Berry ^1,^2,^✉, Philip J Kroth ¹, Heather J H Edgar ², Teddy D Warner ²

PMCID: PMC8172257 PMID: 34077973

Abstract

Background A minimum dataset (MDS) can be determined ad hoc by an investigator or small team; by a metadata expert; or by using a consensus method to take advantage of the global knowledge and expertise of a large group of experts. The first method is the most commonly applied.

Objective Here, we describe a use of the third approach using a modified Delphi method to determine the optimal MDS for a dataset of full body computed tomography scans. The scans are of decedents whose deaths were investigated at the New Mexico Office of the Medical Investigator and constitute the New Mexico Decedent Image Database (NMDID).

Methods The authors initiated the consensus process by suggesting 50 original variables to elicit expert reactions. Experts were recruited from a variety of scientific disciplines and from around the world. Three rounds of variable selection showed high rates of consensus.

Results In total, 59 variables were selected, only 52% of which the original resource authors selected. Using a snowball method, a second set of experts was recruited to validate the variables chosen in the design phase. During the validation phase, no variables were selected for deletion.

Conclusion NMDID is likely to remain more “future proof” than if a single metadata expert or only the original team of investigators designed the metadata.

Keywords: minimum dataset, metadata, future uses, Delphi method, snowball sampling

Introduction

The total amount of digitally stored data now exceeds 44 zetabytes. 1 These data are comprised of both data collected for scientific inquiry and those created as an artifact of another nonresearch purpose. For most research databases, the goal in their creation is to gather information regarding a scientific query, investigation, or task immediately at hand. 2 3 Data collected for other purposes, such as clinical work on the other hand, are used as artifacts of their original purpose. Regardless of whether data are collected for a specific purpose or exist as artifacts, the database is highly dependent on how well the dataset designers predicted its future uses, making it more “future proof,” by selection of optimal metadata. Unfortunately, many selected database variables tend to be chosen to support immediate project needs and usually are chosen without an eye to future applications.

Image databases are growing in number and size, with differing associated modalities and variables. Multiple image databases are available in the health field, such as MedPix and the Cancer Imaging Archive. The cases available are organ specific, and the data are related to disease. 4 5 Because the technology to search images per se is not yet widely available or standardized, a second associated database is needed to contain metadata associated with each image. 6 Users search on the images' meta-database (or, in this case, the data about the image data) to find images of interest. 7 The overall selection of metadata variables influences the breadth and variety of research that can be conducted. The quality of the set of metadata are inextricably linked to the quality and ease of the process of gathering, selecting, and transforming the data to answer an analytical question and therefore determine the value of the data in the future. However, it is difficult to predict all potential future uses of a database and so also difficult to predict the best metadata fields to select at the outset.

How can database designers determine metadata in a way that optimizes the value of the database for future users? In a world with unlimited time and funding, unlimited possible metadata fields would be desired. However, resource constraints limit the sophistication of the metadata design to a relatively small set of variables deemed most important at the time of initial design. Selecting too few or inappropriate variables can significantly reduce the value of the data over time by limiting the potential for reuse of the data. In contrast, defining too many fields requires increased use of valuable resources and reaches a point of diminishing returns. The challenge in database design is to define a reasonably sized meta-dataset that will produce high value now and in the future. The process of designing highly useful minimal datasets is critical to maximize the value of research data over time. As a result, how and by whom the metadata are selected affects the usefulness of the database, both now and in the future.

Metadata

Metadata are the structured information that characterizes each case in the primary database and supports additional functions or actions about an object, topic, or person. 7 Good quality metadata allows the user to efficiently retrieve information in a timely manner, whereas poor quality metadata may miss pertinent cases in a database. 8 The use of appropriate and high quality metadata facilitates information retrieval, searching, maintenance, understanding, interoperability, and reuse. 9 10

In this current technology heavy world with expanding data, there are frequent opportunities for data collection and “data wrangling.” For example, images for medical purposes are occurring every day in hospitals, doctor's offices, imaging facilities, and coroner/medical examiner offices. In 2019, an estimated 91 million computed tomography (CT) scans were performed in the United States alone. 11 At present, the majority of these images are stored in picture archiving and communication system, 12 encoded with Digital Imaging and Communications in Medicine standards, 13 and the Abbreviated Injury Scale, 14 but without easily linked health-related metadata. 15 16 Without association of such primary information with the other metadata, the ability to facilitate research or reuse data are limited. As the number of images created and stored continues to grow, few facilities are incorporating plans for image reuse by investigators and educators.

Metadata Selection

The effectiveness of retrieving data is dependent upon the number of metadata fields and the content contained within. It is a balance of discovery and cost, where additional information is available with more metadata fields, but costs more resources. 10 New metadata fields can be added as necessary, making the database more adaptable (add, delete, and change variables). However, it requires significant resources to back fill the data variables added. 10

Minimum Dataset Creation

Individual metadata elements can be combined to form a set of data for an image or object, called a minimum dataset (MDS). 17 A MDS allows for interoperability of data between investigators in the healthcare system and research domains. 18 19 20 21 22 23 Major domains using an MDS to standardize retrieval of vital information include nursing, 23 24 genetics, 25 nursing homes, 26 spine trauma, 20 Infertility registry, 21 autoimmune disorders, 22 brain injury, 19 and studies of rare and orphaned diseases. 27 28

Multiple approaches can be undertaken to select metadata, including through the resource author (those conducting the research or collecting the data—usually the most common approach), a metadata specialist, or a collaborative procedure. 8 19 20 21 22 23 24 25 27 28 Evidence has shown that many resource authors lack the skills and training to index or apply terminology standards and theories. Therefore, they often create metadata that is insufficient for conducting their research or any research beyond their immediate needs. Inadequate metadata weakens the ability to discover relevant records and can produce underpowered results. Using a metadata specialist can have the same problems, as they lack knowledge about the specific science being undertaken. 29 Greenberg and Robertson 29 suggest that the best quality metadata are obtained through a collaborative process. The exact method for collaboration can vary depending on the resources available for the creation of the MDS. The methods can include the Delphi method, in which there is no direct interaction, and the Nominal Group Technique, in which a round-robin discussion occurs. 30

Assessment of a Minimum Dataset

To ensure its potential use beyond the immediate research purpose, the quality of the MDS should be evaluated. A variety of assessments have been used in the past to evaluate MDS; therefore, the evaluation of a MDS is not a consistent practice. However, the most common procedure is a survey. 31

Objectives

The Office of the Medical Investigator (OMI) is the centralized medical examiner's office for the State of New Mexico. Medical examiner cases are thought to primarily be from homicide or suicide deaths. However, the vast majority of cases in 2010 and in 2017 were from natural or accidental causes ( Table 1 ). 32 In addition, the autopsied OMI sample consist of the ethnic and racial composition of the state. In the 2010 census, 49% reported as Hispanic and 11% as Native American. 33 For the OMI sample, 30% were Hispanic in 2010 and 29% in 2017. Native Americans accounted for 9% of deaths routed to the OMI in 2010 and 2017. 32 34

Table 1. Manner of death at the old myocardial infarction in 2010 and 2017, and the new Mexico decedent image database (mid-2010 to mid-2017) 32 34 .

Manner of Death	2010	2017	NMDID
Natural	24.7%	27.4%	34.6%
Accidental	35.4%	40.8%	38.6%
Suicides	16.8%	12.9%	15.4%
Homicides	9.5%	12%	7.4%
Undetermined or pending	13.5%	5.1%	4% ^a

Open in a new tab

Abbreviation: NMDID, New Mexico Decedent Image Database.

Undetermined only.

The Center for Forensic Imaging at the OMI was awarded in 2010 a grant from the National Institute of Justice to evaluate the efficacy of postmortem computed tomography (CT) scans to supplement or supplant a traditional autopsy (2010-DN-BX-K205). As a result, roughly 85% of decedents who underwent an autopsy at the OMI received a high resolution, head-to-toe CT scan. This produced thousands of whole-body 3D CT images between 2010 and 2017—a treasure trove for a variety of research domains—but with no organized and associated metadata to allow investigators to efficiently identify images of interest . As with the vast amount of data in healthcare, curation of the OMI dataset for both education and research is greatly needed. 18

The OMI collected data for nonresearch purposes, that is, investigation, similar to the biomedical field, and healthcare data. These data would be lacking completeness and breadth needed for the effective use of the images. For this reason, we determined to collect additional data in interviews with next of kin.

The incorporation of a comprehensive annotation schema into a database occurred with the creation of the New Mexico Decedent Image Database (NMDID). NMDID facilitates future research using the CT images and associated health and lifestyle information by making them efficiently findable. NMDID is a unique resource due to its size, 3D images, and diverse population.

To design an optimal MDS, we used a collaborative procedure to choose the metadata for NMDID. We had two objectives:

To determine the MDS to associate with CT scans in a database of 3D, whole-body, decedent images developed at the OMI. The MDS should enable investigators, from multiple domains, to efficiently and effectively search for images from the database that meet the inclusion and exclusion criteria of their studies with optimal sensitivity and specificity.
To assess the relevance of the selected metadata. The MDS should be validated by a second separate group of experts to verify its usefulness in conducting research.

Methods

Design

We selected a consensus method to create the MDS to reduce biases from wither a single database creator or metadata specialist. 29 Furthermore, we used an electronic version to avoid the costs of an in person meeting. Electronic consensus was also asynchronous, so that each participant could do the work at their convenience. We chose the Delphi method because it facilitated electronic data collection 35 ; however, other consensus methods would have also been appropriate. The Delphi method involves asking experts from relevant domains to obtain convergence of opinion. 36 The method allows for anonymous participation of experts through an iterative process. Due to the varying nature of each consensus panel, the level of consensus should be determined after each round to determine when additional rounds are no longer needed.

Once an MDS is determined through an iterative process, it needs to be validated or assessed by additional experts not involved with its creation, to ensure objectivity. 31 37 Questionnaires are regularly used to validate an MDS. The process outlined here did not specify how the selected fields will be collected or coded with appropriate standards. Encoding of the metadata took place after the MDS was defined. 38 The methods used in design and validation phases are illustrated in Fig. 1 .

Fig. 1 — Methods for designing (Delphi) and validating (snowball expert sampling) a minimum dataset.

Expert Determination

For the design phase, we formulated a list of domains amenable to using whole-body, 3D, cadaveric CT scans, and the associated data for future research. Within each of the domains, peer-reviewed literature was searched to find experts. In addition, each of the authors suggested experts within their respective disciplines as well as individuals to contact for their recommendations.

For the validation phase, the design phase participants were asked to recommend two to three experts 39 whom they believed might use the full-body, 3D, cadaveric CT scans, and associated database with health, lifestyle, demographic and cause of death data. The design phase participants were asked for name, institution, and email address (if known) for each validation expert they recommended. This question was included in the third round of the Delphi survey.

All participants in the surveys had terminal degrees in their fields (MD, PhD, RN). Additional data on the participants were not captured; however, most were based in the United States with a smaller percentage being international.

Questionnaire Creation

A preliminary questionnaire was built in REDCap, 40 a web-based, open-source data capture program that has security and privacy controls. The initial questionnaire consisted of variables that had been suggested by the resource authors (S.D.B., H.J.H.E., P.J.K.) as important fields to include in the MDS. As such, these initial variables could be proxies for a resource author's database fields. The first questionnaire provided a basic set of variables within five categories: (1) personal characteristics, (2) lifestyle, (3) health, (4) occupation, and (5) other variables. For all rounds of the design phase, experts voted on original fields and the additional fields they suggested. When original terms were selected as not important for the database, they were eliminated.

The follow-up questionnaires in the design phase allowed participants to revise the groups and their own ideas. This process continued until we believed saturation had been reached. The last questionnaire within the design phase also asked participants to rate the suggested database fields in terms of importance of inclusion in the MDS (e.g., from 0 = not important at all to 10 = absolutely essential to include).

The REDCap validation questionnaire asked participants to evaluate the database fields and rate them. In addition, the experts were asked to provide any essential fields that the design phase participants did not identify.

For both the design and validation phases, a one-page recruitment letter was mailed to potential experts, as well as sent electronically to their institution email; that letter also included a one-page consent form. Because this project collected only nonsensitive data, we requested and received a waiver for a signed documentation of informed consent.

Results

Design Phase

A total of 72 experts were sent a letter and email asking for participation in the design phase. The 17 domains surveyed are listed in Table 2 . In total, 42 participants (58% response rate) completed the questionnaire. Thirty-two experts self-identified their research domain; the summary is listed in Table 3 . The emails and letters were sent at the end of September to coincide with the beginning of the fall school schedule. The questionnaire remained open until the end of November (10 weeks total).

Table 2. Research domains sent surveys.

Research domains
Informatics	16
Epidemiology	7
Anthropology	4
Forensic anthropology	4
Forensics	4
Dentistry	3
Growth and development	3
Medicine	3
Biomechanics	2
Demography	2
Health disparities	2
Health information exchanges	2
Imaging research	2
Odontology	2
Orthopedics	2
Pathology	2
Population variation	2
Public health	2
Radiology	2
Secular change	2
Chronic pain	1
Dental anthropology	1
Health economist	1
Missing person databases	1

Open in a new tab

Table 3. Research domains of participants in the design phase.

Experts' self-identified research domains	Count
Forensic anthropology	7
Anthropology	4
Biomedical informatics	3
Clinical informatics	2
Forensic radiology	2
Biological anthropology	1
Cognitive neuroscience	1
Data management	1
Demography, anthropology	1
Dental medicine, forensic dentistry, paleodontology	1
Emergency medicine	1
Forensic odontology	1
Forensic pathology	1
Health economics and health services research	1
Health services research	1
Health services/epidemiology	1
Pediatric orthopedic surgery	1
Skeletal pathology	1
Unanswered	10
Total	42

Open in a new tab

The first questionnaire contained 50 original database variables (see Appendix A for list) for experts to evaluate and discuss. If a variable was suggested for elimination by the expert, they were asked to provide a reason. At the end of each section experts were asked what additional database variables they advised to have included. This included an “other” category where variables that were outside the five categories could be suggested. Consensus was defined as 60% agreement. In round 1, only four variables were eliminated from the list: last name, first name, marital status, and current residence address.

Appendix A. Original variables.

Personal characteristics	Lifestyle characteristics	Health characteristics	Occupational characteristics	Other characteristics
Last name	Hobbies	Medical diagnoses	Current occupation	Primary cause of death
First name	Current exercise status	Surgical history	Length at occupation	Secondary cause of death
Date of birth	Exercise history	Height	Occupation history	Medical insurance status
Date of death	Current smoking status	Current weight
Current residence address	Smoking history	Weight history
Length at current residence	Current drinking status	Childhood health status
Marital status	Drinking history	Diabetes history
Sex/gender	Current drug use status	Family history of cancer
Race	Drug use history	Cancer diagnosis
Hispanic ethnicity		High blood pressure history
Country of origin		History of broken bones
Parents' country of origin		History of other diseases/disorders
Number of pregnancies		Dental health as a child
Number of live births		Dental health as an adult
Number of living offspring
Annual income
Highest education level
Handedness

Open in a new tab

One investigator (S.D.B.) summarized the results and combined similar suggestions. The second questionnaire contained 120 database variables (including 46 of the original variables) for experts to evaluate. Thirty-three participants (46% response rate) responded to round 2 of the design phase. Round 2 was completed in 2 weeks. Agreement on inclusion of the database variables was extremely high despite the variation in the experts' research domains. As a result, consensus for round 2 was defined as 93%. This value was selected due to a large number of tied variables for importance below 93% (with 50 variables at 92 to 80% consensus). This cut-off point resulted in a manageable number of variables since the data would be coming from calling next of kin and extraction from the medical examiner's database. 41 After elimination of database variables that had less than 93% consensus, one additional variable was added back in (normal height) since related variable (cadaveric height) was part of the MDS. A total of 59 database variables remained after round 2 of the Delphi method determination of the MDS ( Table 4 for the MDS).

Table 4. Fifty-nine metadata variables selected through a modified Delphi method.

Health characteristics		Lifestyle characteristics	Personal characteristics	Highest educational level	Other characteristics
Birth weight	Family history of cancer	Repetitive or habitual activities	Date of birth	Childhood socioeconomic status	Primary cause of death
Congenital abnormalities	History of radiation therapy	Current smoking status	Date of death	Adult socioeconomic status	Contributing cause of death
Medical diagnoses	Facial trauma	Smoking history	zip code	Occupational characteristics	Manner of death
Surgical history	Implanted devises	Current drinking status	Sex/gender	Current occupation	Time delay between death and CT scan
Current medications	Genetic disorders	Drinking history	Race	Length at current occupation	Location of death
Current height	Scoliosis	Current drug use	Country of origin	Major occupation during life	Environmental conditions of cadaver
Cadaver length	Plastic surgery	Drug use history	Number of years in US	Occupation history	Method for identification
Current weight	Dental health as a child	Dietary pattern	Parents' country of origin	Exposure to carcinogens	CT scanner settings
Cadaver weight	Dental health as an adult		Number of pregnancies	Strenuous lifting	Person entering data
Bone density	Presence of dental caries		Number of live births	Length of military service

Open in a new tab

Abbreviations: CT, computed tomography; US, United States.

Round three (31% participation) helped to determine the importance of the variables on a sliding scale from 0 (not important at all) to 10 (essential). This round allowed reduction in the number of variables to be collected if funding was not adequate to capture all 59 variables.

Validation

A true validation would require years of data usage on the database. As this was not possible when determining the metadata, an assessment of the fields was used. This assessment helped to determine how useful the MDS is to researchers outside of the design group.

A total of 34 experts were suggested by 15 design phase participants. Fifty-three percent of participants responded ( Table 5 for their self-identified primary field of interest), suggesting variables for elimination and rating the database fields in order of importance in the MDS.

Table 5. Self-identified research domains of participants in the validation phase.

Experts' self-identified research domains	Count
Forensic anthropology	4
Anthropology	3
Biological anthropology	1
Dentistry	1
Forensic odontology	1
Forensic pathology	1
Interprofessional collaboration	1
Medical devices	1
Medical imaging	1
Modern human skeletal variation	1
Physical anthropology	1
Skeletal biology	1
Total	17

Open in a new tab

No variables were selected for deletion from the MDS during the validation phase. The level of consensus was lower during this portion; however, the majority of variables had greater than 70% consensus (31/59 variables). This demonstrates that the variables selected by the design phase participants were thorough in the selection process and included roughly 60% of the variables the validation phase participants would need for their research.

The validation phase also allowed for additional variables not included in the design phase to be elucidated. Fourteen variables were suggested for addition by the validation phase participants, with only three variables not included or inferred from the original MDS: maxillofacial skeletal relationship category, dental occlusion category, and organ weights. Since most of the variables the researchers wanted were actually included in the database or could be inferred, the 60% estimate of variable usefulness is an understatement. Table 6 for the complete list of variables suggested.

Table 6. Variables suggested in validation phase.

Variables suggested for inclusion	MDS variable can be inferred from	MDS variable can be an additional response	Number of participants suggesting change
Absence/presence of removable dental implants		Implanted devices	1
Occupation of parents	Childhood socioeconomic status		1
Income of parents	Childhood socioeconomic status		1
Income of decedent	Adult socioeconomic status		1
Exercise habits	Habitual activity		1
How consistent was exercise	Habitual activity		1
Was the individual an athlete	Habitual activity		1
Presence of amputations		Major surgeries	1
Presence of surgical implants		Implanted devices	1
Trauma present at death	History of broken bones, primary cause of death, and contributing cause of death		2
Age	Date of death and date of birth		1
Maxillofacial skeletal category			1
Dental occlusion category			1
Organ weights			1

Open in a new tab

Abbreviation: MDS, minimum dataset.

Discussion

The future value of a research database is based on the quality of metadata and an optimal design of the MDS. This is especially true in the realm of image databases. Because the technology to search on the images themselves is in its infancy and not yet ubiquitous, 42 the discovery of specific images of interest relies heavily on the quality of the metadata design. Without sufficient metadata, images will be significantly less discoverable and the sensitivity and specificity of a search or query will decrease markedly. Therefore, appropriate metadata are vital, yet conceptually complicated, requiring a thoughtful balance between discoverability of relevant images and the resources necessary to design and construct a sufficient MDS. Using a consensus method with experts from varying domains is a valuable approach to improve the quality and completeness of the chosen variables and lessen bias. 29 Although varying domains were sought, in some fields experts were not be determined such as public safety. Additionally, the majority of respondents in both the design and validation phases could be classified as anthropologists. This could add bias to how “future proof” the database will become. However, the diversity of research within anthropology is great and those surveyed performed very different research. In addition, many domains not surveyed have used the database to date, such as public safety, art, and virtual education.

Although this method is robust in its ability to identify potentially “future proof” metadata, it is not infallible. Not all variables are discoverable, even after three rounds with experts suggesting and editing metadata fields, and a validation round in which additional participants recommended further variables. Researchers eliminated marital status as a variable in the first round, and it was not suggested for inclusion by the validation phase participants. This is surprising given that it is commonly included in health datasets as good indicator of health. 43 44 45

After a consensus of 93% was imposed on round 2, 59 variables remained. For round 3, the experts were asked if a variable should be kept in the database and how important it might be to future research using the database. This provided us with the ability to create a sliding cut-off point depending on how many final fields we could include in the database.

Some of the variables chosen as important and those eliminated were surprising. The final list of database fields ( n = 59) contained only 26 original variables (52%) of the 50 selected by authors (S.D.B., H.J.H.E., P.J.K.). Table 7 summarizes the variable counts. The vast majority of final variables were suggested by the experts and validated by a separate group. The resulting fields spanned personal characteristics, circumstances of death, health, and lifestyle as well as CT settings. This process supports the value of a consensus method incorporating opinions beyond those of the current project designers.

Table 7. Number of database variables by round in design phase.

Round	Respondents (response rate)	Number of variables evaluated	Number of original variables	Consensus cut-off point
1	42 (68%)	50	50	60%
2	33 (46%)	120	47	93%
3	22 (31%)	59	26	NA

Open in a new tab

Abbreviation: NA, not applicable.

The validation phase also demonstrated that more than 60% of the variables were of interest to the participants for their specific research questions. In addition, of those variables suggested for inclusion in the MDS by the validation group, only three could not be deemed equivalent to existing variables. This suggests that the method of development for this MDS was successful in its attempt to be more future proof and accommodate research from multiple domains.

NMDID became available to the research public in February 2020; as of November 3, 2020, there were 327 users representing 34 countries. The data and images have been used for research on multiple projects including biomechanics, COVID-19, traumatic injury analysis, dental development, art, sarcoidosis, Hispanic diversity, obesity research, and virtual education. 41 While we expected that education might constitute a relatively minor component of uses for NMDID, the requirements imposed by the COVID-19 pandemic significantly increased these applications, providing a case-study for the unexpected value of future proofing.

Conclusion

Using virtual Delphi and snowball methodologies to obtain consensus can be an extremely beneficial tool for MDS design. These two methods require a large number of experts to consider appropriate variables but can be conducted at a relatively low cost. Furthermore, by requiring consensus among disparate researchers, bias that may be inherent in one individual's metadata creation can be balanced by the opinions of others. Other consensus methods may also be beneficial but would also require the diverse domains queried and a validation phase.

It is difficult to ensure that any database will be “future proof.” However, the database will likely remain more relevant in the future if more than a single metadata expert or original team of investigators designed the metadata. In this case, if only database creators (S.D.B., H.J.H.E., P.J.K.) had been consulted for MDS creation for NMDID, over 56% of final variables would not have been captured. This research suggests not only is expert group opinion the path to follow for MDS development, but diverse representation is vital for making a MDS more “future proof.”

The variables such as operationalized, vocabulary standards applied, and seven additional fields of interest to the authors (including marital status) were added before NMDID was built. The operationalization phase required joining some variables together such as current and former smoking status, and breaking others apart such as sex and gender. In the final MDS, there are 69 variables. 46 The database is currently freely available at NMDID.UNM.EDU.

Clinical Relevance Statement

This article demonstrates a method to ensure a database is more “future proof” when created from the artifact of care.

Multiple Choice Questions

The best method for creating the metadata of a database is to query:
1. The resource author
2. A metadata specialist
3. The consensus of experts
Correct Answer: The answer is option c. A resource author is biased to their own research and a metadata specialist does not know the research topic as well. So, the best technique for lessening biases is to use a consensus of experts. 29
A “future-proof” database allows for:
1. Only the original research question to be answered
2. Research beyond the original purpose
3. Previous research to be reanalyzed
Correct Answer: The correct answer is option a. A future-proof database allows for the original research question as well as research beyond the original purpose. It ensures that future questions can be answered.

Acknowledgments

The authors would like to thank the Office of the Medical Investigator in Albuquerque, NM. Statements made are solely the responsibility of the authors

Funding Statement

Funding This study is funded by National Institute of Justice 2016-DN-BX-0144.

Conflict of Interest None declared.

Protection of Human and Animal Subjects

We received Institutional Review Board approval from the University of New Mexico Human Subjects Research Review Committee on June 10, 2013 (Human Research Protections Office13–229).

References

1.How much data is created every day?[27 powerful stats]. Accessed November 24, 2020 at:https://seedscientific.com/how-much-data-is-created-every-day/
2.Pollard T J, Johnson A EW, Raffa J D, Celi L A, Mark R G, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5(01):180178. doi: 10.1038/sdata.2018.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bielefeld R A, Yamashita T S, Kerekes E F, Ercanli E, Singer L T. A research database for improved data management and analysis in longitudinal studies. MD Comput. 1995;12(03):200–205. [PMC free article] [PubMed] [Google Scholar]
4.National Library of Medicine MedPixAccessed 2021 at:https://medpix.nlm.nih.gov/home
5.Cancer Imaging Archive Accessed 2021 at:https://www.cancerimagingarchive.net/
6.Tagare H D, Jaffe C C, Duncan J. Medical image databases: a content-based retrieval approach. J Am Med Inform Assoc. 1997;4(03):184–198. doi: 10.1136/jamia.1997.0040184. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Greenberg J. Metadata generation: processes, people and tools. Bull Am Soc Inf Sci Technol. 2005;29(02):16–19. [Google Scholar]
8.Sarah C, Jane B, Rónán O, Ben R. Quality assurance for digital learning object repositories: issues for the metadata creation process. ALT J. 2004;12(01):5–20. [Google Scholar]
9.Sicilia M-Á. Metadata, semantics, and ontology:providing meaning to information resources. Int J Metadata Semant Ontol. 2006;1(01):83–86. [Google Scholar]
10.Malaxa V, Douglas I. A Framework for metadata creation tools. Interdiscip J E Learning Learn Objects. 2005;1(01):151–162. [Google Scholar]
11.2019 CT Market Outlook ReportAccessed 2019 at:https://imvinfo.com/product/2020-ct-market-outlook-report/
12.Choplin R H, Boehme J M, II, Maynard C D. Picture archiving and communication systems: an overview. Radiographics. 1992;12(01):127–129. doi: 10.1148/radiographics.12.1.1734458. [DOI] [PubMed] [Google Scholar]
13.DICOM Accessed Accessed November 24, 2020 at:https://www.dicomstandard.org/
14.Greenspan L, McLellan B A, Greig H. Abbreviated injury scale and injury severity score: a scoring chart. J Trauma. 1985;25(01):60–64. doi: 10.1097/00005373-198501000-00010. [DOI] [PubMed] [Google Scholar]
15.Annamalai M, Guo D, Susan M, Sep J S.2009 U. Oracle database 11 g DICOM medical image supportAccessed 2009 at:https://download.oracle.com/otndocs/products/multimedia/pdf/oow2009/mm_oow09_dicom_S311474.pdf
16.Health at a Glance 2017: OECD IndicatorsAccessed 2017 at:https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance-2017_health_glance-2017-en
17.Health Information Policy Council Background paper: uniform minimum health data setsAccessed 1983 at:https://link.springer.com/chapter/10.1007/978-1-4757-4160-5_18
18.Werley H H, Devine E C, Zorn C R, Ryan P, Westra B L. The nursing minimum data set: abstraction tool for standardized, comparable, essential data. Am J Public Health. 1991;81(04):421–426. doi: 10.2105/ajph.81.4.421. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Domensino A F, Winkens I, van Haastregt J CM, van Bennekom C AM, van Heugten C M. Defining the content of a minimal dataset for acquired brain injury using a Delphi procedure. Health Qual Life Outcomes. 2020;18(01):30. doi: 10.1186/s12955-020-01286-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Tee J W, Chan C HP, Gruen R L.Inception of an Australian Spine Trauma Registry: The Minimum DatasetAccessed 2012 at:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864422/ [DOI] [PMC free article] [PubMed]
21.Abbasi M, Ahmadian L, Amirian M, Tabesh H, Eslami S. Perspect Heal Inf Manag; 2018. The Development of a Minimum Data Set for an Infertility Registry. [PMC free article] [PubMed] [Google Scholar]
22.McCann L J, Kirkham J J, Wedderburn L R. Development of an internationally agreed minimal dataset for juvenile dermatomyositis (JDM) for clinical and research use. Trials. 2015;16(01):268. doi: 10.1186/s13063-015-0784-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ranegger R, Hackl W O, Ammenwerth E. A proposal for an Austrian nursing minimum data set (NMDS): a delphi study. Appl Clin Inform. 2014;5(02):538–547. doi: 10.4338/ACI-2014-04-RA-0027. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Werley H H, Lang N M, Westlake S K. Brief summary of the nursing minimum data set conference. Nurs Manage. 1986;17(07):42–45. [PubMed] [Google Scholar]
25.Meaney F J, Cunningham G C, Riggle S M.Development of a national genetic services database Proc Symp Comput Appl Med Care. Accessed 1991 at: Published online1991424–428.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2247567/ [PMC free article] [PubMed]
26.Porock D, Oliver D P, Zweig S. Predicting death in the nursing home: development and validation of the 6-month minimum data set mortality risk index. J Gerontol A Biol Sci Med Sci. 2005;60(04):491–498. doi: 10.1093/gerona/60.4.491. [DOI] [PubMed] [Google Scholar]
27.Rubinstein Y R, Groft S C, Bartek R. Creating a global rare disease patient registry linked to a rare diseases biorepository database: Rare Disease-HUB (RD-HUB) Contemp Clin Trials. 2010;31(05):394–404. doi: 10.1016/j.cct.2010.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Jenders R, McDonald C, Rubinstein Y, Groft S.Applying standards to public health: an information model for a global rare-diseases registryAccessed 2011 at: Published online 2011:1819https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900177/
29.Greenberg J, Robertson W D.Semantic Web Construction: An Inquiry of Authors' Views on Collaborative Metadata GenerationVol 0.;2002 [Google Scholar]
30.Bagley Thompson C, Schaffer J. Minimum data set development: air transport time-related terms. Int J Med Inform. 2002;65(02):121–133. doi: 10.1016/s1386-5056(02)00008-4. [DOI] [PubMed] [Google Scholar]
31.Hillmann D I. Metadata quality: From evaluation to augmentation. Cat Classif Q. 2008;46(01):65–80. [Google Scholar]
32.Zumwalt R E, Aurelius M, Brooks E.2010 Annual report office of the medical investigator state of New MexicoAccessed 2010 at:https://hsc.unm.edu/omi/_docs/pdfs/ar2010.pdf
33.2010 Census: new Mexico profileAccessed August 5, 2020 at:https://www2.census.gov/geo/maps/dc10_thematic/2010_Profile/2010_Profile_Map_New_Mexico.pdf
34.New Mexico office of the medical investigator annual reportAccessed 2017 at:https://hsc.unm.edu/omi/_docs/pdfs/ar2018.pdf
35.Yousuf M I.Using experts‘experts’opinions through Delphi techniquePract Assess, Res Eval 2007;12(04):
36.Hsu C-C, Sandford B A. The Delphi technique: making sense of consensus. Pract Assess, Res Eval. 2007;12:10. [Google Scholar]
37.Goossen W TF, Epping P JMM, Feuth T, Dassen T WN, Hasman A, van den Heuvel W JA. A comparison of nursing minimal data sets. J Am Med Inform Assoc. 1998;5(02):152–163. doi: 10.1136/jamia.1998.0050152. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Berry S D, Edgar H JH. Standardizing data from the dead. Stud Health Technol Inform. 2019;264:1427–1428. doi: 10.3233/SHTI190467. [DOI] [PubMed] [Google Scholar]
39.Leo A. Goodman. Snowball Sampling. Ann Math Stat. 1961;32(01):148–170. [Google Scholar]
40.Harris P A, Taylor R, Thielke R, Payne J, Gonzalez N, Conde J G. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(02):377–381. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Edgar H, Daneshvari Berry S, Moes E, Adolphi N, Bridges P, Nolte K. Office of the Medical Investigator; University of New Mexico: 2020. New Mexico decedent image database. [Google Scholar]
42.Hou J, Chen Z, Qin X, Zhang D. Automatic image search based on improved feature descriptors and decision tree. Integr Comput Aided Eng. 2011;18(02):167–180. [Google Scholar]
43.Robards J, Evandrou M, Falkingham J, Vlachantoni A. Marital status, health and mortality. Maturitas. 2012;73(04):295–299. doi: 10.1016/j.maturitas.2012.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Verbrugge L M.Marital Status and HealthVol 41.; Accessed 2021 at:https://psycnet.apa.org/record/1980-27843-001 [Google Scholar]
45.Umberson D. Gender, marital status and the social control of health behavior. Soc Sci Med. 1992;34(08):907–917. doi: 10.1016/0277-9536(92)90259-s. [DOI] [PubMed] [Google Scholar]
46.Berry S D, Edgar H JH. Extracting and standardizing medical examiner data to improve health. AMIA Jt Summits Transl Sci proceedings AMIA Jt Summits. Transl Sci. 2020;2020:63–70. [PMC free article] [PubMed] [Google Scholar]

[OR210012ra-1] 1.How much data is created every day?[27 powerful stats]. Accessed November 24, 2020 at:https://seedscientific.com/how-much-data-is-created-every-day/

[JR210012ra-2] 2.Pollard T J, Johnson A EW, Raffa J D, Celi L A, Mark R G, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5(01):180178. doi: 10.1038/sdata.2018.178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-3] 3.Bielefeld R A, Yamashita T S, Kerekes E F, Ercanli E, Singer L T. A research database for improved data management and analysis in longitudinal studies. MD Comput. 1995;12(03):200–205. [PMC free article] [PubMed] [Google Scholar]

[OR210012ra-4] 4.National Library of Medicine MedPixAccessed 2021 at:https://medpix.nlm.nih.gov/home

[OR210012ra-5] 5.Cancer Imaging Archive Accessed 2021 at:https://www.cancerimagingarchive.net/

[JR210012ra-6] 6.Tagare H D, Jaffe C C, Duncan J. Medical image databases: a content-based retrieval approach. J Am Med Inform Assoc. 1997;4(03):184–198. doi: 10.1136/jamia.1997.0040184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-7] 7.Greenberg J. Metadata generation: processes, people and tools. Bull Am Soc Inf Sci Technol. 2005;29(02):16–19. [Google Scholar]

[JR210012ra-8] 8.Sarah C, Jane B, Rónán O, Ben R. Quality assurance for digital learning object repositories: issues for the metadata creation process. ALT J. 2004;12(01):5–20. [Google Scholar]

[JR210012ra-9] 9.Sicilia M-Á. Metadata, semantics, and ontology:providing meaning to information resources. Int J Metadata Semant Ontol. 2006;1(01):83–86. [Google Scholar]

[JR210012ra-10] 10.Malaxa V, Douglas I. A Framework for metadata creation tools. Interdiscip J E Learning Learn Objects. 2005;1(01):151–162. [Google Scholar]

[OR210012ra-11] 11.2019 CT Market Outlook ReportAccessed 2019 at:https://imvinfo.com/product/2020-ct-market-outlook-report/

[JR210012ra-12] 12.Choplin R H, Boehme J M, II, Maynard C D. Picture archiving and communication systems: an overview. Radiographics. 1992;12(01):127–129. doi: 10.1148/radiographics.12.1.1734458. [DOI] [PubMed] [Google Scholar]

[OR210012ra-13] 13.DICOM Accessed Accessed November 24, 2020 at:https://www.dicomstandard.org/

[JR210012ra-14] 14.Greenspan L, McLellan B A, Greig H. Abbreviated injury scale and injury severity score: a scoring chart. J Trauma. 1985;25(01):60–64. doi: 10.1097/00005373-198501000-00010. [DOI] [PubMed] [Google Scholar]

[OR210012ra-15] 15.Annamalai M, Guo D, Susan M, Sep J S.2009 U. Oracle database 11 g DICOM medical image supportAccessed 2009 at:https://download.oracle.com/otndocs/products/multimedia/pdf/oow2009/mm_oow09_dicom_S311474.pdf

[OR210012ra-16] 16.Health at a Glance 2017: OECD IndicatorsAccessed 2017 at:https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance-2017_health_glance-2017-en

[OR210012ra-17] 17.Health Information Policy Council Background paper: uniform minimum health data setsAccessed 1983 at:https://link.springer.com/chapter/10.1007/978-1-4757-4160-5_18

[JR210012ra-18] 18.Werley H H, Devine E C, Zorn C R, Ryan P, Westra B L. The nursing minimum data set: abstraction tool for standardized, comparable, essential data. Am J Public Health. 1991;81(04):421–426. doi: 10.2105/ajph.81.4.421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-19] 19.Domensino A F, Winkens I, van Haastregt J CM, van Bennekom C AM, van Heugten C M. Defining the content of a minimal dataset for acquired brain injury using a Delphi procedure. Health Qual Life Outcomes. 2020;18(01):30. doi: 10.1186/s12955-020-01286-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OR210012ra-20] 20.Tee J W, Chan C HP, Gruen R L.Inception of an Australian Spine Trauma Registry: The Minimum DatasetAccessed 2012 at:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864422/ [DOI] [PMC free article] [PubMed]

[BR210012ra-21] 21.Abbasi M, Ahmadian L, Amirian M, Tabesh H, Eslami S. Perspect Heal Inf Manag; 2018. The Development of a Minimum Data Set for an Infertility Registry. [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-22] 22.McCann L J, Kirkham J J, Wedderburn L R. Development of an internationally agreed minimal dataset for juvenile dermatomyositis (JDM) for clinical and research use. Trials. 2015;16(01):268. doi: 10.1186/s13063-015-0784-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-23] 23.Ranegger R, Hackl W O, Ammenwerth E. A proposal for an Austrian nursing minimum data set (NMDS): a delphi study. Appl Clin Inform. 2014;5(02):538–547. doi: 10.4338/ACI-2014-04-RA-0027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-24] 24.Werley H H, Lang N M, Westlake S K. Brief summary of the nursing minimum data set conference. Nurs Manage. 1986;17(07):42–45. [PubMed] [Google Scholar]

[OR210012ra-25] 25.Meaney F J, Cunningham G C, Riggle S M.Development of a national genetic services database Proc Symp Comput Appl Med Care. Accessed 1991 at: Published online1991424–428.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2247567/ [PMC free article] [PubMed]

[JR210012ra-26] 26.Porock D, Oliver D P, Zweig S. Predicting death in the nursing home: development and validation of the 6-month minimum data set mortality risk index. J Gerontol A Biol Sci Med Sci. 2005;60(04):491–498. doi: 10.1093/gerona/60.4.491. [DOI] [PubMed] [Google Scholar]

[JR210012ra-27] 27.Rubinstein Y R, Groft S C, Bartek R. Creating a global rare disease patient registry linked to a rare diseases biorepository database: Rare Disease-HUB (RD-HUB) Contemp Clin Trials. 2010;31(05):394–404. doi: 10.1016/j.cct.2010.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OR210012ra-28] 28.Jenders R, McDonald C, Rubinstein Y, Groft S.Applying standards to public health: an information model for a global rare-diseases registryAccessed 2011 at: Published online 2011:1819https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900177/

[BR210012ra-29] 29.Greenberg J, Robertson W D.Semantic Web Construction: An Inquiry of Authors' Views on Collaborative Metadata GenerationVol 0.;2002 [Google Scholar]

[JR210012ra-30] 30.Bagley Thompson C, Schaffer J. Minimum data set development: air transport time-related terms. Int J Med Inform. 2002;65(02):121–133. doi: 10.1016/s1386-5056(02)00008-4. [DOI] [PubMed] [Google Scholar]

[JR210012ra-31] 31.Hillmann D I. Metadata quality: From evaluation to augmentation. Cat Classif Q. 2008;46(01):65–80. [Google Scholar]

[OR210012ra-32] 32.Zumwalt R E, Aurelius M, Brooks E.2010 Annual report office of the medical investigator state of New MexicoAccessed 2010 at:https://hsc.unm.edu/omi/_docs/pdfs/ar2010.pdf

[OR210012ra-33] 33.2010 Census: new Mexico profileAccessed August 5, 2020 at:https://www2.census.gov/geo/maps/dc10_thematic/2010_Profile/2010_Profile_Map_New_Mexico.pdf

[OR210012ra-34] 34.New Mexico office of the medical investigator annual reportAccessed 2017 at:https://hsc.unm.edu/omi/_docs/pdfs/ar2018.pdf

[OR210012ra-35] 35.Yousuf M I.Using experts‘experts’opinions through Delphi techniquePract Assess, Res Eval 2007;12(04):

[JR210012ra-36] 36.Hsu C-C, Sandford B A. The Delphi technique: making sense of consensus. Pract Assess, Res Eval. 2007;12:10. [Google Scholar]

[JR210012ra-37] 37.Goossen W TF, Epping P JMM, Feuth T, Dassen T WN, Hasman A, van den Heuvel W JA. A comparison of nursing minimal data sets. J Am Med Inform Assoc. 1998;5(02):152–163. doi: 10.1136/jamia.1998.0050152. [DOI] [PMC free article] [PubMed] [Google Scholar]

[JR210012ra-38] 38.Berry S D, Edgar H JH. Standardizing data from the dead. Stud Health Technol Inform. 2019;264:1427–1428. doi: 10.3233/SHTI190467. [DOI] [PubMed] [Google Scholar]

[JR210012ra-39] 39.Leo A. Goodman. Snowball Sampling. Ann Math Stat. 1961;32(01):148–170. [Google Scholar]

[JR210012ra-40] 40.Harris P A, Taylor R, Thielke R, Payne J, Gonzalez N, Conde J G. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(02):377–381. doi: 10.1016/j.jbi.2008.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[BR210012ra-41] 41.Edgar H, Daneshvari Berry S, Moes E, Adolphi N, Bridges P, Nolte K. Office of the Medical Investigator; University of New Mexico: 2020. New Mexico decedent image database. [Google Scholar]

[JR210012ra-42] 42.Hou J, Chen Z, Qin X, Zhang D. Automatic image search based on improved feature descriptors and decision tree. Integr Comput Aided Eng. 2011;18(02):167–180. [Google Scholar]

[JR210012ra-43] 43.Robards J, Evandrou M, Falkingham J, Vlachantoni A. Marital status, health and mortality. Maturitas. 2012;73(04):295–299. doi: 10.1016/j.maturitas.2012.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[BR210012ra-44] 44.Verbrugge L M.Marital Status and HealthVol 41.; Accessed 2021 at:https://psycnet.apa.org/record/1980-27843-001 [Google Scholar]

[JR210012ra-45] 45.Umberson D. Gender, marital status and the social control of health behavior. Soc Sci Med. 1992;34(08):907–917. doi: 10.1016/0277-9536(92)90259-s. [DOI] [PubMed] [Google Scholar]

[JR210012ra-46] 46.Berry S D, Edgar H JH. Extracting and standardizing medical examiner data to improve health. AMIA Jt Summits Transl Sci proceedings AMIA Jt Summits. Transl Sci. 2020;2020:63–70. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Developing the Minimum Dataset for the New Mexico Decedent Image Database

Shamsi Daneshvari Berry

Philip J Kroth

Heather J H Edgar

Teddy D Warner

Abstract

Introduction

Metadata

Metadata Selection

Minimum Dataset Creation

Assessment of a Minimum Dataset

Objectives

Table 1. Manner of death at the old myocardial infarction in 2010 and 2017, and the new Mexico decedent image database (mid-2010 to mid-2017) 32 34 .

Methods

Design

Fig. 1.

Expert Determination

Questionnaire Creation

Results

Design Phase

Table 2. Research domains sent surveys.

Table 3. Research domains of participants in the design phase.

Appendix A. Original variables.

Table 4. Fifty-nine metadata variables selected through a modified Delphi method.

Validation

Table 5. Self-identified research domains of participants in the validation phase.

Table 6. Variables suggested in validation phase.

Discussion

Table 7. Number of database variables by round in design phase.

Conclusion

Clinical Relevance Statement

Multiple Choice Questions

Acknowledgments

Funding Statement

Protection of Human and Animal Subjects

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases