Skip to main content
. 2023 May 13;6(2):ooad035. doi: 10.1093/jamiaopen/ooad035

Table 1.

An overview of the salient standards used in TriNetX, the associated mapping activities, and challenges introduced by the global heterogeneity

Data type Source vocabulary Target terminology Method
Demographics EHR and ADT HL7 v3 vocabulary for sex, race, ethnicity, vital status; ISO 639 for language Manual mapping
Encounters EHR and ADT, or derived by TriNetX HL7 v3 vocabulary for visit type (eg, inpatient, outpatient, ER) Manual mapping
Diagnoses
  • US: ICD-10-CM,

  • Ex-US: ICD-10 (WHO version), regional modifications such as ICD-10-GM and occasionally SNOMED

ICD-10-CM
  • For SNOMED source coding (eg, problem list entries) an existing SNOMED to ICD-10-CM mapping is used and extended upon.11

  • For ICD-10 (WHO version) versus ICD-10-CM, string matching for description is applied (eg, ICD-10 K07.1 is mapped to ICD-10-CM M26.1 since both share description “Anomalies of jaw-cranial base relationship,” but are found in different branches of these terminologies)

  • For national extensions (such as ICD-10-AM in Australia) that usually include more specific concepts than ICD-10-CM, those need be mapped to the nearest common ancestor (eg, ICD-10-AM B95.41 “Streptococcus Group C” and ICD-10-AM B95.42 “Streptococcus Group G” are mapped to ICD-10-CM B95.4 “Other streptococcus as the cause of diseases classified elsewhere.”)

Procedures
  • US: ICD-10-PCS, HCPCS, CPT

  • Ex-US: ICD-10-PCS, OPS (Germany), OPCS (UK), and ICD-9 (Italy, Poland)

  • ICD-10-PCS and SNOMED

  • HCPCS and CPT (only for US HCOs)

  • Harmonizing clinical procedures coded with ICD-10-PCS remains an unsolved non-trivial challenge.12

  • For countries not using ICD-10-PCS, TriNetX maps local procedure standards to SNOMED procedures (no perfect mappings are available due to different information coded).

  • The mapping for German OPS was done in collaboration with Averbis GmbH13,14 and is released as open-source at https://open.trinetx.com.

  • UK’s OPCS provides a native mapping to SNOMED.

Medications and Vaccinations
  • US: RxNorm, NDC, other commercial and local coding systems

  • Ex-US: ATC, AEMPS (Spain), DM+D (UK), CNK (Belgium), EAN (Poland)

RxNorm, OMOP extension of RxNorm, CVX Group codes Semi-automated methods involving the use of external sources such as RxNorm “ApproximateTerm” API are utilized. For national catalogues of medications, TriNetX maps medications to RxNorm Ingredients + Route + Brand + Strength.
Lab results, clinical findings, and vital signs Local lab coding or LOINC LOINC
  • Regenstrief LOINC Mapping Assistant (RELMA)15 is used to map at least the concepts covering 80% of the most frequent observations of an HCO.

  • Automatic unit conversion based on UCUM is applied.

  • Lab result distributions are used to validate the correctness of mappings.

Genomics Structured data from: molecular diagnostic labs (XML, JSON, CSV files), annotated VCF files, cancer registry data from NAACCR records HGNC (gene symbols), HGVS (SNVs), ISCN (SVs, cytogenomic), Genomic Coordinates, rsID, LOINC (eg, IHC, MSI)
  • Variants encountered in HCO data are available under the corresponding gene and named using HGVS. To avoid an excessive number of variants, only those present in the data of any of the HCOs are included in the TriNetX terminology.

  • Site of biopsy and type of variant are also included.

Oncology
  • US: NAACCR

  • ex-US: ICD-O, ICD-10-CM

ICD-O NAACCR-based data sources (United States) are almost always linked to ICD-O, but other regions (eg, EMEA or Australia) frequently do not provide ICD-O data. However, when oncology data are provided using ICD-10-CM codes, additional mappings from ICD-10-CM to ICD-O topographies are applied. Additionally, when morphologies are not provided, some ICD-10-CM codes provide morphology information, enabling the derivation of ICD-O morphologies.
Cross-domain mappings selected HCPCS, SNOMED, and ICD-10-PCS codes RxNorm Data types are not homogeneous across regions, and some medications are frequently reported within procedures data sources (eg, CPT or OPS), so cross-domain mappings are also required to maximize the data coverage of TriNetX at a global scale.

ADT: Admission Discharge Transfer; AEMPS: Agencia Española de Medicamentos y Productos Sanitarios; ATC: Anatomical Therapeutic Chemical; CPT: Current Procedural Terminology; CSV: Comma Separated Variable; DM + D: Dictionary of Medicines and Devices; EAN: European Article Numbering; HCPCS: Healthcare Common Procedure Coding System; HGNC: HUGO Gene Nomenclature Committee; HGVS: Human Genome Variation Society; HL-7: Health Level Seven; ICD-9: International Classification of Diseases, Ninth Revision; ICD-10-CM: International Classification of Diseases, Tenth Revision, Clinical Modification; ICD-10-GM: International Classification of Diseases, Tenth Revision, German Modification; ICD-10-PCS: International Classification of Diseases, Tenth Revision, Procedure Coding System; ICD-O-3: International Classification of Diseases for Oncology, third edition; IHC: Immunohistochemistry; ISCN: International System for Human Cytogenomic Nomenclature; JSON: JavaScript Object Notation; LOINC: Logical Observation Identifiers Names and Codes; MSI: Microsatellite instability; NAACCR: North American Association of Central Cancer Registries; NLP: natural language processing; NDC: National Drug Code; OPCS-4: OPCS Classification of Surgical Operations and Procedures (4th revision); OPS: Operationen- und Prozedurenschlüssel; rsID: Reference SNP cluster ID; SNOMED: Systematized Nomenclature of Medicine; SNV: single-nucleotide variants; SV: structural variant; VCF: Variant Call Format; XML: Extensible Markup Language.