Skip to main content
. 2021 Jun 10;12:77. doi: 10.1186/s13244-021-01018-1

Fig. 3.

Fig. 3

MedStruct pipeline. Schematic overview of the MedStruct pipeline, in which five different microservices are present: preprocessing, spaCy, pyContextNLP, measurement extractor and T-stage classifier. The report can be processed either from an Excel file or direct from the graphical user interface (GUI). All components use an intermediate JavaScript Object Notation (JSON) annotation format to chain the pipeline components and can be consumed over REpresentational State Transfer (REST) or chained using a message broker. The use of a JSON annotation format simplifies reusability of the different components, enables mixing programming languages, prevents for duplicate processing and guarantees token alignment between components. This implementation saves annotations at token level instead of sentence level, which enables precise highlighting of annotations in a GUI. Detected tumor and lymph nodes are stored as objects in a list, allowing for detection of concurrent mentions. Documents can now be processed individually with the same rule-based algorithm