[Preprint]. 2024 Sep 3:2024.09.02.24312917. [Version 1] doi: 10.1101/2024.09.02.24312917

Table 5 -. Results after adapted OCR method (output-categories and boolean).

Standard Prompt: “You are a helpful medical assistant. You are supposed to extract information from a pathology report from a patient with colorectal cancer. I need to know the TNM stage of the patient. This is a system to describe the amount and spread of cancer in a patient’s body, using TNM. T describes the size of the tumor and any spread of cancer into nearby tissue; N describes spread of cancer to nearby lymph nodes; and M describes metastasis (spread of cancer to other parts of the body). If you find no information about the T, N or M stage, give Tx, Nx or Mx, respectively. If there is “pT1” or “pN”, just skip the “p” and give “T1” etc. Additionally, I need information about the number of lymph nodes examined and the number of positive lymph nodes. Let me know if the resection margin was tumor free and if there was lymphatic invasion. If you do not find information about resection margin or lymphatic invasion, say “not mentioned”.

This is the report: {report} ”

All T stage N stage M stage Number of lymphnodes examined Number of positive lymphnodes Tumor free resection margin Lymphatic invasion
Accuracy in % 89 90 92 88 88 91 91 88
F1 0.6 0.86 0.74 0.95 0.86
Precision in % 58 85 70 93 81
Recall in % 67 87 96 98 93
FPR 0.88 0.15
FNR 0.02 0.07