Table 1. Prompts Provided to the LLM to Generate Helmet Status by Date Completeda.
Date of completion | Estimated time to complete task | Prompt |
---|---|---|
November 22, 2023 (low-detail prompt) | 5 min | “parse through this data looking at column narrative_1 make a table that includes if the cpsc_case_number was wearing a helmet, not wearing a helmet, or if helmet was not mentioned” |
December 7, 2023 (high-detail prompt) | 240 min | “for every cpsc_case_number there is a narrative_1 with patient description and details of patient accident. for which case_numbers does narrative_1 mention that the case number/patient was wearing/had a helmet at some point? text in narrative_1 to help identify this could include language such as but not limited to: ‘with a helmet’, ‘with helmet’, ‘positive for helmet’, ‘positive for a helmet’, ‘helmeted’, ‘wearing helmet’, ‘wearing a helmet’, ‘w/ helmet’, ‘w helmet’, ‘wore helmet’, ‘w/helmet’, ‘w/ a helmet’, ‘had on a helmet’, ‘had on helmet’, ‘had helmet’, ‘had a helmet’, ‘his helmet’, ‘her helmet’, ‘pt helmet’, ‘pts helmet’, ‘was wearing a helmet’, ‘,helmet,’, ‘helmetd’, ‘whelmet’, ‘cracked helmet’, ‘cracking helmet’, ‘helmet cracked’, ‘helmet on’, ‘broke helmet’, ‘helmet broke’, ‘breaking helmet’, ‘positive helmet’, ‘wore helmet’, ‘wearing a bike helmet’, ‘wearing a full face helmet’, ‘wearing full face helmet’, ‘+helmet’, ‘+ helmet’, ‘including helmet’, ‘ full helmet’, ‘+full helmet’, ‘+ full helmet’, ‘+full bike helmet’, ‘+ full bike helmet’, ‘pt’s helmet’, ‘helmet fell’, ‘smashed helmet’, ‘plus helmet’. ‘helmet was fractured’, ‘helmet was broken’, ‘+ bike helmet’, ‘+bike helmet’, ‘wearing bike helmet’, ‘+ helmemt’, ‘endorses helmet’, ‘helmet went off’, ‘full face mask helmet’, ‘& helmet’, ‘helmet was reported to be cracked’ looking over the initial csv file, for which case_numbers does narrative_1 mention explicitly that the case number/patient was not wearing helmet at any point? text in narrative_1 to help identify this could include language such as but not limited to: ‘no helmet’, ‘without a helmet’, ‘without helmet’, ‘negative for helmet’, ‘negative for a helmet’, ‘not helmeted’, ‘-helmet’, ‘- helmet’, ‘not wearing helmet’, ‘not wearing a helmet’, ‘unhelmeted’, ‘w/o helmet’, ‘did not have on a helmet’, ‘no wearing helmet’, ‘did not have a helmet’, ‘did not have helmet’, ‘w/o a helmet’, ‘without wearing helmet’, ‘without wearing a helmet’, ‘wihtout a helmet’, ‘removed helmet’, ‘denied helmet’, ‘denied use of helmet’, ‘denied use of a helmet’, ‘helmetless’, ‘w/out helmet’, ‘denies helmet’, ‘with out helmet’, ‘with out a helmet’, ‘not wear a helmet’, ‘not wear helmet’, ‘wo a helmet’, ‘not weariing helmet’, ‘not wearinng a helmet’, not wearig’, ‘-helmet’, ‘negative helmet’ looking at the original csv file, for which case_numbers does narrative_1 not mention helmet(s) or mention helmet use unknown? text in narrative_1 to help identify this could include language such as but not limited to: ‘unsure if helmet’, ‘unsure if helmeted’, ‘unsure if pt wearing helmet’, ‘unsure if pt wearing a helmet’, ‘unsure if pt was wearing helmet’, ‘unsure if pt was wearing a helmet’, ‘unknown if helmet’, ‘helmet unknown’, ‘unknown helmet’, ‘no mention of helmet’, ‘unk helmet’, ‘helmet unk’, ‘helmet ns’, ‘ns helmet’, ‘?helmet’, ‘? helmet’, ‘helmet?’ could you make a csv file with cpsc_case_number, narrative_1, and new column “helmet status”. based on their narrative_1 descriptions and criteria above categorize each case number “helmet not mentioned” or “not wearing helmet” or “wearing helmet” for all case_numbers. there should be no duplicate case numbers. should be one-to-one. double check for mistakes” |
December 12, 2023 (intermediate-detail prompt) | 15 min | “for every cpsc_case_number there is a narrative_1 with patient description and details of a patient accident that involved an injury. based on their narrative_1 descriptions, can you use the following criteria to create a new variable called “helmet_status”, which is generated by categorizing each case number into one of the following categories “helmet not mentioned” or “not wearing helmet” or “wearing helmet” using these criteria: “helmet not mentioned”: cpsc_case_numbers where narrative_1 mentions that helmet use was unknown or that the term “helmet” was not recorded within narrative_1. the narrative_1 column could include any variation of the following phrases: “unsure if helmeted”, “unknown if helmeted”, “no mention of helmet”, or “?helmet”. please consider that there could be other phrases that indicate unknown helmet use. “not wearing helmet”: cpsc_case_numbers where narrative_1 mentions that the case number/patient was not wearing a helmet at any point. the narrative_1 column could include any variation of the following phrases: “no helmet” or “without helmet” or “not wearing helmet” or “unhelmeted”. please consider that there could be other phrases that indicate the patient was not wearing a helmet. “wearing helmet”: cpsc_case_numbers where narrative_1 mentions that the case number/patient was wearing/had a helmet at some point. the narrative_1 column could include any variation of the following phrases: “with a helmet” or “had a helmet” or “cracked helmet” or “+ helmet”. please consider that there could be other phrases that indicate the patient was wearing a helmet. can you make a csv file with cpsc_case_number, narrative_1, and new column “helmet status” based on the criteria above?” |
Abbreviation: LLM, large language model.
See eMethods in Supplement 1 for library of text strings used for string-matching and the LLM prompt.