Table 3.
Algorithm versions’ strengths, weaknesses, and suggested uses.
| Algorithm | Strengths | Weaknesses | Qualitative Summary of Findings |
|---|---|---|---|
| 1. Updated incident cancer identification, utilizing 6-month claims data window to exclude cancer prior to study period | Requires only claims data with a narrow additional time window (six months) for excluding prevalent cases | Limited time period (six months) for excluding prevalent cancer cases | Lowest specificity and comparatively low PPV and Kappa points to limitations in using claims data alone for identification of incident cancer cases. |
| 2. Incident cancer identification, utilizing NHS data to exclude prevalent cancer at any point prior to study period | Claims not used to exclude prevalent cancer cases. | Moderate PPV and kappa, especially for colorectal cancer | Can be applied when data on cancer history are obtained at cohort inception to ensure only incident cases are identified through claims |
| 3. Incident cancer identification, utilizing 6-month window in claims data and NHS to exclude prevalent cancer | Makes full use of both data sources | Very close performance characteristics to Algorithm #2. | Higher specificity results in small improvement in PPV and kappa. Use of both data sources minimizes false positive incidenct cancer diagnoses with minimal change in sensitivity. |
| 4. Prevalent cancer identification, utilizing claims only | Only requires claims from a two-year observation window to identify those who have ever had cancer | Cannot distinguish incident from prevalent cases | High sensitivity, specificity, PPV, NPV, kappa for identifying ever cancer diagnoses. Useful in when date of diagnosis is not required (eg, studies of genetic factors, or of early-life risk factors for adult cancers) or if diagnosis date is available from other sources. |