Maintenance policies |
How should model ownership impact local control over maintenance? |
-
•
Policies establishing updating expectations of proprietary models
-
•
Clarity and fairness of local updating opportunities of proprietary models
-
•
Prototypes for establishing collaborative updating of multi-system owned models
|
How do we ensure comparable performance across demographic groups is sustained during the maintenance phase? |
-
•
Guidance on whether and when changes in model fairness warrant pausing AI-enabled tools
-
•
Methods for addressing performance fairness drift when model performance deteriorates differentially across subpopulations
|
How do we communicate model changes to end users and promote acceptance? |
-
•
Design of effective communication strategies for warning end users of model performance drift and informing users when updated models are implemented
-
•
Guidance on aligning messaging with end-user AI literacy
|
Performance monitoring |
At what level should model performance be monitored and maintained? |
-
•
Guidance on aligning monitoring and maintenance with use case needs
-
•
Recommendations for handling monitoring in smaller health systems, including determining minimum sample size and methods for collaborative monitoring
-
•
Policies supporting collaborative model maintenance in low data resource settings
-
•
Guidance on managing interim periods of local performance drift between releases of proprietary models that cannot be locally updated
|
What aspects of performance should be monitored? |
-
•
Generalization recommendations on frequency and sample sizes for measuring performance across a variety of metrics
-
•
Customizable and expandable tools to monitor a matrix of metrics
-
•
Guidelines for aligning metrics of interest with use case needs
|
How do we define meaningful changes in performance? |
-
•
Framework for selecting drift detection methods
-
•
Guidance on establishing clinically acceptable ranges of performance and defining clinically relevant decision boundaries
-
•
Methods for tailoring drift detection algorithms to detect a clinically important change
|
Are there other aspects of AI models that we should monitor, in addition to performance? |
-
•
Approaches to systematically surveil external features that may impact model inputs and for monitoring input data distributions
-
•
Guidance on when to update in response to changes in model inputs if performance remains stable
-
•
Systems for disseminating information on changes anticipated to affect common AI models
|
Model updating |
What updating approaches should be considered? |
-
•
Approaches to optimizing update method selection based on performance characteristics most relevant to use case needs
-
•
Expanded suite of testing procedures options for more updating methods and increased computational efficiency
-
•
Guidance on defining acceptable performance and methods to determine which updating methods, if any, restore acceptable performance
|
Should clinically meaningful or statistically significant changes in performance guide updating practice? |
-
•
Guidance on whether to update models when statistically significant improvement is possible but updating would not provide a clinically meaningful improvement
-
•
Methods for comparing updating options that incorporate tests for both statistical and clinical significance
-
•
Recommendations for decision-making in cases where available updating methods do not restore performance to acceptable levels
|
How do we handle biased outcome feedback after model implementation? |
-
•
Recommendations for assessing feedback from effective AI-enabled interventions
-
•
Methods for model development, validation, and updating that are robust to confounding by intervention
|