Maintenance policies |
How should model ownership impact local control over maintenance? |
Policies establishing updating expectations of proprietary models
Clarity and fairness of local updating opportunities of proprietary models
Prototypes for establishing collaborative updating of multi-system owned models
How do we ensure comparable performance across demographic groups is sustained during the maintenance phase? |
Guidance on whether and when changes in model fairness warrant pausing AI-enabled tools
Methods for addressing performance fairness drift when model performance deteriorates differentially across subpopulations
How do we communicate model changes to end users and promote acceptance? |
Design of effective communication strategies for warning end users of model performance drift and informing users when updated models are implemented
Guidance on aligning messaging with end-user AI literacy
Performance monitoring |
At what level should model performance be monitored and maintained? |
Guidance on aligning monitoring and maintenance with use case needs
Recommendations for handling monitoring in smaller health systems, including determining minimum sample size and methods for collaborative monitoring
Policies supporting collaborative model maintenance in low data resource settings
Guidance on managing interim periods of local performance drift between releases of proprietary models that cannot be locally updated
What aspects of performance should be monitored? |
Generalization recommendations on frequency and sample sizes for measuring performance across a variety of metrics
Customizable and expandable tools to monitor a matrix of metrics
Guidelines for aligning metrics of interest with use case needs
How do we define meaningful changes in performance? |
Framework for selecting drift detection methods
Guidance on establishing clinically acceptable ranges of performance and defining clinically relevant decision boundaries
Methods for tailoring drift detection algorithms to detect a clinically important change
Are there other aspects of AI models that we should monitor, in addition to performance? |
Approaches to systematically surveil external features that may impact model inputs and for monitoring input data distributions
Guidance on when to update in response to changes in model inputs if performance remains stable
Systems for disseminating information on changes anticipated to affect common AI models
Model updating |
What updating approaches should be considered? |
Approaches to optimizing update method selection based on performance characteristics most relevant to use case needs
Expanded suite of testing procedures options for more updating methods and increased computational efficiency
Guidance on defining acceptable performance and methods to determine which updating methods, if any, restore acceptable performance
Should clinically meaningful or statistically significant changes in performance guide updating practice? |
Guidance on whether to update models when statistically significant improvement is possible but updating would not provide a clinically meaningful improvement
Methods for comparing updating options that incorporate tests for both statistical and clinical significance
Recommendations for decision-making in cases where available updating methods do not restore performance to acceptable levels
How do we handle biased outcome feedback after model implementation? |
Recommendations for assessing feedback from effective AI-enabled interventions
Methods for model development, validation, and updating that are robust to confounding by intervention