. 2022 Sep 2;4:958284. doi: 10.3389/fdgth.2022.958284

Table 2.

Overview of gaps in best practices for model maintenance.

Domain	Gaps/Needs
Maintenance policies
How should model ownership impact local control over maintenance?	• Policies establishing updating expectations of proprietary models • Clarity and fairness of local updating opportunities of proprietary models • Prototypes for establishing collaborative updating of multi-system owned models
How do we ensure comparable performance across demographic groups is sustained during the maintenance phase?	• Guidance on whether and when changes in model fairness warrant pausing AI-enabled tools • Methods for addressing performance fairness drift when model performance deteriorates differentially across subpopulations
How do we communicate model changes to end users and promote acceptance?	• Design of effective communication strategies for warning end users of model performance drift and informing users when updated models are implemented • Guidance on aligning messaging with end-user AI literacy
Performance monitoring
At what level should model performance be monitored and maintained?	• Guidance on aligning monitoring and maintenance with use case needs • Recommendations for handling monitoring in smaller health systems, including determining minimum sample size and methods for collaborative monitoring • Policies supporting collaborative model maintenance in low data resource settings • Guidance on managing interim periods of local performance drift between releases of proprietary models that cannot be locally updated
What aspects of performance should be monitored?	• Generalization recommendations on frequency and sample sizes for measuring performance across a variety of metrics • Customizable and expandable tools to monitor a matrix of metrics • Guidelines for aligning metrics of interest with use case needs
How do we define meaningful changes in performance?	• Framework for selecting drift detection methods • Guidance on establishing clinically acceptable ranges of performance and defining clinically relevant decision boundaries • Methods for tailoring drift detection algorithms to detect a clinically important change
Are there other aspects of AI models that we should monitor, in addition to performance?	• Approaches to systematically surveil external features that may impact model inputs and for monitoring input data distributions • Guidance on when to update in response to changes in model inputs if performance remains stable • Systems for disseminating information on changes anticipated to affect common AI models
Model updating
What updating approaches should be considered?	• Approaches to optimizing update method selection based on performance characteristics most relevant to use case needs • Expanded suite of testing procedures options for more updating methods and increased computational efficiency • Guidance on defining acceptable performance and methods to determine which updating methods, if any, restore acceptable performance
Should clinically meaningful or statistically significant changes in performance guide updating practice?	• Guidance on whether to update models when statistically significant improvement is possible but updating would not provide a clinically meaningful improvement • Methods for comparing updating options that incorporate tests for both statistical and clinical significance • Recommendations for decision-making in cases where available updating methods do not restore performance to acceptable levels
How do we handle biased outcome feedback after model implementation?	• Recommendations for assessing feedback from effective AI-enabled interventions • Methods for model development, validation, and updating that are robust to confounding by intervention