Skip to main content
. Author manuscript; available in PMC: 2024 May 3.
Published in final edited form as: Nat Mach Intell. 2023 Jul 17;5(7):799–810. doi: 10.1038/s42256-023-00652-2

Table 2 |.

Benchmarking user roles and responsibilities

Role name Role definition Role responsibilities
Benchmark committee Benchmark committee incLudes regulatory bodies, groups of experts (for example, clinicians, patient representative groups), and data or model owners wishing to drive evaluation of their model or data.
  • Authors the benchmark, manages all benchmark assets, and produces some assets (for example, dataset preparation).

  • Recruits model owners and data owners, makes an open benchmark for model owners and approves applicants.

  • Controls access to the aggregated statistical results.

Data owner Data owners may include hospitals, medical practices, research organizations and healthcare insurance providers that ‘own’ medical data, register medical data and execute benchmark requests.
  • Registers data with benchmarking platform.

  • Performs data labelling.

  • Downloads and executes a data preparation processor to prepare data.

  • Downloads and periodically uses platform client to approve and serve requests, and to approve and upload results to or from benchmarking platform.

Model owner Model owners include AI researchers and software vendors that own a trained medical AI model and want to evaluate its performance.
  • Registers model with benchmarking platform

  • Views results of their model on the benchmark

  • Has the option to approve sharing of results of that benchmark with other model/data owners or the public if allowed by benchmark group

Platform provider Organizations such as MLCommons, which operate a platform that enables benchmark groups to run benchmarks by connecting data owners with model owners.
  • Manages user accounts and provides a website for registering and discovering benchmarks, datasets, models, and for overall workflow management

  • Coordinates active benchmarks by sending requests, aggregating results and managing result access