Skip to main content
. 2023 Jun 12;13(6):e073283. doi: 10.1136/bmjopen-2023-073283

Table 4.

Summary of the review stages, contrasting systematic literature review methods with systematic health app review methods

Review stage Systematic literature review (of effectiveness) with a focus on quantitative reviews Systematic commercial health app review
Scoping work
  • Scoping searches of the literature are usually necessary to inform protocol development and ensure the review addresses appropriate questions.

  • Sometimes used to ensure manageable size.

  • Scoping searches of some app stores are essential to determine whether the number of apps available are feasible to review.

  • The research question and the eligibility criteria may be refined iteratively, with multiple scoping searches performed until a reasonable number of apps are identified.

Protocol development
  • Journals increasingly require pre-registration of protocols.

  • There are dedicated registries for many review types (eg, PROSPERO).

  • Alternatives such as OSF are sometimes used.

  • There is no formal requirement for protocols to be registered.

  • Registration on OSF is appropriate and we recommend this.

Stakeholder engagement
  • Varies from none through protocol review to co-development or co-production.

  • This is recommended but not required unless a specific design is used, or it is a Cochrane review where consumer peer review is required.

  • Not formally required and most have had no stakeholder engagement.

  • All stages could benefit from stakeholder engagement for example, co-production by searching, screening, extracting and analysing data.

Inclusion criteria
  • Most reviews include primary research studies.

  • PICO is usually used to define key eligibility criteria.

  • The novel TECH framework can be used to help determine the eligibility criteria. TECH considers the Target user, Evaluation focus, Connectedness and Health domain.

Search
  • Searching multiple databases of published literature plus (often) trial registries and/or grey literature.

  • Search strategy, dates and number of records reported for each database.

  • De-duplication of search results may include using a reference manager.

  • Citation searching is also often used.

  • There is extensive literature on multiple aspects of searching.

  • Information specialists should be involved.

  • Searching the app market via multiple app stores using basic keywords.

  • Additional sources may include a proprietary software database or publicly available online rating frameworks for health apps that use expert reviewers (eg, ORCHA).

  • The search information (eg, app market, date of search and number of apps identified) is recorded on Excel.

  • De-duplication of search results also often takes place on Excel.

  • Information specialists are not generally involved, given that the search process is simple.

Screening
  • Screening of search results exported from databases. Uses tools including Rayyan, Covidence, Endnote, EppiReviewer.

  • Two-stage process conducted in duplicate at each stage; disagreements resolved through consensus/consulting third reviewer. Full text excludes listed with reasons or available on request.

  • A PRISMA flowchart is used to visually report the literature search and screening process.

  • Screening of search results manually extracted into an Excel sheet.

  • Two-stage process in which stage 1 includes screening the apps title and description on the app store. Stage 2 includes downloading the app and assessing eligibility.

  • Two reviewers are generally involved and a third may help to reach consensus on any disagreements. Studies excluded at stage 2 are listed with reasons for exclusion.

  • The PRISMA flowchart is often amended and used to report the app search and screening process.

Data extraction
  • Data are extracted into a pre-specified and piloted form. Tools include Excel, Covidence, Revman, Eppi-Reviewer.

  • Usually, data are extracted by one reviewer and checked by a second, sometimes duplicate extraction is used for some or all data.

  • Data are manually extracted into a pre-specified form on Excel.

  • Data may be extracted by one reviewer and checked by a second, or the task may be shared between reviewers.

Data management
  • Data may be transformed in various ways and processes may be implemented for the handling of missing data, such as assumption or imputation.

  • There is extensive guidance on methodological approaches to challenges in data management.

  • Not generally relevant for commercial health app reviews.

  • Researchers may contact developers for more information about any evaluations that have taken place.

Quality appraisal
  • A wide range of tools for assessment of risk of bias depending on study design and purpose; usually carried out in duplicate with disagreements resolved through consensus/consulting third reviewer.

  • Recorded in Excel, Revman, EppiReviewer.

  • Risk of bias plots can be generated on RevMan or RobVis.

  • Increasingly, reviews will also use Grading of Recommendations, Assessment, Development and Evaluation (GRADE) to rate the certainty of the evidence (from high to very low); GRADE assessment may use GradePRO.

  • Quality is generally assessed using the MARS and recorded in Excel.

  • Good practice requires each app to be reviewed independently by two raters.

  • Inter-rater reliability is analysed and presented in the review.

Synthesis
  • Meta-analysis may be conducted; Cochrane Handbook is a usual source of methods guidance; extensive literature exists on various aspects of this.

  • Many reviews use narrative synthesis, which Cochrane offers guidance on, and there is recent guidance on SWiM (Synthesis without Meta-analysis).

  • Data synthesis is generally performed descriptively by generating statistics (sums, averages, standard deviation and percentages) on relevant items.

  • The highest-scoring apps (regarding functionality and quality) are identified.

  • Descriptive summaries may be written for text-based items (eg, description of the main features).

  • Inter-rater reliability can be calculated for the IMS Institute for Healthcare Informatics functionality scores using Cohen’s Kappa statistic and an intraclass correlation coefficient for MARS scores.

Data presentation
  • Meta-analyses (and sometimes non-pooled data) are presented using forest plots, often with risk of bias plots displayed alongside.

  • Risk of bias results displayed using bespoke figures (see above); GRADE results typically displayed using Summary of Findings Tables.

  • Data tend to be presented as descriptive summaries and Tables. Bespoke figures can also be created.

  • Data pertaining to the IMS Institute for Healthcare Informatics functionality score is often presented as a radar graph/chart.

  • Inter-rater reliability statistics are presented for both the MARS quality appraisals and the IMS Institute for Healthcare Informatics functionality scores.

Updating and currency
  • Reviews should generally have a search date within the last 12 months at submission for publication. Searches can be updated by re-running searches with the relevant date limit: new records will be identified but old one will not be lost (the process is additive) (an exception may be the grey literature).

  • Apps emerge, are updated, and disappear very quickly, so app reviews should be conducted and published as promptly as possible.

  • New or updated searches in app stores will likely yield very different results, so updating a review is difficult.

Reporting
  • PRISMA checklist

  • No formal reporting guidelines exist for health app reviews.

Guidance
  • The Cochrane Handbook and other guidance exists for specific reviews.

  • A YouTube video shows how to use the MARS to assess quality.

MARS, Mobile App Rating Scale; ORCHA, Organisation for the Review of Care and Health Apps; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.