Scoping work |
Scoping searches of some app stores are essential to determine whether the number of apps available are feasible to review.
The research question and the eligibility criteria may be refined iteratively, with multiple scoping searches performed until a reasonable number of apps are identified.
Protocol development |
Journals increasingly require pre-registration of protocols.
There are dedicated registries for many review types (eg, PROSPERO).
Alternatives such as OSF are sometimes used.
Stakeholder engagement |
Varies from none through protocol review to co-development or co-production.
This is recommended but not required unless a specific design is used, or it is a Cochrane review where consumer peer review is required.
Not formally required and most have had no stakeholder engagement.
All stages could benefit from stakeholder engagement for example, co-production by searching, screening, extracting and analysing data.
Inclusion criteria |
Search |
Searching multiple databases of published literature plus (often) trial registries and/or grey literature.
Search strategy, dates and number of records reported for each database.
De-duplication of search results may include using a reference manager.
Citation searching is also often used.
There is extensive literature on multiple aspects of searching.
Information specialists should be involved.
Searching the app market via multiple app stores using basic keywords.
Additional sources may include a proprietary software database or publicly available online rating frameworks for health apps that use expert reviewers (eg, ORCHA).
The search information (eg, app market, date of search and number of apps identified) is recorded on Excel.
De-duplication of search results also often takes place on Excel.
Information specialists are not generally involved, given that the search process is simple.
Screening |
Screening of search results exported from databases. Uses tools including Rayyan, Covidence, Endnote, EppiReviewer.
Two-stage process conducted in duplicate at each stage; disagreements resolved through consensus/consulting third reviewer. Full text excludes listed with reasons or available on request.
A PRISMA flowchart is used to visually report the literature search and screening process.
Screening of search results manually extracted into an Excel sheet.
Two-stage process in which stage 1 includes screening the apps title and description on the app store. Stage 2 includes downloading the app and assessing eligibility.
Two reviewers are generally involved and a third may help to reach consensus on any disagreements. Studies excluded at stage 2 are listed with reasons for exclusion.
The PRISMA flowchart is often amended and used to report the app search and screening process.
Data extraction |
Data are extracted into a pre-specified and piloted form. Tools include Excel, Covidence, Revman, Eppi-Reviewer.
Usually, data are extracted by one reviewer and checked by a second, sometimes duplicate extraction is used for some or all data.
Data are manually extracted into a pre-specified form on Excel.
Data may be extracted by one reviewer and checked by a second, or the task may be shared between reviewers.
Data management |
Data may be transformed in various ways and processes may be implemented for the handling of missing data, such as assumption or imputation.
There is extensive guidance on methodological approaches to challenges in data management.
Quality appraisal |
A wide range of tools for assessment of risk of bias depending on study design and purpose; usually carried out in duplicate with disagreements resolved through consensus/consulting third reviewer.
Recorded in Excel, Revman, EppiReviewer.
Risk of bias plots can be generated on RevMan or RobVis.
Increasingly, reviews will also use Grading of Recommendations, Assessment, Development and Evaluation (GRADE) to rate the certainty of the evidence (from high to very low); GRADE assessment may use GradePRO.
Quality is generally assessed using the MARS and recorded in Excel.
Good practice requires each app to be reviewed independently by two raters.
Inter-rater reliability is analysed and presented in the review.
Synthesis |
Meta-analysis may be conducted; Cochrane Handbook is a usual source of methods guidance; extensive literature exists on various aspects of this.
Many reviews use narrative synthesis, which Cochrane offers guidance on, and there is recent guidance on SWiM (Synthesis without Meta-analysis).
Data synthesis is generally performed descriptively by generating statistics (sums, averages, standard deviation and percentages) on relevant items.
The highest-scoring apps (regarding functionality and quality) are identified.
Descriptive summaries may be written for text-based items (eg, description of the main features).
Inter-rater reliability can be calculated for the IMS Institute for Healthcare Informatics functionality scores using Cohen’s Kappa statistic and an intraclass correlation coefficient for MARS scores.
Data presentation |
Meta-analyses (and sometimes non-pooled data) are presented using forest plots, often with risk of bias plots displayed alongside.
Risk of bias results displayed using bespoke figures (see above); GRADE results typically displayed using Summary of Findings Tables.
Data tend to be presented as descriptive summaries and Tables. Bespoke figures can also be created.
Data pertaining to the IMS Institute for Healthcare Informatics functionality score is often presented as a radar graph/chart.
Inter-rater reliability statistics are presented for both the MARS quality appraisals and the IMS Institute for Healthcare Informatics functionality scores.
Updating and currency |
Apps emerge, are updated, and disappear very quickly, so app reviews should be conducted and published as promptly as possible.
New or updated searches in app stores will likely yield very different results, so updating a review is difficult.
Reporting |
Guidance |