Abstract
Objective:
This study examined the effectiveness of the Better Access initiative using outcome data from real-world practice settings.
Methods:
We used anonymised data from four datasets to assess outcomes for consumers over 86,121 episodes of care. The datasets contained routinely captured episode-level data from the practices of psychologists and other eligible Better Access providers. Across the datasets, outcomes were assessed on 11 different measures (mostly consumer-rated measures of depression and anxiety symptoms, psychological distress, functioning and wellbeing). We conducted purpose-designed analyses with three of the datasets (83,346 episodes), examining score changes on given measures between the first and last assessment occasion within an episode. We used preexisting outputs for the fourth dataset (2775 episodes), again considering change from the beginning to the end of the episode.
Results:
In the purpose-designed analyses, consumers’ mental health improved in around 50–60% of episodes. However, consumers showed no change or experienced deterioration in their mental health in 20–30% and 10–20% of episodes, respectively. Those with more severe baseline scores had a greater probability of showing improvement. The preexisting outputs also identified significant improvements, particularly in episodes where treatment was complete.
Conclusion:
Better Access is achieving reductions in symptoms and improvements in functioning and wellbeing for the majority of consumers. A minority of consumers do not have these sorts of positive outcomes, however, and further work is required to understand why. Routine measurement of outcomes – particularly consumer-rated outcomes – would enable ongoing monitoring of the extent to which Better Access is achieving its goals.
Keywords: Psychological therapy, mental health services, Better Access, Medicare
Introduction
The Better Access to Psychiatrists, Psychologists and General Practitioners through the Medicare Benefits Schedule initiative (Better Access) was introduced in 2006 to enable people with common mental disorders like depression and anxiety to readily receive treatment. It was specifically designed to support those with mild-to-moderate mental health conditions who might respond well to short-term evidence-based interventions (Australian Government Department of Health, 2021). Under Better Access, a series of item numbers was added to the Medicare Benefits Schedule (MBS). These item numbers are associated with a variety of mental health services that are delivered by different mental health professionals. Key among these are psychological therapy services delivered by clinical psychologists and focussed psychological strategies delivered by psychologists, social workers and occupational therapists. MBS rebates are available to assist consumers meet the costs of these services.
We were commissioned by the (then) Department of Health to evaluate Better Access in 2021–2022. This followed an evaluation that we had conducted in 2009–2010 (Pirkis et al., 2011b). The more recent evaluation considered the accessibility, responsiveness, appropriateness, effectiveness and sustainability of Better Access and involved 10 separate studies; this study and eight others are reported on in this issue of the Australian and New Zealand Journal of Psychiatry (Arya et al., 2026; Chilver et al., 2026; Currier et al., 2026; Harris et al., 2026; Newton et al., 2026; Pirkis et al., 2026; Tapp et al., 2026a; Tapp et al., 2026b).
The current study drew on outcome data that were routinely collected in the clinical practices of psychologists and other eligible providers. It complemented three of the other studies reported in this issue by providing a different lens on consumer outcomes. It assessed outcomes via validated measures of symptoms, functioning and related concepts that were administered prospectively, and considered change over discrete episodes of care. Pirkis et al. (2026) also considered outcomes over the course of an episode of care, but relied on consumers’ retrospective reports of how their mental health changed over the course of the episode. Like the current study, Harris et al. (2026) and Arya et al. (2026) used prospectively administered measures, but assessed change over set periods of time rather than for specific episodes.
The study was informed by our previous experiences with assessing the effectiveness of Better Access. In our 2009–2010 evaluation, we followed 883 consumers over sessions of Better Access care (Pirkis et al., 2011a). Because we needed to collect baseline data for these consumers before they received care, we recruited a stratified random sample of providers and asked them to recruit their next 5–10 English-speaking ‘new’ Better Access consumers. We found that the majority showed improvement on the Kessler Psychological Distress Scale (K-10) (Kessler et al., 2002) and/or the Depression Anxiety Stress Scales (DASS-21) (Lovibond and Lovibond, 1995a; Lovibond and Lovibond, 1995b). We were criticised because our sample was relatively small and may have been biased if providers ‘cherry picked’ consumers who were likely to experience positive outcomes (Allen and Jackson, 2011; Hickie et al., 2011).
In the current study, we addressed these criticisms by using outcome data that had been routinely collected by providers during the course of consumers’ care. This had the advantage of allowing us to access anonymised data for a far greater number of consumers, and, because the data had been routinely collected by providers for a different purpose (i.e. to monitor consumers’ progress and guide their clinical approach) the likelihood of sampling bias was minimised. In addition, using routinely collected data allowed us to examine outcomes not only for consumers who had completed treatment, but also for those who were still receiving treatment or had ceased treatment prematurely. This gave us confidence that we were gleaning an accurate representation of Better Access in real-world settings.
Methods
Study overview
We used four datasets to assess outcomes for consumers over their episodes of care. Four of our authors (B.B., A.F., C.M. and K.F.) were the custodians of these datasets. The first dataset was NovoPsych, a subscription-based platform developed by B.B. and used in multiple Australian practices (usually psychology practices). The other three datasets came from large psychology practices which at the time were run by A.F. (Benchmark Psychology, Queensland), C.M. (Chris Mackey and Associates, Victoria) and K.F. (Kaye Frankcom and Associates, Victoria).
Table 1 provides detail about the datasets’ scope. NovoPsych was the largest, housing data on consumer outcomes from around 3000 providers. All four contained data from extensive periods, with the Mackey database going back to 2007 (when Better Access began), the NovoPsych and Benchmark databases housing data from early 2013, and the Frankcom database containing data from mid-2015.
Table 1.
Scope of the four datasets.
| Dataset | Providers | Period over which data were available |
|---|---|---|
| NovoPsych | ≈3000 (mostly psychologists but also other providers) | January 2013 to February 2022 |
| Benchmark | 42 (all psychologists) | January 2013 to February 2022 |
| Mackey | 35 (all psychologists) | January 2007 to December 2018 |
| Frankcom | 14 (all psychologists) | May 2015 to October 2017 |
We were not able to identify individual consumers or individual providers in any dataset. To anonymise the data further, we do not refer to any of the datasets by name for the remainder of this paper, but instead report all findings by individual measure.
Outcome measurement
The four datasets included outcome data from 11 different measures (see Table 2). These were mostly consumer-rated measures and covered various constructs, including depression and anxiety symptoms, psychological distress, functioning and wellbeing.
Table 2.
Outcome measures used.
| Measure | Perspective | Domain(s) | Structure |
|---|---|---|---|
| Clinical Outcomes in Routine Evaluation (CORE-OM)(Barkham et al., 1998; Evans et al., 2002) | Consumer-rated | Psychological distress | Consists of 34 items relating to four domains (subjective wellbeing, problems/symptoms, life functioning, risk/harm). The items are phrased as statements about how the consumer has been over the last week. Each item is scored on a 5-point scale ranging from 0 (Not at all) to 4 (Most or all the time). Scores are presented as a total raw score (range 0–136) and a mean score from 0–4. A mean score of 1 or more indicates that the consumer is likely to reach a clinical threshold. |
| Clinical Outcomes in Routine Evaluation (CORE-10) (Barkham et al., 2013) | Consumer-rated | Psychological distress | Abbreviated version of the CORE-OM. Consists of 10 items from the original CORE-OM. Each item is scored the same way as the parent instrument (i.e. on a scale of 0–4). Scores are presented as a total raw score (range 0–40) and a mean score from 0 to 4. Total scores of 0–10 suggest the consumer is in the non-clinical range, whereas scores of 11–14 indicate mild psychological distress, scores of 15–19 indicate moderate psychological distress, scores of 20–24 indicate moderate-to-severe psychological distress, and scores of 25 or more indicate severe psychological distress. |
| Depression Anxiety and Stress Scale (DASS-21/42) (Lovibond and Lovibond, 1995a; Lovibond and Lovibond, 1995b) | Consumer-rated | Negative emotional states of depression, anxiety and stress | The longer form (DASS-42) consists of 42 items, and the shorter form (DASS-21) consists of 21 items. Each item takes the form of a statement relating to a symptom of depression, anxiety or stress. The consumer is asked to consider how much each statement applied to them in the past week. Each item is scored from 0 (Did not apply to me at all) to 3 (Applied to me very much, or most of the time). The total score on the DASS-42 ranges from 0 to 126; the raw total score on the DASS-21 ranges from 0 to 0–63 but is then doubled so that it also ranges from 0 to 126. There are three sub-scales – depression, anxiety, and stress – each of which has a score ranging from 0 to 42. The cut-offs for the depression sub-scale is as follows: ⩽ 9 – normal, 10–13 – mild, 14–20 – moderate, 21–27 – severe, ⩾ 28 extremely severe. The equivalent cut-offs for the anxiety and stress sub-scales are ⩽ 7, 8–9, 10–14, 15–19 and ⩾ 20, and ⩽ 14, 15–18, 19–25, 26–33 and ⩾ 34, respectively. |
| Depression Anxiety and Stress Scale (DASS-10) (Halford and Frost, 2021) | Consumer-rated | Negative emotional states of depression, anxiety and stress | Abbreviated version of the DASS-42 and DASS-21. Consists of 8 items from the original DASS, and 2 additional items relating to substance use and suicidality. As with the original measure, the consumer rates each item on a scale of 0–3 to indicate how much it applied to them in the past week. This yields a total score of 0–30. Severity of depression, anxiety and stress is classified as follows: 0–6 – sub-clinical or mild severity, 7–12 moderate, 13–30 severe. |
| Generalised Anxiety Disorder scale (GAD-7) (Spitzer et al., 2006) | Consumer-rated | Anxiety symptoms | Consists of seven questions about how often the consumer has been bothered by selected anxiety symptoms over the past two weeks. Each item is scored 0 (Not at all), 1 (Several days), 2 (More than half the days) or 3 (Nearly every day). The total score ranges from 0 to 21. A score of 10 or more indicates the likely presence of Generalised Anxiety Disorder. |
| Global Assessment of Functioning Scale (GAF) (Endicott et al., 1976) | Clinician-rated | Functioning | Seeks a single rating. Ratings range from 1 (Persistent danger of severely hurting self or others OR persistent inability to maintain minimal personal hygiene OR serious suicidal act with clear expectation of death) to 100 (Superior functioning in a wide range of activities, life’s problems never seem to get out of hand, is sought out by others because of his/her many positive qualities. No symptoms). |
| Kessler Psychological Distress Scale (K-10) (Kessler et al., 2002) | Consumer-rated | Non-specific psychological distress | Comprises 10 items which ask the consumer about symptoms of depression and anxiety in the past four weeks. Each item is rated from 1 (None of the time) to 5 (All of the time), resulting in a total score that ranges from 10 to 50. Scores of 10–15 indicate little or no psychological distress, scores of 16–21 indicate moderate psychological distress, scores of 22–29 indicate high psychological distress, and scores of 30–50 indicate very high psychological distress. |
| Outcome Rating Scale (ORS) (Miller et al., 2003) | Consumer-rated | Consumers’ perceptions of their improvement over the course of treatment | Consumers use visual analogue scales to indicate how well they have been faring in three domains (individually, interpersonally and socially) and overall over the past week. In each case, the visual analogue scale is 10 cm long. Marks to the left indicate low levels and marks to the right indicate high levels, yielding scores on each scale range from 0 to 10 and the total score ranges from 0 to 40. The clinical cut-off for adults is 28. |
| Patient Health Questionnaire (PHQ-9) (Kroenke et al., 2001) | Consumer-rated | Depressive symptoms | Consists of nine items relating to how often the consumer has been bothered by depressive symptoms during the past two weeks. Each item is scored 0 (Not at all), 1 (Several days), 2 (More than half the days) or 3 (Nearly every day). Total scores range from 0 to 27. Scores of 0–4 indicate no depression, scores of 5–9 indicate mild depression, scores of 10–14 indicate moderate depression, scores of 15–19 indicate moderately severe depression, and scores of 20–27 indicate severe depression. |
| Positive and Negative Affect Schedule (PANAS) (Watson et al., 1988) | Consumer-rated | Positive and negative affect | Consists of 20 items, 10 relating to positive affect and 10 relating to negative affect. Each item relates to a specific feeling, and the consumer is asked to indicate the extent to which they have felt this way over the past week. Each item is scored on a scale of 1 (Very slightly or not at all) to 5 (Extremely). This results in total scores for positive and negative affect that each range from 10 to 50. |
| Satisfaction With Life Scale (SWLS) (Diener et al., 1985) | Consumer-rated | Global life satisfaction | Consists of 5 items that are phrased as statements about the consumer’s satisfaction with life. They are asked to rate their agreement with each of these statements. Each item is scored on a scale of 1 (Strongly disagree) to 7 (Strongly agree). This yields a total score of 5–35. The scores can be interpreted in the following way: 5–9 – extremely dissatisfied; 10–14 – dissatisfied; 15–19 – slightly dissatisfied; 20–24 – slightly satisfied; 25–29 – satisfied; 30–35 – extremely satisfied. |
Purpose-designed analyses
We were able to implement a consistent analysis strategy that employed purpose-designed analyses for three datasets. These datasets included data on all of the measures in Table 2 except the ORS. Our approach is described later.
Data management
The datasets were processed and analysed separately. For all three, the data custodian retained the raw data and provided anonymised test datasets with limited numbers of rows to our analysis team. The analysis team developed data cleaning, organisation and analysis code (in R [v4.0.0]) using the test datasets. These differed depending on the structure of the given dataset, but all ultimately enabled the data to be presented consistently across the three datasets. The code was sent to the data custodians who then used it to conduct the analysis and provide our team with aggregate results.
Episodes of care
We organised each dataset around episodes, creating these by aggregating series of sessions at which outcomes were assessed. Where sessions were dated, we were able to determine the time between consecutive sessions; sessions with no date were excluded. We treated consecutive sessions as belonging to the same episode if the period between them was less than six months; if the gap between sessions was six months or more, the latter session was treated as the start of a new episode. This meant that episodes could straddle the end of one year and the beginning of the next and were not defined by the Better Access annual session cap rules. We thought that this was a more meaningful way to explore the outcomes of a consecutive series of sessions of care.
For each episode, we extracted the following data: consumer’s age, sex and first and last scores on the relevant outcome measure(s). Other variables were not consistently available across the datasets.
Inclusion and exclusion criteria
Wherever possible, we tried to ensure that the sessions that made up episodes were delivered through Better Access. Our starting point involved ensuring that the providers who had delivered the care came from a professional group whose services were eligible for rebates under Better Access (psychologists, social workers and occupational therapists).
We were able to take one additional step with one of the datasets. This dataset ‘tagged’ sessions of care that were delivered under Better Access. We used these in the analysis and excluded all others in this dataset. In the remaining datasets, we assumed that all sessions and the episodes they were aggregated to were delivered under Better Access. We did this based on the rationale that most of the episodes in our datasets were delivered by psychologists, and we know from two background analyses that the majority of services delivered by psychologists in Australia are funded through Better Access. The first of these analyses drew on epidemiological data from the National Study of Mental Health and Wellbeing (Australian Bureau of Statistics, 2022) and on MBS administrative data (Australian Institute of Health and Welfare, 2023) and showed that approximately 75% of all Australians aged 16-85 who saw a psychologist in 2020–2021 received Better Access treatment sessions. The second analysis used expenditure data from a range of sources and showed that Better Access accounted for around 85% of all expenditure on psychologists, when psychologists funded through Primary Health Networks (Australian Government Department of Health, 2022), private health insurance companies (Australian Prudential Regulation Authority, 2022), the Department of Veterans Affairs (Australian Institute of Health and Welfare, 2022) and the Department of Defence (Australian Institute of Health and Welfare, 2022) were added to the mix (Pirkis et al., 2022). We are confident, therefore, that the majority of sessions represented in the datasets were Better Access sessions.
To be eligible for inclusion in the analysis, an episode had to include at least two sessions for which the same measure was completed. For some episodes, outcomes were assessed at more than two sessions. Where this was the case, we used the outcome scores from the first and last sessions to calculate change on the given measure.
We also excluded some sessions that did not have valid data for analysis (e.g. outcome scores that fell outside the legitimate scoring range for the given measure; multiple administrations of the same measure on the same day).
We had some rules about the consumers who received the episodes. Consumers were excluded from the analysis if they were based outside Australia. They were also excluded if there was evidence that they were aged <18 years; where date of birth was missing we assumed that they were adults. Our reasoning was that when we interrogated MBS data supplied by Services Australia for the purposes of the broader evaluation, 90% of those who received Better Access services from a clinical psychologist or a psychologist in 2022 were aged 15 or more (Tapp et al., 2026b).
Data analysis
We profiled episodes by sex and age of the consumer, presenting the results as percentages. We also profiled episodes by the first (baseline) score on each measure, using standard cut-offs or quartiles where cut-offs were not available and rounding scores down for the purposes of categorisation.
We examined outcomes (i.e. the change in scores on a given measure between the first and last assessment occasions within an episode) using an effect size methodology. We classified episodes in terms of change, using an effect size of 0.3. A change score of greater than 0.3 times the standard deviation of the mean difference in outcome score for all episodes was taken as ‘significant improvement’; a change score between −0.3 and 0.3 times the standard deviation was taken as ‘no significant change’; and a change score of less than -0.3 times the standard deviation was taken as ‘significant deterioration’. We chose 0.3 as the effect size after considering studies of the Minimum Clinically Important Difference (MCID) on two of the measures in our suite (the PHQ-9 and GAD-7) (Kounali et al., 2020; Kroenke et al., 2019) and recommendations about the range of effect sizes likely to be minimally important in clinical or subjective terms (Angst et al., 2017). The MCID represents the smallest difference perceived by the consumer to be beneficial. An effect size of 0.3 is at lower end of the reported ranges, but we considered this appropriate because samples were not restricted to those who used a minimum number of sessions or completed treatment. If we had limited the samples in this way, we might have anticipated greater average improvement and a higher effect size might have been more appropriate.
For all estimates of change, we calculated 95% confidence intervals. Non-overlapping confidence intervals were used as a conservative method of determining whether differences in the proportions classified as ‘significant improvement’, ‘no significant change’ or ‘significant deterioration’ were statistically significant (Schenker and Gentleman, 2001).
We calculated effect sizes for each measure within a dataset, conducting an analysis for all episodes for each measure and then analyses stratified by baseline severity score on the given measure.
Preexisting outputs
The remaining dataset included data on the ORS. It was not possible to conduct the above purpose-designed analyses with this dataset for logistical reasons, so we were provided with outputs from unpublished preexisting analyses. We felt that it was important to include these because the ORS is widely used in clinical practice.
The specific outputs were organised around outcomes on the ORS at six points in time and contained data from the preceding six months or so (periods 1–6). In each case, the key outcome metric was the effect size associated with change on the ORS from the beginning to the end of each episode for active clients (clients who were still receiving treatment at the end of the episode) and inactive clients (clients whose episode had ended).
The effect size was different from the one that we used in the purpose-designed analyses, described above. This effect size was more complex and described the relative effect of treatment compared to no intervention, after correcting for number of sessions, regression to the mean, baseline severity and bias. The creators of the software through which the outputs were generated indicate that a relative effect size of 0.8 can be translated as ‘[consumers] reporting outcomes 80% better than those not receiving treatment’.
Again, we assumed that the majority of sessions represented in this dataset were delivered via Better Access.
Approvals
The University of Melbourne Human Research Ethics Committee approved the study (2021-22452-23859-4).
Results
Purpose-designed analyses
In total, we had data on outcomes from 83,346 episodes in our purpose-designed analyses. Individual episodes could be represented in more than one analysis if multiple measures were used to assess outcomes in the same episode. The number of episodes represented in any given analysis varied from 1862 to 53,216.
Table 3 profiles the episodes included in the analysis for each measure. Across all measures, around two-thirds of episodes were delivered to females. Between 40% and 65% of episodes were provided to people aged < 40.
Table 3.
Breakdown of episodes included in purpose-designed analyses, by consumers’ sex and age for each measure.
| CORE-OM | CORE-10 | DASS-21/42 | DASS-10 | GAD-7 | GAF | K-10 | PHQ-9 | PANAS | SWLS | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sex | Male | 32.6% | 30.6% | 35.7% | 40.9% | 35.5% | 37.7% | 32.5% | 36.9% | 37.5% | 37.4% |
| Female | 67.4% | 69.4% | 64.3% | 59.1% | 64.5% | 62.3% | 67.5% | 63.1% | 62.5% | 62.6% | |
| Age | 18-29 | 21.7% | 26.8% | 21.1% | 39.6% | 23.3% | 31.7% | 21.3% | 22.1% | 33.2% | 33.2% |
| 30-39 | 20.4% | 18.8% | 18.5% | 25.8% | 18.8% | 27.4% | 18.5% | 18.5% | 27.6% | 27.7% | |
| 40-49 | 18.3% | 16.3% | 16.9% | 18.1% | 16.4% | 20.4% | 16.8% | 16.6% | 20.5% | 20.4% | |
| 50-59 | 15.7% | 15.0% | 15.7% | 11.0% | 15.3% | 13.9% | 16.1% | 16.0% | 12.5% | 12.6% | |
| 60-69 | 13.7% | 12.2% | 14.4% | 4.1% | 13.7% | 5.5% | 14.4% | 14.1% | 5.3% | 5.2% | |
| 70 + | 10.3% | 10.9% | 13.4% | 1.4% | 12.5% | 1.2% | 12.9% | 12.8% | 1.0% | 1.0% |
Table 4 shows the distribution of consumers’ baseline severity across episodes for each measure. For all measures, episodes were distributed across baseline severity categories. There were sizable proportions of episodes where the consumer began care with mild, moderate or severe symptoms or levels of functioning. There were also instances where the consumer began the episode in the ‘normal range’. The precise patterns differed depending on the measure, and the number and nature of the cut-offs for the various levels of severity.
Table 4.
Baseline severity, by measure.
| Measure | Baseline severity | |
|---|---|---|
| Clinical Outcomes in Routine Evaluation (CORE-OM) (Barkham et al., 1998; Evans et al., 2002) | Non-clinical: 0 | 19.4% |
| Clinical: ⩾ 1 | 80.6% | |
| Clinical Outcomes in Routine Evaluation (CORE-10) (Barkham et al., 2013) | Non-clinical range: ⩽ 10 | 16.9% |
| Mild: 11–14 | 17.8% | |
| Moderate: 15–19 | 24.6% | |
| Moderate to severe: 20–24 | 20.9% | |
| Severe: ⩾ 25 | 19.9% | |
| Depression Anxiety and Stress Scale (DASS-21/42) – Depression (Lovibond and Lovibond, 1995a; Lovibond and Lovibond, 1995b) | Normal: ⩽ 9 | 22.5% |
| Mild: 10–13 | 12.3% | |
| Moderate: 14–20 | 25.7% | |
| Severe: 21–27 | 15.7% | |
| Extremely severe: ⩾ 28 | 23.8% | |
| Depression Anxiety and Stress Scale (DASS-21/42) – Anxiety (Lovibond and Lovibond, 1995a; Lovibond and Lovibond, 1995b) | Normal: ⩽ 7 | 26.0% |
| Mild: 8–9 | 7.7% | |
| Moderate: 10–14 | 22.2% | |
| Severe: 15–19 | 13.9% | |
| Extremely severe: ⩾ 20 | 30.1% | |
| Depression Anxiety and Stress Scale (DASS-21/42) – Stress (Lovibond and Lovibond, 1995a; Lovibond and Lovibond, 1995b) | Normal: ⩽ 14 | 26.7% |
| Mild: 15–18 | 15.3% | |
| Moderate: 19–25 | 23.8% | |
| Severe: 26–33 | 23.3% | |
| Extremely severe: ⩾ 34 | 10.9% | |
| Depression Anxiety and Stress Scale (DASS-10) (Halford and Frost, 2021) | Sub-clinical or mild: ⩽ 6 | 25.5% |
| Moderate: 7–12 | 32.0% | |
| Severe: ⩾ 13 | 42.5% | |
| Generalised Anxiety Disorder scale (GAD-7) (Spitzer et al., 2006) | No GAD: ⩽ 9 | 35.4% |
| GAD: ⩾ 10 | 64.6% | |
| Global Assessment of Functioning Scale (GAF) (Endicott et al., 1976) | Quartile 1 | 23.1% |
| Quartile 2 | 25.0% | |
| Quartile 3 | 26.6% | |
| Quartile 4 | 23.1% | |
| Kessler Psychological Distress Scale (K-10) (Kessler et al., 2002) | Low psychological distress: 10–15 | 6.0% |
| Moderate psychological distress: 16–21 | 14.2% | |
| High psychological distress: 22–29 | 29.7% | |
| Very high psychological distress: ⩾ 30 | 50.1% | |
| Patient Health Questionnaire (PHQ-9) (Kroenke et al., 2001) | No depression: ⩽ 4 | 8.2% |
| Mild depression: 5–9 | 24.3% | |
| Moderate depression: 10–14 | 27.3% | |
| Moderately severe depression: 15–19 | 20.8% | |
| Severe depression: ⩾ 20 | 19.4% | |
| Positive and Negative Affect Schedule (PANAS) (Watson et al., 1988) – Positive | Quartile 1 | 23.3% |
| Quartile 2 | 26.5% | |
| Quartile 3 | 20.6% | |
| Quartile 4 | 29.6% | |
| Positive and Negative Affect Schedule (PANAS) (Watson et al., 1988) – Negative | Quartile 1 | 25.9% |
| Quartile 2 | 26.2% | |
| Quartile 3 | 23.3% | |
| Quartile 4 | 24.7% | |
| Satisfaction With Life Scale (SWLS) (Diener et al., 1985) | Satisfied or extremely satisfied | 20.7% |
| Slightly satisfied or slightly dissatisfied | 42.7% | |
| Dissatisfied or extremely dissatisfied | 36.7% | |
Figures 1 and 2 present the findings in relation to outcomes for each measure. Figure 1 presents data for all episodes, and Figure 2 presents data for episodes stratified by baseline severity score (with the lowest level of severity presented to the left).
Figure 1.
Outcomes by measure.
Figure 2.
Outcomes by measure and baseline severity.
The picture is largely consistent across measures. In most cases, there was significant improvement in around 50–60% of episodes. There were some outliers, with higher proportions of episodes showing improvement according to the GAF and, to a lesser extent, the PANAS. Lower proportions did so when the DASS-10 was used as the assessment tool. There may be reasons for this that relate to the measures themselves, the constructs they assess (e.g. symptoms versus levels of functioning versus wellbeing), whose perspective they take (i.e. consumers’ versus providers’), and the way they were administered. There may also be differences in the way practices record data for consumers (e.g. how they account for consumers who drop out of care early). In addition, the casemix of consumers seen by different practices will have a bearing on outcomes.
Almost without exception, those with the most severe baseline scores on the given measure were the most likely to show improvement over the course of the episode. For these consumers, across most measures, there was improvement in around 60–75% of episodes. Exceptions were the GAF and the PANAS, where the percentages were higher.
Preexisting outputs
The preexisting outputs represented 2775 episodes. Figure 3 presents the key results, describing outcomes over six periods as measured by the ORS. It shows the relative effect size associated with change on the ORS over the duration of an episode for active and inactive clients. Active clients might not have yet achieved optimal outcomes because they were still in treatment. Conversely, inactive clients might have been expected to have better outcomes because many would have completed a full course of treatment (although some would have dropped out before they did so). The relative effect sizes for active clients sit at around 0.55 across all time points. The relative effect sizes for inactive clients ranged from 0.59 to 0.73.
Figure 3.
Outcomes on the ORS for active and inactive clients, by period.
Discussion
Summary and interpretation of findings
Our study tracked consumers’ progress over 86,121 episodes, assessing change via various measures of different aspects of mental health. Data on outcomes of psychological care delivered by allied health professionals in private practice are not available on this scale from any other source. Although there are examples elsewhere in the mental health sector of systems for routinely collecting outcome data – e.g. for episodes delivered through public sector specialised inpatient and community services (Australian Mental Health Outcomes and Classification Network, 2022) or commissioned by Primary Health Networks (Australian Government Department of Health, 2022) – there is no equivalent system for Better Access services. Medicare data that are collected for administrative purposes relate to activity only and not outcomes. In an ideal world, steps would be taken to implement routine outcome measurement as a quality assurance tool for Better Access.
Consumers in this study began their episodes with varying levels of severity on the different measures. Some presented with high levels of baseline severity on a given measure, while others presented with more mild or moderate levels. Overall, this suggests that Better Access is not only reaching consumers with mild to moderate mental health conditions as originally intended (Australian Government Department of Health, 2021), but that it is also providing services for those with more severe mental illness. Some consumers (often 20–25%) presented in the ‘normal range’ for some of the symptom-based measures. In some cases, it may be that the consumer had, for example, low levels of anxiety or depressive symptoms but still warranted a mental health diagnosis (e.g. phobia, adjustment disorder). Relatedly, it may be that the particular measure was not capturing the consumer’s presenting issue (e.g. a general measure of anxiety being used for a person with a specific phobia). However, in others instances it may suggest issues relating to the threshold and appropriateness of referral.
It is positive that, irrespective of the measure used, consumers’ mental health improved during the majority (50–60%) of episodes. It is also positive that this improvement was related to indicators of clinical need (i.e. baseline severity); this aligns with the other studies in our evaluation that considered consumer outcomes (Arya et al., 2026; Harris et al., 2026; Pirkis et al., 2026). However, it is of concern that some consumers experienced deterioration in their mental health in considerable numbers of episodes (typically 10–20%), and that some (typically 20–30%) showed no change, although this is consistent with the international literature (Cuijpers et al., 2018). These consumers were most likely to be people who began their episode with relatively mild symptoms or high levels of functioning or satisfaction with life. This may reflect the fact that those who present with relatively worse symptoms or levels of functioning have greater opportunity for improvement.
It is worth commenting on the GAF, which was the only clinician-rated measure in our suite. The GAF was associated with considerably higher proportions of consumers showing significant improvement over their episodes than other measures. This is consistent with international literature which suggests that, compared with consumers, clinicians tend to overestimate outcomes (Cuijpers et al., 2010).
Strengths and limitations
A major strength of this study is that it examined outcomes for consumers over a very large number of episodes (n = 86,121), using a variety of measures. It used real-world outcome data from private practice settings. It is rare for studies conducted in the primary mental health care context to capture outcome data on such a substantial number of episodes, and to do so in a way that provides a window into effectiveness outside of controlled trials.
Our study had some limitations, however. Episodes did not necessarily equate to people; some consumers may have had more than one episode in a given dataset, meaning that the episodes would not have been independent. We were able to investigate this in one of the datasets, and found that the mean number of episodes per consumer was ⩽1.1, indicating that the vast majority of consumers did actually only have one episode of care.
More than one measure may have been used to assess outcomes across a single episode. We considered how to deal with this but decided that it was justifiable to include all measures for each episode, on the grounds that the different measures assessed different constructs.
We were unable to examine the full range of consumer- and provider-level factors that might have influenced outcomes. For example, we did not have information on consumers’ diagnoses, although we know from elsewhere in our evaluation that the majority of Better Access users have a diagnosis of an anxiety disorder (70%) and/or depression (72%) (Pirkis et al., 2026). Likewise, we did not have information on the type of provider who delivered care during the episode. In addition, in our purpose-designed analyses, we were unable to determine whether the episode of care was complete (i.e. whether the consumer had completed treatment). The pre-existing outputs were better suited to this purpose, because we were able to determine whether consumers were still ‘active’ or not. Although some ‘inactive’ consumers may have dropped out of treatment, it is likely that the majority of them would have completed treatment. The findings from this analysis pointed in the expected direction, with outcomes being better for ‘inactive’ consumers than ‘active’ ones.
Although our use of existing data overcame any suggestion of sampling bias, it presented a different issue. Because the data were collected by providers in the course of their clinical practice, the data were not always perfect for the current purpose. In particular, we were only able to be certain that a given session was delivered through Better Access in one dataset. We are, however, confident that the majority of sessions in the other datasets were also delivered via Better Access, for the reasons noted above. The findings from the current study are consistent with those from two other studies in our evaluation where we were able to determine with certainty that participants had used relevant Better Access services. One of these was a survey of consumers who were selected specifically because they had seen a Better Access provider in the past year (Pirkis et al., 2026), and the other was an analysis of data from two national longitudinal studies that had been linked to MBS claims data (Arya et al., 2026). Both indicated that outcomes for Better Access consumers were generally positive.
We went to some lengths to ensure that individual providers and consumers could not be identified, but this meant that we were unable to provide details that we might otherwise have provided. For example, because the datasets varied in size and used unique measures, we were concerned that indicating the number of episodes associated with each measure could identify the dataset (and therefore potentially identify providers and consumers).
Conclusion
Our study provides evidence that Better Access is achieving reductions in symptoms and improvements in functioning and wellbeing for the majority of consumers, particularly those who seek care when they are experiencing relatively severe depression, anxiety and/or psychological distress. A minority of consumers do not have these sorts of positive outcomes, however, and although this is consistent with international literature, further work is required to understand why. Routine measurement of outcomes – particularly those from the consumer’s own perspective – would circumvent the need for one-off studies like ours. It would enable ongoing monitoring of the extent to which Better Access is achieving its goals and would allow improvements to be made to the programme as appropriate.
Acknowledgments
This study was funded by the Australian Government Department of Health, Disability and Ageing, as part of the broader evaluation of Better Access. We would like to thank the two groups that were constituted to advise on the evaluation, the Clinical Advisory Group and the Stakeholder Engagement Group.
Footnotes
The authors declared the following potential conflicts of interest with respect to the research, authorship and/or publication of this article: B.B. is the co-founder and director of NovoPsych; K.F. is the director of Kaye Frankcom Consulting, A.F. is a co-director of Benchmark Psychology and C.M. was the director of Chris Mackey and Associates.
Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The evaluation of Better Access was funded by the Australian Government Department of Health, Disability and Ageing.
ORCID iDs: Jane Pirkis
https://orcid.org/0000-0002-2538-4472
Philip Burgess
https://orcid.org/0000-0001-7184-0363
Aaron Frost
https://orcid.org/0000-0002-5304-1514
Meredith Harris
https://orcid.org/0000-0003-0096-729X
Leo Roberts
https://orcid.org/0000-0002-4486-8667
Katrina Scurrah
https://orcid.org/0000-0001-5226-7370
Matthew J Spittal
https://orcid.org/0000-0002-2841-1536
Caley Tapp
https://orcid.org/0000-0002-2731-7345
Dianne Currier
https://orcid.org/0000-0002-6614-271X
Data accessibility statement: The datasets generated and analysed for the current study are not available.
References
- Allen NB, Jackson HJ. (2011) What kind of evidence do we need for evidence-based mental health policy? The case of the Better Access initiative The Australian and New Zealand Journal of Psychiatry 45: 696–699 [DOI] [PubMed] [Google Scholar]
- Angst F, Aeschlimann A, Angst J. (2017) The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. Journal of Clinical Epidemiology 82: 128–136 [DOI] [PubMed] [Google Scholar]
- Arya V, Tapp C, Currier D, et al. (2026) Examining Better Access use by Australian adults using data from two longitudinal studies (Ten to Men and the Australian Longitudinal Study on Women’s Health). Australian and New Zealand Journal of Psychiatry 60: 74–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Australian Bureau of Statistics (2022) National Study of Mental Health and Wellbeing. Summary Statistics on Key Mental Health Issues Including the Prevalence of Mental Disorders and the Use of Services. Reference Period 2020-21. Available at: https://www.abs.gov.au/statistics/health/mental-health/national-study-mental-health-and-wellbeing/latest-release (accessed 27 June 2023).
- Australian Government Department of Health (2021) Better access initiative. Available at: https://www.health.gov.au/initiatives-and-programs/better-access-initiative#about-the-better-access-initiative (accessed 26 June 2021).
- Australian Government Department of Health (2022) Primary Mental Health Care Minimum Dataset. Available at: https://pmhc-mds.net/#/ (accessed 4 December 2022).
- Australian Government Department of Health (2022) Unpublished data. [Google Scholar]
- Australian Institute of Health and Welfare (2022) Mental Health Services in Australia. Available at: https://www.aihw.gov.au/reports/mental-health-services/mental-health-services-in-australia/report-contents/expenditure-on-mental-health-related-services/data-source-and-key-concepts (accessed 8 September 2022).
- Australian Institute of Health and Welfare (2023) Mental Health Services in Australia: Medicare-subsidised Mental Health-specific Services [Table MBS.1: People Receiving Medicare-subsidised Mental Health-specific Services, by Provider Type, Item Group of Service, States and Territories, 2020–21]. Available at: https://www.aihw.gov.au/mental-health/resources/archived-content (accessed 30 July 2023).
- Australian Mental Health Outcomes and Classification Network (2022) Update of NOCC Data. Available at: https://www.amhocn.org/ (accessed 4 December 2022).
- Australian Prudential Regulation Authority (2022) Operations of Private Health Insurers Annual Report. Available at: https://www.apra.gov.au/operations-of-private-health-insurers-annual-report (accessed 8 May 2022)
- Barkham M, Bewick B, Mullin T, et al. (2013) The CORE-10: A short measure of psychological distress for routine use in the psychological therapies. Counselling and Psychotherapy Research 13: 3–13. [Google Scholar]
- Barkham M, Evans C, Margison F, et al. (1998) The rationale for developing and implementing core outcome batteries for routine use in service settings and psychotherapy outcome research. Journal of Mental Health 7: 35–47. [Google Scholar]
- Chilver M, Harris M, Pirkis J, et al. (2026) Accessibility and responsiveness of Better Access treatment services: Insights from the use of linked administrative data in the evaluation of Better Access. Australian and New Zealand Journal of Psychiatry 60: 25–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuijpers P, Li J, Hofmann SG, et al. (2010) Self-reported versus clinician-rated symptoms of depression as outcome measures in psychotherapy research on depression: A meta-analysis. Clinical Psychology Review 30: 768–778. [DOI] [PubMed] [Google Scholar]
- Cuijpers P, Reijnders M, Karyotaki E, et al. (2018) Negative effects of psychotherapies for adult depression: A meta-analysis of deterioration rates. Journal of Affective Disorders 239: 138–145. [DOI] [PubMed] [Google Scholar]
- Currier D, Williamson M, Newton D, et al. (2026) A virtual consultative forum on future reforms to Better Access. Australian and New Zealand Journal of Psychiatry 60: 115–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diener E, Emmons R, Larsen R, et al. (1985) The satisfaction with life scale. Journal of Personality Assessment 49: 71–75. [DOI] [PubMed] [Google Scholar]
- Endicott J, Spitzer RL, Fleiss JL, et al. (1976) The global assessment scale: A procedure for measuring overall severity of psychiatric disturbance. Archives of General Psychiatry 33: 766–771. [DOI] [PubMed] [Google Scholar]
- Evans C, Connell J, Barkham M, et al. (2002) Towards a standardised brief outcome measure: Psychometric properties and utility of the CORE-OM. The British Journal of Psychiatry: the Journal of Mental Science 180: 51–60 [DOI] [PubMed] [Google Scholar]
- Halford WK, Frost ADJ. (2021) Depression anxiety stress scale-10: A brief measure for routine psychotherapy outcome and progress assessment. Behaviour Change 38: 221–234. [Google Scholar]
- Harris M, Tapp C, Le LK-D, et al. (2026) Who uses Better Access treatment services? A re-analysis of data from the usual care arms of two randomised controlled trials. Australian and New Zealand Journal of Psychiatry 60: 61–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hickie IB, Rosenberg S, Davenport TA. (2011) Australia's Better Access initiative: Still awaiting serious evaluation. The Australian and New Zealand Journal of Psychiatry 45: 814–823. [DOI] [PubMed] [Google Scholar]
- Kessler RC, Andrews G, Colpe LJ, et al. (2002) Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological Medicine 32: 959–976 [DOI] [PubMed] [Google Scholar]
- Kounali D, Button K, Lewis G, et al. (2020) How much change is enough? Evidence from a longitudinal study on depression in UK primary care. Psychological Medicine 52: 1875–1882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kroenke K, Baye F, Lourens SG. (2019) Comparative responsiveness and minimally important difference of common anxiety measures. Medical Care 57: 890–897. [DOI] [PubMed] [Google Scholar]
- Kroenke K, Spitzer RL, Williams JB. (2001) The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine 16: 606–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lovibond PF, Lovibond SH. (1995. a) The structure of negative emotional states: Comparison of the depression anxiety stress scales with the beck depression and anxiety inventories. Behaviour Research and Therapy 33: 335–343. [DOI] [PubMed] [Google Scholar]
- Lovibond S, Lovibond P. (1995. b) Manual for the depression anxiety stress Scales. Sydney, NSW, Australia: Psychology Foundation. [Google Scholar]
- Miller S, Duncan B, Brown J, et al. (2003) The outcome rating Scale: A preliminary study of the reliability, validity, and feasibility of a brief visual analogue measure. Journal of Brief Therapy 2: 91–100. [Google Scholar]
- Newton D, Williamson M, Pirkis J, et al. (2026) Perspectives on Better Access: In-depth interviews with users and non-users of the initiative. Australian and New Zealand Journal of Psychiatry 60: 95–102. [Google Scholar]
- Pirkis J, Currier D, Harris M, et al. (2022) Evaluation of Better Access: Main Report. Melbourne: The University of Melbourne. [Google Scholar]
- Pirkis J, Ftanou M, Williamson M, et al. (2011. a) An evaluation of Australia’s Better Access program. Australian and New Zealand Journal of Psychiatry 45: 726–739. [DOI] [PubMed] [Google Scholar]
- Pirkis J, Harris M, Arya V, et al. (2026) Consumers’ experiences with and outcomes from Better Access: Results from a national survey. Australian and New Zealand Journal of Psychiatry 60: 49–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pirkis J, Harris M, Hall W, et al. (2011. b) Evaluation of the Better Access to Psychiatrists, Psychologists and GPs through the Medicare Benefits Schedule initiative: summative evaluation. Melbourne: The University of Melbourne. [Google Scholar]
- Schenker N, Gentleman J. (2001) On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician 55: 182–186. [Google Scholar]
- Spitzer R, Kroenke K, Williams J, et al. (2006) A brief measure for assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine 166: 1092–1097. [DOI] [PubMed] [Google Scholar]
- Tapp C, Harris M, Currier D, et al. (2026. a) Australia’s Better Access initiative: A survey of provider and referrer views. Australian and New Zealand Journal of Psychiatry 60: 103–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tapp C, Scheurer R, Burgess P, et al. (2026. b) Uptake, utilisation, and costs of treatment through Better Access from 2018 to 2022: an analysis of Medicare Benefits Schedule data. Australian and New Zealand Journal of Psychiatry 60: 11–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watson D, Clark LA, Tellegen A. (1988) Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 54: 1063–1070. [DOI] [PubMed] [Google Scholar]



