Table A Information categories and frequencies for personal statements and references

Table B Percentage of personal statement and references categories falling within six main themes

Table C Zero order bivariate correlation matrix used in LISREL analyses in 87 students

Coding and quantification of personal statements and reference
 

Table A Information categories and frequencies for personal statements and references
 

Personal statement
 
References
Category
FQ (%)
Category
FQ
Medical voluntary wok
84
 Highly intelligent and very able
95
Plays sport
78
 Motivated and dedicated
79
Society member
76
 Interpersonal skills
78
Hobbies—for relaxation (for example, stamp collecting)
72
 Contributes to school life
72
School responsibilities (for example, prefect)
60
 Liked by peers and staff
60
Plays musical instrument—for personal relaxation and not as part of choir or orchestra
56
 Necessary personal and academic qualities to succeed
50
See medicine as challenge
54
 Good written and oral work
47
Non-medical voluntary work
53
 Good analytic skills
45
Head girl or boy
40
 Mature
43
Likes science
40
 Organised
33
Choir or orchestra—plays in choir or orchestra and not just for personal pleasure
36
 Reliable
32
Attended medical conference
35
 Contributes to class discussions
25
Likes travelling
35
 Good health and punctuality
25
Completed D of E award
33
 Work well both in teams and individually
23
Religious
23
 Leadership skills
17
Life time ambition to do medicine
23
 Good sense of humour
14
Good communication skills
23
 Negative comments—emotionally unstable
14
Interest in human body
20
 Good all rounder
12
Member of youth group
17
 Good family background
10
Family ties to medicine
16
 Family ties to medicine
7
Altruism
14
   
Speaks second language
12
   
Likes teamwork
9
   
Reads scientific journals
9
   
Wants to work abroad
5
   
Family illness was inspiration
5
   

Note. Columns 1 and 2 for the personal statement are adapted from Ferguson E, Sanders A., O’Hehir F, James D. Predictive Validity of personal statement and the role of the five factor model of personality in relation to medical training. J Occ Org Psych 2000;73:321-44.
 
 

Table B Percentage of personal statement and references categories falling within six main themes
 

 
Personal statement
 
Reference
Mean (SD)
Range
Mean (SD)
Range
Academic knowledge
7.3 (3.3)
3.8-11.5
 
13.7 (4.7)
10-20
Study skills
2.0 (2.3)
0-4.0
 
11.2 (7.5)
5.0-20
Hobbies
31.2 (9.2)
21-43
 
0
0-0
Social skills
11.8 (8.1)
7.6-24.0
 
23.7 (6.3)
15-30
Motivation to do medicine
30.6 (3.2)
26.9-34.6
 
10.0 (4.1)
5-15
Good character
16.9 (4.5)
11.59-22
 
41.2 (2.5)
40-45

 
 

Table C Zero order bivariate correlation matrix used in LISREL analyses in 87 students
 

 
A level
Personal statement
Conscientiousness
Preclinical
BMedSci
Clinical
A level points score
1
     
Amount of information in personal statement
0.09
1
    
Conscientiousness†
0.23*
0.09
1
   
Preclinical‡
0.30**
0.15
0.55***
1
  
BMedSci§
0.23*
0.07
0.53***
0.88***
1
 
Clinical¶
0.26**
0.20*
0.24*
0.65***
0.67***
1

*P<0.05, **P<0.01, ***P<0.001.

†From Goldberg’s bipolar markers.

‡Total marks from assessments in preclinical years.

§Total marks for BMedSci year.

¶Total marks for clinical years.
 
 
 

Coding and quantification of personal statements and reference

Rationale

The rationale behind the coding of the personal statements and references was to identify the sorts of information candidates and their referees choose to write in support of their application to medical school. This is not just a simple count of words written but an attempt to identify the informational content and then quantify this for amount of information.

Thus we used a manifest coding strategy, which involved the identification of key words or phrases.w1 w2 Therefore the coding should pick up individual aspects of information and not just word length, as key themes and ideas can be expressed in a few words. Indeed, research has shown that essays containing more central themes to the topic of the essay are significantly more likely to get a higher grade (r=0.58, P<0.001), whereas word length is not reported as significantly related to essay grade.w3 Also, the author of that article did not report any relation between word length and number of themes. Therefore, identifying themes or categories of information is not the same as merely recording word length.w3

Procedure

The procedure used was the same for both the personal statements and the references and involved three steps. Firstly, we used manifest coding to develop the initial coding frames for the personal statements and references. Secondly, we conducted a study using four raters to explore the content validity of the personal statement and reference categories. Thirdly, we explored the statistical independence of the informational categories.

Results

Development and reliability of informational categories

The free text of both the personal statements and the references were read through by AS, and an initial categorisation scheme was developed. Using this framework the same researcher then read through all the personal statements and references again, coding each for derived categories. A second independent rater (another postgraduate student), blind to the ratings provided by the first rater, used the coding scheme to code the information in each of the personal statements and references (interrater agreement 86%: all differences were resolved by discussion to consensus). Through this process, 26 information categories were identified and extracted from the personal statements and 20 information categories were identified and extracted from the references. These categories and their frequency are reported in table A.

Content validity of the informational categories

Four experienced academic and researcher staff (one with a PhD, three with masters degrees completing PhDs: total research experience, 22 years) were provided with the information categories for the personal statements and references as presented in table A. They were then given six general themes:

They were then asked to indicate for the personal statement and the reference separately the percentage of the categories in each that reflected these six general themes, such that for the personal statement and references the total across the six themes was 100% each. The means percentages are presented in table B.

As table B shows, most of the personal statements categories cover motivation (medical voluntary work) and hobbies (plays sport), whereas the reference categories cover character (mature) and social skills (interpersonal skills).

Statistical independence of informational categories

To show that these categories were statistically independent, the Kaiser-Meyer-Olkin test of sampling adequacy was applied to the correlation matrix for personal statement codes and the reference codes. The Kaiser-Meyer-Olkin tests if there is a significant degree of covariation within a matrix. In this context this would indicate that in the personal statements, people who mentioned one type of information were more likely to systematically mention another type. Similarly for the reference, teachers mentioning one type of information were more likely systematically to mention other types of information. The Kaiser-Meyer-Olkin scores vary from 0 to 1 and is calibrated into categories as follows:

The Kaiser-Meyer-Olkin scores for the reference categories and personal statement categories were 0.46 and 0.51, respectively. These values indicate that the categories identified for the personal statements and the references were statistically independent.

Scoring

For each candidate the number of the 26 personal statement categories and the 20 reference categories contained in the UCAS form were recorded. Categories were coded as 1 for present and 0 for absent. Then these were summed to produce two scores reflecting the amount of information in the personal statement (mean score 9.3 (SD 2.3) items, range 2-20) and the reference (mean score 7.8 (SD 2.1) items, range 2-13), respectively. No weighting applied to these scores. The presence of each category was scored with a single unit score.

Structural modelling

Scores pertaining to medical school performance are correlated, therefore we applied structural equation modelling to these data using LISREL 8.w4 Structural equation modelling allows the researcher to explore more complex patterns in the data. For example, there may be significant associations between A levels and both preclinical performance and clinical performance, as well as significant associations between preclinical and clinical performance. However, it may be that scores on A levels do not have a direct influence on the clinical performance, rather the effect is indirect via preclinical performance. Structural equation modelling allows for such hypotheses to be tested, by exploring how well a theoretically specified model explains the pattern of intercorrelations in a set of variables. Structural equation modelling also provides a series of fit statistics, which quantify how well the theoretically specified model fits the data. Based on recent recommendations, the following fit statistics are reported: c2, the comparative fit index, and the root mean square approximation of error.w5 For a good fitting model the c2 should be non-significant, the comparative fit index should be >0.95 (potential range 0-1), and the root mean square approximation of error should be <0.06.w5

Based on the results of the correlational and hierarchical multiple linear regression analyses (presented in the main paper), A level scores, the amount of information in the personal statement, conscientiousness, and marks from the preclinical, BMedSci, and clinical assessments were used to construct a structural model. A levels and conscientiousness were included as they were related to all the averaged assessments (preclinical, BMedSci, and clinical) across the medical school. The quantity of information in the personal statement was included as no studies have examined the personal statement in detail over the course of medical training, and the above results show that it is predictive of clinical training.

The rationale for the model used is as follows. The time line from A levels, to preclinical scores, to BMedSci scores to clinical scores, and from preclinical to clinical scores was specified as the basic backbone of the model. Paths were then specified from conscientiousness to A level scores, preclinical scores, BMedSci scores, and clinical scores. Finally, a path was specified from the amount of information contained in the personal statement and levels of clinical knowledge. This model was an acceptable fit to the data (c2=5.82 df=6, P=0.44, comparative fit index=1.0, root mean square approximation of error=.0 (90% confidence interval 0.00 to 0.14), n=87). Table C presents the correlation matrix on which the structural model is based.

w1 Mahalski PE. Essay writing: do study manuals give relevant advice? Higher Educ 1982;24:113-32.

w2 Dane FC. Research methods. Pacific Grove, CA: Brooks Cole, 1990.

w3 Krippendorff, K. Content analysis: an introduction to its methodology. London: Sage, 1980.

w4 Joreskog KG, Sorbom D, du Toit S, du Toit M. Interactive LISREL 8: User’s guide. Chicago, IL: Scientific Software, 2001.

w5 Hu L, Bentler PM. Cut-off criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatitives. Struct Equational Modeling 1999;6:1-55.