Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2008 Feb 6.
Published in final edited form as: Stammering Res. 2005 Jan 1;1(4):333–343.

Elements of statistical treatment of speech and hearing science data

Adrian Davis 1, Peter Howell 2
PMCID: PMC2231513  EMSID: UKMS1099  PMID: 18259586

Abstract

Many of the statistical issues involved in speech and hearing research are shared with other areas of medicine. This article is the first in a series intended to stimulate examination of research data in speech and hearing areas using a wide variety of techniques. This article specifically deals with two essential, but elementary, issues. The first is concerned with experimental design and choice of test data. The second, defines and explains statistical terms, concentrating particularly on the inference to the population mean from the sample mean.

Keywords: Statistics, experimental design, choice of data, inferential statistics

1. Background

This article is the first in a series that will discuss statistical treatment of data from studies into communication and its disorders. Application areas include where data from people with speaking or hearing disorders are available and researchers want to visualize or quantify how performance of these individuals relate to fluent populations. The range of topics that can be covered is massive starting with simple ones like how to display and make summary statements about data, through to highly detailed technical ones like implementation and assessment of causal models of the problems. This article is at the elementary end (the appropriate place to start) and introduces a) issues concerned with experimental design and choice of test data, b) definition of statistical terms involved with inference of the sample mean from the population mean. We start by outlining our motivation in initiating this series of articles and then emphasize how this article is different from a textbook on the subject.

Our experience in teaching and supervising projects with some trainee professionals who will deliver speech and hearing therapy to clients has shown that difficulties are experienced with statistics. It is our belief that part of the reason for this is that the texts and approaches often appear to be remote from the concerns of the students. In particular, texts for teaching students are often from other application areas about the study of behaviour (mainly psychology) that, though closely allied to speech and hearing science, are not directly applicable. When we have used examples from the speech and hearing sciences in our teaching, we have found that students seem able to access the material more readily. Even though we have limited ourselves to elementary statistics, the treatment is not comprehensive. There are two major omissions from these ‘elements’. The first concerns depiction of data. The second is statistical treatment when only a single case is available. It is hoped that both these omissions will be the subject of a future article.

The article is not a précis of a statistics course. There are many good such texts and other resources around (see the bibliography). These books mostly contain detailed information about how to describe data (only touched on here), hypothesis testing and cover the logic and mathematics behind different tests. The article should be used as a gateway to use one of the statistical techniques that give details of specific tests and procedures (though a couple of tests are considered to convey some important details). We do not mean to imply that all available textbooks lack such an overview. However, as textbooks are more extensive than this brief article, there is a tendency for this information to be distributed widely making it hard for students to relate it together. We envisage students reading this article and, once they have understood the material, being able to use it to access the appropriate information in other texts.

2. Experimental design and choice of data

When talking about procedural considerations in statistical analysis, it may help to make things concrete from the start by using an example drawn from hearing science. This is intended to introduce some concepts which are needed both for a researcher to evaluate studies and to allow an experimenter to do their own statistics. Let us assume that a company has developed a hearing aid (Aid A). It is to be employed in a country where all the inhabitants who have a hearing impairment might want to use it. The company wants some idea about how its performance compares with another aid on the market (Aid X). The company calls on a hearing clinic to evaluate the device and the clinic decides to assess it using read texts that can be carefully controlled, even though the aid will eventually have to operate with spontaneous speech to be useful for clients. Some of the questions the assessment team commissioned to do the work may decide to address are:

  1. How to check whether there are differences between spontaneous and read speech, then make a decision whether the results with read speech apply to spontaneous speech.

  2. If they find differences between read and spontaneous speech that require them to use the latter, how can they check whether language statistics on a sample of recordings are representative of the language as a whole?

  3. How are appropriate test data chosen?

These points highlight some of the statistical analysis issues that feature in everyday clinical decisions. Similar questions would arise for a speech therapist when considering whether some new treatment they have learned about produces improvement in speech control so they can decide whether it is worth changing from the current procedure they deliver. In each case, the speech or hearing professional may or may not conduct the research themselves. When they rely on someone else’s published report about the speech or hearing procedure, they need to know whether the analysis was conducted properly and how to interpret the results. Moreover, the specific questions raised, though pertaining to a particular issue of concern, are illustrative of many similar problems that clinicians encounter. Now we will set about attempting to provide answers to these (and other) questions.

Statistical and experimental procedures for analyzing data

In the first part of this section, some fundamental ideas in statistics will be illustrated through selected examples drawn from the speech and hearing sciences.

Statistics is the acquisition, handling, presentation and interpretation of numerical data. Speech and hearing scientists have considerable experience acquiring, handling, presenting and interpreting communication data.

Populations, samples and other terminology

A population is usually defined as the collection of units to which inference (from the sample) is desired (the units may be people, phonemes etc.). In the earlier example, all hearing impaired individuals in the country are the population. Here everyday use of the term ‘population’ corresponds with its use in a statistical sense. Though population in a statistical sense can have the same meaning as the geographical sense, it need not be the case. Thus, for instance the population of users of a hearing aid specifically developed for presbyacusis would only comprise individuals with this disorder. Population does not only refer to humans - for example, the population of /p/ phonemes of a speaker would be all of the instances of that phoneme a speaker ever produces.

A variable ranges over numerical values associated with each unit of the population. Variables are classed as either independent or dependent variables. An independent (or, as some statisticians prefer, explanatory) variable is one that is controlled or manipulated by the researcher. So, for example, when setting up a test for a hearing aid or some treatment for a speech problem, the experimenter might consider it necessary to ensure that as many females are recorded in the test data as males. Sex would then be an independent variable (independent variables are also referred to as factors, particularly in connection with the statistical technique Analysis of Variance, ANOVA discussed in a later section). A dependent variable is a variable that the investigator measures to determine the effect of the independent variable.

When a variable is measured on all units of a population, a full census has been taken. If it were always possible to obtain census data, there would be no need for inferential statistics. However, since most speech and hearing applications (and, indeed, in many other aspects that require measurement), involve very large or infinite populations (such as those illustrated earlier of speakers or phonemes), it is not possible to measure variables on all units: In these circumstances, a finite sample is taken. This sample is used to study the variable of concern in the population. So, if you wanted an idea of the average voice fundamental frequency of men, you might make measurements on a sample of 100 men. This sample is then studied as if it is representative of the population. The statistician is able to provide information about the relationship between variables measured on the sample (here its mean) and, what the investigator is really interested in, the mean voice fundamental frequency of the population.

Sampling

The main problem in treating data statistically is how to ensure the reliability of information about the population obtained from a sample. The main requirement to achieve this is to take a simple random sample. A sample is simple random if every member of the population has the same chance of being selected as every other member. Thus, if some voice recognition equipment is needed for a research application, and a check is made on its performance using employees from the clinic, the sample would not be simple random: It is unlikely that the employees from the clinic are from all social strata, there may be gender imbalances, and they would only include people of working-age.

Biases

Selection of a sample that is not a simple random sample is one of the main sources of bias in assessments. Bias can be defined as a systematic tendency to misrepresent the population. So, if the speech recognizer mentioned in the previous section is intended to be used by all members of the population, you cannot select an unbiased sample of speakers from a sample of people recorded just between 9 a.m. and 5 p.m. This would exclude people who are at work; so, if you do this, the result is a biased sample which is not necessarily representative of the target population.

If you take a sample, how sure can you be that if you measure variables such as the mean of the sample is close to the mean of the population? This sort of problem is termed estimation and is considered in the following.

Estimating sample means, proportions and variances

Estimation is used for making decisions about populations based on simple random samples. A truly random sample is likely to be representative of the population; this does not mean that a variable measured on a second sample taken will be the same as the first. The skill involved in estimating the value of a variable is to impose conditions which allow an acceptable degree of error in the estimate without being so conservative as to be useless in practice (an extreme case of the latter would be recommending a sample of the same order of magnitude as the population). The necessary background skill is to understand how quantities like sample means, proportions and variances are related to means, proportions and variances in the population. The following notation is used in the discussion: M is the sample mean, S is the sample standard deviation, and S2 is the sample variance; μ, is the population mean, σ is the population standard deviation, and σ2 is the population variance. The abbreviations sd and S.D. are sometimes used for standard deviation; S.E. is used for standard error, Z is used for z-scores, ≫ pest ≪ stands for estimated probability, and p stands for proportion.

Estimating means

A fundamental step towards this goal is to relate the sample statistic to a probability distribution: What this means is: if we repeatedly take samples from a population, how do the variables measured on the sample relate to those of the population? To translate this to an empirical example: How sure can you be about how close your sample mean lies to the population mean? Even more concretely, if we obtain the mean of a set of samples, how does the mean of a particular sample relate to the mean of the population? As has already been said, the value of the mean of the first of two samples is unlikely to be exactly the same as the second. However, if repeated samples are taken, the mean value of all the samples will cluster around the population mean; this is usually regarded as an unbiased estimator of the population mean.

The usual way this is shown in textbooks is to take a known distribution (i.e., where the population mean is known) and then consider what the distribution would be like when samples of a given size are taken. So, if a population of events has equally likely outcomes and the variable values are 1, 2, 3 and 4, the mean would be 2.5. If all possible combinations are taken (1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4), the mean of the mean values for all pairs is also 2.5 (taking all pairs is a way of ensuring that the sample is simple random). An additional important finding is that if the distribution of sample means (the sampling distribution) are plotted as a histogram, the distribution is no longer rectangular (rectangular because each option was assumed to occur with the same frequency) but has a peak at 2.5 (1 and 4, and 2 and 3 both have a mean of 2.5 and none of the means of the other pairs have the same value, thus, the peak at 2.5). Moreover the distribution is symmetrical about the mean and approximates more to a normal (Gaussian) distribution even though the original distribution was not. As sample size gets larger the approximation to the normal distribution gets better. Moreover, this tendency applies to all distributions, not just the rectangular distribution considered. The tendency of large samples to approximate the normal distribution is, in fact, a case of the Central Limit Theorem.

This particular result has far-reaching implications when testing between alternative hypotheses (see below). As a rule of thumb sample sizes of 30 or greater are adequate to approximate the mean of a normal distribution (though this depends on the nature of the parent distribution).

The statistical quantity standard deviation (S) is a measure of how a set of observations x1 - xn where n is the number of observations scatter about the mean (x with the bar above it). It is defined numerically as:

S=(x1x)2+(x2x)2++(xnx)2n1=i=1n(xix)2n1

Later the related quantity of the variance will be needed. This is simply the sd squared:

S2=(x1x)2+(x2x)2++(xnx)2n1=i=1n(xix)2n1

An important aspect of the situation described is that the sample means themselves (rather than the observations) have a standard deviation (sd). The sd of the sample means (here the sd of all samples of size two for the rectangular distribution) is related to the sd of the samples in the original distribution by the formula:

S.E.=σxn

This quantity is given a particular name to distinguish it from the sd - it is called the standard error (S.E.). In practice, the standard deviation of the population is often not known. In these circumstances, provided the sample is sufficiently large, the standard deviation of the sample can be used to approximate that of the population and the above formula used to calculate the S.E. The S.E. is used in the computation of another quantity, the z score of the sample mean:

Z=xμxS.E.

The importance of this quantity is that the measure can be translated into a probabilistic statement relating the sample and population means. Put another way, from the z score, the probability of a sample mean being so far from the population mean can be computed.

To show how this is used in practice: if a sample of size 200 is taken, what is the probability that the mean is within 1.5 S.E.s of the population mean? Normal distribution tables give the desired area. Here is a section of a table giving the proportion of the area of a normal distribution associated with given values of z (the stippled section in the figure indicates what area is tabulated) where the mean is the line at the peak and the line to its right is 1.5 S.E. away: graphic file with name nihms-1099-f0001.jpg

The sketch of the normal distribution is symmetrical and the symmetry is about the mean value (i.e., the peak of the distribution). The z values above the mean are tabulated, and the row with a z value of 1.5 indicates that 0.4332 of the area on the right half of the distribution lie within 1.5 S.E.s above the mean. Since it has already been noted that the distribution is symmetrical, 0.4332 of the area will lie within 1.5 S.E.s below the mean. Thus, the area within 1.5 S.E.s above or below the mean is 0.4332 + 0.4332, or 0.8664. Thus, converted to percentages, approximately 86.6 of all samples of size 200 will have means within 1.5 S.E.s of the population mean. If, as in any real experiment, one sample is taken, we can assign a statement about how likely that sample is being within the specified distance of the mean.

Another, related, use of S.E.s is in stipulating confidence intervals. If you look at the areas associated with particular z values in the way just described, you should be able to ascertain that the area of a normal distribution enclosed within z values ±1.96 S.E.s of the mean M is 95. Thus, if the S.E. and M of a sample are known, you can specify a measurement interval that indicates the degree of confidence (here 95%) that the population mean will be within these bounds. This is between the value 1.96 times the S.E below the sample mean and 1.96 times the S.E. above the sample mean. This particular interval is called the 95% confidence interval. Other levels of confidence can be adopted by obtaining the corresponding z values.

Since this topic is so important, an example is given: Say a random sample of mean voice fundamental of 64 male university students has a mean of 98 Hz and a standard deviation of 32 Hz. What is the 95% confidence interval for mean voice fundamental of the male students at this university? The maximum error of the estimate is approximated (using sample standard deviation S rather than that of the population σ as an approximation, see above) as:

1.96Sn=1.96(3264)=7.84

Thus, the 95% confidence interval is from 98 - 7.84 = 90.16 to 98 4- 7.84 = 105.84. Often, the confidence intervals are presented graphically along with the means: the mean of the dependent variable is indicated on the y axis with some chosen symbol; a line representing the confidence interval extends from (in this case) 90.16 to 105.84 and it is drawn vertically and passes through the mean.

Before leaving this section, it is necessary to consider what to do when wanting to make corresponding statements about small-sized samples which cannot be approximated with the normal distribution. Here computation of the mean and standard error proceeds as before. Since the quantity z is used in conjunction with the normal distribution tables, it cannot be used. Instead the analogous quantity t is calculated:

t=xμxS.E.

The distribution of t is dependent on sample size n and so (in essence) the t value has to be referred to different tables for each size of sample. The tables corresponding to the t distribution are usually collapsed into one table and the section of the table used is accessed by a parameter related to the sample size n (the quantity used for accessing the table is n - 1 and is called the degrees of freedom). Clearly, since several different distributions are being tabulated, some condensation of the information relative to the z tables is desirable. For this reason, t values corresponding to particular probabilities are given. Consideration of t tables emphasizes one of the advantages of the Central Limit Theorem insofar as one table can be used to address a wide variety of issues rather than is the case for t.

Estimating proportions

Here the problem faced is similar to that with means: A sample has been taken and the proportion of people meeting some criterion and those not meeting that criterion are observed. The question is with what degree of confidence can you assert that the proportions observed reflect those in the population? Once again the solution is directly related to that discussed when estimating how close a sample mean lies to the population mean using z scores. Essentially the z score for means measures:

Z=estimated meanpopulation meanS.E.

The only difference here is that binomial events are being considered (meet/not meet the criterion). The z score associated with a particular sample based on the estimated probability and the population proportion is (where q = 1-p): ≫phat≪

Z=P^p(pqn)

Normal distribution tables can again be used to assign a probability associated with this particular outcome.

To illustrate with an example: Suppose that it is expected that as many men will use the speech recognizer as will women (p (man) = p (woman) 0.5). What size of sample is needed to be 95% certain that the proportion of men and women in the sample differs from that in the population by at most 4%?

1.96(=95%)=.04(.5)(.5)n

Solving for n gives 600.25. Therefore, a sample of size at least 601 should be used. Now what are the effects if we want to be more than 4 confident, say if the difference is reduced to 2%. The required sample size jumps to 2401.

Estimating variance

The relationship between the variance of a sample and that of the population is distributed as χ2 (chi squared) with n - 1 degrees of freedom.

χ2=(n1)S2σ2

Thus, if we have a sample of size 10 drawn from a normal population with population variance 12, the probability of its variance exceeding 18 is:

χ2=(n1)S2σ2=9.1812=13.5

This has associated with it 9 degrees of freedom. Because χ2 values are only tabulated for particular probabilities (as with t), the probability can only be estimated for limited probabilities. In this case χ2 lies between 0.2 and 0.1.

Ratio of sample variances

If two independent samples are taken from two normal populations with variance σ12 and σ22, the ratio of the two variances (S12 and S22) has the F distribution:

F=S12σ12S22σ22

If the two samples (which can differ in size) from the same normal population are taken, then the ratio of the variances will be approximately 1. Conversely, if the samples are not from the same normal population, the ratio of their variances will not be 1 (the ratio of the variances is termed the F ratio). The F tables can be used to assign probabilities that the sample variances were or were not from the same normal distribution. The importance of this in the Analysis of Variance (ANOVA) will be seen later.

3. Statistical terms involved in inference to the population mean from the sample mean

Simple hypothesis testing

Many practical and research applications in speech and hearing science require testing of hypotheses. An example from the scenario given at the outset was testing whether there were differences between read and spontaneous speech with respect to selected statistics. If the statistic was mean vowel duration in the two conditions where speech was recorded, we have a situation calling for simple hypothesis testing. This situation is called simple hypothesis testing since it involves a parameter of a single population.

Following the approach adopted so far, the concepts involved in such testing are illustrated for this selected example. The first step is to make alternative assertions about what the likely outcome of an analysis might be. One assertion is that the analysis might provide no evidence of a difference between the two conditions. This case is referred to as the null hypothesis (conventionally abbreviated as Ho) and might assert here that the mean syllable duration in the read speech is the same as that in the spontaneous speech. Other assertions might be made about this situation. These are referred to as alternative hypotheses. One alternative hypothesis would be that the vowel duration in the read speech will be less than that of the spontaneous speech. A second would be the converse, i.e. the vowel duration in the spontaneous speech will be less than that of the read speech. The decision about which of these alternate hypotheses to propose will depend on factors that lead the speech or hearing student or investigator to expect differences in one direction or the other. These instances are referred to as one-tailed (one-directional) hypotheses as each predicts a specific way in which there will be a difference between read and spontaneous speech. If the investigator wants to test for a difference but has no theoretical or empirical reasons for predicting the direction of the difference, then the hypothesis is said to be two-tailed. Here, large differences between the means of the read and spontaneous speech, no matter which direction they go in, might constitute evidence in favor of the alternative hypothesis.

The distinction between one and two-tailed tests is an important one as it affects what difference between means is needed to assert a significant difference (i.e., support the null hypothesis). In the case of a one-tailed test, smaller differences between means are needed than in the case of two-tailed tests. Basically, this comes down to how the tables are used in the final step of assessing significance (see below). There are no fixed conventions for the format of tables for the different tests, so there is no point in illustrating how to use them. The tables usually contain guidance as to how they should be used to assess significance.

Hypothesis testing involves asserting what level of support can be given in favor of, on the one hand, the null, and, on the other, the alternate hypotheses. Clearly no difference between the means of the read and spontaneous speech would indicate that the null hypothesis is supported for this sample. A big difference between the means would seem to indicate that there is a statistical difference between these samples if the direction in which the means differs is in the same direction as hypothesized for a one-tailed hypothesis or if a two-tailed test has been formulated. The way in which a decision whether a particular level of support (a probability) is provided is described next.

In the read-spontaneous example that we have been working through, we are interested in testing for a difference between means for two samples where, it is assumed, the samples are from the same speaker. The latter point requires that a related groups test as opposed to an independent groups test is used. In this case, the t statistic is computed from:

t=mean of condition1mean of condition2S.E.of differences

Thus if the read speech for 15 speakers had a mean vowel duration of 40.2 milliseconds and the spontaneous speech 36.4 milliseconds and the standard deviation of the difference between the means is 2.67, the t value is 1.42. The t value is then used for establishing whether two sample means lying this far apart might have come from the same (null hypothesis) or different (alternate hypothesis) distributions. This is done by consulting tables of the t statistic using n-l degrees of freedom (here n refers to the number of pairs of observations).

In assessing a level of support for the alternate hypothesis, decision rules are formulated. Basically this involves stipulating that if the probability of the means lying this far apart is so low then a more likely alternative is that the samples are drawn from different populations, assuming that the samples are from the same distribution. The “stipulation” is done in terms of discrete probability levels and, conventionally, if there is a less than 5% chance that the samples were from the same distribution, then the hypothesis that the samples were drawn from different distributions is supported (the alternative hypothesis at that level of significance). Conversely, if there is a greater than 5 in a hundred chance that the samples are from the same distribution, the null hypothesis is supported. In the worked example, with 14 degrees of freedom, a t value of 1.42 does not support the hypothesis that the samples are drawn from different populations, thus the null hypothesis is accepted. It should be noted that support or rejection of these alternative hypotheses is statistical rather than absolute. In 1/20 (5%) cases where no difference is asserted, a difference does occur (referred to as a Type II error, accepting the null hypothesis when it is false) and in cases where a 5% significance level is adopted and differences found, 1 occasion out of 20 will also lead to an error (referred to as a Type I error, rejecting the null hypothesis when it is in fact true).

Analysis of Variance

As was said earlier, this is not supposed to be a substitute for your statistics textbook as, in particular, it does not cover all statistical tests that might be encountered. It only offers an overview and a means of accessing relevant material in a textbook.

However, some comments on Analysis of Variance (ANOVA) are called for as it is a technique that has a widespread use in speech and hearing assessment. ANOVA is a statistical method for assessing the importance of factors that produce variability in responses or observations. The approach is to control for a factor by specifying different values (or, treatment levels) for it in order to see if there is an effect. It can be thought of as having sampled a potentially different population (different in the sense of having different means). Factors that have an effect change the variation in sample means, where “factor” refers to a controlled independent variable. When the experimenter controls the levels of the factors, this is referred to as treatment level.

For example, in the ANOVA approach, two estimates of the variances are obtained: the variance between the sample means, between groups variance, and the variance of each of the scores about their group mean, within groups variance. If the treatment factor has had no effect, then variability between and within groups should both be estimates of the population variance. So, as discussed earlier when the ratio of two sample variances from the same population was considered, if the ratio of between groups to within groups is taken, the value should be about 1 (in which case, the null hypothesis is supported). The ratio of two variances is called the F ratio. Statistical tables of the F distribution can be consulted to ascertain whether the F ratio is large enough to support the hypothesis that the treatment factor has had an effect resulting in larger variance of the between group to the within group means (the alternative hypothesis is supported). Another way of looking at this is that the between groups variance is affected by individual variation of the units tested plus the treatment effect whereas the within groups estimate is only affected by individual variation of the units tested.

ANOVA is a powerful tool which has been developed to examine treatment effects involving several factors. Some examples of its scope are that it can be used with two or more factors. Factors that are associated with independent and related groups can be tested in the same analysis, and so on. When more than one factor is involved in an analysis, the dependence between factors (interactions) comes into play and has major implications for the interpretation of results.

Non-parametric tests

Parametric tests cannot be used when discrete, rather than continuous, measures are obtained since the Central Limit Theorem does not approximate the normal distribution in these instances. The distinction between discrete and continuous measures is the principal factor governing whether a parametric or non-parametric test can be employed. Continuous and discrete measures relate to another taxonomy of scales - interval, nominal and ordinal: interval scales are continuous and the others are discrete. Statisticians consider this taxonomy misleading, but since it is frequently encountered in the behavioral sciences in general, the nature of data from the different scales is described. Interval data are obtained when the distance between any two numbers on the scale are of known size and is characterized by a constant unit of measurement. This applies to physical measures like duration and frequency measured in Hertz (Hz) which have featured in the examples discussed to now. Nominal scales are obtained when the measures are obtained from symbols to characterize objects (such as sex of the speakers). Ordinal scales give some idea of the relative magnitude of units that are measured but the difference between two numbers does not give any idea of the relative size. Examples would be responses to questionnaires where there is no guarantee of equal distance between the response choices offered (e.g. strongly agree, agree, disagree, strongly disagree). In cases where parametric tests cannot be used, non-parametric (also known as distribution-free) tests have to be employed. The computations involved in these tests are straightforward and covered in any elementary text book. A reader who has followed the material presented thus far should find it easy to apply the previous ideas to these tests.

A number of representative questions a speech and hearing investigator might want to answer were considered at the start of this section. Let us just go back over these and consider which ones we are now equipped to answer. First there was how to check whether there are differences between spontaneous and read speech.

If the measures are parametric (such as would be the case for many acoustic variables), then either an independent or related t test would be appropriate to test for differences. An independent t test is needed when samples of spontaneous speech and read speech are drawn from different speaker sets; a related t test is used when the spontaneous and read samples are both obtained from the same group of speakers.

If the measures are non-parametric (e.g. ratings of clarity for the spontaneous and read speech) then a Wilcoxon test would be used when the read and spontaneous versions of the speech are drawn from the same speaker and a Mann-Whitney U test otherwise.

If you find differences between read and spontaneous speech (see application described), how can you check whether language statistics on your sample of recordings is representative of the language as a whole - or, what might or might not be the same thing, how can you be sure that you have sampled sufficient speech? For this, the background information provided to estimate how close sample estimates are to population estimates is appropriate.

4. Conclusions

This has been a whirlwind tour of some elementary concepts in treatment of data from the speech and hearing areas starting with issues concerned with choice of data and inferential statistics. A final aim is to draw readers attention to other simple, clearly written resources for examining data in these (and medical) areas in general. For this purpose, a selected bibliography follows.

Acknowledgement

The second author is supported by the Wellcome Trust grant 072639.

Bibliography & British Medical Journal: Statistics Notes

  1. Armitage P, Berry G, Matthews JNS. Statistical Methods in Medical Research. London: Blackwell Publishers; 2001. A standard text on statistical methods in Medical Research. [Google Scholar]
  2. Bland M. An introduction to Medical Statistics. 3rd edition Oxford: Oxford University Press; 2000. Another standard text on statistical methods in Medical Research. [Google Scholar]
  3. Hand DJ. Measurement - theory and practice. London: Hodder Arnold; 2004. Just out covers elementary and advanced levels. [Google Scholar]
  4. Huff D. How to lie with statistics. New York: Norton publishers; 1993. Popular and accurate. [Google Scholar]
  5. Moser C, Kalton G. Survey methods in social investigation. London: Heinemann Educational; 1979. [Google Scholar]
  6. Perhaps the finest series of short articles on the use of statistics is the occasional series of Statistics Notes started in 1994 by the British Medical Journal. It should be required reading in any introductory statistics course. The full text of the articles is available is available on the World Wide Web. The articles are listed here chronologically.
  7. •.Bland J Martin, Altman Douglas G. Correlation, regression, and repeated data. BMJ. 1994 Apr 2;308:896. doi: 10.1136/bmj.308.6933.896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. •.Bland J Martin, Altman Douglas G. Regression towards the mean. BMJ. 1994 Jun 4;308:1499. doi: 10.1136/bmj.308.6942.1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. •.Altman Douglas G, Bland J Martin. Diagnostic tests 1: sensitivity and specificity. BMJ. 1994 Jun 11;308:1552. doi: 10.1136/bmj.308.6943.1552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. •.Altman Douglas G, Bland J Martin. Diagnostic tests 2: predictive values. BMJ. 1994 Jul 9;309:102. doi: 10.1136/bmj.309.6947.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. •.Altman Douglas G, Bland J Martin. Diagnostic tests 3: receiver operating characteristic plots. BMJ. 1994 Jul 16;309:188. doi: 10.1136/bmj.309.6948.188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. •.Bland J Martin, Altman Douglas G. One and two sided tests of significance. BMJ. 1994 Jul 23;309:248. doi: 10.1136/bmj.309.6949.248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. •.Bland J Martin, Altman Douglas G. Some examples of regression towards the mean. BMJ. 1994 Sep 24;309:780. doi: 10.1136/bmj.309.6957.780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. •.Altman Douglas G, Bland J Martin. Quartiles, quintiles, centiles, and other quantiles. BMJ. 1994 Oct 15;309:996. doi: 10.1136/bmj.309.6960.996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. •.Bland J Martin, Altman Douglas G. Matching. BMJ. 1994 Oct 29;309:1128. doi: 10.1136/bmj.309.6962.1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. •.Bland J Martin, Altman Douglas G. Multiple significance tests: the Bonferroni method. BMJ. 1995 Jan 21;310:170. doi: 10.1136/bmj.310.6973.170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. •.Altman Douglas G, Bland J Martin. The normal distribution. BMJ. 1995 Feb 4;310:298. doi: 10.1136/bmj.310.6975.298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. •.Bland J Martin, Altman Douglas G. Calculating correlation coefficients with repeated observations: Part 1--correlation within subjects. BMJ. 1995 Feb 18;310:446. doi: 10.1136/bmj.310.6977.446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. •.Bland J Martin, Altman Douglas G. Calculating correlation coefficients with repeated observations: Part 2--correlation between subjects. BMJ. 1995 Mar 11;310:633. doi: 10.1136/bmj.310.6980.633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. •.Altman Douglas G, Bland J Martin. Absence of evidence is not evidence of absence. BMJ. 1995 Aug 19;311:485. doi: 10.1136/bmj.311.7003.485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. •.Bland J Martin, Altman Douglas G. Presentation of numerical data. BMJ. 1996 Mar 2;312:572. doi: 10.1136/bmj.312.7030.572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. •.Bland J Martin, Altman Douglas G. Logarithms. BMJ. 1996 Mar 16;312:700. doi: 10.1136/bmj.312.7032.700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. •.Bland J Martin, Altman Douglas G. Transforming data. BMJ. 1996 Mar 23;312:770. doi: 10.1136/bmj.312.7033.770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. •.Bland J Martin, Altman Douglas G. Transformations, means, and confidence intervals. BMJ. 1996 Apr 27;312:1079. doi: 10.1136/bmj.312.7038.1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. •.Bland J Martin, Altman Douglas G. The use of transformation when comparing two means. BMJ. 1996 May 4;312:1153. doi: 10.1136/bmj.312.7039.1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. •.Altman Douglas G, Bland J Martin. Comparing several groups using analysis of variance. BMJ. 1996 Jun 8;312:1472–1473. doi: 10.1136/bmj.312.7044.1472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. •.Bland J Martin, Altman Douglas G. Measurement error and correlation coefficients. BMJ. 1996 Jul 6;313:41–42. doi: 10.1136/bmj.313.7048.41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. •.Bland J Martin, Altman Douglas G. Measurement error proportional to the mean. BMJ. 1996 Jul 13;313:106. doi: 10.1136/bmj.313.7049.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. •.Altman Douglas G, Matthews John NS. Interaction 1: heterogeneity of effects. BMJ. 1996 Aug 24;313:486. doi: 10.1136/bmj.313.7055.486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. •.Bland J Martin, Altman Douglas G. Measurement error. BMJ. 1996 Sep 21;313:744. doi: 10.1136/bmj.313.7059.744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. •.Matthews John NS, Altman Douglas G. Interaction 2: compare effect sizes not P values. BMJ. 1996 Sep 28;313:808. doi: 10.1136/bmj.313.7060.808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. •.Matthews John NS, Altman Douglas G. Interaction 3: How to examine heterogeneity. BMJ. 1996 Oct 5;313:862. doi: 10.1136/bmj.313.7061.862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. •.Altman Douglas G, Bland J Martin. Detecting skewness from summary information. BMJ. 1996 Nov 9;313:1200. doi: 10.1136/bmj.313.7066.1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. •.Bland J Martin, Altman Douglas G. Cronbach’s alpha. BMJ. 1997 Feb 22;314:572. doi: 10.1136/bmj.314.7080.572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. •.Altman Douglas G, Bland J Martin. Units of analysis. BMJ. 1997 Jun 28;314:1874. doi: 10.1136/bmj.314.7098.1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. •.Bland J Martin, Kerry Sally M. Weighted comparison of means. BMJ. 1998 Jan 10;316:129. doi: 10.1136/bmj.316.7125.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. •.Kerry Sally M, Bland J Martin. Sample size in cluster randomisation. BMJ. 1998 Feb 14;316:549. doi: 10.1136/bmj.316.7130.549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. •.Kerry Sally M, Bland J Martin. The intracluster correlation coefficient in cluster randomisation. BMJ. 1998 May 9;316:1455–1460. doi: 10.1136/bmj.316.7142.1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. •.Altman Douglas G, Bland J Martin. Generalisation and extrapolation. BMJ. 1998 Aug 8;317:409–410. doi: 10.1136/bmj.317.7155.409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. •.Altman Douglas G, Bland J Martin. Time to event (survival) data. BMJ. 1998 Aug 15;317:468–469. doi: 10.1136/bmj.317.7156.468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. •.Bland J Martin, Altman Douglas G. Bayesians and frequentists. BMJ. 1998 Oct 24;317:1151–1160. doi: 10.1136/bmj.317.7166.1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. •.Bland J Martin, Altman Douglas G. Survival probabilities (the Kaplan-Meier method) BMJ. 1998 Dec 5;317:1572–1580. doi: 10.1136/bmj.317.7172.1572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. •.Altman Douglas G, Bland J Martin. Treatment allocation in controlled trials: why randomise? BMJ. 1999 May 1;318:1209–1209. doi: 10.1136/bmj.318.7192.1209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. •.Altman Douglas G, Bland J Martin. Variables and parameters. BMJ. 1999 Jun 19;318:1667–1667. doi: 10.1136/bmj.318.7199.1667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. •.Altman Douglas G, Bland J Martin. How to randomise. BMJ. 1999 Sep 11;319:703–704. doi: 10.1136/bmj.319.7211.703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. •.Altman Douglas G, Bland J Martin. The odds ratio. BMJ. 2000 May 27;320:1468. doi: 10.1136/bmj.320.7247.1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. •.Day Simon J, Altman Douglas G. Blinding in clinical trials and other studies. BMJ. 2000 Aug 19;321:504. doi: 10.1136/bmj.321.7259.504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. •.Altman Douglas G, Schulz Kenneth F. Concealing treatment allocation in randomised trials. BMJ. 2001 Aug 25;323:446–447. doi: 10.1136/bmj.323.7310.446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. •.Bland J Martin, Altman Douglas G. Validating scales and indexes. BMJ. 2002 Mar 9;324:606–607. doi: 10.1136/bmj.324.7337.606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. •.Altman Douglas G, Bland J Martin. Interaction revisited: the difference between two estimates. BMJ. 2003 Jan 25;326:219. doi: 10.1136/bmj.326.7382.219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. •.Bland J Martin, Altman Douglas G. The logrank test. BMJ. 2004 May 1;328(7447):1073. doi: 10.1136/bmj.328.7447.1073. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES