Skip to main content
The Journal of the Indian Prosthodontic Society logoLink to The Journal of the Indian Prosthodontic Society
editorial
. 2023 Jul 18;23(3):207–209. doi: 10.4103/jips.jips_307_23

Basics in statistics: Sample size calculation and descriptive data statistics

Anand Kumar Vaidyanathan 1,
PMCID: PMC10467321  PMID: 37929358

graphic file with name JIPS-23-207-g001.jpg

Statistics is a scientific approach for converting data into information. There are two types of data: descriptive and statistical data, and most often, researchers consider only the statistics of the measured outcome that were derived during the conduction of the study. It is essential to include statistical analysis for sample size calculation and the descriptive data when submitting a manuscript.

Sample population is the subset of participants from the target population, and an adequate number of participants is essential for obtaining appropriate statistical inferences. Selecting a convenient number of samples either due to the availability of limited participants or because of a lack of resources would be noninferential research. The sample size selection should be based on previous studies with similar characteristics. However, if a reference study is unavailable, we can conduct a pilot study before the main research to select an appropriate sample size. Sample size selection is influenced by the level of significance, the power of the study, the expected effect size, the underlying event rate, and the standard deviation of the population [Table 1].[1] The level of significance is to estimate the sample size based on a P value that is either <0.001 or <0.05, which is based on the confidence interval we are choosing, which is either 99% or 95%, respectively.[2] The power of the study is often represented as 1-β, where beta is the probability of failing to detect a difference when it is actually present. Often, the power of the study should be at least 80%, and if a larger sample size is planned for the conduct of the study, the power should be kept at 90%. The level of significance and the power of the study are selected by the researcher. However, the other three factors should be based on previous or pilot studies.

Table 1.

Essentials of sample size calculation

Factors considered Influence on sample size Depending factor
Level of significance (P) <0.001 - Increases sample size <0.05 - Decreases the sample size Can be controlled by researcher
Power of study (1–β) Above 90% - Increases sample size 80% - Decreases the sample size Can be controlled by researcher
Effect size Smaller the effect size - Larger the sample size Mean of the data from reported research
Event rate Smaller the event rate - Larger the sample size Prevalence of disease condition, from reported data or existing disease distribution
SD Homogenous/narrow SD - Smaller the sample size SD from the previously reported data

SD: Standard deviation

Effect size is a relative difference in the mean value of the measuring outcome between the control and the study groups. For example, in a previously conducted study or pilot study, if the crestal bone loss in the peri-implant region in the control population is 1.5 mm and in the study population is 0.7 mm, the mean difference of both determines the effect size. If the effect size is small, the study population should be large, whereas the reverse is true when the effect size is large. The event rate is determined based on the prevalence of the condition/disease; if it is less, the sample should be increased. For example, to conduct a study on a failing implant, in the present scenario, the prevalence of the condition is <2%, and hence, we require a larger sample size. The standard deviation is based on the dispersion of the data among the participants from its average mean value. A smaller sample size is required for homogenous or less dispersed samples (narrow standard deviation) from its mean value. Event rate should be constantly evaluated during the process of the study, and if the prevalence of the condition is altered, the sample size should be altered accordingly at any stage of the study. Sample size determination should never be convenient, especially for a clinical study, and most journals make it mandatory to submit the sample size calculation. Furthermore, an adjustment of sample size is preferred in addition to the calculated size to compensate for the dropout of participants during the course of the study. If n is the number of samples derived from the sample size calculator and b is the dropout rate, the adjusted sample size would be slightly more than the calculated sample size.

The data obtained from participants are divided into descriptive and inferential data, the former being the description of the population and the latter being the data extracted from the population. The statistics of descriptive data help prevent bias between the control and the study population. Descriptive data represent the average value or dispersion of value for each outcome. The mean, median, and mode are representations of descriptive data.[3] Furthermore, depicting the number or frequency of participants in each category is a form of descriptive data. Statistics from descriptive data help understand the presence or absence of bias between the control and the study population in their basic characteristics. For example, if the control and study populations are selected based on the quality of the anterior edentulous maxilla for placement of implants, the participants should not have varied characteristics based on age, sex, or gingival biotype. Furthermore, if the population varies between 30 and 40 years of age, we cannot have a mean test population closer to 30 years and a control population that is closer to 40 years. Hence, a researcher should perform statistics on descriptive data, so that the inference obtained from the research will be unbiased if there is no significant difference between the study and control populations in their basic characteristics.

Inferential statistics of descriptive data involve the comparative evaluation of two groups of data to determine the level of significance or P value. The descriptive data should have an inferential statistic, especially when performing a randomized control trial. The statistics of descriptive data between the control and the study groups will help in detecting the presence of bias during the randomization of participants between the groups. An unbiased RCT will not have statistical differences between the study and control groups.

Inferential statistical analysis of the test outcome should be performed by parametric and nonparametric tests [Table 2]. Parametric statistics are based on assumptions about the distribution of the population from which the sample was taken. Nonparametric statistics are not based on assumptions, and the data can be collected from a sample that does not follow a specific distribution or stringent criteria. The decision to use parametric or nonparametric tests often depends on whether the mean or median accurately represents the center of your data distribution sets.[4] For example, considering the values 10, 11, 11, 11, 13, 15, 18, and 20 as the data distribution for the interincisal distance measurement among the participants with deep bite in Class II division 2, the mean of this data distribution is 13.6 (average values), while the median is 11 (repeated occurrence of values). The type of statistics for this given data depends on the kind of analysis (qualitative or quantitative), the average value (mean or median), and the sample size. A parametric test is performed for a larger sample size, quantitative analysis, or when the mean more accurately represents the center of the data distribution. The nonparametric test is performed for a smaller sample size, qualitative analysis, or when the median accurately represents the center of the data distribution. Although the above factors are basic considerations, further decisions on using parametric or nonparametric tests should be confirmed by Shapiro–Wilk for smaller samples or Kolmogorov–Smirnov tests for larger samples. If there is no significant difference (P > 0.05) in either of the above tests, it indicates that the data are normally distributed and require a parametric test.

Table 2.

Factors affecting the type of inferential statistical analysis

Factors Parametric test is considered Nonparametric test is considered
Sample size Large Small
Centre of data distribution Mean Median
Type of data Quantitative Qualitative

The field of research is moving toward a healthy competition, wherein publishing articles depend on the quality of research and manuscript writing. As already mentioned, the minor details in the manuscript, especially the quality writing and appropriate statistical information, add value to quality research.

REFERENCES


Articles from The Journal of the Indian Prosthodontic Society are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES