Skip to main content
. 2015 Mar 19;2015:431487. doi: 10.1155/2015/431487

Table 1.

Some basic statistical concept on genomic data for genetic diversity assessment.

Concept terms Description/features Formulae/pros/cons
Band-based approaches Easiest way to analyze and measure diversity by focusing on presence or absence of banding pattern. Routinely use individual level.
Totally relay on marker type and polymorphism

(1) Measuring polymorphism Observing the total number of polymorphic bands (PB) and then calculating the percentage of polymorphic bands. This “band informativeness” (Ib) can be represented on a scale ranging from 0 to 1 according to the formula
Ib = 1 − (2 × |0.5 − p|),
where p is the portion of genotypes containing the band.

(2) Shannon's information index (I) It is called the Shannon index of phenotypic diversity and is widely applied. I = −∑p ilog2p i.
These methods depend on the extraction of allelic frequencies.

(3) Similarity coefficients Utilize similarity or dissimilarity (the inverse of the previous one) coefficients.
The Jaccard coefficient (J) only takes into account the bands present in at least one of the two individuals. It is therefore unaffected by homoplasic absent bands (where the absence of the same band is due to different mutations).
The simple-matching index (SM) maximizes the amount of information provided by the banding patterns considering all scored loci.
The Neil and Li index (SD) doubles the weight for bands present in both individuals, thus giving more attention to similarity than dissimilarity.
(i) Jaccard similarity coefficient or
Jaccard index J = a/(a + b + c).
(ii) Simple matching coefficient or index SM = (nbc)/n .
(iii) Sørensen-Dice index or Nei and Li index SD = 2a/2a + b + c
where a is the number of bands (1 s) shared by both individuals; b is the number of positions where individual i has a band, but j does not; c is the number of positions where individual j has a band, but i does not; and n is the total number of bands (0 s and 1 s).

(4) Allele frequency based approaches Measure variability by describing changes in allele frequencies for a particular trait over time, more population oriented than band-based approaches. These methods depend on the extraction of allelic frequencies from the data.
The accurate estimates of frequencies essentially influence the results of different indices calculated for further measurements of genetic diversity.

(5) Allelic diversity (A) Easiest ways to measure genetic diversity is to quantify the number of alleles present.
Allelic diversity (A) is the average number of alleles per locus and is used to describe genetic diversity.
A = n i/n l
where n i is the total number of alleles over all loci; n l is the number of loci.
It is less sensitive to sample size and rare alleles and is calculated as n e = 1/∑p i 2
p i 2 ability; it provides information about the dispersal ability of the organism and the degree of isolation among populations.

(6) Effective population size (N e) It provides a measure of the rate of genetic drift, the rate of genetic diversity loss, and increase of inbreeding within a population. Effective size of a population is an idealized number, since many calculations depend on the genetic parameters used and on the reference generation. Thus, a single population may have many different effective sizes which are biologically meaningful but distinct from each other.

(7) Heterozygosity (H) There are two types of heterozygosity observed (H O) and expected (H E).
The H O is the portion of genes that are heterozygous in a population and H E is estimated fraction of all individuals that would be heterozygous for any randomly chosen locus.
Typically values for H E and H O range from 0 (no heterozygosity) to nearly 1 (a large number of equally frequent alleles).
If H O and H E are similar (they do not differ significantly), mating in the populations is random. If H O < H E, the population is inbreeding; if H O > H E, the population has a mating system avoiding inbreeding.
Expected H E is calculated based on the square root of the frequency of the null (recessive) allele as follows:
H E = 1 − ∑i n p i 2
where p i is the frequency of the ith allele.
H O is calculated for each locus as the total number of heterozygotes divided by sample size.

(8) F-statistics In population genetics the most widely applied measurements besides heterozygosity are F-statistics, or fixation indices, to measure the amount of allelic fixation by genetic drift.
The F-statistics are related to heterozygosity and genetic drift. Since inbreeding increases the frequency of homozygotes, as a consequence, it decreases the frequency of heterozygotes and genetic diversity.
Three indexes can be calculated as follows:
F IT = 1 − (H I/H T),
F IS = 1 − (H I/H S),
F ST = 1 − (H S/H T),
where H I is the average H O within each population, H S is the average H E of subpopulations assuming random mating within each population, and H T is the H E of the total population assuming random mating within subpopulations and no divergence of allele frequencies among subpopulations.