Skip to main content
iScience logoLink to iScience
. 2024 Feb 24;27(3):109325. doi: 10.1016/j.isci.2024.109325

A comprehensive evaluation of the phenotype-first and data-driven approaches in analyzing facial morphological traits

Hui Qiao 1,2,7, Jingze Tan 1,2,7, Jun Yan 2,7, Chang Sun 2, Xing Yin 2, Zijun Li 2, Jiazi Wu 2, Haijuan Guan 2, Shaoqing Wen 3, Menghan Zhang 2,4,5,8,, Shuhua Xu 1,6,∗∗, Li Jin 1,∗∗∗
PMCID: PMC10937830  PMID: 38487017

Summary

The phenotype-first approach (PFA) and data-driven approach (DDA) have both greatly facilitated anthropological studies and the mapping of trait-associated genes. However, the pros and cons of the two approaches are poorly understood. Here, we systematically evaluated the two approaches and analyzed 14,838 facial traits in 2,379 Han Chinese individuals. Interestingly, the PFA explained more facial variation than the DDA in the top 100 and 1,000 except in the top 10 phenotypes. Accordingly, the ratio of heterogeneous traits extracted from the PFA was much greater, while more homogenous traits were found using the DDA for different sex, age, and BMI groups. Notably, our results demonstrated that the sex factor accounted for 30% of phenotypic variation in all traits extracted. Furthermore, we linked DDA phenotypes to PFA phenotypes with explicit biological explanations. These findings provide new insights into the analysis of multidimensional phenotypes and expand the understanding of phenotyping approaches.

Subject areas: Health sciences, Biological sciences, Computer science

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Phenotype-first approach reveals facial phenotypic heterogeneity in Han Chinese

  • Data-driven approach (DDA) characterizes shared facial phenotypes in Han Chinese

  • A greater correlation of facial phenotypes with sex, age, and BMI captured by PFA

  • The deficiency of biological significance for DDA can be supplemented by PFA


Health sciences; Biological sciences; Computer science

Introduction

The morphological diversity of the human face is the basis of many anthropological studies, and the human face is a highly complex and variable structure resulting from the intricate coordination of numerous factors.1 Dissecting and quantifying facial morphological diversity plays an important role in understanding the homogeneity and heterogeneity both within and among human populations. Traditionally, facial anthropometry has been acquired through the direct measurement of subjects in a clinical setting, using calipers or metric tape to measure parameters between landmarks.2 This method is referred to as the phenotype-first approach (PFA). A PFA based on distinct landmarks relies greatly on anthropologists’ or experts’ knowledge and experiences. This approach has contributed to the study of human evolution and comparisons among different human populations.3 Genetic studies on facial anthropometric phenotypes have led to the identification of genes underlying measurements of nose width, the distance between two nostrils, and the nasolabial angle.4,5,6 Instead of using only a limited number of distinct landmarks, Claes et al. developed a data-driven approach (DDA) to exploit both the partial and integrated information from three-dimensional (3D) facial images, facilitating the identification of genetic effects on the facial shape at multiple levels of organization from global to local levels.7 Various aspects of facial analyses are now using this innovative approach increasingly.8,9,10,11 In addition to facial phenotyping, DDAs such as unsupervised machine learning can be used to discover novel phenotypes of various diseases, including diabetes,12 sepsis,13 dilated cardiomyopathy,14 pulmonary arterial hypertension,15 heart failure,16 gliomas,17 and primary prostate cancer,18 that may help in elucidating the mechanisms of diseases and treatment effects.

Previous studies on human facial phenotypes employed either a PFA or a DDA. However, the following unanswered questions remain: (1) What is the relationship between these two approaches? (2) What are the differences between the two approaches in capturing human facial characteristics? To address these questions, this study aimed to perform a comprehensive evaluation and comparison of the two approaches. We dissected 3D facial images of 2,379 individuals from a Han Chinese population into 14,838 facial morphological traits, including PFA phenotypes such as the coordinates, distances, and curvatures of landmarks, as well as DDA phenotypes such as the principal components and surface areas. Our findings are likely to bear some implications in fundamental and applied sciences, including human genetics, developmental biology, evolutionary biology, medical genetics, forensics, and the design of facial products.

Results

The inter-association among facial phenotypes is widespread

To validate the associations among phenotypes, we performed a Spearman correlation analysis on the PFA and DDA. We observed that most traits were related to each other (Figure 1). The strongest mean correlation coefficient was observed within geodesic distance itself (r = 0.455), followed by Euclidean distance and geodesic distance (r = 0.442) in the PFA. The most relevant data-driven categories were the surface area of the module and the Moran’s I of the module Z coordinate (r = 0.068), indicating the weak correlation in the DDA. If considering the relationships between PFA and DDA, the surface area of the module and geodesic distance had the greatest mean correlation coefficient (r = 0.128). These results were consistent with expectations and suggested that correlations within different types of traits may be explained in part by sets of shared common genetic components.

Figure 1.

Figure 1

Association analysis among different phenotypes

Circos plot illustrates correlations among phenotypes. Each color represents one type of phenotype. The wider is the band, the greater is the correlation. The subclass of phenotypes are abbreviated as point coordinates (Point), proportion indices (Index), curvatures (Curvature), angular measurements (Angle), triangle area measurements (Triangle_area), Euclidean distances (Euclidean), geodesic distances (Geodesic), Manhattan distances (Manhattan), voluminal measurements (Volume), surface area measurements (Surface_area), principal components of the module (Module_PCs), surface area of the module (Module_surf_area), Moran’s I of the module Z coordinate (Module_mor_z), Moran’s I of the module Gaussian curvature (Module_mor_gau), and Moran’s I of the module mean curvature (Module_mor_mea), respectively.

See also Figure S1.

To gain further insight into the relationships among different types of phenotypes, we used multi-dimensional scaling (MDS) to identify clusters and patterns of high-dimensional phenotypic data in the lower dimensional space. According to the correlation analysis mentioned previously, we conducted MDS analysis to further derive these similarity or dissimilarity measurements. The results showed that point coordinates, triangle area measurements, surface area measurements, voluminal measurements, and three kinds of distance were clustered together (Figures S1A–S1C). In addition, the graphs based on the angular measurements, the principal components of the module, and the Moran’s I of the module Z coordinate shared some common characteristics. The remaining phenotypes formed another cluster in Dim1 and Dim2. Proportion indices were an outlier for other types in Dim1 and Dim3. The point coordinates, triangle area measurements, surface area measurements, voluminal measurements, Euclidean distances, and geodesic distances formed the largest cluster, and the remaining phenotypes grouped together. From Dim2 and Dim3, the curvatures, the surface area of the module, and the Moran’s I of the module mean or Gaussian curvature gathered together. The proportion indices clustered tightly and separated from other traits, while the voluminal measurements, surface area measurements, triangle area measurements, Euclidean distances, and geodesic distances formed another cluster. To further verify the internal relationships among various phenotypes, t-distributed stochastic neighbor embedding was used to analyze the correlation matrix of each phenotype, and we found that the similarities among facial features were consistent with the MDS results (Figure S1D). These observations illustrated that facial traits using the same phenotyping approach were more similar, and the differences between PFA and DDA were greater than their internal differences.

Facial phenotypes show significant associations with sex, age, or BMI

To probe the roles of these basic variables in different kinds of phenotypes, we performed a non-parametric test to examine the correlations between sex, age, or BMI, and facial traits. As shown in Figure 2A, most of the traits showed significant associations with sex, age, or BMI after multiple adjustments (p < 3.37 × 10−6). The triangle area of the pronasale, left cheilion, and left superior alar groove showed the greatest difference between males and females using Mann-Whitney U test (p = 1.6 × 10−283) (Table S1). Age correlated most strongly with the angle of the left endocanthion, right exocanthion, and right cheilion (r = 0.604, p = 1.3 × 10−236), while the first principal component of the 38th module showed the greatest difference among BMI groups (r = −0.476, p = 4.24 × 10−135) (Tables S2 and S3).

Figure 2.

Figure 2

Relationship between phenotypes and basic variables

(A) The relationship between facial phenotypes and basic variables. Scatterplot for phenotypes against sex, age, and BMI.

(B) Ternary plot diagram showed the distribution of the load weight constrained by sex, age, and BMI from various phenotypes. Each dot represents one phenotype, and the same type of trait is indicated by the common color. The solid red line represents multiple correction significance thresholds (p = 3.37 × 10−6), and phenotypes above the red line indicate significant differences or correlations.

See also Tables S1, S2, and S3.

Sex, age, and BMI play different roles in shaping the phenotypic diversity

Understanding the loadings of basic variables (sex, age, and BMI) on the various phenotypes can help to describe the associations between basic variables and phenotypes. Here, we used canonical correlation analysis (CCA) to estimate the possible associations among different basic variables and different phenotypes. The results showed that most of the facial features significantly correlated with various basic variables (p < 3.37 × 10−6). The ternary diagram illustrated that most of the phenotypes correlated with sex, followed by age (Figure 2B), but BMI had a relatively small loading values on phenotypes. In particular, we found sex affected the point coordinates, distances, proportion indices, triangle areas, volumes, and surface areas. Age affected curvatures, while both age and sex affected the angular measurements. Compared to the PFA, the three basic variables had little effect on the data-driven phenotypes with similar loading values. In summary, our findings showed that sex, age, and BMI played different roles in shaping the phenotypic diversity in Han Chinese populations.

Facial features display significant differences between males and females

We performed the partial least squares-discriminant analysis (PLS-DA) and the Adonis to examine the facial differences across sex, age, or BMI. The PLS-DA scatterplot shown in Figure 3A illustrated the obvious facial differences between males and females with R2Y and Q2 > 0.5 (R2X = 0.47, R2Y = 0.859, Q2 = 0.798). We assessed the significance of our classification using permutation tests. The p values for permutation testing were pR2Y = 0.001 and pQ2 = 0.001. The Adonis analysis also supported the result, and sex could explain about 30% of the variance in facial features (R2 = 0.299, p = 0.001). Three examples of facial features contributing to facial discrimination across sexes are shown in Figure 3B. This finding was consistent with the results of the previous association analysis and CCA analysis of facial phenotypes.

Figure 3.

Figure 3

Scatterplot of sex, age, and BMI groups and representative facial features

(A) Partial least squares-discriminant analysis (PLS-DA) scatterplot showed separation between male (green) and female (red) clustering with individual phenotypic data. M, male; F, female.

(B) Visualization example of the variable importance in projection (VIP) phenotypes. Example 1: The fifth principal component of the 11th module. Example 2: The triangular area of the pronasale, left cheilion, and left superior alar groove. Example 3: The nasal surface area.

(C) PLS-DA scatterplot showed separation among age clustering with individual phenotypic data. Y, young; M, middle-aged; O, old.

(D) Visualization example of the VIP phenotypes. Example 1: Mean curvature Moran index of the 15th module. Example 2: Mean curvature Moran index of the 31st module. Example 3: Mean curvature Moran index of the 62nd module.

(E) PLS-DA scatterplot showed separation among BMI clustering with individual phenotypic data. Uw, underweight; N, normal; Ow, overweight; O, obese.

(F) Visualization example of the VIP phenotypes. Example 1: The seventh principal component of the fourth module. Example 2: The second principal component of the 22nd module. Example 3: The first principal component of the 86th module. Colored circles represent 95% confidence intervals. Colored dots represent individual samples.

See also Figures S2 and S3 and Tables S4–S9.

Facial features display significant differences among age groups

As age is an important factor in human facial features, we examined the facial differences among three age groups of young, middle-aged, and older people using PLS-DA. As shown in Figure 3C, the three age groups formed distinct clusters in the PLS-DA scatterplots (R2X = 0.508, R2Y = 0.521, Q2 = 0.361). The p values for permutation testing were pR2Y = 0.001 and pQ2 = 0.001. Adonis analysis showed a significant difference in the facial phenotypes among the three age groups, and age explained 3% facial variance (R2 = 0.03, p = 0.001) (Table S4), although there were some overlaps among age groups which indicated some degree of potentially shared features. We provided three examples of facial features with variable importance in projection (VIP) > 1 that could well distinguish the age groups (Figure 3D). In addition, we also observed these patterns in either females or males (Figure S2, Tables S5 and S6).

Facial features display significant differences among BMI groups

We further used PLS-DA and Adonis analysis to investigate the facial differences among four groups of underweight, normal, overweight, and obese people based on individual BMI. In particular, the PLS-DA scatterplots displayed distinct clusters of the four BMI groups (R2X = 0.446, R2Y = 0.207, Q2 = 0.142) (Figure 3E). The p values for permutation testing were pR2Y = 0.001 and pQ2 = 0.001. And the Adonis analysis also showed an overall significant difference among the four BMI groups (Total: R2 = 0.07, p = 0.001). However, no significant difference was observed between the underweight and normal groups, suggesting a relationship between certain BMI groups and facial morphology in the Han Chinese population (Table S7). We provided three examples of facial features with VIP >1 that could well distinguish the BMI groups (Figure 3F). These results were further replicated in both male and female sub-datasets (Figure S3, Tables S8 and S9).

PFA and DDA characterize different aspects of facial phenotypes

To further evaluate the performance of PFA and DDA, we adopted the criteria importance through inter-criteria correlation19 to estimate the objective weights (OWs) for all facial phenotypes extracted using the two approaches. As shown in Figure 4A, the average OW obtained from PFA was significantly greater than that from DDA (p < 2.2 × 10−16). By ranking the OWs for each facial phenotype, we next compared PFA and DDA in the top 10, 100, or 1,000 facial phenotypes (Figure 4B). The results showed that the average weight of the top 10 facial phenotypes was smaller in the PFA compared with the DDA (p = 1.6 × 10−4). This indicated that DDA better explained the facial phenotypic variation than PFA. However, the situation was different for the top 100 and 1,000 phenotypes, for which the average OWs for PFA were larger than those for the DDA (top 100: p = 1.1 × 10−3; top 1,000: p < 2.2 × 10−16). Generally, we discovered an overall greater proportion of differentiated facial phenotypes when using PFA than DDA for different sex, age, and BMI groups (Figure 4C). On the other hand, we discovered an overall smaller proportion of the common facial phenotypes with PFA than DDA (Figure 4D). These were also observed using the Adonis analysis (Figure 4E).

Figure 4.

Figure 4

Comprehensive evaluation of the phenotype-first approach (PFA) and the data-driven approach (DDA)

(A) Objective weights of PFA and DDA. The dotted line represents the mean weight.

(B) Top 10, 100, and 1,000 objective weights of the PFA and DDA. The dotted line represents the mean weight.

(C) Heterogeneous ratios in the PFA and DDA.

(D) Homogenous ratios in the PFA and DDA. Age all and BMI all mean a combination of male and female data. All factors mean sex, age, and BMI were considered.

(E) The Adonis tests of the top 10, 100, and 1,000 smallest p values for sex, age, and BMI differences. Colored dots/bars represent each phenotype: phenotype-first (red) and data-driven (green).

Unclear biological explanations of DDA phenotypes can be explained by PFA phenotypes

To provide a concrete meaning of the DDA phenotype in the context of the traditional understanding of facial morphology, we attempted to link DDA phenotypes to PFA phenotypes with explicit biological or physical explanations. We identified several such links using correlation analysis and local area mapping. For example, the DDA phenotype Gau_MoranI17 was associated with three PFA phenotypes for the morphological change of the forehead slope (Figure 5A). Similarly, Module6_pc1 using principal component analysis could be explained by four PFA phenotypes related to the alae nasi projection (Figure 5B). The Module21_pc8, with a low-rank OW, was strongly correlated with three angular phenotypes representing the chin projection in the morphological observation (Figure 5C). These cases demonstrated a prevalent corresponding relationship between PFA and DDA phenotypes.

Figure 5.

Figure 5

Example of mapping relationships among morphological observations, phenotype-first approach (PFA), and data-driven approach (DDA) phenotypes

(A) The biological explanation of Gau_MoranI17.

(B) The biological explanation of Module6_pc1.

(C) The biological explanation of Module21_pc8. The number on the horizontal red line represents the correlation between the two traits. The weight rank represents the weight size of traits obtained using CRITIC. The larger the weight value, the smaller the rank. The traits are abbreviated as the Gaussian curvature Moran index of the 17th module (Gau_MoranI17); the X coordinate direction of the glabella (X1); the Y coordinate direction of the glabella (Y1); the Z coordinate direction of the glabella (Z1); the first principal component of the sixth module (Module6_pc1); the X coordinate direction of the right alare (X14); the angle of the right alare, pronasale, and left superior alar groove (Ang_760a); the angle of the pronasale, right alare, and left alare (Ang_752b); the angle of the right subalare, pronasale, and left subalare (Ang_775a); the eighth principal component of the 21st module (Module21_pc8); the angle of the pronasale, sublabiale, and pogonion (Ang_676b); the angle of the subnasale, stomion, and pogonion (Ang_870b); and the angle of the labral superius, sublabiale, and pogonion (Ang_1117b).

Discussion

In this study, using both PFA and DDA, we analyzed 14,838 facial morphological phenotypes in the Han Chinese population. The findings indicated that there was a broad correlation among PFA phenotypes, but a lower correlation among DDA phenotypes (Figure 1). The nature of the DDA likely resulted in the lower correlation; its computational principle made it easier for each segment to exist independently.7 Importantly, we proposed that PFA can better characterize the highly differentiated facial traits, while DDA can better characterize the shared facial traits among various groups defined based on sex, age, and BMI.

It has been well documented that facial features vary based on sex and age. The timing and duration of surges in facial growth tend to be different for males and females and between populations contributing to overall facial variation.20,21,22 Based on 3D image analysis, a previous study reported that women’s noses were shorter and that the nose tip was more pronounced than men’s.23 Interestingly, another physical anthropological study of the Han Chinese also showed that males had thinner eyes and higher nose wings, while females had more slanted eyes, more prominent cheekbones, a more prominent nose tip, and more Mongolian folds.24 In this study, we observed significant sex differences in most phenotypes, in particular, the males showed larger mean values in 9,219 of 14,838 phenotypes than females.

With respect to age, a previous study in the Korean population reported that the height and thickness of the skull decreased with age, while the length of the skull did not change significantly.25 Another study based on the Malaysian population samples reported significant differences in facial phenotypes among age groups.26 Regarding the Han Chinese population, we found negative correlations with age in several facial phenotypes, including the head width, head circumference, lip height, face width, head length height index, and mouth width index, while the mouth width, morphological surface height, nose width, and upper lip skin height correlated with age positively.27 Here, we showed that age played a critical role in the phenotypic diversity of the facial morphology for both the PFA and DDA. Overall, facial phenotypes largely correlated with sex, age, and BMI.

In the previous study, the goal was to investigate whether different types of phenotypes were correlated with basic variables. Utilizing several analytic strategies, our results demonstrated that the facial phenotypes of 15 types differed in basic variables. The next natural question is whether some phenotypes are not variations from all the basic variables. Therefore, this study screened for non-significant phenotypes in the previous variation analysis. The results showed that 3,711 facial phenotypes had no significant differences in basic variables, among which 1,836 phenotypes were obtained through DDA (p ≥ 3.37 × 10−6) (Figure S4). In addition, we identified 259 facial phenotypes showing no considerable correlation with these three variables by an iterative Adonis test, of which 220 were DDA phenotypes. The Adonis test confirmed that regardless of sex, age, or BMI groups, the R2 interpretation was very tiny, and the p values also verified that there was no difference among subgroups (Tables S10 and S11). These results indicated the complexity of the mechanisms and driving forces of facial morphological diversity.

Several studies have already applied either PFA or DDA to the exploration of the relationship between genes and phenotypes.5,6,9,28,29,30,31,32,33,34,35,36,37,38,39,40 However, to the best of our knowledge, the present study is the first research to evaluate the effects of characterizing facial features with PFA and DDA. In this study, we found that the PFA had a greater average weight than the DDA, indicating that the PFA could better characterize facial features, and also better explain the data conditioning on factors such as sex, age, and BMI. The PFA could extract more heterogeneous features, while the DDA could extract more homogenous features. Therefore, the PFA is a knowledge-based method that is valuable in physical anthropological studies, while DDA is a promising approach for knowledge-free phenotyping. We proposed that both PFA and DDA should be considered not only for facial phenotypes but also for the study of other physical anthropological traits, with a well-designed analysis considering the pros and cons of the two approaches.

Limitations of the study

In future studies, more efforts are needed to collect richer phenotype data to facilitate facial morphological studies, which will further improve the power of locating genes associated with facial phenotypes. It would be interesting to expand the study into non-Han-Chinese populations to explore whether these findings can be broadly applied.

STAR★Methods

Key resources tables

REAGENT or RESOURCE SOURCE IDENTIFIER
Software and algorithms

FaceXenios GFM NA
MATLAB R2016b MathWorks https://www.mathworks.com/products/matlab.html
RStudio (R version 3.5.1) The R Foundation https://www.r-project.org
SIMCA-P 14.0 Umetrics NA
Adobe Illustrator CS6 Adobe NA

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Menghan Zhang (mhzhang@fudan.edu.cn).

Materials availability

This study did not generate reagents.

Data and code availability

  • All data reported in this paper will be shared by the lead contact upon request.

  • The paper does not report the original code.

  • Any additional information required to re-analyze the data reported in this paper is available from the lead contact upon request.

Experimental model and study participant details

Participants

A total of 2,379 Han Chinese individuals, consisting of 904 males and 1,475 females aged 17 to 83 (mean = 48.9, SD = 12.7), were recruited over five years in China for the collection of 3D data and other relevant information. In addition to 3D data, each participant's age, sex, height, and weight were also recorded. Informed consent was obtained from all individual participants included in the study. The sample collection process for this study was approved by the Ethics Committee of Human Genetic Resources of School of Life Sciences, Fudan University, Shanghai (14117) and has been performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Method details

3D facial image acquisition and preprocessing

The GFM FaceScan 3D System, developed by GFMesstechnik GmbH (Teltow, Germany), was used to obtain 3D images. The system was calibrated against a target image, containing a known pattern of dots on a white background, at the beginning and regular intervals during each acquisition session. This ensured that each 3D facial image had accurate dimensions. Individuals with a history of craniofacial trauma, congenital malformation, or surgery were excluded. Before scanning, an experienced operator marked critical bone structures on the participant's face with an erasable pen for further annotation.41 During image capture, participants were instructed to tilt their heads slightly upward, maintain a neutral facial expression with their mouths closed and eyes open, and gaze directly forward. Images with incorrect characteristics were discarded, and new images were taken to meet the set criteria.41 Each image was composed of a 3D triangular mesh, with about 50,000 vertices representing the frontal facial surface and the corresponding texture map. To account for variations in the original pose and position of the meshes, the acquired 3D images were transformed to a common orientation under the same coordinate space. The preprocessing pipeline was performed using the GFM FaceScan 3D System.

Dissecting facial phenotypes

This paper implements two approaches for 3D facial morphological phenotyping to compare their differences in characterizing facial traits. The first approach, PFA, was described in detail by the authors in our previous study.42 Briefly, one trained operator manually placed 26 landmarks across the facial surface twice. The coordinate values (x, y, and z) of these 26 facial landmarks were recorded for each 3D facial image, resulting in a total of 78 coordinate values. Generalized Procrustes analysis standardized the location and orientation of all point configurations.43 The reliability of image capture and manual landmarking has been reported elsewhere in our previous study.42 Using RStudio 3.5.144 and MATLAB R2016b, a total of 11,557 phenotype-first measurements were calculated including the point coordinates, mean curvatures, Gaussian curvatures, Euclidean distances, Manhattan distances, geodesic distances,45 proportion indices, angular measurements, triangle area measurements, voluminal measurements, and surface area measurements. The second approach, DDA, was performed using the pipeline described by Claes et al.7 The global-to-local facial phenotyping partitioned each facial surface into 127 segments, each of which consisted of several point clouds and was represented by multiple dimensions of variation (principal components, PCs). The extracted phenotypes included the principal components of the module, the surface area of the module, the Moran's I of the module Z coordinate, the Moran's I of the module mean curvature, and the Moran's I of the module Gaussian curvature. A total of 3,281 phenotypes were obtained in our previous study42 (Figure S5, and Table S12).

Quantification and statistical analysis

The Spearman rho correlation, a nonparametric correlation of the Pearson correlation, was used to analyze the facial phenotypes within the differences of sex, age, and BMI. P values were adjusted for multiple testing using the Holm-Bonferroni method;46 for example, an α threshold for statistical significance for 14,838 comparisons was determined to be 3.37×10−6 (i.e., α = 0.05/14,838). Multidimensional scaling (MDS) and t-distributed stochastic neighbor embedding (t-SNE) were used to visualize the similarity and dissimilarity among different types of phenotypes.47,48 We used SIMCA-P software, version 14.0 (Umetrics, Umea, Sweden), to identify key phenotypes that could discriminate between different subgroups. Partial least squares-discriminant analysis (PLS-DA)49 was used to investigate the differences between subgroups and to identify the discriminative phenotypes responsible for the separation of the groups. The statistical models were validated using the default internal cross-validation method (10-fold CV) and evaluated by permutation test statistics. The quality of the models was described by the R2X, R2Y, and Q2 parameters. The R2 value is a cross-validation parameter that defines the proportion of variance in the data explained by the models, while the Q2 value is an internal cross-validation parameter that indicates the predictability of the model. Furthermore, the contributions of variates were evaluated in the classification models using the variable importance in projection (VIP) score.50 VIP values > 1.0 indicate maximum discriminatory power, whereas those with values < 1.0 indicate minimal discriminatory power.51 Permutational multivariate analysis of variance (PERMANOVA), also known as Adonis, was utilized to test the significant differences in phenotypic composition among various subgroups and to determine the extent to which different grouping factors explained sample differences.52 The PERMANOVA procedures were implemented using the adonis function of Vegan package53 in R . Canonical correlation analysis (CCA),54 as implemented in the function canoncorr in MATLAB 2016b, was used as a straightforward multivariate testing framework. CCA was used to extract the linear combination of basic variables from each facial segment and calculate the weights of three components. Objective weights of the PFA and the DDA were examined based on the criteria importance through inter-criteria correlation (CRITIC) to detect which approach could better characterize facial features.19

Acknowledgments

We are very grateful to the subjects who volunteered for the project. We thank the Fudan University Taizhou Institute of Health Sciences for its role in volunteer recruitment and data entry. We are indebted to the investigator’s support of the Ministry of Education Key Laboratory of Contemporary Anthropology in Fudan. We also thank the local staff and institutions at recruitment sites for their assistance in sample collection. We are also very grateful to all of the cooperation agencies for generously donating their time to our project and to the present and former lab members who worked tirelessly to make these analyses possible. This study was supported by the Basic Science Center Program (32288101), the National Science and Technology Basic Research Project (2015FY111700 to L.J. and 2023YFC2605400 to S.X.), and the National Natural Science Foundation of China (NSFC), China (No. 32271186, 31771325, 32030020, T2122007, and 32070577). We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.

Author contributions

L.J. conceived the project and provided the main resources. J.T., M.Z., and S.X. supervised the study. H.Q. and J.Y. conducted 3D facial image analysis, manual landmarks annotation, and data analysis and drafted the manuscript. C.S., X.Y., Z.L., J.W., H.G., and S.W. performed research. H.Q., L.J., M.Z., and S.X. revised the manuscript. All authors reviewed the manuscript.

Declaration of interests

The authors declare no competing interests.

Published: February 24, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2024.109325.

Contributor Information

Menghan Zhang, Email: mhzhang@fudan.edu.cn.

Shuhua Xu, Email: xushua@fudan.edu.cn.

Li Jin, Email: lijin@fudan.edu.cn.

Supplemental information

Document S1. Figures S1–S5 and Tables S4–S10
mmc1.pdf (959KB, pdf)
Table S1. Sex difference of facial phenotypes, related to Figure 2
mmc2.xlsx (759.5KB, xlsx)
Table S2. Correlation analysis of facial phenotypes with age, related to Figure 2
mmc3.xlsx (668.6KB, xlsx)
Table S3. Correlation analysis of facial phenotypes with BMI, related to Figure 2
mmc4.xlsx (560.8KB, xlsx)
Table S11. Total number of non-significant phenotypes, related to Figure 2
mmc5.xlsx (17.8KB, xlsx)
Table S12. Facial phenotypic abbreviations and definitions, related to STAR Methods
mmc6.xlsx (407.8KB, xlsx)

References

  • 1.Liu D., Ban H.J., El Sergani A.M., Lee M.K., Hecht J.T., Wehby G.L., Moreno L.M., Feingold E., Marazita M.L., Cha S., et al. PRICKLE1 x FOCAD Interaction Revealed by Genome-Wide vQTL Analysis of Human Facial Traits. Front. Genet. 2021;12 doi: 10.3389/fgene.2021.674642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Farkas L.G. Raven Press; 1994. Anthropometry of the Head and Face in Clinical Practice. [Google Scholar]
  • 3.Xi H.J., Chen Z. 2nd. Science Press; 2010. Anthropometric Methods. [Google Scholar]
  • 4.Boehringer S., van der Lijn F., Liu F., Günther M., Sinigerova S., Nowak S., Ludwig K.U., Herberz R., Klein S., Hofman A., et al. Genetic determination of human facial morphology: links between cleft-lips and normal variation. Eur. J. Hum. Genet. 2011;19:1192–1197. doi: 10.1038/ejhg.2011.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu F., van der Lijn F., Schurmann C., Zhu G., Chakravarty M.M., Hysi P.G., Wollstein A., Lao O., de Bruijne M., Ikram M.A., et al. A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cha S., Lim J.E., Park A.Y., Do J.H., Lee S.W., Shin C., Cho N.H., Kang J.O., Nam J.M., Kim J.S., et al. Identification of five novel genetic loci related to facial morphology by genome wide association studies. BMC Genom. 2018;19:481. doi: 10.1186/s12864-018-4865-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Claes P., Roosenboom J., White J.D., Swigut T., Sero D., Li J., Lee M.K., Zaidi A., Mattern B.C., Liebowitz C., et al. Genome-wide mapping of global-to-local genetic effects on human facial shape. Nat. Genet. 2018;50:414–423. doi: 10.1038/s41588-018-0057-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sero D., Zaidi A., Li J., White J.D., Zarzar T.B.G., Marazita M.L., Weinberg S.M., Suetens P., Vandermeulen D., Wagner J.K., et al. Facial recognition from DNA using face-to-DNA classifiers. Nat. Commun. 2019;10:2557. doi: 10.1038/s41467-019-10617-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.White J.D., Indencleef K., Naqvi S., Eller R.J., Hoskens H., Roosenboom J., Lee M.K., Li J., Mohammed J., Richmond S., et al. Insight into the genetic architecture of the human face. Nat. Genet. 2021;53:45–53. doi: 10.1038/s41588-020-00741-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hoskens H., Liu D., Naqvi S., Lee M.K., Eller R.J., Indencleef K., White J.D., Li J., Larmuseau M.H.D., Hens G., et al. 3D facial phenotyping by biometric sibling matching used in contemporary genomic methodologies. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liu D., Alhazmi N., Matthews H., Lee M.K., Li J., Hecht J.T., Wehby G.L., Moreno L.M., Heike C.L., Roosenboom J., et al. Impact of low-frequency coding variants on human facial shape. Sci. Rep. 2021;11:748. doi: 10.1038/s41598-020-80661-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ahlqvist E., Storm P., Käräjämäki A., Martinell M., Dorkhan M., Carlsson A., Vikman P., Prasad R.B., Aly D.M., Almgren P., et al. Novel subgroups of adultonset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 2018;6:361–369. doi: 10.1016/S2213-8587(18)30051-2. [DOI] [PubMed] [Google Scholar]
  • 13.Seymour C.W., Kennedy J.N., Wang S., Chang C.C.H., Elliott C.F., Xu Z., Berry S., Clermont G., Cooper G., Gomez H., et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA. 2019;321:2003–2017. doi: 10.1001/jama.2019.5791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Verdonschot J.A.J., Merlo M., Dominguez F., Wang P., Henkens M.T.H.M., Adriaens M.E., Hazebroek M.R., Masè M., Escobar L.E., Cobas-Paz R., et al. Phenotypic clustering of dilated cardiomyopathy patients highlights important pathophysiological differences. Eur. Heart J. 2021;42:162–174. doi: 10.1093/eurheartj/ehaa841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Sweatt A.J., Hedlin H.K., Balasubramanian V., Hsi A., Blum L.K., Robinson W.H., Haddad F., Hickey P.M., Condliffe R., Lawrie A., et al. Discovery of distinct immune phenotypes using machine learning in pulmonary arterial hypertension. Circ. Res. 2019;124:904–919. doi: 10.1161/CIRCRESAHA.118.313911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Segar M.W., Patel K.V., Ayers C., Basit M., Tang W.H.W., Willett D., Berry J., Grodin J.L., Pandey A. Phenomapping of patients with heart failure with preserved ejection fraction using machine learning-based unsupervised cluster analysis. Eur. J. Heart Fail. 2020;22:148–158. doi: 10.1002/ejhf.1621. [DOI] [PubMed] [Google Scholar]
  • 17.Rui W., Zhang S., Shi H., Sheng Y., Zhu F., Yao Y., Chen X., Cheng H., Zhang Y., Aili A., et al. Deep Learning-Assisted Quantitative Susceptibility Mapping as a Tool for Grading and Molecular Subtyping of Gliomas. Phenomics. 2023;3:243–254. doi: 10.1007/s43657-022-00087-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li Y., Li F., Han S., Ning J., Su P., Liu J., Qu L., Huang S., Wang S., Li X., Li X. Performance of 18F-DCFPyL PET/CT in Primary Prostate Cancer Diagnosis, Gleason Grading and D'Amico Classification: A Radiomics-Based Study. Phenomics. 2023;3:576–585. doi: 10.1007/s43657-023-00108-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Diakoulaki D., Mavrotas G., Papayannakis L. Determining objective weights in multiple criteria problems: The critic method. Comput. Oper. Res. 1995;22:763–770. [Google Scholar]
  • 20.Kau C.H., Richmond S., Zhurov A., Ovsenik M., Tawfik W., Borbely P., JD E. Use of 3-dimensional surface acquisition to study facial morphology in 5 population. Am. J. Orthod. Dentofacial Orthop. 2010;137:S56.e51–S56.e59. doi: 10.1016/j.ajodo.2009.04.022. [DOI] [PubMed] [Google Scholar]
  • 21.Hopman S.M.J., Merks J.H.M., Suttie M., Hennekam R.C.M., Hammond P. Face shape differs in phylogenetically related populations. Eur. J. Hum. Genet. 2014;22:1268–1271. doi: 10.1038/ejhg.2013.289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Richmond R.C., Sharp G.C., Herbert G., Atkinson C., Taylor C., Bhattacharya S., Campbell D., Hall M., Kazmi N., Gaunt T., et al. The long-term impact of folic acid in pregnancy on offspring DNA methylation: follow-up of the Aberdeen Folic Acid Supplementation Trial (AFAST) Int. J. Epidemiol. 2018;47:928–937. doi: 10.1093/ije/dyy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hennessy R.J., Mclearie S., Kinsella A., Waddington J.L. Facial surface analysis by 3D laser scanning and geometric morphometrics in relation to sexual dimorphism in cerebral-craniofacial morphogenesis and cognitive function. J. Anat. 2005;207:283–295. doi: 10.1111/j.1469-7580.2005.00444.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zheng L., Li Y., H X. Science Press; 2017. Study on Physical Anthropology of Han Chinese. [Google Scholar]
  • 25.Kim H.J., Kim K.D., Choi J.H., Hu K.S., Oh H.J., Kang M.K., Hwang Y.I. Differences in the metric dimensions of craniofacial structures with aging in Korean males and females. Korean J. Phys. Anthropol. 1998;11:197–212. [Google Scholar]
  • 26.Al-Khatib A.R., Rajion Z.A., Masudi S.M., Hassan R., Anderson P.J., Townsend G.C. Stereophotogrammetric analysis of nasolabial morphology among Asian Malays: influence of age and sex. Cleft Palate. Craniofac. J. 2012;49:463–471. doi: 10.1597/11-151. [DOI] [PubMed] [Google Scholar]
  • 27.Li Y., Zheng L., Yu K., Lu S., Zhang X., Li Y., Wang Y., Xue H., Deng W. Variation of head and facial morphological characteristics with increased age of Han in Southern China. Chin. Sci. Bull. 2013;58:517–524. [Google Scholar]
  • 28.Paternoster L., Zhurov A.I., Toma A.M., Kemp J.P., St Pourcain B., Timpson N.J., McMahon G., McArdle W., Ring S.M., Smith G.D., et al. Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. Am. J. Hum. Genet. 2012;90:478–485. doi: 10.1016/j.ajhg.2011.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cole J.B., Manyama M., Kimwaga E., Mathayo J., Larson J.R., Liberton D.K., Lukowiak K., Ferrara T.M., Riccardi S.L., Li M., et al. Genome wide association study of african children identifies association of SHIP1 and PDE8A with facial size and shape. PLoS Genet. 2016;12 doi: 10.1371/journal.pgen.1006174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shaffer J.R., Orlova E., Lee M.K., Leslie E.J., Raffensperger Z.D., Heike C.L., Cunningham M.L., Hecht J.T., Kau C.H., Nidey N.L., et al. Genome-wide association study reveals multiple loci influencing normal human facial morphology. PLoS Genet. 2016;12 doi: 10.1371/journal.pgen.1006149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Pickrell J.K., Berisa T., Liu J.Z., Ségurel L., Tung J.Y., Hinds D.A. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lee M.K., Shaffer J.R., Leslie E.J., Orlova E., Carlson J.C., Feingold E., Marazita M.L., Weinberg S.M. Genome-wide association study of facial morphology reveals novel associations with FREM1 and PARK2. PLoS One. 2017;12 doi: 10.1371/journal.pone.0176566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Crouch D.J.M., Winney B., Koppen W.P., Christmas W.J., Hutnik K., Day T., Meena D., Boumertit A., Hysi P., Nessa A., et al. Genetics of the human face: Identification of large-effect single gene variants. Proc. Natl. Acad. Sci. USA. 2018;115:676–685. doi: 10.1073/pnas.1708207114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Qiao L., Yang Y., Fu P., Hu S., Zhou H., Peng S., Tan J., Lu Y., Lou H., Lu D., et al. Genome-wide variants of Eurasian facial shape differentiation and a prospective model of DNA based face prediction. J. Genet. Genom. 2018;45:419–432. doi: 10.1016/j.jgg.2018.07.009. [DOI] [PubMed] [Google Scholar]
  • 35.Wu W., Zhai G., Xu Z., Hou B., Liu D., Liu T., Liu W., Ren F. Whole-exome sequencing identified four loci influencing craniofacial morphology in northern Han Chinese. Hum. Genet. 2019;138:601–611. doi: 10.1007/s00439-019-02008-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Li Y., Zhao W., Li D., Tao X., Xiong Z., Liu J., Zhang W., Ji A., Tang K., Liu F., Li C. EDAR, LYPLAL1, PRDM16, PAX3, DKK1, TNFSF12, CACNA2D3, and SUPT3H gene variants influence facial morphology in a Eurasian population. Hum. Genet. 2019;138:681–689. doi: 10.1007/s00439-019-02023-7. [DOI] [PubMed] [Google Scholar]
  • 37.Xiong Z., Dankova G., Howe L.J., Lee M.K., Hysi P.G., de Jong M.A., Zhu G., Adhikari K., Li D., Li Y., et al. Novel genetic loci affecting facial shape variation in humans. Elife. 2019;8 doi: 10.7554/eLife.49898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Huang Y., Li D., Qiao L., Liu Y., Peng Q., Wu S., Zhang M., Yang Y., Tan J., Xu S., et al. A genome-wide association study of facial morphology identifies novel genetic loci in Han Chinese. J. Genet. Genom. 2021;48:198–207. doi: 10.1016/j.jgg.2020.10.004. [DOI] [PubMed] [Google Scholar]
  • 39.Liu C., Lee M.K., Naqvi S., Hoskens H., Liu D., White J.D., Indencleef K., Matthews H., Eller R.J., Li J., et al. Genome scans of facial features in East Africans and cross-population comparisons reveal novel associations. PLoS Genet. 2021;17 doi: 10.1371/journal.pgen.1009695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Adhikari K., Fuentes-Guajardo M., Quinto-Sánchez M., Mendoza-Revilla J., Camilo Chacón-Duque J., Acuña-Alonzo V., Jaramillo C., Arias W., Lozano R.B., Pérez G.M., et al. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation. Nat. Commun. 2016;7 doi: 10.1038/ncomms11616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Aynechi N., Larson B.E., Leon-Salazar V., Beiraghi S. Accuracy and precision of a 3D anthropometric facial analysis with and without landmark labeling before image acquisition. Angle Orthod. 2011;81:245–252. doi: 10.2319/041810-210.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Qiao H., Tan J., Wen S., Zhang M., Xu S., Jin L. De novo dissecting the three-dimensional facial morphology of 2,379 Han Chinese individuals. Phenomics. 2023:1–12. doi: 10.1007/s43657-023-00109-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Bookstein L.F. Cambridge University Press; 1992. Morphometric Tools for Landmark Data: Geometry and Biology. [Google Scholar]
  • 44.Team C.R. R Foundation for Statistical Computing; 2013. R Development Core Team. R: A Language and Environment for Statistical Computing. [Google Scholar]
  • 45.Surazhsky V., Surazhsky T., Kirsanov D., Gortler S.J., Hoppe H. Fast exact and approximate geodesics on meshes. ACM Trans. Graph. 2005;24:553–560. [Google Scholar]
  • 46.Sture H. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979;6:65–70. [Google Scholar]
  • 47.Buja A., Swayne D.F., Littman M.L., Dean N., Hofmann H., Chen L. Data Visualization With Multidimensional Scaling. J. Comput. Graph Stat. 2008;17:444–472. [Google Scholar]
  • 48.Laurens V.D.M., Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
  • 49.Lee L.C., Liong C.Y., Jemain A.A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. Analyst. 2018;143:3526–3539. doi: 10.1039/c8an00599k. [DOI] [PubMed] [Google Scholar]
  • 50.Wheelock Å.M., Wheelock C.E. Trials and tribulations of 'omics data analysis: assessing quality of SIMCA-based multivariate models using examples from pulmonary medicine. Mol. Biosyst. 2013;9:2589–2596. doi: 10.1039/c3mb70194h. [DOI] [PubMed] [Google Scholar]
  • 51.Chong I.-G., Jun C.-H. Performance of some variable selection methods when multicollinearity is present. Chemometr. Intell. Lab. Syst. 2005;78:103–112. [Google Scholar]
  • 52.Anderson M.J. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 2001;26:32–46. [Google Scholar]
  • 53.Oksanen J., Simpson G.L., Guillaume Blanchet F., Kindt R., Legendre P., Minchin P.R., O'Hara R.B., Peter S., Stevens M.H.H., Szoecs E., et al. 2022. vegan: Community Ecology Package. R package version. [Google Scholar]
  • 54.Thompson B. 2005. Canonical Correlation Analysis (Encyclopedia of Statistics in Behavioral Science) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S5 and Tables S4–S10
mmc1.pdf (959KB, pdf)
Table S1. Sex difference of facial phenotypes, related to Figure 2
mmc2.xlsx (759.5KB, xlsx)
Table S2. Correlation analysis of facial phenotypes with age, related to Figure 2
mmc3.xlsx (668.6KB, xlsx)
Table S3. Correlation analysis of facial phenotypes with BMI, related to Figure 2
mmc4.xlsx (560.8KB, xlsx)
Table S11. Total number of non-significant phenotypes, related to Figure 2
mmc5.xlsx (17.8KB, xlsx)
Table S12. Facial phenotypic abbreviations and definitions, related to STAR Methods
mmc6.xlsx (407.8KB, xlsx)

Data Availability Statement

  • All data reported in this paper will be shared by the lead contact upon request.

  • The paper does not report the original code.

  • Any additional information required to re-analyze the data reported in this paper is available from the lead contact upon request.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES