Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2020 Oct 12;10:17001. doi: 10.1038/s41598-020-73971-8

Spouses’ faces are similar but do not become more similar with time

Pin Pin Tea-makorn 1, Michal Kosinski 2,
PMCID: PMC7550338  PMID: 33046769

Abstract

The widely disseminated convergence in physical appearance hypothesis posits that long-term partners’ facial appearance converges with time due to their shared environment, emotional mimicry, and synchronized activities. Although plausible, this hypothesis is incompatible with empirical findings pertaining to a wide range of other traits—such as personality, intelligence, attitudes, values, and well-being—in which partners show initial similarity but do not converge over time. We solve this conundrum by reexamining this hypothesis using the facial images of 517 couples taken at the beginning of their marriages and 20 to 69 years later. Using two independent methods of estimating their facial similarity (human judgment and a facial recognition algorithm), we show that while spouses’ faces tend to be similar at the beginning of marriage, they do not converge over time, bringing facial appearance in line with other personal characteristics.

Subject terms: Computational biology and bioinformatics, Data mining, Machine learning, Psychology, Human behaviour

Introduction

What predisposes two people to form and maintain a long-term romantic relationship is a fundamental question with critical consequences for the individuals involved, their families, and entire societies. While we do not yet have a satisfactory answer, one thing is clear: Romantic partners tend to be similar in a wide range of characteristics, ranging from physical and physiological to demographics and psychological1. Long-term romantic partners have been shown to be similar in terms of height, weight, health, diet, age, physical attractiveness, education, ability, intelligence, psychological well-being, personality, attitudes, values, religion, social class, ethnicity, lifestyle, and many other traits28.

What drives romantic partners’ similarity? Two sets of mechanisms have been proposed to explain it. First, partners may be similar from the outset of their relationship due to homophily (i.e., preference for similar others)9,10, the mechanics of the dating market (e.g., having to settle for a partner with a similar level of attractiveness)11, or social homogamy (i.e., being surrounded—socially and geographically—by similar others)8. Second, partners may become more similar with time due to repeated interactions, synchronized routines, shared environment12,13, and/or attrition (i.e., less similar couples breaking up, thus boosting the average similarity of the surviving ones)14,15. Although both sets of mechanisms seem plausible, empirical research consistently suggests that couples are similar to begin with but do not become any more similar with time. Long-term couples, for example, exhibit similarity patterns parallel to new couples16, and are no more similar in terms of attitudes, values, intelligence, personality, psychological well-being, and interests3,4,1719. Also, partners’ personality and interests are similar even before they met (online) for the first time20. These and other analogous findings led most scholars to conclude that shared life experiences and circumstances play a significant role in maintaining, rather than increasing, couples’ initial similarity2,21.

There is, however, one trait that does not seem to follow this general pattern: facial appearance. In their seminal paper, Zajonc, Adelmann, Murphy, and Niendenthal12 showed that spouses’ faces were not similar at the outset of marriage but became more similar with time. Moreover, they found that the degree of convergence positively correlated with couples’ ratings of marriage quality. Their convergence in physical appearance hypothesis posits that as long-term partners tend to occupy the same environments, engage in the same activities, eat the same food, and mimic each other’s emotional expressions—and as these factors can also influence facial features—spouses’ facial appearances should converge with time. For example, if the partners smile a lot—and make each other smile—they should co-develop similar wrinkle patterns (smile lines)22.

Importantly, Zajonc et al.’s reasoning12—that appearance converges as a function of shared actions and environment, and emotional mimicry—should apply to other personal characteristics as well. How does one reconcile the convergence in facial appearance with the lack thereof in the context of virtually all other traits, such as interests, personality, intelligence, attitudes, values, and well-being? A closer look at the literature reveals that while the convergence in physical appearance hypothesis is one of the tenets of current psychological science and has been widely disseminated through textbooks23, books24,25, and landmark papers26,27, it has virtually no empirical support. Zajonc et al.’s study12, while elegantly designed, was based on an extremely small sample of 12 married heterosexual couples. Furthermore, its findings have never been replicated. Two other studies occasionally cited in support of facial convergence (Hinsz28 and Griffith and Kunz29) neither tested this hypothesis nor provided any support for it. Both studies presented evidence for facial homogamy, i.e., spouses’ tendency to have similar faces, but provided no support for the increase in facial resemblance over time. Hinsz28 found that romantic partners’ faces were more similar than those of random pairs of men and women, yet couples married for 25 years were no more similar than recently engaged ones. Griffith and Kunz29 showed that student raters could match spouses’ faces at a level above chance, yet found “no significant trend in growing to look alike as persons live together as husband and wife” (p. 453).

In this work, we aim to validate the physical convergence hypothesis in a large sample (n = 517) of white married heterosexual couples (we were unable to find a large enough sample of homosexual and non-white couples to allow for a meaningful analysis). Two approaches to measuring facial similarity were used: human judges and a modern facial recognition algorithm. Both approaches showed that while spouses’ faces were similar at the outset of their marriage, they did not converge over time.

Methods

The study has been reviewed and approved by Stanford University’s IRB. All methods were carried out in accordance with relevant guidelines and regulations. The preregistration documents can be found at https://aspredicted.org/2fh78.pdf. The Supplementary Information contains the list of and rationale for the post-registration changes to the study design. The materials, data, and code used to compute the results are available at https://osf.io/ekwm7.

Facial images

The facial images of 517 couples were collected from public online sources: 392 newspaper wedding anniversary announcements downloaded from https://www.newspaperarchive.com, 102 Google Search results, and 23 public profiles from Ancestry.com (a genealogy website). Two facial images of each spouse were collected: one taken within 2 years of the wedding, and one taken 20 to 69 years later (the marriage dates and dates on which the photos were taken were extracted from their captions; the average marriage length was 49 years).

Images were processed using Face++ (https://www.faceplusplus.com)—a widely used facial recognition software—to detect facial outlines and head orientation, and to approximate individuals’ age (see Supplementary Fig. S1 for age distribution). We only included images containing faces larger than 120 × 120 pixels and with an absolute value of yaw and pitch below  55° and 24°, respectively. The images were converted into grayscale and cropped around the face to remove the background and non-facial details. Their brightness was corrected using the “auto-adjust colors” function in IrfanView 4.5. The faces were rotated to the vertical position and resized to 224 × 224 pixels (see Fig. 1).

Figure 1.

Figure 1

An example stimulus set (to protect participants’ privacy, we used photos of our colleagues. Their informed consent for publication was obtained).

Stimulus sets

Faces were arranged into 2068 unique stimulus sets (517 couples × two spouses × two time points: at the beginning of the marriage and 20 to 69 years later) following the procedure from Zajonc et al.12 Each face (target) was matched with faces of six other people of the opposite sex (alternatives): the target’s spouse and five random others from our dataset. To control for the effect of age and eyewear, the alternatives had the same eyewear status (glasses or no glasses) and similar approximated age (+ /− 5 years) as the spouse. An example stimulus set is presented in Fig. 1.

Human judges and rankings

Judges (n = 153; from the U.S.), recruited on Amazon Mechanical Turk (AMT; an online crowdsourcing marketplace), were instructed to rank alternative faces from the most (1) to the least (6) closely resembling the target face (see Fig. 1). Ten rankings were obtained for each stimulus set. The spouses’ perceived similarity at a given point in time (at the beginning of the marriage and later) was computed by averaging their ranks across two stimulus sets pertaining to them (husband as a target and wife as a target). The resulting scale ranged from 1 (all judges perceived them as most similar) to 6 (all judges perceived them as least similar). If there was no link between being married and similarity, the spouses’ average rank should equal 3.5. The use of a relative (i.e., ranking) rather than absolute (e.g., Likert scale) measure of facial similarity enabled controlling for the possibility that people’s faces may generally become more (or less) similar with time as they age.

Additionally, following Zajonc et al.’s12 original design, a separate sample of 117 judges recruited on AMT were asked to rank alternatives in terms of their likelihood to be married to the target (the same stimulus as presented in Fig. 1 was used, with “closely resembles” replaced with “likely to be married to”).

Facial recognition algorithm and rankings

An alternative set of results was produced using VGGFace230, a widely used facial recognition deep neural network that was shown to outperform humans in judging facial similarity31. Facial recognition algorithms convert faces into numerical vectors (face descriptors) capturing facial features and compare those vectors across images: The more similar the vectors, the more likely they are to represent the same face. As facial recognition algorithms are aimed at recognizing people across images taken at different times, with different devices, from different angles, and in different circumstances, they tend to capture features that remain stable across age and context, such as facial morphology and complexion. They are as unaffected as possible by transient features such as aging, facial expression, head orientation, hairstyle, and image properties such as background and lighting32. Consequently, they are well suited to the task of quantifying the similarity between faces, while controlling—as much as possible—for transient features.

Following the standard procedure used in facial recognition, cropped facial images were converted into 2048-value-long face descriptors using VGGFace2 in SE-ResNet-50 architecture and L2-normalized. Next, for each stimulus set, the cosine similarity between face vectors of the target and each alternative face was computed. The alternative faces were ranked from the most (1) to the least (6) closely resembling the target face (i.e., the same ranking scale as for human judges).

Statistical analyses

The average similarity ranks of spouses’ faces were compared with the chance value (3.5) using one sample two-tailed t-test to detect homogamy. Paired two-tailed t-tests were used to compare the similarity of spouses’ faces at the beginning of marriage and later to detect the convergence in facial appearance. The average Kendall rank correlation between two randomly selected rankings for each stimulus set was used to measure inter-rater reliability.

Results

Figure 2 shows the similarity ranks produced by human judges (left panel) and VGGFace2 (right panel) at the time of marriage (blue bars) and 20 to 69 years later (green bars). The combined results for all age groups are shown on the gray background. Consistent with the previous studies28,29,3335, we found evidence of homogamy, or spouses’ tendency to have similar faces. At the time of marriage, their average rank was significantly lower than 3.5 (i.e., the rank expected if the alternatives were ranked randomly): 2.75 (95% CI = [2.69, 2.81], one sample t-test t = − 25.08, two-tailed p < 0.001, n = 517) for human judges; and 2.89 (95% CI = [2.76, 3.02], one sample t-test t = − 9.32, two-tailed p < 0.001, n = 517) for VGGFace2.

Figure 2.

Figure 2

The average facial similarity of the spouses at marriage and 20 to 69 years later. Error bars represent 95% confidence intervals (also see Supplementary Table S1 online).

However, we did not find evidence for the convergence in physical appearance hypothesis: Spouses’ faces did not become more similar with time. In fact, according to human judges, spouses’ faces became slightly less similar with time (paired t-test; t = − 3.70, two-tailed p < 0.001, n = 517), though the difference in the rankings was relatively small (Δ = 0.15, 95% CI = [0.07, 0.22]) and was not replicated in the VGGFace2 analysis. The same results were obtained when analyzing data separately for couples married for different lengths of time (Fig. 2): Spouses’ faces tended to be similar but did not become more similar with time, regardless of the time span between the first and the second set of pictures.

Importantly, judgments’ reliability did not vary with subjects’ age or time when the picture was taken: There was no significant difference between the inter-rater reliability for pictures taken at the time of marriage and later (Kendall τmarriage = 0.165; 95% CI = [0.161, 0.168] and τlater = 0.161; 95% CI = [0.157, 0.165]; τmarriage − τlater = 0.004, 95% CI = [− 0.001, 0.009], two-tailed p = 0.95). This indicates that the judges were as consistent when ranking the similarity of faces of young people (taken several decades ago) as the faces of older people (taken more recently).

As in the context of facial similarity (and contrary to Zajonc et al.’s12 findings), there were also no significant differences in judges’ ratings of spouses’ likelihood to be married between facial images taken at the time of marriage and later (paired t-test; t = − 1.51, two-tailed p = 0.13, n = 517; see Supplementary Table S2 for details).

Discussion

We do not find support for the widely disseminated convergence in physical appearance hypothesis: Spouses’ faces are similar but do not converge with time. This brings facial appearance in line with other traits—such as interests, personality, intelligence, attitudes, values, and well-being—which show initial similarity but do not converge over time2.

This study has several limitations. First, we used publicly available images and thus could not control for variance in image properties and self-presentation (such as grooming, facial expression, or biases in selecting images to be publicly shared online). Yet, according to the convergence in physical appearance hypothesis, these factors should amplify the convergence rather than obscure it. Spouses’ tendency to occupy the same environments, engage in the same activities, eat the same food, and—in particular—mimic each other’s emotional expressions should result in convergence in their self-presentation behaviors, and thus more (and not less) similar public facial images. Second, we did not record or control for judges’ age and ethnicity and thus the extent to which their judgments might have been affected by the own-age36 and own-ethnicity37 biases (people’s lower sensitivity when judging the similarity of faces of other ages and ethnic groups). Yet, while the own-ethnicity bias could add noise to our measurements, it is unlikely to moderate the change in similarity over time, as participants’ ethnicity was constant. Also, while the U.S. AMT workers tend to be young38, they were as good at ranking the similarity of faces of young people (taken several decades ago) as the faces of older people (taken more recently). Furthermore, those and other risks to the judges’ accuracy were counterbalanced by the use of two independent measures of facial similarity (human judges and VGGFace2) and the relatively large sample size, enabling the detection of a change in human rankings as small as Δ = 0.17 (with 80% power, α = 0.001), an equivalent of one in six judges increasing a spouse’s rank by just one position. Finally, the validity of our approach and dataset are supported by the successful replication of the well-established effect of people’s tendency to marry similar others (i.e., homogamy).

While the rejection of the convergence in physical appearance hypothesis is surely not as exciting or as cite-worthy as its counterfactual, it solves one of the major conundrums of psychological science and brings us closer to understanding factors predisposing people to form and maintain long-term romantic relationships.

Supplementary information

Supplementary Figure S1. (287.5KB, tiff)

Acknowledgements

We thank Jill O’Nan, Sorathan Chaturapruek, Wiriya Thongsomboon, Varitta Ouiyamaphan, Kawisorn Kamtue, and Natt Srisutthiyakorn for their critical reading of the manuscript, and Youyou Wu for her help with designing the figures.

Author contributions

P.T. and M.K. designed the study and wrote the manuscript. P.T. collected the data and conducted the analysis.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

is available for this paper at 10.1038/s41598-020-73971-8.

References

  • 1.Buss DM. Human mate selection: Opposites are sometimes said to attract, but in fact we are likely to marry someone who is similar to us in almost every variable. Am. Sci. 1985;73:47–51. [Google Scholar]
  • 2.Luo S. Assortative mating and couple similarity: Patterns, mechanisms, and consequences. Soc. Pers. Psychol. Compass. 2017;11:e12337. doi: 10.1111/spc3.12337. [DOI] [Google Scholar]
  • 3.Watson D, et al. Match makers and deal breakers: Analyses of assortative mating in newlywed couples. J. Pers. 2004;72:1029–1068. doi: 10.1111/j.0022-3506.2004.00289.x. [DOI] [PubMed] [Google Scholar]
  • 4.Buss DM. Marital assortment for personality dispositions: Assessment with three different data sources. Behav. Genet. 1984;14:111–123. doi: 10.1007/BF01076408. [DOI] [PubMed] [Google Scholar]
  • 5.Schwartz C, Graff N. Assortative matching among same-sex and different-sex couples in the United States, 1990–2000. Demogr. Res. 2009;21:843–878. doi: 10.4054/DemRes.2009.21.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Robinson MR, et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. 2017;1:0016. doi: 10.1038/s41562-016-0016. [DOI] [Google Scholar]
  • 7.Vandenberg SG. Assortative mating, or who marries whom? Behav. Genet. 1972;2:127–157. doi: 10.1007/BF01065686. [DOI] [PubMed] [Google Scholar]
  • 8.Epstein E, Guttman R. Mate selection in man: Evidence, theory, and outcome. Biodemogr. Soc. Biol. 1984;31:243–278. doi: 10.1080/19485565.1984.9988579. [DOI] [PubMed] [Google Scholar]
  • 9.Hitsch GJ, Hortaçsu A, Ariely D. What makes you click?—Mate preferences in online dating. Quant. Mark. Econ. 2010;8:393–427. doi: 10.1007/s11129-010-9088-6. [DOI] [Google Scholar]
  • 10.Watson D, Beer A, McDade-Montez E. The role of active assortment in spousal similarity. J. Pers. 2014;82:116–129. doi: 10.1111/jopy.12039. [DOI] [PubMed] [Google Scholar]
  • 11.Xie Y, Cheng S, Zhou X. Assortative mating without assortative preference. Proc. Natl. Acad. Sci. 2015;112:5974–5978. doi: 10.1073/pnas.1504811112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zajonc RB, Adelmann PK, Murphy ST, Niedenthal PM. Convergence in the physical appearance of spouses. Motiv. Emot. 1987;11:335–346. doi: 10.1007/BF00992848. [DOI] [Google Scholar]
  • 13.Zajonc RB. Emotion and facial efference: A theory reclaimed. Science. 1985;228:15–21. doi: 10.1126/science.3883492. [DOI] [PubMed] [Google Scholar]
  • 14.Schwartz CR. Earnings inequality and the changing association between spouses’ earnings. Am. J. Sociol. 2010;115:1524–1557. doi: 10.1086/651373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Schwartz CR. Pathways to educational homogamy in marital and cohabiting unions. Demography. 2010;47:735–753. doi: 10.1353/dem.0.0124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Luo S. Partner selection and relationship satisfaction in early dating couples: The role of couple similarity. Pers. Individ. Dif. 2009;47:133–138. doi: 10.1016/j.paid.2009.02.012. [DOI] [Google Scholar]
  • 17.Caspi A, Herbener ES. Marital assortment and phenotypic convergence: Longitudinal evidence. Biodemogr. Soc. Biol. 1993;40:48–60. doi: 10.1080/19485565.1993.9988835. [DOI] [PubMed] [Google Scholar]
  • 18.Feng D, Baker L. Spouse similarity in attitudes, personality, and psychological well-being. Behav. Genet. 1994;24:357–364. doi: 10.1007/BF01067537. [DOI] [PubMed] [Google Scholar]
  • 19.Glicksohn J, Golan H. Personality, cognitive style and assortative mating. Pers. Individ. Dif. 2001;30:1199–1209. doi: 10.1016/S0191-8869(00)00103-3. [DOI] [Google Scholar]
  • 20.Gonzaga GC, Carter S, Galen Buckwater J. Assortative mating, convergence, and satisfaction in married couples. Pers. Relatsh. 2010;17:634–644. doi: 10.1111/j.1475-6811.2010.01309.x. [DOI] [Google Scholar]
  • 21.Caspi A, Herbener ES, Ozer DJ. Shared experiences and the similarity of personalities: A longitudinal study of married couples. J. Pers. Soc. Psychol. 1992;62:281–291. doi: 10.1037/0022-3514.62.2.281. [DOI] [PubMed] [Google Scholar]
  • 22.Lemperle G, Holmes RE, Cohen SR, Lemperle SM. A classification of facial wrinkles. Plast. Reconstr. Surg. 2001;108:1735–1750. doi: 10.1097/00006534-200111000-00049. [DOI] [PubMed] [Google Scholar]
  • 23.Gillbert DT, Fiske ST, Lindzey G. The Handbook of Social Psychology. Oxford: Oxford University Press; 1998. [Google Scholar]
  • 24.Earley PC, Ang S. Cultural Intelligence: Individual Interactions Across Cultures. Palo Alto: Stanford University Press; 2003. [Google Scholar]
  • 25.Berger J. Invisible Influence: The Hidden Forces that Shape Behavior. New York: Simon and Schuster; 2016. [Google Scholar]
  • 26.Niedenthal PM. Embodying emotion. Science. 2007;316:1002–1005. doi: 10.1126/science.1136930. [DOI] [PubMed] [Google Scholar]
  • 27.Chartrand TL, Bargh JA. The chameleon effect: The perception–behavior link and social interaction. J. Pers. Soc. Psychol. 1999;76:893–910. doi: 10.1037/0022-3514.76.6.893. [DOI] [PubMed] [Google Scholar]
  • 28.Hinsz VB. Facial resemblance in engaged and married couples. J. Soc. Pers. Relat. 1989;6:223–229. doi: 10.1177/026540758900600205. [DOI] [Google Scholar]
  • 29.Griffiths RW, Kunz PR. Assortative mating: A study of physiognomic homogamy. Soc. Biol. 1973;20:448–453. doi: 10.1080/19485565.1973.9988075. [DOI] [PubMed] [Google Scholar]
  • 30.Cao, Q., Shen, L., Xie, W., Parkhi, O. M. & Zisserman, A. VGGFace2: A dataset for recognising faces across pose and age. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) 67–74 (IEEE, New York, 2018). 10.1109/FG.2018.00020.
  • 31.Parkhi, O. M., Vedaldi, A. & Zisserman, A. Deep face recognition. In Procedings of the British Machine Vision Conference 2015 (eds. Xie, X., Jones, M. W. & Tam, G. K. L.) 41.1–41.12 (British Machine Vision Association, 2015). 10.5244/C.29.41.
  • 32.Kortylewski, A. et al. Empirically analyzing the effect of dataset biases on deep face recognition systems. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2174–217409 (IEEE, New York, 2018). 10.1109/CVPRW.2018.00283.
  • 33.Abel EL, Kruger ML. Facial resemblances between heterosexual, gay, and lesbian couples. Psychol. Rep. 2011;108:688–692. doi: 10.2466/07.PR0.108.3.688-692. [DOI] [PubMed] [Google Scholar]
  • 34.Alvarez L, Jaffe K. Narcissism guides mate selection: Humans mate assortatively, as revealed by facial resemblance, following an algorithm of “self seeking like”. Evol. Psychol. 2004;2:147470490400200. doi: 10.1177/147470490400200123. [DOI] [Google Scholar]
  • 35.Wong YK, Wong WW, Lui KFH, Wong AC-N. Revisiting facial resemblance in couples. PLoS ONE. 2018;13:e0191456. doi: 10.1371/journal.pone.0191456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rhodes MG, Anastasi JS. The own-age bias in face recognition: A meta-analytic and theoretical review. Psychol. Bull. 2012;138:146–174. doi: 10.1037/a0025750. [DOI] [PubMed] [Google Scholar]
  • 37.Wiese H, Schweinberger SR. Inequality between biases in face memory: Event-related potentials reveal dissociable neural correlates of own-race and own-gender biases. Cortex. 2018;101:119–135. doi: 10.1016/j.cortex.2018.01.016. [DOI] [PubMed] [Google Scholar]
  • 38.Levay KE, Freese J, Druckman JN. The demographic and political composition of mechanical turk samples. SAGE Open. 2016;6:215824401663643. doi: 10.1177/2158244016636433. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figure S1. (287.5KB, tiff)

Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES