Gender effects and racial biases in mate selection as revealed by machine learning

Aimee Hastings-James; AJ Hinman; Derek Berger; Jacob Levman

doi:10.1038/s41598-025-25028-x

. 2025 Nov 26;15:42300. doi: 10.1038/s41598-025-25028-x

Gender effects and racial biases in mate selection as revealed by machine learning

Aimee Hastings-James ^1,^2,^#, AJ Hinman ^1,^3,^#, Derek Berger ¹, Jacob Levman ^1,^4,^✉

PMCID: PMC12660853 PMID: 41298555

Abstract

This study investigates the potential role of machine learning (ML) technology for predicting a match, or mutual interest, in the context of speed dating. Modern machine learning technologies (light gradient boosting machine - lgbm, random forest, logistic regression, stochastic gradient descent, k nearest neighbour), exhaustively combined with feature selection methods (filter-based association, filter-based prediction, embedded lgbm, embedded linear, redundancy aware step up wrapper), were applied to a speed dating dataset, and tasked with predicting a match (mutual interest from speed dating participants). Our analysis employed public-domain ML software combined with a public-domain dataset, supporting reproducibility of study findings. Results indicate that ML models can predict a match with 85.4 to 86.4% accuracy. The creation of ethical ML applications in this domain, including those blinded to issues of race, and specific to each gender, are explored as part of this analysis. Results also demonstrate that it is possible to create race-blinded ML models with approximately equal performance to those biased by racial information, thus supporting the creation of more ethical, inclusive, and behavior-focused technologies.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-25028-x.

Keywords: Dating, Mutual interest, Machine learning, Feature selection, Ethical machine learning, Ethical AI, Automated machine learning

Subject terms: Computational biology and bioinformatics, Mathematics and computing

Introduction

Interest in alternatives to dating apps has been increasing, as users experience burnout alongside unwanted pressure to engage in sexual intercourse¹. This has likely contributed to a 49% increase in singles events within the last year². Unwanted communications from prospective mates, even after indicating disinterest, is likely contributing to renewed interest in approaches to dating where the ability to contact an individual is strictly controlled, such as speed dating. Speed dating involves a series of short interactions, often just a few minutes long, where prospective mates have the ability to interact and gauge their respective interest in each other. Speed dating presents prospective mates with a series of candidates for relationships in short succession, providing the individual with the opportunity to only share contact details with those who are mutually interested in reciprocating. Thus, speed dating can act as a mechanism for mate selection, potentially playing a role in alleviating loneliness and social isolation.

Closely related work

Previous experimental research from Columbia Business School aimed to uncover patterns that could explain why the number of interracial marriages is low³. The approach taken involved arranging speed dating events where the participants of each event had four-minute interactions with all opposite-sex participants. The participants then decided if they wanted to keep in contact with their date and rated the date based on attributes such as their physical features and personality. The study included a diverse group of participants with different racial and ethnic backgrounds, which approximately mirrors the Columbia University graduate and professional student population. They then used linear probability models to determine how each individual racial group impacted a participant’s decision. They also used a linear probability model to assess the role of race in determining attractiveness ratings. They looked at whether determinants, such as age, had an effect on racial preferences, and found that racial status was contributing to decision-making, particularly in women. Additionally, they found that race impacted attractiveness scores. Age and other determinants were found to have an impact on racial preferences, with older subjects discriminating less while forming matches³.

An additional study, also completed at Columbia Business School, and published in The Quarterly Journal of Economics, focused on gender differences in speed dating⁴. The goal of the study was to examine the role of an individual’s attributes in decision-making during the matching process. This study focused on three preferences: attractiveness, intelligence, and ambition. They used a linear probability model that evaluated the participants’ decisions based on those three attributes. Additionally, they examined whether the participants originated in the same field of study, came from the same region of the world, or had the same race. They found that men prefer women that have a level of ambition and intelligence that is lower than their own. Furthermore, it was found that women place greater emphasis on intelligence than men. It was also found that women put more emphasis on race in their decision-making process, a finding in agreement with the previously introduced study³. In their analysis, field of study was not found to have any predictive power⁴.

Finally, an additional study, published in the Review of Economics and Statistics, investigated contrast effects in sequential decisions in the context of speed dating⁵. This study aimed to explore how the order in which participants met their partners affects their perceptions and evaluations of subsequent partners, and was based on contrast effects - the perception of an option being influenced by previous options. They examined the influence of attractiveness to a prior partner to that of their current partner. The results implied that a one-unit rise in previous partner attractiveness led to a 1.8% decrease in current inclination to date. They also found that only males were sensitive to prior partner attractiveness.

Although three high-quality studies^3–5 have produced interesting findings concerning gender and racial effects in mate selection, all based on the same dataset, no studies analyzing this dataset with machine learning (ML) technologies exist in the literature prior to this analysis.

Machine learning (ML) introduction

Machine learning (ML) is a term used to describe technologies that are capable of improving their performance (or learning) through experience, which is a subdomain of artificial intelligence^6–8. Supervised machine learning technologies are ones where the application developer ‘supervises’ the training (or learning) of the machine, by providing it with a carefully annotated ground-truth dataset with clearly defined target variables for the machine to predict, such as whether a sample belongs to a group of interest (classification), or predicting a specific number (regression), from a typically large collection of measurements that might inform that prediction^9,10. Feature selection technologies can act in combination with supervised learning algorithms, to limit the number of underlying feature measurements that are relied upon to inform prediction^11,12. Feature selection algorithms have tremendous potential value in identifying factors that are predictive of the target variable of interest, as limited sets of features relied upon to inform predictions.

Recent advances in automated ML (AutoML) have included the development of software that encapsulates all aspects of a standard ML study, including comparing combinations of learning and feature selection algorithms, combined with tuning of the learning machines, and fair comparison of all methods via established validation techniques, in a single reproducible public-domain program, such as df-analyze¹³. By implementing a standard ML study with a single shell command, df-analyze has modularized a complex process, supporting the next generation of ML enabled studies to perform reproducible analyses, and derive interesting findings through comparison of ML performance and behaviour. The first study using df-analyze was a simple analysis focused on a standard medical application, diagnosing schizophrenia¹⁴. A subsequent analysis using df-analyze, focused on chronic kidney disease, included a comparison of diagnostic performance with and without key clinical patient characteristics (such as if a patient had diabetes), demonstrating clear value from the inclusion of clinical patient variables informing the ML¹⁵. That study helped demonstrate the feasibility of using df-analyze to compare ML technologies trained on carefully controlled differences in the underlying datasets the ML was trained upon. AutoML df-analyze software has also been applied to thyroid cancer recurrence prediction¹⁶, pediatric appendicitis¹⁷, and investigating proteins potentially associated with learning in the cerebral cortex¹⁸. An additional study focused on predicting traffic stop outcomes, the first attempt to use df-analyze to help answer social questions, while simultaneously constructing ML technologies that mitigate racial and gender biases potentially perpetrated by traffic officers¹⁹. Results supported previous literature claims that there are gender and racial effects in traffic stop outcomes, and demonstrated that by creating race- and gender-blinded ML, we can develop technologies that mitigate the effects of racial and gender biases on the part of the traffic officers¹⁹, whose behaviour necessarily influence the training data upon which the ML learns to model its own functionality. In the present analysis, we expand upon the use of df-analyze to a blocking design, whereby we create ML models specific to each gender, supporting comparison of ML technology’s functional behaviour, specifically what feature measurements are relied upon to inform prediction for each gender, providing a methodology to easily support comparative analyses to better understand gender effects. Additionally, in the present analysis, we repeat the creation of race-blinded ML previously addressed with respect to traffic stops¹⁹, towards the creation of more ethical technologies that heavily mitigate the possibility of perpetuating pre-existing biases inherent in the data upon which the machines are trained.

Study hypotheses and contributions

In this study, we hypothesize that the application of ML technologies may assist in the creation of more ethical approaches to relationship matching, and may assist in assessing gender effects and racial biases.

The main contributions of this manuscript is that this is the first study to apply automated ML technology to predicting mutual interest with and without racial information, as well as creating sex-specific models to help reveal gender effects.

Results

All data – baseline and race-agnostic ML models

Table 1 provides a summary of our leading models from Table A1, which provides the results of our extensive validation on our baseline dataset (both genders and racial information included), presenting 5-Fold performance on the hold-out dataset. All appendix tables (Table A1 through A8, are provided in the supplementary materials). Table 2 provides a summary of our leading models from Table A2, which provides the results of our validation on our race-agnostic dataset consisting of both genders, providing validation results for 5-Fold performance on the hold-out dataset. The leading method for the full baseline dataset with all features included (including race and both genders, but excluding features identified in the data cleaning description in the methods) achieved 86.2% accuracy with the light gradient boosting machine (lgbm) and multiple feature selection approaches (see Table 1 and A1). It is noteworthy that these models selected for race-based features to inform the predictive model. The leading method for the race-blinded ML models applied to the full dataset (both genders) achieved 85.9% accuracy, with the light gradient boosting machine (lgbm) or with the random forest (rf) machine learning methods (see Table 2 and A2). These race-agnostic ML models were created with complex feature sets that included many measurements in the dataset, including age, perceptions of sincerity & attractiveness, importance of being from the same religion, ambition, finding the person funny, etc.

Table 1.

Summary Table. Full dataset race included. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lgbm	pred	none	0.862	0.844
lgbm	assoc	none	0.862	0.855
lgbm	embed_linear	linear	0.859	0.851
rf	embed_linear	linear	0.858	0.845
lgbm	none	none	0.857	0.848

Open in a new tab

Table 2.

Summary Table. Full dataset race agnostic. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lgbm	none	none	0.859	0.856
rf	embed_lgbm	lgbm	0.859	0.840
lgbm	embed_linear	linear	0.859	0.855
lgbm	embed_lgbm	lgbm	0.859	0.852
lgbm	assoc	none	0.858	0.851

Open in a new tab

Female data – baseline and race-agnostic ML models

When addressing female-specific ML models with the full dataset (race included), leading predictive performance was found to be 85.5% accuracy with either no feature selection, filter-based association (assoc) based feature selection, or linear embedded (embed_linear) feature selection (see Table 3 for a summary and Table A3 for the complete listing). All leading results based on feature selection methods selected for race features in the female dataset. Leading female-specific race-blinded ML models produced 85.4% accuracy (see Table 4 for a summary and Table A4 for the complete listing).

Table 3.

Summary Table. Female only race included. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lr	none	none	0.855	0.841
lgbm	none	none	0.855	0.854
lr	assoc	none	0.855	0.841
lr	embed_linear	linear	0.855	0.841
lgbm	embed_linear	linear	0.854	0.851

Open in a new tab

Table 4.

Summary Table. Female only race agnostic. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lgbm	none	none	0.854	0.855
rf	embed_linear	linear	0.852	0.850
lr	none	none	0.851	0.841
lr	embed_linear	linear	0.851	0.841
rf	none	none	0.851	0.841

Open in a new tab

Male data – baseline and race-agnostic ML models

When considering male-specific ML models with the full dataset (race included), leading predictive performance was found to be 86.4% accuracy with no feature selection (thus race information was included in the model), see Table 5 for a summary and Table A5 for the complete listing. High-performing race-agnostic models with slightly lower accuracy (86.3%) were also obtained from methods relying upon the embedded linear method (embed_linear), see Table 6 for a summary and Table A6 for the complete listing.

Table 5.

Summary Table. Male only race included. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lgbm	none	none	0.864	0.863
lgbm	embed_linear	linear	0.862	0.861
lgbm	assoc	none	0.858	0.860
sgd	assoc	linear	0.858	0.834
rf	embed_linear	linear	0.858	0.853

Open in a new tab

Table 6.

Summary Table. Male only race agnostic. 5-Fold validation on holdout set.

Model	Selection	Embed Selector	Acc	AUROC
lgbm	embed_linear	linear	0.863	0.866
lgbm	none	none	0.861	0.854
lgbm	embed_lgbm	lgbm	0.858	0.850
lr	embed_lgbm	lgbm	0.854	0.836
lgbm	pred	none	0.854	0.861

Open in a new tab

Comparing male and female cohorts

High-performing models were obtained from the filter-based association (assoc) and prediction (pred) feature selection methods, as well as both the linear and light gradient boosting machine (lgbm) embedded feature selection methods (embed_lgbm, embed_linear). All four of these methods select for a large number of features in order to assist in the training of high-quality ML models. For standardization reasons, in a study design such as this one, we focus on comparing features selected for as predictive across genders, keeping all remaining methodological parameters identical, including all machine learning methods, and we only directly compare feature selection reports from the same feature selection methods across ML models.

Prediction method (pred)

In the prediction method (pred), the list of selected features are ranked according to an accuracy metric, with higher values implying the feature measurement has notably more apparent importance to the ML technology, with the lowest possible value for a feature that is included in the model being zero. In the female-focused ML using prediction feature selection, only four of the features selected for achieved non-zero accuracy scores: whether they have shared interests with their partner, how likely they thought it was that their male partner liked them back, whether their partner was funny, and whether they liked their partner. With the male-focused ML using filter-based prediction (pred) feature selection, only one of the features selected for achieved non-zero accuracy scores: whether they deemed the female attractive.

Embedded light gradient boosting machine method (embed_lgbm)

In the high-performing embed_lgbm method, we observed that the following features were selected for to be relied upon by the ML to predict mutual interest in the female population but not in the males: how happy they expect to be with the people they meet during the speed dating event, a rating of how funny the female sees herself, a rating of how important it is that their partner be of the same race, a self-rating of the female’s own intelligence, a self-rating of the female’s own sincerity, a female’s rating of her candidate partner’s sincerity, as well as her interest in movies, clubbing, sports, theater, sports viewed on television, as well as which cohort/session of the multiple speed dating events organized that they participated in.

The following features were selected (embed_lgbm) to inform ML predictions of mutual interest in the male population and not in the females: the male’s expectation of the number of females interested in themselves, field of study, and whether or not the race of the partner was European/Caucasian-American.

Also of note, the embed_lgbm technique ranks the apparent relative importance of each feature selected for in predicting mutual interest. For males, the leading features that were predictive of mutual interest from the embed_lgbm method were: How likely they think it is that their partner likes them, whether they like their partner, their female partner’s rating of how funny the male was, their female partner’s rating of how attractive the male is, and their rating of the attractiveness of the female.

For females, the leading features that were predictive of mutual interest from the embed_lgbm method were whether their own and their partner’s ratings of interests correlate with one another, how important their male partner rated having shared interests, how important their male partner rates sincerity, how important their male partner rates attractiveness, and how important their male partner rates being funny.

A full collection of all leading features selected for is provided for females in Table A7, and for males in Table A8, both available in the appendix, provided in the supplementary materials. Higher feature selection scores provided in the tables imply increased importance for that variable to inform ML model predictions.

Filter-based association method (assoc)

The association-based feature selection method (assoc) produces large feature sets in both genders, and ranks the features based on mutual information, providing a ranked sorting of feature importances. The leading features in ML built for males were: their female partner’s rating about the male’s shared interests, the female partner’s rating of how funny the male is, the male’s assessment of how likely they think it is that the female likes them, whether they like their partner, their rating of how funny their female partner is, the rating by their female partner of how attractive the male is, and their rating of their female partner’s attractiveness.

The leading features selected for by the association-based feature selection method in ML built for females was: whether they liked their male partner, how funny they rated their male partner to be, the rating by their male partner of how attractive the female is, the female’s rating of her male partner’s shared interests, their male partner’s rating of how funny they are, their rating of their male partner’s attractiveness, their male partner’s rating of how attractive the female is, and the rating of their male partner as to how funny the female is.

Embedded linear method (embed_linear)

The embedded linear feature selection method (embed_linear) produces large feature sets in both genders, and ranks the features in terms of apparent importance in informing ML model predictions of a match. The leading features in ML built for males were: whether the female’s race was European/Caucasian-American, whether their field of study was a Master’s of Business Administration (MBA), whether they liked their partner, the male’s expectation of the number of females interested in themselves, the female’s rating of the attractiveness of the male, their rating of their partner’s intelligence, their female partner’s rating of their shared interests, their rating of their partner’s attractiveness, their female partner’s rating of how funny the male was, the male’s rating of the expected number of females they will be matched with, and how likely they think it is that their female partner likes them.

The leading features selected for in the all-female ML models were: the rating by their male partner of how attractive the female was, whether the female’s race was European/Caucasian-American, whether they liked their partner, their rating of their own intelligence, their male partner’s rating of the female’s ambition, whether the female’s race was Black/African American, and whether their field of study was medicine.

Discussion

Machine learning race-blinded models

Results indicate that race-blinded models can be created with approximately equal performance to ones informed by race. However, race-biased models demonstrate repetitive selection for race-based features in leading ML models. As such, our findings are supportive of past studies that have reported race effects in mutual interest^3,4. While previous research implicated females as exhibiting a stronger racial preference effect^3,4, our analysis was not designed to measure the relative importance of racial effects for each gender. However, racial features were selected for by both male-specific and female-specific models predicting mutual interest. Our findings indicate that through the creation of race-blinded ML technology, we can still produce ML models with predictive accuracy that approximately ties the accuracy (acc) of the race-biased models for the full dataset that includes both genders (race-biased: 86.2% acc; race-blinded 85.9% acc), as well as for the female-specific ML (race-biased: 85.5% acc; race-blinded 85.4% acc), and the male-specific ML (race-biased: 86.4% acc; race-blinded 86.3% acc). A real-world implication of this finding is that dating app algorithms, which suggest matches, could exclude race-related information in their ML models and still potentially maintain strong predictive performance.

Gender effects revealed by machine learning

Our analysis has identified noteworthy gender differences between expert ML systems trained to predict mutual interest in females and males respectively. In the male population, ML models have placed more emphasis on the male’s perception of the female’s attractiveness (selected for by all methods, as well as being the leading feature in the prediction (pred) feature selection method). However, it should be noted that although perceptions of attractiveness are particularly relied upon in male-specific ML models, female-specific ML is also dependent on perceptions of attractiveness. Male-specific ML models have also regularly selected for features that may act as a proxy for male confidence, such as the male’s expectation of the number of females interested in them (which was selected for by the embed_lgbm method in males but not in females, and was a leading feature in the embed_linear method in males), how likely they think it is that their partner likes them (which was a leading feature selected for by the embed_lgbm method and the assoc method in males), and the male’s rating of the expected number of females they will be matched with (which was a leading selected feature in the embed_linear method for males). These results could imply that males with confidence are more likely to be matched. To further assess this theory, we computed the Pearson correlation between the expected number of participants interested in oneself (feature name: expected_num_interested_in_me, a potential proxy for confidence) and the match target variable. The results indicated a positive correlation of 0.149, with an associated p-value of 4.91 × 10⁻⁶. Additionally, the Pearson correlation between the number of matches expected by a male participant (feature name: expected_num_matches) and the match target variable was 0.105 with a p-value of 3.83 × 10⁻¹⁰. These p-values indicate a near zero likelihood that these findings were the result of random chance. In the female population, ML models have placed far more emphasis on sincerity (selected for by embed_lgbm as a leading feature in females and not selected for at all in the males), on being funny (selected for by the pred, embed_lgbm and assoc feature selection methods), and on a variety of features pertaining to mutual interests. It should also be noted that the female’s perception of how funny the male was, is a selected for feature in male-specific models predicting mutual interest, thus this feature is contributing to both gender models.

Also noteworthy was that in the leading embed_lgbm feature selection method, there are more features selected for in the female-specific ML, and notably more features that are not included in the male ML. Furthermore, the larger set of features relied upon in the female model produced accuracies for predicting mutual interest that were lower in the female-specific ML relative to the male-specific models. These findings could imply that the dynamics of matchmaking with females is more complex, and thus requires more complex models, with additional feature measurements included, in order to maximize the ML model’s predictive accuracy for a match. This potentially implies that it is more challenging to create female-specific ML for predicting mutual interest than the less challenging task of creating male-specific ML.

Considering that the embed_lgbm method produced the best performing models, we investigated feature selection results for male and female participants and also examined which features overlapped for both groups. When inspecting male-only traits, we found that career and academic field features, such as whether the individual’s field of study was in business or a Master’s of Business Administration (MBA), were selected. This may allude to male participants preferring partners with similar professional backgrounds. Upon examining female-only traits, embed_lgbm selected diverse fields such as psychology, medicine, biology, international studies, and social studies. Similarly to males, this could imply that women potentially favor partners who share similar professional backgrounds to themselves. Results also support the existence of overlap between what features ML relies upon in both men and women for predicting mutual interest, such as differences in age from their partner, if their partner is funny, if their partner has shared interests, and if their partner is intelligent. However, we observed that ML models potentially imply that men value business related traits in a partner.

Literature comparison

Our analysis represents the fourth major study focused on this dataset, and the first to use machine learning technology in the analysis. Previous experimental research on this dataset aimed to uncover patterns that could explain why the number of interracial marriages is low³. Investigators found that race impacted attractiveness scores, and that age and other determinants were found to have an impact on racial preferences³. An additional study focused on gender differences in speed dating, with the goal of examining the role of an individual’s attributes in decision-making, focused on three preferences: attractiveness, intelligence, and ambition⁴. They found that men prefer women that have a level of ambition and intelligence that is lower than their own. Furthermore, it was found that women place greater emphasis on intelligence than men. Finally, an additional study investigated contrast effects in sequential decisions in the context of speed dating⁵. The results implied that previous partner attractiveness led to a decrease in current inclination to date, with only males being sensitive to prior partner attractiveness. Although our study design, using AutoML technology, is quite different from the previous three studies focused on this dataset, our findings are consistent with these prior analyses. Our findings confirm that ML models are race-dependent, with race-biased features being consistently selected for in race-biased models, as well as behavioural and performance differences between race-biased and race-blinded technologies, which is consistent with race-based effects previously reported³. Our gender divided ML analyses are consistent with previous research that emphasizes the importance of intelligence in the female cohort⁴. Finally, our ML analyses are also consistent with previously reported male biases in favour of female attractiveness⁵. By using AutoML technology, our analysis has also established a variety of sets of features that together are predictive of mutual interest in both females and males, providing novel additional contributions in this domain in the context of modern learning technology.

Strengths, weaknesses and future work

This study included gender-specific ML models that were either race-biased or race-blinded. This design allowed us to compare ML model characteristics that differ between the two genders. Future work can investigate race-specific ML models, in order to better understand racial effects in predicting mutual interest. Unfortunately, a plurality of the samples in this dataset were acquired from a Caucasian/American population, and so sample size would be greatly compromised by dividing this dataset into race-based cohorts. However, such a study design may provide additional race-specific social insights in mutual interest, and so can be completed as part of a future study on a larger dataset consisting of more samples from non-Caucasian populations.

The dataset collected neither considered non heterosexual, nor transgendered individuals. While this may, in part, be a reflection of the era in which the data was collected (the study was performed between 2002 and 2004 in New York City, and so was completed well over 20 years ago), it necessarily results in a study with a much more narrow focus than could have theoretically been performed. The categorization of individuals as only male or female and only interested in the opposite sex not only causes fundamental limitations to this dataset, it forces our machine learning analysis to address the problem as a binary classification one. In reality, many more types of humans exist, and so a future study could collect a much larger sample size, have representation from a wide array of individuals, and the resultant, potentially more nuanced machine learning algorithms could address the matchmaking process in the context of a larger and richer dataset, and potentially help us better understand mating preferences across all segments of society. Furthermore, if a more nuanced dataset were established, whereby matched individuals ranked or evaluated their matches on a scale, then we could transition our technology from a fundamentally binarized (classification) model to a regression prediction learner which would estimate the individual’s assessment of how good a given match will be. Such an approach could theoretically support a more nuanced, less binarized assessment of mutual interest.

A strength of this study includes reproducibility, having relied upon public-domain software (df-analyze) and a public-domain dataset originally acquired at Columbia University and published on in several extremely interesting peer-reviewed studies. While the sample size in this analysis (n = 8,378) was good, future work would benefit from reproducing these study findings with a larger independent dataset, acquired at multiple institutions in multiple countries and including a wide array of genders and races. Although we cannot reproduce these study findings on an independent dataset, as to the best of our knowledge, no comparable open access dataset exists on this topic, our analysis employed reproducible public domain ML software, and we reserved a large 40% of samples for testing, amounting to data from 3,351 study participants, helping to ensure that study findings are reliable and reproducible.

Having race-blinded ML technology could theoretically result in improvements in the underlying ethics of the matchmaking process. This potential ethical improvement is necessarily theoretical as, in their present form, a ML algorithm blinded to race may still construct a complex model that may still replicate the underlying biases present in the training samples. A superior approach could be to avoid collecting racial information during the original experiment, as well as to make effort to avoid recruitment of individuals with racial biases, so as to avoid contaminating the training dataset with discriminatory ground truth data.

Future work can statistically assess the predictive value of the feature measurements identified in this analysis in a univariate context. This will help discern whether the features identified have predictive power on their own, or whether their predictive capacity is limited to their joint role as part of an overarching feature set. Future work can also investigate the potential for explainable machine learning to help us better understand gender effects and racial biases in matchmaking.

Materials and methods

Dataset description

The dataset used in this study was collected from speed dating events, as part of Columbia Business School’s HurryDate research project, which aimed to understand dating preferences and decision-making, and is publicly available for download²⁰. The dataset contains a total of 8,378 samples, representing interactions between participants of opposite sexes during these events²⁰. Participants attended multiple four-minute speed dates (amounting to a short conversation), and after each date, they provided ratings of their partner and decided whether they wanted to meet again. This decision, recorded for both participants, forms the basis of the target variable of interest (match), which indicates mutual agreement to exchange contact information. The dataset contains a large collection of variables potentially predictive of mutual interest, including demographic information (e.g., age, race, gender), self-evaluations, partner ratings (e.g., attractiveness, intelligence), and decisions made during the dates. A listing of the feature measurements upon which ML models can base their predictions of mutual interest is provided at the beginning of Appendix A, in the supplementary materials.

Preprocessing and machine learning

Data cleaning is performed automatically as part of df-analyze¹³, the automated machine learning (AutoML) software package relied upon for this analysis. Df-analyze performs feature type inference, and data imputation with the median value for numerical variables and the mode value for categorical variables. Redundant and empty features, along with features for which all samples have the same or unique values are also excluded from analyses in df-analyze. We conducted a series of machine learning analyses using df-analyze, a command-line tool that automates many aspects of the machine learning workflow, particularly for small to medium-sized tabular datasets¹³. df-analyze automates key processes including feature type inference, univariate associations, data cleaning, and train-test splitting¹³. It also handles feature selection, hyperparameter tuning, model selection, and validation, allowing for rapid and efficient analysis with a wide variety of leading machine learning techniques¹³. df-analyze was instrumental in preprocessing our dataset and preparing it for analysis. We used df-analyze to explore several different models and hypotheses regarding biases in predictions of mutual interest. First, we created a baseline model that included all features, including demographic variables such as gender and race. To explore the impact of these variables, we also built gender-specific models (one for males, one for females). We then repeated these analyses (ML built on the full dataset, only for the males, and only for the females) excluding all race variables. These various configurations allowed us to test how demographic factors influenced the ML-based match predictions and assess potential biases by comparing ML models.

Each supported machine learning algorithm, including the high-performing light gradient boosting machine (lgbm)²¹, the random forest (rf)²², logistic regression (lr)²³, stochastic gradient descent (sgd)²⁴, and K nearest neighbour (knn)²⁵, were paired with each supported feature selection method, including no feature selection (none), filter-based association (assoc) and prediction (pred) methods, as well both the linear- and lgbm-based embedded methods (embed_linear, embed_lgbm). Feature selection methods contributing to our results have been previously reviewed²⁶. All methods included in this analysis are supported through df-analyze AutoML software¹³. Thus, we provide a thorough analysis of feature selection and machine learning combinations towards the creation of high-performing ML models for predicting mutual interest between candidates from opposite genders. All models were tuned with Optuna hyperparameter tuning, with 50 repetitive trials, which is supported in df-analyze¹³.

The machine learning technologies included in this study were selected for as they are the standard set of supported learning machines in df-analyze AutoML technology. They were selected for their computational efficiency, high predictive accuracy (lgbm, rf), and ability to act as statistical (sgd, lr), and traditional computational (knn) learning machine baselines. The machine learning models supported by df-analyze were implemented in python with scikit learn, which is thoroughly documented through the df-analyze github repository¹³.

The target variable for mutual interest, match, is a binary variable (1 = match, 0 = no match), thus we are focused on a binary classification problem. A match is defined as mutual agreement between both participants to exchange contact information. Interest in a match was removed as a predictor variable to ensure that we weren’t creating trivial ML models that could learn to predict a match based on mutual interest as defined in the dataset, thus supporting the creation of models that base predictions on a variety of potentially interesting factors.

Statistical analysis

We evaluated the performance of various classification models using multiple metrics to assess their predictive capabilities. The models were compared across metrics including accuracy (acc), the area under the receiver operating characteristic curve (auroc)²⁷, sensitivity (sens), specificity (spec), balanced accuracy (bal-acc)²⁸, positive predictive value (ppv), negative predictive value (npv), and F1-score (F1). Reviews of metrics relied upon for evaluation of machine learning algorithms are available in the literature^11,12,29. Accuracy was used to measure the overall proportion of correct predictions, while auroc provided an assessment of the model’s ability to distinguish between the positive and negative classes across operating points. Each model was tested on the holdout set as well as using 5-fold cross-validation on the hold-out set to ensure robustness and generalizability. The hold-out set was set to a large 40% of the dataset, randomly selected to help ensure reliability of results.

Conclusions

This study investigated the role of racial and gender biases in mate selection through the application of machine learning for predicting mutual interest in males and females with both race-biased and race-blinded technologies developed. ML revealed many gender-specific factors predictive of mutual interest. For example, leading female-specific ML models predicted mutual interest based, in part, on whether the male was perceived as funny, whether the male was deemed sincere, and assessments of shared interests. Conversely, leading male-specific ML models predicted mutual interest based, in part, on the male’s perception of female attractiveness, and a variety of potential proxies for male confidence (i.e. their perception of the number of females interested in themselves, their perception of the number of females they will be matched with, and how likely they think it is that their partner likes them). Racial features were selected as predictive of mutual interest in both genders. Results also demonstrate that it is possible to create race-blinded ML models with approximately equal performance to those biased by racial information, supporting the creation of more ethical, inclusive, and behavior-focused technologies.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(31.1KB, docx)}

Abbreviations

Acc: Accuracy
AI: Artificial Intelligence
Assoc: Filter-based Association FS
Auroc: Area under the receiver operating characteristic curve
bal-acc: Balanced Accuracy
embed_lgbm: Embedded lgbm FS
embed_linear: Embedded linear FS
f1: F1 score
FS: Feature Selection
Knn: K Nearest Neighbour
Lgbm: Light Gradient Boosting Machine
Lr: Logistic Regression
ML: Machine Learning
Npv: Negative Predictive Value
Ppv: Positive Predictive Value
Pred: Filter-based Prediction FS
Rf: Random Forest
Sens: Sensitivity
Sgd: Stochastic Gradient Descent
Spec: Specificity
Wrap: Wrapper based redundancy aware FS

Author contributions

Study conceptualization: AH, AH, JL; Software: AH, AH, DB; analysis: AH, AH, JL; Validation: AH, AH, DB; Manuscript writing: AH, AH; Manuscript editing: AH, AH, JL.

Funding

This study was financially supported by a Canada Foundation for Innovation grant, a Nova Scotia Research and Innovation Trust grant, an NSERC Discovery grant, a Compute Canada Resource Allocation, and a Nova Scotia Health Authority grant to J.L.

Data availability

The data relied upon in this analysis is publicly available at: https://openml.org/search?type=data&status=active&id=40536. The software relied upon in this study is publicly available and is available at: https://github.com/stfxecutables/df-analyze/blob/master/README.md.

Declarations

Competing interests

J.L. is founder of Time Will Tell Technologies, Inc. The authors have no relevant conflicts of interest to declare.

Institutional review board statement

This research was conducted with public domain software on a public domain dataset, as such, no Institutional Review Board approval was needed for this study.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Aimee Hastings-James and Andrew Hinman.

References

1.Pearson, C. ‘A decade of fruitless searching’: the toll of dating app burnout. New. York Times Aug31https://www.nytimes.com/2022/08/31/well/mind/burnout-online-dating-apps.html(2022).
2.Xie, T. Gen Z Revives Singles Events As Many Abandon Dating Apps, Bloomberg.com, Jun. 28 https://www.bloomberg.com/news/articles/2024-06-28/gen-z-revives-singles-events-as-many-abandon-dating-apps(2024).
3.Iyengar, S., Kamenica, E. & Simonson, I. Racial preferences in dating: evidence from a speed dating experiment. Rev. Econ. Stud.75, 117–132 https://business.columbia.edu/faculty/research/racial-preferences-dating-evidence-speed-dating-experiment(2008).
4.Iyengar, S., Kamenica, E. & Simonson, I. Gender differences in mate selection: evidence from a speed dating experiment. Quart. J. Econ.121, 673–697. 10.1162/qjec.2006.121.2.673 (2006).
5.Bhargava, S. & Fisman, R. Contrast effects in sequential decisions: evidence from speed dating. Rev. Econ. Stat.96 (3), 444–457. 10.1162/REST_a_00416 (Jul. 2014).
6.Noorbakhsh-Sabet, N., Zand, R., Zhang, Y. & Abedi, V. Artificial intelligence transforms the future of health care. Am. J. Med.132 (7), 795–801. 10.1016/j.amjmed.2019.01.017 (Jul. 2019). [DOI] [PMC free article] [PubMed]
7.Jaeger, J. Artificial intelligence is algorithmic mimicry: why artificial ‘agents’ are not (and won’t be) proper agents, Neurons, behavior, data analysis, and theory, 10.51628/001c.94404 Feb. (2024).
8.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349 (6245), 255–260. 10.1126/science.aaa8415 (Jul. 2015). [DOI] [PubMed]
9.Burkart, N. & Huber, M. F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res.70, 245–317. 10.1613/jair.1.12228 (Jan. 2021).
10.Singh, A., Thakur, N. & Sharma, A. A review of supervised machine learning algorithms, 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 1310–1315. Available https://ieeexplore.ieee.org/document/7724478 (2016).
11.Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Comput. Electr. Eng.40 (1), 16–28. 10.1016/j.compeleceng.2013.11.024 (Jan. 2014).
12.Li, J. et al. Feature selection: A data perspective. ACM Comput. Surveys. 50 (6), 1–45. 10.1145/3136625 (Jan. 2018).
13.stfxecutables df-analyze/README.md at master · stfxecutables/df-analyze, GitHub, (2024). https://github.com/stfxecutables/df-analyze/blob/master/README.md (accessed Oct. 20, 2024).
14.Levman, J. et al. A morphological study of schizophrenia with magnetic resonance imaging, advanced analytics, and machine learning. Front. NeuroSci.1610.3389/fnins.2022.926426 (Aug. 2022). [DOI] [PMC free article] [PubMed]
15.Figueroa, J., Etim, P., Shibu, A. K., Berger, D. & Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset, Electronics, 13(21), 4326–4326 10.3390/electronics13214326 Nov. (2024).
16.Penner, M., Berger, D., Guo, X. & Levman, J. Machine learning in differentiated thyroid cancer recurrence and risk prediction. Appl. Sci.15 (17), 9397 (2025). [Google Scholar]
17.Kendall, J., Gaspar, G., Berger, D. & Levman, J. Machine learning and feature selection in pediatric appendicitis. Tomography11 (8), 90. 10.3390/tomography11080090 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Huang, X., Gauthier, C., Berger, D., Cai, H. & Levman J. Identifying cortical molecular biomarkers potentially associated with learning in mice using artificial intelligence. Int. J. Mol. Sci.26, 6878. 10.3390/ijms26146878 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Saville, K., Berger, D. & Levman, J. Mitigating bias due to race and gender in machine learning predictions of traffic stop outcomes. Information15 (11), 687. 10.3390/info15110687 (Nov. 2024).
20.OpenML Openml.org, (2024). https://openml.org/search?type=data&status=active&id=40536 (accessed Oct. 20, 2024).
21.Yang, H. et al. LightGBM robust optimization algorithm based on topological data analysis. Cornell Univ. Arxivhttps://arxiv.org/pdf/2406.13300(2024).
22.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
23.Kleinbaum, D. G. & Klein, M. Logistic Regression, Springer, (2010).
24.Ketkar, N. Stochastic Gradient Descent, Chapter in Deep Learning with Python, Apress, Berkeley, CA 10.1007/978-1-4842-2766-4_8 (2017).
25.Zhang, Z. Introduction to machine learning: k nearest neighbors. Ann. Transl Med.4 (11), 218. 10.21037/atm.2016.03.37 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Jovic, A. et al. A review of feature selection methods with applications, Proceedings of the International Convention on Information and Communication Technology, Electronics and Microelectronics, 10.1109/MIPRO.2015.7160458 (2015).
27.de Hond, A. A. H. et al. Interpreting area under the receiver operating characteristic curve. The Lancet Digit. Health, 4(12), E853–E855, (2022). [DOI] [PubMed]
28.Brodersen, K. H. et al. The balanced accuracy and its posterior distribution. Proc. Int. Conf. Pattern Recognit.10.1109/ICPR.2010.764 (2010). [Google Scholar]
29.Naidu, G. et al. A review of evaluation metrics in machine learning algorithms. Proc. Artif. Intell. Appl. Networks Syst.10.1007/978-3-031-35314-7_2 (2023). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(31.1KB, docx)}

Data Availability Statement

[CR1] 1.Pearson, C. ‘A decade of fruitless searching’: the toll of dating app burnout. New. York Times Aug31https://www.nytimes.com/2022/08/31/well/mind/burnout-online-dating-apps.html(2022).

[CR2] 2.Xie, T. Gen Z Revives Singles Events As Many Abandon Dating Apps, Bloomberg.com, Jun. 28 https://www.bloomberg.com/news/articles/2024-06-28/gen-z-revives-singles-events-as-many-abandon-dating-apps(2024).

[CR3] 3.Iyengar, S., Kamenica, E. & Simonson, I. Racial preferences in dating: evidence from a speed dating experiment. Rev. Econ. Stud.75, 117–132 https://business.columbia.edu/faculty/research/racial-preferences-dating-evidence-speed-dating-experiment(2008).

[CR4] 4.Iyengar, S., Kamenica, E. & Simonson, I. Gender differences in mate selection: evidence from a speed dating experiment. Quart. J. Econ.121, 673–697. 10.1162/qjec.2006.121.2.673 (2006).

[CR5] 5.Bhargava, S. & Fisman, R. Contrast effects in sequential decisions: evidence from speed dating. Rev. Econ. Stat.96 (3), 444–457. 10.1162/REST_a_00416 (Jul. 2014).

[CR6] 6.Noorbakhsh-Sabet, N., Zand, R., Zhang, Y. & Abedi, V. Artificial intelligence transforms the future of health care. Am. J. Med.132 (7), 795–801. 10.1016/j.amjmed.2019.01.017 (Jul. 2019). [DOI] [PMC free article] [PubMed]

[CR7] 7.Jaeger, J. Artificial intelligence is algorithmic mimicry: why artificial ‘agents’ are not (and won’t be) proper agents, Neurons, behavior, data analysis, and theory, 10.51628/001c.94404 Feb. (2024).

[CR8] 8.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349 (6245), 255–260. 10.1126/science.aaa8415 (Jul. 2015). [DOI] [PubMed]

[CR9] 9.Burkart, N. & Huber, M. F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res.70, 245–317. 10.1613/jair.1.12228 (Jan. 2021).

[CR10] 10.Singh, A., Thakur, N. & Sharma, A. A review of supervised machine learning algorithms, 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 1310–1315. Available https://ieeexplore.ieee.org/document/7724478 (2016).

[CR11] 11.Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Comput. Electr. Eng.40 (1), 16–28. 10.1016/j.compeleceng.2013.11.024 (Jan. 2014).

[CR12] 12.Li, J. et al. Feature selection: A data perspective. ACM Comput. Surveys. 50 (6), 1–45. 10.1145/3136625 (Jan. 2018).

[CR13] 13.stfxecutables df-analyze/README.md at master · stfxecutables/df-analyze, GitHub, (2024). https://github.com/stfxecutables/df-analyze/blob/master/README.md (accessed Oct. 20, 2024).

[CR14] 14.Levman, J. et al. A morphological study of schizophrenia with magnetic resonance imaging, advanced analytics, and machine learning. Front. NeuroSci.1610.3389/fnins.2022.926426 (Aug. 2022). [DOI] [PMC free article] [PubMed]

[CR15] 15.Figueroa, J., Etim, P., Shibu, A. K., Berger, D. & Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset, Electronics, 13(21), 4326–4326 10.3390/electronics13214326 Nov. (2024).

[CR16] 16.Penner, M., Berger, D., Guo, X. & Levman, J. Machine learning in differentiated thyroid cancer recurrence and risk prediction. Appl. Sci.15 (17), 9397 (2025). [Google Scholar]

[CR17] 17.Kendall, J., Gaspar, G., Berger, D. & Levman, J. Machine learning and feature selection in pediatric appendicitis. Tomography11 (8), 90. 10.3390/tomography11080090 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Huang, X., Gauthier, C., Berger, D., Cai, H. & Levman J. Identifying cortical molecular biomarkers potentially associated with learning in mice using artificial intelligence. Int. J. Mol. Sci.26, 6878. 10.3390/ijms26146878 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Saville, K., Berger, D. & Levman, J. Mitigating bias due to race and gender in machine learning predictions of traffic stop outcomes. Information15 (11), 687. 10.3390/info15110687 (Nov. 2024).

[CR20] 20.OpenML Openml.org, (2024). https://openml.org/search?type=data&status=active&id=40536 (accessed Oct. 20, 2024).

[CR21] 21.Yang, H. et al. LightGBM robust optimization algorithm based on topological data analysis. Cornell Univ. Arxivhttps://arxiv.org/pdf/2406.13300(2024).

[CR22] 22.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]

[CR23] 23.Kleinbaum, D. G. & Klein, M. Logistic Regression, Springer, (2010).

[CR24] 24.Ketkar, N. Stochastic Gradient Descent, Chapter in Deep Learning with Python, Apress, Berkeley, CA 10.1007/978-1-4842-2766-4_8 (2017).

[CR25] 25.Zhang, Z. Introduction to machine learning: k nearest neighbors. Ann. Transl Med.4 (11), 218. 10.21037/atm.2016.03.37 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Jovic, A. et al. A review of feature selection methods with applications, Proceedings of the International Convention on Information and Communication Technology, Electronics and Microelectronics, 10.1109/MIPRO.2015.7160458 (2015).

[CR27] 27.de Hond, A. A. H. et al. Interpreting area under the receiver operating characteristic curve. The Lancet Digit. Health, 4(12), E853–E855, (2022). [DOI] [PubMed]

[CR28] 28.Brodersen, K. H. et al. The balanced accuracy and its posterior distribution. Proc. Int. Conf. Pattern Recognit.10.1109/ICPR.2010.764 (2010). [Google Scholar]

[CR29] 29.Naidu, G. et al. A review of evaluation metrics in machine learning algorithms. Proc. Artif. Intell. Appl. Networks Syst.10.1007/978-3-031-35314-7_2 (2023). [Google Scholar]

PERMALINK

Gender effects and racial biases in mate selection as revealed by machine learning

Aimee Hastings-James

AJ Hinman

Derek Berger

Jacob Levman

Abstract

Supplementary Information

Introduction

Closely related work

Machine learning (ML) introduction

Study hypotheses and contributions

Results

All data – baseline and race-agnostic ML models

Table 1.

Table 2.

Female data – baseline and race-agnostic ML models

Table 3.

Table 4.

Male data – baseline and race-agnostic ML models

Table 5.

Table 6.

Comparing male and female cohorts

Prediction method (pred)

Embedded light gradient boosting machine method (embed_lgbm)

Filter-based association method (assoc)

Embedded linear method (embed_linear)

Discussion

Machine learning race-blinded models

Gender effects revealed by machine learning

Literature comparison

Strengths, weaknesses and future work

Materials and methods

Dataset description

Preprocessing and machine learning

Statistical analysis

Conclusions

Supplementary Information

Abbreviations

Author contributions

Funding

Data availability

Declarations

Competing interests

Institutional review board statement

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases