Quantifying and predicting success in show business

Oliver E Williams; Lucas Lacasa; Vito Latora

doi:10.1038/s41467-019-10213-0

. 2019 Jun 4;10:2256. doi: 10.1038/s41467-019-10213-0

Quantifying and predicting success in show business

Oliver E Williams ^1,^✉, Lucas Lacasa ^1,^✉, Vito Latora ^1,^2,^3,^4,^✉

PMCID: PMC6548779 PMID: 31164650

Abstract

In certain artistic endeavours—such as acting in films and TV, where unemployment rates hover at around 90%—sustained productivity (simply making a living) is probably a better proxy for quantifying success than high impact. Drawing on a worldwide database, here we study the temporal profiles of activity of actors and actresses. We show that the dynamics of job assignment is well described by a “rich-get-richer” mechanism and we find that, while the percentage of a career spent active is unpredictable, such activity is clustered. Moreover, productivity tends to be higher towards the beginning of a career and there are signals preceding the most productive year. Accordingly, we propose a machine learning method which predicts with 85% accuracy whether this “annus mirabilis” has passed, or if better days are still to come. We analyse actors and actresses separately, also providing compelling evidence of gender bias in show business.

Subject terms: Computational science, Interdisciplinary studies

For most actors sustained productivity defines success. Here the authors study the careers of actors and identify a "rich-get-richer" mechanism with respect to productivity, the emergence of hot streaks and the presence of gender bias, and are able to predict whether the most productive year of an actor is yet to come.

Introduction

“It’s feast or famine in showbiz.”—Joan Rivers. A sentiment likely to be echoed by many would be stars of the silver screen. But for those that feast the rewards are, at least thought to be, worth the risk. The so-called science of success has recently uncovered many features of the careers of academics¹, artists², and all manner of other individuals whose output can be effectively assessed over the course of their working life^3–5. For instance, in the world of scientific research it has revealed the unpredictability of the location of an academics most impactful work¹, showing that even such prestigious awards as Nobel prizes, which usually occur later in a career⁶, are underpinned by research papers that are located randomly and uniformly throughout the ordered list of papers in the career of the awardee. On the other hand, the anatomy of funding and collaborations in universities has revealed “rich clubs” of leading institutions, and suggested that such patterns of collaborations contribute greatly to the success of these institutions, as measured in terms of over-attraction of available resources and of breadth and depth of their research products⁷. Studies of innovation in industry across different countries have found that the commercial success of manufacturing plants is far more closely related to intra-group links than external ties⁸. Strikingly, these features can be common across multiple areas; the Matthew effect^9,10, or the rich-get-richer phenomenon, and the recently discovered presence of “hot streaks”¹¹, are not restricted to isolated cases. With regards to success, a great deal of work has been done in assessing impact^1,12, the distribution of standout or landmark works^13,14, whether these are related to the age of the individual in question^15,16, how impact can be assessed in the long term¹⁷, and even prediction of future successes^18,19. Indeed the fortunes of both films and the actors and actresses that make them have been studied in some specific ways^17,20–22. These studies do not however address the question that interests those who are not already on the higher rungs of the ladder of success: how can one avoid the famine and build a sustainable career in acting?

The aim of this work is to use a data-driven approach in order to define, quantify and even predict the success of actors and actresses in terms of their ability to maintain a steady flow of jobs. Drawing on the International Movie Database (IMDb), an online database of information related to films, television programs and home videos, www.imdb.com, we study the careers of millions of actors from several countries worldwide, from the birth of film in 1888 up to the present day. Each career is viewed as a profile sequence: the yearly time series of acting jobs in films or TV series over the entire working life of the actor or actress (this is similar in spirit to the approach used in²³ to explore scientific productivity). Note that all acting jobs are considered, regardless of salary, role, screen time, or the impact of the work. The statistical analysis of such a large number of profile sequences allows us to derive some general properties of the actors activity patterns. In particular, we look at several quantities of interest such as career length, productivity (defined as the number of credit jobs in a year or in the entire career of an actor) and position of the annus mirabilis, defined as the year with the largest number of credited jobs. We also explore possible emergence of gender inequality in these properties.

The first message that emerges from our quantitative analysis is that one-hit wonders, i.e., actors whose career spans only a single year, are the norm rather than the exception. Long career lengths and high activity are found to be exponentially rare, suggesting a scarcity of resources in the acting world. These results are in agreement with previously collected evidence, pointing to the fact that unemployment rates in actors hover around 90%, and that as low as 2% of actors are able to make a living out of acting²⁴. We also observe that that this dramatic scarcity unequally applies to actors and actresses, providing compelling evidence of gender bias. Moreover, the total productivity of an actor’s career is found to be power-law distributed, with most actors having very few jobs, while a few of them have more than a hundred. This indicates a rich-get-richer mechanism underpinning the dynamics of job assignments, with already scarce resources being allocated in a heterogeneous way. All of this suggests that, while activity and sustained productivity are by definition measures of performance²⁵, they should in this context be considered as a proxy for success. Only a select few will ever be awarded an Oscar, or have their hands on the walk of fame, but this is not important to the majority of actors and actresses who simply want to make a living. It is the continued ability to work (as opposed to prestige) that is most likely to ensure a stable career. For these reasons we propose that predictions of success in show business should be focused on activity and productivity. Observe at this point that performance is usually conflated with success²⁵. While performance is objectively measured in terms of an individual’s actions, and is typically bounded, success is traditionally measured by recognition, i.e., in terms of impact, and is a collective phenomenon which is unbounded. Notwithstanding, the severe scarcity of resources in show business forces us to redefine an actor’s success, not in terms of popularity or impact, but in terms of activity and productivity as discussed above. Incidentally, note also that being credited on IMDb is to a certain extent funnelled by recognition mechanisms such as popularity—a producer might offer the job to the actor who had the best audition or to the one who has more followers on Instagram—so productivity is not only, strictly speaking, a performance-driven indicator.

Motivated by these results, we then address the questions that interest the majority of working actors and actresses. Questions such as “am I going to get another paid job?” or “is this year going to be my best?”. We first show that efficiency, defined as the ratio between the total number of active years and the career length, is unpredictable, as there is no evident correlation between these two things. This is in line with recent studies¹ pointing out that the most impactful pieces of work in scientific disciplines are equally likely to be located in any position throughout the entirety of an individuals output, and is therefore not predictable. Nevertheless, we here, surprisingly, find distinctive features in their temporal arrangement. In particular, we find that actor careers are clustered in periods of high activity (hot streaks)¹¹ combined with periods of latency (cold streaks). Moreover, we discover that the most productive year (annus mirabilis) for both actors and actresses is located towards the beginning of their career, and that there are clear signals preceding and following the location of the annus mirabilis of an individual. Altogether, these unexpected results lead us to conclude that prediction is possible in theory. Finally, we validate this hypothesis by building a statistical learning model which predicts the location of the most productive year, finding that we can, with up to 85% accuracy, tell whether an actor’s career has reached its most productive year yet or not.

Results

Preliminaries

We study the careers of 1,512,472 actors and 896,029 actresses as recorded on IMDb as of January 16th, 2016, including careers stretching back to the first recorded movie in 1888. The career of each actor a is characterised by his/her track record, which consists of a set of pairs of numbers representing respectively each year when actor a was credited in IMDb, and the number of different credits in that year. As credits we count the number of acting jobs in films and/or TV series. A sketch of the typical activity pattern of an actor is reported in Fig. 1, showing the yearly credits from the first to the last year of thir career. Notice that there are not only active years, where the actor has credited jobs in IMDb, but also latent years with no recorded jobs. We therefore fill the latent years with zeros and construct the profile sequence ${w_{k}}_{k = 1}^{L}$ of each actor a as depicted in the top part of Fig. 1. The quantity w_k denotes the actor’s local productivity in year k, i.e., the number of credited jobs in that year. The length of an actor’s career is defined as the number of years between the first and the last active year (inclusive), and is denoted as L. The total number of active years s is from now on referred to as the activity of an actor. Since a career can have latent years intertwined with active ones we must have s ≤ L, moreover L − s is the number of latent years. By definition we have: (i) L ≥ 1, (ii) s ≥ 1 and (iii) s = 1 ⇔ L = 1.

Fig. 1 — Career activity pattern of an actor. The yearly productivity of a given actor, measured as the total number of IMDb credited jobs in each year, is reported from the first to the last year of the actor activity. Shown is the case of an actor whose career spanned L = 23 years and who was credited a cumulated n = 17 different jobs in s = 12 years. From the yearly productivity we can construct the actor profile sequence w_k, with k = 1, …, L, shown in brackets above the plot, which can be modelled as a stochastic marked point process

Finally, we define the total productivity n of an actor, as the cumulated number of credited jobs, $n = \sum_{k = 1}^{L} w_{k}$ . The annus mirabilis (AM) of a given actor is defined as the year where the actor was credited with the largest number of works in IMDb: AM = m, where m is such that $w_{m} = \max {w_{k}}_{k = 1}^{L}$ . In the case that this m is not unique we take the final such year: AM = max{m}.

Career lengths and one-hit wonders

We start our analysis by exploring the statistics of the career length L. In Fig. 2a we plot in a semi-log scale the empirical distribution of career lengths P(L), for both actors and actresses finding that the tail is well fitted by an exponential distribution. By construction, P(L = 1) = P(s = 1) and this quantity represent the percentage of one-hit wonders i.e., of actors whose career started and ended, according to IMDb, in the same year. Interestingly, we find that the percentage of such cases is extremely high (around 69% for males and 68% for females) and deviates from the otherwise decaying exponential distribution. This sharp deviation highlights that one-hit wonders are not an exception in show business, but, on the contrary, are the norm²⁶. A zoom of the distribution in the range L ∈ [2, 10] is reported in the inset of (a), revealing systematic differences between actors and actresses, suggesting that it is consistently more common to find (non-one-hit wonder) actresses with shorter career lengths than actors. We have indeed performed a model selection experiment which confirms that gender bias is statistically significant (see Supplementary Note 1 for details).

The empirical probability distribution of activities, displaying the probability of sampling an actor that worked in s years, is shown in Fig. 2b in a semi-log scale. Most of the actors and actresses are only active in a single year (s = 1), as by default $s = 1 \mapsto L = 1$ . The probability of finding actors with large activity, i.e., those that have worked in many different years, decays exponentially fast. This exponential decay mimics the similar decay in the probability of finding long career lengths and altogether are the basis for claiming a scarcity of resources in show business, i.e., there are many more actors/actresses than job offers²⁷. This lack of resources naturally leads to a question: how are they allocated? We address this question in the next section.

Productivity and the rich-get-richer phenomenon

Figure 2c shows the empirical distributions of total productivity P(n), reporting the normalised numbers of actors or actresses with n appearances in movies or TV series over their careers. While the career length distribution P(L) and the activity distributions P(s) are well fitted in their tails by an exponential law, the function P(n) decays more slowly and can be fitted by a power law P(n) ~ n^−γ with exponent γ ≈ 2. Notice that similar behaviours have already been found in the context of two-mode actor-movie networks and of other systems that can be modelled as bipartite graphs²⁸. A power law in the distribution of total productivity implies also the existence of scaling in the rank-frequency distribution of productivity. It is indeed well known that observing a power-law distribution with exponent γ for the abundance of some variable is equivalent to obtaining a power-law scaling for the frequency of the variable that appears with rank r: f(r) ~ r^−α²⁹. The exponents of the two scaling laws are mathematically related via α = 1/(γ − 1). The celebrated Zipf’s law refers to the particular case of an exponent α ≈ 1, which is indeed the case here. In turn, the emergence of a Zipf’s law for the rank-frequency distribution of the total productivity of an actor suggests a possible mechanistic explanation for our observations. Many different proposals for the mechanism underpinning the emergence of a Zipf’s law, and several names for the phenomenon itself, have been put forward in various contexts, including the Simon–Yule process, the mechanism of preferential attachment, the Matthew effect, the Gibrat principle, rich get richer, etc.

In this context, we can suggest a possible mechanism for the onset of a power-law distribution for the total productivity in terms of a rich-get-richer phenomenon. Let us consider a generative model of a bipartite graph whose two sets of nodes represent respectively actors and movies. Actors acquire new links to movies, thus increasing their productivity, if they get a role in those movies. Suppose all actor nodes start with zero edges and acquire their first edges only according to a fitness, that is initially assigned at random or on some hypothetical intrinsic acting skill. When more movie nodes enter the network, actor nodes that acquire new edges gain popularity and this, in turn, increases their fitness. As it is well known that producers are more keen to offer a role to popular actors, actor nodes with high fitness are more likely to attract new edges. This leads to a multiplicative effect which clearly expresses the rich-get-richer phenomenon; actors with many job assignments will have a higher chance of working even more than actors with low productivity. In conclusion, the same rich-get-richer mechanism, which is at the heart of networks with power-law degree distributions^30–33, can also be the cause of the observed power laws in the total productivity of movie actors. This result is not at all unexpected, after all, the more well-known an actor is, the more likely producers will want him or her in their next film, if only for commercial purposes. What is perhaps dramatic about this observation is that it is well known that rich-get-richer effects are rather arbitrary and unpredictable, as large hubs can evolve out of unpredictable and random initial fluctuations which have been amplified, and not based on any particular intrinsic fitness³³ (such as acting skills). Quoting Easly and Kleinberg: “if we could roll time back 15 years, and then run history forward again, would the Harry Potter books again sell hundreds of millions of copies, or would they languish in obscurity while some other works of children’s fiction achieved major success?”. As a matter of fact, it seems likely that across different parallel universes productivity would still have a power-law distribution, but it is far from clear that the most productive actors would always be the same. Interestingly, this hypothesis has recently been validated in an online social experiment for the case of musical popularity³⁴. In summary, productivity is probably the variable every actor aims to maximise, but these results suggest that boosting productivity can be more of a network effect^35,36 than a consequence of acting skills.

Efficiency is unpredictable

In Fig. 2 we observed that career length L and activity s are variables which are both exponentially distributed, indicating a scarcity of resources. In this section we further explore whether the two quantities L and s are correlated. We first define an actor’s efficiency as the ratio s/L of active years over the entire career, and we investigate how the efficiency is distributed. The results reported in Supplementary Fig. 1, show that: (i) the efficiency distribution drops rapidly as s/L approaches either zero or one –i.e., most actors and actresses have intermediate values of efficiency– and that (2) for middle-range efficiency the distribution is essentially uniform (see Supplementary Note 2 for additional details). This suggests that efficiency is not predictable and that, for middle-range efficiency, the only correlations that emerge between the activity s and the career length L come from the fact that, by construction, s ≤ L. To further validate this, we performed a scatter plot of s versus L for all actors and actresses, and computed the Pearson correlation coefficient, then compared this to the correlation coefficient of a null model generated by randomly extracting values of L and s from the pool of career profiles, ensuring that L ≥ s (Supplementary Fig. 2). For actors, s and L exhibit a Pearson correlation coefficient r ≈ 0.69, whereas in the null model we obtained r_null ≈ 0.6. In the case of actresses we found r ≈ 0.69 and r_null ≈ 0.58. As expected, s and L are indeed correlated quantities, but the correlations can almost entirely be explained by a null model. In other words, for intermediate ranges there are no additional correlations between length and activity: the activity of actors cannot therefore be predicted by their career length, and we can conclude that the efficiency is an unpredictable quantity.

Actors careers are clustered in hot and cold streaks

To understand the temporal arrangement of active years within the profile sequence of a given actor, we now consider the statistics of waiting times. A waiting time τ is defined as the time elapsed (in years) between two active years (equivalently, a waiting time is a collection of successive latent years), and its statistics provide a classical way to analyse the presence of memory and bursts in time series^37,38. We have estimated the waiting time distribution P(τ) for actors and actresses, discarding those with short career lengths, L < 10 years, to avoid a lack of statistics. To estimate this distribution, for each actor (actress) we count how frequently one observes waiting times of a certain duration τ, and normalise the accumulated frequencies. This process will inevitably introduce finite size biases since, for short career lengths, we are more likely to find short waiting times, simply because there is no room for long ones. For a proper comparison we therefore have also computed the distribution for a randomised null model P_null(τ) where all of the profile sequences have been shuffled (while keeping the first event w₁ and the last event w_L unaltered). A lack of temporal correlations would imply P_null(τ) = P(τ), whereas systematic differences suggest the onset of temporal correlations in the activity of actors. In panel (a) of Fig. 3 we report the difference P(τ) − P_null(τ) as a function of τ.

For both actors and actresses, we systematically find P_null(τ = 1) < P(τ = 1), and P_null(τ > 1) > P(τ > 1), that is, active years are more clustered than they would be by chance, and hence the same is true for periods of inactivity. This means that the profile sequence shows clustering and is composed of bursts of activity (hot streaks) where actors and actresses are more likely, than would be expected by chance, to work in a year if they worked the year before (τ = 1). This result is in agreement with recent findings in other creative jobs in science and art¹¹. Additionally, these hot streaks are interspersed by abnormally long periods of latency (cold streaks) where authors are less likely than random to work in a given year if they did not work the year before (τ > 1).

Furthermore, to appropriately compare deviations from the null model for different waiting times, in Fig. 3b we plot the relative difference (in percentage) [P(τ) − P_null(τ)] · 100/P_null(τ). We find a substantial difference between actors and actresses: while deviation from the null model decays for larger waiting times τ in the case of actors, for actresses this relative deviation is maintained, pointing to a longer memory kernel, in turn suggesting that having a period of latency is overall more detrimental for actresses than for actors.

Predicting the annus mirabilis

It has recently been found that the most impactful publication that a scientist will produce is equally likely to occur at any stage of their career¹. Here we explore a related question in the context of actors and actresses. Instead of impact, the indicator of success under study is productivity, as measured by the number of credited works in IMDb. We concentrate on actors and actresses with working lives extending beyond L = 20 years. We restrict our reported results to those cases where there were at least 5 credited jobs in the annus mirabilis (AM), although other thresholds do produce qualitatively similar results. The subset of actors with L > 20 and more than 5 acting jobs in the AM consists of 15357 actors (1.02%) and 5904 actresses (0.65%). The large gender difference indicates that actors tend to have more acting jobs than actresses.

In Fig. 4 we plot the probability with which the AM will occur at each point within an actor or actress’s career. To be able to compare these probabilities over careers of varying lengths, we have broken up each actor’s time series of L years respectively into 5 bins (other segmentations produce qualitatively similar results). The plots consistently indicate that the most probable location of the annus mirabilis is towards the beginning of a career. Although the results are qualitatively similar for male and female actors, this bias is much more pronounced in the case of actresses, further confirming the gender difference previously observed.

To study whether one can detect the imminent appearance of an actor’s annus mirabilis we have analysed, for both actors and actresses, the average number of acting jobs before and after the AM. In order to do this consistently, we initially perform a translation $k \mapsto κ$ that aligns all profile sequences, so that the annus mirabilis k = y^* all occur at κ = 0. We then define:

ξ (κ) = \frac{1}{∣ A ∣} \sum_{i = 1}^{∣ A ∣} w_{y^{*} + κ}^{(i)},

where κ is the offset from the annus mirabilis and |A| is the size of the set of actors/actresses for which there exists a profile sequence with an input at offset κ. In Fig. 5 we plot ξ(κ), showing that, on average, there is a clear increase in the number of jobs preceding the AM and a clear decrease immediately afterward. This pattern is absent in the corresponding null models obtained by shuffling the profile sequences (red bars).

Fig. 5 — The annus mirabilis is predictable. The total number of acting jobs, ξ(κ), averaged over all a actors and b actresses, is reported as a function of the number of years κ after or before the annus mirabilis. Only actors and actresses with a career lasting more than L = 20 years and annus mirabilis with w > 5 acting jobs have been selected. In both cases, we observe a clear non-monotonic pattern, indicating that the annus mirabilis is either approaching or has just passed. For comparison, we report in red the results obtained for a null model where the profile sequences of all actors and actresses have been shuffled. No pattern emerges in that case

It is interesting to note that similar patterns have been observed before in the context of scientific productivity, although recent research challenges this paradigm²³. As a matter of fact, in²³ the authors leveraged the observed shapes of scientific productivity profiles and followed an unsupervised learning approach to cluster different types of careers. Here, instead, we shall follow a supervised learning approach and will now show how the observed patterns can indeed be exploited to build a method for the early prediction of the annus mirabilis.

Based on our observed distribution of jobs surrounding the annus mirabilis we initially propose a naive early-warning criterion: if the career sequence is non-monotonic around a value of k, i.e., if w_k > w_k−1 and w_k+1 < w_k, then the year k is a good candidate for the annus mirabilis. With this criterion in mind, one could ask the following question: given a sample of an actor or actress’s profile sequence, can we tell whether the annus mirabilis has already passed or not? Mathematically, the question above can be formalised as follows: given a career sequence ${(w_{k})}_{k = 1}^{L}$ such that the maximal total productivity occurs at time k = y^*, consider a truncated sequence ${\bar{w}}_{k} = {(w_{k})}_{k = 1}^{T \leq L}$ . We now wish to know if we can accurately assess whether y^* ∈ {1,...,T} using only ${\bar{w}}_{k}$ . This forms a binary classification problem, in which ${\bar{w}}_{k} \in C_{1}$ if $y^{*} \notin {1, . . ., T}$ and ${\bar{w}}_{k} \in C_{2}$ otherwise. Our naive criterion, as illustrated above, readily provides the heuristic: ${\bar{w}}_{k} \in C_{1}$ if ${\bar{w}}_{k}$ is monotonic, and ${\bar{w}}_{k} \in C_{2}$ if not. When this method is tested on an appropriately generated set $W$ of truncated sequences (see Supplementary Note 3 for details) we find that it is correct ~69.2% of the times for actors, and ~75.0% of the times for actresses. This model now forms a benchmark against which we will test a more refined approach. The idea is to relax our classification method by introducing some parameters which allow for deviation from the rigid heuristic, then train those parameters on some subset $T ⊊ W$ , and subsequently test the trained model on the test set $W \ T$ . To do this let us first define the function

D ({\bar{w}}_{k}) = - \sum_{y = 1}^{T - 1} \min (0, {\bar{w}}_{y + 1} - {\bar{w}}_{y}) .

At each year k the contribution to D from that year is zero if the total productivity in the subsequent year is larger. This means that for a monotonically increasing sequence ${\bar{w}}_{k}$ , $D ({\bar{w}}_{k}) = 0$ . If productivity decreases from year k to k + 1, then D will increase by a corresponding amount.

$D ({\bar{w}}_{k})$ effectively measures how far the sequence ${\bar{w}}_{k}$ is from being monotonically increasing, thus we can use it to relax our naive heuristic by defining some threshold d such that the decision rule $C ({\bar{w}}_{k}, d)$ becomes

C ({\bar{w}}_{k}, d) = \{\begin{matrix} C_{1} & if D ({\bar{w}}_{k}) < d \\ C_{2} & if D ({\bar{w}}_{k}) \geq d . \end{matrix}

This new classifier is more flexible than the naive heuristic as we have introduced a parameter d which can now be optimised (trained) as follows: if we denote $C^{*} ({\bar{w}}_{k})$ as the true class of the sequence ${\bar{w}}_{k}$ , then the optimal value of the parameter d^* is the value of d that minimises the following loss function

L (T, d) = - \sum_{T} δ (C ({\bar{w}}_{k}, d), C^{*} ({\bar{w}}_{k})) .

Where δ(X, Y) yields one if X = Y and 0 otherwise. This value for d^* is then used to classify the remaining sequences in $W \ T$ . The results of this testing on both actors and actresses can be partially summarised by the two confusion matrices CO_m (for actors) and CO_f (for actresses):

{CO}_{m} = [\begin{matrix} 33775 & 5659 \\ 10771 & 52000 \end{matrix}], {CO}_{f} = [\begin{matrix} 12549 & 2593 \\ 3596 & 26682 \end{matrix}]

The classical metrics used to assess the performance of the classifier, namely accuracy, precision, recall and the F1 score, are summarised in Table 1. We find that the accuracies of the prediction are 84 and 86% respectively, i.e., ~10% higher than those obtained using a naive heuristic.

Table 1.

Performance metrics (accuracy, precision, recall and F1 score) of the proposed classification method for the prediction of the annus mirabilis

Quantity	Actors	Actresses
Total $C_{1}$	44652	16145
Total $C_{2}$	57553	29275
Accuracy	0.8405	0.8637
Precision	0.8608	0.8287
Recall	0.7575	0.7773
F1 score	0.8058	0.8021

Open in a new tab

To round off, we have further explored the nature of the ≈15% of samples which are misclassified (see Supplementary Note 3 for details). We found that false negatives (samples for which the annus mirabilis is wrongly predicted to be still yet to come) arise due to the conservative nature of the prediction model, hence more refined versions of the prediction model might yield even better prediction results (Supplementary Fig. 3). Conversely, we find that false positives –where the annus mirabilis is wrongly predicted to have passed– are usually related to actors and actresses experiencing a come-back at a later stage of their careers (see Fig. 6a for an example). Interestingly, the positions of these late bursts of activity seem to be fundamentally difficult to predict (Fig. 6b).

Fig. 6 — Comebacks of actors are unpredictable. a Typical profile sequence of an actor exhibiting a comeback after a long period of latency. Such cases might lead to misclassification when the subcareer fed to the prediction algorithm (highlighted in pink) captures a long latency period: the prediction algorithm wrongly classifies the pink sequence as one where the annus mirabilis has passed. b Semi-log probability distribution of the estimated time lapse from the (wrongly estimated) annus mirabilis to the true one (i.e., the time t_cb to come-back for actors with profile sequences such as the one in panel a), for those misclassified samples where the algorithm wrongly predicts that the annus mirabilis had already passed (a linear binning has been applied to the data). Modelling the position of the secondary peak (comeback burst) as a random variable, the fact that P(t_cb) decays exponentially, suggests that this random variable is memoryless (Poisson process), i.e., the comeback burst is intrinsically unpredictable

Discussion

In this work we have made use of the vast quantity of data presented by IMDb to explore, analyse and predict success on the silver screen. By studying the careers of 1,512,472 actors and 896,029 actresses from 1888 up to 2016, we have uncovered a number of distinctive patterns which include an endemic scarcity of resources, a rich-get-richer mechanism of job assignation, the onset of hot and cold streaks of productivity¹¹ and an annus mirabilis which can indeed be predicted. Such patterns – which we show to systematically differ for actors and actresses, suggesting strong evidence of gender bias²⁶– not only allow us to identify qualities of individual actors or actresses working lives, but also to gain a deeper insight into the mechanisms by which jobs are themselves assigned, where high productivity is not necessarily based on merit and is likely to be a network effect^34–36. Based on our findings, we have then constructed a statistical learning model that predicts with up to 85% accuracy whether an actor or actress is likely to have a brighter future, or if the best days are, unfortunately, behind them. While we expect refined versions of the prediction model to give even higher accuracy, it is worth noting that actors with long latency periods who then experience late comebacks are rare but intrinsically difficult to predict.

We hope that the methods presented and the results obtained will contribute to the new science of success³⁵. Given the scope of our findings across the industry, we also wish that our article will be of interest to those working in show business.

Supplementary information

Supplementary Information^{(379.7KB, pdf)}

Acknowledgements

L.L. acknowledges support from EPSRC grant EP/P01660X/1. V.L. acknowledges support from EPSRC Grant EP/N013492/1.

Author contributions

L.L., V.L. and O.W. designed the study. O.W. and L.L. performed the data analysis. All authors interpreted results and wrote the paper.

Data Availability

Data is available upon request, or can be accessed at 10.17605/OSF.IO/NDTA3

Code Availability

Codes are available upon request.

Competing interests

The authors declare no competing interests.

Footnotes

Journal peer review information: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Oliver E. Williams, Email: o.e.williams@qmul.ac.uk

Lucas Lacasa, Email: l.lacasa@qmul.ac.uk.

Vito Latora, Email: v.latora@qmul.ac.uk.

Supplementary information

Supplementary Information accompanies this paper at 10.1038/s41467-019-10213-0.

References

1.Sinatra R, Wang D, Deville P, Song C, Barabasi AL. Quantifying the evolution of individual scientific impact. Science. 2016;354:3612. doi: 10.1126/science.aaf5239. [DOI] [PubMed] [Google Scholar]
2.Galenson DW. Quantifying artistic success: ranking french painters and paintings from impressionism to cubism. Hist. Methods. 2002;35:5–19. doi: 10.1080/01615440209603140. [DOI] [Google Scholar]
3.Brands S, Brown SJ, Gallagher DR. Portfolio concentration and investment manager performance. Int. Rev. Financ. 2006;5:149–174. doi: 10.1111/j.1468-2443.2006.00054.x. [DOI] [Google Scholar]
4.Gilovich T, Vallone R, Tversky A. The hot hand in basketball: on the misperception of random sequences. Cogn. Psychol. 1985;17:295–314. doi: 10.1016/0010-0285(85)90010-6. [DOI] [Google Scholar]
5.Rabin M, Vayanos. D. The gambler’s and hot-hand fallacies: theory and applications. Rev. Econ. Stud. 2010;77:730–778. doi: 10.1111/j.1467-937X.2009.00582.x. [DOI] [Google Scholar]
6.Fortunato S. Growing time lag threatens Nobels. Nature. 2014;508:186. doi: 10.1038/508186a. [DOI] [PubMed] [Google Scholar]
7.Ma A, Mondragón RJ, Latora V. Anatomy of funded research in science. Proc. Natl Acad. Sci. USA. 2015;112:14760–14765. doi: 10.1073/pnas.1513651112. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Love JH, Roper S. Location and network effects on innovation success: evidence for uk, german and irish manufacturing plants. Res. policy. 2001;30:643–661. doi: 10.1016/S0048-7333(00)00098-6. [DOI] [Google Scholar]
9.Merton RK. The Matthew effect in science. Science. 1968;159:56–63. doi: 10.1126/science.159.3810.56. [DOI] [PubMed] [Google Scholar]
10.Petersen AM, Jung WS, Yang JS, Stanley HE. Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc. Natl Acad. Sci. USA. 2011;108:18–23. doi: 10.1073/pnas.1016733108. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Liu L, et al. Hot streaks in artistic, cultural, and scientific careers. Nature. 2018;559:396–399. doi: 10.1038/s41586-018-0315-8. [DOI] [PubMed] [Google Scholar]
12.Hirsch JE. An index to quantify an individual’s scientific research output. Proc. Natl Acad. Sci. USA. 2005;102:16569–16572. doi: 10.1073/pnas.0507655102. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kozbelt A. One-hit wonders in classical music: evidence and (partial) explanations for an early career peak. Creat. Res. J. 2008;20:179–195. doi: 10.1080/10400410802059952. [DOI] [Google Scholar]
14.Simonton DK. Creative productivity: a predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 1997;104:66. doi: 10.1037/0033-295X.104.1.66. [DOI] [Google Scholar]
15.Lehman, H. C. Age and Achievement, Vol. 4970 (Princeton University Press, Princeton, New Jersey, 1953).
16.Simonton DK. Age and outstanding achievement: What do we know after a century of research? Psychol. Bull. 1988;104:251. doi: 10.1037/0033-2909.104.2.251. [DOI] [PubMed] [Google Scholar]
17.Spitz A, Horvát EAacute. Measuring long-term impact based on network centrality: Unraveling cinematic citations. PLoS ONE. 2014;9:e108857. doi: 10.1371/journal.pone.0108857. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Acuna DE, Allesina S, Kording KP. Future impact: Predicting scientific success. Nature. 2012;489:201. doi: 10.1038/489201a. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S. On the predictability of future impact in science. Sci. Rep. 2013;3:3052. doi: 10.1038/srep03052. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Gemser G, Oostrum MVan, AAM Leenders M. The impact of film reviews on the box office performance of art house versus mainstream motion pictures. J. Cult. Econ. 2007;31:43–63. doi: 10.1007/s10824-006-9025-4. [DOI] [Google Scholar]
21.Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on wikipedia activity big data. PLoS ONE. 2013;8:e71226. doi: 10.1371/journal.pone.0071226. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Pardoe I, Simonton DK. Applying discrete choice models to predict academy award winners. J. R. Stat. Soc.: Ser. A. 2008;171:375–394. doi: 10.1111/j.1467-985X.2007.00518.x. [DOI] [Google Scholar]
23.Way SF, Morgan AC, Clauset A, Larremore DB. The misleading narrative of the canonical faculty productivity trajectory. Proc. Natl Acad. Sci. USA. 2017;114:44. doi: 10.1073/pnas.1702121114. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Consistent percentages obtained via large surveys have been reported in several countries including US, UK or Spain see: N.Clark, Just one actor in 50 makes more than 20,000 per year, survey shows (The Independent, 28th May 2014, online version available at https://www.independent.co.uk); Estudio y Diagnóstico sobre la situación sociolaboral de actores y bailarines en España, Fundación AISGE, 2016, available at https://www.aisge.es/media/multimedia/ficheros/618.pdf; B. Mcmahon, Unemployment is a lifestyle for actors, and not too many others, Huffpost, 5 March 2012, Available at https://www.huffingtonpost.com/brendan-mcmahon.
25.Yucesoy B, Barabasi AL. Untangling performance from success, EPJ Data. Science. 2016;5:17. [Google Scholar]
26.Lutter M. Do women suffer from network closure? The moderating effect of social capital on gender inequality in a project-based labor market, 1929 to 2010. Am. Sociol. Rev. 2015;80:329–358. doi: 10.1177/0003122414568788. [DOI] [Google Scholar]
27.Mortensen, D. T. Handbook of Labor Economics, Vol. 2 849–919. (Elsevier, Amsterdam, 1986).
28.Latapy M, Magnien C, Vecchio NDel. Basic notions for the analysis of large two-mode networks. Soc. Netw. 2008;30:31–48. doi: 10.1016/j.socnet.2007.04.006. [DOI] [Google Scholar]
29.Adamic LA, Huberman BA. Zipf’s law and the Internet’. Glottometrics. 2002;3:143–150. [Google Scholar]
30.Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
31.Bollobas B. & Riordan O. In Handbook of Graphs and Networks (Bornholdt S. and Schuster H. G., eds) 1–34 (John Wiley & Sons, Weinheim, 2005).
32.Latora V., Nicosia V., Russo G. Complex Networks: Principles, Methods and Applications (Cambridge University Press, Cambridge, 2017)
33.Easley D. and Kleinberg J. Networks, Crowds, and Markets: Reasoning About a Highly Connected World Chapter 18 (Cambridge University Press, New York, 2010).
34.Salganik M, Dodds P, Watts D. Experimental study of inequality and unpredictability in an artificial cultural market. Science. 2006;311:854–856. doi: 10.1126/science.1121066. [DOI] [PubMed] [Google Scholar]
35.Barabasi AL. The Formula: The Universal Laws of Success. New York: Little Brown and Company; 2018. [Google Scholar]
36.Fraiberger SP, Sinatra R, Resch M, Riedl C, Barabási AL. Quantifying reputation and success in art. Science. 2018;362:825–829. doi: 10.1126/science.aau7224. [DOI] [PubMed] [Google Scholar]
37.Bak P, Christensen K, Danon L, Scanlon T. Unified scaling law for earthquakes. Phys. Rev. Lett. 2002;88:178501. doi: 10.1103/PhysRevLett.88.178501. [DOI] [PubMed] [Google Scholar]
38.Corral A. Long-term clustering, scaling, and universality in the temporal occurrence of earthquakes. Phys. Rev. Lett. 2004;92:108501. doi: 10.1103/PhysRevLett.92.108501. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(379.7KB, pdf)}

Data Availability Statement

Data is available upon request, or can be accessed at 10.17605/OSF.IO/NDTA3

Codes are available upon request.

[CR1] 1.Sinatra R, Wang D, Deville P, Song C, Barabasi AL. Quantifying the evolution of individual scientific impact. Science. 2016;354:3612. doi: 10.1126/science.aaf5239. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Galenson DW. Quantifying artistic success: ranking french painters and paintings from impressionism to cubism. Hist. Methods. 2002;35:5–19. doi: 10.1080/01615440209603140. [DOI] [Google Scholar]

[CR3] 3.Brands S, Brown SJ, Gallagher DR. Portfolio concentration and investment manager performance. Int. Rev. Financ. 2006;5:149–174. doi: 10.1111/j.1468-2443.2006.00054.x. [DOI] [Google Scholar]

[CR4] 4.Gilovich T, Vallone R, Tversky A. The hot hand in basketball: on the misperception of random sequences. Cogn. Psychol. 1985;17:295–314. doi: 10.1016/0010-0285(85)90010-6. [DOI] [Google Scholar]

[CR5] 5.Rabin M, Vayanos. D. The gambler’s and hot-hand fallacies: theory and applications. Rev. Econ. Stud. 2010;77:730–778. doi: 10.1111/j.1467-937X.2009.00582.x. [DOI] [Google Scholar]

[CR6] 6.Fortunato S. Growing time lag threatens Nobels. Nature. 2014;508:186. doi: 10.1038/508186a. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Ma A, Mondragón RJ, Latora V. Anatomy of funded research in science. Proc. Natl Acad. Sci. USA. 2015;112:14760–14765. doi: 10.1073/pnas.1513651112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Love JH, Roper S. Location and network effects on innovation success: evidence for uk, german and irish manufacturing plants. Res. policy. 2001;30:643–661. doi: 10.1016/S0048-7333(00)00098-6. [DOI] [Google Scholar]

[CR9] 9.Merton RK. The Matthew effect in science. Science. 1968;159:56–63. doi: 10.1126/science.159.3810.56. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Petersen AM, Jung WS, Yang JS, Stanley HE. Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc. Natl Acad. Sci. USA. 2011;108:18–23. doi: 10.1073/pnas.1016733108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Liu L, et al. Hot streaks in artistic, cultural, and scientific careers. Nature. 2018;559:396–399. doi: 10.1038/s41586-018-0315-8. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Hirsch JE. An index to quantify an individual’s scientific research output. Proc. Natl Acad. Sci. USA. 2005;102:16569–16572. doi: 10.1073/pnas.0507655102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Kozbelt A. One-hit wonders in classical music: evidence and (partial) explanations for an early career peak. Creat. Res. J. 2008;20:179–195. doi: 10.1080/10400410802059952. [DOI] [Google Scholar]

[CR14] 14.Simonton DK. Creative productivity: a predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 1997;104:66. doi: 10.1037/0033-295X.104.1.66. [DOI] [Google Scholar]

[CR15] 15.Lehman, H. C. Age and Achievement, Vol. 4970 (Princeton University Press, Princeton, New Jersey, 1953).

[CR16] 16.Simonton DK. Age and outstanding achievement: What do we know after a century of research? Psychol. Bull. 1988;104:251. doi: 10.1037/0033-2909.104.2.251. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Spitz A, Horvát EAacute. Measuring long-term impact based on network centrality: Unraveling cinematic citations. PLoS ONE. 2014;9:e108857. doi: 10.1371/journal.pone.0108857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Acuna DE, Allesina S, Kording KP. Future impact: Predicting scientific success. Nature. 2012;489:201. doi: 10.1038/489201a. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S. On the predictability of future impact in science. Sci. Rep. 2013;3:3052. doi: 10.1038/srep03052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Gemser G, Oostrum MVan, AAM Leenders M. The impact of film reviews on the box office performance of art house versus mainstream motion pictures. J. Cult. Econ. 2007;31:43–63. doi: 10.1007/s10824-006-9025-4. [DOI] [Google Scholar]

[CR21] 21.Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on wikipedia activity big data. PLoS ONE. 2013;8:e71226. doi: 10.1371/journal.pone.0071226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Pardoe I, Simonton DK. Applying discrete choice models to predict academy award winners. J. R. Stat. Soc.: Ser. A. 2008;171:375–394. doi: 10.1111/j.1467-985X.2007.00518.x. [DOI] [Google Scholar]

[CR23] 23.Way SF, Morgan AC, Clauset A, Larremore DB. The misleading narrative of the canonical faculty productivity trajectory. Proc. Natl Acad. Sci. USA. 2017;114:44. doi: 10.1073/pnas.1702121114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Consistent percentages obtained via large surveys have been reported in several countries including US, UK or Spain see: N.Clark, Just one actor in 50 makes more than 20,000 per year, survey shows (The Independent, 28th May 2014, online version available at https://www.independent.co.uk); Estudio y Diagnóstico sobre la situación sociolaboral de actores y bailarines en España, Fundación AISGE, 2016, available at https://www.aisge.es/media/multimedia/ficheros/618.pdf; B. Mcmahon, Unemployment is a lifestyle for actors, and not too many others, Huffpost, 5 March 2012, Available at https://www.huffingtonpost.com/brendan-mcmahon.

[CR25] 25.Yucesoy B, Barabasi AL. Untangling performance from success, EPJ Data. Science. 2016;5:17. [Google Scholar]

[CR26] 26.Lutter M. Do women suffer from network closure? The moderating effect of social capital on gender inequality in a project-based labor market, 1929 to 2010. Am. Sociol. Rev. 2015;80:329–358. doi: 10.1177/0003122414568788. [DOI] [Google Scholar]

[CR27] 27.Mortensen, D. T. Handbook of Labor Economics, Vol. 2 849–919. (Elsevier, Amsterdam, 1986).

[CR28] 28.Latapy M, Magnien C, Vecchio NDel. Basic notions for the analysis of large two-mode networks. Soc. Netw. 2008;30:31–48. doi: 10.1016/j.socnet.2007.04.006. [DOI] [Google Scholar]

[CR29] 29.Adamic LA, Huberman BA. Zipf’s law and the Internet’. Glottometrics. 2002;3:143–150. [Google Scholar]

[CR30] 30.Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Bollobas B. & Riordan O. In Handbook of Graphs and Networks (Bornholdt S. and Schuster H. G., eds) 1–34 (John Wiley & Sons, Weinheim, 2005).

[CR32] 32.Latora V., Nicosia V., Russo G. Complex Networks: Principles, Methods and Applications (Cambridge University Press, Cambridge, 2017)

[CR33] 33.Easley D. and Kleinberg J. Networks, Crowds, and Markets: Reasoning About a Highly Connected World Chapter 18 (Cambridge University Press, New York, 2010).

[CR34] 34.Salganik M, Dodds P, Watts D. Experimental study of inequality and unpredictability in an artificial cultural market. Science. 2006;311:854–856. doi: 10.1126/science.1121066. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Barabasi AL. The Formula: The Universal Laws of Success. New York: Little Brown and Company; 2018. [Google Scholar]

[CR36] 36.Fraiberger SP, Sinatra R, Resch M, Riedl C, Barabási AL. Quantifying reputation and success in art. Science. 2018;362:825–829. doi: 10.1126/science.aau7224. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Bak P, Christensen K, Danon L, Scanlon T. Unified scaling law for earthquakes. Phys. Rev. Lett. 2002;88:178501. doi: 10.1103/PhysRevLett.88.178501. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Corral A. Long-term clustering, scaling, and universality in the temporal occurrence of earthquakes. Phys. Rev. Lett. 2004;92:108501. doi: 10.1103/PhysRevLett.92.108501. [DOI] [PubMed] [Google Scholar]

PERMALINK

Quantifying and predicting success in show business

Oliver E Williams

Lucas Lacasa

Vito Latora

Abstract

Introduction