Abstract
Recent technological advances such as microprocessors and random-access memory have had a significant role in gathering, storing and processing digital data, but the basic principles underpinning such data management were established in the century preceding the digital revolution. This paper maps the emergence of those older technologies to show that the logic and imperative for the surveillance potential of more recent digital technologies was laid down in a pre-digital age. The paper focuses on the development of the data point from its use in punch cards in the late 19th century through its manipulation in ideas about correlation to its collection via self-completion questionnaires. Some ways in which medicine and psychology have taken up and deployed the technology of data points are used as illustrative exemplars. The paper concludes with a discussion of the role of data points in defining human identity.
Keywords: data points, digital identity, digital technologies, social life of methods
In 1880 the decennial US Census enumerated 50 million people. It was a complex process capturing economic activity as well as population characteristics and involved using over 200 different forms containing 13,000 different questions. Collation of this vast quantity of data was by hand; clerks read Census returns and counted responses using tally marks on a separate record. This was a laborious activity, especially if some sort of cross tabulation was desired, and it took over seven years to publish the main Census results, many responses remaining uncounted and unpublished. Based on this experience it was estimated that the Census planned for 1890, involving the enumeration of over 62 million people would not be published until well into the following century.
In the event, the 1890 Census was published in only three years due to the use of punch cards and an Electric Tabulating Machine invented by Herman Hollerith (1889). His solution to the data management problem was to transfer responses from the Census forms to a card divided into quarter-inch squares, each square being assigned a particular value or designation. A hole was then punched in that square from a special keyboard if, for example, the respondent identified themselves as male but left unpunched if female. The Electric Tabulating Machine could then ‘read’ the cards by making an electrical connection if a hole was present. Hollerith’s company further developed its ‘pantograph’ punch keyboards and tabulating machines and in 1924 became the International Business Machines Corporation (IBM), whose 80-column punch cards became the standard for mechanical data handling right up to the computer age at the end of the 20th century.
Hollerith’s system offered an efficient solution to the technical problem of data management. By the 1930s, an experienced keyboard operator could punch about 1,800 cards a day and electromechanical ‘card readers’ could rapidly summarize the data. The key innovation of the Tabulating Machine, however, was not efficiency but what might be called the ‘data point’. A round hole, about three-sixteenths of an inch in diameter, established a mechanism for collating, storing and analysing the individual characteristics of large numbers of people, the presence or absence of the hole echoing the binary coding of modern digital technology. In effect Hollerith’s punch cards marked the beginning of the digital age nearly a century before ‘digital’ microprocessor technologies.
Hollerith’s data management technologies spread widely during the 20th century, as punch cards and sorting machines were further developed (Heide, 2009; Kistermann, 1991). These devices and methods, though, had a double life (Law, 2009). On the one hand, they described reality – in the original instance, contemporary characteristics of the US population – and on the other hand they were performative in that they enacted or produced the very characteristics they purported to measure (Ruppert, 2007). This constructive task was based on the punch card and Tabulating Machine devices but also, crucially, on their outputs. The latter were not merely inscriptions but a new way of structuring information, a proto-digital language that itself had a social life. The data point of the punched hole reduced data to a singular atom, the smallest possible piece of information. These data points could be joined together into larger digital structures; they could be the basic building blocks of a molecular knowledge of the individual that, when combined, manipulated and compared, offered a new descriptive language of identity. The accumulation of these data points – and given the capacity of Hollerith’s punch cards and Tabulating Machine the potential scope was large – has come to be recognised as Big Data, though there were many steps between the original Tabulating Machine and modern digital technologies. Some of these earlier developments were based around mechanical improvements in the Tabulating Machine but most were concerned with developing what might be called the ‘social-physics’ of the data point. This involved determining both what iota of information could be captured in the punched hole and how that data point could be ‘spatialized’ to underpin a new information network that might both describe and construct a digital individual. This paper describes the social life of data points before the digital age drawing illustrative material from two sciences, psychology and medicine, both of which have played an important role in transposing the descriptive language of the individual from idiosyncratic narrative to the atomistic data collected on the punch card.
Punch cards, data points and medical application
The success of the Hollerith Electric Tabulating Machine for recording and analysing the US Census led to its widespread use whenever large quantities of data needed collating and storing. Punch cards proved useful in areas as diverse as farming, opinion polls and the military. Indeed, during WWII the US Army had numerous Machine Record Units, Mobile (MRUs) ‘each complete and self-contained … (consisting) of two huge truck-trailers carrying complete machine records equipment mounted on rubber shock absorbers and sprung carriages’ (Jay, 1942: 10). MRUs ‘landed on the beaches of Normandy, Sicily, Italy, and the islands of the Pacific even before docking facilities had been established’ (Donges, 1948: 65).
Punch cards began to be used in medical settings whenever clinical data could be standardized in some way. Surveys of chronic illness and impairment, for example, made extensive use of punch cards (Britten and Goddard, 1932; Research Division of the Milbank Memorial Fund, 1930) as did later large community surveys such as Framingham and Midtown Manhattan (Dawber et al., 1951; Srole et al., 1962). Post-war mass screening also produced codified data that could be transferred to punch cards for storage and analysis (Schenthal, 1960). Capturing routine clinical data on punch cards, however, proved more challenging as clinical records followed a narrative form, trying to capture the patient’s illness ‘story’ followed by identification of the clinical clues for diagnostic problem-solving. Punch cards were used in the inter-war years only in situations where clinical data was already standardized, such as the homogenous patients attending a thyroid clinic (Lerman and Means, 1933) or when a tabular summary of diagnoses could be developed (Berkson, 1936). When clinical data was more numeric – such as assessments of physiological functioning under stress of potential pilots in the US space programme – use of a punch card-based clinical record could be explored though even then the space programme investigators were surprised to find that ‘contrary to all assumptions and expectations … there had apparently been very little actual recording of medical data as such, on machine record cards, and none in any great detail, such as this project would require’ (Schwichtenberg et al., 1959: 1324).
The rapid expansion in demand for laboratory investigations in the second half of the 20th century enabled punch cards to make further inroads into clinical settings:
Once the data have been acquired, how is the information to be subsequently manipulated? In our own laboratory we use a simple IBM870 punch card data processing system and with this we have been able to deal quite efficiently with the results obtained in the routine haematology laboratory. (Nelson, 1969: 5)
By the late 1960s, laboratories were also experimenting with processing their punch card records through off-site mainframe computers whenever access could be arranged. Over the next few years these computing resources became more available and were increasingly located in the laboratory itself alongside the punch card keyboards and sorters.
Early computing power increased the potential for a punch card-based medical record. In one experimental setting, data from 138 outpatients was transferred on to punch cards, checked and sorted using electronic sorting/counting machines but then analysed using a digital computer that could identify patterns of signs and symptoms with great speed (Schenthal et al., 1960). A Journal of the American Medical Association editorial summarized the advantages of the new system: Unlike ‘analog computers’ (such as the electrocardiograph) ‘the electronic digital computer has its input in the form of representatives of discrete digits. The input digit to a digital computer can be any symbol which the computer has been designed to accept’ (JAMA editorial, 1960: 58). In the following years, further attempts were made to reduce the narrative language of medicine to a binary format, as systems for encoding clinical terminology were developed and a new science of clinical informatics emerged.
While the punch card might have started the digitization process it did not survive to see its full implementation. In the 1970s, magnetic tape started to be used to back up punch cards. Then, as direct data entry to tape or disk expanded, the use of punch cards declined. The storerooms of punch cards that recorded previously collected data were transferred onto magnetic tape and then the individual cards were discarded. For data from the large Framingham cohort, for example, ‘as technology advanced, punched cards moved to reel-to-reel tapes, to big floppy discs, to smaller ones, to CDs, to secure FTP, enabling data transfer with increasing content and efficiency’ (Sorlie et al., 2015: 17). In 1937, IBM was producing 10 million punch cards every day, but by the 1990s, after 100 years of dominating the management of data points, the era of punch cards was ending.
Hollerith, the inventor of the punch card, had originally been inspired by railroad tickets that let the conductor create a rough description of the traveller (Austrian, 1982). The rail ticket was printed with several descriptive terms – tall or short, male or female, slim or stout, etc – and the ticket collector would clip out the words that did not apply. The passenger ended up with a ‘punch photograph’, a self-portrait on their ticket made up in a proto-digital format. So, while modern digital technology in medicine certainly owes much to the technical solutions offered by the microprocessor and random-access memory of the last two decades, some key elements of the ‘digital revolution’ had already been established over the preceding century. The punch card both allowed data points to become the basis of a modern descriptive digital language, an ever more detailed ‘punch photograph’, but also encouraged the reduction of information to proto-digital points. The punch card technology, however, that enabled and promoted a digital descriptive language also heralded a parallel revolution in the emergence of virtual multi-dimensional data points.
The social-geometry of the data point
The 1890 US Census presented standardized questions to respondents and their responses – sex, marital status, etc – were reduced to a series of data points captured on punch cards. At its simplest, the presence or absence of a hole in the card might denote, say, a respondent’s recorded sex, thereby enabling the numbers of males or females in the whole population to be determined by running the data cards through the Tabulating Machine. Where more than one response was allowed, there were several potential holes in the punch card to accommodate the various possibilities. In this way, two or more data points as represented by holes in the card could be combined to record any multi-digit number.
The initial output from the Tabulating Machine was enumeration: How many males or 54 year olds were there in the population? But, as its name implied, the Tabulating Machine could also use relay logic to enable counts of combinations of holes such as the number of females in the population who were also married. The output was a ‘table’. Tables had been used earlier in the 19th century as a means of organizing and presenting information ranging from the ‘table of contents’ at the beginning of a textbook to mortality tables that presented lists of deaths at different ages in an abbreviated form. The table that emerged from the Tabulating Machine, however, involved comparing individuals across two variables at the same time. By cross-tabulating two variables a new conceptual two-dimensional space was opened in which individuals could be located, the intersection of variables establishing new virtual ‘spatialized’ data points distributed in the cells of the table. Statistical methods emerged to summarize the degree of association of these data points and by the late 1930s their presentation was being labelled as ‘contingency tables’ that captured the way in which one variable depended on another. The geometry of this dimensionalized data point can be more clearly seen in graphical representations that also began to be formalized at the turn of the 19th century.
For much of the 19th century, ‘graphic’ approaches had implied the outputs of various ‘graph’ devices such as the sphygmograph, stereograph, radiograph, etc., but towards the end of the century ‘graphical statics’ emerged as a means of using geometry in the presentation of data. Its main form was the ‘method of curves’, in which two sets of data were presented together as two lines with the measurement of time being the same for both variables: ‘It calls our attention to sequences and coincidences of time, and prompts us to seek for the causal connection between them’ (Marshall, 1885: 252). In 1880, the ‘graphic method as applied to physiological investigation’ was introduced to a medical audience in a description of its basic elements:
The starting-point of the system was a space or distance moving in a horizontal direction past a fixed point in a certain period of time …. By studying the relations of the abscissa and the ordinate, which together constituted the coordinates … we could calculate almost anything in physiology. (Arnold, 1880: 109)
The temperature or clinical chart illustrates contemporary data point presentation in medicine. Until the end of the 19th century a chart in medicine referred to any sort of list, table, diagram or drawing that provided a summary of an area, such as a ‘Chart of Poisons’. But the introduction of the temperature chart in the last two decades of the 19th century showed patient temperatures over a period of time plotted on ‘co-ordinate paper’ with a line joining the points: ‘the clinician just needed to glance at the temperature chart above their beds to determine whether the patient was suffering from any febrile disease’ (Ashby, 1881: 69). Other ‘data points’, the pulse rate, blood pressure and respiration rate, were later added to the ‘clinical chart’ to produce additional temporal lines.
Just as the table created a two-dimensional space by cross-tabulating two variables, so the revolution in graphic presentation was marked by the subsequent spatialization of data points. To be sure, the lines and connected points that underpinned the method of curves were represented by pairs of data – perhaps a temperature and the time it was taken – but it was less a form of cross-tabulation and more a comparison of frequencies. The transformation in graphical representation involved the plotting of one variable directly against another. ‘Up to this time’, it was observed in 1919, ‘we have thought of the values of xi and yi as functions of i, and have plotted them as two separate curves on the axis of i as a base’. Instead, it was proposed to plot the graph of x against y:
The result will not, however, be an ordinary graph, since y will not, in general, be a single-valued function of x. To any value of x may correspond many values of y, any one of which may be repeated more than once. A typical graph of y as a function of x will therefore [show] every dot (x, y) [that] represents a pair of values of x and y belonging to some individual, and the total number of dots is equal to the total “population”. (Huntington, 1919: 426)
The new data point dot, therefore, held not only a value of x but also a value of y.
The disseminated data points defined by two variables were described as ‘scatter diagrams’ (later also as ‘scatter plots’ or ‘dot charts’) in which all two-value data points were represented by dots or small crosses. In the early 20th century, these diagrams became more common: ‘In the last few years there has been a great increase in the appreciation and use of statistical charts’ (Fisher, 1917: 577). In 1915, a Joint Committee on Standards for Graphic Presentation (1915), which included engineers, economists, statisticians, psychologists, mathematicians, accountants, educators and others, drew up guidance for how this new two-dimensional space should be represented. The abscissa (the x-axis) was to run along the bottom from left to right and the ordinate (the y axis) was to be placed on the left-hand side with values running from bottom to top, and so on. These were the specifications that would guide data point presentation for the next hundred years.
Unlike the earlier lines and curves that linked data, the data points in the scatter diagram were difficult to connect. Rather the distribution and density of the data points indicated the ‘correlation’ (co-relation) of the two variables and accordingly the scatter diagram was often referred to as a correlation diagram or correlation graph. The term ‘correlation’ indicated a new form of relationship between variables based simply on co-occurrence: ‘Instead of speaking of “causal relation”, “causally related quantities”, we will use the terms “correlation,” “correlated quantities”’ (Yule, 1897: 812). A correlation coefficient summarized this relationship and a line of ‘best fit’ between the data points (a regression line) captured the implied relationship between the two variables across the target group or population.
Between about 1890 and 1920 the new social geometry of the data point was laid out. The data points of the punch card were multiplied by the unfolding of two-dimensional spaces (and, in subsequent decades, multi-dimensional spaces). New data points in the form of dots became visible in scatter diagrams and scatterplots, measures of association and correlation coefficients unified their elements, and a new line capturing the distribution of points summarized the interdependence of variables. A pattern for understanding the relationship between variables based on causation gave way to one based on correlation as a web of connections between disparate measures was established. Individual measures of the person – the Census variables or the anthropomorphic assessments of stature – could be used to create a proliferation of new two-dimensional data points that had the potential to transform knowledge of the individual. This proto-digital language was further developed in the emerging area of psychometrics.
Mental mechanics
In 1890, Cattell and Galton proposed using ten ‘mental tests and measurements’ to capture psychological characteristics of the individual. Each test involved a brief ‘experiment’ by the tester after which the subject’s response was recorded. The proposed tests were like pre-existing anthropomorphic measurements such as height and weight in that each produced single data points. They were also similar in the way in which they examined aspects of physical or physiological functioning: Tests, for example, assessed grip pressure, speed of movement and reaction time to a sound. As Cattell and Galton (1890) noted, ‘it was impossible to separate bodily from mental energy’ so it was assumed that measures of the former would be valid measures of the latter (p. 374).
At the end of the 19th century, Binet had proposed that intelligence could be better measured by means of the higher psychological processes than by elementary sensory processes. A new scale devised in 1905 with his colleague, Simon, consisted of 30 different tests. A revised scale published in 1908 dropped some simple tasks, added new ones at the higher end of the scale and provided instructions for summing the individual test scores. A third revision of the Binet-Simon scales appeared in 1911, offering five tests for different age levels which, together with a new scoring system, allowed the mental level of the child to be calculated. This mental level or age could easily be compared with chronological age and by dividing mental age by chronological age and then multiplying by 100 a ‘quotient’ could be derived. Later, in 1916, the scale was further revised to produce the now familiar ‘IQ test’.
The emergence of the IQ test at the beginning of the 20th century was one example of the fundamental contribution of psychology to the stabilization and intensification of data point technologies. Items in questionnaires provided the basic elements for aggregated data points, such as IQ scores, that attempted to capture some aspect of the psychological functioning of individuals. In the first few decades of the 20th century, psychologists used this new technology to devise tests to measure various aspects of mental functioning such as ‘Memory’, ‘Imagination’, ‘Attention’, ‘Suggestibility’, ‘Moral Sentiments’, ‘Nervousness’ and the ‘Faculty of Discriminating Loudness’. Each of these mental attributes could be represented as data points in which a human characteristic previously known only in narrative form was reduced to a single number through aggregation of other numbers. Whether a child could touch his or her right ear, for example, could be added to success in the ability to repeat numbers to derive a single score.
The derivation of scores by mathematically manipulating existing data points (mainly by addition) proved a popular activity in the early 20th century, and methods for combining the results from several tests were promoted (Woodworth, 1912). Yet together with their resultant data points the proliferation of mental tests faced several challenges. First, how could the addition of individual test scores to create a measure of Nervousness or Suggestibility, say, be justified when adding or removing one item (that is, one data point) might change the final score? Second, how could a measure, of Imagination perhaps, be ‘validated’ as an accurate summary of that faculty? And third, how did these various human attributes relate to one another: Were Memory and Moral Sentiments, for example, associated? The solution to all these problems was correlation. As Cattell (1893) noted, ‘A still further advance is made, when we are able to correlate the frequency of an association with the time it takes. Here we have, indeed, the beginnings of a mental mechanics’ (p. 322). Or, as Spearman (1904) advocated a few years later, a ‘Correlational Psychology’ was needed ‘for the purpose of positively determining all psychical tendencies, and in particular those which connect together the so-called “mental tests” with psychical activities of greater generality and interest’ (p. 205).
The correlation of two psychological variables could be interpreted in two ways. Either the two variables were measuring the same underlying construct or they were measuring two different things that were related in some way. Correlation of IQ scores and classroom test results, for example, might be used to validate IQ – surely educational performance and IQ results were both the product of some underlying ‘intelligence’? – or to show that one of the variables, say intelligence as measured by an IQ score, determined better classroom performance. A correlation between two psychological measures therefore established a tension between the independence of virtual constructs and their connectedness. The result was an increasing web of relationships in which psychological identity crystallized, a mesh of data points whose correlations linked otherwise virtual human characteristics.
At first, these psychological constructs were anchored in physical measures. In the early years of mental testing, for example, physical and physiological measures formed the second variable in the correlation table, as they promised a grounding for the newly emerging psychological constructs. The correlation of intellectual ability with the shape of the head or with the ratio of height to weight, for example, involved attempts to validate psychological constructs in the physical world. The finding of small or non-existent correlations did not, however, detract from a correlation activity that was less based on empirical findings and more on exploring the interrelatedness of human attributes that could not be directly measured. Correlation allowed exploration of the interconnectedness of those same attributes, assessing the degree to which two variables mapped the same conceptual space, establishing a network of relationships – expressed as data points – around everyone. Within a few years, psychological constructs began to float free as their web of connections focused on psychological attributes alone.
The new use of correlation, a measure of how empirical data points related one to another, increased rapidly in the early 20th century. It
became possible for anyone who could add and multiply to correlate any two series of data that he might encounter, and with multiple correlation methods, any three, four, five or more series; and with little knowledge of the meaning of what he was doing to believe that the resulting coefficients were measures of the true laws of relationship, regardless of the inadequacies of the data. (Malenbaum and Black, 1937: 71)
It was reported that a professor of statistics in a Midwestern institution one day encountered the director of the agricultural experiment station on the campus, and the following conversation ensued: ‘You’ve been giving So-and-So a course in statistics, haven’t you?’ … ‘Why, Yes?’ … ‘Well, Good Lord, can’t you do something to make him stop correlating?’ (p. 71).
The correlation coefficient captured the relationship between the data points of two variables. This analysis could be taken further with both multiple correlation linking many variables and partial correlation aiming at isolating the specific connection between two variables while allowing for the effect of others. To make sense of this dense network of correlations, the technique of factor analysis emerged in the 1930s, by which ‘hidden’ factors underpinning correlation matrices could be identified, labelled and quantified into yet more data points.
If we have a table of intercorrelations for a battery of motor tests it is of considerable psychological interest to know how many independent motor abilities it is necessary to postulate in order to account for the whole table of intercorrelations … It is probable that these methods of multiple factor analysis will be useful in discovering how many factors underlie a given table of correlation coefficients and in discovering their general nature. (Thurstone, 1931: 427)
The role of mental mechanics in developing the technology of the data point was profound. It was not only a case of correlating variables, of creating correlational spaces in which virtual data points could materialize, but of assigning a number to a psychological attribute that could not be directly envisioned or measured: This was the data point that existed in a multi-dimensional space, its attributes reflecting the data points of its constitutive axes. Ironically, just as these data points could claim to reveal a new numerical description of the individual, their combination and comparison in correlational psychology involved a choreography of data points quite separated from the individual. Whether intelligent students did well at school was answered by correlating intelligence scores with academic grades: it was the data points that represented individuals and it was the new data points/correlation coefficients that characterized an aspect of individuality, albeit without needing to return to and re-measure the individuals concerned.
The reduction of human characteristics to data points challenged the new discipline of psychology to measure and relate seemingly intangible human attributes such as ‘Aesthetic Response’, ‘Aggressiveness’, ‘Confidence’, ‘Excitability’, ‘Cheerfulness’ and ‘Honesty’. The solution was the accumulation of data points located in multi-dimensional property spaces (Barton, 1955). Facets of identity could be stabilized in a space defined by construct validity (Cronbach and Meehl, 1955) and methodological triangulation (Campbell and Fiske, 1959). In the 19th century, a man or woman had ‘character’, expressible in a poetic language; in the 20th century he or she had a ‘personality’, a digital portrait of individuality that was constructed through an elaborate assemblage of data points bound together by correlation coefficients and factor analyses (Schiele et al., 1943). ‘The correlational psychologist is a mere observer of a play where Nature pulls a thousand strings; but his multivariate methods make him equally an expert, an expert in figuring out where to look for the hidden strings’ (Cronbach, 1957: 675).
Controlled answer questionnaire technology
The emphasis of mental tests on speed or success in completing specific tasks meant they were difficult to take beyond the laboratory or classroom or conduct with large numbers of people. Moreover, it was apparent they only tapped a limited range of psychological functions:
I do not wish to be hasty in discarding all of the many methods of mental testing now in vogue, but one can have little hesitation in saying, after careful examination of them, that they are limited, narrow and unsatisfactory. (Swift, 1916: 82)
The ‘over-rapid growth of mental tests’ meant that they failed to measure psychological functions ‘which are not of a fairly routine character’ (Moore, 1916: 227). Methods were needed that would transpose other aspects of mental life quickly and efficiently into novel data points. These methods were adapted from the 19th century questionary or questionnaire.
The questionary invited responses to a pre-set list of questions by a target group of respondents. The format involved questions such as ‘What are the mineral products of the country? Where are the mines and quarries situated? At what distance from the port?’ (Fayrer and Yeats, 1878: 216). Without a standardized response, analysis consisted of recording and summarizing text. When questionnaires were introduced into psychology they too obtained ‘open-ended’ inventories (for example, of children’s fears Hall, 1897) that were simply presented as lists. Besides, for some researchers a questionnaire, unlike a test of reaction times or task achievement, required more introspection than children could manage. So, if the questionnaire allowed a mental world to be explored by introspection how could the responses be recorded in terms of data points?
Early in the new century, Yerkes (1914) devised a ‘multiple choice method’ for testing animals, in which pressing one of twelve keys yielded a desired result, such as the presentation of food or the ringing of a bell. A few years later, Myerson (1919) applied a ‘Method of Multiple Choice’ to scoring personality tests for human subjects. Others described ‘cross-out’ tests: ‘The tests described in the present paper are all “cross-out” tests – that is, each one asks of the subject that by crossing out some one thing he eliminate a wrong, irrelevant, or extreme element in a situation’ (Pressey and Pressey, 1919: 138). These ‘controlled answers’ transformed the questionnaire from a narrative enquiry into a data point machine suitable for processing large numbers of respondents. The First World War provided an opportunity to apply the new technology to thousands of recruits:
‘The present tendencies in testing may be conveniently thought of as those evident since 1917. In this year, came a more extensive application of psychological tests to masses of persons than had ever been imagined possible. … The fact is that the government was sufficiently impressed by preliminary trials to put through a universal plan of psychological examining for almost the entire draft army’ (Young, 1924: 40).
The new military questionnaires (the Army Alpha) were derived from existing tests modified for ‘group conditions’ by including ‘“the method of alternative answers”, and pencil and paper methods of dealing with true-false materials, vocabulary tests, etc.’ (Young, 1924: 40). Other questionnaires intended for use in the military, such as Woodworth’s Personal Data Sheet for discovering emotional maladjustments (Pescor, 1934), exploited the potential of ‘controlled answers’ by requiring all questions to be answered with a ‘yes’ or ‘no’. Use of controlled answer questionnaires was further extended with the invention of ‘mark-sense’ data cards in the inter-war years. These consisted of adapted punch cards that used machine-readable pencil marks instead of holes (Watkins, 1943). This innovation allowed a card with questions and answer spaces to be printed so that respondents could self-complete data cards and observers could read the responses by examining the card itself. There was no need for the intermediate step of a separate punch operator to transfer data points from questionnaire to punch card: Respondents could create their own data points. In the early 1930s, the responses to mental tests and questionnaires began to be described as self-reports or ‘self-ratings’.
The controlled answer questionnaire could be used whenever large numbers of people needed ‘processing’. During wars or whenever medicine penetrated communities the questionnaire provided an efficient means of assessing some aspect of health status. In the second half of the 20th century, questionnaires proved useful in collecting patients’ symptoms, the first step in the clinical process: ‘Improved methods of collecting and recording detailed clinical information directly from patients are needed’ (Slack et al., 1966: 194). The Cornell Medical Index-Health Questionnaire and the questionnaire portion of the ‘Multiphasic Health Checkup’ of the Permanente Medical Group, for example, were designed to take standardized histories in a form permitting computer processing of patients’ responses. Each of these questionnaires consisted of medical questions answered ‘yes’ or ‘no’ by patients themselves. Responses to the Cornell questionnaires were then key-punched onto data-processing cards for storage and processing while the Permanente responses were pre-sorted by the patients: ‘The computer itself is used here to collect data directly from the patient. This is a “closed-loop” system. There are no intervening data handlers between the subject-matter expert—that is, the patient—and the computer’ (Slack et al., 1966: 198).
Digitization of patients’ own medical histories, their accounts of symptoms and illnesses, exploited the earlier successes of the self-completion questionnaire. The use of questionnaires to record symptoms (in the context of school performance) started in the 1920s but rapidly expanded with inter-war morbidity surveys, particularly of chronic illness. These surveys were prototypes for later computerized data collection methods. Instead of the doctor opening the consultation with some pleasantry or specific question about the location of pain or the presenting symptom, the interviewing machine presented a standardized interrogation to the patient. This standardization ensured that the patient’s responses could be grouped and summarized numerically. While a patient might use a diverse vocabulary to describe a pain, the controlled answer questionnaire ensured that any subjective experience could be objectified in a series of data points.
Data points, digital technology and identity
In today’s digital world, computers, self-monitoring devices, cloud storage and the vast digital archives of Big Data are among the resources used for production, storage and analysis of the now-ubiquitous data points. The modern digital regime is based on data points and without their growth and development over the previous century it is unlikely that electronic data technologies could have spread with such rapidity. Punch cards – or at least the holes in the cards – were the earliest manifestation of this proto-digital technology allowing the management of data from large numbers of people, a previously challenging task. That capability opened the possibility of collecting quasi-digital data from any ‘population’, as new groups were identified and their characteristics reduced to data points. Gaining knowledge of large bodies of people – soldiers in wars, respondents in community surveys or patients in health care systems – could only be achieved with the aid of punch cards and data points. Analysis of those punch cards by tables, graphs and correlational techniques established the population parameters against which and within which the individual could be known.
Manipulation of data points enabled the interconnectedness of things – mental attributes, physical characteristics, symptoms, diseases, etc. – to escape the hegemony of causation. A two-dimensional data point was sustained only by the presence of other data points. Correlational approaches, for example, changed the relationship of psychological and clinical phenomena, bound together less by cause and more by co-occurrence. Just as factor analysis (and later searches for ‘latent variables’) conjured up new variables beyond immediate perception so clinical correlation began to identify broad pathological processes (such as metabolic syndrome) that underlay the outward manifestations of a variety of diseases, signs and symptoms.
Active involvement of individuals in the production of data through self-completed questionnaires or mental tests involving a ‘closed’ response set signalled a personal contribution to the web of data points that surround, describe and constitute identity. Everyone could create their own ‘punch photographs’, their own self-portraits. These new data points, in their turn, made ‘objective’ what had been subjective not only by expressing the latter in terms of irreducible data points but also in the very act of affirming the data point. When a class of General Psychology students was invited to say whether they had ‘the feeling of being stared at’ (Coover, 1913), not only did the results provide a data point record but the very act of reflecting on the answer may well have had its own effects. Multiplied a thousand-fold, a component of subjectivity was crystallized in the act of data point self-creation.
What has been lost? All those biographical idiosyncrasies that made every mind and every case record unique are disappearing, as without a digital language to describe them they sit uneasily in a digital descriptive taxonomy. Revival of ‘narrative medicine’ or ‘narrative psychology’ might reflect nostalgia for a vanishing world (Greenhalgh and Hurwitz, 1999; Sarbin, 1986), but the data point machinery has replaced the chronicles of events and accounts of experiences with a new language of numerical precision that transposes those same events and experiences. Squeezing human characteristics into the punched hole, into the singular data point that can have no nuances, represents a revolution in the understanding of identity. Even proposals to counteract the depersonalizing effect of a reductionist medicine with ‘social biomarkers’ simply extends the range and power of data points (Prainsack, 2017). The data point, moreover, represents the archetypal surveillance object: What is seen, what is recorded, what is analysed. Surveillance technologies are equally effective in monitoring individual performance or subjective state as both can be reduced to data points. If surveillance (including self-surveillance) both monitors and constructs its objects, then data points form the foundations of modern subjectivity.
Exploration of the social life of data points inverts those histories of statistics that prioritise the contributions of gifted individuals such as Quetelet, Galton or Pearson (Desroiserie, 1998; Hacking, 1990) or the interests of social groups (Porter, 1995). It was the advent of the data point in the late 19th and early 20th centuries, together with its proliferating dimensions, that produced an explosion in the availability of raw material for numerical manipulation. It was only after this ‘abundance event’ that individuals and social groups who pioneered statistical thinking could be recognised and their contribution explained in terms of personal, political and economic factors. Similarly, while the 20th century construction of the ‘averaged American’ (Igo, 2007) or the ‘calculable mind’ (Rose, 1988) might reflect the application of statistics, of numerical and survey methods, of objectification through number, the underpinning materiel was the malleable and kaleidoscopic data point. More recent explorations of the economic and political drivers behind Big Data (Beer, 2016; Kitchin, 2014) also occur in the context of managing the avalanche of data points during the late 20th century. All these phenomena – statistics, numerology, the proliferation of human measurements, the objectification of individuals – rely on a digital language constructed through the generation and manipulation of data points. Data points are the digital atoms that constitute the statistical molecules that allow the assembly of knowledge forms that had not been possible before. Studies of the social life of methods have identified the transformative effects of digital technologies and devices, such as their role in mediating contemporary culture (Beer and Burrows, 2013; Lupton, 2016), yet data points are not the inert products of digital devices; they have a life of their own.
In the 19th century, before data point technology, knowledge of the individual was expressed in terms of adjectives. During the 20th century, that knowledge came increasingly through numerical digits located in dimensional spaces. Certainly, in an adjectival system an individual could have ‘more’ or ‘less’ of some attribute than another, but data points gave both a more precise language and a facility for making many and detailed comparisons. Whatever the measurement scale, people could be ranked and distributed in virtual space. While the temperature chart had connected data points marking body temperature for a single patient, the new data points of punch cards, graphs, correlations and controlled answer questionnaires enabled the juxtaposition of data points from many individuals. The process of collecting data points stimulated comparisons: it was not a celebration of individual character but an identification of difference, of variability, that separated one individual from another. Human experience was no longer to be expressed in prose; with the use of questionnaires it was possible to reduce it to number, a number moreover that could be summed, ranked and compared. Autobiography was no longer the preserve of the rich and famous as everyone could, in a way, take a part in constructing their own data point biographies. These were the precursors that foreshadowed a digital age.
Author biography
David Armstrong is Professor of Medicine and Sociology at King’s College London. His research interests include the sociology of medical knowledge and health services research. He is the author of Political Anatomy of the Body: Medical Knowledge in Britain in the Twentieth Century (Cambridge, 1983) and A New History of Identity: A Sociology of Medical Knowledge (Palgrave, 2002).
Footnotes
ORCID iD: David Armstrong
https://orcid.org/0000-0003-3652-9662
References
- Arnold JWS. (1880) The graphic method as applied to physiological investigation. Boston Medical and Surgical Journal 102(5): 109–110. [Google Scholar]
- Ashby H. (1881) Typhoid fever in children. Boston Medical and Surgical Journal 105(3): 68–69. [Google Scholar]
- Austrian G. (1982) Herman Hollerith: Forgotten Giant of Information Processing. New York: Columbia University Press. [Google Scholar]
- Barton A. (1955) The concept of property space in social research. In: Lazarsfeld PF, Rosenberg M. (eds) Language of Social Research. Glencoe, IL: The Free Press, 40–53. [Google Scholar]
- Beer D. (2016) How should we do the history of Big Data? Big Data & Society. Epub ahead of print 4 May. DOI: 10.1177/2053951716646135. [DOI] [Google Scholar]
- Beer D, Burrows R. (2013) Popular culture, digital archives and the new social life of data. Theory, Culture & Society 30(4): 47–71. [Google Scholar]
- Berkson J. (1936) A tabular outline for use in reporting hospital morbidity. American Journal of Public Health 26(6): 723–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Britten RH, Goddard JC. (1932) Rates of physical impairments in 28 occupations, based on 17,294 medical examinations. Public Health Reports 47(1): 1–25.19315328 [Google Scholar]
- Campbell DT, Fiske DW. (1959) Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 56(2): 81–105. [PubMed] [Google Scholar]
- Cattell JM. (1893) Mental measurement. The Philosophical Review 2(3): 316–332. [Google Scholar]
- Cattell JM, Galton F. (1890) Mental tests and measurements. Mind 15(59): 373–381. [Google Scholar]
- Coover JE. (1913) ‘The feeling of being stared at’: Experimental. American Journal of Psychology 24(4): 570–575. [Google Scholar]
- Cronbach LJ. (1957) The two disciplines of scientific psychology. American Psychologist 12(11): 671–684. [Google Scholar]
- Cronbach LJ, Meehl PE. (1955) Construct validity in psychological tests. Psychological Bulletin 52(4): 281–302. [DOI] [PubMed] [Google Scholar]
- Dawber TR, Meadors GF, Moore FE. (1951) Epidemiological approaches to heart disease: The Framingham Study. American Journal of Public Health 41(3): 279–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desrosières A. (1998) The Politics of Large Numbers: A History of Statistical Reasoning (trans. Naish C.). Cambridge, MA: Harvard University Press. [Google Scholar]
- Donges NA. (1948) How the Army knows its strength. Army Information Digest 3: 65–69. [Google Scholar]
- Fayrer J, Yeats J. (1878) Exportation and exportable products. Journal of the Society of Arts 26(1316): 187–226. [Google Scholar]
- Fisher I. (1917) The ‘ratio’ chart for plotting statistics. Publications of the American Statistical Association 15(118): 577–601. [Google Scholar]
- Greenhalgh T, Hurwitz B. (1999) Why study narrative? British Medical Journal 318(7175): 48–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hacking I. (1990) The Taming of Chance. Cambridge: Cambridge University Press. [Google Scholar]
- Hall GS. (1897) A study of fears. American Journal of Psychology 8(2): 147–249. [Google Scholar]
- Heide L. (2009) Punched-Card Systems and the Early Information Explosion, 1880–1945. Baltimore, MD: Johns Hopkins University Press. [Google Scholar]
- Hollerith H. (1889) An electric tabulating system. School of Mines Quarterly 10: 238–255. [Google Scholar]
- Huntington EV. (1919) Mathematics and statistics, with an elementary account of the correlation coefficient and the correlation ratio. American Mathematical Monthly 26(10): 421–435. [Google Scholar]
- Igo SE. (2007) The Averaged American: Surveys, Citizens, and the Making of a Mass Public. Cambridge, MA: Harvard University Press. [Google Scholar]
- JAMA Editorial (1960) Electronic data processing apparatus. Journal of the American Medical Association 173(1): 58–59. [DOI] [PubMed] [Google Scholar]
- Jay F. (1942) Machine records unit – U.S. Army: Statistics take to the battlefield. Army Life Magazine XXIV: 10–12. [Google Scholar]
- Joint Committee on Standards for Graphic Presentation (1915) Joint committee on standards for graphic presentation. Publications of the American Statistical Association 14(112): 790–797. [Google Scholar]
- Kistermann FW. (1991) The invention and development of the Hollerith punched card. Annals of the History of Computing 13(3): 245–259. [Google Scholar]
- Kitchin R. (2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: SAGE. [Google Scholar]
- Law J. (2009) Seeing like a survey. Cultural Sociology 3(2): 239–256. [Google Scholar]
- Lerman J, Means JH. (1933) The use of record forms and mechanical methods of analysis in the study of clinical data. New England Journal of Medicine 208(22): 1135–1143. [Google Scholar]
- Lupton D. (2016) The Quantified Self. Cambridge: Polity Press. [Google Scholar]
- Malenbaum W, Black JD. (1937) The use of the short-cut graphic method of multiple correlation. Quarterly Journal of Economics 52(1): 66–112. [Google Scholar]
- Marshall A. (1885) On the graphic method of statistics. Journal of the Statistical Society of London 48(5): 251–260. [Google Scholar]
- Moore HT. (1916) A method of testing the strength of instincts. American Journal of Psychology 27(2): 227–233. [Google Scholar]
- Myerson A. (1919) Personality tests involving the principle of multiple choice. Archives of Neurology and Psychiatry 1(4): 459–470. [Google Scholar]
- Nelson MG. (1969) Automation in the laboratory. Journal of Clinical Pathology 22(1): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pescor MJ. (1934) The Woodworth Personal Data Sheet as applied to delinquents. Public Health Reports 49(38): 1111–1115. [PubMed] [Google Scholar]
- Porter TM. (1995) Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, NJ: Princeton University Press. [DOI] [PubMed] [Google Scholar]
- Prainsack B. (2017) Personalized Medicine: Empowered Patients in the 21st Century. New York: New York University Press. [Google Scholar]
- Pressey SL, Pressey LW. (1919) ‘Cross-out’ tests with suggestions as to a group scale of the emotions. Journal of Applied Psychology 3(2): 138–150. [Google Scholar]
- Research Division of the Milbank Memorial Fund (1930) The prevalence of physical impairments at different ages. Milbank Memorial Fund Quarterly Bulletin 8(1): 1–13. [Google Scholar]
- Rose N. (1988) Calculable minds and manageable individuals. History of the Human Sciences 1(2): 179–200. [DOI] [PubMed] [Google Scholar]
- Ruppert E. (2007) Producing population. Working paper 37. Manchester: CRESC; Available at: http://hummedia.manchester.ac.uk/institutes/cresc/workingpapers/wp37.pdf (accessed 6 December 2018). [Google Scholar]
- Sarbin TR. (ed.) (1986) Narrative Psychology: The Storied Nature of Human Conduct. London: Praeger Press. [Google Scholar]
- Schenthal JE. (1960) Multiphasic screening of the well patient: Twelve-year experience of the Tulane University Cancer Detection Clinic. Journal of the American Medical Association 172(1): 51–54. [DOI] [PubMed] [Google Scholar]
- Schenthal JE, Sweeney JW, Nettleton W. (1960) Clinical application of large-scale electronic data processing apparatus. Journal of the American Medical Association 173(1): 90–95. [DOI] [PubMed] [Google Scholar]
- Schiele BC, Baker AB, Hathaway SR. (1943) The Minnesota multiphasic personality inventory. Journal-Lancet 63: 292–297. [Google Scholar]
- Schwichtenberg AH, Flickinger DD, Lovelace WR. (1959) Development and use of medical machine record cards in astronaut selection. US Armed Forces Medical Journal 10(11): 1324–1351. [PubMed] [Google Scholar]
- Slack WV, Hicks GP, Reed CE, et al. (1966) A computer-based medical-history system. New England Journal of Medicine 274(4): 194–198. [DOI] [PubMed] [Google Scholar]
- Sorlie PD, Sholinsky PD, Lauer MS. (2015) Reinvestment in government-funded research. Circulation 131(1): 17–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spearman C. (1904) ‘General intelligence’, objectively determined and measured. American Journal of Psychology 15(2): 201–292. [Google Scholar]
- Srole L, Langner TS, Michael ST, et al. (1962) Mental Health in the Metropolis: The Midtown Manhattan Study. New York: McGraw-Hill. [Google Scholar]
- Swift WB. (1916) Some developmental psychology in lower animals and in man and its contribution to certain theories of adult mental tests. American Journal of Psychology 27(1): 71–86. [Google Scholar]
- Thurstone LL. (1931) Multiple factor analysis. Psychological Review 38(5): 406–427. [Google Scholar]
- Watkins JG. (1943) Machine methods of handling large classes. Journal of Experimental Education 11(3): 243–244. [Google Scholar]
- Woodworth RS. (1912) Combining the results of several tests: A study in statistical method. Psychological Review 19(2): 97–123. [Google Scholar]
- Yerkes RM. (1914) The study of human behavior. Science 39(1009): 625–633. [DOI] [PubMed] [Google Scholar]
- Young K. (1924) The history of mental testing. Pedagogical Seminary 31(1): 1–48. [Google Scholar]
- Yule GU. (1897) On the theory of correlation. Journal of the Royal Statistical Society 60(4): 812–854. [Google Scholar]
