Skip to main content
Data in Brief logoLink to Data in Brief
. 2025 Nov 27;64:112323. doi: 10.1016/j.dib.2025.112323

Touch-based interaction dataset for user behavioral analysis in mobile devices

Daniel Garabato 1,, Mario Casado 1, Carlos Dafonte 1, Manuel F López-Vizcaíno 1, Marco A Álvarez 1, Francisco J Nóvoa 1
PMCID: PMC12720175  PMID: 41438634

Abstract

As smartphones and tablets are increasingly integrated into numerous aspects of everyday life, security on mobile devices is becoming important. Although facial recognition or fingerprint scanning are commonly employed to verify user identity, they are not always suitable for on-demand validation during specific moments of interaction. This limitation has motivated the search for alternative solutions, particularly those based on behavioral biometrics, as they enable the capture and analysis of unique interaction patterns without disrupting the user experience.

This paper describes a dataset comprising touch-based interactions collected from 37 distinct users within a controlled ad-hoc scenario designated for authentication purposes. The dataset includes a wide range of touch gestures, from single-touch events (e.g., tap, swipe or pan) to multi-touch interactions (e.g., pinch or rotate), which can be utilized to extract individual behavioral patterns from user-device interactions, thus supporting further research.

In particular, the data acquisition process is thoroughly described, so that raw data can be appropriately understood, as well as those features that were extracted to carry out our own research. Apart from the data being available on a public repository, we also include some base code that can help other researchers to handle raw data and extract the aforementioned features, so that they can adapt or even extend them according to their own needs.

Keywords: Biometrics, Touch-based gestures, Authentication, Mobile interaction


Specifications Table

Subject Computer Sciences
Specific subject area Artificial Intelligence in Cybersecurity, Authentication based on Behavioral Biometrics
Type of data Raw Events (JSON format), Extracted features (CSV format)
Data collection Touch-based interaction data were captured for 37 volunteers through an ad-hoc mobile application developed to this purpose. Different scenes were proposed so that the most common gestures were captured in order to study behavioral patterns, mostly oriented to authentication purposes. Every user was appropriately informed about the information gathered and its intended use, requesting their consent. During the data acquisition process itself, all the users used the same device, a Samsung Tab S4, to guarantee that the data was not dependent on the device used. Moreover, to ensure that the user sticks to the guidelines provided and to guarantee the quality of the data being gathered, we monitored every user during such a process.
On the one hand, raw data is available in JSON format, which includes essentially positions, timestamps and contact surface measurements for each active finger. This results in around 24,000 events (700,000 individual measurements) as they were recorded by the application, with no further processing.
On the other hand, we also provide some features derived from motion and geometric characteristics, which have already been sanitized (spurious or outlier samples were removed) and preprocessed (average values are provided for each event, rather than for individual measurements). In addition, the code used to extract and sanitize such features is also available so that the user can reproduce, adapt or even extend them to conduct their own research.
Data source location Faculty of Computer Science, University of A Coruna, Spain.
Data accessibility Repository name: AITouch – Data (Mendeley Data) [1]
Data identification number: 10.17632/9v7bxv3dcc.1
Direct URL to data: https://data.mendeley.com/datasets/9v7bxv3dcc/1
Related research article None currently published (in preparation)

1. Value of the Data

  • This dataset provides a structured collection of user interactions with touchscreen devices in different scenarios that were specifically developed to study behavioral patterns, mostly intended for authentication purposes. Such data were collected from 37 volunteers who had to complete several stages that included the most common usage gestures, so that those recorded interactions were consistent and comparable across all of them.

  • Based on our analysis of the state of the art, we are able to assert that no existing dataset has been found to encompass the diversity of gestures included in ours (taps, swipe, pan, pinch and rotate), while also involving such a substantial number of real-world users.

  • This dataset provides approximately 24,000 raw events (700,000 measurements), thereby enabling any researcher to conduct a comprehensive data processing workflow — from data cleaning and outlier removal to the extraction of features of interest in a customized manner and the application of suitable AI models.

  • In addition, we have also included a feature set derived from raw data, tailored for the analysis of gesture dynamics in the context of biometric-based user authentication. This supplementary information enables researchers to replicate our methodology and experiments with alternative AI models without the overhead of implementing their own feature extraction pipeline, thus ensuring reproducibility and facilitating direct comparison of results.

  • Apart from the aforementioned application to the cybersecurity field, this dataset could also serve for other purposes, such as gesture detection and classification or even identify common usage patterns to improve UX (User Experience) designs.

2. Background

Historically, human-computer interaction relied on keyboards and mice as the predominant input devices. However, as technology evolved, new devices emerged along with new interaction mechanisms, and touch screens became the most popular ones due to the boom of smartphones and tablets.

These devices are not immune to cybersecurity attacks [2,3], and some countermeasures must be put in place [4,5]. Keystroke and mouse dynamics have been extensively studied for authentication purposes [6,7], in which the identity verification process is leveraged to the unique usage patterns performed by individuals to type or navigate through the system. Similarly, an analogous procedure could be followed using touch-based inputs, where gestures can contribute to building a unique signature for each user.

Proving that touch-based features are suitable for authentication is our main research goal and, therefore, a dataset to support our experimentation is required. Some existent datasets, such as Touchalytics [8], BioIdent Dataset [9] or BrainRun [10] offer limited interaction types. Therefore, we gathered a custom dataset which includes a wide variety of representative gestures collected from different controlled scenarios so that they are consistent between users, allowing researchers not only to address authentication but also other topics.

3. Data Description

This dataset1 contains touch-based events collected in an ad-hoc application designed to include representative gesture actions from users in different scenarios, as is described below. The whole dataset contains data from 37 users who completed the entire data acquisition process, resulting in around 24,000 events that were empirically categorized into four gesture types, as it can be observed in Table 1.2

Table 1.

Distribution of events according to the estimated gesture type.

Gesture type Number of events
Pan 2300
Swipe 18,800
Pinch 1400
Rotate 300
unknown 800

On the one hand, the collected raw data can be found in JavaScript Object Notation (JSON) format under the raw_data/data.json file, which includes all the data as captured by the application, with no further processing. This file contains a list of JSON objects, each one contains information related to those events recorded for a given user, as described in Table 2 according to the following schema:

[
{
"user": string,
"events": [
{
"centerX": integer,
"centerY": integer,
"eventType": integer,
"scene": string,
"timeStamp": long,
"pointers": [
{
"width": float,
"height": float,
"clientX": float,
"clientY": float
},

]
},

]
},

]

Table 2.

Available raw data per user.

Field Description
User Unique username (anonymized)
Events List of events (including individual measurements) recorded for the user
- centerX X-axis position: true position for single-touch events; central position across all the registered pointers for multi-touch events1
- centerY Y-axis position: true position for single-touch events; central position across all the registered pointers for multi-touch events1
- eventType Type of measurement represented as an integer value: start (1), move (2), end (4), and cancel (8)
- scene Application scene where the event was recorded: FruitNinja, GearMatching, PuzzleSliding, CountrySearch
- timeStamp Timestamp when the measurement took place, in milliseconds
- pointers List of pointers (fingers) that triggered the measurement
 . width Contact surface width for this pointer
 . height Contact surface height for this pointer
 . clientX X-axis position registered for this pointer
 . clientY Y-axis position registered for this pointer
1

Position (0,0) represents the top left edge of the application.

On the other hand, we have also published an enriched version of the dataset in which we have derived high-level features. Therefore, a Comma Separated Values (CSV) file is provided for each of the scenes of our application (described below), which contains all the quantities listed in Table 3 (including a header row).3 In addition, the code needed to extract them is also provided, so that anyone can reproduce or even customize such a processing according to their needs. Such features can be found under the features directory, whereas the source code is publicly available.4

Table 3.

Full set of features extracted from raw data.

Feature Definition
Horizontal velocity hvi=xixi1titi1
Absolute horizontal velocity abs_hvi=|hvi|
Horizontal velocity towards left hvil={|hvi|ifhvi<00otherwise
Horizontal velocity towards right hvir={|hvi|ifhvi>00otherwise
Vertical velocity vvi=yiyi1titi1
Absolute vertical velocity abs_vvi=|vvi|
Vertical velocity downwards vvid={|vvi|ifvvi>00otherwise
Vertical velocity upwards vviu={|vvi|ifvvi<00otherwise
Tangential velocity tvi=hvi2+vvi2
Tangential acceleration tai=tvitvi1titi1
Absolute tangential acceleration abs_tai=|tai|
Tangential deceleration taid={|tai|iftai<00otherwise
True tangential acceleration taia={|tai|iftai>00otherwise
Tangential jerk tji=taitai1titi1
Absolute tangential jerk abs_tji=|tji|
Negative tangential jerk tjin={|tji|iftji<00otherwise
Positive tangential jerk tjip={|tji|iftji>00otherwise
Distance li=(xixi1)2+(yiyi1)2
Angle θi=arctan(yiyi1xixi1)
Sine of the angle sin_θi=sin(θi)
Cosine of the angle cos_θi=cos(θi)
Angle variation δθi=(θiθi1+π)mod2ππ
Angular velocity ωi=δθititi1
Angular acceleration αi=ωiωi1titi1
Curvature ci=θiθi1lili1
Absolute curvature abs_ci=|ci|
Curvature rate of change δci=cici1lili1
Absolute curvature rate of change abs_δci=|δci|
Distance from the origin li0=xi2+yi2
Angle with respect to the origin θi0=arctan(yixi)
Sine of the angle with respect to the origin sin_θi0=sin(θi0)
Cosine of the angle with respect to the origin cos_θi0=cos(θi0)
Variation of the angle with respect to the origin δθi0=(θi0θi10+π)mod2ππ
Angular velocity with respect to the origin ωi=δθi0titi1
Angular acceleration with respect to the origin αi=ωi0ωi10titi1
Curvature from the origin ci0=θi0θi10li0li10
Absolute curvature from the origin abs_ci0=|ci0|
Curvature rate of change from the origin δci0=ci0ci10li0li10
Absolute curvature rate of change from the origin abs_δci0=|δci0|
Distance from the gesture origin ligo=(xix0g)2+(yiy0g)2
Angle with respect to the gesture origin θigo=arctan(yiy0gxix0g)
Sine of the angle with respect to the gesture origin sin_θigo=sin(θigo)
Cosine of the angle with respect to the gesture origin cos_θigo=cos(θigo)
Variation of the angle with respect to the gesture origin δθigo=(θigoθi1go+π)mod2ππ
Angular velocity with respect to the gesture origin ωigo=δθigotiti1
Angular acceleration with respect to the gesture origin αigo=ωigoωi1gotiti1
Curvature within the gesture cigo=θigoθi1goligoli1go
Absolute curvature within the gesture abs_cigo=|cigo|
Curvature rate of change within the gesture δcigo=cigoci1goligoli1go
Absolute curvature rate of change within the gesture abs_δcigo=|δcigo|
Overall gesture angle θg=arctan(y1gy0gx1gx0g)
Sine of the gesture angle sin_θg=sin(θg)
Cosine of the gesture angle cos_θg=cos(θg)
Gesture shape (horizontal/vertical) ratio shrg=|x1gx0g||y1gy0g|
Gesture path length lg=j=2ng(xjxj1)2+(yjyj1)2
Gesture duration (time span) tg=t1gt0g
Number of events within the gesture ng=numberofeventsfromt0gtot1g
Finger area (average) fai=π·widthi2·heighti2
Finger area rate of change δfai=faifai1fai1

Where for a given event i, xi and yi are the pointer positions, and ti the timestamp. Moreover, widthi and heighti represent the width and height of the tracking pointers detected (i.e., fingers). Superscript 0 refers to the origin of coordinates (0,0), whereas go refers to the first event within a gesture (i.e., gesture origin) and g represents a gesture-wide feature.

4. Experimental Design, Materials and Methods

In order to gather these data, we have developed an ad-hoc application that establishes a controlled scenario so that all user interaction takes place under specific and supervised conditions. Our main goal was to obtain a representative sample of data that includes the most frequent and representative touch-based gestures that can be performed in a real-world application, allowing us to conduct a more in-depth analysis for each one of them.

Thus, we developed four different scenes that were proposed as games to catch the attention and interest of the users without resulting in an aimless and cumbersome process that may annoy them. Each one of these scenes focuses on generating events that correspond to a specific type of gesture:

  • -

    Fruit ninja: The user has to cut fruit pieces that come out from the bottom of the screen following a random generation pattern, so that the user is required to perform swipe gestures (Fig. 1).

  • -

    Country search: A wide list of flags and country names is presented to the user among several pages, while it is required to seek a random one, so that swipe and pan gestures have to be performed in order to navigate and locate the requested countries/flags (Fig. 2).

  • -

    Puzzle sliding: A reference picture is presented to the user, while a set of slices from such an image are randomly distributed in a grid. The user has to appropriately arrange them to reproduce the reference images, performing mostly swipe and pan gestures (Fig. 3).

  • -

    Gear matching: A reference gear is presented at a random point in the screen, while a second one is also placed at a different randomly generated position and scale, so that the user has to match both using mostly rotate, pinch and pan gestures to appropriately transform the replica to fit the target (Fig. 4).

Fig. 1.

1

Example of swipe gesture while trying to cut fruit.

Fig. 2.

2

Example of pan gesture while searching for a specific country flag.

Fig. 3.

3

Example of pan gesture while trying to arrange the pieces of the puzzle.

Fig. 4.

4

Example of rotating gesture while trying to match both gears.

The entire application was developed using the Ionic 5 framework,5 capturing raw touch-based events through the Hammer.js library.6 Every action performed by the user is recorded as a pair of positions on the screen (x, y) along with a timestamp that is stored in JSON format for further processing, as it was described above.

Regarding the data collection process, 37 participants were recruited within our institution, specifically at the Faculty of Computer Science of the University of A Coruna. Hence, all these volunteers had some background knowledge about the usage of touchscreen-based devices, such as smartphones or tablets, as well as they were also more or less familiar with the type of games proposed. Specifically, all of them were graduates in computer science or physics. Regarding age and gender, they ranged from 22 to 60 years (28 of them were under 30) and 4 out of the 37 volunteers were female, while the rest were male. Currently, half of the participants are master’s students, and the other half are professors. To comply with ethics standards, and before they enrolled in this data acquisition process, they were informed about the procedure, the type of data being collected, and how it was going to be used and analyzed. Thus, verbal informed consent was explicitly required prior to start collecting any data and, to guarantee their privacy, anonymization mechanisms were put in place to remove any personal identifiable information that could link these information to individuals.

In order to gather enough events for each gesture and to guarantee the quality and homogeneity of the samples, the following protocol was established: (a) the same device, a Samsung Tab S4, was always used to avoid any differences in the manner data is captured by the application, such as different touchscreen sensitivities or different pixel resolutions; (b) each participant was asked to complete three entire sessions across the four scenarios; (c) each scenario lasted 60 s, in which the user had to perform the main actions required; and (d) in-person meetings were arranged with all the volunteers, so that they were always supervised during the entire process to ensure that they performed the required actions without any disruption or interference and that the collected data were consistent –e.g., the dominant hand was used throughout the sessions and that natural movements were performed, avoiding erratic behaviors–.

Despite using a controlled environment, the proctoring of volunteers enabled the collection of consistent data comparable to that obtained in real-world scenarios. In summary, each participant completed approximately 15 min of interaction, that resulted in around 600 events (20,000 individual measurements) per user, achieving approximately a total of 24,000 events or 700,000 measurements across all users. Fig. 5 presents the distribution of raw measurements across all the participants and the various proposed scenes.

Fig. 5.

5

Distribution of touch-based events across all participants and scenes.

4.1. Feature extraction and processing

In addition to the raw data, we also provide an extended version of the dataset that mostly consists of geometric and motion related features oriented to study interaction patterns; specifically, those related to authentication purposes. These features are described in Table 3 and they can be broadly grouped into the following categories:

  • -

    Temporal and physical features refer to the duration of the gesture (i.e., time span required to complete an action) and finger contact area derived from pointer dimensions.

  • -

    Geometric and spatial features along the path of the gesture with respect to different reference systems (e.g., the coordinate origin or the gesture origin). These features encompass distances, angles, curvatures or path lengths, but also shape descriptors and some variants, such as absolute values of the aforementioned quantities.

  • -

    Kinematic features consider various forms of velocities (horizontal, vertical, and tangential) and some derived quantities (e.g., left and right or up and down decompositions). Acceleration and its variants describe changes in velocity over time, while jerk captures changes in acceleration. In addition, angular-based metrics (e.g., angular velocities or accelerations) allow to measure changes in directional orientation.

Since each application scene requires completely different actions to be performed, it may result in distinct gestures or even diverse execution patterns for the same gesture. Therefore, each one of the scenes was handled separately and the following procedure was applied:

  • 1.

    Those individual measurements were associated with a specific event through the eventType field, which determines when the event starts, continues and finishes. This leads to variable-sized measurement groups depending on the duration and dynamics of the gesture, so that an event can be processed as a whole entity.

  • 2.
    Within each group or event some sanitization operations are performed in order to avoid spurious information:
    • -
      Duplicated measurements are identified and removed according to both timestamp and positions.
    • -
      A minimum of four different measurements are required; otherwise, the event is completely discarded because it will not contain enough information to derive meaningful features.
    • -
      Sometimes, the library used to capture these measurements detects more pointers (active fingers) than it should, probably due to partial or involuntary touches from the user inherent to the way they interact with the device to perform such gestures. These clearly contain non-relevant information and, therefore, they are removed so that the number of pointers is limited to one or two, depending on the type of gesture being performed.
  • 3.

    Once the events are sorted out, we proceed to extract all the features listed and described in Table 3. It must be noticed that this operation is performed separately for each event; that is, we iterate over all the events, considering each one a whole entity isolated from others. Moreover, these features are individually derived for every measurement. This requires sorting them by their timestamps to ensure their actual ordering in the gesture path, as they are dependent on the previous ones (e.g., velocities, accelerations or curvatures).

  • 4.

    Each event was categorized into one of the different gesture types considered: swipe, pan, pinch, or rotate (see Fig. 6). To achieve this, we followed a rule-based approach that seeks for specific motion patterns and characteristics (e.g., path length, duration or direction variations).

    It is worth mentioning that this is an empirical estimation, rather than ground truth and, thus, such a categorization must be used with caution. Moreover, some of the gestures could not fit into any of these predefined schemes, so that they were labelled as unknown.

  • 5.

    Over the computed features, an outlier detection and removal process based on the inter-quartile ranges was carried out to discard anomalies (i.e., extremely high or low figures that lie out of regular bounds).

  • 6.

    Afterwards, an aggregation step was applied to summarize the features of each event (a group of measurements), computing the mean value for each feature (without outliers), so that a unique feature vector was produced for each event.

  • 7.

    As a result of the previous steps, an event may end up with some features not being available (i.e., insufficient measurements and/or anomalous values) and, therefore, they were discarded.

Fig. 6.

Fig 6

Distribution of touch-based events according to the estimated gesture type.

Limitations

On the one hand, Android imposes serious restrictions for security reasons, and it is unfeasible to perform a system-wide data collection. Therefore, it is only possible to record interaction events within application boundaries. In addition, one action is completely isolated from the rest, since there are no transitions recorded between them as interaction can be suddenly stopped (i.e., the user can lift its finger from the screen).

On the other hand, the acquired data was tailored to the specific actions and gestures required in each one of the proposed scenes. Consequently, this may reduce the variability among a common gesture compared to other scenarios across a wider range of applications and contexts. Moreover, we decided to use a unique device throughout the data acquisition process to guarantee that the data sample gathered was uniform and non-device dependent.

Finally, the number of users that volunteered to generate data was limited, as it was difficult to get them to physically attend to our facilities in order to follow the supervised protocol established in advance.

Ethics Statement

In order to gather the dataset presented in this article, 37 volunteers participated in the data acquisition process. Before they started such a process, they were informed about: (a) the type of information that was being gathered; (b) the usage and type of experimentation we were going to carry out; and (c) the possibility of making these data publicly available so that third parties could conduct their own analysis. Once they were informed and they had given their consent, we proceeded with the acquisition process itself. Finally, it must be noticed that no sensible data from any user was gathered at all, apart from their usernames, which were appropriately anonymized prior to any data distribution.

CRediT Author Statement

Daniel Garabato: Conceptualization, Formal analysis, Investigation, Software, Writing – original draft. Mario Casado: Formal analysis, Investigation, Software, Writing - original draft. Carlos Dafonte: Conceptualization, Supervision, Writing – review & editing. Manuel F. López-Vizcaíno: Validation, Data curation, Writing – review & editing. Marco A. Álvarez: Validation, Data curation, Writing – review & editing Francisco J. Nóvoa: Conceptualization, Formal analysis, Investigation, Supervision, Writing – original draft.

Acknowledgements

This work has been developed in part thanks to the grant TED2021–130492B-C21 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”.

This work is funded in part by Xunta de Galicia and the European Union (European Regional Development Fund–Galicia 2021–2027 Program), under Grant ED431B 2024/02 and Grant ED431B 2024/21.

This work was also supported by grant number PID2023–150794OB-I00, funded by the  MICIU/AEI/10.13039/501100011033, and by “ERDF A way of making Europe”. We also acknowledge support from CIGUS-CITIC, funded by Xunta de Galicia and the European Union (ERDF–Galicia 2021–2027 Program), through grant ED431G 023/01.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

1

AITouch – Data repository: https://data.mendeley.com/datasets/9v7bxv3dcc/1.

2

Some of these events could not be categorized into any specific gesture type due to their unclear nature.

3

Some additional metadata columns are also provided: username, scene, event/group identifier (chunk), gesture type estimation, etc.

5

Ionic 5 framework: https://ionicframework.com/.

6

Hammer.js library: https://hammerjs.github.io/.

Data Availability

References

  • 1.Garabato D., Casado M., Dafonte C., López-Vizcaíno M.F., Álvarez M.A., Nóvoa F.J. AITouch - Data. Mendeley Data. 2025 doi: 10.17632/9v7bxv3dcc.1. [DOI] [Google Scholar]
  • 2.Ajayi A., Olajide M.S., Afolabi O.P., Abiodun O.A. Evaluation of phishing attack strategies on mobile device users. Int. J. Comput. Inf. Technol. 2023;12 doi: 10.24203/ijcit.v12i1.312. [DOI] [Google Scholar]
  • 3.Aviv A.J., Katherine Gibson, Mossop E., Blaze M., Smith J.M. Proceedings of the 4th USENIX Conference on Offensive Technologies. 2010. Smudge attacks on smartphone touch screens. [Google Scholar]
  • 4.Anastasova M., Azarderakhsh R., Kermani M.M. 2024/2083. Cryptology ePrint Archive; 2024. (Fully Hybrid TLSv1.3 in WolfSSL On Cortex-M4). [Google Scholar]
  • 5.Niasar M.B., Azarderakhsh R., Kermani M.M. 2020/1338. Cryptology ePrint Archive; 2020. (Optimized Architectures for Elliptic Curve Cryptography over Curve448). [Google Scholar]
  • 6.Monrose F., Rubin A.D. Keystroke dynamics as a biometric for authentication. Future Gener. Comput. Syst. 2000;16(4):351–359. doi: 10.1016/S0167-739X(99)00059-X. [DOI] [Google Scholar]
  • 7.Garabato D., Dafonte C., Santovena R., Silvelo A., Novoa F.J., Manteiga M. AI-based user authentication reinforcement by continuous extraction of behavioral interaction features. Neural Comput. Appl. 2022;34(14):11691–11705. doi: 10.1007/s00521-022-07061-3. [DOI] [Google Scholar]
  • 8.Frank M., Biedert R., Ma E., Martinovic I., Song D. Touchalytics: on the applicability of touchscreen input as a behavioral biometric for continuous authentication. IEEE Trans. Inf. Forensics Secur. 2012;8(1):136–148. doi: 10.1109/TIFS.2012.2225048. [DOI] [Google Scholar]
  • 9.Antal M., Bokor Z., Szabó L.Z. Information revealed from scrolling interactions on mobile devices. Pattern. Recognit. Lett. 2015;56:7–13. doi: 10.1016/j.patrec.2015.01.011. [DOI] [Google Scholar]
  • 10.Papamichail M.D., Chatzidimitriou K.C., Karanikiotis T., Oikonomou N.-C.I., Symeonidis A.L., Saripalle S.K. BrainRun: a behavioral biometrics dataset towards continuous implicit authentication. Data. 2019;4(2):60. doi: 10.3390/data4020060. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES