Skip to main content
Data in Brief logoLink to Data in Brief
. 2026 Jan 29;65:112516. doi: 10.1016/j.dib.2026.112516

A dataset of smart home devices sold on Spanish e-commerce platforms

Jhovany Quintana-Vera 1, Ana I González-Tablas 1,, Mohammed Rashed 1
PMCID: PMC12907662  PMID: 41704511

Abstract

The use of smart home devices is on the rise with estimations of the number of users reaching over 785 million from the current ∼ 361 million users; a 117% increase in just 4 years. Thus, it becomes essential to have an available dataset that provides details about the different aspects of the available devices in the market. In this paper, we introduce our dataset titled Spanish MArket Smart Home devices (SMASH) which we collected via structured data extraction from four major Spanish e-commerce platforms. Containing 5218 devices across 652 brands, the dataset provides an overview of smart home devices sold within Spain, the fourth largest economy in the European Union. The dataset is versatile as it includes details such as name, price, brand, model, rating, number of reviews, platform and category. The dataset can be used as primary source in research that involves consumer behaviour and microeconomics. Additionally, the details could be used for creating new datasets like privacy policies of brands and mobile applications (apps) used for the devices. The dataset is publicly accessible under license CC-BY-NC-4.0-ES. We note, however, that SMASH is limited to products sold within Spain and collected within a specific time window (start date: 2023–12; end date: 2024–08); users should consider the scope and temporal constraints when generalizing findings.

Keywords: IoT, Consumer electronics, Home automation, Smart home, Online shops


Specifications Table

Subject Computer Sciences
Specific subject area Extracted data of sold smart home devices on four Spain-based e-commerce platforms.
Type of data Table
Raw.
Data collection The data were collected using four python scripts (one script per platform) that were executed on a local workstation. To collect the data, we included platforms that sell smart home devices within Spain. Another inclusion criteria was that the platform must have had a designated category or categories for smart home devices so that we could automate the data collection process. Data was collected in the following time period: start date: 2023–12; end date: 2024–08.
Data source location Primary data sources:
- Amazon Spain. https://www.amazon.es [1]
- MediaMarkt Spain. https://www.mediamarkt.es [2]
- El Corte Inglés. https://www.elcorteingles.es [3]
- PC Componentes. https://www.pccomponentes.com/ [4].
Data accessibility Repository name: e-cienciadatos (https://edatos.consorciomadrono.es/)
Data identification number: DOI: 10.21950/JTIQQ5
Direct URL to data: https://edatos.consorciomadrono.es/dataset.xhtml?persistentId=doi:10.21950/JTIQQ5
Related research article None.

1. Value of the Data

  • Our dataset stands out for providing real-world based information about the status of smart home devices in a European Union state.

  • This data’s value is versatile. Given the details they include, they can be used within different fields of study (see Table 1) like consumer behaviour, market tendencies, price analysis, hardware capabilities of sold devices, etc.

  • Researchers in the area of economics can analyze columns related to prices, number of reviews, device category and macro-category, and brands. Besides, researchers in the area of computing can make use of device category and macro-category to study hardware capabilities. Moreover, privacy researchers in areas like the General Data Protection Regulation (GDPR) may leverage information from this dataset to study the brands’ privacy policies.

  • This data can be further analyzed by the research community to create secondary data about aspects such as platform comparison, popularity of brands, etc.

Table 1.

Suggested research lines that can leverage the SMASH dataset.

Dataset field(s) Suggested research line
product_name, category NLP / information extraction
brand, model Market structure / brand analysis
price Economics / price prediction
rating, review_count Consumer behavior / demand modeling
product_link, product_id Reproducibility / data enrichment
category, macro_category Consumer electronics ecosystem studies
platform Platform competition / digital markets

2. Background

The creation of this dataset forms part of an ongoing project in which we study privacy within the context of smart home devices. First, we needed to identify the most popular smart-home devices in Spain, but we could not find a recent dataset with that information. Datafiniti’s “Electronic Products and Pricing Data” [5] contains similar fields to SMASH, but its records date from 2014 to 2018. Most smart-home datasets focus on network traffic or sensor measurements and therefore differ from SMASH. Consumer-electronics databases may provide similar information, but they require payment, whereas our dataset is publicly available.

With the goal of identifying the most popular brands and devices within the smart home market in Spain, we first identified many e-commerce platforms that Spain-based users can buy from. Afterwards, we selected only the platforms that allowed for the automation of information extraction of the sold devices by means of scraping. These platforms are Amazon Spain [6], MediaMarkt Spain [1], El Corte Inglés [2] and PC Componentes [3].

Our dataset, while built for privacy research, is versatile as it contains details like models, prices, platform, number of reviews, and others. This makes the dataset one that could be used in several fields.

3. Data Description

General Overview. The dataset, which contains extracted information of 5218 devices, is provided as a table in a file following the Tab-Separated Value format (a .tsv file). Each register (a row in the table) represents a device and the data collected about it. The table has a header row. Each register (row) has the following fields (columns):

  • A.

    id: field used to identify uniquely each register in the table (integer number in text format)

  • B.

    name: includes the full name of the product as extracted from the product name label on the webpage (text)

  • C.

    price: refers to the selling price of the product (numeric, with the dot ‘.’ used as decimal separator)

  • D.

    brand: indicates the name of the brand as extracted from the webpage of the product (text)

  • E.

    model: is the model name as designated by the product’s webpage (text)

  • F.

    code: a field titled code that can either be a universal code or an internal code of the product (text)

  • G.

    rating: the rating of the product based on the 5-star reviewing system (numeric, with the dot ‘.’ used as decimal separator, or text ‘NULL’)

  • H.

    reviews: the number of reviews available on the product’s webpage for that product (numeric, with the dot ‘.’ used as decimal separator, or text ‘NULL’)

  • I.

    link: the link from which the data was extracted (url, text format)

  • J.

    currency: the currency used in the sales of the product (EUR)

  • K.

    platform: the platform from which the data was extracted. There are four platforms: Amazon, PC Componentes, El Corte Inglés, MediaMarkt (text)

  • L.

    Category-ES: category as designated by the platform and shown on the product’s webpage in the official webpage (text)

  • M.

    Category-EN: an English translation of the Category-ES field (text)

  • N.

    Macro-Category-ES: a higher level category that we create in order to normalize the diverse categories that the platforms assign to the products (text)

  • O.

    Macro-Category-EN: an English translation of the Macro-Category-ES (text)

  • P.

    date: the date of collection of the data about the product (date, format YYYY/MM/DD HH-MM-SS)

Next, minimal instructions to import the data into two common data formats are provided.

Instructions to import the dataset in a Microsoft Excel sheet:

  • -

    Open a new file *.xlsx

  • -

    Import the data from the file “smash_2024_v02.tsv”, selecting the type of file as “CSV File”.

  • -

    Indicate that data are delimited (not with a fixed width), and select as delimiter the tab character.

  • -

    Establish the following format for each column of the data to import:

  • -

    “General” format for columns “price”, “rating” and “reviews”. They contain numeric values. In advanced options, select the dot character “.” as the decimal separator, and none for the thousands.

  • -

    “Date” (YYYY-MM-DD h:mm:ss or YMD) format for column “date”.

  • -

    “Text” format for the other columns.

Instructions (sample Python code) to import the dataset into a Python DataFrame are provided in Table 2.

Table 2.

All devices-based dataset details.

Field Quantity
Devices 5218
Brands 652
Categories 49
Macro-Categories 7
Platforms 4
Price (non zero values) 5193
Rating (non NULL values) 2597
Reviews (non zero nor NULL values) 2597

import pandas as pd

# Specify the path where the dataset file is located

dataset_file = ``[CUSTOM_PATH]/smash_2024_v2.tsv''

# Read the contents of the file into a Python DataFrame

df = pd.read_csv(dataset_file, sep='\t', decimal='.')

# Display the first few rows of the DataFrame

print(``DataFrame Sample:'')

print(df.head())

Further overview about the dataset is provided in Table 1.

Dataset Files and Folders. The dataset is publicly available (under license CC-BY-NC-4.0-ES) on e-cienciadatos [7] in the file named smash_2024_v02.tsv. ‘Tab Separated Values’ (TSV) is a widely used data exchange format, similar to CSV (‘Comma-Separated Values’) but using tabs to avoid issues when data fields themselves contain commas.

All Devices-based Description. Fig. 1 depicts the distribution of the price, rating and number of reviews values for all the devices in the dataset.

Fig. 1.

Fig. 1 dummy alt text

Price, rating and number of reviews distribution for all devices.

Platform-based Description. Table 3 shows general data description across platforms and Fig. 2 depicts the distribution of the price, rating and number of reviews values across platforms.

Table 3.

Platform-based dataset details.

Platforms
pccomponentes mediamarkt amazon corteingles Total
Device Count 2472 1301 1184 261 5218 (sum)
Brand Count 208 136 437 25 652 (unique)
Categories Count 20 16 10 6 50 (unique)
Devices w/Ratings Count 1271 97 1162 67 2597 (sum)
Devices w/Reviews Count 1271 97 1162 261
[(!=zero)=67]
2791 (sum)
Reviews Count 42717 680 2450582 496 2494475 (sum)

Fig. 2.

Fig. 2 dummy alt text

Price, rating and number of reviews distribution across platforms.

Brand-based Description. Fig. 3 depicts the distribution of device counts across platforms and the distribution of the aggregated device count per brand considering all the platforms (all devices). Fig. 4 highlights the absolute device counts for the top 50 brands and the share of each platform within those brands. Fig. 5 depicts the device counts for the top 50 brands across platforms. Fig. 6 shows in subfigures [a]. [b] and [c] the distribution of the median values of price, rating and reviews across brands, together with the full range of these fields for each brand (error bars specify the maximum and minimum value of that field for that brand). Subfigures [d], [e] and [f] of Fig. 6 show the distribution of devices with reviews, categories and macro-categories per brand. Finally, Fig. 7 presents the overlap of brands available across various combinations of platforms, illustrating how many brands are shared among them.

Fig. 3.

Fig. 3 dummy alt text

Distribution of device counts per brand for all devices and across platforms.

Fig. 4.

Fig. 4 dummy alt text

Top 50 brands by device count with platform distribution.

Fig. 5.

Fig. 5 dummy alt text

Top 50 brands by device count across platforms.

Fig. 6.

Fig. 6 dummy alt text

Median values (Price [a], Rating [b], Reviews [c]) and counts (Devices with Reviews [d], Distinct Categories [e], Macro-Categories [f]) by Brand.

Fig. 7.

Fig. 7 dummy alt text

Overlap of available brands across platform combinations.

Category and Macro-Category Details. Each selected platform includes a dedicated section for smart home devices. Table 4 presents the names of these sections along with the corresponding categories for each platform. As mentioned in the General Overview subsection, we aggregated the various platform categories into broader macro-categories. Table 5 subsequently displays these categories, now grouped by their assigned macro-categories.

Table 4.

Categories across platforms.

Platform Section Name
(Original - Spanish)
Name
(English Translation)
El Corte Inglés Domótica (Home Automation) ECI01 alarma y seguridad alarm and security
ECI02 camaras de vigilancia surveillance cameras
ECI03 detectores detectors
ECI04 iluminacion inteligente smart lighting
ECI05 mandos y control remotes and controls
ECI06 termostatos thermostats

PC Componentes Hogar Digital (Digital Home) PCC01 aire acondicionado air conditioning
PCC02 alarmas alarms
PCC03 altavoces inteligentes smart speakers
PCC04 batidoras mixers
PCC05 bombillas inteligentes smart light bulbs
PCC06 camaras bebes baby cameras
PCC07 camaras inteligentes smart cameras
PCC08 detectores de movimiento motion detectors
PCC09 enchufes inteligentes smart plugs
PCC10 estaciones meteorologicas weather stations
PCC11 freidoras fryers
PCC12 frigorificos refrigerators
PCC13 lavadoras washing machines
PCC14 lavavajillas dishwasher
PCC15 purificadores de aire air purifiers
PCC16 radiadores radiators
PCC17 robots aspiradoras robot vacuums
PCC18 termostatos inteligentes smart thermostats
PCC19 ventiladores ventilators
PCC20 wearables wearables

Amazon Hogar Digital (Digital Home) AMZ01 camaras inteligentes smart cameras
AMZ02 control climatico climate control
AMZ03 echo altavoces inteligentes echo smart speakers
AMZ04 electrodomesticos home appliances
AMZ05 enchufes inteligentes smart plugs
AMZ06 entrada y seguridad entrance and security
AMZ07 entretenimiento en el hogar home entertainment
AMZ08 iluminacion e interruptores lighting and switches
AMZ09 smartwatches smartwatches
AMZ 10 wifi y redes wifi and networks

MediaMarkt Smart Home MMK01 alexa echo alexa echo
MMK02 apple homepod apple homepod
MMK03 cerraduras electronicas electronic locks
MMK04 detectores y sensores detectors and sensors
MMK05 fire tv fire tv
MMK06 google nest google nest
MMK07 iluminacion lighting
MMK08 interruptores y enchufes switches and plugs
MMK09 kits smarthome kits smarthome
MMK10 plc wifi plc wifi
MMK11 repetidores y amplificadores wifi wifi repeaters and amplifiers
MMK12 routers wifi routers wifi
MMK13 seguridad inteligente smart security
MMK14 sistema wifi mesh wifi mesh system
MMK15 termostatos thermostats
MMK16 xiaomi mi xiaomi mi

Table 5.

Assignment of each Category to a Macro-Category.

Category Macro-Category
air conditioning PCC01 Home Appliances
air purifiers PCC15
climate control AMZ02
dishwasher PCC14
fryers PCC11
home appliances AMZ04
mixers PCC04
radiators PCC16
refrigerators PCC12
thermostats ECI06, MMK15
ventilators PCC19
washing machines PCC13
weather stations PCC10
plc wifi MMK10 Routers and Networks
routers wifi MMK12
wifi and networks AMZ10
wifi mesh system MMK14
wifi repeaters and amplifiers MMK11
alexa echo MMK01 Smart Home
apple homepod MMK02
echo smart speakers AMZ03
fire tv MMK05
google nest MMK06
home entertainment AMZ07
kits smarthome MMK09
remotes and controls ECI05
robot vacuums PCC17
smart speakers PCC03
smart thermostats PCC18
xiaomi mi MMK16
alarm and security ECI01 Smart Security
alarms PCC02
baby cameras PCC06
detectors ECI03
detectors and sensors MMK04
electronic locks MMK03
entrance and security AMZ06
Motion detectors PCC08
smart cameras AMZ01, PCC07
smart security MMK13
surveillance cameras ECI02
wearables PCC20 Wearable Devices
smartwatches AMZ09
lighting MMK07 Smart Lighting
lighting and switches AMZ08
smart light bulbs PCC05
smart lighting ECI04
smart plugs PCC09, AMZ05 Smart Plugs and Switches
switches and plugs MMK08

Category-Based and Macro-Category-Based Description. This subsection presents some graphics that describe the main characteristics of the dataset across the categories and macro-categories. Fig. 8 and 9 depict the device counts per category and macro-category, while Fig. 10 and 11 depict the brand counts. Fig. 12, Fig. 13, Fig. 14 and 15 show the distribution of the same fields across platforms. Fig. 16 and 18 depict the distribution of price, rating and reviews across categories and macro-categories. Finally, Fig. 17 and 19 show the distribution of devices with reviews across categories and macro-categories.

Fig. 8.

Fig. 8 dummy alt text

Distribution of device counts per category.

Fig. 9.

Fig. 9 dummy alt text

Distribution of device counts per macro-category.

Fig. 10.

Fig. 10 dummy alt text

Distribution of distinct brand counts per category.

Fig. 11.

Fig. 11 dummy alt text

Distribution of distinct brand counts per macro-category.

Fig. 12.

Fig. 12 dummy alt text

Distribution of device counts per category across platforms.

Fig. 13.

Fig. 13 dummy alt text

Distribution of device counts per macro-category across platforms.

Fig. 14.

Fig. 14 dummy alt text

Distribution of distinct brand counts per category across platforms.

Fig. 15.

Fig. 15 dummy alt text

Distribution of distinct brand counts per macro-category across platforms.

Fig. 16.

Fig. 16 dummy alt text

Distribution of price, rating and number of reviews per category across platforms.

Fig. 18.

Fig. 18 dummy alt text

Distribution of price, rating and number of reviews per macro-category across platforms.

Fig. 17.

Fig. 17 dummy alt text

Distribution of number of devices with reviews per category across platforms.

Fig. 19.

Fig. 19 dummy alt text

Distribution of number of devices with reviews per macro-category across platforms.

4. Experimental Design, Materials and Methods

In order to acquire the data, we carried out the following steps:

  • We surveyed the existing e-commerce platforms that sell smart home devices in Spain.

  • We studied each platform on a case by case in order to know if there was/were (a) dedicated category/ies for smart home products regardless of the name of the section. We discarded platforms that did not have such capacity.

  • We analyzed the included platforms to ensure that the content of these sections contained only devices that were considered to be smart home and discarded those that mixed smart home devices with other products.

This process concluded with the selection of four platforms: Amazon Spain, El Corte Inglés, PC Componentes, and MediaMarkt Spain. In order to collect the data from these platforms, we analyzed the platforms; each on its own. We also studied the ‘robots.txt’ file of each platform to identify the web pages related to the smart home sections that were permitted for automated processing. This was followed by creating a Python script for each platform given the uniqueness of the html properties that each platform has. A high-level description of the whole procedure for each platform is provided in Table 6.

Table 6.

Step-by-step data obtaining process for each of the platforms.

Step PC Componentes Amazon MediaMarkt El Corte Inglés
Manual 1 List of smart home categories is obtained and stored in Json format
2 HTML structure of the pages of the product within the relevant category/ies is analyzed to identify elements and data to be collected
Automated using customized script 3 For each platform, a function in the script is developed to navigate through all the product pages for each category
4 - Access is limited to page 10 for each category - -
5 Using Selenium and BeautifulSoup libraries, a different function extracts the product’s data for each of the categories identified in step 1
6 - Function that scrolls the page to access all the necessary elements
7 - - - Scrolling function includes a click function developed to access all the necessary elements
8 Text normalization function to clean up special characters from some data fields

Limitations

While our initial goal was to create a more comprehensive dataset for all popular e-commerce platforms, the practical limitation of automating data extraction led to only selecting 4 platforms. Additionally, we believe that including devices in non-corresponding categories by the platforms themselves or having similar categories with different sets of devices in the different platforms may lead to bias within the dataset. Moreover, the field model is mostly represented by a code rather than being the model name itself which is typically embedded within the name field. We attribute such practice to easier querying of specifications of the product on the platform. For further enhancing this dataset, models shall be extracted from the field name, where the platforms include the most valuable information for analysis. This technique used by the platforms, while beneficial in the product querying on the platforms, is not machine-ready.

Ethics Statement

We confirm that we have read and followed the ethical requirements for publication in Data in Brief and confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.

Terms of Service (ToS): Web crawling is crucial for web functionality, generating backlinks that enhance a site's visibility. Consequently, companies typically permit crawling, with guidelines set in the ‘robots.txt’ file. During web scraping, we reference this file to determine which sections can be accessed and comply with its directives. Only one category in Mediamarkt was included in the ‘robots.txt’ for which we extracted the information manually.

Additionally, we have collected the data and elaborated the dataset supported by Articles 3 and 4 of the DIRECTIVE (EU) 2019/790 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.

Copyright: The dataset only contains publicly published data about smart home products sold by the considered platforms. Neither data from social media nor news outlets are included.

Privacy: No personally identifiable information was scraped nor has been included in the dataset. Anonymization is not relevant to this dataset

Scraping Policies: Web scraping (text and data mining) for the purpose of scientific research is explicitly allowed within the European Union by DIRECTIVE (EU) 2019/790 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.

Furthermore, the web scraping process was conducted in accordance with the limits established by each platform's websites, e.g. adhering to allowed access times and request rates, and ensuring that the services provided by the platforms were not disrupted.

Credit Author Statement

Jhovany Quintana-Vera: Methodology, Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Original Draft; Ana I. González-Tablas: Conceptualization, Methodology, Validation, Data Curation, Writing - Review & Editing, Visualization, Supervision, Project Administration, Funding acquisition; Mohammed Rashed: Conceptualization, Methodology, Validation, Resources, Writing - Original Draft, Data Curation, Writing - Review & Editing, Visualization, Supervision.

Acknowledgements

Jhovany Quintana-Vera’s work is supported by a UC3M Full Scholarship to the Master’s Study for the Master in Computer Science and Technology of the University Carlos III de Madrid.

Mohammed Rashed’s work is supported by the Recualificación-Margarita Salas grant (call of Universidad Carlos III Madrid) financed by the Ministerio de Ciencia, Innovación y Universidades and the European Union-Next Generation EU.

Ana I. González-Tablas’ work has been supported by the European Defence Industrial Development Programme (EDIDP) under grant agreement No EDIDP-CSAMN-SSC-2019–022-ECYSAP (European Cyber Situational Awareness Platform), and by the Ministerio de Ciencia, Innovación y Universidades under Grant No TIN2016-79095-C2-2-R (SMOG).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Jhovany Quintana-Vera, Email: jhquinta@inf.uc3m.es.

Ana I. González-Tablas, Email: aigonzal@inf.uc3m.es.

Mohammed Rashed, Email: mrashed@inf.uc3m.es.

Data Availability

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES