Abstract
The use of smart home devices is on the rise with estimations of the number of users reaching over 785 million from the current ∼ 361 million users; a 117% increase in just 4 years. Thus, it becomes essential to have an available dataset that provides details about the different aspects of the available devices in the market. In this paper, we introduce our dataset titled Spanish MArket Smart Home devices (SMASH) which we collected via structured data extraction from four major Spanish e-commerce platforms. Containing 5218 devices across 652 brands, the dataset provides an overview of smart home devices sold within Spain, the fourth largest economy in the European Union. The dataset is versatile as it includes details such as name, price, brand, model, rating, number of reviews, platform and category. The dataset can be used as primary source in research that involves consumer behaviour and microeconomics. Additionally, the details could be used for creating new datasets like privacy policies of brands and mobile applications (apps) used for the devices. The dataset is publicly accessible under license CC-BY-NC-4.0-ES. We note, however, that SMASH is limited to products sold within Spain and collected within a specific time window (start date: 2023–12; end date: 2024–08); users should consider the scope and temporal constraints when generalizing findings.
Keywords: IoT, Consumer electronics, Home automation, Smart home, Online shops
Specifications Table
| Subject | Computer Sciences |
| Specific subject area | Extracted data of sold smart home devices on four Spain-based e-commerce platforms. |
| Type of data | Table Raw. |
| Data collection | The data were collected using four python scripts (one script per platform) that were executed on a local workstation. To collect the data, we included platforms that sell smart home devices within Spain. Another inclusion criteria was that the platform must have had a designated category or categories for smart home devices so that we could automate the data collection process. Data was collected in the following time period: start date: 2023–12; end date: 2024–08. |
| Data source location | Primary data sources: - Amazon Spain. https://www.amazon.es [1] - MediaMarkt Spain. https://www.mediamarkt.es [2] - El Corte Inglés. https://www.elcorteingles.es [3] - PC Componentes. https://www.pccomponentes.com/ [4]. |
| Data accessibility | Repository name: e-cienciadatos (https://edatos.consorciomadrono.es/) Data identification number: DOI: 10.21950/JTIQQ5 Direct URL to data: https://edatos.consorciomadrono.es/dataset.xhtml?persistentId=doi:10.21950/JTIQQ5 |
| Related research article | None. |
1. Value of the Data
-
•
Our dataset stands out for providing real-world based information about the status of smart home devices in a European Union state.
-
•
This data’s value is versatile. Given the details they include, they can be used within different fields of study (see Table 1) like consumer behaviour, market tendencies, price analysis, hardware capabilities of sold devices, etc.
-
•
Researchers in the area of economics can analyze columns related to prices, number of reviews, device category and macro-category, and brands. Besides, researchers in the area of computing can make use of device category and macro-category to study hardware capabilities. Moreover, privacy researchers in areas like the General Data Protection Regulation (GDPR) may leverage information from this dataset to study the brands’ privacy policies.
-
•
This data can be further analyzed by the research community to create secondary data about aspects such as platform comparison, popularity of brands, etc.
Table 1.
Suggested research lines that can leverage the SMASH dataset.
| Dataset field(s) | Suggested research line |
|---|---|
| product_name, category | NLP / information extraction |
| brand, model | Market structure / brand analysis |
| price | Economics / price prediction |
| rating, review_count | Consumer behavior / demand modeling |
| product_link, product_id | Reproducibility / data enrichment |
| category, macro_category | Consumer electronics ecosystem studies |
| platform | Platform competition / digital markets |
2. Background
The creation of this dataset forms part of an ongoing project in which we study privacy within the context of smart home devices. First, we needed to identify the most popular smart-home devices in Spain, but we could not find a recent dataset with that information. Datafiniti’s “Electronic Products and Pricing Data” [5] contains similar fields to SMASH, but its records date from 2014 to 2018. Most smart-home datasets focus on network traffic or sensor measurements and therefore differ from SMASH. Consumer-electronics databases may provide similar information, but they require payment, whereas our dataset is publicly available.
With the goal of identifying the most popular brands and devices within the smart home market in Spain, we first identified many e-commerce platforms that Spain-based users can buy from. Afterwards, we selected only the platforms that allowed for the automation of information extraction of the sold devices by means of scraping. These platforms are Amazon Spain [6], MediaMarkt Spain [1], El Corte Inglés [2] and PC Componentes [3].
Our dataset, while built for privacy research, is versatile as it contains details like models, prices, platform, number of reviews, and others. This makes the dataset one that could be used in several fields.
3. Data Description
General Overview. The dataset, which contains extracted information of 5218 devices, is provided as a table in a file following the Tab-Separated Value format (a .tsv file). Each register (a row in the table) represents a device and the data collected about it. The table has a header row. Each register (row) has the following fields (columns):
-
A.
id: field used to identify uniquely each register in the table (integer number in text format)
-
B.
name: includes the full name of the product as extracted from the product name label on the webpage (text)
-
C.
price: refers to the selling price of the product (numeric, with the dot ‘.’ used as decimal separator)
-
D.
brand: indicates the name of the brand as extracted from the webpage of the product (text)
-
E.
model: is the model name as designated by the product’s webpage (text)
-
F.
code: a field titled code that can either be a universal code or an internal code of the product (text)
-
G.
rating: the rating of the product based on the 5-star reviewing system (numeric, with the dot ‘.’ used as decimal separator, or text ‘NULL’)
-
H.
reviews: the number of reviews available on the product’s webpage for that product (numeric, with the dot ‘.’ used as decimal separator, or text ‘NULL’)
-
I.
link: the link from which the data was extracted (url, text format)
-
J.
currency: the currency used in the sales of the product (EUR)
-
K.
platform: the platform from which the data was extracted. There are four platforms: Amazon, PC Componentes, El Corte Inglés, MediaMarkt (text)
-
L.
Category-ES: category as designated by the platform and shown on the product’s webpage in the official webpage (text)
-
M.
Category-EN: an English translation of the Category-ES field (text)
-
N.
Macro-Category-ES: a higher level category that we create in order to normalize the diverse categories that the platforms assign to the products (text)
-
O.
Macro-Category-EN: an English translation of the Macro-Category-ES (text)
-
P.
date: the date of collection of the data about the product (date, format YYYY/MM/DD HH-MM-SS)
Next, minimal instructions to import the data into two common data formats are provided.
Instructions to import the dataset in a Microsoft Excel sheet:
-
-
Open a new file *.xlsx
-
-
Import the data from the file “smash_2024_v02.tsv”, selecting the type of file as “CSV File”.
-
-
Indicate that data are delimited (not with a fixed width), and select as delimiter the tab character.
-
-
Establish the following format for each column of the data to import:
-
-
“General” format for columns “price”, “rating” and “reviews”. They contain numeric values. In advanced options, select the dot character “.” as the decimal separator, and none for the thousands.
-
-
“Date” (YYYY-MM-DD h:mm:ss or YMD) format for column “date”.
-
-
“Text” format for the other columns.
Instructions (sample Python code) to import the dataset into a Python DataFrame are provided in Table 2.
Table 2.
All devices-based dataset details.
| Field | Quantity |
|---|---|
| Devices | 5218 |
| Brands | 652 |
| Categories | 49 |
| Macro-Categories | 7 |
| Platforms | 4 |
| Price (non zero values) | 5193 |
| Rating (non NULL values) | 2597 |
| Reviews (non zero nor NULL values) | 2597 |
import pandas as pd
# Specify the path where the dataset file is located
dataset_file = ``[CUSTOM_PATH]/smash_2024_v2.tsv''
# Read the contents of the file into a Python DataFrame
df = pd.read_csv(dataset_file, sep='\t', decimal='.')
# Display the first few rows of the DataFrame
print(``DataFrame Sample:'')
print(df.head())
Further overview about the dataset is provided in Table 1.
Dataset Files and Folders. The dataset is publicly available (under license CC-BY-NC-4.0-ES) on e-cienciadatos [7] in the file named smash_2024_v02.tsv. ‘Tab Separated Values’ (TSV) is a widely used data exchange format, similar to CSV (‘Comma-Separated Values’) but using tabs to avoid issues when data fields themselves contain commas.
All Devices-based Description. Fig. 1 depicts the distribution of the price, rating and number of reviews values for all the devices in the dataset.
Fig. 1.
Price, rating and number of reviews distribution for all devices.
Platform-based Description. Table 3 shows general data description across platforms and Fig. 2 depicts the distribution of the price, rating and number of reviews values across platforms.
Table 3.
Platform-based dataset details.
| Platforms |
|||||
|---|---|---|---|---|---|
| pccomponentes | mediamarkt | amazon | corteingles | Total | |
| Device Count | 2472 | 1301 | 1184 | 261 | 5218 (sum) |
| Brand Count | 208 | 136 | 437 | 25 | 652 (unique) |
| Categories Count | 20 | 16 | 10 | 6 | 50 (unique) |
| Devices w/Ratings Count | 1271 | 97 | 1162 | 67 | 2597 (sum) |
| Devices w/Reviews Count | 1271 | 97 | 1162 | 261 [(!=zero)=67] |
2791 (sum) |
| Reviews Count | 42717 | 680 | 2450582 | 496 | 2494475 (sum) |
Fig. 2.
Price, rating and number of reviews distribution across platforms.
Brand-based Description. Fig. 3 depicts the distribution of device counts across platforms and the distribution of the aggregated device count per brand considering all the platforms (all devices). Fig. 4 highlights the absolute device counts for the top 50 brands and the share of each platform within those brands. Fig. 5 depicts the device counts for the top 50 brands across platforms. Fig. 6 shows in subfigures [a]. [b] and [c] the distribution of the median values of price, rating and reviews across brands, together with the full range of these fields for each brand (error bars specify the maximum and minimum value of that field for that brand). Subfigures [d], [e] and [f] of Fig. 6 show the distribution of devices with reviews, categories and macro-categories per brand. Finally, Fig. 7 presents the overlap of brands available across various combinations of platforms, illustrating how many brands are shared among them.
Fig. 3.
Distribution of device counts per brand for all devices and across platforms.
Fig. 4.
Top 50 brands by device count with platform distribution.
Fig. 5.
Top 50 brands by device count across platforms.
Fig. 6.
Median values (Price [a], Rating [b], Reviews [c]) and counts (Devices with Reviews [d], Distinct Categories [e], Macro-Categories [f]) by Brand.
Fig. 7.
Overlap of available brands across platform combinations.
Category and Macro-Category Details. Each selected platform includes a dedicated section for smart home devices. Table 4 presents the names of these sections along with the corresponding categories for each platform. As mentioned in the General Overview subsection, we aggregated the various platform categories into broader macro-categories. Table 5 subsequently displays these categories, now grouped by their assigned macro-categories.
Table 4.
Categories across platforms.
| Platform | Section | N° | Name (Original - Spanish) |
Name (English Translation) |
|---|---|---|---|---|
| El Corte Inglés | Domótica (Home Automation) | ECI01 | alarma y seguridad | alarm and security |
| ECI02 | camaras de vigilancia | surveillance cameras | ||
| ECI03 | detectores | detectors | ||
| ECI04 | iluminacion inteligente | smart lighting | ||
| ECI05 | mandos y control | remotes and controls | ||
| ECI06 | termostatos | thermostats | ||
| PC Componentes | Hogar Digital (Digital Home) | PCC01 | aire acondicionado | air conditioning |
| PCC02 | alarmas | alarms | ||
| PCC03 | altavoces inteligentes | smart speakers | ||
| PCC04 | batidoras | mixers | ||
| PCC05 | bombillas inteligentes | smart light bulbs | ||
| PCC06 | camaras bebes | baby cameras | ||
| PCC07 | camaras inteligentes | smart cameras | ||
| PCC08 | detectores de movimiento | motion detectors | ||
| PCC09 | enchufes inteligentes | smart plugs | ||
| PCC10 | estaciones meteorologicas | weather stations | ||
| PCC11 | freidoras | fryers | ||
| PCC12 | frigorificos | refrigerators | ||
| PCC13 | lavadoras | washing machines | ||
| PCC14 | lavavajillas | dishwasher | ||
| PCC15 | purificadores de aire | air purifiers | ||
| PCC16 | radiadores | radiators | ||
| PCC17 | robots aspiradoras | robot vacuums | ||
| PCC18 | termostatos inteligentes | smart thermostats | ||
| PCC19 | ventiladores | ventilators | ||
| PCC20 | wearables | wearables | ||
| Amazon | Hogar Digital (Digital Home) | AMZ01 | camaras inteligentes | smart cameras |
| AMZ02 | control climatico | climate control | ||
| AMZ03 | echo altavoces inteligentes | echo smart speakers | ||
| AMZ04 | electrodomesticos | home appliances | ||
| AMZ05 | enchufes inteligentes | smart plugs | ||
| AMZ06 | entrada y seguridad | entrance and security | ||
| AMZ07 | entretenimiento en el hogar | home entertainment | ||
| AMZ08 | iluminacion e interruptores | lighting and switches | ||
| AMZ09 | smartwatches | smartwatches | ||
| AMZ 10 | wifi y redes | wifi and networks | ||
| MediaMarkt | Smart Home | MMK01 | alexa echo | alexa echo |
| MMK02 | apple homepod | apple homepod | ||
| MMK03 | cerraduras electronicas | electronic locks | ||
| MMK04 | detectores y sensores | detectors and sensors | ||
| MMK05 | fire tv | fire tv | ||
| MMK06 | google nest | google nest | ||
| MMK07 | iluminacion | lighting | ||
| MMK08 | interruptores y enchufes | switches and plugs | ||
| MMK09 | kits smarthome | kits smarthome | ||
| MMK10 | plc wifi | plc wifi | ||
| MMK11 | repetidores y amplificadores wifi | wifi repeaters and amplifiers | ||
| MMK12 | routers wifi | routers wifi | ||
| MMK13 | seguridad inteligente | smart security | ||
| MMK14 | sistema wifi mesh | wifi mesh system | ||
| MMK15 | termostatos | thermostats | ||
| MMK16 | xiaomi mi | xiaomi mi | ||
Table 5.
Assignment of each Category to a Macro-Category.
| Category | Macro-Category | |
|---|---|---|
| air conditioning | PCC01 | Home Appliances |
| air purifiers | PCC15 | |
| climate control | AMZ02 | |
| dishwasher | PCC14 | |
| fryers | PCC11 | |
| home appliances | AMZ04 | |
| mixers | PCC04 | |
| radiators | PCC16 | |
| refrigerators | PCC12 | |
| thermostats | ECI06, MMK15 | |
| ventilators | PCC19 | |
| washing machines | PCC13 | |
| weather stations | PCC10 | |
| plc wifi | MMK10 | Routers and Networks |
| routers wifi | MMK12 | |
| wifi and networks | AMZ10 | |
| wifi mesh system | MMK14 | |
| wifi repeaters and amplifiers | MMK11 | |
| alexa echo | MMK01 | Smart Home |
| apple homepod | MMK02 | |
| echo smart speakers | AMZ03 | |
| fire tv | MMK05 | |
| google nest | MMK06 | |
| home entertainment | AMZ07 | |
| kits smarthome | MMK09 | |
| remotes and controls | ECI05 | |
| robot vacuums | PCC17 | |
| smart speakers | PCC03 | |
| smart thermostats | PCC18 | |
| xiaomi mi | MMK16 | |
| alarm and security | ECI01 | Smart Security |
| alarms | PCC02 | |
| baby cameras | PCC06 | |
| detectors | ECI03 | |
| detectors and sensors | MMK04 | |
| electronic locks | MMK03 | |
| entrance and security | AMZ06 | |
| Motion detectors | PCC08 | |
| smart cameras | AMZ01, PCC07 | |
| smart security | MMK13 | |
| surveillance cameras | ECI02 | |
| wearables | PCC20 | Wearable Devices |
| smartwatches | AMZ09 | |
| lighting | MMK07 | Smart Lighting |
| lighting and switches | AMZ08 | |
| smart light bulbs | PCC05 | |
| smart lighting | ECI04 | |
| smart plugs | PCC09, AMZ05 | Smart Plugs and Switches |
| switches and plugs | MMK08 |
Category-Based and Macro-Category-Based Description. This subsection presents some graphics that describe the main characteristics of the dataset across the categories and macro-categories. Fig. 8 and 9 depict the device counts per category and macro-category, while Fig. 10 and 11 depict the brand counts. Fig. 12, Fig. 13, Fig. 14 and 15 show the distribution of the same fields across platforms. Fig. 16 and 18 depict the distribution of price, rating and reviews across categories and macro-categories. Finally, Fig. 17 and 19 show the distribution of devices with reviews across categories and macro-categories.
Fig. 8.
Distribution of device counts per category.
Fig. 9.
Distribution of device counts per macro-category.
Fig. 10.
Distribution of distinct brand counts per category.
Fig. 11.
Distribution of distinct brand counts per macro-category.
Fig. 12.
Distribution of device counts per category across platforms.
Fig. 13.
Distribution of device counts per macro-category across platforms.
Fig. 14.
Distribution of distinct brand counts per category across platforms.
Fig. 15.
Distribution of distinct brand counts per macro-category across platforms.
Fig. 16.
Distribution of price, rating and number of reviews per category across platforms.
Fig. 18.
Distribution of price, rating and number of reviews per macro-category across platforms.
Fig. 17.
Distribution of number of devices with reviews per category across platforms.
Fig. 19.
Distribution of number of devices with reviews per macro-category across platforms.
4. Experimental Design, Materials and Methods
In order to acquire the data, we carried out the following steps:
-
•
We surveyed the existing e-commerce platforms that sell smart home devices in Spain.
-
•
We studied each platform on a case by case in order to know if there was/were (a) dedicated category/ies for smart home products regardless of the name of the section. We discarded platforms that did not have such capacity.
-
•
We analyzed the included platforms to ensure that the content of these sections contained only devices that were considered to be smart home and discarded those that mixed smart home devices with other products.
This process concluded with the selection of four platforms: Amazon Spain, El Corte Inglés, PC Componentes, and MediaMarkt Spain. In order to collect the data from these platforms, we analyzed the platforms; each on its own. We also studied the ‘robots.txt’ file of each platform to identify the web pages related to the smart home sections that were permitted for automated processing. This was followed by creating a Python script for each platform given the uniqueness of the html properties that each platform has. A high-level description of the whole procedure for each platform is provided in Table 6.
Table 6.
Step-by-step data obtaining process for each of the platforms.
| Step | PC Componentes | Amazon | MediaMarkt | El Corte Inglés | |
|---|---|---|---|---|---|
| Manual | 1 | List of smart home categories is obtained and stored in Json format | |||
| 2 | HTML structure of the pages of the product within the relevant category/ies is analyzed to identify elements and data to be collected | ||||
| Automated using customized script | 3 | For each platform, a function in the script is developed to navigate through all the product pages for each category | |||
| 4 | - | Access is limited to page 10 for each category | - | - | |
| 5 | Using Selenium and BeautifulSoup libraries, a different function extracts the product’s data for each of the categories identified in step 1 | ||||
| 6 | - | Function that scrolls the page to access all the necessary elements | |||
| 7 | - | - | - | Scrolling function includes a click function developed to access all the necessary elements | |
| 8 | Text normalization function to clean up special characters from some data fields | ||||
Limitations
While our initial goal was to create a more comprehensive dataset for all popular e-commerce platforms, the practical limitation of automating data extraction led to only selecting 4 platforms. Additionally, we believe that including devices in non-corresponding categories by the platforms themselves or having similar categories with different sets of devices in the different platforms may lead to bias within the dataset. Moreover, the field model is mostly represented by a code rather than being the model name itself which is typically embedded within the name field. We attribute such practice to easier querying of specifications of the product on the platform. For further enhancing this dataset, models shall be extracted from the field name, where the platforms include the most valuable information for analysis. This technique used by the platforms, while beneficial in the product querying on the platforms, is not machine-ready.
Ethics Statement
We confirm that we have read and followed the ethical requirements for publication in Data in Brief and confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.
Terms of Service (ToS): Web crawling is crucial for web functionality, generating backlinks that enhance a site's visibility. Consequently, companies typically permit crawling, with guidelines set in the ‘robots.txt’ file. During web scraping, we reference this file to determine which sections can be accessed and comply with its directives. Only one category in Mediamarkt was included in the ‘robots.txt’ for which we extracted the information manually.
Additionally, we have collected the data and elaborated the dataset supported by Articles 3 and 4 of the DIRECTIVE (EU) 2019/790 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.
Copyright: The dataset only contains publicly published data about smart home products sold by the considered platforms. Neither data from social media nor news outlets are included.
Privacy: No personally identifiable information was scraped nor has been included in the dataset. Anonymization is not relevant to this dataset
Scraping Policies: Web scraping (text and data mining) for the purpose of scientific research is explicitly allowed within the European Union by DIRECTIVE (EU) 2019/790 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.
Furthermore, the web scraping process was conducted in accordance with the limits established by each platform's websites, e.g. adhering to allowed access times and request rates, and ensuring that the services provided by the platforms were not disrupted.
Credit Author Statement
Jhovany Quintana-Vera: Methodology, Software, Validation, Formal Analysis, Investigation, Resources, Data Curation, Writing – Original Draft; Ana I. González-Tablas: Conceptualization, Methodology, Validation, Data Curation, Writing - Review & Editing, Visualization, Supervision, Project Administration, Funding acquisition; Mohammed Rashed: Conceptualization, Methodology, Validation, Resources, Writing - Original Draft, Data Curation, Writing - Review & Editing, Visualization, Supervision.
Acknowledgements
Jhovany Quintana-Vera’s work is supported by a UC3M Full Scholarship to the Master’s Study for the Master in Computer Science and Technology of the University Carlos III de Madrid.
Mohammed Rashed’s work is supported by the Recualificación-Margarita Salas grant (call of Universidad Carlos III Madrid) financed by the Ministerio de Ciencia, Innovación y Universidades and the European Union-Next Generation EU.
Ana I. González-Tablas’ work has been supported by the European Defence Industrial Development Programme (EDIDP) under grant agreement No EDIDP-CSAMN-SSC-2019–022-ECYSAP (European Cyber Situational Awareness Platform), and by the Ministerio de Ciencia, Innovación y Universidades under Grant No TIN2016-79095-C2-2-R (SMOG).
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Contributor Information
Jhovany Quintana-Vera, Email: jhquinta@inf.uc3m.es.
Ana I. González-Tablas, Email: aigonzal@inf.uc3m.es.
Mohammed Rashed, Email: mrashed@inf.uc3m.es.
Data Availability
e-cienciadatosSpanish MArket's Smart Home devices' (SMASH) (Original data).
e-cienciadatosSpanish MArket's Smart Home devices' (SMASH) (Original data).
References
- 1.Amazon Spain. https://www.amazon.es. Last access, December 2025.
- 2.MediaMarkt Spain. https://www.mediamarkt.es. Last access, December 2025.
- 3.El Corte Inglés. https://www.elcorteingles.es. Last access, December 2025.
- 4.PC Componentes. https://www.pccomponentes.com/. Last access, December 2025.
- 5.Datafinity, “Electronic Products and Pricing Data”, data.world. URL: https://data.world/datafiniti/electronic-products-and-pricing-data. Last access, December 2025.
- 6.Smart home - worldwide. https://www.statista.com/outlook/dmo/smart-home/worldwide. Last access, October 2023.
- 7.Quintana Vera, Jhovany Antonio; González-Tablas, Ana Isabel; Rashed, Mohammed Ahmed Fahim, 2025, ``Spanish MArket's Smart Home devices (SMASH)'', doi: 10.21950/JTIQQ5, e-cienciaDatos, V2. Accessibility: public; license: CC-BY-NC-4.0-ES.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
e-cienciadatosSpanish MArket's Smart Home devices' (SMASH) (Original data).
e-cienciadatosSpanish MArket's Smart Home devices' (SMASH) (Original data).



















