Data storage and query optimization for Blockchain-based agricultural supply chains using storage light nodes

Mei Sun; Na Luo; Xing Bin; Feng Chen; Qingbo Liu; Xiaohui Liu; Chuanheng Sun

doi:10.1038/s41598-025-17258-w

. 2025 Sep 24;15:32753. doi: 10.1038/s41598-025-17258-w

Data storage and query optimization for Blockchain-based agricultural supply chains using storage light nodes

Mei Sun ^1,², Na Luo ^1,³, Xing Bin ^1,³, Feng Chen ^1,³, Qingbo Liu ^1,³, Xiaohui Liu ^1,², Chuanheng Sun ^1,^3,^✉

PMCID: PMC12460803 PMID: 40993185

Abstract

The agricultural product supply chain comprises numerous links, generates a substantial amount of data, and is susceptible to high data loss rates, posing significant challenges to data management and traceability. While ensuring data integrity, the high redundancy storage of blockchain increases resource consumption and limits the participation of resource-constrained nodes. To address this gap, we propose a Storage Light Node (SLN) model for the characteristics of the agricultural supply chain. This model integrates a cold/hot data classification mechanism based on identity relevance, generation time, and query frequency, and realizes the selective local storage of high-priority data. A query optimization strategy based on Bloom filters was designed, which accelerated the speed of data retrieval. Experiments using 50,563 real records from the agricultural product supply chain show that compared with the full node, SLN reduces the storage usage by 95.10%, and the average query time is 30.91 ms, which is much faster than the traditional light node. This model provides a scalable and efficient solution for blockchain-based agricultural traceability.

Keywords: Light node, Blockchain, Agricultural product supply chain, Storage model, Query optimization

Subject terms: Engineering, Mathematics and computing

Introduction

Agricultural products constitute an essential foundation for human survival and development, and their food safety has a direct impact on public health and well-being. Agricultural supply chain management is particularly complex and challenging due to the limited shelf life of products, highly volatile market demand and price sensitivity¹. The supply chain operates under a unique circulation model, characterized by vast amounts of data and frequent data interactions². The agricultural supply chain involves many stakeholders and requires an efficient and reliable system to ensure data security and transparency³. In recent years, with the maturation and development of smart agricultural technologies, devices such as wireless sensors⁴, agricultural robots⁵, and unmanned aerial vehicles⁶ have been utilized to monitor and upload relevant data in real time. Combined with technologies like artificial intelligence⁷, the Internet of Things (IoT)⁸, and cloud computing⁹, automation has been achieved across various supply chain stages, significantly reducing labor costs. Due to the large number of participating enterprises in various stages of the supply chain, each of which operates independently yet is closely interconnected, any change in one stage can impact other stages, with the effects amplifying over time. The reliability of the data is contingent upon the reliability of the system’s transactions¹⁰. Although traditional centralized database systems are still widely adopted, they exhibit low data exchange efficiency among enterprises and are vulnerable to data tampering¹¹. They also face challenges such as low query efficiency and difficulty scaling traceability data¹². With decentralized and tamper-resistant features, blockchain technology provides a trustworthy traceability solution for supply chains¹³. However, blockchain networks themselves face storage bottlenecks. By October 2, 2024, the storage requirements for full nodes had reached approximately 600 GB for Bitcoin and over 16 TB for Ethereum, highlighting the growing burden on blockchain participants¹⁴. This indicates that in the current blockchain networks, supply chain participating enterprises with limited storage, bandwidth, and other resources face significant storage capacity challenges when joining the network as full nodes. This data-driven demand has led to an increasing need for scalability in storing agricultural product traceability data.

Numerous studies, both domestic and international, have explored diverse storage models to enhance the scalability of blockchain technology^15–21. Current research primarily focuses on both on-chain and off-chain storage approaches. Off-chain storage involves the utilization of external storage systems such as the Distributed Hash Table and the InterPlanetary File System. On-chain approaches commonly utilize cooperative mechanisms, sharding strategies, and light node storage models to optimize storage efficiency. Feng et al.²² proposed a blockchain storage architecture grounded in information-centric networks. The architecture introduces a virtual chain index and incorporates a collaborative block replica deletion algorithm designed to reduce on-chain storage overhead and enhance replica utilization efficiency. Nevertheless, this approach leads to longer data access times compared to fully redundant storage Schemes. Sun et al.²³ proposed a rice-origin traceability model leveraging blockchain and edge computing, where embedded devices were used to implement a multi-chain architecture. This approach achieved an 18% reduction in storage consumption compared to conventional single-chain models, while also reducing data query latency. Zhang et al.²⁴ proposed a multi-level indexing method for master-slave blockchains, which significantly enhances query efficiency. However, constructing the index requires a considerable amount of time, which directly affects the overall query performance of the master-slave blockchain system. Liu et al.²⁵ proposed a collaborative storage method of “on-chain + off-chain”, which belongs to data compression and solves key challenges such as high storage pressure, low query efficiency, and information explosion. This study offers valuable guidance for the development of other traceability systems.

In conventional blockchain storage architectures, full-node full-redundancy remains a widely adopted approach, requiring each full node to download and maintain a complete copy of the distributed ledger. According to the Pareto principle, in blockchain networks such as Bitcoin and Ethereum²⁶, approximately 20% of transaction data accounts for the majority of query activity. This suggests that storing all redundant data on every blockchain node is unnecessary. To alleviate storage pressure, Ethereum introduced the concept of light nodes, which optimize resource usage by retaining only block header information. Block headers encapsulate critical metadata for verifying blockchain states and querying executed transactions²⁷. Zhao et al.²⁸ proposed the Enhanced Simplified Payment Verification (ESPV) model, which is built upon the Unspent Transaction Output (UTXO) framework. In this model, historical block data is stored in shards, while recent block data is maintained with full redundancy. This belongs to the pruning operation of data. The enhanced light node supports mining functionality, thereby offering capabilities comparable to those of a full node. Although this storage approach alleviates storage pressure, the UTXO model is primarily designed for simple value transfers and lacks support for complex smart contract logic. As a result, it is difficult to meet the traceability requirements of agricultural products with multiple attributes and dynamic states, making it unsuitable for agricultural product supply chains. Furthermore, when light nodes receive new requests, they depend on full nodes to provide Merkle proofs for transaction verification. This dependency has been shown to undermine the decentralized nature of blockchain networks and degrade query efficiency²⁹.

In summary, agricultural product supply chains continuously generate large volumes of traceability data across multiple stages. The use of full nodes with complete data redundancy significantly increases the storage burden on individual blockchain nodes. As the ledger size grows, query efficiency gradually declines. While light nodes offer a viable solution for devices with limited resources, research on enhancing storage and query performance through improved light node mechanisms remains scarce, particularly in the context of agricultural product traceability. Owing to the inherent biological characteristics of agricultural products, traceability data exhibits strong periodicity, with substantial variations in access frequency across different stages of the supply chain. Therefore, there is an urgent need to design an improved light node model that retains the low resource consumption of traditional light nodes while incorporating the high query efficiency of full nodes, to meet the requirements of efficient and reliable traceability in agricultural supply chains.

Table 1 presents a comparative analysis of representative blockchain-based storage and query optimization methods alongside the approach proposed in this study, within the context of supply chain applications. The evaluation covers several key dimensions, including storage burden reduction, query efficiency, cold and hot data management, smart contract compatibility, and system scalability. The analysis reveals that most existing methods focus on optimizing a single performance metric, making it difficult to achieve comprehensive improvements across multiple dimensions. In contrast, the SLN model proposed in this study not only significantly reduces storage pressure but also incorporates a dynamic cold/hot data storage mechanism and a Bloom filter-based query optimization strategy. This approach provides a scalable and efficient solution for applying blockchain technology to agricultural product supply chains.

Table 1.

Comparative analysis of blockchain storage and query optimization methods for supply chain systems.

Method	Storage Reduction	Query Efficiency	Cold/Hot Data Management	Smart Contract Compatibility	Scalability
Feng et al.²² (ICN-based)	Moderate	Low	No	Yes	Moderate
Sun et al.²³ (Edge Multi-Chain)	High (↓82%)	High	No	Yes	Limited (multi-chain overhead)
Zhang et al.²⁴ (Indexing)	Low	High	No	Yes	Moderate
Liu et al.²⁵ (Hybrid On/Off)	High	Moderate	No	Yes	Low (off-chain trust issues)
Zhao et al.²⁸ (ESPV)	Moderate	Low	No	No (UTXO-limited)	High
Proposed SLN (This Study)	High (↓95.1%)	High	Yes (Dynamic weighting)	Yes	High

Open in a new tab

Contribution

In order to address these challenges, this paper proposes a scalable blockchain storage model that has been tailored to agricultural supply chains.

We propose a Storage Light Node SLN architecture that bridges the gap between traditional full nodes and light nodes. SLNs retain the data integrity guarantees of full nodes while adopting the lightweight characteristics of light nodes by storing only a selective subset of blockchain data. This design optimizes both resource consumption and storage efficiency in blockchain systems.
A cold/hot data classification model is proposed based on the characteristics of data in the agricultural product supply chain. After data is uploaded to the blockchain, it is categorized according to temperature thresholds determined by factors such as supply chain relevance. Subsequently, cold and hot data are selectively stored. A storage and cooling algorithm dynamically adjusts the heat value of the data based on its access frequency, thereby ensuring data timeliness and storage efficiency.
We propose a Bloom filter-based query mechanism constructed using batch number identifiers. This method enables efficient local data lookup on SLNs, reducing reliance on full nodes for query resolution and significantly improving overall query performance in real-world supply chain scenarios.

Chapter arrangement

The remainder of this paper is organized as follows: Sect. 2 presents an overview of the cold and hot data storage model proposed in this study. In Sect. 3, the network storage model based on storage light nodes is introduced. Section 4 provides a detailed discussion of data query optimization techniques. Section 5 reports the experimental analysis conducted to evaluate the proposed methods. In Sect. 6, a summary and outlook for this research are provided.

Cold /Hot data storage model

Analysis of agricultural supply chain information

Agricultural supply chain key information

The agricultural supply chain encompasses multiple complex stages, each requiring the collection and uploading of large volumes of traceability data. Most agricultural supply chains can be divided into six distinct phases. Taking the wheat supply chain as an example, Table 2 summarizes the key traceability information across these six stages: planting, harvesting, processing, warehousing, logistics, and sales.

Table 2.

Key information in agricultural product supply Chains.

Agricultural supply chain stage	Key Information
Planting	Basic information of growers; planting method; contact information; name of planting base; address of a place of planting; seed source; planting time; planting method; seedling nursery time; irrigation and drainage time; information on diseases and treatment; harvesting time; information on natural disasters; man-made disasters; planting operation; information on fertilizers and records of their use; operators; information on pesticides and records of their use; cost of planting; number of harvested quantities; and number of shipped quantities; harvest data; planting yield; product quality
Harvesting	Grower information; pesticide sampling records at the time of purchase; commodity origin information; time of purchase; commodity species; place of purchase; information on storage enterprises; purchase price; cost of drying, de-killing, warehousing, etc.; selling price; impurity content; de-impurity rate; inventory number product source; quality inspection number; time of release; information on storage personnel;
Processing	Processing method; equipment information; processing cost; factory information; processing procedures; processing-related personnel; processing specifications information; real-time environmental temperature; real-time environmental humidity; processing price; trademark name; packaging source, raw materials; processing personnel information; packaging time; product packaging number product batch number; product quality information;
Warehousing	Warehousing enterprise name; warehousing enterprise address; enterprise legal person information; warehousing administrator information; license information; administrator contact information; Inventory number; Product source; Product quantity; incoming time; quality inspection number; outgoing time; warehousing cost; warehousing price;
Logistics	Name of the logistics company; address of the logistics company; information on the person in charge of transport; information on licenses; contact details of the person in charge; means of transport; the number of the means (license plate number); place of departure; time of departure; destination; cities and destinations passed through; time of arrival; temperature and hygiene inside the transport vehicle; flow of logistics; cost of logistics; logistics price;
Sales	Name of the merchant; address of the shop; information on the person in charge of the shop; information on the business license; contact information of the merchant; name of the product; quantity of the product; time of purchase; time of storage of the product; product storage and purchase number; time of shipment; basic information and hygiene and health conditions; photos of the sales environment; purchase price; sales price;

Open in a new tab

Agricultural supply chain information analysis

In the agricultural product supply chain, the data of participating enterprises varies in access priority across different periods and stages, and this priority changes over time. According to a paper analyzing Bitcoin data³⁰, the frequency of blockchain users accessing block data is closely related to the block generation time. Over 80% of user access is directed towards blocks generated within a single day, with the probability of accessing blocks generated earlier decreasing rapidly. This phenomenon is more pronounced in transactional blockchains. For enterprises involved in the agricultural product supply chain, there is a large volume of data access during the initial stages. Newly generated block data quickly reaches its peak access probability when it first appears, but as time passes and the data decays, it eventually enters a declining phase³¹. Following the principle of Newton’s Law of Cooling³², as new data is generated and the supply chain progresses, old data gradually loses its relevance.

Analyzing data queries in the agricultural product supply chain, it can be concluded that the data with the highest access frequency is stored in a small number of newly generated blocks, showing higher real-time relevance. For the participating enterprises, the query frequency of a product batch is closely related to the operations at each stage. For example, in the planting stage of the wheat supply chain, operations such as fertilization, weeding, and pest control require querying information on planting and other farming activities. Therefore, during this stage, the frequency of data access is closely correlated with the number of operations performed. If an operation is queried too often over time, there may be a problem with the batch, and a product recall may be involved³³. Once all operations at this stage are completed, the data access frequency and data heat will also decline if no further queries are made in the subsequent stage³⁴. In other words, the total number of queries for a block and the number of access requests per unit of time represent its historical query frequency.

The data accessed by participating enterprises in the supply chain is called the access set. In practical applications, each enterprise’s access set primarily consists of data relevant to their specific stage, as this data directly influences their operations and decision-making³⁵. For instance, historical data such as crop yield, weather, and fertilizer usage are often used to improve agricultural productivity and decision-making³⁶. A smaller portion of the access set consists of data that has a significant impact on the current stage, such as data related to upstream or downstream activities or other stages. Considering the characteristics of agricultural product supply chain traceability, data from each stage can be classified and stored based on the identity of the participating enterprises (such as growers, processors, etc.). For example, as a grower, the participating enterprise typically gathers critical data from IoT devices, such as temperature, humidity, wind speed, and light intensity, to plan agricultural activities and monitor crop growth. The storage stage primarily focuses on information related to inventory levels. Based on the above analysis, the heat weight level of the data is determined by factors such as the relevance of the participating enterprise’s role identity, the block data generation time, and the historical access frequency of the data. Each participating enterprise can store and manage data highly relevant to their specific stage, which is then incorporated into the blockchain as SLN. Through the data partitioning model proposed in this paper, data query efficiency can be significantly enhanced while simultaneously reducing storage pressure.

To ensure that heat-based storage decisions are well-founded, this paper employs various statistical analyses and modeling methods. Specifically, descriptive statistical analysis (e.g., access frequency and mean) is used to study the distribution of block queries. In addition, a heat model based on Newton’s law of cooling is constructed, focusing on three factors: identity correlation, time decay, and historical access frequency, to develop a weighted heat assessment model that reflects the decay pattern of the data heat index. Finally, the Zipf distribution is introduced to simulate the non-uniformity in data access.

Cold /Hot data model classification

The data is assigned weights, and its coldness, warmth, and heat levels are determined based on factors such as the participating enterprise’s identity relevance in the supply chain, the block generation time, and the historical frequency of data access. For each batch of data, a weight W= Inline graphic *T_{_ Association}+*T_{_ time}+*T_{_ frequency}, where T_{_ Association} represents the heat associated with the identity correlation of data to supply chain stages, T_{_ time} represents the heat variation over time, and T_{_ frequency} represents the heat based on access frequency on the blockchain. In this model, W = ( Inline graphic , , )represents the weight coefficients of data characteristics, with = 1 and , , ∈[0,1]. Through data analysis, it is observed that in supply chains, , , typically take average values³⁷.

In practical applications, the data access sets in agricultural supply chains are mostly concentrated within the respective stages, while the likelihood of accessing data from other stages is relatively low. Therefore, data classification by heat levels is based on the identity of the supply chain stages. The heat value T__Association is assigned based on the participating enterprise’s identity in each stage and is divided into two levels: data associated with the participating enterprise’s stage, which is considered more critical, is assigned a heat level T₁, while data from other stages, with a lower access probability, is assigned T₀, where T₁ > T₀. Mathematically:

T_ _time represents a variable that changes dynamically based on the time of data generation. The expression for T_ _time is as follows:

Where α represents the coefficient of heat decay over time, and t₁ is the initial heat value of the product. Assuming t denotes the time duration for which the data exists, Inline graphic represents the product’s shelf life, measured in days. The expression is as follows:

According to the analysis of agricultural product supply chain query characteristics, within the product’s shelf life, if the data generation time falls within the first quarter of the shelf life cycle, the initial value of T__time is set to α* t₁. As time progresses, for every additional quarter of the cycle, the value of α is halved, forming a decay trend. Specifically, the value of T__time will be adjusted according to the formula above, where Inline graphic indicates the ceiling function. Once the product has exceeded its specified shelf life by six months, in compliance with food safety regulations, its traceability data can be deleted, and T__time is set to 0, significantly reducing the likelihood of the data being queried.

The historical access frequency of data indicates that the higher the frequency of data access within a given period, the greater the data’s heat level. Even if the total query count of a specific block is relatively low, it should still be considered “hot data” if it has been queried multiple times recently. This is because queries occurring close to the current time signify a greater demand in the near term.

Assuming the historical query records of the data are Inline graphic ,,……, where each corresponds to a query timestamp. The current time is T, and the formula is expressed as follows:

Inline graphic represents a function related to queries, which can typically be directly represented by the number of queries n.

Inline graphic represents the time weight function, which assigns higher weights to queries that are closer to the current time. It can be expressed using an exponential decay function as follows:

Where λ represents the decay factor, which determines the speed at which time influences the weight. If the agricultural product supply chain cycle is short and data expires quickly, λ is typically between 0.1 and 0.5. If the agricultural product supply chain cycle is longer and data does not expire easily, λ is typically between 0.01 and 0.1. The term Inline graphic Represents the time difference from the query occurrence. When the query time is very close to the current time T, the weight approaches 1, meaning that these queries have a significant impact on the data’s heat. As the query time moves further from the current time, the weight decays exponentially, reflecting the decreasing impact of older queries on the data’s heat.

The data heat level W_T determination rule is as follows:

The heat level is greater than or equal to the first threshold, i.e., WW₁, it is classified as hot data(hot).
The heat level is less than the first threshold but greater than the second threshold. i.e., W₂ < W < W₁, at which point it is considered warm data (warm).
The heat is less than or equal to the second threshold, WW₂, at which point it is considered cold data (cold).

To determine the primary threshold W₁ that distinguishes hot data from the rest, a target proportion R_target∈(0,1) is defined first, representing the desired percentage of hot data relative to the total dataset size N. The dataset is then sorted in descending order according to the computed W_i values. The threshold W₁ is chosen such that the top-ranked records account for approximately R_target·N entries:

Similarly, to determine the secondary threshold W₂ that distinguishes warm data from cold data, a cumulative coverage ratio R_warm (where R_warm > R_target) may be defined. The threshold W₂ is determined such that:

This quantile-based thresholding method ensures that hot, warm, and cold data are classified according to the actual distribution of heat scores in the dataset and and can be adjusted flexibly based on storage constraints and system performance goals.

Network storage model based on storage light nodes

Data storage network model

The blockchain storage model includes three distinct types of nodes: full nodes, storage light nodes, and light nodes. These nodes are designed to accommodate varied operational scenarios.

Full Nodes (FNs): Store the entire blockchain data, providing the highest level of data security and integrity. They are suitable for high-performance, large server devices.
Storage Light Nodes (SLNs): Store a subset of block data and are suitable for medium to small server devices or resource-constrained devices.
Light Nodes (LNs): Store only block header information and support simple payment verification, making them suitable for devices with extremely limited resources, such as embedded gateways and mobile devices.

In the event of querying a node, in the absence of local storage of transaction data, said node is capable of transmitting a request to other nodes within the blockchain. By calling smart contracts, the data is requested and validated, and the requested data is sent from the receiving node to the requesting node.

In the interest of simplicity, the following definitions are proposed: within the context of the entire blockchain network, full nodes are defined as FN, storage light nodes as SLN, and light nodes as LN. The collection of blocks is defined as: Blocks={B_i∣ i∈N,1 ≤ i ≤ n} where B_i represents the i-th block, and n is the total number of blocks. H_{head i} and H_{body i} represent the block header and block body content of block B_i, respectively.

The set of nodes is defined as: Nodes = FN∪SLN∪LN.

Following the division of the block body content according to heat thresholds, block content with higher data heat is defined as H_hot, block content with lower data heat is defined as H_cold, and block content with intermediate heat is defined as H_warm. That is to say,

H_body(i)→{“hot”, “warm”, “cold”}.

The storage content of full nodes is:

FN≡[(H_{head i}, H_{body i})∣i∈N,1 ≤ i ≤ n]

The storage content of storage light nodes is:

SLN≡[(H_{head i}, H_{body j})∣i∈N,1 ≤ i ≤ n, H_{body j}∈{“hot”,“warm”}]

The storage content of light nodes is:

LN≡[(H_{head i})∣i∈N,1 ≤ i ≤ n]

As illustrated in Fig. 1, the storage states of three distinct types of nodes ( FN, SLN, and LN) are depicted after the upload of agricultural product supply chain data to the blockchain network. This figure takes three pieces of data as examples to vividly illustrate the role division of different nodes in data storage and processing. FN stores complete block data, inclusive of block headers and block bodies. The system under discussion has the capabilities of data broadcasting, verification, and storage, and is the guarantor of data integrity for the entire system. LN only stores the block header information, does not save the block body, and does not have local verification capabilities. LN primarily engages with FN, verifying the existence of transactions by utilizing the Merkle Tree to facilitate lightweight operations. SLN, as the intermediate node proposed in this study, has undergone functional expansion on the basis of LN. In addition to the preservation of block header information, SLN also selectively stores certain types of data, categorized as “hot data” and “warm data”, on a local basis, with the storage location determined by the data heat. The popularity of this part of the data is determined by a number of factors, including the frequency of data access, the generation time, and the relevance of the enterprise identity. The determination of cold/hot data classification heat levels is shown in Algorithm 1 below. The term is employed to delineate the logic of classifying blocks on the basis of data access heat. This involves the calculation of the heat weight for each block and its subsequent division into three levels: hot, warm, or cold. The result of this analysis provides a foundation upon which SLN can develop its storage strategies.

A blockchain network storage data model is constructed for the characteristics of the agricultural supply chain. The model is outlined as follows: when the data of each stage in the agricultural supply chain layer is uploaded to the blockchain layer in real-time through IoT layer devices such as sensors, routers, etc., it is controlled by the smart contract and stored in each node of the blockchain after the consensus mechanism.

The agricultural supply chain layer is comprised of six primary segments: namely, planting, harvesting, processing, warehousing, logistics, and sales. The collection of planting stage data is facilitated by agricultural sensors, data collection, and communication equipment. The collection and storage stage involves the acquisition and decontamination of crops, followed by their storage, to maintain supply, quality, type, and inventory of agricultural products. The subsequent stage of the process, known as the crop processing stage, involves the processing, handling, and packaging of the crop. The storage stage data encompasses fundamental information regarding storage enterprises, environmental testing data, transaction records, and price information. The logistics and transportation stage is responsible for the transmission of logistics enterprises and transportation-related information to the blockchain. The sales stage involves the distribution of the product through intermediaries, i.e., distributors, to the final consumers. The data of the final link is uploaded to the blockchain data layer. The agricultural supply chain stage is illustrated in Fig. 2, which establishes a connection between real-world supply chain segments and the blockchain layer via IoT infrastructure.

Fig. 2 — Agricultural Products Supply Chain Stage.

The Internet of Things (IoT) layer incorporates operational monitoring devices, automation control devices, environmental monitoring sensors, location tracking devices, data transmission and communication devices, and other related components. The IoT facilities, including an array of sensing devices such as temperature sensors, GPS, RFID tags and readers, WiFi, ZigBee, and other technologies, facilitate data transmission. These devices play a pivotal role in the collection, transmission, and processing of data, ensuring efficient operations and transparency within the supply chain.

The blockchain layer includes various components such as data storage, network communication mechanisms, consensus protocols, and smart contracts, covering key technologies like cryptography, timestamps, P2P networks, and consensus mechanisms. The blockchain has built-in smart contracts that are tamper-proof and co-maintained. A smart contract publishes the compiled contract code to the blockchain network through a transaction, thus generating an independent contract address for users to invoke. Transactions are packaged after verification, Merkle trees are constructed to form blocks, which are broadcast across the blockchain network, and after consensus is reached, distributed storage is performed, with underlying non-relational databases like LevelDB.

Data storage process

The conventional procedure for storing blockchain data on-chain principally entails the involvement of FNs. The present research proposal posits a taxonomy of all nodes in the blockchain network into three distinct categories: FNs, LNs, and SLNs. SLNs are responsible for the storage of a proportion of the block, with this proportion being determined by each node’s storage capacity and storage willingness. In the blockchain network, the FN is responsible for storing the complete data set and broadcasting and verifying transactions. Nodes with weaker performance can act as SLNs, storing only part of the block data and verifying part of the transactions independently. LNs only store the block header information. They can trust one or more FNs to quickly confirm whether the transactions are included in a certain block in a device with limited resources. This enhances the decentralization degree of the whole network. Following the calculation of the data validity, heat weight value, and storage capacity according to the actual situation, the data heat level determination is shown in Algorithm 1, the SLN storage data is shown in Algorithm 2, and the heat cooling change storage algorithm is shown in Algorithm 3.

Algorithm 1 — Function calTemp(block_data, ω₁, ω₂).

Algorithm 2 — Function SLN store data process(block_data).

Algorithm 3 — Function trigger_mechanism(block_data, will).

Algorithm 1 The calculation is to be based on the data’s heat weight, with the block’s heat value being determined by taking the highest heat value of the data in the block. The level classification is to be performed by the established thresholds. In the event that the heat value exceeds the first threshold, the block is to be classified as a hot block. Conversely, if the heat value falls short of the second threshold, the block is to be classified as a cold block. Otherwise, the block is to be classified as a warm block.

Algorithm 2 When data from various stages is uploaded to the blockchain and the corresponding block is generated, the verified block is broadcast to other nodes via flooding. If an FN receives the data, it stores the block directly. For resource-constrained nodes, such as SLNs, the heat level of the block is evaluated based on the data heat determination rules. If the block is hot, it is stored locally. If the block is cold, it is discarded. For warm blocks, the decision to store them is made based on factors such as storage capacity.

Algorithm 3 The heat weight value of data changes over time. A polling method calculates data access frequency and heat changes within a time unit. A time-trigger mechanism is employed to periodically update the heat weight. If the SLN storage reaches its capacity limit, data with lower heat values is deleted to achieve data deletion and replacement storage.

Data query optimization

In traditional light nodes (LNs), only block header data is retained for basic validation, while full transaction data resides in FNs. Consequently, querying any transaction or state information requires interaction with FNs, resulting in high latency and increased communication costs due to factors such as network topology, bandwidth limitations, and block propagation delay³⁸.

In order to address these issues, the proposed SLN integrates a layered storage model combined with a Bloom filter to optimize query efficiency. The Bloom Filter is a data structure used for quickly determining whether an element exists in a set. In this structure, SLNs selectively store “hot” data records with high query frequency in local storage. The Bloom filter is constructed using the keys of these locally stored records. Specifically, a series of distinct hash functions (e.g., ℎ₁,ℎ₂,…,h _k ) are employed to process each data key, thereby setting specific bits in a bit array of length m. This probabilistic data structure allows for fast membership testing with time complexity O(k), where k is the number of hash functions.

During query operations, instead of interacting with FNs, the SLN first checks the Bloom filter to determine whether the requested data is likely in its local hot storage. If the filter returns a positive result (i.e., all k bits are 1), the data is retrieved locally, avoiding remote access. If any bit is 0, the SLN can quickly determine that the data is not stored locally and forward the request to an FN. This mechanism has been demonstrated to substantially reduce unnecessary storage access and external communication, thereby effectively lowering the overall query time.

Moreover, the integration of the Bloom filter with the SLN’s storage architecture supports a scalable and lightweight query process: the hot data is indexed by the filter, while cold data remains on-chain and is accessed only when necessary. In comparison with traditional LNs, which are required to request each query from an FN, the SLN approach has been shown to achieve a substantial reduction in time complexity, from remote O(n) query complexity to local O(k), where k≪n and k is typically constant.

Product data numbering design for agricultural supply chain

To ensure the security of data before uploading, the data from each stage is effectively collected and directly uploaded to the blockchain. For example, after the data collection is completed in the planting stage, the IoT device data will be directly uploaded through communication equipment or centrally uploaded to the blockchain by a dedicated administrator. In this paper, the product batch number ID is structured as “product name + first upload date + batch number + stage number.” For example, “sp20221205010A” represents the data ID for the planting stage of batch 01 of the product uploaded on December 5, 2022. Its final traceability code is the last sales stage number, i.e., “sp20221205010F”. Table 3 shows the design of the product batch number encoding for the agricultural supply chain.

Table 3.

Design of product batch number coding for the agricultural supply chain.

Batch Number	Planting stage	Storage stage	Processing stage	Warehousing stage	Logistics stage	Sales stage
sp2022120501	sp20221205010A	sp20221205010B	sp20221205010C	sp20221205010D	sp20221205010E	sp20221205010F

Open in a new tab

Constructing a bloom filter

The Bloom Filter is a data structure used for quickly determining whether an element exists in a set. In this paper, the Bloom Filter is utilized to facilitate efficient data queries. The method maps the batch number IDs of each stage in the supply chain as keywords. When an SLN user queries local data, the data’s existence is first quickly determined locally. If the data is not found locally, a request is made to the FN. After receiving the request, the FN uses the Bloom Filter in each block header to quickly identify which blocks may contain the data. Specifically, during the query, the FN will traverse all block headers, using the Bloom Filter to narrow the search range and determine the blocks that may store the data. Then, the FN traverses the narrowed blocks and returns the corresponding data, reducing computational overhead.

To construct a Bloom filter, the batch number ID in the transaction needs to be extracted, k different hash operations are performed on the batch number ID, and the resultant values of the hash operations are bitwise-or’d to obtain the corresponding Bloom filter. The Bloom filter can effectively help SLN users pre-filter the block numbers where the given keyword might exist, and then determine whether the block is stored locally. This approach minimizes unnecessary data traversal and query operations, thereby enhancing data query efficiency and alleviating the overall system load.

The process of constructing a Bloom filter is as follows:

(1) Initialization: Create a bit array of length m, with all bits initialized to 0.

(2) Adding Elements: The element to be added is mapped to positions in the bit vector using hash functions. The Bloom filter is added to the block header information. In this paper, the batch number ID of transaction data is added to the Bloom filter. The hash value obtained by performing k different sets of hash operations on the batch number ID of the transaction data is used as the keyword, and the keyword obtained each time is respectively modulo divided by m, and the data bits obtained are respectively set to 1.

(3) Querying Elements: the element to be queried, i.e., the product’s batch number ID, is mapped to k positions in the bit vector by k hash functions, and these positions are checked to see if they are 1. If any of the bits is 0, the element must not exist in the block; if all the bits are 1, the element may exist in the block. Using this method, it is possible to determine with some margin of error what blocks the data might be stored in Fig. 3 shows a particular state of the Bloom filter designed by performing k hash calculations using the batch number ID as the keyword.

Fig. 3 — Bloom filter structure diagram.

Data query process

When a participating enterprise joins the blockchain network as an SLN to query data, the first operation is to retrieve the target data content from the local storage environment. A quick check is performed using the Bloom filter in each block header to determine whether the required query content is stored locally. If a block that may contain the data is found, the query is performed locally; if the data is not available locally, the SLN requests the data from the FN and performs verification. The process for an SLN to request data from an FN is roughly as follows:

Establishing Network Connection: The SLN establishes a network connection with one or more FNs using the IP addresses and port numbers of the nodes.
Sending Request Message: After the connection is established, the SLN can send a request message to the FN containing the specific information required.
FN Processes the Request: Upon receiving the SLN request message, the FN processes the request accordingly.
Data Transmission: When the FN confirms the validity of the request and has the relevant data available for the SLN, it transmits the required data to the SLN. The data can be transferred via a network protocol or through a dedicated data channel.
SLN Receives Data: The SLN receives and processes the data, performing the necessary validation, parsing, or storage.

The sequence diagram of the data query process is shown in Fig. 4:

Experimental analysis

Experimental environment

The experiment was conducted on a PC equipped with an Intel^® Core™ i7-9700 CPU @ 3.00 GHz and 16 GB of RAM. Blockchain simulation and algorithm implementation were carried out in Python, with the blockchain network built using the PoA consensus mechanism. MySQL 8.0.19 was used for data storage. Each stage of the simulation was configured with one LN, one SLN, and one FN, resulting in a total of 18 nodes. The wheat supply chain was selected as the case study for testing and analysis, with test data obtained from the smart farm cloud platform.

Data storage analysis

Data parameter

The test data used in this experiment were derived from records exported from the Smart Farm Cloud Platform between January 2022 and December 2022. The dataset was subsequently filtered and augmented based on this raw data. It contains a total of 50,563 records, each with 15 distinct attributes. This study utilizes data from six key stages of the agricultural product supply chain: planting, harvesting, processing, warehousing, logistics, and sales. Based on actual data collection and generation rules, the data volume across these stages is relatively balanced within the dataset of approximately 50,000 records obtained from the Smart Farm Cloud Platform, with each stage accounting for around 16.5% ± 1.2% of the total. Nonetheless, slight fluctuations in the proportion of records are observed across different stages. The planting and processing stages are slightly higher than the others due to the greater frequency of equipment sensing involved. The data is divided according to the principle of heat threshold division, and the storage volume is calculated accordingly. In order to meet the actual storage demand of the wheat supply chain, it is necessary to take the heating parameters of T₀ and T₁ as 0.3 and 0.6, respectively, and the initial value of T₁ of T_{_time} as 0.4, for example. When the storage capacity of SLN is limited, the first and second thresholds are usually the thresholds A, B, and C in Table 2, and λ is usually taken to be 0.01 because of the long cycle of the wheat supply chain. In the agricultural supply chain, the data storage threshold can be flexibly adjusted based on the SLN’s storage capacity and the desired balance between query efficiency and storage space. The selected threshold values and related parameters are presented in Table 4:

Table 4.

Experiment parameter setting.

Parameter Name	Parameter Value
Database version	MySQL 8.0.19
Compiler version	Python 3.9.3
PyCharm IDE	PyCharm 2024.1
Consensus Mechanism	PoA
Number of nodes	N = 18
Block height	2000
Weight coefficient	W=(0.33, 0.33, 0.33)
The maximum number of transactions in a block	100(The maximum number of transactions that a block can accept)
Block maximum capacity /MB	100(The maximum capacity that a block can accept)
Three sets of thresholds	A=(0.33, 0.23) B=(0.3, 0.2) C=(0.26, 0.16)

Open in a new tab

Cold/Hot data storage comparison

The data storage volume variation over time was obtained through the implementation of a data heat level determination and heat cooling change storage algorithm, under the condition that the first and second thresholds follow Scheme A. As demonstrated in Fig. 5, with the passage of time, both the total amount of stored data and the amount of cold data exhibit a linear growth trend. By the hot-cold data replacement principle, the replacement trend of hot data and warm data is found to be stable, ultimately reaching a balanced state. Following 12 months of operation, the proportion of hot data storage volume accounted for 4.90% of the total data volume within the wheat supply chain. In practical supply chain operations, each participant’s SLN selects an appropriate temperature threshold range for hot–cold data classification based on its available storage capacity. Through the implementation of threshold adjustments, it is possible to adapt different storage requirements, thereby enabling precise control of data storage volume and achieving a dynamic balance of storage resources.

Fig. 5 — Comparison of data storage volume.

In scenarios about supply chain traceability, the accuracy of data exerts a substantial influence on the classification of cold and hot data, as well as the optimization of node storage strategies. In order to mitigate the impact of potential anomalies, such as timestamp errors or abnormal access frequencies, this study employs a multi-factor weighted heat evaluation model and introduces dynamic threshold Schemes (A, B, C) to enhance adaptability and tolerance. In addition, the experimental data originate from a structured and reliable smart farm cloud platform, thus ensuring data quality. The proposed model demonstrates strong robustness and is capable of maintaining reliable performance in heat-based data classification and storage optimization even under conditions of moderate data inaccuracy.

Threshold selection comparison

A storage comparison analysis was conducted, with the storage parameters serving as the foundation for the analysis. A total of 50,563 data entries were divided for storage, and the analysis of hot data storage was conducted at the fourth, eighth, and twelfth months under three different threshold settings for SLN, as illustrated in Fig. 6. The findings suggest that, by the fourth month, under the threshold settings of Scheme A, where the first threshold was W₁ = 0.33 and the second threshold was W₂ = 0.23, the hot data storage accounted for 22.81% of the total data. In the context of the Scheme B, the first threshold is designated as W₁ = 0.3, with the second threshold set at W₂ = 0.2. The analysis reveals that the designated hot data storage component accounts for 28.95% of the total. In the context of Scheme C, with the first threshold fixed at W₁ = 0.26 and the second threshold established at W₂ = 0.16, the proportion of hot data storage was determined to be 46.02%. At the eighth month, the proportion of hot data storage was found to be 11.44% of the total data under the threshold settings of Scheme (A) This figure was 14.41% under Scheme B and 28.13% under Scheme C. As the experiment progressed to the twelfth month, the proportion of hot data storage within the total data under Scheme A increased to 4.90%. Enterprises participating in the SLN stored a greater volume of hot data from additional stages, with hot data storage accounting for 9.70% under Scheme (B) As the threshold values decreased, the SLN demonstrated an increased capacity for storing hot data, with the storage amount accounting for 18.80% of the total data under Scheme C.

Fig. 6 — Comparison of the impact of thresholds.

Node storage comparison

The Bloom filter is implemented at the lower level based on a bit array structure, where each element in the bit array occupies only 1 bit. In the event of m elements being requested, the storage size will be m bits. When requesting 2048 elements, the required space is 2048 bits / 8 = 256 bytes = 256 / 1024 KB = 0.25 KB. When 10,000 blocks are used to store this Bloom filter, the required space is less than 2.5 MB. This indicates that the memory space occupied by the Bloom filter for storage is acceptable.

In instances where the LN storage capacity is subject to an upper limit, an analytical approach is employed, with the first-stage participating enterprise serving as a case study. The supply chain planting stage participating enterprise data was modeled according to the data storage and heat-cooling change storage algorithms. These were used to calculate the amount of data stored in SLN and FN, respectively. In instances where the SLN storage capacity is constrained to 500 MB, the storage rule of this Scheme prioritizes the storage of hot blocks, with the remaining space allocated for the storage of warm blocks. The blue triangle curve in Fig. 7 illustrates the comparison of the storage capacity of SLN and FN when the storage capacity of SLN is limited. When the number of blocks reaches 150 and the SLN’s capacity reaches its upper limit, the system will first store all hot blocks, then allocate the remaining space to warm blocks in descending order of their heat weight values. Storage or deletion operations are ultimately performed according to the SLN’s storage willingness. The orange circular curve in the figure provides a comparison of SLN and FN storage, without imposing any limitations on the storage size of a single SLN. When the block height is 2000, SLN can save 92.75% storage space compared with FN stock. This indicates that the storage method proposed in this study can greatly reduce the storage pressure of a single node. As the number of blocks increases, FN data demonstrates linear growth; however, SLN storage capacity achieves a dynamic equilibrium after a certain period. The quantity of data stored is determined by the storage size of the node itself. It is important to note that, even in the event of a greater quantity of data being stored, the total amount of data can be balanced.

Fig. 7 — Comparison of storage between SLN and LN.

Storage comparison test

We compared the Enhanced Simplified Payment Verification (ESPV) model proposed in the literature²⁸ with the Account-Based Blockchain Scalable Storage Model (SSMAB) proposed in the literature³⁹ to verify the effectiveness of the model. The idea behind the model is collaborative storage, which belongs to data pruning. The SSMAB model uses full redundancy to preserve state data and segmentation to store block data. In this study, when the storage capacity of nodes is limited, the number of FNs is appropriately reduced, and SLNs are used to store some block data, thereby reducing the storage redundancy of the entire blockchain network. Table 5 compares the data storage capacity of the SSMAB Scheme, the ESPV Scheme, and the research operation over 12 months, as well as the storage capacity of all nodes in a full redundancy state. When the above parameters are met, the threshold value is selected as an example. The three models are then compared and analyzed under the same amount of data, node size and running time. The SSMAB node’s data storage capacity is 13.00% of the FN storage model’s capacity, and the ESPV Scheme’s data storage capacity is 10.00% of the FN model’s capacity. Due to the principle of replacing hot and cold blocks, SLN always stores hot blocks; therefore, its annual data storage capacity is 4.90% of the FN storage mode’s capacity. It can therefore be concluded that this research can greatly reduce the storage resource consumption of the blockchain network.

Table 5.

Comparison of data storage volume.

Scheme type	Storage capacity (MB)	Proportion of data storage (%)
SLN (Scheme A)	120.97	4.90%
ESPV	246.889	10.00%
SSMAB	310.955	13.00%
FN	2468.89	100%

Open in a new tab

Query analysis

Bloom filter construction time

In this experiment, a Bloom filter was created for the product ID in each block, and the construction time was observed as the number of blocks in the blockchain increased. The blockchain utilizes a bit array with a length of 2048, and three distinct hash functions are employed to minimize the false positive rate. In this phase of the experiment, the Python programming language was utilized to implement the Bloom filter construction on the generated blocks. The experiment is illustrated in Fig. 8. In instances where the number of blocks is small, the construction time of the Bloom filter can be disregarded. Notwithstanding the generation of 500 blocks, the construction time remains below 1 millisecond. As the number of blocks increases, the construction time increases concomitantly, reaching 2 milliseconds when the block count reaches 2000. Consequently, it can be deduced that the creation time of the Bloom filter is brief, and the time cost is well within an acceptable range.

Fig. 8 — Construction time of Bloom Filter.

SLN data query time

The parameter s is the Zipf exponent, which is known to determine the steepness of the distribution⁴⁰. It is typically set to 1. As shown in Formula 9. The total number of blocks is denoted by N. The query results are obtained by averaging the outcomes of multiple experimental runs. Initially, the mean query time differences under various node types (FN, LN, and SLN) were compared, whilst maintaining the blockchain constant, in order to validate the advantage of the proposed storage optimization Scheme in terms of query efficiency. In the subsequent stage of the research, the impact of storage thresholds on the enhancement of query efficiency by the proposed storage optimization Scheme was investigated. The objective of these experiments is to demonstrate the significant effect of the proposed SLN Scheme on improving query performance.

The results are displayed in Fig. 9. As illustrated in Fig. 9, the mean query time for the FN, LN, and SLN models proposed in this paper is compared under three distinct storage threshold Schemes (A, B, and C). Each subgraph corresponds to a distinct threshold Scheme. As shown in Fig. 9(a), Scheme A has the least amount of hot data stored locally by SLN nodes. Under this Scheme, although the SLN query time is still higher than that of FN, it has been significantly reduced compared to LN, indicating that even with only a small amount of hot data stored locally, query efficiency can be significantly improved. Figure 9(b) shows the moderate hot data storage situation in Scheme B, where the SLN query time is further reduced, and the gap with FN is narrowed. This indicates that moderately increasing the proportion of locally stored hot data can effectively reduce dependence on full nodes, thereby improving response speed. Figure 9(c) corresponds to Scheme C’s highly localized hot data storage, where the average query time for SLN nodes has decreased to 30.91 milliseconds, with performance now closely matching that of FN. This further validates the significant advantages of the hot data localization storage strategy in enhancing query performance.

The mean query times for the three node groups are displayed in Table 6. In instances where the first and second thresholds are derived from Scheme A, the storage of hot blocks is minimal, and the SLN average query time is recorded as 63.46 ms. In instances where the first and second thresholds are derived from Scheme B, the storage of hot blocks is moderate, and the SLN average query time is recorded as 44.02 ms. In instances where the first and second thresholds are derived from Scheme C, the storage of hot blocks is considerable, and the SLN average query time is recorded as 30.91 ms. A comparison of the SLN query time with the FN query time reveals an increase of 14.45 ms; conversely, when contrasted with the LN query time, a decrease of 54.43 ms is observed. Consequently, the proposed solution in this study has the potential to enhance data query efficiency in comparison to conventional LN queries.

Table 6.

Comparative analysis of average query time across different node Types.

The time under different plans(ms)	LN	FN	SLN
(A)	93.12	18.58	63.46
(B)	89.37	18.04	44.02
(C)	85.34	16.46	30.91

Open in a new tab

Query throughput test

In this experiment, 20 concurrent threads are used to perform query operations on smart contracts, simulating highly concurrent requests in real scenarios. In each round of testing, the system randomly initiates query requests through the thread pool, measures the overall response time, and calculates the throughput to evaluate the system’s performance under different load conditions. As shown in Fig. 10, when the total number of queries increases from 50 to 130, the TPS increases linearly as the number of queries increases, indicating that the system still has sufficient processing resources to respond to query requests on time. However, when the number of queries exceeds 130, throughput growth slows down and fluctuates slightly, indicating that the system is approaching the upper limit of processing capacity. This phenomenon may be caused by factors such as network delay, node resource saturation, and degraded data synchronization efficiency, reflecting the system’s scalability bottleneck under high load.

Conclusions and future direction

Conclusions

To address key challenges in the traceability of agricultural product supply chains—such as large data volume, low query efficiency, and high storage pressure on individual blockchain nodes—this paper proposes a hierarchical cold and hot data storage model based on storage light nodes. This model leverages data access heat to classify data into three categories—hot, warm, and cold—and dynamically replaces low-heat data via a trigger mechanism. In addition, a Bloom filter utilizing batch numbers was designed to enable efficient local queries and block positioning for light nodes. The experimental results show that, out of 50,000 data records from the agricultural product supply chain, the SLN stores only 4.90% of the hot data locally, achieving an average query time of 30.96 milliseconds. This represents a significant improvement in query performance compared to traditional light nodes. This Scheme effectively mitigates storage pressure within the blockchain system and enhances query efficiency. It also offers good scalability and demonstrates strong potential for practical applications.

Future direction

This research has significant practical application value and a theoretical basis for agricultural supply chains. With the continuous growth of data volume in the agricultural product supply chain, this model still needs further optimization. To achieve a more complete supply chain traceability system, in the future, practical application and promotion value should be considered, as well as how to deploy and maintain the system in large-scale practical applications and other issues. Firstly, machine learning algorithms can be combined to predict the trend of data temperature changes, achieving a more intelligent data cold/hot division and storage scheduling strategy. Secondly, on the basis of the storage light node, the storage elimination mechanism can be further optimized.

Moreover, in terms of cross-chain interoperability, future work could investigate how this model may facilitate data exchange among heterogeneous blockchain networks, thereby improving the availability and shareability of traceability information in agricultural supply chains. Concurrently, from the standpoint of data privacy and security, cryptographic techniques such as homomorphic encryption and zero-knowledge proofs can be integrated to ensure that improvements in query efficiency do not compromise data integrity or privacy protection.

The cold and hot data storage model proposed in this study offers a novel approach to applying blockchain technology in the traceability of agricultural product supply chains. Continued research and optimization will further facilitate the widespread adoption of blockchain in the field of agricultural traceability.

Acknowledgements

This work was funded by the National Key Research and Development Program (2023YFD2001304) and supported by Jiangsu Province Science and Technology Plan (Key Research and Development Plan Modern Agriculture) Project (BE2023315).

Author contributions

Mei Sun: Paper conception, Blockchain part implementation, Experimental design, Experimental testing, Draft preparation, Data curation, Formal analysis. Chuanheng Sun: Funding acquisition, System framework propose, Formal analysis. Na Luo: Paper conception, Methodology, Formal analysis. Bin Xing: Paper conception, Formal analysis. Feng Chen: Paper conception, Formal analysis. Qingbo Liu: Paper conception, Experimental design. Xiaohui Liu: Paper conception. All authors reviewed the manuscript.

Data availability

The datasets generated and/or analyzed during the current study are not publicly available for security reasons, but are available from the first author, Mei Sun(smei_123@163.com), upon reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Ahumada, O. & Villaobos, J. R. Application of planning models in the agri-food supply chain: A review. Eur. J. Oper. Res.196, 1–20. 10.1016/j.ejor.2008.02.014 (2009). [Google Scholar]
2.Peng, X. et al. A review on blockchain smart contracts in the agri-food industry: current state, application challenges and future trends. Comput. Electron. Agric.208, 107776. 10.1016/j.compag.2023.107776 (2023). [Google Scholar]
3.Bosona, T. & Gebresenbet, G. J. S. The role of blockchain technology in promoting traceability systems in agri-food production and supply chains. Sensors23, 5342. 10.3390/s23115342 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Cama-Pinto, D. et al. A deep learning model of radio wave propagation for precision agriculture and sensor system in greenhouses. Agronomy13, 244. 10.3390/agronomy13010244 (2023). [Google Scholar]
5.Cheng, A. J. et al. Recent advances of capacitive sensors: materials, microstructure designs, applications, and opportunities. Adv. Mater. Technol.8, 2201959. 10.1002/admt.202201959 (2023). [Google Scholar]
6.Fan, R., Wang, J., Han, W., Xu, B. UAV swarm control based on hybrid bionic swarm intelligence. Guidance Navig. Control. 3, 2350008. 10.1142/S2737480723500085 (2023).
7.Khandelwal, P. M. & Chavhan, H. Artificial intelligence in agriculture: an emerging era of research. Res. Gate Publication. 1, 56–60 (2019). [Google Scholar]
8.Almadani, B., Mostafa, S. M. & J. I. A. IIoT based multimodal communication model for agriculture and agro-industries. IEEE Access.9, 10070–10088. 10.1109/ACCESS.2021.3050391 (2021). [Google Scholar]
9.Choudhary, S. K., Jadoun, R. S. & Mandoriya, H. L. Role of cloud computing technology in agriculture fields. Computing7, 1–7 (2016). [Google Scholar]
10.Zhao, Y., Li, Q., Yi, W. & Xiong, H. Agricultural IoT data storage optimization and information security method based on blockchain. Agriculture13, 274. 10.3390/agriculture13020274 (2023). [Google Scholar]
11.Salah, K., Nizamuddin, N., Jayaraman, R. & Omar, M. Blockchain-based soybean traceability in agricultural supply chain. IEEE Access.7, 73295–73305. 10.1109/ACCESS.2019.2918000 (2019). [Google Scholar]
12.Li, R. et al. Blockchain for large-scale internet of things data storage and protection. IEEE Trans. Serv. Comput.12, 762–771. 10.1109/TSC.2018.2853167 (2018). [Google Scholar]
13.Dasaklis, T. K., Voutsinas, T. G., Tsoulfas, G. T. & Casino, F. A systematic literature review of blockchain-enabled supply chain traceability implementations. Sustainability14, 2439. 10.3390/su14042439 (2022). [Google Scholar]
14.ETHERSCAN. Ethereum Full Node Sync(Archive)Chart[EB/OL]. https://etherscan.io/chartsync/chainarchive
15.Xiaobao, L. et al. Graph blockchain model for In-depth traceability of the fruit and vegetable supply chain. 44, 1–10. 10.7506/spkx1002-6630-20220930-340 (2023).
16.Benisi, N. Z., Aminian, M. & Javadi, B. Blockchain-based decentralized storage networks: A survey. J. Netw. Comput. Appl.162, 102656. 10.1016/j.jnca.2020.102656 (2020). [Google Scholar]
17.Wang, Y. X. & Hsueh, Y. L. A low-storage synchronization framework for blockchain systems. J. Netw. Comput. Appl.231, 103977. 10.1016/j.jnca.2024.103977 (2024). [Google Scholar]
18.Hassanzadeh-Nazarabadi, Y., Taheri-Boshrooyeh, S. & Özkasap, Ö. in IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 1–6 (IEEE). 10.1109/INFOCOMWKSHPS61880.2024.10620713
19.Bandara, E. et al. Lightweight, geo-scalable deterministic blockchain design for 5G networks sliced applications with hierarchical CFT/BFT consensus groups, IPFS, and novel hardware design. Internet Things. 25, 101077. 10.1016/j.iot.2024.101077 (2024). [Google Scholar]
20.Akrasi-Mensah, N. K. et al. An overview of technologies for improving storage efficiency in blockchain-based IIoT applications. Electronics11, 2513. 10.3390/electronics11162513 (2022). [Google Scholar]
21.Jia, D., Xin, J., Wang, Z. & Wang, G. Optimized data storage method for sharding-based blockchain. IEEE Access.9, 67890–67900. 10.1109/ACCESS.2021.3077650 (2021). [Google Scholar]
22.Feng, H., Wang, J. & Li, Y. A. Blockchain storage architecture based on Information-Centric networking. Electronics11, 2661. 10.3390/electronics11172661 (2022). [Google Scholar]
23.Sun, C. H., Yuan, S., Luo, N., Xu, D. M. & Yang, X. T. Traceability method of rice origin based on blockchain and edge computing. J. Agricultural Mach.54, 359–368. 10.6041/j.issn.1000-1298.2023.05.037 (2023). [Google Scholar]
24.Zhang, H. et al. Multi-level index construction method based on master–slave blockchains. Sci. Rep.14, 4049. 10.1038/s41598-024-54240-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Liu, S. et al. Development of a reliable traceability system for agricultural products quality and safety based on blockchain. Trans. Chin. Soc. Agricultural Mach.53, 327–337. 10.6041/j.issn.1000-1298.2022.06.035 (2022). [Google Scholar]
26.Kılıç, B., Özturan, C. & Sen, A. Parallel analysis of Ethereum blockchain transaction data using cluster computing. Cluster Comput.25, 1885–1898. 10.1007/s10586-021-03511-0 (2022). [Google Scholar]
27.Li, X., Ahmed, F., Wei, L., Zhang, C. & Fang, Y. Protecting access privacy in Ethereum using differentially private information retrieval. 1–6 (IEEE). 10.1109/GLOBECOM42002.2020.9348108
28.Zhao, Y., Niu, B., Li, P. & Fan, X. Blockchain enhanced lightweight node model. J. Comput. Appl.40, 942. 10.11772/j.issn.1001-9081.2019111917 (2020). [Google Scholar]
29.Palai, A., Vora, M. & Shah, A. In 2018 9th IFIP international conference on new technologies, mobility and security (NTMS). 1–5 (IEEE). 10.1109/NTMS42894.2018
30.Liu, Y., Zhang, L. & Zhao, Y. Deciphering bitcoin blockchain data by cohort analysis. Sci. Data. 9, 136. 10.48550/arXiv.2103.00173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Diallo, E., Dib, O. & Al Agha, K. A scalable blockchain-based scheme for traffic-related data sharing in VANETs. Blockchain: Res. Appl.3, 100087. 10.1016/j.bcra.2022.100087 (2022). [Google Scholar]
32.Yulin, X. Research on Recognition Mechanism of Hot and Cold Data Based on Data Tenwerature [D]. (2019).
33.Stranieri, S., Riccardi, F., Meuwissen, M. P. & Soregaroli, C. Exploring the impact of blockchain on the performance of agri-food supply chains. Food Control. 119, 107495. 10.1016/j.foodcont.2020.107495 (2021). [Google Scholar]
34.Tan, Y., Huang, X. & Li, W. Does blockchain-based traceability system guarantee information authenticity? An evolutionary game approach. Int. J. Prod. Econ.264, 108974. 10.1016/j.ijpe.2023.108974 (2023). [Google Scholar]
35.Arif, J., Samadhiya, A., Naz, F. & Kumar, A. Exploring the application of ICTs in decarbonizing the agriculture supply chain: A literature review and research agenda. Heliyon10.1016/j.heliyon.2024.e29564 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
36.Lezoche, M., Hernandez, J. E., Díaz, M. M. E. A., Panetto, H. & Kacprzyk, J. Agri-food 4.0: A survey of the supply chains and technologies for the future agriculture. Comput. Ind.117, 103187. 10.1016/j.compind.2020.103187 (2020). [Google Scholar]
37.Yang, D. & Tsai, W. T. An Optimized Encryption Storage Scheme for Blockchain Data Based on Cold and Hot Blocks and Threshold Secret Sharing. Entropy. 26, 690 10.3390/e26080690 (2024). [DOI] [PMC free article] [PubMed]
38.Shi, D. et al. A reliable and effcient storage scheme for bitcoin blockchain based on raptor code. Chin. J. Electron.32, 577–586. 10.23919/cje.2022.00.343 (2023). [Google Scholar]
39.Zhang, G., Niu, X. & Gong, T. B. Account-based blockchain scalable storage model. 48, 708–715 10.13700/j.bh.1001-5965.2020.0638 (2022).
40.Zhang, L. et al. Resource allocation and trust computing for blockchain-enabled edge computing system. Computers Secur.105, 102249. 10.1016/j.cose.2021.102249 (2021). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Ahumada, O. & Villaobos, J. R. Application of planning models in the agri-food supply chain: A review. Eur. J. Oper. Res.196, 1–20. 10.1016/j.ejor.2008.02.014 (2009). [Google Scholar]

[CR2] 2.Peng, X. et al. A review on blockchain smart contracts in the agri-food industry: current state, application challenges and future trends. Comput. Electron. Agric.208, 107776. 10.1016/j.compag.2023.107776 (2023). [Google Scholar]

[CR3] 3.Bosona, T. & Gebresenbet, G. J. S. The role of blockchain technology in promoting traceability systems in agri-food production and supply chains. Sensors23, 5342. 10.3390/s23115342 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Cama-Pinto, D. et al. A deep learning model of radio wave propagation for precision agriculture and sensor system in greenhouses. Agronomy13, 244. 10.3390/agronomy13010244 (2023). [Google Scholar]

[CR5] 5.Cheng, A. J. et al. Recent advances of capacitive sensors: materials, microstructure designs, applications, and opportunities. Adv. Mater. Technol.8, 2201959. 10.1002/admt.202201959 (2023). [Google Scholar]

[CR6] 6.Fan, R., Wang, J., Han, W., Xu, B. UAV swarm control based on hybrid bionic swarm intelligence. Guidance Navig. Control. 3, 2350008. 10.1142/S2737480723500085 (2023).

[CR7] 7.Khandelwal, P. M. & Chavhan, H. Artificial intelligence in agriculture: an emerging era of research. Res. Gate Publication. 1, 56–60 (2019). [Google Scholar]

[CR8] 8.Almadani, B., Mostafa, S. M. & J. I. A. IIoT based multimodal communication model for agriculture and agro-industries. IEEE Access.9, 10070–10088. 10.1109/ACCESS.2021.3050391 (2021). [Google Scholar]

[CR9] 9.Choudhary, S. K., Jadoun, R. S. & Mandoriya, H. L. Role of cloud computing technology in agriculture fields. Computing7, 1–7 (2016). [Google Scholar]

[CR10] 10.Zhao, Y., Li, Q., Yi, W. & Xiong, H. Agricultural IoT data storage optimization and information security method based on blockchain. Agriculture13, 274. 10.3390/agriculture13020274 (2023). [Google Scholar]

[CR11] 11.Salah, K., Nizamuddin, N., Jayaraman, R. & Omar, M. Blockchain-based soybean traceability in agricultural supply chain. IEEE Access.7, 73295–73305. 10.1109/ACCESS.2019.2918000 (2019). [Google Scholar]

[CR12] 12.Li, R. et al. Blockchain for large-scale internet of things data storage and protection. IEEE Trans. Serv. Comput.12, 762–771. 10.1109/TSC.2018.2853167 (2018). [Google Scholar]

[CR13] 13.Dasaklis, T. K., Voutsinas, T. G., Tsoulfas, G. T. & Casino, F. A systematic literature review of blockchain-enabled supply chain traceability implementations. Sustainability14, 2439. 10.3390/su14042439 (2022). [Google Scholar]

[CR14] 14.ETHERSCAN. Ethereum Full Node Sync(Archive)Chart[EB/OL]. https://etherscan.io/chartsync/chainarchive

[CR15] 15.Xiaobao, L. et al. Graph blockchain model for In-depth traceability of the fruit and vegetable supply chain. 44, 1–10. 10.7506/spkx1002-6630-20220930-340 (2023).

[CR16] 16.Benisi, N. Z., Aminian, M. & Javadi, B. Blockchain-based decentralized storage networks: A survey. J. Netw. Comput. Appl.162, 102656. 10.1016/j.jnca.2020.102656 (2020). [Google Scholar]

[CR17] 17.Wang, Y. X. & Hsueh, Y. L. A low-storage synchronization framework for blockchain systems. J. Netw. Comput. Appl.231, 103977. 10.1016/j.jnca.2024.103977 (2024). [Google Scholar]

[CR18] 18.Hassanzadeh-Nazarabadi, Y., Taheri-Boshrooyeh, S. & Özkasap, Ö. in IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 1–6 (IEEE). 10.1109/INFOCOMWKSHPS61880.2024.10620713

[CR19] 19.Bandara, E. et al. Lightweight, geo-scalable deterministic blockchain design for 5G networks sliced applications with hierarchical CFT/BFT consensus groups, IPFS, and novel hardware design. Internet Things. 25, 101077. 10.1016/j.iot.2024.101077 (2024). [Google Scholar]

[CR20] 20.Akrasi-Mensah, N. K. et al. An overview of technologies for improving storage efficiency in blockchain-based IIoT applications. Electronics11, 2513. 10.3390/electronics11162513 (2022). [Google Scholar]

[CR21] 21.Jia, D., Xin, J., Wang, Z. & Wang, G. Optimized data storage method for sharding-based blockchain. IEEE Access.9, 67890–67900. 10.1109/ACCESS.2021.3077650 (2021). [Google Scholar]

[CR22] 22.Feng, H., Wang, J. & Li, Y. A. Blockchain storage architecture based on Information-Centric networking. Electronics11, 2661. 10.3390/electronics11172661 (2022). [Google Scholar]

[CR23] 23.Sun, C. H., Yuan, S., Luo, N., Xu, D. M. & Yang, X. T. Traceability method of rice origin based on blockchain and edge computing. J. Agricultural Mach.54, 359–368. 10.6041/j.issn.1000-1298.2023.05.037 (2023). [Google Scholar]

[CR24] 24.Zhang, H. et al. Multi-level index construction method based on master–slave blockchains. Sci. Rep.14, 4049. 10.1038/s41598-024-54240-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Liu, S. et al. Development of a reliable traceability system for agricultural products quality and safety based on blockchain. Trans. Chin. Soc. Agricultural Mach.53, 327–337. 10.6041/j.issn.1000-1298.2022.06.035 (2022). [Google Scholar]

[CR26] 26.Kılıç, B., Özturan, C. & Sen, A. Parallel analysis of Ethereum blockchain transaction data using cluster computing. Cluster Comput.25, 1885–1898. 10.1007/s10586-021-03511-0 (2022). [Google Scholar]

[CR27] 27.Li, X., Ahmed, F., Wei, L., Zhang, C. & Fang, Y. Protecting access privacy in Ethereum using differentially private information retrieval. 1–6 (IEEE). 10.1109/GLOBECOM42002.2020.9348108

[CR28] 28.Zhao, Y., Niu, B., Li, P. & Fan, X. Blockchain enhanced lightweight node model. J. Comput. Appl.40, 942. 10.11772/j.issn.1001-9081.2019111917 (2020). [Google Scholar]

[CR29] 29.Palai, A., Vora, M. & Shah, A. In 2018 9th IFIP international conference on new technologies, mobility and security (NTMS). 1–5 (IEEE). 10.1109/NTMS42894.2018

[CR30] 30.Liu, Y., Zhang, L. & Zhao, Y. Deciphering bitcoin blockchain data by cohort analysis. Sci. Data. 9, 136. 10.48550/arXiv.2103.00173 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Diallo, E., Dib, O. & Al Agha, K. A scalable blockchain-based scheme for traffic-related data sharing in VANETs. Blockchain: Res. Appl.3, 100087. 10.1016/j.bcra.2022.100087 (2022). [Google Scholar]

[CR32] 32.Yulin, X. Research on Recognition Mechanism of Hot and Cold Data Based on Data Tenwerature [D]. (2019).

[CR33] 33.Stranieri, S., Riccardi, F., Meuwissen, M. P. & Soregaroli, C. Exploring the impact of blockchain on the performance of agri-food supply chains. Food Control. 119, 107495. 10.1016/j.foodcont.2020.107495 (2021). [Google Scholar]

[CR34] 34.Tan, Y., Huang, X. & Li, W. Does blockchain-based traceability system guarantee information authenticity? An evolutionary game approach. Int. J. Prod. Econ.264, 108974. 10.1016/j.ijpe.2023.108974 (2023). [Google Scholar]

[CR35] 35.Arif, J., Samadhiya, A., Naz, F. & Kumar, A. Exploring the application of ICTs in decarbonizing the agriculture supply chain: A literature review and research agenda. Heliyon10.1016/j.heliyon.2024.e29564 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[CR36] 36.Lezoche, M., Hernandez, J. E., Díaz, M. M. E. A., Panetto, H. & Kacprzyk, J. Agri-food 4.0: A survey of the supply chains and technologies for the future agriculture. Comput. Ind.117, 103187. 10.1016/j.compind.2020.103187 (2020). [Google Scholar]

[CR37] 37.Yang, D. & Tsai, W. T. An Optimized Encryption Storage Scheme for Blockchain Data Based on Cold and Hot Blocks and Threshold Secret Sharing. Entropy. 26, 690 10.3390/e26080690 (2024). [DOI] [PMC free article] [PubMed]

[CR38] 38.Shi, D. et al. A reliable and effcient storage scheme for bitcoin blockchain based on raptor code. Chin. J. Electron.32, 577–586. 10.23919/cje.2022.00.343 (2023). [Google Scholar]

[CR39] 39.Zhang, G., Niu, X. & Gong, T. B. Account-based blockchain scalable storage model. 48, 708–715 10.13700/j.bh.1001-5965.2020.0638 (2022).

[CR40] 40.Zhang, L. et al. Resource allocation and trust computing for blockchain-enabled edge computing system. Computers Secur.105, 102249. 10.1016/j.cose.2021.102249 (2021). [Google Scholar]

PERMALINK

Data storage and query optimization for Blockchain-based agricultural supply chains using storage light nodes

Mei Sun

Na Luo

Xing Bin

Feng Chen

Qingbo Liu

Xiaohui Liu

Chuanheng Sun

Abstract

Introduction

Table 1.

Contribution

Chapter arrangement

Cold /Hot data storage model

Analysis of agricultural supply chain information

Agricultural supply chain key information

Table 2.

Agricultural supply chain information analysis

Cold /Hot data model classification

Network storage model based on storage light nodes

Data storage network model

Fig. 1.

Fig. 2.

Data storage process

Algorithm 1.

Algorithm 2.

Algorithm 3.

Data query optimization

Product data numbering design for agricultural supply chain

Table 3.

Constructing a bloom filter

Fig. 3.

Data query process

Fig. 4.

Experimental analysis

Experimental environment

Data storage analysis

Data parameter

Table 4.

Cold/Hot data storage comparison

Fig. 5.

Threshold selection comparison

Fig. 6.

Node storage comparison

Fig. 7.

Storage comparison test

Table 5.

Query analysis

Bloom filter construction time

Fig. 8.

SLN data query time

Fig. 9.

Table 6.

Query throughput test

Fig. 10.

Conclusions and future direction

Conclusions

Future direction

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases