An efficient watermarking algorithm for digital audio data in security applications

Mohamed Yamni; Achraf Daoui; Hicham Karmouni; Mhamed Sayyouri; Hassan Qjidaa; Saad motahhir; Ouazzani Jamil; Walid El-Shafai; Abeer D Algarni; Naglaa F Soliman; Moustafa H Aly

doi:10.1038/s41598-023-45619-w

. 2023 Oct 27;13:18432. doi: 10.1038/s41598-023-45619-w

An efficient watermarking algorithm for digital audio data in security applications

Mohamed Yamni ¹, Achraf Daoui ², Hicham Karmouni ², Mhamed Sayyouri ², Hassan Qjidaa ³, Saad motahhir ², Ouazzani Jamil ³, Walid El-Shafai ^4,^5,^✉, Abeer D Algarni ⁶, Naglaa F Soliman ⁶, Moustafa H Aly ⁷

PMCID: PMC10611798 PMID: 37891357

Abstract

Transform-domain audio watermarking systems are more robust than time-domain systems. However, the main weakness of these systems is their high computational cost, especially for long-duration audio signals. Therefore, they are not desirable for real-time security applications where speed is a critical factor. In this paper, we propose a fast watermarking system for audio signals operating in the hybrid transform domain formed by the fractional Charlier transform (FrCT) and the dual-tree complex wavelet transform (DTCWT). The central idea of the proposed algorithm is to parallelize the intensive and repetitive steps in the audio watermarking system and then implement them simultaneously on the available physical cores on an embedded systems cluster. In order to have a low power consumption and a low-cost cluster with a large number of physical cores, four Raspberry Pis 4B are used where the communication between them is ensured using the Message Passing Interface (MPI). The adopted Raspberry Pi cluster is also characterized by its portability and mobility, which are required in watermarking-based smart city applications. In addition to its resistance to any possible manipulation (intentional or unintentional), high payload capacity, and high imperceptibility, the proposed parallel system presents a temporal improvement of about 70%, 80%, and 90% using 4, 8, and 16 physical cores of the adopted cluster, respectively.

Subject terms: Electrical and electronic engineering, Mathematics and computing

Introduction

Recently, information security has received considerable attention due to the appearance of serious problems with multimedia data, such as illegal distribution, copying, authentication, and editing. Among the techniques followed to avoid these problems, digital watermarking is a technology used to ensure law enforcement and copyright protection of multimedia data. Digital watermarking methods secure multimedia data by embedding copyright information (known as a watermark) imperceptibly and securely into the host in a way that resists any possible manipulation, whether intentional or unintentional, that attempts to delete or damage the watermark.

Several watermarking methods for audio signal protection have been published in the literature, which can be mainly classified into two categories: time domain methods and transform domain methods. The methods in the first category^{1, 2} are the simplest; they embed the watermark by directly modifying the host signal samples. However, these methods are generally not very robust to various common signal processing manipulations. In contrast, the second category methods are more robust by embedding the watermark into transform coefficients; examples include the discrete wavelet transform (DWT)^3–5, discrete cosine transform (DCT)⁶, singular value decomposition (SVD)⁷, and lifting wavelet transform (LWT)⁸. Moreover, to improve the performance, hybrid audio watermarking methods have been proposed that adopt two transforms, such as DWT-DCT⁹, DWT-DTMT¹⁰, and DWT-SVD¹¹.

The main limitations of the existing audio watermarking systems are low robustness, in particular to shifting modification, and high computation cost, especially for long-duration audio.

Several methods address the first limitation that can be overcome by using a synchronization code strategy^12–18. With this strategy, synchronization codes are also embedded together with a watermark into the host audio signal to determine the positions of the modified samples of the audio signal. In the watermark extraction process, these synchronization codes are firstly found, and then the watermark bits that follow the synchronization code can be extracted. Without using a synchronization code strategy, we proposed in¹⁹ a hybrid approach robust to attacks, including shifting attacks, based on the Dual Tree Complex Wavelet Transform (DTCWT) and the Fractional Charlier Transform (FrCT). We embedded the watermark in the host signal by manipulating the coefficients resulting from the application of the DTCWT and FrCT, respectively.

If the robustness problem against shifting attacks has been effectively addressed by the methods mentioned earlier, the execution time of audio watermarking systems in real-time applications remains a challenging problem. In the context of copyright protection applications, the duration of the watermark embedding process may not be a primary concern, but the need for swift watermark extraction is of paramount importance²⁰. This emphasis on rapid extraction is supported by a multitude of compelling reasons²¹. Firstly, in scenarios characterized by real-time content dissemination, such as live streaming or content delivery networks, the rapid extraction of watermarks becomes indispensable for immediate verification of authenticity and copyright ownership. Fast extraction is vital for detecting and addressing unauthorized usage or distribution promptly. Secondly, content creators and copyright holders frequently employ automated systems to monitor the utilization of their intellectual property across digital platforms. The efficient and timely tracking of copyrighted material depends on a swift watermark extraction process, facilitating the effective implementation of enforcement measures. Thirdly, the user experience is significantly affected by the pace of watermark extraction, particularly in applications like video streaming or online gaming. The imperative here is to minimize disruptions and latency issues, ensuring seamless content consumption. Fourthly, scalability considerations loom large as the volume of multimedia content burgeons across the internet. A rapid watermark extraction capability is pivotal for the efficient management and safeguarding of extensive content repositories. Fifthly, the expeditious extraction of watermarks plays a crucial role in deterring piracy and curtailing unauthorized distribution of copyrighted content. It strengthens the ability to promptly identify infringements and take necessary legal actions, thereby effectively safeguarding intellectual property rights. These reasons underscore the significance of fast watermark extraction in the context of audio watermarking systems used for copyright protection. However, most audio watermarking systems in the transform domain, such as^{3, 9–11, 19}, are very time-consuming, especially for signals of long duration. These systems operate in a sequential manner (Fig. 1). They divide the audio signal into segments and then apply a set of steps to each segment (preprocessing, switching from the time domain to the transform domain, embedding watermark bits, reconstructing watermarked segments, etc.). Only one segment is processed at a time on a single processor core. Applying transforms, and inverse transforms in a sequential way on audio segments are intensive processes, mainly for audio signals of long duration and for hybrid approaches that combine multiple transforms.

The main goal of this paper is to create and implement a fast audio watermarking system in the transform domain that can be executed in real-time. The central idea of the possible solution is to parallelize the intensive and repetitive steps in the audio watermarking system and then execute them simultaneously on the available physical cores of a multi-core processor (Fig. 2).

Proposed parallel audio watermarking system.

In the realm of parallel computing, various endeavors have been made, particularly in the domain of image processing applications. For instance, Hosny et al.²² presented a pioneering parallel medical image watermarking scheme, which they successfully deployed on both multi-core CPUs and GPUs. Similarly, Daoui et al.²³ introduced a parallel image encryption algorithm tailored for multi-core CPU architectures. Additionally, researchers in a related study²⁴ harnessed the parallel processing capabilities of both multi-core CPUs and GPUs to enhance image reconstruction and image classification tasks.

Despite the documented strides made in leveraging parallel computing for computational acceleration in various domains, these efforts have predominantly been confined to conventional personal computing devices (desktops or laptops). Such devices, distinguished by their considerable physical dimensions and weight, possess limited portability. Consequently, their applicability in mobility-constrained environments, encompassing scenarios like transportation modes (e.g., cars, trains, planes, and boats) and smart home or urban infrastructure contexts, has remained largely impractical.

In response to these inherent limitations associated with traditional personal computing systems, the adoption of mobile and portable embedded systems, exemplified by platforms such as Raspberry Pis, has emerged as a viable solution²⁵. These embedded systems offer a compelling alternative by virtue of their compact form factor, lower power consumption, and enhanced mobility, making them well-suited for a diverse range of applications and settings.

Parallel processing entails a heightened demand for computational resources, encompassing processor cores and memory, due to the concurrent execution of tasks and the need for efficient workload distribution. Simultaneous execution of multiple tasks necessitates the allocation of dedicated processor cores, while data sharing and synchronization among these tasks amplify the requirement for memory resources. To tackle this computational challenge, we build in this paper a cluster based on several Raspberry Pis for fast, parallel, and distributed audio watermarking. The selection of the Raspberry Pi as our computational platform is substantiated by its advantageous features, including its exceptional portability due to its lightweight (46 g) and compact dimensions (85.6 mm × 56.5 mm), coupled with its minimal power consumption and affordability. Compared to other versions of the Raspberry Pi, the 4B version with 2 GB of RAM is powerful enough to support complex signal processing applications that require a high computational load. In this paper, a cluster based on four Raspberry Pis 4B is built to have a large number of physical cores, which is very useful to accelerate the time of a parallel watermarking system.

This paper presents a parallel watermarking system for audio signals, implemented on the Raspberry Pi cluster. The proposed approach decomposes the host audio signal and the watermark into several sub-signals and vectors equal to the number of available cores of the Raspberry Pi cluster. Then simultaneously, on each core, we extract from each sub-signal the low-frequency coefficients that are less sensitive to the human auditory system by applying the 5-level DTCWT. Then, we apply the FrCT transform²⁶ with the optimal fractional order in order to improve the imperceptibility and robustness, and then we embed the watermark bits by quantizing the energies of FrCT coefficients. Finally, each core of the Raspberry Pi cluster sends the watermarked sub-signal to the master Raspberry Pi, and then the latter combines all these sub-signals to obtain the watermarked audio signal. Each Raspberry Pi 4B in our cluster has the same input data and the same copy of the instruction script, but each Raspberry Pi executes only a specific part of the script determined by the master Raspberry Pi of the cluster. Raspberry Pis in the cluster are independent of each other, and communications (sending and receiving data) between them are ensured using the Message Passing Interface (MPI) library²⁷.

Like the embedding process, the watermark extraction process requires neither the original audio signal nor the original watermark (blind extraction). We also used a modified Henon map²⁸ to encrypt the watermark and guarantee security.

The results show that the proposed parallel watermarking system is fast compared to the sequential system, with an improvement of about 70%, 80%, and 90% using 4, 8, and 16 cores of the Raspberry Pi cluster, respectively.

As summary, the contributions of this article are presented as follows.

A new parallel audio watermarking system implemented on the embedded systems cluster is proposed for the first time.
The audio watermarking system is fast and can be desirable for real-time applications.
All the Raspberry Pis in the cluster work simultaneously on the audio watermarking system, which reduces the execution time.
Raspberry Pi is characterized by its easy portability due to its lightweight and small size, and therefore, the limited portability of standard PCs can be overcome.

The rest of the manuscript is organized as follows: Sections "Discrete fractional Charlier transform", "Dual-tree complex wavelet transform", and "Modified Henon map" present respectively the FrCT, the DTCWT, the modified Henon map, and their roles in the proposed approach. Section "Raspberry Pi cluster" presents our Raspberry Pi cluster. Section "Proposed parallel audio watermarking system" presents the proposed parallel audio watermarking system. Section "Experiments results" presents the experimental results and discussions, and the conclusion is finally provided in Section "Conclusion".

Discrete fractional Charlier transform

In our previous paper²⁶, we proposed the fractional version of the Charlier transform, which is called the fractional Charlier transform (FrCT) based on the fractional Charlier polynomials (FrCPs) also proposed in the same paper. The FrCT generalizes the classical Charlier transform of integer order to fractional order in order to benefit the properties of non-integer orders.

The main property of FrCT that makes it very suitable for digital watermarking is its dependence on transform orders. By adjusting the fractional orders in the FrCT transform, different FrCT coefficients can be obtained. Therefore, we select the optimal fractional orders, and the corresponding FrCT coefficients are used as host coefficients to integrate the watermark. This approach improves the imperceptibility and robustness requirements of the watermarking system. In addition, the fractional orders in the transform can be used as additional secret keys to improve the security of the watermarking system.

Let $x (t), t = 1, 2, . . ., N$ be a one-dimensional signal of finite length $N$ , the one-dimensional fractional Charlier transform of this signal with fractional order $α, (α \in R)$ is defined as follows:

{FrCM}^{α} = C^{α} x

where $x$ is a column vector representation of $x (t)$ , and $C^{α}$ is the fractional Charlier polynomial matrix of size $N \times N$ and fractional order $α$ , which is defined as follows:

{FrCM}^{α} = C^{α} x

where the eigenvectors of the fractional Charlier polynomial matrix $v_{k} (k = 0, 1, . . ., N - 1)$ are the kth column of $V$ , and $D^{α}$ is defined as follows:

D^{α} = D i a g {1, e^{- j α π}, e^{- j 2 α π}, \dots, e^{- j (N - 1) α π}}

The corresponding inverse transform (iFrCT) can be written as follows:

x = C^{- α} {FrCM}^{α}

Dual-tree complex wavelet transform

The DTCWT²⁹ is an enhanced expansive version of the DWT. It is implemented as two separate DWTs ( $T r e e_{a}$ and $T r e e_{b}$ ) applied on the same signal data (Fig. 3). At the heart of DTCWT is a pair of filters: low pass and high pass. For a DTCWT of level $H$ , the low-pass ( $h_{0}$ ) and high-pass ( $h_{1}$ ) filters of $T r e e_{a}$ generating the approximation coefficients $A_{a}^{H}$ (low frequencies) and the detail coefficients $D_{a}^{H}, D_{a}^{H - 1}, . . ., D_{a}^{1}$ (high frequencies). Similarly, the approximation coefficients $A_{b}^{H}$ and the detail coefficients $D_{b}^{H}, D_{b}^{H - 1}, . . . ., D_{b}^{1}$ are generated by the low-pass and high-pass filters of $T r e e_{b}$ ${g_{0}, g_{1}}$ .

The outputs of the DTCWT can be interpreted as complex coefficients as follows:

A^{H} = A_{a}^{H} + j A_{b}^{H} and D^{H} = D_{a}^{H} + j D_{b}^{H}

where $A^{H}$ are the approximation coefficients of level $H$ and $D^{H}$ are the detail coefficients of level $H$ .

The original signal can be reconstructed without loss of information using inverse DTCWT (iDTCWT)²⁹.

The main advantage of DTCWT for signal processing is the shifting invariance that is not ensured by DWT. Indeed, the DTCWT is approximately shifting invariant, which means that small shifts in the input signal do not produce major variations in the energy distribution of the DTCWT coefficients at different levels. To obtain this advantage, the approximation and the detail coefficients of $T r e e_{a}$ must be approximate Hilbert transforms of the approximation and the detail coefficients of $T r e e_{b}$ , that is

A_{a}^{H} = H (A_{b}^{H}) and D_{a}^{H} = H (D_{b}^{H})

where $H$ is the Hilbert transform operator.

In our case, for the first level, we use a set of filters from³⁰, and for the other levels, we use a set of filters from^{31, 32} in order to verify the condition of Eq. (6).

Ethics approval

The manuscript does not contain any human or animal studies.

Consent to participate

All authors are contributing and accepting to submit the current work.

Modified Henon map

The modified Henon map is a nonlinear chaotic map very sensitive to the initial conditions recently proposed in²⁸. This chaotic map is defined as follows:

\{\begin{matrix} y (i) = b (1 - d |sin (y (i - 1))|) x (i - 1) \\ x (i) = 1 - a (1 - c |c o s (i)|) x^{2} (i - 1) + y (i - 1) \end{matrix}) ; with i = 0, 1, 2, . . .

where $a \in [0.54, 2], b \in [0, 1], c \in [0, 0.8], and d \in [0, 0.8] [0, 1]$ $, c \in [0, 0.8], and d \in [0, 0.8]$ are the control parameters of the chaotic system. If $c = d = 0$ , the modified Henon map coincides with the classical Henon map³³:

\{\begin{matrix} y (i) = b x (i - 1) \\ x (i) = 1 - a x^{2} (i - 1) + y (i - 1) \end{matrix}) ; with i = 0, 1, 2, . . .

In this paper, the modified Henon map is used to encrypt the watermark information before embedding it into the original host audio signal. This makes the watermark hard to be extract by unauthorized persons, which improves the overall security of the audio watermarking system. In addition, the encryption of the watermark eliminates the correlation between its information, and consequently, an improvement can be achieved in terms of the overall robustness of the proposed watermarking system.

Let $W = {w (i), 0 \leq i < N}$ be a binary sequence of ones and zeros with N bits, the watermark encryption process is as follows:

(1) Generate a chaotic sequence $Y = {y(i), 0 \leq i < N}$ using the modified Henon map (Eq. 7).

(2) Binarize the sequence $Y$ using its mean $T$ as a binarization threshold as follow:

\bar{Y} (i) = \{\begin{matrix} 1, if Y(i) \geq T \\ 0, if Y(i) < T \end{matrix}), (0 \leq i < N)

(3) Encrypt the watermark from $W$ to $W_{1}$ by applying the xor operation between $W$ and $\bar{Y}$ as follows:

W_{1} = xor(W, \bar{Y})

The watermark can be decrypted by applying the xor operation between the encrypted watermark $W_{1}$ and the chaotic sequence $\bar{Y}$ as follows:

W = {xor(W}_{1}, \bar{Y})

Watermark decryption depends on the initial parameters of the modified Henon map $\{a, b, c, d, x_{0}, y_{0}\}$ . These parameters can be used as a secret key in an audio watermarking system.

Raspberry Pi cluster

Basically, a cluster can be considered as a group of computers in a single entity. By combining two or more computers in a cluster, one can achieve a potential increase in performance by performing operations in a distributed and parallel environment. In this paper, we build a cluster using Raspberry Pi embedded systems for fast, parallel, and distributed audio watermarking. This choice can be justified by the fact that the Raspberry Pi is characterized by its easy portability due to its light weight (46 g) and small size (85.6 mm × 56.5 mm), low power consumption, low cost, and in terms of its functionality and scalability. Raspberry Pi has been used in various domains such as Internet of Things (IoT)^34–36, image processing^37–40, home automation^{41, 42}, and other applications.

Several versions of the Raspberry Pi computer have been produced by the Raspberry Pi Foundation⁴³ with an open-source platform. Compared to the previous versions of the Raspberry Pi (3B, 3B + , 2B, 2B + , 1A, and 1B), Raspberry Pi 4B (Fig. 4) presents a major improvement in terms of processor speed and RAM quantity. The characteristics of the Raspberry Pi 4B are summarized in Table 1. As can be seen in Table 1, the Raspberry Pi 4B is powerful enough to support complex signal processing applications that require a high computational load. In addition, the Raspberry Pi 4B's processor has four physical cores, so it can be very useful when applications implemented on this processor can be run on more than one core. In this paper, a cluster based on four Raspberry Pis 4B is built to have a large number of processor cores, which is very useful to accelerate the time of an audio watermarking system.

Table 1.

Raspberry Pi 4B characteristics.

Feature	Description
Soc	Broadcom BCM2711
Processor	Quad-core Cortex-A72 (ARM v8) 64-bit @ 1.5 GHz
RAM	2 GB
SD card support	Micro SD card slot for loading operating system and data storage
Connectivity	2.4 GHz and 5.0 GHz IEEE 802.11b/g/n/ac wireless LAN, Bluetooth 5.0, BLE Gigabit Ethernet 2 × USB 3.0 ports 2 × USB 2.0 ports
GPIO	Standard 40-pin GPIO header
Video and sound	2 × micro HDMI ports (up to 4Kp60 supported) 2-lane MIPI DSI display port 2-lane MIPI CSI camera port 4-pole stereo audio and composite video port
Input power	5 V via USB-C connector or GPIO header (minimum 3A)

Audio signal	Category	Duration (s)	Bits per sample	Sample rate (kHz)	Format
Classical_looperman-t-5360279-0234295	Classical	60	16	44.1	Wave
Rap_looperman-t-5460389-0236865	Rap	90	16	44.1	Wave
Jazz_looperman-t-5378703-0236794	Jazz	120	16	44.1	Wave
Pop_ooperman-t-0966004-0236621	Pop	150	16	44.1	Wave
Rock_looperman-t-3294503-0236832	Rock	180	16	44.1	Wave

Audio signal	Length of watermark	Payload capacity
Classical_looperman-t-5360279-0234295	5438	90.7407 (bps)
Rap_looperman-t-5460389-0236865	8158	90.7407 (bps)
Jazz_looperman-t-5378703-0236794	10,945	91.4938 (bps)
Pop_ooperman-t-0966004-0236621	13,665	91.4938 (bps)
Rock_looperman-t-3294503-0236832	13,665	91.4938 (bps)

Audio watermarking system	Average payload capacity (bps)
Proposed	91.1926
¹⁹	496.48
¹⁰	541.10
⁹	40.27
³	102.40
⁵	450.00
⁷	172.39
¹⁶	64.00
¹⁷	64.00
¹⁸	64.50

Audio signal	SNR (dB)
Classical_looperman-t-5360279-0234295	31.9157
Rap_looperman-t-5460389-0236865	30.6872
Jazz_looperman-t-5378703-0236794	31.9098
Pop_ooperman-t-0966004-0236621	33.0123
Rock_looperman-t-3294503-0236832	33.5415

Watermarking method	Average SNR (dB)
Proposed	32.2133
¹⁹	31.4936
¹⁰	29.1370
⁹	31.0786
³	22.46
⁵	35.3644
⁷	30.30
¹⁶	26.86
¹⁷	25.26
¹⁸	26.50

Manipulation	Classical	Rap	Jazz	Pop	Rock
Resampling (22,050 Hz)	0	0	0	0	0
Resampling (11,025 Hz)	0	0	0	0	0
Resampling (8000 Hz)	0	0	0	0	0
Requantization (16–8–16 bits)	0	0	0	0	0
Requantization (16–24–16 bits)	0	0	0	0	0

Manipulation	Classical	Rap	Jazz	Pop	Rock
Low-pass filtering (4 kHz)	0	0	0	0	0
Low-pass filtering (500 Hz)	1.5994	1.5922	1.5814	1.5650	1.5647
High-pass filtering (200 Hz)	1.3357	1.3363	1.3503	1.3398	1.3592

Manipulation	Classical	Rap	Jazz	Pop	Rock
Echo addition (50 ms, 5%)	0	0	0	0	0
Echo addition (300 ms, 40%)	0	0	0	0	0

Manipulation	Classical	Rap	Jazz	Pop	Rock
MP3 (128 kbps)	0	0	0	0	0
MP3 (112 kbps)	0	0	0	0	0
MP3 (64 kbps)	9.0872	9.1195	8.0508	8.4168	7.8226
MP3 (32 kbps)	11.0870	10.2092	10.4080	11.3230	10.6726

Manipulation	Classical	Rap	Jazz	Pop	Rock
Amplitude scaling (1.2)	0	0	0	0	0
Amplitude scaling (1.1)	0	0	0	0	0
Amplitude scaling (0.9)	0	0	0	0	0
Amplitude scaling (0.8)	0	0	0	0	0

Manipulation	Classical	Rap	Jazz	Pop	Rock
AWGN (30 dB)	0	0	0	0	0
AWGN (20 dB)	0	0	0	0	0
AWGN (18 dB)	0	0	0	0	0

Manipulation	Classical	Rap	Jazz	Pop	Rock
Random cropping (10%)	0	0	0	0	0
Random cropping (20%)	0	0	0	0	0
Random cropping (30%)	0	0	0	0	0
Random cropping (40%)	1.1785	1.1777	1.1004	1.1467	1.0438

Manipulation	Classical	Rap	Jazz	Pop	Rock
Shifting (5 samples)	0.0341	0.0396	0.0375	0.0272	0.0386
Shifting (10 samples)	0.0793	0.0850	0.0759	0.0942	0.0869
Shifting (20 samples)	0.8171	0.6985	0.8313	0.7244	0.7185
Shifting (50 samples)	2.2274	2.2035	2.2399	2.2358	2.2417
Shifting (100 samples)	8.6434	8.4967	8.5547	8.5606	8.6416
Shifting (150 samples)	15.0390	15.1003	15.1723	15.1292	15.1588

Manipulation	Classical	Rap	Jazz	Pop	Rock
TSM (+ 1)	5.6129	5.0856	5.5747	5.3971	5.7445
TSM (− 1)	6.4553	6.4992	5.9417	6.1319	6.3586
TSM (+ 5)	9.9908	9.9700	9.4926	9.6817	9.7023
TSM (− 5)	10.1900	10.2025	10.0320	9.7840	10.1078

Manipulation	Proposed	¹⁹	¹⁰	⁹	³	⁵	⁷	¹⁶	¹⁷	¹⁸
No modification	0	0	0	0	0	0	0	0	0	0
AWGN (20 dB)	0	0	0	0	0	0.24	4.25	2.42	0.07	4.46
AWGN (15 dB)	0.2458	0.23	1.43	1.53	3.56	6.40	11.32	10.54	9.24	12.78
Resampling (22,050 Hz)	0	0	0	3.90	0	0.10	0	0.32	0.70	0.790
Resampling (11,025 Hz)	0	0	0	6.81	0	4.02	0	6.64	7.82	8.74
Resampling (8000 Hz)	0	0	0	11.02	0	9.33	4.30	12.81	13.52	13.46
Requantization (16–8–16 bits)	0	0	0	1.81	0	0.10	0.85	0	0	0.36
Requantization (16–24–16 bits)	0	0	0	0	1.35	0	0	0	0	1.45
Echo addition (300 ms, 40%)	0	0	0	0	7.74	8.95	6.27	9.46	8.43	9.87
Random cropping (10%)	0	0	0	0	0.37	0	12.58	0.61	0.63	1.05
Random cropping (20%)	0	0	0	0	0.68	0	13.26	3.97	4.64	3.87
Random cropping (30%)	0	0	0	0	1.59	0.67	17.37	8.56	8.04	7.21
Random cropping (40%)	1.13	1.38	0.35	1.25	14.99	0.92	27.65	16.10	16.57	17.54
Low-pass filtering (4 kHz)	0	0	0.51	0.54	0.91	2.32	9.35	0.15	25.64	18.54
Low-pass filtering (500 Hz)	1.58	1.76	1.09	11.83	2.03	21.84	20.17	22.49	36.74	36.48
High-pass filtering (200 Hz)	1.34	1.35	2.84	4.78	2.96	26.05	25.80	22.65	21.97	20.87
Amplitude scaling (0.7)	0	0	0	0	22.04	25.25	0.49	0	0.96	0.24
MP3 (128 kbps)	0	0	0	0	0	0.03	5.30	0.76	0.79	0.68
MP3 (112 kbps)	0	0	0	0	0	3.44	9.36	0.94	1.02	0.85
MP3 (64 kbps)	8.45	8.08	7.31	24.05	0	26.07	22.11	10.10	11.28	9.47
MP3 (32 kbps)	10.74	10.47	25.77	31.80	0	32.30	39.90	28.94	29.45	28.04
Shifting (5 samples)	0.03	0.02	38.40	38.38	36.30	42.28	39.55	3.04	0.02	39.78
Shifting (10 samples)	0.08	0.037	40.17	42.96	39.62	47.40	45.03	4.10	0.13	41.97
Shifting (20 samples)	0.76	0.78	46.58	47.63	46.11	50.03	44.15	6.84	0.89	42.79
Shifting (50 samples)	2.23	2.37	41.60	46.26	47.67	51.41	47.96	8.27	2.46	45.10
Shifting (100 samples)	8.58	8.63	48.36	47.29	45.47	54.57	46.24	17.58	3.94	45.34
Shifting (150 samples)	15.12	15.31	50.46	49.56	50.18	51.06	50.39	21.05	10.78	48.46
TSM (− 1%)	6.28	7.53	14.46	11.96	6.89	12.87	26.40	6.17	1.84	12.54
TSM (− 5%)	5.48	7.01	13.96	11.11	6.65	13.14	25.39	5.84	1.47	11.30
TSM (− 5%)	10.06	15.61	19.79	17.12	14.14	26.80	37.82	12.07	3.34	14.46
TSM (+ 5%)	9.77	15.53	19.51	16.09	14.29	26.75	37.35	11.46	2.69	14.07

Audio signal with its duration	Execution time (in seconds)
Audio signal with its duration	Embedding process	Extraction process
Classical (60 s)	28.2114	22.6615
Rap (90 s)	52.5263	47.71793
Jazz (120 s)	92.7756	86.9927
Pop (150 s)	120.2145	115.9767
Rock (180 s)	181.1722	173.6886

Implementation environment	Audio watermarking system
Implementation environment	Proposed	¹⁰	⁹	¹⁹
PC Intel core i3 2.40 GHz, 6 GB RAM,	49.2675	58.4176	37.9481	71.3133
PC AMD Ryzen 5 2.30 GHz, 8 GB RAM	19.6285	23.5018	10.8438	29.1655
Raspberry Pi cluster (4 cores)	7.6210	7.8491	5.9706	8.4218
Raspberry Pi cluster (8 cores)	4.0680	4.1787	3.7922	4.5469
Raspberry Pi cluster (16 cores)	2.3197	2.6324	1.9572	2.9649

PERMALINK

An efficient watermarking algorithm for digital audio data in security applications

Mohamed Yamni

Achraf Daoui

Hicham Karmouni

Mhamed Sayyouri

Hassan Qjidaa

Saad motahhir

Ouazzani Jamil

Walid El-Shafai

Abeer D Algarni

Naglaa F Soliman

Moustafa H Aly

Abstract

Introduction

Figure 1.

Figure 2.

Discrete fractional Charlier transform

Dual-tree complex wavelet transform

Figure 3.

Ethics approval

Consent to participate

Modified Henon map

Raspberry Pi cluster

Figure 4.

Table 1.

Figure 5.

Figure 6.

Proposed parallel audio watermarking system

Figure 7.

Embedding process

Extraction process

Experiments results

Table 2.

Payload capacity

Table 3.

Table 4.

Imperceptibility

Figure 8.

Figure 9.

Table 5.

Table 6.

Robustness against common signal processing manipulations

Robustness without signal processing manipulations

Table 7.

Robustness to AWGN

Table 8.

Robustness to resampling and requantizing

Table 9.

Robustness to signal filtering

Table 10.

Robustness to echo addition

Table 11.

Robustness to MP3 compression

Table 12.

Robustness to amplitude scaling

Table 13.

Robustness to cropping

Table 14.

Robustness to shifting

Table 15.

Robustness to TSM

Table 16.

Table 17.

Time complexity analysis

Table 18.

Figure 10.

Figure 11.

Table 19.

Table 20.

Conclusion

Acknowledgements

Author contributions

Funding

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement