Aided Recognition and Training of Music Features Based on the Internet of Things and Artificial Intelligence

Xidan Zhang

doi:10.1155/2022/3733818

. 2022 Mar 11;2022:3733818. doi: 10.1155/2022/3733818

Aided Recognition and Training of Music Features Based on the Internet of Things and Artificial Intelligence

Xidan Zhang ^1,^✉

PMCID: PMC8933112 PMID: 35310596

Abstract

With the development of the Internet of Things, many industries have been on the train of the information age, and digital audio technology is also constantly developing. Music retrieval has gradually become a research hotspot in the music industry. Among them, the auxiliary recognition of music characteristics is also a particularly important Task. Music retrieval is mainly to manually extract music signals, but now the music signal extraction technology has encountered a bottleneck. The article uses Internet and artificial intelligence technology to design an SNN music feature recognition model to identify and classify music features. The research results of the article show (1) statistic graphs of the main melody and accompanying melody of different music. The absolute value of the main melody and accompanying melody mainly fluctuates in the range of 0–7, and the proportion of the main melody can reach 36%. The accompanying melody can reach 17%. After the absolute value of the interval reaches 13, the interval ratio of the main melody and the accompanying melody tends to be stable, maintaining between 0.6 and 0.9, and the melody interval ratio value completely coincides; the main melody in the interval variable is X. (1) The relative difference value in the interval of −X(16) fluctuates greatly. After the absolute value of the interval reaches 17, the interval ratio of the main melody and the accompanying melody tends to be stable, maintaining between 0.01 and 0.04 and the main melody. The value of the difference is always higher than the accompanying melody. (2) When the number of feature maps is 24∗5, the recognition result is the most accurate, MAP recognition result can reach 78.8, and the recognition result of precision@ is 79.2; when the feature map size is 5∗5, the recognition result is the most accurate, MAP recognition result can reach 78.9, the recognition result of precision@ is 79.2, and the recognition result of HAM2 (%) is 78.6. The detection accuracy of the SNN music recognition model proposed in the article is the highest. When the number of bits is 64, the detection accuracy of the SNN detection model is 59.2%, and the detection accuracy of the improved SNN music recognition model is 79.3%, which is better than the detection rate of ITQ music recognition model of 17.9%, which is 61.4% higher. The experimental data further shows that the detection efficiency of the ITQ music recognition model is the highest. (3) The SNN music recognition model proposed in the article has the highest detection accuracy, regardless of whether it is in a noisy or no-noise music environment, with an accuracy rate of 97.97% and a detection accuracy value of 0.88, which is 5 types of music. The highest one among the recognition models, the ITQ music recognition model, has the lowest detection accuracy, with a detection accuracy of 67.47% in the absence of noise and a detection accuracy of 70.23% in the presence of noise. Although there is a certain noise removal technology, it can suppress noise interference to a certain extent, but cannot accurately describe music information, and the detection accuracy rate is also low.

1. Introduction

Because the network has the advantages of fast information dissemination, easy use, and sufficient network resources, it is widely used in human work and study life. At present, with the rapid development of popular music in our country, music is everywhere, and the music wave has also affected us. When faced with a wide variety of music types, users will inevitably feel at a loss. Users need to spend a lot of time choosing the type of music they are interested in. This method is not only a waste of time, but also very inefficient. Based on the above background, it is inevitable to design an intelligent auxiliary model of music characteristics. Literature [1] studied the ability of using self-organizing neural mapping as a music style classifier for music fragments. The article cuts the music melody into many segments of equal length, then analyzes the music melody and rhythm, and presents the analyzed data to SOM. Document [2] discloses a system and method for implementing a simple and fast real-time single note recognition algorithm based on fuzzy pattern matching. The system can accept the music rhythm and notes during the performance, and then compared with the correct music rhythm, you can know whether the music rhythm during the performance is standard. Literature [3] proposed a new method for automatic music genre recognition in the visual domain using two texture descriptors. Literature [4] introduces the use of a dynamic set of classifier selection schemes and creates a classifier pool to perform automatic music genre classification. The working principle of the classifier is the principle of support vector machine, which can extract effective information from the spectrum image of music. The research results of the article show that the accuracy of music extraction can reach 83%. Literature [5] introduced optical music recognition technology and proposed a method for computer to automatically recognize music scores. The system can scan the printed images of music scores to extract effective information and then automatically generate audio files to provide users with listening functions. Literature [6] proposed a statistical method to deal with the task of handwritten music recognition in early notation. This method of processing music is different from the traditional method in that it directly recognizes the music signal without dividing the music signal into many paragraphs. Literature [7] investigated various aspects of automatic emotion recognition in music. Music is also a good way to express emotions. Different classifications and timbres in music will interpret different musical effects. This article explores the extensive research on music emotion recognition. Literature [8] studied the utility of the most advanced pretraining deep audio embedding method used in the task of music emotion recognition. Literature [9] proposed a music emotion recognition method based on adaptive aggregation regression model. Emotion recognition of music is an important task to evaluate the influence of music on the emotions of listeners. The article proposes an emotion estimation model, which uses the variance obtained by Gaussian process regression to measure the confidence of the estimation results of each regression model. Literature [10] proposed a new method of using template matching and pixel pattern features in computer games. The general music model does not have much to do with the change of the font, but the beats and notes of some notes do not maintain the original shape of the music signal. The model proposed in the article can be applied to these music symbols. Literature [11] proposed a method to solve the problem of multidimensional music emotion recognition, combining standard and melody audio features. Literature [12] studied the reduction of the number of training examples in music genre recognition. The article studies the impact of the reduction of training real numbers on the detection results in the process of music style recognition. The experimental results show that although the number of experiments is greatly reduced during the detection process, it can still maintain a high classification performance in many cases. Literature [13] presents a method to parse solo performances into individual note components and use support vector machines to adjust the back-end classifier. In order to realize the generalization of instrument recognition to ready-made, commercial solo music, [14] proposed a method of musical instrument recognition in chord recording. Literature [15] proposed a method for analyzing and recognizing music speech signals based on speech feature extraction. The method is to extract effective music information from the music signal and then reorganize the music signal to a certain extent, so as to achieve the function of noise reduction. The results of the experiment show that the reorganized music signal has good noise reduction compared with the original music signal ability.

2. Research on Auxiliary Recognition of Music Features

2.1. Overall Structure of Music Feature Recognition

The music feature recognition system based on the Internet of Things technology is mainly composed of a physical perception layer, a capability layer, an adaptation layer, and a system application layer. The overall structure of the system is shown in Figure 1.

Framework diagram of music feature recognition.

2.2. Design of Music Collection Module

To identify the music signal, it is necessary to collect the music signal first. The music collection module is composed of two parts, namely, the collection submodule and the encoding module. The music collection submodule is composed of sound sensors installed in different positions and is responsible for collecting the original music signal [16]. The sound sensor has a built-in capacitive electret microphone that is sensitive to sound, which is converted by an A/D converter and transmitted to the voice coding submodule [17]. The voice coding submodule is mainly responsible for the high-fidelity and lossless compression of the original music signal, converts the music signal into transmittable data information, and then transmits it to the music signal processing module.

2.3. Music Signal Module Processing Design

The music signal processing module is designed by a DSP processor [18]. The module uses a fixed DS chip suitable for voice signal processing. The DSP chip has low power consumption and fast running speed. It carries 2 MCBSPS, can be connected to CODEC for voice input, and has an 8-bit enhanced host parallel port to communicate with the host. Establish a communication connection, including 4 KB ROM and 16 KB DARAM. Its structure is shown in Figure 2:

3. Music Feature Assisted Recognition and Training

3.1. Extraction of Basic Music Features

Pitch, time value, and tone intensity are the most basic elements of music characteristics. The formula for the pitch level of music is defined as

\begin{matrix} \bar{p} = \frac{\sum_{i = 1}^{n} p i}{n} . \end{matrix}

(1)

pi represents the pitch of the i note, and n represents the number of notes in the music.

Treble changes:

\begin{matrix} Var_p = \frac{\sum_{i = 1}^{N - 1} {Bar}_{i + 1} - {Bar}_{i}}{N \bar{P}} . \end{matrix}

(2)

The pitch mean square error can be used to express the pitch change:

\begin{matrix} Var_P = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(P_{i} - \frac{\sum_{i = 1}^{n} P_{i}}{n})}^{2}} . \end{matrix}

(3)

The range describes the breadth of the pitch of the music:

\begin{matrix} range = Max (P_{1}, P_{2}, \dots, P_{n}) - Min (P_{1}, P_{2}, \dots, P_{n}) . \end{matrix}

(4)

Time value:

\begin{matrix} duration = end time - start time . \end{matrix}

(5)

3.1.1. Tone and Music Feature Extraction

The frequency spectrum distribution of music signals and the emotions expressed by timbre perception are shown in Table 1 [19].

Table 1.

Tone-emotion mapping relationship.

Musical emotion type	Tonal characteristics
Hate class	The tone is sharp, rough, and bright
Depression	The tone is deep, pure, simple, monotonous bass, dim, plain, and hollow
Calm meditation class	The tone is soft, pure, and simple
Desire	The tone is deep, plain, hollow, pure, and simple
Pastoral style	The tone is soft, pure, and simple
Perceptual	The tone is soft, sweet and soft, rich, gorgeous, pleasant, and nasal
Active class	The tone is bright, rich, and gorgeous
Awesome	The tone is bright, sharp, and brilliant

Open in a new tab

The formula for extracting music strength is

\begin{matrix} Dyn = \frac{1}{n} \sum_{i = 1}^{n} I_{t}, \\ Var_Dyn = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} (I_{i} - \frac{\sum_{i = 1}^{m} I_{i}}{m})} . \end{matrix}

(6)

The degree of musical intensity change can also be expressed as

\begin{matrix} Var_Dyn = \frac{\sum_{i = 1}^{N - 1} D_{t + 1} - D_{t}}{N \cdot Dyn} . \end{matrix}

(7)

3.1.2. Melody Direction Recognition

The expression formula of music melody is

\begin{matrix} Mel = \sum_{i = 1}^{n - 1} \frac{(P_{i + 1} - P_{i}) \cdot D_{i}}{D - D_{n}} . \end{matrix}

(8)

D represents the total length of all notes; D_i represents the length of the i-th note [20].

The melody direction can also be expressed as

\begin{matrix} Mel = \sum_{i = 1}^{n - 1} \frac{P_{i + 1} - P_{i}}{D_{i}} . \end{matrix}

(9)

The expression formula of pronunciation point density is

\begin{matrix} density = \frac{n}{D} . \end{matrix}

(10)

The change intensity of the rhythm is

\begin{matrix} Rhy = \sum_{i = 1}^{n - 1} |\frac{I_{i + 1} - I_{i}}{D_{i}}| . \end{matrix}

(11)

Music mutation degree:

\begin{matrix} mutation = Max (\frac{|BarCapacity (i) - BarCapacity (i - 1)|}{Max (BarCapacity (i))}) . \end{matrix}

(12)

The expression of BarCapacity is [21]

\begin{matrix} BarCapacity = \sum_{i = 0}^{n} \frac{K (f_{0}) \cdot D_{i} \cdot I_{i}}{j}, \\ K (f_{0}) = \{\begin{matrix} \frac{90}{120 - (30 f_{0} / 500)}, when 20 Hz ≺ f_{0} ≺ 500 Hz \\ 1, when 500 Hz ≺ f_{0} ≺ 1000 Hz, \\ \frac{90}{90 - (10 \times (f_{0} - 1000) / 4000)}, when 1000 Hz ≺ f_{0} ≺ 5000 Hz, \end{matrix} \\ j = \frac{\sum_{i = 0}^{n} {Velocity}_{i}}{120} . \end{matrix}

(13)

3.2. Musical Inference Rules

Sudden changes in treble or tone stability appear in the sequence variance. In order to measure these change points, first express the music as the following time sequence:

\begin{matrix} Y_{k} = μ + ε_{k}, k = 0,1,2,3, \dots, T . \end{matrix}

(14)

Among them, μ represents the unknown constant mean value of time series Y_k, and σ² represents the unknown constant variance of time series Y_k (and ε_k).

Get the iterative residual sequence:

\begin{matrix} a_{k} = \frac{Y_{k} - (\sum_{i = 0}^{k - 1} (Y_{i} / k))}{\sqrt{(k + 1 / k) S_{y}^{2}}}, k = 0,1,2,3, \dots, T, \end{matrix}

(15)

and make

\begin{matrix} C_{k} = \sum_{i = 1}^{k} a_{i}^{2} k = 0,1,2,3, \dots, T . \end{matrix}

(16)

Get statistics:

\begin{matrix} W_{k} = \frac{C_{k}}{C_{T}}, k = 0,1,2,3, \dots, T . \end{matrix}

(17)

After centralized processing,

\begin{matrix} D_{k} = \frac{C_{k}}{C_{T}} - \frac{k}{T}, k = 0,1,2,3, \dots, T, D_{0} = D_{T} = 0. \end{matrix}

(18)

3.3. Music Separation Algorithm

According to the difference between the impact sound and the harmonic sound in the frequency spectrum, we can separate the original spectrum W_fJ into the impact spectrum P_fJ and the harmonic spectrum, which is

\begin{matrix} W_{f J} = P_{f J} + H_{f J} . \end{matrix}

(19)

The separation of impact sound and harmonic sound:

\begin{matrix} Q (H^{t}, P^{t}, U^{t}, V^{t}) = \frac{1}{σ_{H}^{2}} \sum_{f J} \{{(H_{f, t - 1}^{t} - U_{f, t}^{t})}^{2} - {(H_{f, t}^{t} - U_{f, t}^{t})}^{2}\} + \frac{1}{σ_{p}^{2}} \sum_{f J} \{{(P_{f, t - 1}^{t} - V_{f, t}^{t})}^{2} - {\{P_{f, t}^{t} - V_{f, t}^{t}\}}^{2}\} . \end{matrix}

(20)

Minimum:

\begin{matrix} H_{f, J}^{t + 1} = H_{f J}^{t} + Δ^{t}, \\ P_{f, J}^{t + 1} = P_{f J}^{t} + Δ^{t}, \end{matrix}

(21)

\begin{matrix} Δ^{t} = \frac{α}{4} (H_{f J - 1}^{t} - 2 H_{f t + 1}^{t} + H_{f t + 1}^{t}) - \frac{1 - α}{4} (P_{f J - 1}^{t} - 2 P_{f t + 1}^{t} + P_{f t + 1}^{t}), \\ α = \frac{σ_{γ}^{2}}{σ_{H}^{2} + σ_{γ}^{2}} . \end{matrix}

(22)

4. Simulation Experiment

4.1. Music Feature Recognition

4.1.1. Algorithm Definition

Algorithm definition is as shown in Table 2.

Table 2.

Algorithm definition table.

	Definition	Publicity
Interval statistics	Statistics of the melody interval are carried out on each track of the music [22]	$Interval_{Sta}_{i} = \{\begin{matrix} \|{Interval}_{i}\|, \|{Interval}_{i}\| < 25 \\ 25, \|{Interval}_{i}\| \geq 25 \end{matrix}$
Classification algorithm	According to the different characteristics of the interval distribution, the main track and the accompanying track are distinguished [23]	Rhythm=(n/T)=(n/(Dura_Meter × Dura_Num))

Open in a new tab

4.1.2. Experimental Data and Research

The article uses the Internet of Things and human intelligence technology to design an SNN music feature assisted recognition model. In order to detect the recognition efficiency of the SNN music feature assisted recognition model, the experiment selected more than 50 pieces of multiple types of music for music feature recognition and counted them separately: the main melody and accompanying melody curves of different music. The main melody lines of different types of music are different, and the main characteristics of music melody are linearity and fluidity. The abscissa of the experimental statistics graph represents the absolute value of the interval, and the ordinate represents the percentage of the absolute value of the interval. The specific experimental results are shown in Figure 3.

Interval statistics of different audio tracks.

From the data in Figure 3, we can conclude that the absolute value of the interval between the main melody and the accompanying melody mainly fluctuates in the range of 0–7. In the interval line chart of the main melody, the second degree accounted for the highest proportion of the interval melody. Reaching 36%, in the interval line chart accompanying the melody, 5 degrees accounted for the highest proportion of interval melody, up to 17%. After the absolute value of the interval reaches 13, the interval ratio of the main melody and the accompanying melody tends to be stable, maintaining between 0.6 and 0.9, and the melody interval ratio values completely coincide.

According to the experimental data in Figure 4, we can conclude that the relative difference of the main melody within the interval of X(1)–X(16) fluctuates greatly. When the interval variable is X(3), the relative difference is the largest. The maximum can reach 0.79. The relative difference value of the accompanying melody in the interval variable X(1)–X(10) fluctuates greatly. When the interval variable is X(3), the relative difference value is the largest, and the maximum can reach 0.61. After the absolute value of the interval reaches 17, the interval ratio of the main melody and the accompanying melody tends to be stable, maintaining between 0.01 and 0.04, and the difference between the main melody and the accompanying melody is always higher than that of the accompanying melody.

Relative difference distribution of interval statistics.

4.2. Comparative Experiment and Analysis

4.2.1. The Influence Experiment of Feature Map

Based on the same recognition results of different features, it can directly reflect the recognition accuracy of different models and experimentally study the influence of the number and size of feature maps on the detection results. In the different feature map number recognition experiment, different distributions of convolutional layers were selected, and the distribution size was from 8 to 64. In the different feature map size recognition results experiment, 11 feature maps of different sizes were selected. The experimental data is shown in Tables 3 and 4.

Table 3.

Recognition results of different number of feature maps.

Quantity	MAP (%)	Precision@500 (%)	HAM2 (%)
8∗5	74.7	76.3	74.9
16∗5	74.5	77.1	76.2
24∗5	78.8	79.2	79.6
32∗5	77.6	78.9	78.1
48∗5	77.3	77.5	76.6
64∗5	74.8	76.6	75.7

Open in a new tab

Table 4.

Recognition results of different feature map sizes.

Quantity	MAP (%)	Precision@500 (%)	HAM2 (%)
4∗4	78.7	79.2	78.8
5∗5	78.9	79.2	78.6
6∗6	78.8	79.2	79.6
7∗7	78.7	78.9	78.6
8∗8	77.6	77.9	77.7
9∗9	76.4	77.2	76.8
10∗10	76.8	77.5	77.1
11∗11	75.7	76.6	76.3
12∗12	75.8	75.9	76.1
13∗13	75.2	75.6	75.7
14∗14	74.1	75.7	75.8

Open in a new tab

According to the data in Table 3 and Figure 5, we can conclude that when the number of feature maps is 24∗5, the recognition result is the most accurate, the MAP recognition result can reach 78.8, the recognition result of precision@ is 79.2, and the recognition result of HAM2 (%) is 79.6. When the number of feature maps is 8∗5, the accuracy of the recognition result is the lowest. The recognition rate of MAP is 74.7, the recognition result of precision@ is 76.3, and the recognition result of HAM2 (%) is 74.9. In general, the detection accuracy of 6 different numbers of feature maps generally maintains above 74%.

According to the data in Table 4 and Figure 6, we can conclude that when the feature map size is 5∗5, the recognition result is the most accurate. The MAP recognition result can reach 78.9, the recognition result of precision@ is 79.2, and the recognition result of HAM2 (%) is 78.5. When the feature map size is 78.6 and the feature map size is 14∗14, the recognition accuracy is the lowest. The recognition accuracy of MAP is 74.1, the recognition result of precision@ is 75.7, and the recognition result of HAM2 (%) is 75.8. In general, the detection accuracy of 11 different sizes of feature maps generally maintains above 74%.

4.2.2. Comparison with Other Methods

In order to test the performance of the music recognition model, the experiment improved the SNN music recognition model proposed in the article and compared it with the detection performance of the other three models. The experiment chose 5 different types of bit numbers. The number of bits is a unit, and the same as the sampling accuracy, the higher the baud rate or bit rate is, the more detailed the light changes of the music can be reflected. Observe the detection accuracy rates of 5 different models under different types of bits. The specific experimental data are shown in Table 5.

Table 5.

Average mean precision of different number of bits.

Method	16 bits	24 bits	32 bits	48 bits	64 bits
SNN music recognition model	55.2	56.6	55.8	58.1	59.2
Improved SNN music recognition model	76.7	77.9	78.3	78.9	79.3
CNNH music recognition model	46.5	52.1	52.1	53.2	53.3
KSH music recognition model	30.3	33.7	34.7	35.6	36.5
ITQ music recognition model	16.2	16.9	17.3	17.5	17.9

Open in a new tab

According to the data in Table 5 and Figure 7, we can conclude that the detection accuracy of the SNN music recognition model proposed in the article is the highest among 5 different music recognition models. When the number of bits is 64, the SNN detection accuracy rate of the improved SNN music recognition model is 59.2%, and the detection accuracy rate of the improved SNN music recognition model is 79.3%, which is 61.4% higher than the 17.9% detection rate of the ITQ music recognition model. The experimental data further shows that the ITQ music recognition model has the highest detection efficiency, which greatly promotes the efficiency of music feature auxiliary recognition.

4.3. Test Model Performance Comparison Test

4.3.1. Evaluation Criteria

The evaluation criteria are as shown in Table 6.

Table 6.

Evaluation criteria table.

Index	Metrics	Formula
Accuracy	The accuracy measurement standard refers to the ratio of the number of correct music types to the number of all music types [24]. The larger the index value, the more accurate the recognition result	Precision=(hits_u/recset_u)
Recall rate	The recall rate standard refers to the proportion of the theoretically largest number of hits that recognize musical characteristics [25]. The larger the index value, the more accurate the recognition result	Recall=(hits_u/testset_u)
F1 measurement	The F1 measurement index can effectively balance the accuracy rate and the recall rate by favoring the smaller value. The larger the index value, the more accurate the recognition result	F1=(2 × precision × recall/(precision+recall))

Open in a new tab

4.3.2. Experimental Results and Analysis

In order to test the performance of the SNN music feature-assisted recognition model, we run the model proposed in the article and other music recognition models under noisy and no-noise music conditions, observe the detection accuracy of different models, and verify the detection accuracy of different models. In order to make the experimental results more analytical, we have selected 5 different types of music data. The experiment detects these 5 types of music data with and without noise and observes the experimental results. The music sample data is shown in Table 7, and the specific detection results are shown in Tables 8 and 9.

Table 7.

Music classification and detection object table.

Music type number	Noisy	No noise
1	10	30
2	10	30
3	20	40
4	10	40
5	20	50
6	20	50

Open in a new tab

Table 8.

No-noise recognition results.

Model	Accuracy (%)	Accuracy (%)	Recall rate (%)	F1 score (%)
SNN music recognition model	95.71	96.83	96.74	95.82
Improved SNN music recognition model	97.97	98.10	97.89	98.00
CNNH music recognition model	80.21	82.31	83.24	84.51
KSH music recognition model	72.14	73.46	90.26	91.32
ITQ music recognition model	67.47	68.24	66.76	67.12

Open in a new tab

Table 9.

Noisy recognition results.

Model	Accuracy (%)	Accuracy (%)	Recall rate (%)	F1 score (%)
SNN music recognition model	92.12	91.83	91.64	91.28
Improved SNN music recognition model	93.91	94.21	94.74	94.62
CNNH music recognition model	75.32	77.23	76.34	77.21
KSH music recognition model	68.62	69.24	69.12	68.24
ITQ music recognition model	70.23	71.22	74.21	72.45

Open in a new tab

According to the data in Table 8 and Figure 8, we can conclude that the SNN music recognition model proposed in the article has the highest detection accuracy, with an accuracy rate of 97.97% and a detection accuracy value of 0.88. It is 5 types of music recognition models. The tallest one among them, the ITQ music recognition model, has the lowest detection accuracy rate of 67.47%, and the highest detection accuracy value is 0.3. The CNNH music recognition model and the KSH music recognition model are in the middle of the highest value and the lowest value.

Comparison of uninteresting music classification and detection accuracy.

We can find from Figure 9 that the ITQ music recognition model has the lowest detection accuracy. The detection accuracy rate in the absence of noise is 67.47%, and the detection accuracy in the presence of noise is 70.23%. Although there is a certain noise removal technology, it can suppress noise interference to a certain extent, but cannot accurately describe music information, and the detection accuracy rate is also low. The detection accuracy of the KSH music recognition model is higher than that of the ITQ music recognition model, which can accurately describe the changes of music signals, but there are certain defects in noise processing, and the error rate of music detection is relatively large. The SNN music feature assisted recognition model proposed in the article has the highest detection accuracy among the five models, and there are many types of music detected, which can analyze music signals more comprehensively and systematically, and the accuracy rate is as high as 99.12%, thus greatly improving the efficiency of music detection. It is believed that the detection accuracy will be improved by using feature extraction approaches.

Comparison of dry music classification and detection accuracy.

5. Conclusion

Today we are in an era of informationization and intelligence. The use of intelligent methods to study music has attracted more and more people's attention. Computer music has also made many achievements and has a very broad market prospect. Using a computer to simulate music signals, this process involves not only computers and music, but also a lot of complex professional knowledge. At present, there are still many problems in the auxiliary recognition of music characteristics in human intelligence; despite the design of the auxiliary model of music characteristics recognition in the article, music signals can be analyzed and identified efficiently, but the way of expressing music needs further research.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.

References

1.León P. J. P. D., Quereda J. Feature-driven recognition of music styles. Proceedings of the Pattern Recognition and Image Analysis, First Iberian Conference, Ib PRIA 2003; June 2003; Puerto de Andratx, Mallorca, Spain. DBLP; pp. 51–65. [Google Scholar]
2.Sinith M. S., Tripathi S., Murthy K. V. V. Real-time swara recognition system in Indian Music using TMS320C6713. Proceedings of the 2015 International Conference on Advances in Computing; 2015; Kochi, India. Communications and Informatics (ICACCI); pp. 51–65. [Google Scholar]
3.Costa Y., Oliveira L., Koerich A. Music genre recognition using gabor filters and LPQ texture descriptors. Proceedings of the Iberoamerican Congress on Pattern Recognition; 2013; Berlin Heidelberg. Springer; pp. 114–121. [Google Scholar]
4.Costa Y., Oliveira L., Koerich A. Music genre recognition based on visual features with dynamic ensemble of classifiers selection[J] IEEE . 2013;12(08):21–32. [Google Scholar]
5.Liu X. Application and research on optical music recognition. Computer Engineering . 2003;12(11):21–36. [Google Scholar]
6.Calvo-Zaragoza J., Toselli A. H., Vidal E. Early handwritten music recognition with hidden markov models[C]. IEEE ; Proceedings of the International Conference on Frontiers in Handwriting Recognition; 2017; pp. 21–31. [Google Scholar]
7.Janani S., Iyswarya K., Visuwasam L. A. Critical survey on music emotion recognition techniques for music information retrieval[J] Progress in Textile Science & Technology . 2011;26(1):11–17. [Google Scholar]
8.Koh E., Dubnov S. Comparison and analysis of deep audio embeddings for music emotion recognition. Acoustic Engineering . 2021;04(12):117–121. [Google Scholar]
9.Fukayama S., Goto M. Adaptive aggregation of regression models for music emotion recognition. Journal of the Acoustical Society of America . 2016;140(4):p. 3091. doi: 10.1121/1.4969635. [DOI] [Google Scholar]
10.Ki Woong L., Bong C. The music score recognition system of the robust music symbols distortion for computer games. Journal of The Korean Society for Computer Game . 2015;28(4):17–26. [Google Scholar]
11.Rocha B., Panda R., Rui P. P. Dimensional music emotion recognition: combining standard and melodic audio features. Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research – CMMR’2013; 2014; pp. 21–31. [Google Scholar]
12.Vatolkin I., Preuß M., Rudolph G. Training Set Reduction Based on 2-Gram Feature Statistics for Music Genre Recognition . Technische Universität, Faculty of Computer Science, Algorithm Engineering; 2008. pp. 45–52. [Google Scholar]
13.Patil K., Elhilali M. Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases[J] EURASIP Journal on Audio Speech and Music Processing . 2015;2015(1):1–13. doi: 10.1186/s13636-015-0070-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Vatolkin I., Nagathil A., Theimer W. Performance of Specific vs. Generic Feature Sets in Polyphonic Music Instrument Recognition . Berlin, Heidelberg: Springer; 2013. pp. 14–16. [Google Scholar]
15.Li juan Y. U. Study on the music recognition method based on voiceprint recognition[J] Automation & Instrumentation . 2018;04(16):25–32. [Google Scholar]
16.Guobin C., Sun Z., Zhang L. Road Identification Algorithm for Remote Sensing Images Based on Wavelet Transform and Recursive Operator. IEEE Access . 2020;8:141824–141837. [Google Scholar]
17.Ning X., Li W., Tang B., He H. BULDP: biomimetic uncorrelated locality discriminant projection for feature extraction in face recognition. IEEE Transactions on Image Processing . 2018;27(5):2575–2586. doi: 10.1109/tip.2018.2806229. [DOI] [PubMed] [Google Scholar]
18.Hong L. L. Y., Hong-Jiang Z. A new approach to query by humming in music retrieval. Proceedings of the 2001 IEEE International Conference on Multimedia and Expo; August 22-25, 2001; Tokyo, Japan. pp. 322–324. [Google Scholar]
19.Liu D., Zhang N., Zhu H. Form and mood recognition of Johann Strauss’s waltz centos. Chinese Journal of Electronics . 2003;12(4):587–593. [Google Scholar]
20.Liu D., Zhang N., Zhu H. CAD system of music animation based on form and mood recognition. Pattern Recognition and Artificial Intelligence . 2003;16(3):271–283. [Google Scholar]
21.Juslin N., Laukka Improving cemotional communication in music performance through cognitive feedback. Musicae Scientiae . 2004;12(08):151–183. [Google Scholar]
22.Juslin N., Madison G. The role of timing patterns in recognition of emotional expression from musical performance Music perception. Music Perception: An Interdisciplinary . 1999;17(2):197–221. [Google Scholar]
23.Chen G., Zhang Y., Wang S. Hyperspectral remote sensing IQA via learning multiple kernels from mid-level features[J] Signal Processing: Image Communication . 2020;83:p. 115804. doi: 10.1016/j.image.2020.115804. [DOI] [Google Scholar]
24.Yu D., Wang S., Deng L. Sequential labeling using deep-structured conditional random fields[J] IEEE Journal of Selected Topics in Signal Processing . 2010;4(6):965–973. doi: 10.1109/jstsp.2010.2075990. [DOI] [Google Scholar]
25.Wu C. Application of digital image based on machine learning in media art design. Computational Intelligence and Neuroscience . 2021;2021:p. 8546987. doi: 10.1155/2021/8546987. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The experimental data used to support the findings of this study are available from the corresponding author upon request.

[B1] 1.León P. J. P. D., Quereda J. Feature-driven recognition of music styles. Proceedings of the Pattern Recognition and Image Analysis, First Iberian Conference, Ib PRIA 2003; June 2003; Puerto de Andratx, Mallorca, Spain. DBLP; pp. 51–65. [Google Scholar]

[B2] 2.Sinith M. S., Tripathi S., Murthy K. V. V. Real-time swara recognition system in Indian Music using TMS320C6713. Proceedings of the 2015 International Conference on Advances in Computing; 2015; Kochi, India. Communications and Informatics (ICACCI); pp. 51–65. [Google Scholar]

[B3] 3.Costa Y., Oliveira L., Koerich A. Music genre recognition using gabor filters and LPQ texture descriptors. Proceedings of the Iberoamerican Congress on Pattern Recognition; 2013; Berlin Heidelberg. Springer; pp. 114–121. [Google Scholar]

[B4] 4.Costa Y., Oliveira L., Koerich A. Music genre recognition based on visual features with dynamic ensemble of classifiers selection[J] IEEE . 2013;12(08):21–32. [Google Scholar]

[B5] 5.Liu X. Application and research on optical music recognition. Computer Engineering . 2003;12(11):21–36. [Google Scholar]

[B6] 6.Calvo-Zaragoza J., Toselli A. H., Vidal E. Early handwritten music recognition with hidden markov models[C]. IEEE ; Proceedings of the International Conference on Frontiers in Handwriting Recognition; 2017; pp. 21–31. [Google Scholar]

[B7] 7.Janani S., Iyswarya K., Visuwasam L. A. Critical survey on music emotion recognition techniques for music information retrieval[J] Progress in Textile Science & Technology . 2011;26(1):11–17. [Google Scholar]

[B8] 8.Koh E., Dubnov S. Comparison and analysis of deep audio embeddings for music emotion recognition. Acoustic Engineering . 2021;04(12):117–121. [Google Scholar]

[B9] 9.Fukayama S., Goto M. Adaptive aggregation of regression models for music emotion recognition. Journal of the Acoustical Society of America . 2016;140(4):p. 3091. doi: 10.1121/1.4969635. [DOI] [Google Scholar]

[B10] 10.Ki Woong L., Bong C. The music score recognition system of the robust music symbols distortion for computer games. Journal of The Korean Society for Computer Game . 2015;28(4):17–26. [Google Scholar]

[B11] 11.Rocha B., Panda R., Rui P. P. Dimensional music emotion recognition: combining standard and melodic audio features. Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research – CMMR’2013; 2014; pp. 21–31. [Google Scholar]

[B12] 12.Vatolkin I., Preuß M., Rudolph G. Training Set Reduction Based on 2-Gram Feature Statistics for Music Genre Recognition . Technische Universität, Faculty of Computer Science, Algorithm Engineering; 2008. pp. 45–52. [Google Scholar]

[B13] 13.Patil K., Elhilali M. Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases[J] EURASIP Journal on Audio Speech and Music Processing . 2015;2015(1):1–13. doi: 10.1186/s13636-015-0070-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Vatolkin I., Nagathil A., Theimer W. Performance of Specific vs. Generic Feature Sets in Polyphonic Music Instrument Recognition . Berlin, Heidelberg: Springer; 2013. pp. 14–16. [Google Scholar]

[B15] 15.Li juan Y. U. Study on the music recognition method based on voiceprint recognition[J] Automation & Instrumentation . 2018;04(16):25–32. [Google Scholar]

[B16] 16.Guobin C., Sun Z., Zhang L. Road Identification Algorithm for Remote Sensing Images Based on Wavelet Transform and Recursive Operator. IEEE Access . 2020;8:141824–141837. [Google Scholar]

[B17] 17.Ning X., Li W., Tang B., He H. BULDP: biomimetic uncorrelated locality discriminant projection for feature extraction in face recognition. IEEE Transactions on Image Processing . 2018;27(5):2575–2586. doi: 10.1109/tip.2018.2806229. [DOI] [PubMed] [Google Scholar]

[B18] 18.Hong L. L. Y., Hong-Jiang Z. A new approach to query by humming in music retrieval. Proceedings of the 2001 IEEE International Conference on Multimedia and Expo; August 22-25, 2001; Tokyo, Japan. pp. 322–324. [Google Scholar]

[B19] 19.Liu D., Zhang N., Zhu H. Form and mood recognition of Johann Strauss’s waltz centos. Chinese Journal of Electronics . 2003;12(4):587–593. [Google Scholar]

[B20] 20.Liu D., Zhang N., Zhu H. CAD system of music animation based on form and mood recognition. Pattern Recognition and Artificial Intelligence . 2003;16(3):271–283. [Google Scholar]

[B21] 21.Juslin N., Laukka Improving cemotional communication in music performance through cognitive feedback. Musicae Scientiae . 2004;12(08):151–183. [Google Scholar]

[B22] 22.Juslin N., Madison G. The role of timing patterns in recognition of emotional expression from musical performance Music perception. Music Perception: An Interdisciplinary . 1999;17(2):197–221. [Google Scholar]

[B23] 23.Chen G., Zhang Y., Wang S. Hyperspectral remote sensing IQA via learning multiple kernels from mid-level features[J] Signal Processing: Image Communication . 2020;83:p. 115804. doi: 10.1016/j.image.2020.115804. [DOI] [Google Scholar]

[B24] 24.Yu D., Wang S., Deng L. Sequential labeling using deep-structured conditional random fields[J] IEEE Journal of Selected Topics in Signal Processing . 2010;4(6):965–973. doi: 10.1109/jstsp.2010.2075990. [DOI] [Google Scholar]

[B25] 25.Wu C. Application of digital image based on machine learning in media art design. Computational Intelligence and Neuroscience . 2021;2021:p. 8546987. doi: 10.1155/2021/8546987. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Aided Recognition and Training of Music Features Based on the Internet of Things and Artificial Intelligence

Xidan Zhang

Abstract

1. Introduction

2. Research on Auxiliary Recognition of Music Features

2.1. Overall Structure of Music Feature Recognition

Figure 1.

2.2. Design of Music Collection Module

2.3. Music Signal Module Processing Design

Figure 2.

3. Music Feature Assisted Recognition and Training

3.1. Extraction of Basic Music Features

3.1.1. Tone and Music Feature Extraction

Table 1.

3.1.2. Melody Direction Recognition

3.2. Musical Inference Rules

3.3. Music Separation Algorithm

4. Simulation Experiment

4.1. Music Feature Recognition

4.1.1. Algorithm Definition

Table 2.

4.1.2. Experimental Data and Research

Figure 3.

Figure 4.

4.2. Comparative Experiment and Analysis

4.2.1. The Influence Experiment of Feature Map

Table 3.

Table 4.

Figure 5.

Figure 6.

4.2.2. Comparison with Other Methods

Table 5.

Figure 7.

4.3. Test Model Performance Comparison Test

4.3.1. Evaluation Criteria

Table 6.

4.3.2. Experimental Results and Analysis

Table 7.

Table 8.

Table 9.

Figure 8.

Figure 9.

5. Conclusion

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases