Abstract
Since electronic music is simpler to produce and distribute than analog music, the variety of musicals available worldwide has increased rapidly along with the music marketplace’s shift from analog to digital. Due to the abundance of available songs, people are discovering songs in various ways; one of them is by analyzing their emotional content. Not every age group can listen to the same music at all times. Deep learning techniques have yielded excellent results recently, marking a significant advance in NLP. However, there have been few attempts to use a deep learning model to sort out lyrics from improper music. Hence, a deep learning-based lyrics text classification process is presented in this proposal. Firstly, indispensable text data are fetched from the standard online resources and further, it is applied to the text pre-processing stage. After that, the resultant pre-processed text is subjected to the Serial Cascaded Hybrid Adaptive Deep Networks (SCHADNet) for classification purposes. The Transformer-based Bidirectional Long Short-Term Memory (Trans Bi-LSTM) is integrated with a Gated Recurrent Unit (GRU) for developing the model of SCHADNet, where the parameters of SCHADNet are optimally tuned by the Improved Marine Predators Algorithm (IMPA). Lastly, the classified outcome is accomplished from the SCHADNet. In order to enhance the classification performance, the developed model shows significant advancement by increasing the accuracy rate of 93.4%, 93.47% recall and 99.2% NPV, respectively. The numerical analysis is performed for the suggested lyrics text classification model over numerous classical text classification techniques to portray the effectiveness of the presented model.
Keywords: Lyrics text classification, Serial cascaded hybrid adaptive deep networks, Transformer-based bidirectional long short-term memory, Improved marine predators algorithm
Subject terms: Engineering, Optics and photonics
Introduction
Since the beginning of time, music has been a significant part of our lives. It profoundly affects the state of mind, ideas, and interactions with others while also evoking human feelings1. Our cultural and social life is enhanced by music, which has a range of effects on us. Perhaps the more widely used medium for data, pleasure, and leisure in the past few decades is music. Since lyrics are a means for artists to convey themselves, the library of electronic music is expanding quickly2. There are lyrics that hint at aggressive sexual or drug themes and contain material that is not appropriate for children’s ears. Recognizing an atmosphere in music is an ongoing process of exploration. It uses various techniques to identify the feelings connected to a musical composition3. These consist of lyric analysis of texts, sound evaluation, and other things. The majority of studies on musical classification rely on examining auditory signals and musical characteristics4. Employing a slang vocabulary is the initial method. This technique compares a song’s lyrics to a list of phrases considered obscene or improper. The music is deemed unsuitable if one or more of these conditions appear in its lyrics5. Nevertheless, since there isn’t a single profane vocabulary used by all businesses, the outcomes of this approach could differ6.. To ensure the swearing lexicon is updated with the latest offensive words, ongoing maintenance is necessary when using this strategy.
It is challenging to satisfy the requirements of users experiencing a range of emotions when the majority of tools just suggest well-known songs while ignoring personalized efforts7. The process of creating classification labeling was primarily manual prior to the development of sophisticated software, and tracks with various musical genres were arranged into appropriate song categories8. Nevertheless, these techniques seem not just ineffective but highly dependent on human judgment, and the precision of classification isn’t consistent9. The classic classification techniques, which mostly consist of techniques like Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), and Support Vector Machines (SVM), are currently maturing based on human classification10.Deep and other machine learning techniques have been used extensively to classify sound, picture, and text, and the results have been impressive11. With the development of computerized-related methods, computers are now capable of doing intricate calculations and emotional evaluation, as well as generating emotional outcomes12.The word embeddings of the lyrical data are used to determine the genre, which is highly related to the lyrics.
The minds of adolescents may be significantly impacted by such lyrics. Lyrics are becoming more explicit and aggressive13. Nevertheless, the methods in place to filter explicit words from song lyrics are ineffective14. There are several methods that have been proposed to classify texts, such as deep learning techniques like CNNs and RNNs, classification using machine learning methods, and lexicon-based filters. These experiments have produced differing degrees of efficiency and were carried out on various data sources and dialects15. According to some research, employing more sophisticated machine learning classifications could assist in achieving even greater gains16. While techniques based on machine learning demonstrate promising outcomes in the area of classifying music feelings, there is still a need for improvement in the comprehensive identification of sound feelings because the connection among phrases and harmony sentiments can be distinguished in various manners during the processing of phrases and melodies, without taking the uniformity of feelings among lyrics and melody into consideration17.
Motivation of the developed model
In general, the music plays an important role in the human emotions. Moreover, the lyrics are a vital part of the song that acts as an inevitable role helps to understand the emotions of the songs. It is crucial to categorize lyrics using various machine learning and deep learning approaches18. Several well-known classification techniques have been adopted to classify the lyrics text from the labeling data. In recent times, the utilization of deep learning models like CNN and RNN has achieved superior outcomes and provided an exciting breakthrough with the help of Natural Language Processing (NLP)19. The imbalance of data in traditional models can result in biased models that impact performance with less frequent classes due to the uneven class distribution. Due to the presence of noise, misspellings and inconsistent data can easily affect the data quality in CNN20. Training the CNN model is computationally expensive and required significant memory usage to understand the sequential nature of the lyrics. On the other hand, the RNN model has the ability to progress the data sequentially yet, it struggles to provide parallelize the computations. Thus, it results to beslow training process than the other traditional approaches21. Existing traditional models are still challenging due to inconsistent and redundant data, which often leads to misclassification. To rectify the issues in the existing models, the research work develops an effectual lyric text classification model based on deep learning to alleviate such challenges, and the contributions are given as follows.
To develop the effectual deep learning-based lyric text classification model using the optimization approach that helps to categorize the songs based on its mood, genre, sentiment, and performer.
To design the SCHADNet-based text classification model useful for recognizing and analyzing the meanings used in lyrics and facilitates the analysis of songs’ context within history and culture.
To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm.
To develop and evaluate the IMPA model by modifying the traditional MPA with an effective concept that helps in the parameter tuning and performance enhancement of the suggested lyrics text classification.
To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted for the lyrics text classification approach against a variety of traditional text classification methods.
The layout of the suggested framework is provided below. The automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep network is shown in Section "Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks". The pre-processing of text data for lyrics text classification is provided in Section "Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm". The hybrid adaptive deep networks for the lyrics text classification model are offered in Section "Serial cascaded hybrid adaptive deep networks for lyrics text classification model". The result and discussion are available in Section "Result and discussion". The conclusion is offered in Section “Conclusion”.
Literature survey
Related works
In 2020, Abdillah et al.22 have employed the deep learning Bi-LSTM algorithm with weighted keywords from GloVe to identify the song’s feelings utilizing its lyrics. The precision of the Bi-LSTM framework with the layer of dropout with activity periodicity was determined to be 91.08%. The difference between validation and training loss could be reduced by approximately 0.15 if the settings for abandonment, activity regulation, and instruction rate degradation were adjusted.
In 2023, Revathy et al.23 have used the Music4All database to assess the musical elements crucial for identifying four main human feelings: joyful, furious, calm, and unhappy. Several artificial intelligence methods based on a conceptual psychological model were used to accomplish this. To predict the mood of the desired information, a transfer learning method was used to comprehend the emotions of the lyrics derived from an in-domain database.A rudimentary lyric-suggested network was created using the phrase converter concept.
In 2022, Jia24 has proposed an approach for classifying musical emotions based on enhanced attention mechanisms and extensive knowledge. The characteristics of the tune’s songs were initially extracted, yielding the term frequencies weighted matrix and phrase vector. By combining the matched attention system with the extracting features capabilities of CNN and LSTM networks to handle serialized input, a framework for evaluating feelings was created. Ultimately, the CNN-LSTM model along with the Deep Neural Network (DNN)’s data outputs was combined, and the SoftMax algorithm was utilized to determine the different emotion kinds. Given the chosen data sets, the tests revealed that the suggested method’s mean accuracy in classification reached 0.848, greater than the average of the other comparative methods, while the method’s categorization efficiency had significantly increased.
In 2023, Chen et al.25 have developeda model by combining the deep learning as well as machine learning to extract lyrics from songs. The suggested model, ELSTM-VC, was compared to other algorithms due to the integration of extra branch classifiers and LSTM. With its ability to identify sexually explicit material in English phrases, the ELSTM-VC has potential applications for the entertainment sector. The suggested method successfully identified explicit phrases, according to the study’s findings, which were based on an array of 100 songs on Spotify utilized for learning. It has the ability to accurately extract content that is objectionable for younger audiences. The suggested strategy outperformed other strategies, such as encoding-decoding algorithms and models for machine learning.
In 2023, Li et al.26 have suggested a multimodal structure for classifying music genres that used lyrics and audio files. By embracing the complementary nature of multisensory data, it is possible to achieve a more thorough representation of musical styles. A CNN was employed to gather audio characteristics after the structure had first retrieved the audio’s mel-spectrogram. BERT used multiple methods concurrently to acquire the lyrics’ dispersed representation. Subsequently, the two multimodal pieces of data were combined using several techniques, including features and choice-level fusion. To address the significant difference in convergence rate between the sound channel and the melody stream, the asynchronous technique was employed at the beginning of two streams with various models. A number of tests were conducted to confirm the suggested model’s efficacy. In terms of the genre of music categorization, the suggested approach’s F1 score represented 0.87, a value that was almost 4% greater than the highest background in the trial.
In 2018, Delbouys et al.27 have developed the multimodal musical mood forecasting model using a track’s words and sound input. The use of conventional feature engineering-based methods was replicated and put forth a novel deep learning-based model. The method was able to outperform conventional algorithms on the excitation identification task, but both techniques performed similarly on the emotion forecasting challenge. The efficacy of both methods was assessed on a collection of data that had 18,000 recordings with related arousal and valence scores. The integration of modality optimized concurrently for every single-modal model resulted in a significant increase in valence predictions when evaluated afterward. A portion of the database was made available for examination.
In 2023, Carmo et al.28 have identified an imbalance in the existing research on musical data mining by applying text-based representation methods to the issue of categorizing melodic sub-genres. Identifying the line that separates groups from a single category is the challenge of the issue, given that they share several characteristics. Extensive tests were conducted in order to determine the most effective blend of written models and classifiers. The findings demonstrated that enhanced Bag-of-Words (BoW) using the Support Vector Machine (SVM) with LR methods outperformed DNN and integrating algorithms in terms of performance. The findings may lead to further research on the classification of texts with complex and delicate interfaces of separateness.
In 2017, Tsaptsinos29 has created models for continuous neural systems for organizing a big collection of whole lyrics to songs. To use each of these strata and comprehend the significance of the phrases, paths, and sections, a Hierarchical Attention Network (HAN) was utilized. Lyrics display a hierarchical layered framework, where words merge to create lines, lines create sections, and sections make the whole song. A reduced database of 20 genres was used and an expanded dataset with 117 genres to evaluate the framework. The HAN’s performance in experimental data was superior to that of less difficult computational models and non-neural designs, and it was also capable of discriminating across a wider range of categories than previous studies. During the process of learning, it will additionally be possible to see what lyrics or words of the music that the example considers crucial for dividing its genre. Consequently, the HAN offered insights into the linguistic characteristics and poetic organization that distinguish distinct genres of music from a computing standpoint.
Problem statement
Text classification is a common process that includes categorizing the text into groups utilizing advanced approaches. The text classifier has the ability to evaluate the text and assign pre-defined classes or tags based on its content. From the lyrics text classification, approaches such as categorizing music mood, genre, sentiment, and performer can be carried out. Numerous text classification works have been presented using lyrics. Some of the method’s merits and issues are given in Table 1.
In conventional techniques, dealing with a massive number of data in high-quality datasets can merely affect the accuracy of the model. Training and testing a large amount of data is a time-consuming and challenging process. Incorporating the transformer, Bi-LSTM and GRU model ensures to learn the intrinsic patterns and has the ability to train the model by considering the large number of data. Thus, it greatly strengthens the accuracy performance in lyrics text classification model.
Understanding the contextual relationship of the words and phrases becomes difficulty and prone to increase the errors in the text data. Overfitting and poor performance for unseen data can be a result of existing deep learning models. On the other hand, the implemented model in this research work ensures to split the data into training as well as testing phases. The developed model can certainly minimize overfitting issues to improve the model’s overall performance in this context.
Existing preprocessing techniques often provide inaccurate outcomes, especially in inconsistent formatting, noisy data and nuances of musical language. Eliminating the redundant and inconsistent content in the lyrics text data becomes challenging the traditional techniques. However, the research work focus on effective data pre-processing by considering the punctuation and special character removal, removal of redundant and inappropriate data and stemming to emphasize the overall performance. The data pre-processing phase eliminates the noisy outcome to maximize the model’s accuracy.
The presence of repeated data can impact the classification performance in the traditional models. Most of the research works do not focus on tuning the parameters. The optimization algorithm’s parameter tuning plays a crucial role in selecting the optimal parameters. In this research work, the fine tuning parameter optimization is done with the help of IMPA algorithm by selecting the appropriate parameters to get the optimal solutions.
Table 1.
Discussionon the conventional lyric text classification models.
| Author [citation] | Methodology | Features | Challenges |
|---|---|---|---|
| Abdillah et al.,22 | Bi-LSTM | • It enhances the available network data and enhances the contexts. | • It performs slow calculations. |
| • It offers better data representations. | • It consumes more training time. | ||
| Revathy et al.,23 | BERT | • It gives high-accuracy solutions. | • It is a very expensive model and demands more computation. |
| • It requires very little memory. | • It has a complex network. | ||
| Jia24 | CNN | • It automatically recognizes the relevant patterns. | • It shows limited efficacy for the sequential data. |
| • It can process a high amount of data with more accuracy. | • It may be affected by overfitting issues when processing small datasets. | ||
| Chen et al.,25 | LSTM | • It offers high-accuracy solutions. | • It is computationally intensive and expensive. |
| • It rectifies the gradient issues of the network. | • It is prone to chaotic, complex, and noisy data. | ||
| Li et al.,26 | BERT | • It provides better predictions due to its bi-directional nature. | • It is not a straightforward process. |
| • It evaluates all the input without any particular direction. | • It produces undesired outcomes. | ||
| Delbouys et al.,27 | DNN | • The computation required is minimal. | • It is very hard to interpret and lacks domain expertise. |
| • It is very flexible and performs complex tasks. | • It needs more data to train the network. | ||
| Carmo et al.,28 | SVM | • It prevents the network from the overfitting issues. | • Complete data sources without any missing values are necessary. |
| • It performs rapid prediction and has good generalization. | • It provides poor performance for large data sources. | ||
| Tsaptsinos29 | HAN | • It is very helpful in detecting the significant data. | • It has dimensional issues. |
| • It provides better functionality in complex data. | • It consumes more resources for the execution. |
Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks
Proposed lyrics text classification model
Textual lyrics pose a number of classification issues. The subjective nature and comprehension of lyrics serve as one of the primary obstacles. Individual listeners can interpret identical lyrics in different ways as they reflect the listener’s events, feelings, and viewpoints. Because of this individuality, it is challenging to develop a systematic categorization scheme that faithfully conveys the lyrics’ purposeful meaning. Lyrics’ intricate structure of language presents another difficulty. Numerous songs include literal spoken language, analogies, phraseology, and symbolic references that may pose a challenge for algorithmic techniques to understand. Advanced techniques are needed to effectively categorize and evaluate lyrics due to their complex and nuanced wording.
The overwhelming amount of lyrical data is another major obstacle. To manage the quantity of tunes and lyrics, sophisticated algorithms are necessary for analyzing and evaluating such large volumes of text. The variety of styles and categories also makes categorization even more difficult. Developing a classification system that works effectively for all genres of music is difficult because every genre may have its own distinct lyrical qualities. Furthermore, delicate or sexual content occasionally appears in songs. This makes it difficult to moderate material and guarantee proper filtering, particularly on sites wherein lyrics are posted publicly. Keeping a secure and welcoming atmosphere requires the development of strong content-filtering algorithms that can reliably recognize and identify possibly dangerous or unsuitable lyrics. By dealing with these issues, it is possible to gain a greater understanding of the global context of lyrics and songs, which enhances one’s comprehension and appreciation for the uniqueness of songs. So, we developed an effectual lyric text classification, and the pictorial view is provided in Fig. 1.
Fig. 1.
Pictorial view of the developed lyric text classification model.
A novel lyric text classification model is implemented, where the primary objective of this process is to effectively categorize songs based on their mood, genre, sentiment, and performer, resulting in a better understanding of the songs for further analytics. This classified solution helps listeners and scholars to examine and investigate musical patterns, themes, and styles. Most of the time, the lyrics have a more implicit, and subtle tone, demanding a deeper understanding of the emotional undertones. Also, the emotional categorization of lyrics in the texts is somewhat subjective due to the music and personal interpretation of the lyrics.Therefore, classifying the emotions in song lyrics is more significant than any other text such as books. This lyric classification model provides the insights of the individual’s inner feelings. Generally, the necessitated data are garnered using benchmark sources of data. In addition, the data gathered is subjected to pre-processing to enhance its quality. In this stage, operations such as (i) punctuation and special character removal, (ii) removing redundant and inappropriate data and (iii) stemming are performed. After performing the text pre-processing, the resultant data is given to the classification stage for categorizing the lyrics text. The SCHADNet model is created to achieve effective text classification by combining Trans Bi-LSTM and GRU models.To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm. The SCHADNet model that was developed provided text-classified results. The output classes are various genres, moods, performers, and sentiments of the song.
Text dataset for classification analysis
The data necessitated to carry out the lyric text classification model are as follows.
Dataset-1 ("Multi-Lingual Lyrics for Genre Classification dataset"): The data are collected using the link of https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. This dataset is in the Kaggle platform. This dataset includes two files in.csv with 11 columns. The size of this dataset is 341 MB and includes 291118 songs.
Dataset-2 (“Song-lyric-classification datasets”): The data are garnered using the link of https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03. The database has the motive to predict the emotions of the songs on the basis of its lyricsandgenres. The size of this dataset is 1.86 MB and includes 1369 songs. This dataset is in the.csv file.
Dataset 3 (“Veucci/lyric-to-3genre”): utilizing the link, “https://huggingface.co/datasets/Veucci/lyric-to-3genre: access date: 2024-08-16”, this dataset has been accessed. This data source includes numerous song lyrics from distinct genres and artists in English. The genre such as rock, hip-hop, and pop are utilized in this resource.
By utilizing these datasets, the mood, genre, sentiment, and performer of the song is classified from the lyrics.
From the datasets, the collected data is defined by
. In this, the total count of the gathered text is expressed as
.
Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm
Text data pre-processing
The text pre-processing is a significant step in the data preparation, where the original text is converted into a suitable, clean, and consistent format for modeling and evaluation. While modern techniques such as BERT, BART, and GPT are trained with punctuation and stop word removal, the pre-processing state is still needed for some reasons such as improving data quality, noise minimization, model explainability, computational efficiency, and model interpretability, and so on. Moreover, the mentioned BERT, MART, and GPT are transformer-based models. Pre-processing text data may be less time-consuming than performing the pre-processing task with these models. Hence, the text pre-processing is performed in this work for improving the text quality. The collected data
are inputted for text data pre-processing stage.
Punctuation and special character removal
The punctuation removal is the process of replacing or depleting the punctuation marks from the text data. Some of the punctuations are periods (.), commas (,), colons (:), parentheses (()), dashes (-) and so on. By removing these punctuations, text data can be simplified, noise can be minimized, and focus can be placed on meaningful words. The special character removal is the process of removing the special characters such as symbols (+,-,=), currency signs ($), HTML tags (<,>), and so on. This special character removal results in enhanced model performance, text representation, and so on.
Remove redundant and inappropriate data
Redundant data is material that is repeated or reproduced within the lyrics. By decreasing needless reiteration eliminating duplicate information serves to simplify the classification procedure. The definition of improper data in lyrics text classification is that it is not related and obnoxious to the classification process. This also includes information that is irrelevant, offensive, or explicit to the operation. The improper data can include particular terms or overall songs that are not related to the classification aim. For example, if the aim is to categorize the lyrics on the basis of their emotional content, entire songs, or terms that contain explicit or offensive language irrelevant to the emotions can be considered improper and eliminated during the pre-processing process.
Stemming
The practice of distilling languages to their basic or core shape is referred to as stemming. By using this strategy, lyrics’ terms can be created while reducing the intricacy and diversity of terminology. It can gather several variants of a single word, like “performing” and "performs," into a prevalent root, like "perform." This streamlines the categorization procedure by considering comparable words, irrespective of their particular shape, as identical. By lowering the degree of dimensionality for the information, stemming can increase the precision and effectiveness of models for lyrical text categorization. After executing all these processes, the pre-processing stage given the pre-processed data is indicated by
.
Marine predators algorithm
This part examines the MPA30 method’s evolution as a straightforward and effective meta-heuristic optimization technique.
Detecting top predator phase
MPA is a population-level approach, where initialization process in the searching space as in Eq. (1).
![]() |
1 |
Here, the term
is an even randomized matrix in the interval from 0 to 1, and
and
are the lowest and highest limits for parameters.
The elite matrix is created by selecting the fittest outcome as the top predator.This matrix’s panels manage the process of looking at and locating the target using the location of the target in Eq. (2).
![]() |
2 |
Here, the term
is the elite; in order to create the
, the top predatory vector form denoted by
is copied
periods. The total amount of dimensionality is
, while the quantity of agents that search equals
. If the superior predator replaces the most powerful predatory at the conclusion of each cycle, the term
is going to be changed.
Prey
is a different matrix having identical dimensions as
, and hunters adjust their places according to it. To put it simply, activation produces the first batch of prey, and then the best of the best builds the
. The term
is calculated using Eq. (3).
![]() |
3 |
Eq. (3) shows the
aspect of the
victim as
. It ought to be mentioned that each of these arrays play a major but immediate role in the optimum procedure in its entirety.
MPA optimization scenarios
The MPA optimization procedure is broken down into three key phases and is explained as follows.Levy and Brownian motion are the primary random walks used in the MPA.
The Levy motion is a kind of arbitrary walk that the sizes of the steps are estimated from the probability function and are expressed as
. Here, the distribution index is specified as
and the attributes
are two specific points.
The probability function referenced by unit variance and normal distribution determines the length of step in the stochastic operation of Brownian motion. It is expressed as
. Here,
is the mean and
is the unit variance.
Phase-1: The optimum tactic for an attacker in a high-velocity proportion
is to move very slowly, according to the guidelines. This rule’s mathematical framework is used as in Eq. (4) in the case of
.
![]() |
4 |
Here, the term
represents an integer that the Brownian motion. Multipliers by entries are indicated by the symbol
. Prey replicates its movements by multiplying
in
,
representing an array of uniformly at random values, and
providing a constant. The maximum number of iterations is
, and the present repetition is
.
Phase-2: If food travels in Levy at a particular speed ratio
, its optimum course of action is Brownian. Using Eq. (5), the movements of prey take place in Levy and predators in Brownian While
.
![]() |
5 |
Here, the Levy motion is represented by a series of arbitrary numbers
, which depends on the Levy probability. Prey motion is simulated in a Levy fashion by multiplying
and
, and predator motion is simulated by adjusting a step number to the predator location. The current investigation is based on Eq.(6) for the remaining groups.
![]() |
6 |
Conversely,
is thought of as a trait of adaptation that regulates a step’s size enabling the motion of predators. The target changes where it is in response to the predators’ Brownian motion, whereas the predator’s motion is simulated by multiplying
and
.
Phase 3: Levy is the most effective predation technique at low-speed ratios
. This stage is described in Eq. (7) While
.
![]() |
7 |
In the Levy tactics, the motion of the hunter is simulated by multiplying
and
, and the motion of the prey is simulated by adding the number of steps to
status, which aids in the updating of prey location.
Eddy formation
The Fish Aggregating Devices (FADs) impact is expressed numerically in Eq. (8).
![]() |
8 |
The possibility of FADs impacting the optimization process is represented by
. A binary column of an array containing one and zero is denoted as
. This is created by creating an arbitrary vector within the interval
, and when its length is smaller than 0.2, switching it by just one, and when it is more than 0.2, switching it to 0. Here, the term
indicates a uniform randomized integer
. A vector with the bottom and top limits of the size is denoted by the terms
and
. The letters
and
represent the prey matrixes’ randomized indices.
Proposed IMPA-based classification performance enhancement
The developed IMPA is employed for tuning the parameters in the developed SCHADNet-based lyric text classification model. The conventional MPA has several cons and pros. The exploration and exploitation are efficiently balanced by the MPA algorithm. It explores searching areas and takes advantage of the most effective solutions currently discovered by employing the concepts of predator escape and victim pursuit. It has proven to be effective in resolving intricate optimization issues involving several optima. Thus, the MPA is selected for the suggested SCHADNet model’s optimization. However, in circumstances when the problem environment is fluid, it might have trouble. Its efficiency may be impacted if it is unable to react swiftly to abrupt shifts in its surroundings. It might have trouble growing up to highly dimensional issues. The technique could make it more difficult to efficiently search for and identify the best answers when complexity rises. As a consequence, we developed an enhanced MPA named IMPA to optimize the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy, sensitivity along with reduce the FNR and FPR.The conventional MPA includes phases like detecting top predators, MPA optimization scenarios, eddy formation, and marine memory saving, to provide more results lyric text classification process, but the developed IMPA is modified by removing the top predator detecting phase and marine memory saving phase from the conventional MPA.This IMPA-aided parameter tuning enhances the performance rates of the developed lyrics text classification process with better convergence rates.
Optimization process using developed IMPA
The optimization process begins by considering the population and iteration count of the movement of predators and prey in the marine ecosystem. Here, the iterative process begins by considering the several run times for getting the optimal solutions. Moreover, the solution in the developed IMPA algorithm gets encoded to maximize the model performance. The optimal solution can be effectively attained by adjusting the essential parameters by minimizing the error rate and maximizing the desired outcomes. Deriving Eq. (21), the objective function of the model is evaluated to select and understand the relevant parameters based on the algorithmic rules. Iteratively updating the random parameters could significantly improve the models searchability and provides a better balance between the exploitation and exploration abilities.Fig. 2 and Algorithm 1 offer the Flowchart and pseudo-code for the proposed IMPA algorithm.
Fig. 2.

Flowchart for the proposed IMPA algorithm.
Algorithm 1.

DevelopedIMPA
Serial cascaded hybrid adaptive deep networks for lyrics text classification model
Bidirectional long short term memory
Bi-LSTM31 was generated from an LSTM that can train on its own from the initial pattern vector of definition documentation is the Bi-LSTM part that relies on a deep learning network design.
The Bi-LSTM can generalize and capture the features deeply. As an outcome, the Bi-LSTM becomes generalized network for invisible features. Word order, bidirectional contextual relationships, and dependence can be acquired over time. Moreover, it can address disappearance gradients and inflation problems with effectiveness.
The two LSTM neural networks in Bi-LSTM, which have forward as well as backward feedback, are linked to a single layer of output as soon as it comes to classification. Bi-LSTM adds reservations and context knowledge for every point to the incoming series to increase reliability. Three gates’ configurations and a single-cell condition make up the fundamental construction of an LSTM component.
The input, forgets, and output gates control how the cell status is updated and preserved throughout the LSTM component. The forget gate decides how to keep knowledge of the previous unit nation, the output gate manages which elements of the revised state of the cell are produced, and the input gate manages what components of the fresh data are retained in the cell’s memory. The subsequent Eq. (9) to (13) illustrates the particular operation that uses the LSTM components.
![]() |
9 |
![]() |
10 |
![]() |
11 |
![]() |
12 |
![]() |
13 |
Here, the terms
and
constitute the source vector as well as concealed layer value at duration
, respectively, and
,
,
, and
indicate the results of the intake gate, forget gate, output gate, and cell at period
. The weighted array and biases variable are represented by
and
, accordingly, and their underscores, which include
and
, signify the matrix of weights and biases field that are part of the gate’s input architecture. The term
stands for the function of sigmoid activation.
Context-sensitive data can be retained by the Bi-LSTM through the use of the LSTM component.The Bi-LSTM architecture has two LSTM coatings that are identical in both directions. Like traditional LSTM neural systems, both of the concurrent layers of LSTM function in a comparable way. The input vector
in the other side is handled by two separate layers of LSTM for the front and backward instructions, accordingly, for the
temporal increment. The result is the total of the secret state carriers, which is expressed in Eq. (14).
![]() |
14 |
In this case,
indicates the bias, the terms
and
are the weighted variables for the two simultaneous layers of LSTM in the forward as well as backward orders, correspondingly, and the terms
and
represent the final outcomes of both concurrent layer LSTM.The Bi-LSTM model’s graphical presentation is displayed in Fig. 3.
Fig. 3.

Graphical presentation of the Bi-LSTM model.
Gated recurrent unit
GRU32 represents a new technology that is reminiscent of LSTM, an improved variation of RNN. It transfers data using a hidden state rather than the cell’s internal state. In addition, there are two gates: the resetting gate and an updated gate. The first gate chooses what knowledge about past events to discard. The later gate indicates that the choice to toss or retain fresh data has been made. The data is scaled from
by a sigmoid barrier and the graphically illustrated in Fig. 4.
Fig. 4.

Graphical illustration of the GRU model.
If 0, the state of hiding does not allow any knowledge to pass through, and 1 indicates that details must be inputted during the following state. The term
gates are candidate phase activation mechanisms that crush the values among
. The subsequent Eq. (15) to (18) is available from GRU.
![]() |
15 |
![]() |
16 |
![]() |
17 |
![]() |
18 |
Here, the term
serves as a timestamp
is the entered significance, and
gives the state that is hidden. The respective weights that correspond to the updating
and resetting
gating are denoted by
and
, correspondingly. On the other hand, the term
is a possible output.Some major benefits of GRU versus LSTM or similar longitudinal learning algorithms include effective training with fewer variables, insensitivity to sound, and greater distributed data within GRU. To identify temporal connections and variations in usage, GRU is used to compare what was consumed at one stamp to the amount consumed at the next stamp and make predictions on this basis. It is also economical concerning memory and time due to the usage of two Gates.
Developed SCHADNet-aided lyrics text classification
Transformer
Transformer33 is the encoder-decoder architecture that uses the sequence-to-sequence conversion. An encoder was used to convert text into vector form during this classification process.The particular functioning technique is to use the encoding component to transform the inputs to a vector that has fixed-length software. Yet, the goal of this research is to matrix encoding the initial input environment or perspective in order to extract a high-level characteristic; just the encoder component of the Transformer is employed since there isn’t a requirement to turn this encoded vector to series outputs. This section consists of
equal levels, where every layer consists of a pair of sub-layers: an entirely linked forward feed system and a multiple-head attention system. The residual link and standardization processes will proceed after the two sublayers. Mounting several scaling dot-product attention yields multiple heads of attention between individuals.The transformer’s intake constitutes a vector of
which contains
phrases acquired through the embedding layer’s input. Three numbers of linear transforming vectors
,
,
are arbitrarily set and multiplied to the inputting vector to acquire the query vector
, key vector
, and value vector
, wherein
indicates the concealed dimensions. Transformer Encoding was crucial due to the scaling of dot-product attention.Standardizing the resemblance scalar is necessary to determine its weight. After that, the closeness among every vector
of the query column and every vector
in the key matrices is computed. The vector of weights is subsequently divided by the total phrase worth in the phrase to obtain the scaling dot-product attentiveness result using Eq. (19).
![]() |
19 |
Here, the square root of the vector size
within the
matrix is often used as the coefficient of scaling
. A significantly greater number of characteristics may be obtained through repeatedly learning various categories following the
order linear change of the query, key, and value array using various settings. After that, the multi-head attention mechanism generates the following output in Eq. (20).
![]() |
20 |
Here, the concatenated vector is
explained as
. The embedded characteristics and the contents are inputted to the encoder of the transformer. Following this, the concealed presentation process
,
and the word level concealed presentation of
,
is taken.
Reason for choosingSCHADNet model
The SCHADNet model is developed for the classification of lyrical text. The pre-processed data
are given to the recommended SCHADNet model for classifying the lyric text. The models such as transformer, Bi-LSTM and GRU are incorporated into the developed SCHADNet model. This work utilized the GRU and Bi-LSTM as primary networks for the classification process. These two techniques are relatively better than the conventional transformers because of faster training and efficiency. Moreover, these networks are low-cost and determine richer features than other models. The Bi-LSTM technique can capture the contextual relationships among the words and minimizes the impact of noisy data.However, the Bi-LSTM model’s sequential nature makes it complex to parallelize, resulting in poor scalability.The Trans-Bi-LSTM network effectively performs the tasks in parallel, thus improves the scalability. Though this network has better scalability, it struggles to handle the very long sequences and may face the vanishing gradient issues. Therefore, the GRU is combined with the Trans-Bi-LSTM model, thus forming the hybrid network. This hybridized network handles the variable-length sequences and also enhances the interpretability. Here, the obtained features from the Trans-Bi-LSTM are passed to the GRU model for further processing. After classifying the text features, the GRU model offered the classified outcome. Although this serially cascaded hybrid network offers relatively promising solutions, the parameters in the network require careful tuning for achieving maximum accuracy in the text classification process. For this objective, the IMPA is considered. This is an effective algorithm offering optimal solutions with better convergence values. Therefore, by employing the IMPA, the parameter tuning is performed. Thus the SCHADNet network is chosen for text classification.
The transformer model is combined with this technique, thus constructing the Trans-Bi-LSTM. Initially, the obtained preprocessed data are fed into the Trans-Bi-LSTM model, which supports determining and processing the inputted features, and it comprises the transformer and Bi-LSTM.The SCHADNet-based lyric text classification model is used to recognize and analyze lyric texts. The parameters like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy
, sensitivity
along with reducing the FNR
and FPR
and the mathematical formulations are shown in Eq. (22) to Eq. (26). The objective function
of the recommended SCHADNet-based lyric text classificationsystem is down in Eq. (21).IMPA’s support has enabled the SCHADNet network to provide highly accurate classified solutions. In the SCHADNet training process, the dataset is divided into two sections for training and testing in the ratio of 75:25. The training data is used to train and use the SCHADNet model for the classification process.
![]() |
21 |
Here, the terms
and
denote the optimized hidden neurons in Trans-Bi-LSTM and GRU in the range of
and
, and the terms
and
define the optimized epochs in Trans-Bi-LSTM and GRU in the range of
and
.
(i) Eq. (22) is used to assess accuracy
.
![]() |
22 |
(ii) Eq. (23) is used for estimating sensitivity
.
![]() |
23 |
(iii) Eq. (24) is used for defining the False Negative Rate (FNR)
.
![]() |
24 |
(iv) Eq. (25) is used for evaluating the False Positive Rate (FPR)
.
![]() |
25 |
In this case, the terms
and
constitute the true positive and true negative,
and
stand for the false negative and false positive, correspondingly.Fig.5 displays the representation of SCHADNet-aided lyrics text classification model.
Fig 5.
Developed SCHADNet-aided lyrics text classification model.
Interacting the model with each other
In order to provide better classification performance, the effective data pre-processing is achieved to provide cleaned data without the noise. With the help of pre-processing techniques, it effectively removes punctuation, special characters, redundant data, and inappropriate stemming. Utilizing the pre-processing helps to minimize the computational complexity and strengthen the models capability. The data can be cleaned and provided with meaningful information by removing punctuation and special characters. So, the noisy and irrelevant data is cleaned for extracting the relevant features that leads to enhance the classification performance. Thus, the outcome of the model is simple and easier to understand the model to get the precise outcomes. In this context, the effective preprocessed outcome
is inputted into the SCHADNetclassification model to precisely classify the lyrics text. The SCHADNet model that was developed can learn semantic patterns to understand the user’s preference based on underlying patterns.
Result and discussion
Simulation setup
Python platform was employed in the lyrics text classification for the entire processing. The proposed IMPA scheme used 50 maximum Iterations. The suggested IMPA algorithm’s population input is referred to. Here, the populations are encoded by utilizing the required amount of parameters. Here, the IMPA’s number of populations was 10. The chromosome represents an individual solution encoded in a format suitable for manipulation by the algorithm. For the IMPA, the length of the chromosome was 4 The designed SCHADNet-based lyrics text classification process considered the following parameters: The number of epochs-50, batch size-16, number of LSTM units-64, number of transformer encoder layers-2, number of attention heads-4, learning rate: 0.0001, number of GRU units: 64, dropout rate: 0.2, hidden layer size-128, activation function: TanH, optimizer: {SGD, Adam, RMSprop}. Finally, the network produces highly accurate lyrics text classified solutions. The performance was validated with numerous existing systems like LSTM34, Trans-Bi-LSTM35, GRU32 and Trans-Bi-LSTM-GRU36, and the algorithms like Eurasian Oystercatcher Optimizer (EOO)37, Valley Optimizer (EVO)38, Political Optimizer (PO)35 and Marine Predators Algorithm (MPA)26.
In experiment, the selection of parameters is treated as an automated search for the most efficient configuration within a predefined range. The process begins by defining the specific hyperparameters such as hidden neuron counts and epoch sizes. The IMPA then initializes a population of candidate solutions. Each candidate represents a unique combination of parameters that is used to train the Trans Bi-LSTM or GRU. The resulting performance is assigned as a best fitness score to that specific combination. As the algorithm iterates, it refines these values through exploration and exploitation phase. Throughout this process, the algorithm constantly compares new combinations against the current best performer. Upon reaching the maximum number of iterations, the IMPA outputs the global best solution, which contains the optimized values for neurons and epochs that yielded the highest accuracy. These optimized values are then finalized as the parameters for the experimental model. Thus, choosing the hidden neuron count in Trans Bi-LSTM and GRU within the range of [5–255] can effectively balance architectural depth with computational efficiency. Further, selecting the number of epochs in Trans Bi-LSTM and GRU within [5–50] helps to generalize well on unseen data.
Experimental measures
The following measures are employed to develop the lyric text classification framework.
(a) Eq. (19) determines precision
.
![]() |
26 |
(b) Eq. (21) can be used to determine the F1-Score
.
![]() |
27 |
(c) Eq. (24) is used to assess specificity
.
![]() |
28 |
(d) When applied Eq. (26), yields the Matthews correlation coefficient (MCC)
.
![]() |
29 |
(e) Eq. (27) is used to classify Negative Predictive Value (NPV)
.
![]() |
30 |
(f) Eq. (28) provides a definition for False Discovery Rate (FDR)
.
![]() |
31 |
Convergence analysis
Fig. 6 provides the analysis on proposed lyric text classification model considering the convergence score.The proposed technique’s convergence over the existing models is validated using this cost function-based experiment.The developed lyrics text classification model given a cost function score is 11.42% lower than EOO-SCHADNet, 9.26% lower than EVO-SCHADNet, 11.26% lower than PO-SCHADNet and 13.48% lower than MPA-SCHADNet at the 30th iteration. When considering the 40th iteration, the cost function is 17.33% lower than EOO-SCHADNet, 7% lower than EVO-SCHADNet, 12.47% lower than PO-SCHADNet and 11.42% lower than MPA-SCHADNet.The proposed IMPA-SCHADNet achieved a higher convergence rate than the existing techniques due to the lower cost function values of the designed model. Also, it has been reported that the IMPA-SCHADNet technique is efficiently supported to classify the texts in the lyrics than the other models.
Fig 6.
Cost function analysis on the developed lyrics text classification model based on (a) Dataset-1 and (b) Dataset-2.
Dataset-1-based performance analysis on the proposed lyrics text classification model
Dataset-1-based analysis on the lyrics text classification model is shown in Fig. 7 with existing classifiers and Fig. 8 with heuristic approaches.This experiment takes into account activation functions such as linear, sigmoid, TanH, softmax, and ReLU to ensure the designed model’s improved rates of performance. This activation function-aided validation ensures the designed model how effectively classifies the lyrics than the other traditional techniques. When analyzing the classifiers, the developed model offered an accuracy value score is 8.23% more than LSTM, 2.22% enhanced than Trans Bi-LSTM, 6.97% increased than GRU, and 1.09% superior to Trans-Bi-LSTM-GRU while analyzing the sigmoid function. When taking the TanH function, the developed model offered the NPV value based on algorithms is 0.91% superior to EOO-SCHADNet, 0.61% more than EVO-SCHADNet, 0.51% enhanced than PO-SCHADNet and 0.24% increased than MPA-SCHADNet.The IMPA-SCHADNet model is better suited for text classification than any other traditional techniques because of its superior value.
Fig 7.
ClassifierAnalysis on the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.
Fig 8.
Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.
Dataset-2-based Analysis on the proposed lyrics text classification model
Dataset-2-based performance analysis on the proposed lyrics text classification model with existing classifiers and heuristic approaches are shown in Fig. 9 and Fig. 10.This experiment also utilized the various standard activation functions for analyzing the designed model over other previous techniques. Based on classifiers, the developed model offered an accuracy value is 27.28% more than LSTM, 31.03% enhanced than Trans-Bi-LSTM, 22.05% increased than GRU, and 18.8% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. When considering the Linear function, the developed model offered the NPV value based on algorithms is 9.62% superior to LSTM, 6.8% more than Trans-Bi-LSTM, 4.92% enhanced than GRU and 2.56% increased than Trans-Bi-LSTM-GRU.According to the other performance measures, the designed technique has better rates of performance than the other classification techniques. Thus, it has been elucidated that the designed lyrics classification approach offers relatively more efficient solutions than any other models when considering the second dataset.
Fig 9.
ClassifierAnalysis on the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.
Fig 10.
Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.
Performance analysis of developed model using dataset 3
Based on the third dataset, the experimental validation is given in Fig.11 and Fig.12 over the previous algorithms and models. Here, the graph analysis is conducted by considering the different activation functions of ReLu, sigmoid, linear, tanh and softmax is validated to provide superior outcomes. This experiment validation shows the suggested lyrics text classification framework’s superior solutions with the support of various activation functions. When considering the ReLU activation function in Fig.11 (b), the FNR of the designed lyrics text classification process is minimized by 38.82% of LSTM, 61.17% of Trans-Bi-LSTM, 35.29% of GRU, and 11.76% of Trans-Bi-LSTM-GRU respectively. The design of the lyrics text classification process resulted in relatively lower error rates than other models, which led to an increase in performance rates.
Fig. 11.
Classifier-based analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, and (c) Precision.
Fig. 12.
Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, and (c) Precision.
Overall classifier analysis on the proposed lyrics text classification model
Table 2 illustrates the overall classification analysis of the proposed lyrics text classification model based on three datasets. Standard true and false measures are used for this experimental validation. These measures show the reliability and efficiency of the presented work over the existing models.In dataset-1, the developed model provided the F1-Score is 20.40% more than LSTM, 8.16% enhanced than Trans Bi-LSTM, 13.69% increased than GRU, and 6.04% superior to Trans-Bi-LSTM-GRU. When considering dataset-2, the developed model offered the specificity is 3.94% superior to LSTM, 5.2% more than Trans-Bi-LSTM, 3.6% enhanced than GRU and 0.84% increased than Trans-Bi-LSTM-GRU. Similarly, when considering the third dataset, the FDR of the suggested lyrics text classification process is minimized by 16.56% of LSTM, 24.43% of Trans-Bi-LSTM, 16.3% of GRU, and 6.48% of Trans-Bi-LSTM-GRU accordingly. The three datasets show the superior solutions of the suggested model over the other techniques for any performance metrics.The designed lyrics text classification technique has been shown to have low error rates and high classification accuracy rates compared to conventional techniques in both dataset 1 and dataset 2.
Table 2.
OverallClassifier analysis on the proposed lyrics text classification model.
| Terms | LSTM34 | Trans-Bi-LSTM35 | GRU32 | Trans-Bi-LSTM-GRU36 | IMPA-SCHADNet |
|---|---|---|---|---|---|
| Dataset-1 | |||||
| Precision | 47.009 | 54.713 | 50.961 | 56.306 | 61.369 |
| Recall | 88.872 | 91.574 | 90.354 | 92.046 | 93.471 |
| NPV | 98.628 | 98.988 | 98.827 | 99.049 | 99.230 |
| FPR | 11.131 | 8.422 | 9.661 | 7.936 | 6.538 |
| FNR | 11.128 | 8.426 | 9.646 | 7.954 | 6.529 |
| Accuracy | 88.869 | 91.578 | 90.341 | 92.062 | 93.463 |
| FDR | 52.991 | 45.287 | 49.039 | 43.694 | 38.631 |
| Specificity | 88.869 | 91.578 | 90.339 | 92.064 | 93.462 |
| F1-Score | 61.492 | 68.500 | 65.166 | 69.871 | 74.092 |
| MCC | 0.596 | 0.668 | 0.634 | 0.682 | 0.726 |
| Dataset-2 | |||||
| Recall | 89.701 | 88.459 | 89.920 | 92.549 | 92.988 |
| Specificity | 89.579 | 88.507 | 89.871 | 92.330 | 93.109 |
| Precision | 74.155 | 71.955 | 74.742 | 80.088 | 81.812 |
| FDR | 25.845 | 28.045 | 25.258 | 19.912 | 18.188 |
| FPR | 10.421 | 11.493 | 10.129 | 7.670 | 6.891 |
| Accuracy | 89.609 | 88.495 | 89.883 | 92.385 | 93.079 |
| NPV | 96.309 | 95.834 | 96.396 | 97.381 | 97.551 |
| FNR | 10.299 | 11.541 | 10.080 | 7.451 | 7.012 |
| F1-Score | 81.190 | 79.358 | 81.631 | 85.869 | 87.043 |
| MCC | 0.747 | 0.722 | 0.753 | 0.811 | 0.827 |
| Dataset-3 | |||||
| MCC | 0.6337755 | 0.6019229 | 0.6348357 | 0.6724605 | 0.6964619 |
| Recall | 90.340578 | 89.115489 | 90.397094 | 91.700065 | 92.515413 |
| FDR | 49.040598 | 52.351401 | 48.937052 | 44.803024 | 42.075965 |
| Precision | 50.959402 | 47.648599 | 51.062948 | 55.196976 | 57.924035 |
| FNR | 9.6594218 | 10.884511 | 9.6029058 | 8.2999349 | 7.4845873 |
| FPR | 9.6598813 | 10.878997 | 9.6259563 | 8.2702601 | 7.4670122 |
| Specificity | 90.340119 | 89.121003 | 90.374044 | 91.72974 | 92.532988 |
| NPV | 98.825917 | 98.661148 | 98.833139 | 99.004646 | 99.109276 |
| Accuracy | 90.340165 | 89.120452 | 90.376349 | 91.726772 | 92.53123 |
| F1-Score | 65.162102 | 62.095661 | 65.261427 | 68.913115 | 71.24283 |
Overall analysis on the proposed lyrics text classification model based on algorithms
The overall analysis of the proposed lyrics text classification model based on three datasets is shown in Table 3.The developed lyrics text classification model given an MCC is 22.01% more than EOO-SCHADNet, 17.47% enhanced than EVO-SCHADNet, 15.6% increased than PO-SCHADNet, and 7.55% superior to MPA-SCHADNet based on dataset-1. When considering the dataset-2, FDR is 41.85% superior to EOO-SCHADNet, 38.39% more than EVO-SCHADNet, 30.46% enhanced than PO-SCHADNet and 12.8% increased than MPA-SCHADNet. Likewise, when considering the third dataset, the recommended lyrics text classification process’s precision is enhanced by 25.94% of EOO-SCHADNet, 21.25% of EVO-SCHADNet, 13.27% of PO-SCHADNet, and 7.89% of MPA-SCHADNet accordingly. These experimental validations for three data sources reported the superior solutions of the designed technique.The experimental validations enabled the designed text classification process to achieve more effective solutions than conventional models, ensuring the model’s robustness and reliability.
Table 3.
OverallAlgorithmic analysis of the proposed lyrics text classification model.
| Terms | EOO-SCHADNet37 | EVO-SCHADNet38 | PO-SCHADNet39 | MPA-SCHADNet30 | IMPA-SCHADNet |
|---|---|---|---|---|---|
| Dataset-1 | |||||
| Accuracy | 88.863 | 89.743 | 90.121 | 91.827 | 93.463 |
| Recall | 88.853 | 89.748 | 90.144 | 91.827 | 93.471 |
| Specificity | 88.864 | 89.742 | 90.119 | 91.827 | 93.462 |
| Precision | 46.994 | 49.294 | 50.339 | 55.523 | 61.369 |
| FPR | 11.136 | 10.258 | 9.881 | 8.173 | 6.538 |
| FNR | 11.147 | 10.252 | 9.856 | 8.173 | 6.529 |
| NPV | 98.625 | 98.747 | 98.799 | 99.021 | 99.230 |
| FDR | 53.006 | 50.706 | 49.661 | 44.477 | 38.631 |
| F1-Score | 61.474 | 63.636 | 64.603 | 69.203 | 74.092 |
| MCC | 0.595 | 0.618 | 0.628 | 0.675 | 0.726 |
| Dataset-2 | |||||
| Accuracy | 86.779 | 87.692 | 89.481 | 91.892 | 93.079 |
| Recall | 86.486 | 87.363 | 89.701 | 91.746 | 92.988 |
| Specificity | 86.876 | 87.801 | 89.408 | 91.941 | 93.109 |
| Precision | 68.717 | 70.477 | 73.842 | 79.143 | 81.812 |
| FPR | 13.124 | 12.199 | 10.592 | 8.059 | 6.891 |
| FNR | 13.514 | 12.637 | 10.299 | 8.254 | 7.012 |
| NPV | 95.071 | 95.422 | 96.302 | 97.094 | 97.551 |
| FDR | 31.283 | 29.523 | 26.158 | 20.857 | 18.188 |
| F1-Score | 76.585 | 78.017 | 81.003 | 84.980 | 87.043 |
| MCC | 0.684 | 0.704 | 0.745 | 0.799 | 0.827 |
| Dataset-3 | |||||
| Accuracy | 87.111719 | 88.303622 | 90.085257 | 91.145829 | 92.53123 |
| Recall | 87.144319 | 88.299108 | 90.082465 | 91.153169 | 92.515413 |
| Specificity | 87.108097 | 88.304124 | 90.085567 | 91.145013 | 92.532988 |
| Precision | 42.892011 | 45.617999 | 50.237732 | 53.353356 | 57.924035 |
| FPR | 12.891903 | 11.695876 | 9.9144333 | 8.8549869 | 7.4670122 |
| FNR | 12.855681 | 11.700892 | 9.9175348 | 8.8468311 | 7.4845873 |
| NPV | 98.386644 | 98.549065 | 98.791558 | 98.933027 | 99.109276 |
| FDR | 57.107989 | 54.382001 | 49.762268 | 46.646644 | 42.075965 |
| F1-Score | 57.488474 | 60.157043 | 64.503027 | 67.309452 | 71.24283 |
| MCC | 0.553628 | 0.5816648 | 0.6269435 | 0.6559782 | 0.6964619 |
Statistical analysis of the proposed lyrics text classification model
Statistical performance analysis on the proposed lyrics text classification model based on dataset-1 and dataset-2 is shown in Table 4.Here, the statistical measures such as worst, best, mean, median, and standard deviation are considered for this experiment. The minimum recorded performance value is defined by the best measure, while the median explains the middle value of the performance metric. Finally, the standard deviation indicates the variability of the performance metric. These metrics are employed for fitness function validation, where the accuracy, sensitivity, FPR, and FNR are considered.The median of the developed lyrics text classification model is 7.19% more than EOO-SCHADNet, 3.5% enhanced than EVO-SCHADNet, 5.05% increased than PO-SCHADNet, and 4.92% superior to MPA-SCHADNetbased on dataset-1. When considering dataset 2, the standard deviation of the developed model is 15.73% superior to EOO-SCHADNet, 36.22% more than EVO-SCHADNet, 2.45% enhanced than PO-SCHADNet and 32.55% increased than MPA-SCHADNet. The experimental validations indicate that the designed model is effective in selecting optimal solutions and offers better performance rates than the existing algorithms.
Table 4.
Statistical analysis of the proposed lyrics text classification model.
| Terms | EOO-SCHADNet 37 | EVO-SCHADNet 38 | PO-SCHADNet 39 | MPA-SCHADNet 30 | IMPA-SCHADNet |
|---|---|---|---|---|---|
| Dataset-1 | |||||
| Worst | 5.020 | 5.413 | 6.704 | 6.918 | 6.295 |
| Best | 4.044 | 4.048 | 4.038 | 4.108 | 3.906 |
| Mean | 4.323 | 4.317 | 4.293 | 4.283 | 3.994 |
| Median | 4.209 | 4.048 | 4.114 | 4.108 | 3.906 |
| Std | 0.321 | 0.484 | 0.712 | 0.544 | 0.435 |
| Dataset-2 | |||||
| Worst | 5.868 | 6.151 | 6.489 | 7.039 | 5.574 |
| Best | 4.322 | 4.039 | 4.196 | 4.071 | 3.918 |
| Mean | 4.726 | 4.354 | 4.354 | 4.298 | 4.100 |
| Median | 4.558 | 4.039 | 4.214 | 4.071 | 4.033 |
| Std | 0.445 | 0.588 | 0.366 | 0.556 | 0.375 |
ROC analysis on the proposed lyrics text classification model
ROC analysis on the suggested lyrics text classification model is depicted in Fig. 13.This ROC-aided experiment illustrates the designed technique’s minimized error rates over the existing classification models.In dataset-1, the developed model provided the ROC score is 15.29% more than LSTM, 7.92% enhanced than Trans Bi-LSTM, 2.43% increased than GRU, and 0.2% superior to Trans-Bi-LSTM-GRU. By analyzing the ROC, the model’s ability can be maximized with different thresholds among the classes. Thus, it can minimize the misclassification issues to improve the overall performance of the model. The implemented text classification model for lyrics is guaranteed to offer efficient solutions with lower error rates than other existing techniques through experimental validation.
Fig. 13.
ROC analysis on the developed lyrics text classification model based on (a) Dataset-1 and (b) Dataset-2.
State-of-the-art-Method comparative analysis in lyrics text classification model
By comparing the traditional and related classification models in Table 5, the performance of the suggested lyrics text classification process is validated. In this table validation, the state-of-the-art-techniques like CNN, LSTM and DNN model is validated and also the recent techniques of CNN with Fast Text embeddings (CNN-FT)40, Convolution and Attention with a Bi-directional Gated Recurrent Unit (CAT-BiGRU)41 and Multi-View RNN (MV-RNN)42is validated to prove the efficiency in the developed model. In this validation, the accuracy of the developed model shows 93.4%. Higher accuracy performance could effectively minimize the error rate to improve the classification performance. Moreover, the error rate of developed IMPA-SCHADNet model shows 6.53% in terms of FPR. Considering dataset-2, the designed text classification process’s sensitivity is enhanced by 18.6% of CNN, 14.8% of LSTM, 13.2% of DNN, 10.9% of CNN-FT40, 6.24% of CAT-BiGRU41 and 11.3% of MV-RNN42 respectively. Thus, it has been reported that the implemented lyrics text classification process achieved very effective and superior solutions to the conventional and related classification models.
Table 5.
Overallperformance analysis of the proposed lyrics text classification model over state-of-the-art models.
| Terms | State-of-the-art-techniques | Recent techniques |
Proposed IMPA-SCHADNet |
||||
|---|---|---|---|---|---|---|---|
| CNN 24 | LSTM 25 | DNN 27 | CNN-FT 40 | CAT-BiGRU 41 | MV-RNN 42 | ||
| Dataset-1 | |||||||
| Accuracy | 81.35 | 81.35 | 83.78 | 86.54 | 88.65 | 85.08 | 93.46 |
| Sensitivity | 79.12 | 79.20 | 81.69 | 84.20 | 86.65 | 82.62 | 93.47 |
| Specificity | 83.88 | 83.77 | 86.11 | 89.16 | 90.83 | 87.87 | 93.46 |
| Precision | 54.80 | 49.64 | 60.76 | 59.71 | 58.18 | 60.56 | 61.36 |
| FPR | 16.12 | 16.23 | 13.89 | 10.84 | 9.17 | 12.13 | 6.538 |
| FNR | 20.88 | 20.80 | 18.31 | 15.80 | 13.35 | 17.38 | 6.52 |
| NPV | 77.94 | 78.10 | 80.84 | 83.41 | 86.15 | 81.64 | 99.23 |
| FDR | 45.20 | 45.36 | 43.24 | 40.29 | 48.82 | 41.44 | 38.63 |
| F1-Score | 71.86 | 71.83 | 72.15 | 73.87 | 72.85 | 73.49 | 74.09 |
| MCC | 62.87 | 62.86 | 67.70 | 73.24 | 77.40 | 70.35 | 72.60 |
| Dataset-2 | |||||||
| Accuracy | 79.16 | 82.00 | 83.70 | 85.56 | 89.38 | 84.43 | 93.07 |
| Sensitivity | 78.35 | 80.97 | 82.10 | 83.82 | 87.52 | 83.48 | 92.98 |
| Specificity | 80.14 | 83.24 | 85.71 | 87.75 | 91.67 | 85.56 | 93.10 |
| Precision | 80.87 | 78.51 | 80.85 | 79.56 | 80.83 | 77.38 | 81.81 |
| FPR | 19.86 | 16.76 | 14.29 | 12.25 | 8.33 | 14.44 | 6.89 |
| FNR | 21.65 | 19.03 | 17.90 | 16.18 | 12.48 | 16.52 | 7.01 |
| NPV | 75.13 | 78.17 | 79.19 | 81.22 | 85.62 | 81.22 | 97.55 |
| FDR | 27.13 | 24.49 | 22.15 | 20.44 | 27.17 | 19.62 | 18.18 |
| F1-Score | 80.55 | 83.18 | 84.88 | 86.60 | 80.10 | 85.39 | 87.04 |
| MCC | 58.24 | 63.95 | 67.42 | 71.18 | 78.82 | 68.82 | 82.70 |
| Dataset-3 | |||||||
| Accuracy | 79.16 | 82.00 | 83.70 | 85.56 | 89.38 | 84.43 | 92.53 |
| Sensitivity | 78.35 | 80.97 | 82.10 | 83.82 | 87.52 | 83.48 | 92.52 |
| Specificity | 80.14 | 83.24 | 85.71 | 87.75 | 91.67 | 85.56 | 92.53 |
| Precision | 52.87 | 55.51 | 57.85 | 56.56 | 54.83 | 53.38 | 57.92 |
| FPR | 19.86 | 16.76 | 14.29 | 12.25 | 8.33 | 14.44 | 7.47 |
| FNR | 21.65 | 19.03 | 17.90 | 16.18 | 12.48 | 16.52 | 7.48 |
| NPV | 75.13 | 78.17 | 79.19 | 81.22 | 85.62 | 81.22 | 99.11 |
| FDR | 45.13 | 44.49 | 43.15 | 45.44 | 47.17 | 45.62 | 42.08 |
| F1-Score | 70.55 | 63.18 | 64.88 | 66.60 | 70.10 | 65.39 | 71.24 |
| MCC | 58.24 | 63.95 | 67.42 | 61.18 | 68.32 | 68.82 | 69.65 |
Ablation study of the proposed model
Table 6 represents the ablation study of the designed model. This analysis helps to evaluate the effectiveness of the developed system. The Table demonstrates that the classical BiLSTM system attains 88.8% accuracy, which is relatively lower compared to other models, representing poor user experience and inefficient resource allocation. However, the developed model attains 93.4% of accuracy, leading to more efficient and enhanced performance. Therefore, the developed model demonstrates superior performance in text classification than traditional models.
Table 6.
Ablation study of the proposed model.
| Terms | BiLSTM | BiLstm-GRU | LSTM-GRU | TransLstm - GRU |
Proposed IMPA-SCHADNet |
|---|---|---|---|---|---|
| Dataset-1 | |||||
| Accuracy | 88.86751 | 91.57618 | 90.3464 | 92.0586 | 93.4633 |
| Sensitivity | 88.85014 | 91.5698 | 90.33334 | 92.06673 | 93.47067 |
| Specificity | 88.86944 | 91.57689 | 90.34785 | 92.05769 | 93.46248 |
| Precision | 47.00439 | 54.70844 | 50.97742 | 56.29358 | 61.36936 |
| FPR | 11.13056 | 8.423114 | 9.652147 | 7.942306 | 6.537522 |
| FNR | 11.14986 | 8.430197 | 9.666659 | 7.93327 | 6.529328 |
| NPV | 98.62513 | 98.98751 | 98.82515 | 99.05156 | 99.22975 |
| FDR | 52.99561 | 45.29156 | 49.02258 | 43.70642 | 38.63064 |
| F1-Score | 61.48262 | 68.49469 | 65.17495 | 69.86728 | 74.09241 |
| MCC | 0.595509 | 0.66818 | 0.633887 | 0.68234 | 0.725815 |
| Dataset-2 | |||||
| Accuracy | 89.59094 | 88.86048 | 89.81008 | 92.31191 | 93.07889 |
| Sensitivity | 89.77356 | 88.60482 | 89.55442 | 92.40321 | 92.98758 |
| Specificity | 89.53007 | 88.9457 | 89.8953 | 92.28147 | 93.10933 |
| Precision | 74.08077 | 72.76545 | 74.71054 | 79.96207 | 81.81234 |
| FPR | 10.46993 | 11.0543 | 10.1047 | 7.718529 | 6.890674 |
| FNR | 10.22644 | 11.39518 | 10.44558 | 7.596786 | 7.012418 |
| NPV | 96.3322 | 95.90444 | 96.27119 | 97.32922 | 97.55102 |
| FDR | 25.91923 | 27.23455 | 25.28946 | 20.03793 | 18.18766 |
| F1-Score | 81.17569 | 79.90777 | 81.46179 | 85.73365 | 87.04274 |
| MCC | 0.747262 | 0.729752 | 0.750965 | 0.809036 | 0.826616 |
Convergence time complexity analysis of the proposed model
Table 7 shows the convergence time analysis of the proposed model. Here, the traditional Bi-LSTM model attains higher training time and poor scalability, which indicates that the model struggles with large datasets and real world applications. However, the proposed hybrid model demonstrates superior efficiency. By leveraging the strengths of Trans-Bi-LSTM and GRU, the designed framework acheives minimzed training time, faster convergence and lower cost values. This is primarily because the serial cascaded architecture improves network flexibility, allowing for more efficient feature propagation. Furthermore, the integration of the IMPA ensures the model reaches an optimal solution rapidly. This enhanced optimization leads to a significantly better convergence rate and improved overall training performance for the lyrics text classification task. Therefore, the developed model is more effective compared to traditional models
Table 7.
Convergence time analysis of the proposed model.
| Model | Training Characteristics | Convergence/Time Processing Result |
|---|---|---|
| Bi-LSTM | Sequential, slow | Higher training time, poor scalability |
| Trans-Bi-LSTM | Parallelizable | Reduced training time, better scalability |
| GRU | Efficient, lightweight | Lower complexity, faster training |
| SCHADNet (Proposed) | Hybrid (Trans-Bi-LSTM + GRU, tuned by IMPA) | Minimized training time, faster convergence, lower cost values |
Best/Worst analysis of the proposed model
Figure 14 demonstrates the best and worst analysis of the developed model. This analysis helps to demonstrates superiority, quantifies performance gains and to evaluate robustness of the model. In Figure 14(a), the worst (LSTM) attains an accuracy of 88.87% and the Best(SCHADNet) gains 93.46%, which indicates that the proposed model gains superior accuracy leading to enhanced reliability, improved decision making and better user experience. As a resut, it is proven that the proposed SCHADNet model achives greater performance compared to traditional models.
Fig. 14.
Best/Worst analysis on the developed lyrics text classification model based on Datset 1 and 2 in terms of (a) Accuracy, (b) F1-Score and (c) Recall.
State of comparison of the proposed model
The State of Art analysis of the proposed model is stated in Table 8. This comparison is performed to evaluate the performance and efficiency of the system. This evaluation is useful for identifying the advantage and disadvantage, contributing to the ongoing process of the system. In this Table, the traditional SVM model achieves a low accuracy of 88.8%, leads to inaccurate results and a waste of resources. However, the developed IMPA-SCHADNet model gain an accuracy of 93.4% which is superiorto other classical models, leads to enhanced efficiency and better decision making. As a result, the suggested IMPA-SCHADNetmodel achieved better performance than other models.
Table 8.
Comparative analysis of the suggested model with different optimizers.
| TERMS | SVM43 | SLEM44 | IMPA-SCHADNet |
|---|---|---|---|
| Dataset 1 | |||
| Accuracy | 88.86913 | 91.57611 | 93.4633 |
| Recall | 88.87082 | 91.586 | 93.47067 |
| Specificity | 88.86894 | 91.57501 | 93.46248 |
| Precision | 47.00907 | 54.70731 | 61.36936 |
| FPR | 11.13106 | 8.42499 | 6.537522 |
| FNR | 11.12918 | 8.414001 | 6.529328 |
| NPV | 98.62764 | 98.98942 | 99.22975 |
| FDR | 52.99093 | 45.29269 | 38.63064 |
| F1-Score | 61.49158 | 68.49833 | 74.09241 |
| MCC | 0.595633 | 0.668242 | 0.725815 |
| Dataset 2 | |||
| Accuracy | 89.49963 | 88.64134 | 89.70051 |
| Recall | 89.48137 | 88.67787 | 89.70051 |
| Specificity | 89.50572 | 88.62917 | 89.70051 |
| Precision | 73.97343 | 72.21892 | 74.37916 |
| FPR | 10.49428 | 11.37083 | 10.29949 |
| FNR | 10.51863 | 11.32213 | 10.29949 |
| NPV | 96.23037 | 95.91568 | 96.31373 |
| FDR | 26.02657 | 27.78108 | 25.62084 |
| F1-Score | 80.99174 | 79.60656 | 81.3245 |
| MCC | 0.744661 | 0.725761 | 0.749205 |
Impact of feature extraction on the proposed model
Table 9 shows the impact of feature extraction on the proposed model. This analysis is performed over various feature extraction techniques like Glove embedding, Term Frequency Inverse Document Frequency (TF-IDF), and Bidirectional Encoder Representations from Transformers (BERT) to showcase the efficacy of the developed framework without these feature extraction processes. Here, the accuracy of the designed IMPA-SCHADNet is 93.46%, whereas the addition of Glove embedding in the IMPA-SCHADNet achieved the accuracy of 91.38%. Similarly, the integration of BERT in the designed IMPA-SCHADNet attained 92.74% accuracy, which is lower than the designed IMPA-SCHADNet techniques. Thus, the result confirmed that the use of Trans-Bi-LSTM in the designed IMPA-SCHADNet can effectively extract the significant features from the given input. These findings suggest that the internal feature extraction mechanism of the designed IMPA-SCHADNet is more effective for this classification task than relying on traditional feature extraction techniques like Glove embedding, TF-IDF, and BERT.
Table 9.
Impact of feature extraction on the proposed model.
| Models | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | FNR (%) | FPR (%) |
|---|---|---|---|---|---|---|
| TF-IDF+ IMPA-SCHADNet | 87.92 | 78.44 | 86.31 | 82.16 | 13.69 | 12.08 |
| GloVe+ IMPA-SCHADNet | 91.38 | 83.92 | 90.87 | 87.24 | 9.13 | 8.62 |
| BERT+ IMPA-SCHADNet | 92.74 | 86.15 | 92.08 | 89.01 | 7.92 | 7.26 |
| IMPA-SCHADNet | 93.46 | 87.92 | 93.47 | 90.61 | 6.53 | 6.54 |
Discussions
An effective lyrics text classification approach is implemented in this work by utilizing the powerful deep learning techniques. Various performance measures are used to support the experimental analysis of the designed technique. Also, the cost function experiment is performed for the developed approach. This experiment reported that the designed approach obtained very low cost function values thus confirming the higher convergence rates. Moreover, the performance examination of the developed lyrics text classification process is conducted for the first dataset over the previous classifiers and algorithms. The performance of the lyrics text classification process, which was developed using classical models and algorithms, is examined in section "dataset-2-based analysis on the proposed lyrics text classification model" using a second dataset. These performance experiments elucidated that the designed lyrics text classification process obtained relatively superior solutions than any other techniques over the classical models. Overall comparative examination of the implemented lyrics text classification process over existing techniques and algorithms for three data sources is examined. The overall comparative examination demonstrates the improved performance rates of the suggested approach, which ensures high efficiency in the classification process. Also, it provides the statistical experiment of the implemented model by considering the statistical measures. This experiment ensures the IMPA algorithm helps to select the optimal parameters more effectively than any other existing algorithms thus providing the detailed insights of the designed process. In addition, the ROC validation of the suggested lyrics text classification process has been confirmed. This operation explains that the implemented approach attained very lower error rates than the classical models thus offering outstanding solutions. Thus, the developed model provides the performance investigation of the developed lyrics text classification process by comparing it with the state-of-the-art models. Based on this experimental analysis, it was found that the designed model outperforms the state-of-the-art models and provides highly accurate solutions. Finally, the performance verification of the designed model utilizing a third dataset over existing algorithms and classifiers are computed. The implemented text classification process for lyrics is more effective than the baseline models thanks to this experimental solution.
Conclusion
This paper provided a lyrics text classification approach that utilized deep learning to classify the lyrics text based on its mood, genre, sentiment, and performer. The text pre-processing step was preceded by the acquisition of essential textual information from usual internet sites. Following that, SCHADNet was used to classify the text using the pre-processed text. The parameters, like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU, were tuned using the proposed IMPA algorithm to enhance the accuracy and sensitivity, along with reducing the FNR and FPR. Finally, the developed SCHADNet model provided the text-classified results. To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted against a variety of traditional methods. From the evaluation, the developed model provided a precision value was 41.22% more than LSTM, 18.95% enhanced than Trans-Bi-LSTM, 38.82% increased than GRU, and 13.46% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. The mean of the developed lyrics text classification model is 7.61% more than EOO, 7.48% enhanced than EVO, 6.96% increased than PO, and 6.74% superior to MPA. The experimental validations have confirmed that the proposed lyrics text classification process outperformed and provided more effective solutions than the traditional techniques. This designed lyrics text classification process is used for some practical implications like mood-aided analysis, research and academia, music recommendation systems, artist and genre analysis, and so on.
Limitations of the developed model
The main limitations of the developed SCHADNet system are its computational complexity due to the combination of several deep learning components, Transformer, Bi-LSTM, and GRU in a serial cascaded structure. This design, while effective for extracting contextual and sequential connections, demands greater computational resources, extended training time, and superior memory than simpler architectures. Furthermore, because the system learns end-to-end without predefined feature extraction, it demands a considerable amount of training data to attain optimal generalization.
Future scope
In future, strategies like transfer learning, self-supervised learning, and advanced data augmentation will be introduced to minimize the reliance on vast amounts of data and improve the systems capability to generalize.
Acknowledgements
I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.
Author contributions
All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Data availability
Dataset 1: The data underlying this article are available in https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. Dataset 2: The data underlying this article are available in https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03.
Declarations
Competing interest
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Furner, M., Islam, M. Z. & Li, C. T. Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data. Expert Syst. Appl.182, 115236 (2021). [Google Scholar]
- 2.Hizlisoy, S., Yildirim, S. & Tufekci, Z. Music emotion recognition using convolutional long short term memory deep neural networks. Eng. Sci. Technol. Int J.24(3), 760–767 (2021). [Google Scholar]
- 3.Wang, C. & Ko, Y. C. Emotional representation of music in multi-source data by the internet of things and deep learning. J. Supercomput.79(1), 349–366 (2023). [Google Scholar]
- 4.Jena, K. K., Bhoi, S. K., Mohapatra, S. & Bakshi, S. A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Comput. Appl.35(1), 11223–11248 (2023). [Google Scholar]
- 5.Khattak, A., Asghar, M. Z., Khalid, H. A. & Ahmad, H. Emotion classification in poetry text using deep neural network. Multimed. Tools Appl.81(18), 26223–26244 (2022). [Google Scholar]
- 6.Yang, L., Shen, Z., Zeng, J., Luo, X. & Lin, H. COSMIC: music emotion recognition combining structure analysis and modal interaction. Multimed. Tools Appl.83 (5), 1–16 (2023). [Google Scholar]
- 7.Dong, L. Using deep learning and genetic algorithms for melody generation and optimization in music. Soft Comput.27(1), 17419–17433 (2023). [Google Scholar]
- 8.Sarkar, R., Choudhury, S., Dutta, S., Roy, A. & Saha, S. K. Recognition of emotion in music based on deep convolutional neural network. Multimed. Tools Appl.79, 765–783 (2020). [Google Scholar]
- 9.Policicchio, V. L., Pietramala, A. & Rullo, P. GAMoN: discovering M-of-N ¬,∨ hypotheses for text classification by a lattice-based genetic algorithm. Artif. Intell.191, 61–95 (2012). [Google Scholar]
- 10.Dwiyani, L. K. D., Suarjaya, I. M. A. D. & Rusjayanthi, N. K. D. Classification of explicit songs based on lyrics using random forest algorithm. J. Inform. Syst. Inform.5, 550–567 (2023). [Google Scholar]
- 11.Du, J. Sentiment analysis and lyrics theme recognition of music lyrics based on natural language processing. J. Electr. Syst.20, 315–321 (2024). [Google Scholar]
- 12.Xie, C. et al. Music genre classification based on res-gated CNN and attention mechanism. Multimed. Tools Appl.83(5), 13527–13542 (2024). [Google Scholar]
- 13.Jandaghian, M., Setayeshi, S., Razzazi, F. & Sharifi, A. Music emotion recognition based on a modified brain emotional learning model. Multimed. Tools Appl.82(4), 26037–26061 (2023). [Google Scholar]
- 14.Rajan, R. & Nithin, S. K. Folk music structural segment classification using GRU-based hierarchical attention network. Sādhanā48(4), 254 (2023). [Google Scholar]
- 15.Hongdan, W., SalmiJamali, S., Zhengping, C., Qiaojuan, S. & Le, R. An intelligent music genre analysis using feature extraction and classification using deep learning techniques. Comput. Electr. Eng.100, 107978 (2022). [Google Scholar]
- 16.Sujeesha, A. S., Mala, J. B. & Rajan, R. Automatic music mood classification using multi-modal attention framework. Eng. Appl. Artif. Intell.128, 107355 (2024). [Google Scholar]
- 17.da Silva, A. C. M., Coelho, M. A. N. & Neto, R. F. A music classification model based on metric learning applied to MP3 audio files. Expert Syst. Appl.144, 113071 (2020). [Google Scholar]
- 18.Andreyan Rizky Baskara; Muti’a Maulida; Muhammad Tri Madya Lestiyanto; Yuslena Sari; Nurul Fathanah Mustamin; Eka Setya Wijaya, Explicit content classification in indonesian song lyrics using the LSTM-CNN method. 2024 Ninth International Conference on Informatics and Computing (ICIC) (2024).
- 19.Bonela, Abraham Albert, He, Zhen, Luxford, Dan-Anderson., Riordan, Benjamin & Kuntsche, Emmanuel. Development of the lyrics-based deep learning algorithm for identifying alcohol-related words (LYDIA). Alcohol Alcohol.59, 2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bolla, B. K., Pattnaik, S. R. & Patra, S. Detection of objectionable song lyrics using weakly supervised learning and natural language processing techniques. Procedia Comput. Sci.235, 1929–1942 (2024). [Google Scholar]
- 21.Syed Nawaz Pasha; Dadi Ramesh; Sallauddin Mohmmad; Shabana; D. Kothandaraman; T. Sravanthi, Song lyrics genre detection using RNN. AIP Conference Proceedings 2971(1) (2024).
- 22.Abdillah, J., Asror, I. & Wibowo, Y. F. A. Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting. J. RESTI (Rekayasa Sistem Dan Teknologi Informasi)4(4), 723–729 (2020). [Google Scholar]
- 23.Revathy, V. R., Pillai, A. S. & Daneshfar, F. LyEmoBERT: classification of lyrics’ emotion and recommendation using a pre-trained model. Procedia Comput. Sci.218, 1196–1208 (2023). [Google Scholar]
- 24.Jia, X. Music emotion classification method based on deep learning and improved attention mechanism. Comput. Intell. Neurosci.2022, 5181899 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chen, X. et al. A novel approach for explicit song lyrics detection using machine and deep ensemble learning models. PeerJ Comput. Sci.9, e1469 (2023). [Google Scholar]
- 26.Li, Y., Zhang, Z., Ding, H. & Chang, L. Music genre classification based on fusing audio and lyric information. Multimed. Tools Appl.82(13), 20157–20176 (2023). [Google Scholar]
- 27.Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J. and Moussallam, M., Music mood detection based on audio and lyrics with deep neural net. arXiv preprint (2018).
- 28.F. Almeida do Carmo, J. L. Figueira da Silva Junior, R. Geraldeli Rossi and F. M. França Lobato, Text representations for lyric-based identification of musical subgenres. IEEE Latin America Transactions 21(6):737-744 (2023).
- 29.Tsaptsinos, A., Lyrics-based music genre classification using a hierarchical attention network. arXiv (2017).
- 30.Faramarzia, A., Heidarinejada, M., Mirjalili, S. & Gandomi, A. H. marine predators algorithm: a nature-inspired metaheuristic. Expert Syst. Appl.152, 113377 (2020). [Google Scholar]
- 31.Ye, H. et al. Web services classification based on wide & Bi-LSTM model. IEEE Access7, 43697–43706 (2019). [Google Scholar]
- 32.Naeem, A. et al. A novel combined densenet and gated recurrent unit approach to detect energy thefts in smart grids. IEEE Access11, 59496–59510 (2023). [Google Scholar]
- 33.Sun, J., Han, P., Cheng, Z., Wu, E. & Wang, W. Transformer based multi-grained attention network for aspect-based sentiment analysis. IEEE Access8, 211152–211163 (2020). [Google Scholar]
- 34.Alfarizi, M. I., Syafaah, L. & Lestandy, M. Emotional text classification using TF-IDF (Term frequency-inverse document frequency) And LSTM (Long short-term memory). J. Informatika10, 2 (2022). [Google Scholar]
- 35.Ping Yu and XueBo Fu, Classification and identification of emotion of non-foreign music based on TR-Bi-LSTM emotion analysis. Researchsquare (2023).
- 36.Jia, C. et al. State of health prediction of lithium-ion batteries based on bidirectional gated recurrent unit and transformer. Energy285, 129401 (2023). [Google Scholar]
- 37.Salim, A., Jummar, W. K., Jasim, F. M. & Yousif, M. Eurasian oystercatcher optimiser: new meta-heuristic algorithm. J. Intell. Syst.31(1), 332–344 (2022). [Google Scholar]
- 38.Azizi, M., Aickelin, U., Khorshidi, H. A. & Baghalzadeh Shishehgarkhaneh, M. Energy valley optimizer: a novel metaheuristic algorithm for global and engineering optimization. Sci. Rep.13, 226 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Askari, Q., Younas, I. & Saeed, M. Political optimizer: a novel socio-inspired meta-heuristic for global optimization. Knowl.-based Syst.195, 105709 (2020). [Google Scholar]
- 40.Pengxu Wang, Electronic archive classification method based on convolutional neural network with fast text embeddings, 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC) (2024).
- 41.Najla Al-shathry, Badria Al-onazi, Abdulkhaleq Q A Hassan, Shoayee Alotaibi, Saud Alotaibi, Faiz Alotaibi, Mohammed Elbes, Mrim Alnfiai, Leveraging hybrid adaptive sine cosine algorithm with deep learning for arabic poem meter detection ACM Transactions on Asian and Low-Resource Language Information Processing (2024).
- 42.Eswaraiah, P. & Hussain, S. A hybrid deep learning GRU based approach for text classification using Word embedding. EAI Endorsed Trans. Internet Things10, 1 (2023). [Google Scholar]
- 43.Rahayu, S. P., Afuan, L. & Yunindar, G. A. Implementation of text mining on song lyrics for song classification based on emotion using website-based logistic regression. J. Teknik Informatika (Jutif)6(1), 359–368 (2025). [Google Scholar]
- 44.Mehra, Ashman, Mehra, Aryan & Narang, Pratik. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed. Tools Appl.84(7), 3701–3721 (2025). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Dataset 1: The data underlying this article are available in https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. Dataset 2: The data underlying this article are available in https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03.










































