Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 12;16:8527. doi: 10.1038/s41598-026-38813-z

Serial cascaded hybrid adaptive deep networks-based lyrics text classification using optimization approach

R L Jasmine 1,, Saswati Mukherjee 2, C R Rene Robin 3, G David Raj 4
PMCID: PMC12976351  PMID: 41680344

Abstract

Since electronic music is simpler to produce and distribute than analog music, the variety of musicals available worldwide has increased rapidly along with the music marketplace’s shift from analog to digital. Due to the abundance of available songs, people are discovering songs in various ways; one of them is by analyzing their emotional content. Not every age group can listen to the same music at all times. Deep learning techniques have yielded excellent results recently, marking a significant advance in NLP. However, there have been few attempts to use a deep learning model to sort out lyrics from improper music. Hence, a deep learning-based lyrics text classification process is presented in this proposal. Firstly, indispensable text data are fetched from the standard online resources and further, it is applied to the text pre-processing stage. After that, the resultant pre-processed text is subjected to the Serial Cascaded Hybrid Adaptive Deep Networks (SCHADNet) for classification purposes. The Transformer-based Bidirectional Long Short-Term Memory (Trans Bi-LSTM) is integrated with a Gated Recurrent Unit (GRU) for developing the model of SCHADNet, where the parameters of SCHADNet are optimally tuned by the Improved Marine Predators Algorithm (IMPA). Lastly, the classified outcome is accomplished from the SCHADNet. In order to enhance the classification performance, the developed model shows significant advancement by increasing the accuracy rate of 93.4%, 93.47% recall and 99.2% NPV, respectively. The numerical analysis is performed for the suggested lyrics text classification model over numerous classical text classification techniques to portray the effectiveness of the presented model.

Keywords: Lyrics text classification, Serial cascaded hybrid adaptive deep networks, Transformer-based bidirectional long short-term memory, Improved marine predators algorithm

Subject terms: Engineering, Optics and photonics

Introduction

Since the beginning of time, music has been a significant part of our lives. It profoundly affects the state of mind, ideas, and interactions with others while also evoking human feelings1. Our cultural and social life is enhanced by music, which has a range of effects on us. Perhaps the more widely used medium for data, pleasure, and leisure in the past few decades is music. Since lyrics are a means for artists to convey themselves, the library of electronic music is expanding quickly2. There are lyrics that hint at aggressive sexual or drug themes and contain material that is not appropriate for children’s ears. Recognizing an atmosphere in music is an ongoing process of exploration. It uses various techniques to identify the feelings connected to a musical composition3. These consist of lyric analysis of texts, sound evaluation, and other things. The majority of studies on musical classification rely on examining auditory signals and musical characteristics4. Employing a slang vocabulary is the initial method. This technique compares a song’s lyrics to a list of phrases considered obscene or improper. The music is deemed unsuitable if one or more of these conditions appear in its lyrics5. Nevertheless, since there isn’t a single profane vocabulary used by all businesses, the outcomes of this approach could differ6.. To ensure the swearing lexicon is updated with the latest offensive words, ongoing maintenance is necessary when using this strategy.

It is challenging to satisfy the requirements of users experiencing a range of emotions when the majority of tools just suggest well-known songs while ignoring personalized efforts7. The process of creating classification labeling was primarily manual prior to the development of sophisticated software, and tracks with various musical genres were arranged into appropriate song categories8. Nevertheless, these techniques seem not just ineffective but highly dependent on human judgment, and the precision of classification isn’t consistent9. The classic classification techniques, which mostly consist of techniques like Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), and Support Vector Machines (SVM), are currently maturing based on human classification10.Deep and other machine learning techniques have been used extensively to classify sound, picture, and text, and the results have been impressive11. With the development of computerized-related methods, computers are now capable of doing intricate calculations and emotional evaluation, as well as generating emotional outcomes12.The word embeddings of the lyrical data are used to determine the genre, which is highly related to the lyrics.

The minds of adolescents may be significantly impacted by such lyrics. Lyrics are becoming more explicit and aggressive13. Nevertheless, the methods in place to filter explicit words from song lyrics are ineffective14. There are several methods that have been proposed to classify texts, such as deep learning techniques like CNNs and RNNs, classification using machine learning methods, and lexicon-based filters. These experiments have produced differing degrees of efficiency and were carried out on various data sources and dialects15. According to some research, employing more sophisticated machine learning classifications could assist in achieving even greater gains16. While techniques based on machine learning demonstrate promising outcomes in the area of classifying music feelings, there is still a need for improvement in the comprehensive identification of sound feelings because the connection among phrases and harmony sentiments can be distinguished in various manners during the processing of phrases and melodies, without taking the uniformity of feelings among lyrics and melody into consideration17.

Motivation of the developed model

In general, the music plays an important role in the human emotions. Moreover, the lyrics are a vital part of the song that acts as an inevitable role helps to understand the emotions of the songs. It is crucial to categorize lyrics using various machine learning and deep learning approaches18. Several well-known classification techniques have been adopted to classify the lyrics text from the labeling data. In recent times, the utilization of deep learning models like CNN and RNN has achieved superior outcomes and provided an exciting breakthrough with the help of Natural Language Processing (NLP)19. The imbalance of data in traditional models can result in biased models that impact performance with less frequent classes due to the uneven class distribution. Due to the presence of noise, misspellings and inconsistent data can easily affect the data quality in CNN20. Training the CNN model is computationally expensive and required significant memory usage to understand the sequential nature of the lyrics. On the other hand, the RNN model has the ability to progress the data sequentially yet, it struggles to provide parallelize the computations. Thus, it results to beslow training process than the other traditional approaches21. Existing traditional models are still challenging due to inconsistent and redundant data, which often leads to misclassification. To rectify the issues in the existing models, the research work develops an effectual lyric text classification model based on deep learning to alleviate such challenges, and the contributions are given as follows.

  • To develop the effectual deep learning-based lyric text classification model using the optimization approach that helps to categorize the songs based on its mood, genre, sentiment, and performer.

  • To design the SCHADNet-based text classification model useful for recognizing and analyzing the meanings used in lyrics and facilitates the analysis of songs’ context within history and culture.

  • To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm.

  • To develop and evaluate the IMPA model by modifying the traditional MPA with an effective concept that helps in the parameter tuning and performance enhancement of the suggested lyrics text classification.

  • To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted for the lyrics text classification approach against a variety of traditional text classification methods.

The layout of the suggested framework is provided below. The automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep network is shown in Section "Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks". The pre-processing of text data for lyrics text classification is provided in Section "Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm". The hybrid adaptive deep networks for the lyrics text classification model are offered in Section "Serial cascaded hybrid adaptive deep networks for lyrics text classification model". The result and discussion are available in Section "Result and discussion". The conclusion is offered in Section “Conclusion”.

Literature survey

Related works

In 2020, Abdillah et al.22 have employed the deep learning Bi-LSTM algorithm with weighted keywords from GloVe to identify the song’s feelings utilizing its lyrics. The precision of the Bi-LSTM framework with the layer of dropout with activity periodicity was determined to be 91.08%. The difference between validation and training loss could be reduced by approximately 0.15 if the settings for abandonment, activity regulation, and instruction rate degradation were adjusted.

In 2023, Revathy et al.23 have used the Music4All database to assess the musical elements crucial for identifying four main human feelings: joyful, furious, calm, and unhappy. Several artificial intelligence methods based on a conceptual psychological model were used to accomplish this. To predict the mood of the desired information, a transfer learning method was used to comprehend the emotions of the lyrics derived from an in-domain database.A rudimentary lyric-suggested network was created using the phrase converter concept.

In 2022, Jia24 has proposed an approach for classifying musical emotions based on enhanced attention mechanisms and extensive knowledge. The characteristics of the tune’s songs were initially extracted, yielding the term frequencies weighted matrix and phrase vector. By combining the matched attention system with the extracting features capabilities of CNN and LSTM networks to handle serialized input, a framework for evaluating feelings was created. Ultimately, the CNN-LSTM model along with the Deep Neural Network (DNN)’s data outputs was combined, and the SoftMax algorithm was utilized to determine the different emotion kinds. Given the chosen data sets, the tests revealed that the suggested method’s mean accuracy in classification reached 0.848, greater than the average of the other comparative methods, while the method’s categorization efficiency had significantly increased.

In 2023, Chen et al.25 have developeda model by combining the deep learning as well as machine learning to extract lyrics from songs. The suggested model, ELSTM-VC, was compared to other algorithms due to the integration of extra branch classifiers and LSTM. With its ability to identify sexually explicit material in English phrases, the ELSTM-VC has potential applications for the entertainment sector. The suggested method successfully identified explicit phrases, according to the study’s findings, which were based on an array of 100 songs on Spotify utilized for learning. It has the ability to accurately extract content that is objectionable for younger audiences. The suggested strategy outperformed other strategies, such as encoding-decoding algorithms and models for machine learning.

In 2023, Li et al.26 have suggested a multimodal structure for classifying music genres that used lyrics and audio files. By embracing the complementary nature of multisensory data, it is possible to achieve a more thorough representation of musical styles. A CNN was employed to gather audio characteristics after the structure had first retrieved the audio’s mel-spectrogram. BERT used multiple methods concurrently to acquire the lyrics’ dispersed representation. Subsequently, the two multimodal pieces of data were combined using several techniques, including features and choice-level fusion. To address the significant difference in convergence rate between the sound channel and the melody stream, the asynchronous technique was employed at the beginning of two streams with various models. A number of tests were conducted to confirm the suggested model’s efficacy. In terms of the genre of music categorization, the suggested approach’s F1 score represented 0.87, a value that was almost 4% greater than the highest background in the trial.

In 2018, Delbouys et al.27 have developed the multimodal musical mood forecasting model using a track’s words and sound input. The use of conventional feature engineering-based methods was replicated and put forth a novel deep learning-based model. The method was able to outperform conventional algorithms on the excitation identification task, but both techniques performed similarly on the emotion forecasting challenge. The efficacy of both methods was assessed on a collection of data that had 18,000 recordings with related arousal and valence scores. The integration of modality optimized concurrently for every single-modal model resulted in a significant increase in valence predictions when evaluated afterward. A portion of the database was made available for examination.

In 2023, Carmo et al.28 have identified an imbalance in the existing research on musical data mining by applying text-based representation methods to the issue of categorizing melodic sub-genres. Identifying the line that separates groups from a single category is the challenge of the issue, given that they share several characteristics. Extensive tests were conducted in order to determine the most effective blend of written models and classifiers. The findings demonstrated that enhanced Bag-of-Words (BoW) using the Support Vector Machine (SVM) with LR methods outperformed DNN and integrating algorithms in terms of performance. The findings may lead to further research on the classification of texts with complex and delicate interfaces of separateness.

In 2017, Tsaptsinos29 has created models for continuous neural systems for organizing a big collection of whole lyrics to songs. To use each of these strata and comprehend the significance of the phrases, paths, and sections, a Hierarchical Attention Network (HAN) was utilized. Lyrics display a hierarchical layered framework, where words merge to create lines, lines create sections, and sections make the whole song. A reduced database of 20 genres was used and an expanded dataset with 117 genres to evaluate the framework. The HAN’s performance in experimental data was superior to that of less difficult computational models and non-neural designs, and it was also capable of discriminating across a wider range of categories than previous studies. During the process of learning, it will additionally be possible to see what lyrics or words of the music that the example considers crucial for dividing its genre. Consequently, the HAN offered insights into the linguistic characteristics and poetic organization that distinguish distinct genres of music from a computing standpoint.

Problem statement

Text classification is a common process that includes categorizing the text into groups utilizing advanced approaches. The text classifier has the ability to evaluate the text and assign pre-defined classes or tags based on its content. From the lyrics text classification, approaches such as categorizing music mood, genre, sentiment, and performer can be carried out. Numerous text classification works have been presented using lyrics. Some of the method’s merits and issues are given in Table 1.

  • In conventional techniques, dealing with a massive number of data in high-quality datasets can merely affect the accuracy of the model. Training and testing a large amount of data is a time-consuming and challenging process. Incorporating the transformer, Bi-LSTM and GRU model ensures to learn the intrinsic patterns and has the ability to train the model by considering the large number of data. Thus, it greatly strengthens the accuracy performance in lyrics text classification model.

  • Understanding the contextual relationship of the words and phrases becomes difficulty and prone to increase the errors in the text data. Overfitting and poor performance for unseen data can be a result of existing deep learning models. On the other hand, the implemented model in this research work ensures to split the data into training as well as testing phases. The developed model can certainly minimize overfitting issues to improve the model’s overall performance in this context.

  • Existing preprocessing techniques often provide inaccurate outcomes, especially in inconsistent formatting, noisy data and nuances of musical language. Eliminating the redundant and inconsistent content in the lyrics text data becomes challenging the traditional techniques. However, the research work focus on effective data pre-processing by considering the punctuation and special character removal, removal of redundant and inappropriate data and stemming to emphasize the overall performance. The data pre-processing phase eliminates the noisy outcome to maximize the model’s accuracy.

  • The presence of repeated data can impact the classification performance in the traditional models. Most of the research works do not focus on tuning the parameters. The optimization algorithm’s parameter tuning plays a crucial role in selecting the optimal parameters. In this research work, the fine tuning parameter optimization is done with the help of IMPA algorithm by selecting the appropriate parameters to get the optimal solutions.

Table 1.

Discussionon the conventional lyric text classification models.

Author [citation] Methodology Features Challenges
Abdillah et al.,22 Bi-LSTM • It enhances the available network data and enhances the contexts. • It performs slow calculations.
• It offers better data representations. • It consumes more training time.
Revathy et al.,23 BERT • It gives high-accuracy solutions. • It is a very expensive model and demands more computation.
• It requires very little memory. • It has a complex network.
Jia24 CNN • It automatically recognizes the relevant patterns. • It shows limited efficacy for the sequential data.
• It can process a high amount of data with more accuracy. • It may be affected by overfitting issues when processing small datasets.
Chen et al.,25 LSTM • It offers high-accuracy solutions. • It is computationally intensive and expensive.
• It rectifies the gradient issues of the network. • It is prone to chaotic, complex, and noisy data.
Li et al.,26 BERT • It provides better predictions due to its bi-directional nature. • It is not a straightforward process.
• It evaluates all the input without any particular direction. • It produces undesired outcomes.
Delbouys et al.,27 DNN • The computation required is minimal. • It is very hard to interpret and lacks domain expertise.
• It is very flexible and performs complex tasks. • It needs more data to train the network.
Carmo et al.,28 SVM • It prevents the network from the overfitting issues. • Complete data sources without any missing values are necessary.
• It performs rapid prediction and has good generalization. • It provides poor performance for large data sources.
Tsaptsinos29 HAN • It is very helpful in detecting the significant data. • It has dimensional issues.
• It provides better functionality in complex data. • It consumes more resources for the execution.

Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks

Proposed lyrics text classification model

Textual lyrics pose a number of classification issues. The subjective nature and comprehension of lyrics serve as one of the primary obstacles. Individual listeners can interpret identical lyrics in different ways as they reflect the listener’s events, feelings, and viewpoints. Because of this individuality, it is challenging to develop a systematic categorization scheme that faithfully conveys the lyrics’ purposeful meaning. Lyrics’ intricate structure of language presents another difficulty. Numerous songs include literal spoken language, analogies, phraseology, and symbolic references that may pose a challenge for algorithmic techniques to understand. Advanced techniques are needed to effectively categorize and evaluate lyrics due to their complex and nuanced wording.

The overwhelming amount of lyrical data is another major obstacle. To manage the quantity of tunes and lyrics, sophisticated algorithms are necessary for analyzing and evaluating such large volumes of text. The variety of styles and categories also makes categorization even more difficult. Developing a classification system that works effectively for all genres of music is difficult because every genre may have its own distinct lyrical qualities. Furthermore, delicate or sexual content occasionally appears in songs. This makes it difficult to moderate material and guarantee proper filtering, particularly on sites wherein lyrics are posted publicly. Keeping a secure and welcoming atmosphere requires the development of strong content-filtering algorithms that can reliably recognize and identify possibly dangerous or unsuitable lyrics. By dealing with these issues, it is possible to gain a greater understanding of the global context of lyrics and songs, which enhances one’s comprehension and appreciation for the uniqueness of songs. So, we developed an effectual lyric text classification, and the pictorial view is provided in Fig. 1.

Fig. 1.

Fig. 1

Pictorial view of the developed lyric text classification model.

A novel lyric text classification model is implemented, where the primary objective of this process is to effectively categorize songs based on their mood, genre, sentiment, and performer, resulting in a better understanding of the songs for further analytics. This classified solution helps listeners and scholars to examine and investigate musical patterns, themes, and styles. Most of the time, the lyrics have a more implicit, and subtle tone, demanding a deeper understanding of the emotional undertones. Also, the emotional categorization of lyrics in the texts is somewhat subjective due to the music and personal interpretation of the lyrics.Therefore, classifying the emotions in song lyrics is more significant than any other text such as books. This lyric classification model provides the insights of the individual’s inner feelings. Generally, the necessitated data are garnered using benchmark sources of data. In addition, the data gathered is subjected to pre-processing to enhance its quality. In this stage, operations such as (i) punctuation and special character removal, (ii) removing redundant and inappropriate data and (iii) stemming are performed. After performing the text pre-processing, the resultant data is given to the classification stage for categorizing the lyrics text. The SCHADNet model is created to achieve effective text classification by combining Trans Bi-LSTM and GRU models.To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm. The SCHADNet model that was developed provided text-classified results. The output classes are various genres, moods, performers, and sentiments of the song.

Text dataset for classification analysis

The data necessitated to carry out the lyric text classification model are as follows.

Dataset-1 ("Multi-Lingual Lyrics for Genre Classification dataset"): The data are collected using the link of https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. This dataset is in the Kaggle platform. This dataset includes two files in.csv with 11 columns. The size of this dataset is 341 MB and includes 291118 songs.

Dataset-2 (“Song-lyric-classification datasets”): The data are garnered using the link of https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03. The database has the motive to predict the emotions of the songs on the basis of its lyricsandgenres. The size of this dataset is 1.86 MB and includes 1369 songs. This dataset is in the.csv file.

Dataset 3 (“Veucci/lyric-to-3genre”): utilizing the link, “https://huggingface.co/datasets/Veucci/lyric-to-3genre: access date: 2024-08-16”, this dataset has been accessed. This data source includes numerous song lyrics from distinct genres and artists in English. The genre such as rock, hip-hop, and pop are utilized in this resource.

By utilizing these datasets, the mood, genre, sentiment, and performer of the song is classified from the lyrics.

From the datasets, the collected data is defined by Inline graphic. In this, the total count of the gathered text is expressed as Inline graphic.

Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm

Text data pre-processing

The text pre-processing is a significant step in the data preparation, where the original text is converted into a suitable, clean, and consistent format for modeling and evaluation. While modern techniques such as BERT, BART, and GPT are trained with punctuation and stop word removal, the pre-processing state is still needed for some reasons such as improving data quality, noise minimization, model explainability, computational efficiency, and model interpretability, and so on. Moreover, the mentioned BERT, MART, and GPT are transformer-based models. Pre-processing text data may be less time-consuming than performing the pre-processing task with these models. Hence, the text pre-processing is performed in this work for improving the text quality. The collected data Inline graphic are inputted for text data pre-processing stage.

Punctuation and special character removal

The punctuation removal is the process of replacing or depleting the punctuation marks from the text data. Some of the punctuations are periods (.), commas (,), colons (:), parentheses (()), dashes (-) and so on. By removing these punctuations, text data can be simplified, noise can be minimized, and focus can be placed on meaningful words. The special character removal is the process of removing the special characters such as symbols (+,-,=), currency signs ($), HTML tags (<,>), and so on. This special character removal results in enhanced model performance, text representation, and so on.

Remove redundant and inappropriate data

Redundant data is material that is repeated or reproduced within the lyrics. By decreasing needless reiteration eliminating duplicate information serves to simplify the classification procedure. The definition of improper data in lyrics text classification is that it is not related and obnoxious to the classification process. This also includes information that is irrelevant, offensive, or explicit to the operation. The improper data can include particular terms or overall songs that are not related to the classification aim. For example, if the aim is to categorize the lyrics on the basis of their emotional content, entire songs, or terms that contain explicit or offensive language irrelevant to the emotions can be considered improper and eliminated during the pre-processing process.

Stemming

The practice of distilling languages to their basic or core shape is referred to as stemming. By using this strategy, lyrics’ terms can be created while reducing the intricacy and diversity of terminology. It can gather several variants of a single word, like “performing” and "performs," into a prevalent root, like "perform." This streamlines the categorization procedure by considering comparable words, irrespective of their particular shape, as identical. By lowering the degree of dimensionality for the information, stemming can increase the precision and effectiveness of models for lyrical text categorization. After executing all these processes, the pre-processing stage given the pre-processed data is indicated by Inline graphic.

Marine predators algorithm

This part examines the MPA30 method’s evolution as a straightforward and effective meta-heuristic optimization technique.

Detecting top predator phase

MPA is a population-level approach, where initialization process in the searching space as in Eq. (1).

graphic file with name d33e693.gif 1

Here, the term Inline graphic is an even randomized matrix in the interval from 0 to 1, and Inline graphic and Inline graphic are the lowest and highest limits for parameters.

The elite matrix is created by selecting the fittest outcome as the top predator.This matrix’s panels manage the process of looking at and locating the target using the location of the target in Eq. (2).

graphic file with name d33e716.gif 2

Here, the term Inline graphic is the elite; in order to create the Inline graphic, the top predatory vector form denoted by Inline graphic is copied Inline graphic periods. The total amount of dimensionality is Inline graphic, while the quantity of agents that search equals Inline graphic. If the superior predator replaces the most powerful predatory at the conclusion of each cycle, the term Inline graphic is going to be changed.

Prey Inline graphic is a different matrix having identical dimensions as Inline graphic, and hunters adjust their places according to it. To put it simply, activation produces the first batch of prey, and then the best of the best builds the Inline graphic. The term Inline graphic is calculated using Eq. (3).

graphic file with name d33e773.gif 3

Eq. (3) shows the Inline graphic aspect of the Inline graphic victim as Inline graphic. It ought to be mentioned that each of these arrays play a major but immediate role in the optimum procedure in its entirety.

MPA optimization scenarios

The MPA optimization procedure is broken down into three key phases and is explained as follows.Levy and Brownian motion are the primary random walks used in the MPA.

The Levy motion is a kind of arbitrary walk that the sizes of the steps are estimated from the probability function and are expressed as Inline graphic. Here, the distribution index is specified as Inline graphic and the attributes Inline graphic are two specific points.

The probability function referenced by unit variance and normal distribution determines the length of step in the stochastic operation of Brownian motion. It is expressed as Inline graphic. Here, Inline graphic is the mean and Inline graphic is the unit variance.

Phase-1: The optimum tactic for an attacker in a high-velocity proportion Inline graphic is to move very slowly, according to the guidelines. This rule’s mathematical framework is used as in Eq. (4) in the case of Inline graphic.

graphic file with name d33e841.gif 4

Here, the term Inline graphic represents an integer that the Brownian motion. Multipliers by entries are indicated by the symbol Inline graphic. Prey replicates its movements by multiplying Inline graphic in Inline graphic, Inline graphic representing an array of uniformly at random values, and Inline graphic providing a constant. The maximum number of iterations is Inline graphic, and the present repetition is Inline graphic.

Phase-2: If food travels in Levy at a particular speed ratio Inline graphic, its optimum course of action is Brownian. Using Eq. (5), the movements of prey take place in Levy and predators in Brownian While Inline graphic.

graphic file with name d33e895.gif 5

Here, the Levy motion is represented by a series of arbitrary numbers Inline graphic, which depends on the Levy probability. Prey motion is simulated in a Levy fashion by multiplying Inline graphic and Inline graphic, and predator motion is simulated by adjusting a step number to the predator location. The current investigation is based on Eq.(6) for the remaining groups.

graphic file with name d33e916.gif 6

Conversely, Inline graphic is thought of as a trait of adaptation that regulates a step’s size enabling the motion of predators. The target changes where it is in response to the predators’ Brownian motion, whereas the predator’s motion is simulated by multiplying Inline graphic and Inline graphic.

Phase 3: Levy is the most effective predation technique at low-speed ratios Inline graphic. This stage is described in Eq. (7) While Inline graphic.

graphic file with name d33e949.gif 7

In the Levy tactics, the motion of the hunter is simulated by multiplying Inline graphic and Inline graphic, and the motion of the prey is simulated by adding the number of steps to Inline graphic status, which aids in the updating of prey location.

Eddy formation

The Fish Aggregating Devices (FADs) impact is expressed numerically in Eq. (8).

graphic file with name d33e974.gif 8

The possibility of FADs impacting the optimization process is represented by Inline graphic. A binary column of an array containing one and zero is denoted as Inline graphic. This is created by creating an arbitrary vector within the interval Inline graphic, and when its length is smaller than 0.2, switching it by just one, and when it is more than 0.2, switching it to 0. Here, the term Inline graphic indicates a uniform randomized integer Inline graphic. A vector with the bottom and top limits of the size is denoted by the terms Inline graphic and Inline graphic. The letters Inline graphic and Inline graphic represent the prey matrixes’ randomized indices.

Proposed IMPA-based classification performance enhancement

The developed IMPA is employed for tuning the parameters in the developed SCHADNet-based lyric text classification model. The conventional MPA has several cons and pros. The exploration and exploitation are efficiently balanced by the MPA algorithm. It explores searching areas and takes advantage of the most effective solutions currently discovered by employing the concepts of predator escape and victim pursuit. It has proven to be effective in resolving intricate optimization issues involving several optima. Thus, the MPA is selected for the suggested SCHADNet model’s optimization. However, in circumstances when the problem environment is fluid, it might have trouble. Its efficiency may be impacted if it is unable to react swiftly to abrupt shifts in its surroundings. It might have trouble growing up to highly dimensional issues. The technique could make it more difficult to efficiently search for and identify the best answers when complexity rises. As a consequence, we developed an enhanced MPA named IMPA to optimize the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy, sensitivity along with reduce the FNR and FPR.The conventional MPA includes phases like detecting top predators, MPA optimization scenarios, eddy formation, and marine memory saving, to provide more results lyric text classification process, but the developed IMPA is modified by removing the top predator detecting phase and marine memory saving phase from the conventional MPA.This IMPA-aided parameter tuning enhances the performance rates of the developed lyrics text classification process with better convergence rates.

Optimization process using developed IMPA

The optimization process begins by considering the population and iteration count of the movement of predators and prey in the marine ecosystem. Here, the iterative process begins by considering the several run times for getting the optimal solutions. Moreover, the solution in the developed IMPA algorithm gets encoded to maximize the model performance. The optimal solution can be effectively attained by adjusting the essential parameters by minimizing the error rate and maximizing the desired outcomes. Deriving Eq. (21), the objective function of the model is evaluated to select and understand the relevant parameters based on the algorithmic rules. Iteratively updating the random parameters could significantly improve the models searchability and provides a better balance between the exploitation and exploration abilities.Fig. 2 and Algorithm 1 offer the Flowchart and pseudo-code for the proposed IMPA algorithm.

Fig. 2.

Fig. 2

Flowchart for the proposed IMPA algorithm.

Algorithm 1.

Algorithm 1

DevelopedIMPA

Serial cascaded hybrid adaptive deep networks for lyrics text classification model

Bidirectional long short term memory

Bi-LSTM31 was generated from an LSTM that can train on its own from the initial pattern vector of definition documentation is the Bi-LSTM part that relies on a deep learning network design.

The Bi-LSTM can generalize and capture the features deeply. As an outcome, the Bi-LSTM becomes generalized network for invisible features. Word order, bidirectional contextual relationships, and dependence can be acquired over time. Moreover, it can address disappearance gradients and inflation problems with effectiveness.

The two LSTM neural networks in Bi-LSTM, which have forward as well as backward feedback, are linked to a single layer of output as soon as it comes to classification. Bi-LSTM adds reservations and context knowledge for every point to the incoming series to increase reliability. Three gates’ configurations and a single-cell condition make up the fundamental construction of an LSTM component.

The input, forgets, and output gates control how the cell status is updated and preserved throughout the LSTM component. The forget gate decides how to keep knowledge of the previous unit nation, the output gate manages which elements of the revised state of the cell are produced, and the input gate manages what components of the fresh data are retained in the cell’s memory. The subsequent Eq. (9) to (13) illustrates the particular operation that uses the LSTM components.

graphic file with name d33e1070.gif 9
graphic file with name d33e1074.gif 10
graphic file with name d33e1078.gif 11
graphic file with name d33e1082.gif 12
graphic file with name d33e1086.gif 13

Here, the terms Inline graphic and Inline graphic constitute the source vector as well as concealed layer value at duration Inline graphic, respectively, and Inline graphic, Inline graphic, Inline graphic, and Inline graphic indicate the results of the intake gate, forget gate, output gate, and cell at period Inline graphic. The weighted array and biases variable are represented by Inline graphic and Inline graphic, accordingly, and their underscores, which include Inline graphic and Inline graphic, signify the matrix of weights and biases field that are part of the gate’s input architecture. The term Inline graphic stands for the function of sigmoid activation.

Context-sensitive data can be retained by the Bi-LSTM through the use of the LSTM component.The Bi-LSTM architecture has two LSTM coatings that are identical in both directions. Like traditional LSTM neural systems, both of the concurrent layers of LSTM function in a comparable way. The input vector Inline graphic in the other side is handled by two separate layers of LSTM for the front and backward instructions, accordingly, for the Inline graphic temporal increment. The result is the total of the secret state carriers, which is expressed in Eq. (14).

graphic file with name d33e1159.gif 14

In this case, Inline graphic indicates the bias, the terms Inline graphic and Inline graphic are the weighted variables for the two simultaneous layers of LSTM in the forward as well as backward orders, correspondingly, and the terms Inline graphic and Inline graphic represent the final outcomes of both concurrent layer LSTM.The Bi-LSTM model’s graphical presentation is displayed in Fig. 3.

Fig. 3.

Fig. 3

Graphical presentation of the Bi-LSTM model.

Gated recurrent unit

GRU32 represents a new technology that is reminiscent of LSTM, an improved variation of RNN. It transfers data using a hidden state rather than the cell’s internal state. In addition, there are two gates: the resetting gate and an updated gate. The first gate chooses what knowledge about past events to discard. The later gate indicates that the choice to toss or retain fresh data has been made. The data is scaled from Inline graphic by a sigmoid barrier and the graphically illustrated in Fig. 4.

Fig. 4.

Fig. 4

Graphical illustration of the GRU model.

If 0, the state of hiding does not allow any knowledge to pass through, and 1 indicates that details must be inputted during the following state. The term Inline graphic gates are candidate phase activation mechanisms that crush the values among Inline graphic. The subsequent Eq. (15) to (18) is available from GRU.

graphic file with name d33e1234.gif 15
graphic file with name d33e1238.gif 16
graphic file with name d33e1243.gif 17
graphic file with name d33e1247.gif 18

Here, the term Inline graphic serves as a timestamp Inline graphic is the entered significance, and Inline graphic gives the state that is hidden. The respective weights that correspond to the updating Inline graphic and resetting Inline graphic gating are denoted by Inline graphic and Inline graphic, correspondingly. On the other hand, the term Inline graphic is a possible output.Some major benefits of GRU versus LSTM or similar longitudinal learning algorithms include effective training with fewer variables, insensitivity to sound, and greater distributed data within GRU. To identify temporal connections and variations in usage, GRU is used to compare what was consumed at one stamp to the amount consumed at the next stamp and make predictions on this basis. It is also economical concerning memory and time due to the usage of two Gates.

Developed SCHADNet-aided lyrics text classification

Transformer

Transformer33 is the encoder-decoder architecture that uses the sequence-to-sequence conversion. An encoder was used to convert text into vector form during this classification process.The particular functioning technique is to use the encoding component to transform the inputs to a vector that has fixed-length software. Yet, the goal of this research is to matrix encoding the initial input environment or perspective in order to extract a high-level characteristic; just the encoder component of the Transformer is employed since there isn’t a requirement to turn this encoded vector to series outputs. This section consists of Inline graphic equal levels, where every layer consists of a pair of sub-layers: an entirely linked forward feed system and a multiple-head attention system. The residual link and standardization processes will proceed after the two sublayers. Mounting several scaling dot-product attention yields multiple heads of attention between individuals.The transformer’s intake constitutes a vector of Inline graphic which contains Inline graphic phrases acquired through the embedding layer’s input. Three numbers of linear transforming vectors Inline graphic,Inline graphic,Inline graphic are arbitrarily set and multiplied to the inputting vector to acquire the query vector Inline graphic, key vector Inline graphic, and value vector Inline graphic, wherein Inline graphic indicates the concealed dimensions. Transformer Encoding was crucial due to the scaling of dot-product attention.Standardizing the resemblance scalar is necessary to determine its weight. After that, the closeness among every vector Inline graphic of the query column and every vector Inline graphic in the key matrices is computed. The vector of weights is subsequently divided by the total phrase worth in the phrase to obtain the scaling dot-product attentiveness result using Eq. (19).

graphic file with name d33e1349.gif 19

Here, the square root of the vector size Inline graphic within the Inline graphic matrix is often used as the coefficient of scaling Inline graphic. A significantly greater number of characteristics may be obtained through repeatedly learning various categories following the Inline graphic order linear change of the query, key, and value array using various settings. After that, the multi-head attention mechanism generates the following output in Eq. (20).

graphic file with name d33e1375.gif 20

Here, the concatenated vector is Inline graphic explained as Inline graphic. The embedded characteristics and the contents are inputted to the encoder of the transformer. Following this, the concealed presentation process Inline graphic,Inline graphic and the word level concealed presentation of Inline graphic,Inline graphic is taken.

Reason for choosingSCHADNet model

The SCHADNet model is developed for the classification of lyrical text. The pre-processed data Inline graphic are given to the recommended SCHADNet model for classifying the lyric text. The models such as transformer, Bi-LSTM and GRU are incorporated into the developed SCHADNet model. This work utilized the GRU and Bi-LSTM as primary networks for the classification process. These two techniques are relatively better than the conventional transformers because of faster training and efficiency. Moreover, these networks are low-cost and determine richer features than other models. The Bi-LSTM technique can capture the contextual relationships among the words and minimizes the impact of noisy data.However, the Bi-LSTM model’s sequential nature makes it complex to parallelize, resulting in poor scalability.The Trans-Bi-LSTM network effectively performs the tasks in parallel, thus improves the scalability. Though this network has better scalability, it struggles to handle the very long sequences and may face the vanishing gradient issues. Therefore, the GRU is combined with the Trans-Bi-LSTM model, thus forming the hybrid network. This hybridized network handles the variable-length sequences and also enhances the interpretability. Here, the obtained features from the Trans-Bi-LSTM are passed to the GRU model for further processing. After classifying the text features, the GRU model offered the classified outcome. Although this serially cascaded hybrid network offers relatively promising solutions, the parameters in the network require careful tuning for achieving maximum accuracy in the text classification process. For this objective, the IMPA is considered. This is an effective algorithm offering optimal solutions with better convergence values. Therefore, by employing the IMPA, the parameter tuning is performed. Thus the SCHADNet network is chosen for text classification.

The transformer model is combined with this technique, thus constructing the Trans-Bi-LSTM. Initially, the obtained preprocessed data are fed into the Trans-Bi-LSTM model, which supports determining and processing the inputted features, and it comprises the transformer and Bi-LSTM.The SCHADNet-based lyric text classification model is used to recognize and analyze lyric texts. The parameters like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy Inline graphic, sensitivity Inline graphic along with reducing the FNR Inline graphic and FPR Inline graphic and the mathematical formulations are shown in Eq. (22) to Eq. (26). The objective function Inline graphic of the recommended SCHADNet-based lyric text classificationsystem is down in Eq. (21).IMPA’s support has enabled the SCHADNet network to provide highly accurate classified solutions. In the SCHADNet training process, the dataset is divided into two sections for training and testing in the ratio of 75:25. The training data is used to train and use the SCHADNet model for the classification process.

graphic file with name d33e1446.gif 21

Here, the terms Inline graphic and Inline graphic denote the optimized hidden neurons in Trans-Bi-LSTM and GRU in the range of Inline graphic and Inline graphic, and the terms Inline graphic and Inline graphic define the optimized epochs in Trans-Bi-LSTM and GRU in the range of Inline graphic and Inline graphic.

(i) Eq. (22) is used to assess accuracy Inline graphic.

graphic file with name d33e1494.gif 22

(ii) Eq. (23) is used for estimating sensitivity Inline graphic.

graphic file with name d33e1507.gif 23

(iii) Eq. (24) is used for defining the False Negative Rate (FNR)Inline graphic.

graphic file with name d33e1520.gif 24

(iv) Eq. (25) is used for evaluating the False Positive Rate (FPR)Inline graphic.

graphic file with name d33e1533.gif 25

In this case, the terms Inline graphic and Inline graphic constitute the true positive and true negative, Inline graphic and Inline graphic stand for the false negative and false positive, correspondingly.Fig.5 displays the representation of SCHADNet-aided lyrics text classification model.

Fig 5.

Fig 5

Developed SCHADNet-aided lyrics text classification model.

Interacting the model with each other

In order to provide better classification performance, the effective data pre-processing is achieved to provide cleaned data without the noise. With the help of pre-processing techniques, it effectively removes punctuation, special characters, redundant data, and inappropriate stemming. Utilizing the pre-processing helps to minimize the computational complexity and strengthen the models capability. The data can be cleaned and provided with meaningful information by removing punctuation and special characters. So, the noisy and irrelevant data is cleaned for extracting the relevant features that leads to enhance the classification performance. Thus, the outcome of the model is simple and easier to understand the model to get the precise outcomes. In this context, the effective preprocessed outcome Inline graphic is inputted into the SCHADNetclassification model to precisely classify the lyrics text. The SCHADNet model that was developed can learn semantic patterns to understand the user’s preference based on underlying patterns.

Result and discussion

Simulation setup

Python platform was employed in the lyrics text classification for the entire processing. The proposed IMPA scheme used 50 maximum Iterations. The suggested IMPA algorithm’s population input is referred to. Here, the populations are encoded by utilizing the required amount of parameters. Here, the IMPA’s number of populations was 10. The chromosome represents an individual solution encoded in a format suitable for manipulation by the algorithm. For the IMPA, the length of the chromosome was 4 The designed SCHADNet-based lyrics text classification process considered the following parameters: The number of epochs-50, batch size-16, number of LSTM units-64, number of transformer encoder layers-2, number of attention heads-4, learning rate: 0.0001, number of GRU units: 64, dropout rate: 0.2, hidden layer size-128, activation function: TanH, optimizer: {SGD, Adam, RMSprop}. Finally, the network produces highly accurate lyrics text classified solutions. The performance was validated with numerous existing systems like LSTM34, Trans-Bi-LSTM35, GRU32 and Trans-Bi-LSTM-GRU36, and the algorithms like Eurasian Oystercatcher Optimizer (EOO)37, Valley Optimizer (EVO)38, Political Optimizer (PO)35 and Marine Predators Algorithm (MPA)26.

In experiment, the selection of parameters is treated as an automated search for the most efficient configuration within a predefined range. The process begins by defining the specific hyperparameters such as hidden neuron counts and epoch sizes. The IMPA then initializes a population of candidate solutions. Each candidate represents a unique combination of parameters that is used to train the Trans Bi-LSTM or GRU. The resulting performance is assigned as a best fitness score to that specific combination. As the algorithm iterates, it refines these values through exploration and exploitation phase. Throughout this process, the algorithm constantly compares new combinations against the current best performer. Upon reaching the maximum number of iterations, the IMPA outputs the global best solution, which contains the optimized values for neurons and epochs that yielded the highest accuracy. These optimized values are then finalized as the parameters for the experimental model. Thus, choosing the hidden neuron count in Trans Bi-LSTM and GRU within the range of [5–255] can effectively balance architectural depth with computational efficiency. Further, selecting the number of epochs in Trans Bi-LSTM and GRU within [5–50] helps to generalize well on unseen data.

Experimental measures

The following measures are employed to develop the lyric text classification framework.

(a) Eq. (19) determines precision Inline graphic.

graphic file with name d33e1628.gif 26

(b) Eq. (21) can be used to determine the F1-Score Inline graphic.

graphic file with name d33e1641.gif 27

(c) Eq. (24) is used to assess specificity Inline graphic.

graphic file with name d33e1654.gif 28

(d) When applied Eq. (26), yields the Matthews correlation coefficient (MCC)Inline graphic.

graphic file with name d33e1667.gif 29

(e) Eq. (27) is used to classify Negative Predictive Value (NPV)Inline graphic.

graphic file with name d33e1680.gif 30

(f) Eq. (28) provides a definition for False Discovery Rate (FDR)Inline graphic.

graphic file with name d33e1693.gif 31

Convergence analysis

Fig. 6 provides the analysis on proposed lyric text classification model considering the convergence score.The proposed technique’s convergence over the existing models is validated using this cost function-based experiment.The developed lyrics text classification model given a cost function score is 11.42% lower than EOO-SCHADNet, 9.26% lower than EVO-SCHADNet, 11.26% lower than PO-SCHADNet and 13.48% lower than MPA-SCHADNet at the 30th iteration. When considering the 40th iteration, the cost function is 17.33% lower than EOO-SCHADNet, 7% lower than EVO-SCHADNet, 12.47% lower than PO-SCHADNet and 11.42% lower than MPA-SCHADNet.The proposed IMPA-SCHADNet achieved a higher convergence rate than the existing techniques due to the lower cost function values of the designed model. Also, it has been reported that the IMPA-SCHADNet technique is efficiently supported to classify the texts in the lyrics than the other models.

Fig 6.

Fig 6

Cost function analysis on the developed lyrics text classification model based on (a) Dataset-1 and (b) Dataset-2.

Dataset-1-based performance analysis on the proposed lyrics text classification model

Dataset-1-based analysis on the lyrics text classification model is shown in Fig. 7 with existing classifiers and Fig. 8 with heuristic approaches.This experiment takes into account activation functions such as linear, sigmoid, TanH, softmax, and ReLU to ensure the designed model’s improved rates of performance. This activation function-aided validation ensures the designed model how effectively classifies the lyrics than the other traditional techniques. When analyzing the classifiers, the developed model offered an accuracy value score is 8.23% more than LSTM, 2.22% enhanced than Trans Bi-LSTM, 6.97% increased than GRU, and 1.09% superior to Trans-Bi-LSTM-GRU while analyzing the sigmoid function. When taking the TanH function, the developed model offered the NPV value based on algorithms is 0.91% superior to EOO-SCHADNet, 0.61% more than EVO-SCHADNet, 0.51% enhanced than PO-SCHADNet and 0.24% increased than MPA-SCHADNet.The IMPA-SCHADNet model is better suited for text classification than any other traditional techniques because of its superior value.

Fig 7.

Fig 7

ClassifierAnalysis on the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Fig 8.

Fig 8

Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Dataset-2-based Analysis on the proposed lyrics text classification model

Dataset-2-based performance analysis on the proposed lyrics text classification model with existing classifiers and heuristic approaches are shown in Fig. 9 and Fig. 10.This experiment also utilized the various standard activation functions for analyzing the designed model over other previous techniques. Based on classifiers, the developed model offered an accuracy value is 27.28% more than LSTM, 31.03% enhanced than Trans-Bi-LSTM, 22.05% increased than GRU, and 18.8% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. When considering the Linear function, the developed model offered the NPV value based on algorithms is 9.62% superior to LSTM, 6.8% more than Trans-Bi-LSTM, 4.92% enhanced than GRU and 2.56% increased than Trans-Bi-LSTM-GRU.According to the other performance measures, the designed technique has better rates of performance than the other classification techniques. Thus, it has been elucidated that the designed lyrics classification approach offers relatively more efficient solutions than any other models when considering the second dataset.

Fig 9.

Fig 9

ClassifierAnalysis on the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Fig 10.

Fig 10

Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Performance analysis of developed model using dataset 3

Based on the third dataset, the experimental validation is given in Fig.11 and Fig.12 over the previous algorithms and models. Here, the graph analysis is conducted by considering the different activation functions of ReLu, sigmoid, linear, tanh and softmax is validated to provide superior outcomes. This experiment validation shows the suggested lyrics text classification framework’s superior solutions with the support of various activation functions. When considering the ReLU activation function in Fig.11 (b), the FNR of the designed lyrics text classification process is minimized by 38.82% of LSTM, 61.17% of Trans-Bi-LSTM, 35.29% of GRU, and 11.76% of Trans-Bi-LSTM-GRU respectively. The design of the lyrics text classification process resulted in relatively lower error rates than other models, which led to an increase in performance rates.

Fig. 11.

Fig. 11

Classifier-based analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, and (c) Precision.

Fig. 12.

Fig. 12

Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, and (c) Precision.

Overall classifier analysis on the proposed lyrics text classification model

Table 2 illustrates the overall classification analysis of the proposed lyrics text classification model based on three datasets. Standard true and false measures are used for this experimental validation. These measures show the reliability and efficiency of the presented work over the existing models.In dataset-1, the developed model provided the F1-Score is 20.40% more than LSTM, 8.16% enhanced than Trans Bi-LSTM, 13.69% increased than GRU, and 6.04% superior to Trans-Bi-LSTM-GRU. When considering dataset-2, the developed model offered the specificity is 3.94% superior to LSTM, 5.2% more than Trans-Bi-LSTM, 3.6% enhanced than GRU and 0.84% increased than Trans-Bi-LSTM-GRU. Similarly, when considering the third dataset, the FDR of the suggested lyrics text classification process is minimized by 16.56% of LSTM, 24.43% of Trans-Bi-LSTM, 16.3% of GRU, and 6.48% of Trans-Bi-LSTM-GRU accordingly. The three datasets show the superior solutions of the suggested model over the other techniques for any performance metrics.The designed lyrics text classification technique has been shown to have low error rates and high classification accuracy rates compared to conventional techniques in both dataset 1 and dataset 2.

Table 2.

OverallClassifier analysis on the proposed lyrics text classification model.

Terms LSTM34 Trans-Bi-LSTM35 GRU32 Trans-Bi-LSTM-GRU36 IMPA-SCHADNet
Dataset-1
Precision 47.009 54.713 50.961 56.306 61.369
Recall 88.872 91.574 90.354 92.046 93.471
NPV 98.628 98.988 98.827 99.049 99.230
FPR 11.131 8.422 9.661 7.936 6.538
FNR 11.128 8.426 9.646 7.954 6.529
Accuracy 88.869 91.578 90.341 92.062 93.463
FDR 52.991 45.287 49.039 43.694 38.631
Specificity 88.869 91.578 90.339 92.064 93.462
F1-Score 61.492 68.500 65.166 69.871 74.092
MCC 0.596 0.668 0.634 0.682 0.726
Dataset-2
Recall 89.701 88.459 89.920 92.549 92.988
Specificity 89.579 88.507 89.871 92.330 93.109
Precision 74.155 71.955 74.742 80.088 81.812
FDR 25.845 28.045 25.258 19.912 18.188
FPR 10.421 11.493 10.129 7.670 6.891
Accuracy 89.609 88.495 89.883 92.385 93.079
NPV 96.309 95.834 96.396 97.381 97.551
FNR 10.299 11.541 10.080 7.451 7.012
F1-Score 81.190 79.358 81.631 85.869 87.043
MCC 0.747 0.722 0.753 0.811 0.827
Dataset-3
MCC 0.6337755 0.6019229 0.6348357 0.6724605 0.6964619
Recall 90.340578 89.115489 90.397094 91.700065 92.515413
FDR 49.040598 52.351401 48.937052 44.803024 42.075965
Precision 50.959402 47.648599 51.062948 55.196976 57.924035
FNR 9.6594218 10.884511 9.6029058 8.2999349 7.4845873
FPR 9.6598813 10.878997 9.6259563 8.2702601 7.4670122
Specificity 90.340119 89.121003 90.374044 91.72974 92.532988
NPV 98.825917 98.661148 98.833139 99.004646 99.109276
Accuracy 90.340165 89.120452 90.376349 91.726772 92.53123
F1-Score 65.162102 62.095661 65.261427 68.913115 71.24283

Overall analysis on the proposed lyrics text classification model based on algorithms

The overall analysis of the proposed lyrics text classification model based on three datasets is shown in Table 3.The developed lyrics text classification model given an MCC is 22.01% more than EOO-SCHADNet, 17.47% enhanced than EVO-SCHADNet, 15.6% increased than PO-SCHADNet, and 7.55% superior to MPA-SCHADNet based on dataset-1. When considering the dataset-2, FDR is 41.85% superior to EOO-SCHADNet, 38.39% more than EVO-SCHADNet, 30.46% enhanced than PO-SCHADNet and 12.8% increased than MPA-SCHADNet. Likewise, when considering the third dataset, the recommended lyrics text classification process’s precision is enhanced by 25.94% of EOO-SCHADNet, 21.25% of EVO-SCHADNet, 13.27% of PO-SCHADNet, and 7.89% of MPA-SCHADNet accordingly. These experimental validations for three data sources reported the superior solutions of the designed technique.The experimental validations enabled the designed text classification process to achieve more effective solutions than conventional models, ensuring the model’s robustness and reliability.

Table 3.

OverallAlgorithmic analysis of the proposed lyrics text classification model.

Terms EOO-SCHADNet37 EVO-SCHADNet38 PO-SCHADNet39 MPA-SCHADNet30 IMPA-SCHADNet
Dataset-1
Accuracy 88.863 89.743 90.121 91.827 93.463
Recall 88.853 89.748 90.144 91.827 93.471
Specificity 88.864 89.742 90.119 91.827 93.462
Precision 46.994 49.294 50.339 55.523 61.369
FPR 11.136 10.258 9.881 8.173 6.538
FNR 11.147 10.252 9.856 8.173 6.529
NPV 98.625 98.747 98.799 99.021 99.230
FDR 53.006 50.706 49.661 44.477 38.631
F1-Score 61.474 63.636 64.603 69.203 74.092
MCC 0.595 0.618 0.628 0.675 0.726
Dataset-2
Accuracy 86.779 87.692 89.481 91.892 93.079
Recall 86.486 87.363 89.701 91.746 92.988
Specificity 86.876 87.801 89.408 91.941 93.109
Precision 68.717 70.477 73.842 79.143 81.812
FPR 13.124 12.199 10.592 8.059 6.891
FNR 13.514 12.637 10.299 8.254 7.012
NPV 95.071 95.422 96.302 97.094 97.551
FDR 31.283 29.523 26.158 20.857 18.188
F1-Score 76.585 78.017 81.003 84.980 87.043
MCC 0.684 0.704 0.745 0.799 0.827
Dataset-3
Accuracy 87.111719 88.303622 90.085257 91.145829 92.53123
Recall 87.144319 88.299108 90.082465 91.153169 92.515413
Specificity 87.108097 88.304124 90.085567 91.145013 92.532988
Precision 42.892011 45.617999 50.237732 53.353356 57.924035
FPR 12.891903 11.695876 9.9144333 8.8549869 7.4670122
FNR 12.855681 11.700892 9.9175348 8.8468311 7.4845873
NPV 98.386644 98.549065 98.791558 98.933027 99.109276
FDR 57.107989 54.382001 49.762268 46.646644 42.075965
F1-Score 57.488474 60.157043 64.503027 67.309452 71.24283
MCC 0.553628 0.5816648 0.6269435 0.6559782 0.6964619

Statistical analysis of the proposed lyrics text classification model

Statistical performance analysis on the proposed lyrics text classification model based on dataset-1 and dataset-2 is shown in Table 4.Here, the statistical measures such as worst, best, mean, median, and standard deviation are considered for this experiment. The minimum recorded performance value is defined by the best measure, while the median explains the middle value of the performance metric. Finally, the standard deviation indicates the variability of the performance metric. These metrics are employed for fitness function validation, where the accuracy, sensitivity, FPR, and FNR are considered.The median of the developed lyrics text classification model is 7.19% more than EOO-SCHADNet, 3.5% enhanced than EVO-SCHADNet, 5.05% increased than PO-SCHADNet, and 4.92% superior to MPA-SCHADNetbased on dataset-1. When considering dataset 2, the standard deviation of the developed model is 15.73% superior to EOO-SCHADNet, 36.22% more than EVO-SCHADNet, 2.45% enhanced than PO-SCHADNet and 32.55% increased than MPA-SCHADNet. The experimental validations indicate that the designed model is effective in selecting optimal solutions and offers better performance rates than the existing algorithms.

Table 4.

Statistical analysis of the proposed lyrics text classification model.

Terms EOO-SCHADNet 37 EVO-SCHADNet 38 PO-SCHADNet 39 MPA-SCHADNet 30 IMPA-SCHADNet
Dataset-1
Worst 5.020 5.413 6.704 6.918 6.295
Best 4.044 4.048 4.038 4.108 3.906
Mean 4.323 4.317 4.293 4.283 3.994
Median 4.209 4.048 4.114 4.108 3.906
Std 0.321 0.484 0.712 0.544 0.435
Dataset-2
Worst 5.868 6.151 6.489 7.039 5.574
Best 4.322 4.039 4.196 4.071 3.918
Mean 4.726 4.354 4.354 4.298 4.100
Median 4.558 4.039 4.214 4.071 4.033
Std 0.445 0.588 0.366 0.556 0.375

ROC analysis on the proposed lyrics text classification model

ROC analysis on the suggested lyrics text classification model is depicted in Fig. 13.This ROC-aided experiment illustrates the designed technique’s minimized error rates over the existing classification models.In dataset-1, the developed model provided the ROC score is 15.29% more than LSTM, 7.92% enhanced than Trans Bi-LSTM, 2.43% increased than GRU, and 0.2% superior to Trans-Bi-LSTM-GRU. By analyzing the ROC, the model’s ability can be maximized with different thresholds among the classes. Thus, it can minimize the misclassification issues to improve the overall performance of the model. The implemented text classification model for lyrics is guaranteed to offer efficient solutions with lower error rates than other existing techniques through experimental validation.

Fig. 13.

Fig. 13

ROC analysis on the developed lyrics text classification model based on (a) Dataset-1 and (b) Dataset-2.

State-of-the-art-Method comparative analysis in lyrics text classification model

By comparing the traditional and related classification models in Table 5, the performance of the suggested lyrics text classification process is validated. In this table validation, the state-of-the-art-techniques like CNN, LSTM and DNN model is validated and also the recent techniques of CNN with Fast Text embeddings (CNN-FT)40, Convolution and Attention with a Bi-directional Gated Recurrent Unit (CAT-BiGRU)41 and Multi-View RNN (MV-RNN)42is validated to prove the efficiency in the developed model. In this validation, the accuracy of the developed model shows 93.4%. Higher accuracy performance could effectively minimize the error rate to improve the classification performance. Moreover, the error rate of developed IMPA-SCHADNet model shows 6.53% in terms of FPR. Considering dataset-2, the designed text classification process’s sensitivity is enhanced by 18.6% of CNN, 14.8% of LSTM, 13.2% of DNN, 10.9% of CNN-FT40, 6.24% of CAT-BiGRU41 and 11.3% of MV-RNN42 respectively. Thus, it has been reported that the implemented lyrics text classification process achieved very effective and superior solutions to the conventional and related classification models.

Table 5.

Overallperformance analysis of the proposed lyrics text classification model over state-of-the-art models.

Terms State-of-the-art-techniques Recent techniques Proposed
IMPA-SCHADNet
CNN 24 LSTM 25 DNN 27 CNN-FT 40 CAT-BiGRU 41 MV-RNN 42
Dataset-1
Accuracy 81.35 81.35 83.78 86.54 88.65 85.08 93.46
Sensitivity 79.12 79.20 81.69 84.20 86.65 82.62 93.47
Specificity 83.88 83.77 86.11 89.16 90.83 87.87 93.46
Precision 54.80 49.64 60.76 59.71 58.18 60.56 61.36
FPR 16.12 16.23 13.89 10.84 9.17 12.13 6.538
FNR 20.88 20.80 18.31 15.80 13.35 17.38 6.52
NPV 77.94 78.10 80.84 83.41 86.15 81.64 99.23
FDR 45.20 45.36 43.24 40.29 48.82 41.44 38.63
F1-Score 71.86 71.83 72.15 73.87 72.85 73.49 74.09
MCC 62.87 62.86 67.70 73.24 77.40 70.35 72.60
Dataset-2
Accuracy 79.16 82.00 83.70 85.56 89.38 84.43 93.07
Sensitivity 78.35 80.97 82.10 83.82 87.52 83.48 92.98
Specificity 80.14 83.24 85.71 87.75 91.67 85.56 93.10
Precision 80.87 78.51 80.85 79.56 80.83 77.38 81.81
FPR 19.86 16.76 14.29 12.25 8.33 14.44 6.89
FNR 21.65 19.03 17.90 16.18 12.48 16.52 7.01
NPV 75.13 78.17 79.19 81.22 85.62 81.22 97.55
FDR 27.13 24.49 22.15 20.44 27.17 19.62 18.18
F1-Score 80.55 83.18 84.88 86.60 80.10 85.39 87.04
MCC 58.24 63.95 67.42 71.18 78.82 68.82 82.70
Dataset-3
Accuracy 79.16 82.00 83.70 85.56 89.38 84.43 92.53
Sensitivity 78.35 80.97 82.10 83.82 87.52 83.48 92.52
Specificity 80.14 83.24 85.71 87.75 91.67 85.56 92.53
Precision 52.87 55.51 57.85 56.56 54.83 53.38 57.92
FPR 19.86 16.76 14.29 12.25 8.33 14.44 7.47
FNR 21.65 19.03 17.90 16.18 12.48 16.52 7.48
NPV 75.13 78.17 79.19 81.22 85.62 81.22 99.11
FDR 45.13 44.49 43.15 45.44 47.17 45.62 42.08
F1-Score 70.55 63.18 64.88 66.60 70.10 65.39 71.24
MCC 58.24 63.95 67.42 61.18 68.32 68.82 69.65

Ablation study of the proposed model

Table 6 represents the ablation study of the designed model. This analysis helps to evaluate the effectiveness of the developed system. The Table demonstrates that the classical BiLSTM system attains 88.8% accuracy, which is relatively lower compared to other models, representing poor user experience and inefficient resource allocation. However, the developed model attains 93.4% of accuracy, leading to more efficient and enhanced performance. Therefore, the developed model demonstrates superior performance in text classification than traditional models.

Table 6.

Ablation study of the proposed model.

Terms BiLSTM BiLstm-GRU LSTM-GRU TransLstm - GRU Proposed
IMPA-SCHADNet
Dataset-1
Accuracy 88.86751 91.57618 90.3464 92.0586 93.4633
Sensitivity 88.85014 91.5698 90.33334 92.06673 93.47067
Specificity 88.86944 91.57689 90.34785 92.05769 93.46248
Precision 47.00439 54.70844 50.97742 56.29358 61.36936
FPR 11.13056 8.423114 9.652147 7.942306 6.537522
FNR 11.14986 8.430197 9.666659 7.93327 6.529328
NPV 98.62513 98.98751 98.82515 99.05156 99.22975
FDR 52.99561 45.29156 49.02258 43.70642 38.63064
F1-Score 61.48262 68.49469 65.17495 69.86728 74.09241
MCC 0.595509 0.66818 0.633887 0.68234 0.725815
Dataset-2
Accuracy 89.59094 88.86048 89.81008 92.31191 93.07889
Sensitivity 89.77356 88.60482 89.55442 92.40321 92.98758
Specificity 89.53007 88.9457 89.8953 92.28147 93.10933
Precision 74.08077 72.76545 74.71054 79.96207 81.81234
FPR 10.46993 11.0543 10.1047 7.718529 6.890674
FNR 10.22644 11.39518 10.44558 7.596786 7.012418
NPV 96.3322 95.90444 96.27119 97.32922 97.55102
FDR 25.91923 27.23455 25.28946 20.03793 18.18766
F1-Score 81.17569 79.90777 81.46179 85.73365 87.04274
MCC 0.747262 0.729752 0.750965 0.809036 0.826616

Convergence time complexity analysis of the proposed model

Table 7 shows the convergence time analysis of the proposed model. Here, the traditional Bi-LSTM model attains higher training time and poor scalability, which indicates that the model struggles with large datasets and real world applications. However, the proposed hybrid model demonstrates superior efficiency. By leveraging the strengths of Trans-Bi-LSTM and GRU, the designed framework acheives minimzed training time, faster convergence and lower cost values. This is primarily because the serial cascaded architecture improves network flexibility, allowing for more efficient feature propagation. Furthermore, the integration of the IMPA ensures the model reaches an optimal solution rapidly. This enhanced optimization leads to a significantly better convergence rate and improved overall training performance for the lyrics text classification task. Therefore, the developed model is more effective compared to traditional models

Table 7.

Convergence time analysis of the proposed model.

Model Training Characteristics Convergence/Time Processing Result
Bi-LSTM Sequential, slow Higher training time, poor scalability
Trans-Bi-LSTM Parallelizable Reduced training time, better scalability
GRU Efficient, lightweight Lower complexity, faster training
SCHADNet (Proposed) Hybrid (Trans-Bi-LSTM + GRU, tuned by IMPA) Minimized training time, faster convergence, lower cost values

Best/Worst analysis of the proposed model

Figure 14 demonstrates the best and worst analysis of the developed model. This analysis helps to demonstrates superiority, quantifies performance gains and to evaluate robustness of the model. In Figure 14(a), the worst (LSTM) attains an accuracy of 88.87% and the Best(SCHADNet) gains 93.46%, which indicates that the proposed model gains superior accuracy leading to enhanced reliability, improved decision making and better user experience. As a resut, it is proven that the proposed SCHADNet model achives greater performance compared to traditional models.

Fig. 14.

Fig. 14

Best/Worst analysis on the developed lyrics text classification model based on Datset 1 and 2 in terms of (a) Accuracy, (b) F1-Score and (c) Recall.

State of comparison of the proposed model

The State of Art analysis of the proposed model is stated in Table 8. This comparison is performed to evaluate the performance and efficiency of the system. This evaluation is useful for identifying the advantage and disadvantage, contributing to the ongoing process of the system. In this Table, the traditional SVM model achieves a low accuracy of 88.8%, leads to inaccurate results and a waste of resources. However, the developed IMPA-SCHADNet model gain an accuracy of 93.4% which is superiorto other classical models, leads to enhanced efficiency and better decision making. As a result, the suggested IMPA-SCHADNetmodel achieved better performance than other models.

Table 8.

Comparative analysis of the suggested model with different optimizers.

TERMS SVM43 SLEM44 IMPA-SCHADNet
Dataset 1
Accuracy 88.86913 91.57611 93.4633
Recall 88.87082 91.586 93.47067
Specificity 88.86894 91.57501 93.46248
Precision 47.00907 54.70731 61.36936
FPR 11.13106 8.42499 6.537522
FNR 11.12918 8.414001 6.529328
NPV 98.62764 98.98942 99.22975
FDR 52.99093 45.29269 38.63064
F1-Score 61.49158 68.49833 74.09241
MCC 0.595633 0.668242 0.725815
Dataset 2
Accuracy 89.49963 88.64134 89.70051
Recall 89.48137 88.67787 89.70051
Specificity 89.50572 88.62917 89.70051
Precision 73.97343 72.21892 74.37916
FPR 10.49428 11.37083 10.29949
FNR 10.51863 11.32213 10.29949
NPV 96.23037 95.91568 96.31373
FDR 26.02657 27.78108 25.62084
F1-Score 80.99174 79.60656 81.3245
MCC 0.744661 0.725761 0.749205

Impact of feature extraction on the proposed model

Table 9 shows the impact of feature extraction on the proposed model. This analysis is performed over various feature extraction techniques like Glove embedding, Term Frequency Inverse Document Frequency (TF-IDF), and Bidirectional Encoder Representations from Transformers (BERT) to showcase the efficacy of the developed framework without these feature extraction processes. Here, the accuracy of the designed IMPA-SCHADNet is 93.46%, whereas the addition of Glove embedding in the IMPA-SCHADNet achieved the accuracy of 91.38%. Similarly, the integration of BERT in the designed IMPA-SCHADNet attained 92.74% accuracy, which is lower than the designed IMPA-SCHADNet techniques. Thus, the result confirmed that the use of Trans-Bi-LSTM in the designed IMPA-SCHADNet can effectively extract the significant features from the given input. These findings suggest that the internal feature extraction mechanism of the designed IMPA-SCHADNet is more effective for this classification task than relying on traditional feature extraction techniques like Glove embedding, TF-IDF, and BERT.

Table 9.

Impact of feature extraction on the proposed model.

Models Accuracy (%) Precision (%) Recall (%) F1-Score (%) FNR (%) FPR (%)
TF-IDF+ IMPA-SCHADNet 87.92 78.44 86.31 82.16 13.69 12.08
GloVe+ IMPA-SCHADNet 91.38 83.92 90.87 87.24 9.13 8.62
BERT+ IMPA-SCHADNet 92.74 86.15 92.08 89.01 7.92 7.26
IMPA-SCHADNet 93.46 87.92 93.47 90.61 6.53 6.54

Discussions

An effective lyrics text classification approach is implemented in this work by utilizing the powerful deep learning techniques. Various performance measures are used to support the experimental analysis of the designed technique. Also, the cost function experiment is performed for the developed approach. This experiment reported that the designed approach obtained very low cost function values thus confirming the higher convergence rates. Moreover, the performance examination of the developed lyrics text classification process is conducted for the first dataset over the previous classifiers and algorithms. The performance of the lyrics text classification process, which was developed using classical models and algorithms, is examined in section "dataset-2-based analysis on the proposed lyrics text classification model" using a second dataset. These performance experiments elucidated that the designed lyrics text classification process obtained relatively superior solutions than any other techniques over the classical models. Overall comparative examination of the implemented lyrics text classification process over existing techniques and algorithms for three data sources is examined. The overall comparative examination demonstrates the improved performance rates of the suggested approach, which ensures high efficiency in the classification process. Also, it provides the statistical experiment of the implemented model by considering the statistical measures. This experiment ensures the IMPA algorithm helps to select the optimal parameters more effectively than any other existing algorithms thus providing the detailed insights of the designed process. In addition, the ROC validation of the suggested lyrics text classification process has been confirmed. This operation explains that the implemented approach attained very lower error rates than the classical models thus offering outstanding solutions. Thus, the developed model provides the performance investigation of the developed lyrics text classification process by comparing it with the state-of-the-art models. Based on this experimental analysis, it was found that the designed model outperforms the state-of-the-art models and provides highly accurate solutions. Finally, the performance verification of the designed model utilizing a third dataset over existing algorithms and classifiers are computed. The implemented text classification process for lyrics is more effective than the baseline models thanks to this experimental solution.

Conclusion

This paper provided a lyrics text classification approach that utilized deep learning to classify the lyrics text based on its mood, genre, sentiment, and performer. The text pre-processing step was preceded by the acquisition of essential textual information from usual internet sites. Following that, SCHADNet was used to classify the text using the pre-processed text. The parameters, like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU, were tuned using the proposed IMPA algorithm to enhance the accuracy and sensitivity, along with reducing the FNR and FPR. Finally, the developed SCHADNet model provided the text-classified results. To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted against a variety of traditional methods. From the evaluation, the developed model provided a precision value was 41.22% more than LSTM, 18.95% enhanced than Trans-Bi-LSTM, 38.82% increased than GRU, and 13.46% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. The mean of the developed lyrics text classification model is 7.61% more than EOO, 7.48% enhanced than EVO, 6.96% increased than PO, and 6.74% superior to MPA. The experimental validations have confirmed that the proposed lyrics text classification process outperformed and provided more effective solutions than the traditional techniques. This designed lyrics text classification process is used for some practical implications like mood-aided analysis, research and academia, music recommendation systems, artist and genre analysis, and so on.

Limitations of the developed model

The main limitations of the developed SCHADNet system are its computational complexity due to the combination of several deep learning components, Transformer, Bi-LSTM, and GRU in a serial cascaded structure. This design, while effective for extracting contextual and sequential connections, demands greater computational resources, extended training time, and superior memory than simpler architectures. Furthermore, because the system learns end-to-end without predefined feature extraction, it demands a considerable amount of training data to attain optimal generalization.

Future scope

In future, strategies like transfer learning, self-supervised learning, and advanced data augmentation will be introduced to minimize the reliance on vast amounts of data and improve the systems capability to generalize. 

Acknowledgements

I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.

Author contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Data availability

Dataset 1: The data underlying this article are available in https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. Dataset 2: The data underlying this article are available in https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03.

Declarations

Competing interest

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Furner, M., Islam, M. Z. & Li, C. T. Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data. Expert Syst. Appl.182, 115236 (2021). [Google Scholar]
  • 2.Hizlisoy, S., Yildirim, S. & Tufekci, Z. Music emotion recognition using convolutional long short term memory deep neural networks. Eng. Sci. Technol. Int J.24(3), 760–767 (2021). [Google Scholar]
  • 3.Wang, C. & Ko, Y. C. Emotional representation of music in multi-source data by the internet of things and deep learning. J. Supercomput.79(1), 349–366 (2023). [Google Scholar]
  • 4.Jena, K. K., Bhoi, S. K., Mohapatra, S. & Bakshi, S. A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Comput. Appl.35(1), 11223–11248 (2023). [Google Scholar]
  • 5.Khattak, A., Asghar, M. Z., Khalid, H. A. & Ahmad, H. Emotion classification in poetry text using deep neural network. Multimed. Tools Appl.81(18), 26223–26244 (2022). [Google Scholar]
  • 6.Yang, L., Shen, Z., Zeng, J., Luo, X. & Lin, H. COSMIC: music emotion recognition combining structure analysis and modal interaction. Multimed. Tools Appl.83 (5), 1–16 (2023). [Google Scholar]
  • 7.Dong, L. Using deep learning and genetic algorithms for melody generation and optimization in music. Soft Comput.27(1), 17419–17433 (2023). [Google Scholar]
  • 8.Sarkar, R., Choudhury, S., Dutta, S., Roy, A. & Saha, S. K. Recognition of emotion in music based on deep convolutional neural network. Multimed. Tools Appl.79, 765–783 (2020). [Google Scholar]
  • 9.Policicchio, V. L., Pietramala, A. & Rullo, P. GAMoN: discovering M-of-N ¬,∨ hypotheses for text classification by a lattice-based genetic algorithm. Artif. Intell.191, 61–95 (2012). [Google Scholar]
  • 10.Dwiyani, L. K. D., Suarjaya, I. M. A. D. & Rusjayanthi, N. K. D. Classification of explicit songs based on lyrics using random forest algorithm. J. Inform. Syst. Inform.5, 550–567 (2023). [Google Scholar]
  • 11.Du, J. Sentiment analysis and lyrics theme recognition of music lyrics based on natural language processing. J. Electr. Syst.20, 315–321 (2024). [Google Scholar]
  • 12.Xie, C. et al. Music genre classification based on res-gated CNN and attention mechanism. Multimed. Tools Appl.83(5), 13527–13542 (2024). [Google Scholar]
  • 13.Jandaghian, M., Setayeshi, S., Razzazi, F. & Sharifi, A. Music emotion recognition based on a modified brain emotional learning model. Multimed. Tools Appl.82(4), 26037–26061 (2023). [Google Scholar]
  • 14.Rajan, R. & Nithin, S. K. Folk music structural segment classification using GRU-based hierarchical attention network. Sādhanā48(4), 254 (2023). [Google Scholar]
  • 15.Hongdan, W., SalmiJamali, S., Zhengping, C., Qiaojuan, S. & Le, R. An intelligent music genre analysis using feature extraction and classification using deep learning techniques. Comput. Electr. Eng.100, 107978 (2022). [Google Scholar]
  • 16.Sujeesha, A. S., Mala, J. B. & Rajan, R. Automatic music mood classification using multi-modal attention framework. Eng. Appl. Artif. Intell.128, 107355 (2024). [Google Scholar]
  • 17.da Silva, A. C. M., Coelho, M. A. N. & Neto, R. F. A music classification model based on metric learning applied to MP3 audio files. Expert Syst. Appl.144, 113071 (2020). [Google Scholar]
  • 18.Andreyan Rizky Baskara; Muti’a Maulida; Muhammad Tri Madya Lestiyanto; Yuslena Sari; Nurul Fathanah Mustamin; Eka Setya Wijaya, Explicit content classification in indonesian song lyrics using the LSTM-CNN method. 2024 Ninth International Conference on Informatics and Computing (ICIC) (2024).
  • 19.Bonela, Abraham Albert, He, Zhen, Luxford, Dan-Anderson., Riordan, Benjamin & Kuntsche, Emmanuel. Development of the lyrics-based deep learning algorithm for identifying alcohol-related words (LYDIA). Alcohol Alcohol.59, 2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bolla, B. K., Pattnaik, S. R. & Patra, S. Detection of objectionable song lyrics using weakly supervised learning and natural language processing techniques. Procedia Comput. Sci.235, 1929–1942 (2024). [Google Scholar]
  • 21.Syed Nawaz Pasha; Dadi Ramesh; Sallauddin Mohmmad; Shabana; D. Kothandaraman; T. Sravanthi, Song lyrics genre detection using RNN. AIP Conference Proceedings 2971(1) (2024).
  • 22.Abdillah, J., Asror, I. & Wibowo, Y. F. A. Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting. J. RESTI (Rekayasa Sistem Dan Teknologi Informasi)4(4), 723–729 (2020). [Google Scholar]
  • 23.Revathy, V. R., Pillai, A. S. & Daneshfar, F. LyEmoBERT: classification of lyrics’ emotion and recommendation using a pre-trained model. Procedia Comput. Sci.218, 1196–1208 (2023). [Google Scholar]
  • 24.Jia, X. Music emotion classification method based on deep learning and improved attention mechanism. Comput. Intell. Neurosci.2022, 5181899 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen, X. et al. A novel approach for explicit song lyrics detection using machine and deep ensemble learning models. PeerJ Comput. Sci.9, e1469 (2023). [Google Scholar]
  • 26.Li, Y., Zhang, Z., Ding, H. & Chang, L. Music genre classification based on fusing audio and lyric information. Multimed. Tools Appl.82(13), 20157–20176 (2023). [Google Scholar]
  • 27.Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J. and Moussallam, M., Music mood detection based on audio and lyrics with deep neural net. arXiv preprint (2018).
  • 28.F. Almeida do Carmo, J. L. Figueira da Silva Junior, R. Geraldeli Rossi and F. M. França Lobato, Text representations for lyric-based identification of musical subgenres. IEEE Latin America Transactions 21(6):737-744 (2023).
  • 29.Tsaptsinos, A., Lyrics-based music genre classification using a hierarchical attention network. arXiv (2017).
  • 30.Faramarzia, A., Heidarinejada, M., Mirjalili, S. & Gandomi, A. H. marine predators algorithm: a nature-inspired metaheuristic. Expert Syst. Appl.152, 113377 (2020). [Google Scholar]
  • 31.Ye, H. et al. Web services classification based on wide & Bi-LSTM model. IEEE Access7, 43697–43706 (2019). [Google Scholar]
  • 32.Naeem, A. et al. A novel combined densenet and gated recurrent unit approach to detect energy thefts in smart grids. IEEE Access11, 59496–59510 (2023). [Google Scholar]
  • 33.Sun, J., Han, P., Cheng, Z., Wu, E. & Wang, W. Transformer based multi-grained attention network for aspect-based sentiment analysis. IEEE Access8, 211152–211163 (2020). [Google Scholar]
  • 34.Alfarizi, M. I., Syafaah, L. & Lestandy, M. Emotional text classification using TF-IDF (Term frequency-inverse document frequency) And LSTM (Long short-term memory). J. Informatika10, 2 (2022). [Google Scholar]
  • 35.Ping Yu and XueBo Fu, Classification and identification of emotion of non-foreign music based on TR-Bi-LSTM emotion analysis. Researchsquare (2023).
  • 36.Jia, C. et al. State of health prediction of lithium-ion batteries based on bidirectional gated recurrent unit and transformer. Energy285, 129401 (2023). [Google Scholar]
  • 37.Salim, A., Jummar, W. K., Jasim, F. M. & Yousif, M. Eurasian oystercatcher optimiser: new meta-heuristic algorithm. J. Intell. Syst.31(1), 332–344 (2022). [Google Scholar]
  • 38.Azizi, M., Aickelin, U., Khorshidi, H. A. & Baghalzadeh Shishehgarkhaneh, M. Energy valley optimizer: a novel metaheuristic algorithm for global and engineering optimization. Sci. Rep.13, 226 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Askari, Q., Younas, I. & Saeed, M. Political optimizer: a novel socio-inspired meta-heuristic for global optimization. Knowl.-based Syst.195, 105709 (2020). [Google Scholar]
  • 40.Pengxu Wang, Electronic archive classification method based on convolutional neural network with fast text embeddings, 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC) (2024).
  • 41.Najla Al-shathry, Badria Al-onazi, Abdulkhaleq Q A Hassan, Shoayee Alotaibi, Saud Alotaibi, Faiz Alotaibi, Mohammed Elbes, Mrim Alnfiai, Leveraging hybrid adaptive sine cosine algorithm with deep learning for arabic poem meter detection ACM Transactions on Asian and Low-Resource Language Information Processing (2024).
  • 42.Eswaraiah, P. & Hussain, S. A hybrid deep learning GRU based approach for text classification using Word embedding. EAI Endorsed Trans. Internet Things10, 1 (2023). [Google Scholar]
  • 43.Rahayu, S. P., Afuan, L. & Yunindar, G. A. Implementation of text mining on song lyrics for song classification based on emotion using website-based logistic regression. J. Teknik Informatika (Jutif)6(1), 359–368 (2025). [Google Scholar]
  • 44.Mehra, Ashman, Mehra, Aryan & Narang, Pratik. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed. Tools Appl.84(7), 3701–3721 (2025). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Dataset 1: The data underlying this article are available in https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. Dataset 2: The data underlying this article are available in https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES