Serial cascaded hybrid adaptive deep networks-based lyrics text classification using optimization approach

R L Jasmine; Saswati Mukherjee; C R Rene Robin; G David Raj

doi:10.1038/s41598-026-38813-z

. 2026 Feb 12;16:8527. doi: 10.1038/s41598-026-38813-z

Serial cascaded hybrid adaptive deep networks-based lyrics text classification using optimization approach

R L Jasmine ^1,^✉, Saswati Mukherjee ², C R Rene Robin ³, G David Raj ⁴

PMCID: PMC12976351 PMID: 41680344

Abstract

Since electronic music is simpler to produce and distribute than analog music, the variety of musicals available worldwide has increased rapidly along with the music marketplace’s shift from analog to digital. Due to the abundance of available songs, people are discovering songs in various ways; one of them is by analyzing their emotional content. Not every age group can listen to the same music at all times. Deep learning techniques have yielded excellent results recently, marking a significant advance in NLP. However, there have been few attempts to use a deep learning model to sort out lyrics from improper music. Hence, a deep learning-based lyrics text classification process is presented in this proposal. Firstly, indispensable text data are fetched from the standard online resources and further, it is applied to the text pre-processing stage. After that, the resultant pre-processed text is subjected to the Serial Cascaded Hybrid Adaptive Deep Networks (SCHADNet) for classification purposes. The Transformer-based Bidirectional Long Short-Term Memory (Trans Bi-LSTM) is integrated with a Gated Recurrent Unit (GRU) for developing the model of SCHADNet, where the parameters of SCHADNet are optimally tuned by the Improved Marine Predators Algorithm (IMPA). Lastly, the classified outcome is accomplished from the SCHADNet. In order to enhance the classification performance, the developed model shows significant advancement by increasing the accuracy rate of 93.4%, 93.47% recall and 99.2% NPV, respectively. The numerical analysis is performed for the suggested lyrics text classification model over numerous classical text classification techniques to portray the effectiveness of the presented model.

Keywords: Lyrics text classification, Serial cascaded hybrid adaptive deep networks, Transformer-based bidirectional long short-term memory, Improved marine predators algorithm

Subject terms: Engineering, Optics and photonics

Introduction

Since the beginning of time, music has been a significant part of our lives. It profoundly affects the state of mind, ideas, and interactions with others while also evoking human feelings¹. Our cultural and social life is enhanced by music, which has a range of effects on us. Perhaps the more widely used medium for data, pleasure, and leisure in the past few decades is music. Since lyrics are a means for artists to convey themselves, the library of electronic music is expanding quickly². There are lyrics that hint at aggressive sexual or drug themes and contain material that is not appropriate for children’s ears. Recognizing an atmosphere in music is an ongoing process of exploration. It uses various techniques to identify the feelings connected to a musical composition³. These consist of lyric analysis of texts, sound evaluation, and other things. The majority of studies on musical classification rely on examining auditory signals and musical characteristics⁴. Employing a slang vocabulary is the initial method. This technique compares a song’s lyrics to a list of phrases considered obscene or improper. The music is deemed unsuitable if one or more of these conditions appear in its lyrics⁵. Nevertheless, since there isn’t a single profane vocabulary used by all businesses, the outcomes of this approach could differ⁶.. To ensure the swearing lexicon is updated with the latest offensive words, ongoing maintenance is necessary when using this strategy.

It is challenging to satisfy the requirements of users experiencing a range of emotions when the majority of tools just suggest well-known songs while ignoring personalized efforts⁷. The process of creating classification labeling was primarily manual prior to the development of sophisticated software, and tracks with various musical genres were arranged into appropriate song categories⁸. Nevertheless, these techniques seem not just ineffective but highly dependent on human judgment, and the precision of classification isn’t consistent⁹. The classic classification techniques, which mostly consist of techniques like Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), and Support Vector Machines (SVM), are currently maturing based on human classification¹⁰.Deep and other machine learning techniques have been used extensively to classify sound, picture, and text, and the results have been impressive¹¹. With the development of computerized-related methods, computers are now capable of doing intricate calculations and emotional evaluation, as well as generating emotional outcomes¹².The word embeddings of the lyrical data are used to determine the genre, which is highly related to the lyrics.

The minds of adolescents may be significantly impacted by such lyrics. Lyrics are becoming more explicit and aggressive¹³. Nevertheless, the methods in place to filter explicit words from song lyrics are ineffective¹⁴. There are several methods that have been proposed to classify texts, such as deep learning techniques like CNNs and RNNs, classification using machine learning methods, and lexicon-based filters. These experiments have produced differing degrees of efficiency and were carried out on various data sources and dialects¹⁵. According to some research, employing more sophisticated machine learning classifications could assist in achieving even greater gains¹⁶. While techniques based on machine learning demonstrate promising outcomes in the area of classifying music feelings, there is still a need for improvement in the comprehensive identification of sound feelings because the connection among phrases and harmony sentiments can be distinguished in various manners during the processing of phrases and melodies, without taking the uniformity of feelings among lyrics and melody into consideration¹⁷.

Motivation of the developed model

In general, the music plays an important role in the human emotions. Moreover, the lyrics are a vital part of the song that acts as an inevitable role helps to understand the emotions of the songs. It is crucial to categorize lyrics using various machine learning and deep learning approaches¹⁸. Several well-known classification techniques have been adopted to classify the lyrics text from the labeling data. In recent times, the utilization of deep learning models like CNN and RNN has achieved superior outcomes and provided an exciting breakthrough with the help of Natural Language Processing (NLP)¹⁹. The imbalance of data in traditional models can result in biased models that impact performance with less frequent classes due to the uneven class distribution. Due to the presence of noise, misspellings and inconsistent data can easily affect the data quality in CNN²⁰. Training the CNN model is computationally expensive and required significant memory usage to understand the sequential nature of the lyrics. On the other hand, the RNN model has the ability to progress the data sequentially yet, it struggles to provide parallelize the computations. Thus, it results to beslow training process than the other traditional approaches²¹. Existing traditional models are still challenging due to inconsistent and redundant data, which often leads to misclassification. To rectify the issues in the existing models, the research work develops an effectual lyric text classification model based on deep learning to alleviate such challenges, and the contributions are given as follows.

To develop the effectual deep learning-based lyric text classification model using the optimization approach that helps to categorize the songs based on its mood, genre, sentiment, and performer.
To design the SCHADNet-based text classification model useful for recognizing and analyzing the meanings used in lyrics and facilitates the analysis of songs’ context within history and culture.
To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm.
To develop and evaluate the IMPA model by modifying the traditional MPA with an effective concept that helps in the parameter tuning and performance enhancement of the suggested lyrics text classification.
To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted for the lyrics text classification approach against a variety of traditional text classification methods.

The layout of the suggested framework is provided below. The automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep network is shown in Section "Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks". The pre-processing of text data for lyrics text classification is provided in Section "Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm". The hybrid adaptive deep networks for the lyrics text classification model are offered in Section "Serial cascaded hybrid adaptive deep networks for lyrics text classification model". The result and discussion are available in Section "Result and discussion". The conclusion is offered in Section “Conclusion”.

Literature survey

Related works

In 2020, Abdillah et al.²² have employed the deep learning Bi-LSTM algorithm with weighted keywords from GloVe to identify the song’s feelings utilizing its lyrics. The precision of the Bi-LSTM framework with the layer of dropout with activity periodicity was determined to be 91.08%. The difference between validation and training loss could be reduced by approximately 0.15 if the settings for abandonment, activity regulation, and instruction rate degradation were adjusted.

In 2023, Revathy et al.²³ have used the Music4All database to assess the musical elements crucial for identifying four main human feelings: joyful, furious, calm, and unhappy. Several artificial intelligence methods based on a conceptual psychological model were used to accomplish this. To predict the mood of the desired information, a transfer learning method was used to comprehend the emotions of the lyrics derived from an in-domain database.A rudimentary lyric-suggested network was created using the phrase converter concept.

In 2022, Jia²⁴ has proposed an approach for classifying musical emotions based on enhanced attention mechanisms and extensive knowledge. The characteristics of the tune’s songs were initially extracted, yielding the term frequencies weighted matrix and phrase vector. By combining the matched attention system with the extracting features capabilities of CNN and LSTM networks to handle serialized input, a framework for evaluating feelings was created. Ultimately, the CNN-LSTM model along with the Deep Neural Network (DNN)’s data outputs was combined, and the SoftMax algorithm was utilized to determine the different emotion kinds. Given the chosen data sets, the tests revealed that the suggested method’s mean accuracy in classification reached 0.848, greater than the average of the other comparative methods, while the method’s categorization efficiency had significantly increased.

In 2023, Chen et al.²⁵ have developeda model by combining the deep learning as well as machine learning to extract lyrics from songs. The suggested model, ELSTM-VC, was compared to other algorithms due to the integration of extra branch classifiers and LSTM. With its ability to identify sexually explicit material in English phrases, the ELSTM-VC has potential applications for the entertainment sector. The suggested method successfully identified explicit phrases, according to the study’s findings, which were based on an array of 100 songs on Spotify utilized for learning. It has the ability to accurately extract content that is objectionable for younger audiences. The suggested strategy outperformed other strategies, such as encoding-decoding algorithms and models for machine learning.

In 2023, Li et al.²⁶ have suggested a multimodal structure for classifying music genres that used lyrics and audio files. By embracing the complementary nature of multisensory data, it is possible to achieve a more thorough representation of musical styles. A CNN was employed to gather audio characteristics after the structure had first retrieved the audio’s mel-spectrogram. BERT used multiple methods concurrently to acquire the lyrics’ dispersed representation. Subsequently, the two multimodal pieces of data were combined using several techniques, including features and choice-level fusion. To address the significant difference in convergence rate between the sound channel and the melody stream, the asynchronous technique was employed at the beginning of two streams with various models. A number of tests were conducted to confirm the suggested model’s efficacy. In terms of the genre of music categorization, the suggested approach’s F1 score represented 0.87, a value that was almost 4% greater than the highest background in the trial.

In 2018, Delbouys et al.²⁷ have developed the multimodal musical mood forecasting model using a track’s words and sound input. The use of conventional feature engineering-based methods was replicated and put forth a novel deep learning-based model. The method was able to outperform conventional algorithms on the excitation identification task, but both techniques performed similarly on the emotion forecasting challenge. The efficacy of both methods was assessed on a collection of data that had 18,000 recordings with related arousal and valence scores. The integration of modality optimized concurrently for every single-modal model resulted in a significant increase in valence predictions when evaluated afterward. A portion of the database was made available for examination.

In 2023, Carmo et al.²⁸ have identified an imbalance in the existing research on musical data mining by applying text-based representation methods to the issue of categorizing melodic sub-genres. Identifying the line that separates groups from a single category is the challenge of the issue, given that they share several characteristics. Extensive tests were conducted in order to determine the most effective blend of written models and classifiers. The findings demonstrated that enhanced Bag-of-Words (BoW) using the Support Vector Machine (SVM) with LR methods outperformed DNN and integrating algorithms in terms of performance. The findings may lead to further research on the classification of texts with complex and delicate interfaces of separateness.

In 2017, Tsaptsinos²⁹ has created models for continuous neural systems for organizing a big collection of whole lyrics to songs. To use each of these strata and comprehend the significance of the phrases, paths, and sections, a Hierarchical Attention Network (HAN) was utilized. Lyrics display a hierarchical layered framework, where words merge to create lines, lines create sections, and sections make the whole song. A reduced database of 20 genres was used and an expanded dataset with 117 genres to evaluate the framework. The HAN’s performance in experimental data was superior to that of less difficult computational models and non-neural designs, and it was also capable of discriminating across a wider range of categories than previous studies. During the process of learning, it will additionally be possible to see what lyrics or words of the music that the example considers crucial for dividing its genre. Consequently, the HAN offered insights into the linguistic characteristics and poetic organization that distinguish distinct genres of music from a computing standpoint.

Problem statement

Text classification is a common process that includes categorizing the text into groups utilizing advanced approaches. The text classifier has the ability to evaluate the text and assign pre-defined classes or tags based on its content. From the lyrics text classification, approaches such as categorizing music mood, genre, sentiment, and performer can be carried out. Numerous text classification works have been presented using lyrics. Some of the method’s merits and issues are given in Table 1.

In conventional techniques, dealing with a massive number of data in high-quality datasets can merely affect the accuracy of the model. Training and testing a large amount of data is a time-consuming and challenging process. Incorporating the transformer, Bi-LSTM and GRU model ensures to learn the intrinsic patterns and has the ability to train the model by considering the large number of data. Thus, it greatly strengthens the accuracy performance in lyrics text classification model.
Understanding the contextual relationship of the words and phrases becomes difficulty and prone to increase the errors in the text data. Overfitting and poor performance for unseen data can be a result of existing deep learning models. On the other hand, the implemented model in this research work ensures to split the data into training as well as testing phases. The developed model can certainly minimize overfitting issues to improve the model’s overall performance in this context.
Existing preprocessing techniques often provide inaccurate outcomes, especially in inconsistent formatting, noisy data and nuances of musical language. Eliminating the redundant and inconsistent content in the lyrics text data becomes challenging the traditional techniques. However, the research work focus on effective data pre-processing by considering the punctuation and special character removal, removal of redundant and inappropriate data and stemming to emphasize the overall performance. The data pre-processing phase eliminates the noisy outcome to maximize the model’s accuracy.
The presence of repeated data can impact the classification performance in the traditional models. Most of the research works do not focus on tuning the parameters. The optimization algorithm’s parameter tuning plays a crucial role in selecting the optimal parameters. In this research work, the fine tuning parameter optimization is done with the help of IMPA algorithm by selecting the appropriate parameters to get the optimal solutions.

Table 1.

Discussionon the conventional lyric text classification models.

Author [citation]	Methodology	Features	Challenges
Abdillah et al.,²²	Bi-LSTM	• It enhances the available network data and enhances the contexts.	• It performs slow calculations.
Abdillah et al.,²²	Bi-LSTM	• It offers better data representations.	• It consumes more training time.
Revathy et al.,²³	BERT	• It gives high-accuracy solutions.	• It is a very expensive model and demands more computation.
Revathy et al.,²³	BERT	• It requires very little memory.	• It has a complex network.
Jia²⁴	CNN	• It automatically recognizes the relevant patterns.	• It shows limited efficacy for the sequential data.
Jia²⁴	CNN	• It can process a high amount of data with more accuracy.	• It may be affected by overfitting issues when processing small datasets.
Chen et al.,²⁵	LSTM	• It offers high-accuracy solutions.	• It is computationally intensive and expensive.
Chen et al.,²⁵	LSTM	• It rectifies the gradient issues of the network.	• It is prone to chaotic, complex, and noisy data.
Li et al.,²⁶	BERT	• It provides better predictions due to its bi-directional nature.	• It is not a straightforward process.
Li et al.,²⁶	BERT	• It evaluates all the input without any particular direction.	• It produces undesired outcomes.
Delbouys et al.,²⁷	DNN	• The computation required is minimal.	• It is very hard to interpret and lacks domain expertise.
Delbouys et al.,²⁷	DNN	• It is very flexible and performs complex tasks.	• It needs more data to train the network.
Carmo et al.,²⁸	SVM	• It prevents the network from the overfitting issues.	• Complete data sources without any missing values are necessary.
Carmo et al.,²⁸	SVM	• It performs rapid prediction and has good generalization.	• It provides poor performance for large data sources.
Tsaptsinos²⁹	HAN	• It is very helpful in detecting the significant data.	• It has dimensional issues.
Tsaptsinos²⁹	HAN	• It provides better functionality in complex data.	• It consumes more resources for the execution.

Open in a new tab

Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks

Proposed lyrics text classification model

Textual lyrics pose a number of classification issues. The subjective nature and comprehension of lyrics serve as one of the primary obstacles. Individual listeners can interpret identical lyrics in different ways as they reflect the listener’s events, feelings, and viewpoints. Because of this individuality, it is challenging to develop a systematic categorization scheme that faithfully conveys the lyrics’ purposeful meaning. Lyrics’ intricate structure of language presents another difficulty. Numerous songs include literal spoken language, analogies, phraseology, and symbolic references that may pose a challenge for algorithmic techniques to understand. Advanced techniques are needed to effectively categorize and evaluate lyrics due to their complex and nuanced wording.

The overwhelming amount of lyrical data is another major obstacle. To manage the quantity of tunes and lyrics, sophisticated algorithms are necessary for analyzing and evaluating such large volumes of text. The variety of styles and categories also makes categorization even more difficult. Developing a classification system that works effectively for all genres of music is difficult because every genre may have its own distinct lyrical qualities. Furthermore, delicate or sexual content occasionally appears in songs. This makes it difficult to moderate material and guarantee proper filtering, particularly on sites wherein lyrics are posted publicly. Keeping a secure and welcoming atmosphere requires the development of strong content-filtering algorithms that can reliably recognize and identify possibly dangerous or unsuitable lyrics. By dealing with these issues, it is possible to gain a greater understanding of the global context of lyrics and songs, which enhances one’s comprehension and appreciation for the uniqueness of songs. So, we developed an effectual lyric text classification, and the pictorial view is provided in Fig. 1.

Fig. 1 — Pictorial view of the developed lyric text classification model.

A novel lyric text classification model is implemented, where the primary objective of this process is to effectively categorize songs based on their mood, genre, sentiment, and performer, resulting in a better understanding of the songs for further analytics. This classified solution helps listeners and scholars to examine and investigate musical patterns, themes, and styles. Most of the time, the lyrics have a more implicit, and subtle tone, demanding a deeper understanding of the emotional undertones. Also, the emotional categorization of lyrics in the texts is somewhat subjective due to the music and personal interpretation of the lyrics.Therefore, classifying the emotions in song lyrics is more significant than any other text such as books. This lyric classification model provides the insights of the individual’s inner feelings. Generally, the necessitated data are garnered using benchmark sources of data. In addition, the data gathered is subjected to pre-processing to enhance its quality. In this stage, operations such as (i) punctuation and special character removal, (ii) removing redundant and inappropriate data and (iii) stemming are performed. After performing the text pre-processing, the resultant data is given to the classification stage for categorizing the lyrics text. The SCHADNet model is created to achieve effective text classification by combining Trans Bi-LSTM and GRU models.To enhance the accuracy, and sensitivity along with reducing the FNR and FPR, the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model using the proposed IMPA algorithm. The SCHADNet model that was developed provided text-classified results. The output classes are various genres, moods, performers, and sentiments of the song.

Text dataset for classification analysis

The data necessitated to carry out the lyric text classification model are as follows.

Dataset-1 ("Multi-Lingual Lyrics for Genre Classification dataset"): The data are collected using the link of https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. This dataset is in the Kaggle platform. This dataset includes two files in.csv with 11 columns. The size of this dataset is 341 MB and includes 291118 songs.

Dataset-2 (“Song-lyric-classification datasets”): The data are garnered using the link of https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03. The database has the motive to predict the emotions of the songs on the basis of its lyricsandgenres. The size of this dataset is 1.86 MB and includes 1369 songs. This dataset is in the.csv file.

Dataset 3 (“Veucci/lyric-to-3genre”): utilizing the link, “https://huggingface.co/datasets/Veucci/lyric-to-3genre: access date: 2024-08-16”, this dataset has been accessed. This data source includes numerous song lyrics from distinct genres and artists in English. The genre such as rock, hip-hop, and pop are utilized in this resource.

By utilizing these datasets, the mood, genre, sentiment, and performer of the song is classified from the lyrics.

From the datasets, the collected data is defined by Inline graphic . In this, the total count of the gathered text is expressed as .

Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm

Text data pre-processing

The text pre-processing is a significant step in the data preparation, where the original text is converted into a suitable, clean, and consistent format for modeling and evaluation. While modern techniques such as BERT, BART, and GPT are trained with punctuation and stop word removal, the pre-processing state is still needed for some reasons such as improving data quality, noise minimization, model explainability, computational efficiency, and model interpretability, and so on. Moreover, the mentioned BERT, MART, and GPT are transformer-based models. Pre-processing text data may be less time-consuming than performing the pre-processing task with these models. Hence, the text pre-processing is performed in this work for improving the text quality. The collected data Inline graphic are inputted for text data pre-processing stage.

Punctuation and special character removal

The punctuation removal is the process of replacing or depleting the punctuation marks from the text data. Some of the punctuations are periods (.), commas (,), colons (:), parentheses (()), dashes (-) and so on. By removing these punctuations, text data can be simplified, noise can be minimized, and focus can be placed on meaningful words. The special character removal is the process of removing the special characters such as symbols (+,-,=), currency signs ($), HTML tags (<,>), and so on. This special character removal results in enhanced model performance, text representation, and so on.

Remove redundant and inappropriate data

Redundant data is material that is repeated or reproduced within the lyrics. By decreasing needless reiteration eliminating duplicate information serves to simplify the classification procedure. The definition of improper data in lyrics text classification is that it is not related and obnoxious to the classification process. This also includes information that is irrelevant, offensive, or explicit to the operation. The improper data can include particular terms or overall songs that are not related to the classification aim. For example, if the aim is to categorize the lyrics on the basis of their emotional content, entire songs, or terms that contain explicit or offensive language irrelevant to the emotions can be considered improper and eliminated during the pre-processing process.

Stemming

The practice of distilling languages to their basic or core shape is referred to as stemming. By using this strategy, lyrics’ terms can be created while reducing the intricacy and diversity of terminology. It can gather several variants of a single word, like “performing” and "performs," into a prevalent root, like "perform." This streamlines the categorization procedure by considering comparable words, irrespective of their particular shape, as identical. By lowering the degree of dimensionality for the information, stemming can increase the precision and effectiveness of models for lyrical text categorization. After executing all these processes, the pre-processing stage given the pre-processed data is indicated by Inline graphic .

Marine predators algorithm

This part examines the MPA³⁰ method’s evolution as a straightforward and effective meta-heuristic optimization technique.

Detecting top predator phase

MPA is a population-level approach, where initialization process in the searching space as in Eq. (1).

Here, the term Inline graphic is an even randomized matrix in the interval from 0 to 1, and and are the lowest and highest limits for parameters.

The elite matrix is created by selecting the fittest outcome as the top predator.This matrix’s panels manage the process of looking at and locating the target using the location of the target in Eq. (2).

Here, the term Inline graphic is the elite; in order to create the , the top predatory vector form denoted by is copied periods. The total amount of dimensionality is , while the quantity of agents that search equals . If the superior predator replaces the most powerful predatory at the conclusion of each cycle, the term Inline graphic is going to be changed.

Prey Inline graphic is a different matrix having identical dimensions as , and hunters adjust their places according to it. To put it simply, activation produces the first batch of prey, and then the best of the best builds the . The term is calculated using Eq. (3).

Eq. (3) shows the Inline graphic aspect of the victim as . It ought to be mentioned that each of these arrays play a major but immediate role in the optimum procedure in its entirety.

MPA optimization scenarios

The MPA optimization procedure is broken down into three key phases and is explained as follows.Levy and Brownian motion are the primary random walks used in the MPA.

The Levy motion is a kind of arbitrary walk that the sizes of the steps are estimated from the probability function and are expressed as Inline graphic . Here, the distribution index is specified as and the attributes are two specific points.

The probability function referenced by unit variance and normal distribution determines the length of step in the stochastic operation of Brownian motion. It is expressed as Inline graphic . Here, is the mean and is the unit variance.

Phase-1: The optimum tactic for an attacker in a high-velocity proportion Inline graphic is to move very slowly, according to the guidelines. This rule’s mathematical framework is used as in Eq. (4) in the case of .

Here, the term Inline graphic represents an integer that the Brownian motion. Multipliers by entries are indicated by the symbol . Prey replicates its movements by multiplying in , representing an array of uniformly at random values, and providing a constant. The maximum number of iterations is , and the present repetition is Inline graphic .

Phase-2: If food travels in Levy at a particular speed ratio Inline graphic , its optimum course of action is Brownian. Using Eq. (5), the movements of prey take place in Levy and predators in Brownian While .

Here, the Levy motion is represented by a series of arbitrary numbers Inline graphic , which depends on the Levy probability. Prey motion is simulated in a Levy fashion by multiplying and , and predator motion is simulated by adjusting a step number to the predator location. The current investigation is based on Eq.(6) for the remaining groups.

Conversely, Inline graphic is thought of as a trait of adaptation that regulates a step’s size enabling the motion of predators. The target changes where it is in response to the predators’ Brownian motion, whereas the predator’s motion is simulated by multiplying and .

Phase 3: Levy is the most effective predation technique at low-speed ratios Inline graphic . This stage is described in Eq. (7) While .

In the Levy tactics, the motion of the hunter is simulated by multiplying Inline graphic and , and the motion of the prey is simulated by adding the number of steps to status, which aids in the updating of prey location.

Eddy formation

The Fish Aggregating Devices (FADs) impact is expressed numerically in Eq. (8).

The possibility of FADs impacting the optimization process is represented by Inline graphic . A binary column of an array containing one and zero is denoted as . This is created by creating an arbitrary vector within the interval , and when its length is smaller than 0.2, switching it by just one, and when it is more than 0.2, switching it to 0. Here, the term indicates a uniform randomized integer Inline graphic . A vector with the bottom and top limits of the size is denoted by the terms and . The letters and represent the prey matrixes’ randomized indices.

Proposed IMPA-based classification performance enhancement

The developed IMPA is employed for tuning the parameters in the developed SCHADNet-based lyric text classification model. The conventional MPA has several cons and pros. The exploration and exploitation are efficiently balanced by the MPA algorithm. It explores searching areas and takes advantage of the most effective solutions currently discovered by employing the concepts of predator escape and victim pursuit. It has proven to be effective in resolving intricate optimization issues involving several optima. Thus, the MPA is selected for the suggested SCHADNet model’s optimization. However, in circumstances when the problem environment is fluid, it might have trouble. Its efficiency may be impacted if it is unable to react swiftly to abrupt shifts in its surroundings. It might have trouble growing up to highly dimensional issues. The technique could make it more difficult to efficiently search for and identify the best answers when complexity rises. As a consequence, we developed an enhanced MPA named IMPA to optimize the parameters like hidden neurons in Trans-Bi-LSTM and GRU, epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy, sensitivity along with reduce the FNR and FPR.The conventional MPA includes phases like detecting top predators, MPA optimization scenarios, eddy formation, and marine memory saving, to provide more results lyric text classification process, but the developed IMPA is modified by removing the top predator detecting phase and marine memory saving phase from the conventional MPA.This IMPA-aided parameter tuning enhances the performance rates of the developed lyrics text classification process with better convergence rates.

Optimization process using developed IMPA

The optimization process begins by considering the population and iteration count of the movement of predators and prey in the marine ecosystem. Here, the iterative process begins by considering the several run times for getting the optimal solutions. Moreover, the solution in the developed IMPA algorithm gets encoded to maximize the model performance. The optimal solution can be effectively attained by adjusting the essential parameters by minimizing the error rate and maximizing the desired outcomes. Deriving Eq. (21), the objective function of the model is evaluated to select and understand the relevant parameters based on the algorithmic rules. Iteratively updating the random parameters could significantly improve the models searchability and provides a better balance between the exploitation and exploration abilities.Fig. 2 and Algorithm 1 offer the Flowchart and pseudo-code for the proposed IMPA algorithm.

Fig. 2 — Flowchart for the proposed IMPA algorithm.

Serial cascaded hybrid adaptive deep networks for lyrics text classification model

Bidirectional long short term memory

Bi-LSTM³¹ was generated from an LSTM that can train on its own from the initial pattern vector of definition documentation is the Bi-LSTM part that relies on a deep learning network design.

The Bi-LSTM can generalize and capture the features deeply. As an outcome, the Bi-LSTM becomes generalized network for invisible features. Word order, bidirectional contextual relationships, and dependence can be acquired over time. Moreover, it can address disappearance gradients and inflation problems with effectiveness.

The two LSTM neural networks in Bi-LSTM, which have forward as well as backward feedback, are linked to a single layer of output as soon as it comes to classification. Bi-LSTM adds reservations and context knowledge for every point to the incoming series to increase reliability. Three gates’ configurations and a single-cell condition make up the fundamental construction of an LSTM component.

The input, forgets, and output gates control how the cell status is updated and preserved throughout the LSTM component. The forget gate decides how to keep knowledge of the previous unit nation, the output gate manages which elements of the revised state of the cell are produced, and the input gate manages what components of the fresh data are retained in the cell’s memory. The subsequent Eq. (9) to (13) illustrates the particular operation that uses the LSTM components.

Here, the terms Inline graphic and constitute the source vector as well as concealed layer value at duration , respectively, and , , , and indicate the results of the intake gate, forget gate, output gate, and cell at period . The weighted array and biases variable are represented by and , accordingly, and their underscores, which include Inline graphic and , signify the matrix of weights and biases field that are part of the gate’s input architecture. The term stands for the function of sigmoid activation.

Context-sensitive data can be retained by the Bi-LSTM through the use of the LSTM component.The Bi-LSTM architecture has two LSTM coatings that are identical in both directions. Like traditional LSTM neural systems, both of the concurrent layers of LSTM function in a comparable way. The input vector Inline graphic in the other side is handled by two separate layers of LSTM for the front and backward instructions, accordingly, for the temporal increment. The result is the total of the secret state carriers, which is expressed in Eq. (14).

In this case, Inline graphic indicates the bias, the terms and are the weighted variables for the two simultaneous layers of LSTM in the forward as well as backward orders, correspondingly, and the terms and represent the final outcomes of both concurrent layer LSTM.The Bi-LSTM model’s graphical presentation is displayed in Fig. 3.

Fig. 3 — Graphical presentation of the Bi-LSTM model.

Gated recurrent unit

GRU³² represents a new technology that is reminiscent of LSTM, an improved variation of RNN. It transfers data using a hidden state rather than the cell’s internal state. In addition, there are two gates: the resetting gate and an updated gate. The first gate chooses what knowledge about past events to discard. The later gate indicates that the choice to toss or retain fresh data has been made. The data is scaled from Inline graphic by a sigmoid barrier and the graphically illustrated in Fig. 4.

Fig. 4 — Graphical illustration of the GRU model.

If 0, the state of hiding does not allow any knowledge to pass through, and 1 indicates that details must be inputted during the following state. The term Inline graphic gates are candidate phase activation mechanisms that crush the values among . The subsequent Eq. (15) to (18) is available from GRU.

Here, the term Inline graphic serves as a timestamp is the entered significance, and gives the state that is hidden. The respective weights that correspond to the updating and resetting gating are denoted by and , correspondingly. On the other hand, the term is a possible output.Some major benefits of GRU versus LSTM or similar longitudinal learning algorithms include effective training with fewer variables, insensitivity to sound, and greater distributed data within GRU. To identify temporal connections and variations in usage, GRU is used to compare what was consumed at one stamp to the amount consumed at the next stamp and make predictions on this basis. It is also economical concerning memory and time due to the usage of two Gates.

Developed SCHADNet-aided lyrics text classification

Transformer

Transformer³³ is the encoder-decoder architecture that uses the sequence-to-sequence conversion. An encoder was used to convert text into vector form during this classification process.The particular functioning technique is to use the encoding component to transform the inputs to a vector that has fixed-length software. Yet, the goal of this research is to matrix encoding the initial input environment or perspective in order to extract a high-level characteristic; just the encoder component of the Transformer is employed since there isn’t a requirement to turn this encoded vector to series outputs. This section consists of Inline graphic equal levels, where every layer consists of a pair of sub-layers: an entirely linked forward feed system and a multiple-head attention system. The residual link and standardization processes will proceed after the two sublayers. Mounting several scaling dot-product attention yields multiple heads of attention between individuals.The transformer’s intake constitutes a vector of Inline graphic which contains phrases acquired through the embedding layer’s input. Three numbers of linear transforming vectors ,, are arbitrarily set and multiplied to the inputting vector to acquire the query vector , key vector , and value vector , wherein indicates the concealed dimensions. Transformer Encoding was crucial due to the scaling of dot-product attention.Standardizing the resemblance scalar is necessary to determine its weight. After that, the closeness among every vector Inline graphic of the query column and every vector in the key matrices is computed. The vector of weights is subsequently divided by the total phrase worth in the phrase to obtain the scaling dot-product attentiveness result using Eq. (19).

Here, the square root of the vector size Inline graphic within the matrix is often used as the coefficient of scaling . A significantly greater number of characteristics may be obtained through repeatedly learning various categories following the order linear change of the query, key, and value array using various settings. After that, the multi-head attention mechanism generates the following output in Eq. (20).

Here, the concatenated vector is Inline graphic explained as . The embedded characteristics and the contents are inputted to the encoder of the transformer. Following this, the concealed presentation process , and the word level concealed presentation of , is taken.

Reason for choosingSCHADNet model

The SCHADNet model is developed for the classification of lyrical text. The pre-processed data Inline graphic are given to the recommended SCHADNet model for classifying the lyric text. The models such as transformer, Bi-LSTM and GRU are incorporated into the developed SCHADNet model. This work utilized the GRU and Bi-LSTM as primary networks for the classification process. These two techniques are relatively better than the conventional transformers because of faster training and efficiency. Moreover, these networks are low-cost and determine richer features than other models. The Bi-LSTM technique can capture the contextual relationships among the words and minimizes the impact of noisy data.However, the Bi-LSTM model’s sequential nature makes it complex to parallelize, resulting in poor scalability.The Trans-Bi-LSTM network effectively performs the tasks in parallel, thus improves the scalability. Though this network has better scalability, it struggles to handle the very long sequences and may face the vanishing gradient issues. Therefore, the GRU is combined with the Trans-Bi-LSTM model, thus forming the hybrid network. This hybridized network handles the variable-length sequences and also enhances the interpretability. Here, the obtained features from the Trans-Bi-LSTM are passed to the GRU model for further processing. After classifying the text features, the GRU model offered the classified outcome. Although this serially cascaded hybrid network offers relatively promising solutions, the parameters in the network require careful tuning for achieving maximum accuracy in the text classification process. For this objective, the IMPA is considered. This is an effective algorithm offering optimal solutions with better convergence values. Therefore, by employing the IMPA, the parameter tuning is performed. Thus the SCHADNet network is chosen for text classification.

The transformer model is combined with this technique, thus constructing the Trans-Bi-LSTM. Initially, the obtained preprocessed data are fed into the Trans-Bi-LSTM model, which supports determining and processing the inputted features, and it comprises the transformer and Bi-LSTM.The SCHADNet-based lyric text classification model is used to recognize and analyze lyric texts. The parameters like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU are tuned in the developed SCHADNet-based text classification model to enhance the accuracy Inline graphic , sensitivity along with reducing the FNR and FPR and the mathematical formulations are shown in Eq. (22) to Eq. (26). The objective function of the recommended SCHADNet-based lyric text classificationsystem is down in Eq. (21).IMPA’s support has enabled the SCHADNet network to provide highly accurate classified solutions. In the SCHADNet training process, the dataset is divided into two sections for training and testing in the ratio of 75:25. The training data is used to train and use the SCHADNet model for the classification process.

Here, the terms Inline graphic and denote the optimized hidden neurons in Trans-Bi-LSTM and GRU in the range of and , and the terms and define the optimized epochs in Trans-Bi-LSTM and GRU in the range of and .

(i) Eq. (22) is used to assess accuracy Inline graphic .

(ii) Eq. (23) is used for estimating sensitivity Inline graphic .

(iii) Eq. (24) is used for defining the False Negative Rate (FNR) Inline graphic .

(iv) Eq. (25) is used for evaluating the False Positive Rate (FPR) Inline graphic .

In this case, the terms Inline graphic and constitute the true positive and true negative, and stand for the false negative and false positive, correspondingly.Fig.5 displays the representation of SCHADNet-aided lyrics text classification model.

Fig 5 — Developed SCHADNet-aided lyrics text classification model.

Interacting the model with each other

In order to provide better classification performance, the effective data pre-processing is achieved to provide cleaned data without the noise. With the help of pre-processing techniques, it effectively removes punctuation, special characters, redundant data, and inappropriate stemming. Utilizing the pre-processing helps to minimize the computational complexity and strengthen the models capability. The data can be cleaned and provided with meaningful information by removing punctuation and special characters. So, the noisy and irrelevant data is cleaned for extracting the relevant features that leads to enhance the classification performance. Thus, the outcome of the model is simple and easier to understand the model to get the precise outcomes. In this context, the effective preprocessed outcome Inline graphic is inputted into the SCHADNetclassification model to precisely classify the lyrics text. The SCHADNet model that was developed can learn semantic patterns to understand the user’s preference based on underlying patterns.

Result and discussion

Simulation setup

Python platform was employed in the lyrics text classification for the entire processing. The proposed IMPA scheme used 50 maximum Iterations. The suggested IMPA algorithm’s population input is referred to. Here, the populations are encoded by utilizing the required amount of parameters. Here, the IMPA’s number of populations was 10. The chromosome represents an individual solution encoded in a format suitable for manipulation by the algorithm. For the IMPA, the length of the chromosome was 4 The designed SCHADNet-based lyrics text classification process considered the following parameters: The number of epochs-50, batch size-16, number of LSTM units-64, number of transformer encoder layers-2, number of attention heads-4, learning rate: 0.0001, number of GRU units: 64, dropout rate: 0.2, hidden layer size-128, activation function: TanH, optimizer: {SGD, Adam, RMSprop}. Finally, the network produces highly accurate lyrics text classified solutions. The performance was validated with numerous existing systems like LSTM³⁴, Trans-Bi-LSTM³⁵, GRU³² and Trans-Bi-LSTM-GRU³⁶, and the algorithms like Eurasian Oystercatcher Optimizer (EOO)³⁷, Valley Optimizer (EVO)³⁸, Political Optimizer (PO)³⁵ and Marine Predators Algorithm (MPA)²⁶.

In experiment, the selection of parameters is treated as an automated search for the most efficient configuration within a predefined range. The process begins by defining the specific hyperparameters such as hidden neuron counts and epoch sizes. The IMPA then initializes a population of candidate solutions. Each candidate represents a unique combination of parameters that is used to train the Trans Bi-LSTM or GRU. The resulting performance is assigned as a best fitness score to that specific combination. As the algorithm iterates, it refines these values through exploration and exploitation phase. Throughout this process, the algorithm constantly compares new combinations against the current best performer. Upon reaching the maximum number of iterations, the IMPA outputs the global best solution, which contains the optimized values for neurons and epochs that yielded the highest accuracy. These optimized values are then finalized as the parameters for the experimental model. Thus, choosing the hidden neuron count in Trans Bi-LSTM and GRU within the range of [5–255] can effectively balance architectural depth with computational efficiency. Further, selecting the number of epochs in Trans Bi-LSTM and GRU within [5–50] helps to generalize well on unseen data.

Experimental measures

The following measures are employed to develop the lyric text classification framework.

(a) Eq. (19) determines precision Inline graphic .

(b) Eq. (21) can be used to determine the F1-Score Inline graphic .

(d) When applied Eq. (26), yields the Matthews correlation coefficient (MCC) Inline graphic .

(e) Eq. (27) is used to classify Negative Predictive Value (NPV) Inline graphic .

(f) Eq. (28) provides a definition for False Discovery Rate (FDR) Inline graphic .

Convergence analysis

Fig. 6 provides the analysis on proposed lyric text classification model considering the convergence score.The proposed technique’s convergence over the existing models is validated using this cost function-based experiment.The developed lyrics text classification model given a cost function score is 11.42% lower than EOO-SCHADNet, 9.26% lower than EVO-SCHADNet, 11.26% lower than PO-SCHADNet and 13.48% lower than MPA-SCHADNet at the 30th iteration. When considering the 40th iteration, the cost function is 17.33% lower than EOO-SCHADNet, 7% lower than EVO-SCHADNet, 12.47% lower than PO-SCHADNet and 11.42% lower than MPA-SCHADNet.The proposed IMPA-SCHADNet achieved a higher convergence rate than the existing techniques due to the lower cost function values of the designed model. Also, it has been reported that the IMPA-SCHADNet technique is efficiently supported to classify the texts in the lyrics than the other models.

Dataset-1-based performance analysis on the proposed lyrics text classification model

Dataset-1-based analysis on the lyrics text classification model is shown in Fig. 7 with existing classifiers and Fig. 8 with heuristic approaches.This experiment takes into account activation functions such as linear, sigmoid, TanH, softmax, and ReLU to ensure the designed model’s improved rates of performance. This activation function-aided validation ensures the designed model how effectively classifies the lyrics than the other traditional techniques. When analyzing the classifiers, the developed model offered an accuracy value score is 8.23% more than LSTM, 2.22% enhanced than Trans Bi-LSTM, 6.97% increased than GRU, and 1.09% superior to Trans-Bi-LSTM-GRU while analyzing the sigmoid function. When taking the TanH function, the developed model offered the NPV value based on algorithms is 0.91% superior to EOO-SCHADNet, 0.61% more than EVO-SCHADNet, 0.51% enhanced than PO-SCHADNet and 0.24% increased than MPA-SCHADNet.The IMPA-SCHADNet model is better suited for text classification than any other traditional techniques because of its superior value.

Dataset-2-based Analysis on the proposed lyrics text classification model

Dataset-2-based performance analysis on the proposed lyrics text classification model with existing classifiers and heuristic approaches are shown in Fig. 9 and Fig. 10.This experiment also utilized the various standard activation functions for analyzing the designed model over other previous techniques. Based on classifiers, the developed model offered an accuracy value is 27.28% more than LSTM, 31.03% enhanced than Trans-Bi-LSTM, 22.05% increased than GRU, and 18.8% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. When considering the Linear function, the developed model offered the NPV value based on algorithms is 9.62% superior to LSTM, 6.8% more than Trans-Bi-LSTM, 4.92% enhanced than GRU and 2.56% increased than Trans-Bi-LSTM-GRU.According to the other performance measures, the designed technique has better rates of performance than the other classification techniques. Thus, it has been elucidated that the designed lyrics classification approach offers relatively more efficient solutions than any other models when considering the second dataset.

Fig 9 — ClassifierAnalysis on the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Fig 10 — Algorithmic analysis of the proposed lyrics text classification model in terms of (a) Accuracy, (b) FNR, (c) FPR, and (d) NPV.

Performance analysis of developed model using dataset 3

Based on the third dataset, the experimental validation is given in Fig.11 and Fig.12 over the previous algorithms and models. Here, the graph analysis is conducted by considering the different activation functions of ReLu, sigmoid, linear, tanh and softmax is validated to provide superior outcomes. This experiment validation shows the suggested lyrics text classification framework’s superior solutions with the support of various activation functions. When considering the ReLU activation function in Fig.11 (b), the FNR of the designed lyrics text classification process is minimized by 38.82% of LSTM, 61.17% of Trans-Bi-LSTM, 35.29% of GRU, and 11.76% of Trans-Bi-LSTM-GRU respectively. The design of the lyrics text classification process resulted in relatively lower error rates than other models, which led to an increase in performance rates.

Overall classifier analysis on the proposed lyrics text classification model

Table 2 illustrates the overall classification analysis of the proposed lyrics text classification model based on three datasets. Standard true and false measures are used for this experimental validation. These measures show the reliability and efficiency of the presented work over the existing models.In dataset-1, the developed model provided the F1-Score is 20.40% more than LSTM, 8.16% enhanced than Trans Bi-LSTM, 13.69% increased than GRU, and 6.04% superior to Trans-Bi-LSTM-GRU. When considering dataset-2, the developed model offered the specificity is 3.94% superior to LSTM, 5.2% more than Trans-Bi-LSTM, 3.6% enhanced than GRU and 0.84% increased than Trans-Bi-LSTM-GRU. Similarly, when considering the third dataset, the FDR of the suggested lyrics text classification process is minimized by 16.56% of LSTM, 24.43% of Trans-Bi-LSTM, 16.3% of GRU, and 6.48% of Trans-Bi-LSTM-GRU accordingly. The three datasets show the superior solutions of the suggested model over the other techniques for any performance metrics.The designed lyrics text classification technique has been shown to have low error rates and high classification accuracy rates compared to conventional techniques in both dataset 1 and dataset 2.

Table 2.

OverallClassifier analysis on the proposed lyrics text classification model.

Terms	LSTM³⁴	Trans-Bi-LSTM³⁵	GRU³²	Trans-Bi-LSTM-GRU³⁶	IMPA-SCHADNet
Dataset-1
Precision	47.009	54.713	50.961	56.306	61.369
Recall	88.872	91.574	90.354	92.046	93.471
NPV	98.628	98.988	98.827	99.049	99.230
FPR	11.131	8.422	9.661	7.936	6.538
FNR	11.128	8.426	9.646	7.954	6.529
Accuracy	88.869	91.578	90.341	92.062	93.463
FDR	52.991	45.287	49.039	43.694	38.631
Specificity	88.869	91.578	90.339	92.064	93.462
F1-Score	61.492	68.500	65.166	69.871	74.092
MCC	0.596	0.668	0.634	0.682	0.726
Dataset-2
Recall	89.701	88.459	89.920	92.549	92.988
Specificity	89.579	88.507	89.871	92.330	93.109
Precision	74.155	71.955	74.742	80.088	81.812
FDR	25.845	28.045	25.258	19.912	18.188
FPR	10.421	11.493	10.129	7.670	6.891
Accuracy	89.609	88.495	89.883	92.385	93.079
NPV	96.309	95.834	96.396	97.381	97.551
FNR	10.299	11.541	10.080	7.451	7.012
F1-Score	81.190	79.358	81.631	85.869	87.043
MCC	0.747	0.722	0.753	0.811	0.827
Dataset-3
MCC	0.6337755	0.6019229	0.6348357	0.6724605	0.6964619
Recall	90.340578	89.115489	90.397094	91.700065	92.515413
FDR	49.040598	52.351401	48.937052	44.803024	42.075965
Precision	50.959402	47.648599	51.062948	55.196976	57.924035
FNR	9.6594218	10.884511	9.6029058	8.2999349	7.4845873
FPR	9.6598813	10.878997	9.6259563	8.2702601	7.4670122
Specificity	90.340119	89.121003	90.374044	91.72974	92.532988
NPV	98.825917	98.661148	98.833139	99.004646	99.109276
Accuracy	90.340165	89.120452	90.376349	91.726772	92.53123
F1-Score	65.162102	62.095661	65.261427	68.913115	71.24283

Open in a new tab

Overall analysis on the proposed lyrics text classification model based on algorithms

The overall analysis of the proposed lyrics text classification model based on three datasets is shown in Table 3.The developed lyrics text classification model given an MCC is 22.01% more than EOO-SCHADNet, 17.47% enhanced than EVO-SCHADNet, 15.6% increased than PO-SCHADNet, and 7.55% superior to MPA-SCHADNet based on dataset-1. When considering the dataset-2, FDR is 41.85% superior to EOO-SCHADNet, 38.39% more than EVO-SCHADNet, 30.46% enhanced than PO-SCHADNet and 12.8% increased than MPA-SCHADNet. Likewise, when considering the third dataset, the recommended lyrics text classification process’s precision is enhanced by 25.94% of EOO-SCHADNet, 21.25% of EVO-SCHADNet, 13.27% of PO-SCHADNet, and 7.89% of MPA-SCHADNet accordingly. These experimental validations for three data sources reported the superior solutions of the designed technique.The experimental validations enabled the designed text classification process to achieve more effective solutions than conventional models, ensuring the model’s robustness and reliability.

Table 3.

OverallAlgorithmic analysis of the proposed lyrics text classification model.

Terms	EOO-SCHADNet³⁷	EVO-SCHADNet³⁸	PO-SCHADNet³⁹	MPA-SCHADNet³⁰	IMPA-SCHADNet
Dataset-1
Accuracy	88.863	89.743	90.121	91.827	93.463
Recall	88.853	89.748	90.144	91.827	93.471
Specificity	88.864	89.742	90.119	91.827	93.462
Precision	46.994	49.294	50.339	55.523	61.369
FPR	11.136	10.258	9.881	8.173	6.538
FNR	11.147	10.252	9.856	8.173	6.529
NPV	98.625	98.747	98.799	99.021	99.230
FDR	53.006	50.706	49.661	44.477	38.631
F1-Score	61.474	63.636	64.603	69.203	74.092
MCC	0.595	0.618	0.628	0.675	0.726
Dataset-2
Accuracy	86.779	87.692	89.481	91.892	93.079
Recall	86.486	87.363	89.701	91.746	92.988
Specificity	86.876	87.801	89.408	91.941	93.109
Precision	68.717	70.477	73.842	79.143	81.812
FPR	13.124	12.199	10.592	8.059	6.891
FNR	13.514	12.637	10.299	8.254	7.012
NPV	95.071	95.422	96.302	97.094	97.551
FDR	31.283	29.523	26.158	20.857	18.188
F1-Score	76.585	78.017	81.003	84.980	87.043
MCC	0.684	0.704	0.745	0.799	0.827
Dataset-3
Accuracy	87.111719	88.303622	90.085257	91.145829	92.53123
Recall	87.144319	88.299108	90.082465	91.153169	92.515413
Specificity	87.108097	88.304124	90.085567	91.145013	92.532988
Precision	42.892011	45.617999	50.237732	53.353356	57.924035
FPR	12.891903	11.695876	9.9144333	8.8549869	7.4670122
FNR	12.855681	11.700892	9.9175348	8.8468311	7.4845873
NPV	98.386644	98.549065	98.791558	98.933027	99.109276
FDR	57.107989	54.382001	49.762268	46.646644	42.075965
F1-Score	57.488474	60.157043	64.503027	67.309452	71.24283
MCC	0.553628	0.5816648	0.6269435	0.6559782	0.6964619

Open in a new tab

Statistical analysis of the proposed lyrics text classification model

Statistical performance analysis on the proposed lyrics text classification model based on dataset-1 and dataset-2 is shown in Table 4.Here, the statistical measures such as worst, best, mean, median, and standard deviation are considered for this experiment. The minimum recorded performance value is defined by the best measure, while the median explains the middle value of the performance metric. Finally, the standard deviation indicates the variability of the performance metric. These metrics are employed for fitness function validation, where the accuracy, sensitivity, FPR, and FNR are considered.The median of the developed lyrics text classification model is 7.19% more than EOO-SCHADNet, 3.5% enhanced than EVO-SCHADNet, 5.05% increased than PO-SCHADNet, and 4.92% superior to MPA-SCHADNetbased on dataset-1. When considering dataset 2, the standard deviation of the developed model is 15.73% superior to EOO-SCHADNet, 36.22% more than EVO-SCHADNet, 2.45% enhanced than PO-SCHADNet and 32.55% increased than MPA-SCHADNet. The experimental validations indicate that the designed model is effective in selecting optimal solutions and offers better performance rates than the existing algorithms.

Table 4.

Statistical analysis of the proposed lyrics text classification model.

Terms	EOO-SCHADNet ³⁷	EVO-SCHADNet ³⁸	PO-SCHADNet ³⁹	MPA-SCHADNet ³⁰	IMPA-SCHADNet
Dataset-1
Worst	5.020	5.413	6.704	6.918	6.295
Best	4.044	4.048	4.038	4.108	3.906
Mean	4.323	4.317	4.293	4.283	3.994
Median	4.209	4.048	4.114	4.108	3.906
Std	0.321	0.484	0.712	0.544	0.435
Dataset-2
Worst	5.868	6.151	6.489	7.039	5.574
Best	4.322	4.039	4.196	4.071	3.918
Mean	4.726	4.354	4.354	4.298	4.100
Median	4.558	4.039	4.214	4.071	4.033
Std	0.445	0.588	0.366	0.556	0.375

Open in a new tab

ROC analysis on the proposed lyrics text classification model

ROC analysis on the suggested lyrics text classification model is depicted in Fig. 13.This ROC-aided experiment illustrates the designed technique’s minimized error rates over the existing classification models.In dataset-1, the developed model provided the ROC score is 15.29% more than LSTM, 7.92% enhanced than Trans Bi-LSTM, 2.43% increased than GRU, and 0.2% superior to Trans-Bi-LSTM-GRU. By analyzing the ROC, the model’s ability can be maximized with different thresholds among the classes. Thus, it can minimize the misclassification issues to improve the overall performance of the model. The implemented text classification model for lyrics is guaranteed to offer efficient solutions with lower error rates than other existing techniques through experimental validation.

State-of-the-art-Method comparative analysis in lyrics text classification model

By comparing the traditional and related classification models in Table 5, the performance of the suggested lyrics text classification process is validated. In this table validation, the state-of-the-art-techniques like CNN, LSTM and DNN model is validated and also the recent techniques of CNN with Fast Text embeddings (CNN-FT)⁴⁰, Convolution and Attention with a Bi-directional Gated Recurrent Unit (CAT-BiGRU)⁴¹ and Multi-View RNN (MV-RNN)⁴²is validated to prove the efficiency in the developed model. In this validation, the accuracy of the developed model shows 93.4%. Higher accuracy performance could effectively minimize the error rate to improve the classification performance. Moreover, the error rate of developed IMPA-SCHADNet model shows 6.53% in terms of FPR. Considering dataset-2, the designed text classification process’s sensitivity is enhanced by 18.6% of CNN, 14.8% of LSTM, 13.2% of DNN, 10.9% of CNN-FT⁴⁰, 6.24% of CAT-BiGRU⁴¹ and 11.3% of MV-RNN⁴² respectively. Thus, it has been reported that the implemented lyrics text classification process achieved very effective and superior solutions to the conventional and related classification models.

Table 5.

Overallperformance analysis of the proposed lyrics text classification model over state-of-the-art models.

Terms	State-of-the-art-techniques			Recent techniques			Proposed IMPA-SCHADNet
Terms	CNN ²⁴	LSTM ²⁵	DNN ²⁷	CNN-FT ⁴⁰	CAT-BiGRU ⁴¹	MV-RNN ⁴²	Proposed IMPA-SCHADNet
Dataset-1
Accuracy	81.35	81.35	83.78	86.54	88.65	85.08	93.46
Sensitivity	79.12	79.20	81.69	84.20	86.65	82.62	93.47
Specificity	83.88	83.77	86.11	89.16	90.83	87.87	93.46
Precision	54.80	49.64	60.76	59.71	58.18	60.56	61.36
FPR	16.12	16.23	13.89	10.84	9.17	12.13	6.538
FNR	20.88	20.80	18.31	15.80	13.35	17.38	6.52
NPV	77.94	78.10	80.84	83.41	86.15	81.64	99.23
FDR	45.20	45.36	43.24	40.29	48.82	41.44	38.63
F1-Score	71.86	71.83	72.15	73.87	72.85	73.49	74.09
MCC	62.87	62.86	67.70	73.24	77.40	70.35	72.60
Dataset-2
Accuracy	79.16	82.00	83.70	85.56	89.38	84.43	93.07
Sensitivity	78.35	80.97	82.10	83.82	87.52	83.48	92.98
Specificity	80.14	83.24	85.71	87.75	91.67	85.56	93.10
Precision	80.87	78.51	80.85	79.56	80.83	77.38	81.81
FPR	19.86	16.76	14.29	12.25	8.33	14.44	6.89
FNR	21.65	19.03	17.90	16.18	12.48	16.52	7.01
NPV	75.13	78.17	79.19	81.22	85.62	81.22	97.55
FDR	27.13	24.49	22.15	20.44	27.17	19.62	18.18
F1-Score	80.55	83.18	84.88	86.60	80.10	85.39	87.04
MCC	58.24	63.95	67.42	71.18	78.82	68.82	82.70
Dataset-3
Accuracy	79.16	82.00	83.70	85.56	89.38	84.43	92.53
Sensitivity	78.35	80.97	82.10	83.82	87.52	83.48	92.52
Specificity	80.14	83.24	85.71	87.75	91.67	85.56	92.53
Precision	52.87	55.51	57.85	56.56	54.83	53.38	57.92
FPR	19.86	16.76	14.29	12.25	8.33	14.44	7.47
FNR	21.65	19.03	17.90	16.18	12.48	16.52	7.48
NPV	75.13	78.17	79.19	81.22	85.62	81.22	99.11
FDR	45.13	44.49	43.15	45.44	47.17	45.62	42.08
F1-Score	70.55	63.18	64.88	66.60	70.10	65.39	71.24
MCC	58.24	63.95	67.42	61.18	68.32	68.82	69.65

Open in a new tab

Ablation study of the proposed model

Table 6 represents the ablation study of the designed model. This analysis helps to evaluate the effectiveness of the developed system. The Table demonstrates that the classical BiLSTM system attains 88.8% accuracy, which is relatively lower compared to other models, representing poor user experience and inefficient resource allocation. However, the developed model attains 93.4% of accuracy, leading to more efficient and enhanced performance. Therefore, the developed model demonstrates superior performance in text classification than traditional models.

Table 6.

Ablation study of the proposed model.

Terms	BiLSTM	BiLstm-GRU	LSTM-GRU	TransLstm - GRU	Proposed IMPA-SCHADNet
Dataset-1
Accuracy	88.86751	91.57618	90.3464	92.0586	93.4633
Sensitivity	88.85014	91.5698	90.33334	92.06673	93.47067
Specificity	88.86944	91.57689	90.34785	92.05769	93.46248
Precision	47.00439	54.70844	50.97742	56.29358	61.36936
FPR	11.13056	8.423114	9.652147	7.942306	6.537522
FNR	11.14986	8.430197	9.666659	7.93327	6.529328
NPV	98.62513	98.98751	98.82515	99.05156	99.22975
FDR	52.99561	45.29156	49.02258	43.70642	38.63064
F1-Score	61.48262	68.49469	65.17495	69.86728	74.09241
MCC	0.595509	0.66818	0.633887	0.68234	0.725815
Dataset-2
Accuracy	89.59094	88.86048	89.81008	92.31191	93.07889
Sensitivity	89.77356	88.60482	89.55442	92.40321	92.98758
Specificity	89.53007	88.9457	89.8953	92.28147	93.10933
Precision	74.08077	72.76545	74.71054	79.96207	81.81234
FPR	10.46993	11.0543	10.1047	7.718529	6.890674
FNR	10.22644	11.39518	10.44558	7.596786	7.012418
NPV	96.3322	95.90444	96.27119	97.32922	97.55102
FDR	25.91923	27.23455	25.28946	20.03793	18.18766
F1-Score	81.17569	79.90777	81.46179	85.73365	87.04274
MCC	0.747262	0.729752	0.750965	0.809036	0.826616

Open in a new tab

Convergence time complexity analysis of the proposed model

Table 7 shows the convergence time analysis of the proposed model. Here, the traditional Bi-LSTM model attains higher training time and poor scalability, which indicates that the model struggles with large datasets and real world applications. However, the proposed hybrid model demonstrates superior efficiency. By leveraging the strengths of Trans-Bi-LSTM and GRU, the designed framework acheives minimzed training time, faster convergence and lower cost values. This is primarily because the serial cascaded architecture improves network flexibility, allowing for more efficient feature propagation. Furthermore, the integration of the IMPA ensures the model reaches an optimal solution rapidly. This enhanced optimization leads to a significantly better convergence rate and improved overall training performance for the lyrics text classification task. Therefore, the developed model is more effective compared to traditional models

Table 7.

Convergence time analysis of the proposed model.

Model	Training Characteristics	Convergence/Time Processing Result
Bi-LSTM	Sequential, slow	Higher training time, poor scalability
Trans-Bi-LSTM	Parallelizable	Reduced training time, better scalability
GRU	Efficient, lightweight	Lower complexity, faster training
SCHADNet (Proposed)	Hybrid (Trans-Bi-LSTM + GRU, tuned by IMPA)	Minimized training time, faster convergence, lower cost values

Open in a new tab

Best/Worst analysis of the proposed model

Figure 14 demonstrates the best and worst analysis of the developed model. This analysis helps to demonstrates superiority, quantifies performance gains and to evaluate robustness of the model. In Figure 14(a), the worst (LSTM) attains an accuracy of 88.87% and the Best(SCHADNet) gains 93.46%, which indicates that the proposed model gains superior accuracy leading to enhanced reliability, improved decision making and better user experience. As a resut, it is proven that the proposed SCHADNet model achives greater performance compared to traditional models.

Fig. 14 — Best/Worst analysis on the developed lyrics text classification model based on Datset 1 and 2 in terms of (a) Accuracy, (b) F1-Score and (c) Recall.

State of comparison of the proposed model

The State of Art analysis of the proposed model is stated in Table 8. This comparison is performed to evaluate the performance and efficiency of the system. This evaluation is useful for identifying the advantage and disadvantage, contributing to the ongoing process of the system. In this Table, the traditional SVM model achieves a low accuracy of 88.8%, leads to inaccurate results and a waste of resources. However, the developed IMPA-SCHADNet model gain an accuracy of 93.4% which is superiorto other classical models, leads to enhanced efficiency and better decision making. As a result, the suggested IMPA-SCHADNetmodel achieved better performance than other models.

Table 8.

Comparative analysis of the suggested model with different optimizers.

TERMS	SVM⁴³	SLEM⁴⁴	IMPA-SCHADNet
Dataset 1
Accuracy	88.86913	91.57611	93.4633
Recall	88.87082	91.586	93.47067
Specificity	88.86894	91.57501	93.46248
Precision	47.00907	54.70731	61.36936
FPR	11.13106	8.42499	6.537522
FNR	11.12918	8.414001	6.529328
NPV	98.62764	98.98942	99.22975
FDR	52.99093	45.29269	38.63064
F1-Score	61.49158	68.49833	74.09241
MCC	0.595633	0.668242	0.725815
Dataset 2
Accuracy	89.49963	88.64134	89.70051
Recall	89.48137	88.67787	89.70051
Specificity	89.50572	88.62917	89.70051
Precision	73.97343	72.21892	74.37916
FPR	10.49428	11.37083	10.29949
FNR	10.51863	11.32213	10.29949
NPV	96.23037	95.91568	96.31373
FDR	26.02657	27.78108	25.62084
F1-Score	80.99174	79.60656	81.3245
MCC	0.744661	0.725761	0.749205

Open in a new tab

Impact of feature extraction on the proposed model

Table 9 shows the impact of feature extraction on the proposed model. This analysis is performed over various feature extraction techniques like Glove embedding, Term Frequency Inverse Document Frequency (TF-IDF), and Bidirectional Encoder Representations from Transformers (BERT) to showcase the efficacy of the developed framework without these feature extraction processes. Here, the accuracy of the designed IMPA-SCHADNet is 93.46%, whereas the addition of Glove embedding in the IMPA-SCHADNet achieved the accuracy of 91.38%. Similarly, the integration of BERT in the designed IMPA-SCHADNet attained 92.74% accuracy, which is lower than the designed IMPA-SCHADNet techniques. Thus, the result confirmed that the use of Trans-Bi-LSTM in the designed IMPA-SCHADNet can effectively extract the significant features from the given input. These findings suggest that the internal feature extraction mechanism of the designed IMPA-SCHADNet is more effective for this classification task than relying on traditional feature extraction techniques like Glove embedding, TF-IDF, and BERT.

Table 9.

Impact of feature extraction on the proposed model.

Models	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	FNR (%)	FPR (%)
TF-IDF+ IMPA-SCHADNet	87.92	78.44	86.31	82.16	13.69	12.08
GloVe+ IMPA-SCHADNet	91.38	83.92	90.87	87.24	9.13	8.62
BERT+ IMPA-SCHADNet	92.74	86.15	92.08	89.01	7.92	7.26
IMPA-SCHADNet	93.46	87.92	93.47	90.61	6.53	6.54

Open in a new tab

Discussions

An effective lyrics text classification approach is implemented in this work by utilizing the powerful deep learning techniques. Various performance measures are used to support the experimental analysis of the designed technique. Also, the cost function experiment is performed for the developed approach. This experiment reported that the designed approach obtained very low cost function values thus confirming the higher convergence rates. Moreover, the performance examination of the developed lyrics text classification process is conducted for the first dataset over the previous classifiers and algorithms. The performance of the lyrics text classification process, which was developed using classical models and algorithms, is examined in section "dataset-2-based analysis on the proposed lyrics text classification model" using a second dataset. These performance experiments elucidated that the designed lyrics text classification process obtained relatively superior solutions than any other techniques over the classical models. Overall comparative examination of the implemented lyrics text classification process over existing techniques and algorithms for three data sources is examined. The overall comparative examination demonstrates the improved performance rates of the suggested approach, which ensures high efficiency in the classification process. Also, it provides the statistical experiment of the implemented model by considering the statistical measures. This experiment ensures the IMPA algorithm helps to select the optimal parameters more effectively than any other existing algorithms thus providing the detailed insights of the designed process. In addition, the ROC validation of the suggested lyrics text classification process has been confirmed. This operation explains that the implemented approach attained very lower error rates than the classical models thus offering outstanding solutions. Thus, the developed model provides the performance investigation of the developed lyrics text classification process by comparing it with the state-of-the-art models. Based on this experimental analysis, it was found that the designed model outperforms the state-of-the-art models and provides highly accurate solutions. Finally, the performance verification of the designed model utilizing a third dataset over existing algorithms and classifiers are computed. The implemented text classification process for lyrics is more effective than the baseline models thanks to this experimental solution.

Conclusion

This paper provided a lyrics text classification approach that utilized deep learning to classify the lyrics text based on its mood, genre, sentiment, and performer. The text pre-processing step was preceded by the acquisition of essential textual information from usual internet sites. Following that, SCHADNet was used to classify the text using the pre-processed text. The parameters, like hidden neurons in Trans-Bi-LSTM and GRU, and epochs in Trans-Bi-LSTM and GRU, were tuned using the proposed IMPA algorithm to enhance the accuracy and sensitivity, along with reducing the FNR and FPR. Finally, the developed SCHADNet model provided the text-classified results. To demonstrate the efficacy of the proposed model, an empirical evaluation was conducted against a variety of traditional methods. From the evaluation, the developed model provided a precision value was 41.22% more than LSTM, 18.95% enhanced than Trans-Bi-LSTM, 38.82% increased than GRU, and 13.46% superior to Trans-Bi-LSTM-GRU while analyzing the ReLU function. The mean of the developed lyrics text classification model is 7.61% more than EOO, 7.48% enhanced than EVO, 6.96% increased than PO, and 6.74% superior to MPA. The experimental validations have confirmed that the proposed lyrics text classification process outperformed and provided more effective solutions than the traditional techniques. This designed lyrics text classification process is used for some practical implications like mood-aided analysis, research and academia, music recommendation systems, artist and genre analysis, and so on.

Limitations of the developed model

The main limitations of the developed SCHADNet system are its computational complexity due to the combination of several deep learning components, Transformer, Bi-LSTM, and GRU in a serial cascaded structure. This design, while effective for extracting contextual and sequential connections, demands greater computational resources, extended training time, and superior memory than simpler architectures. Furthermore, because the system learns end-to-end without predefined feature extraction, it demands a considerable amount of training data to attain optimal generalization.

Future scope

In future, strategies like transfer learning, self-supervised learning, and advanced data augmentation will be introduced to minimize the reliance on vast amounts of data and improve the systems capability to generalize.

Acknowledgements

I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.

Author contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Data availability

Dataset 1: The data underlying this article are available in https://www.kaggle.com/datasets/mateibejan/multilingual-lyrics-for-genre-classification?select=train.csv. Access date: 2024-01-02. Dataset 2: The data underlying this article are available in https://github.com/wojtek11530/song_lyric_classification/tree/master/datasets. Access data: 2024-01-03.

Declarations

Competing interest

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Furner, M., Islam, M. Z. & Li, C. T. Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data. Expert Syst. Appl.182, 115236 (2021). [Google Scholar]
2.Hizlisoy, S., Yildirim, S. & Tufekci, Z. Music emotion recognition using convolutional long short term memory deep neural networks. Eng. Sci. Technol. Int J.24(3), 760–767 (2021). [Google Scholar]
3.Wang, C. & Ko, Y. C. Emotional representation of music in multi-source data by the internet of things and deep learning. J. Supercomput.79(1), 349–366 (2023). [Google Scholar]
4.Jena, K. K., Bhoi, S. K., Mohapatra, S. & Bakshi, S. A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Comput. Appl.35(1), 11223–11248 (2023). [Google Scholar]
5.Khattak, A., Asghar, M. Z., Khalid, H. A. & Ahmad, H. Emotion classification in poetry text using deep neural network. Multimed. Tools Appl.81(18), 26223–26244 (2022). [Google Scholar]
6.Yang, L., Shen, Z., Zeng, J., Luo, X. & Lin, H. COSMIC: music emotion recognition combining structure analysis and modal interaction. Multimed. Tools Appl.83 (5), 1–16 (2023). [Google Scholar]
7.Dong, L. Using deep learning and genetic algorithms for melody generation and optimization in music. Soft Comput.27(1), 17419–17433 (2023). [Google Scholar]
8.Sarkar, R., Choudhury, S., Dutta, S., Roy, A. & Saha, S. K. Recognition of emotion in music based on deep convolutional neural network. Multimed. Tools Appl.79, 765–783 (2020). [Google Scholar]
9.Policicchio, V. L., Pietramala, A. & Rullo, P. GAMoN: discovering M-of-N ¬,∨ hypotheses for text classification by a lattice-based genetic algorithm. Artif. Intell.191, 61–95 (2012). [Google Scholar]
10.Dwiyani, L. K. D., Suarjaya, I. M. A. D. & Rusjayanthi, N. K. D. Classification of explicit songs based on lyrics using random forest algorithm. J. Inform. Syst. Inform.5, 550–567 (2023). [Google Scholar]
11.Du, J. Sentiment analysis and lyrics theme recognition of music lyrics based on natural language processing. J. Electr. Syst.20, 315–321 (2024). [Google Scholar]
12.Xie, C. et al. Music genre classification based on res-gated CNN and attention mechanism. Multimed. Tools Appl.83(5), 13527–13542 (2024). [Google Scholar]
13.Jandaghian, M., Setayeshi, S., Razzazi, F. & Sharifi, A. Music emotion recognition based on a modified brain emotional learning model. Multimed. Tools Appl.82(4), 26037–26061 (2023). [Google Scholar]
14.Rajan, R. & Nithin, S. K. Folk music structural segment classification using GRU-based hierarchical attention network. Sādhanā48(4), 254 (2023). [Google Scholar]
15.Hongdan, W., SalmiJamali, S., Zhengping, C., Qiaojuan, S. & Le, R. An intelligent music genre analysis using feature extraction and classification using deep learning techniques. Comput. Electr. Eng.100, 107978 (2022). [Google Scholar]
16.Sujeesha, A. S., Mala, J. B. & Rajan, R. Automatic music mood classification using multi-modal attention framework. Eng. Appl. Artif. Intell.128, 107355 (2024). [Google Scholar]
17.da Silva, A. C. M., Coelho, M. A. N. & Neto, R. F. A music classification model based on metric learning applied to MP3 audio files. Expert Syst. Appl.144, 113071 (2020). [Google Scholar]
18.Andreyan Rizky Baskara; Muti’a Maulida; Muhammad Tri Madya Lestiyanto; Yuslena Sari; Nurul Fathanah Mustamin; Eka Setya Wijaya, Explicit content classification in indonesian song lyrics using the LSTM-CNN method. 2024 Ninth International Conference on Informatics and Computing (ICIC) (2024).
19.Bonela, Abraham Albert, He, Zhen, Luxford, Dan-Anderson., Riordan, Benjamin & Kuntsche, Emmanuel. Development of the lyrics-based deep learning algorithm for identifying alcohol-related words (LYDIA). Alcohol Alcohol.59, 2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bolla, B. K., Pattnaik, S. R. & Patra, S. Detection of objectionable song lyrics using weakly supervised learning and natural language processing techniques. Procedia Comput. Sci.235, 1929–1942 (2024). [Google Scholar]
21.Syed Nawaz Pasha; Dadi Ramesh; Sallauddin Mohmmad; Shabana; D. Kothandaraman; T. Sravanthi, Song lyrics genre detection using RNN. AIP Conference Proceedings 2971(1) (2024).
22.Abdillah, J., Asror, I. & Wibowo, Y. F. A. Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting. J. RESTI (Rekayasa Sistem Dan Teknologi Informasi)4(4), 723–729 (2020). [Google Scholar]
23.Revathy, V. R., Pillai, A. S. & Daneshfar, F. LyEmoBERT: classification of lyrics’ emotion and recommendation using a pre-trained model. Procedia Comput. Sci.218, 1196–1208 (2023). [Google Scholar]
24.Jia, X. Music emotion classification method based on deep learning and improved attention mechanism. Comput. Intell. Neurosci.2022, 5181899 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Chen, X. et al. A novel approach for explicit song lyrics detection using machine and deep ensemble learning models. PeerJ Comput. Sci.9, e1469 (2023). [Google Scholar]
26.Li, Y., Zhang, Z., Ding, H. & Chang, L. Music genre classification based on fusing audio and lyric information. Multimed. Tools Appl.82(13), 20157–20176 (2023). [Google Scholar]
27.Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J. and Moussallam, M., Music mood detection based on audio and lyrics with deep neural net. arXiv preprint (2018).
28.F. Almeida do Carmo, J. L. Figueira da Silva Junior, R. Geraldeli Rossi and F. M. França Lobato, Text representations for lyric-based identification of musical subgenres. IEEE Latin America Transactions 21(6):737-744 (2023).
29.Tsaptsinos, A., Lyrics-based music genre classification using a hierarchical attention network. arXiv (2017).
30.Faramarzia, A., Heidarinejada, M., Mirjalili, S. & Gandomi, A. H. marine predators algorithm: a nature-inspired metaheuristic. Expert Syst. Appl.152, 113377 (2020). [Google Scholar]
31.Ye, H. et al. Web services classification based on wide & Bi-LSTM model. IEEE Access7, 43697–43706 (2019). [Google Scholar]
32.Naeem, A. et al. A novel combined densenet and gated recurrent unit approach to detect energy thefts in smart grids. IEEE Access11, 59496–59510 (2023). [Google Scholar]
33.Sun, J., Han, P., Cheng, Z., Wu, E. & Wang, W. Transformer based multi-grained attention network for aspect-based sentiment analysis. IEEE Access8, 211152–211163 (2020). [Google Scholar]
34.Alfarizi, M. I., Syafaah, L. & Lestandy, M. Emotional text classification using TF-IDF (Term frequency-inverse document frequency) And LSTM (Long short-term memory). J. Informatika10, 2 (2022). [Google Scholar]
35.Ping Yu and XueBo Fu, Classification and identification of emotion of non-foreign music based on TR-Bi-LSTM emotion analysis. Researchsquare (2023).
36.Jia, C. et al. State of health prediction of lithium-ion batteries based on bidirectional gated recurrent unit and transformer. Energy285, 129401 (2023). [Google Scholar]
37.Salim, A., Jummar, W. K., Jasim, F. M. & Yousif, M. Eurasian oystercatcher optimiser: new meta-heuristic algorithm. J. Intell. Syst.31(1), 332–344 (2022). [Google Scholar]
38.Azizi, M., Aickelin, U., Khorshidi, H. A. & Baghalzadeh Shishehgarkhaneh, M. Energy valley optimizer: a novel metaheuristic algorithm for global and engineering optimization. Sci. Rep.13, 226 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Askari, Q., Younas, I. & Saeed, M. Political optimizer: a novel socio-inspired meta-heuristic for global optimization. Knowl.-based Syst.195, 105709 (2020). [Google Scholar]
40.Pengxu Wang, Electronic archive classification method based on convolutional neural network with fast text embeddings, 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC) (2024).
41.Najla Al-shathry, Badria Al-onazi, Abdulkhaleq Q A Hassan, Shoayee Alotaibi, Saud Alotaibi, Faiz Alotaibi, Mohammed Elbes, Mrim Alnfiai, Leveraging hybrid adaptive sine cosine algorithm with deep learning for arabic poem meter detection ACM Transactions on Asian and Low-Resource Language Information Processing (2024).
42.Eswaraiah, P. & Hussain, S. A hybrid deep learning GRU based approach for text classification using Word embedding. EAI Endorsed Trans. Internet Things10, 1 (2023). [Google Scholar]
43.Rahayu, S. P., Afuan, L. & Yunindar, G. A. Implementation of text mining on song lyrics for song classification based on emotion using website-based logistic regression. J. Teknik Informatika (Jutif)6(1), 359–368 (2025). [Google Scholar]
44.Mehra, Ashman, Mehra, Aryan & Narang, Pratik. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed. Tools Appl.84(7), 3701–3721 (2025). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Furner, M., Islam, M. Z. & Li, C. T. Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data. Expert Syst. Appl.182, 115236 (2021). [Google Scholar]

[CR2] 2.Hizlisoy, S., Yildirim, S. & Tufekci, Z. Music emotion recognition using convolutional long short term memory deep neural networks. Eng. Sci. Technol. Int J.24(3), 760–767 (2021). [Google Scholar]

[CR3] 3.Wang, C. & Ko, Y. C. Emotional representation of music in multi-source data by the internet of things and deep learning. J. Supercomput.79(1), 349–366 (2023). [Google Scholar]

[CR4] 4.Jena, K. K., Bhoi, S. K., Mohapatra, S. & Bakshi, S. A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis. Neural Comput. Appl.35(1), 11223–11248 (2023). [Google Scholar]

[CR5] 5.Khattak, A., Asghar, M. Z., Khalid, H. A. & Ahmad, H. Emotion classification in poetry text using deep neural network. Multimed. Tools Appl.81(18), 26223–26244 (2022). [Google Scholar]

[CR6] 6.Yang, L., Shen, Z., Zeng, J., Luo, X. & Lin, H. COSMIC: music emotion recognition combining structure analysis and modal interaction. Multimed. Tools Appl.83 (5), 1–16 (2023). [Google Scholar]

[CR7] 7.Dong, L. Using deep learning and genetic algorithms for melody generation and optimization in music. Soft Comput.27(1), 17419–17433 (2023). [Google Scholar]

[CR8] 8.Sarkar, R., Choudhury, S., Dutta, S., Roy, A. & Saha, S. K. Recognition of emotion in music based on deep convolutional neural network. Multimed. Tools Appl.79, 765–783 (2020). [Google Scholar]

[CR9] 9.Policicchio, V. L., Pietramala, A. & Rullo, P. GAMoN: discovering M-of-N ¬,∨ hypotheses for text classification by a lattice-based genetic algorithm. Artif. Intell.191, 61–95 (2012). [Google Scholar]

[CR10] 10.Dwiyani, L. K. D., Suarjaya, I. M. A. D. & Rusjayanthi, N. K. D. Classification of explicit songs based on lyrics using random forest algorithm. J. Inform. Syst. Inform.5, 550–567 (2023). [Google Scholar]

[CR11] 11.Du, J. Sentiment analysis and lyrics theme recognition of music lyrics based on natural language processing. J. Electr. Syst.20, 315–321 (2024). [Google Scholar]

[CR12] 12.Xie, C. et al. Music genre classification based on res-gated CNN and attention mechanism. Multimed. Tools Appl.83(5), 13527–13542 (2024). [Google Scholar]

[CR13] 13.Jandaghian, M., Setayeshi, S., Razzazi, F. & Sharifi, A. Music emotion recognition based on a modified brain emotional learning model. Multimed. Tools Appl.82(4), 26037–26061 (2023). [Google Scholar]

[CR14] 14.Rajan, R. & Nithin, S. K. Folk music structural segment classification using GRU-based hierarchical attention network. Sādhanā48(4), 254 (2023). [Google Scholar]

[CR15] 15.Hongdan, W., SalmiJamali, S., Zhengping, C., Qiaojuan, S. & Le, R. An intelligent music genre analysis using feature extraction and classification using deep learning techniques. Comput. Electr. Eng.100, 107978 (2022). [Google Scholar]

[CR16] 16.Sujeesha, A. S., Mala, J. B. & Rajan, R. Automatic music mood classification using multi-modal attention framework. Eng. Appl. Artif. Intell.128, 107355 (2024). [Google Scholar]

[CR17] 17.da Silva, A. C. M., Coelho, M. A. N. & Neto, R. F. A music classification model based on metric learning applied to MP3 audio files. Expert Syst. Appl.144, 113071 (2020). [Google Scholar]

[CR18] 18.Andreyan Rizky Baskara; Muti’a Maulida; Muhammad Tri Madya Lestiyanto; Yuslena Sari; Nurul Fathanah Mustamin; Eka Setya Wijaya, Explicit content classification in indonesian song lyrics using the LSTM-CNN method. 2024 Ninth International Conference on Informatics and Computing (ICIC) (2024).

[CR19] 19.Bonela, Abraham Albert, He, Zhen, Luxford, Dan-Anderson., Riordan, Benjamin & Kuntsche, Emmanuel. Development of the lyrics-based deep learning algorithm for identifying alcohol-related words (LYDIA). Alcohol Alcohol.59, 2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Bolla, B. K., Pattnaik, S. R. & Patra, S. Detection of objectionable song lyrics using weakly supervised learning and natural language processing techniques. Procedia Comput. Sci.235, 1929–1942 (2024). [Google Scholar]

[CR21] 21.Syed Nawaz Pasha; Dadi Ramesh; Sallauddin Mohmmad; Shabana; D. Kothandaraman; T. Sravanthi, Song lyrics genre detection using RNN. AIP Conference Proceedings 2971(1) (2024).

[CR22] 22.Abdillah, J., Asror, I. & Wibowo, Y. F. A. Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting. J. RESTI (Rekayasa Sistem Dan Teknologi Informasi)4(4), 723–729 (2020). [Google Scholar]

[CR23] 23.Revathy, V. R., Pillai, A. S. & Daneshfar, F. LyEmoBERT: classification of lyrics’ emotion and recommendation using a pre-trained model. Procedia Comput. Sci.218, 1196–1208 (2023). [Google Scholar]

[CR24] 24.Jia, X. Music emotion classification method based on deep learning and improved attention mechanism. Comput. Intell. Neurosci.2022, 5181899 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Chen, X. et al. A novel approach for explicit song lyrics detection using machine and deep ensemble learning models. PeerJ Comput. Sci.9, e1469 (2023). [Google Scholar]

[CR26] 26.Li, Y., Zhang, Z., Ding, H. & Chang, L. Music genre classification based on fusing audio and lyric information. Multimed. Tools Appl.82(13), 20157–20176 (2023). [Google Scholar]

[CR27] 27.Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J. and Moussallam, M., Music mood detection based on audio and lyrics with deep neural net. arXiv preprint (2018).

[CR28] 28.F. Almeida do Carmo, J. L. Figueira da Silva Junior, R. Geraldeli Rossi and F. M. França Lobato, Text representations for lyric-based identification of musical subgenres. IEEE Latin America Transactions 21(6):737-744 (2023).

[CR29] 29.Tsaptsinos, A., Lyrics-based music genre classification using a hierarchical attention network. arXiv (2017).

[CR30] 30.Faramarzia, A., Heidarinejada, M., Mirjalili, S. & Gandomi, A. H. marine predators algorithm: a nature-inspired metaheuristic. Expert Syst. Appl.152, 113377 (2020). [Google Scholar]

[CR31] 31.Ye, H. et al. Web services classification based on wide & Bi-LSTM model. IEEE Access7, 43697–43706 (2019). [Google Scholar]

[CR32] 32.Naeem, A. et al. A novel combined densenet and gated recurrent unit approach to detect energy thefts in smart grids. IEEE Access11, 59496–59510 (2023). [Google Scholar]

[CR33] 33.Sun, J., Han, P., Cheng, Z., Wu, E. & Wang, W. Transformer based multi-grained attention network for aspect-based sentiment analysis. IEEE Access8, 211152–211163 (2020). [Google Scholar]

[CR34] 34.Alfarizi, M. I., Syafaah, L. & Lestandy, M. Emotional text classification using TF-IDF (Term frequency-inverse document frequency) And LSTM (Long short-term memory). J. Informatika10, 2 (2022). [Google Scholar]

[CR35] 35.Ping Yu and XueBo Fu, Classification and identification of emotion of non-foreign music based on TR-Bi-LSTM emotion analysis. Researchsquare (2023).

[CR36] 36.Jia, C. et al. State of health prediction of lithium-ion batteries based on bidirectional gated recurrent unit and transformer. Energy285, 129401 (2023). [Google Scholar]

[CR37] 37.Salim, A., Jummar, W. K., Jasim, F. M. & Yousif, M. Eurasian oystercatcher optimiser: new meta-heuristic algorithm. J. Intell. Syst.31(1), 332–344 (2022). [Google Scholar]

[CR38] 38.Azizi, M., Aickelin, U., Khorshidi, H. A. & Baghalzadeh Shishehgarkhaneh, M. Energy valley optimizer: a novel metaheuristic algorithm for global and engineering optimization. Sci. Rep.13, 226 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Askari, Q., Younas, I. & Saeed, M. Political optimizer: a novel socio-inspired meta-heuristic for global optimization. Knowl.-based Syst.195, 105709 (2020). [Google Scholar]

[CR40] 40.Pengxu Wang, Electronic archive classification method based on convolutional neural network with fast text embeddings, 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC) (2024).

[CR41] 41.Najla Al-shathry, Badria Al-onazi, Abdulkhaleq Q A Hassan, Shoayee Alotaibi, Saud Alotaibi, Faiz Alotaibi, Mohammed Elbes, Mrim Alnfiai, Leveraging hybrid adaptive sine cosine algorithm with deep learning for arabic poem meter detection ACM Transactions on Asian and Low-Resource Language Information Processing (2024).

[CR42] 42.Eswaraiah, P. & Hussain, S. A hybrid deep learning GRU based approach for text classification using Word embedding. EAI Endorsed Trans. Internet Things10, 1 (2023). [Google Scholar]

[CR43] 43.Rahayu, S. P., Afuan, L. & Yunindar, G. A. Implementation of text mining on song lyrics for song classification based on emotion using website-based logistic regression. J. Teknik Informatika (Jutif)6(1), 359–368 (2025). [Google Scholar]

[CR44] 44.Mehra, Ashman, Mehra, Aryan & Narang, Pratik. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed. Tools Appl.84(7), 3701–3721 (2025). [Google Scholar]

PERMALINK

Serial cascaded hybrid adaptive deep networks-based lyrics text classification using optimization approach

R L Jasmine

Saswati Mukherjee

C R Rene Robin

G David Raj

Abstract

Introduction

Motivation of the developed model

Literature survey

Related works

Problem statement

Table 1.

Automatic lyrics text classification using an improved heuristic approach and serial cascaded hybrid adaptive deep networks

Proposed lyrics text classification model

Fig. 1.

Text dataset for classification analysis

Text data pre-processing with performance enhancement in lyrics text classification using improved marine predators algorithm

Text data pre-processing

Punctuation and special character removal

Remove redundant and inappropriate data

Stemming

Marine predators algorithm

Detecting top predator phase

MPA optimization scenarios

Eddy formation

Proposed IMPA-based classification performance enhancement

Optimization process using developed IMPA

Fig. 2.

Algorithm 1.

Serial cascaded hybrid adaptive deep networks for lyrics text classification model

Bidirectional long short term memory

Fig. 3.

Gated recurrent unit

Fig. 4.

Developed SCHADNet-aided lyrics text classification

Transformer

Reason for choosingSCHADNet model

Fig 5.

Interacting the model with each other

Result and discussion

Simulation setup

Experimental measures

Convergence analysis

Fig 6.

Dataset-1-based performance analysis on the proposed lyrics text classification model

Fig 7.

Fig 8.

Dataset-2-based Analysis on the proposed lyrics text classification model

Fig 9.

Fig 10.

Performance analysis of developed model using dataset 3

Fig. 11.

Fig. 12.

Overall classifier analysis on the proposed lyrics text classification model

Table 2.

Overall analysis on the proposed lyrics text classification model based on algorithms

Table 3.

Statistical analysis of the proposed lyrics text classification model

Table 4.

ROC analysis on the proposed lyrics text classification model

Fig. 13.

State-of-the-art-Method comparative analysis in lyrics text classification model

Table 5.

Ablation study of the proposed model

Table 6.

Convergence time complexity analysis of the proposed model

Table 7.

Best/Worst analysis of the proposed model

Fig. 14.

State of comparison of the proposed model

Table 8.

Impact of feature extraction on the proposed model

Table 9.

Discussions

Conclusion

Limitations of the developed model

Future scope

Acknowledgements

Author contributions