Developing a predictive model for anticipating technology convergence: A transformer-based model and supervised learning approach

Mokh Afifuddin; Wonchul Seo

doi:10.1371/journal.pone.0326417

. 2025 Jun 26;20(6):e0326417. doi: 10.1371/journal.pone.0326417

Developing a predictive model for anticipating technology convergence: A transformer-based model and supervised learning approach

Mokh Afifuddin ^1,², Wonchul Seo ^2,^*

Editor: Ghulam Mustafa³

PMCID: PMC12200714 PMID: 40569982

Abstract

This study proposes a novel approach to anticipating technology convergence in the bio-healthcare sector by integrating text mining based on transformer models and supervised learning methodologies. The overarching goal is to develop a robust method for predicting technology convergence, leveraging the interrelationships between technology topics extracted from patents and research articles. Through the application of advanced techniques and by leveraging the strengths of transformer-based models such as BERTopic with KeyBERT and OpenAI integration to generate technology topics, we identified potential convergence opportunities and explored emerging trends within the dataset. The proposed method seeks to predict technology convergence effectively by employing various machine learning and deep learning techniques to train prediction models by integrating technological similarity, link prediction measures, and causal relationships between technology topics as input features, offering a more accurate and comprehensive understanding of the intricate relationships within the technological landscape. This study contributes to the literature on technology convergence by offering a novel methodology for anticipating future trends and identifying opportunities for interdisciplinary collaboration in the bio-healthcare sector. Overall, the outcomes of this study hold significant implications for businesses seeking to capitalize on emerging convergence opportunities for sustainable growth.

Introduction

Recent decades have seen a notable acceleration of technological progress, which has resulted in the regular appearance of new technologies that provide a vast array of technical opportunities. In recent years, the phenomenon of technological convergence has drawn a lot of attention [1]. In reality, numerous sectors give rise to newly emerging industry sectors, providing prospects through technology innovation and convergence [2]. Technological convergence, a burgeoning trend in innovation across multiple sectors, involves the integration of at least two pre-existing technologies [3]. This fusion not only results in the creation of hybrid technologies but can also lead to the establishment of entirely new technological domains [4]. This convergence is the driving force behind some of the most transformative developments of this time.

Anticipating technology convergence is crucial for researchers, industry professionals, policymakers, and investors to stay ahead and seize new opportunities [5]. Understanding convergence patterns helps companies find new markets, design innovative products, and create new business models [6]. Policymakers can craft better policies to support emerging technologies. However, predicting technology convergence is complex, requiring an understanding of the relationships and evolution of different technologies. Traditional methods often fall short in capturing this dynamic process. Therefore, a more sophisticated approach is needed to identify potential convergence points among previously unrelated technologies, paving the way for innovation and progress.

Numerous models and methodologies have been developed to study technology convergence, leveraging diverse data sources such as patents [7–13], news articles [14], research articles [15], and Wikipedia hyperlinks [16]. These approaches provide critical insights into the dynamics of technology convergence across various domains. Methodologically, existing studies can be grouped into three main categories: patent citation or bibliographic [17–19], co-classification [3,9,10,12,20–22], and semantic-based methods [13,23–29]. Recently, text mining methods have become prominent for uncovering hidden relationships between technologies [13,25,26,30–36]. These techniques offer valuable insights into technology convergence, uncovering trends and relationships within the technology landscape [37,38]. Particularly, employing topics as the representation of documents has proven highly effective due to the comprehensive information they encapsulate. Technology topics, derived from the distribution of diverse keywords, offer a richer portrayal of documents when compared to other representations.

Despite the considerable value of text mining using topics, prior research predominantly relies on the conventional Latent Dirichlet Allocation (LDA) technique [39]. However, LDA may not capture complex semantic relationships. Therefore, this study seeks to improve on that by using transformer models for text mining to develop technology topics. This innovative approach harnesses the capabilities of transformer models, such as BERTopic [40], to derive richer and more contextually nuanced representations of technology topics, capturing semantic relationships better than traditional methods. Then, we will utilize two primary data sources: patents and research articles, to generate comprehensive technology topics. Next, we will use supervised learning with various machine and deep learning algorithms to train multiple prediction models. These models will leverage the connections between technology topics, incorporating measures like similarity, link prediction, and technological influence. Our goal is to create a strong method for predicting technology convergence, offering insights for R&D planning and competitive advantage. This study will provide valuable insights into emerging convergence opportunities in the evolving technology landscape.

The remainder of this paper is structured as follows. Section 2 reviews the relevant literature on technology convergence and transformer-based text mining approaches. Section 3 outlines the research methodology, including data collection, topic modeling using BERTopic, and feature extraction. Section 4 details the implementation of the supervised learning model for predicting convergence. Section 5 presents and discusses the results, with a comparison to prior studies. Finally, Section 6 concludes the paper and offers directions for future research.

Related work

Technology convergence

Technology convergence has been a subject of interest for many researchers and practitioners in recent years. Research on technology convergence originated in the 1960s with by Rosenberg’s recognition of overlapping technologies across industries, linking convergence to industrial progress. Agarwal and Brem [41] subsequently defined it as the integration of multiple technologies to generate innovative products, services, or systems. The literature on technology convergence spans various approaches and methodologies, emphasizing the evolution and prediction of technological trends. Research on technology convergence can be categorized into two primary streams based on their objectives: those aimed at identifying historical convergence trends [15,18,42] and those focused on forecasting future convergence patterns [8,9,12,16,43,44].

Researchers have explored diverse data sources, including patents, news articles, and Wikipedia hyperlinks, to uncover patterns and trajectories of technology convergence. Patent data is a key result of technology research, and analyzing it helps explore detailed technology trends. Technology convergence can be determined by co-occurrences of patent classification codes in a patent, such as the International Patent Classification (IPC) codes, Cooperative Patent Classification (CPC), and United States Patent Classification (USPC) [5,7,10,21,22], or by examining relationships among patent classification codes through patent citations [17,18,43,45–48]. However, predicting convergence using patent citation and co-classification methods has challenges due to ambiguity in technology classification. This ambiguity can lead to imprecise categorization, as connections between patents or classifications may not always reflect true convergence [49]. To address these challenges, various strategies have been proposed to predict the emergence of new technology convergence. Recent studies have turned to advanced techniques like text mining or semantic analysis, network analysis, and machine learning techniques to identify technology patterns [8,12,13,20,23,50,51]. Text mining methods offer a more nuanced representation of technology topics by capturing intricate semantic relationships and contextual nuances. These approaches have shown promise in revealing hidden relationships between technologies and enhancing our understanding of technology convergence dynamics [13,25,27,30,32,33,38,42,52].

Text mining approach for technology convergence

As the landscape of innovation evolves, predicting technology convergence is crucial for strategic decision-making, innovation planning, and fostering interdisciplinary advancements. Text mining, leveraging natural language processing (NLP) techniques, has become as a valuable tool for uncovering patterns and trends in large textual datasets [53–57]. Recently, text mining has emerged as a pivotal tool for understanding and predicting technology convergence. Prior research has extensively explored text mining methodologies include LDA to uncover patterns of technological opportunities [11–13,24,42,58–60]. They have improved the performance of text-based classification and sentiment analysis tasks by utilizing advanced deep learning and word embedding techniques. They emphasized the possibility of contextual learning and semantic representation in obtaining significant patterns from unstructured texts. Additionally, by highlighting the adaptability and scalability of text mining techniques across a range of applications, some studies have combined optimization algorithms and molecular simulations to improve model accuracy and provide domain-specific insights. Kim and Sohn [23] and Feng [61] employed the document-to-vector (Doc2Vec) technique to generate vector representations for each technological domain, while Ma et al. [62] and Liu et al. [33] applied a semantic analysis using Subject-Action-Object (SAO) and topic modeling for technology convergence in emerging fields, aiming to unveil latent semantic relationships among technology topics. Afifuddin and Seo [52] demonstrated the application of semantic analysis in text mining to uncover hidden relationships between technologies over time by applying Dynamic Topic Modeling (DTM).

However, previous approaches often face limitations in capturing nuanced semantic relationships and contextual intricacies within documents. BERTopic [40] addresses this by combining the strengths of transformer-based models and advanced topic modeling techniques. Using the BERT (Bidirectional Encoder Representations from Transformers) model [63], BERTopic extracts coherent topics from textual data by capturing semantic meaning at the sentence level. Unlike traditional probabilistic models, BERTopic leverages contextual embeddings from transformers for more nuanced topic extraction. This approach has exhibited promising outcomes in topic modeling across diverse domains, as evidenced by studies conducted by An et al. [64], and Jeon et al. [65]. Therefore, this study aims to contribute to this evolving field by proposing a novel methodology that combines text mining based on a transformer model with supervised learning. By examining the interrelationships between technology topics extracted from patents and research articles, the study seeks to advance predictive modeling for technology convergence, offering a more accurate and comprehensive understanding of technological relationships. The outcomes of this research hold significant implications for businesses seeking to capitalize on emerging convergence opportunities for sustainable growth.

Methodology

Illustrated in Fig 1, the comprehensive research framework delineates four key stages: (1) collecting and pre-processing text data; (2) generating technology topics with a transformer-based model; (3) Extracting features of technology from each period; (4) Training model for technology convergence prediction; (5) Identifying potential technology convergence.

Collecting data and preprocessing

In the data collection stage of anticipating technology convergence, a crucial source involves gathering information from patent documents and research articles. Patents are valuable sources of technological advancements, offering insights into new inventions and emerging trends [10]. Research articles provide essential contextual information, explaining the scientific and technical aspects behind innovations [15]. Combining these sources enhances the predictive model’s ability to capture the dynamics of technology convergence, ensuring a strong foundation for anticipating future developments at the intersection of various technological domains [66]. In this study, we collect patent documents from the United States Patent and Trademark Office (USPTO) database, recognized for its extensive global coverage [67]. Research articles are collected from the SCOPUS database, a reputable and widely used platform for scholarly publications. We use a consistent query across both patents and research articles, focusing on the bio-healthcare domain. The collected textual data includes titles, abstracts, and claims from both patents and articles. This systematic approach ensures a comprehensive and inclusive retrieval of relevant information from both patents and articles. Table 1 shows a summary of data collection information.

Table 1. Overview of data gathering details.

Patent document database	https://www.wipson.com
Research paper database	https://www.scopus.com
Query formulation	“(bio AND healthcare) OR (bio-healthcare) OR (smart AND healthcare) OR (smart-healthcare) OR (digital AND healthcare) OR (digital-healthcare) OR (digital AND bio AND healthcare) OR (healthcare AND device) OR (healthcare-device)”
Publication date	2013-2021

Open in a new tab

The complexity and variability of language in patents and articles necessitate careful preparation. Initially, raw text data is subjected to a series of preprocessing techniques to standardize and enhance the quality of extracted technology topics [68]. The preprocessing begins with the removal of irrelevant characters and special symbols and the formatting of artifacts, ensuring a cleaner and more uniform text corpus.

Generating technology topics with a transformer-based model

We used advanced methods to create technology topics from a large dataset of patents and research articles. Our approach relied on transformer models predicated on the principle of self-attention mechanisms, which excel at understanding the context of words and capturing complex relationships. Making the most of transformer-based models’ potential, specifically BERTopic [40], our approach involved encoding and clustering textual information to identify coherent and representative technology topics.

The BERTopic process for generating technology topics involves several steps to extract meaningful insights from the text data, as shown in Fig 2. It starts by using BERT [63] to create embeddings, representing words in a contextualized manner. In this study, we use the DistilBERT model with the “distilbert-base-nli-mean-tokens” architecture, capturing the semantic representations of the input text. These embeddings are then reduced to a lower-dimensional space using a uniform manifold approximation and projection (UMAP) [69] while preserving word relationships. HDBSCAN [70] is then applied to cluster these embeddings into coherent groups, forming our initial technology topics. To refine these clusters, CountVectorizer is used to represent documents as numerical vectors based on term frequency. The next, Class of Term Frequency-Inverse Document Frequency (c-TF-IDF) [40], is applied to weigh the importance of terms across the entire dataset, providing a comprehensive measure of term relevance within each technology topic. Through the utilization of c-TF-IDF, BERTopic identifies key terms and their significance, constructing high-density clusters for distinct technology topics. Finally, KeyBERT and OpenAI are applied to select representative terms for each topic. This systematic approach ensures that resulting topics not only capture semantic nuances but also offer clear representations of diverse technological domains.

Extracting features of technology topics from each period

This study develops supervised learning models to identify new opportunities for technology convergence. A crucial step is feature extraction, identifying key indicators of potential convergence between technology topics. We use three types of features: technological similarities, link prediction measures within topic networks, and causal relationships between technology topics. By analyzing these features, we aim to create a model that offers valuable insights into emerging trends and collaborative dynamics between different technological domains.

1)
Measure of similarity analysis

The utilization of technological similarity between technology topics is a key aspect of our approach to understanding and predicting technological convergence. It measures how closely related different technology topics are, based on shared characteristics and functionalities. We use cosine similarity to quantify this, which compares vectors representing technology topics [71].

We encode technology topics into numerical vectors using methods like word embeddings, utilizing BERTopic’s document-topic distribution. We then calculate cosine similarity between these vectors, giving us similarity scores for both documents and words. By combining these scores, we get a comprehensive representation of technological similarity. A higher cosine similarity indicates a closer relationship between topics, suggesting a higher likelihood of convergence. We have $n$ topic vectors $T_{1}, T_{2} \dots . T_{n}$ . The similarity matrix $S$ is an $n x n$ matrix where each element $s_{i j}$ represents the cosine similarity between topics $i$ and $j$ . This approach helps identify potential convergence opportunities by revealing patterns and trends. The process is illustrated in Fig 3.

2)
Link prediction index

We build a co-occurrence network using the topics and apply the link prediction measures to the network to calculate proximity values for potential technology connections, as shown in Fig 4. The link prediction index is another crucial tool applied to forecast the emergence of new connections between technology topics. This index evaluates the likelihood of a link forming between nodes in the network, predicting future collaborations or convergences [72]. By assessing the historical co-occurrences and relationships, we can anticipate the evolution of connections and identify areas primed for future technological convergence.

Represent the technological landscape as a network where nodes represent distinct technology topics and edges signify potential connections or convergence between these topics, as shown in Fig 5. In this study, we employed a network-based link prediction approach, which includes various types of measures. This proximity index quantifies the relationships between nodes, assessing factors such as similarity, distinctiveness, and universality [44]. By integrating multiple measures, our approach aims to provide a comprehensive understanding of the dynamic interactions within the technological landscape, offering nuanced insights into the likelihood of convergence between diverse technology topics. Table 2 presents a concise summary of the proximity measures utilized in our network-based link prediction methodology, each contributing a distinct perspective to the analysis of technological relationships.

Table 2. The structural proximity index utilized gauges the collection of neighboring nodes and the node’s degree within a set.

Measurement	Name	Definition	Reference
Technological similarity	Jaccard Coefficient (jc)	$S (x, y) = \frac{\| γ (x) \cap γ (y) \|}{\| γ (x) \cup γ (y) \|}$	[73]
	Common Neighbor (cn)	$S (x, y) = \| γ (x) \cap γ (y) \|$	[74]
	Leicht-Holme-Newman (lhn)	$S (x, y) = \frac{\| γ (x) \cap γ (y) \|}{k_{x} x k_{y}}$	[75]
	Hub Depressed Index (hdi)	$S (x, y) = \frac{\| γ (x) \cap γ (y) \|}{m a x (k_{x}, k_{y})}$	[76]
Technological distinctiveness	Adamic-Adar (aa)	$S (x, y) = \sum_{z \in γ (x) \cap γ (y)} \frac{1}{l o g \| γ (z) \|}$	[77]
Technological distinctiveness	Resource Allocation (ra)	$S (x, y) = \sum_{z \in γ (x) \cap γ (y)} \frac{1}{\| γ (z) \|}$	[78]
Technological universality	Preferential Attachment (pa)	$S (x, y) = \| γ (x) \| x \| γ (y) \|$	[79]
	Katz Index (katz)	$S (x, y) = \sum_{l = 1}^{\infty} β^{l} \| p a t h s_{x, y}^{(l)} \| = \sum_{l = 1}^{\infty} β^{l} {(A^{l})}_{x, y}$	[80]
	Average Commute Time (act)	$S (x, y) = \frac{1}{m (x, y) + m (y, x)}$ Where $m (x, y)$ represents the average number of steps taken by the random walker to reach $y$ , starting from $x$	[81]
Technological nearness (quasi-local)	Local Path Index (lp)	$S = A^{2} + \in A^{3}$ Where $A$ denotes an adjacency matrix for nodes $x$ and $y$ , while $\in$ is a free parameter	[78]

Open in a new tab

3)
Linkages of technological influences

Determining technological influence is a multi-step process involving rule mining, rule mining measures, and the application of the DEMATEL for a comprehensive analysis that considers both direct and indirect effects [82]. First, rule mining extracts meaningful patterns and relationships between technology topics based on their connections or co-occurrences. This helps identify associations and interactions within the technological landscape. Next, rule mining measures quantify the strength and significance of these extracted rules by assessing frequency, confidence, and support, providing a quantitative basis for evaluating relationships between topics. Then, the DEMATEL technique is applied to analyze the influential effects of technology topics that consider both direct and indirect effects. The formula of the DEMATEL method involves the following:

Construction the direct-relation matrix (D):

D = [\begin{matrix} \begin{matrix} d_{11} & d_{12} & \dots d_{1 n} \\ d_{21} & d_{22} & \dots d_{2 n} \\ ⋮ & ⋮ & ⋱ ⋮ \end{matrix} \\ \begin{matrix} d_{n 1} & d_{n 2} & \dots d_{n n} \end{matrix} \end{matrix}]

(1)

Normalize the direct-relation matrix

N = \frac{D}{M a x (\sum_{i = 1}^{n} d_{i j}, \sum_{j}^{n} d_{i j})}

(2)

Calculate the total-relation matrix

T = {N (I - N)}^{- 1}

(3)

Where $I$ is the identify matrix of the same size as $N$

Determine the influential degree and relationship

r_{i} = \sum_{j = 1}^{n} t_{i j} (T o t a l i n f l u e n c e e x t r a c t e d b y f a c t o r i)

(4)

c_{j} = \sum_{i = 1}^{n} t_{i j} (T o t a l i n f l u e n c e r e c e i v e d b y f a c t o r j)

(5)

The result is a thorough and data-driven exploration of technological influence, aiding in the identification of key players and themes that shape the dynamics of technological convergence. Fig 6 illustrates the investigation of causal relationships between technology topics.

These feature extraction processes contribute to the development of predictive models capable of navigating and predicting the evolving landscape of technology convergence, offering insights into emerging trends and collaborative opportunities.

Training classification model for technology convergence prediction

In training the classification model for predicting technology convergence, input features for our model encompass key metrics such as cosine similarity, link prediction index, and technological influence measures. The input features used in exploring technology convergence in this study are detailed in Table 3. Following the extraction of these features during period 1 and the observation of technology convergence in period 2, we applied them to train various prediction models. Due to the inherent variability in the ranges of values among these features, normalization is imperative during the model training process.

Table 3. Input features employed for training models to analyze technology convergence.

Type	Feature	Description
Similarity measure	Index_cs	Cosine similarity between technology topic from generated topic embeddings.
Link prediction	Index_cn Index_jc Index_pa Index_aa Index_ra Index_hdi Index_katz Index_lhn Index_ac Index_lp	The likelihood of the pair of technology topics being linked in the next period is determined by a particular link prediction algorithm (such as common neighbor (cn), jaccard (jc), preferential attachment (pa), adamic-adar (aa), resource allocation (ra), hub depressed index (hdi), katz, leicht-holme-newman (lhn), average commute (ac), and local path index (lp).
Technological Influences	Index_cause Index_effect	The degree to which each technology topic influences all other topics. The level of influence that each technology topic receives from all others.

Open in a new tab

The model is trained using various machine learning or deep learning approaches, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), Gradient Boasting (GB), XGBoost (XGB), LightGBM (LGBM), and Deep Neural Network (DNN). Algorithms are selected to effectively capture patterns and relationships within the multidimensional input feature space. The training process involves optimizing the model parameters to enhance its predictive capabilities. To ensure the robustness and generalizability of the model, rigorous validation and evaluation procedures are implemented. The model is tested on a separate dataset to assess its performance in predicting future technological convergence and identifying influential topics. Evaluation metrics such as precision, recall, and F1 score provide quantitative measures of the model’s accuracy. Given the highly formalized nature of the input features, we are confident that employing fundamental machine learning and deep learning techniques will be adept at navigating the complexities associated with technology convergence. The model aims to provide actionable insights for navigating the ever-evolving landscape of technological convergence.

Identifying potential technology convergence

In this study, we aim to predict future technology convergence beyond period 3. We evaluate the performance of different machine learning models using data from period 2 to predict whether convergence occurred in period 3. We assess the models using metrics like precision, recall, F1 score, and accuracy to see how well they predict convergence. By comparing these models, we gain insights into their effectiveness. Then, we analyze the predicted convergence instances to understand future convergence patterns and identify emerging trends. This analysis helps decision-making and strategic planning in various industries.

Result and analysis

Data exploration

The data were collected in the bio-healthcare field from January 2013 to December 2021 and include patent documents sourced from the USPTO and research articles from SCOPUS publications. The number of patents is 3,812, and the number of research articles is 4,931. In total, 8,743 documents were carefully collected and formed the basis of our dataset for analysis and exploration. The distribution pattern of documents based on type each year is shown in Fig 7. We combine all documents and divide them into three periods based on time intervals, and each period is analyzed separately to observe technology topics within the evolving technological landscape. The distribution pattern of documents throughout the various years is seen in Table 4. The first period, 2013–2015, had 1,725 documents; the second period had 2,555 documents; and the third period had 4,463 documents.

Table 4. The number of all documents based on time interval.

Period	Grant Year	Number of documents
Period 1	2013	535
	2014	588
	2015	602
Period 2	2016	731
	2017	871
	2018	953
Period 3	2019	1,246
	2020	1,526
	2021	1,691
Total		8,743

Open in a new tab

Cleaning the text data includes tasks such as removing special characters, punctuation, and irrelevant symbols to ensure a consistent and standardized format. Additionally, common text preprocessing techniques like lowercasing and stemming may be applied to enhance uniformity. Once the text is cleaned, the extraction phase involves identifying and isolating key information.

In our study, we assumed textual data could effectively represent the technology landscape. Topic modeling generates topics based on word distribution in these texts. Fig 8 shows word count distribution: blue bars indicate document counts within each range, while red and green dashed lines represent the median and mean word counts, respectively. This highlights the importance of preprocessing to extract meaningful insights from the technological landscape.

Analysis of topic extraction and interpretation for generating technology topic

After the exploration and preprocessing of the text data, the BERTopic process is initiated to uncover latent topics within the technological landscape. BERTopic, a topic modeling technique based on transformer architecture, is applied to generate clusters of words that represent distinct technology topics. The resulting document-topic distribution reveals the likelihood or proportion of each document belonging to these identified topics. This distribution is then analyzed to extract meaningful insights into the prevalent themes and patterns within the dataset. By delving into the output of BERTopic, we gain a comprehensive understanding of how technology topics are interrelated and how they manifest in the corpus of text data.

The BERTopic model provides information about the frequency and characteristics of each identified topic. Fig 9 illustrates the distribution of document counts across different topics, excluding any potential outlier topics. In this study, we provide a comprehensive analysis of term scores. As illustrated in Fig 10, the illustration reveals insightful patterns in the distribution of c-TF-IDF scores across terms within topics. Across the majority of topics, there is a notable decline in c-TF-IDF scores from the top-ranked term to the third-ranked term. Beyond the third rank, the scores exhibit a gradual flattening, indicating a diminishing rate of decrease as the rank increases.

Finally, we use KeyBERT and OpenAI to choose representative terms for each topic and to generate well-crafted labels for our topics. By integrating prompt generation with OpenAI, BERTopic not only clusters related documents into meaningful topics but also assigns them polished, informative labels. Additionally, to enhance the accessibility and comprehension of the identified technology topics, we leverage visualizations derived from the BERTopic output. Fig 11 shows a plot to visualize the embeddings generated by BERT after dimensionality reduction, with each data point labeled according to its corresponding category. This visualization helps us understand the main themes of each technology topic.

Result of prediction model

By utilizing insights obtained from advanced features extracted, including metrics like similarity, link prediction, and technological influence, our objective is to construct a model that can offer valuable insights into emerging trends and collaborative interactions among diverse technological domains. The features derived from the identified topics will serve as inputs for a machine learning algorithm, specifically a classification model. Training on labeled data is shown in Fig 12, which consists of technology topics either connected or not connected in the subsequent period, the classification model aims to discern patterns indicative of technology convergence.

We have created eight individual classification models, each optimized with specific hyperparameters tailored for precise predictions of new technology convergence. By utilizing data from period 1, the models are trained to anticipate technological convergence in the following period. The effectiveness of the trained classification model will be assessed using a test dataset comprising pairs of technology topics where convergence was not found in period 2. The model will predict the occurrence of convergence in period 3 for these pairs. The details of the number of datasets used for training and testing are shown in Table 5. Notably, the dataset faces a class imbalance challenge, as there are fewer pairs of converged topics in the current period compared to the number of topic pairs that did not converge. Therefore, we utilize the Synthetic Minority Over-sampling Technique (SMOTE), a method commonly applied to create a balanced dataset by oversampling data samples through statistical distribution adjustments [83]. Table 6 shows descriptive statistics for input features, providing a comprehensive summary of the data’s central tendencies and overall distribution. The relative importance of each feature used in the predictive model is further detailed in Supporting information S1 Fig, highlighting their individual contributions to forecasting technology convergence.

Table 5. The number of pairs between technology topics to be used as training and test datasets.

Pairs not linked in period 1 (Training input feature)	Pairs linked in period 2 (Y = True)	Pairs not linked in period 2 (Y = False)
4,128	3,966	162
Pairs not linked in period 2 (Test input feature)	Pairs linked in period 3 (Y = True)	Pairs not linked in period 3 (Y = False)
2,016	1,953	63

Open in a new tab

Table 6. Descriptive statistics data for input features in period 1.

Index	Min.	Std.	Q1	Mean	Median	Q3	Max
cn	0.0000	12.6545	12.0000	21.4006	22.0000	31.0000	53.0000
jc	0.0000	0.1520	0.1240	0.2398	0.2323	0.3367	0.8036
pa	864.0000	807.7528	2520.0000	3096.7790	3080.0000	3678.7500	5229.0000
aa	0.0000	3.0163	2.7474	5.0514	5.1097	7.2501	12.6199
ra	0.0000	0.1932	0.1616	0.3117	0.3089	0.4525	0.8170
hdi	123.8333	7.8045	150.3333	155.4796	156.1667	161.1667	173.0000
lhn	0.0000	0.0122	0.0000	0.0108	0.0088	0.0182	0.0909
katz	−0.4659	0.1192	−0.1178	−0.0375	−0.0302	0.0517	0.2855
ac	0.1639	0.0143	0.1888	0.1982	0.1967	0.2032	0.2651
lp	0.0100	0.3528	0.5700	0.8433	0.8400	1.0400	2.1800
cs	0.0037	0.0048	0.0128	0.0163	0.0165	0.0194	0.0314
cause	0.0000	0.0008	0.0001	0.0006	0.0003	0.0008	0.0061
effect	0.0000	0.0012	0.0002	0.0010	0.0005	0.0012	0.0088

Open in a new tab

Comparisons between the evaluation results and the actual convergence in the third period will be conducted to gauge the accuracy of the model. The insights garnered from the evaluation results, as delineated in Table 7 and Fig 13, offer valuable perspectives on the performance of the classification models. This analysis forms a critical aspect of understanding the model’s effectiveness in predicting convergence opportunities between technology topics. The table presents the performance measures based on the test data for both the nine individual models and the voting classifier. Unmistakably, it illustrates that each algorithm demonstrates diverse levels of accuracy, precision, recall, and F1 score. This comprehensive examination contributes to a nuanced understanding of the strengths and weaknesses of the classification models and the overall predictive capabilities of the voting classifier.

Table 7. The results from the performance of individual classification model and voting classifier.

Algorithm	Accuracy	Precision	Recall	F1 score	AUC
LR	0.779	1.000	0.773	0.872	0.933
SVM	0.863	0.988	0.870	0.925	0.891
RF	0.935	0.979	0.954	0.966	0.929
GB	0.873	0.994	0.875	0.930	0.923
XGB	0.945	0.984	0.959	0.971	0.899
LGBM	0.940	0.986	0.951	0.968	0.921
CB	0.943	0.984	0.956	0.970	0.915
KNN	0.905	0.991	0.910	0.949	0.813
DNN	0.975	0.977	0.997	0.987	0.939
Voting	0.950	0.989	0.959	0.974	0.938

Open in a new tab

The majority of models demonstrate excellent performance, with high accuracy, precision, recall, and F1 scores. The voting classifier consistently performs well across all metrics, showcasing its robustness and effectiveness in combining the strengths of individual models. The highest AUC values are observed in the voting classifier, DNN, RF, and LR models, indicating strong overall predictive capabilities. The choice of the best model may depend on the specific requirements of the application. However, the DNN stands out as an optimal choice due to its well-rounded performance across multiple metrics.

Discussion

The objective of this study was to predict future technology convergence by analyzing patent and article data from period 3 using supervised learning. This approach aligns with the rapid pace of technological advancements. In the scope of our investigation, we specifically focused on forecasting technology convergence trends during period 4. This timeframe was selected to encapsulate significant trends and patterns in the evolution of technology topics, ensuring the relevance and timeliness of our predictions. The findings indicated that among the 101 labeled technology topics, certain pairs exhibited DNN probabilities equal to or exceeding 0.5. These high-probability pairs, highlighted in Table 8, demonstrate notable associations, suggesting promising areas of convergence. Among these pairs that have a high probability are: T12 is related to augmented reality ophthalmic displays that integrate augmented reality (AR) technology into ophthalmic displays. T4 is cardiac health, which focuses on monitoring and treating conditions in the heart. T36 is focused on holographic cell imaging, which utilizes holographic microscopy techniques to capture high-resolution images of cells and specimens. T9 is associated with sensor technology and monitoring, while T6 pertains to related orthopedic surgery and recovery.

Table 8. Potential technology convergence.

Pair of technology topics				Technology implication
Topic	Description	Topic	Description
T36	Holographic Cell Imaging	T6	Orthopedic Surgery & Recovery	The combination of holographic cell imaging, sensor technology, and orthopedic surgery and recovery could greatly boost patient involvement and confidence during rehab. By using holographic visuals and sensor feedback in tailored rehab plans, patients can actively track their progress and goals, which can enhance motivation and exercise adherence. This increased engagement may speed up recovery and improve overall function. Additionally, providing real-time feedback and virtual guidance through tele-rehab platforms can widen access to care, making it easier for patients to stay on track even outside the doctor’s office.
T9	Sensor Technology and Monitoring	T6	Orthopedic Surgery & Recovery
T12	Augmented Reality Ophthalmic Display	T4	Cardiac Health & Intervention	Combining these technologies could bring new solutions to enhance heart care, diagnostics, and treatments. “Real-time cardiac visualization with AR ophthalmic display” means creating special AR displays for heart surgeons and cardiologists. These displays can show detailed views of the heart, like its chambers and blood vessels, in real-time. They can also overlay heart imaging data onto the patient’s anatomy during surgery planning.
T89	Liquid Sample Analysis Technology	T39	Hearing Aid Solutions	The convergence of liquid sample analysis technology with hearing aid solutions has the potential to transform traditional hearing aids into multifunctional health monitoring devices that not only improve hearing but also provide valuable insights into the wearer’s overall health status. This technology could enhance early disease detection, promote proactive healthcare management, and empower individuals to take control of their well-being.
T12	Augmented Reality Ophthalmic Display	T65	Smoking Cessation Technology	The convergence of augmented reality ophthalmic display with smoking cessation technology has the potential to revolutionize smoking cessation efforts by providing personalized support, real-time health visualization, interactive resources, and virtual coaching. This technology could empower individuals to quit smoking, improve their overall health, and reduce the burden of tobacco-related diseases on society.

Open in a new tab

The opportunity for technology convergence between T12, which is related to AR ophthalmic display, and T51, which is related to breast cancer screening, presents a compelling future technological opportunity in the realm of medical imaging and diagnostics. This convergence can lead to “AR-Assisted Breast Cancer Screening” [84]. Future advancements may involve integrating augmented reality technology into breast cancer screening processes, allowing radiologists and healthcare professionals to visualize and interact with mammographic images in real-time using AR headsets or display devices. AR overlays could provide enhanced visualization of breast tissue structures, lesions, and abnormalities detected during screening mammograms, enabling more accurate interpretation and diagnosis. AR-guided tools and features could facilitate interactive analysis of mammographic images, allowing radiologists to annotate findings, measure tumor dimensions, and navigate through 3D reconstructions of breast anatomy for comprehensive assessment. The convergence of augmented reality ophthalmic display with breast cancer screening technology holds the potential to revolutionize the way breast cancer is screened, diagnosed, and managed, offering opportunities for improved accuracy, efficiency, and patient-centered care in breast healthcare.

Combining T36 (holographic cell imaging), T9 (sensor technology), and T6 (orthopedic surgery) can advance personalized orthopedic care. Develop wearable sensors (T9) to monitor patients’ movement, joint mobility, and physiological parameters before, during, and after orthopedic surgeries and rehabilitation (T6). These sensors can provide real-time feedback on range of motion, muscle strength, gait analysis, and vital signs to patients and healthcare providers. Use holographic cell imaging (T36) to create detailed 3D models of musculoskeletal structures, aiding surgeons in visualizing patient-specific anatomy for precise surgical planning. Implement AI-driven algorithms to analyze sensor data and holographic images, creating personalized rehabilitation protocols tailored to each patient’s needs and progress. Provide real-time feedback during rehabilitation with holographic overlays to help patients perform exercises correctly and avoid injury.

Conclusion

Technology convergence involves creating new technologies by combining innovations from different fields. Anticipating this convergence is vital for driving innovation and gaining a competitive edge. This study presents a method for predicting technology convergence in the bio-healthcare sector. By combining text mining based on transformer models and supervised learning, we analyze patents and research articles to find convergence opportunities and future trends. Previous methods often missed the nuanced relationships in documents, limiting their insights into technology convergence. Our approach uses advanced techniques, like BERTopic for topic modeling, to identify potential convergence opportunities and emerging trends. We integrate technological similarity, link prediction, and causal relationships between technology topics to train machine learning and deep learning models. A voting classifier combines these models, improving performance over previous methods. This approach enhances our understanding of technology convergence, advancing predictive modeling in technological innovation.

Our analysis revealed promising convergence prospects across various technology topics, including augmented reality in ophthalmic displays, holographic cell imaging, sensor technology, and cardiac health. These findings highlight how multidisciplinary collaboration and technology integration can transform future healthcare innovation. Our results demonstrate that the transformer-based model effectively identifies nuanced and semantically rich technology topics from both patent and article data. Compared to earlier studies that relied on traditional topic modeling methods such as Latent Dirichlet Allocation (LDA) [45] or semantic analysis methods like SAO [62], our approach yielded more contextually coherent topic clusters and higher prediction performance in supervised learning. For instance, while Kim and Sohn [23] used Doc2Vec to predict convergence and improved accuracy through vector combination with bibliometric indicators, our model surpassed this by integrating topic networks and link prediction features derived directly from transformer embeddings. Additionally, compared to Giordano et al. [38], which used text and dynamic network analysis to measure convergence in defense patents, our method emphasizes topic-level convergence prediction and includes both patent and research article data for broader contextual understanding. These enhancements underscore the potential of transformer-based models in capturing the evolving and interconnected nature of emerging technologies. This comprehensive strategy not only refines predictive capabilities but also provides a nuanced understanding of relationships within the technological landscape. The research is expected to help organizations with R&D planning by providing them the ability to seize new opportunities from the convergence of specific technology topics and secure strategic competitive advantages for sustainable growth.

However, it is important to recognize the limitations of our study, including the reliance on historical data and the potential biases inherent in predictive modeling techniques. Furthermore, it is difficult to predict future convergence patterns with accuracy due to the dynamic nature of technology growth. To improve the precision and dependability of convergence forecasts, future studies in this field could explore advanced machine learning approaches and integrate real-time data sources. Studies that follow the development of technology topics over time could also provide valuable insights into the dynamics of convergence trends and their implications for innovation and industry competitiveness.

Overall, our study contributes to the growing body of literature on technology convergence by offering a novel methodology for anticipating future trends and identifying opportunities for interdisciplinary collaboration in the bio-healthcare sector. By addressing the challenges and limitations outlined in this study, we can continue to advance our understanding of technological convergence and drive transformative innovations in healthcare and beyond.

Supporting information

S1 Fig. Feature importance.

(TIF)

pone.0326417.s001.tif^{(186.3KB, tif)}

S1 Data. Feature values for model training.

(XLSX)

pone.0326417.s002.xlsx^{(618.8KB, xlsx)}

S2 Data. Feature values for model testing.

(XLSX)

pone.0326417.s003.xlsx^{(295.8KB, xlsx)}

S3 Data. Feature values for future predictions.

(XLSX)

pone.0326417.s004.xlsx^{(282.8KB, xlsx)}

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00250585). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Clauss T. Measuring business model innovation: conceptualization, scale development, and proof of performance. R&D Manage. 2016;47(3):385–403. doi: 10.1111/radm.12186 [DOI] [Google Scholar]
2.Geum Y, Kim M-S, Lee S. How industrial convergence happens: a taxonomical approach based on empirical evidences. Technol Forecast Soc Change. 2016;107:112–20. doi: 10.1016/j.techfore.2016.03.020 [DOI] [Google Scholar]
3.Yun J, Geum Y. Analysing the dynamics of technological convergence using a co-classification approach: a case of healthcare services. Technol Anal Strateg Manag. 2019;31(12):1412–29. doi: 10.1080/09537325.2019.1616082 [DOI] [Google Scholar]
4.Park H, Anderson TR, Seo W. Regional innovation capability from a technology-oriented perspective: An analysis at industry level. Comput Ind. 2021;129:103441. doi: 10.1016/j.compind.2021.103441 [DOI] [Google Scholar]
5.Song CH, Elvers D, Leker J. Anticipation of converging technology areas – a refined approach for the identification of attractive fields of innovation. Technol Forecast Soc Change. 2017;116:98–115. doi: 10.1016/j.techfore.2016.11.001 [DOI] [Google Scholar]
6.Sick N, Preschitschek N, Leker J, Bröring S. A new framework to assess industry convergence in high technology environments. Technovation. 2019;84–85:48–58. doi: 10.1016/j.technovation.2018.08.001 [DOI] [Google Scholar]
7.Lee WS, Han EJ, Sohn SY. Predicting the pattern of technology convergence using big-data technology on large-scale triadic patents. Technol Forecast Soc Change. 2015;100:317–29. doi: 10.1016/j.techfore.2015.07.022 [DOI] [Google Scholar]
8.Lee C, Hong S, Kim J. Anticipating multi-technology convergence: a machine learning approach using patent information. Scientometrics. 2021;126(3):1867–96. doi: 10.1007/s11192-020-03842-6 [DOI] [Google Scholar]
9.Kwon O, An Y, Kim M, Lee C. Anticipating technology-driven industry convergence: evidence from large-scale patent analysis. Technol Anal Strateg Manag. 2019;32(4):363–78. doi: 10.1080/09537325.2019.1661374 [DOI] [Google Scholar]
10.Caviggioli F. Technology fusion: Identification and analysis of the drivers of technology convergence using patent data. Technovation. 2016;55–56:22–32. doi: 10.1016/j.technovation.2016.04.003 [DOI] [Google Scholar]
11.Choi H, Oh S, Choi S, Yoon J. Innovation topic analysis of technology: the case of augmented reality patents. IEEE Access. 2018;6:16119–37. doi: 10.1109/access.2018.2807622 [DOI] [Google Scholar]
12.Choi S, Afifuddin M, Seo W. A supervised learning-based approach to anticipating potential technology convergence. IEEE Access. 2022;10:19284–300. doi: 10.1109/access.2022.3151870 [DOI] [Google Scholar]
13.Feng S, An H, Li H, Qi Y, Wang Z, Guan Q, et al. The technology convergence of electric vehicles: exploring promising and potential technology convergence relationships and topics. J Clean Prod. 2020;260:120992. doi: 10.1016/j.jclepro.2020.120992 [DOI] [Google Scholar]
14.Kim N, Lee H, Kim W, Lee H, Suh JH. Dynamic patterns of industry convergence: evidence from a large amount of unstructured data. Res Policy. 2015;44(9):1734–48. doi: 10.1016/j.respol.2015.02.001 [DOI] [Google Scholar]
15.Kose T, Sakata I. Identifying technology convergence in the field of robotics research. Technol Forecast Soc Change. 2019;146:751–66. doi: 10.1016/j.techfore.2018.09.005 [DOI] [Google Scholar]
16.Kim J, Kim S, Lee C. Anticipating technological convergence: link prediction using Wikipedia hyperlinks. Technovation. 2019;79:25–34. doi: 10.1016/j.technovation.2018.06.008 [DOI] [Google Scholar]
17.Rodriguez A, Tosyali A, Kim B, Choi J, Lee J-M, Coh B-Y, et al. Patent clustering and outlier ranking methodologies for attributed patent citation networks for technology opportunity discovery. IEEE Trans Eng Manage. 2016;63(4):426–37. doi: 10.1109/tem.2016.2580619 [DOI] [Google Scholar]
18.Karvonen M, Kässi T. Patent citations as a tool for analysing the early stages of convergence. Technol Forecast Soc Change. 2013;80(6):1094–107. doi: 10.1016/j.techfore.2012.05.006 [DOI] [Google Scholar]
19.Geum Y, Kim C, Lee S, Kim MS. Technological convergence of IT and BT: evidence from patent analysis. ETRI J. 2012;34(3):439–49. doi: 10.4218/etrij.12.1711.0010 [DOI] [Google Scholar]
20.Wang J, Lee J-J. Predicting and analyzing technology convergence for exploring technological opportunities in the smart health industry. Comput Ind Eng. 2023;182:109352. doi: 10.1016/j.cie.2023.109352 [DOI] [Google Scholar]
21.Luan C, Deng S, Porter AL, Song B. An approach to construct technological convergence networks across different IPC hierarchies and identify key technology fields. IEEE Trans Eng Manage. 2024;71:346–58. doi: 10.1109/tem.2021.3120709 [DOI] [Google Scholar]
22.Tang Y, Lou X, Chen Z, Zhang C. A study on dynamic patterns of technology convergence with IPC co-occurrence-based analysis: the case of 3D printing. Sustainability. 2020;12(7):2655. doi: 10.3390/su12072655 [DOI] [Google Scholar]
23.Kim TS, Sohn SY. Machine-learning-based deep semantic analysis approach for forecasting new technology convergence. Technol Forecast Soc Change. 2020;157:120095. doi: 10.1016/j.techfore.2020.120095 [DOI] [Google Scholar]
24.Hu R, Ma W, Lin W, Chen X, Zhong Z, Zeng C. Technology topic identification and trend prediction of new energy vehicle using LDA modeling. Complexity. 2022;2022(1). doi: 10.1155/2022/9373911 [DOI] [Google Scholar]
25.Yun S, Cho W, Kim C, Lee S. Technological trend mining: identifying new technology opportunities using patent semantic analysis. Inf Process Manag. 2022;59(4):102993. doi: 10.1016/j.ipm.2022.102993 [DOI] [Google Scholar]
26.Seo W. A patent-based approach to identifying potential technology opportunities realizable from a firm’s internal capabilities. Comput Ind Eng. 2022;171:108395. doi: 10.1016/j.cie.2022.108395 [DOI] [Google Scholar]
27.Zhu C, Motohashi K. Identifying the technology convergence using patent text information: a graph convolutional networks (GCN)-based approach. Technol Forecast Soc Change. 2022;176:121477. doi: 10.1016/j.techfore.2022.121477 [DOI] [Google Scholar]
28.Yun S, Cho W, Kim C, Lee S. Technological trend mining: identifying new technology opportunities using patent semantic analysis. Inf Process Manag. 2022;59(4):102993. doi: 10.1016/j.ipm.2022.102993 [DOI] [Google Scholar]
29.Han Y-L, Yin H-H, Li C, Du J, He Y, Guan Y-X. Discovery of new pentapeptide inhibitors against amyloid-β aggregation using Word2Vec and molecular simulation. ACS Chem Neurosci. 2025;16(6):1055–65. doi: 10.1021/acschemneuro.4c00661 [DOI] [PubMed] [Google Scholar]
30.Kim S, Yoon B. Patent infringement analysis using a text mining technique based on SAO structure. Comput Ind. 2021;125:103379. doi: 10.1016/j.compind.2020.103379 [DOI] [Google Scholar]
31.Durmuşoğlu A, Durmuşoğlu ZDU. Remembering medical ventilators and masks in the days of COVID-19: patenting in the last decade in respiratory technologies. IEEE Trans Eng Manage. 2024;71:1359–73. doi: 10.1109/tem.2022.3151636 [DOI] [Google Scholar]
32.Lee M, Kim S, Kim H, Lee J. Technology opportunity discovery using deep learning-based text mining and a knowledge graph. Technol Forecast Soc Change. 2022;180:121718. doi: 10.1016/j.techfore.2022.121718 [DOI] [Google Scholar]
33.Liu Z, Feng J, Uden L. Technology opportunity analysis using hierarchical semantic networks and dual link prediction. Technovation. 2023;128:102872. doi: 10.1016/j.technovation.2023.102872 [DOI] [Google Scholar]
34.Wang J, Zhang Z, Feng L, Lin K-Y, Liu P. Development of technology opportunity analysis based on technology landscape by extending technology elements with BERT and TRIZ. Technol Forecast Soc Change. 2023;191:122481. doi: 10.1016/j.techfore.2023.122481 [DOI] [Google Scholar]
35.Kim KH, Han YJ, Lee S, Cho SW, Lee C. Text mining for patent analysis to forecast emerging technologies in wireless power transfer. Sustainability. 2019;11(22):6240. doi: 10.3390/su11226240 [DOI] [Google Scholar]
36.Zhao X, Zhang X, Emmanuel A. Research and demonstration of technology opportunity identification model based on text classification and core patents. Comput Ind Eng. 2022;171:108403. doi: 10.1016/j.cie.2022.108403 [DOI] [Google Scholar]
37.Seo W, Yoon J, Park H, Coh B, Lee J-M, Kwon O-J. Product opportunity identification based on internal capabilities using text mining and association rule mining. Technol Forecast Soc Change. 2016;105:94–104. doi: 10.1016/j.techfore.2016.01.011 [DOI] [Google Scholar]
38.Giordano V, Chiarello F, Melluso N, Fantoni G, Bonaccorsi A. Text and dynamic network analysis for measuring technological convergence: a case study on defense patent data. IEEE Trans Eng Manag. 2021. [Google Scholar]
39.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022. doi: 10.1017/9781009218245.012 [DOI] [Google Scholar]
40.Grootendorst M. BerTopic: Neural topic modeling with a class-based TF-IDF procedure. 2022. [Google Scholar]
41.Agarwal N, Brem A. Strategic business transformation through technology convergence: implications from General Electric’s industrial internet initiative. Int J Technol Manag. 2015;67(2/3/4):196. doi: 10.1504/ijtm.2015.068224 [DOI] [Google Scholar]
42.Song B, Suh Y. Identifying convergence fields and technologies for industrial safety: LDA-based network analysis. Technol Forecast Soc Change. 2019;138:115–26. doi: 10.1016/j.techfore.2018.08.013 [DOI] [Google Scholar]
43.Kim J, Lee S. Forecasting and identifying multi-technology convergence based on patent data: the case of IT and BT industries in 2020. Scientometrics. 2017;111(1):47–65. doi: 10.1007/s11192-017-2275-4 [DOI] [Google Scholar]
44.Hong S, Lee C. Effective indexes and classification algorithms for supervised link prediction approach to anticipating technology convergence: a comparative study. IEEE Trans Eng Manage. 2023;70(4):1430–41. doi: 10.1109/tem.2021.3098602 [DOI] [Google Scholar]
45.Park I, Yoon B. Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J Informetr. 2018;12(4):1199–222. doi: 10.1016/j.joi.2018.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Daim T, Lai KK, Yalcin H, Alsoubie F, Kumar V. Forecasting technological positioning through technology knowledge redundancy: patent citation analysis of IoT, cybersecurity, and Blockchain. Technol Forecast Soc Change. 2020;161:120329. doi: 10.1016/j.techfore.2020.120329 [DOI] [Google Scholar]
47.Park Y, Yoon B, Lee S. The idiosyncrasy and dynamism of technological innovation across industries: patent citation analysis. Technol Soc. 2005;27(4):471–85. doi: 10.1016/j.techsoc.2005.08.003 [DOI] [Google Scholar]
48.Jung S, Kim K, Lee C. The nature of ICT in technology convergence: a knowledge-based network analysis. PLoS ONE. 2021;16(7):e0254424. doi: 10.1371/journal.pone.0254424 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Lee C, Kogler DF, Lee D. Capturing information on technology convergence, international collaboration, and knowledge flow from patent documents: a case of information and communication technology. Inf Process Manag. 2019;56(4):1576–91. doi: 10.1016/j.ipm.2018.09.007 [DOI] [Google Scholar]
50.Seo W, Afifuddin M. Developing a supervised learning model for anticipating potential technology convergence between technology topics. Technol Forecast Soc Change. 2024;203:123352. doi: 10.1016/j.techfore.2024.123352 [DOI] [Google Scholar]
51.Ren H, Zhang L, Wang Q. A general methodology for technology opportunity discovery based on opportunity evaluation and optimization. IEEE Trans Eng Manage. 2024;71:6725–40. doi: 10.1109/tem.2023.3262257 [DOI] [Google Scholar]
52.Afifuddin M, Seo W. Predictive modeling for technology convergence: a patent data-driven approach through technology topic networks. Comput Ind Eng. 2024;188:109909. doi: 10.1016/j.cie.2024.109909 [DOI] [Google Scholar]
53.Tseng Y-H, Lin C-J, Lin Y-I. Text mining techniques for patent analysis. Inf Process Manag. 2007;43(5):1216–47. doi: 10.1016/j.ipm.2006.11.011 [DOI] [Google Scholar]
54.Mustafa G, Rauf A, Al-Shamayleh AS, Sulaiman M, Alrawagfeh W, Afzal MT, et al. Optimizing document classification: unleashing the power of genetic algorithms. IEEE Access. 2023;11:83136–49. doi: 10.1109/access.2023.3292248 [DOI] [Google Scholar]
55.Mustafa G, Usman M, Yu L, Afzal MT, Sulaiman M, Shahid A. Multi-label classification of research articles using Word2Vec and identification of similarity threshold. Sci Rep. 2021;11(1):21900. doi: 10.1038/s41598-021-01460-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Kadu S, Joshi B. Text-based sentiment analysis using deep learning techniques. Studies in Big Data. vol. 113. 2022. doi: 10.1007/978-3-031-10869-3_5 [DOI] [Google Scholar]
57.Rakshit P, Sarkar A. A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques. Multimed Tools Appl. 2024;84(2):979–1012. doi: 10.1007/s11042-024-19045-7 [DOI] [Google Scholar]
58.Feng L, Zhao W, Wang J, Feng J, Guo Y. Combining machine learning with a pharmaceutical technology roadmap to analyze technological innovation opportunities. Comput Ind Eng. 2023;176:108974. doi: 10.1016/j.cie.2022.108974 [DOI] [Google Scholar]
59.Ghaffari M, Aliahmadi A, Khalkhali A, Zakery A, Daim TU, Yalcin H. Topic-based technology mapping using patent data analysis: a case study of vehicle tires. Technol Forecast Soc Change. 2023;193:122576. doi: 10.1016/j.techfore.2023.122576 [DOI] [Google Scholar]
60.Kim C, Lee H. A patent-based approach for the identification of technology-based service opportunities. Comput Ind Eng. 2020;144:106464. doi: 10.1016/j.cie.2020.106464 [DOI] [Google Scholar]
61.Feng S. The proximity of ideas: An analysis of patent text using machine learning. PLoS One. 2020;15(7):e0234880. doi: 10.1371/journal.pone.0234880 [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Ma T, Zhou X, Liu J, Lou Z, Hua Z, Wang R. Combining topic modeling and SAO semantic analysis to identify technological opportunities of emerging technologies. Technol Forecast Soc Change. 2021;173:121159. doi: 10.1016/j.techfore.2021.121159 [DOI] [Google Scholar]
63.Devlin J, Chang MW, Lee K, Google KT, Language AI. BERT: Pre-training of deep bidirectional transformers for language understanding. 2019. [Google Scholar]
64.An Y, Oh H, Lee J. Marketing insights from reviews using topic modeling with BERTopic and deep clustering network. Appl Sci. 2023;13(16):9443. doi: 10.3390/app13169443 [DOI] [Google Scholar]
65.Jeon E, Yoon N, Sohn SY. Exploring new digital therapeutics technologies for psychiatric disorders using BERTopic and PatentSBERTa. Technol Forecast Soc Change. 2023;186:122130. doi: 10.1016/j.techfore.2022.122130 [DOI] [Google Scholar]
66.Huang M-H, Yang H-W, Chen D-Z. Increasing science and technology linkage in fuel cells: a cross citation analysis of papers and patents. J Informetr. 2015;9(2):237–49. doi: 10.1016/j.joi.2015.02.001 [DOI] [Google Scholar]
67.Graham SJH, Hancock G, Marco AC, Myers AF. The USPTO Trademark Case Files Dataset: descriptions, lessons, and insights. Econ Manag Strategy. 2013;22(4):669–705. doi: 10.1111/jems.12035 [DOI] [Google Scholar]
68.Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015;5:7–16. [Google Scholar]
69.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. ArXiv Preprint ArXiv:180203426; 2018. [Google Scholar]
70.McInnes L, Healy J, Astels S. hdbscan: Hierarchical density based clustering. J Open Source Softw. 2017;2(11):205. doi: 10.21105/joss.00205 [DOI] [Google Scholar]
71.Li B, Han L. Distance Weighted Cosine Similarity Measure for Text Classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8206 LNCS. 2013. doi: 10.1007/978-3-642-41278-3_74 [DOI] [Google Scholar]
72.Han J, Jeon B, Geum Y. Link prediction revisited: new approach for anticipating new innovation chances using technology convergence. IEEE Trans Eng Manage. 2024;71:5143–59. doi: 10.1109/tem.2022.3213867 [DOI] [Google Scholar]
73.Jaccard P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat. 1901;37:241–72. [Google Scholar]
74.Newman ME. Clustering and preferential attachment in growing networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2001;64(2 Pt 2):025102. doi: 10.1103/PhysRevE.64.025102 [DOI] [PubMed] [Google Scholar]
75.Leicht EA, Holme P, Newman MEJ. Vertex similarity in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73(2 Pt 2):026120. doi: 10.1103/PhysRevE.73.026120 [DOI] [PubMed] [Google Scholar]
76.Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–5. [DOI] [PubMed] [Google Scholar]
77.Adamic LA, Adar E. Friends and neighbors on the Web. Soc Netw. 2003;25(3):211–30. doi: 10.1016/s0378-8733(03)00009-1 [DOI] [Google Scholar]
78.Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71:623–30. [Google Scholar]
79.Barabasi A, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–12. doi: 10.1126/science.286.5439.509 [DOI] [PubMed] [Google Scholar]
80.Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43. doi: 10.1007/bf02289026 [DOI] [Google Scholar]
81.Klein DJ, Randić M. Resistance distance. J Math Chem. 1993;12(1):81–95. doi: 10.1007/bf01164627 [DOI] [Google Scholar]
82.Si S-L, You X-Y, Liu H-C, Zhang P. DEMATEL technique: a systematic review of the state-of-the-art literature on methodologies and applications. Math Probl Eng. 2018;2018:1–33. doi: 10.1155/2018/3696457 [DOI] [Google Scholar]
83.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. doi: 10.1613/jair.953 [DOI] [Google Scholar]
84.Costa N, Ferreira L, de Araújo ARVF, Oliveira B, Torres HR, Morais P, et al. Augmented reality-assisted ultrasound breast biopsy. Sensors (Basel). 2023;23(4):1838. doi: 10.3390/s23041838 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Feature importance.

(TIF)

pone.0326417.s001.tif^{(186.3KB, tif)}

S1 Data. Feature values for model training.

(XLSX)

pone.0326417.s002.xlsx^{(618.8KB, xlsx)}

S2 Data. Feature values for model testing.

(XLSX)

pone.0326417.s003.xlsx^{(295.8KB, xlsx)}

S3 Data. Feature values for future predictions.

(XLSX)

pone.0326417.s004.xlsx^{(282.8KB, xlsx)}

Data Availability Statement

All relevant data are within the paper and its Supporting Information files.

[pone.0326417.ref001] 1.Clauss T. Measuring business model innovation: conceptualization, scale development, and proof of performance. R&D Manage. 2016;47(3):385–403. doi: 10.1111/radm.12186 [DOI] [Google Scholar]

[pone.0326417.ref002] 2.Geum Y, Kim M-S, Lee S. How industrial convergence happens: a taxonomical approach based on empirical evidences. Technol Forecast Soc Change. 2016;107:112–20. doi: 10.1016/j.techfore.2016.03.020 [DOI] [Google Scholar]

[pone.0326417.ref003] 3.Yun J, Geum Y. Analysing the dynamics of technological convergence using a co-classification approach: a case of healthcare services. Technol Anal Strateg Manag. 2019;31(12):1412–29. doi: 10.1080/09537325.2019.1616082 [DOI] [Google Scholar]

[pone.0326417.ref004] 4.Park H, Anderson TR, Seo W. Regional innovation capability from a technology-oriented perspective: An analysis at industry level. Comput Ind. 2021;129:103441. doi: 10.1016/j.compind.2021.103441 [DOI] [Google Scholar]

[pone.0326417.ref005] 5.Song CH, Elvers D, Leker J. Anticipation of converging technology areas – a refined approach for the identification of attractive fields of innovation. Technol Forecast Soc Change. 2017;116:98–115. doi: 10.1016/j.techfore.2016.11.001 [DOI] [Google Scholar]

[pone.0326417.ref006] 6.Sick N, Preschitschek N, Leker J, Bröring S. A new framework to assess industry convergence in high technology environments. Technovation. 2019;84–85:48–58. doi: 10.1016/j.technovation.2018.08.001 [DOI] [Google Scholar]

[pone.0326417.ref007] 7.Lee WS, Han EJ, Sohn SY. Predicting the pattern of technology convergence using big-data technology on large-scale triadic patents. Technol Forecast Soc Change. 2015;100:317–29. doi: 10.1016/j.techfore.2015.07.022 [DOI] [Google Scholar]

[pone.0326417.ref008] 8.Lee C, Hong S, Kim J. Anticipating multi-technology convergence: a machine learning approach using patent information. Scientometrics. 2021;126(3):1867–96. doi: 10.1007/s11192-020-03842-6 [DOI] [Google Scholar]

[pone.0326417.ref009] 9.Kwon O, An Y, Kim M, Lee C. Anticipating technology-driven industry convergence: evidence from large-scale patent analysis. Technol Anal Strateg Manag. 2019;32(4):363–78. doi: 10.1080/09537325.2019.1661374 [DOI] [Google Scholar]

[pone.0326417.ref010] 10.Caviggioli F. Technology fusion: Identification and analysis of the drivers of technology convergence using patent data. Technovation. 2016;55–56:22–32. doi: 10.1016/j.technovation.2016.04.003 [DOI] [Google Scholar]

[pone.0326417.ref011] 11.Choi H, Oh S, Choi S, Yoon J. Innovation topic analysis of technology: the case of augmented reality patents. IEEE Access. 2018;6:16119–37. doi: 10.1109/access.2018.2807622 [DOI] [Google Scholar]

[pone.0326417.ref012] 12.Choi S, Afifuddin M, Seo W. A supervised learning-based approach to anticipating potential technology convergence. IEEE Access. 2022;10:19284–300. doi: 10.1109/access.2022.3151870 [DOI] [Google Scholar]

[pone.0326417.ref013] 13.Feng S, An H, Li H, Qi Y, Wang Z, Guan Q, et al. The technology convergence of electric vehicles: exploring promising and potential technology convergence relationships and topics. J Clean Prod. 2020;260:120992. doi: 10.1016/j.jclepro.2020.120992 [DOI] [Google Scholar]

[pone.0326417.ref014] 14.Kim N, Lee H, Kim W, Lee H, Suh JH. Dynamic patterns of industry convergence: evidence from a large amount of unstructured data. Res Policy. 2015;44(9):1734–48. doi: 10.1016/j.respol.2015.02.001 [DOI] [Google Scholar]

[pone.0326417.ref015] 15.Kose T, Sakata I. Identifying technology convergence in the field of robotics research. Technol Forecast Soc Change. 2019;146:751–66. doi: 10.1016/j.techfore.2018.09.005 [DOI] [Google Scholar]

[pone.0326417.ref016] 16.Kim J, Kim S, Lee C. Anticipating technological convergence: link prediction using Wikipedia hyperlinks. Technovation. 2019;79:25–34. doi: 10.1016/j.technovation.2018.06.008 [DOI] [Google Scholar]

[pone.0326417.ref017] 17.Rodriguez A, Tosyali A, Kim B, Choi J, Lee J-M, Coh B-Y, et al. Patent clustering and outlier ranking methodologies for attributed patent citation networks for technology opportunity discovery. IEEE Trans Eng Manage. 2016;63(4):426–37. doi: 10.1109/tem.2016.2580619 [DOI] [Google Scholar]

[pone.0326417.ref018] 18.Karvonen M, Kässi T. Patent citations as a tool for analysing the early stages of convergence. Technol Forecast Soc Change. 2013;80(6):1094–107. doi: 10.1016/j.techfore.2012.05.006 [DOI] [Google Scholar]

[pone.0326417.ref019] 19.Geum Y, Kim C, Lee S, Kim MS. Technological convergence of IT and BT: evidence from patent analysis. ETRI J. 2012;34(3):439–49. doi: 10.4218/etrij.12.1711.0010 [DOI] [Google Scholar]

[pone.0326417.ref020] 20.Wang J, Lee J-J. Predicting and analyzing technology convergence for exploring technological opportunities in the smart health industry. Comput Ind Eng. 2023;182:109352. doi: 10.1016/j.cie.2023.109352 [DOI] [Google Scholar]

[pone.0326417.ref021] 21.Luan C, Deng S, Porter AL, Song B. An approach to construct technological convergence networks across different IPC hierarchies and identify key technology fields. IEEE Trans Eng Manage. 2024;71:346–58. doi: 10.1109/tem.2021.3120709 [DOI] [Google Scholar]

[pone.0326417.ref022] 22.Tang Y, Lou X, Chen Z, Zhang C. A study on dynamic patterns of technology convergence with IPC co-occurrence-based analysis: the case of 3D printing. Sustainability. 2020;12(7):2655. doi: 10.3390/su12072655 [DOI] [Google Scholar]

[pone.0326417.ref023] 23.Kim TS, Sohn SY. Machine-learning-based deep semantic analysis approach for forecasting new technology convergence. Technol Forecast Soc Change. 2020;157:120095. doi: 10.1016/j.techfore.2020.120095 [DOI] [Google Scholar]

[pone.0326417.ref024] 24.Hu R, Ma W, Lin W, Chen X, Zhong Z, Zeng C. Technology topic identification and trend prediction of new energy vehicle using LDA modeling. Complexity. 2022;2022(1). doi: 10.1155/2022/9373911 [DOI] [Google Scholar]

[pone.0326417.ref025] 25.Yun S, Cho W, Kim C, Lee S. Technological trend mining: identifying new technology opportunities using patent semantic analysis. Inf Process Manag. 2022;59(4):102993. doi: 10.1016/j.ipm.2022.102993 [DOI] [Google Scholar]

[pone.0326417.ref026] 26.Seo W. A patent-based approach to identifying potential technology opportunities realizable from a firm’s internal capabilities. Comput Ind Eng. 2022;171:108395. doi: 10.1016/j.cie.2022.108395 [DOI] [Google Scholar]

[pone.0326417.ref027] 27.Zhu C, Motohashi K. Identifying the technology convergence using patent text information: a graph convolutional networks (GCN)-based approach. Technol Forecast Soc Change. 2022;176:121477. doi: 10.1016/j.techfore.2022.121477 [DOI] [Google Scholar]

[pone.0326417.ref028] 28.Yun S, Cho W, Kim C, Lee S. Technological trend mining: identifying new technology opportunities using patent semantic analysis. Inf Process Manag. 2022;59(4):102993. doi: 10.1016/j.ipm.2022.102993 [DOI] [Google Scholar]

[pone.0326417.ref029] 29.Han Y-L, Yin H-H, Li C, Du J, He Y, Guan Y-X. Discovery of new pentapeptide inhibitors against amyloid-β aggregation using Word2Vec and molecular simulation. ACS Chem Neurosci. 2025;16(6):1055–65. doi: 10.1021/acschemneuro.4c00661 [DOI] [PubMed] [Google Scholar]

[pone.0326417.ref030] 30.Kim S, Yoon B. Patent infringement analysis using a text mining technique based on SAO structure. Comput Ind. 2021;125:103379. doi: 10.1016/j.compind.2020.103379 [DOI] [Google Scholar]

[pone.0326417.ref031] 31.Durmuşoğlu A, Durmuşoğlu ZDU. Remembering medical ventilators and masks in the days of COVID-19: patenting in the last decade in respiratory technologies. IEEE Trans Eng Manage. 2024;71:1359–73. doi: 10.1109/tem.2022.3151636 [DOI] [Google Scholar]

[pone.0326417.ref032] 32.Lee M, Kim S, Kim H, Lee J. Technology opportunity discovery using deep learning-based text mining and a knowledge graph. Technol Forecast Soc Change. 2022;180:121718. doi: 10.1016/j.techfore.2022.121718 [DOI] [Google Scholar]

[pone.0326417.ref033] 33.Liu Z, Feng J, Uden L. Technology opportunity analysis using hierarchical semantic networks and dual link prediction. Technovation. 2023;128:102872. doi: 10.1016/j.technovation.2023.102872 [DOI] [Google Scholar]

[pone.0326417.ref034] 34.Wang J, Zhang Z, Feng L, Lin K-Y, Liu P. Development of technology opportunity analysis based on technology landscape by extending technology elements with BERT and TRIZ. Technol Forecast Soc Change. 2023;191:122481. doi: 10.1016/j.techfore.2023.122481 [DOI] [Google Scholar]

[pone.0326417.ref035] 35.Kim KH, Han YJ, Lee S, Cho SW, Lee C. Text mining for patent analysis to forecast emerging technologies in wireless power transfer. Sustainability. 2019;11(22):6240. doi: 10.3390/su11226240 [DOI] [Google Scholar]

[pone.0326417.ref036] 36.Zhao X, Zhang X, Emmanuel A. Research and demonstration of technology opportunity identification model based on text classification and core patents. Comput Ind Eng. 2022;171:108403. doi: 10.1016/j.cie.2022.108403 [DOI] [Google Scholar]

[pone.0326417.ref037] 37.Seo W, Yoon J, Park H, Coh B, Lee J-M, Kwon O-J. Product opportunity identification based on internal capabilities using text mining and association rule mining. Technol Forecast Soc Change. 2016;105:94–104. doi: 10.1016/j.techfore.2016.01.011 [DOI] [Google Scholar]

[pone.0326417.ref038] 38.Giordano V, Chiarello F, Melluso N, Fantoni G, Bonaccorsi A. Text and dynamic network analysis for measuring technological convergence: a case study on defense patent data. IEEE Trans Eng Manag. 2021. [Google Scholar]

[pone.0326417.ref039] 39.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022. doi: 10.1017/9781009218245.012 [DOI] [Google Scholar]

[pone.0326417.ref040] 40.Grootendorst M. BerTopic: Neural topic modeling with a class-based TF-IDF procedure. 2022. [Google Scholar]

[pone.0326417.ref041] 41.Agarwal N, Brem A. Strategic business transformation through technology convergence: implications from General Electric’s industrial internet initiative. Int J Technol Manag. 2015;67(2/3/4):196. doi: 10.1504/ijtm.2015.068224 [DOI] [Google Scholar]

[pone.0326417.ref042] 42.Song B, Suh Y. Identifying convergence fields and technologies for industrial safety: LDA-based network analysis. Technol Forecast Soc Change. 2019;138:115–26. doi: 10.1016/j.techfore.2018.08.013 [DOI] [Google Scholar]

[pone.0326417.ref043] 43.Kim J, Lee S. Forecasting and identifying multi-technology convergence based on patent data: the case of IT and BT industries in 2020. Scientometrics. 2017;111(1):47–65. doi: 10.1007/s11192-017-2275-4 [DOI] [Google Scholar]

[pone.0326417.ref044] 44.Hong S, Lee C. Effective indexes and classification algorithms for supervised link prediction approach to anticipating technology convergence: a comparative study. IEEE Trans Eng Manage. 2023;70(4):1430–41. doi: 10.1109/tem.2021.3098602 [DOI] [Google Scholar]

[pone.0326417.ref045] 45.Park I, Yoon B. Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J Informetr. 2018;12(4):1199–222. doi: 10.1016/j.joi.2018.09.007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326417.ref046] 46.Daim T, Lai KK, Yalcin H, Alsoubie F, Kumar V. Forecasting technological positioning through technology knowledge redundancy: patent citation analysis of IoT, cybersecurity, and Blockchain. Technol Forecast Soc Change. 2020;161:120329. doi: 10.1016/j.techfore.2020.120329 [DOI] [Google Scholar]

[pone.0326417.ref047] 47.Park Y, Yoon B, Lee S. The idiosyncrasy and dynamism of technological innovation across industries: patent citation analysis. Technol Soc. 2005;27(4):471–85. doi: 10.1016/j.techsoc.2005.08.003 [DOI] [Google Scholar]

[pone.0326417.ref048] 48.Jung S, Kim K, Lee C. The nature of ICT in technology convergence: a knowledge-based network analysis. PLoS ONE. 2021;16(7):e0254424. doi: 10.1371/journal.pone.0254424 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326417.ref049] 49.Lee C, Kogler DF, Lee D. Capturing information on technology convergence, international collaboration, and knowledge flow from patent documents: a case of information and communication technology. Inf Process Manag. 2019;56(4):1576–91. doi: 10.1016/j.ipm.2018.09.007 [DOI] [Google Scholar]

[pone.0326417.ref050] 50.Seo W, Afifuddin M. Developing a supervised learning model for anticipating potential technology convergence between technology topics. Technol Forecast Soc Change. 2024;203:123352. doi: 10.1016/j.techfore.2024.123352 [DOI] [Google Scholar]

[pone.0326417.ref051] 51.Ren H, Zhang L, Wang Q. A general methodology for technology opportunity discovery based on opportunity evaluation and optimization. IEEE Trans Eng Manage. 2024;71:6725–40. doi: 10.1109/tem.2023.3262257 [DOI] [Google Scholar]

[pone.0326417.ref052] 52.Afifuddin M, Seo W. Predictive modeling for technology convergence: a patent data-driven approach through technology topic networks. Comput Ind Eng. 2024;188:109909. doi: 10.1016/j.cie.2024.109909 [DOI] [Google Scholar]

[pone.0326417.ref053] 53.Tseng Y-H, Lin C-J, Lin Y-I. Text mining techniques for patent analysis. Inf Process Manag. 2007;43(5):1216–47. doi: 10.1016/j.ipm.2006.11.011 [DOI] [Google Scholar]

[pone.0326417.ref054] 54.Mustafa G, Rauf A, Al-Shamayleh AS, Sulaiman M, Alrawagfeh W, Afzal MT, et al. Optimizing document classification: unleashing the power of genetic algorithms. IEEE Access. 2023;11:83136–49. doi: 10.1109/access.2023.3292248 [DOI] [Google Scholar]

[pone.0326417.ref055] 55.Mustafa G, Usman M, Yu L, Afzal MT, Sulaiman M, Shahid A. Multi-label classification of research articles using Word2Vec and identification of similarity threshold. Sci Rep. 2021;11(1):21900. doi: 10.1038/s41598-021-01460-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326417.ref056] 56.Kadu S, Joshi B. Text-based sentiment analysis using deep learning techniques. Studies in Big Data. vol. 113. 2022. doi: 10.1007/978-3-031-10869-3_5 [DOI] [Google Scholar]

[pone.0326417.ref057] 57.Rakshit P, Sarkar A. A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques. Multimed Tools Appl. 2024;84(2):979–1012. doi: 10.1007/s11042-024-19045-7 [DOI] [Google Scholar]

[pone.0326417.ref058] 58.Feng L, Zhao W, Wang J, Feng J, Guo Y. Combining machine learning with a pharmaceutical technology roadmap to analyze technological innovation opportunities. Comput Ind Eng. 2023;176:108974. doi: 10.1016/j.cie.2022.108974 [DOI] [Google Scholar]

[pone.0326417.ref059] 59.Ghaffari M, Aliahmadi A, Khalkhali A, Zakery A, Daim TU, Yalcin H. Topic-based technology mapping using patent data analysis: a case study of vehicle tires. Technol Forecast Soc Change. 2023;193:122576. doi: 10.1016/j.techfore.2023.122576 [DOI] [Google Scholar]

[pone.0326417.ref060] 60.Kim C, Lee H. A patent-based approach for the identification of technology-based service opportunities. Comput Ind Eng. 2020;144:106464. doi: 10.1016/j.cie.2020.106464 [DOI] [Google Scholar]

[pone.0326417.ref061] 61.Feng S. The proximity of ideas: An analysis of patent text using machine learning. PLoS One. 2020;15(7):e0234880. doi: 10.1371/journal.pone.0234880 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0326417.ref062] 62.Ma T, Zhou X, Liu J, Lou Z, Hua Z, Wang R. Combining topic modeling and SAO semantic analysis to identify technological opportunities of emerging technologies. Technol Forecast Soc Change. 2021;173:121159. doi: 10.1016/j.techfore.2021.121159 [DOI] [Google Scholar]

[pone.0326417.ref063] 63.Devlin J, Chang MW, Lee K, Google KT, Language AI. BERT: Pre-training of deep bidirectional transformers for language understanding. 2019. [Google Scholar]

[pone.0326417.ref064] 64.An Y, Oh H, Lee J. Marketing insights from reviews using topic modeling with BERTopic and deep clustering network. Appl Sci. 2023;13(16):9443. doi: 10.3390/app13169443 [DOI] [Google Scholar]

[pone.0326417.ref065] 65.Jeon E, Yoon N, Sohn SY. Exploring new digital therapeutics technologies for psychiatric disorders using BERTopic and PatentSBERTa. Technol Forecast Soc Change. 2023;186:122130. doi: 10.1016/j.techfore.2022.122130 [DOI] [Google Scholar]

[pone.0326417.ref066] 66.Huang M-H, Yang H-W, Chen D-Z. Increasing science and technology linkage in fuel cells: a cross citation analysis of papers and patents. J Informetr. 2015;9(2):237–49. doi: 10.1016/j.joi.2015.02.001 [DOI] [Google Scholar]

[pone.0326417.ref067] 67.Graham SJH, Hancock G, Marco AC, Myers AF. The USPTO Trademark Case Files Dataset: descriptions, lessons, and insights. Econ Manag Strategy. 2013;22(4):669–705. doi: 10.1111/jems.12035 [DOI] [Google Scholar]

[pone.0326417.ref068] 68.Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015;5:7–16. [Google Scholar]

[pone.0326417.ref069] 69.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. ArXiv Preprint ArXiv:180203426; 2018. [Google Scholar]

[pone.0326417.ref070] 70.McInnes L, Healy J, Astels S. hdbscan: Hierarchical density based clustering. J Open Source Softw. 2017;2(11):205. doi: 10.21105/joss.00205 [DOI] [Google Scholar]

[pone.0326417.ref071] 71.Li B, Han L. Distance Weighted Cosine Similarity Measure for Text Classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8206 LNCS. 2013. doi: 10.1007/978-3-642-41278-3_74 [DOI] [Google Scholar]

[pone.0326417.ref072] 72.Han J, Jeon B, Geum Y. Link prediction revisited: new approach for anticipating new innovation chances using technology convergence. IEEE Trans Eng Manage. 2024;71:5143–59. doi: 10.1109/tem.2022.3213867 [DOI] [Google Scholar]

[pone.0326417.ref073] 73.Jaccard P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat. 1901;37:241–72. [Google Scholar]

[pone.0326417.ref074] 74.Newman ME. Clustering and preferential attachment in growing networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2001;64(2 Pt 2):025102. doi: 10.1103/PhysRevE.64.025102 [DOI] [PubMed] [Google Scholar]

[pone.0326417.ref075] 75.Leicht EA, Holme P, Newman MEJ. Vertex similarity in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73(2 Pt 2):026120. doi: 10.1103/PhysRevE.73.026120 [DOI] [PubMed] [Google Scholar]

[pone.0326417.ref076] 76.Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–5. [DOI] [PubMed] [Google Scholar]

[pone.0326417.ref077] 77.Adamic LA, Adar E. Friends and neighbors on the Web. Soc Netw. 2003;25(3):211–30. doi: 10.1016/s0378-8733(03)00009-1 [DOI] [Google Scholar]

[pone.0326417.ref078] 78.Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71:623–30. [Google Scholar]

[pone.0326417.ref079] 79.Barabasi A, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–12. doi: 10.1126/science.286.5439.509 [DOI] [PubMed] [Google Scholar]

[pone.0326417.ref080] 80.Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43. doi: 10.1007/bf02289026 [DOI] [Google Scholar]

[pone.0326417.ref081] 81.Klein DJ, Randić M. Resistance distance. J Math Chem. 1993;12(1):81–95. doi: 10.1007/bf01164627 [DOI] [Google Scholar]

[pone.0326417.ref082] 82.Si S-L, You X-Y, Liu H-C, Zhang P. DEMATEL technique: a systematic review of the state-of-the-art literature on methodologies and applications. Math Probl Eng. 2018;2018:1–33. doi: 10.1155/2018/3696457 [DOI] [Google Scholar]

[pone.0326417.ref083] 83.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. doi: 10.1613/jair.953 [DOI] [Google Scholar]

[pone.0326417.ref084] 84.Costa N, Ferreira L, de Araújo ARVF, Oliveira B, Torres HR, Morais P, et al. Augmented reality-assisted ultrasound breast biopsy. Sensors (Basel). 2023;23(4):1838. doi: 10.3390/s23041838 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Developing a predictive model for anticipating technology convergence: A transformer-based model and supervised learning approach

Mokh Afifuddin

Wonchul Seo

Roles

Abstract

Introduction

Related work

Technology convergence

Text mining approach for technology convergence

Methodology

Fig 1. The proposed research framework for identifying technology convergence.

Collecting data and preprocessing

Table 1. Overview of data gathering details.

Generating technology topics with a transformer-based model

Fig 2. Applying the BERTopic process for generating technology topic.

Extracting features of technology topics from each period

Fig 3. Exploration of technological similarity between technology topics.

Fig 4. Constructing technology topic networks and applying link prediction measures to calculate proximity values for potential technology connections.

Fig 5. Illustration of technology topic networks in period 1(2013-2015).

Table 2. The structural proximity index utilized gauges the collection of neighboring nodes and the node’s degree within a set.

Fig 6. Exploration of cause-and-effect relatedness between technology topics.

Training classification model for technology convergence prediction

Table 3. Input features employed for training models to analyze technology convergence.

Identifying potential technology convergence

Result and analysis

Data exploration

Fig 7. The distribution pattern of articles and patents each year in bio-healthcare field published from 2013 to 2021.

Table 4. The number of all documents based on time interval.

Fig 8. Word count distribution in the text data after preprocessing, showing the frequency of words across the dataset.

Analysis of topic extraction and interpretation for generating technology topic

Fig 9. Distribution of document counts across various topics, excluding outliers.

Fig 10. The distribution of c-TF-IDF scores across terms within topics (a) the visualization of term scores for each topic (b) the visualization of term scores with logarithmic scaling.

Fig 11. Visualization of embedding reduction with fine-tune topic representation.

Result of prediction model

Fig 12. Training on labeled data in pairs to forecast whether the pairs will occur in the subsequent period.

Table 5. The number of pairs between technology topics to be used as training and test datasets.

Table 6. Descriptive statistics data for input features in period 1.

Table 7. The results from the performance of individual classification model and voting classifier.

Fig 13. ROC curve for individual models and voting classifier.

Discussion

Table 8. Potential technology convergence.

Conclusion

Supporting information

Data Availability

Funding Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases