Linguistic analysis on OTL marketing
(A) The average length of OTL marketing abstracts and inventors' abstracts over time.
(B) The average fraction of adjectives in titles over time.
(C) The correlation between the occurrence of each adjective in the marketing abstract and net income rank. Shown here are adjectives with p <0.05. Font size indicates the frequency of the word. Text color indicates the correlation coefficient with net income rank after controlling for categories: red indicates negative correlation, and blue indicates positive correlation.
(D) Machine-learning classifiers with the marketing abstracts as inputs to predict whether the net income of an invention will be above the median net income of the inventions of the same disclosure year. TF-IDF, the classifier using term frequency-inverse document frequency features; BERT, the state-of-the-art text classifier that utilizes deep learning to provide contextual features for each word. Category baseline: only using category tags of each invention as inputs. Shown are receiver operating characteristic (ROC) curves on the hold-out test set. A classifier using TF-IDF features achieves a 0.71 area under the receiver operating characteristic (AUROC) on the hold-out test set.