A Neural Network-Inspired Approach for Improved and True Movie Recommendations

Muhammad Ibrahim; Imran Sarwar Bajwa; Riaz Ul-Amin; Bakhtiar Kasi

doi:10.1155/2019/4589060

. 2019 Aug 4;2019:4589060. doi: 10.1155/2019/4589060

A Neural Network-Inspired Approach for Improved and True Movie Recommendations

Muhammad Ibrahim ¹, Imran Sarwar Bajwa ^1,^✉, Riaz Ul-Amin ², Bakhtiar Kasi ²

PMCID: PMC6701398 PMID: 31467517

Abstract

In the last decade, sentiment analysis, opinion mining, and subjectivity of microblogs in social media have attracted a great deal of attention of researchers. Movie recommendation systems are the tools, which provide valuable services to the users. The data available online are growing gradually because the online activities of users or viewers are increasing day by day. Because of this, big data, analytics, and computational issues have raised. Therefore, we have to improve recommendations services upon the traditional one to make the recommendation system significant and efficient. This article presents the solution for these issues by producing the significant and efficient recommendation services using multivariates (ratings, votes, Twitter likes, and reviews) of movies from multiple external resources which are fetched by the web bot and managed by the Apache Hadoop framework in a distributed manner. Reviews are analyzed by a deep semantic analyzer based on the recurrent neural network (RNN/LSTM attention) with user movie attention (UMA) to produce the emotion. The proposed recommender evaluates multivariates and produces a more significant movie recommendation list according to the taste of the user on a mobile app in an efficient way.

1. Introduction

“Recommendation systems” are services that use Artificial Intelligence (AI) and Natural Language Processing (NLP) techniques to provide the empirical solutions of the recommendations for various application frameworks and services [1]. Recommendation systems enables mobile apps and web applications to make the perception intelligently about the selection of different items, movies [2], hotels [3], food [4], tourism [5], books [6], TV shows [7], YouTube videos [8], health [9], etc. Community trends polarize towards music, movies, or videos. For music or movies or videos, a huge amount of stream is available online, but which one of them will be watched is still a rising question. Music or movie recommendation systems still have challenges like the playlist, magnitude, security, privacy, recommendation, and session. Therefore, MRSs become a domain of music information retrieval (MIR) [10–13]. Now, the society has changed, and community trends highly depend on mobile app usage. Several products are enriched by the usage of a mobile app. So mobile app recommendation systems are essential for suitable selection of recommended items [14–16]. Most of the recommender systems are univariate and use ratings and reviews or tweets [17], and other few are bivariate (sentiment score and likes) [18–20]. This work is state of the art and uses the multivariate matrix, which makes the decision using a dynamic approach for suggesting the movie according to the relative taste of the users. The term “multivariate” means involving many variables like a qualitative variable (semantic score) and quantitative variables (Twitter likes, rating, and votes) of movies from three movie sites for significant recommendation [21]. Our work is on extremity grouping of movie reviews, where an opinionated report is labeled with semantic emotions of the microblog text or reviews and emotions [22] using a semantic parser based on the recurrent neural network (RNN/LSTM) [23, 24]. A drawback is that change of a user's review about a movie may affect the user's preference. The nature of reviews influenced by the choice of words uses multilingual dictionaries. Some recommendation systems use linked movie databases, including Trovacinema, Google Places, and Netflix, and Wikipedia provides linked data and ontologies for descriptions about the movie [25–27]. Using the shallow machine learning models for solving the NLP problems is handcrafted and time-consuming. Nowadays, word embedding, neural-based models achieve success and popularity by producing a better result as compared to traditional machine learning logistic regression, SVM, and KNN.

Artificial neural networks are the mathematical models that are inspired by human neural networks. They have three simple layers: input, output, and hidden layers, or sometimes only two layers: input and output layers. The input layer is connected to the hidden layer via a lean weight. The hidden layer output combines via the activation function h=ϕ(w_i · x_i). In the ANN, like the biological neural network, neurons are the nodes, while synopses are the edges. Each artificial neuron has an activation function in the ANN. There are several activation functions like sigmoid which ranges from 0 to 1, hyperbolic function which ranges from −1 to 1, and softmax function whose output in categorical distribution and ReLu function is a feedforward neural network. The ANN is not an algorithm; it is a framework for several machine learning algorithms to solve a complex work. Therefore, we can say that it is a collection of neurons or networks of neurons (https://en.wikipedia.org/wiki/Artificial_neural_network). The recurrent neural network (RNN/LSTM) processes the sequence semantically, which is the basic structure of deep neural networks. Several NLP tasks are performed by RNNs/LSTM attention. In this work, we used the hierarchical neural network (HNN) based on LSTM attention, which impaled the global user and movie information via word and sentence-level attention for document representation. The user's reviews and movie features at the word and sentence level are taken for semantic analysis of reviews, which play a major role in the process of true recommendations. Global user information represents the personal behavior and the movie feature represents a movie genre or a movie profile or linked data which are useful for semantic extraction of movie reviews [28]. In natural language (word sequence), each word or sentence is related to another one and requires to be understood semantically. A huge amount of data are available online on web contents (ratings, reviews, likes, votes, smiley, images, and stars) that can be fetched by a web bot or web agent or crawler, which are all same terms used interchangeably. Web content (ratings, reviews, likes, votes, smiley, images, and stars) is useful for recommendation services. These contents are evaluated and make the perception about users, and items make the recommendation for others [29, 30]. The hot issues of big data like computational complexity are managed by using Map-Reduce and Apache Mahout in NoSQL [31, 32] distributed environment which reduce computation complexity by clustering and horizontal scaling instead of empowered single machine [33]. Because user frequency and data volume gradually increase, it is difficult to manage these huge data by a single machine. Sparsity can be reduced by factorization [34]. Movie recommendation systems provide services to users using content-based filtering algorithms [35], collaborative filtering [36], and some combined forms to make a hybrid filtering algorithm [37]. We used implicate rating to handle the cold start problem [38], an implication managed by the server The multivariate movie recommender provides the services to users to watch the movies according to their profile or history (previously watched or rated). Therefore, there is a need to improve recommendation systems for significant recommendation services. We developed a pilot version for these problems, which consists of a mobile app, a web scraper, and a multivariate recommender to provide the significant services for movie recommendation in an efficient way.

This work is arranged as follows: related works are discussed in Section 2, the recurrent multivariate movie recommendation system model is explained in Section 3, recurrent multivariate movie recommendation system implementation is given in Section 4, experiments and results are discussed in Section 5, and evaluation of the system is done in Section 6. The conclusion of this paper is presented in Section 7 and future work with more parameters in Section 8.

2. Literature Review

Sentiment analysis deals with the user's comments, reviews, likeness, ratings, etc. to retrieve the sentiment and opinions of users. The microblog text sentiment analysis is based on the NLP methodology to retrieve suitable YouTube videos and movies and campaigns for smoking cessation, pharmacovigilance, politics of elections, advertisement of pizza, journalistic inquiry, and influenza prevention for public health [39–45]. The CNN and RNN are two major categories of deep neural networks (DNNs). Sequential and hierarchal structures deal with the RNN and CNN, respectively. Both the CNN and RNN can be supervised, semisupervised, and unsupervised. The deep learning algorithm also involves in propagation and weight update activities. RNNs are based on multiple layers: input, hidden, and output layers, while CNNs have input, hidden, and pooling layers. The CNN is efficient for pattern recognition in hierarchal data classification. However, the RNN deals with linear data to be semantically analyzed and classified in NLP; in the CNN, the window size is limited, so the RNN is very useful if reviews from the microblog are very large [46, 47]. Recommendation frameworks were presented as agents of the second class, being characterized as frameworks that “… enable individuals to settle on decisions dependent on the conclusions of other individuals.” [48]. Early data-sharing frameworks had a place with the primary class and depended on text-based classification or separation, which works by choosing important things as per many literary catchphrases [49]. Recommender frameworks propose “things important to clients dependent on their unequivocal and verifiable inclinations, the inclinations of different clients, and client and thing traits.” [50]. The recommendation system is finding the right product according to the taste of the customer by filtering the fact through the likeness value [51]. Suggestions utilize the assessments of a community of clients to help people in that community all the more adequately distinguish the content of enthusiasm from a possibly overpowering set of decisions [52]. Recommendation by demographics which groups the users as per the traits of their personnel file, besides, creates proposals dependent on classes of the statistic. A premature precedent is a generalization-based Grundy system, which has been made to bolster book searching in a library [53]. The recommendation is reliant on the computation of utility of each item for a user' utility capacity (http://www.eqo.info). Recommendation by knowledge proposes things dependent on legitimate inductions about a user's inclinations. A learning portrayal or a rule about how a thing meets a specific client requirement is important (http://www.findme.com.ph). By applying preference-based collaborative filtering, a recommender system intend to foresee majority of estimation of likeness, where a few users may provide inconspicuous views as well [54]. There are two types of architecture for the recommendation systems: One is centralized and situated at a specific location [55]. Another one is geographically distributed and situated at different locations [56]. There are three types of recommendation modes by which the system will be initiated: The first one is the push mode in which suggestions are pushed to the user while he is not associating with the system by email [48]. The second one is the pull mode in which suggestions are generated but are displayed to the user just when he permits or unequivocally asks for it [57]. Push and pull modes are the active mode in which the recommender is initiated. The third one is the passive mode in which suggestions are generated as a feature of the customary framework activity, for instance, an item suggestion with reference to a user's preference [58]. A user's preference of items can be determined by using the linear adaptive function multiattribute utility theory (MAUT) [59]. Cosine similarity determined by cosine vector comparability is one of the well-known measurements of insight since it notionally considers just the edge of two vectors without the size. The collaboration between the search item and the other item that is rated by users can be measured by the angle of their vectors; if the angle is 90°, then the value of cosine similarity is zero, which means the item is irrelevant. If the angle between cosine vectors is nearly about zero, then the value of cosine similarity is one, which means the product is relevant (https://en.wikipedia.org/wiki/Cosine_similarity) [60]. There are three major classes of collaborative filtering: (1) collaborative filtering (CF) in which users and items' profile data are required to make a decision for recommendation [61], (2) content-based filtering on the description of the content of items and user preference information (explicate or implicate) for recommendation [62], and (3) combining various filtering techniques to handle scalability, sparsity, and cold start problem and other big data issues of the recommendation system to get better outcomes [63].

3. Multivariate Movie Recommendation Model

The multivariate approach is (see Figure 1) based on three modules: mobile app, multivariate recommender, and web scraper. Users can get the recommendation services through a mobile application. The mobile app module provides the information such as the user's query, profile, and history to the recommender module. The recommendation is made for both registered and unregistered users of the mobile app. The recommendation module is based on the deep learning NLP module and computation module. The NLP module preprocesses the fetched qualitative data (user's reviews) of microblogs using a tokenizer, stemmer, and POStagger and then semantically analyzes the reviews and extracts the semantic emotions about movies. Semantic parser work is based on the deep machine learning algorithm recurrent neural network (RNN/LSTM attention) with user movie attention (UMA). Semantic emotion is classified into five major classes: (i) Highly Favorable, (ii) Favorable, (iii) Averagely Favorable, (iv) Unfavorable, and (v) Highly Unfavorable, on the bases of their relative semantic scores. While the computation module normalized the quantitative data (Twitter likes, votes, and ratings), normalized scores and semantic emotional scores were evaluated to generate the recommended movie list. The recommended movie list consists of five medals and their popularity such as Platinum: “Highly Popular,” Gold: “Popular,” Silver: “Averagely Popular,” Bronze: “Unpopular,” and Copper: “Highly Unpopular.” The recommended movie list is generated according to users' taste and preference. A web scraper fetched data (reviews, Twitter likes, votes, and ratings) from external data source sites (CinemaBlend, Moviefone, Rotten Tomatoes, and Twitter) and stored them in the NoSQL database for computation. Users' feedback about a movie and app is useful for generating the recommended list and evaluation of system reliability.

Architecture of the multivariate movie recommendation system.

3.1. NLP Module

NLP has the capability to understand natural language. Users share their opinions and reviews from the microblog that help in making a decision. Positivity, negativity, and neutrality are extracted by opinion mining, whereas emotions are extracted by semantic analysis. In our work, the NLP module determines the semantic emotion of the movie's reviews by the LSTM-attention machine learning algorithm. This semantics is one of the parameters in multivariates used to make a recommendation. This methodology for semantics is depicted as follows:

The module fetches the reviews from microblogs related to movies such as CinemaBlend, Moviefone, and Rotten Tomatoes
The module preprocesses the microblog text or reviews using a sentence splitter, tokenizer, and stemmer/lemmatizer
The module determines the sense of the word to strength the sentiment using SenticNet
Semantic parsing based on attention is done to construct a parse tree to identify the syntactic tree as the emotion of the sentence
RNN/LSTM-user movie attention (UMA) machine learning algorithm is used to classify the reviews

3.2. Preprocessing

It is estimated that more than 80% of data are unstructured and not in an organized manner. Preprocessing of text is cleaning or normalization of text/reviews. Stemming or lemmatization and tokenization are done to reduce the sparsity and shrink the feature space. Semantic analysis has to face some challenges such as short text, misspelling, grammatical mistake, slang, unusual terms, tags, white spaces, noise, and emoji. Text is a sequence of words, while word is a meaningful sequence of characters. However, the question is how to find out the boundaries of words. Words are identified by spaces or punctuation in English. However, a compound word is a set of words which have no spaces in German, for example, (“childhood memories description of an unforgettable event”) ⟶ (“Kindheitserinnerungen Beschreibung eines unvergesslichen Ereignisses”), while there are no spaces at all in Japanese like this (“childhoodmemoriesdescriptionofanunforgettableevent”).

3.2.1. Tokenization

The process of splitting the text stream into units is called tokenization. Units refer to tokens. For example, “This movie is so riddled” is a character string which is tokenized as [This] [movie] [is] [so] [riddled]. Splitting the input sequence into tokens has some problems. Splitting by white space has a problem that different tokens are tokenized into similar words, while the same words may have similar meanings (https://NLTK.Tokenize.WhiteSpaceTokenizer). Splitting by punctuation in which some punctuation are not meaningful is like “An apostrophe problem” (https://NLTK.Tokenize.WordPunctTokenizer). Splitting comes up with the set of rules that generate a more meaning full result (https://NLTK.Tokenize.TreeBankWordTokenizer).

3.2.2. Stemming (Lemmatization)

The stemmer stemmed the words like the Porter stemmer, which stemmed the English words “looked” as “look” with a morphological production rule, for example, [(“SSES ⟶ SS”): (“Caresses ⟶ caress”)], [(“IES ⟶ I”): (“Ponies ⟶ Poni”)], [(“SS ⟶ SS”): (“Caress ⟶ Caress”)], and [(“S ⟶ S”): (“Cats ⟶ Cat”)], but due to stemming of nonwords, the same plural word can be stemmed to singular and irregular forms. These are produced like (Wolves ⟶ wolv), (Feet ⟶ Feet). The WordNet database is looked up for lemmas to solve this type of problem. It solves some specific problems but not all, like (Wolves ⟶ wolf) and (Feet ⟶ Foot) (https://NLTK.Stem.WordNetlemmatizer).

3.2.3. POS-Tag Generation

POS tags are determined for all the tokens by Treebank POStagger. Treebank Project 1 represents 36 POS tags (http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). For example, the POStag of string “Unwatchable I made it through 20 minutes I think” is [Unwatchable/VB] [I/PRP] [made/VBD] [it/PRP] [through/IN] [20/CD] [minutes/NNS] [I/PRP] [think/VBP].

3.2.4. Word Sense Disambiguation (WSD)

WSD is the issue of deciding the “sense” of a word. A lexicon controls a word and its conceivable faculties. Bar-Hillel, 1960, presented the example [“Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.”]. In the previous string, a word “pen” has different senses according to WordNet. “Pen” word defines an “ink flow from a point to write”; here, pen is defined as an “arena of cattle” and as a “bird's family.” In the assessment of the movie's reviews, SenticNet is utilized to indicate their degrees of polarity, antagonism, and impartiality. The SenticNet score of the terms and its recurrence are determined to get the general supposition of the reviews (https://sentic.net) [64].

3.3. Parsing

In NLP, parsing is the process of determining the structure of a sentence by analyzing its essential words based on an underlying syntax. The Stanford parser is used to construct the parse tree that determines the syntactic structure relative to grammar (language). Parsing can refer to various things. Shallow parsing or chunking is the process of grouping the words into noun phrases (NP). Stuff can also be grouped into VP (verb phrases) and PP (prepositional phrases) using grammar like (S ⟶ NP│VP), (NP ⟶ DetNoun), (NP ⟶ ProperNoun), and (VP ⟶ Verb│NP). In contrast, dependency parsing determines the dependencies between the words and their type. For example, spaCy + displaCy for parsing and rendering is used to produce a more semantic result.

3.3.1. RNN/LSTM

Neural networks are represented by RNN/LSTM cells [65]. Typically, in Birdseye, RNN/LSTM is a chain of several copies of the same static network, as shown in Figure 2. From input, the sequence of copies of networks is working in a single timestep. In addition, networks are linked with each other via their hidden states h. So we can say that every copy network has its own inputs as the copy network is unfolded or unrolled. Let the sequence be represented as x₁, x₂, x₃ … x_n and each timestep be represented as x_t ∈ x₁ … x_n. At timestep t, h_t is a hidden layer and f is used to calculate the hidden state: h_t = f(h_t−1, x_t). A word is represented by a timestep in the long sequence. For example, the given string is represented as a sequence in the mathematical form: “it is a good movie” ⟶ [“it,” “is,” “a,” “good,” “movie”]. And the timestep (t = 0, 1, 2,….) for the string “it” is represented as x₀, “is” as x₁, “a” as x₂, “good” as x₃, and “movie” as x₄. If t = 1, then x_t = “is” ⟶ “current timestep to event” and x_t−1 = “it” ⟶ “previous time stamp to event”:

\begin{matrix} X = (\begin{matrix} h_{t - 1} \\ x_{t} \end{matrix}), \\ input gate at time t : i_{t} = σ (W_{i} \cdot X + b_{i}), \\ forget gate at time t : f_{t} = σ (W_{f} \cdot X + b_{f}), \\ candidate state at time t : {\tilde{C}}_{t} = \tanh (W_{c} \cdot X + b_{c}), \\ final memory cell : C_{t} = f_{t} * c_{t - 1} + i_{t} * {\tilde{C}}_{t}, \\ output gate : o_{t} = σ (W_{o} \cdot X + b_{o}), \\ h_{t} = o_{t} * tanh (C_{t}) . \end{matrix}

(1)

At the i_ts input gate, the decision on which information should be remembered or rid of is made by the sigmoid function σ. It produces a 0 or 1 value: 0 means forget, while 1 means remember in the cell state. Sigmoid function at the input gate takes a decision on which value should be updated, and the new candidate value information is represented by tanh function ${\tilde{C}}_{t}$ . Output gate sigmoid function decides which part of information should be produced, and then tanh function produces the value between 1 and −1.

The sequential semantic information is preserved in the recurrent neural network's hidden states. In the hidden state (h_t), the semantic information of the input sequence is preserved. When a new input is experienced and again delivered to be the subsequent input, then semantic information is altered. Passing the information from one to another network helping to find out the correlation among the words from the sequence is represented as a long-term dependency.

3.3.2. LSTM-Based Sequence Labeling

Predicates from a given input sequence are marked, and the label arguments corresponding to every predicate are identified. For example, in the given sentence “I watched the movie,” the predicate (watched) is marked, and labels corresponding to the predicate are “I,” “the,” and “movie” as an agent, null, and theme, respectively. Multiple predicates may present in a sentence, and different labels may be marked to the same word for every predicate. Concatenating pretrained ones (Word2vec) generates vectors of every word. The 1-bit flag represents the predicate in the specific training unit to confirm that the network deals with every predicate separately and serves it into the LSTM layer to the word context. With the predicate, any one word is labeled to take the dot product of its hidden state. A softmax function is applied over it. The probability of a sentence is calculated as follows:

\begin{matrix} P (X) = \prod_{k = 1}^{n} P (x_{i} ∣ x_{1}, x_{2}, x_{3} \dots x_{t - 1}), \\ X_{l, r} = ReLU (x_{l} \cdot x_{r}) . \end{matrix}

(2)

Here, the role label r is calculated by the weight matrix parameter using ReLU function and predicate lemma and the role depicted by taking the dot product of vectors to embedding.

3.3.3. Neural Sentiment Classification (NSC)

Document-level sentiment classification is measured by neural sentiment classification (NSC) based on hierarchical LSTM attention with user movie attention (UMA) (see Figure 3) that is represented by the user's global information and movie features [28]. Let a review d ∈ D with sentences, each sentence (s₁, s₂, …, s_n) of a particular review s_i ∈ d, a user u ∈ U, and a movie m ∈ M review corpus (users and their movie set). Moreover, l_i is the length of the i-th sentence, while s_i consists of l_i words as x₁ⁱ, x₂ⁱ,…, x_{l_i}ⁱ. Predicting the semantic rating of documents is done according to their text information. Firstly, in word-level low-dimensional semantic space, each word x_jⁱ is mapped to its embedding x_jⁱ ∈ ℝ^d in a sentence. Every step has a given input word x_jⁱ, the current cell state c_jⁱ, and the hidden state h_jⁱ that may be updated with the preceding cell state c_j−1ⁱ. Then, the hidden state h_j−1ⁱ is represented. The document representation architecture is presented as follows:

\begin{matrix} [\begin{matrix} i_{j}^{i} \\ f_{j}^{i} \\ o_{j}^{i} \end{matrix}] = [\begin{matrix} σ \\ σ \\ σ \end{matrix}] (W \cdot [h_{j - 1}^{i}, x_{j}^{i}] + b), \\ {\hat{c}}_{j}^{i} = \tanh (W \cdot [h_{j - 1}^{i}, x_{j}^{i}] + b), \\ c_{j}^{i} = f_{j}^{i} ⊙ c_{j}^{i} + i_{j}^{i} ⊙ {\hat{c}}_{j}^{i}, \\ h_{j}^{i} = o_{j}^{i} ⊙ \tanh (c_{j}^{i}) . \end{matrix}

(3)

Sigmoid activation function and gate activation functions are represented as σ and i, f, and o, respectively, while elementwise multiplication is represented as ⊙. Training parameters needed for training are represented as x and b. The feed hidden states [h₁ⁱ, h₂ⁱ,…, h_{l_i}ⁱ] are represented to a mediocre pooling layer to acquire the representation of the s_i sentence. Sentences are embedded at the sentence level (s₁, s₂,…, s_n) into the LSTM; after that, document representation d is acquired via a mediocre pooling layer in a similar way as follows:

\begin{matrix} [\begin{matrix} i_{i} \\ f_{i} \\ o_{i} \end{matrix}] = [\begin{matrix} σ \\ σ \\ σ \end{matrix}] (W \cdot [h_{i - 1}, s_{i}] + b), \\ {\tilde{C}}_{i} = \tanh (W \cdot [h_{i - 1}, s_{i}] + b), \\ C_{i} = f_{i} ⊙ c_{i - 1} + i_{i} ⊙ {\tilde{C}}_{i}, \\ h_{i} = o_{i} ⊙ tanh (C_{i}) . \end{matrix}

(4)

Here, training parameters needed for training are represented as s and b. The feed hidden states [h₁, h₂,…, h_n] are represented to a mediocre pooling layer to acquire the d_i document representation.

3.3.4. User Movie Attention (UMA)

At various levels, a necessary component is extracted by using user movie attention (UMA) for sentiment classification. UMA is applied at the word level to construct a sentence and sentence level to generate a document. Obviously, sentence meaning may not be represented by all words for several users and movies. In spite of feeding hidden states at the word level to an average pooling layer, user movie attention (UMA) is used to extract user/movie relative words, which are essential to sentence meaning. Informative words are aggregated to produce the representation of the sentence. Formally, weighted hidden states generate the enhanced sentence as follows:

\begin{matrix} s_{i} = \sum_{j = 1}^{l_{i}} a_{j}^{i} h_{j}^{i}, \\ d_{i} = \sum_{i = 1}^{n} a_{i} h_{i} . \end{matrix}

(5)

Importance of the jth word is measured by a_jⁱ for the current user and movie. Each user u and movie m are embedded continuous and real-valued vectors u ε ℝ^d_u and m ε ℝ^d_m, while user and movie embedding is represented as d_u and d_m dimensions, respectively. Moreover, for every hidden state, the attention weight a_jⁱ is presented as follows:

\begin{matrix} a_{j}^{i} = \frac{\exp (e (h_{j}^{i}, u, m))}{\sum_{k = 1}^{l_{i}} \exp (e (h_{k}^{i}, u, m))} . \end{matrix}

(6)

For the sentence level,

\begin{matrix} a_{i} = \frac{\exp (e (h_{i}, u, m))}{\sum_{i = 1}^{n} \exp (e (h_{i}, u, m))} . \end{matrix}

(7)

Importance of words for sentence representation as well as document representation is presented by e score function as follows:

\begin{matrix} e (h_{j}^{i}, u, m) = v^{T} \tanh (W_{h} h_{i j} + W_{u} u + W_{m} m + b) . \end{matrix}

(8)

For the sentence level,

\begin{matrix} e (h_{i}, u, m) = v^{T} \tanh (W_{h} h_{i} + W_{u} u + W_{m} m + b), \end{matrix}

(9)

where v is a weight vector and v^T represents its transpose, while W_h, W_u, and W_m are weight matrices. Meaning of every document varies for different users and movies by the sentence, which provides the hints. So in the sentence level, usage of attention a with the u user and m movie vector at the word level to select informative sentences to generate document representation d is presented as follows:

\begin{matrix} d = \sum_{i = 1}^{n} β_{i} h_{i} . \end{matrix}

(10)

In the sentence level, the β_i weight of the h_i hidden state is measured similar to word attention. The higher level representation of document d is generated by hierarchical extraction from words and sentences in the document. So, for sentiment classification of the document, it is used as features. tanh activation function is used at the nonlinear layer for current document representation in the target space of C classes:

\begin{matrix} \hat{d} = \tanh (W_{c} d + b_{c}) . \end{matrix}

(11)

tanh activation function is used at an absolute layer to get sentiment distribution of the document:

\begin{matrix} d_{c} = \frac{\exp ({\hat{d}}_{c})}{\sum_{k = 1}^{C} exp ({\hat{d}}_{k})} . \end{matrix}

(12)

Sentiment classes and prediction probability of sentiment class C are represented as C and p_c, respectively. During the training, loss function for optimization is measured by error cross-entropy between the distribution of Gold sentiment and distribution of our model sentiment as follows:

\begin{matrix} L = \sum_{d ε D} \sum_{c = 1}^{C} p_{c}^{g} (d) \cdot \log (p_{c} (d)) . \end{matrix}

(13)

Here, Gold probability of sentiment class C and training document are represented as p_c^g and d, respectively, while reality-based truth is one and others are zero.

Some nomenclatures used in our mathematical model are presented in Table 1.

Table 1.

Nomenclatures and description.

Nomenclature	Description
d	Document/review
s	Sentence
x	Word
D	Review corpus
m	Movie
l	Length of a sentence
h	Hidden state
S	Total movie sites
b	Biases
TL	Twitter likes
t	Timestep
C ^j,1	j-th movie sentiment at site S₁
C ^j,2	j-th movie sentiment at site S₂
C ^j,3	j-th movie sentiment at site S₃
R ^j,1	j-th movie rating at site S₁
R ^j,2	j-th movie rating at site S₂
R ^j,3	j-th movie rating at site S₃
Q ^j	j-th movie total quantitative score
RecS	Final recommendation score
AWAS	Aggregated weighted average sentiment
Multivariate	Multivariate final score
i	Input gate
o	Output gate
f	Forget gate
σ	Activation function
b	Biases
v	Weight vector
v ^T	Vector transpose
t	Timestep
⊙	Multiplication
h _t	Hidden state at t timestep
h _t−1	Hidden state at t−1 (previous) timestep
W	Weight matrix for input to hidden layers at t timestep
∅	tanh is an activation function
x _j ⁱ	Input at timestep (t)
V ^j,1	j-th movie votes at site S₁
V ^j,2	j-th movie votes at site S₂
V ^j,3	j-th movie votes at site S₃
L	Loss
AS	Aggregated sentiment
WAS	Weighted average sentiment

Semantic score	Emotional class
0.5< and ≤1.00	Highly Favorable
0.00< and ≤0.5	Favorable
−0.5< and ≤0.00	Average Favorable
−1.00< and ≤−0.50	Unfavorable
≤−1.00	Highly Unfavourable

Popularity score	Medal rank	Status
0.8–1.0	Platinum	Highly Popular
0.6–0.79	Gold	Popular
0.4–0.59	Silver	Average Popular
0.2–0.39	Bronze	Unpopular
0.0–0.19	Copper	Highly Unpopular

Movie and category IDs
Movie ID	Movie title	Movie category
m ₁	Robin Hood (2018)	Action (c₁)
m ₂	The House with a Clock in Its Walls (2018)	Adventure (c₂)
m ₃	The Predator (2018)	Fantasy (c₃)
m ₄	Venom (2018)	Horror (c₄)
m ₅	The Flash	Science fiction (c₅)

ID table
Site name	Site ID	Movie ID	User name	User ID	Review ID
CinemaBlend	s ₁	m ₁	Deplorable_me	u ₁	d ₁
		m ₂	Snow gator	u ₂	d ₂
		m ₃	David Curry	u ₃	d ₃
		m ₄	Smedley	u ₄	d ₄
		m ₅	DC villains	u ₅	d ₅
Moviefone	s ₂	m ₁	The Guardian Peter Bradshaw	u ₆	d ₆
		m ₂	Snow gator	u ₇	d ₇
		m ₃	Relax ad mike	u ₈	d ₈
		m ₄	Jza Smack	u ₉	d ₉
		m ₅	Clifford De Voe	u ₁₀	d ₁₀
Rotten Tomatoes	s ₃	m ₁	Jennifer Heaton	u ₁₁	d ₁₁
		m ₂	Carlos Díaz Reyes	u ₁₂	d ₁₂
		m ₃	Jeffrey Bloomer	u ₁₃	d ₁₃
		m ₄	ugene Bernabe	u ₁₄	d ₁₄
		m ₅	Dee R.	u ₁₅	d ₁₅

Tags
User ID	Tokens per document	Tagging
u ₁	9	Unwatchable/VB I/PRP made/VBD it/PRP through/IN 20/CD minutes/NNS I/PRP think/VBP
u ₂	91	Thought/RB it/PRP was/VBD uneven/JJ and/CC wasted/VBD some/DT great/JJ talent/NN ./. Not/RB funny/JJ enough/RB ,/, too/RB much/JJ turd/VBD humor/NN ,/, and/CC think/VBP it/PRP is/VBZ a/DT made/VBN for/IN USA/NNP level/NN of/IN quality/NN with/IN better/JJR effects/NNS ./. It/PRP is/VBZ worth/JJ seeing/VBG for/IN Blanchett/NNP ./.She/PRP does/VBZ steal/VB every/DT scene/NN ,/, and/CC when/WRB the/DT sequel/NN happens/VBZ -/: and/CC it/PRP has/VBZ made/VBN more/RBR than/IN enough/JJ money/NN for/IN one/CD -/: I/PRP hope/VBP she/PRP is/VBZ front/NN and/CC center/NN as/IN the/DT main/JJ character/NN ./. She/PRP and/CC Black/NNP do/VBP have/VB a/DT fantastic/JJ chemistry/NN ./.
u ₃	34	Predator/NNP 1/CD &/CC 2/CD had/VBD comedy/NN in/IN it/PRP ./. Shane/NNP Black/NNP helped/VBD write/VB the/DT original/NN -LRB-/-LRB- everyone/NN should/MD know/VB that/DT by/IN now/RB -RRB-/-RRB- ./. Predators/NNS is/VBZ the/DT most/RBS serious/JJ movie/NN of/IN the/DT franchise/NN ./.
u ₄	34	I/PRP went/VBD to/TO see/VB it/PRP today/NN with/IN open/JJ expectations/NNS -LRB-/-LRB- professional/JJ reviews/NNS bad/JJ ,/, viewer/CD reviews/NNS good/JJ -RRB-/-RRB- and/CC thought/VBD it/PRP was/VBD a/DT fun/NN movie/NN ./. It/PRP cracked/VBD me/PRP up/IN a/DT couple/NN times/NNS ./.
u ₅	17	The/DT Flash/NNP has/VBZ done/VBN a/DT fantastic/JJ job/NN of/IN incorporating/VBG classic/NN from/IN the/DT hero/NN's/POS comic/JJ book/NN history/NN
—	—	—
u _n	—	—

Twitter likes
Movie name	Unnormalized	Normalized
m ₁	366	366
m ₂	154	154
m ₃	258	258
m ₄	3	3
m ₅	2196K	2169

Movie ID	Final score	Genre category	Medal rank	Recommendation of movie
m ₁	1.30	Action	Copper	Highly Unpopular
m ₂	4.41	Adventure	Silver	Average Popular
m ₃	3.63	Fantasy	Bronze	Unpopular
m ₄	4.59	Horror	Silver	Average Popular
m ₅	4.47	Science fiction	Silver	Average Popular

Classification models	IMDB		Yelp 2013		Yelp 2014
Classification models	Accuracy	RMSE	Accuracy	RMSE	Accuracy	RMSE
Without using user and product information
Majority	0.196	2.495	0.411	1.060	0.392	1.097
Trigram	0.399	1.783	0.569	0.814	0.577	0.804
Text feature	0.402	1.793	0.556	0.845	0.572	0.800
AvgWordvec + SVM	0.304	1.985	0.526	0.898	0.530	0.893
SSWE + SVM	0.312	1.973	0.549	0.849	0.557	0.851
Paragraph vector	0.341	1.814	0.554	0.832	0.564	0.802
RNTN + recurrent	0.400	1.764	0.574	0.804	0.582	0.821
CNN and without UP (UPNN)	0.405	1.629	0.577	0.812	0.585	0.808
NSC	0.443	1.465	0.627	0.701	0.637	0.686
NSC + LA	0.487	1.381	0.631	0.706	0.630	0.715

Using user and product information
Trigram + UPF	0.404	1.764	0.570	0.803	0.576	0.789
Text feature + UPF	0.402	1.774	0.561	1.822	0.579	0.791
JMARS	N/A	1.773	N/A	0.985	N/A	0.999
UPNN (CNN)	0.435	1.602	0.596	0.784	0.608	0.764
UPNN (NSC)	0.471	1.443	0.631	0.702	N/A	N/A
NSC + UMA	0.533	1.281	0.650	0.692	0.667	0.654

Ref.	NLP/RNN/LSTM	User preferences	NoSQL	Heterogeneous data	Quantitative score (votes, likes, and ratings)	Qualitative score (analysis of reviews)	Multivariates	Multiple data source sites	Popularity medals	Web bot	Categories	User app
[15]	☓	✓	✓	☓	✓	☓	☓	☓	☓	☓	☓	✓
[72]	✓	✓	✓	☓	☓	✓	☓	☓	☓	☓	☓	✓
[73]	✓	✓	☓	☓	☓	✓	☓	☓	☓	☓	☓	☓
[74]	✓	✓	☓	☓	☓	☓	☓	☓	☓	☓	☓	☓
[21]	✓	☓	☓	✓	✓	☓	✓	✓	☓	✓	✓	✓
Proposed work	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓

	AS	WAS	AWAS	[74]	[21]	Multivariate system
Precision	0.6223	0.7979	0.7778	0.8537	0.7950	0.9858
Recall	0.6133	0.6818	0.7561	0.7955	0.8858	0.9911
F score	0.6178	0.7353	0.7668	0.8235	0.8379	0.9884
Accuracy	0.5780	0.7300	0.7530	0.7750	0.8290	0.9870

Rating
Movie ID	Unnormalized			Normalized
Movie ID	CinemaBlend (%)	Moviefone (%)	Rotten Tomatoes (%)	CinemaBlend	Moviefone	Rotten Tomatoes
m ₁	70	16	39	7	1.60	3.90
m ₂	80	55	60	4.0	5.50	6.00
m ₃	70	25	49	3.5	2.50	4.90
m ₄	40	Nil	39	2.0	Nil	3.90
m ₅	73	69	93	7.3	6.90	9.30

Votes
Movie ID	Unnormalized			Normalized
Movie ID	CinemaBlend	Moviefone	Rotten Tomatoes	CinemaBlend	Moviefone	Rotten Tomatoes
m ₁	1.5K	679	4.5K	1500	679	4500
m ₂	870	760	6.5K	870	760	6500
m ₃	797	890	94	797	890	94
m ₄	3.46K	6.9K	910	3460	6900	910
m ₅	67	76	8.3K	670	760	8300

Decision parameters	TP	TN	FP	FN
AS	341	237	207	215
WAS	375	355	95	175
AWAS	406	347	116	131
[74]	525	250	90	135
[21]	442	387	114	57
Multivariate system	554	433	8	5

PERMALINK

A Neural Network-Inspired Approach for Improved and True Movie Recommendations

Muhammad Ibrahim

Imran Sarwar Bajwa

Riaz Ul-Amin

Bakhtiar Kasi

Abstract

1. Introduction

2. Literature Review

3. Multivariate Movie Recommendation Model

Figure 1.

3.1. NLP Module

3.2. Preprocessing

3.2.1. Tokenization

3.2.2. Stemming (Lemmatization)

3.2.3. POS-Tag Generation

3.2.4. Word Sense Disambiguation (WSD)

3.3. Parsing

3.3.1. RNN/LSTM

Figure 2.

3.3.2. LSTM-Based Sequence Labeling

3.3.3. Neural Sentiment Classification (NSC)

Figure 3.

3.3.4. User Movie Attention (UMA)

Table 1.

Table 2.

3.4. Computation and Classification

Table 3.

4. Multivariate Movie Recommendation System Implementation

4.1. System Component Interaction

Figure 4.

4.2. NoSQL Environment Implementation

4.2.1. Hadoop Architecture

4.2.2. Apache Mahout

4.3. Web Scraper

4.4. NLP Tools

4.5. Mobile Application Usage

4.5.1. Unregistered Users

4.5.2. Registered Users

4.6. Cold Start Problem Handling

4.7. Similarity Measurement

Figure 5.

5. Experiments and Results

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

6. Evaluation and Discussion

6.1. Sentiment Classification Model Evaluation

Table 13.

6.2. Comparative Analysis of Recommendation Models

Table 14.

6.3. Results of the Experiments

Figure 10.

Table 15.

Table 16.

Figure 11.

7. Conclusion

8. Future Work

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases