Skip to main content
PLOS One logoLink to PLOS One
. 2013 May 17;8(5):e63531. doi: 10.1371/journal.pone.0063531

Information Filtering via a Scaling-Based Function

Tian Qiu 1,*, Zi-Ke Zhang 2,3,4, Guang Chen 1
Editor: Danilo Roccatano5
PMCID: PMC3656959  PMID: 23696829

Abstract

Finding a universal description of the algorithm optimization is one of the key challenges in personalized recommendation. In this article, for the first time, we introduce a scaling-based algorithm (SCL) independent of recommendation list length based on a hybrid algorithm of heat conduction and mass diffusion, by finding out the scaling function for the tunable parameter and object average degree. The optimal value of the tunable parameter can be abstracted from the scaling function, which is heterogeneous for the individual object. Experimental results obtained from three real datasets, Netflix, MovieLens and RYM, show that the SCL is highly accurate in recommendation. More importantly, compared with a number of excellent algorithms, including the mass diffusion method, the original hybrid method, and even an improved version of the hybrid method, the SCL algorithm remarkably promotes the personalized recommendation in three other aspects: solving the accuracy-diversity dilemma, presenting a high novelty, and solving the key challenge of cold start problem.

Introduction

Favored by increasing information, people can enjoy an abundant life. However, people are also brought into a quandary decision of getting what they actually prefer. For example, how to select a satisfactory dress from various dress brands, or get an interesting book to read from the book sea. As a powerful tool, recommendation engine emerges to help people out of the overloaded information [1]. With the need of personalized recommendation, developing efficient recommendation methods has become one of the central scientific programs.

A great many algorithms have been proposed, and have led to a considerable progress, such as the collaborative filtering (CF) algorithms [2], [3] which can be further divided into memory-based [4][6] and model-based methods [7][10], content-based algorithms [11][14], and the relevant extensive studies [15][21]. Recently, favored by the fruitful achievements of complexity theory, complex-network-based recommendation algorithms have been proposed [21], [22], which directs a promising way for the personalized recommendation [23][35]. Meanwhile, concepts from traditional physical domain have been introduced into the algorithm design, e.g., the introduction of the thought of mass diffusion [24], [28] and heat conduction [23], [28], which greatly promotes recommendation accuracy and diversity, respectively.

Among these numerous physical-concept-based recommendation algorithms, a representative work is a hybrid algorithm of heat conduction and mass diffusion (HHP) [28]. Generally, improving the recommendation accuracy usually inhibits the recommendation diversity. However, the need of personalized recommendation resorts to a powerful engine that is not only accurate but also personalized. Whereas improving the recommendation accuracy, the HHP method simultaneously elevates the recommendation diversity, which therefore greatly contributes to solving the long-standing dilemma between the recommendation accuracy and diversity for the network-based recommender systems. Inspired by this work, extensive methods have been proposed in various disciplines, such as the integrated weighted tags [36] and the target-drug prediction [37]. A promising direction of improvement is to consider the heterogeneity of users or objects [38], which might lead to a more personalized recommendation matching individual tastes.

However, for a number of different algorithms, the algorithm performance is usually controlled by some ‘tunable parameter’. What challenges these algorithms in common is how to find out the optimal value of the tunable parameter. By far, most algorithms take a one-evaluator-based parameter selection, namely, choosing the optimal value of the tunable parameter according to the recommendation performance of one evaluator [28], [35], [39], [40]. For instance, one can take the value of the tunable parameter as its optimal value, with which parameter the system leads to its best recommendation accuracy. Nevertheless, without bias, different recommendation focuses might prefer different evaluator performance. Consequently, a challenging question emerges: which evaluator is the best one to be used as the reference of searching for the optimal value of the tunable parameter? Even though the recommendation accuracy is widely accepted to be the most important evaluator in personalized recommendation, the cold start problem or the recommendation diversity and novelty also raises a central interest [28], [41], . The cold start problem refers to how to recommend the new object or recommend the interesting object to new users due to the lack of activity records. The diversity and novelty also significantly mark the vitality of a system. Explicitly, one can hardly find out the same optimal value of the tunable parameter according to different recommendation focal purposes. Moreover, even when evaluating the recommendation accuracy, different indicators might reach different optimal values of the tunable parameter. For example, the ranking score [24] and the precision [43] are both indicators which are used to evaluate the recommendation accuracy. However, the optimal value of the tunable parameter obtained from the ranking score and the precision are usually not consistent for the same method.

Motivated by the explicit dilemma to choose a proper reference of the algorithm optimization, in the present paper, for the first time, we introduce a scaling-based algorithm (SCL) independent of the recommendation list length, based on the hybrid method of heat conduction and mass diffusion (HHP). By testing our algorithm on three real datasets, Netflix, MovieLens and RYM, we here report two results:

  1. A single curve independent of the recommendation list length is obtained by rescaling the tunable parameter and the object average degree, and we describe it by a scaling function. The optimal value of the tunable parameter can be abstracted from the scaling function, which is heterogeneous for the individual object.

  2. The present algorithm shows a high accuracy in recommendation. More importantly, it greatly improves the personalized recommendation in three other challenging aspects: solving the accuracy-diversity dilemma, presenting a high novelty, and solving the cold start problem.

The remainder of this paper is organized as follows. In the next section, we detail the bipartite network and the investigated recommendation algorithms. Some popular indicators to evaluate the recommendation algorithm performance are introduced in the section of metrics, and followed by the description of the datasets in the data section. Then, we compare the results of the present algorithm with a highly accurate mass diffusion algorithm, the original both highly accurate and diverse hybrid method, and even an improved version of the hybrid method which well resolves the cold start problem in the section of results and discussion. Finally comes to the conclusion.

Materials and Methods

A recommendation system can be described by a bipartite network composed of a user set and an object set. The user set includes Inline graphic users Inline graphic, and the object set includes Inline graphic objects Inline graphic. If an object Inline graphic is collected by a user Inline graphic, then add a link between them. The adjacent matrix which links the users and the objects is Inline graphic. If the object Inline graphic is collected by the user Inline graphic, then Inline graphic, otherwise, Inline graphic.

In the following algorithms, a so-called “resource” is introduced to objects. At first, objects are assigned an initial resource f, with Inline graphic for a particular user Inline graphic. If an object is collected by the user Inline graphic, its initial resource is assigned to be 1, otherwise, to be 0. That is to say, for the user Inline graphic, the initial resource Inline graphic of the object Inline graphic equates the value of the adjacent matrix element Inline graphic, i.e., Inline graphic. After a resource reallocation process via a transformation matrix W, objects obtain a final resource Inline graphic formulated by Inline graphic. For each user, rank his/her uncollected objects in the decreasing order of the final resource, and then recommend the top Inline graphic objects to the user. The formula of the transformation matrix Inline graphic, i.e., how to redistribute the resources, therefore plays a key role in the recommendation process.

PBS and HTS Methods

The mass-diffusion based algorithm, referring to the probability spreading (PBS) process based algorithm, is reported as a highly accurate method [24]. An example is illustrated in figure 1 (a) to show the process of the resource reallocation. Initially, the four objects are assigned a resource. At first, each object distributes the resource to its neighboring users with an equal probability. For example, for the particular user indicated by the solid circle with two neighboring objects, i.e., the first and the fourth object. The first object transits Inline graphic resource to the user, and the fourth object also transits Inline graphic resource to the user. Therefore, the user can get the total resource of 1 from his/her neighboring objects. Then the user again redistributes the total resource of 1 to his/her neighboring objects with the equal probability, i.e., the first and the fourth objects both get Inline graphic resource from the user. By summing up all the resources from their neighboring users, the objects then obtain their final level of resources. The resource transformation matrix of the PBS is formulated as,

graphic file with name pone.0063531.e033.jpg (1)

where Inline graphic is the degree of object Inline graphic, and Inline graphic is the degree of user Inline graphic (Degree is denoted as the number of links owned by the user or the object). We assume an object to be popular if the object has a high degree, otherwise, the object to be cold. In the last step of the PBS, due to objects receiving resources from all their neighboring users, it greatly upgrades the resources of objects with high degrees. Henceforth, the PBS assigns more priority to the popular objects, leading to a good recommendation accuracy, yet a relatively low diversity.

Figure 1. An illustration of the resource reallocation process.

Figure 1

(a) for the PBS method, and (b) for the HTS method.

By incorporating heat-conduction analogous process, the heat conduction (HTS) method is proposed, with an illustration of how resources are reallocated shown in figure 1 (b). Firstly, the user gets the average resource from all his/her neighboring objects. For example, for the particular user indicated by the solid circle, he/she receives 1 resource from the first object and 1 resource from the fourth object. Taking an average over the two objects, the user therefore gets the total resource of 1. Then the object again gets the average resource from all its neighboring users. The transformation matrix then reads,

graphic file with name pone.0063531.e058.jpg (2)

where Inline graphic is the degree of object Inline graphic. In the last step of the HTS, due to the resources of objects divided by their degree, the rank of objects with high degrees is greatly depreciated. Therefore, the HTS assigns more priority to the cold objects, leading to a good performance in recommendation diversity, but at the cost of the recommendation accuracy.

Hybrid Method and an Improved Version

To achieve a high accuracy and diversity of recommendation, a hybrid method (HHP) is proposed [28], by elegantly combining the heat conduction and the mass-diffusion method as,

graphic file with name pone.0063531.e064.jpg (3)

where Inline graphic. When tuning the parameter Inline graphic to a suitable value, the HHP method shows an apparent advantage in both the recommendation accuracy and the diversity.

Based on the HHP method, an improved object-oriented hybrid method (OHHP) is proposed [38], focusing on resolving the cold-start problem. In the OHHP, an object-degree-dependent tunable parameter is introduced, with its resource transformation matrix to be,

graphic file with name pone.0063531.e071.jpg (4)

where Inline graphic, Inline graphic is the maximal degree of all the object degrees, and Inline graphic is a tunable parameter. The OHHP actually optimizes the probability spreading factor in the transformation matrix of equation (3) according to the individual object degree level, therefore it greatly enhances the recommendation accuracy of cold objects, whereas keeping a high recommendation accuracy of the overall objects.

Scaling-based Method

The common question in most algorithms is how to find out the optimal value of the tunable parameter. For example, the optimal value obtained by utilizing the ranking score as the reference is usually different from that obtained by utilizing the diversity as the reference. Moreover, diversity performance varies with the recommendation list length. We show the tunable parameter Inline graphic on the object average degree Inline graphic for different recommendation list length Inline graphic in the HHP algorithm in figure 2, where Inline graphic. For three real datasets, the Netflix, MovieLens, and RYM (Details of the datasets will be introduced in the Data section), Inline graphic on Inline graphic exhibits different behavior for different recommendation list length. It indicates that, for different recommendation list length, one can obtain different value of the tunable parameter for the same object average degree. If the scaling behavior independent of the recommendation list length can be found, the tunable parameter on the object average degree for different recommendation list length can be then described in a universal way.

Figure 2. The tunable parameter Inline graphic on the object average degree Inline graphic.

Figure 2

The black, red, green, blue and dark yellow lines are for the recommendation list lengths of Inline graphic, 20, 30, 40 and 50, respectively.

In order to obtain an Inline graphic-independent scaling function, we analytically investigate the recommendation result for the HHP algorithm. On average, the probability that a user Inline graphic collects an object Inline graphic is directly proportional to Inline graphic’s degree, Inline graphic, that is to say, Inline graphic, where Inline graphic is the number of objects. Hypothesize that the probability of Inline graphic is independent of other links. For the particular user Inline graphic, the resource Inline graphic of the object Inline graphic can be calculated according to the transformation matrix, which reads,

graphic file with name pone.0063531.e096.jpg (5)

where Inline graphic is the probability distribution function of the object degrees. As suggested in Ref. [38], Inline graphic obeys a power-law distribution from the empirical study, i.e., Inline graphic. Then, one can calculate Inline graphic as,

graphic file with name pone.0063531.e101.jpg (6)

where Inline graphic and Inline graphic are respectively the maximum and the minimum of the object degrees.

Inspired by the above theoretical analysis, we propose the Scaling-based (SCL) algorithm, making use of the formula in equation (6) to collapse the data into a single curve characterized by the scaling form,

graphic file with name pone.0063531.e104.jpg (7)

where Inline graphic is a universal function, Inline graphic, with Inline graphic and Inline graphic to be the maximum and minimum of the object average degree Inline graphic for the overall range of Inline graphic. We rescale the axes Inline graphic and Inline graphic according to the transformation Inline graphic and Inline graphic, and obtain Inline graphic and Inline graphic to make all the curves roughly collapsed to a single curve. Therefore, Inline graphic and Inline graphic. As shown in figure 3, the major part of the curves is well collapsed. However, due to the fluctuations of empirical data, a small part of the curves is only approximately collapsed. The procedure to obtain the optimal value of the tunable parameter in the SCL is as follows:

Figure 3. The rescaled tunable parameter Inline graphic vs. the rescaled object average degree Inline graphic.

Figure 3

The black, red, green, blue and dark yellow lines are for the recommendation list lengths of Inline graphic, 20, 30, 40 and 50, respectively.

  1. Make the polynomial fit Inline graphic for the single curve, so that one can obtain a set of fitting coefficients Inline graphic, where Inline graphic is the number of polynomial fitting order. Here we take Inline graphic to obtain the coefficient set Inline graphic.

  2. Having the coefficients Inline graphic, compute the optimal value of the tunable parameter Inline graphicfor a particular object Inline graphic according to the formula Inline graphic, where Inline graphic, with Inline graphic being the degree of the examined object Inline graphic, Inline graphic, and Inline graphic(Inline graphic) being the maximal (minimal) degree of all the objects.

  3. Having the optimal value of the tunable parameter Inline graphic for a particular object Inline graphic, calculate its resource transformation matrix as

graphic file with name pone.0063531.e136.jpg (8)

Henceforth, the optimal value of the tunable parameter in the SCL is no longer accessed according to any specific evaluator, but abstracted from the scaling function acquired from the single curve.

Metrics

Recommendation accuracy is with no doubt one of the most important indicators to evaluate the performance of an algorithm. As an adjunct to accuracy, recommendation diversity and novelty are addressed to be important evaluators to quantify the personalized recommendation. In our study, we take the ranking score, precision and recall to quantify the recommendation accuracy, the object average degree to quantify the novelty, the inter-diversity and inner-diversity to quantify the recommendation diversity. Moreover, to specifically investigate the recommendation accuracy of cold objects, we further study an object-dependent ranking score, an object-dependent precision, and an object-dependent recall.

1. Ranking score (Inline graphic) [24]

The ranking score Inline graphic for the object Inline graphic to the user Inline graphic is defined as,

graphic file with name pone.0063531.e141.jpg (9)

where Inline graphic is the number of all objects, Inline graphic is the degree of the user Inline graphic, and Inline graphic is the position of the recommended object Inline graphic located in all the uncollected objects of the user Inline graphic. Generally speaking, users collect the objects which they prefer. Namely, for a user Inline graphic, if the deleted link with an object Inline graphic is in a higher rank of Inline graphic’s all deleted links, the algorithm is more accurate. The average ranking score Inline graphic is then defined as the average of Inline graphic over all the deleted links. The smaller the Inline graphic, the more accurate the algorithm.

To focus on the recommendation accuracy of cold objects, we define an object-degree dependent ranking score Inline graphic as the average ranking score over objects with the same value of degrees [39].

2. Precision (Inline graphic) [43]

The recommendation precision Inline graphic is defined as

graphic file with name pone.0063531.e157.jpg (10)

where Inline graphic is the number of the user ui’s deleted links contained in the top Inline graphic recommended object list. The larger the Inline graphic, the higher accuracy the algorithm.

Similarly, to better understand the recommendation accuracy of the cold objects, we define an object-degree dependent precision by,

graphic file with name pone.0063531.e161.jpg (11)

where Inline graphic is the number of the user ui’s deleted links for objects with degree Inline graphic in the top Inline graphic recommended object list.

3. Recall (Inline graphic) [43]

The recall Inline graphic is defined as

graphic file with name pone.0063531.e167.jpg (12)

where Inline graphic is the number of user ui’s deleted links contained in the top Inline graphic recommended object list, Inline graphic is the number of user ui’s deleted links in the test set.

The object-degree dependent recall is analogously defined as,

graphic file with name pone.0063531.e171.jpg (13)

where Inline graphic is the number of user ui’s deleted links for objects with degree Inline graphic in the top Inline graphic recommended object list, and Inline graphic is the number of user ui’s deleted links for objects with degree Inline graphic in the test set.

4. Novelty (Inline graphic)

The average degree of objects in the recommendation list is widely used to identify the novelty of a recommender system, which is defined by,

graphic file with name pone.0063531.e178.jpg (14)

where Inline graphic is the object set of user Inline graphic’s recommendation list. If Inline graphic is small, it indicates that, on average, the degree of the recommended objects is small, i.e., more cold objects are recommended, which is therefore more novel to users; otherwise, if the recommended objects are on average with high degree, i.e., the popular objects, it is less novel to users.

5. Inter diversity (Inline graphic)

Inline graphic quantifies the difference between different users recommendation list by

graphic file with name pone.0063531.e184.jpg (15)

where Inline graphic is the number of common recommended objects for user Inline graphic and Inline graphic in the top Inline graphic recommendation list. Generally, the greater the Inline graphic, the more personalized the recommendation for different users, and vice versa.

6. Inner diversity (Inline graphic)

Inline graphic calculates the difference within a specific user recommendation list by

graphic file with name pone.0063531.e192.jpg (16)

where Inline graphic is the cosine similarity between objects Inline graphic and Inline graphic in a single user’s top Inline graphic recommended object list. Generally, the greater the Inline graphic, the higher diversification of the recommendation list for a specific user, and vice versa.

Data

We test the algorithm performance on three datasets, the Netfilx, MovieLens and RYM. The Netflix and MovieLens are movie rating systems with a five-level rating and the RYM is a music rating system with a ten-level rating. The Netflix dataset is obtained by randomly selecting from the huge dataset of the Netflix Prize, and the MovieLens is downloaded from the web site of GroupLens Research (http://grouplens.org), and the RYM dataset is downloaded from the music rating web site RateYourMusic.com. Due to the different level of ratings, we perform a coarse-graining mapping to a unary form for all the three datasets. If the rating is no less than three for the Netflix and MovieLens, and six for the RYM, we argue that the object is collected by a user. The Netflix contains 9999 users, 5870 objects and 815917 links, and the MovieLens contains 943 users, 1682 objects and 100000 links, and the RYM contains 10159 users, 5250 objects and 559634 links. The sparsity of the datasets, defined as the number of links proportional to the total number of the user-object links, is Inline graphic, Inline graphic and Inline graphic for the Netflix, the MovieLens and the RYM, respectively.

We divide a dataset into two subsets of the training set and the test set. We randomly delete Inline graphic links as the test set, and remain the rest Inline graphic links as the training set. We utilize the training set to make predictions for users, and the test set to test the algorithm performance.

Results and Discussion

To provide a solid investigation of the performance of the SCL algorithm, we compare the performance of the SCL with three typical and excellent algorithms, the PBS, the HHP, and the OHHP. The PBS is highly accurate, and the HHP well resolves the great challenge of accuracy-diversity dilemma, and the OHHP further outperforms the HHP in resolving the cold start problem. A summary of the performance of the PBS, the HHP, the OHHP and the SCL is presented in table 1, with the results being the average over six runs.

Table 1. The performance of the PBS, HHP, OHHP and SCL methods.

r rk≤ 10 P Pk≤ 10 R Rk≤ 10 NL DInter DInner
NET PBS 0.051 0.484 0.054 0.0000 0.420 0.0003 2336.0 0.637 0.423
HHP 0.045 0.417 0.062 0.0006 0.470 0.0176 1843.7 0.720 0.672
OHHP 0.044 0.350 0.058 0.0009 0.437 0.0255 2048.3 0.691 0.575
SCL 0.046 0.357 0.060 0.0012 0.426 0.0340 1497.5 0.792 0.768
MOV PBS 0.105 0.562 0.074 0.0000 0.477 0.0000 233.5 0.645 0.616
HHP 0.083 0.408 0.085 0.0011 0.527 0.0441 157.2 0.717 0.839
OHHP 0.083 0.364 0.084 0.0015 0.528 0.0527 170.6 0.707 0.818
SCL 0.087 0.326 0.080 0.0028 0.469 0.0928 128.2 0.762 0.881
RYM PBS 0.069 0.480 0.042 0.0002 0.497 0.0080 465.7 0.829 0.874
HHP 0.048 0.250 0.050 0.0024 0.557 0.0924 329.7 0.850 0.940
OHHP 0.050 0.189 0.047 0.0048 0.542 0.1578 374.8 0.849 0.919
SCL 0.050 0.168 0.048 0.0055 0.539 0.1835 317.9 0.862 0.941

The overall ranking score Inline graphic, the object-degree dependent ranking score Inline graphic, the overall precision Inline graphic, the object-degree dependent precision Inline graphic, the overall recall Inline graphic, the object-degree dependent recall Inline graphic, the novelty Inline graphic, the inter-diversity Inline graphic and the inner-diversity Inline graphic of the PBS, HHP, OHHP and SCL methods are shown for the Netflix(NET), the MovieLens(MOV) and the RYM, with Inline graphic.

To detect how much the SCL outperforms the other three algorithms, we define an improvement percentage Inline graphic by,

graphic file with name pone.0063531.e204.jpg (17)

where the subhead Inline graphic refers to the investigated algorithm, and the Inline graphic is the value of the indicator, i.e., the value of Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic. The improvement percentage Inline graphic of the SCL against the PBS, the HHP and the OHHP is summarized in table 2.

Table 2. The improvement percentage of the SCL against the PBS, HHP and OHHP methods.

r rk≤ 10 P Pk≤ 10 R Rk≤ 10 NL DInter DInner
NET δPBS 9.8% 26.2% 11.1% 1.4% 11233.3% 35.9% 24.3% 81.6%
δHHP −2.2% 14.4% −3.2% 100.0% −9.4% 93.2% 18.8% 10.0% 14.3%
δOHHP −4.5% −2.0% 3.4% 33.3% −2.5% 33.3% 26.9% 14.6% 33.6%
MOV δPBS 17.1% 42.0% 8.1% −1.7% 45.1% 18.1% 43.0%
δHHP −4.8% 20.1% −5.9% 154.5% −11.0% 110.4% 18.4% 6.3% 5.0%
δOHHP −4.8% 10.4% −4.8% 86.7% −11.2% 76.1% 24.9% 7.8% 7.7%
RYM δPBS 27.5% 65.0% 14.3% 2650.0% 8.5% 2193.8% 31.7% 4.0% 7.7%
δHHP −4.2% 32.8% −4.0% 129.2% −3.2% 98.6% 3.6% 1.4% 0.1%
δOHHP 0.0% 11.1% 2.1% 14.6% −0.6% 16.3% 15.2% 1.5% 2.4%

The improvement percentage of the SCL against the PBS, HHP and OHHP in the overall ranking score Inline graphic, the object-degree dependent ranking score Inline graphic, the overall precision Inline graphic, the object-degree dependent precision Inline graphic, the overall recall Inline graphic, the object-degree dependent recall Inline graphic, the novelty Inline graphic, the inter-diversity Inline graphic and the inner-diversity Inline graphic are shown for the Netflix(NET), the MovieLens(MOV) and the RYM, with Inline graphic. To guide the eyes, if the indicator of the SCL outperforms other methods, we show the improvement percentage as a positive value, otherwise, as a negative value. The blank in the form indicates an infinite value owing to the zero value of the PBS’s precision and recall.

From table 1 and table 2, for all the three datasets, the SCL shows a great advantage in recommendation accuracy of the low-degree objects, as well as novelty and diversity, while simultaneously keeping a high recommendation accuracy.

For the recommendation accuracy, we focus on the overall recommendation accuracy and the recommendation accuracy of the cold objects. Compared with the highly accurate PBS method, the SCL outperforms the PBS for almost all the metrics. Taking the Netflix as an example, the SCL outperforms the PBS as much as Inline graphic and Inline graphic for the recommendation accuracy of the low-degree objects Inline graphic and Inline graphic; Inline graphic Inline graphic Inline graphic and Inline graphic for the overall recommendation accuracy Inline graphic, Inline graphic and Inline graphic; Inline graphic for the novelty Inline graphic; Inline graphic and Inline graphic for the inter-diversity Inline graphic and the inner-diversity Inline graphic. Due to the zero value of the Inline graphic of the PBS, the improvement of the SCL against the PBS leads to an infinite value for the Inline graphic. Similar outstanding performance of the SCL against the PBS is also observed for the MovieLens and the RYM. It indicates the SCL is highly accurate.

The HHP is excellent in both the accuracy and the diversity at the optimal value of the tunable parameter. Compared with the HHP at the optimal value of the tunable parameter evaluated by the ranking score, the SCL presents a very little lower overall recommendation accuracy, but a much greater advantage in the recommendation accuracy of the cold objects. Moreover, the SCL outperforms the HHP in the novelty Inline graphic, as well as both the inter-diversity Inline graphic and the inner-diversity Inline graphic for all the three datasets. Taking the Netflix as an example, the HHP is Inline graphic more advantageous than the SCL in the overall ranking score. However, the ranking score for the cold objects Inline graphic of the SCL is Inline graphic more advantageous than the HHP, and the improvement of the SCL against the HHP is as high as Inline graphic and Inline graphic for the precision Inline graphic and recall Inline graphic for the cold objects. It also suggests that the SCL is outstanding in the cold start problem, while keeping a high recommendation accuracy. To be significant, the improvement of the SCL against the HHP in the novelty Inline graphic, the inter-diversity Inline graphic and the inner-diversity Inline graphic reaches Inline graphic, Inline graphic and Inline graphic, respectively.

The OHHP method has been reported to be more advantageous in the cold start problem than the HHP. Compared with the OHHP at the optimal value of the tunable parameter defined by the ranking score, the SCL method further improves the recommendation accuracy of the cold objects. Also, the SCL outperforms the OHHP in the novelty, the inter-diversity and the inner-diversity for all the three datasets.

The cold start problem is a long-standing challenge in traditional recommendation system, since it is difficult for users to be aware of the cold objects due to the lack of sufficient accessorial information [42]. Basically, the cold start problem can be divided into two categories [44]: i) cold user start [45] and ii.) cold object start [46]. The former focuses on recommending objects for new users, while the latter tends to design algorithms to push new objects, which is exactly what we are trying to solve in this paper. Most of researches in this area try to generate recommendation by using additional information, such as trust relationship [47], social network structure [48], tags [21], [30], [31], [41], [49], [50], etc [51]. However, it increases the system complexity. In addition, for most systems, the cold objects occupy a big proportion. In the Netflix, Movielens and RYM, the cold objects whose degrees are no more than 10 are as much as Inline graphic, Inline graphic, and Inline graphic. Developing effective information filtering techniques is essentially required to solve the cold start problem. Without any additional information, the SCL greatly improves the recommendation accuracy of the cold objects.

To further understand the cold start efficiency of the four algorithms, we investigate the object-degree-dependent ranking score Inline graphic vs. the object degree Inline graphic. As shown in figure 4, it is observed that, the Inline graphic of the low-degree objects of the SCL is much smaller than that of the PBS and the HHP for all the three datasets, and even a little smaller than that of the OHHP for the MovieLens and the RYM, while keeping a close value for the popular objects with high degrees. It suggests that the SCL significantly elevates the recommendation accuracy for cold objects.

Figure 4. The object-degree dependent ranking score Inline graphic vs. the object degree.

Figure 4

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

We then study the degree distribution Inline graphic of the objects in the top Inline graphic recommendation list in figure 5. It is observed that the Inline graphic of the cold objects of the SCL is much greater than the PBS, the HHP and the OHHP, which indicates that the SCL indeed contributes greatly to the recommendation efficiency of the cold objects.

Figure 5. The degree distribution Inline graphic of the objects in the top Inline graphic recommendation list.

Figure 5

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

Besides the cold start problem, diversity and novelty are also significant to mark the vitality of personalized recommendation. Recommendation accuracy and diversity has been addressed to a dilemma pair, as well as accuracy-novelty. Typical examples are the PBS and HTS algorithms, where the PBS is more accurate but less diverse and novel, whereas the HTS is more diverse and novel but less accurate.

Intuitively, the improvement of recommendation accuracy of the cold objects would meanwhile upgrade the recommendation novelty and diversity. However, by comparing the OHHP with the original HHP, we find that the novelty, the inter-diversity and the inner-diversity of the HHP outperform those of the OHHP for all the three datasets, though the OHHP greatly improves the recommendation accuracy of the cold objects. To better understand the observed phenomena, we show the optimal value of the tunable parameter on the object average degree of the OHHP and the SCL in figure 6, where the curve of the SCL is obtained from the empirical study. It is observed that the curve obtained from the SCL is more heterogeneous than that obtained from the OHHP, which can partially explain why the OHHP method unilaterally improves the recommendation accuracy of the cold objects, but not simultaneously enhances the recommendation novelty and diversity. Compared with the OHHP, the SCL not only further improves the recommendation accuracy of the cold objects, but also elevates the recommendation novelty and diversity.

Figure 6. The tunable parameter Inline graphic on the object degree Inline graphic.

Figure 6

The black and red lines are for the SCL and OHHP methods, respectively.

To manifest how the novelty evolves with the recommendation list length, we then study the novelty Inline graphic on the recommendation list length Inline graphic. As shown in figure 7, for all the three datasets, the Inline graphic of the SCL is much smaller than that of the PBS, the HHP and the OHHP for all the investigated range of the recommendation list length. Also, the novelty of the SCL keeps quite stable with the recommendation list length for all the three datasets. It supports that the novelty of the SCL is quite advantageous.

Figure 7. The novelty Inline graphic on the recommendation list length Inline graphic.

Figure 7

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

Further investigation of the inter-diversity Inline graphic on the recommendation list length Inline graphic suggests that, for all the four methods, the inter-diversity decreases with the recommendation list length Inline graphic, as shown in figure 8. It is reasonable since the difference between different users’ recommendation list would decrease with the augment of the recommendation list length Inline graphic. Compared with the PBS, the HHP and the OHHP, the SCL exhibits a much higher value. Moreover, the inter-diversity of the SCL shows a slower decay for the overall range of the recommendation list length Inline graphic for the Netflix and the MovieLens. For the RYM, the inter-diversity Inline graphic of the SCL is also higher than that of the PBS and the OHHP, and similar to the HHP with the recommendation list length evolving. It also indicates that the recommendation diversity of the SCL is advantageous.

Figure 8. The inter-diversity Inline graphic on the recommendation list length Inline graphic.

Figure 8

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

Similar advantage of the SCL is also found for the inner-diversity Inline graphic, as shown in figure 9. It is observed that the Inline graphic increases with Inline graphic for all the four algorithms for the Netflix, the MovieLens, and the RYM, and the Inline graphic of the SCL is higher than the other three methods.

Figure 9. The inner-diversity Inline graphic on the recommendation list length Inline graphic.

Figure 9

The black, red, green and blue lines are for the HHP, PBS, OHHP and SCL methods, respectively.

Taken together, while not searching for the optimal value of the tunable parameter according to any particular evaluator, but abstracting it from the scaling function, the SCL remarkably outperforms the PBS, the HHP, and the OHHP in the recommendation accuracy of cold objects, as well as the recommendation novelty and diversity, and simultaneously keeps a high overall recommendation accuracy.

Conclusion

In conclusion, we have proposed a scaling-based (SCL) recommendation algorithm, in which the optimal value of the tunable parameter can be abstracted from the scaling function independent of the recommendation list length via a rescaled procedure. Based on three real datasets, Netflix, MovieLens and RYM, the optimal value of the tunable parameter is observed to be heterogeneous for the individual object in the SCL algorithm. Experimental results show that, the SCL algorithm not only shows a high accuracy, but also significantly promotes the performance in three other important aspects of personalized recommendation: improving the novelty, solving the long-standing cold start problem, as well as the accuracy-diversity dilemma.

The dilemma existing most in common in a number of algorithms is how to find out the proper value of the tunable parameter for different recommendation focuses, e.g., the accuracy, the diversity, or the cold start problem. It is with no doubt that recommendation accuracy is one of the most important evaluators of the algorithm performance. However, even using the recommendation accuracy as the reference to search for the optimal value of the tunable parameter, the optimal value might also be different for using different accuracy evaluators. By finding out a scaling function independent of the recommendation list length based on empirical data, we resolve the explicit dilemma of the optimal value selection of the tunable parameter for the complex contradiction among different recommendation focuses.

Funding Statement

The work was supported by the National Natural Science Foundation of China (grant nos. 11175079 and 11105024). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans KnowlData Eng 17: 734–749. [Google Scholar]
  • 2. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35: 61–70. [Google Scholar]
  • 3.Schafer JB, Frankowski D, Herlocker J, Sen S (2007) Collaborative filtering recommender systems. In: The adaptive web, Springer. 291–324.
  • 4.Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proc. 14th Conf. Uncertainity Artif. Intel. Morgan Kaufmann Publishers Inc., 43–52.
  • 5.Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proc. 5th Intl. Conf. Mach. Learn. 395–403.
  • 6.Delgado J, Ishii N (1999) Memory-based weighted majority prediction. In: SIGIR Workshop Recomm. Syst. Citeseer.
  • 7.Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. In: Workshop Web Usage Anal. User Profil. Citeseer.
  • 8.Hofmann T (2003) Collaborative filtering via gaussian probabilistic latent semantic analysis. In: Proc. 26th Ann. Intl. SIGIR Conf. Research Devel. Infor. Retr. ACM, 259–266.
  • 9. Billsus D, Pazzani MJ (2000) User modeling for adaptive news access. User Model User-Adap 10: 147–180. [Google Scholar]
  • 10.Marlin B (2003) Modeling user rating profiles for collaborative filtering. Adv Neural inf Process Syst 16.
  • 11.Pazzani MJ, Billsus D (2007) Content-based recommendation systems. In: The adaptive web, Springer. 325–341.
  • 12.Lipczak M, Hu Y, Kollet Y, Milios E (2009) Tag sources for recommendation in collaborative tagging systems. Proc ECML/PKDD Discovery Challenge: 157–172.
  • 13.Cantador I, Vallet D, Jose JM (2009) Measuring vertex centrality in co-occurrence graphs for online social tag recommendation. Proc ECML/PKDD Discovery Challenge: 17–33.
  • 14.Ju S, Hwang KB (2009) A weighting scheme for tag recommendation in social bookmarking systems. In: Proc. ECML/PKDD Discovery Challenge. 109–118.
  • 15. Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Comm ACM 40: 66–72. [Google Scholar]
  • 16. Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: A constant time collaborative filtering algorithm. Infor Retr 4: 133–151. [Google Scholar]
  • 17. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM Trans Inf Syst 22: 89–115. [Google Scholar]
  • 18. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3: 993–1022. [Google Scholar]
  • 19. Laureti P, Moret L, Zhang YC, Yu YK (2006) Information filtering via iterative refinement. EPL 75: 1006. [Google Scholar]
  • 20. Ren J, Zhou T, Zhang YC (2008) Information filtering via self-consistent refinement. EPL 82: 58007. [Google Scholar]
  • 21. Zhang ZK, Zhou T, Zhang YC (2011) Tag-aware recommender systems: a state-of-the-art survey. J Comput Sci Technol 26: 767–777. [Google Scholar]
  • 22. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, et al. (2012) Recommender systems. Phys Rep 519: 1–49. [Google Scholar]
  • 23. Zhang YC, Blattner M, Yu YK (2007) Heat conduction process on community networks as a recommendation model. Phys Rev Lett 99: 154301. [DOI] [PubMed] [Google Scholar]
  • 24. Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E 76: 046115. [DOI] [PubMed] [Google Scholar]
  • 25. Liu C, Zhou WX (2012) Heterogeneity in initial resource configurations improves network-based hybrid recommendation algorithm. Physica A 391: 5704–5711. [Google Scholar]
  • 26. Zhou T, Su RQ, Liu RR, Jiang LL, Wang BH, et al. (2009) Accurate and diverse recommendations via eliminating redundant correlations. New J Phys 11: 123008. [Google Scholar]
  • 27. Liu J, Deng G (2009) Link prediction in a user–object network based on time-weighted resource allocation. Physica A 388: 3643–3650. [Google Scholar]
  • 28. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, et al. (2010) Solving the apparent diversityaccuracy dilemma of recommender systems. Proc Natl Acad Sci USA 107: 4511–4515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu RR, Jia CX, Zhou T, Sun D, Wang BH (2009) Personal recommendation via modified collaborative filtering. Physica A 388: 462–468. [Google Scholar]
  • 30. Zhang ZK, Zhou T, Zhang YC (2010) Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A 389: 179–186. [Google Scholar]
  • 31. Shang MS, Zhang ZK, Zhou T, Zhang YC (2010) Collaborative filtering with diffusion-based similarity on tripartite graphs. Physica A 389: 1259–1264. [Google Scholar]
  • 32. Liu JG, Zhou T, Wang BH, Zhang YC, Guo Q (2009) Effects of user’s tastes on personalized recommendation. Int J Mod Phys C 20: 1925–1932. [Google Scholar]
  • 33. Liu JG, Zhou T, Che HA, Wang BH, Zhang YC (2010) Effects of high-order correlations on personalized recommendations for bipartite networks. Physica A 389: 881–886. [Google Scholar]
  • 34. Zeng A, Yeung CH, Shang MS, Zhang YC (2012) The reinforcing influence of recommendations on global diversification. EPL 97: 18005. [Google Scholar]
  • 35. Liu JG, Zhou T, Guo Q (2011) Information filtering via biased heat conduction. Phys Rev E 84: 037101. [DOI] [PubMed] [Google Scholar]
  • 36.Liang H, Xu Y, Li Y, Nayak R, Tao X (2010) Connecting users and items with weighted tags for personalized item recommendations. In: Proc. 21st ACM Conf. Hypertext hypermedia. ACM, 51–60.
  • 37. Cheng F, Liu C, Jiang J, Lu W, Li W, et al. (2012) Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8: e1002503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Qiu T, Chen G, Zhang ZK, Zhou T (2011) An item-oriented recommendation algorithm on coldstart problem. EPL 95: 58003. [Google Scholar]
  • 39. Zhou T, Jiang LL, Su RQ, Zhang YC (2008) Effect of initial configuration on network-based recommendation. EPL 81: 58004. [Google Scholar]
  • 40. Lü L, Liu W (2011) Information filtering via preferential diffusion. Phys Rev E 83: 066119. [DOI] [PubMed] [Google Scholar]
  • 41. Zhang ZK, Liu C, Zhang YC, Zhou T (2010) Solving the cold-start problem in recommender systems with social tags. EPL 92: 28002. [Google Scholar]
  • 42. Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user coldstarting problem. Inf Sci 178: 37–51. [Google Scholar]
  • 43. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22: 5–53. [Google Scholar]
  • 44. Papagelis M, Plexousakis D (2005) Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents. Engin Appl Artif Intel 18: 781–789. [Google Scholar]
  • 45.Lam XN, Vu T, Le TD, Duong AD (2008) Addressing cold-start problem in recommendation systems. In: Proc. 2nd Intl. Conf. Ubiquitous Infor. Manag. Commun. ACM, 208–211.
  • 46.Park YJ, Tuzhilin A (2008) The long tail of recommender systems and how to leverage it. In: Proc. 2008 ACM Conf. Recomm. Syst. ACM, 11–18.
  • 47.Jamali M, Ester M (2009) Trustwalker: a random walk model for combining trust-based and itembased recommendation. In: Proc. 15th ACM SIGKDD Intl Conf. Knowl. Disc. Data Mining. ACM, 397–406.
  • 48.Groh G, Ehmig C (2007) Recommendations in taste related domains: collaborative filtering vs. social filtering. In: Proc. 2007 Intl. Conf. Supporting Group Work. ACM, 127–136.
  • 49. Zhang ZK, Liu C (2012) Hybrid recommendation algorithm based on two roles of social tags. Int J Bifurcat Chaos 22: 1250166. [Google Scholar]
  • 50. Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commerce Research Appl 9: 73–83. [Google Scholar]
  • 51.Chu W, Park ST (2009) Personalized recommendation on dynamic content using predictive bilinear models. In: Proc. 18th Intl. Conf. World Wide Web. ACM, 691–700.

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES