Something to say: Matrix Factorization: initial values

The initial distribution of feature values affects the results of matrix factorization (SVD) algorithm (this implementation). In this post, let's have a look at performance of SVD algorithm with different distributions of initial values. To conduct experiments, I used Lenskit framework and MovieLens100K dataset. The experiments includes three distributions:

Fixed values (0.1) (Fixed)
Random values (Random)
Popularity distribution for item features and random for user features (POP)

I conducted experiments for two settings: (1) each algorithm ranks only test items (we know the rating for each test item) and (2) each algorithm ranks all the items (items without ratings are irrelevant to a user). The results of the experiment are presented below.

Figure 1. An experimental setting where each algorithm ranks only test movies

Figure 2. An experimental setting where each algorithm ranks all the movies

Figure 3. Unpopularity values of the experiment, where each algorithm ranks all the movies

The following observations can be noticed:

POP outperforms others at recommending movies among all the movies (Figure 2), as it suggests popular movies (Figure 3). Users tend to rate popular movies.
Fixed outperforms others at ranking test movies only.
Random underperforms other algorithms.

In the experimental setting with all the items, it is clearly better to recommend only popular items, as users tend to consume popular items. In another setting it is not always the case, but in our experiments it seems to play an important role.
SVD with popularity distribution (POP) suggest the most popular items among the three algorithms. Meanwhile, the SVD modification with fixed values suggests less popular items and therefore outperforms Random in both settings.
It seems that Random distribution is the worst one for the SVD algorithm, but before we make this conclusion let's have a look at the actual recommendations for three randomly selected users:

Figure 4. Movie ids recommended by the three algorithms

According to figure 4, the SVD algorithm with the fixed initial values suggests the same top-3 movies to all the users. According to figure 3, the algorithm with popularity distribution suggests mostly popular movies. Even though the performance of Random is lower of two other distributions, Random seems to generate more novel and personalized recommendations. Surprisingly, most recommendations are different for each distribution. Depending on the goals of the algorithm, one might choose a suitable distribution.

Data split: 20 test ratings for each user. The rest ratings are test data.

Features: 25

Learning rate: 0.0001

Regularization: 0.01

In this post, I used this implementation of SVD. In other implementations (For example, FunkSVD) the situation could be different.

Something to say

Thursday, 25 August 2016

Matrix Factorization: initial values

No comments:

Post a Comment