The initial distribution of feature values affects the results of matrix factorization (SVD) algorithm (this implementation). In this post, let's have a look at performance of SVD algorithm with different distributions of initial values. To conduct experiments, I used Lenskit framework and MovieLens100K dataset. The experiments includes three distributions:
The following observations can be noticed:
SVD with popularity distribution (POP) suggest the most popular items among the three algorithms. Meanwhile, the SVD modification with fixed values suggests less popular items and therefore outperforms Random in both settings.
It seems that Random distribution is the worst one for the SVD algorithm, but before we make this conclusion let's have a look at the actual recommendations for three randomly selected users:
According to figure 4, the SVD algorithm with the fixed initial values suggests the same top-3 movies to all the users. According to figure 3, the algorithm with popularity distribution suggests mostly popular movies. Even though the performance of Random is lower of two other distributions, Random seems to generate more novel and personalized recommendations. Surprisingly, most recommendations are different for each distribution. Depending on the goals of the algorithm, one might choose a suitable distribution.
- Fixed values (0.1) (Fixed)
- Random values (Random)
- Popularity distribution for item features and random for user features (POP)
I conducted experiments for two settings: (1) each algorithm ranks only test items (we know the rating for each test item) and (2) each algorithm ranks all the items (items without ratings are irrelevant to a user). The results of the experiment are presented below.
Figure 2. An experimental setting where each algorithm ranks all the movies
Figure 3. Unpopularity values of the experiment, where each algorithm ranks all the movies
- POP outperforms others at recommending movies among all the movies (Figure 2), as it suggests popular movies (Figure 3). Users tend to rate popular movies.
- Fixed outperforms others at ranking test movies only.
- Random underperforms other algorithms.
SVD with popularity distribution (POP) suggest the most popular items among the three algorithms. Meanwhile, the SVD modification with fixed values suggests less popular items and therefore outperforms Random in both settings.
It seems that Random distribution is the worst one for the SVD algorithm, but before we make this conclusion let's have a look at the actual recommendations for three randomly selected users:
Figure 4. Movie ids recommended by the three algorithms
According to figure 4, the SVD algorithm with the fixed initial values suggests the same top-3 movies to all the users. According to figure 3, the algorithm with popularity distribution suggests mostly popular movies. Even though the performance of Random is lower of two other distributions, Random seems to generate more novel and personalized recommendations. Surprisingly, most recommendations are different for each distribution. Depending on the goals of the algorithm, one might choose a suitable distribution.
Data split: 20 test ratings for each user. The rest ratings are test data.
Features: 25
Learning rate: 0.0001
Regularization: 0.01
In this post, I used this implementation of SVD. In other implementations (For example, FunkSVD) the situation could be different.
In this post, I used this implementation of SVD. In other implementations (For example, FunkSVD) the situation could be different.
No comments:
Post a Comment