Lenskit 2 has an embedded evaluation module. In this post I am going to describe how it splits datasets.
Lets consider an example, where we are given the data file with 4 users:
The data file corresponds to the following user-item matrix:
If we ask Lenskit to split our data with crossfold 1 and holdout 3, we would receive two files test.0.csv and train.0.csv that correspond to the following user-item matrices.
Lenskit hid 3 ratings of each user regardless of how many ratings the user has. For example, in the training dataset user 4 does not have any rating at all. If we set crossfold 2 and holdout 2, which correspond to 2-fold cross-validation, we would obtain the following result:
The framework selected half of users for the first fold and half for the second fold. For each user the framework hid 2 ratings.
I could not find a detailed documentation on data split in Lenskit framework. Hope this post helps.
Lets consider an example, where we are given the data file with 4 users:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 1 5 | |
1 2 5 | |
1 3 4 | |
1 4 3 | |
1 5 2 | |
1 6 1 | |
2 1 1 | |
2 2 2 | |
2 3 3 | |
2 4 4 | |
2 5 5 | |
2 6 5 | |
3 1 3 | |
3 2 3 | |
3 3 2 | |
3 4 2 | |
3 5 5 | |
3 6 5 | |
4 1 2 | |
4 2 2 | |
4 3 4 |
If we ask Lenskit to split our data with crossfold 1 and holdout 3, we would receive two files test.0.csv and train.0.csv that correspond to the following user-item matrices.
Lenskit hid 3 ratings of each user regardless of how many ratings the user has. For example, in the training dataset user 4 does not have any rating at all. If we set crossfold 2 and holdout 2, which correspond to 2-fold cross-validation, we would obtain the following result:
The framework selected half of users for the first fold and half for the second fold. For each user the framework hid 2 ratings.
I could not find a detailed documentation on data split in Lenskit framework. Hope this post helps.
No comments:
Post a Comment