Anteckning
Åtkomst till den här sidan kräver auktorisering. Du kan prova att logga in eller ändra kataloger.
Åtkomst till den här sidan kräver auktorisering. Du kan prova att ändra kataloger.
The post Generate movie recommendations using Mahout is a good introduction to Mahout. I've seen similar posts on the web and in books about Mahout that also use the same GroupLens Research movie ratings data.
I won't regurgitate the same info in the tutorial linked above. Rather, let's look at the results a little more deeply.
The result file part-r-0000 contains the movie recommendation results, something like:
1 [234:5.0...
3 ...272:4.649266]
These results can be parsed as User ID, Movie ID and recommendation score. The first 2 fields are easy enough to understand. The recommendation results can be used
if you are trying to predict a user's rating for an item but aren't useful in ranking recommendations.
To provide ranked recommendations SIMILARITY_LOGLIKELIHOOD would work better. In summary, SIMILARITY_LOGLIKELIHOOD doesn't use the users item rating rather it considers overlapping and non-overlapping users and items each user did and did not interact with. To me, log likelihood is kinda like gathering user feedback; focus on what users do, not what they say they do. Did the user interact/purchase/consume an item? Yes, then that's what log likelihood uses.
More information on how log likelihood works can be found here.
To use SIMILARITY_LOGLIKELIHOOD you only need a file with User ID and ID of the item/movie/etc. that the user has interacted with. Below we'll use the same data file from the tutorial.
Once you're setup for the tutorial then there's 2 options to slightly change and re-run it.
A) In the tutorial if you used the PowerShell script in the "Run the job" section just change the jobArguments to "SIMILARITY_LOGLIKELIHOOD".
View the output file "part-r-00000" per the tutorial.
B) Alternatively, I used the following command line with my local HDInsight Emulator:
Copy the u.data file from your local file system to HDFS:
hadoop fs -put u.data u.data
Run it:
hadoop jar C:\hdp\mahout-0.9.0.2.1.3.0-1981\core\target\mahout-core-0.9.0.2.1.3.0-1981-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_LOGLIKELIHOOD --input u.data --output udata_output
Note: you may need to change the mahout version above to match your local machine.
Get the output file from HDFS and put it back into your local file system:
hadoop fs -getmerge udata_output udata_output.txt
View the file udata_output.txt
You'll notice that the recommendations are different when compared to the results you received from previously using SIMILARITY_COOCCURRENCE. In udata_output.txt
you can ignore the recommendation score and focus on the recommended item IDs.
There's a lot of data to look through but I'll let you determine whether SIMILARITY_COOCCURRENCE, SIMILARITY_LOGLIKELIHOOD or another Mahout similarity metric will best meet your needs.
I hope this post provided additional insight.
Comments
- Anonymous
February 08, 2015
helpful.