Now that you have run through the tutorial, you have:

  • Efficiently imported data into a KijiTable in Bulk Importing.
  • Manipulated data in and between Kiji and HDFS in PlayCount.
  • Use a gatherer to generate the next-song pairs, and use a generic MapReduce job to aggregate them into counts and SequentialPlayCount
  • Filtered the next song counts by popularity and wrote a list of the most popular to a Kiji table in NextTopSong.
  • Used a producer and joined together data sources, to generate the recommendations for our users in Music Recommendation Producer.

Now that you understand how the mechanics work, you can begin to improve on the incredibly simple recommendation algorithm we have implemented. It is worth noting that we have put thought into our data generator, and you should be able to leverage patterns in what the users listen to.

Have a look at the recommendations generated by the music recommendation producer. There are a few problems to notice here:

  • Users can get recommended songs that they just listened to. You can improve the logic of the recommend() method in the producer to avoid recommending recently played songs.
  • We only recommend songs that have been played immediately after each other. Our idea of songs that are similar enough to recommend is too narrow. To broaden our definition, you can:
    • Instead of looking back one song play, you can increase the "play radius" by incorporating songs that have been played several song ahead of the song being analyzed.
    • Implement item-item collaborative filtering, perhaps using the Mahout implementation.
    • Cluster users together and use that information to enhance recommendations.
    • Once you have implemented another recommendation strategy, you can combine multiple strategies in the recommendation producer by using multiple key-value stores and modifying the recommend method.

When you are ready to take on real-time recommendations instead of batch processing, try the KijiScoring Tutorial. It uses the same scenario as this one but introduces applying a model in real time if the data you are looking for needs to be refreshed.

Now that you have gotten your feet wet, you should join our user group mailing list. It is great place to ask questions and talk about all the cool apps you are building with Kiji.