Today’s world is a small, deeply interconnected web. The internet has seemingly infinite potential, and in the post COVID world, the effect of the internet on our lives will just increase. Businesses are developing advanced solutions in e-commerce space to win over clients and one common focus area is personalized recommendations. Recommendation systems enable businesses to maximize their ROI based on the information they can gather on each customer’s preferences and purchases.
While building the recommendation system it is very common to use a traditional collaborative filtering algorithm such as item-item and user-item filtering. Basically, these algorithms are item-based in the sense that they analyse item-item relations to produce item similarities and focused on learning similarities between users and items simultaneously to provide recommendations. What if the user or item information is not available, how to build a more robust recommendation system? This led us to try to discover techniques from deep learning that can be used to overcome this limitation.
We want to focus on an extension of the Word2Vec model, called Item2Vec for item-based Collaborative Filtering that produces embedding for items in a latent space. The method is capable of inferring item to item (items might refer to sequences or words) relations even when user/item information is not available, which basically means it learns the item similarities by embedding items in a low dimensional space regardless of the users.
2. Problem Statement
The task at hand is thus: Build a single item-based recommender system (Model) for an online website or a mobile app. More specifically we would be focussing on the below
- To show the recommendation in the context of explicit user interest in a specific item and in the context of an explicit user intent to purchase (This helps to obtain a larger share of sales by higher Click-Through Rates)
- Compare the results with traditional collaborative filtering algorithms.
3. Solution Methodology
After gathering the required data (we have used the Movielens 20M Dataset) and performing a train-test split on it, we tried to create word and sentence equivalents as to that of a typical Word2Vec model, and then fed this to the model for training. The newly trained model is what our Recommender Engine will be, as depicted in Fig. 1.
Data Gathering and understanding:
For an Illustrative purpose, we considered MovieLens 20M Dataset curated by the MovieLens research team. It contains 20 million ratings with 465,000 tag applications applied to 27,000 movies by 138,000 users.
The Rating data is split into train (70%) and test (30%). We used a function from the sklearn’s Model_Selection library for this.
Fig 2. Train-Test Split
After performing the split, we had around 14M rows for the training, and then around 6M for testing.
Model to learn embedding: Identify word and sentence equivalents from data
For the model to learn the embedding, we need to get the “word” and “sentence” equivalents from the data. Here it can be imagined that each “movie” is a “word”, and movies that received similar ratings from a user are in the same “sentence”. Specifically, “sentences” are generated with the following process: For each user, generate 2 lists, which respectively stores the movies “Liked” and “Disliked” by the user. The first list contains all the movies that are rated 4 points or above. The second list contains the rest of the movies. Those lists are the inputs to train the Gensim Word2Vec model.
- For the original Word2Vec, the window size affects the scope to search for “contexts” to define the meaning of a given word.
- As the way it is defined, the window is of a fixed size. However, in the Item2Vec implementation, the “meaning” of a movie should be captured by all its neighbours in the same list.
- In other words, we should consider all the movies “liked” (> 4 stars) by a user to define the “meaning” for each of those movies. This applies to all the movies “disliked” (< 4 stars) by a user too.
- The window size then needs to be changed according to the size of each movie list. To address this problem here we specified a very large window size, which is way larger than the length of any movie lists in our training example
The below code snippet performs model training and saves the resulted model object.
Figure 4. Training the model
Note that, Gensim saves all the information about the model, including the hidden weights, vocabulary frequency and the binary tree of the model, so it is possible to continue training after loading the file.
Built-in method called model.wv.most_similar_word() provided by Genism is used for recommendation. This method has an additional feature called positive list and negative list, if these features are given, then model will provide the recommendation by taking this into consideration. There are two ways to measure similarity in this method. Cosmul and Cosadd. Like the name suggests, these use cosine similarities. Cosadd adds the positive terms’ vector and subtracts the negative terms’ vector before reporting a similarity value, and the positive and negative terms go in numerator and denominator respectively in cosmul similarity, and finally the cosine distance is reported.
A case in point is the analogy question “London is to England as Baghdad is to — ?”, which the authors of the paper giving these methods wants to answer using:
To achieve better balance among the different aspects of similarity, the authors propose switching from an additive to a multiplicative combination:
Now we move on to defining the Recommender function. The function takes as an input a positive and a negative list and the number of recommendations to be generated, and generates recommended movies using the similarity functions as the output.
Figure 7: Building the recommender
We’ll see the recommendation in action in this section now with two use cases. In the first test case, the inputs will only be given in the positive list, and then put in one movie from the positive list into the negative list and see the changes thereafter.
Test 1: Using the Positive List:
Testing the model by giving movie in only positive list:
We tried testing the recommender on the cult classic “The Matrix”. The output is what we can see in the table below:
Table 1: Recommendations for “The Matrix”
Test 2: Using the Positive and Negative List:
Testing the model by giving movies in both positive and negative lists: As the next scenario, we decided to input the first suggestion (Die Hard) in the negative list to see how well the recommender system is taking it into consideration. Below are the results:
Table 2: Recommendations for The Matrix in the positive list and Die hard in the negative list
If we notice, when we gave in a movie in the negative list, the recommender tried to learn how Die Hard has a lot more action than The Matrix (through Genre tags) and as an output tried to give us movies which take all the aspects of the movie in the positive list into consideration except for those which are matching with the movie in the negative list.
So basically, when a specific movie is given in the negative list, recommender engine will try not to include genre of movies that are like it. This makes the recommendations more robust overall.
Let’s see the results. To validate the method and for the sake of comparison, we have compared the outputs with that of a traditional CF model. The tables below show the results for the recommendations on the movie “Toy Story”.
Table 3: Results from Item2vec for Toy Story
Table 4: Results from Collaborative Filtering for Toy Story
Now the results are pointing towards a very interesting anomaly. Toy story is children’s movie and the recommendations given by traditional methods even have Adult rated movies such as American Pie and Die Hard, whereas item2Vec recommends only relevant children movie. This confirms that item2vec outperforms the traditional method.
- This algorithm can be applied only for huge datasets that are appropriately labelled. All similarities are based on item profiles, and this is one major limitation. It requires rich item metadata so that they could be compared to other items and recommended properly.
- Model must be re-trained every time for the different business problems (this can also be understood like the model can only make recommendations based on existing interests of the user.)
- Model training is computationally expensive hence requires a suitable infrastructure like GPUs or TPUs (Made available easily through cloud services such as AWS,Azure).
The authors wish to express their gratitude to Paulami Das, Head of Data Science CoE @ Brillio and Anish Roychowdhury, Senior Analytics Leader @ Brillio for their mentoring and guidance towards shaping up this study.
Sachin graduated from IIT Kharagpur (batch of ’19) from the department of Electronics and Communication Engineering and have been working as a Data Science professional in Brillio Technologies ever since. He likes reading about the constant new research happening in the domain (NLP specially) and thinking about how they can have an impact on business. He has more than an year’s worth of experience in the industry, having worked on projects ranging from NLP use cases to ML operations.
Smruthi S R is currently a Data Scientist at Brillio, a Leading Digital Services Organization.
She has worked on various analytics use cases by applying ML and Deep learning techniques, from domain such as Telecom, Mass media, Retail, Pharma and Real estate. She has total 6 years work experience in the field of Data Science and Analytics.
Smruthi complete her Bachelors in Information Science from VTU