So here it begins…
It was during April 2019, when I was surfing arXiv just like any other day, and suddenly something caught my attention in cs.LG aka Machine Learning category. It was this paper Fiducia: A Personalized Food Recommender System for Zomato. The name was amusing enough and I found myself finished reading its initial paragraphs in few minutes (for people who know me very well, they know how obsessed I am with clever project names e.g. frisky, arena, Epicentre). I added it to my bucket list of research implementations and planned on working on its initial stages.
Fastforward, it was the time when I was studying (just kidding) for my final semester. It was then, when I realized that I have spent all my engineering semesters going through a common phase. Stress -> Creativity -> Write code for that new side project -> No studying -> Exam day -> Repeat. During that time, I was thorough with the Fiducia paper and the line “After preprocessing, we had a total 3,131 reviews in our dataset” really bugged me and I asked myself can we get some more data? Later one day, during a lazy afternoon of music and code, I created a crawler in few hours which will fetch me the data I actually need. I ran it and boom! I could see around 12,000+ restaurants just in Mumbai and more than 5 million reviews. (Sorry, can’t really get into more details of that, you know me!) It was really intimidating and I planned on working of its future stages like NLP pipeline and some recommender algorithms.
In July 2019, I joined Continuum my current company and quickly I got busy with my day job. I found myself writing systems in Golang, playing with Big Data stack and doing a bit of DevOps every now and then. It was during Diwali of 2019, I realized I need to get back into that exam time cycle, but this time its something different. This time, it was Work Stress -> Creativity -> Multitask -> Write code for that last side project -> Work @ office -> Repeat.
I wrapped up some old Go projects and released them. And finally, it was time for Fiducia. Few weeks back, during some free time, I was checking my crawler and found a Python unicode bug. I quickly fixed it and thought of crawling reviews for only 100 restaurants. Boom again! I passed 35,000 reviews just for 100 restaurants, which means I will definitely pass over 20 million reviews when I crawl for all restaurants! That’s a huge amount of data for building a brilliant recommender system.
LET’S GET TO WHAT YOU CAME FOR?
These are some visualizations of Restaurants data from Zomato. EDA on Reviews will be coming soon, with some results of research paper implementation, as shared above.
I am becoming seriously allergic to 500-pound websites.