Recommendation Of Tamil Movies Using Machine Learning
Data science projects in pondicherry
Create New

Recommendation Of Tamil Movies Using Machine Learning

Project period

06/16/2018 - 07/15/2018

Views

491

2

Project Category

Computer Science



Recommendation Of Tamil Movies Using Machine Learning
Recommendation Of Tamil Movies Using Machine Learning

Recommender systems algorithms are among the most popular applications of data science today. They are used to predict the "rating" or "preference" that a user would give to an item. Amazon uses recommendation system to suggest products to customers, YouTube uses it to decide which video to play next on autoplay, and Facebook uses it to recommend pages to like and people to follow.

Why: Problem statement

Most of the people were not aware of the good and popular movies. They waste their time on searching it. 

How: Solution description

In this challenge, we are going to complete the analysis of Tamil movies in the year of 2017. In particular, we are going to recommend the movies by applying the tools of machine learning such as Simple recommender, Content-Based recommender systems, and Collaborative filtering algorithm.

I collected the list of Tamil movies data from some leading website. The dataset contains Title of the movie, Director of the movie, cast (actors who participated in various roles), Producer of the movie, the year that the movie released, ratings, and the vote counts.

Simple Recommender:

Simple recommenders are basic systems that recommend the top items based on a certain metric or score. In this section, I built a simplified clone of top Tamil movies in the year of 2017.

The following are the steps involved:

Calculate the ratings for every movie.

Sort the movies based on the ratings and output the top results.

One of the most basic metrics is rating. For one, it does not take into consideration the popularity of a movie. Therefore, a movie with a rating of 9 from 10 voters will be considered 'better'.

As the number of voters increases, the rating of a movie regularizes and approaches towards a value that is reflective of the movie's quality. It is more difficult to discern the quality of a movie with extremely few voters.

Taking these shortcomings into consideration, it is necessary that we come up with a weighted rating that takes into account the average rating and the number of votes it has gathered. Such a system will make sure that a movie with a 9 rating from 1000 voters gets a far higher score.

The average rating of a movie on Tamil movie dataset is around 6.29, on a scale of 10. I calculated the average number of votes,  received by each movie is 515.

I calculated the metric for each qualified movie. To do this, I define a function, weighted_rating() and define a new feature wr, of which I calculated the value by applying this function to the DataFrame of qualified movies. Finally, I sorted the DataFrame based on the wr feature and the output title, vote count, vote average and weighted rating or score of the top 10 movies.

Genre-Based Recommendation:

I calculated the genre for each qualified movie. To do this, I defined a function, build_chart() for the genre, of which I calculated the value by applying this function to the DataFrame of qualified movies.

Content-Based Recommender systems:

I built a system that recommends movies that are similar to a particular movie. More specifically, I computed pairwise similarity scores for all movies based on their plot descriptions and recommend movies based on that similarity weighted ratings. The plot description is available as Overview feature in the dataset.

In its current form, it is not possible to compute the similarity between any two overviews. To do that, I computed the word vectors of each overview, as it will be called from now on. I computed the Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each overview. This will give a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document) and each column represents a movie.

I analyzed that 1717 different words were used to describe the 191 movies in the dataset.

I used the cosine similarity to calculate a numeric quantity that denotes the similarity between the two movies. I used the cosine similarity score since it is independent of magnitude and is relatively easy and fast to calculate.

I defined a function that takes in a movie title as an input and outputs a list of the 10 most similar movies. I used reverse mapping of movie titles and DataFrame indices.

Collaborative filtering algorithm:

These systems are extremely similar to the content-based recommendation engine that I built. These systems identify similar items based on how people have rated it in the past. Here I took three columns and found the items that were similar in all the three columns named Director, Genre, and ratings.

Result:

By the nature of our system, it is not an easy task to evaluate the performance since there are no right or wrong recommendations. It is just a matter of opinions. This is the RMSE and MAE results between the top 5 movies.

Mean RMSE: 1.9462
Mean MAE: 1.6732

How is it different from competition

I collected the Tamil movie dataset whereas others don't.

Who are your customers

People who are curious about watching various types of movies can use this project. Data Scientists can use this for study and literature survey.

Project Phases and Schedule

Phase 1: Data collection

Phase 2: Data Analysis

Phase 3: Filtering

Phase 4: Recommendation

Resources Required

Tool required: Anaconda - Python 3.6 version

 

Download:
Project Code Code copy
/* Your file Name : Tamil.ipynb */
/* Your coding Language : python */
/* Your code snippet start here */
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from scipy import stats\n",
    "from ast import literal_eval\n",
    "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer\n",
    "from sklearn.metrics.pairwise import linear_kernel, cosine_similarity\n",
    "from nltk.stem.snowball import SnowballStemmer\n",
    "from nltk.stem.wordnet import WordNetLemmatizer\n",
    "from nltk.corpus import wordnet\n",
    "from wordcloud import WordCloud, STOPWORDS\n",
    "import re\n",
    "import sys\n",
    "import nltk\n",
    "from nltk.corpus import stopwords\n",
    "from surprise import Reader, Dataset, SVD, evaluate\n",
    "\n",
    "import warnings; warnings.simplefilter('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Director</th>\n",
       "      <th>Cast</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Producer</th>\n",
       "      <th>Vote</th>\n",
       "      <th>Vote Counts</th>\n",
       "      <th>Year</th>\n",
       "      <th>Overview</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Peiyena Peiyum Kurudhi</td>\n",
       "      <td>Sudhakar Shanmugam</td>\n",
       "      <td>Jana, Seenivasan, Harish, Ganeshan, Robin</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>Lion Hunters Productions</td>\n",
       "      <td>4.5</td>\n",
       "      <td>28.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>A thriller film directed by Sudhakar Shanmugam...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Soorathengai</td>\n",
       "      <td>Sanjeev Srinivas Kanna</td>\n",
       "      <td>Arvind Vinod, Eugina Samanthi, Theni Murugan</td>\n",
       "      <td>Action masala</td>\n",
       "      <td>Maruthi Films International</td>\n",
       "      <td>5.2</td>\n",
       "      <td>35.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>Soorathengai is the Tamil film featuring Guru ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Unnai Thottu Kolla Vaa</td>\n",
       "      <td>Krishnakumar</td>\n",
       "      <td>Powerstar Srinivasan, Livingston, Ganja Karuppu</td>\n",
       "      <td>Horror</td>\n",
       "      <td>Kavibharathi Creations</td>\n",
       "      <td>4.6</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>The movie is directed by Andal Ramesh and feat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bairavaa</td>\n",
       "      <td>Bharathan</td>\n",
       "      <td>Vijay, Keerthy Suresh, Sathish, Jagapati Babu,...</td>\n",
       "      <td>Action masala</td>\n",
       "      <td>Vijaya Productions</td>\n",
       "      <td>7.0</td>\n",
       "      <td>120.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>The film stars Vijay and Keerthy Suresh in the...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Koditta Idangalai Nirappuga</td>\n",
       "      <td>Parthiban</td>\n",
       "      <td>Shanthanu Bhagyaraj, Parvatii Nair, Parthiban,...</td>\n",
       "      <td>Comedy thriller</td>\n",
       "      <td>Reel Estate Company &amp; Bioscope Film Frames</td>\n",
       "      <td>7.0</td>\n",
       "      <td>29.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>RadhakrishnanParthiban’s films can be hit-and-...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Title                Director  \\\n",
       "0       Peiyena Peiyum Kurudhi      Sudhakar Shanmugam   \n",
       "1                 Soorathengai  Sanjeev Srinivas Kanna   \n",
       "2       Unnai Thottu Kolla Vaa            Krishnakumar   \n",
       "3                     Bairavaa               Bharathan   \n",
       "4  Koditta Idangalai Nirappuga               Parthiban   \n",
       "\n",
       "                                                Cast            Genre  \\\n",
       "0          Jana, Seenivasan, Harish, Ganeshan, Robin         Thriller   \n",
       "1       Arvind Vinod, Eugina Samanthi, Theni Murugan    Action masala   \n",
       "2    Powerstar Srinivasan, Livingston, Ganja Karuppu           Horror   \n",
       "3  Vijay, Keerthy Suresh, Sathish, Jagapati Babu,...    Action masala   \n",
       "4  Shanthanu Bhagyaraj, Parvatii Nair, Parthiban,...  Comedy thriller   \n",
       "\n",
       "                                     Producer  Vote  Vote Counts  Year  \\\n",
       "0                    Lion Hunters Productions   4.5         28.0  2017   \n",
       "1                 Maruthi Films International   5.2         35.0  2017   \n",
       "2                      Kavibharathi Creations   4.6         40.0  2017   \n",
       "3                          Vijaya Productions   7.0        120.0  2017   \n",
       "4  Reel Estate Company & Bioscope Film Frames   7.0         29.0  2017   \n",
       "\n",
       "                                            Overview  \n",
       "0  A thriller film directed by Sudhakar Shanmugam...  \n",
       "1  Soorathengai is the Tamil film featuring Guru ...  \n",
       "2  The movie is directed by Andal Ramesh and feat...  \n",
       "3  The film stars Vijay and Keerthy Suresh in the...  \n",
       "4  RadhakrishnanParthiban’s films can be hit-and-...  "
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "md = pd. read_csv('Tamil_Movies.csv')\n",
    "md.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.294764397905758"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vote_counts = md[md['Vote Counts'].notnull()]['Vote Counts'].astype('float')\n",
    "vote_averages = md[md['Vote'].notnull()]['Vote'].astype('float')\n",
    "C = vote_averages.mean()\n",
    "C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "553.5"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m = vote_counts.quantile(0.95)\n",
    "m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10, 5)"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qualified = md[(md['Vote Counts'] >= m) & (md['Vote Counts'].notnull()) & (md['Vote'].notnull())][['Title', 'Year', 'Vote Counts', 'Vote','Genre']]\n",
    "qualified['Vote Counts'] = qualified['Vote Counts'].astype('int')\n",
    "qualified['Vote'] = qualified['Vote'].astype('int')\n",
    "qualified.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "def weighted_rating(x):\n",
    "    v = x['Vote Counts']\n",
    "    R = x['Vote']\n",
    "    return (v/(v+m) * R) + (m/(m+v) * C)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Year</th>\n",
       "      <th>Vote Counts</th>\n",
       "      <th>Vote</th>\n",
       "      <th>Genre</th>\n",
       "      <th>wr</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>Baahubali: The Conclusion</td>\n",
       "      <td>2017</td>\n",
       "      <td>3932</td>\n",
       "      <td>9</td>\n",
       "      <td>Fantasy</td>\n",
       "      <td>8.666180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>178</th>\n",
       "      <td>Aruvi</td>\n",
       "      <td>2017</td>\n",
       "      <td>1351</td>\n",
       "      <td>9</td>\n",
       "      <td>Social drama</td>\n",
       "      <td>8.213784</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>Mersal</td>\n",
       "      <td>2017</td>\n",
       "      <td>1352</td>\n",
       "      <td>8</td>\n",
       "      <td>Action Thriller</td>\n",
       "      <td>7.504672</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>Kurangu Bommai</td>\n",
       "      <td>2017</td>\n",
       "      <td>762</td>\n",
       "      <td>8</td>\n",
       "      <td>Thriller drama</td>\n",
       "      <td>7.282518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>Nagarvalam</td>\n",
       "      <td>2017</td>\n",
       "      <td>720</td>\n",
       "      <td>8</td>\n",
       "      <td>Romantic thriller</td>\n",
       "      <td>7.258855</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118</th>\n",
       "      <td>Oru Kanavu Pola</td>\n",
       "      <td>2017</td>\n",
       "      <td>671</td>\n",
       "      <td>8</td>\n",
       "      <td>Romance</td>\n",
       "      <td>7.229197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>184</th>\n",
       "      <td>Velaikkaran</td>\n",
       "      <td>2017</td>\n",
       "      <td>992</td>\n",
       "      <td>7</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>6.747429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>8 Thottakkal</td>\n",
       "      <td>2017</td>\n",
       "      <td>859</td>\n",
       "      <td>7</td>\n",
       "      <td>Crime thriller</td>\n",
       "      <td>6.723648</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>Pannam Pathinonnum Seyum</td>\n",
       "      <td>2017</td>\n",
       "      <td>569</td>\n",
       "      <td>6</td>\n",
       "      <td>Action</td>\n",
       "      <td>6.145347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>Kalavaadiya Pozhuthugal</td>\n",
       "      <td>2017</td>\n",
       "      <td>731</td>\n",
       "      <td>5</td>\n",
       "      <td>Drama</td>\n",
       "      <td>5.557923</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Title  Year  Vote Counts  Vote              Genre  \\\n",
       "52   Baahubali: The Conclusion  2017         3932     9            Fantasy   \n",
       "178                      Aruvi  2017         1351     9       Social drama   \n",
       "144                     Mersal  2017         1352     8    Action Thriller   \n",
       "117             Kurangu Bommai  2017          762     8     Thriller drama   \n",
       "50                  Nagarvalam  2017          720     8  Romantic thriller   \n",
       "118            Oru Kanavu Pola  2017          671     8            Romance   \n",
       "184                Velaikkaran  2017          992     7           Thriller   \n",
       "41                8 Thottakkal  2017          859     7     Crime thriller   \n",
       "116   Pannam Pathinonnum Seyum  2017          569     6             Action   \n",
       "187    Kalavaadiya Pozhuthugal  2017          731     5              Drama   \n",
       "\n",
       "           wr  \n",
       "52   8.666180  \n",
       "178  8.213784  \n",
       "144  7.504672  \n",
       "117  7.282518  \n",
       "50   7.258855  \n",
       "118  7.229197  \n",
       "184  6.747429  \n",
       "41   6.723648  \n",
       "116  6.145347  \n",
       "187  5.557923  "
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qualified['wr'] = qualified.apply(weighted_rating, axis=1)\n",
    "qualified = qualified.sort_values('wr', ascending=False).head(250)\n",
    "qualified.head(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = md.apply(lambda x: pd.Series(x['Genre']),axis=1).stack().reset_index(level=1, drop=True)\n",
    "s.name = 'Genre'\n",
    "gen_md = md.drop('Genre', axis=1).join(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_chart(Genre, percentile=0.85):\n",
    "    df = gen_md[gen_md['Genre'] == Genre]\n",
    "    vote_counts = df[df['Vote Counts'].notnull()]['Vote Counts'].astype('int')\n",
    "    vote_averages = df[df['Vote'].notnull()]['Vote'].astype('int')\n",
    "    C = vote_averages.mean()\n",
    "    m = vote_counts.quantile(percentile)\n",
    "    \n",
    "    qualified = df[(df['Vote Counts'] >= m) & (df['Vote Counts'].notnull()) & (df['Vote'].notnull())][['Title', 'Year', 'Vote Counts', 'Vote']]\n",
    "    qualified['Vote Counts'] = qualified['Vote Counts'].astype('int')\n",
    "    qualified['Vote'] = qualified['Vote'].astype('int')\n",
    "    \n",
    "    qualified['wr'] = qualified.apply(lambda x: (x['Vote Counts']/(x['Vote Counts']+m) * x['Vote']) + (m/(m+x['Vote Counts']) * C), axis=1)\n",
    "    qualified = qualified.sort_values('wr', ascending=False).head(250)\n",
    "    \n",
    "    return qualified"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Year</th>\n",
       "      <th>Vote Counts</th>\n",
       "      <th>Vote</th>\n",
       "      <th>wr</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>164</th>\n",
       "      <td>Guru Uchaththula Irukkaru</td>\n",
       "      <td>2017</td>\n",
       "      <td>156</td>\n",
       "      <td>8</td>\n",
       "      <td>6.833333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Kanna Pinna</td>\n",
       "      <td>2017</td>\n",
       "      <td>156</td>\n",
       "      <td>7</td>\n",
       "      <td>6.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>Saravanan Irukka Bayamaen</td>\n",
       "      <td>2017</td>\n",
       "      <td>156</td>\n",
       "      <td>7</td>\n",
       "      <td>6.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>Adra Raja Adida</td>\n",
       "      <td>2017</td>\n",
       "      <td>367</td>\n",
       "      <td>6</td>\n",
       "      <td>5.900574</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Peechankai</td>\n",
       "      <td>2017</td>\n",
       "      <td>157</td>\n",
       "      <td>6</td>\n",
       "      <td>5.833866</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146</th>\n",
       "      <td>Kadaisi Bench Karthi</td>\n",
       "      <td>2017</td>\n",
       "      <td>156</td>\n",
       "      <td>4</td>\n",
       "      <td>4.833333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Title  Year  Vote Counts  Vote        wr\n",
       "164  Guru Uchaththula Irukkaru  2017          156     8  6.833333\n",
       "25                 Kanna Pinna  2017          156     7  6.333333\n",
       "57   Saravanan Irukka Bayamaen  2017          156     7  6.333333\n",
       "115            Adra Raja Adida  2017          367     6  5.900574\n",
       "77                  Peechankai  2017          157     6  5.833866\n",
       "146       Kadaisi Bench Karthi  2017          156     4  4.833333"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "build_chart('Comedy').head(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5,1,'Overview')"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1a18685978>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "stopwords = set(STOPWORDS)\n",
    "wordcloud = WordCloud(\n",
    "                          background_color='white',\n",
    "                          stopwords=stopwords,\n",
    "                          max_words=200,\n",
    "                          max_font_size=40, \n",
    "                          random_state=42\n",
    "                         ).generate(str(md['Overview']))\n",
    "plt.imshow(wordcloud)\n",
    "plt.axis('off')\n",
    "plt.title('Overview')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9.09 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "def cleaning(s):\n",
    "    s = str(s)\n",
    "    s = s.lower()\n",
    "    s = re.sub('\\s\\W',' ',s)\n",
    "    s = re.sub('\\W,\\s',' ',s)\n",
    "    s = re.sub(r'[^\\w]', ' ', s)\n",
    "    s = re.sub(\"\\d+\", \"\", s)\n",
    "    s = re.sub('\\s+',' ',s)\n",
    "    s = re.sub('[!@#$_]', '', s)\n",
    "    s = s.replace(\"co\",\"\")\n",
    "    s = s.replace(\"https\",\"\")\n",
    "    s = s.replace(\",\",\"\")\n",
    "    s = s.replace(\"[\\w*\",\" \")\n",
    "    return s\n",
    "md['Overview'] = [cleaning(s) for s in md['Overview']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a thriller film directed by sudhakar shanmugam...\n",
       "1    soorathengai is the tamil film featuring guru ...\n",
       "2    the movie is directed by andal ramesh and feat...\n",
       "3    the film stars vijay and keerthy suresh in the...\n",
       "4    radhakrishnanparthiban s films can be hit and ...\n",
       "Name: Overview, dtype: object"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Print plot overviews of the first 5 movies.\n",
    "md['Overview'].head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(191, 1691)"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Import TfIdfVectorizer from scikit-learn\n",
    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
    "\n",
    "#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'\n",
    "tfidf = TfidfVectorizer(stop_words='english')\n",
    "\n",
    "#Replace NaN with an empty string\n",
    "md['Overview'] = md['Overview'].fillna('')\n",
    "\n",
    "#Construct the required TF-IDF matrix by fitting and transforming the data\n",
    "tfidf_matrix = tfidf.fit_transform(md['Overview'])\n",
    "\n",
    "#Output the shape of tfidf_matrix\n",
    "tfidf_matrix.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import linear_kernel\n",
    "from sklearn.metrics.pairwise import linear_kernel\n",
    "\n",
    "# Compute the cosine similarity matrix\n",
    "cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.        ,  0.0791257 ,  0.04333171,  0.12170194,  0.04869945,\n",
       "        0.09214741,  0.07638808,  0.055337  ,  0.02295348,  0.        ,\n",
       "        0.02371623,  0.        ,  0.03996045,  0.        ,  0.        ,\n",
       "        0.        ,  0.01935917,  0.        ,  0.        ,  0.        ,\n",
       "        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,\n",
       "        0.0176576 ,  0.        ,  0.        ,  0.05073715,  0.05972802,\n",
       "        0.12989205,  0.        ,  0.        ,  0.        ,  0.        ,\n",
       "        0.        ,  0.04020944,  0.04897837,  0.        ,  0.        ,\n",
       "        0.04768548,  0.        ,  0.        ,  0.        ,  0.        ,\n",
       "        0.05691269,  0.        ,  0.        ,  0.        ,  0.        ,\n",
       "        0.        ,  0.        ,  0.        ,  0.05207113,  0.        ,\n",
       "        0.07043293,  0.        ,  0.01985719,  0.05379042,  0.0492001 ,\n",
       "        0.        ,  0.        ,  0.04479365,  0.        ,  0.03595572,\n",
       "        0.        ,  0.        ,  0.0566211 ,  0.        ,  0.        ,\n",
       "        0.04921356,  0.12081681,  0.        ,  0.        ,  0.03687466,\n",
       "        0.        ,  0.        ,  0.        ,  0.        ,  0.04309622,\n",
       "        0.04162354,  0.17135481,  0.        ,  0.        ,  0.20635562,\n",
       "        0.0501414 ,  0.        ,  0.03187064,  0.        ,  0.12064169,\n",
       "        0.09506094,  0.10301313,  0.        ,  0.16776608,  0.        ,\n",
       "        0.        ,  0.        ,  0.03544325,  0.        ,  0.        ,\n",
       "        0.        ,  0.04747624,  0.        ,  0.07927598,  0.        ,\n",
       "        0.        ,  0.        ,  0.02848915,  0.        ,  0.04259367,\n",
       "        0.        ,  0.        ,  0.        ,  0.05855933,  0.        ,\n",
       "        0.04473147,  0.04472828,  0.        ,  0.02426322,  0.        ,\n",
       "        0.14459396,  0.05160853,  0.        ,  0.        ,  0.        ,\n",
       "        0.        ,  0.10983147,  0.        ,  0.        ,  0.10936779,\n",
       "        0.        ,  0.        ,  0.        ,  0.        ,  0.02188185,\n",
       "        0.0729935 ,  0.        ,  0.06534049,  0.        ,  0.        ,\n",
       "        0.        ,  0.11617162,  0.        ,  0.03801929,  0.        ,\n",
       "        0.        ,  0.        ,  0.20739346,  0.        ,  0.05375034,\n",
       "        0.        ,  0.        ,  0.03124291,  0.20971563,  0.        ,\n",
       "        0.        ,  0.        ,  0.02582353,  0.05359731,  0.        ,\n",
       "        0.05441297,  0.0702002 ,  0.        ,  0.        ,  0.        ,\n",
       "        0.050575  ,  0.        ,  0.0600976 ,  0.        ,  0.0337501 ,\n",
       "        0.        ,  0.        ,  0.08119167,  0.04409968,  0.02730328,\n",
       "        0.        ,  0.        ,  0.        ,  0.        ,  0.17452284,\n",
       "        0.04892673,  0.06054961,  0.        ,  0.        ,  0.02725508,\n",
       "        0.        ,  0.02830373,  0.        ,  0.        ,  0.0962513 ,  0.        ])"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosine_sim[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Construct a reverse map of indices and movie titles\n",
    "md = md.reset_index()\n",
    "titles = md['Title']\n",
    "indices = pd.Series(md.index, index=md['Title']).drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function that takes in movie title as input and outputs most similar movies\n",
    "def get_recommendations(Title, cosine_sim=cosine_sim):\n",
    "    # Get the index of the movie that matches the title\n",
    "    idx = indices[Title]\n",
    "\n",
    "    # Get the pairwsie similarity scores of all movies with that movie\n",
    "    sim_scores = list(enumerate(cosine_sim[idx]))\n",
    "\n",
    "    # Sort the movies based on the similarity scores\n",
    "    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n",
    "\n",
    "    # Get the scores of the 10 most similar movies\n",
    "    sim_scores = sim_scores[1:11]\n",
    "\n",
    "    # Get the movie indices\n",
    "    movie_indices = [i[0] for i in sim_scores]\n",
    "\n",
    "    # Return the top 10 most similar movies\n",
    "    return md['Title'].iloc[movie_indices]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "108                  Sathura Adi 3500\n",
       "63     Sangili Bungili Kadhava Thorae\n",
       "52          Baahubali: The Conclusion\n",
       "131                     Bayama Irukku\n",
       "110       Podhuvaga En Manasu Thangam\n",
       "88                   Yaanum Theeyavan\n",
       "127                    Thupparivaalan\n",
       "18                         Kuttram 23\n",
       "34                     Paambhu Sattai\n",
       "38                               Dora\n",
       "Name: Title, dtype: object"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_recommendations('Shivalinga')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "153                    Ippadai Vellum\n",
       "84     Adhagappattathu Magajanangalay\n",
       "81                             Veruli\n",
       "30                                465\n",
       "0              Peiyena Peiyum Kurudhi\n",
       "147                   Kalathur Gramam\n",
       "91                          Niranjana\n",
       "169                             Yaazh\n",
       "5             Sivappu Enakku Pidikkum\n",
       "1                        Soorathengai\n",
       "Name: Title, dtype: object"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_recommendations('Bairavaa')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Director</th>\n",
       "      <th>Cast</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Producer</th>\n",
       "      <th>Vote</th>\n",
       "      <th>Vote Counts</th>\n",
       "      <th>Year</th>\n",
       "      <th>Overview</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Peiyena Peiyum Kurudhi</td>\n",
       "      <td>Sudhakar Shanmugam</td>\n",
       "      <td>Jana, Seenivasan, Harish, Ganeshan, Robin</td>\n",
       "      <td>Thriller</td>\n",
       "      <td>Lion Hunters Productions</td>\n",
       "      <td>4.5</td>\n",
       "      <td>28.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>A thriller film directed by Sudhakar Shanmugam...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Soorathengai</td>\n",
       "      <td>Sanjeev Srinivas Kanna</td>\n",
       "      <td>Arvind Vinod, Eugina Samanthi, Theni Murugan</td>\n",
       "      <td>Action masala</td>\n",
       "      <td>Maruthi Films International</td>\n",
       "      <td>5.2</td>\n",
       "      <td>35.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>Soorathengai is the Tamil film featuring Guru ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Unnai Thottu Kolla Vaa</td>\n",
       "      <td>Krishnakumar</td>\n",
       "      <td>Powerstar Srinivasan, Livingston, Ganja Karuppu</td>\n",
       "      <td>Horror</td>\n",
       "      <td>Kavibharathi Creations</td>\n",
       "      <td>4.6</td>\n",
       "      <td>40.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>The movie is directed by Andal Ramesh and feat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Bairavaa</td>\n",
       "      <td>Bharathan</td>\n",
       "      <td>Vijay, Keerthy Suresh, Sathish, Jagapati Babu,...</td>\n",
       "      <td>Action masala</td>\n",
       "      <td>Vijaya Productions</td>\n",
       "      <td>7.0</td>\n",
       "      <td>120.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>The film stars Vijay and Keerthy Suresh in the...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Koditta Idangalai Nirappuga</td>\n",
       "      <td>Parthiban</td>\n",
       "      <td>Shanthanu Bhagyaraj, Parvatii Nair, Parthiban,...</td>\n",
       "      <td>Comedy thriller</td>\n",
       "      <td>Reel Estate Company &amp; Bioscope Film Frames</td>\n",
       "      <td>7.0</td>\n",
       "      <td>29.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>RadhakrishnanParthiban’s films can be hit-and-...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Title                Director  \\\n",
       "0       Peiyena Peiyum Kurudhi      Sudhakar Shanmugam   \n",
       "1                 Soorathengai  Sanjeev Srinivas Kanna   \n",
       "2       Unnai Thottu Kolla Vaa            Krishnakumar   \n",
       "3                     Bairavaa               Bharathan   \n",
       "4  Koditta Idangalai Nirappuga               Parthiban   \n",
       "\n",
       "                                                Cast            Genre  \\\n",
       "0          Jana, Seenivasan, Harish, Ganeshan, Robin         Thriller   \n",
       "1       Arvind Vinod, Eugina Samanthi, Theni Murugan    Action masala   \n",
       "2    Powerstar Srinivasan, Livingston, Ganja Karuppu           Horror   \n",
       "3  Vijay, Keerthy Suresh, Sathish, Jagapati Babu,...    Action masala   \n",
       "4  Shanthanu Bhagyaraj, Parvatii Nair, Parthiban,...  Comedy thriller   \n",
       "\n",
       "                                     Producer  Vote  Vote Counts  Year  \\\n",
       "0                    Lion Hunters Productions   4.5         28.0  2017   \n",
       "1                 Maruthi Films International   5.2         35.0  2017   \n",
       "2                      Kavibharathi Creations   4.6         40.0  2017   \n",
       "3                          Vijaya Productions   7.0        120.0  2017   \n",
       "4  Reel Estate Company & Bioscope Film Frames   7.0         29.0  2017   \n",
       "\n",
       "                                            Overview  \n",
       "0  A thriller film directed by Sudhakar Shanmugam...  \n",
       "1  Soorathengai is the Tamil film featuring Guru ...  \n",
       "2  The movie is directed by Andal Ramesh and feat...  \n",
       "3  The film stars Vijay and Keerthy Suresh in the...  \n",
       "4  RadhakrishnanParthiban’s films can be hit-and-...  "
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Collaborative Filtering\n",
    "reader = Reader()\n",
    "ratings = pd.read_csv('Tamil_movies.csv')\n",
    "ratings.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [],
   "source": [
    "md = Dataset.load_from_df(ratings[['Director', 'Genre', 'Vote']], reader)\n",
    "md.split(n_folds=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluating RMSE, MAE of algorithm SVD.\n",
      "\n",
      "------------\n",
      "Fold 1\n",
      "RMSE: 2.1224\n",
      "MAE:  1.7692\n",
      "------------\n",
      "Fold 2\n",
      "RMSE: 1.8613\n",
      "MAE:  1.5132\n",
      "------------\n",
      "Fold 3\n",
      "RMSE: 2.1208\n",
      "MAE:  1.8974\n",
      "------------\n",
      "Fold 4\n",
      "RMSE: 1.7317\n",
      "MAE:  1.5342\n",
      "------------\n",
      "Fold 5\n",
      "RMSE: 1.8947\n",
      "MAE:  1.6526\n",
      "------------\n",
      "------------\n",
      "Mean RMSE: 1.9462\n",
      "Mean MAE : 1.6733\n",
      "------------\n",
      "------------\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "CaseInsensitiveDefaultDict(list,\n",
       "                           {'mae': [1.7692307692307692,\n",
       "                             1.513157894736842,\n",
       "                             1.8973684210526314,\n",
       "                             1.5342105263157895,\n",
       "                             1.6526315789473685],\n",
       "                            'rmse': [2.1224079213514502,\n",
       "                             1.8613096690799535,\n",
       "                             2.1207620278917121,\n",
       "                             1.7316709302076752,\n",
       "                             1.8947295321496416]})"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "svd = SVD()\n",
    "evaluate(svd, md, measures=['RMSE', 'MAE'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a1952d208>"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trainset = md.build_full_trainset()\n",
    "svd.train(trainset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Director</th>\n",
       "      <th>Cast</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Producer</th>\n",
       "      <th>Vote</th>\n",
       "      <th>Vote Counts</th>\n",
       "      <th>Year</th>\n",
       "      <th>Overview</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Dora</td>\n",
       "      <td>Doss Ramasamy</td>\n",
       "      <td>Nayanthara, Sulile Kumar, Harish Uthaman, Tham...</td>\n",
       "      <td>Horror</td>\n",
       "      <td>A Sarkunam Cinemaz &amp; Nemichand Jabak Productions</td>\n",
       "      <td>8.3</td>\n",
       "      <td>538.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>A father and a daughter buy a used car which i...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>Thappattam</td>\n",
       "      <td>Mujibur Rahman</td>\n",
       "      <td>Durai Sudhakar, Dona Rozario</td>\n",
       "      <td>Romantic drama</td>\n",
       "      <td>Moon Pictures</td>\n",
       "      <td>8.3</td>\n",
       "      <td>123.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>Thappattam is a Tamil movie released on 24 Aug...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>Mersal</td>\n",
       "      <td>Atlee</td>\n",
       "      <td>Vijay, Samantha, Kajal Aggarwal, Nithya Menen,...</td>\n",
       "      <td>Action Thriller</td>\n",
       "      <td>Sri Thenandal Films</td>\n",
       "      <td>8.3</td>\n",
       "      <td>1352.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>Maaran, a doctor, is invited to Paris for a se...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>164</th>\n",
       "      <td>Guru Uchaththula Irukkaru</td>\n",
       "      <td>B. Dhandapani</td>\n",
       "      <td>Guru Jeeva, Aara, Pandiarajan, M. S. Bhaskar</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>Best Movies</td>\n",
       "      <td>8.3</td>\n",
       "      <td>156.0</td>\n",
       "      <td>2017</td>\n",
       "      <td>A wastrel and his friends learn that a politic...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Title        Director  \\\n",
       "38                        Dora   Doss Ramasamy   \n",
       "113                 Thappattam  Mujibur Rahman   \n",
       "144                     Mersal           Atlee   \n",
       "164  Guru Uchaththula Irukkaru   B. Dhandapani   \n",
       "\n",
       "                                                  Cast            Genre  \\\n",
       "38   Nayanthara, Sulile Kumar, Harish Uthaman, Tham...           Horror   \n",
       "113                       Durai Sudhakar, Dona Rozario   Romantic drama   \n",
       "144  Vijay, Samantha, Kajal Aggarwal, Nithya Menen,...  Action Thriller   \n",
       "164       Guru Jeeva, Aara, Pandiarajan, M. S. Bhaskar           Comedy   \n",
       "\n",
       "                                             Producer  Vote  Vote Counts  \\\n",
       "38   A Sarkunam Cinemaz & Nemichand Jabak Productions   8.3        538.0   \n",
       "113                                     Moon Pictures   8.3        123.0   \n",
       "144                               Sri Thenandal Films   8.3       1352.0   \n",
       "164                                       Best Movies   8.3        156.0   \n",
       "\n",
       "     Year                                           Overview  \n",
       "38   2017  A father and a daughter buy a used car which i...  \n",
       "113  2017  Thappattam is a Tamil movie released on 24 Aug...  \n",
       "144  2017  Maaran, a doctor, is invited to Paris for a se...  \n",
       "164  2017  A wastrel and his friends learn that a politic...  "
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ratings[ratings['Vote'] == 8.3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Prediction(uid=1, iid=302, r_ui=3, est=5, details={'was_impossible': False})"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "svd.predict(1, 302, 3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Accuracy 0.8671328671328671\n"
     ]
    }
   ],
   "source": [
    "X_train, X_test, Y_train, Y_test = train_test_split(\n",
    "        md.loc[:, ['Vote', 'Vote Counts']], \n",
    "        md.loc[:, 'Title'])\n",
    "\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "tree = DecisionTreeClassifier()#criterion='entropy'\n",
    "                               #, max_depth = 3\n",
    "                               #, random_state = 0)\n",
    "tree.fit(X_train,Y_train)\n",
    "#Y_pred_tree = tree.predict(X_test)\n",
    "print (\"Training Accuracy {}\".format(tree.score(X_train, Y_train)))\n",
    "#print (\"Testing Accuracy {}\".format(tree.score(X_test, Y_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Accuracy 0.8461538461538461\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "random = RandomForestClassifier()\n",
    "random.fit(X_train,Y_train)\n",
    "#y_pred = random.predict(X_test)\n",
    "print (\"Training Accuracy {}\".format(random.score(X_train, Y_train)))\n",
    "#print (\"Testing Accuracy {}\".format(random.score(X_test, Y_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training Accuracy 0.8461538461538461\n"
     ]
    }
   ],
   "source": [
    "from sklearn.naive_bayes import BernoulliNB\n",
    "naive = BernoulliNB()\n",
    "naive.fit(X_train,Y_train)\n",
    "#y_pred = random.predict(X_test)\n",
    "print (\"Training Accuracy {}\".format(random.score(X_train, Y_train)))\n",
    "#print (\"Testing Accuracy {}\".format(random.score(X_test, Y_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
View on Github
Tamil Movie Recommendation

Comments

Leave a Comment

Post a Comment

Are you Interested in this project?


Do you need help with a similar project? We can guide you. Please Click the Contact Us button.


Contact Us

Social Sharing