Clustering-based Location Recommendation(Collaborative Filtering)

Presented By
Md. Farhan Tanvir(2014-2-60-124)
Kevin Stephen Bishwas (2014-2-60-091)
Nazmul Hasan(2014-2-60-063)
Supervised By
Dr. Mohammad Rezwanul Huq
Assistant Professor
Department Of Computer Science And Engineering
East West University .
Clustering-based Location
Recommendation System
1

The world is an over-crowded place
2

They all want to get our attention
3

We are overloaded
• Thousands of news places to visit
• Millions of restaurants , hotels ,
parks to visit .
4

Can Google Help ?
• Yes, but only when we really know what
we are looking for
• What if I just want some interesting place to
visit?
– Btw, what does it mean by “interesting”?
6

Can Facebook Help ?
• Yes, I tend to find my friends’ stuffs
interesting
• What if I had only few friends, and what places
they visit do not always attract me?
7

Can experts help?
• Yes, but it won’t scale well
– Everyone receives exactly the same advice!
• It is what they like, not me!
– Like restaurant , what get expert approval does
not guarantee attention of the mass .
8

OK, Here is the idea called Recommendation System
• Recommendation system is an information filtering technique,
which provides users with information, which user may be
interested in .
• Based on
- Past Behavior
- Relations to the user
- Item Similarity
- Context
9

Existing Work
• Ling Li*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes
and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information
Technology, Electronic and Automation Control Conference (IAEAC) .
• Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for
Tourist Attraction Recommendations , 2015 IEEE International Conference on
Computational Intelligence & Communication Technology.
• Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju
Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based
Collaborative Filtering Technique , 2016 International Conference on Emerging Trends
in Engineering, Technology and Science (ICETETS) .
10

Our Proposal
Input
Dataset
Data
Cleaning
Feature
Engineering Clustering
Find User
Preference
Result
11

Our Dataset
• Foursquare NYC Check-in Dataset
• https://sites.google.com/site/yangdingqi/home/foursquare-dataset
12

Attributes of our Dataset
13
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Time zone offset
8 UTC time
But after Data cleaning and
feature engineering we’ve
got some other attribute .
What Data Cleaning and
Feature Engineering ?

Task 1: Data Cleaning
• Removing Home Check-Ins:
-The dataset did not contain the home check-ins for all the users .
After cleaning with certain process we removed this.
14

Task 1: Data Cleaning(Cont…)
• Replacing Multiple category of a venue:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C002 Park
Figure : Before Replacing
Category
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
Figure : After Replacing
15

• Replacing Sub-Category Id’s From Category Id Column:
Category
1 V-1 C001 Bar
1 V-2 C002 Bar
1 V-3 C001 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : Before Replacing
Category
1 V-1 C002 Bar
1 V-2 C002 Bar
1 V-3 C002 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : After Replacing
16

• Replacing different latitude and longitude value of a venue:
Figure : Before Replacing Figure : After Replacing
Venue Id Latitude Longitude
V-1 40 -73
V-1 43 -70
V-1 43 -70
V-1 40 -73
V-1 40 -73
17
Venue Id Latitude Longitude
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73

Task 2: Feature Engineering
• Check-In Counts:
User Id Venue Id Check-In Count
1083 V-1 3
1083 V-2 1
1083 V-3 1
1083 V-4 2
1083 V-5 1
Figure : After adding Check-In Count attribute
18

Task 2: Feature Engineering(Cont…)
• Venue Distance from User’s Center:
- First We find out users center point by doing average of latitude and
longitude where user has previously checked .
-Now, Using this center points we calculate the distance of each ven using “The Haversine
Formula”.
Where,
• d is the distance between the two points,
• r is the radius of the sphere,
• φ1, φ2: latitude of point 1 and latitude of point 2, in radians
• λ1, λ2: longitude of point 1 and longitude of point 2, in radians
Reference : https://www.movable-type.co.uk/scripts/latlong.html 19
𝒅 = 𝟐𝒓 𝐬𝐢𝐧−𝟏
𝐬𝐢𝐧 𝟐
𝝋 𝟐 − 𝝋 𝟏
𝟐
+ 𝐜𝐨𝐬 𝝋 𝟏 𝐜𝐨𝐬 𝝋 𝟐 𝐬𝐢𝐧 𝟐
𝝀 𝟐 − 𝝀 𝟏
𝟐

Our Dataset After Feature Engineering
20
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Distance From Center
8 Check In Count

Task 2 : Clustering
• We used KNN (k-nearest neighbors) as clustering algorithm .
• First we find the similarity between user using Pearson correlation . We also checked cosine
correlation but Pearson Correlation gives us better result .
Where:-
 Rui, Rvi represent the checkingCount of ith item given by the user
u and v respectively.
 Ru , Rv represent the average checkin of user u and v respectively.
 Iuv donates the set of items checked by both user u and v
𝒔𝒊𝒎 𝒖, 𝒗 =
𝒊∈𝑰 𝒖𝒗
𝑹 𝒖𝒊 − 𝑹 𝒖 . 𝑹 𝒗𝒊 − 𝑹 𝒗
𝑹 𝒖𝒊 − 𝑹 𝒖
𝟐
𝑹 𝒗𝒊 − 𝑹 𝒗
𝟐
21
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation

Task 2 : Clustering(Cont…)
• After finding similarity we take top n nearest neighbor .
• Then used their checkinCount to find predicted checkinCount for
every places of that user which user didn’t check in . We used
weighted average checkin to predict checkin count for a user .
• After this we took top most checkInCount.
22

Task 3 : Find User Preference
• We used user’s every check-in’s distance from center point and find a mean
distance. If user’s most of the checkin’s distance are more than mean distance
we can say user like to travel in long distance otherwise like to travel in close
distance . Then we sort the recommendation on user preference .
• Example :
Users mean checkin distance = 50 KM
User’s have 50 checkins .
30 of them are more than 50 km.
Result : Users Love o travel in long distance
23

Example
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
What will be probable checking count of Place4? 24

Example(Cont..)
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
25

Example(Cont..)
Me 3 - 5 6
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
26

Evolution
• We used Sampling and RMSE technique for evaluating our recommendation.
• In sampling technique 10% of the entire dataset was selected randomly without replacement
to make a sample dataset.
• RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted
check in count from an actual check in count of a venue by specific user in test dataset.
RMSE Formula:
RMSE =
𝑖=1
𝑁 𝑃 𝑢,𝑖−𝑅 𝑢,𝑖
2
𝑁
Here :
P u,i=is the predicted checkIn Count for user u on venue i
R u,i=is the actual checkIn Count for user u on venue i
N=is the total number of venues where user checked in
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
27

RMSE Graph
28
Figure : RMSE graph

Demo
• We have created a simple demo where user can enter their id and our system will
recommend place for user .
Figure : Input User Id Figure : Output Recommendation
29

Future Work
30
• Try Model Based Recommendation System
• Add More Domain
• Try Triangulation Technique to find user’s center point .

Clustering-based Location Recommendation(Collaborative Filtering)

More Related Content

Similar to Clustering-based Location Recommendation(Collaborative Filtering)

Recently uploaded

Clustering-based Location Recommendation(Collaborative Filtering)