Presented By
Md. Farhan Tanvir(2014-2-60-124)
Kevin Stephen Bishwas (2014-2-60-091)
Nazmul Hasan(2014-2-60-063)
Supervised By
Dr. Mohammad Rezwanul Huq
Assistant Professor
Department Of Computer Science And Engineering
East West University .
Clustering-based Location
Recommendation System
1
The world is an over-crowded place
2
They all want to get our attention
3
We are overloaded
• Thousands of news places to visit
• Millions of restaurants , hotels ,
parks to visit .
4
5
Can Google Help ?
• Yes, but only when we really know what
we are looking for
• What if I just want some interesting place to
visit?
– Btw, what does it mean by “interesting”?
6
Can Facebook Help ?
• Yes, I tend to find my friends’ stuffs
interesting
• What if I had only few friends, and what places
they visit do not always attract me?
7
Can experts help?
• Yes, but it won’t scale well
– Everyone receives exactly the same advice!
• It is what they like, not me!
– Like restaurant , what get expert approval does
not guarantee attention of the mass .
8
OK, Here is the idea called Recommendation System
• Recommendation system is an information filtering technique,
which provides users with information, which user may be
interested in .
• Based on
- Past Behavior
- Relations to the user
- Item Similarity
- Context
9
Existing Work
• Ling Li*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes
and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information
Technology, Electronic and Automation Control Conference (IAEAC) .
• Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for
Tourist Attraction Recommendations , 2015 IEEE International Conference on
Computational Intelligence & Communication Technology.
• Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju
Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based
Collaborative Filtering Technique , 2016 International Conference on Emerging Trends
in Engineering, Technology and Science (ICETETS) .
10
Our Proposal
Input
Dataset
Data
Cleaning
Feature
Engineering Clustering
Find User
Preference
Result
11
Our Dataset
• Foursquare NYC Check-in Dataset
• https://sites.google.com/site/yangdingqi/home/foursquare-dataset
12
Attributes of our Dataset
13
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Time zone offset
8 UTC time
But after Data cleaning and
feature engineering we’ve
got some other attribute .
What Data Cleaning and
Feature Engineering ?
Task 1: Data Cleaning
• Removing Home Check-Ins:
-The dataset did not contain the home check-ins for all the users .
After cleaning with certain process we removed this.
14
Task 1: Data Cleaning(Cont…)
• Replacing Multiple category of a venue:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C001 Bar
1 V-1 C002 Bar
1 V-1 C002 Park
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
1 V-1 C002 Bar
Figure : After Replacing
15
Task 1: Data Cleaning(Cont…)
• Replacing Sub-Category Id’s From Category Id Column:
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C001 Bar
1 V-2 C002 Bar
1 V-3 C001 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : Before Replacing
User Id Venue Id Venue Category Id Venue
Category
1 V-1 C002 Bar
1 V-2 C002 Bar
1 V-3 C002 Bar
1 V-4 C002 Bar
1 V-5 C002 Bar
Figure : After Replacing
16
Task 1: Data Cleaning(Cont…)
• Replacing different latitude and longitude value of a venue:
Figure : Before Replacing Figure : After Replacing
Venue Id Latitude Longitude
V-1 40 -73
V-1 43 -70
V-1 43 -70
V-1 40 -73
V-1 40 -73
17
Venue Id Latitude Longitude
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
V-1 40 -73
Task 2: Feature Engineering
• Check-In Counts:
User Id Venue Id Check-In Count
1083 V-1 3
1083 V-2 1
1083 V-3 1
1083 V-4 2
1083 V-5 1
Figure : After adding Check-In Count attribute
18
Task 2: Feature Engineering(Cont…)
• Venue Distance from User’s Center:
- First We find out users center point by doing average of latitude and
longitude where user has previously checked .
-Now, Using this center points we calculate the distance of each ven using “The Haversine
Formula”.
Where,
• d is the distance between the two points,
• r is the radius of the sphere,
• φ1, φ2: latitude of point 1 and latitude of point 2, in radians
• λ1, λ2: longitude of point 1 and longitude of point 2, in radians
Reference : https://www.movable-type.co.uk/scripts/latlong.html 19
𝒅 = 𝟐𝒓 𝐬𝐢𝐧−𝟏
𝐬𝐢𝐧 𝟐
𝝋 𝟐 − 𝝋 𝟏
𝟐
+ 𝐜𝐨𝐬 𝝋 𝟏 𝐜𝐨𝐬 𝝋 𝟐 𝐬𝐢𝐧 𝟐
𝝀 𝟐 − 𝝀 𝟏
𝟐
Our Dataset After Feature Engineering
20
1 User ID
2 Venue ID
3 Venue Category ID
4 Venue Category
5 Latitude
6 Longitude
7 Distance From Center
8 Check In Count
Task 2 : Clustering
• We used KNN (k-nearest neighbors) as clustering algorithm .
• First we find the similarity between user using Pearson correlation . We also checked cosine
correlation but Pearson Correlation gives us better result .
Where:-
 Rui, Rvi represent the checkingCount of ith item given by the user
u and v respectively.
 Ru , Rv represent the average checkin of user u and v respectively.
 Iuv donates the set of items checked by both user u and v
𝒔𝒊𝒎 𝒖, 𝒗 =
𝒊∈𝑰 𝒖𝒗
𝑹 𝒖𝒊 − 𝑹 𝒖 . 𝑹 𝒗𝒊 − 𝑹 𝒗
𝒊∈𝑰 𝒖𝒗
𝑹 𝒖𝒊 − 𝑹 𝒖
𝟐
𝒊∈𝑰 𝒖𝒗
𝑹 𝒗𝒊 − 𝑹 𝒗
𝟐
21
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
Task 2 : Clustering(Cont…)
• After finding similarity we take top n nearest neighbor .
• Then used their checkinCount to find predicted checkinCount for
every places of that user which user didn’t check in . We used
weighted average checkin to predict checkin count for a user .
• After this we took top most checkInCount.
22
Task 3 : Find User Preference
• We used user’s every check-in’s distance from center point and find a mean
distance. If user’s most of the checkin’s distance are more than mean distance
we can say user like to travel in long distance otherwise like to travel in close
distance . Then we sort the recommendation on user preference .
• Example :
Users mean checkin distance = 50 KM
User’s have 50 checkins .
30 of them are more than 50 km.
Result : Users Love o travel in long distance
23
Example
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
What will be probable checking count of Place4? 24
Example(Cont..)
Place1 Place2 Place3 Place4
Me 3 - 5 ?
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
25
Example(Cont..)
Place1 Place2 Place3 Place4
Me 3 - 5 6
My Friend 4 6 - -
You 3 - 5 6
Another guy 4 2 - 1
Your Friend 8 - - 3
26
Evolution
• We used Sampling and RMSE technique for evaluating our recommendation.
• In sampling technique 10% of the entire dataset was selected randomly without replacement
to make a sample dataset.
• RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted
check in count from an actual check in count of a venue by specific user in test dataset.
RMSE Formula:
RMSE =
𝑖=1
𝑁 𝑃 𝑢,𝑖−𝑅 𝑢,𝑖
2
𝑁
Here :
P u,i=is the predicted checkIn Count for user u on venue i
R u,i=is the actual checkIn Count for user u on venue i
N=is the total number of venues where user checked in
Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
27
RMSE Graph
28
Figure : RMSE graph
Demo
• We have created a simple demo where user can enter their id and our system will
recommend place for user .
Figure : Input User Id Figure : Output Recommendation
29
Future Work
30
• Try Model Based Recommendation System
• Add More Domain
• Try Triangulation Technique to find user’s center point .
31

Clustering-based Location Recommendation(Collaborative Filtering)

  • 1.
    Presented By Md. FarhanTanvir(2014-2-60-124) Kevin Stephen Bishwas (2014-2-60-091) Nazmul Hasan(2014-2-60-063) Supervised By Dr. Mohammad Rezwanul Huq Assistant Professor Department Of Computer Science And Engineering East West University . Clustering-based Location Recommendation System 1
  • 2.
    The world isan over-crowded place 2
  • 3.
    They all wantto get our attention 3
  • 4.
    We are overloaded •Thousands of news places to visit • Millions of restaurants , hotels , parks to visit . 4
  • 5.
  • 6.
    Can Google Help? • Yes, but only when we really know what we are looking for • What if I just want some interesting place to visit? – Btw, what does it mean by “interesting”? 6
  • 7.
    Can Facebook Help? • Yes, I tend to find my friends’ stuffs interesting • What if I had only few friends, and what places they visit do not always attract me? 7
  • 8.
    Can experts help? •Yes, but it won’t scale well – Everyone receives exactly the same advice! • It is what they like, not me! – Like restaurant , what get expert approval does not guarantee attention of the mass . 8
  • 9.
    OK, Here isthe idea called Recommendation System • Recommendation system is an information filtering technique, which provides users with information, which user may be interested in . • Based on - Past Behavior - Relations to the user - Item Similarity - Context 9
  • 10.
    Existing Work • LingLi*, Ya Zhou, Han Xiong, Cailin Hu, Collaborative filtering based on user attributes and user ratings for restaurant recommendation , 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) . • Zhiyang Jia , Wei Gao , Yuting Yang , Xu Chen , User-based Collaborative Filtering for Tourist Attraction Recommendations , 2015 IEEE International Conference on Computational Intelligence & Communication Technology. • Lakshmi Tharun Ponnam (Author) , Sreenivasa Deepak Punyasamudram ,Siva Nagaraju Nallagulla , Srikanth Yellamati , Movie Recommender System Using Item Based Collaborative Filtering Technique , 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS) . 10
  • 11.
  • 12.
    Our Dataset • FoursquareNYC Check-in Dataset • https://sites.google.com/site/yangdingqi/home/foursquare-dataset 12
  • 13.
    Attributes of ourDataset 13 1 User ID 2 Venue ID 3 Venue Category ID 4 Venue Category 5 Latitude 6 Longitude 7 Time zone offset 8 UTC time But after Data cleaning and feature engineering we’ve got some other attribute . What Data Cleaning and Feature Engineering ?
  • 14.
    Task 1: DataCleaning • Removing Home Check-Ins: -The dataset did not contain the home check-ins for all the users . After cleaning with certain process we removed this. 14
  • 15.
    Task 1: DataCleaning(Cont…) • Replacing Multiple category of a venue: User Id Venue Id Venue Category Id Venue Category 1 V-1 C001 Bar 1 V-1 C002 Bar 1 V-1 C001 Bar 1 V-1 C002 Bar 1 V-1 C002 Park Figure : Before Replacing User Id Venue Id Venue Category Id Venue Category 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar 1 V-1 C002 Bar Figure : After Replacing 15
  • 16.
    Task 1: DataCleaning(Cont…) • Replacing Sub-Category Id’s From Category Id Column: User Id Venue Id Venue Category Id Venue Category 1 V-1 C001 Bar 1 V-2 C002 Bar 1 V-3 C001 Bar 1 V-4 C002 Bar 1 V-5 C002 Bar Figure : Before Replacing User Id Venue Id Venue Category Id Venue Category 1 V-1 C002 Bar 1 V-2 C002 Bar 1 V-3 C002 Bar 1 V-4 C002 Bar 1 V-5 C002 Bar Figure : After Replacing 16
  • 17.
    Task 1: DataCleaning(Cont…) • Replacing different latitude and longitude value of a venue: Figure : Before Replacing Figure : After Replacing Venue Id Latitude Longitude V-1 40 -73 V-1 43 -70 V-1 43 -70 V-1 40 -73 V-1 40 -73 17 Venue Id Latitude Longitude V-1 40 -73 V-1 40 -73 V-1 40 -73 V-1 40 -73 V-1 40 -73
  • 18.
    Task 2: FeatureEngineering • Check-In Counts: User Id Venue Id Check-In Count 1083 V-1 3 1083 V-2 1 1083 V-3 1 1083 V-4 2 1083 V-5 1 Figure : After adding Check-In Count attribute 18
  • 19.
    Task 2: FeatureEngineering(Cont…) • Venue Distance from User’s Center: - First We find out users center point by doing average of latitude and longitude where user has previously checked . -Now, Using this center points we calculate the distance of each ven using “The Haversine Formula”. Where, • d is the distance between the two points, • r is the radius of the sphere, • φ1, φ2: latitude of point 1 and latitude of point 2, in radians • λ1, λ2: longitude of point 1 and longitude of point 2, in radians Reference : https://www.movable-type.co.uk/scripts/latlong.html 19 𝒅 = 𝟐𝒓 𝐬𝐢𝐧−𝟏 𝐬𝐢𝐧 𝟐 𝝋 𝟐 − 𝝋 𝟏 𝟐 + 𝐜𝐨𝐬 𝝋 𝟏 𝐜𝐨𝐬 𝝋 𝟐 𝐬𝐢𝐧 𝟐 𝝀 𝟐 − 𝝀 𝟏 𝟐
  • 20.
    Our Dataset AfterFeature Engineering 20 1 User ID 2 Venue ID 3 Venue Category ID 4 Venue Category 5 Latitude 6 Longitude 7 Distance From Center 8 Check In Count
  • 21.
    Task 2 :Clustering • We used KNN (k-nearest neighbors) as clustering algorithm . • First we find the similarity between user using Pearson correlation . We also checked cosine correlation but Pearson Correlation gives us better result . Where:-  Rui, Rvi represent the checkingCount of ith item given by the user u and v respectively.  Ru , Rv represent the average checkin of user u and v respectively.  Iuv donates the set of items checked by both user u and v 𝒔𝒊𝒎 𝒖, 𝒗 = 𝒊∈𝑰 𝒖𝒗 𝑹 𝒖𝒊 − 𝑹 𝒖 . 𝑹 𝒗𝒊 − 𝑹 𝒗 𝒊∈𝑰 𝒖𝒗 𝑹 𝒖𝒊 − 𝑹 𝒖 𝟐 𝒊∈𝑰 𝒖𝒗 𝑹 𝒗𝒊 − 𝑹 𝒗 𝟐 21 Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation
  • 22.
    Task 2 :Clustering(Cont…) • After finding similarity we take top n nearest neighbor . • Then used their checkinCount to find predicted checkinCount for every places of that user which user didn’t check in . We used weighted average checkin to predict checkin count for a user . • After this we took top most checkInCount. 22
  • 23.
    Task 3 :Find User Preference • We used user’s every check-in’s distance from center point and find a mean distance. If user’s most of the checkin’s distance are more than mean distance we can say user like to travel in long distance otherwise like to travel in close distance . Then we sort the recommendation on user preference . • Example : Users mean checkin distance = 50 KM User’s have 50 checkins . 30 of them are more than 50 km. Result : Users Love o travel in long distance 23
  • 24.
    Example Place1 Place2 Place3Place4 Me 3 - 5 ? My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 What will be probable checking count of Place4? 24
  • 25.
    Example(Cont..) Place1 Place2 Place3Place4 Me 3 - 5 ? My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 25
  • 26.
    Example(Cont..) Place1 Place2 Place3Place4 Me 3 - 5 6 My Friend 4 6 - - You 3 - 5 6 Another guy 4 2 - 1 Your Friend 8 - - 3 26
  • 27.
    Evolution • We usedSampling and RMSE technique for evaluating our recommendation. • In sampling technique 10% of the entire dataset was selected randomly without replacement to make a sample dataset. • RMSE technique was used to evaluate the algorithm. It calculated the error of a predicted check in count from an actual check in count of a venue by specific user in test dataset. RMSE Formula: RMSE = 𝑖=1 𝑁 𝑃 𝑢,𝑖−𝑅 𝑢,𝑖 2 𝑁 Here : P u,i=is the predicted checkIn Count for user u on venue i R u,i=is the actual checkIn Count for user u on venue i N=is the total number of venues where user checked in Reference : Collaborative filtering based on user attributes and user ratings for restaurant recommendation 27
  • 28.
  • 29.
    Demo • We havecreated a simple demo where user can enter their id and our system will recommend place for user . Figure : Input User Id Figure : Output Recommendation 29
  • 30.
    Future Work 30 • TryModel Based Recommendation System • Add More Domain • Try Triangulation Technique to find user’s center point .
  • 31.