Recommender Systems with Apache Spark's ALS Function

Building aBuilding a
RecommenderRecommender
SystemSystem
in Pysparkin Pyspark

Will JohnsonWill Johnson
- Uline- Uline
- DePaul- DePaul
LearnBy
Marketing.com

AGENDAAGENDA
- RecSys- RecSys
* Basics* Basics
* MF* MF
* Evaluation* Evaluation
* Advanced* Advanced
- PySpark- PySpark
* Basics* Basics
* ALS* ALS

User Based Collaborative Filtering
4.5
4.0
5.0
4.5
3.0
4.0
2.0
1.0 2.0
1.5
4.5

User Based Collaborative Filtering
4.5
4.0
5.0
4.5
3.0
4.0
3.8 2.0
1.0 2.0
1.5
4.5

Item Based Collaborative Filtering

Data
Understanding
movielens = sc.textFile("../in/ml-100k/u.data")

Data
Understanding
movielens.first()
movielens.count() 100,000
u'196t242t3t881250949'

Data
Understanding
clean_data = movielens.map(lambda x:x.split('t'))
rate = clean_data.map(lambda y: int(y[2]))
rate.mean() 3.52986
3
users = clean_data.map(lambda y: int(y[0]))
users.distinct().count() 943
clean_data.map(lambda y: int(y[1])).
distinct().count() 1,682

Data
Preparation
from pyspark.mllib.recommendation
import ALS, MatrixFactorizationModel, Rating
mls = movielens.map(lambda l: l.split('t'))
ratings = mls.map(lambda x:
Rating(int(x[0]), int(x[1]), float(x[2])))
Rating(user=196, product=242, rating=3.0)

Data
Preparation
train, test = ratings.randomSplit([0.7,0.3],7856)
train.count()
70,005
test.count()
29,995
train.cache()
test.cache()

Modeling
rank = 5 # Latent Factors to be made
numIterations = 10 # Times to repeat process
#Create the model on the training data
model = ALS.train(train, rank, numIterations)

Modeling /
Evaluation
model.userFeatures()
model.productFeatures()

Modeling /
Evaluation
# For Product X, Find N Users to Sell To
model.recommendUsers(242,100)
# For User Y Find N Products to Promote
model.recommendProducts(196,10)
#Predict Single Product for Single User
model.predict(196, 242)

Modeling /
Evaluation
# Predict Multi Users and Multi Products
# Pre-Processing
pred_input = train.map(lambda x:(x[0],x[1]))
# Lots of Predictions
pred = model.predictAll(pred_input)
#Returns Ratings(user, item, prediction)
(196, 242)
Rating(user=894, product=1560, rating=3.845)

Evaluation
User Item Actual Pred
196 242 3.0 3.91
186 302 3.0 3.29
22 377 1.0 1.09
244 51 2.0 3.66
298 474 4.0 4.11
TRAINING
RMSE: 0.763

Evaluation
#Organize the data to make (user, product) the key)
true_reorg = train.map(lambda x:((x[0],x[1]), x[2]))
pred_reorg = pred.map(lambda x:((x[0],x[1]), x[2]))
#Do the actual join
true_pred = true_reorg.join(pred_reorg)
from math import sqrt
MSE = true_pred.map(lambda r: (r[1][0] - r[1][1])**2).mean()
RMSE = sqrt(MSE)
#Results in 0.7629908117414474
((582, 1014), (4.0, 3.397))
((196, 242), 3.0)

Evaluation
test_input = test.map(lambda x:(x[0],x[1]))
pred_test = model.predictAll(test_input)
test_reorg = test.map(lambda x:((x[0],x[1]), x[2]))
pred_reorg = pred_test.map(lambda x:
((x[0],x[1]), x[2]))
test_pred = test_reorg.join(pred_reorg)
test_MSE = test_pred.map(lambda r:
(r[1][0] - r[1][1])**2).mean()
test_RMSE = sqrt(test_MSE)
TEST
RMSE: 1.0145

RECAP
RecSys are Nearest Neighbors or MF Based
ALS is Implemented in Spark

RECAP
rank = 5; numIterations = 10;
#Create the model on the training data
model = ALS.train(train, rank, numIterations)
# Lots of Predictions
pred = model.predictAll(pred_input)
#Examine Model Features
model.productFeatures()
# Save your model!
model.save(sc,"../out/ml-model")

Questions?Questions?
LearnBy
Marketing.com

Recommender Systems with Apache Spark's ALS Function

More Related Content

What's hot

Viewers also liked

Similar to Recommender Systems with Apache Spark's ALS Function

Recently uploaded

Recommender Systems with Apache Spark's ALS Function