Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (ACM RecSys 2019)

Efficient Similarity Computation
for Collaborative Filtering in Dynamic Environments
Olivier Jeunen1
, Koen Verstrepen2
, Bart Goethals1,2
September 18th, 2019
1Adrem Data Lab, University of Antwerp
2Froomle
olivier.jeunen@uantwerp.be
1

Setting the scene




















u1 i1 t1
u1 i2 t2
u1 i3 t3
u2 i4 t4
u2 i2 t5
u3 i1 t6
u2 i5 t7
u2 i7 t8
u3 i6 t9
. . . . . . . . .




















We deal with implicit feedback: a set of (user, item,
timestamp)-triplets, representing clicks, views, sales,
…
Suppose we have a set of pageviews of this form.
2

Problem statement
In neighbourhood-based collaborative
filtering1, we need to compute similarity
between pairs of items.
Items are represented as sparse,
high-dimensional columns in the
user-item matrix P.




















0 0 0 . . . 0 1 0
1 0 0 . . . 0 0 1
0 0 0 . . . 1 0 0
0 0 1 . . . 0 0 0
. . . . . . . . . . . . . . . . . . . . .
0 1 0 . . . 0 0 0
0 0 0 . . . 0 1 0
0 1 1 . . . 0 0 0
0 0 0 . . . 1 0 0
1 0 1 . . . 0 1 0




















1
Still a very competitive baseline, but often deemed unscalable
3

A need for speed
Typically, the model is periodically recomputed.
For ever-growing datasets, these iterative updates can become very
time-consuming and model recency is often sacrificed.
0 20 40 60 80 100 120 140 160
time
0
10
20
30
40
50
60
runtime
∆t
∆t
...
Iterative model updates over time
4

Previous work
Existing approaches tend to speed up computations through
• Approximation.
• Parallellisation.
• Incremental computation.
But currently existing exact solutions do not exploit the sparsity that is
inherent to implicit-feedback data streams.
5

Incremental Similarity Computation
In the binary setting, cosine-similarity simplifies to the number of users
that have seen both items, divided by the square root of their individual
numbers.
cos(i, j) =
|Ui ∩ Uj|
|Ui| |Uj|
=
Mi,j
√
Ni Nj
As such, we can compute these building blocks incrementally instead of
recomputing the entire similarity with every update:
N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|.
6

Dynamic Index
Existing approaches tend to build inverted indices in a preprocessing
step… we do this on-the-fly!
Initialise a simple inverted index for every user, to hold their
histories.
For every pageview (u, i):
1. Increment item co-occurence for i and other items seen by u.
2. Update the item’s count.
3. Add the item to the user’s inverted index.
7

Online Learning
As Dynamic Index consists of a single for-loop over the pageviews,
it can naturally handle streaming data.
0 1 2 3 4 5 6
|P| ×103
0
1
2
3
4
runtime(s)
×107
|∆P|
∆t
ti ti+1
Impact of Online Learning
8

Parallellisation Procedure
We adopt a MapReduce-like parallellisation framework:
• Mapping is the Dynamic Index algorithm.
• Reducing two models M = {M, N, L} and M = {M , N , L } is:
1. Summing up M, M and N, N
2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L [u].
Step 2 is obsolete if M and M are computed on disjoint sets of users!
9

Recommendability
Often, the set of items that should be considered as recommendations is
constrained by recency, stock, licenses, seasonality, … We denote Rt as
the set of recommendable items at time t, and argue that it is often
much smaller than the full item collection.
Rt I
As such, we only need an up-to-date similarity sim(i, j) if either i or j is
recommendable:
i ∈ Rt ∨ j ∈ Rt
To keep up-to-date with recommendability updates:
add a second inverted index for every user. 10

Datasets
Table 1: Experimental dataset characteristics.
Movielens* Netflix* News Outbrain
# “events” 20e6 100e6 96e6 200e6
# users 138e3 480e3 5e6 113e6
# items 27e3 18e3 297e3 1e6
mean items per user 144.41 209.25 18.29 1.76
mean users per item 747.84 5654.50 242.51 184.50
sparsity user-item matrix 99.46% 98.82% 99.99% 99.99%
sparsity item-item matrix 59.90% 0.22% 99.83% 99.98%
11

RQ1: Are we more efficient than the baselines?
0.0 0.5 1.0 1.5 2.0
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
runtime(s)
×103 Movielens
0.00 0.25 0.50 0.75 1.00
×108
0.00
0.15
0.30
0.45
0.60
0.75
0.90
1.05
×104 Netflix
0.00 0.25 0.50 0.75 1.00
|P| ×108
0
1
2
3
4
5
6
7
runtime(s)
×103 News
0.0 0.5 1.0 1.5 2.0
|P| ×108
0.0
0.2
0.4
0.6
0.8
1.0
1.2
×103 Outbrain
Sparse Baseline Dynamic Index 12

RQ1: Are we more efficient than the baselines?
Observations
• More efficient if M is sparse
• More efficient if users have shorter histories
• Average number of processed interactions per second
ranges from 14 500 to 834 000
13

RQ2: How effective is parallellisation?
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4runtime(s) ×103 Movielens
0.0 0.2 0.4 0.6 0.8 1.0
×108
0
1
2
3
4
5
6
7
8
×103 Netflix
0.0 0.2 0.4 0.6 0.8 1.0
|P| ×108
0
1
2
3
4
runtime(s)
×103 News
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
|P| ×108
0.0
0.5
1.0
1.5
2.0
2.5
3.0
×102 Outbrain
n = 1 n = 2 n = 4 n = 8 14

RQ2: How effective is parallellisation?
Observations
• Speedup factor of > 4 for Netflix and News datasets with 8 cores
• Incremental updates complicate reduce procedure:
• For sufficiently large batches, performance gains are tangible.
• For small batches, single-core updates are preferred.
15

RQ3: What is the effect of constrained recommendability?
101
102
103
runtime(s)
News (n = 8)
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75
time (h) ×102
103
104
105
|Rt|
δ = 6h
δ = 12h
δ = 18h
δ = 24h
δ = 48h
δ = 96h
δ = 168h
δ = ∞ 16

RQ3: What is the effect of constrained recommendability?
Observations
• Clear efficiency gains for lower values of δ:
• 48h only needs < 10% of the runtime needed without
restrictions.
• 24h < 5%
• 6h 1.6%
• Slope of increasing runtime with more data is flattened,
improving scalability.
17

Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
18

Conclusion
• computes incrementally by design.
18

Conclusion
• is easily parallellisable.
18

Conclusion
• is easily parallellisable.
• naturally handles and exploits recommendability of items.
18

Questions?
Source code is available:
Academics hire too!
PhD students + Post-docs
19

Future Work
• More advanced similarity measures:
• Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,…
are all dependent on the co-occurrence matrix M.
• Beyond item-item collaborative filtering:
• With relatively straightforward extensions…
(e.g. including a value in the inverted indices to allow for non-binary data)
…we can tackle more general Information Retrieval use-cases.
20

Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (ACM RecSys 2019)

More Related Content

What's hot

Similar to Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (ACM RecSys 2019)

Recently uploaded

Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (ACM RecSys 2019)