Efficient Similarity Computation
for Collaborative Filtering in Dynamic Environments
Olivier Jeunen1
, Koen Verstrepen2
, Bart Goethals1,2
September 18th, 2019
1Adrem Data Lab, University of Antwerp
2Froomle
olivier.jeunen@uantwerp.be
1
Introduction & Motivation
Setting the scene




















u1 i1 t1
u1 i2 t2
u1 i3 t3
u2 i4 t4
u2 i2 t5
u3 i1 t6
u2 i5 t7
u2 i7 t8
u3 i6 t9
. . . . . . . . .




















We deal with implicit feedback: a set of (user, item,
timestamp)-triplets, representing clicks, views, sales,
…
Suppose we have a set of pageviews of this form.
2
Problem statement
In neighbourhood-based collaborative
filtering1, we need to compute similarity
between pairs of items.
Items are represented as sparse,
high-dimensional columns in the
user-item matrix P.




















0 0 0 . . . 0 1 0
1 0 0 . . . 0 0 1
0 0 0 . . . 1 0 0
0 0 1 . . . 0 0 0
. . . . . . . . . . . . . . . . . . . . .
0 1 0 . . . 0 0 0
0 0 0 . . . 0 1 0
0 1 1 . . . 0 0 0
0 0 0 . . . 1 0 0
1 0 1 . . . 0 1 0




















1
Still a very competitive baseline, but often deemed unscalable
3
A need for speed
Typically, the model is periodically recomputed.
For ever-growing datasets, these iterative updates can become very
time-consuming and model recency is often sacrificed.
0 20 40 60 80 100 120 140 160
time
0
10
20
30
40
50
60
runtime
∆t
∆t
...
Iterative model updates over time
4
Previous work
Existing approaches tend to speed up computations through
• Approximation.
• Parallellisation.
• Incremental computation.
But currently existing exact solutions do not exploit the sparsity that is
inherent to implicit-feedback data streams.
5
Contribution & Methodology
Incremental Similarity Computation
In the binary setting, cosine-similarity simplifies to the number of users
that have seen both items, divided by the square root of their individual
numbers.
cos(i, j) =
|Ui ∩ Uj|
|Ui| |Uj|
=
Mi,j
√
Ni Nj
As such, we can compute these building blocks incrementally instead of
recomputing the entire similarity with every update:
N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|.
6
Dynamic Index
Existing approaches tend to build inverted indices in a preprocessing
step… we do this on-the-fly!
Initialise a simple inverted index for every user, to hold their
histories.
For every pageview (u, i):
1. Increment item co-occurence for i and other items seen by u.
2. Update the item’s count.
3. Add the item to the user’s inverted index.
7
Online Learning
As Dynamic Index consists of a single for-loop over the pageviews,
it can naturally handle streaming data.
0 1 2 3 4 5 6
|P| ×103
0
1
2
3
4
runtime(s)
×107
|∆P|
∆t
ti ti+1
Impact of Online Learning
8
Parallellisation Procedure
We adopt a MapReduce-like parallellisation framework:
• Mapping is the Dynamic Index algorithm.
• Reducing two models M = {M, N, L} and M = {M , N , L } is:
1. Summing up M, M and N, N
2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L [u].
Step 2 is obsolete if M and M are computed on disjoint sets of users!
9
Recommendability
Often, the set of items that should be considered as recommendations is
constrained by recency, stock, licenses, seasonality, … We denote Rt as
the set of recommendable items at time t, and argue that it is often
much smaller than the full item collection.
Rt I
As such, we only need an up-to-date similarity sim(i, j) if either i or j is
recommendable:
i ∈ Rt ∨ j ∈ Rt
To keep up-to-date with recommendability updates:
add a second inverted index for every user. 10
Experimental Results
Datasets
Table 1: Experimental dataset characteristics.
Movielens* Netflix* News Outbrain
# “events” 20e6 100e6 96e6 200e6
# users 138e3 480e3 5e6 113e6
# items 27e3 18e3 297e3 1e6
mean items per user 144.41 209.25 18.29 1.76
mean users per item 747.84 5654.50 242.51 184.50
sparsity user-item matrix 99.46% 98.82% 99.99% 99.99%
sparsity item-item matrix 59.90% 0.22% 99.83% 99.98%
11
RQ1: Are we more efficient than the baselines?
0.0 0.5 1.0 1.5 2.0
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
runtime(s)
×103 Movielens
0.00 0.25 0.50 0.75 1.00
×108
0.00
0.15
0.30
0.45
0.60
0.75
0.90
1.05
×104 Netflix
0.00 0.25 0.50 0.75 1.00
|P| ×108
0
1
2
3
4
5
6
7
runtime(s)
×103 News
0.0 0.5 1.0 1.5 2.0
|P| ×108
0.0
0.2
0.4
0.6
0.8
1.0
1.2
×103 Outbrain
Sparse Baseline Dynamic Index 12
RQ1: Are we more efficient than the baselines?
Observations
• More efficient if M is sparse
• More efficient if users have shorter histories
• Average number of processed interactions per second
ranges from 14 500 to 834 000
13
RQ2: How effective is parallellisation?
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4runtime(s) ×103 Movielens
0.0 0.2 0.4 0.6 0.8 1.0
×108
0
1
2
3
4
5
6
7
8
×103 Netflix
0.0 0.2 0.4 0.6 0.8 1.0
|P| ×108
0
1
2
3
4
runtime(s)
×103 News
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
|P| ×108
0.0
0.5
1.0
1.5
2.0
2.5
3.0
×102 Outbrain
n = 1 n = 2 n = 4 n = 8 14
RQ2: How effective is parallellisation?
Observations
• Speedup factor of > 4 for Netflix and News datasets with 8 cores
• Incremental updates complicate reduce procedure:
• For sufficiently large batches, performance gains are tangible.
• For small batches, single-core updates are preferred.
15
RQ3: What is the effect of constrained recommendability?
101
102
103
runtime(s)
News (n = 8)
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75
time (h) ×102
103
104
105
|Rt|
δ = 6h
δ = 12h
δ = 18h
δ = 24h
δ = 48h
δ = 96h
δ = 168h
δ = ∞ 16
RQ3: What is the effect of constrained recommendability?
Observations
• Clear efficiency gains for lower values of δ:
• 48h only needs < 10% of the runtime needed without
restrictions.
• 24h < 5%
• 6h 1.6%
• Slope of increasing runtime with more data is flattened,
improving scalability.
17
Conclusion & Future Work
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
• is easily parallellisable.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
• is easily parallellisable.
• naturally handles and exploits recommendability of items.
18
Questions?
Source code is available:
Academics hire too!
PhD students + Post-docs
19
Future Work
• More advanced similarity measures:
• Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,…
are all dependent on the co-occurrence matrix M.
• Beyond item-item collaborative filtering:
• With relatively straightforward extensions…
(e.g. including a value in the inverted indices to allow for non-binary data)
…we can tackle more general Information Retrieval use-cases.
20

Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (ACM RecSys 2019)

  • 1.
    Efficient Similarity Computation forCollaborative Filtering in Dynamic Environments Olivier Jeunen1 , Koen Verstrepen2 , Bart Goethals1,2 September 18th, 2019 1Adrem Data Lab, University of Antwerp 2Froomle olivier.jeunen@uantwerp.be 1
  • 2.
  • 3.
    Setting the scene                     u1i1 t1 u1 i2 t2 u1 i3 t3 u2 i4 t4 u2 i2 t5 u3 i1 t6 u2 i5 t7 u2 i7 t8 u3 i6 t9 . . . . . . . . .                     We deal with implicit feedback: a set of (user, item, timestamp)-triplets, representing clicks, views, sales, … Suppose we have a set of pageviews of this form. 2
  • 4.
    Problem statement In neighbourhood-basedcollaborative filtering1, we need to compute similarity between pairs of items. Items are represented as sparse, high-dimensional columns in the user-item matrix P.                     0 0 0 . . . 0 1 0 1 0 0 . . . 0 0 1 0 0 0 . . . 1 0 0 0 0 1 . . . 0 0 0 . . . . . . . . . . . . . . . . . . . . . 0 1 0 . . . 0 0 0 0 0 0 . . . 0 1 0 0 1 1 . . . 0 0 0 0 0 0 . . . 1 0 0 1 0 1 . . . 0 1 0                     1 Still a very competitive baseline, but often deemed unscalable 3
  • 5.
    A need forspeed Typically, the model is periodically recomputed. For ever-growing datasets, these iterative updates can become very time-consuming and model recency is often sacrificed. 0 20 40 60 80 100 120 140 160 time 0 10 20 30 40 50 60 runtime ∆t ∆t ... Iterative model updates over time 4
  • 6.
    Previous work Existing approachestend to speed up computations through • Approximation. • Parallellisation. • Incremental computation. But currently existing exact solutions do not exploit the sparsity that is inherent to implicit-feedback data streams. 5
  • 7.
  • 8.
    Incremental Similarity Computation Inthe binary setting, cosine-similarity simplifies to the number of users that have seen both items, divided by the square root of their individual numbers. cos(i, j) = |Ui ∩ Uj| |Ui| |Uj| = Mi,j √ Ni Nj As such, we can compute these building blocks incrementally instead of recomputing the entire similarity with every update: N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|. 6
  • 9.
    Dynamic Index Existing approachestend to build inverted indices in a preprocessing step… we do this on-the-fly! Initialise a simple inverted index for every user, to hold their histories. For every pageview (u, i): 1. Increment item co-occurence for i and other items seen by u. 2. Update the item’s count. 3. Add the item to the user’s inverted index. 7
  • 10.
    Online Learning As DynamicIndex consists of a single for-loop over the pageviews, it can naturally handle streaming data. 0 1 2 3 4 5 6 |P| ×103 0 1 2 3 4 runtime(s) ×107 |∆P| ∆t ti ti+1 Impact of Online Learning 8
  • 11.
    Parallellisation Procedure We adopta MapReduce-like parallellisation framework: • Mapping is the Dynamic Index algorithm. • Reducing two models M = {M, N, L} and M = {M , N , L } is: 1. Summing up M, M and N, N 2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L [u]. Step 2 is obsolete if M and M are computed on disjoint sets of users! 9
  • 12.
    Recommendability Often, the setof items that should be considered as recommendations is constrained by recency, stock, licenses, seasonality, … We denote Rt as the set of recommendable items at time t, and argue that it is often much smaller than the full item collection. Rt I As such, we only need an up-to-date similarity sim(i, j) if either i or j is recommendable: i ∈ Rt ∨ j ∈ Rt To keep up-to-date with recommendability updates: add a second inverted index for every user. 10
  • 13.
  • 14.
    Datasets Table 1: Experimentaldataset characteristics. Movielens* Netflix* News Outbrain # “events” 20e6 100e6 96e6 200e6 # users 138e3 480e3 5e6 113e6 # items 27e3 18e3 297e3 1e6 mean items per user 144.41 209.25 18.29 1.76 mean users per item 747.84 5654.50 242.51 184.50 sparsity user-item matrix 99.46% 98.82% 99.99% 99.99% sparsity item-item matrix 59.90% 0.22% 99.83% 99.98% 11
  • 15.
    RQ1: Are wemore efficient than the baselines? 0.0 0.5 1.0 1.5 2.0 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2 runtime(s) ×103 Movielens 0.00 0.25 0.50 0.75 1.00 ×108 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.05 ×104 Netflix 0.00 0.25 0.50 0.75 1.00 |P| ×108 0 1 2 3 4 5 6 7 runtime(s) ×103 News 0.0 0.5 1.0 1.5 2.0 |P| ×108 0.0 0.2 0.4 0.6 0.8 1.0 1.2 ×103 Outbrain Sparse Baseline Dynamic Index 12
  • 16.
    RQ1: Are wemore efficient than the baselines? Observations • More efficient if M is sparse • More efficient if users have shorter histories • Average number of processed interactions per second ranges from 14 500 to 834 000 13
  • 17.
    RQ2: How effectiveis parallellisation? 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4runtime(s) ×103 Movielens 0.0 0.2 0.4 0.6 0.8 1.0 ×108 0 1 2 3 4 5 6 7 8 ×103 Netflix 0.0 0.2 0.4 0.6 0.8 1.0 |P| ×108 0 1 2 3 4 runtime(s) ×103 News 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 |P| ×108 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ×102 Outbrain n = 1 n = 2 n = 4 n = 8 14
  • 18.
    RQ2: How effectiveis parallellisation? Observations • Speedup factor of > 4 for Netflix and News datasets with 8 cores • Incremental updates complicate reduce procedure: • For sufficiently large batches, performance gains are tangible. • For small batches, single-core updates are preferred. 15
  • 19.
    RQ3: What isthe effect of constrained recommendability? 101 102 103 runtime(s) News (n = 8) 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 time (h) ×102 103 104 105 |Rt| δ = 6h δ = 12h δ = 18h δ = 24h δ = 48h δ = 96h δ = 168h δ = ∞ 16
  • 20.
    RQ3: What isthe effect of constrained recommendability? Observations • Clear efficiency gains for lower values of δ: • 48h only needs < 10% of the runtime needed without restrictions. • 24h < 5% • 6h 1.6% • Slope of increasing runtime with more data is flattened, improving scalability. 17
  • 21.
  • 22.
    Conclusion We introduce DynamicIndex, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. 18
  • 23.
    Conclusion We introduce DynamicIndex, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. 18
  • 24.
    Conclusion We introduce DynamicIndex, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. • is easily parallellisable. 18
  • 25.
    Conclusion We introduce DynamicIndex, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. • is easily parallellisable. • naturally handles and exploits recommendability of items. 18
  • 26.
    Questions? Source code isavailable: Academics hire too! PhD students + Post-docs 19
  • 27.
    Future Work • Moreadvanced similarity measures: • Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,… are all dependent on the co-occurrence matrix M. • Beyond item-item collaborative filtering: • With relatively straightforward extensions… (e.g. including a value in the inverted indices to allow for non-binary data) …we can tackle more general Information Retrieval use-cases. 20