Machine Learning for Search at LinkedIn

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Machine Learning for Search @
Viet Ha-Thuc
Search Quality - LinkedIn
1

2
• 200+ countries and
territories
• 2+ new members per
second

3
● Dual Roles of Search
○ Enable talent discover opportunity
○ Help companies to search for the right talent

4
FLAGSHIP SEARCH
RECRUITER SEARCH
SALES NAVIGATOR

Unique Nature of LinkedIn Search
▪ Heterogeneous sources
People, jobs, companies,
slideshares, members’ posts,
groups
▪ Scale
▪ Deep Personalization
▪ Support many use-cases
Hiring, connecting, job seeking,
research, sales, etc.
5

Overview
6
Query
Federated Search
Spell Correction
Query Tagging
People Companies
Federated Search
Blending
Name Title Skill
Jobs

Overview
7
Query
Federated Search
Spell Correction
Query Tagging
People Companies
Federated Search
Blending
Name Title Skill
Jobs

Agenda
▪ Introduction
▪ Vertical Ranking
–People Search by Skills [BigData’15,SIGIR’16]
–Job Search [KDD’16]
▪ Federation [CIKM’15]
▪ Lessons
8

Introduction
▪ Skills
– 40K+ standardized skills
– Members get endorsed on
skills
– Represent professional
expertise
9

Introduction
▪ Unique challenges to LinkedIn expertise Search
– Scale: 400M members x 40K standardized skills
– Sparsity of skills in profiles
– Personalization
10
…

Reputation
Information a decision maker uses to make a
judgment on an entity with a record (*)
11
(*) “Building web reputation systems”, Glass and Farmer, 2010

Skill Reputation Scores [BigData’15]
12
▪ Decision Maker: searcher
▪ Record: Professional
career
▪ Skill reputation: member
expertise on a skill
▪ Judgment: Hire?

Estimating Skill Reputation
13
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
P(expert| member, skill)
Supervised
Learning
algorithm

14
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
Each row is a representation of a
member in latent space
Each column
represents a skill in
latent space
Matrix Factorization

15
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05
Members
Skills
Fill in unknown cells in
the original matrix

Features
▪ Reputation feature
▪ Social Connection
▪ Homophily
– Geo
– Industry
▪ Textual Features
16

Learning to Rank
▪ Listwise
– Consider relevance is relative to every query
– Allow optimizing quality metric directly
▪ Objective function
– Normalized Discounted Cumulative Gain (NDCG@K)
– Graded relevance labels
17

Labeling Strategy
18
▪ Logs + Top-K randomization
Uncertain (removed)
Bad: label = 0
Good: label = 1click
InMail Perfect: label = 3

Experiments
CTR@10 # Messages
per Search
Flagship +11% +20%
Premium +18% +37%
19
▪ Query Tagging
▪ Target Segment: skill and no-name
▪ Baseline
– No skill reputation feature
– Hand-tuned

Agenda
▪ Introduction
–People Search by Skills [BigData’15, SIGIR’16]
▪ Lessons
20

Challenges of Job Search
▪ “Hidden” structures
▪ Query only represents a small fraction of information need
–“San Francisco”, “software engineer”, “java”“Hidden” structures
▪ Job attractiveness varies on many aspects
–“Hot” titles: “data scientist”
–Top companies: Google, Facebook, etc.
–Trending skills: machine learning, big data, etc.,
–Location
21

Expertise Homophily
▪ “Classic” homophily in social networks
–People tend to interact with similar ones
▪ Expertise homophily in job search
–Searcher tends to apply for jobs with similar expertise
–Apply rate of job results with overlapping skills is 2x higher
▪ Expertise: skill reputation scores
23

Entity-faceted CTRs
▪ Job attractiveness
– Historical CTRs for individual jobs
– Challenge: job lifetime is short -> unreliable estimation
▪ Entity-faceted historical CTRs
– CTRs of jobs with standardized tile “data scientist”
– CTRs of jobs from company IBM
– CTRs of jobs requiring trending skill: machine learning, big data, etc.
▪ Advantages
– Alleviate data sparseness by grouping jobs by facets
– Resolve cold start problem
24

Experiment Results
▪ Baseline
▪ All of the existing features except entity-aware ones
▪ Machine learned
▪ Optimized for the same objective function
25
CTR Apply Rate
Improvement +11.3% +5.3%

Agenda
▪ Introduction
–People Search by Skills [BigData’15, SIGIR’16]
▪ Lessons
26

Personalized Blending
▪ Why do we need this?
– Not to overwhelm the user with too much information
– Make results personally relevant

Learning Model
▪ Training data: click logs
▪ Features
– Relevance scores from base rankers
– Searcher intent
– Query intent
– Prior scores

Calibrate Scores across Verticals
▪ Relevance scores from vertical rankers are incomparable

Calibrate Scores across Verticals
▪ Relevance scores from vertical rankers are incomparable
▪ Construct composite features
People relevance score of searcher if result is People
f 1= ⎨0, otherwise

Searcher Intent
Searcher’s job seeking intent if result is job vertical cluster
Searcher’s job seeking intent if result is individual job
Searcher’s recruiting intent if result is people vertical cluster
Searcher’s recruiting intent if result is individual people
...

Take-Aways
▪ Text match is still important but not enough
▪ Advanced features based on semi-structured
data
– People search: skill reputation scores
– Job Search: expertise homophily
▪ Personalized Learning-to-Rank is crucial
34

35
Email: vhathuc@linkedin.com

References
▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc,
Venkataraman, Rodriguez, Sinha, Sundaram and Guo,
BigData, 2015
▪“Personalized Federated Search at LinkedIn”, Arya, Ha-
Thuc and Sinha, CIKM, 2015
▪“Learning to Rank Personalized Search Results in
Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016
▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc,
Sinha, KDD, 2016
36

Machine Learning for Search at LinkedIn

More Related Content

Viewers also liked

Similar to Machine Learning for Search at LinkedIn

Recently uploaded

Machine Learning for Search at LinkedIn