Talos: Neutralizing Vulnerabilities
with Security Workarounds for
Rapid Response
Zhen Huang, Mariana D’Angelo, Dhaval Miyani, David Lie
Department of Electrical and Computer Engineering
University of Toronto
Drawbacks of Patching
• Patching is the usual way to fix a vulnerability.
• There often exists a delay between the discovery of a
vulnerability and the release of its patch, a pre-patch
window.
2
Discover a vulnerability Apply the patchRelease the patch
pre-patch window
Attackers can exploit the
vulnerability!
Pre-patch Window
• Our study on 131 recent vulnerabilities shows that
the delay is significant.
– five popular Linux server applications
• 33.3% of them were patched 30 days after their
discovery. A recent study indicates similar result [1].
3
1. “A large scale exploratory analysis of software vulnerability life cycles”, ICSE 2012
52 days delay on average!
Cause of Pre-patch Window
• We study bug reports to understand the time spent on
each step of releasing a patch.
• The complexity of constructing a correct patch is the
major cause.
– We found bug reports for 21 of the 131 vulnerabilities: 89% of
time was spent in constructing the patch for those took more
than one day to patch.
– 9 of them took between two to six attempts to patch correctly.
4
vulnerability triage constructing a patch regression testingconstructing a patch
Multiple attempts of patching (Quotes from a bug report)
The developer: “This updates the previous patch...”
....
The developer: “This patch builds on the previous one...”
....
....
The tester: “I’m afraid I found a bug...”
Configuration Workarounds
• To address the pre-patch window, users often resort
to configuration workarounds.
– leverage existing configuration settings to neutralize
vulnerabilities
5
2. CVE-2014-0226. Workaround disclosed on mail-archives.apache.org.
status module [2]
apache HTTP server
malicious request
sensitive datarequest rejected
Weakness of Configuration
Workarounds
• However, configuration workarounds have poor
coverage.
• Our study on 182 vulnerabilities indicates that only
25.2% of them have configuration workarounds.
– four Linux server applications and two Windows client
applications (IE and Office)
6
The vast majority of vulnerabilities do not have
configuration workarounds!
Security Workarounds for Rapid
Response (SWRR)
• SWRRs address the drawbacks of patching and
configuration workarounds.
• Objectives of SWRR:
– security: neutralize vulnerabilities rapidly without
introducing new bugs or vulnerabilities
– coverage: cover many more vulnerabilities than
configuration workarounds
– low cost: apply to existing applications with minimum
engineering effort
7
Example of an SWRR
• An SWRR neutralizes a vulnerability by disabling the
execution of vulnerable code.
• The mechanism is simple but effective.
8
int foo(...) {
....
// vulnerable code
....
}
int foo(...) {
return error_code;
....
// vulnerable code
....
}
SWRR
SWRR Deployment
Developers can choose two deployment modes.
1. In-place SWRRs
– pre-installed into an application
– deactivated by default
– users can activate them on the fly
– can cause runtime overhead
2. Patch-based SWRRs
– issued after vulnerabilities are discovered
– users need to install them
– no runtime overhead
9
unprotected
protected by SWRR
vulnerable
SWRR Reduces Pre-patch Window
Full Patch
In-place SWRR
Patch-based SWRR
SWRR eliminates these steps!
• Different approaches to addressing a vulnerability
Challenges of SWRR
• How to disable code execution safely?
– Applications should continually run with minimum
loss of functionality.
– An SWRR should be unobtrusive, i.e. not causing
loss of major functionality.
• How to minimize human effort in generating SWRRs?
11
Error-Handling Mechanism
• The existing error-handling mechanism can be leveraged
to address the challenges.
12
• Readily available
• Designed for unexpected
situations
• Can be identified using static
code analysis
int http_request_parse(...) {
if (0 != request_check_hostname(...)) {
return 0; // error-handling
}
....
}
int request_check_hostname(...) {
if (invalid_hostname)
return -1; // error-handling
lighttpd web server
Leverage Existing Error Code
• The error code used by SWRR must be recognized by
the application.
13
unsigned char* base64_decode(...) {
return 0; // SWRR
// vulnerable code
....
}
int http_auth_basic_check(...) {
if (!base64_decode(...) ) {
return 0; // error-handling
}
….
lighttpd web server
Identify Existing Error Code
• Some approaches to identifying error code:
– Common libraries or API functions have
documentation, but most code in an application does
not.
– Asking developers to annotate error code for each
function is tedious and time-consuming.
• Instead we use heuristics to identify error code via
static analysis.
14
Using Heuristics
15
Error-logging heuristic
NULL return heuristic
List of functions that
return error code
Propagate error code via info on
call chains
Augmented list of functions that
return error code
Evaluation
• Our prototype, Talos, mechanically generates and
instruments SWRRs into an application.
• Security, coverage, and overhead of SWRRs are evaluated
using five popular Linux applications.
– web servers: apache and lighttpd
– web cache/proxy: squid
– ftp server: proftpd
– database management: sqlite
16
Security
• Do SWRRs successfully neutralize vulnerabilities?
• Are SWRRs unobtrusive, i.e. not causing loss of
major functionality?
• We analyze effectiveness and unobtrusiveness
of SWRR for 11 real-world vulnerabilities.
– All vulnerabilities are successfully neutralized by
SWRRs.
– 8 SWRRs are unobtrusive.
17
Detailed analysis of each vulnerability
and its SWRR is presented in our
paper.
Coverage
• What is the percentage of vulnerabilities that can be
neutralized with an unobtrusive SWRR?
• We estimate coverage on vulnerabilities with
coverage on application code and tested 320 SWRRs.
18
0.00%
20.00%
40.00%
60.00%
80.00%
SWRR Configuration
Workaround
Obtrusive
Unobtrusive
2.1x of configuration workarounds!
Overhead
• We measure the increased code size and runtime
overhead for in-place SWRRs.
• On average, Talos adds 2% of code and causes an
application to incur 1.3% of runtime overhead.
19
Conclusion
• SWRRs can neutralize 53% of potential vulnerabilities
unobtrusively, which is 2.1x of configuration
workarounds.
• SWRRs can be used just like configuration
workarounds with a small 1.3% runtime overhead.
• Talos mechanically generates and instruments
SWRRs into existing applications, requiring minimum
developer effort.
20
Thank You!
contact: z.huang@mail.utoronto.ca
21
Error-logging Heuristic
• We note that error-handling code often logs
occurred errors.
• Look for a call to error logging function,
followed by a return of constant
22
if (name == NULL) {
// apache’s error logging function
ap_log_error(...., “Internal Error....”);
// indicate error to caller
return APR_EBADF;
apache web server
Developers annotate where they are declared
NULL Return Heuristic
• A function that returns a pointer usually
returns NULL to indicate an error.
23
Expr *sqlite3Expr(...) {
....
return sqlite3ExprAlloc(...);
}
static int multiSelectOrderBy(...) {
....
Expr *pNew = sqlite3Expr(...);
if (pNew==0) return SQLITE_NOMEM;
}
sqlite3 database server
Error Propagation Heuristic
• Many times the error code is propagated
up/down the call chain.
• There are three different error propagations:
– Direct error propagation
– Translated error propagation
– Inferred error propagation
24
Direct Error Propagation
• A caller directly use its callee’s return value as
its own return value.
25
int config_insert_values_global(...) {
....
return config_insert_values_internal(...);
}
int config_insert_values_internal(...) {
if (...) {
log_error_write(...);
return -1;
Callee returns -1 on error
Caller must return -1
on error
lighttpd web server
Translated Error Propagation
• An error code can be translated before it is
passed up the call chain.
26
SETDEFAULTS_FUNC (mod_secdownload_set_defaults) {
....
if (0 != config_insert_values_global(...)) {
return HANDLER_ERROR;
}
....
}
Callee returns -1 on
error
lighttpd web server
Caller must return
HANDLER_ERROR
on error
Inferred Error Propagation
• The error code can be inferred down the call
chain.
27
int http_request_parse (...) {
....
if (0 != request_check_hostname(...)) {
log_error_write(...);
return 0;
}
lighttpd web server
Callee must return
non-zero on error
Caller returns 0 on
error
Indirect Heuristic
• If a function does not have error-handling
code, we disable it by disabling all its all
callers.
28
foo()
does not handle
error
funcB()
handles error
funcA()
handles error
Talos
• Talos has two phases: analyzing source code
and instrumenting SWRRs.
29
Analyze
Source Code
Annotations
Add SWRRs to
Source CodeCall Graph
Control
Dependency
Source Code
with SWRRs
Source
Code

Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response (S&P'2016)

  • 1.
    Talos: Neutralizing Vulnerabilities withSecurity Workarounds for Rapid Response Zhen Huang, Mariana D’Angelo, Dhaval Miyani, David Lie Department of Electrical and Computer Engineering University of Toronto
  • 2.
    Drawbacks of Patching •Patching is the usual way to fix a vulnerability. • There often exists a delay between the discovery of a vulnerability and the release of its patch, a pre-patch window. 2 Discover a vulnerability Apply the patchRelease the patch pre-patch window Attackers can exploit the vulnerability!
  • 3.
    Pre-patch Window • Ourstudy on 131 recent vulnerabilities shows that the delay is significant. – five popular Linux server applications • 33.3% of them were patched 30 days after their discovery. A recent study indicates similar result [1]. 3 1. “A large scale exploratory analysis of software vulnerability life cycles”, ICSE 2012 52 days delay on average!
  • 4.
    Cause of Pre-patchWindow • We study bug reports to understand the time spent on each step of releasing a patch. • The complexity of constructing a correct patch is the major cause. – We found bug reports for 21 of the 131 vulnerabilities: 89% of time was spent in constructing the patch for those took more than one day to patch. – 9 of them took between two to six attempts to patch correctly. 4 vulnerability triage constructing a patch regression testingconstructing a patch Multiple attempts of patching (Quotes from a bug report) The developer: “This updates the previous patch...” .... The developer: “This patch builds on the previous one...” .... .... The tester: “I’m afraid I found a bug...”
  • 5.
    Configuration Workarounds • Toaddress the pre-patch window, users often resort to configuration workarounds. – leverage existing configuration settings to neutralize vulnerabilities 5 2. CVE-2014-0226. Workaround disclosed on mail-archives.apache.org. status module [2] apache HTTP server malicious request sensitive datarequest rejected
  • 6.
    Weakness of Configuration Workarounds •However, configuration workarounds have poor coverage. • Our study on 182 vulnerabilities indicates that only 25.2% of them have configuration workarounds. – four Linux server applications and two Windows client applications (IE and Office) 6 The vast majority of vulnerabilities do not have configuration workarounds!
  • 7.
    Security Workarounds forRapid Response (SWRR) • SWRRs address the drawbacks of patching and configuration workarounds. • Objectives of SWRR: – security: neutralize vulnerabilities rapidly without introducing new bugs or vulnerabilities – coverage: cover many more vulnerabilities than configuration workarounds – low cost: apply to existing applications with minimum engineering effort 7
  • 8.
    Example of anSWRR • An SWRR neutralizes a vulnerability by disabling the execution of vulnerable code. • The mechanism is simple but effective. 8 int foo(...) { .... // vulnerable code .... } int foo(...) { return error_code; .... // vulnerable code .... } SWRR
  • 9.
    SWRR Deployment Developers canchoose two deployment modes. 1. In-place SWRRs – pre-installed into an application – deactivated by default – users can activate them on the fly – can cause runtime overhead 2. Patch-based SWRRs – issued after vulnerabilities are discovered – users need to install them – no runtime overhead 9 unprotected protected by SWRR vulnerable
  • 10.
    SWRR Reduces Pre-patchWindow Full Patch In-place SWRR Patch-based SWRR SWRR eliminates these steps! • Different approaches to addressing a vulnerability
  • 11.
    Challenges of SWRR •How to disable code execution safely? – Applications should continually run with minimum loss of functionality. – An SWRR should be unobtrusive, i.e. not causing loss of major functionality. • How to minimize human effort in generating SWRRs? 11
  • 12.
    Error-Handling Mechanism • Theexisting error-handling mechanism can be leveraged to address the challenges. 12 • Readily available • Designed for unexpected situations • Can be identified using static code analysis int http_request_parse(...) { if (0 != request_check_hostname(...)) { return 0; // error-handling } .... } int request_check_hostname(...) { if (invalid_hostname) return -1; // error-handling lighttpd web server
  • 13.
    Leverage Existing ErrorCode • The error code used by SWRR must be recognized by the application. 13 unsigned char* base64_decode(...) { return 0; // SWRR // vulnerable code .... } int http_auth_basic_check(...) { if (!base64_decode(...) ) { return 0; // error-handling } …. lighttpd web server
  • 14.
    Identify Existing ErrorCode • Some approaches to identifying error code: – Common libraries or API functions have documentation, but most code in an application does not. – Asking developers to annotate error code for each function is tedious and time-consuming. • Instead we use heuristics to identify error code via static analysis. 14
  • 15.
    Using Heuristics 15 Error-logging heuristic NULLreturn heuristic List of functions that return error code Propagate error code via info on call chains Augmented list of functions that return error code
  • 16.
    Evaluation • Our prototype,Talos, mechanically generates and instruments SWRRs into an application. • Security, coverage, and overhead of SWRRs are evaluated using five popular Linux applications. – web servers: apache and lighttpd – web cache/proxy: squid – ftp server: proftpd – database management: sqlite 16
  • 17.
    Security • Do SWRRssuccessfully neutralize vulnerabilities? • Are SWRRs unobtrusive, i.e. not causing loss of major functionality? • We analyze effectiveness and unobtrusiveness of SWRR for 11 real-world vulnerabilities. – All vulnerabilities are successfully neutralized by SWRRs. – 8 SWRRs are unobtrusive. 17 Detailed analysis of each vulnerability and its SWRR is presented in our paper.
  • 18.
    Coverage • What isthe percentage of vulnerabilities that can be neutralized with an unobtrusive SWRR? • We estimate coverage on vulnerabilities with coverage on application code and tested 320 SWRRs. 18 0.00% 20.00% 40.00% 60.00% 80.00% SWRR Configuration Workaround Obtrusive Unobtrusive 2.1x of configuration workarounds!
  • 19.
    Overhead • We measurethe increased code size and runtime overhead for in-place SWRRs. • On average, Talos adds 2% of code and causes an application to incur 1.3% of runtime overhead. 19
  • 20.
    Conclusion • SWRRs canneutralize 53% of potential vulnerabilities unobtrusively, which is 2.1x of configuration workarounds. • SWRRs can be used just like configuration workarounds with a small 1.3% runtime overhead. • Talos mechanically generates and instruments SWRRs into existing applications, requiring minimum developer effort. 20
  • 21.
  • 22.
    Error-logging Heuristic • Wenote that error-handling code often logs occurred errors. • Look for a call to error logging function, followed by a return of constant 22 if (name == NULL) { // apache’s error logging function ap_log_error(...., “Internal Error....”); // indicate error to caller return APR_EBADF; apache web server Developers annotate where they are declared
  • 23.
    NULL Return Heuristic •A function that returns a pointer usually returns NULL to indicate an error. 23 Expr *sqlite3Expr(...) { .... return sqlite3ExprAlloc(...); } static int multiSelectOrderBy(...) { .... Expr *pNew = sqlite3Expr(...); if (pNew==0) return SQLITE_NOMEM; } sqlite3 database server
  • 24.
    Error Propagation Heuristic •Many times the error code is propagated up/down the call chain. • There are three different error propagations: – Direct error propagation – Translated error propagation – Inferred error propagation 24
  • 25.
    Direct Error Propagation •A caller directly use its callee’s return value as its own return value. 25 int config_insert_values_global(...) { .... return config_insert_values_internal(...); } int config_insert_values_internal(...) { if (...) { log_error_write(...); return -1; Callee returns -1 on error Caller must return -1 on error lighttpd web server
  • 26.
    Translated Error Propagation •An error code can be translated before it is passed up the call chain. 26 SETDEFAULTS_FUNC (mod_secdownload_set_defaults) { .... if (0 != config_insert_values_global(...)) { return HANDLER_ERROR; } .... } Callee returns -1 on error lighttpd web server Caller must return HANDLER_ERROR on error
  • 27.
    Inferred Error Propagation •The error code can be inferred down the call chain. 27 int http_request_parse (...) { .... if (0 != request_check_hostname(...)) { log_error_write(...); return 0; } lighttpd web server Callee must return non-zero on error Caller returns 0 on error
  • 28.
    Indirect Heuristic • Ifa function does not have error-handling code, we disable it by disabling all its all callers. 28 foo() does not handle error funcB() handles error funcA() handles error
  • 29.
    Talos • Talos hastwo phases: analyzing source code and instrumenting SWRRs. 29 Analyze Source Code Annotations Add SWRRs to Source CodeCall Graph Control Dependency Source Code with SWRRs Source Code

Editor's Notes

  • #5 link to previous slide – we want to understand the cause, more explanations to quotes quotes – link to attempts of correct patch first try does not pass regression testing
  • #9 important to return an error code, explain later on
  • #10 we propose two SWRR deployment modes
  • #11 to understand how SWRRs can reduce the pre-patch window, we compare SWRR with full patch. As we can see, releasing a full patch consists of the steps of finding the location of the vulnerability, figuring out the cause of the vulnerability, constructing a patch, ensuring no functionality is broken with regression testing. And the users needs to download and install the patch. emphasize the skipping of three steps of full patch, talos icon
  • #13 Readily available: almost every decent application has error-handling code Designed for unexpected situations: safely allow an application to continually run after an error Can be identified using static code analysis: needs minimum aid from developers
  • #15 focus on the structure and purpose of the heuristics and how they fits together
  • #19 clarify on why basic coverage is reduced to effective coverage