Does static analysis need machine learning?

Does static analysis need
machine learning?
Anti-Talk
Victoria Khanieva
PVS-Studio

Speaker
2
Victoria Khanieva
• С++ developer in PVS-Studio
• Supported the MISRA standard
• Wrote articles in checks of open-source
projects
khanieva@viva64.com
www.viva64.com

 Introduction to static analysis
 Existing solutions and approaches they implement
 Problems and pitfalls when creating an analyzer:
 When learning «manually»
 When learning on a real large code base
 Most promising approaches
Agenda
3

 Code review
Types of code analysis
5

 Code review
 Dynamic analysis
6

 Code review
 Dynamic analysis
 Static analysis
7

 How to reveal errors and flaws in the source code
of programs.
 Detect errors in programs
 Get tips on code formatting
 Count metrics
 ….
Static analysis
8

void createCube(float halfExtentsX,
float halfExtentsY,
float halfExtentsZ,
....){
....
m_model->addVertex(halfExtentsX,
halfExtentsY,
halfExtentsY,
....);
....
}
Diagnostics
9

void createCube(float halfExtentsX,
float halfExtentsY,
float halfExtentsZ,
....){
....
m_model->addVertex(halfExtentsX,
halfExtentsY,
halfExtentsY,
....);
....
}
Diagnostics
10
V751 Parameter 'halfExtentsZ' is not used inside function body.
TinyRenderer.cpp 375

When ML is useful
12
 Useful: Scanning photos and videos

When ML is useful
13
 Unuseful: Calculator

When ML is useful
14
 Unuseful: Calculator

 Java, JS, TS, Python, C, C++
 Code review and audit
 You can check out demos on an open-source project
 Related posts
DeepCode
19
Link

 Java, C, C++, Objective-C
 By Facebook
 Open-source code
 You can try Infer on your projects
 Based on the Хоара and separation logic,
bi-abduction, and the abstract interpretation
theory
Infer
21
Link

 Handles Infer results
 Suggests possible edits
SapFix
22

 Platform to analyze code quality
 System of edits suggestion
 Searches for dependencies
between functions and methods
by NLP
Embold
23

 Open-source
 Related posts
 Repository with dataset for learning
 Code-style detection
 Platform for collecting metrics and statistics
Source{d}
24
Link

Fixing code style in Source{d}
25
Based on the article
“STYLE-ANALYZER: fixing
code style inconsistencies
with interpretable
unsupervised algorithms”
Link

 By Mozilla+Ubisoft
 Searches for suspicious commits
 Based on the publication: “CLEVER: Combining Code
Metrics with Clone Detection for Just-In-Time Fault
Prevention and Resolution in Large Industrial Projects”
Clever-Commit
26
Link

 Java
 By Amazon
 Recommendations on best practices from the
documentation and code base
CodeGuru
27

 Analyze code to search for errors
 Analyze code to search for deviations from best
practices
 Analyze artifacts’ code
 Collect metrics and data on code
 Suggest code-style fixes
Main directions
29

 Selected base of open-source repositories
 Dataset selected manually
 Own project base
Ways to learn
30

Problems and pitfalls
31
* in the view of a classic static analyzer developer

How it may look like:
• if (X && A == A)
• if (A + 1 == A + 1)
• if (A[i] == A[i])
• if ((A) == (A))
• …
«Manual» dataset selection
32
We need to find:
if (A == A)

We need to find:
int y = x / 0;
In practice
35
How it may look like:
template <class T> class numeric_limits {
....
}
namespace boost {
....
}
namespace boost {
namespace hash_detail {
template <class T> void dsizet(size_t x) {
size_t length = x / (limits<int>::digits - 31);
}
}
}

@Override
public String getText(Mode mode) {
StringBuilder sb = new StringBuilder();
....
if (filter.getMessage()
.toLowerCase(Locale.ENGLISH)
.startsWith("Each ")) {
sb.append(" has base power and toughness ");
} else {
sb.append(" have base power and toughness ");
}
....
return sb.toString();
}
Data flow analysis
36

Data flow analysis
37
uint32_t* BnNew() {
uint32_t* result = new uint32_t[kBigIntSize];
memset(result, 0, kBigIntSize * sizeof(uint32_t));
return result;
}
std::string AndroidRSAPublicKey(crypto::RSAPrivateKey* key) {
....
uint32_t* n = BnNew();
....
RSAPublicKey pkey;
....
if (pkey.n0inv == 0)
return kDummyRSAPublicKey; // <=
....
}

 «So many projects on GitHub! The analyzer will learn from their
repositories and commits» turns into commits’ collection and
markup.
 If a manually collected learning base is unreliable, what to
expect from an automatically collected one?
Learning on many projects
38

 Check out the commit with the word «fix»:
Learning on many projects
39

 Analyzer has to be up-to-date in terms of the checked
language
 Most projects use outdated standards
 Most projects don’t use new constructions
Outdated code
40

New construction:
std::vector<int> numbers;
....
for (int num : numbers)
foo(num);
New error pattern:
numbers.push_back(num * 2);
Example
41

 Code example:
char check(const uint8 *hash_stage2)
{
....
return memcmp(hash_stage2, hash_stage2_reassured,
SHA1_HASH_SIZE);
}
 The analyzer hypothetically suggests to fix as follows:
int check(const uint8 *hash_stage2)
{
....
return memcmp(hash_stage2, hash_stage2_reassured,
SHA1_HASH_SIZE);
}
Why documentation matters
43

Classic approach: documentation
44

Code example:
ObjectOutputStream out = new ObjectOutputStream(....);
SerializedObject obj = new SerializedObject();
obj.state = 100;
out.writeObject(obj);
obj.state = 200;
out.close();
45

The analyzer suggests:
obj.state = 100;
obj = new SerializedObject(); // Add this line
obj.state = 200;
out.close();
46

What happens without the edit:
obj.state = 100;
out.writeObject(obj); // stores the object with the state = 100
obj.state = 200;
out.writeObject(obj); // stores the object with the state = 100
out.close();
47

std::vector<int> numbers;
....
{
if (num < 5)
{
numbers.push_back(0);
break; // or, for example, return
}
}
False positives
51

 Reason for getting a warning may be unclear.
Reason for NOT getting a warning may be unclear as well.
 How to fix?
 Additional learning (will it help?)
 Mechanism to hide warnings (not universal)
False positives
52

In case of successful analyzer learning
53

 Code style by specific symbols
 Collecting additional metrics and information
Promising directions
54

 Best-practices for a specific framework/code base/platform
Promising directions
55

56
https://pvs-studio.com/en/pvs-studio/download/
Download a PVS-Studio one-month trial version and
check your projects using a classic static analysis:

Q&A
viva64.com
57
khanieva@viva64.com

Does static analysis need machine learning?

More Related Content

What's hot

Similar to Does static analysis need machine learning?

More from Andrey Karpov

Recently uploaded

Does static analysis need machine learning?