Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces

Using Source Code Metrics to Predict Change-
Prone Java Interfaces
Daniele Romano and Martin Pinzger
Williamsburg, ICSM 2011
29 Sept 2011

Delft
University of
Technology

Challenge the future

Contributions
• Correlation source code metrics vs #changes in interfaces:
• C&K metrics
• complexity and usage metrics
• interface usage cohesion metric
• Predictive power of source code metrics for interfaces:
• prediction models
• 10 open source projects
• 8 Eclipse projects
• Hibernate 2 and Hibernate 3

2

Motivations
• Changes in interfaces are not desirable
• changes can have stronger impact
• interfaces define contracts
• existing object oriented metrics not sound for interfaces

• Related work about metrics as quality predictors
• no differences among the kind of class

3

Hypotheses

• H1
• InterfaceUsageCohesion (IUC) has a stronger
correlation with number of Source Code Changes
(#SCC) of interfaces than the C&K metrics
• H2
• IUC can improve the performance of prediction models
to classify Java interfaces into change- and not-
change-prone

4

The Approach

source code
repository

metrics Changes
computation Retrieval

Spearman rank Correlation
Metrics train models
correlation Prediction
analysis Changes classify interfaces
analysis
H1 H2

5

Metrics Computation

Evolizer Model
source code Importer
repository

Famix
Model
Computation
Metrics
Values Understand

6

Changes Computation

Evolizer
source code Version Control
repository Connector Revisions
Info
&
Subsequent
Changes Computation files
Fine-Grained
Evolizer Change
Source Code Distiller
Changes
(SCC) AST Comparison

7

Why SCC?
• Filtering out useless changes due to modification of:
• licenses
• comments
• More precise measurement

#Revision=1 #LineModified=1 #SCC=2

8

C&K Correlation for Interfaces
Project CB0 NOC RFC DIT LCOM WMC
Hibernate3 0.535** 0.029 0.592** 0.058 0.103 0.657**
Hibernate2 0.373** 0.065 0.325** -0.01 0.006 0.522**
ecl.debug.core 0.484** 0.105 0.486** 0.232* 0.337 0.597**
ecl.debug.ui 0.216* 0.033 0.152 0.324** 0.214* 0.131
ecl.jface 0.239* 0.012 0.174** 0.103 0.320** 0.137
ecl.jdt.debug 0.512** 0.256** 0.349** -0.049 0.238** 0.489**
ecl.team.core 0.367* 0.102 0.497** 0.243 0.400 0.451**
ecl.team.cvs.core 0.688** -0.013 0.738** 0.618** 0.610** 0.744**
ecl.team.ui 0.301* -0.003 0.299* -0.103* 0.395** 0.299*
update.core 0.499** -0.007 0.381** 0.146 0.482** 0.729**
Median 0.428 0.031 0.365 0.124 0.328 0.505

*= significant at α=0.05 **= significant at α=0.01

9

Weighted Methods per Class (WMC)

• ci cyclomatic complexity of the ith method
• n number of methods in a class

Number of Methods

10

Interface Segregation Principle
 ISP
 defined by Robert C. Martin
 cope with fat interfaces

 Fat interface
 interfaces that serve different clients
 each kind of client uses a different set of methods
 the interface should be split in more interface, each one
designed to serve a specific client

11

Interface Segregation Principle (I)

Different clients do not share any methods

ClusterClients(i): counts the number of clients
that do not share any method of the interface i

12

Interface Usage Cohesion

Different clients share a method

13

Other metrics for interfaces…

• Number Of Methods (NOM)
• Number Of Arguments (NOA)
• Arguments Per Procedure (APP)
• Number of Clients (Cli)
• Number of Invocations (Inv)
• Number of Implementing Classes (Impl)

14

Correlation for Interfaces
Project Inv Cli NOM Clust IUC
Hibernate3 0.544** 0.433** 0.657** 0.302** -0.601**
Hibernate2 0.165 0.104 0.522** 0.016 -0.373**
ecl.debug.core 0.317** 0.327** 0.597** 0.273** -0.682**
ecl.debug.ui 0.497** 0.498** 0.131 0.418** -0.508**
ecl.jface 0.205 0.099 0.137 0.106** -0.363**
ecl.jdt.debug 0.495** 0.471 0.489** 0.474** -0.605**
ecl.team.core 0.261 0.278 0.451** 0.328* -0.475**
ecl.team.cvs.core 0.557** 0.608** 0.744** 0.369 -0.819**
ecl.team.ui 0.290 0.270 0.299 0.056 -0.618**
update.core 0.677** 0.656** 0.729** 0.606** -0.656**
Median 0.317 0.327 0.505 0.328 -0.605

*= significant at α=0.05 **= significant at α=0.01

15

Prediction Analysis
• Three Machine Learning Algorithms
• upport Vector Machine
S
• aïve Bayes Network
N

• eural Nets
N

• Interfaces classification:

• Training using 10 fold cross-validation
• {CBO, RFC, LCOM, WMC} = CK
• {CBO, RFC, LCOM, WMC, IUC} = IUC
16

Prediction – AUC values
NBayes LibSVN NN
Project CK IUC CK IUC CK IUC
ecl.team.cvs.core 0.55 0.75 0.692 0.811 0.8 0.8
ecl.debug.core 0.75 0.79 0.806 0.828 0.85 0.875
ecl.debug.ui 0.66 0.72 0.71 0.742 0.748 0.766
Hibernate2 0.745 0.807 0.735 0.708 0.702 0.747
Hibernate3 0.835 0.862 0.64 0.856 0.874 0.843
ecl.jdt.debug 0.79 0.738 0.741 0.82 0.77 0.762
ecl.jface 0.639 0.734 0.607 0.778 0.553 0.542
ecl.team.core 0.708 0.792 0.617 0.608 0.725 0.85
ecl.team.ui 0.88 0.8 0.74 0.884 0.65 0.75
update.core 0.782 0.811 0.794 0.817 0.675 0.744
Median 0.747 0.791 0.722 0.814 0.736 0.764

17

Results
• H1 ACCEPTED
• IUC has a stronger correlation with #SCC of interfaces
than the C&K metrics
• UIC shows the best correlation

• H2 PARTIALLY ACCEPTED
• IUC can improve the performance of prediction models
to classify Java interfaces into change- and not-
change-prone
• Despite the improvements Wilcoxon test showed a
significant difference only for the LibSVM

18

Implications
• Researchers
• taking in account the nature of the measured entities

• Quality Engineers
• enlarge metrics suites

• Developers and Architects
• Measure the ISP violation

19

Future Work

• Metrics measurement overtime

• Further validation

• Are the shared methods the problem?

• Component Based System and Service Oriented System

20

Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces

More Related Content

Viewers also liked

Similar to Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces

Recently uploaded

Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces