Using Source Code Metrics to Predict Change-
Prone Java Interfaces
Daniele Romano and Martin Pinzger
 Williamsburg, ICSM 2011
29 Sept 2011




         Delft
         University of
         Technology

         Challenge the future
Contributions
•  Correlation source code metrics vs #changes in interfaces:
   •  C&K metrics
   •  complexity and usage metrics
   •  interface usage cohesion metric
•  Predictive power of source code metrics for interfaces:
   •  prediction models
•  10 open source projects
   •  8 Eclipse projects
   •  Hibernate 2 and Hibernate 3

                                                             2
Motivations
•  Changes in interfaces are not desirable
   •  changes can have stronger impact
   •  interfaces define contracts
   •  existing object oriented metrics not sound for interfaces


 •  Related work about metrics as quality predictors
     •  no differences among the kind of class




                                                           3
Hypotheses

•  H1
   • InterfaceUsageCohesion (IUC) has a stronger
   correlation with number of Source Code Changes
   (#SCC) of interfaces than the C&K metrics
•  H2
   • IUC can improve the performance of prediction models
   to classify Java interfaces into change- and not-
   change-prone



                                                      4
The Approach

                      source code
                       repository




          metrics                   Changes
        computation                 Retrieval




Spearman rank         Correlation
                                       Metrics train models
  correlation         Prediction
                        analysis    Changes classify interfaces
                        analysis
      H1                                        H2

                                                        5
Metrics Computation

                     Evolizer Model
source code            Importer
 repository

                                      Famix
                                      Model
                      Computation
 Metrics
 Values               Understand




                                         6
Changes Computation

                                   Evolizer
      source code               Version Control
       repository                 Connector        Revisions
                                                     Info
                                                       &
                                                  Subsequent
                         Changes Computation         files
Fine-Grained
                            Evolizer Change
Source Code                     Distiller
  Changes
   (SCC)                    AST Comparison




                                                      7
Why SCC?
•  Filtering out useless changes due to modification of:
   •  licenses
   •  comments
•  More precise measurement




#Revision=1      #LineModified=1       #SCC=2

                                                           8
C&K Correlation for Interfaces
     Project            CB0            NOC        RFC         DIT     LCOM       WMC
Hibernate3           0.535**     0.029         0.592**     0.058     0.103     0.657**
Hibernate2           0.373**     0.065         0.325**     -0.01     0.006     0.522**
ecl.debug.core       0.484**     0.105         0.486**     0.232*    0.337     0.597**
ecl.debug.ui         0.216*      0.033         0.152       0.324**   0.214*    0.131
ecl.jface            0.239*      0.012         0.174**     0.103     0.320**   0.137
ecl.jdt.debug        0.512**     0.256**       0.349**     -0.049    0.238**   0.489**
ecl.team.core        0.367*      0.102         0.497**     0.243     0.400     0.451**
ecl.team.cvs.core    0.688**     -0.013        0.738**     0.618**   0.610**   0.744**
ecl.team.ui          0.301*      -0.003        0.299*      -0.103*   0.395**   0.299*
update.core          0.499**     -0.007        0.381**     0.146     0.482**   0.729**
     Median          0.428       0.031         0.365       0.124     0.328     0.505

            *= significant at α=0.05    **= significant at α=0.01

                                                                                   9
Weighted Methods per Class (WMC)


   •  ci cyclomatic complexity of the ith method
   •  n number of methods in a class



                                   Number of Methods




                                                   10
Interface Segregation Principle
  ISP
       defined by Robert C. Martin
       cope with fat interfaces


  Fat   interface
       interfaces that serve different clients
       each kind of client uses a different set of methods
       the interface should be split in more interface, each one
        designed to serve a specific client




                                                                    11
Interface Segregation Principle (I)




 Different clients do not share any methods

ClusterClients(i): counts the number of clients
that do not share any method of the interface i


                                                  12
Interface Usage Cohesion




 Different clients share a method




                                    13
Other metrics for interfaces…

•  Number Of Methods (NOM)
•  Number Of Arguments (NOA)
•  Arguments Per Procedure (APP)
•  Number of Clients (Cli)
•  Number of Invocations (Inv)
•  Number of Implementing Classes (Impl)




                                           14
Correlation for Interfaces
     Project            Inv            Cli     NOM          Clust      IUC
Hibernate3           0.544**    0.433**      0.657**     0.302**     -0.601**
Hibernate2           0.165      0.104        0.522**     0.016       -0.373**
ecl.debug.core       0.317**    0.327**      0.597**     0.273**     -0.682**
ecl.debug.ui         0.497**    0.498**      0.131       0.418**     -0.508**
ecl.jface            0.205      0.099        0.137       0.106**     -0.363**
ecl.jdt.debug        0.495**    0.471        0.489**     0.474**     -0.605**
ecl.team.core        0.261      0.278        0.451**     0.328*      -0.475**
ecl.team.cvs.core    0.557**    0.608** 0.744**          0.369       -0.819**
ecl.team.ui          0.290      0.270        0.299       0.056       -0.618**
update.core          0.677**    0.656** 0.729**          0.606**     -0.656**
     Median          0.317      0.327        0.505       0.328       -0.605

            *= significant at α=0.05     **= significant at α=0.01

                                                                                15
Prediction Analysis
•  Three Machine Learning Algorithms
    •  upport Vector Machine
     S
    •  aïve Bayes Network
     N

    •  eural Nets
     N

•  Interfaces classification:




•  Training using 10 fold cross-validation
    •  {CBO, RFC, LCOM, WMC} = CK
    •  {CBO, RFC, LCOM, WMC, IUC} = IUC
                                             16
Prediction – AUC values
                       NBayes             LibSVN                NN
      Project        CK     IUC      CK        IUC         CK        IUC
ecl.team.cvs.core   0.55    0.75    0.692    0.811   0.8         0.8
ecl.debug.core      0.75    0.79    0.806    0.828   0.85        0.875
ecl.debug.ui        0.66    0.72    0.71     0.742   0.748       0.766
Hibernate2          0.745   0.807   0.735    0.708   0.702       0.747
Hibernate3          0.835   0.862   0.64     0.856   0.874       0.843
ecl.jdt.debug       0.79    0.738   0.741    0.82    0.77        0.762
ecl.jface           0.639   0.734   0.607    0.778   0.553       0.542
ecl.team.core       0.708   0.792   0.617    0.608   0.725       0.85
ecl.team.ui         0.88     0.8    0.74     0.884   0.65        0.75
update.core         0.782   0.811   0.794    0.817   0.675       0.744
      Median        0.747   0.791   0.722    0.814   0.736       0.764


                                                                        17
Results
•  H1 ACCEPTED
   • IUC has a stronger correlation with #SCC of interfaces
   than the C&K metrics
   •  UIC shows the best correlation

•  H2 PARTIALLY ACCEPTED
   • IUC can improve the performance of prediction models
   to classify Java interfaces into change- and not-
   change-prone
   •  Despite the improvements Wilcoxon test showed a
   significant difference only for the LibSVM

                                                        18
Implications
• Researchers
  •  taking in account the nature of the measured entities


• Quality Engineers
  •  enlarge metrics suites


• Developers and Architects
  •  Measure the ISP violation



                                                        19
Future Work

• Metrics measurement overtime

• Further validation

• Are the shared methods the problem?

• Component Based System and Service Oriented System




                                                20
21

Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces

  • 1.
    Using Source CodeMetrics to Predict Change- Prone Java Interfaces Daniele Romano and Martin Pinzger Williamsburg, ICSM 2011 29 Sept 2011 Delft University of Technology Challenge the future
  • 2.
    Contributions •  Correlation sourcecode metrics vs #changes in interfaces: •  C&K metrics •  complexity and usage metrics •  interface usage cohesion metric •  Predictive power of source code metrics for interfaces: •  prediction models •  10 open source projects •  8 Eclipse projects •  Hibernate 2 and Hibernate 3 2
  • 3.
    Motivations •  Changes ininterfaces are not desirable •  changes can have stronger impact •  interfaces define contracts •  existing object oriented metrics not sound for interfaces •  Related work about metrics as quality predictors •  no differences among the kind of class 3
  • 4.
    Hypotheses •  H1 • InterfaceUsageCohesion (IUC) has a stronger correlation with number of Source Code Changes (#SCC) of interfaces than the C&K metrics •  H2 • IUC can improve the performance of prediction models to classify Java interfaces into change- and not- change-prone 4
  • 5.
    The Approach source code repository metrics Changes computation Retrieval Spearman rank Correlation Metrics train models correlation Prediction analysis Changes classify interfaces analysis H1 H2 5
  • 6.
    Metrics Computation Evolizer Model source code Importer repository Famix Model Computation Metrics Values Understand 6
  • 7.
    Changes Computation Evolizer source code Version Control repository Connector Revisions Info & Subsequent Changes Computation files Fine-Grained Evolizer Change Source Code Distiller Changes (SCC) AST Comparison 7
  • 8.
    Why SCC? •  Filteringout useless changes due to modification of: •  licenses •  comments •  More precise measurement #Revision=1 #LineModified=1 #SCC=2 8
  • 9.
    C&K Correlation forInterfaces Project CB0 NOC RFC DIT LCOM WMC Hibernate3 0.535** 0.029 0.592** 0.058 0.103 0.657** Hibernate2 0.373** 0.065 0.325** -0.01 0.006 0.522** ecl.debug.core 0.484** 0.105 0.486** 0.232* 0.337 0.597** ecl.debug.ui 0.216* 0.033 0.152 0.324** 0.214* 0.131 ecl.jface 0.239* 0.012 0.174** 0.103 0.320** 0.137 ecl.jdt.debug 0.512** 0.256** 0.349** -0.049 0.238** 0.489** ecl.team.core 0.367* 0.102 0.497** 0.243 0.400 0.451** ecl.team.cvs.core 0.688** -0.013 0.738** 0.618** 0.610** 0.744** ecl.team.ui 0.301* -0.003 0.299* -0.103* 0.395** 0.299* update.core 0.499** -0.007 0.381** 0.146 0.482** 0.729** Median 0.428 0.031 0.365 0.124 0.328 0.505 *= significant at α=0.05 **= significant at α=0.01 9
  • 10.
    Weighted Methods perClass (WMC) •  ci cyclomatic complexity of the ith method •  n number of methods in a class Number of Methods 10
  • 11.
    Interface Segregation Principle  ISP   defined by Robert C. Martin   cope with fat interfaces   Fat interface   interfaces that serve different clients   each kind of client uses a different set of methods   the interface should be split in more interface, each one designed to serve a specific client 11
  • 12.
    Interface Segregation Principle(I) Different clients do not share any methods ClusterClients(i): counts the number of clients that do not share any method of the interface i 12
  • 13.
    Interface Usage Cohesion Different clients share a method 13
  • 14.
    Other metrics forinterfaces… •  Number Of Methods (NOM) •  Number Of Arguments (NOA) •  Arguments Per Procedure (APP) •  Number of Clients (Cli) •  Number of Invocations (Inv) •  Number of Implementing Classes (Impl) 14
  • 15.
    Correlation for Interfaces Project Inv Cli NOM Clust IUC Hibernate3 0.544** 0.433** 0.657** 0.302** -0.601** Hibernate2 0.165 0.104 0.522** 0.016 -0.373** ecl.debug.core 0.317** 0.327** 0.597** 0.273** -0.682** ecl.debug.ui 0.497** 0.498** 0.131 0.418** -0.508** ecl.jface 0.205 0.099 0.137 0.106** -0.363** ecl.jdt.debug 0.495** 0.471 0.489** 0.474** -0.605** ecl.team.core 0.261 0.278 0.451** 0.328* -0.475** ecl.team.cvs.core 0.557** 0.608** 0.744** 0.369 -0.819** ecl.team.ui 0.290 0.270 0.299 0.056 -0.618** update.core 0.677** 0.656** 0.729** 0.606** -0.656** Median 0.317 0.327 0.505 0.328 -0.605 *= significant at α=0.05 **= significant at α=0.01 15
  • 16.
    Prediction Analysis •  ThreeMachine Learning Algorithms •  upport Vector Machine S •  aïve Bayes Network N •  eural Nets N •  Interfaces classification: •  Training using 10 fold cross-validation •  {CBO, RFC, LCOM, WMC} = CK •  {CBO, RFC, LCOM, WMC, IUC} = IUC 16
  • 17.
    Prediction – AUCvalues NBayes LibSVN NN Project CK IUC CK IUC CK IUC ecl.team.cvs.core 0.55 0.75 0.692 0.811 0.8 0.8 ecl.debug.core 0.75 0.79 0.806 0.828 0.85 0.875 ecl.debug.ui 0.66 0.72 0.71 0.742 0.748 0.766 Hibernate2 0.745 0.807 0.735 0.708 0.702 0.747 Hibernate3 0.835 0.862 0.64 0.856 0.874 0.843 ecl.jdt.debug 0.79 0.738 0.741 0.82 0.77 0.762 ecl.jface 0.639 0.734 0.607 0.778 0.553 0.542 ecl.team.core 0.708 0.792 0.617 0.608 0.725 0.85 ecl.team.ui 0.88 0.8 0.74 0.884 0.65 0.75 update.core 0.782 0.811 0.794 0.817 0.675 0.744 Median 0.747 0.791 0.722 0.814 0.736 0.764 17
  • 18.
    Results •  H1 ACCEPTED • IUC has a stronger correlation with #SCC of interfaces than the C&K metrics •  UIC shows the best correlation •  H2 PARTIALLY ACCEPTED • IUC can improve the performance of prediction models to classify Java interfaces into change- and not- change-prone •  Despite the improvements Wilcoxon test showed a significant difference only for the LibSVM 18
  • 19.
    Implications • Researchers • taking in account the nature of the measured entities • Quality Engineers •  enlarge metrics suites • Developers and Architects •  Measure the ISP violation 19
  • 20.
    Future Work • Metrics measurementovertime • Further validation • Are the shared methods the problem? • Component Based System and Service Oriented System 20
  • 21.