On Analyzing and Developing Data
     Contracts in Cloud-based Data
              Marketplaces
                Hong-Linh Truong1, G.R. Gangadharan2, Marco
                 Comerio3, Schahram Dustdar1, Flavio De Paoli3
                  1
                      Distributed Systems Group, Vienna University of Technology
        2
            Institute for Development & Research in Banking Technology (IDRBT), India
       3
           Department of Informatics, Systems and Communication, University of Milano
                                           - Bicocca


                                     truong@infosys.tuwien.ac.at
                             http://www.infosys.tuwien.ac.at/Staff/truong
APSCC 2011, 12 Dec, 2011, Jeju, Korean                  1
Outline

 Background and motivation

 Analysis of data contracts

 Model of abstract data contracts

 Experiments




APSCC 2011, 12 Dec, 2011, Jeju, Korean 2
Background
 The rise of data-as-a-service and data market
  places
 Data contracts are important
    Give a clear information about data usage
    Have a remedy against the consumer where the
     circumstances are such that the acts complained of do
     not
    Limit the liability of data providers in case of failure of
     the provided data;
    Specify information on data delivery, acceptance, and
     payment


APSCC 2011, 12 Dec, 2011, Jeju, Korean 3
Motivation
 Well-researched contracts for services but not for DaaS and
  data marketplaces
    But service APIs != data APIs =! data assests
 Several open questions
    Right to use data? Quality of data in the data agreement? Search
     based on data contract? Etc.


  ➔
      Require extensible models
      ➔
          Capture contractual terms for data contracts
      ➔
          Support (semi-)automatic data service/data
          selection techniques.


APSCC 2011, 12 Dec, 2011, Jeju, Korean 4
Study of main data contract terms
 Data rights
    Derivation, Collection, Reproduction, Attribution
 Quality of Data (QoD)
    Not mentioned, Not clear how to establish QoD metrics
 Regulatory Compliance
    Sarbanes-Oxley, EU data protection directive, etc.
 Pricing model
    Different models, pricing for data APIs and for data assets
 Control and Relationship
    Evolution terms, support terms, limitation of liability, etc

    Most information is in human-readable form
APSCC 2011, 12 Dec, 2011, Jeju, Korean 5
Data contract study




APSCC 2011, 12 Dec, 2011, Jeju, Korean 6
Developing data contracts in cloud-
          based data marketplaces

 Our approach
   Follow community-based approach for data contract
   Propose generic structures to represent data
    contract terms and abstract data contracts
   Develop frameworks for data contract applications
   Incorporate data contracts into data-as-a-service
    description
   Develop data contract applications




APSCC 2011, 12 Dec, 2011, Jeju, Korean 7
Community view on data contract
          development
 Community users can develop:
    Term categories, term names, values, and units
    Rules for data contracts
    Common contract and contract fragments




                                           Community users
                                           =! novice users
APSCC 2011, 12 Dec, 2011, Jeju, Korean 8
Representing data contract terms
 Contract term: (termName,termValue)
    Term name: common terms or user-specific terms
    Term value: a single value, a set, or a range




APSCC 2011, 12 Dec, 2011, Jeju, Korean 9
Structuring abstract data contracts

 Concrete data                  generates
 contracts can be in
 RDF, XML or JSON




                                            Use Identifiers and
                                            Tags for identifying
                                            and searches
APSCC 2011, 12 Dec, 2011, Jeju, Korean 10
Development of contract
         applications
 Main applications:
    Data contract compatibility evaluation
    Data contract composition
 This paper does not deal with them but there are
  some common steps
    Extract DCTermType in TermCategoryType
        Extact comprable terms from all contracts,
           - e.g., dataRight: Derivation, Composition and Reproduction
    Use evaluation rules associated with DCTermType from
     from rule repositories
    Execute rules by passing comparable terms to rules
    Aggregate results
APSCC 2011, 12 Dec, 2011, Jeju, Korean 11
Prototype
 RDF for representing term
  categories, term names, term
  values, units
 Allegro Graph for storing
  contract knowledge




APSCC 2011, 12 Dec, 2011, Jeju, Korean 12
Illustrating examples
 A large sustainability monitoring data platform
  shows how green buildings are
   Real-time total and per capita of CO2 emission
     of monitored building
   Open government data about CO2 per capita at
    national level
 We created contracts from
   Open Data Commons Attribution License
   Open Government License



 APSCC 2011, 12 Dec, 2011, Jeju, Korean 13
Existing
common
knowledge
about Open
Data
Commons

  APSCC 2011, 12 Dec, 2011, Jeju, Korean 14
Step 2: provide OpenBuildingCO2
 OpenBuildingCO2 by                         OpenGov for
 modifying quality of                       government data
 data and data right




         Data contract for green building data
APSCC 2011, 12 Dec, 2011, Jeju, Korean 15
Experiments – composing data
         contract terms




APSCC 2011, 12 Dec, 2011, Jeju, Korean 16
Conclusions and future work
 Emerging data marketplaces and DaaS
    But lack of data contract support
    What constitutes data contracts has not been deeply
     investigated
 Our contribution:
    Analysis of data contracts
    An approach and framework to support data contracts
 Future work
    Work on domain-specific applications
    Integrate data contracts with data agreement
     exchange and data section and composition
     frameworks
    Integrate data contracts to DEMODS [AINA 2012]
APSCC 2011, 12 Dec, 2011, Jeju, Korean 17
Thanks for your attention!

         Hong-Linh Truong
         Distributed Systems Group
         Vienna University of Technology
         Austria

         truong@infosys.tuwien.ac.at
         http://www.infosys.tuwien.ac.at/staff/truong




APSCC 2011, 12 Dec, 2011, Jeju, Korean 18

On Analyzing and Developing Data Contracts in Cloud-based Data Marketplaces

  • 1.
    On Analyzing andDeveloping Data Contracts in Cloud-based Data Marketplaces Hong-Linh Truong1, G.R. Gangadharan2, Marco Comerio3, Schahram Dustdar1, Flavio De Paoli3 1 Distributed Systems Group, Vienna University of Technology 2 Institute for Development & Research in Banking Technology (IDRBT), India 3 Department of Informatics, Systems and Communication, University of Milano - Bicocca truong@infosys.tuwien.ac.at http://www.infosys.tuwien.ac.at/Staff/truong APSCC 2011, 12 Dec, 2011, Jeju, Korean 1
  • 2.
    Outline  Background andmotivation  Analysis of data contracts  Model of abstract data contracts  Experiments APSCC 2011, 12 Dec, 2011, Jeju, Korean 2
  • 3.
    Background  The riseof data-as-a-service and data market places  Data contracts are important  Give a clear information about data usage  Have a remedy against the consumer where the circumstances are such that the acts complained of do not  Limit the liability of data providers in case of failure of the provided data;  Specify information on data delivery, acceptance, and payment APSCC 2011, 12 Dec, 2011, Jeju, Korean 3
  • 4.
    Motivation  Well-researched contractsfor services but not for DaaS and data marketplaces  But service APIs != data APIs =! data assests  Several open questions  Right to use data? Quality of data in the data agreement? Search based on data contract? Etc. ➔ Require extensible models ➔ Capture contractual terms for data contracts ➔ Support (semi-)automatic data service/data selection techniques. APSCC 2011, 12 Dec, 2011, Jeju, Korean 4
  • 5.
    Study of maindata contract terms  Data rights  Derivation, Collection, Reproduction, Attribution  Quality of Data (QoD)  Not mentioned, Not clear how to establish QoD metrics  Regulatory Compliance  Sarbanes-Oxley, EU data protection directive, etc.  Pricing model  Different models, pricing for data APIs and for data assets  Control and Relationship  Evolution terms, support terms, limitation of liability, etc Most information is in human-readable form APSCC 2011, 12 Dec, 2011, Jeju, Korean 5
  • 6.
    Data contract study APSCC2011, 12 Dec, 2011, Jeju, Korean 6
  • 7.
    Developing data contractsin cloud- based data marketplaces  Our approach  Follow community-based approach for data contract  Propose generic structures to represent data contract terms and abstract data contracts  Develop frameworks for data contract applications  Incorporate data contracts into data-as-a-service description  Develop data contract applications APSCC 2011, 12 Dec, 2011, Jeju, Korean 7
  • 8.
    Community view ondata contract development  Community users can develop:  Term categories, term names, values, and units  Rules for data contracts  Common contract and contract fragments Community users =! novice users APSCC 2011, 12 Dec, 2011, Jeju, Korean 8
  • 9.
    Representing data contractterms  Contract term: (termName,termValue)  Term name: common terms or user-specific terms  Term value: a single value, a set, or a range APSCC 2011, 12 Dec, 2011, Jeju, Korean 9
  • 10.
    Structuring abstract datacontracts Concrete data generates contracts can be in RDF, XML or JSON Use Identifiers and Tags for identifying and searches APSCC 2011, 12 Dec, 2011, Jeju, Korean 10
  • 11.
    Development of contract applications  Main applications:  Data contract compatibility evaluation  Data contract composition  This paper does not deal with them but there are some common steps  Extract DCTermType in TermCategoryType  Extact comprable terms from all contracts, - e.g., dataRight: Derivation, Composition and Reproduction  Use evaluation rules associated with DCTermType from from rule repositories  Execute rules by passing comparable terms to rules  Aggregate results APSCC 2011, 12 Dec, 2011, Jeju, Korean 11
  • 12.
    Prototype  RDF forrepresenting term categories, term names, term values, units  Allegro Graph for storing contract knowledge APSCC 2011, 12 Dec, 2011, Jeju, Korean 12
  • 13.
    Illustrating examples  Alarge sustainability monitoring data platform shows how green buildings are  Real-time total and per capita of CO2 emission of monitored building  Open government data about CO2 per capita at national level  We created contracts from  Open Data Commons Attribution License  Open Government License APSCC 2011, 12 Dec, 2011, Jeju, Korean 13
  • 14.
    Existing common knowledge about Open Data Commons APSCC 2011, 12 Dec, 2011, Jeju, Korean 14
  • 15.
    Step 2: provideOpenBuildingCO2 OpenBuildingCO2 by OpenGov for modifying quality of government data data and data right Data contract for green building data APSCC 2011, 12 Dec, 2011, Jeju, Korean 15
  • 16.
    Experiments – composingdata contract terms APSCC 2011, 12 Dec, 2011, Jeju, Korean 16
  • 17.
    Conclusions and futurework  Emerging data marketplaces and DaaS  But lack of data contract support  What constitutes data contracts has not been deeply investigated  Our contribution:  Analysis of data contracts  An approach and framework to support data contracts  Future work  Work on domain-specific applications  Integrate data contracts with data agreement exchange and data section and composition frameworks  Integrate data contracts to DEMODS [AINA 2012] APSCC 2011, 12 Dec, 2011, Jeju, Korean 17
  • 18.
    Thanks for yourattention! Hong-Linh Truong Distributed Systems Group Vienna University of Technology Austria truong@infosys.tuwien.ac.at http://www.infosys.tuwien.ac.at/staff/truong APSCC 2011, 12 Dec, 2011, Jeju, Korean 18