Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
We do Hadoop together.
Modern Data Architecture for
Data Transformation and Acquisition
with Oracle® and Apache™
Hadoop®
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Quick Housekeeping
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
Your Presenters
• Jeff Pollock
– Vice President, Product Management, Oracle
– Previously responsible for IBM InfoSphere Information
Integration & Governance products
– Author of “Semantic Web for Dummies” and "Adaptive
Information”
• Tim Hall
– Vice President, Product Management, Hortonworks
– Previously responsible for Oracle’s outbound product
management covering the Business Process
Management Suite, SOA Suite
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Today’s Topics
• Drivers for the Modern Data Architecture
• New Analytic Applications for New Types of Data
• Hadoop as the solution for Data Lake
• Hortonworks and Oracle Data Integration teaming up
• Oracle patterns for successful Data Reservoirs
• Oracle Data Integration Strengths in Hadoop
• Oracle Data Governance for Hadoop
• Q&A
Poll: Where are you in your Hadoop journey?
1. Researching our options
2. Currently evaluating some software
3. Deep in a trial
4. In production with a Hadoop cluster
5. What’s Hadoop?
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Data Architecture Under Pressure From New DataAPPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Packaged
Applications
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
OLTP, ERP, CRM Systems
Unstructured documents, emails
Clickstream
Server logs
Sentiment, Web Data
Sensor. Machine Data
Geolocation
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Within An Emerging Modern Data Architecture
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DEV & DATA TOOLS
Build &
Test
DATASYSTEM
REPOSITORIES
SOURCES
RDBMS EDW MPP
OLTP, ERP,
CRM
Systems
Documents,
Emails
Web Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
Governance
&Integration
Security
Operations
Data Access
Data Management
APPLICATIONS
Business
Analytics
Custom
Applications
Packaged
Applications
Clickstream
Capture and analyze
website visitors’ data
trails and optimize
your website
Sensors
Discover patterns in
data streaming
automatically from
remote sensors and
machines
Server Logs
Research logs to
diagnose process
failures and prevent
security breaches
New types of dataHadoop Value:
Sentiment
Understand how
your customers feel
about your brand
and products –
right now
Geographic
Analyze location-
based data to
manage operations
where they occur
Unstructured
Understand patterns
in files across
millions of web
pages, emails, and
documents
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New Analytic Applications For New Types Of Data
$
• Supplier Consolidation
• Supply Chain and Logistics
• Assembly Line Quality Assurance
• Proactive Maintenance
• Crowdsourced Quality Assurance
• New Account Risk Screens
• Fraud Prevention
• Trading Risk
• Maximize Deposit Spread
• Insurance Underwriting
• Accelerate Loan Processing
• Call Detail Records (CDRs)
• Infrastructure Investment
• Next Product to Buy (NPTB)
• Real-time Bandwidth
Allocation
• New Product Development
• 360° View of the Customer
• Analyze Brand Sentiment
• Localized, Personalized
Promotions
• Website Optimization
• Optimal Store Layout
Financial
Services
Retail Telecom Manufacturing
Healthcare
Utilities,
Oil & Gas
Public
Sector
• Genomic data for medical trials
• Monitor patient vitals
• Reduce re-admittance rates
• Store medical research data
• Recruit cohorts for
pharmaceutical trials
• Smart meter stream
analysis
• Slow oil well decline curves
• Optimize lease bidding
• Compliance reporting
• Proactive equipment repair
• Seismic image processing
• Analyze public sentiment
• Protect critical networks
• Prevent fraud and waste
• Crowdsource reporting for
repairs to infrastructure
• Fulfill open records requests
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
… And Incrementally Delivers A ‘Data Lake’
Data Lake
• An architectural shift in the
data center that uses
Hadoop to deliver deeper
insight across a large,
broad, diverse set of data at
efficient scale
SCALE
SCOPE
A Modern Data Architecture/Data Lake
New Analytic Apps
New types of data
LOB-driven
RDBMS
MPP
EDW
Governance
&Integration
Security
Operations
Data Access
Data Management
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Modern Data Architecture
Oracle Data Integration
• Eliminates need for
separate ETL engine –
and associated H/W,
admin, overhead
• Non-invasive realtime
data staging into Hadoop
• Streamlines development
by providing capability to
separate Logical from
Physical mappings
• Reduces risk and
compliance exposure via
comprehensive data
governance
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Oracle & Hortonworks
YARN Ready Partner
Certified on latest release of
Hortonworks Data Platform
Sandbox tutorial
Tutorial for
HWX Sandbox
Coming Soon!
ORCL Sandbox
Here Now!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration & Governance
14
Dynamic Data Movement
– Low impact capture
– Continuous data staging
Data Transformation
– Bulk data movement
– Pushdown data processing
Data Federation
– Virtualized Data Services
Data Quality & Verification
– Fix quality at the source
– Verify data consistency
Metadata Management
– Lineage and Impact Analysis
– Business Glossary Semantics
Data Governance
Foundation
Oracle Data Integrator
(Transformation)
Enterprise Data Quality
(Profile, Cleanse, Match and De-duplicate)
Fast
Load
Oracle GoldenGate
(Movement)
Enterprise Metadata Management & Business Glossary
(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator
(Federation)
GoldenGate Veridata
(Online Data Verification)
ELT Processing
on Hadoop or SQL
Continuous Availability
Comprehensive capabilities for the end-to-end data integration
and governance of all data – including Hadoop based data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Leverage Wide Range of Modern Analytic Styles
How to Succeed With a Big Data Reservoir
15
Do:
– Directly link to a Line of Business
initiative
– Iterate on short cycles, plan for
small high-value deliverables along
the way
– Use tools, not only custom coded
programs
Do Not:
– Start with a techie-led research
project w/out a biz objective
– Over promise business results on
the market hype alone
– Assume MapReduce is the answer
to all your technical challenges
DBMS
(on prem or cloud)
Data First
Analytics
Model First
Analytics
Streaming
Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Maximizing benefits:
1. Schema on Read
2. Cheaper Compute
3. Cheaper Storage
3 Core Patterns of Big Data Reservoir Success
16
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data
Storage
Data Sandbox:
– Leader: Line of Business (LoB)
– Value: Faster access to business data, Faster
time to value on Analytics
– Innovation: Schema-on-read empowers
rapid staging and Data Discovery
ETL Offload:
– Leader: Information Technology (IT)
– Value: Cost avoidance on DW/Marts
– Innovation: YARN/Hadoop empowers lower
cost compute and lower cost storage
Deep Data Storage:
– Leader: Risk / Compliance (LoB)
– Core Value: High fidelity aged data
– Innovation: SQL on Hadoop engines enable
very low cost, queryable data access
Leverage Wide Range of Modern Analytic Styles
Data First
Analytics
Model First
Analytics
Streaming
Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Approach to Big Data Integration is Superior
17
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data
Storage
Data Governance
Foundation
Oracle Data Integrator
(Transformation)
Enterprise Data Quality
(Profile, Cleanse, Match and De-duplicate)
Oracle GoldenGate
(Movement)
Enterprise Metadata Management & Business Glossary
(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
GoldenGate Veridata
(Online Data Verification)
Oracle GoldenGate:
– Non-invasive data capture
– Low-latency data movement
– Full or partial records staging
– Most proven integration tool worldwide
Oracle Data Integrator:
– No ETL engine is required
– Logical design separate from physical
– Deploys in Hadoop or off cluster
– Many options for movement
Metadata & Glossary:
– Search Driven
– Business Friendly
– Huge 3rd Party Support
– Automated Metadata Stitching
Leverage Wide Range of Modern Analytic Styles
Data First
Analytics
Model First
Analytics
Streaming
Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle GoldenGate Capabilities for Big Data
18
HDFS (Files)
HBase (NoSQL)
Hive / Hive Streaming (SQL)
Flume & Storm (Streaming)
Kafka (MPP Pub/Sub)
Spark Streaming (Machine Learning)
Capture Database Transactions and
Deliver to Big Data in Real-Time
Capture
Trail
Route
Deliver
Pump
GoldenGate
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Business Value of the GoldenGate Approach
19
Continuous Data Staging
– Don’t make the business wait
– CDC is by default, not an add-on
– Least invasive on sources
– Hadoop staging is fresh
Integrated, Native Capture
– Don’t create unnecessary risk
– Keep current with DB patches
– Certainty around licensing
– Proven best performance
Most Widely Proven
– 1000’s of customers
– Most demanding high volume
– Used for High Availability (HA)
– Dependable results
vs.
Batch Data Movement
– Typical ETL vendors all default to batch data
movement in their reference architectures
– Changed Data is an immature add-on
– ETL loading into Hadoop is mainly “batch mode”
Clumsy & Risky Data Capture
– Not in sync with Oracle Database versions
– Some can “talk the talk” but their CDC tech can’t
touch Oracle GoldenGate scale/performance
– Patches and Licensing create business risk
Niche, Low-End
– Some vendors only cover a few platforms
– Some vendors are broad, but don’t scale
– Few vendors have the reliability and dependability
to cover HA use cases
vs.
vs.
…the “Other Vendors”
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator (ODI) Capabilities for Big Data
20
Flume
Hive on MR, Tez, Spark
Logs
OLTP DB
SQOOP
OGG
Pig on MR, Tez, Spark
ODI
SQOOP
Any DW
OGG
Spark
Oozie
OEDQ OEMM
Data Validation
& Cleansing
Metadata Mgmt
& Lineage
API/File
Hive/HCat,
HDFS,HBase
Hive/HCat,
HDFS,HBase
NoSQL
Flume
Map once at the logical level, and then choose which Big Data or
Hadoop framework you want to run in!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Business Value of ODI: Low Cost and High Dev Efficiency
21
No ETL engine is
required
Separation of
Logical and
Physical design
Physical exec on
SQL, Hive, Pig, or
Spark
Runtime exec in
Oozie or via ODI
Java Agent
Rich set of pre-
built operators
User defined
functions
Eliminate your ETL Engines and improve Developer efficiency –
now, everybody can be a Big Data developer!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hadoop Cluster
Spark
Sqoop
Hive
Pig
ODI
Oozie
Sqoop
Data Flow Approaches to Big Data Integration
22
Hadoop Cluster
Spark
Sqoop Sqoop
Hive
Pig
Manual Code
Hadoop Cluster
ETLETL HDFS
Hadoop Cluster
ETLETLETL
HDFS
1. Traditional ETL Tools
(execute entirely outside of Hadoop)
2. ETL Tools with Native “on” Hadoop
(require proprietary code on Data Nodes)
3. Manual Coding
(ultimate flexibility, but at a very high cost)
4. ODI Native in Hadoop
(no ETL Engine & no Data Node footprint)
ETL
*small ODI Agent may optionally install off cluster or
on Name Node, no dependencies on Data Nodes
GG
BEST
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Metadata Management & Glossary for Big Data
23
Comprehensive Data Lineage
Business Friendly Navigation
Business & IT Collaboration
Easy to Use, Search Driven
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Value of Metadata and Business Glossary
My dashboard
does not match
this report…why?
Where did
this data
come
from?Where can I find
the data I need for
analytics?
Which ETL mappings or
BI Reports will be
affected by my column
change?
What systems does
the data flow
through?
TRUSTED DATA IT CERTAINTY
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Governance Lifecycle Tooling
25
Operational Data Flows
Business Sources
Quality KPIs Case
Management
Governance Cockpit for Data Stewards & Stakeholders
Exception
Review
Metadata
Management
Business
Glossary
Design Time
Support People and Processes with an end-to-
end tooling capability!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
…to manage Risk/Compliance
 Records retention
 Rediscovery
 Litigation support
 Data access management
 Information security and protection
Minimize corporate liability through proper
governance of data
…to drive Business Value
 Metadata discovery
 Metadata & glossary cataloging
 Data profiling
 Data cleansing lifecycle
 Data remediation
Maximize opportunity by ensuring trusted
data is easily available for data driven
business processes
26
The Data Governance Opportunity with Big Data
Solving business and IT data challenges
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Most Heterogeneous, Deep 3rd Party Coverage
27
 Hadoop HBase
 Hadoop Hive/Flume
 HP Enscribe
 HP NonStop
 HP Neoview
 Hypersonic SQL
 IBM DB2 i Series
 IBM DB2 UDB
 IBM DB2 z Series
 IBM Informix
 IBM Netezza
 JMS / MQ
 Microsoft Access
 Microsoft SQLServer
 MySQL
 Pivotal Greenplum
 PostgreSQL
 Salesforce.com
 SAP BW / BI
 SAP ERP / ECC
 SAS
 SQL/MP
 SQL/MX
 Sybase ASE
 Sybase IQ
 Teradata
 Adaptive
 Altova
 Apache Hcatalog
 Apache Hive/HQL
 Borland
 CA ERwin
 Cloudera Impala
 COBOL Copybook
 DataStax
 Embarcadero
 EMC ProActivity
 GentleWare
 Google BigQuery
 Grandite
 Hadapt Hive
 Hortonworks Hive
 IBM Cognos
 IBM DB2
 IBM DataStage
 IBM Discovery
 IBM Federation Server
 IBM Lotus Notes
 IBM Netezza
 IBM Rational Rose
 IBM Rational Architect
 Informatica Metadata Mgr.
 Informatica PowerCenter
 CoSORT
 ISO SQL Standard (DDL)
 MapR Hadoop Hive
 MicroFocus
 Microsoft Access
 Microsoft Office Excel
 Microsoft Visio
 Microsoft SQL Server
 Microsoft SSIS
 Microsoft Visual Studio
 Microstrategy
 Magic Draw
 OMG CWM Standard
 OMG UML Standard
 Oracle BI Answers
 Oracle BI Enterprise Edition
 Oracle BI Server
 Oracle DAC
 Oracle Data Integrator
 Oracle Data Modeler
 Oracle Database
 Oracle Designer
 Oracle Hyperion Applications
 Oracle Hyperion Essbase
 Oracle Warehouse Builder
 Pivotal Greenplum
 PostgreSQL
 QlikView
 SAP BO Crystal Reports
 SAP BO Designer
 SAP BO Desktop Intelligence
 SAP BO Repository
 SAP BO Data Integrator
 SAP BO Data Steward
 SAP Master Data Management
 SAP Sybase PowerDesigner
 SAP Sybase ASE Database
 SAS Data Integration Studio
 SAS BI Server
 SAS Information Map
 SAS Metadata Management
 SAS OLAP Server
 Select
 Sparx Architect
 Syncsort
 Tableau
 Talend
 Teradata
 Tigris
 Visible
 W3C DTD & XSD Schema
Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis)
 Oracle Database
 Oracle Exadata
 Oracle Big Data Appliance
 Oracle TimesTen
 Oracle OLAP
 Oracle Business Intelligence
 Oracle BI Applications
 Oracle E-Business Suite
 Oracle JD Edwards Enterprise One
 Oracle JD Edwards World
 Oracle Fusion Applications
 Oracle Governance Risk and Compliance
 Oracle Fusion AIA
 Oracle Retail Applications
 Oracle Agile BI / DW
 Oracle Agile PLM for Process
 Oracle iFlex FlexCUBE
 Oracle iFlex Mantas
 Oracle Hyperion Applications
 Oracle PeopleSoft
 Oracle Siebel CRM / OnDemand
 Oracle Communications
 Oracle WebLogic Server
 Oracle Coherence Data Grid
 Oracle SOA Suite
 Oracle Enterprise Service Bus
+ open APIs and standards
based meta-model
No other vendor can compare:
• 50+ systems for Operational Integration
• 70+ systems for Metadata Harvesting
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Governance
Foundation
Differentiated Technical Approach from Oracle
28
Dynamic Data Movement
– Real-time by default, not ETL
– Least invasive on sources
– Proven best performance
– Native Oracle integration
No ETL Engines
– Take processing to the data;
don’t move the data
– Leverage the data engines for
workloads (Hadoop or SQL)
Most Heterogeneous
– Leverage open source Hadoop,
not proprietary distributions
– Hadoop is the Hub, not ETL tools
– Open metadata standards
Oracle Data Integrator
(Transformation)
Enterprise Data Quality
(Profile, Cleanse, Match and De-duplicate)
Fast
Load
Oracle GoldenGate
(Movement)
Enterprise Metadata Management & Business Glossary
(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator
(Federation)
GoldenGate Veridata
(Online Data Verification)
ELT Processing
on Hadoop or SQL
Continuous Availability
Comprehensive capabilities for the end-to-end data integration
and governance of all data – including Hadoop based data
Question & Answer session will be conducted electronically,
using the panel to the right of your screen
About Oracle and Hortonworks
hortonworks.com/partner/oracle/
Get started with Hortonworks Sandbox
hortonworks.com/sandbox
Follow us:
@hortonworks @Oracle
Learn more
Oracle.com/goto/dataintegration

Hortonworks Oracle Big Data Integration

  • 1.
    Page1 © HortonworksInc. 2011 – 2014. All Rights Reserved We do Hadoop together.
  • 2.
    Modern Data Architecturefor Data Transformation and Acquisition with Oracle® and Apache™ Hadoop®
  • 3.
    Page3 © HortonworksInc. 2011 – 2014. All Rights Reserved Quick Housekeeping Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining!
  • 4.
    Your Presenters • JeffPollock – Vice President, Product Management, Oracle – Previously responsible for IBM InfoSphere Information Integration & Governance products – Author of “Semantic Web for Dummies” and "Adaptive Information” • Tim Hall – Vice President, Product Management, Hortonworks – Previously responsible for Oracle’s outbound product management covering the Business Process Management Suite, SOA Suite
  • 5.
    Page5 © HortonworksInc. 2011 – 2014. All Rights Reserved Today’s Topics • Drivers for the Modern Data Architecture • New Analytic Applications for New Types of Data • Hadoop as the solution for Data Lake • Hortonworks and Oracle Data Integration teaming up • Oracle patterns for successful Data Reservoirs • Oracle Data Integration Strengths in Hadoop • Oracle Data Governance for Hadoop • Q&A
  • 6.
    Poll: Where areyou in your Hadoop journey? 1. Researching our options 2. Currently evaluating some software 3. Deep in a trial 4. In production with a Hadoop cluster 5. What’s Hadoop?
  • 7.
    Page7 © HortonworksInc. 2011 – 2014. All Rights Reserved A Data Architecture Under Pressure From New DataAPPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Sentiment, Web Data Sensor. Machine Data Geolocation
  • 8.
    Page8 © HortonworksInc. 2011 – 2014. All Rights Reserved Hadoop Within An Emerging Modern Data Architecture OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test DATASYSTEM REPOSITORIES SOURCES RDBMS EDW MPP OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data Governance &Integration Security Operations Data Access Data Management APPLICATIONS Business Analytics Custom Applications Packaged Applications
  • 9.
    Clickstream Capture and analyze websitevisitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches New types of dataHadoop Value: Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location- based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 10.
    Page10 © HortonworksInc. 2011 – 2014. All Rights Reserved New Analytic Applications For New Types Of Data $ • Supplier Consolidation • Supply Chain and Logistics • Assembly Line Quality Assurance • Proactive Maintenance • Crowdsourced Quality Assurance • New Account Risk Screens • Fraud Prevention • Trading Risk • Maximize Deposit Spread • Insurance Underwriting • Accelerate Loan Processing • Call Detail Records (CDRs) • Infrastructure Investment • Next Product to Buy (NPTB) • Real-time Bandwidth Allocation • New Product Development • 360° View of the Customer • Analyze Brand Sentiment • Localized, Personalized Promotions • Website Optimization • Optimal Store Layout Financial Services Retail Telecom Manufacturing Healthcare Utilities, Oil & Gas Public Sector • Genomic data for medical trials • Monitor patient vitals • Reduce re-admittance rates • Store medical research data • Recruit cohorts for pharmaceutical trials • Smart meter stream analysis • Slow oil well decline curves • Optimize lease bidding • Compliance reporting • Proactive equipment repair • Seismic image processing • Analyze public sentiment • Protect critical networks • Prevent fraud and waste • Crowdsource reporting for repairs to infrastructure • Fulfill open records requests
  • 11.
    Page11 © HortonworksInc. 2011 – 2014. All Rights Reserved … And Incrementally Delivers A ‘Data Lake’ Data Lake • An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale SCALE SCOPE A Modern Data Architecture/Data Lake New Analytic Apps New types of data LOB-driven RDBMS MPP EDW Governance &Integration Security Operations Data Access Data Management
  • 12.
    Page12 © HortonworksInc. 2011 – 2014. All Rights Reserved The Modern Data Architecture Oracle Data Integration • Eliminates need for separate ETL engine – and associated H/W, admin, overhead • Non-invasive realtime data staging into Hadoop • Streamlines development by providing capability to separate Logical from Physical mappings • Reduces risk and compliance exposure via comprehensive data governance
  • 13.
    Page13 © HortonworksInc. 2011 – 2014. All Rights Reserved Oracle & Hortonworks YARN Ready Partner Certified on latest release of Hortonworks Data Platform Sandbox tutorial Tutorial for HWX Sandbox Coming Soon! ORCL Sandbox Here Now!
  • 14.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Oracle Data Integration & Governance 14 Dynamic Data Movement – Low impact capture – Continuous data staging Data Transformation – Bulk data movement – Pushdown data processing Data Federation – Virtualized Data Services Data Quality & Verification – Fix quality at the source – Verify data consistency Metadata Management – Lineage and Impact Analysis – Business Glossary Semantics Data Governance Foundation Oracle Data Integrator (Transformation) Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate) Fast Load Oracle GoldenGate (Movement) Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance) Data Service Integrator (Federation) GoldenGate Veridata (Online Data Verification) ELT Processing on Hadoop or SQL Continuous Availability Comprehensive capabilities for the end-to-end data integration and governance of all data – including Hadoop based data
  • 15.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Leverage Wide Range of Modern Analytic Styles How to Succeed With a Big Data Reservoir 15 Do: – Directly link to a Line of Business initiative – Iterate on short cycles, plan for small high-value deliverables along the way – Use tools, not only custom coded programs Do Not: – Start with a techie-led research project w/out a biz objective – Over promise business results on the market hype alone – Assume MapReduce is the answer to all your technical challenges DBMS (on prem or cloud) Data First Analytics Model First Analytics Streaming Analytics
  • 16.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Maximizing benefits: 1. Schema on Read 2. Cheaper Compute 3. Cheaper Storage 3 Core Patterns of Big Data Reservoir Success 16 DBMS (on prem or cloud) Sandbox ETL Offload Staging Deep Data Storage Data Sandbox: – Leader: Line of Business (LoB) – Value: Faster access to business data, Faster time to value on Analytics – Innovation: Schema-on-read empowers rapid staging and Data Discovery ETL Offload: – Leader: Information Technology (IT) – Value: Cost avoidance on DW/Marts – Innovation: YARN/Hadoop empowers lower cost compute and lower cost storage Deep Data Storage: – Leader: Risk / Compliance (LoB) – Core Value: High fidelity aged data – Innovation: SQL on Hadoop engines enable very low cost, queryable data access Leverage Wide Range of Modern Analytic Styles Data First Analytics Model First Analytics Streaming Analytics
  • 17.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Oracle Approach to Big Data Integration is Superior 17 DBMS (on prem or cloud) Sandbox ETL Offload Staging Deep Data Storage Data Governance Foundation Oracle Data Integrator (Transformation) Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate) Oracle GoldenGate (Movement) Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance) GoldenGate Veridata (Online Data Verification) Oracle GoldenGate: – Non-invasive data capture – Low-latency data movement – Full or partial records staging – Most proven integration tool worldwide Oracle Data Integrator: – No ETL engine is required – Logical design separate from physical – Deploys in Hadoop or off cluster – Many options for movement Metadata & Glossary: – Search Driven – Business Friendly – Huge 3rd Party Support – Automated Metadata Stitching Leverage Wide Range of Modern Analytic Styles Data First Analytics Model First Analytics Streaming Analytics
  • 18.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Oracle GoldenGate Capabilities for Big Data 18 HDFS (Files) HBase (NoSQL) Hive / Hive Streaming (SQL) Flume & Storm (Streaming) Kafka (MPP Pub/Sub) Spark Streaming (Machine Learning) Capture Database Transactions and Deliver to Big Data in Real-Time Capture Trail Route Deliver Pump GoldenGate
  • 19.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Business Value of the GoldenGate Approach 19 Continuous Data Staging – Don’t make the business wait – CDC is by default, not an add-on – Least invasive on sources – Hadoop staging is fresh Integrated, Native Capture – Don’t create unnecessary risk – Keep current with DB patches – Certainty around licensing – Proven best performance Most Widely Proven – 1000’s of customers – Most demanding high volume – Used for High Availability (HA) – Dependable results vs. Batch Data Movement – Typical ETL vendors all default to batch data movement in their reference architectures – Changed Data is an immature add-on – ETL loading into Hadoop is mainly “batch mode” Clumsy & Risky Data Capture – Not in sync with Oracle Database versions – Some can “talk the talk” but their CDC tech can’t touch Oracle GoldenGate scale/performance – Patches and Licensing create business risk Niche, Low-End – Some vendors only cover a few platforms – Some vendors are broad, but don’t scale – Few vendors have the reliability and dependability to cover HA use cases vs. vs. …the “Other Vendors”
  • 20.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Oracle Data Integrator (ODI) Capabilities for Big Data 20 Flume Hive on MR, Tez, Spark Logs OLTP DB SQOOP OGG Pig on MR, Tez, Spark ODI SQOOP Any DW OGG Spark Oozie OEDQ OEMM Data Validation & Cleansing Metadata Mgmt & Lineage API/File Hive/HCat, HDFS,HBase Hive/HCat, HDFS,HBase NoSQL Flume Map once at the logical level, and then choose which Big Data or Hadoop framework you want to run in!
  • 21.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Business Value of ODI: Low Cost and High Dev Efficiency 21 No ETL engine is required Separation of Logical and Physical design Physical exec on SQL, Hive, Pig, or Spark Runtime exec in Oozie or via ODI Java Agent Rich set of pre- built operators User defined functions Eliminate your ETL Engines and improve Developer efficiency – now, everybody can be a Big Data developer!
  • 22.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Hadoop Cluster Spark Sqoop Hive Pig ODI Oozie Sqoop Data Flow Approaches to Big Data Integration 22 Hadoop Cluster Spark Sqoop Sqoop Hive Pig Manual Code Hadoop Cluster ETLETL HDFS Hadoop Cluster ETLETLETL HDFS 1. Traditional ETL Tools (execute entirely outside of Hadoop) 2. ETL Tools with Native “on” Hadoop (require proprietary code on Data Nodes) 3. Manual Coding (ultimate flexibility, but at a very high cost) 4. ODI Native in Hadoop (no ETL Engine & no Data Node footprint) ETL *small ODI Agent may optionally install off cluster or on Name Node, no dependencies on Data Nodes GG BEST
  • 23.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Oracle Metadata Management & Glossary for Big Data 23 Comprehensive Data Lineage Business Friendly Navigation Business & IT Collaboration Easy to Use, Search Driven
  • 24.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Value of Metadata and Business Glossary My dashboard does not match this report…why? Where did this data come from?Where can I find the data I need for analytics? Which ETL mappings or BI Reports will be affected by my column change? What systems does the data flow through? TRUSTED DATA IT CERTAINTY
  • 25.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Big Data Governance Lifecycle Tooling 25 Operational Data Flows Business Sources Quality KPIs Case Management Governance Cockpit for Data Stewards & Stakeholders Exception Review Metadata Management Business Glossary Design Time Support People and Processes with an end-to- end tooling capability!
  • 26.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | …to manage Risk/Compliance  Records retention  Rediscovery  Litigation support  Data access management  Information security and protection Minimize corporate liability through proper governance of data …to drive Business Value  Metadata discovery  Metadata & glossary cataloging  Data profiling  Data cleansing lifecycle  Data remediation Maximize opportunity by ensuring trusted data is easily available for data driven business processes 26 The Data Governance Opportunity with Big Data Solving business and IT data challenges
  • 27.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Most Heterogeneous, Deep 3rd Party Coverage 27  Hadoop HBase  Hadoop Hive/Flume  HP Enscribe  HP NonStop  HP Neoview  Hypersonic SQL  IBM DB2 i Series  IBM DB2 UDB  IBM DB2 z Series  IBM Informix  IBM Netezza  JMS / MQ  Microsoft Access  Microsoft SQLServer  MySQL  Pivotal Greenplum  PostgreSQL  Salesforce.com  SAP BW / BI  SAP ERP / ECC  SAS  SQL/MP  SQL/MX  Sybase ASE  Sybase IQ  Teradata  Adaptive  Altova  Apache Hcatalog  Apache Hive/HQL  Borland  CA ERwin  Cloudera Impala  COBOL Copybook  DataStax  Embarcadero  EMC ProActivity  GentleWare  Google BigQuery  Grandite  Hadapt Hive  Hortonworks Hive  IBM Cognos  IBM DB2  IBM DataStage  IBM Discovery  IBM Federation Server  IBM Lotus Notes  IBM Netezza  IBM Rational Rose  IBM Rational Architect  Informatica Metadata Mgr.  Informatica PowerCenter  CoSORT  ISO SQL Standard (DDL)  MapR Hadoop Hive  MicroFocus  Microsoft Access  Microsoft Office Excel  Microsoft Visio  Microsoft SQL Server  Microsoft SSIS  Microsoft Visual Studio  Microstrategy  Magic Draw  OMG CWM Standard  OMG UML Standard  Oracle BI Answers  Oracle BI Enterprise Edition  Oracle BI Server  Oracle DAC  Oracle Data Integrator  Oracle Data Modeler  Oracle Database  Oracle Designer  Oracle Hyperion Applications  Oracle Hyperion Essbase  Oracle Warehouse Builder  Pivotal Greenplum  PostgreSQL  QlikView  SAP BO Crystal Reports  SAP BO Designer  SAP BO Desktop Intelligence  SAP BO Repository  SAP BO Data Integrator  SAP BO Data Steward  SAP Master Data Management  SAP Sybase PowerDesigner  SAP Sybase ASE Database  SAS Data Integration Studio  SAS BI Server  SAS Information Map  SAS Metadata Management  SAS OLAP Server  Select  Sparx Architect  Syncsort  Tableau  Talend  Teradata  Tigris  Visible  W3C DTD & XSD Schema Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis)  Oracle Database  Oracle Exadata  Oracle Big Data Appliance  Oracle TimesTen  Oracle OLAP  Oracle Business Intelligence  Oracle BI Applications  Oracle E-Business Suite  Oracle JD Edwards Enterprise One  Oracle JD Edwards World  Oracle Fusion Applications  Oracle Governance Risk and Compliance  Oracle Fusion AIA  Oracle Retail Applications  Oracle Agile BI / DW  Oracle Agile PLM for Process  Oracle iFlex FlexCUBE  Oracle iFlex Mantas  Oracle Hyperion Applications  Oracle PeopleSoft  Oracle Siebel CRM / OnDemand  Oracle Communications  Oracle WebLogic Server  Oracle Coherence Data Grid  Oracle SOA Suite  Oracle Enterprise Service Bus + open APIs and standards based meta-model No other vendor can compare: • 50+ systems for Operational Integration • 70+ systems for Metadata Harvesting
  • 28.
    Copyright © 2014Oracle and/or its affiliates. All rights reserved. | Data Governance Foundation Differentiated Technical Approach from Oracle 28 Dynamic Data Movement – Real-time by default, not ETL – Least invasive on sources – Proven best performance – Native Oracle integration No ETL Engines – Take processing to the data; don’t move the data – Leverage the data engines for workloads (Hadoop or SQL) Most Heterogeneous – Leverage open source Hadoop, not proprietary distributions – Hadoop is the Hub, not ETL tools – Open metadata standards Oracle Data Integrator (Transformation) Enterprise Data Quality (Profile, Cleanse, Match and De-duplicate) Fast Load Oracle GoldenGate (Movement) Enterprise Metadata Management & Business Glossary (Business Glossary, Data Lineage, Impact Analysis and Data Provenance) Data Service Integrator (Federation) GoldenGate Veridata (Online Data Verification) ELT Processing on Hadoop or SQL Continuous Availability Comprehensive capabilities for the end-to-end data integration and governance of all data – including Hadoop based data
  • 29.
    Question & Answersession will be conducted electronically, using the panel to the right of your screen About Oracle and Hortonworks hortonworks.com/partner/oracle/ Get started with Hortonworks Sandbox hortonworks.com/sandbox Follow us: @hortonworks @Oracle Learn more Oracle.com/goto/dataintegration