CONSULTING SOLUTIONS OUTSOURCING
PARTNER FOR A NEW
ERA
Transform Your Business with
Big Data and Hortonworks
Tom Kersnick – Pactera – Director Big Data Solutions
Robby Richardson – Hortonworks – Enterprise Account Manager
Topics
© Pactera. Confidential. All Rights Reserved.
2 Who is Hortonworks?
3 Hortonworks HDP: Enterprise Hadoop Distribution
4
5 Pactera Intro
6 Big Data Deep Dive
Hadoop 2.0: The Enterprise Generation
1 Hortonworks Intro
2
Hortonworks Snapshot
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data Platform
• We engineer, test & certify HDP
for enterprise usage
• We employ the core
architects, builders and
operators of Apache Hadoop
• We drive innovation within
Apache Software Foundation
projects
• We are uniquely positioned to
deliver the highest quality of
Hadoop support
• We enable the ecosystem to
work better with Hadoop
Develop Distribute Support
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo
3© Pactera. Confidential. All Rights Reserved. 3
Rapid Customer Growth
4© Pactera. Confidential. All Rights Reserved. 4
Hortonworks HDP: Enterprise Hadoop 1.x Distribution
© Pactera. Confidential. All Rights Reserved.
OS Cloud VM Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster Recovery,
Security and Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HIVE
(HCATALOG)
PIG HBASE
OOZIE
AMBARI
HDFS
MAP REDUCE
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source and
complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to ensure
interoperability
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
5
Hadoop 2.0… The Enterprise Generation
© Pactera. Confidential. All Rights Reserved.
Business Value
Big Data
Transactions, Interacti
ons, Observations
Single Platform
Multiple Use
BATCH
INTERACTIVE
ONLINE
1.0 Architected for the Large Web Properties
2.0 Architected for the Broad Enterprise
Enterprise Requirements Hadoop 2.0 Features
Mixed workloads YARN
Interactive Query Hive on Tez
Reliability Full Stack HA
Point in time Recovery Snapshots
Multi Data Center Disaster Recovery
ZERO downtime Rolling Upgrades
Security Knox Gateway
6
HDP: Enterprise Hadoop 2.0 Distribution
© Pactera. Confidential. All Rights Reserved.
OS/VM Cloud Appliance
PLATFORM
SERVICES
HADOOP
CORE
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling
Upgrades, Security and
Snapshots
HORTONWORKS
DATA PLATFORM (HDP)
OPERATIONAL
SERVICES
DATA
SERVICES
HIVE &
HCATALOG
PIG HBASE
HDFS
MAP
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source and
complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to ensure
interoperability
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN*
TEZ* OTHERREDUCE
7
Seamless Interoperability with Microsoft Tools
© Pactera. Confidential. All Rights Reserved.
• Integrated with Microsoft
tools for native big data
analysis
» Bi-directional connectors for SQL
Server and SQL Azure through
SQOOP
» Excel ODBC integration through
Hive
• Addressing demand for
Hadoop on Windows
» Ideal for Windows customers with
Hadoop operational experience
• Enables most common
Hadoop workloads in the
Enterprise
» Data refinement and ETL offload
for high-volume data landing
» Data exploration for discovery of
new business opportunities
» Data enrichment for fined tuned
delivery and recommendation
engines
APPLICATIONSDATASYSTEMS
Microsoft Applications
HORTONWORKS
DATA PLATFORM
For Windows
DATASOURCES
MOBILE
DATA
OLTP, PO
S
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
8
Transferring Our Hadoop Expertise to You
© Pactera. Confidential. All Rights Reserved.
The expert source for
Apache Hadoop training & certification
• World class training programs designed to help you learn
fast
• Role-based hands on classes with 50% lab time
• Certification to demonstrate Hadoop Expertise in
Development and Administration
• Expert consulting services
• Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox
• Free download
• Fastest way to learn Apache Hadoop
• Personal, portable Hadoop environment
9
Hortonworks Summary
© Pactera. Confidential. All Rights Reserved.
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out moreworks.com
» www.hortonworks.com/hadoop-training/
» www.hortonworks.com/sandbox
10
Big Data is Critical
© Pactera. Confidential. All Rights Reserved.
Challenges to Using Big Data
Given that nearly less than one-third of businesses are in the dark about their
available data, it makes sense that silos are the primary hurdle in using this
information.
Lack of
sharing data is
an obstacle to
measuring
marketing ROI
Not using data
effectively to
personalize
marketing
communications
Not able to
link data
together at
the individual
customer level
Data collected
infrequently or
not quickly
enough
Too little or no
customer/
consumer data
51% 45% 42% 39% 29%
11
What Initiatives Are Using Big Data
© Pactera. Confidential. All Rights Reserved. 12
Obstacles to Define Big Data ROI
© Pactera. Confidential. All Rights Reserved.
Not enough skilled resources for adaptation
• Advance competencies
Traditional IT Architectures cause limitations
• Identifying the right technologies
• Adapting to particular needs
• Assemble business use cases
• Silos
Optimizing Solutions
• Strong internal use cases
• Inability to effectively automate data
13
Keys to a Successful Big Data Initiative
© Pactera. Confidential. All Rights Reserved.
Define the Impact
• Short term VS. Long term measures
What cannot be answered today?
• This is your starting point
Create User Centric Internal Applications
• Decision support framework
Predicting the Consumer
• Algorithms, Models, Testing, and
More Testing!
14
Solution Architecture using Multiple Ecosystems
© Pactera. Confidential. All Rights Reserved.
incoming
outgoing
Real Time In-Memory
Solution
EDW
Hadoop
Sand
box
2
3
4
7
8
9
6
5
Models
Algorithms
Simulations
1. Data Feeds into a Real-Time Memory solution that will ingest data into EDW, Hadoop, and other platforms as
mobile, API’s, etc.
2. ELT streaming into In-Memory Solution to provide visibility to Real-Time Social, Mobile, and Shell approaches to
Algorithms, Models, and Simulations
3. In-Memory Real-Time Solution such as YARN or Storm to digest data to EDW, Hadoop, Social Media, and other such
platforms.
4. EDW for Structured Information from Sources in 1.
5. Hadoop for semi-structured and unstructured data. Solution architecture including Sand Box availability.
6. Shell UI Interfaces utilizing data from Real-Time in memory solution as well as EDW, Hadoop, etc. for
Models, Algorithms, and Simulations.
7. Structured and Unstructured Reporting in reporting interfaces
8. Deep Dive analytics in Hadoop and Real-time Streaming
9. Real-Time customer interaction for Social and other similar platforms.
1
15
Predictive
Analysis
Use Case
for Online Travel
Company
16© Pactera. Confidential. All Rights Reserved.
Flight Cost by Variants Determination
Data Feeds utilize real-time in-memory streaming to execute matching algorithms.
Used in order to determine views within a session of certain one-way and round trip
flights viewed by users.
Predictive Analytics algorithms determine how to increase/decrease prices based on
views, market pricing, time, and availability.
© Pactera. Confidential. All Rights Reserved.
http
logs
partners
custom
incoming
outgoing
destinations
rdbms
hadoop
application
mobile
Real Time In-Memory Solution
(Storm)
17
Solution Architecture using YARN
© Pactera. Confidential. All Rights Reserved.
• Created to manage resource needs across all uses
• Ensures predictable performance & QoS for all apps
• Enables apps to run “IN” Hadoop rather than “ON”
» Key to leveraging all other common services of the Hadoop platform:
security, data lifecycle management, etc.
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Storm, S4,…)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
ONLINE
(HBase)
OTHER
(Search)
(Weave…)
18
Pactera Big Data Capability
© Pactera. Confidential. All Rights Reserved.
Big Data Solution Architecture
 In-Memory Solutions
 Scalable Distributed Platforms
Next Generation Analytics
 Models, Algorithms, and Simulations
 Visualization
Improving Operational Ability
 Help companies drive more operational efficiencies from existing
investments.
 Moving from the realm of data scientists into everyday business transactions
and encounters.
New Business Processes
 Impact on both customer intelligence and operational efficiency by making
everything immediately actionable.
 Armed with immediate decision-making capability and intelligence,
companies will be able to implement new business processes that will
change how business is done.
 We ask the Right Questions
19
How Pactera can help with Big Data
Implementation and Architecture
Benchmark and Monitoring
Implementation and Architecture
POC (2-4 Weeks)
© Pactera. Confidential. All Rights Reserved.
Executive Workshop
Strategies, Planning, and Expectations
• Big Data strategy on what tomorrow will look like
• Using Big Data to establish market dominance
• Big Data project takeaways
• Roadblocks to implementing Big Data analytics
• Defining an ROI for Big Data
• Getting the right ROI on Big Data
Workshop
(4 Hours)
Proof of Concept
(2-4 Weeks)
Projects:
•Benchmark & Monitoring
•Integrations & Migrations
•Implementation & Architecture
•Project Management
•Analytics
•Reporting
Technical Workshop
End-To-End Management
• System tuning/auto-tuning and configuration management
• Dealing with both structured and unstructured data
• Monitoring, diagnosis, and automated behavior detection
Solution Architecture
• Processor, memory, and system architectures for data analysis
• Benchmarks, metrics, and workload characterization for big
data
• Availability, fault tolerance and recovery issues
• Data management and analytics for vast amounts of
unstructured data
20
© Pactera. Confidential. All Rights Reserved.
Thank You
Tom Kersnick
tom.kersnick@pactera.com
Robby Richardson
rrichardson@hortonworks.com

Transform Your Business with Big Data and Hortonworks

  • 1.
    CONSULTING SOLUTIONS OUTSOURCING PARTNERFOR A NEW ERA Transform Your Business with Big Data and Hortonworks Tom Kersnick – Pactera – Director Big Data Solutions Robby Richardson – Hortonworks – Enterprise Account Manager
  • 2.
    Topics © Pactera. Confidential.All Rights Reserved. 2 Who is Hortonworks? 3 Hortonworks HDP: Enterprise Hadoop Distribution 4 5 Pactera Intro 6 Big Data Deep Dive Hadoop 2.0: The Enterprise Generation 1 Hortonworks Intro 2
  • 3.
    Hortonworks Snapshot • Wedistribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo 3© Pactera. Confidential. All Rights Reserved. 3
  • 4.
    Rapid Customer Growth 4©Pactera. Confidential. All Rights Reserved. 4
  • 5.
    Hortonworks HDP: EnterpriseHadoop 1.x Distribution © Pactera. Confidential. All Rights Reserved. OS Cloud VM Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE (HCATALOG) PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS 5
  • 6.
    Hadoop 2.0… TheEnterprise Generation © Pactera. Confidential. All Rights Reserved. Business Value Big Data Transactions, Interacti ons, Observations Single Platform Multiple Use BATCH INTERACTIVE ONLINE 1.0 Architected for the Large Web Properties 2.0 Architected for the Broad Enterprise Enterprise Requirements Hadoop 2.0 Features Mixed workloads YARN Interactive Query Hive on Tez Reliability Full Stack HA Point in time Recovery Snapshots Multi Data Center Disaster Recovery ZERO downtime Rolling Upgrades Security Knox Gateway 6
  • 7.
    HDP: Enterprise Hadoop2.0 Distribution © Pactera. Confidential. All Rights Reserved. OS/VM Cloud Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE HDFS MAP Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN* TEZ* OTHERREDUCE 7
  • 8.
    Seamless Interoperability withMicrosoft Tools © Pactera. Confidential. All Rights Reserved. • Integrated with Microsoft tools for native big data analysis » Bi-directional connectors for SQL Server and SQL Azure through SQOOP » Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows » Ideal for Windows customers with Hadoop operational experience • Enables most common Hadoop workloads in the Enterprise » Data refinement and ETL offload for high-volume data landing » Data exploration for discovery of new business opportunities » Data enrichment for fined tuned delivery and recommendation engines APPLICATIONSDATASYSTEMS Microsoft Applications HORTONWORKS DATA PLATFORM For Windows DATASOURCES MOBILE DATA OLTP, PO S SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) 8
  • 9.
    Transferring Our HadoopExpertise to You © Pactera. Confidential. All Rights Reserved. The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast • Role-based hands on classes with 50% lab time • Certification to demonstrate Hadoop Expertise in Development and Administration • Expert consulting services • Programs designed to transfer knowledge • Industry leading Hadoop Sandbox • Free download • Fastest way to learn Apache Hadoop • Personal, portable Hadoop environment 9
  • 10.
    Hortonworks Summary © Pactera.Confidential. All Rights Reserved. • Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source. • Best in industry support with flexible pricing model • Find out moreworks.com » www.hortonworks.com/hadoop-training/ » www.hortonworks.com/sandbox 10
  • 11.
    Big Data isCritical © Pactera. Confidential. All Rights Reserved. Challenges to Using Big Data Given that nearly less than one-third of businesses are in the dark about their available data, it makes sense that silos are the primary hurdle in using this information. Lack of sharing data is an obstacle to measuring marketing ROI Not using data effectively to personalize marketing communications Not able to link data together at the individual customer level Data collected infrequently or not quickly enough Too little or no customer/ consumer data 51% 45% 42% 39% 29% 11
  • 12.
    What Initiatives AreUsing Big Data © Pactera. Confidential. All Rights Reserved. 12
  • 13.
    Obstacles to DefineBig Data ROI © Pactera. Confidential. All Rights Reserved. Not enough skilled resources for adaptation • Advance competencies Traditional IT Architectures cause limitations • Identifying the right technologies • Adapting to particular needs • Assemble business use cases • Silos Optimizing Solutions • Strong internal use cases • Inability to effectively automate data 13
  • 14.
    Keys to aSuccessful Big Data Initiative © Pactera. Confidential. All Rights Reserved. Define the Impact • Short term VS. Long term measures What cannot be answered today? • This is your starting point Create User Centric Internal Applications • Decision support framework Predicting the Consumer • Algorithms, Models, Testing, and More Testing! 14
  • 15.
    Solution Architecture usingMultiple Ecosystems © Pactera. Confidential. All Rights Reserved. incoming outgoing Real Time In-Memory Solution EDW Hadoop Sand box 2 3 4 7 8 9 6 5 Models Algorithms Simulations 1. Data Feeds into a Real-Time Memory solution that will ingest data into EDW, Hadoop, and other platforms as mobile, API’s, etc. 2. ELT streaming into In-Memory Solution to provide visibility to Real-Time Social, Mobile, and Shell approaches to Algorithms, Models, and Simulations 3. In-Memory Real-Time Solution such as YARN or Storm to digest data to EDW, Hadoop, Social Media, and other such platforms. 4. EDW for Structured Information from Sources in 1. 5. Hadoop for semi-structured and unstructured data. Solution architecture including Sand Box availability. 6. Shell UI Interfaces utilizing data from Real-Time in memory solution as well as EDW, Hadoop, etc. for Models, Algorithms, and Simulations. 7. Structured and Unstructured Reporting in reporting interfaces 8. Deep Dive analytics in Hadoop and Real-time Streaming 9. Real-Time customer interaction for Social and other similar platforms. 1 15
  • 16.
    Predictive Analysis Use Case for OnlineTravel Company 16© Pactera. Confidential. All Rights Reserved.
  • 17.
    Flight Cost byVariants Determination Data Feeds utilize real-time in-memory streaming to execute matching algorithms. Used in order to determine views within a session of certain one-way and round trip flights viewed by users. Predictive Analytics algorithms determine how to increase/decrease prices based on views, market pricing, time, and availability. © Pactera. Confidential. All Rights Reserved. http logs partners custom incoming outgoing destinations rdbms hadoop application mobile Real Time In-Memory Solution (Storm) 17
  • 18.
    Solution Architecture usingYARN © Pactera. Confidential. All Rights Reserved. • Created to manage resource needs across all uses • Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON” » Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc. Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) 18
  • 19.
    Pactera Big DataCapability © Pactera. Confidential. All Rights Reserved. Big Data Solution Architecture  In-Memory Solutions  Scalable Distributed Platforms Next Generation Analytics  Models, Algorithms, and Simulations  Visualization Improving Operational Ability  Help companies drive more operational efficiencies from existing investments.  Moving from the realm of data scientists into everyday business transactions and encounters. New Business Processes  Impact on both customer intelligence and operational efficiency by making everything immediately actionable.  Armed with immediate decision-making capability and intelligence, companies will be able to implement new business processes that will change how business is done.  We ask the Right Questions 19
  • 20.
    How Pactera canhelp with Big Data Implementation and Architecture Benchmark and Monitoring Implementation and Architecture POC (2-4 Weeks) © Pactera. Confidential. All Rights Reserved. Executive Workshop Strategies, Planning, and Expectations • Big Data strategy on what tomorrow will look like • Using Big Data to establish market dominance • Big Data project takeaways • Roadblocks to implementing Big Data analytics • Defining an ROI for Big Data • Getting the right ROI on Big Data Workshop (4 Hours) Proof of Concept (2-4 Weeks) Projects: •Benchmark & Monitoring •Integrations & Migrations •Implementation & Architecture •Project Management •Analytics •Reporting Technical Workshop End-To-End Management • System tuning/auto-tuning and configuration management • Dealing with both structured and unstructured data • Monitoring, diagnosis, and automated behavior detection Solution Architecture • Processor, memory, and system architectures for data analysis • Benchmarks, metrics, and workload characterization for big data • Availability, fault tolerance and recovery issues • Data management and analytics for vast amounts of unstructured data 20
  • 21.
    © Pactera. Confidential.All Rights Reserved. Thank You Tom Kersnick tom.kersnick@pactera.com Robby Richardson rrichardson@hortonworks.com

Editor's Notes

  • #12 Big Data is extremely critical in organizations just to keep up with the masses.In most retail organizations, internal data is very challenging to comprehend in understanding your customer as well as demand.Publications state that 1/3 of retailers are in the dark regarding data that could be available to them. The Silo approach within organizations is the primary cause of the broken data pipeline.The primary reasons as of why this is a hurdle are due to:*The lack of sharing data – definitely a major obstacle in measuring ROI*Misuse of available data in marketing communications – not able to personalize directly to your customer*Linking data at the customer level – this is needed to thoroughly understand user behavior*Infrequent data collection – only extracting from logs and online serving systems used within your traditional reporting ecosystem*Not enough customer data – not capturing the details of the customer (includes proper timings of viewed product, key indicators on why a user looks at one product versus another and so on)
  • #18 Flight Cost Variant Determination Flight Cost is one of the algorithm methods being used to increase/decrease revenue based on page views, consumer marketing, and time spent on a particular one-way or round-trip flight by a consumer. The goal is to provide not only alternatives, but increase/decrease cost while other consumers are also viewing the same flights. This is determined by sales from all related airlines and competitors during the flight availability. This method can be extended to use other sources as well.Destinations:web applicationsmobile applicationshadooprdbmsIn the solution architecture shown, the in memory solution processes views, marketing, customer behavior, time, and competitor results to derive a increased or decreased price for a given one-way or round trip flight. This allows this travel company to determine the proper pricing based on these measures within an algorithm. The architecture shown also allows this travel company to try out other predictive models at any given point in time to see if one model out performs another. They could be utilizing similar measures and outcomes as well as new derived measures from their predictive models. Overall, this is a win for the travel company. Never losing revenue from the original ‘bread and butter’ model they always apply. Fascinating right?As you can see in the outgoing destinations, this provides consistent results in all platforms allowing a finite understanding of how the travel company is generating results overall. The solution can provide endless results based on predictive models that can be applied in real-time. Any day, any time, any millisecond.
  • #21 Pactera offers a complete life cycle solutions within your organization. We offer a free 4 hour executive and technical workshop within your organization. We just ask for you to fill out a 1 page questionnaire to help us understand your expectations.The executive workshop entails strategy, planning, and your current and future goals.The technical workshop is a deep dive involving end to end management and a proper solution architecture based on your current and up and coming goals. Once the workshops is complete, we will provide you an assessment of the outcome.We also offer a 2-4 week proof of concept to ensure your project is put into action. And finally, we offer Full lifecycle in the following:Benchmark & MonitoringIntegrations & MigrationsImplementation & ArchitectureProject ManagementAnalyticsReporting