© 2015 IBM Corporation
BigInsights on Cloud
Hadoop-as-a-Service
July 28th, 2015
© 2015 IBM Corporation2
Disclaimer
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion. Information regarding potential future
products is intended to outline our general product direction and it should not be relied on in
making a purchasing decision. The information mentioned regarding potential future products
is not a commitment, promise, or legal obligation to deliver any material, code or functionality.
Information about potential future products may not be incorporated into any contract. The
development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
© 2015 IBM Corporation3
Agenda
• Evolution of the Big Data Analytics space
• Open Data Platform and IBM’s BigInsights
• Hadoop as a Service – BigInsights on Cloud Options
• IBM Analytics for Hadoop – Free, 14-day trial
• BigInsights for Apache Hadoop – Bare Metal option for Production
• Demo
• Questions & Answers
• Resources
© 2015 IBM Corporation4
“At the World Economic
Forum last month in Davos,
Switzerland, Big Data was a
marquee topic. A report by the
forum, “Big Data, Big Impact,”
declared data a new class of
economic asset, like
currency or gold.
“Companies are being
inundated with data—from
information on customer-buying
habits to supply-chain efficiency.
But many managers struggle to
make sense of the numbers.”
“Increasingly, businesses are
applying analytics to social
media such as Facebook and
Twitter, as well as to product
review websites, to try to
“understand where customers are,
what makes them tick and what
they want”, says Deepak Advani,
who heads IBM’s predictive
analytics group.”
“Big Data has arrived at Seton
Health Care Family, fortunately
accompanied by an
analytics tool that will help
deal with the complexity of
more than two million
patient contacts a year…”
“Data is the new oil.”
Clive Humby
The Oscar Senti-meter — a tool
developed by the L.A. Times, IBM
and the USC Annenberg
Innovation Lab — analyzes
opinions about the Academy
Awards race shared in millions
of public messages on Twitter.”
Big Data continues to be a hot topic in the market
“…now Watson is being put to
work digesting millions of
pages of research,
incorporating the best clinical
practices and monitoring the
outcomes to assist physicians in
treating cancer patients.”
© 2015 IBM Corporation5
An automotive company is running a
series of experiments to better
understand and adapt to shifting
landscape of urban transportation by
streaming data from sensors on cars
using InfoSphere Streams to analyze it
on Hadoop using BigInsights on Cloud
Industrial manufacturer in the United
States reduces errors and the time
required for engine calibrations by 90
percent and improves reliability and new
product design by using sensors to collect
information on its products in the field and
analyzing it using InfoSphere BigInsights
Big Data implementations are driving real
business value for IBM customers
© 2015 IBM Corporation6
Rich capabilities in IBM’s Big Data Portfolio mean
lower risk and more successful projects
On premise, Cloud, and “as a Service”
BigInsights
© 2015 IBM Corporation7
Open Data Platform and IBM BigInsights
© 2015 IBM Corporation8
Open Data Platform Initiative
Why is IBM involved?
 Strong history of leadership in open source & standards
 Supports our commitment to open source currency in all
future releases
 Accelerates our innovation within Hadoop &
surrounding applications
Open Data Platform (ODP) vs. Apache Software
Foundation (ASF)
 ODP supports the ASF mission
 ASF provides a governance model around individual
projects without looking at ecosystem
 ODP aims to provide a vendor-led consistent packaging
model for core Apache components as an ecosystem
All Standard Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODP
© 2015 IBM Corporation9
SQL on Hadoop
Big SQL – optimized ANSI compliant SQL
Application Tooling
Toolkits and accelerators
Search & Entity Matching
Watson Explorer, Big Mach
Data Visualization
BigSheets spreadsheet interface
Predictive Modeling
Big R, Machine Learning
Text Analytics
Advanced text processing with AQL, Text
extraction web interface
Real-time Analytics
Streams
Data Governance and Security
DataClick, LDAP, Secure cluster
Storage Integration
GPFS - POSIX Distributed Filesystem
Enterprise Manageability
Adaptive MapReduce, Multi-tenant
scheduling
BigInsights for Apache Hadoop
IOP + IBM Value Adds = BigInsights
Knox
Ambari
Snappy
Open JDK
Avro
Solr
Oozie
Flume
Slider
Pig
Hadoop
HDFS/MapReduce/YARN*
Zookeeper
Parquet
HBase
IBM Open Platform (IOP)
Spark
Hive
Sqoop
ODP
© 2015 IBM Corporation10
BigInsights Users & Role-Based Modules
IBM Open Platform
BigInsights for
Apache Hadoop
© 2015 IBM Corporation11
BigInsights on Cloud
© 2015 IBM Corporation12
IBM Open Platform uses Ambari
© 2015 IBM Corporation13
BigInsights Home
© 2015 IBM Corporation14
IBM BigInsights – BigSheets
Spreadsheet style analysis tool for business users
Easily visualize big data using
rich built-in graphing and
analytic functions
© 2015 IBM Corporation15
Big SQL in BigInsights
Data Sources
Hive Tables HBase Tables
BigSQL Engine
BigInsights
Application
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Native Sources
CSV SEQ
Parquet RC
AVRO ORC
JSON Custom
 ANSI SQL 2011 Compliant
 IBM’s SQL for Hadoop
• Makes Hadoop data accessible
to a wider audience
• Familiar, widely known syntax
• Leverage native Hadoop
data sources
 Complements the Data
Warehouse
• Exploratory analytics
• Sandbox, Data Lake
 Included in BigInsights
 Use familiar SQL tools
• Cognos, SPSS, Tableau,
MicroStrategy
© 2015 IBM Corporation16
Example of text analytic tooling: Graphical
interface to describe structure of various
textual formats – from log file data to natural
language. Users do not need to now AQL
IBM BigInsights – Text Analytics
Information Extraction Framework for Text Analytics
© 2015 IBM Corporation17
R Clients
Embedded R Execution
R Packages
1
2
 Explore, visualize, transform, and
model big data using familiar R
syntax and paradigm
 Scale out R
 Partitioning of large data (“divide”)
 Parallel cluster execution of
pushed down R code (“conquer”)
 All of this from within the R
environment (Jaql, Map/Reduce
are hidden from you)
 Almost any R package can run in
this environment
Pull data
summaries to R
client
Or, push R
functions right
on the data
Data sources
R Packages
IBM BigInsights – Big R
End-to-end integration of R into BigInsights
© 2015 IBM Corporation18
 Prototype, create mash-ups in
the cloud for non-production use
 Empowers developers to rapidly
drive insight from all data
 Two-node Docker Instance
 Enterprise features – BigSheets,
Big SQL, Text, and Big R
 Delivered via IBM Bluemix
 50 GB – input data space
 Extendable, Free 14-day Trial
 For Production deployments at scale
in the cloud
 Delivers flexibility and efficiency
with BYOL and PAYG pricing
 Scale to meet spikes in demand
without on-premise infrastructure
 Perform enterprise-class, complex
analytics on Big Data Available via
the IBM Cloud Marketplace
 Web-based UI for Sizing/Pricing
IBM BigInsights – Cloud deployment options
Manage less, analyze more
IBM Analytics for Hadoop BigInsights for Apache Hadoop
© 2015 IBM Corporation19
IBM Analytics for Hadoop Details
 Free 14-day trial on www.bluemix.net
© 2015 IBM Corporation20
BigInsights for Apache Hadoop – Options
Secure, Dedicated Bare-metal
Infrastructure
IBM Open Platform
BigInsights for
Apache Hadoop
© 2015 IBM Corporation21
IBM BigInsights on Cloud – Security
 Dedicated, isolated environment for every client
 Administrative control owned by customer at Hadoop
and BigInsights level
 Native HDFS encryption; optional Guardium encryption
 Firewalls provide perimeter security and private network isolation
 Aiming for ISO 27K1 compliance in 2015
 Example Configuration…
Non-shared physical machines for added security & performance
© 2015 IBM Corporation22
BigInsights on Cloud
Demonstration
© 2015 IBM Corporation23
The IBM Difference
 IBM delivers the foundation for Big Data – now and in the future
 Embraces open source
 Establishes standards
 Integrates with familiar interfaces and established systems
 Delivers advanced analytic capabilities
 IBM is the only vendor providing…
 Hadoop as a Managed Service in the Cloud
 A single company providing Hadoop-base software, cloud and services
 Provides expertise to help you on your journey
 6,000 partners
 Analytics services and solution centers
© 2015 IBM Corporation24
IBM BigInsights on Cloud – unique capability
Built-in Twitter Decahose service
 Scaled down random sample of Twitter Firehose
 Easily land Twitter data into BigInsights HDFS
 Manipulate and visualize data using BigSheets
 Incorporate sentiment data into analytic models
 Easily store and accommodate vast data sets
© 2015 IBM Corporation25
Check out more data management services at www.bluemix.net
Cloudant dashDB
BigInsights on
Cloud
DB2 on Cloud
© 2015 IBM Corporation26
 Big Data University – Free Training
http://bigdatauniversity.com/
 Powered by Hadoop
http://wiki.apache.org/hadoop/PoweredBy
 Free Trial Software (both for on-premise and cloud)
http://www-01.ibm.com/software/data/infosphere/hadoop/trials.html
 YouTube Videos
 Watson
• The Science Behind the Answer (~7 minutes)
• Watson: Final Jeopardy (~11 minute summary)
 Big Data Channel
• http://www.youtube.com/user/ibmbigdata
Resources
© 2015 IBM Corporation27
Thank You
Merci
Grazie
Gracias Obrigado
Danke
Japanese
French
German
Italian
Spanish
Portuguese
Traditional Chinese
Simplified Chinese
Romanian
Multumesc
Turkish
Teşekkür ederim
English

Get Started Quickly with IBM's Hadoop as a Service

  • 1.
    © 2015 IBMCorporation BigInsights on Cloud Hadoop-as-a-Service July 28th, 2015
  • 2.
    © 2015 IBMCorporation2 Disclaimer IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
  • 3.
    © 2015 IBMCorporation3 Agenda • Evolution of the Big Data Analytics space • Open Data Platform and IBM’s BigInsights • Hadoop as a Service – BigInsights on Cloud Options • IBM Analytics for Hadoop – Free, 14-day trial • BigInsights for Apache Hadoop – Bare Metal option for Production • Demo • Questions & Answers • Resources
  • 4.
    © 2015 IBMCorporation4 “At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold. “Companies are being inundated with data—from information on customer-buying habits to supply-chain efficiency. But many managers struggle to make sense of the numbers.” “Increasingly, businesses are applying analytics to social media such as Facebook and Twitter, as well as to product review websites, to try to “understand where customers are, what makes them tick and what they want”, says Deepak Advani, who heads IBM’s predictive analytics group.” “Big Data has arrived at Seton Health Care Family, fortunately accompanied by an analytics tool that will help deal with the complexity of more than two million patient contacts a year…” “Data is the new oil.” Clive Humby The Oscar Senti-meter — a tool developed by the L.A. Times, IBM and the USC Annenberg Innovation Lab — analyzes opinions about the Academy Awards race shared in millions of public messages on Twitter.” Big Data continues to be a hot topic in the market “…now Watson is being put to work digesting millions of pages of research, incorporating the best clinical practices and monitoring the outcomes to assist physicians in treating cancer patients.”
  • 5.
    © 2015 IBMCorporation5 An automotive company is running a series of experiments to better understand and adapt to shifting landscape of urban transportation by streaming data from sensors on cars using InfoSphere Streams to analyze it on Hadoop using BigInsights on Cloud Industrial manufacturer in the United States reduces errors and the time required for engine calibrations by 90 percent and improves reliability and new product design by using sensors to collect information on its products in the field and analyzing it using InfoSphere BigInsights Big Data implementations are driving real business value for IBM customers
  • 6.
    © 2015 IBMCorporation6 Rich capabilities in IBM’s Big Data Portfolio mean lower risk and more successful projects On premise, Cloud, and “as a Service” BigInsights
  • 7.
    © 2015 IBMCorporation7 Open Data Platform and IBM BigInsights
  • 8.
    © 2015 IBMCorporation8 Open Data Platform Initiative Why is IBM involved?  Strong history of leadership in open source & standards  Supports our commitment to open source currency in all future releases  Accelerates our innovation within Hadoop & surrounding applications Open Data Platform (ODP) vs. Apache Software Foundation (ASF)  ODP supports the ASF mission  ASF provides a governance model around individual projects without looking at ecosystem  ODP aims to provide a vendor-led consistent packaging model for core Apache components as an ecosystem All Standard Apache Open Source Components HDFS YARN MapReduce Ambari HBase Spark Flume Hive Pig Sqoop HCatalog Solr/Lucene ODP
  • 9.
    © 2015 IBMCorporation9 SQL on Hadoop Big SQL – optimized ANSI compliant SQL Application Tooling Toolkits and accelerators Search & Entity Matching Watson Explorer, Big Mach Data Visualization BigSheets spreadsheet interface Predictive Modeling Big R, Machine Learning Text Analytics Advanced text processing with AQL, Text extraction web interface Real-time Analytics Streams Data Governance and Security DataClick, LDAP, Secure cluster Storage Integration GPFS - POSIX Distributed Filesystem Enterprise Manageability Adaptive MapReduce, Multi-tenant scheduling BigInsights for Apache Hadoop IOP + IBM Value Adds = BigInsights Knox Ambari Snappy Open JDK Avro Solr Oozie Flume Slider Pig Hadoop HDFS/MapReduce/YARN* Zookeeper Parquet HBase IBM Open Platform (IOP) Spark Hive Sqoop ODP
  • 10.
    © 2015 IBMCorporation10 BigInsights Users & Role-Based Modules IBM Open Platform BigInsights for Apache Hadoop
  • 11.
    © 2015 IBMCorporation11 BigInsights on Cloud
  • 12.
    © 2015 IBMCorporation12 IBM Open Platform uses Ambari
  • 13.
    © 2015 IBMCorporation13 BigInsights Home
  • 14.
    © 2015 IBMCorporation14 IBM BigInsights – BigSheets Spreadsheet style analysis tool for business users Easily visualize big data using rich built-in graphing and analytic functions
  • 15.
    © 2015 IBMCorporation15 Big SQL in BigInsights Data Sources Hive Tables HBase Tables BigSQL Engine BigInsights Application SQL Language JDBC / ODBC Driver JDBC / ODBC Server Native Sources CSV SEQ Parquet RC AVRO ORC JSON Custom  ANSI SQL 2011 Compliant  IBM’s SQL for Hadoop • Makes Hadoop data accessible to a wider audience • Familiar, widely known syntax • Leverage native Hadoop data sources  Complements the Data Warehouse • Exploratory analytics • Sandbox, Data Lake  Included in BigInsights  Use familiar SQL tools • Cognos, SPSS, Tableau, MicroStrategy
  • 16.
    © 2015 IBMCorporation16 Example of text analytic tooling: Graphical interface to describe structure of various textual formats – from log file data to natural language. Users do not need to now AQL IBM BigInsights – Text Analytics Information Extraction Framework for Text Analytics
  • 17.
    © 2015 IBMCorporation17 R Clients Embedded R Execution R Packages 1 2  Explore, visualize, transform, and model big data using familiar R syntax and paradigm  Scale out R  Partitioning of large data (“divide”)  Parallel cluster execution of pushed down R code (“conquer”)  All of this from within the R environment (Jaql, Map/Reduce are hidden from you)  Almost any R package can run in this environment Pull data summaries to R client Or, push R functions right on the data Data sources R Packages IBM BigInsights – Big R End-to-end integration of R into BigInsights
  • 18.
    © 2015 IBMCorporation18  Prototype, create mash-ups in the cloud for non-production use  Empowers developers to rapidly drive insight from all data  Two-node Docker Instance  Enterprise features – BigSheets, Big SQL, Text, and Big R  Delivered via IBM Bluemix  50 GB – input data space  Extendable, Free 14-day Trial  For Production deployments at scale in the cloud  Delivers flexibility and efficiency with BYOL and PAYG pricing  Scale to meet spikes in demand without on-premise infrastructure  Perform enterprise-class, complex analytics on Big Data Available via the IBM Cloud Marketplace  Web-based UI for Sizing/Pricing IBM BigInsights – Cloud deployment options Manage less, analyze more IBM Analytics for Hadoop BigInsights for Apache Hadoop
  • 19.
    © 2015 IBMCorporation19 IBM Analytics for Hadoop Details  Free 14-day trial on www.bluemix.net
  • 20.
    © 2015 IBMCorporation20 BigInsights for Apache Hadoop – Options Secure, Dedicated Bare-metal Infrastructure IBM Open Platform BigInsights for Apache Hadoop
  • 21.
    © 2015 IBMCorporation21 IBM BigInsights on Cloud – Security  Dedicated, isolated environment for every client  Administrative control owned by customer at Hadoop and BigInsights level  Native HDFS encryption; optional Guardium encryption  Firewalls provide perimeter security and private network isolation  Aiming for ISO 27K1 compliance in 2015  Example Configuration… Non-shared physical machines for added security & performance
  • 22.
    © 2015 IBMCorporation22 BigInsights on Cloud Demonstration
  • 23.
    © 2015 IBMCorporation23 The IBM Difference  IBM delivers the foundation for Big Data – now and in the future  Embraces open source  Establishes standards  Integrates with familiar interfaces and established systems  Delivers advanced analytic capabilities  IBM is the only vendor providing…  Hadoop as a Managed Service in the Cloud  A single company providing Hadoop-base software, cloud and services  Provides expertise to help you on your journey  6,000 partners  Analytics services and solution centers
  • 24.
    © 2015 IBMCorporation24 IBM BigInsights on Cloud – unique capability Built-in Twitter Decahose service  Scaled down random sample of Twitter Firehose  Easily land Twitter data into BigInsights HDFS  Manipulate and visualize data using BigSheets  Incorporate sentiment data into analytic models  Easily store and accommodate vast data sets
  • 25.
    © 2015 IBMCorporation25 Check out more data management services at www.bluemix.net Cloudant dashDB BigInsights on Cloud DB2 on Cloud
  • 26.
    © 2015 IBMCorporation26  Big Data University – Free Training http://bigdatauniversity.com/  Powered by Hadoop http://wiki.apache.org/hadoop/PoweredBy  Free Trial Software (both for on-premise and cloud) http://www-01.ibm.com/software/data/infosphere/hadoop/trials.html  YouTube Videos  Watson • The Science Behind the Answer (~7 minutes) • Watson: Final Jeopardy (~11 minute summary)  Big Data Channel • http://www.youtube.com/user/ibmbigdata Resources
  • 27.
    © 2015 IBMCorporation27 Thank You Merci Grazie Gracias Obrigado Danke Japanese French German Italian Spanish Portuguese Traditional Chinese Simplified Chinese Romanian Multumesc Turkish Teşekkür ederim English