Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1
Storm Overview &
comparison to OEP
Prabhu Thukkaram
Senior Director, Engineering, OEP
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.2
 The comparisons and opinions
expressed here are my own and do
not represent the position of my
employer.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.3
Storm Realtime Computation
 Distributed & fault tolerant platform for realtime computations
 Storm is for realtime computations as Hadoop is to batch
 Born at Twitter for implementing real-time Twitter analytics
 Now open sourced to Apache
 Replaces the typical “Queues and Workers” paradigm used in real-time
message processing
 How was Twitter Analytics implemented before Storm?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.4
Twitter
Fire
Hose Twitter Analytics Before Storm
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.5
Disadvantages of Queue/Worker Paradigm
 Lack of Scalability
 Adding additional second-level worker requires reconfiguration of first-
level workers. Requires rehashing, remember hash(url) mod #second-
level-workers
 No HA or Fault tolerance
 Tedious to code
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.6
Storm Advantages
 High Availability
 Guaranteed message processing
 Fault tolerant
 Superb Performance
 No intermediate message brokers
 Millions of messages a second
 Horizontal Scalability
 Dramatic increase in workload ? Just add a node !!
 Higher level abstraction than message passing
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.7
Master
Node/Nimbus
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
Zookeeper
Zookeeper
Storm Cluster
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.8
Storm Concepts & OEP Equivalents
 Stream
 Tuple
 Spout
 Bolt
 Topology
 Stream
 Event
 Adapter
 Processor
 EPN (Event Processing Network)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.9
Storm Topology
Note:
No intermediate message brokers between bolts.
Processors within OEP are typically separated by channels.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.10
Parallelism in Storm
 Spouts and Bolts are inherently parallel
 User code in Spouts and Bolts is executed (as tasks) using multiple threads and can
be configured
 Tasks pass messages directly to each other
 Channels in Oracle Event Processing provide Concurrency, Ordering, and Flow Control
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.11
Stream Grouping
 Determines the consuming Spout/Bolt “task” for an emitted tuple
 Shuffle grouping – Send to random task
 Fields grouping – Send to specific task. Uses consistent hashing on a
subset of tuple fields to determine the task
 All grouping – Send to all tasks, use with care
 Global grouping – Send to task with lowest Id
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.12
Word count with Storm
 Step 1 :- Create a Storm topology
TopologyBuilder t = new TopologyBuilder();
 Step 2 :- Create and add a Spout
t.setSpout(“jmsSpout”, new JMSSpoutQ(“mySentenceQ”, 2);
 Step 3 :- Create and add tokenizer Bolt
t.setBolt(“tokenizer”, new TokenGeneratorBolt(), 3 ).shuffleGrouping(“jmsSpoutQ”);
Note: Consumer decides from where and how to receive the tuple
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.13
Word count with Storm
 Step 4 :- Create and add a counter Bolt
t.setBolt(“wordCount”, new WordCountBolt(), 3 ).fieldGrouping(“tokenizer”, new
Fields(“word”));
 Step 5 :- Submit the topology
Map configuration – new HashMap();
configuration.put(Config.TOPOLOGY_WORKERS, 3);
StormSubmitter.submitTopology(“my-word-count”, configuration, t.createTopology());
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.14
Word count TopologyJMS
jmsSpout
jmsSpout
Complex Event Processing
Oracle Event Processing
tokenizer
tokenizer
tokenizer
wordCount
wordCount
wordCount
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.15
 Distributed, scalable, and fault-
tolerant framework for real-time
computation, but how and the
details are really up to the
application developer.
 No OOTB operators for stream or
real-time data processing -
correlation, rolling averages, event
enrichment, pattern matching,
missing events, etc.
 Time windows must be implemented
and maintained by application
 Holistic platform for developing,
running, and managing real-time
stream processing applications.
 Memory optimized query processing
engine for high-volumes, with simple
programming model. E.g. Detecting
a W pattern in a stock-quote stream
is 6 lines of CQL code vs. 260 lines
of Java code
 Engine maintains window states by
automatically including new events
and flushing out expired events
Storm OEP
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.16
 Onus of event ingestion from
different data sources falls on
application developer
 No OOTB support to correlate
incoming stream data with SQL and
NoSQL data sources.
 No OOTB spatial capabilities.
 No dynamic application
reconfiguration. Requires code
rewrite and redeployment.
 OOTB connectivity to a multitude of
sources - JMS, Flume, Kafka, CSV,
REST, HTTP, etc.
 Full support for SQL and NoSQL
data sources with extensible data
cartridges – Oracle, HBase, etc.
 OOTB functions for 2 and 3
dimensional spatial analysis
 Dynamically inject CQL and tweak
threshold values without stopping
the application
Storm OEP
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.17
 No OOTB support for statistical
functions
 Absence of IDE and Web
development and administration
tools
 No OOTB support for caching
 Fault tolerant but not HA !
Application logic needs to
implement HA
 CQL in OEP supports hundreds of
statistical functions with OOTB
integration to R
 Sports full-fledged IDE for
developing event processing
applications and an Admin console
for managing apps
 OOTB support for Coherence
distributed data grid
 Active-active configuration with
exact replica of window/application
state using clock synchronization.
Storm OEP

Apache Storm and Oracle Event Processing for Real-time Analytics

  • 1.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.1 Storm Overview & comparison to OEP Prabhu Thukkaram Senior Director, Engineering, OEP
  • 2.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.2  The comparisons and opinions expressed here are my own and do not represent the position of my employer.
  • 3.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.3 Storm Realtime Computation  Distributed & fault tolerant platform for realtime computations  Storm is for realtime computations as Hadoop is to batch  Born at Twitter for implementing real-time Twitter analytics  Now open sourced to Apache  Replaces the typical “Queues and Workers” paradigm used in real-time message processing  How was Twitter Analytics implemented before Storm?
  • 4.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.4 Twitter Fire Hose Twitter Analytics Before Storm
  • 5.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.5 Disadvantages of Queue/Worker Paradigm  Lack of Scalability  Adding additional second-level worker requires reconfiguration of first- level workers. Requires rehashing, remember hash(url) mod #second- level-workers  No HA or Fault tolerance  Tedious to code
  • 6.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.6 Storm Advantages  High Availability  Guaranteed message processing  Fault tolerant  Superb Performance  No intermediate message brokers  Millions of messages a second  Horizontal Scalability  Dramatic increase in workload ? Just add a node !!  Higher level abstraction than message passing
  • 7.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.7 Master Node/Nimbus Supervisor Supervisor Supervisor Supervisor Supervisor Zookeeper Zookeeper Zookeeper Storm Cluster
  • 8.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.8 Storm Concepts & OEP Equivalents  Stream  Tuple  Spout  Bolt  Topology  Stream  Event  Adapter  Processor  EPN (Event Processing Network)
  • 9.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.9 Storm Topology Note: No intermediate message brokers between bolts. Processors within OEP are typically separated by channels.
  • 10.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.10 Parallelism in Storm  Spouts and Bolts are inherently parallel  User code in Spouts and Bolts is executed (as tasks) using multiple threads and can be configured  Tasks pass messages directly to each other  Channels in Oracle Event Processing provide Concurrency, Ordering, and Flow Control
  • 11.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.11 Stream Grouping  Determines the consuming Spout/Bolt “task” for an emitted tuple  Shuffle grouping – Send to random task  Fields grouping – Send to specific task. Uses consistent hashing on a subset of tuple fields to determine the task  All grouping – Send to all tasks, use with care  Global grouping – Send to task with lowest Id
  • 12.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.12 Word count with Storm  Step 1 :- Create a Storm topology TopologyBuilder t = new TopologyBuilder();  Step 2 :- Create and add a Spout t.setSpout(“jmsSpout”, new JMSSpoutQ(“mySentenceQ”, 2);  Step 3 :- Create and add tokenizer Bolt t.setBolt(“tokenizer”, new TokenGeneratorBolt(), 3 ).shuffleGrouping(“jmsSpoutQ”); Note: Consumer decides from where and how to receive the tuple
  • 13.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.13 Word count with Storm  Step 4 :- Create and add a counter Bolt t.setBolt(“wordCount”, new WordCountBolt(), 3 ).fieldGrouping(“tokenizer”, new Fields(“word”));  Step 5 :- Submit the topology Map configuration – new HashMap(); configuration.put(Config.TOPOLOGY_WORKERS, 3); StormSubmitter.submitTopology(“my-word-count”, configuration, t.createTopology());
  • 14.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.14 Word count TopologyJMS jmsSpout jmsSpout Complex Event Processing Oracle Event Processing tokenizer tokenizer tokenizer wordCount wordCount wordCount
  • 15.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.15  Distributed, scalable, and fault- tolerant framework for real-time computation, but how and the details are really up to the application developer.  No OOTB operators for stream or real-time data processing - correlation, rolling averages, event enrichment, pattern matching, missing events, etc.  Time windows must be implemented and maintained by application  Holistic platform for developing, running, and managing real-time stream processing applications.  Memory optimized query processing engine for high-volumes, with simple programming model. E.g. Detecting a W pattern in a stock-quote stream is 6 lines of CQL code vs. 260 lines of Java code  Engine maintains window states by automatically including new events and flushing out expired events Storm OEP
  • 16.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.16  Onus of event ingestion from different data sources falls on application developer  No OOTB support to correlate incoming stream data with SQL and NoSQL data sources.  No OOTB spatial capabilities.  No dynamic application reconfiguration. Requires code rewrite and redeployment.  OOTB connectivity to a multitude of sources - JMS, Flume, Kafka, CSV, REST, HTTP, etc.  Full support for SQL and NoSQL data sources with extensible data cartridges – Oracle, HBase, etc.  OOTB functions for 2 and 3 dimensional spatial analysis  Dynamically inject CQL and tweak threshold values without stopping the application Storm OEP
  • 17.
    Copyright © 2014,Oracle and/or its affiliates. All rights reserved.17  No OOTB support for statistical functions  Absence of IDE and Web development and administration tools  No OOTB support for caching  Fault tolerant but not HA ! Application logic needs to implement HA  CQL in OEP supports hundreds of statistical functions with OOTB integration to R  Sports full-fledged IDE for developing event processing applications and an Admin console for managing apps  OOTB support for Coherence distributed data grid  Active-active configuration with exact replica of window/application state using clock synchronization. Storm OEP