Modernizing the Analytics and Data Science Lifecycle
for the Scalable Enterprise: The SEAL Method
CTO and Head of Engineering & Analytics, Dfuse Technologies (“Data Fusion”)
Former Director of Data for Warner Bros Digital Networks
Jeff Bertman
Click2: For FUTURE VERSIONS of this Presentation Deck,
click this public online folder (in Microsoft OneDrive).
Click4: Data Con Speaker Page
(Scroll to “Bertman”) Also Contact via:
- www.LinkedIn.com/in/JeffBertman
- Jeff.Bertman@DfuseTech.com
- Jeff.TechBreeze@gmail.com
- Mobile +1 818-321-3111
- More contact info and links at end of deck
Click5: Dfuse Technologies Main Site (www.DfuseTech.com)
Click3: Whitepaper underlying this Presentation (Details)
#NOTES to Audience:
(1) Caution: This deck contains some
Hollywood “GLITZ” that could be
harmful to your Boringzola.
Please be prepared to Smile 😊
(2) Thanks for the honor of Serving the
Data Con LA Community!
Click1: Data Con Main Site (www.DataConLA.com)
2(Rev 2020-11-01a)2
Version 2020-11-01a: Future Updates are available in two ways. First, thru Dec 31, 2020, updates will be posted on DataConLA.com.
Long Term Future Updates of this Deck + Underlying Whitepaper are available via links and contact info on first and second-to-last slides.
Agenda
Topic Slides Remarks
Agenda and Special Reviewer Notes 1 – 3 Incl this Table of Contents.
Intros 4 – 5
Setting the Stage 6– 8
Goals,, General Approach, Biz and Tech Landscapes 9 – 21
Analytics and Data Science Lifecycles –– Old and New (SEAL) 22 – 36 ➢ SEAL is explained on slides 25 to 36
Emerging Tech (Relevant to SEAL) 37 – 42
Q&A and Wrap-up 43 – 44
Appendix 45 – 58 Some Additional Sample Artifacts: Roadmaps, Diagrams, Agile Retrospective, Lite SOP, etc
33
Notes to Audience
IMPORTANT IMPORTANT
• Underlying Whitepaper:
Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click).
• Also See Appendix (at end): “If I had more time, this would have been shorter.”
-- Paraphrased from B. Pascal (mathematician) and H. D. Thoreau (philosopher).
• Some Slides are More Dense Than Typical:
Largely because basic info is supplemented with whitepaper style Callouts on certain slides.
• Minor Animations are used to Facilitate Readability for Dense Slides:
So not all content is displayed up-front (until you click through).
• PDF versions of the deck do Not Support the Dazzle (above):
On the flip side, PowerPoint dazzle in full decks sometimes runs poorly on slow network connections.
• Future Updates available:
See first and second-to-last slides for contact info and links.
44
➢ Data is our Flagship Boutique – Dfuse is short for “Data Fusion”.
Additional Practices cover broader FBI Needs (see “Beyond Data” below).
➢ Our fans call us "Data Doctors" – and we are well known as
presenters and expert/keynote panelists in Technical Forums sponsored by
Amazon AWS, Oracle, Microsoft, Snowflake, Hitachi Data, DigiMarCon (Marketing Tech Forum), et al.
➢ Ranked #84 in Inc 5000’s IT Services and #107 in IT Development for 2019 and 2020.
(See pic to right, and for other awards click here)
➢ Core Service Pillars (Biz and Tech): ”Data Everything,” E-Commerce, Health Industry, Financials, Marketing, Sales,
Telecom, Insurance, Gov/Military, Web/Mobile Dev, Tech/Cloud Infrastructure, DevOps, Cybersecurity, Compliances
➢ Major Success Stories for Huge Enterprises in Fortune 100/1000 (Apple, CVS Health, Amgen, EY, Wells Fargo,
New York & Company, et al) plus Related Government Spaces (DoD, FDA, NIH, HHS, Defense Health Agency, DoL, et al).
And our CTO’s personal background includes Verizon (saved $100 Million in 13 month period), Comcast, WBros, GEICO, CIGNA, Airlines.
➢ SMEs incl Authors, Expert Speakers, Former AWS Chief Data Scientist, Retired CIO Navy ONR, et al.
➢ Dfuse Certified Partnerships include Amazon AWS, Oracle (gold), Microsoft (silver), Symantec, Cofence (security), Proofpoint (enterprise
security), Alteryx (Gartner MQ Citizen Data Science), MemSQL (top rated Operational Analytics for Millions of Transactions/Sec), et al.
➢ Dfuse also conveys how we DEfuse or simplify the complexities of modern, advanced technologies to achieve real, measurable success
stories.
➢ ROI is a Specialty Topic we present at major national and regional conferences, mentor customers, and achieve 360 initiatives. We make it
Goal Driven, e.g., to maximize market share, revenue, profit, cost savings, efficiencies, quality, fulfillment, security, productivity, and more.
Dfuse Technologies –– Corporate Highlights
5
Relevant Success Stories
• Small to HUGE Scale Analytic and Operational
Environments supporting B2C, B2B, and B2G
(E-Commerce and more).
• Corporate Scorecard – numerous gains in B2C,
B2B, B2G environments. A GEICO MVP, for
example, increased internet sales (approx. 26%)
and gross profit (approx. 10%).
• Built all GEICO’s Internet Analytics System
from scratch.
• Saved Verizon $100 Million in 13 month period.
• Grew > 3X in < 1 Year: Data Modernization for
WB Digital / Machinima.
• Founded & Led Enterprise Data Sharing
Initiatives for WBros, Gov Intel, et al.
• Real Time Analytics / Fraud Detection for
Insurance, Marketing, eCommerce, et al.
• Increased ROI, TCO, TEC, and Various Cost
Savings and Efficiencies for numerous business
and technical processes.
• Routinely work with CxOs of small to huge
multi-enterprises (incl Fortune 100/1000 + Gov).
Certifications and Speaker Engagements
Speaker at National & Regional Conferences – Marketing, Biz Apps, Tech
-- Focus on Big Data & Analytics --
(incl Expert Speaker, Keynote, and SME Panels)
Work Experience
(15+ Years Experience as Hybrid Professional: Chief Architect & Biz Optimization SME)
Jeff Bertman –– Career Highlights & Credentials
6
Presentation Abstract – Brief Discussion
It’s no secret that the roots of Data Science date back to the 1960’s and were first mainstreamed
in the 1990’s with the emergence of Data Mining. This occurred when commercially affordable
computers started offering the horsepower and storage necessary to perform advanced statistics to
scale.
However, the words “to scale” have evolved over time. The leap to “Big Data” is only one serial
aspect of growth. Beyond the typical 1-off studies that catalyzed the field of Data Mining, Data
Science now fulfills enterprise and multi-enterprise use cases spanning much broader and deeper
data sets and integrations. For example, AI and Machine Learning frameworks can interoperate
with a variety of other systems to drive alerting, feedback loops, predictive frameworks, prescriptive
engines, continual learning, and more. The deployment of AI/ML processes themselves often
involves integration with contemporary DevOps tools.
Brief Discussion
It’s no secret
Now segue to SEAL – the Scalable Enterprise Analytic Lifecycle. In this presentation, you’ll learn
how to cover the major bases of a modern Data Science projects – and Citizen Data Science as
well – from conception, learning, and evaluation through integration, implementation, monitoring,
and continual improvement. And as the name implies, your deployments will be performant and
scale as expected in today’s environments.
Now seque to SEAL
7
This deck contains more GLITZ than usual ☺ And there are layers on several slides.
For Best Viewing, DOWNLOAD the PowerPoint Show File (vs viewing online). Thanks!
Show Business WARNING
8
▪ Thousands of content creators (aka talent partners)
▪ Millions of videos on numerous platforms
▪ Billions of aggregate views / month
Expanding Within & Beyond Your Enterprise requires even Greater Scale
▪ BI/Data supports WB Digital Networks, Other WB Divisions, External Companies
▪ Distribution supports other WB Initiatives
Most Enterprises These Days Routinely Work in the . . .
Recent Years
Millions… BILLIONS+Thousands…
▪ Cornerstone Technologies (Big Data focus)
#DISCLAIMER: This example is the only WB related slide (non-proprietary).
For Example:
▪ Cornerstone Technologies (Big Data focus)
99
Goals and General Approach
10
High Level Goal and Considerations
GOAL:
Challenges
1) Need Speed & Agility Balanced with Due Diligence:
2) Need Innovation Balanced with Pragmatics:
Solutioning Approach
• The “SEAL” Lifecycle presented here...
• Lean Focus -- Use Only the Parts We Need…
• Hybrid Solutioning…
• Economy of Scale…
Achieve maximum business outcomes by leveraging analytics/data science to deliver results
on/ahead of time, on/under budget, with enduring results and easily reproducible consistency.
11
High Level Goal and Considerations
GOAL:
Challenges
1) Need Speed & Agility Balanced with Due Diligence:
It’s tough to provide end-to-end solutioning from brainstorming
and concept through evaluation, development, delivery, and
integration into all the moving parts of an enterprise.
2) Need Innovation Balanced with Pragmatics:
Foster creativity, lateral thinking, and swift delivery while
avoiding typical pitfalls such as business specifics or data
sources not being accurately vetted (especially when working
swiftly), disconnects between business and technical solution
which yields inaccurate results, or an under-performant
solution. Also need cohesive, cost-effective, adaptive
methods to ensure consistent delivery for each story plus
integration into the enterprise / multi-enterprise ecosystem.
Solutioning Approach
• The “SEAL” Lifecycle presented here is rooted on other,
mainstream proven methods –– cleanly enhanced and
supplemented with critical gap fillers to accomplish our stated
goals.
• Lean Focus – Use Only the Parts We Need based on
story complexity, risks (more risk often but not always = more
due diligence), and mission / business criticality.
• Hybrid Solutioning involves multiple perspectives. At the
highest level, we drive from Top-Down business mission,
goals, objectives, KPIs and triangulate from Bottom-Up
technology and other enablers – to ensure accurate
solutioning as well as timely and cost-effective delivery.
• Economy of Scale is attained in many ways.
For example, uniting, sharing, and democratizing work
between formal data scientists and empowered business/data
analysts (aka Citizen Data Scientists).
Achieve maximum business outcomes by leveraging analytics/data science to deliver results
on/ahead of time, on/under budget, with enduring results and easily reproducible consistency.
12
LEVERAGE TECHNOLOGY
Architecture, Engineering, Methods, Libraries,
CM, QA, Security, SysOps, DevOps
DATA >> INFO >> KNOWLEDGE >> ACTION
Improve BIZ (Revenue, Profit, Market Share, Etc)
Always Grow BIZ Value –– Data Intelligence
BEST PRACTICES & ~SLAs
Continual Improvement, Serviceability,
Reliability, Performance, Governance
SERVICE ORIENTED Mindset Driven By Clear Mission, Values, Goals & Priorities:
Cost-Benefit + “Everyone is a Customer” Approach
Culture Credos: Be the Solution, Be the Boss, Value Each Other, A-Team, Executional Excellence, …
Example
Business Pillars
Enabled by
Data Technology
(Analytics & Operations)
13
Low Level
Processes
Get Stuff, Do Stuff, Put Stuff, Etc
Raw
Data
Structured, Semi-Structured, Unstructured
Information Technology
Data
Engineering
Data ►► Information
Software
Engineering
Tech ►► Biz Tools
Project/Product
Management,
QA, Security
Actualization
Apps, Visualizations, Analytics/AI/ML,
Reporting, Alerting, Extended Consumption
Value-Scape
Fulfillment Pyramid
Empower
BUSINESS
Gains thru
Technology
BUSINESS VALUE.
Improve Revenue, Profit, Market Share,
Mission Effectiveness, Efficiencies, ROI, TCO,
Quality, Timeliness, Accessibility, Safety, and other KPIs
14
Data
Integration
Structured, Semi-Structured, Unstructured
Data Platforms, Underlying
Network & Storage Tiers
From Landing and Data Lakes to Refined Data Stores/Hubs
Presentation
Optimization
Data ►► Information
Geo + Media
Management
Unstructured Content
►► Concepts
Data Curation,
Governance,
Meta, Catalog
Actualization
Apps, Visualizations, Analytics/AI/ML,
Reporting, Alerting, Extended Consumption
BUSINESS VALUE.
Improve Revenue, Profit, Market Share,
Mission Effectiveness, Efficiencies, ROI, TCO,
Quality, Timeliness, Accessibility, Safety, and other KPIs
Data
Virtualization / Fabric
Simplify Disparate Sources (& Targets)
Value-Scape
Fulfillment Pyramid
DATA
Landscape
15
Simplifying ROI with Accurate, Focused Results
• ROI’s inherently simple formula can quickly become quite cumbersome,
e.g., how to measure Customer Satisfaction. This is a main reason why
ROI is often discussed but seldom assessed (see empty seat in comic).
• But quite often remarkably accurate results can be obtained by blending a
Simple Yelp / Zagat “# stars” approach with Zachman based
perspectives to yield an At-a-Glance ROI guide for decisioning.
• Incorporate Lateral Thinking and Impact Analysis for accurate ROI.
Analyze direct and indirect costs, how talent is leveraged across various
data focused roles, impact across all environment tiers, transition and
training costs, ecosystem, interoperability, vendor viability, etc.
Effective ROI Requires Lateral Thinking & Impact Analysis (But Can Still be Simple)
Source: Daniel Kuperman “Marketing Humor” series (click)
Example Lateral Considerations for ROI for
Analytics / Data Science
• In-Place Analytics and Citizen Data Science can accelerate 1-off studies
and sometimes more (depending on tools). And they can save $$$ by
leveraging relatively inexpensive object storage, minimizing use of data
integration labor/tools, and providing economy of scale by empowering all
data savvy users with advanced analytics and asset sharing. BUT they can
also grow data silos, “swamps,” and other technical debt. And they can
defer the ability to perform broad and deep analytics, e.g., comparing and
contrasting YoY, across market segments, campaigns / initiatives, etc.
• Save ~75% Storage Costs for certain DW platforms that use object
storage natively with high performance (S3/Blob/ADLS). For example,
Snowflake, Microsoft Synapse (as of Nov 2019 release), EMR, etc. Fyi later
in 2020 AWS is slated to introduce S3 for AWS Outposts which will
influence this topic).
16
A Glimpse of the ROI Ascendancy Model (ROI-AM)
NOTES: • The term ROI-AM has not yet been officially released (ETA 2021-Q1). The underlying techniques have many years of success stories in
Fortune 500 and Gov arenas. ROI-AM has been presented at major national and regional conferences (incl expert speaker engagements).
• The “Fiscal Technology Landscape (FITL)” is a key part of ROI-AM. Details (deck/whitepaper) available upon request.
• CMMI is NOT a part of ROI-AM, and not particularly endorsed by this author. It does align with ROI’s maturity theme to help people get the
idea. CMMI V2 released 2019-2020 is the first AGILE version -- much better than V1. CMMI ideas such as continual improvement are great!
17
BIZ ACTIVITIES >>
Conventional Data Flow –– Simple Landscape
18
BIZ ACTIVITIES >>
Most Actionable
CUSTOM
TURBO
CUSTOM
TURBO
CUSTOM
TURBO
#Discuss
Raw
Curated
• Purple Dotted
Lines depict
“Accelerator”
Patterns
Snowflake, Firebolt (eval)
>>ETC<<
>>ETC<<
Hadoop, HBase, $ BigTable
Parquet, CSV, JSON, Etc
Druid, $ MemSQL, $ Vertica
ELK, Cassandra, $ Dyn’DB
• Best of Both
Worlds:
Accelerate
Time-to-Market
while
Eliminate
Tech Debt
• Fyi Details
are under the
Circles and
below the
Data Drums
Conventional Data Flow –– plus Accelerator Patterns and Tools
19
BIZ ACTIVITIES >>
Most Actionable
#Discuss
Raw
Curated
• Purple Dotted
Lines depict
“Accelerator”
Patterns
Snowflake, Firebolt (eval)
>>ETC<<
>>ETC<<
Hadoop, HBase, $ BigTable
Parquet, CSV, JSON, Etc
Druid, $ MemSQL, $ Vertica
ELK, Cassandra, $ Dyn’DB
• Best of Both
Worlds:
Accelerate
Time-to-Market
while
Eliminate
Tech Debt
• Fyi Details
are under the
Circles and
below the
Data Drums
Conventional Data Flow –– plus Accelerator Patterns and Tools
2020
• (Over)Abundant
GREEN
highlights
NEW
Components
• Fyi Specific
AWS and Azure
Components
available in
Complementary
Slide
(upon request)
• Cannot just
“Deploy” per
CRISP-DM
• INTEGRATE is
the word
Sample Transactional & Analytic Multi-Purpose Landscape – Integration is CRITICAL
2121
• (Over)Abundant
GREEN
highlights
NEW
Components
• #FUTURE:
Make GCP and
OCI Versions
• Cannot just
“Deploy” per
CRISP-DM.
• INTEGRATE is
the word
Breakdown (Just FYI – FUN ☺ Complementary Slide with AWS and Azure Components)
2222
Analytics & Data Science Lifecycles
► Old and New ◄
23
Cross-Industry Standard Process for
Data Mining:
• Focused on Data Mining Silos
• Advent 1997
CRISP-DM (1997+)
Current Methodologies to Drive Analytics / Data Science Projects
Highlights:
• CRISP-DM: Wikipedia, Towards Data Science
• Data Preparation is often said to hold
“most of the work”
• Modeling (click) is the ML Core:
o Model / Algorithm Selection and Creation
(click for decision tree options, etc)
o Model Test Plan
o Parameter Testing & Tuning
24
Current Methodologies to Drive Analytics / Data Science Projects
Some Modern Alternatives
• SAS Institute: SEMMA — Sample Explore Modify Model Assess (click)
• IBM: ASUM-DM — Analytics Solutions Unified Method (click)
• Microsoft: TDSP — Team Data Science Process lifecycle (click)
• Collective University Study in Germany & South Korea, 2020: CRISP-ML(Q) — Focus on QA (click)
• Note about Model Selection & Evaluation:
Like the University Study above, “There are plenty of ML models and it is out of the scope of this paper to compare and list their characteristics.
However, there are introductory books on classical methods… (click)”
Where They Fall Short (why not widespread yet?)
• People associate them with proprietary tools of their respective vendors.
• While filling certain gaps in CRISP-DM, there are more to address:
• Scalability / Performance
• Accountability / SLAs
• DevSecOps (DevOps, Security, Compliances)
• Holistic QA and Continual Improvement
• Democratized Analytics / Citizen Data Science
• etc
• Applicable in all modern initiatives, and especially medium to large enterprises.
• Underlying SEAL whitepaper (click) has more info. (See “Info on Other Industry Methods” section.)
◄══ Personal Favorite
25
OLD: Cross-Industry Standard Process for Data Mining
(CRISP-DM)
• Focused on Data Mining Silos
• Advent: 1997
• Good Explanation (here)
NEW: Scalable Enterprise Analytics Lifecycle
(SEAL)
➢ Modernized Version of CRISP-DM (w/ considerations of other methods)
➢ Advent: 2020 based on Past Experience in Fortune 1000 and Government
➢ Read on …
Scalable Enterprise Analytics Lifecycle (SEAL)CRISP-DM (1997+)
Juxtaposing OLD and New: Methodologies to Drive Analytics / Data Science Projects
New Implement & Improve Wrangle & Refine
Fulfill & Validate Charter
Data
Define Charter incl Goals & SLAs
Optimize
Optimize
Start
26
Diving In with SEAL: Scalable Enterprise Analytics Lifecycle
➢ Multi-Faceted / Multi-Team support for data science as well as
conventional analytics and citizen/democratized analytics (self-service)
➢ Accommodates Modern Reference Architectures
(recall our Value-Scape)
A Modernization of CRISP-DM (and Other Proven Methods)
Implement & Improve Wrangle & Refine
Fulfill & Validate Charter
Data
Define Charter incl Goals & SLAs
Optimize
Optimize
Start
➢ Lifecycle is Closed Loop with Feedback and Optimization
➢ Charter and Goals Driven so “Understanding” permeates and
persists (makes continual improvement possible)
➢ SLAs are Optional but Typical in Modern Enterprises
Highlights
27
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Start
… SEAL Highlights Continued:
➢ Scales with Big Data Volume, Velocity, Variety, Veracity, Value (click)
➢ Integrates with Enterprise and Multi-Enterprise Ecosystem
➢ Incorporates Modern DevSecOps (DevOps + Security)
➢ Continuous Improvement is Built-In
Last Glimpse –– Juxtaposed with “Deeper” SEAL:
• Focused on Data Mining Silos
• Advent: 1997
• Good Explanation (here)
New
Scalable Enterprise Analytics Lifecycle (SEAL)CRISP-DM (1997+)
28
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Callout 1
Callout 2
Callout 3
Callout 4Callout 6
Callout 7Callout 8
Callout 9
Callout 10
Callout 5
Start
➢ CRISP-DM Parts of SEAL (Jeff’s SEAL article + 3rd Party article)
➢ Read on for Distinguishing Callouts …
29
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Callout 1
Start Callout #1:
Added Charter with Goals & SLAs
LEAN, BULLET ORIENTED CONTENT:
✓Goals – seek to compare/contrast, predict, prescribe.
Outcomes are generally measurable.
✓Value Proposition Summary – incl brief bullets for
Who, What, When, Where, Why, How
✓Cost-Benefit – can be simple score, e.g. 1 low to 5 high
✓Risks & Mitigations
✓SLA Summary (Optional) – Quality, Performance,
Availability / Reliability, Security (e.g. click1, click2).
Details can be deferred til “Fullfill” and “Implement” phases.
30
SWOT Summary Value Proposition (for ST Customers unless stated otherwise)
• Strengths:
o Whatever.
• Weaknesses:
o Whatever.
• Opportunities:
o Whatever.
• Threats:
o Whatever.
♦ WHAT: • Whatever.
• Whatever.
•. Whatever.
♦ WHO: •. Whatever.
♦ WHEN: •. Whatever.
♦ HOW: • Whatever.
• Whatever.
♦ WHERE: • Whatever.
♦ WHY: • Whatever.
♦ BUSINESS VALUE VALIDATION:
o Proactive: Preemptive Survey plus AI / ML Predictive performance metrics
(OKRs / KPIs) around revenue, profit, market share gain, ROI, etc.
o Reactive: Net Promoter (NPS) Contractor Survey plus Actual performance metrics
(OKRs / KPIs) around revenue, profit, market share gain, ROI, etc.
Example Charter Template Part 1 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC)
#
Goal
Type(s) Opportunity Goals and Profile
Cost : Ben : Risk &
Value (1-5 Hi : 1-5 Hi)
1 • Mkt Dev:
Large Niche
+ Surge
• Prod Dev:
Add-On
SHORT DESCRIPTION:
• Details
• Cascade Value Options:
o Etc.
o Etc.
• Goal(s): Summary..
• Market: Grow Customer Base, New Inbound Channel.
• Finance: Increase Revenue.
• Product Add-On: Whatever.
• Product Lines: One, Two.
• Deployment / Packaging Options: Whatever.
• BIZ 2 : 5 : 1
• TECH 3 : 5 : 2
• VPROP: High
• PRIORITY: High
31
# Nutshell Channels / Touch Points Risks / Unknowns Mitigations
3 • SHORT
DESCRIPTION
• Mkt Dev:
Large Niche + Surge
• Prod Dev:
Add-On
• Sales:
Whatever.
• Distribution:
Whatever.
• Communication:
Whatever.
• Command & Control (DoD):
Whatever.
a) Whatever.
b) Whatever.
c) Whatever.
Proactive:
a) Whatever.
b) Whatever.
c) Whatever.
Reactive:
• Whatever.
Key Partners / Vendors Key Resources, Activities, Dependencies Customer Segments & Relationships
Critical:
• Whatever.
• And/or Whatever.:
Also see Mashups section below, especially
partners for the “Data” drilldown.
Resources and Activities:
• Biz: Need Inhouse Role as liaison with Whatever.
• Tech: Support service request feed … … and Closed Loop back to
Whatever.
Other Dependencies: See Mashups section below.
Segments
• Related / Niche Markets.
Relationships:
• Initial: Whoever for Large Surges / Special Events.
• Potential Future:
o Whoever (recurring based on bla bla bla).
o Whoever (occasional when whatever).
Cost Structure Monetization, Revenue Streams & Models Mashups & Critical Integrations
Choose (usually just 1):
• Value Driven (High Value Proposition) vs
Cost Driven (Low Price).
• OR Cost Driven (Low Price).
• OR Whatever.
Monetize by: Whatever.
Revenue Stream: Recurring.
From Whoever1: Subscription (potentially Tiered by
Features/Usage) and/or Per Use/End Client.
From Whoever2: #TBD if can monetize, e.g. via Whatever.
Genl Behavior: • Initial: Large Surges (due to Event in VProp “When”).
• Long Term -- can also Sustain via broadened
Whatever Network and satisfied End-Customers.
Data & Analytics:
• See MVU Board Part 3.
Functional Product Dev for:
• Whatever / Partner Portal + Key Resources, Activities (above).
Example Charter Template Part 2 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC)
32
# Project Nutshell
3 • SHORT DESCRIPTION • Mkt Dev: Large Niche + Surge • Prod Dev: Add-On
Semi #Techie
Example Charter Template Part 3 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC)
SLAs INTERNAL SLAs EXTERNAL
• Quality:
Whatever.
• Performance:
Whatever.
• Availability / Reliability:
Whatever.
• Security / Compliance:
Whatever.
• Quality:
Whatever.
• Performance:
Whatever.
• Availability / Reliability:
Whatever.
• Security / Compliance:
Whatever.
Data INTERNAL Data EXTERNAL (All inbound unless stated otherwise, e.g. for Closed Loop)
• Whatever (e.g. Social Network):
Whatever.
• Whatever (e.g. Property Assets):
Whatever.
• Whatever (e.g. External Demographics):
Whatever.
• Validation:
For Proactive AI / ML predictive performance metrics
(see Profile “Value Validation” section), need access to
Whatever.
• #BONUS:
Can potentially (eventually) SELL Whatever data,
optionally blend w/ External data. E.g. Whoever can
use in Insurance Risk and Rates determination, etc.
General Usage:
• 2 Purposes: Pre-Event e.g. Line up Whatever based on Forecasting) + Post-Event (e.g. Adjust Campaign / Continual Improvement).
• 2 Patterns: Med to Very High Velocity and Volume (Med Pre-Event. Med to Very High Post-Event.).
• Interactions –– Pre-Event:
o E.g. Identify real estate properties by Prosperity (across metro, urban, rural), Additional Demographics, Competitive Landscape, etc.
o See AI / ML details in Value Proposition “WHAT” and “HOW” sections (MVU Board Part 1).
• Interactions –– Post-Event:
o Feed Whatever Queue.
o Closed Loop feed completed Whatever info back to Whoever.
Sources Critical – Data Owners of Whatever Info:
• Source1 (Gov): Whatever1.
• And/Or Source2 (Cmcl): Whatever2.
Sources Optional – For Early Warning Whatever Info, etc: (Start with Just 1 or 2 of the following)
• Preferred (Gov+Cmcl): for Whatever1 Events: Whatever org / dataset.
• Maybe Cmcl: for Whatever2 org / dataset.
• Maybe Gov: for Whatever3 Iorg / dataset.
33
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
➢ Interaction and Dependencies across Roles:
Data Engineer, Data Scientist, DevOps Engineer
➢ Sets the Stage for Subsequent Phases …(Diagram Source: Unknown
#TODO: Replace with custom diagram)
34
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Start
Callout 2
Callout 3
Callout #2:
More than just Data Preparation
✓ Wrangle, Visualize when Possible (“for best results”)
✓ Acquire, Prepare, Verify Data Quality
✓ Blend, Refine, Transform
Callout #3:
Support for Conventional Analytics
✓ Analytics & Data Science can Share the Same Lifecycle
✓ Consistent Results + Economy of Scale
✓ Share Data Assets and Workflows
✓ Certain Tools Enable This Sharing (e.g. Alteryx, KNIME, etc)
35
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Start
Callout 4Callout 6
Callout 5
Callout #4:
Add Scalability Checkpoint
In Addition to Evaluation of Accuracy / Quality:
✓ Validate Scale Up and Out Scenarios
✓ Cover Data Volume, Velocity, Variety, Veracity, Value (click)
Callout #5:
Add Feedback Loop to Data Wrangle & Refine
✓ Immediate Gratification >> Agile Spiral
✓ Also update Business and Data Understanding
for Current and Future Stories
Callout #6:
Add Actualize / Visualize Validations
✓ Test 360 Fulfillment of core data as well as edge cases, etc
✓ Prototype / Preview Actualizations,
e.g. End Visualizations, Alerts, Data Sharing / Feeds
36
Diving Deeper with SEAL: Scalable Enterprise Analytics Lifecycle
A Modernization of CRISP-DM (and Other Proven Methods)
Callouts #7 and #8:
Add Integrations and DevSecOps
In Addition to Basic Deployment, Prepare and Implement:
✓ Integrations and Orchestrations
• Internal Interoperabilities
(DevOps, CI/CDeliver, SysOps, Compliance Checks, UAT,
CDeploy, Alerts, Inhouse Consumers/Feeds, etc)
• External Interactions
(Outside Consumers, Interfaces, Partners, etc)
✓ Proof of Performance & Scalability
✓ Proof of Security & Compliance Safeguards / Mitigations
Callouts #9 and #10:
Add Monitoring & Improvement
✓ Retrospective
✓ Continuous Improvement Feeds
(Manual, Drift Detection, Feature Relevance/Noise over time,
Imputing Data Quality / Replacement, Model Degradation, etc
– e.g. click for model retraining situations)
✓ React Within Project Scope + Enterprise Ripples (if any)
✓ Feedback to Business & Technical “Understanding”
✓ Feedback to / from Project Charter, Goals, SLAs
(potentially adjust)
Implement &
Improve
Wrangle &
Refine
Fulfill & Validate Charter
AI/ML
Modeling
Evaluate
Deployment
/ DevSecOps
Monitoring
Data
Scale
Analytic
Solutioning
Integration
(Internal + External)
Define Charter incl Goals & SLAs
Business
Understanding
Data
Understanding
Data Acquisition,
Preparation & Blending
Actualize
/ Visualize
Optimize
Optimize
Callout 7Callout 8
Callout 9
Callout 10
Start
3737
Related Kewl & Emerging Tech
► Accelerators, ROI, Maintainability, Etc ◄
#TODO
Refine / Expand this section
in Future Version
3838
Related Kewl & Emerging Tech
► Continuous Innovation ◄
(Source: Gartner Inc. – www.gartner.com)
Need
Iterations
or New
Tech to
Sustain
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Title Goes Here
ENTERPRISE BUSINESS
SOURCES
ENTERPRISE DATA LAKE
Expedited Track (for special cases)
STREAM MONITOR
STORE AND PROCESS
Data Discovery & Business Intelligence
RFID
Social
WEB POS
GPS REAL-TIME
ALERTS
ACT
32
Heterogeneous Sources
REFINED DATA HUB / DW
4b
4a
• Patient Data
• Historic Events
• Vaccination Coverage
• HER
• Survey Data
• Pandemic Influenza Data
• Other
MED SPECIFIC INPUTS
• Census Data
• Health Care
• Research Data
• Geo-Spatial
• Other
DBMS’s
REAL-TIME DATA SOURCES
DOWNSTREAM
DATA CONSUMERS
ANALYTICS TOOLKIT
ACTUALIZATION: INSIGHTS & FOLLOW-UP
5a
5b Produce
VISUALIZATIONS
INGEST
1a
1b
INGEST
Produce
(Mult Options)
5a
QUALITY & SECURITY “BY DESIGN” (End-to-End incl QA Checkpoints, SSO, SOD) Also See Security Framework Figure and Governance Callouts
DATA >> ACTIONABLE INFO
• Other
ANALYZE
LEARN
ADLSS3
hadoop
/ HDFS
IoT/Hub/
Edge
GEO
(various)
Data Journey –– Reference Architecture Options Innovation Opps in Purple
4040
Related Emerging Tech
► DevSecOps ◄
#TODO
Refine / Expand this
in Future Version
(Source: Gartner Inc. – www.gartner.com)
41
Artificial Intelligence / Machine Learning (AI / ML)
Lifecycle Optimization --- Product Highlights with High Focus On
Innovation, ROI, Cost-Benefit, Endurance, EcoSystem, Vendor Viability, Etc
Products Why Remarks
• Alteryx (.com) “Citizen Data Science Platform” that empowers
conventional business/data analysts to perform formal
AI/ML while sharing assets and workflows with formal data
scientists and AI engineers. Other terms for Citizen Data
Science: Democratized Data Science, Self-Service Analytics.
Alteryx is a Gartner MQ Leader.
Drag-and-Drop No Code support (and optional command-
line). “Visualytics” approach renders graphics all along data
pipeline – easy to find anomalies early vs waiting for final
visuals. Also supports lite-moderate Data Integration
(prep/blend/transform).
• DataRobot (.com) “AutoML” -- Automate the normally laborious tasks of trial
and error, and more. Another form of Democratized Data
Science, but still requires some statistical background.
Automation of ML Overall (click), Feature Selection (click),
Data Exploration & Preparation (click), Deployment (on-
prem, managed AI cloud, hybrid & multi-cloud).
Data Virtualization / Integration
Products Why Remarks
• Denodo (.com) or Lyftron (.com) Expedite and centralize access to all data everywhere. Data
Catalog can exceed industry leaders (offline discussion).
Denodo is Top ~3 and Gartner MQ Leader – read Small
Print about Data Virtualization vs DI. Read mostly + lite
write support. Lyftron is emerging contender ~75% $less.
• Delphix(.com) Virtual Data Cloning of DBMS (only) data sources. Supports
full read+write, data masking, access controls, catalog.
Purported “data virtualization” is really virtual data cloning
– although it is immediate so excellent for large scale
testing and CI/CD using real or scrubbed/masked
production data. Clones consume ~zero initial space (grows
only on deltas).
• Magpie (Silect.is) Automated Data Exploration and Data Prep combined with
Data Catalog and Governance/Security features.
Powered by Apache Spark plus proprietary features. SaaS
supports AWS, Azure, GCP.
#TODO: Refine / Add Info
42
Lifecycle Optimization --- Product Highlights with High Focus On
Innovation, ROI, Cost-Benefit, Endurance, EcoSystem, Vendor Viability, Etc
DevSecOps and ALM (Lifecycle) Optimization
Products Why Remarks
• ProofMethod (.io) Centralized Monitoring and Insights of
End-to-End Plan, Build, Test, Deploy
activities.
Works with Jira, GitHub, Jenkins (and more coming). Blended metrics with
visibility for past week, month, quarter, year, etc. Show Plan Metrics (issues by
lead time and counts by status/type/priority/etc; Code Metrics (avg commits by
time, top/least contributors by pulls, commits, merges); Test Metrics (results over
time), Pipeline Metrics (results over time, duration by build/run, etc).
• DataRobot (.com) – Yes Again ☺ Automated MLOps and Monitoring (e.g.
Drift Detection, Feature Relevance vs
Noise over Time, etc)
Centralized UI panel (and command-line) for deployment, monitoring/insight,
management, and governance of machine learning models in production
environments – across any cloud.
Also see AI/ML section for use of DataRobot in Development, etc.
• Liquibase (.org and .com)
fka Datical
Automated DataOps including CI/CD for
DBMS’s (click for integrations list).
DataOps tracking, versioning, and CI/CD incremental deployment support for
database changes. Millions of developers to date on open source version. Also
also paid support options (e.g. $33 to $99 per connection per month). Mainly for
SQL Databases plus recently added MongoDB, Cassandra (in 2020-H1) -- adding
more NoSQL platforms over time.
• CloudHealth (VMware.com) Multi-Cloud Infrastructure Financial
Management & Insights, Provisioning
Policies, and Security Risk Analysis.
CloudHealth also integrates with third party monitoring tools for additional depth
(DataDog, New Relic, WaveFront, etc).
AWS CloudWatch updates Oct 12, 2020 (click) add improved UI and works with
Cost Explorer and Trusted Advisor. But CloudHealth is Multi-Cloud + has other
distinguishers such as ease of management across accounts,
identifying/terminating un/under-used assets, RI optimization (#TBD how
different for Savings Plans), highlighting old/upgradeable EC2/VM instance types,
highlighting/managing storage tiering costs, optimizing containerization costs by
comparing utilization vs provisioning (click for more info).
#TODO: Refine / Add Info
43
Contact Info –– For Future Questions plus Updates of this Deck and Underlying Whitepaper
• Many Thanks to Data Con LA (Subash and Team) … and YOU!
• Wide Open to (pro-bono) Q&A after this session. Also see Co-Authoring Opps on this and other topics.
Jeff Bertman, CTO & Lead Data Scientist/Engineer, Dfuse Technologies
• Jeff.Bertman@dfusetech.com (and Jeff.TechBreeze@gmail.com per blog/whitepapers)
• Cell/Text 818-321-3111, Headquarters 877-553-3873
• Also available on WhatsApp (same as cell #), MS Teams, Slack, Google Meet, Zoom, Discord, etc
• FUTURE VERSIONS of this deck in Public Online Folder (https://1drv.ms/u/s!AuvOPf8_XJZVmVjLFWVqLD126XgC?e=83UM7C)
• Underlying Whitepaper – Wide Open to Co-Author Contributions for Future Versions:
Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click)
• Dfuse Technologies Main Site (www.DfuseTech.com) – Nationwide Commercial & Gov/DoD
Q&A, Contact Info, Future Updates
Closing Remarks
44
Make it a Great Day :)
4545
Appendix
Some Example Artifacts: Generic and Fictional (but Realistic)
4646
Appendix: Roadmapping Innovation Highlights
Heads-up on Related Future Article: Preemptive Agile Roadmapping (PAR)
47
Symbol Description Remarks
Done Completed work
In Progress Aka Work in Progress (WIP)
[Outplan] Popup Work not previously planned on Roadmap For example, due to new government regulations, time-to-market race, etc.
Kickoff Meeting Incl Interoperability Requirements, Risks & Mitigations, etc.
Moved to Future
Typically due to Outplan work. Sometimes we can squeeze schedule or use flex
resources to accommodate. Other times we slide like this.
Moved SOONER than originally planned Occurs when running ahead of schedule (various reasons, e.g. new staff, etc).
Cautionary Note Generally requires discussion with stakeholders.
#TBD “Dust” tag to resolve something unknown Can impact Roadmap, depending on outcome or delay in resolving.
#WAIT “Dust” tag indicates something is pending Can impact Roadmap if dependency arrives/occurs late.
#DEPEND “Dust” tag indicates some other kind of dependency Can impact Roadmap if dependency arrives/occurs late.
#IDEA Discussion tag about a topic to discuss Discussion typically includes Biz and/or Tech Stakeholders (dep on which Roadmap).
NOTES: • Beyond straight planning and prioritizing, Roadmap Meetings (1:1 and Roadmap “Unity Sessions”)
are a great way to socialize how we are shaping our Future together ☺
• I generally prefer sharing Roadmaps via a SaaS app such as Aha or RoadMunk, etc.
But PowerPoint or Google Slides can also suffice.
Appendix: Legend for Technical & Business Roadmaps (Next 2 Slides)
48
Rev: 2019-09-17a
Q1 2020
ABC Biz Expansion – Wave D
• Estimated Earnings Refinement
(SChat supersedes Fan Funding)
• Cross-Enterprise Workflows LOB Group 2
• Platform Expansion Ph 2b:
Full for AVD, PTV; MVP for FB
• SAP Migration PRD + Pilot
#TBD Need more granular contract info?
XYZ Consumer Expansion
• CDP Direct Automation Ph 2: PRD
(Replace EMI weekly/monthly ingest)
• FB General Ph 2: MVP + Full
Core Intelligence Platform (CIP)
• Social Analytics Data Group 2
• YT Video Enhance Ph 3: MVP + Implem
• TWITCH Ph1 MVP Automation
• #TBD Enhance/Start New Platforms:
Verizon Oath (supercedes Go90), Pinterest,
Sony VUE, SOHU Global Wave B, Xumo
• Platform Driven Integration Changes
Q3 2019
ABC Biz Expansion – Wave B
• Paid Features Ph 1: PRD + MVP Interim Ops
• Cross-Enterprise Workflows Pilot
• Talent Referrals Interim Semi-Auto Ops
• Contracts Ph 2: Retro Metrics
• [Outplan] Google DFP Data Feed Mods Ph 1
• [Outplan] Platinum Tier Fees: MVP Pilot
XYZ Consumer Expansion
• FB Video Data Ph 1: PRD + MVP1
• [Outplan] FB Driven Major Data
Enhancements – Implem
Core Intelligence Platform (CIP)
• Social Analytics Data Group 1a
• Multi-Platform Contracts PRD + MVP
• [Outplan] Multi-Platform Platinum (ripple)
• Salesforce Integration Ph 1b: MVP
• Google DFP (Direct Sales) Changes
• Platform Driven Integration Changes
Q4 2019
ABC Biz Expanson – Wave C
• Paid Features Ph 2: Auto-Pay to Scale
• Cross-Enterprise Workflows LOB Group 1
• Platform Expansion Ph 2a MVPs
o Amazon Video Direct (AVD)
o Pluto TV (PTV)
• [Outplan] Google DFP Data Feed Mods Ph 2
• Platinum Tier Fees: Full Implem Oct 16 Cycle
XYZ Consumer Expansion
• FB Video Data Ph 2: MVP2 + Full
(#WAIT for YT to Populate Estimated Earnings)
• TWITCH General Ph 1: PRD + MVP
Core Intelligence Platform (CIP)
• Social Analytics Data Groups 1b + 4
• Salesforce Integration Ph 2: Implem
• YT Video Enhance Ph 2: PRD
(e.g. Video Topic Xref to Game Types, etc)
• YT Bulk -- Add Lower Tiers Part 1
• Platform Driven Integration Changes
2019 – 2020 Data Engineering BUSINESS Roadmap (Fictional but Realistic)
Q2 / Q3 2020
ABC Biz Expansion – Wave E
• Affiliate Enhancements Ph 2
• #TBD Director Payments
• Cross-Enterprise LOB Group 3
• Platform Expansion Ph3:
Full for FB, MVP for Comcast
• SAP Migration Full Implem
• Acctg QA Auto Ph 2: Tax Withhold
• Acctg QA Auto Ph 3: Wire Xfer + ACH
• Talent Referrals Full Prod
XYZ Consumer Expansion
• CDP Direct Auto Ph 3: Implem
(Replace EMI weekly/monthly ingest)
• #TBD TWITCH Ph 2: Implem
Core Intel Platform (CIP)
• Social Analytics Data Group 3
• YT Bulk -- Add Lower Tiers Part 2
• #IDEA Video Topic Ph3: Lexicon
• JIRA Video Classification f/ SF
• YT Legacy Ruby Migration
(Upgrade or Python)
• Platform Driven Integ Changes
49
Rev: 2019-09-17a
Q1 2020
NEXTGEN Cloud
• Core Ph 5: Analytics BIG Boost 2 (ABB)
Implem: Migrate Dim Models to Snowflake
• Enterprise Data Sharing Initiative (EDSI)
Ph 2 PRD
ENHANCE TECH HUB
• Data Readiness Dashboard (DRD) Ph 2
• Data Surge Framework (DSF) Ph 1
#TBD Q4 2018(?) PRD + Lite MVP
CORE Platforms
• MDM Ph 2 PRD Off-YT OTT PPW
• DISTRO Ph 1 PRD Spec for Integrtn
• Platform Integration/ETL Chgs/Maint
Prod Support O&M, Etc
(Mult Depts, ~250 Rqsts/Mo)
Q3 2019
NEXTGEN Cloud
• Core Ph3: Implement thru Staging
• Enterprise Data Sharing Initiative (EDSI)
Ph 1 Implem
ENHANCE TECH HUB
• Data Mgmt Framework (DMF) Ph 2
• Core Intel Platform (CIP) Multi-Tenant Ph 2
• Email Accts Expansion Framework (OTT etc)
CORE Platforms
• [Outplan HUGE] System/Network Admin Gap-Fill
• E-Commerce Encrypt Upgrade (Full Prod)
• Sec GDPR Consumer Data Purge Auto (Mult Depts)
• Biz Contracts + Invoices Doc (Mult Depts)
• Analytics Social Media Accts MFA (Mult Depts)
• YT Tech Survey Response (API etc)
• [Outplan] Degraded OnPrem HW Workarounds
(after long A/C outage mid-summer)
• Platform Integration/ETL Chgs/Maint
Prod Support O&M, Etc
(Mult Depts, ~100 Rqsts/Mo)
Q4 2019
NEXTGEN Cloud
• Core Ph4: Implement Production
incl Analytics Big Boost 1 (ABB) Implem
• Core Ph4: Analytics BIG Boost 2 (ABB):
Migrate Dim Models to Snowflake
ENHANCE TECH HUB
• Data Anomaly Detection (DAD) Ph 2
• Data Readiness Dashboard (DRD) Ph 2
CORE Platforms
• #TBD MDM Need Sooner than Q1-Q2-2020
(e.g. for OTT Pmt Integ Ph 3)?
• Platform Integration/ETL Chgs/Maint
Prod Support O&M, Etc
(Mult Depts, ~150 Rqsts/Mo)
Q2 / Q3 2020
NEXTGEN Cloud
• Enterprise Data Sharing Initiative
(EDSI) Ph 2 Implem
• Multi-Cloud Optimization Ph 2
PRD + Implem
ENHANCE TECH HUB
• Data Readiness Dash (DRD)
Ph 3 – Full Implem
• Data Surge Framework (DSF) Ph 2
CORE Platforms
• Q1?: MDM Ph 2 Implem OTT PPW
• Q2?: DISTRO Ph 2 MVP Implem
• #TBD Intra-Video Metrics Ph1 PRD
• Platform Integration/ETL Chgs/Maint
Prod Support O&M, Etc
(Mult Depts, ~300 Rqsts/Mo)
2019 – 2020 Data Engineering TECHNICAL Roadmap (Fictional but Realistic)
50
Appendix: Sample 100K Foot Context Diagram (Data & Process)
Core Services
Our
Company
Notes: • Sample Diagram –– Generic and Fictional (but Realistic).
• This Slide is NOT part of a Typical Analytics Presentation – but can provide background info for MVU / MVP.
51
Appendix: Sample Workflow with Swimlanes for Various Participants
Notes: • Sample Diagram –– Generic and Fictional (but Realistic).
• This Slide is NOT part of a Typical Analytics Presentation – but can provide background info for MVU / MVP.
Core-Auth
OurCompany
Core
#TODO:
Add Data Stores
or Replace with
Data-Centric
Repository
Sample
5252
Appendix: Agile Retrospective (Fictional but Realistic Example) #Discuss
Next Slide
Easier to
Read
>>>
5353
Appendix: Agile Retrospective (Fictional but Realistic Example) #Discuss
• Highlight the Most Important Action at the TOP of the Doc – No Fishing for Follow-up.
• Same for Each Topic – Nutshell, Health, etc
• Easy to Find Critical Reusable Assets
(minimize dependence on email)
5454
Appendix: Agile Retrospective (Fictional but Realistic Example) #Discuss
• Don’t Forget about the Climate – How People Feel? What are Attitudes?
• Highlight Special Actions/Follow-ups – Even if the end up in Tickets, Stories, Etc
5555
Appendix: Lite SOP (Fictional but Realistic Example) #Discuss
Next Slide
Easier to
Read
>>>
5656
Appendix: Lite SOP (Fictional but Realistic Example) #Discuss
• Start with the BASICS (What >> Then How)
• Keep it LEAN
(“Ceremony” when Needed)
• Then the DETAILS
57
Contact Info –– For Future Questions plus Updates of this Deck and Underlying Whitepaper
• Many Thanks to Data Con LA (Subash and Team) … and YOU!
• Wide Open to (pro-bono) Q&A after this session. Also see Co-Authoring Opps on this and other topics.
Jeff Bertman, CTO & Lead Data Scientist/Engineer, Dfuse Technologies
• Jeff.Bertman@dfusetech.com (and Jeff.TechBreeze@gmail.com per blog/whitepapers)
• Cell/Text 818-321-3111, Headquarters 877-553-3873
• Also available on WhatsApp (same as cell #), MS Teams, Slack, Google Meet, Zoom, Discord, etc
• FUTURE VERSIONS of this deck in Public Online Folder (https://1drv.ms/u/s!AuvOPf8_XJZVmVjLFWVqLD126XgC?e=83UM7C)
• Underlying Whitepaper – Wide Open to Co-Author Contributions for Future Versions:
Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click)
• Dfuse Technologies Main Site (www.DfuseTech.com) – Nationwide Commercial & Gov/DoD
Q&A, Contact Info, Future Updates
Closing Remarks
58
Make it a Great Day :)

Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterprise: The SEAL Method

  • 1.
    Modernizing the Analyticsand Data Science Lifecycle for the Scalable Enterprise: The SEAL Method CTO and Head of Engineering & Analytics, Dfuse Technologies (“Data Fusion”) Former Director of Data for Warner Bros Digital Networks Jeff Bertman Click2: For FUTURE VERSIONS of this Presentation Deck, click this public online folder (in Microsoft OneDrive). Click4: Data Con Speaker Page (Scroll to “Bertman”) Also Contact via: - www.LinkedIn.com/in/JeffBertman - Jeff.Bertman@DfuseTech.com - Jeff.TechBreeze@gmail.com - Mobile +1 818-321-3111 - More contact info and links at end of deck Click5: Dfuse Technologies Main Site (www.DfuseTech.com) Click3: Whitepaper underlying this Presentation (Details) #NOTES to Audience: (1) Caution: This deck contains some Hollywood “GLITZ” that could be harmful to your Boringzola. Please be prepared to Smile 😊 (2) Thanks for the honor of Serving the Data Con LA Community! Click1: Data Con Main Site (www.DataConLA.com)
  • 2.
    2(Rev 2020-11-01a)2 Version 2020-11-01a:Future Updates are available in two ways. First, thru Dec 31, 2020, updates will be posted on DataConLA.com. Long Term Future Updates of this Deck + Underlying Whitepaper are available via links and contact info on first and second-to-last slides. Agenda Topic Slides Remarks Agenda and Special Reviewer Notes 1 – 3 Incl this Table of Contents. Intros 4 – 5 Setting the Stage 6– 8 Goals,, General Approach, Biz and Tech Landscapes 9 – 21 Analytics and Data Science Lifecycles –– Old and New (SEAL) 22 – 36 ➢ SEAL is explained on slides 25 to 36 Emerging Tech (Relevant to SEAL) 37 – 42 Q&A and Wrap-up 43 – 44 Appendix 45 – 58 Some Additional Sample Artifacts: Roadmaps, Diagrams, Agile Retrospective, Lite SOP, etc
  • 3.
    33 Notes to Audience IMPORTANTIMPORTANT • Underlying Whitepaper: Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click). • Also See Appendix (at end): “If I had more time, this would have been shorter.” -- Paraphrased from B. Pascal (mathematician) and H. D. Thoreau (philosopher). • Some Slides are More Dense Than Typical: Largely because basic info is supplemented with whitepaper style Callouts on certain slides. • Minor Animations are used to Facilitate Readability for Dense Slides: So not all content is displayed up-front (until you click through). • PDF versions of the deck do Not Support the Dazzle (above): On the flip side, PowerPoint dazzle in full decks sometimes runs poorly on slow network connections. • Future Updates available: See first and second-to-last slides for contact info and links.
  • 4.
    44 ➢ Data isour Flagship Boutique – Dfuse is short for “Data Fusion”. Additional Practices cover broader FBI Needs (see “Beyond Data” below). ➢ Our fans call us "Data Doctors" – and we are well known as presenters and expert/keynote panelists in Technical Forums sponsored by Amazon AWS, Oracle, Microsoft, Snowflake, Hitachi Data, DigiMarCon (Marketing Tech Forum), et al. ➢ Ranked #84 in Inc 5000’s IT Services and #107 in IT Development for 2019 and 2020. (See pic to right, and for other awards click here) ➢ Core Service Pillars (Biz and Tech): ”Data Everything,” E-Commerce, Health Industry, Financials, Marketing, Sales, Telecom, Insurance, Gov/Military, Web/Mobile Dev, Tech/Cloud Infrastructure, DevOps, Cybersecurity, Compliances ➢ Major Success Stories for Huge Enterprises in Fortune 100/1000 (Apple, CVS Health, Amgen, EY, Wells Fargo, New York & Company, et al) plus Related Government Spaces (DoD, FDA, NIH, HHS, Defense Health Agency, DoL, et al). And our CTO’s personal background includes Verizon (saved $100 Million in 13 month period), Comcast, WBros, GEICO, CIGNA, Airlines. ➢ SMEs incl Authors, Expert Speakers, Former AWS Chief Data Scientist, Retired CIO Navy ONR, et al. ➢ Dfuse Certified Partnerships include Amazon AWS, Oracle (gold), Microsoft (silver), Symantec, Cofence (security), Proofpoint (enterprise security), Alteryx (Gartner MQ Citizen Data Science), MemSQL (top rated Operational Analytics for Millions of Transactions/Sec), et al. ➢ Dfuse also conveys how we DEfuse or simplify the complexities of modern, advanced technologies to achieve real, measurable success stories. ➢ ROI is a Specialty Topic we present at major national and regional conferences, mentor customers, and achieve 360 initiatives. We make it Goal Driven, e.g., to maximize market share, revenue, profit, cost savings, efficiencies, quality, fulfillment, security, productivity, and more. Dfuse Technologies –– Corporate Highlights
  • 5.
    5 Relevant Success Stories •Small to HUGE Scale Analytic and Operational Environments supporting B2C, B2B, and B2G (E-Commerce and more). • Corporate Scorecard – numerous gains in B2C, B2B, B2G environments. A GEICO MVP, for example, increased internet sales (approx. 26%) and gross profit (approx. 10%). • Built all GEICO’s Internet Analytics System from scratch. • Saved Verizon $100 Million in 13 month period. • Grew > 3X in < 1 Year: Data Modernization for WB Digital / Machinima. • Founded & Led Enterprise Data Sharing Initiatives for WBros, Gov Intel, et al. • Real Time Analytics / Fraud Detection for Insurance, Marketing, eCommerce, et al. • Increased ROI, TCO, TEC, and Various Cost Savings and Efficiencies for numerous business and technical processes. • Routinely work with CxOs of small to huge multi-enterprises (incl Fortune 100/1000 + Gov). Certifications and Speaker Engagements Speaker at National & Regional Conferences – Marketing, Biz Apps, Tech -- Focus on Big Data & Analytics -- (incl Expert Speaker, Keynote, and SME Panels) Work Experience (15+ Years Experience as Hybrid Professional: Chief Architect & Biz Optimization SME) Jeff Bertman –– Career Highlights & Credentials
  • 6.
    6 Presentation Abstract –Brief Discussion It’s no secret that the roots of Data Science date back to the 1960’s and were first mainstreamed in the 1990’s with the emergence of Data Mining. This occurred when commercially affordable computers started offering the horsepower and storage necessary to perform advanced statistics to scale. However, the words “to scale” have evolved over time. The leap to “Big Data” is only one serial aspect of growth. Beyond the typical 1-off studies that catalyzed the field of Data Mining, Data Science now fulfills enterprise and multi-enterprise use cases spanning much broader and deeper data sets and integrations. For example, AI and Machine Learning frameworks can interoperate with a variety of other systems to drive alerting, feedback loops, predictive frameworks, prescriptive engines, continual learning, and more. The deployment of AI/ML processes themselves often involves integration with contemporary DevOps tools. Brief Discussion It’s no secret Now segue to SEAL – the Scalable Enterprise Analytic Lifecycle. In this presentation, you’ll learn how to cover the major bases of a modern Data Science projects – and Citizen Data Science as well – from conception, learning, and evaluation through integration, implementation, monitoring, and continual improvement. And as the name implies, your deployments will be performant and scale as expected in today’s environments. Now seque to SEAL
  • 7.
    7 This deck containsmore GLITZ than usual ☺ And there are layers on several slides. For Best Viewing, DOWNLOAD the PowerPoint Show File (vs viewing online). Thanks! Show Business WARNING
  • 8.
    8 ▪ Thousands ofcontent creators (aka talent partners) ▪ Millions of videos on numerous platforms ▪ Billions of aggregate views / month Expanding Within & Beyond Your Enterprise requires even Greater Scale ▪ BI/Data supports WB Digital Networks, Other WB Divisions, External Companies ▪ Distribution supports other WB Initiatives Most Enterprises These Days Routinely Work in the . . . Recent Years Millions… BILLIONS+Thousands… ▪ Cornerstone Technologies (Big Data focus) #DISCLAIMER: This example is the only WB related slide (non-proprietary). For Example: ▪ Cornerstone Technologies (Big Data focus)
  • 9.
  • 10.
    10 High Level Goaland Considerations GOAL: Challenges 1) Need Speed & Agility Balanced with Due Diligence: 2) Need Innovation Balanced with Pragmatics: Solutioning Approach • The “SEAL” Lifecycle presented here... • Lean Focus -- Use Only the Parts We Need… • Hybrid Solutioning… • Economy of Scale… Achieve maximum business outcomes by leveraging analytics/data science to deliver results on/ahead of time, on/under budget, with enduring results and easily reproducible consistency.
  • 11.
    11 High Level Goaland Considerations GOAL: Challenges 1) Need Speed & Agility Balanced with Due Diligence: It’s tough to provide end-to-end solutioning from brainstorming and concept through evaluation, development, delivery, and integration into all the moving parts of an enterprise. 2) Need Innovation Balanced with Pragmatics: Foster creativity, lateral thinking, and swift delivery while avoiding typical pitfalls such as business specifics or data sources not being accurately vetted (especially when working swiftly), disconnects between business and technical solution which yields inaccurate results, or an under-performant solution. Also need cohesive, cost-effective, adaptive methods to ensure consistent delivery for each story plus integration into the enterprise / multi-enterprise ecosystem. Solutioning Approach • The “SEAL” Lifecycle presented here is rooted on other, mainstream proven methods –– cleanly enhanced and supplemented with critical gap fillers to accomplish our stated goals. • Lean Focus – Use Only the Parts We Need based on story complexity, risks (more risk often but not always = more due diligence), and mission / business criticality. • Hybrid Solutioning involves multiple perspectives. At the highest level, we drive from Top-Down business mission, goals, objectives, KPIs and triangulate from Bottom-Up technology and other enablers – to ensure accurate solutioning as well as timely and cost-effective delivery. • Economy of Scale is attained in many ways. For example, uniting, sharing, and democratizing work between formal data scientists and empowered business/data analysts (aka Citizen Data Scientists). Achieve maximum business outcomes by leveraging analytics/data science to deliver results on/ahead of time, on/under budget, with enduring results and easily reproducible consistency.
  • 12.
    12 LEVERAGE TECHNOLOGY Architecture, Engineering,Methods, Libraries, CM, QA, Security, SysOps, DevOps DATA >> INFO >> KNOWLEDGE >> ACTION Improve BIZ (Revenue, Profit, Market Share, Etc) Always Grow BIZ Value –– Data Intelligence BEST PRACTICES & ~SLAs Continual Improvement, Serviceability, Reliability, Performance, Governance SERVICE ORIENTED Mindset Driven By Clear Mission, Values, Goals & Priorities: Cost-Benefit + “Everyone is a Customer” Approach Culture Credos: Be the Solution, Be the Boss, Value Each Other, A-Team, Executional Excellence, … Example Business Pillars Enabled by Data Technology (Analytics & Operations)
  • 13.
    13 Low Level Processes Get Stuff,Do Stuff, Put Stuff, Etc Raw Data Structured, Semi-Structured, Unstructured Information Technology Data Engineering Data ►► Information Software Engineering Tech ►► Biz Tools Project/Product Management, QA, Security Actualization Apps, Visualizations, Analytics/AI/ML, Reporting, Alerting, Extended Consumption Value-Scape Fulfillment Pyramid Empower BUSINESS Gains thru Technology BUSINESS VALUE. Improve Revenue, Profit, Market Share, Mission Effectiveness, Efficiencies, ROI, TCO, Quality, Timeliness, Accessibility, Safety, and other KPIs
  • 14.
    14 Data Integration Structured, Semi-Structured, Unstructured DataPlatforms, Underlying Network & Storage Tiers From Landing and Data Lakes to Refined Data Stores/Hubs Presentation Optimization Data ►► Information Geo + Media Management Unstructured Content ►► Concepts Data Curation, Governance, Meta, Catalog Actualization Apps, Visualizations, Analytics/AI/ML, Reporting, Alerting, Extended Consumption BUSINESS VALUE. Improve Revenue, Profit, Market Share, Mission Effectiveness, Efficiencies, ROI, TCO, Quality, Timeliness, Accessibility, Safety, and other KPIs Data Virtualization / Fabric Simplify Disparate Sources (& Targets) Value-Scape Fulfillment Pyramid DATA Landscape
  • 15.
    15 Simplifying ROI withAccurate, Focused Results • ROI’s inherently simple formula can quickly become quite cumbersome, e.g., how to measure Customer Satisfaction. This is a main reason why ROI is often discussed but seldom assessed (see empty seat in comic). • But quite often remarkably accurate results can be obtained by blending a Simple Yelp / Zagat “# stars” approach with Zachman based perspectives to yield an At-a-Glance ROI guide for decisioning. • Incorporate Lateral Thinking and Impact Analysis for accurate ROI. Analyze direct and indirect costs, how talent is leveraged across various data focused roles, impact across all environment tiers, transition and training costs, ecosystem, interoperability, vendor viability, etc. Effective ROI Requires Lateral Thinking & Impact Analysis (But Can Still be Simple) Source: Daniel Kuperman “Marketing Humor” series (click) Example Lateral Considerations for ROI for Analytics / Data Science • In-Place Analytics and Citizen Data Science can accelerate 1-off studies and sometimes more (depending on tools). And they can save $$$ by leveraging relatively inexpensive object storage, minimizing use of data integration labor/tools, and providing economy of scale by empowering all data savvy users with advanced analytics and asset sharing. BUT they can also grow data silos, “swamps,” and other technical debt. And they can defer the ability to perform broad and deep analytics, e.g., comparing and contrasting YoY, across market segments, campaigns / initiatives, etc. • Save ~75% Storage Costs for certain DW platforms that use object storage natively with high performance (S3/Blob/ADLS). For example, Snowflake, Microsoft Synapse (as of Nov 2019 release), EMR, etc. Fyi later in 2020 AWS is slated to introduce S3 for AWS Outposts which will influence this topic).
  • 16.
    16 A Glimpse ofthe ROI Ascendancy Model (ROI-AM) NOTES: • The term ROI-AM has not yet been officially released (ETA 2021-Q1). The underlying techniques have many years of success stories in Fortune 500 and Gov arenas. ROI-AM has been presented at major national and regional conferences (incl expert speaker engagements). • The “Fiscal Technology Landscape (FITL)” is a key part of ROI-AM. Details (deck/whitepaper) available upon request. • CMMI is NOT a part of ROI-AM, and not particularly endorsed by this author. It does align with ROI’s maturity theme to help people get the idea. CMMI V2 released 2019-2020 is the first AGILE version -- much better than V1. CMMI ideas such as continual improvement are great!
  • 17.
    17 BIZ ACTIVITIES >> ConventionalData Flow –– Simple Landscape
  • 18.
    18 BIZ ACTIVITIES >> MostActionable CUSTOM TURBO CUSTOM TURBO CUSTOM TURBO #Discuss Raw Curated • Purple Dotted Lines depict “Accelerator” Patterns Snowflake, Firebolt (eval) >>ETC<< >>ETC<< Hadoop, HBase, $ BigTable Parquet, CSV, JSON, Etc Druid, $ MemSQL, $ Vertica ELK, Cassandra, $ Dyn’DB • Best of Both Worlds: Accelerate Time-to-Market while Eliminate Tech Debt • Fyi Details are under the Circles and below the Data Drums Conventional Data Flow –– plus Accelerator Patterns and Tools
  • 19.
    19 BIZ ACTIVITIES >> MostActionable #Discuss Raw Curated • Purple Dotted Lines depict “Accelerator” Patterns Snowflake, Firebolt (eval) >>ETC<< >>ETC<< Hadoop, HBase, $ BigTable Parquet, CSV, JSON, Etc Druid, $ MemSQL, $ Vertica ELK, Cassandra, $ Dyn’DB • Best of Both Worlds: Accelerate Time-to-Market while Eliminate Tech Debt • Fyi Details are under the Circles and below the Data Drums Conventional Data Flow –– plus Accelerator Patterns and Tools
  • 20.
    2020 • (Over)Abundant GREEN highlights NEW Components • FyiSpecific AWS and Azure Components available in Complementary Slide (upon request) • Cannot just “Deploy” per CRISP-DM • INTEGRATE is the word Sample Transactional & Analytic Multi-Purpose Landscape – Integration is CRITICAL
  • 21.
    2121 • (Over)Abundant GREEN highlights NEW Components • #FUTURE: MakeGCP and OCI Versions • Cannot just “Deploy” per CRISP-DM. • INTEGRATE is the word Breakdown (Just FYI – FUN ☺ Complementary Slide with AWS and Azure Components)
  • 22.
    2222 Analytics & DataScience Lifecycles ► Old and New ◄
  • 23.
    23 Cross-Industry Standard Processfor Data Mining: • Focused on Data Mining Silos • Advent 1997 CRISP-DM (1997+) Current Methodologies to Drive Analytics / Data Science Projects Highlights: • CRISP-DM: Wikipedia, Towards Data Science • Data Preparation is often said to hold “most of the work” • Modeling (click) is the ML Core: o Model / Algorithm Selection and Creation (click for decision tree options, etc) o Model Test Plan o Parameter Testing & Tuning
  • 24.
    24 Current Methodologies toDrive Analytics / Data Science Projects Some Modern Alternatives • SAS Institute: SEMMA — Sample Explore Modify Model Assess (click) • IBM: ASUM-DM — Analytics Solutions Unified Method (click) • Microsoft: TDSP — Team Data Science Process lifecycle (click) • Collective University Study in Germany & South Korea, 2020: CRISP-ML(Q) — Focus on QA (click) • Note about Model Selection & Evaluation: Like the University Study above, “There are plenty of ML models and it is out of the scope of this paper to compare and list their characteristics. However, there are introductory books on classical methods… (click)” Where They Fall Short (why not widespread yet?) • People associate them with proprietary tools of their respective vendors. • While filling certain gaps in CRISP-DM, there are more to address: • Scalability / Performance • Accountability / SLAs • DevSecOps (DevOps, Security, Compliances) • Holistic QA and Continual Improvement • Democratized Analytics / Citizen Data Science • etc • Applicable in all modern initiatives, and especially medium to large enterprises. • Underlying SEAL whitepaper (click) has more info. (See “Info on Other Industry Methods” section.) ◄══ Personal Favorite
  • 25.
    25 OLD: Cross-Industry StandardProcess for Data Mining (CRISP-DM) • Focused on Data Mining Silos • Advent: 1997 • Good Explanation (here) NEW: Scalable Enterprise Analytics Lifecycle (SEAL) ➢ Modernized Version of CRISP-DM (w/ considerations of other methods) ➢ Advent: 2020 based on Past Experience in Fortune 1000 and Government ➢ Read on … Scalable Enterprise Analytics Lifecycle (SEAL)CRISP-DM (1997+) Juxtaposing OLD and New: Methodologies to Drive Analytics / Data Science Projects New Implement & Improve Wrangle & Refine Fulfill & Validate Charter Data Define Charter incl Goals & SLAs Optimize Optimize Start
  • 26.
    26 Diving In withSEAL: Scalable Enterprise Analytics Lifecycle ➢ Multi-Faceted / Multi-Team support for data science as well as conventional analytics and citizen/democratized analytics (self-service) ➢ Accommodates Modern Reference Architectures (recall our Value-Scape) A Modernization of CRISP-DM (and Other Proven Methods) Implement & Improve Wrangle & Refine Fulfill & Validate Charter Data Define Charter incl Goals & SLAs Optimize Optimize Start ➢ Lifecycle is Closed Loop with Feedback and Optimization ➢ Charter and Goals Driven so “Understanding” permeates and persists (makes continual improvement possible) ➢ SLAs are Optional but Typical in Modern Enterprises Highlights
  • 27.
    27 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Start … SEAL Highlights Continued: ➢ Scales with Big Data Volume, Velocity, Variety, Veracity, Value (click) ➢ Integrates with Enterprise and Multi-Enterprise Ecosystem ➢ Incorporates Modern DevSecOps (DevOps + Security) ➢ Continuous Improvement is Built-In Last Glimpse –– Juxtaposed with “Deeper” SEAL: • Focused on Data Mining Silos • Advent: 1997 • Good Explanation (here) New Scalable Enterprise Analytics Lifecycle (SEAL)CRISP-DM (1997+)
  • 28.
    28 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Callout 1 Callout 2 Callout 3 Callout 4Callout 6 Callout 7Callout 8 Callout 9 Callout 10 Callout 5 Start ➢ CRISP-DM Parts of SEAL (Jeff’s SEAL article + 3rd Party article) ➢ Read on for Distinguishing Callouts …
  • 29.
    29 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Callout 1 Start Callout #1: Added Charter with Goals & SLAs LEAN, BULLET ORIENTED CONTENT: ✓Goals – seek to compare/contrast, predict, prescribe. Outcomes are generally measurable. ✓Value Proposition Summary – incl brief bullets for Who, What, When, Where, Why, How ✓Cost-Benefit – can be simple score, e.g. 1 low to 5 high ✓Risks & Mitigations ✓SLA Summary (Optional) – Quality, Performance, Availability / Reliability, Security (e.g. click1, click2). Details can be deferred til “Fullfill” and “Implement” phases.
  • 30.
    30 SWOT Summary ValueProposition (for ST Customers unless stated otherwise) • Strengths: o Whatever. • Weaknesses: o Whatever. • Opportunities: o Whatever. • Threats: o Whatever. ♦ WHAT: • Whatever. • Whatever. •. Whatever. ♦ WHO: •. Whatever. ♦ WHEN: •. Whatever. ♦ HOW: • Whatever. • Whatever. ♦ WHERE: • Whatever. ♦ WHY: • Whatever. ♦ BUSINESS VALUE VALIDATION: o Proactive: Preemptive Survey plus AI / ML Predictive performance metrics (OKRs / KPIs) around revenue, profit, market share gain, ROI, etc. o Reactive: Net Promoter (NPS) Contractor Survey plus Actual performance metrics (OKRs / KPIs) around revenue, profit, market share gain, ROI, etc. Example Charter Template Part 1 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC) # Goal Type(s) Opportunity Goals and Profile Cost : Ben : Risk & Value (1-5 Hi : 1-5 Hi) 1 • Mkt Dev: Large Niche + Surge • Prod Dev: Add-On SHORT DESCRIPTION: • Details • Cascade Value Options: o Etc. o Etc. • Goal(s): Summary.. • Market: Grow Customer Base, New Inbound Channel. • Finance: Increase Revenue. • Product Add-On: Whatever. • Product Lines: One, Two. • Deployment / Packaging Options: Whatever. • BIZ 2 : 5 : 1 • TECH 3 : 5 : 2 • VPROP: High • PRIORITY: High
  • 31.
    31 # Nutshell Channels/ Touch Points Risks / Unknowns Mitigations 3 • SHORT DESCRIPTION • Mkt Dev: Large Niche + Surge • Prod Dev: Add-On • Sales: Whatever. • Distribution: Whatever. • Communication: Whatever. • Command & Control (DoD): Whatever. a) Whatever. b) Whatever. c) Whatever. Proactive: a) Whatever. b) Whatever. c) Whatever. Reactive: • Whatever. Key Partners / Vendors Key Resources, Activities, Dependencies Customer Segments & Relationships Critical: • Whatever. • And/or Whatever.: Also see Mashups section below, especially partners for the “Data” drilldown. Resources and Activities: • Biz: Need Inhouse Role as liaison with Whatever. • Tech: Support service request feed … … and Closed Loop back to Whatever. Other Dependencies: See Mashups section below. Segments • Related / Niche Markets. Relationships: • Initial: Whoever for Large Surges / Special Events. • Potential Future: o Whoever (recurring based on bla bla bla). o Whoever (occasional when whatever). Cost Structure Monetization, Revenue Streams & Models Mashups & Critical Integrations Choose (usually just 1): • Value Driven (High Value Proposition) vs Cost Driven (Low Price). • OR Cost Driven (Low Price). • OR Whatever. Monetize by: Whatever. Revenue Stream: Recurring. From Whoever1: Subscription (potentially Tiered by Features/Usage) and/or Per Use/End Client. From Whoever2: #TBD if can monetize, e.g. via Whatever. Genl Behavior: • Initial: Large Surges (due to Event in VProp “When”). • Long Term -- can also Sustain via broadened Whatever Network and satisfied End-Customers. Data & Analytics: • See MVU Board Part 3. Functional Product Dev for: • Whatever / Partner Portal + Key Resources, Activities (above). Example Charter Template Part 2 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC)
  • 32.
    32 # Project Nutshell 3• SHORT DESCRIPTION • Mkt Dev: Large Niche + Surge • Prod Dev: Add-On Semi #Techie Example Charter Template Part 3 of 3 –– MVU / PRD / Enhanced Biz Strategy Canvas (BSC) SLAs INTERNAL SLAs EXTERNAL • Quality: Whatever. • Performance: Whatever. • Availability / Reliability: Whatever. • Security / Compliance: Whatever. • Quality: Whatever. • Performance: Whatever. • Availability / Reliability: Whatever. • Security / Compliance: Whatever. Data INTERNAL Data EXTERNAL (All inbound unless stated otherwise, e.g. for Closed Loop) • Whatever (e.g. Social Network): Whatever. • Whatever (e.g. Property Assets): Whatever. • Whatever (e.g. External Demographics): Whatever. • Validation: For Proactive AI / ML predictive performance metrics (see Profile “Value Validation” section), need access to Whatever. • #BONUS: Can potentially (eventually) SELL Whatever data, optionally blend w/ External data. E.g. Whoever can use in Insurance Risk and Rates determination, etc. General Usage: • 2 Purposes: Pre-Event e.g. Line up Whatever based on Forecasting) + Post-Event (e.g. Adjust Campaign / Continual Improvement). • 2 Patterns: Med to Very High Velocity and Volume (Med Pre-Event. Med to Very High Post-Event.). • Interactions –– Pre-Event: o E.g. Identify real estate properties by Prosperity (across metro, urban, rural), Additional Demographics, Competitive Landscape, etc. o See AI / ML details in Value Proposition “WHAT” and “HOW” sections (MVU Board Part 1). • Interactions –– Post-Event: o Feed Whatever Queue. o Closed Loop feed completed Whatever info back to Whoever. Sources Critical – Data Owners of Whatever Info: • Source1 (Gov): Whatever1. • And/Or Source2 (Cmcl): Whatever2. Sources Optional – For Early Warning Whatever Info, etc: (Start with Just 1 or 2 of the following) • Preferred (Gov+Cmcl): for Whatever1 Events: Whatever org / dataset. • Maybe Cmcl: for Whatever2 org / dataset. • Maybe Gov: for Whatever3 Iorg / dataset.
  • 33.
    33 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) ➢ Interaction and Dependencies across Roles: Data Engineer, Data Scientist, DevOps Engineer ➢ Sets the Stage for Subsequent Phases …(Diagram Source: Unknown #TODO: Replace with custom diagram)
  • 34.
    34 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Start Callout 2 Callout 3 Callout #2: More than just Data Preparation ✓ Wrangle, Visualize when Possible (“for best results”) ✓ Acquire, Prepare, Verify Data Quality ✓ Blend, Refine, Transform Callout #3: Support for Conventional Analytics ✓ Analytics & Data Science can Share the Same Lifecycle ✓ Consistent Results + Economy of Scale ✓ Share Data Assets and Workflows ✓ Certain Tools Enable This Sharing (e.g. Alteryx, KNIME, etc)
  • 35.
    35 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Start Callout 4Callout 6 Callout 5 Callout #4: Add Scalability Checkpoint In Addition to Evaluation of Accuracy / Quality: ✓ Validate Scale Up and Out Scenarios ✓ Cover Data Volume, Velocity, Variety, Veracity, Value (click) Callout #5: Add Feedback Loop to Data Wrangle & Refine ✓ Immediate Gratification >> Agile Spiral ✓ Also update Business and Data Understanding for Current and Future Stories Callout #6: Add Actualize / Visualize Validations ✓ Test 360 Fulfillment of core data as well as edge cases, etc ✓ Prototype / Preview Actualizations, e.g. End Visualizations, Alerts, Data Sharing / Feeds
  • 36.
    36 Diving Deeper withSEAL: Scalable Enterprise Analytics Lifecycle A Modernization of CRISP-DM (and Other Proven Methods) Callouts #7 and #8: Add Integrations and DevSecOps In Addition to Basic Deployment, Prepare and Implement: ✓ Integrations and Orchestrations • Internal Interoperabilities (DevOps, CI/CDeliver, SysOps, Compliance Checks, UAT, CDeploy, Alerts, Inhouse Consumers/Feeds, etc) • External Interactions (Outside Consumers, Interfaces, Partners, etc) ✓ Proof of Performance & Scalability ✓ Proof of Security & Compliance Safeguards / Mitigations Callouts #9 and #10: Add Monitoring & Improvement ✓ Retrospective ✓ Continuous Improvement Feeds (Manual, Drift Detection, Feature Relevance/Noise over time, Imputing Data Quality / Replacement, Model Degradation, etc – e.g. click for model retraining situations) ✓ React Within Project Scope + Enterprise Ripples (if any) ✓ Feedback to Business & Technical “Understanding” ✓ Feedback to / from Project Charter, Goals, SLAs (potentially adjust) Implement & Improve Wrangle & Refine Fulfill & Validate Charter AI/ML Modeling Evaluate Deployment / DevSecOps Monitoring Data Scale Analytic Solutioning Integration (Internal + External) Define Charter incl Goals & SLAs Business Understanding Data Understanding Data Acquisition, Preparation & Blending Actualize / Visualize Optimize Optimize Callout 7Callout 8 Callout 9 Callout 10 Start
  • 37.
    3737 Related Kewl &Emerging Tech ► Accelerators, ROI, Maintainability, Etc ◄ #TODO Refine / Expand this section in Future Version
  • 38.
    3838 Related Kewl &Emerging Tech ► Continuous Innovation ◄ (Source: Gartner Inc. – www.gartner.com) Need Iterations or New Tech to Sustain
  • 39.
    39 © HortonworksInc. 2011 – 2016. All Rights Reserved Title Goes Here ENTERPRISE BUSINESS SOURCES ENTERPRISE DATA LAKE Expedited Track (for special cases) STREAM MONITOR STORE AND PROCESS Data Discovery & Business Intelligence RFID Social WEB POS GPS REAL-TIME ALERTS ACT 32 Heterogeneous Sources REFINED DATA HUB / DW 4b 4a • Patient Data • Historic Events • Vaccination Coverage • HER • Survey Data • Pandemic Influenza Data • Other MED SPECIFIC INPUTS • Census Data • Health Care • Research Data • Geo-Spatial • Other DBMS’s REAL-TIME DATA SOURCES DOWNSTREAM DATA CONSUMERS ANALYTICS TOOLKIT ACTUALIZATION: INSIGHTS & FOLLOW-UP 5a 5b Produce VISUALIZATIONS INGEST 1a 1b INGEST Produce (Mult Options) 5a QUALITY & SECURITY “BY DESIGN” (End-to-End incl QA Checkpoints, SSO, SOD) Also See Security Framework Figure and Governance Callouts DATA >> ACTIONABLE INFO • Other ANALYZE LEARN ADLSS3 hadoop / HDFS IoT/Hub/ Edge GEO (various) Data Journey –– Reference Architecture Options Innovation Opps in Purple
  • 40.
    4040 Related Emerging Tech ►DevSecOps ◄ #TODO Refine / Expand this in Future Version (Source: Gartner Inc. – www.gartner.com)
  • 41.
    41 Artificial Intelligence /Machine Learning (AI / ML) Lifecycle Optimization --- Product Highlights with High Focus On Innovation, ROI, Cost-Benefit, Endurance, EcoSystem, Vendor Viability, Etc Products Why Remarks • Alteryx (.com) “Citizen Data Science Platform” that empowers conventional business/data analysts to perform formal AI/ML while sharing assets and workflows with formal data scientists and AI engineers. Other terms for Citizen Data Science: Democratized Data Science, Self-Service Analytics. Alteryx is a Gartner MQ Leader. Drag-and-Drop No Code support (and optional command- line). “Visualytics” approach renders graphics all along data pipeline – easy to find anomalies early vs waiting for final visuals. Also supports lite-moderate Data Integration (prep/blend/transform). • DataRobot (.com) “AutoML” -- Automate the normally laborious tasks of trial and error, and more. Another form of Democratized Data Science, but still requires some statistical background. Automation of ML Overall (click), Feature Selection (click), Data Exploration & Preparation (click), Deployment (on- prem, managed AI cloud, hybrid & multi-cloud). Data Virtualization / Integration Products Why Remarks • Denodo (.com) or Lyftron (.com) Expedite and centralize access to all data everywhere. Data Catalog can exceed industry leaders (offline discussion). Denodo is Top ~3 and Gartner MQ Leader – read Small Print about Data Virtualization vs DI. Read mostly + lite write support. Lyftron is emerging contender ~75% $less. • Delphix(.com) Virtual Data Cloning of DBMS (only) data sources. Supports full read+write, data masking, access controls, catalog. Purported “data virtualization” is really virtual data cloning – although it is immediate so excellent for large scale testing and CI/CD using real or scrubbed/masked production data. Clones consume ~zero initial space (grows only on deltas). • Magpie (Silect.is) Automated Data Exploration and Data Prep combined with Data Catalog and Governance/Security features. Powered by Apache Spark plus proprietary features. SaaS supports AWS, Azure, GCP. #TODO: Refine / Add Info
  • 42.
    42 Lifecycle Optimization ---Product Highlights with High Focus On Innovation, ROI, Cost-Benefit, Endurance, EcoSystem, Vendor Viability, Etc DevSecOps and ALM (Lifecycle) Optimization Products Why Remarks • ProofMethod (.io) Centralized Monitoring and Insights of End-to-End Plan, Build, Test, Deploy activities. Works with Jira, GitHub, Jenkins (and more coming). Blended metrics with visibility for past week, month, quarter, year, etc. Show Plan Metrics (issues by lead time and counts by status/type/priority/etc; Code Metrics (avg commits by time, top/least contributors by pulls, commits, merges); Test Metrics (results over time), Pipeline Metrics (results over time, duration by build/run, etc). • DataRobot (.com) – Yes Again ☺ Automated MLOps and Monitoring (e.g. Drift Detection, Feature Relevance vs Noise over Time, etc) Centralized UI panel (and command-line) for deployment, monitoring/insight, management, and governance of machine learning models in production environments – across any cloud. Also see AI/ML section for use of DataRobot in Development, etc. • Liquibase (.org and .com) fka Datical Automated DataOps including CI/CD for DBMS’s (click for integrations list). DataOps tracking, versioning, and CI/CD incremental deployment support for database changes. Millions of developers to date on open source version. Also also paid support options (e.g. $33 to $99 per connection per month). Mainly for SQL Databases plus recently added MongoDB, Cassandra (in 2020-H1) -- adding more NoSQL platforms over time. • CloudHealth (VMware.com) Multi-Cloud Infrastructure Financial Management & Insights, Provisioning Policies, and Security Risk Analysis. CloudHealth also integrates with third party monitoring tools for additional depth (DataDog, New Relic, WaveFront, etc). AWS CloudWatch updates Oct 12, 2020 (click) add improved UI and works with Cost Explorer and Trusted Advisor. But CloudHealth is Multi-Cloud + has other distinguishers such as ease of management across accounts, identifying/terminating un/under-used assets, RI optimization (#TBD how different for Savings Plans), highlighting old/upgradeable EC2/VM instance types, highlighting/managing storage tiering costs, optimizing containerization costs by comparing utilization vs provisioning (click for more info). #TODO: Refine / Add Info
  • 43.
    43 Contact Info ––For Future Questions plus Updates of this Deck and Underlying Whitepaper • Many Thanks to Data Con LA (Subash and Team) … and YOU! • Wide Open to (pro-bono) Q&A after this session. Also see Co-Authoring Opps on this and other topics. Jeff Bertman, CTO & Lead Data Scientist/Engineer, Dfuse Technologies • Jeff.Bertman@dfusetech.com (and Jeff.TechBreeze@gmail.com per blog/whitepapers) • Cell/Text 818-321-3111, Headquarters 877-553-3873 • Also available on WhatsApp (same as cell #), MS Teams, Slack, Google Meet, Zoom, Discord, etc • FUTURE VERSIONS of this deck in Public Online Folder (https://1drv.ms/u/s!AuvOPf8_XJZVmVjLFWVqLD126XgC?e=83UM7C) • Underlying Whitepaper – Wide Open to Co-Author Contributions for Future Versions: Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click) • Dfuse Technologies Main Site (www.DfuseTech.com) – Nationwide Commercial & Gov/DoD Q&A, Contact Info, Future Updates Closing Remarks
  • 44.
    44 Make it aGreat Day :)
  • 45.
    4545 Appendix Some Example Artifacts:Generic and Fictional (but Realistic)
  • 46.
    4646 Appendix: Roadmapping InnovationHighlights Heads-up on Related Future Article: Preemptive Agile Roadmapping (PAR)
  • 47.
    47 Symbol Description Remarks DoneCompleted work In Progress Aka Work in Progress (WIP) [Outplan] Popup Work not previously planned on Roadmap For example, due to new government regulations, time-to-market race, etc. Kickoff Meeting Incl Interoperability Requirements, Risks & Mitigations, etc. Moved to Future Typically due to Outplan work. Sometimes we can squeeze schedule or use flex resources to accommodate. Other times we slide like this. Moved SOONER than originally planned Occurs when running ahead of schedule (various reasons, e.g. new staff, etc). Cautionary Note Generally requires discussion with stakeholders. #TBD “Dust” tag to resolve something unknown Can impact Roadmap, depending on outcome or delay in resolving. #WAIT “Dust” tag indicates something is pending Can impact Roadmap if dependency arrives/occurs late. #DEPEND “Dust” tag indicates some other kind of dependency Can impact Roadmap if dependency arrives/occurs late. #IDEA Discussion tag about a topic to discuss Discussion typically includes Biz and/or Tech Stakeholders (dep on which Roadmap). NOTES: • Beyond straight planning and prioritizing, Roadmap Meetings (1:1 and Roadmap “Unity Sessions”) are a great way to socialize how we are shaping our Future together ☺ • I generally prefer sharing Roadmaps via a SaaS app such as Aha or RoadMunk, etc. But PowerPoint or Google Slides can also suffice. Appendix: Legend for Technical & Business Roadmaps (Next 2 Slides)
  • 48.
    48 Rev: 2019-09-17a Q1 2020 ABCBiz Expansion – Wave D • Estimated Earnings Refinement (SChat supersedes Fan Funding) • Cross-Enterprise Workflows LOB Group 2 • Platform Expansion Ph 2b: Full for AVD, PTV; MVP for FB • SAP Migration PRD + Pilot #TBD Need more granular contract info? XYZ Consumer Expansion • CDP Direct Automation Ph 2: PRD (Replace EMI weekly/monthly ingest) • FB General Ph 2: MVP + Full Core Intelligence Platform (CIP) • Social Analytics Data Group 2 • YT Video Enhance Ph 3: MVP + Implem • TWITCH Ph1 MVP Automation • #TBD Enhance/Start New Platforms: Verizon Oath (supercedes Go90), Pinterest, Sony VUE, SOHU Global Wave B, Xumo • Platform Driven Integration Changes Q3 2019 ABC Biz Expansion – Wave B • Paid Features Ph 1: PRD + MVP Interim Ops • Cross-Enterprise Workflows Pilot • Talent Referrals Interim Semi-Auto Ops • Contracts Ph 2: Retro Metrics • [Outplan] Google DFP Data Feed Mods Ph 1 • [Outplan] Platinum Tier Fees: MVP Pilot XYZ Consumer Expansion • FB Video Data Ph 1: PRD + MVP1 • [Outplan] FB Driven Major Data Enhancements – Implem Core Intelligence Platform (CIP) • Social Analytics Data Group 1a • Multi-Platform Contracts PRD + MVP • [Outplan] Multi-Platform Platinum (ripple) • Salesforce Integration Ph 1b: MVP • Google DFP (Direct Sales) Changes • Platform Driven Integration Changes Q4 2019 ABC Biz Expanson – Wave C • Paid Features Ph 2: Auto-Pay to Scale • Cross-Enterprise Workflows LOB Group 1 • Platform Expansion Ph 2a MVPs o Amazon Video Direct (AVD) o Pluto TV (PTV) • [Outplan] Google DFP Data Feed Mods Ph 2 • Platinum Tier Fees: Full Implem Oct 16 Cycle XYZ Consumer Expansion • FB Video Data Ph 2: MVP2 + Full (#WAIT for YT to Populate Estimated Earnings) • TWITCH General Ph 1: PRD + MVP Core Intelligence Platform (CIP) • Social Analytics Data Groups 1b + 4 • Salesforce Integration Ph 2: Implem • YT Video Enhance Ph 2: PRD (e.g. Video Topic Xref to Game Types, etc) • YT Bulk -- Add Lower Tiers Part 1 • Platform Driven Integration Changes 2019 – 2020 Data Engineering BUSINESS Roadmap (Fictional but Realistic) Q2 / Q3 2020 ABC Biz Expansion – Wave E • Affiliate Enhancements Ph 2 • #TBD Director Payments • Cross-Enterprise LOB Group 3 • Platform Expansion Ph3: Full for FB, MVP for Comcast • SAP Migration Full Implem • Acctg QA Auto Ph 2: Tax Withhold • Acctg QA Auto Ph 3: Wire Xfer + ACH • Talent Referrals Full Prod XYZ Consumer Expansion • CDP Direct Auto Ph 3: Implem (Replace EMI weekly/monthly ingest) • #TBD TWITCH Ph 2: Implem Core Intel Platform (CIP) • Social Analytics Data Group 3 • YT Bulk -- Add Lower Tiers Part 2 • #IDEA Video Topic Ph3: Lexicon • JIRA Video Classification f/ SF • YT Legacy Ruby Migration (Upgrade or Python) • Platform Driven Integ Changes
  • 49.
    49 Rev: 2019-09-17a Q1 2020 NEXTGENCloud • Core Ph 5: Analytics BIG Boost 2 (ABB) Implem: Migrate Dim Models to Snowflake • Enterprise Data Sharing Initiative (EDSI) Ph 2 PRD ENHANCE TECH HUB • Data Readiness Dashboard (DRD) Ph 2 • Data Surge Framework (DSF) Ph 1 #TBD Q4 2018(?) PRD + Lite MVP CORE Platforms • MDM Ph 2 PRD Off-YT OTT PPW • DISTRO Ph 1 PRD Spec for Integrtn • Platform Integration/ETL Chgs/Maint Prod Support O&M, Etc (Mult Depts, ~250 Rqsts/Mo) Q3 2019 NEXTGEN Cloud • Core Ph3: Implement thru Staging • Enterprise Data Sharing Initiative (EDSI) Ph 1 Implem ENHANCE TECH HUB • Data Mgmt Framework (DMF) Ph 2 • Core Intel Platform (CIP) Multi-Tenant Ph 2 • Email Accts Expansion Framework (OTT etc) CORE Platforms • [Outplan HUGE] System/Network Admin Gap-Fill • E-Commerce Encrypt Upgrade (Full Prod) • Sec GDPR Consumer Data Purge Auto (Mult Depts) • Biz Contracts + Invoices Doc (Mult Depts) • Analytics Social Media Accts MFA (Mult Depts) • YT Tech Survey Response (API etc) • [Outplan] Degraded OnPrem HW Workarounds (after long A/C outage mid-summer) • Platform Integration/ETL Chgs/Maint Prod Support O&M, Etc (Mult Depts, ~100 Rqsts/Mo) Q4 2019 NEXTGEN Cloud • Core Ph4: Implement Production incl Analytics Big Boost 1 (ABB) Implem • Core Ph4: Analytics BIG Boost 2 (ABB): Migrate Dim Models to Snowflake ENHANCE TECH HUB • Data Anomaly Detection (DAD) Ph 2 • Data Readiness Dashboard (DRD) Ph 2 CORE Platforms • #TBD MDM Need Sooner than Q1-Q2-2020 (e.g. for OTT Pmt Integ Ph 3)? • Platform Integration/ETL Chgs/Maint Prod Support O&M, Etc (Mult Depts, ~150 Rqsts/Mo) Q2 / Q3 2020 NEXTGEN Cloud • Enterprise Data Sharing Initiative (EDSI) Ph 2 Implem • Multi-Cloud Optimization Ph 2 PRD + Implem ENHANCE TECH HUB • Data Readiness Dash (DRD) Ph 3 – Full Implem • Data Surge Framework (DSF) Ph 2 CORE Platforms • Q1?: MDM Ph 2 Implem OTT PPW • Q2?: DISTRO Ph 2 MVP Implem • #TBD Intra-Video Metrics Ph1 PRD • Platform Integration/ETL Chgs/Maint Prod Support O&M, Etc (Mult Depts, ~300 Rqsts/Mo) 2019 – 2020 Data Engineering TECHNICAL Roadmap (Fictional but Realistic)
  • 50.
    50 Appendix: Sample 100KFoot Context Diagram (Data & Process) Core Services Our Company Notes: • Sample Diagram –– Generic and Fictional (but Realistic). • This Slide is NOT part of a Typical Analytics Presentation – but can provide background info for MVU / MVP.
  • 51.
    51 Appendix: Sample Workflowwith Swimlanes for Various Participants Notes: • Sample Diagram –– Generic and Fictional (but Realistic). • This Slide is NOT part of a Typical Analytics Presentation – but can provide background info for MVU / MVP. Core-Auth OurCompany Core #TODO: Add Data Stores or Replace with Data-Centric Repository Sample
  • 52.
    5252 Appendix: Agile Retrospective(Fictional but Realistic Example) #Discuss Next Slide Easier to Read >>>
  • 53.
    5353 Appendix: Agile Retrospective(Fictional but Realistic Example) #Discuss • Highlight the Most Important Action at the TOP of the Doc – No Fishing for Follow-up. • Same for Each Topic – Nutshell, Health, etc • Easy to Find Critical Reusable Assets (minimize dependence on email)
  • 54.
    5454 Appendix: Agile Retrospective(Fictional but Realistic Example) #Discuss • Don’t Forget about the Climate – How People Feel? What are Attitudes? • Highlight Special Actions/Follow-ups – Even if the end up in Tickets, Stories, Etc
  • 55.
    5555 Appendix: Lite SOP(Fictional but Realistic Example) #Discuss Next Slide Easier to Read >>>
  • 56.
    5656 Appendix: Lite SOP(Fictional but Realistic Example) #Discuss • Start with the BASICS (What >> Then How) • Keep it LEAN (“Ceremony” when Needed) • Then the DETAILS
  • 57.
    57 Contact Info ––For Future Questions plus Updates of this Deck and Underlying Whitepaper • Many Thanks to Data Con LA (Subash and Team) … and YOU! • Wide Open to (pro-bono) Q&A after this session. Also see Co-Authoring Opps on this and other topics. Jeff Bertman, CTO & Lead Data Scientist/Engineer, Dfuse Technologies • Jeff.Bertman@dfusetech.com (and Jeff.TechBreeze@gmail.com per blog/whitepapers) • Cell/Text 818-321-3111, Headquarters 877-553-3873 • Also available on WhatsApp (same as cell #), MS Teams, Slack, Google Meet, Zoom, Discord, etc • FUTURE VERSIONS of this deck in Public Online Folder (https://1drv.ms/u/s!AuvOPf8_XJZVmVjLFWVqLD126XgC?e=83UM7C) • Underlying Whitepaper – Wide Open to Co-Author Contributions for Future Versions: Medium.com (https://link.medium.com/i864jb47f9) or ~same article on LinkedIn (click) • Dfuse Technologies Main Site (www.DfuseTech.com) – Nationwide Commercial & Gov/DoD Q&A, Contact Info, Future Updates Closing Remarks
  • 58.
    58 Make it aGreat Day :)