Nathan Bijnens
Manager, Belux CSU Data Team
Data Mesh in Azure
using Cloud Scale Analytics
What we’ve heard
To spend less time preparing
data
Robust data governance
Platform to actionable Insights to
the business
Ability to increase the value of
hidden data
Improve Operational Efficiency
Ideally, organizations
want to have…..
Reduce cost of data engineering
Need for Frictionless
Data Governance
Difficult to balance
access and data
protection
Data and Analytics
Operationalization
Enable Lines of Businesses
Poor data quality
Disparate systems
and data silos
Too slow moving
from data to decision
Barriers
to
achieve
business
outcomes
Unified ecosystem
Project prioritization
Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.
We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration
Business Drivers
•Lack of data
ownership
Lack of data quality
Difficult to see
interdependencies
Model conflicts
across business
concerns
Tremendous effort
for integration and
coordination leads
to bypasses
Business and IT
work in silos
Disconnect
between the data
producers and data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
technical
dependencies
Small changes
become risky due
to unexpected
consequences
Technical
ownership rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.
Problems with Existing Architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities into one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized Architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations to create harmonized data
• Central platform serves as large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers
Transformational Trends in the Data Landscape
Massive increase of computing power, driven
by hardware innovation (SSD storage, in-
memory storage, GPU advances) lets us move
data to compute faster.
Cloud and APIs make it easier to integrate.
Software & Platform as a Service (SaaS, PaaS)
offerings push the connectivity and API usage
even further.
Explosion of tools
New (open source) concepts are introduced,
such as NoSQL database types, block chain,
new database designs, distributed models
(Hadoop), new analytical methods, etc.
Exponential growth of data, especially external
data sources like open and social data.
Internal, external, structured, and unstructured
data are all used to deliver additional insights.
Eco-system connectivity
Exponential growth of data
Increase of computing power
Stronger regulatory requirements, such as
GDPR and BCBS 239, are coming into effect
worldwide. Data quality and lineage become
more important every day.
Increased regulatory attention
The read/write ratio has changed due to more
intensive data consumption: data is read more
often, there is increased real-time consumption
and more searches are performed.
Increase of read/write ratio
Data as a Product
Data as a Product
Data is no
longer a
side-effect,
it’s a product.
Who are my
"customers"?
What do my
"customers"
need?
Are they
happy with
the data? Are
they using it?
How do I let
my
"customers"
know my
data exists?
What is in it
for the
"customer"?
Data Product Owner
Domain
Data
Product
Owner
Data
Engineer
Software
Developer
Infra
Engineer
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Data Product Properties
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
• Overview of product in central data catalog
• Provide easy discoverability
Discoverable
• Help users access the product
programmatically
Addressable
• Data Product Owners provide monitored SLOs
• Data is cleansed and up to standard
Trustworthy
• Minimal friction for data engineers and
scientists to use the data
Self-describing
• Open standards for harmonization
• Field type formatting
Interoperable
• Access control policies
• Use SSO and RBAC
Secure
Data Mesh
Data Mesh
Data Mesh is a new decentralized
socio-technical approach to
managing data, designed to work
with organizational complexity and
continuous growth. It enables large
organizations to get value from their
data, at scale, through reusability,
analytics and ML. It is building on the
Domain Driven Design methodology.
Data
Mesh
Domain
Driven
Design
Domain
Zones
Data
Products
Consumed
by other
Domains
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Centralized Implementation is not working!
GSR
Finance
HR
Travel
Sales
Clinical Ops
Centralized Platform
LOBs are the SMEs and Shared
Service team is not able to cope
up with the projects
Datasets sprawls
Competing needs within the
organization
• IT needs to standardize
• LOBs need to implement analytics
Primitive Data Strategy
Introduction to Data Domains
Search
Keywords
Promotions
Top
Selling
Products
Orders
Customer
Profiles
Data Products
Integration
Services
Operational
Systems
Marketing
Domain
Customer Services
Domain
Order Management
Domain
• A domain is a collection of people, typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains.
• Ensure data is accessible, usable, available, and meets the quality criteria defined.
• Evolve data products based on user feedback and retire data products when they become irrelevant.
Domain Zones
Engineering
Finance HR Innovation
Program 1 Operations
Management zone
Data products
Data Domains
Microsoft Enterprise Data Mesh
Domain Zone
Domain Zone
Environment for each LOB
LOBs: Implement Data Services
• ex: Exploration Service, Data Order System
LOBs: Build and Share Data Products
• ex: Sales Forecast, Clean Room Performance
Automated using templates
• security, integration, monitoring, etc
E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Licensing & Usage
Observation & Monitoring
Domain Architecture
Shift towards Domain Ownership
A new type of eco-system architecture, which shifts to the left towards a modern distributed
architecture that enables domain-specific data and data products, empowering each domain to handle
their own data pipelines.
Supporting governance and domain-agnostic platform infrastructure
Data Providers Data Product
Data Providers Data Product
Data Providers Data Product
Source-oriented
Domains
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumer-
specific
Transformation
Data Consumer
Consumption-oriented
Domains
Domain Zones
Data Products
Domain Zone
HR
Recruitment
Time Tracking
Employee Value
And Performance
Training and
Development
Engagement and
Retention
Engineering Operations
New Project :
Digital Twin
Clean Room
Personnel
• Map your data domains organically, during the onboarding of data
providers and consumers.
• Reference your business capabilities (e.g., strategy and processes) while
mapping your data domains.
• Isolate your data domains and enable communication through data
products like APIs or events.
• Create and document a shared, ubiquitous language that different domains
can use to communicate.
• Determine boundaries for both business and technical granularity.
Data Domain Considerations from the Field
Enterprise Scale for
Analytics
Cloud-scale Analytics Framework
Enterprise Scale: Azure Landing Zones
The main purpose of a “Landing Zone”
is to ensure that when a workload
lands on Azure, the required
“plumbing” is already in place,
providing greater agility and
compliance with enterprise security
and governance requirements.
Data Management Landing Zone
Data Management Landing Zone
Business Glossary
Data Discovery
SLAs Business Rules Ref. Data Mgmt.
Master Record Mgmt.
Data Policy
Access Governance
Loss Prevention
Privacy Operations
Risk Assessment
Repository for Data
Models
Integration
API Documentation
Automation for provisioning landing zones, data
integrations, and products
Pre-configured network and monitoring setup Standard images for deploying analytics and AI services
Azure Subscription Azure Policy
Data Landing Zone
Core
Networking Shared
Products
Ingest and
Processing
Upload
Data Lake
Services
Metadata
Services
Preconfigured
network and
monitoring setup
Data lake configured
with layers and
connectivity
Spark and
scheduling
engines
Blobs where 3rd parties
can upload their data
Scanners for data
governance/metadata
required by landing
zone
Analytics engines for
exploratory analytics
Data
Integration
Data
Integration #
Data Integration Teams are responsible for the ingestion of data to a
read data source. The data shouldn’t have any data transformation
applied apart from data quality checks and data type verification.
Data
Integration #
Pull SAP Data into
Landing Zone #
Streaming interface
to pull data from
heat sensors
Data
Products
Data Product
#
Data Product #
Financial Reporting
pulling Customers and
Sales together
Streaming Machine
Data from Read Data
Source
Data products fulfil a specific business need using data. Data products
manage, organize, and make sense of data across domains and
present the insights gained from the data products.
A data product is a result from one or many data integrations and/or
other data products.
Infrastructure
as
Code
Azure Event
Hubs
Azure Data
Lake Store Gen2
Storing read-optimized
domain data
Data
Product
Team
Data
Product
Team
Data
Product
Team
Data
Product
Team Data Onboarding Team
Data Integration
Synapse
Analytics
Data
Product
Team
Data
Product
Team
Data
Product
Team
Real-time applications,
operational systems
Self-service BI,
semantic models
Analytical applications
Data
Engineering
Team
Data Management
Landing Zone
Data Governance
Team
Azure Purview
Data Lake Services
Azure Data
Factory
Transforming into read-
optimized data products
Data Integration
Data Integration
Data Landing Zone
Azure Databricks
Shared Service
Data-driven
applications
Data Product
Data Product
Example Reference Architecture for Data Mesh in a Small Company
Data Product
Optimize Existing Implementation Patterns
Take a new approach to data management that supports and evolves with your strategy.
The data management and analytics scenario supports a range of patterns to
build on your current data infrastructure, to help you modernize and scale from where you are.
Data Warehouse Data Lake Data Lakehouse Data Mesh
Data Fabric
Integrating your DWH in a Data Mesh
 From be-all end-all to yet another Data Product in your mesh
 Ownership based on your preference
 DWH is data product on its own: managed by one data product team
 DWH serves as "wrapper" for multiple data products: managed by multiple teams
 DWH consumes data from multiple Data Products
 Multiple Data Products consume data from DWH
Agile Data Management
Enforce data governance and security.

Serve data as a product rather than a byproduct.

Provide an ecosystem of data products.

Create data domains to serve lines of business.

Empower teams to drive analytics solutions that deliver value to the business.

Modernize your teams and operations.

Prepare your company to:
Multi Organization
Data Mesh
Contoso
Managem
ent zone
Data products
Data Domains
Multi Organization Data Mesh
Finance
HR
Organization
Contoso
Managem
ent zone
Finance
HR
Contoso
Managem
ent zone
Finance
HR
Interested in
learning more?
Reach out to
Nathan.Bijnens@microsoft.com
Links
DDD
Best Practice - An Introduction To Domain-Driven Design | Microsoft Docs
Introduction into Domain-Driven Design (DDD) (jannikwempe.com)
IBM Automation Event-Driven Reference Architecture – Domain Driven Design (ibm-
cloud-architecture.github.io)
Data Mesh
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
(martinfowler.com)
Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes
Beyond the Data Lake - Databricks

Data Mesh in Azure using Cloud Scale Analytics (WAF)

  • 1.
    Nathan Bijnens Manager, BeluxCSU Data Team Data Mesh in Azure using Cloud Scale Analytics
  • 2.
    What we’ve heard Tospend less time preparing data Robust data governance Platform to actionable Insights to the business Ability to increase the value of hidden data Improve Operational Efficiency Ideally, organizations want to have….. Reduce cost of data engineering Need for Frictionless Data Governance Difficult to balance access and data protection Data and Analytics Operationalization Enable Lines of Businesses Poor data quality Disparate systems and data silos Too slow moving from data to decision Barriers to achieve business outcomes Unified ecosystem Project prioritization
  • 3.
    Every application thatcreates data, needs and will have a database Application A Application B Consequently, when we have two applications, we hypothesize that each application has its own ‘database’. When there is interoperability between these two applications, we expect data to be transferred from one application to the other. Every application, at least in the context of data management, that creates data, needs and will have a database. Even stateless applications that create data have “databases”. In these scenarios the database typically sits in the RAM or in a temp file.
  • 4.
    We can’t escapefrom data integration Application A Application B The ‘always’ required data transformation lies in the fact that an application database schema is designed to meet the application’s specific requirements. Since the requirements differ from application to application, the schemas are expected to be different and data integration is always required when moving data around. A crucial aspect when it comes to data transfer is that data integration is always right around the corner. Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data integration* dilemma. Data integration
  • 5.
    Business Drivers •Lack ofdata ownership Lack of data quality Difficult to see interdependencies Model conflicts across business concerns Tremendous effort for integration and coordination leads to bypasses Business and IT work in silos Disconnect between the data producers and data consumers Central team becomes the bottleneck Difficult to apply policy and governance Hard to see technical dependencies Small changes become risky due to unexpected consequences Technical ownership rather than data ownership Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi- disciplinary organizations.
  • 6.
    Problems with ExistingArchitectures There’s a deep assumption that centralization is the solution to data management. This includes centralizing all data and management activities into one central team, building one data platform, using one ETL framework, using one canonical model, etc. Transactional Sources Analytical Consumers Centralized Architecture • Single team with centralized knowledge and book of work • Centralized pipelines for all extraction / ingestion activities • Centralized transformations to create harmonized data • Central platform serves as large integration database: all execution and analysis is done on the same platform Data providers Data consumers Central engineering team Transactional Sources Transactional Sources Analytical Consumers Analytical Consumers
  • 7.
    Transformational Trends inthe Data Landscape Massive increase of computing power, driven by hardware innovation (SSD storage, in- memory storage, GPU advances) lets us move data to compute faster. Cloud and APIs make it easier to integrate. Software & Platform as a Service (SaaS, PaaS) offerings push the connectivity and API usage even further. Explosion of tools New (open source) concepts are introduced, such as NoSQL database types, block chain, new database designs, distributed models (Hadoop), new analytical methods, etc. Exponential growth of data, especially external data sources like open and social data. Internal, external, structured, and unstructured data are all used to deliver additional insights. Eco-system connectivity Exponential growth of data Increase of computing power Stronger regulatory requirements, such as GDPR and BCBS 239, are coming into effect worldwide. Data quality and lineage become more important every day. Increased regulatory attention The read/write ratio has changed due to more intensive data consumption: data is read more often, there is increased real-time consumption and more searches are performed. Increase of read/write ratio
  • 8.
    Data as aProduct
  • 9.
    Data as aProduct Data is no longer a side-effect, it’s a product. Who are my "customers"? What do my "customers" need? Are they happy with the data? Are they using it? How do I let my "customers" know my data exists? What is in it for the "customer"?
  • 10.
    Data Product Owner Domain Data Product Owner Data Engineer Software Developer Infra Engineer Howto Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 11.
    Data Product Properties Howto Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani • Overview of product in central data catalog • Provide easy discoverability Discoverable • Help users access the product programmatically Addressable • Data Product Owners provide monitored SLOs • Data is cleansed and up to standard Trustworthy • Minimal friction for data engineers and scientists to use the data Self-describing • Open standards for harmonization • Field type formatting Interoperable • Access control policies • Use SSO and RBAC Secure
  • 12.
  • 13.
    Data Mesh Data Meshis a new decentralized socio-technical approach to managing data, designed to work with organizational complexity and continuous growth. It enables large organizations to get value from their data, at scale, through reusability, analytics and ML. It is building on the Domain Driven Design methodology. Data Mesh Domain Driven Design Domain Zones Data Products Consumed by other Domains How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 14.
    Centralized Implementation isnot working! GSR Finance HR Travel Sales Clinical Ops Centralized Platform LOBs are the SMEs and Shared Service team is not able to cope up with the projects Datasets sprawls Competing needs within the organization • IT needs to standardize • LOBs need to implement analytics Primitive Data Strategy
  • 15.
    Introduction to DataDomains Search Keywords Promotions Top Selling Products Orders Customer Profiles Data Products Integration Services Operational Systems Marketing Domain Customer Services Domain Order Management Domain • A domain is a collection of people, typically organized around a common business purpose. • Create and serve data products to other domains and end users, independently from other domains. • Ensure data is accessible, usable, available, and meets the quality criteria defined. • Evolve data products based on user feedback and retire data products when they become irrelevant.
  • 16.
    Domain Zones Engineering Finance HRInnovation Program 1 Operations Management zone Data products Data Domains Microsoft Enterprise Data Mesh
  • 17.
    Domain Zone Domain Zone Environmentfor each LOB LOBs: Implement Data Services • ex: Exploration Service, Data Order System LOBs: Build and Share Data Products • ex: Sales Forecast, Clean Room Performance Automated using templates • security, integration, monitoring, etc
  • 18.
    E N TE R P R I S E R E Q U I R E M E N T S Security & Privacy Governance & Compliance Availability & Recovery Performance & Scalability Skills & Training Licensing & Usage Observation & Monitoring Domain Architecture
  • 19.
    Shift towards DomainOwnership A new type of eco-system architecture, which shifts to the left towards a modern distributed architecture that enables domain-specific data and data products, empowering each domain to handle their own data pipelines. Supporting governance and domain-agnostic platform infrastructure Data Providers Data Product Data Providers Data Product Data Providers Data Product Source-oriented Domains Consumer- specific Transformation Data Consumer Consumer- specific Transformation Data Consumer Consumer- specific Transformation Data Consumer Consumption-oriented Domains
  • 20.
    Domain Zones Data Products DomainZone HR Recruitment Time Tracking Employee Value And Performance Training and Development Engagement and Retention Engineering Operations New Project : Digital Twin Clean Room Personnel
  • 21.
    • Map yourdata domains organically, during the onboarding of data providers and consumers. • Reference your business capabilities (e.g., strategy and processes) while mapping your data domains. • Isolate your data domains and enable communication through data products like APIs or events. • Create and document a shared, ubiquitous language that different domains can use to communicate. • Determine boundaries for both business and technical granularity. Data Domain Considerations from the Field
  • 22.
  • 23.
    Enterprise Scale: AzureLanding Zones The main purpose of a “Landing Zone” is to ensure that when a workload lands on Azure, the required “plumbing” is already in place, providing greater agility and compliance with enterprise security and governance requirements.
  • 24.
    Data Management LandingZone Data Management Landing Zone Business Glossary Data Discovery SLAs Business Rules Ref. Data Mgmt. Master Record Mgmt. Data Policy Access Governance Loss Prevention Privacy Operations Risk Assessment Repository for Data Models Integration API Documentation Automation for provisioning landing zones, data integrations, and products Pre-configured network and monitoring setup Standard images for deploying analytics and AI services Azure Subscription Azure Policy
  • 25.
    Data Landing Zone Core NetworkingShared Products Ingest and Processing Upload Data Lake Services Metadata Services Preconfigured network and monitoring setup Data lake configured with layers and connectivity Spark and scheduling engines Blobs where 3rd parties can upload their data Scanners for data governance/metadata required by landing zone Analytics engines for exploratory analytics Data Integration Data Integration # Data Integration Teams are responsible for the ingestion of data to a read data source. The data shouldn’t have any data transformation applied apart from data quality checks and data type verification. Data Integration # Pull SAP Data into Landing Zone # Streaming interface to pull data from heat sensors Data Products Data Product # Data Product # Financial Reporting pulling Customers and Sales together Streaming Machine Data from Read Data Source Data products fulfil a specific business need using data. Data products manage, organize, and make sense of data across domains and present the insights gained from the data products. A data product is a result from one or many data integrations and/or other data products. Infrastructure as Code
  • 26.
    Azure Event Hubs Azure Data LakeStore Gen2 Storing read-optimized domain data Data Product Team Data Product Team Data Product Team Data Product Team Data Onboarding Team Data Integration Synapse Analytics Data Product Team Data Product Team Data Product Team Real-time applications, operational systems Self-service BI, semantic models Analytical applications Data Engineering Team Data Management Landing Zone Data Governance Team Azure Purview Data Lake Services Azure Data Factory Transforming into read- optimized data products Data Integration Data Integration Data Landing Zone Azure Databricks Shared Service Data-driven applications Data Product Data Product Example Reference Architecture for Data Mesh in a Small Company Data Product
  • 27.
    Optimize Existing ImplementationPatterns Take a new approach to data management that supports and evolves with your strategy. The data management and analytics scenario supports a range of patterns to build on your current data infrastructure, to help you modernize and scale from where you are. Data Warehouse Data Lake Data Lakehouse Data Mesh Data Fabric
  • 28.
    Integrating your DWHin a Data Mesh  From be-all end-all to yet another Data Product in your mesh  Ownership based on your preference  DWH is data product on its own: managed by one data product team  DWH serves as "wrapper" for multiple data products: managed by multiple teams  DWH consumes data from multiple Data Products  Multiple Data Products consume data from DWH
  • 29.
    Agile Data Management Enforcedata governance and security.  Serve data as a product rather than a byproduct.  Provide an ecosystem of data products.  Create data domains to serve lines of business.  Empower teams to drive analytics solutions that deliver value to the business.  Modernize your teams and operations.  Prepare your company to:
  • 30.
  • 31.
    Contoso Managem ent zone Data products DataDomains Multi Organization Data Mesh Finance HR Organization Contoso Managem ent zone Finance HR Contoso Managem ent zone Finance HR
  • 32.
    Interested in learning more? Reachout to Nathan.Bijnens@microsoft.com
  • 33.
    Links DDD Best Practice -An Introduction To Domain-Driven Design | Microsoft Docs Introduction into Domain-Driven Design (DDD) (jannikwempe.com) IBM Automation Event-Driven Reference Architecture – Domain Driven Design (ibm- cloud-architecture.github.io) Data Mesh How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Data Mesh in Practice: How Europe's Leading Online Platform for Fashion Goes Beyond the Data Lake - Databricks