SlideShare a Scribd company logo
1 of 15
Download to read offline
Modern Data Architecture
A Data Modernization Green Paper
By Ranjan Bhattacharya, Chief Data Officer
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
2
Table of Contents
Executive Summary ....................................................................................................................3
Introduction.................................................................................................................................4
An Evolutionary Journey of Enterprise Data Architecture...........................................................5
The Enterprise Data Warehouse ......................................................................................5
Key Technological Trends and Architectural Shifts .........................................................6
NoSQL Databases and Polyglot Persistence...................................................................7
Data Lake and ELT...........................................................................................................9
Patterns for Building a Modern Data Architecture....................................................................11
Architectural Pattern for Descriptive & Diagnostic Stage Analytics...............................13
Architectural Pattern for Predictive Stage Analytics ......................................................14
Architectural Pattern for Prescriptive Stage Analytics ...................................................15
Conclusion................................................................................................................................15
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
3
Executive Summary
Organizations have been collecting, storing, and accessing data from the beginning of
computerization. Insights gained from analyzing the data enable them to identify new
opportunities, improve core processes, enable continuous learning and differentiation, remain
competitive, and thrive in an increasingly challenging business environment.
The well-established data architecture, consisting of a data warehouse, fed from multiple
operational data stores, and fronted by BI tools, has served most organizations well.
However, over the last two decades, with the explosion of internet-scale data, and the advent
of new approaches to data and computational processing, this tried-and-true data
architecture has come under strain, and has created both challenges and opportunities for
organizations.
In this green paper, we will discuss modern approaches to data architecture that have
evolved to address these challenges and provide a framework for companies to build a data
architecture and better adapt to increasing demands of the modern business environment.
This discussion of data architecture will be tied to the Data Maturity Journey introduced in
EQengineered’s June 2021 green paper on Data Modernization.
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
4
Introduction
Data storage has always been an integral part of computer technology. In the early days of
commercial computing, the technology for storing and retrieving data was clunky and slow,
and required users to become experts. In the 80s, with the introduction of relational
databases, data storage and access started becoming easier for even non-experts to use.
A major transformation in the way organizations use data came about with the data
warehousing paradigm in the early 90s. Business leaders could now access business
intelligence dashboards from a centralized data store, fed from multiple operational data
systems. Up to this point, the managing the various components of the data architecture was
relatively straightforward for most organizations.
Over the last two decades, several transformative technology trends like the following have
emerged in the business landscape causing foundational changes to business norms:
• Internet-scale data generation and its impact on data volume, velocity, and variety,
• The rise of cloud computing and its impact on scale and availability, and
• Adoption of machine learning and artificial intelligence techniques for business analytics.
For companies to succeed in this transformed landscape, adoption of new technologies and
architectures that align with strategic business priorities is imperative.
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
5
An Evolutionary Journey of Enterprise Data Architecture
Today, most organizations implement a variation of the following generic data processing flow
and supporting processes and infrastructure:
The Enterprise Data Warehouse
The dominating data processing architecture of the 90s was the one that integrated
operational systems with the enterprise data warehouse (EDW). The EDW forms the core
component of business intelligence (BI), serving as a central repository of data from multiple
sources, both current and historical.
This architecture enables an enterprise-wide capability for analytics and reporting, thus
allowing organizations to respond quickly to changing business opportunities. For traditional
organizational data-processing needs, this architecture can satisfy quality requirements like
Generation
• Business data
• Operational
data
Ingestion
• Extraction
• Transformation
Storage
• Repositories for
query and
analytics
Analytics
• Tools for
running queries
and analytics
Consumption
• Reports
• Dashboards
• APIs
Data Processing Stages
Generation Ingestion Storage Analytics Consumption
Data Processing Stages
Data Flow Pipeline &
Workflow
Quality &
Security
Performance &
Scalability
Disaster Recovery &
Failover
OLTP Databases
Applications, CRMs,
ERPs
APIs, Flat-files
ETL processes Data Warehouse Query Tools
BI Tools
Reports/Dashboards
APIs
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
6
data integrity, consistency, availability, and failover. The flexibility of this architecture has also
served well in the migration from on-premise installations to cloud.
Key Technological Trends and Architectural Shifts
The introduction of newer technologies and platforms like internet-scale systems, migration to
cloud computing, ML/AI based analytics, IoT (Internet of Things), and edge devices like smart
phones have necessitated fundamental shifts to the data architecture to meet the enhanced
requirements for scalability, consistency, and disaster recovery.
The table below lists how enterprise data components are impacted due to this shift.
Stage From
(Key Attributes)
To
(Key Attributes)
Rationale
Generation • On-prem mostly
relational
databases
• Structured data
• Batch
• Cloud-based
Relational and
NoSQL databases
• Structured &
unstructured data
• Real-time & batch
• Moving to the cloud to take
advantage of the increased
flexibility, and scalability
• Polyglot persistence (see
below) uses the most suitable
database technology for the
task
Ingestion • On-prem ETL
tools
• Structured data
• Batch
• Cloud-based ELT
tools
• Structured &
unstructured data
• Real-time & batch
• ETL processes are being
replaced by more flexible and
performant ELT ones (see
below)
Storage • On-prem
centralized Data
warehouse
• Cloud-based Data
warehouse
• Distributed
storage
• Data Lake
• Increased flexibility in data
collection from multiple
sources, scale, and analytics
Analytics • Historical &
prescriptive
• Batch
• Data mining
• Prescriptive &
Predictive
• Real-time & batch
• AI/ML models
• Moving from prescriptive to
predictive analytics
• Use of AI/ML approaches
Consumption • Dashboards &
Reports
• Centralized
administration
• AI/ML model
driven insights
• Self-serve
• Analytics now available to non-
technical users through self-
serve portals
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
7
NoSQL Databases and Polyglot Persistence
Relational database technologies are well-suited for enterprise transaction processing with
their ACID—atomicity, consistency, isolation, and durability—guarantees, which ensure data
validity despite errors and failures. To implement these ACID properties, relational databases
make some tradeoffs with regards to scalability and availability by assuming a single instance
deployment, with tools for replication and disaster recovery. Handling a larger volume of data
simply means running the database on a larger server.
Concerns about data consistency and availability were not part of mainstream enterprise
architectural thinking before the confluence of several technological shifts like the widespread
adoption of cloud computing to handle the increasing volume and variety of unstructured data
serving a globally distributed user base. Companies quickly realized that simply moving a
relational database to a larger server is not able to meet the scalability and availability
demands of the business. Organizations need to think about distributing the load across
multiple database instances. However, relational databases are not designed to guarantee
availability in the face of network failures that is inherent in a distributed system.
In addition, modern Internet based applications generate a wider variety of unstructured data
that can consist of images, audio and video data, chat transcripts, application logs, social
media interactions, and other content that is difficult for relational technologies to handle.
A new class of database technologies, collectively known as NoSQL databases, better suited
for distributed operations, and storing various forms of unstructured data, has become
popular in the last decade.
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
8
The following table lists the different types of data and access patterns for which NoSQL
databases are more appropriate.
Database
Type
Strengths Weaknesses Use cases Tools
Key-Value Scalability,
Availability,
Fast read/write
access
Limited query and
ACID capabilities;
Not able to
update data
partially
User sessions;
Chat data; Caching
Memcached,
Redis, Riak, AWS
DynamoDB,
Azure CosmosDB
Document Flexible data
model,
Distributed
deployment
Schema-less
design may
impact
maintainability;
Not able to
update data
partially; Limited
SQL-like query
capabilities
IoT data capture;
Product catalogs;
Content
management
MongoDB,
CouchDB, AWS
DynamoDB,
Azure CosmosDB
Column Scalability,
Availability,
High read/write
performance,
Distributed
deployment
Some has limited
SQL-like query
capabilities
Catalog searches;
Time series data
Cassandra,
HBase, AWS
Keyspaces,
Azure Cosmos
DB Cassandra
API
Graph Network or
hierarchical
relationships
Limited
scalability;
Limited
transactional
capabilities
Social
relationships;
Hierarchies;
Recommendations;
Fraud detections
Neo4j, AWS
Netpune, Azure
Cosmos DB
Gremlin API
An organization may opt for a mix of different databases, with relational databases providing
the core and critical transactional components of a system, while one or more of the NoSQL
databases providing support for different use cases. This trend of picking the right database
technology for the right use case has become known as polyglot persistence.
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
9
Data Lake and ELT
The needs of traditional analytics are well-met by the data warehouse architecture discussed
above. Interactive BI dashboards, visualization tools, and reports can describe what is
happening in a business, and answer questions like “What Happened?” and “Why did
Something Happen?” These tools mostly deal with structured data, well-defined at source,
and ingested through batch processes.
The ETL (Extract-Transform-Load) pattern has been the workhorse of ingesting data from
diverse sources into the data warehouse for a long time.
The ETL pattern includes the following steps:
• Extract: data is copied from different source systems to a staging area, typically in batch
mode.
• Transform: data cleansing, enrichment, and transformation is done in preparation for
loading into the data warehouse
• Load: the transformed data is loaded into the data warehouse tables
Although the ETL approach is good for bulk data movement, it has a few limitations in the
new world of internet scale, real-time data, and AI/ML based analytics, including:
• It is primarily batch-oriented and not suitable for real-time data
• EDW schemas are difficult to change and adapt to changing source data structures
• Data that does not have a schema defined for it cannot be loaded into the EDW and is not
available to the analytics team
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
10
• Since in this model the EDW serves as a repository of historical data, the data that cannot
be loaded into the EDW is lost
• For the data that is loaded into the EDW, its lineage information is not preserved
• Because of the dependency on an EDW schema, unstructured and binary data are difficult
to load
The ELT (Extract-Load-Transform) is a variation on the ETL approach, and it consists of the
following steps:
• Extract and Load: data is copied from different source systems to a centralized
repository or staging area in its raw form, sometimes called a data lake
• Transform: data from this central repository is moved after cleansing, enrichment, and
transformation to the most appropriate location for analytics, which may be a data
warehouse, or even a NoSQL database.
The advantages of this approach include:
• Data is always maintained in its raw form in the data lake, preserving history and lineage
information
• Data from diverse sources—real-time, and streaming, structured, unstructured, and
binary—can all be stored in the data lake
• Data in the data lake can be processed after the fact when a need arises
• Analytics tools can access data directly from the data lake even if it is not available in the
data warehouse
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
11
Patterns for Building a Modern Data Architecture
In an earlier green paper—Building an Effective Data & Analytics Operating Model—we
presented a data maturity journey:
As an organization plans on a data modernization initiative, selecting the appropriate set of
technologies and tools from the breathtakingly diverse and complex choices may appear
quite daunting. To select an architecture best suited for their specific needs, it helps to think
in terms of the maturity journey, the appropriate use cases, and corresponding patterns of
data architecture.
Maturity Level Target Use Cases
Descriptive & Diagnostic • Reporting, dashboards, ad-hoc analysis
• Multiple data sources, both structured and some unstructured
• Third-party data sources from SaaS vendors
• Monolithic applications with centralized data stores
Predictive • Business intelligence and occasional operational AI/ML
• Streaming data
• Diverse data types (including text, images, and video)
• Microservice based applications with federated data stores
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
12
Prescriptive • All the above. and
• AI/ML driven, real-time capabilities for both internal and
external users
The important thing to note here is the recommendation to migrate to cloud-based data
platforms which offers several advantages over using traditional on-prem technologies,
including that it is easier to get started, scale, and mature.
Here is a list of popular cloud-based technologies used in the modern data architecture stack:
Technology Categories Cloud Products
NoSQL databases See above
Data warehouse Snowflake, AWS Redshift, Azure SQL Data Warehouse
Data lake Databricks Delta Lake, AWS S3, Azure Data Lake (Hadoop)
Stream processing Kafka, AWS Kinesis, Azure Event Hub
ETL/ELT Fivetran, dbt, Airflow
BI tools Tableau, Power BI, Looker
Data science and AI/ML
platforms
Databricks Spark ML, AWS SageMaker, Azure ML
MLOps MLFlow, Seldon, Kubeflow, Apache Airflow, AWS SageMaker,
Azure ML
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
13
Architectural Pattern for Descriptive & Diagnostic Stage Analytics
The architectural pattern for descriptive and diagnostic stage analytics is appropriate for
organizations of all sizes which are at the initial milestones of their analytics journey. It uses an
architecture quite similar to that of the traditional EDW architecture, with a few
enhancements.
The architecture includes the
• Ability to handle multiple data sources, both structured and some unstructured
• Use of cloud-based technologies: relational databases for storing operational data, data
warehouse, NoSQL data stores
Because of the similarity of this architecture to traditional EDW setups, it is easier to get
started. Most importantly, however, this architecture lays the ground work for moving to the
subsequent maturity levels of the data journey without requiring a major rearchitecting effort.
Generation Ingestion Storage Analytics Consumption
OLTP Databases
Applications, CRMs,
ERPs
APIs, Flat-files
ETL processes Data Warehouse Query Tools
BI Tools
Reports/Dashboards
APIs
NoSQL Stores
Logs
NoSQL Stores
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
14
Architectural Pattern for Predictive Stage Analytics
The pattern for predictive analytics is the next milestone in the data maturity journey and is
appropriate for organizations with more complex data needs, including support for more
diverse data types, streaming data, and support for both operational and analytics use cases.
The salient components of this patterns include the:
• Use of the ELT workflow and a cloud-based data lake to handle the diversity of source
data
• Use of prepackaged AutoML platforms for a relatively simpler approach to predictive
analytics
Generation Ingestion Storage Analytics Consumption
OLTP Databases
Applications, CRMs,
ERPs
APIs, Flat-files
ELT workflow Data Warehouse Query Tools
BI Tools
Reports/Dashboards
APIs
NoSQL Stores
Logs
NoSQL Stores
Event Collectors
Stream Processing
Data Lake Data Science Tools Embedded Analytics
P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255
15
Architectural Pattern for Prescriptive Stage Analytics
At the end stage of the data maturity journey, the pattern for prescriptive analytics supports a
robust MLOps workflow, with exploration, testing, deployment, and monitoring, with AI/ML as
a core capability.
This architecture is both powerful and flexible to address the most sophisticated analytics
needs of an organization.
Conclusion
Harnessing the capabilities of a mature data and analytics practice will allow organizations to
create significant value and differentiate themselves for their competitors. There is a wide
choice of tools from which to build a modern data architecture stack.
Though there is no one-size-fits-all approach, the above framework assists an organization to
build a robust and adaptable data architecture aligned to their strategic and business
imperatives.
Generation Ingestion Storage Analytics Consumption
OLTP Databases
Applications, CRMs,
ERPs
APIs, Flat-files
ELT workflow Data Warehouse Query Tools
BI Tools
Reports/Dashboards
APIs
NoSQL Stores
Logs
NoSQL Stores
Event Collectors
Stream Processing
Data Lake Data Science Tools Embedded Analytics
AI/ML Platform
Exploration & Training Deployment Operationalization
Labelling
Feature Engineering
Model Training & Tuning
Scaling
Testing
Model Versioning
Data Versioning
Monitoring & Alerting
Model Tracking
Model Tracking

More Related Content

What's hot

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Rhapsody Technologies, Inc.
 

What's hot (20)

Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Requirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationRequirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - Presentation
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
Master Data Management (MDM) 101 & Oracle Trading Community Architecture (TCA...
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 

Similar to Modern Data Architecture

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
redmondpulver
 

Similar to Modern Data Architecture (20)

When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
oracle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdforacle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdf
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
StreamCentral for the IT Professional
StreamCentral for the IT ProfessionalStreamCentral for the IT Professional
StreamCentral for the IT Professional
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 

More from Mark Hewitt

Modernizing the Enterprise Monolith: EQengineered Consulting Green Paper
Modernizing the Enterprise Monolith: EQengineered Consulting Green PaperModernizing the Enterprise Monolith: EQengineered Consulting Green Paper
Modernizing the Enterprise Monolith: EQengineered Consulting Green Paper
Mark Hewitt
 

More from Mark Hewitt (18)

Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Microsoft Partner | Application Integration
Microsoft Partner | Application IntegrationMicrosoft Partner | Application Integration
Microsoft Partner | Application Integration
 
Embrace Modular Technology and Agile Process to Deliver Business Impact
Embrace Modular Technology and Agile Process to Deliver Business ImpactEmbrace Modular Technology and Agile Process to Deliver Business Impact
Embrace Modular Technology and Agile Process to Deliver Business Impact
 
Personal Branding | Visionocity Magazine
Personal Branding | Visionocity MagazinePersonal Branding | Visionocity Magazine
Personal Branding | Visionocity Magazine
 
21st Century Outplacement Program
21st Century Outplacement Program21st Century Outplacement Program
21st Century Outplacement Program
 
Modernizing the Enterprise Monolith: EQengineered Consulting Green Paper
Modernizing the Enterprise Monolith: EQengineered Consulting Green PaperModernizing the Enterprise Monolith: EQengineered Consulting Green Paper
Modernizing the Enterprise Monolith: EQengineered Consulting Green Paper
 
Social Media: Employability Skills for the 21st Century
Social Media: Employability Skills for the 21st CenturySocial Media: Employability Skills for the 21st Century
Social Media: Employability Skills for the 21st Century
 
How to Effectively Use Social Media in Your CPA Practice
How to Effectively Use Social Media in Your CPA Practice How to Effectively Use Social Media in Your CPA Practice
How to Effectively Use Social Media in Your CPA Practice
 
How to Effectively Use Social Media in Your Law Practice
How to Effectively Use Social Media in Your Law PracticeHow to Effectively Use Social Media in Your Law Practice
How to Effectively Use Social Media in Your Law Practice
 
Social Business and Personal Brand Building for Your Law Firm
Social Business and Personal Brand Building for Your Law FirmSocial Business and Personal Brand Building for Your Law Firm
Social Business and Personal Brand Building for Your Law Firm
 
Social Media is a CTE Necessity
Social Media is a CTE NecessitySocial Media is a CTE Necessity
Social Media is a CTE Necessity
 
Design Systems the 9 States
Design Systems the 9 StatesDesign Systems the 9 States
Design Systems the 9 States
 
EQengineered: Rationalizing the Tension Between User Experience and Technology
EQengineered: Rationalizing the Tension Between User Experience and TechnologyEQengineered: Rationalizing the Tension Between User Experience and Technology
EQengineered: Rationalizing the Tension Between User Experience and Technology
 
EQengineered Corporate Overview
EQengineered Corporate OverviewEQengineered Corporate Overview
EQengineered Corporate Overview
 
EQengineered: A look into Design systems
EQengineered: A look into Design systemsEQengineered: A look into Design systems
EQengineered: A look into Design systems
 
Socially Savvy Corporate Introduction
Socially Savvy Corporate IntroductionSocially Savvy Corporate Introduction
Socially Savvy Corporate Introduction
 
Military Job Transition_Intro 2
Military Job Transition_Intro 2Military Job Transition_Intro 2
Military Job Transition_Intro 2
 
Personal Brand Activation Program For Executive Leaders
Personal Brand Activation Program For Executive LeadersPersonal Brand Activation Program For Executive Leaders
Personal Brand Activation Program For Executive Leaders
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Modern Data Architecture

  • 1. Modern Data Architecture A Data Modernization Green Paper By Ranjan Bhattacharya, Chief Data Officer
  • 2. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 2 Table of Contents Executive Summary ....................................................................................................................3 Introduction.................................................................................................................................4 An Evolutionary Journey of Enterprise Data Architecture...........................................................5 The Enterprise Data Warehouse ......................................................................................5 Key Technological Trends and Architectural Shifts .........................................................6 NoSQL Databases and Polyglot Persistence...................................................................7 Data Lake and ELT...........................................................................................................9 Patterns for Building a Modern Data Architecture....................................................................11 Architectural Pattern for Descriptive & Diagnostic Stage Analytics...............................13 Architectural Pattern for Predictive Stage Analytics ......................................................14 Architectural Pattern for Prescriptive Stage Analytics ...................................................15 Conclusion................................................................................................................................15
  • 3. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 3 Executive Summary Organizations have been collecting, storing, and accessing data from the beginning of computerization. Insights gained from analyzing the data enable them to identify new opportunities, improve core processes, enable continuous learning and differentiation, remain competitive, and thrive in an increasingly challenging business environment. The well-established data architecture, consisting of a data warehouse, fed from multiple operational data stores, and fronted by BI tools, has served most organizations well. However, over the last two decades, with the explosion of internet-scale data, and the advent of new approaches to data and computational processing, this tried-and-true data architecture has come under strain, and has created both challenges and opportunities for organizations. In this green paper, we will discuss modern approaches to data architecture that have evolved to address these challenges and provide a framework for companies to build a data architecture and better adapt to increasing demands of the modern business environment. This discussion of data architecture will be tied to the Data Maturity Journey introduced in EQengineered’s June 2021 green paper on Data Modernization.
  • 4. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 4 Introduction Data storage has always been an integral part of computer technology. In the early days of commercial computing, the technology for storing and retrieving data was clunky and slow, and required users to become experts. In the 80s, with the introduction of relational databases, data storage and access started becoming easier for even non-experts to use. A major transformation in the way organizations use data came about with the data warehousing paradigm in the early 90s. Business leaders could now access business intelligence dashboards from a centralized data store, fed from multiple operational data systems. Up to this point, the managing the various components of the data architecture was relatively straightforward for most organizations. Over the last two decades, several transformative technology trends like the following have emerged in the business landscape causing foundational changes to business norms: • Internet-scale data generation and its impact on data volume, velocity, and variety, • The rise of cloud computing and its impact on scale and availability, and • Adoption of machine learning and artificial intelligence techniques for business analytics. For companies to succeed in this transformed landscape, adoption of new technologies and architectures that align with strategic business priorities is imperative.
  • 5. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 5 An Evolutionary Journey of Enterprise Data Architecture Today, most organizations implement a variation of the following generic data processing flow and supporting processes and infrastructure: The Enterprise Data Warehouse The dominating data processing architecture of the 90s was the one that integrated operational systems with the enterprise data warehouse (EDW). The EDW forms the core component of business intelligence (BI), serving as a central repository of data from multiple sources, both current and historical. This architecture enables an enterprise-wide capability for analytics and reporting, thus allowing organizations to respond quickly to changing business opportunities. For traditional organizational data-processing needs, this architecture can satisfy quality requirements like Generation • Business data • Operational data Ingestion • Extraction • Transformation Storage • Repositories for query and analytics Analytics • Tools for running queries and analytics Consumption • Reports • Dashboards • APIs Data Processing Stages Generation Ingestion Storage Analytics Consumption Data Processing Stages Data Flow Pipeline & Workflow Quality & Security Performance & Scalability Disaster Recovery & Failover OLTP Databases Applications, CRMs, ERPs APIs, Flat-files ETL processes Data Warehouse Query Tools BI Tools Reports/Dashboards APIs
  • 6. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 6 data integrity, consistency, availability, and failover. The flexibility of this architecture has also served well in the migration from on-premise installations to cloud. Key Technological Trends and Architectural Shifts The introduction of newer technologies and platforms like internet-scale systems, migration to cloud computing, ML/AI based analytics, IoT (Internet of Things), and edge devices like smart phones have necessitated fundamental shifts to the data architecture to meet the enhanced requirements for scalability, consistency, and disaster recovery. The table below lists how enterprise data components are impacted due to this shift. Stage From (Key Attributes) To (Key Attributes) Rationale Generation • On-prem mostly relational databases • Structured data • Batch • Cloud-based Relational and NoSQL databases • Structured & unstructured data • Real-time & batch • Moving to the cloud to take advantage of the increased flexibility, and scalability • Polyglot persistence (see below) uses the most suitable database technology for the task Ingestion • On-prem ETL tools • Structured data • Batch • Cloud-based ELT tools • Structured & unstructured data • Real-time & batch • ETL processes are being replaced by more flexible and performant ELT ones (see below) Storage • On-prem centralized Data warehouse • Cloud-based Data warehouse • Distributed storage • Data Lake • Increased flexibility in data collection from multiple sources, scale, and analytics Analytics • Historical & prescriptive • Batch • Data mining • Prescriptive & Predictive • Real-time & batch • AI/ML models • Moving from prescriptive to predictive analytics • Use of AI/ML approaches Consumption • Dashboards & Reports • Centralized administration • AI/ML model driven insights • Self-serve • Analytics now available to non- technical users through self- serve portals
  • 7. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 7 NoSQL Databases and Polyglot Persistence Relational database technologies are well-suited for enterprise transaction processing with their ACID—atomicity, consistency, isolation, and durability—guarantees, which ensure data validity despite errors and failures. To implement these ACID properties, relational databases make some tradeoffs with regards to scalability and availability by assuming a single instance deployment, with tools for replication and disaster recovery. Handling a larger volume of data simply means running the database on a larger server. Concerns about data consistency and availability were not part of mainstream enterprise architectural thinking before the confluence of several technological shifts like the widespread adoption of cloud computing to handle the increasing volume and variety of unstructured data serving a globally distributed user base. Companies quickly realized that simply moving a relational database to a larger server is not able to meet the scalability and availability demands of the business. Organizations need to think about distributing the load across multiple database instances. However, relational databases are not designed to guarantee availability in the face of network failures that is inherent in a distributed system. In addition, modern Internet based applications generate a wider variety of unstructured data that can consist of images, audio and video data, chat transcripts, application logs, social media interactions, and other content that is difficult for relational technologies to handle. A new class of database technologies, collectively known as NoSQL databases, better suited for distributed operations, and storing various forms of unstructured data, has become popular in the last decade.
  • 8. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 8 The following table lists the different types of data and access patterns for which NoSQL databases are more appropriate. Database Type Strengths Weaknesses Use cases Tools Key-Value Scalability, Availability, Fast read/write access Limited query and ACID capabilities; Not able to update data partially User sessions; Chat data; Caching Memcached, Redis, Riak, AWS DynamoDB, Azure CosmosDB Document Flexible data model, Distributed deployment Schema-less design may impact maintainability; Not able to update data partially; Limited SQL-like query capabilities IoT data capture; Product catalogs; Content management MongoDB, CouchDB, AWS DynamoDB, Azure CosmosDB Column Scalability, Availability, High read/write performance, Distributed deployment Some has limited SQL-like query capabilities Catalog searches; Time series data Cassandra, HBase, AWS Keyspaces, Azure Cosmos DB Cassandra API Graph Network or hierarchical relationships Limited scalability; Limited transactional capabilities Social relationships; Hierarchies; Recommendations; Fraud detections Neo4j, AWS Netpune, Azure Cosmos DB Gremlin API An organization may opt for a mix of different databases, with relational databases providing the core and critical transactional components of a system, while one or more of the NoSQL databases providing support for different use cases. This trend of picking the right database technology for the right use case has become known as polyglot persistence.
  • 9. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 9 Data Lake and ELT The needs of traditional analytics are well-met by the data warehouse architecture discussed above. Interactive BI dashboards, visualization tools, and reports can describe what is happening in a business, and answer questions like “What Happened?” and “Why did Something Happen?” These tools mostly deal with structured data, well-defined at source, and ingested through batch processes. The ETL (Extract-Transform-Load) pattern has been the workhorse of ingesting data from diverse sources into the data warehouse for a long time. The ETL pattern includes the following steps: • Extract: data is copied from different source systems to a staging area, typically in batch mode. • Transform: data cleansing, enrichment, and transformation is done in preparation for loading into the data warehouse • Load: the transformed data is loaded into the data warehouse tables Although the ETL approach is good for bulk data movement, it has a few limitations in the new world of internet scale, real-time data, and AI/ML based analytics, including: • It is primarily batch-oriented and not suitable for real-time data • EDW schemas are difficult to change and adapt to changing source data structures • Data that does not have a schema defined for it cannot be loaded into the EDW and is not available to the analytics team
  • 10. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 10 • Since in this model the EDW serves as a repository of historical data, the data that cannot be loaded into the EDW is lost • For the data that is loaded into the EDW, its lineage information is not preserved • Because of the dependency on an EDW schema, unstructured and binary data are difficult to load The ELT (Extract-Load-Transform) is a variation on the ETL approach, and it consists of the following steps: • Extract and Load: data is copied from different source systems to a centralized repository or staging area in its raw form, sometimes called a data lake • Transform: data from this central repository is moved after cleansing, enrichment, and transformation to the most appropriate location for analytics, which may be a data warehouse, or even a NoSQL database. The advantages of this approach include: • Data is always maintained in its raw form in the data lake, preserving history and lineage information • Data from diverse sources—real-time, and streaming, structured, unstructured, and binary—can all be stored in the data lake • Data in the data lake can be processed after the fact when a need arises • Analytics tools can access data directly from the data lake even if it is not available in the data warehouse
  • 11. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 11 Patterns for Building a Modern Data Architecture In an earlier green paper—Building an Effective Data & Analytics Operating Model—we presented a data maturity journey: As an organization plans on a data modernization initiative, selecting the appropriate set of technologies and tools from the breathtakingly diverse and complex choices may appear quite daunting. To select an architecture best suited for their specific needs, it helps to think in terms of the maturity journey, the appropriate use cases, and corresponding patterns of data architecture. Maturity Level Target Use Cases Descriptive & Diagnostic • Reporting, dashboards, ad-hoc analysis • Multiple data sources, both structured and some unstructured • Third-party data sources from SaaS vendors • Monolithic applications with centralized data stores Predictive • Business intelligence and occasional operational AI/ML • Streaming data • Diverse data types (including text, images, and video) • Microservice based applications with federated data stores
  • 12. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 12 Prescriptive • All the above. and • AI/ML driven, real-time capabilities for both internal and external users The important thing to note here is the recommendation to migrate to cloud-based data platforms which offers several advantages over using traditional on-prem technologies, including that it is easier to get started, scale, and mature. Here is a list of popular cloud-based technologies used in the modern data architecture stack: Technology Categories Cloud Products NoSQL databases See above Data warehouse Snowflake, AWS Redshift, Azure SQL Data Warehouse Data lake Databricks Delta Lake, AWS S3, Azure Data Lake (Hadoop) Stream processing Kafka, AWS Kinesis, Azure Event Hub ETL/ELT Fivetran, dbt, Airflow BI tools Tableau, Power BI, Looker Data science and AI/ML platforms Databricks Spark ML, AWS SageMaker, Azure ML MLOps MLFlow, Seldon, Kubeflow, Apache Airflow, AWS SageMaker, Azure ML
  • 13. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 13 Architectural Pattern for Descriptive & Diagnostic Stage Analytics The architectural pattern for descriptive and diagnostic stage analytics is appropriate for organizations of all sizes which are at the initial milestones of their analytics journey. It uses an architecture quite similar to that of the traditional EDW architecture, with a few enhancements. The architecture includes the • Ability to handle multiple data sources, both structured and some unstructured • Use of cloud-based technologies: relational databases for storing operational data, data warehouse, NoSQL data stores Because of the similarity of this architecture to traditional EDW setups, it is easier to get started. Most importantly, however, this architecture lays the ground work for moving to the subsequent maturity levels of the data journey without requiring a major rearchitecting effort. Generation Ingestion Storage Analytics Consumption OLTP Databases Applications, CRMs, ERPs APIs, Flat-files ETL processes Data Warehouse Query Tools BI Tools Reports/Dashboards APIs NoSQL Stores Logs NoSQL Stores
  • 14. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 14 Architectural Pattern for Predictive Stage Analytics The pattern for predictive analytics is the next milestone in the data maturity journey and is appropriate for organizations with more complex data needs, including support for more diverse data types, streaming data, and support for both operational and analytics use cases. The salient components of this patterns include the: • Use of the ELT workflow and a cloud-based data lake to handle the diversity of source data • Use of prepackaged AutoML platforms for a relatively simpler approach to predictive analytics Generation Ingestion Storage Analytics Consumption OLTP Databases Applications, CRMs, ERPs APIs, Flat-files ELT workflow Data Warehouse Query Tools BI Tools Reports/Dashboards APIs NoSQL Stores Logs NoSQL Stores Event Collectors Stream Processing Data Lake Data Science Tools Embedded Analytics
  • 15. P.O. Box 211 Hampton Falls, NH 03844 | info@eqengineered.com | eqengineered.com | 617.448.4255 15 Architectural Pattern for Prescriptive Stage Analytics At the end stage of the data maturity journey, the pattern for prescriptive analytics supports a robust MLOps workflow, with exploration, testing, deployment, and monitoring, with AI/ML as a core capability. This architecture is both powerful and flexible to address the most sophisticated analytics needs of an organization. Conclusion Harnessing the capabilities of a mature data and analytics practice will allow organizations to create significant value and differentiate themselves for their competitors. There is a wide choice of tools from which to build a modern data architecture stack. Though there is no one-size-fits-all approach, the above framework assists an organization to build a robust and adaptable data architecture aligned to their strategic and business imperatives. Generation Ingestion Storage Analytics Consumption OLTP Databases Applications, CRMs, ERPs APIs, Flat-files ELT workflow Data Warehouse Query Tools BI Tools Reports/Dashboards APIs NoSQL Stores Logs NoSQL Stores Event Collectors Stream Processing Data Lake Data Science Tools Embedded Analytics AI/ML Platform Exploration & Training Deployment Operationalization Labelling Feature Engineering Model Training & Tuning Scaling Testing Model Versioning Data Versioning Monitoring & Alerting Model Tracking Model Tracking