For those contemplating re-architecting or greenfields data lakes/data hubs/data warehouses in a cloud environment, talk to our Altis AWS Practice Lead - Guillaume Jaudouin about why you should be considering the "tour de force" combination of AWS and Snowflake.
2. Company
overview
• Established in 1998
• Offices in Sydney,
Melbourne, Canberra,
Auckland and London
• Vendor independent
• Largest ANZ specialist Data & Analytics Consultancy
100
Introduction
About Altis Consulting
4. What do we do? Consulting Services
Our information management model
5. AWS Experience – Sample Snowflake Projects
• Strategy/roadmap.
• Data platform architecture design.
• Implementation of greenfield data warehouse
• Snowflake design
• Production readiness assessment
• Implementation of real-time web analytics
• Migration from Redshift
• Data platform architecture design.
• Implementation of greenfield data warehouse
• Data Lake integration
• Strategy/roadmap.
• Data platform architecture design.
• Implementation scoping
• Architecture Guidance.
• Data Warehouse Implementation
• Data platform architecture design.
• Proof of concept
• Data Migration from legacy DW
Company 1
Company 2
Company 3
Company 4
Company 5
Company 6
7. Integrated Data Hub
DD MMMM YYYY
Presenter Name
Use Case: Data & Analytics Platform including Self-serve
Customer Challenge: Siloed data, multiple datamarts and
multiple versions of the truth
Solution:
- Data sources: Operational Data Sources
- Data ingestion: Dell Boomi
- Data Store: S3, Snowflake
- Data Processing: Matillion
- Data Presenation: Tableau
Benefits:
• Certified gas production dashboards distributed to the field
and allowing near-real time input/visibility of field
commentary back at head office
• Centrallised/certified copy of the truth for data & analytics
purposes across all subject areas
• Support for Advanced Analytics capabilities (eg. predictive
maintenance)
• Geospatial planning of work activities
Well
view
MDM
OpsDB
Others.
..
Staging
DWH Users
Source
Systems
DataLake
ODS
- Persisted
- Integrated
- Volatile (data gets updated)
- Most recent view of IG data
- PPDM aligned
An exact copy of
source-system
data, unchanged
- Persisted
- Integrated
- Non-Volatile
- Current and historical view
of IG data
- Subject Oriented
DW
Advanced
Analytics /
Data
Science
Use-
Cases
Batched
integrations
Extract
- Transient
- Required by
Matillion
- Only the latest
data from s3
Stage
- Persisted
- Not-integrated
- Volatile (data gets
updated)
- Best practice for
data loading flexiblity
- Transient
Transformation
Layer
- Time-series
- Near real-time
Advanced
Analytics
Custom
Application
Use Cases
Data
Integration
8. DW Modernization
DD MMMM YYYY
Presenter Name
Use Case: Greenfield Data Warehouse
Customer Challenge: No framework for warehousing and
reporting on sales. Reports generated directly from source system
databases via stored procedures. Limited ability to utilize modern
analytics and artificial intelligence technologies to improve
decision making.
Solution:
• Data Sources: RDS / Flat Files
• Data Store: Amazon S3
• ETL: Matillion
• Data Warehouse: Snowflake
• Reporting and analytics: Tableau
Benefits:
Established Data Warehouse according to best practice
Created metadata driven ETL framework which supports rapid
integration of new data sources
Built sales subject area allowing reproduction of existing reports
and adding self-service reports in Tableau
Business well positioned to leverage Analytics and AI
technologies
Landing
Amazon S3
Object Store
Data Acquisition Data WarehouseData Transformation
AWS Source
Systems
Snowflake
Processed
Amazon S3
Batch Data
Processing
Batch
Data
Extract
Presentation
Core DB
Retail Gift Cards
Amazon S3
9. Digital Data Analytics
DD MMMM YYYY
Presenter Name
Use Case: Web Analytics
Customer Challenge: Impossibility to access server stats data
in near real-time.
Solution:
- Data sources: Web server stats
- Data Ingestion: Kinesis Streams and Firehose
- Data Store: S3, Snowflake
- Data Processing: Python and Luigi
- Reporting and analytics: Tableau
Benefits:
- Real-time data ingestion
- JSON parsing allowing extraction of relevant data
- Web server stats can be access soon after the event
Web Servers
Staging
Amazon S3
Object StoreData Acquisition Data Warehouse
Data Transformation
Amazon Kinesis
Streams
Amazon Kinesis
Firehose
Snowflake
Presentation
ETL
EC2
Source Systems
- Modelled in 3NF
- Current view of IG
- Kept in sync with MDM (hopefully!)
- Operational use / Reporting
ODS
- Modelled in 3NF (just like source systems!)
- Most-recent view of IG
- Source for DWH
- One big IG source system!
- Batch-updated, not real-time
- No use-case at present (Operational reporting... but late?)
- Part of IG's Architecture StandardsODS Requirements
- Persist the most recent copy of data
- Integrate across source systems
- Take a balanced approach to normalisation, focus on deduplication, and ease of use, and ease of maintenance
- Align naming conventions and concepts with PPDM
- Batch updated
ODS Uses
- One stop IG source system for applications requiring data from multiple source systems (i.e. PPAC)
- Operational reporting (TBC)DWH
- Facts
- Dimensions
- Business Rules
- Single Source of Truth
- Strategic and Tactical reporting
- Predictive Analytics (Machine Learning / Data Science)
- Needed for ad hoc reporting and visualisation
- Self ServiceBI Tools
- Analyse DWH data
- Visualise data
- Data updated when the DWH updates
- No business logic (already present in the DWH)
- Many reports, same data
- Drill-up, drill-down, drill-across, drill-through
- Dimensional Modelling
- Star Schema
- Auditability
- Tracability
- EDW (Enterprise)
- Decision Support System