This document summarizes a webinar presentation about modernizing data architecture for customer insights and machine learning. It discusses how a company called EarEcstasy modernized its data systems to better understand customer usage as it transitioned from a B2B to B2C model. The presentation outlines three outcomes of EarEcstasy's data transformation: 1) modernizing and consolidating its data infrastructure, 2) innovating for new revenues through personalization and forecasting, and 3) enabling real-time engagement with interactive experiences. It provides examples of how EarEcstasy implemented a modern data lake architecture on AWS to achieve these outcomes.
2. Summit Webinar Edition | Vietnam
Topic
Thông tin chuyên sâu về khách hàng và Machine Learning (Cấp 200 – 300) | Tìm hiểu khách hàng của bạn – Cấu trúc dữ liệu
hiện đại
Hãy thay đổi! Chuyển đổi sang AWS (Cấp 200) | Chuyển đổi & hiện đại hóa các ứng dụng Microsoft truyền thống với
container
Chuyển đổi sang AWS (Cấp 200) | Quản lý dự án chuyển đổi DB – Quy tắc thực tiễn tốt nhất
3. Ask the AWS Experts
Our Experts are online to answer any questions you have during the
presentation.
Ask your questions via the Questions Window on the GoToWebinar Control
Panel
5. Meet EarEcstasy, as they move from B2B to B2C
* This case is representative of a common customer journey, but EarEcstasy isn’t an actual business
EarEcstasy manufacturers headsets. They ran
a traditional B2B business since 2005, selling
through distribution and retail channels.
2005
In 2018, they launched their first “Smart
Buds”. These wireless headsets have voice
enablement, GPS tracking, and heartrate
monitors built in, and the device syncs with
the users mobile phone via Bluetooth. The
mobile app also supports scene detection.
2018
6. EarEcstasy needs to answer new questions and move faster
Raymond, Head of ProductLim, Head of Finance
Which regions are the new earbuds selling well?
What is the demand forecast by product category?
What is the social sentiment about our products?
How do quality issues impact cost of production?
Can I look at supplier performance over time?
How can we reduce our inventory holding costs?
7. To answer new questions quickly, we look to a
modern data architecture design
Massive upfront costs
Overprovisioned capacity
Long implementation times
Pay as you go, for what you use
Decoupled pipelines and engines
Experimentation platform
Ingest/
Collect
Consume/
visualize
Store Process/
analyze
1 4
0 9
5
9. Start with a set of specific questions to answer, then work
backwards to the data required
Lim, Head of Finance
How do quality issues impact cost of production?
Can I look at supplier performance over time?
How can we reduce our inventory holding costs?
Order History /
Returns (CRM)
Inventory /
Production (ERP)
10. Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
Transactions
ERP
Data analysts
DATA PIPELINES
Ingest/
Collect
Consume /
visualize
Store Process /
analyze
1 4
0 9
5
12. Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
Transactions
ERP
DATA PIPELINES
Data
Lake
expdp
Data Data analysts
Data Warehouse
Amazon Redshift
Direct Query
Amazon Athena
She asks for the SMALLEST amount of data to answer her questions.
If it isn’t good enough, she asks for another small slice to be loaded to the DATA LAKE
13. Amazon Redshift – Modern Data Warehousing
Fast, scalable, fully managed data warehouse at 1/10th the cost
Massively parallel, scales from gigabytes to exabytes
Queries data across your Redshift data warehouse and Amazon S3 data lake
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
Open file formats
Analyze optimized data
formats on direct-attached
disks, and all open file
formats in S3
Cost-effective
Start at $0.25 per hour;
as low as $250-$333 per
uncompressed terabyte
per year
$
Secure
Audit everything; encrypt
data end-to-end; extensive
certification and compliance
14. Characteristics of a Data Lake
Future
Proof
Flexible
Access
Dive in
Anywhere
Collect
Anything
15. Start with a set of specific questions to answer, then work
backwards to the data required
Raymond, Head of Product
Which regions are the new earbuds selling well?
What is the demand forecast by product category?
What is the social sentiment about our products?
Trending /
Mentions (Social)
Order History /
Returns (CRM)
NOW IN THE DATA LAKE
17. Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
DATA PIPELINES
Data
Lake
He first looks to the DATA LAKE, and imports only the category data he needs
He imports JUST ENOUGH data to see if the market is responding to products.
Business users
Transactions
ERP
Social media
Data
Stream
Capture
Amazon
Kinesis
Events
Amazon
QuickSight
Data Warehouse
Amazon Redshift
Stream Data
Amazon
ElasticSearch
18. Common data pipeline configuration
Raw Data
Amazon S3
Highly decoupled configurations scale better, are more fault tolerant, and cost optimized
ETL (Hadoop)
Amazon EMR
Triggered Code
Amazon Lambda
Staged Data
(Data Lake)
Amazon S3
ETL & Catalog Management
AWS Glue
Data Warehouse
Amazon Redshift
Triggered Code
Amazon Lambda
19. Data security
and management
Encryption
Access Controls
Monitoring and Metrics
Audit Trails
Automation
Serverless Computing
Data Discovery and
Protection
Data Visualization
Data movement
Physical Appliances
Hybrid Storage
Private Networks
File Data
WAN Acceleration
Third-party Applications
Streaming Data
Complete set of building blocks
FileBlock
Object Archival
Storage types
20. Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
Transactions
ERP
Data analysts
Business users
DATA PIPELINES
EVENT PIPELINES
Data
Event
Insights
Data
Lake
Social media
22. EarEcstasy has its first direct relationship with consumers
Krzysztof, Data ScientistBala, Head of Marketing
What are our customer segments, based on usage?
Can predict music preference from location and HR?
Are there additional signals in the voice commands?
Can we infer user activity, from scenes in pictures?
How are people using the Smart Buds?
How to understand what they listen to and when?
What kinds of people are in/decreasing usage?
23. Start with a set of specific questions to answer, then work
backwards to the data required
Bala, Head of Marketing
How are people using the Smart Buds?
How to understand what they listen to and when?
What kinds of people are in/decreasing usage?
Media
consumption
(Partner API)
Registration,
usage [time/place]
(Mobile app)
24. Start with a set of specific questions to answer, then work
backwards to the data required
Krzysztof, Data Scientist
What are our customer segments, based on usage?
Can predict music preference from location and HR?
Are there additional signals in the voice commands?
Can we infer user activity, from scenes in pictures?
HR, Voice, GPS,
Images (Device
data)
DATA LAKE, OR NOT?
Registration,
usage [time/place]
(Mobile app)
LOAD TO DATA LAKE
26. Ingest ServingData
sources
Modern data architecture
Insights to enhance business applications, new digital services
Transactions
Data scientists
Business users
Connected
devices
DATA PIPELINES
EVENT PIPELINES
Data
Event
Insights
Data
Lake
Sandbox
ML / Analytics / DLWeb logs /
clickstream
27. Ingest ServingData
sources
Modern data architecture
Innovate for new revenues - personalization and forecasting
Transactions
ERP
Data analysts
Data scientists
Business users
Connected
devices
DATA PIPELINES
EVENT PIPELINES
Data
Event
Insights
Data
Lake
ML / Analytics
Social media
Web logs /
clickstream
29. EarEcstasy offers a personalized life soundtrack
Personalized, based on
past preferences,
people with similar behaviors,
and environments detected
30. Use EarEcstasy voice enablement to play music
I’m tired, play me
some music!
Amazon Transcribe
/ Comprehend
Action: PLAY
Category: MUSIC
Genre: <RECOMMEND>
Request content
HISTORY
Twenty One Pilots!
PEOPLE LIKE YOU
Amazon Kinesis
Streams
Connected device data
Location: <FIND GPS>
Mood: <FIND HR>
31. Use the mobile app to take a picture to identify
activity
A QUIET OFFICE
Amazon SageMaker
Image Classification
Amazon Rekognition
Image
CHAIR
LAPTOP
LAMP
DESK
97%
95%
88%
82%
Object Identification
WORKING!
<HISTORY>
32. Ingest ServingData
sources
Modern data architecture
Real-time engagement and interactive customer experiences
Transactions
ERP
Data analysts
Data scientists
Business users
Engagement platformsConnected
devices
Automation / events
DATA PIPELINES
EVENT PIPELINES
Data
Event Action
Insights
Data
Lake
ML / Analytics
Predict /
Recommend
AI Services
Social media
Web logs /
clickstream
33. Business Outcomes on a Modern Data Architecture
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and create new digital
services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven automation, fraud
detection
34. Ready to build better business from your ideas?
Short list projects that
directly impact
customer engagement
and adoption
Build simple data
pipelines that allow you
to test new ideas, and
fill your data lake
Ask our solution architects
and professional services
teams to help you build
36. Summit Webinar Edition | Vietnam
Topic
Thông tin chuyên sâu về khách hàng và Machine Learning (Cấp 200 – 300) | Tìm hiểu khách hàng của bạn – Cấu trúc dữ liệu
hiện đại
Hãy thay đổi! Chuyển đổi sang AWS (Cấp 200) | Chuyển đổi & hiện đại hóa các ứng dụng Microsoft truyền thống với
container
Chuyển đổi sang AWS (Cấp 200) | Quản lý dự án chuyển đổi DB – Quy tắc thực tiễn tốt nhất
38. Server
Host OS
Hypervisor
Server
Host OS
VM Containers
Guest OS
Lib,bin
App
VM 1
C 1
AppGuest OS
Lib,bin
App
VM 2
C 2
App
C 3
App
C 4
App
C 5
App
C 6
App
Common bin,lib Common bin,lib
OS Image 1 OS Image 2
Docker
Images
Running
Containers
2 Apps
Vs
6 Apps
39. Why do we care?
https://pixabay.com/en/baby-boy-child-childhood-computer-84626/
44. Windows Instance Example
m4.4xlarge -> $1.736/hr (SIN)
m4.xlarge -> $0.434/hr (SIN)
CPU Utilization ~ 15%
Change Instance Type
CPU Utilization
~60%
Still has ~40% head
room
75%
Savings
45. Bring Your Own License
(BYOL) – Windows Server
Running two m4.4xlarge instances
~2500 USD
On Demand
~1600 USD
3 Year Reservation
36% Savings
~1600 USD
3 Year Reservation
~800 USD
Dedicated Host
Reserved Instance
50% Savings
47. So far what we found
Reserved instances
Bigger instances
Right sizing
Easy deployment
Easy Patching
Isolation
Increase
Utilization
Savings
Headache Free IT
56. Set of containers
E.g. SQL server, Web sites
instance instance instance instance
Cluster
Constrains
E.g. HR, Finance, Instance Size
Resource Demand
E.g. Memory, CPU
Task
57. instance instance instance instance
Cluster
Task
Service
How many task?
Deployment
Strategy?
Auto scaling
Strategy?
66. Summit Webinar Edition | Vietnam
Topic
Thông tin chuyên sâu về khách hàng và Machine Learning (Cấp 200 – 300) | Tìm hiểu khách hàng của bạn – Cấu trúc dữ liệu
hiện đại
Hãy thay đổi! Chuyển đổi sang AWS (Cấp 200) | Chuyển đổi & hiện đại hóa các ứng dụng Microsoft truyền thống với
container
Chuyển đổi sang AWS (Cấp 200) | Quản lý dự án chuyển đổi DB – Quy tắc thực tiễn tốt nhất
67. Ask the AWS Experts
Our Experts are online to answer any questions you have during the
presentation.
Ask your questions via the Questions Window on the GoToWebinar Control
Panel
68. What to Expect from the Session
• Database Migration Context
• Database Migration Tools
• Introduction to the AWS Migration Framework
• Database Migration Effort
• Customer References
• Next Steps
72. Amazon DynamoDB
F a s t a n d f l e x i b l e N o S Q L d a t a b a s e s e r v i c e f o r a n y s c a l e
Fast, consistent
performance
Highly scalable Fully managed Business critical
reliability
Consistent single-digit
millisecond latency; DAX in-
memory performance reduces
response times to microseconds
Automatic scaling to
hundreds of terabytes of
data that serve millions
of requests per second
Automatic provisioning,
infrastructure
management, scaling,
and configuration with
zero downtime
Data is replicated across
fault-tolerant Availability
Zones, with fine-grained
access control
The
picture
can't be
displayed.
75. AWS Database Migration Service (AWS DMS)
DMS migrates databases to AWS easily and
securely with minimal downtime. It can migrate
your data to and from most widely used
commercial and open-source databases.
Amazon Aurora
S3 Bucket
DynamoDB
76. Customer
Premises
Application Users
AWS
Internet
VPN
Start a replication instance
Connect to source and target
databases
Select tables, schemas, or
databases
Let AWS DMS create tables,
load data, and keep them in
sync
Switch applications over to the
target at your convenience
Keep Your Apps Running During the Migration
AWS
Database Migration
Service
77. AWS Schema Conversion Tool (AWS SCT)
SCT helps automate many database schema and
code conversion tasks when migrating between
database engines or data warehouse engines
Amazon Aurora
78. SCT can tell you how hard the migration will be
1. Connect SCT to
Source and Target
databases.
2. Run Assessment
Report.
3. Read Executive
Summary.
4. Follow detailed
instructions.
80. Tools for Migration Project Phases
Phase Service/Tool Notes
Assessment AWS Schema Conversion Tool
Reports on the database objects, complexity and types of
migration issues
Schema Migration AWS Schema Conversion Tool
Copies a schema or migrates a schema depending on
whether it is a homogeneous or heterogeneous
migrations
Data Migration
AWS Database Migration Service
AWS Schema Conversion Tool
Bulk load and change data capture (CDC) options
Extraction and load for large data warehouses, including
AWS Snowball integration
Application Migration AWS Schema Conversion Tool SQL statement migration in application code
Data Validation AWS Database Migration Service Ensure data is the same on source and target
Functional Testing Various Tools on Marketplace Ensure the application runs as intended
Performance Testing Various Tools on Marketplace Ensure the application performance as intended
81. Tools for Migration Scenarios
Scenario Example Recommendation
Homogeneous
migration to the same
database version and
edition
Migration of Oracle Database
11gR2 Enterprise Edition from
on-premise to EC2
Use the native replication technology to
create a standby database and then failover
to the standby database
Homogeneous
migration to a different
version
Migration of MySQL 5.5 to
MySQL 5.7
AWS Schema Conversion Tool and AWS
Database Migration Service
Homogeneous
migration to a different
edition
Migration of SQL Server
Enterprise Edition to Standard
Edition
AWS Schema Conversion Tool and AWS
Database Migration Service
Heterogeneous
migration
Migration from Oracle
Database to PostgreSQL
AWS Schema Conversion Tool and AWS
Database Migration Service
86. AWS Migration Framework - Readiness &
Planning
• Project Control
⎼ Strategy (business
driver)
⎼ Key Stakeholders and
Team
⎼ Plan (Scope, Schedule,
Resources)
⎼ Cost Estimation
• Portfolio discovery
• Migration plan
• Operations Integration
• Security
READINESS AND
PLANNING
Project Control focuses on ensuring there is a migration
strategy in place that is supported by key stakeholders in
the organisation. Additionally, we look at defining the team
that will carry out the work, with associated timelines and
cost estimations.
Sample decision points:
• Who is the executive sponsor?
• Are there any compelling events that will affect the
migration strategy?
• Do we have the right resources? How are they organized?
• What are the timeframes we are working with?
• Do we have the necessary budget?
87. AWS Migration Framework - Readiness &
Planning
• Project Control
⎼ Strategy (business driver)
⎼ Key Stakeholders and
Team
⎼ Plan (Scope, Schedule,
Resources)
⎼ Cost Estimation
• Portfolio discovery
• Migration plan
• Operations Integration
• Security
READINESS AND
PLANNING
Quickly understand which applications are Cloud Eligible,
Cloud Friendly, or Cloud Native and then execute a dive
deep analysis on just that subset of applications.
88. Application Assessment
• Business driver and intended ROI?
• Migration sponsor (business owner, C-level)?
• ISV application? Does the ISV support the target?
• Maintenance window for the migration?
• Design documentation?
• Original developers/DBAs still available?
89. Database Assessment
• How many database objects (tables, triggers, SPs, users, etc.)?
• How much data?
• Complexity of the SPs and triggers?
• Proprietary DB features?
• Non-standard or custom data types?
• Character set conversions?
• Time zone or UTC?
• User authentication method?
• Licensing mechanism (cores, users, ULA etc.)
90. Application Technical Assessment
• Database Access:
• SQL statements throughout the code?
• Calls to a data abstraction layer?
• API calls?
• ANSI SQL used where possible?
• SQL complexity, e.g. analytics with many joins or simple CRUD?
• Number of lines of SQL code?
• Application access, e.g. LDAP, DB Users, etc.
91. The 6Rs of Migration Planning
Discover, Assess &
Prioritize Applications
Use Migration Tools
Transition Production
Redesign Application/
Infrastructure Architecture
App Code
Development
Purchase COTS/
SaaS & licensing
Validation
Modify underlying
Infrastructure
Full ALM / SDLC
Manual Config
Manual Deploy
Manual Install
Determine
Migration Path
Automate
Manual Install & Setup
Integration
Determine
new platform
93. AWS Migration Framework - Activate
• Determine your application priorities and group
integrated applications together
• Outline the success criteria for each application
migration
• Create your AWS landing zone (accounts, VPC, subnets,
IAM roles, VPN/Direct Connect, etc.)
• Configure DMS, SCT and other migration tools
• Team creation
• POC/pilot
• Prioritized Backlog
⎼ Application groups
⎼ Migration strategy
⎼ Success criteria
• Ops Integration –
Foundation and Landing
Zone (target zone setup)
• Setup Factory (Tools,
Teams, Processes)
• Pilot Migration
ACTIVATE
94. Building a Migration Team
Application architect/developer: Application expert who can identify
what components are important, complex, redundant, etc.
Source DBA: Knows the database design, schema, features used and
what must be migrated to the target.
Target DBA: An expert in the target database to help map features
from the source DB with the Source DBA.
AWS Solution Architect: Determines the correct target architecture in
AWS and is familiar with DMS/SCT.
Application/Database Developers: Customer and/or partner
resources to migrate the stored procedures, triggers and application
code.
95. Hiring and Developing Talent
New skills are needed for the target DB and often AWS if
migrating from on-premises
Develop training plans for existing employees
Hire in required skills if necessary
Retrain, redeploy or make people redundant who’s skills
are no longer relevant
96. Pilot/POC
Choose a reasonably complex module/component to
migrate to validate your assumptions in the Activate phase
You should:
• Obtain more accurate migration assessments
• Determine what can be automated
• Learn how the migration tools behave (limitations, bugs,
improvements needed)
• Learn what skills are missing from your team
98. AWS Migration Framework - Execute
• Always have a back up plan!
• Execute according to lessons from Pilot/POC
• Typically the same amount of time to migrate the
DB as to migrate the application (assuming DAL)
• Determine how you will cutover
• Parallel run: expensive and difficult
• Minimal downtime: DMS+CDC
• Large maintenance window: application and
data verification before go live
EXECUTE
Discover
Design
Build
Integrate
Validate
Cutover
100. AWS Migration Framework - Optimize
• DMS instance and task optimization
• Database tuning
• Database instance right sizing
• Application tuning
• Application instance right sizing
• Use EC2 and RDS stop/start to optimize costs
• Purchase reserved instances and use spot
instances
• Look for contention and evaluate caching, NoSQL,
federation and adoption of a microservices
strategy
• Perform HA/DR scenarios and optimize the use of
AWS managed services to help, e.g. RDS MAZ,
Auto-scaling
• Application optimization
• Process optimization
• Operational optimization
• Cost optimization
OPTIMIZE
102. Database migration – multi phase process
Phase Description Automation Effort (%)
1 Assessment SCT 2
2 Database Schema Conversion SCT/DMS 14
3 Application Conversion/Remediation SCT 25
4 Scripts Conversion SCT 7
5 Integration with 3rd party applications 3
6 Data Migration DMS 4
7 Functional testing of the entire system 29
8 Performance tuning SCT 2
9 Integration and deployment 7
10 Training and knowledge 2
11 Documentation and version control 2
12 Post production support 3
104. Oracle to Aurora Migration Playbook
• Topic-by-topic overview of Oracle to Aurora
PostgreSQL migrations and “hands-on” best
practices
• How to migrate from proprietary features and
the different database objects
• Migration best practices
SCT DMS Playbook
Schema Data Best practices
https://aws.amazon.com/dms/getting-started/
110. Next Steps
• Talk to your AWS account team and AWS Partner
• Ask us about funding for POCs and commercial DB
migrations (e.g. Oracle Database to Aurora)
• Read Documentation, White Papers, Playbooks
• Links:
• DMS & SCT: https://aws.amazon.com/dms/
• Getting Started Guides and Playbooks:
https://aws.amazon.com/dms/getting-started/
111. Thank You
You will receive today’s webinar recording and presentation deck,
look out for it in your inbox.