Mais conteúdo relacionado Semelhante a Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019 (20) Mais de javier ramirez (20) Building a Modern Data Platform on AWS. Public Sector Summit Brussels 20191. P U B L I C S E C T O R
S U M M I T
Public Secto r B rus s els
04.09.19
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Building a Modern Data Platform in
theCloud
Javier Ramirez
AWS Tech Evangelist
@supercoco9
D A T 1
Brussels
04.09.19
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Relatedbreakouts
Everything You Need to Know About Big Data: From
Architectural Principles to the Best Practices
Manos Samatas, Solutions Architect, Amazon Web Services
Tableau and AWS: Analytics in the Cloud
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Agenda
Challenges of data engineering and analytics
Building a data lake with S3. Ingesting data into the cloud
Data catalog and ETL with AWS Glue
Datawarehouse with Redshift, Spectrum, and Athena
Business dashboards with Quicksight
Customer presentation
Demo
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A brief opinionated historyof dataanalytics
Problem
Solution
My reports make
my database
server very slow
Before 2009
The DBA years
Overnight DB dump
Read-only replica
My data doesn’t fit in one
machine
And it’s not only
transactional
2009-2011
The Hadoop epiphany
Hadoop
Map/Reduce all the
things
My data is very
fast
Map/Reduce is
hard to use
2012-2014
The Message Broker
and NoSQL Age
Kafka/RabbitMQ
Cassandra/HBAS
E/STORM
Basic ETL
Hive
Duplicating batch/stream is inefficient
I need to cleanse my source data
Hadoop ecosystem is hard to manage
My data scientists don’t like JAVA
I am not sure which data we are
already processing
2015-2017
The Spark kingdom and
the spreadsheet wars
Kafka/Spark
Complex ETL
Create new departments for data
governance
Spreadsheet all the things
Streaming is hard
My schemas have evolved
I cannot query old and new
data together
My cluster is running old
versions. Upgrading is hard
I want to use ML
2017-2018
The myth of DataOps
Kafka/Flink (JAVA or Scala
required)
Complex ETL with a pinch of
ML
Apache Atlas
Commercial distributions
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Some problems during allperiods
• My team spends more time maintaining the cluster than adding functionality
• Security and monitoring are hard
• Most of my time my cluster is sitting idle; Then it’s a bottleneck
• I don’t have the time to experiment
• Data preparation, cleansing, and basic transformations take a disproportionally
high amount of my time. And it’s so frustrating
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Somesimplethingsthatscareme (andeatmyproductivity)
• Text encodings
• Empty strings. Literal ”NULL” strings. Uppercase and Lowercase
• Date and time formats: which date would you say this is 1/4/19? And this? 1553589297
• CSV, especially if uploaded by end users
• A big JSON file in which row 176.543 has a property never seen before
• The same JSON file when all the numbers are strings
• XML
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Let’smake dataengineering and analyticslessscary
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Moredatalakes&analyticsonAWSthananywhereelse
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes,Analytics,and MLPortfolio fromAWS
Broadest,deepestsetofanalyticservices
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS Storage Gateway
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Real-time
Data Movement
On-premises
Data Movement
Data Lake on AWS
Storage | Archival Storage | Data Catalog
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Movement From On-premises Datacenters
AWS Snowball,
Snowball Edge and
Snowmobile
Petabyte and Exabyte-
scale data transport
solution that uses secure
appliances to transfer
large amounts of data into
and out of the AWS cloud
AWS Direct Connect
Establish a dedicated
network connection from
your premises to AWS;
reduces your network
costs, increase bandwidth
throughput, and provide a
more consistent network
experience than Internet-
based connections
AWS Storage
Gateway
Lets your on-premises
applications to use AWS
for storage; includes a
highly-optimized data
transfer mechanism,
bandwidth management,
along with local cache
AWS Database
Migration Service
Migrate database from the
most widely-used
commercial and open-
source offerings to AWS
quickly and securely with
minimal downtime to
applications
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Movement From Real-time Sources
Amazon Kinesis
Video Streams
Securely stream video
from connected devices to
AWS for analytics,
machine learning (ML),
and other processing
Amazon Kinesis Data
Firehose
Capture, transform, and
load data streams into
AWS data stores for near
real-time analytics with
existing business
intelligence tools.
Amazon Kinesis Data
Streams
Build custom, real-time
applications that process
data streams using
popular stream processing
frameworks
AWS IoT Core
Supports billions of
devices and trillions of
messages, and can
process and route those
messages to AWS
endpoints and to other
devices reliably and
securely
Managed Streaming
For Kafka
Fully managed open-
source platform for
building real-time
streaming data pipelines
and applications.
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AmazonS3—ObjectStorage
Security and
Compliance
Three different forms of
encryption; encrypts data
in transit when replicating
across regions; log and
monitor with CloudTrail,
use ML to discover and
protect sensitive data with
Macie
Flexible Management
Classify, report, and
visualize data usage
trends; objects can be
tagged to see storage
consumption, cost, and
security; build lifecycle
policies to automate
tiering, and retention
Durability, Availability
& Scalability
Built for eleven nine’s of
durability; data distributed
across 3 physical facilities
in an AWS region;
automatically replicated to
any other AWS region
Query in Place
Run analytics & ML on
data lake without data
movement; S3 Select can
retrieve subset of data,
improving analytics
performance by 400%
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AmazonGlacier—Backup andArchive
Durability, Availability
& Scalability
Built for eleven nine’s of
durability; data distributed
across 3 physical facilities
in an AWS region;
automatically replicated to
any other AWS region
Secure
Log and monitor with
CloudTrail, Vault Lock
enables WORM storage
capabilities, helping
satisfy compliance
requirements
Retrieves data in
minutes
Three retrieval options to
fit your use case;
expedited retrievals with
Glacier Select can return
data in minutes
Inexpensive
Lowest cost AWS object
storage class, allowing
you to archive large
amounts of data at a very
low cost
$
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data PreparationAccounts for ~80% of theWork
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-
least-enjoyable-data-science-task-survey-says/#6493d6c76f63
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWSGlue—DataCatalog
Makedatadiscoverable
• Automatically discovers data and stores schema
• Catalog makes data searchable, and available for ETL
• Catalog contains table and job definitions
• Computes statistics to make queries efficient
• Run ad hoc or on a schedule; serverless – only pay when
crawler runs
Glue
Data Catalog
Discover data and
extract schema
Compliance
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWSGlue—ETLService
MakeETLscriptinganddeploymenteasy
• Automatically generates ETL code. Spark
(Scale/Python) or Python shell script.
• Code is customizable (demo later on. Yay!)
• Endpoints provided to edit, debug,
test code
• Jobs are scheduled or event-based
• Serverless
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes,Analytics,and MLPortfolio fromAWS
Broadest,deepestsetofanalyticservices
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine Learning
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS Storage Gateway
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Real-time
Data Movement
On-premises
Data Movement
Data Lake on AWS
Storage | Archival Storage | Data Catalog
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon EMR—Big DataProcessing
Low cost
Flexible billing with per-
second billing, EC2 spot,
reserved instances and
auto-scaling to reduce
costs 50–80%
$
Easy
Launch fully managed
Hadoop & Spark in
minutes; no cluster
setup, node provisioning,
cluster tuning
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Use S3 storage
Process data directly in
the S3 data lake securely
with high performance
using the EMRFS
connector
Data Lake
100110000100101011100
101010111001010100000
111100101100101010001
100001
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon Redshift—DataWarehousing
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
Secure
Audit everything; encrypt
data end-to-end;
extensive certification and
compliance
Open file formats
Analyze optimized data
formats on the latest SSD,
and all open data formats
in Amazon S3
Inexpensive
As low as $1,000 per
terabyte per year, 1/10th
the cost of traditional data
warehouse solutions; start
at $0.25 per hour
$
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon RedshiftSpectrum
ExtendthedatawarehousetoexabytesofdatainS3datalake
S3 data lakeRedshift data
Redshift Spectrum
query engine • Exabyte Redshift SQL queries against S3
• Join data across Redshift and S3
• Scale compute and storage separately
• Stable query performance and unlimited concurrency
• CSV, ORC, Avro, & Parquet data formats
• Pay only for the amount of data scanned
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Let’splay a game:SQL on anexabyteof data
WernerVogels, Amazon’s CTO, AWS Summit San Francisco 2017
https://youtu.be/RpPf38L0HHU?t=3963
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Numbers are fun
WernerVogels, Amazon’s CTO, AWS Summit San Francisco 2017
https://youtu.be/RpPf38L0HHU?t=3963
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Numbers are fun
WernerVogels, Amazon’s CTO, AWS Summit San Francisco 2017
https://youtu.be/RpPf38L0HHU?t=3963
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AmazonAthena—InteractiveAnalysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
Query Instantly
Zero setup cost; just
point to S3 and
start querying
SQL
Open
ANSI SQL interface,
JDBC/ODBC drivers,
multiple formats,
compression types,
and complex joins and
data types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with
QuickSight
Pay per query
Pay only for queries
run; save 30–90% on
per-query costs
through compression
$
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AmazonQuickSight
easy
Empower
everyone
Seamless
connectivity
Fast analysis Serverless
Now with ML superpowers!
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes fromAWS
Data Lake
on AWS
Cost-effective
Scalable and durable
Secure
Open and comprehensiveAnalyticsMachine Learning
Real-time Data
Movement
On-premises
Data Movement
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS Provides HighestLevelsofSecurity
Secure
Compliance
AWS Artifact
Amazon Inspector
Amazon Cloud HSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
VPC
Encryption
AWS Certification Manager
AWS Key Management
Service
Encryption at rest
Encryption in transit
Bring your own keys, HSM
support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customer need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Compliance:VirtuallyEveryRegulatoryAgency
CSA
Cloud Security
Alliance Controls
ISO 9001
Global Quality
Standard
ISO 27001
Security Management
Controls
ISO 27017
Cloud Specific
Controls
ISO 27018
Personal Data
Protection
PCI DSS Level 1
Payment Card
Standards
SOC 1
Audit Controls
Report
SOC 2
Security, Availability, &
Confidentiality Report
SOC 3
General Controls
Report
Global United States
CJIS
Criminal Justice
Information Services
DoD SRG
DoD Data
Processing
FedRAMP
Government Data
Standards
FERPA
Educational
Privacy Act
FIPS
Government Security
Standards
FISMA
Federal Information
Security Management
GxP
Quality Guidelines
and Regulations
ISO FFIEC
Financial Institutions
Regulation
HIPPA
Protected Health
Information
ITAR
International Arms
Regulations
MPAA
Protected Media
Content
NIST
National Institute of
Standards and Technology
SEC Rule 17a-4(f)
Financial Data
Standards
VPAT/Section 508
Accountability
Standards
Asia Pacific
FISC [Japan]
Financial Industry
Information Systems
IRAP [Australia]
Australian Security
Standards
K-ISMS [Korea]
Korean Information
Security
MTCS Tier 3 [Singapore]
Multi-Tier Cloud
Security Standard
My Number Act [Japan]
Personal Information
Protection
Europe
C5 [Germany]
Operational Security
Attestation
Cyber Essentials
Plus [UK]
Cyber Threat
Protection
G-Cloud [UK]
UK Government
Standards
IT-Grundschutz
[Germany]
Baseline Protection
Methodology
X P
G
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes fromAWS
Data Lake
on AWS
Cost-effective
Scalable and durable
Secure
Open and comprehensiveAnalyticsMachine Learning
Real-time Data
Movement
On-premises
Data Movement
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
For example: Amazon S3 holds trillions of objects and
regularly peaks at millions of requests per second
TIME
CUSTOMERDATA
“…the scale at which AWS operates its public
cloud storage services dwarfs the other vendors
in this Magic Quadrant.”
- Gartner Magic Quadrant for Public Cloud Storage Services, Worldwide
Raj Bala, Arun Chandrasekaran, John McArthur, July 24, 2017
AWS Runs the Largest Global Cloud
Infrastructure
Scalable and durable
34. CHALLENGE
Need to create constant feedback loop
for designers
Gain up-to-the-minute understanding of
gamer satisfaction to guarantee gamers
are engaged, thus resulting in the most
popular game played in the world
Fortnite | 125+ million players
35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
EpicGames usesData Lakesand analytics
Entire analytics platform running on AWS
S3 leveraged as a Data Lake
All telemetry data is collected with Kinesis
Real-time analytics done through Spark on EMR,
DynamoDB to create scoreboards and real-time queries
Use Amazon EMR for large batch data processing
Game designers use data to inform their decisions
Game
clients
Game
servers
Launcher
Game
services
N E A R R E A L T I M E P I P E L I N E
N E A R R E A L T I M E P I P E L I N E
Grafana
Scoreboards API
Limited Raw Data
(real time ad-hoc SQL)
User ETL
(metric definition)
Spark on EMR DynamoDB
NEAR REALTIME PIPELINES
BATCH PIPELINES
ETL using
EMR
Tableau/BI
Ad-hoc SQLS3
(Data Lake)
Kinesis
APIs
Databases
S3
Other
sources
36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lakes fromAWS
Data Lake
on AWS
Lowest cost
Scalable and durable
Secure
Open and comprehensiveAnalyticsMachine Learning
Real-time Data
Movement
On-premises
Data Movement
37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
PayOnly for theResourcesYouUseas youScale
LowestCost
• Pay-as-you-go for the resources you consume
• As low as $0.05/GB scanned with Athena
• EMR and Athena can automatically scale down
resources after job completes, saving you costs
• Commit to a set term and save up to 75% with
Reserved Instance
• Run on spare compute capacity with EMR and
save up to 90% with Spot
Traditional approach leads to wasted capacity
Traditional: Rigid
AWS: Elastic
Capacity
Demand
Demand
Servers
Unmet demand
upset players
missed revenue
Excess capacity
wasted $$$
AWS approach: pay for the capacity you use
38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS databasesand analytics
Broadanddeepportfolio,builtforbuilders
AWS Marketplace
Amazon Redshift
Data warehousing
Amazon EMR
Hadoop + Spark
Athena
Interactive analytics
Kinesis Analytics
Real-time
Amazon Elasticsearch
service
Operational Analytics
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server
Aurora
MySQL, PostgreSQL
Amazon
QuickSight
Amazon
SageMaker
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
S3/Amazon Glacier
AWS Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect
Data Movement
AnalyticsDatabases
Business Intelligence & Machine Learning
Data Lake
Managed
Blockchain
Blockchain
Templates
Blockchain
Amazon
Comprehend
Amazon
Rekognition
Amazon
Lex
Amazon
Transcribe
AWS DeepLens 250+ solutions
730+ Database
solutions
600+
Analytics
solutions
25+
Blockchain
solutions
20+
Data lake
solutions
30+ solutions
RDS on VMWare
39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
41
ENTERPRISE INFORMATION MANAGEMENT @ VOO
Modern Data Platform on AWS: 13:15 – 14:05
41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AGENDA
VOO & Micropole Belgium
The problem of managing the Enterprise Information
Why Public Cloud? Why AWS?
Architecture and used AWS services
The results
Next steps
42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
VOO, PART OF NETHYS GROUP
Gaz & Electricité
Energie
invest
Télécoms/Média Participations
Quadruple
Play
Energies
renouvelables
Participations
financières
Gestion de réseaux
de distribution
d’électricité
et de gaz
Services
ICT
BtoB
Télévision
à
péage
Quotidien
régional
Magazine
News & TV
43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
VOO PRESENCE IN WALLONIAAND BRUSSELS
44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
VOO – QUADRUPLE PLAY OFFERING
- ~20 analogue TV channels
- - ~150 digital TV channels
(SD, HD, 3D, 4K)
- VOOcorder & .évasion
(PVR)
- Digital TV card
- VOD
- Be tv
- VOOmotion & Be tv Go
available for PC, tablet and
smartphones
- Internet -> 400Mbps
- “unlimited” Packs
- WIFI modems according “ac”
norms
- WIFI homespots (Wifree)
Fixed telephony VOOmobile via an MVNO
agreement
Television and Internet, a
simple offer,
tailored to your needs
The no-frills essential
experience!
A generous offer at
attractive rates
The All-in Pack
that makes your
life easier
The Pack that is
as mobile as you
are
45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
MICROPOLE BELGIUM, PART OF MICROPOLE GROUP
We will soon become an
advanced AWS partner
1250
30
12
Business consultants & engineers around the world
Years of expertise in advanced analytics and BI
A team of 12 certified AWS experts
Data Intelligence & Performance
Data Governance & Architecture
Machine Learning
Blockchain
Advanced Analytics in the Cloud3 Years of AWS Partnership
46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
HISTORY OF BI & BIG DATAAT VOO
OSS
BSS Up to 2014
Typical Legacy environment with multiple dedicated silo
DWH/Reporting environments
Source systems and data managements environments all
hosted on a Private Cloud
Capacity upgrades and performance tuning slow, but more
or less manageable
During 2014 - 2015 - 2016
Launch of Mobile services and exploitation of Network
and
Set-Up-Box data to improve customer experience and
usage-based campaign management
Consequences
Explosion of storage requirements
Performance issues
Lack of adequate tools to manage « big data »
In addition: reporting/analytical environments fragmented
and unsecured
47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
GENESIS OF THE MEMENTO PROJECT
April-May 2017 - concretization: detailed study with Micropole to define a
business and technical architecture, a future organization and data
governance model, and an implementation roadmap
June 2017 - implementation: kickoff of Memento project with 2 project tracks
1. Implementation of an “ARCHITECTURE” allowing the “unification of DATA”
coming from different sources (internal/external) to provide the company with
reliable “INFORMATION” and in realtime (reporting/analysis)
2. Set up of an “IT and Business ORGANISATION + an Operational
GOVERNANCE” strengthening a culture supporting the definition and
implementation of a “data strategy”
48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Memento project objectives
Improve Customer Experience
Subscriber acquisition
Optimise operations
Increase the level of profitability
Increase customer retention
GDPR compliance
Provide decision support data for the 6 strategic axes of VOO
49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
WHY CHOSINGAPUBLIC CLOUD?
No upfront investments, no need for a detailed
capacity plan, no long delays to order and install
hardware
Ultra fast installation and configuration of the different
solution components
« Pay as you go » / « pay as you use » principle
(capacity extensions only when required and payments
accordingly)
Elasticity, resilience and high availability
Use of Managed Services proposed in the Cloud
drastically reduces license and maintenance costs
50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
WHYAWS?
Market leader in public cloud (source: Gartner)
Broadest offering in PaaS
AWS has the most complete offering in managed services –
compared to Azure or Google Cloud Platform
Managed service definition (example for a database): it
doesn’t requires installation, maintenance, patching,
licensing. Furthermore backups are managed and high
availability is integrated and requires only a minimum in
configuration.
This is an tremendous accelerator and a considerable cost
saving compared to traditional software. Examples: S3,
Redshift, EMR, DynamoDB etc.
Infrastructure administration is reduced to a minimum and is
done inside the EIM team
51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
MEMENTOARCHITECTURE
Stage – raw/native format
Process: batch & real-time
Store & expose
Archive
Telenet
Social media
Web logs
Sensor data
Big Data Cluster
Optional:
filter or
aggregate
BW
Primary
access
point
Power users – Data
Scientists
Exploration (sandbox mode)
Batch
Real Time
NRT
Extract SAP data –
simplified flows
CRM 4 CRM 7 ISU ACBIS Numbers EffortelFAST
Data Lake
SERAM EffortelJira
ALLOT
…
Every
SAP
transactional
system
S
O
U
R
C
E
S
S
T
A
G
E
E
X
P
O
S
E
I
N
T
E
G
R
A
T
E
Predictive tools
Entreprise
Data Warehouse
Reporting
Analysis
Visualisation
e.g. Churn
Prediction
Controlled
Sandbox
NoSQL
High
frequency
queries –
detailed
data
Batch
Mini batches (NRT)
Call center
application
52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
MEMENTO IMPLEMENTATION ONAWS
53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
RESULTS
A unified data platform
That enables any analytical use case
Goes beyond analytics usage
Shorten development cycles and answer faster business requirements
Scales without boundaries AND pay for what you are using
A well defined architecture, structures and standards allows parallel work by a large(r) team
and embraces agile
Allows business teams to focus on data analysis (and not on data crunching)
GDPR compliance (privacy by design)
54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
NEXT STEPS
Finalise the integration of all existing enterprise data sources
Finalise remaining use cases and decommission the Legacy environment
Integrate all new data sources (from new projects and products)
Extend the usage to other companies within the Nethys Group
Respond in a agile way to the more challenging and complex business uses cases, including
the use of Machine Learning
55. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
THANKS FOR YOUR ATTENTION
56. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
57. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
DemoOverview
https://aws.amazon.com/blogs/big-data/harmonize-query-and-visualize-data-from-
various-providers-using-aws-glue-amazon-athena-and-amazon-quicksight/
58. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
59. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Typicalstepsof building adatalake
Setup Storage1
Move data2
Cleanse, prep,
and catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
60. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Building data lakes can still take months
61. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS LakeFormation (join thepreview)
Build, secure, and manage a data lake in days
Build a data lake in days,
not months
Build and deploy a fully managed
data lake with a few clicks
Enforce security
policies across multiple
services
Centrally define security, governance,
and auditing policies in one place and
enforce those policies for all users
and all applications
Combine different
analytics approaches
Empower analyst and data scientist
productivity, giving them self-service
discovery and safe access to all data
from a single catalog
62. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Howitworks:AWSLake Formation
S3
IAM KMS
OLTP
ERP
CRM
LOB
Devices
Web
Sensors
Social Kinesis
Build Data Lakes quickly
• Identify, crawl, and catalog sources
• Ingest and clean data
• Transform into optimal formats
Simplify security management
• Enforce encryption
• Define access policies
• Implement audit login
Enable self-service and combined analytics
• Analysts discover all data available for analysis from
a single data catalog
• Use multiple analytics tools over the same data
Athena
Amazon
Redshift
AI Services
Amazon
EMR
Amazon
QuickSight
Data
Catalog
63. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
CustomerinterestinAWS LakeFormation
“We are very excited about the launch of AWS Lake Formation,
which provides a central point of control to easily load,
clean, secure, and catalog data from thousands of clients to
our AWS-based data lake, dramatically reducing our
operational load. … Additionally, AWS Lake Formation will be
HIPAA compliant from day one …”
- Aaron Symanski, CTO, Change Healthcare
“I can’t wait for my team to get our hands on AWS Lake
Formation. With an enterprise-ready option like Lake
Formation, we will be able to spend more time deriving
value from our data rather than doing the heavy lifting
involved in manually setting up and managing our data lake.” -
Joshua Couch, VP Engineering, Fender Digital
64. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
65. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Javier Ramirez
@supercoco9
66. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
SelectAWSGlue customers
67. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
68. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
DemoOverview
https://aws.amazon.com/blogs/big-data/harmonize-query-and-visualize-data-from-
various-providers-using-aws-glue-amazon-athena-and-amazon-quicksight/