Mais conteúdo relacionado Semelhante a Deriving Value with Next Gen Analytics and ML Architectures (20) Mais de Amazon Web Services (20) Deriving Value with Next Gen Analytics and ML Architectures1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Deriving Value with Next Gen
Analytics and ML
Architectures
Rahul Pathak, GM Big Data & Data Lakes
March 19, 2019
2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
125+ million players
Data provides a constant feedback loop
for game designers
Up-to-the-minute analysis of gamer
satisfaction to drive gamer engagement
Resulting in the most popular
game played in the world
30 PB+ data lake in S3 growing at 2PB
every month
Fortnite
3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Customers want more value from their data
Growing
exponentially
From new
sources
Increasingly
diverse
Used by
many people
Analyzed by
many applications
4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Cloud data lakes are the future
Data Lake
Customers want:
To move to a single store; i.e., a data lake in the cloud
To store data securely in standard formats
To grow to any scale, with low costs
To analyze their data in a variety of ways
To democratize data access and analysis
5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Why choose AWS for data lakes and analytics?
Most
comprehensive
Most
secure
Easiest
to build
Most
cost-effective
Most
customers
& partners
6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most comprehensive
Broadest and deepest portfolio, purpose-built for builders
Migration & Streaming Services
Infrastructure Data Catalog
& ETL
Security &
Management
Dashboards Predictive Analytics
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Serverless
Data processing
Visualization & Machine Learning
Data Movement
Analytics
Data Lake Infrastructure & Management
7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data Movement
Analytics
Most comprehensive
Broadest and deepest portfolio, purpose-built for builders
+ 10 more
Redshift
EMR (Spark
& Hadoop)
Athena
Elasticsearch
Service
Kinesis Data
Analytics
Glue (Spark
& Python)
S3/Glacier GlueLake
Formation
Visualization & Machine Learning
QuickSight SageMaker Comprehend Lex Polly Rekognition Translate Transcribe
Deep Learning
AMIs
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Lake Infrastructure & Management
8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon SageMaker
Frameworks Interfaces
EC2 P3
& P3dn
EC2 C5 FPGASs GreenGrass Elastic
Inference
The Amazon ML stack
Broadest & deepest set of capabilities
AI Services
ML Frameworks & Infrastructure
Rekognition
Image
Polly
Transcribe
Translate Comprehend
& Comprehend Medical
Rekognition
Video
Textract
Forecast PersonalizeLex
Vision Speech ChatbotsLanguage Forecasting Recommendations
Infrastructure
Pre-built algorithms & notebooks
Data labeling (Ground Truth)
One-click model training & tuning
Optimization (NEO)
One-click deployment & hosting
Reinforcement learningAlgorithms & models (AWS Marketplace for ML)
Train DeployBuild
ML Services
9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most secure
Services for security and governance
Compliance
AWS Artifact
Amazon Inspector
Amazon Cloud HSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
VPC
Encryption
AWS Certification Manager
AWS Key Management
Service
Encryption at rest
Encryption in transit
Bring your own keys, HSM
support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customers need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most secure — Certifications
CSA
Cloud Security
Alliance Controls
ISO 9001
Global Quality
Standard
ISO 27001
Security Management
Controls
ISO 27017
Cloud Specific
Controls
ISO 27018
Personal Data
Protection
PCI DSS Level 1
Payment Card
Standards
SOC 1
Audit Controls
Report
SOC 2
Security, Availability, &
Confidentiality Report
SOC 3
General Controls
Report
Global United States
CJIS
Criminal Justice
Information Services
DoD SRG
DoD Data
Processing
FedRAMP
Government Data
Standards
FERPA
Educational
Privacy Act
FIPS
Government Security
Standards
FISMA
Federal Information
Security Management
GxP
Quality Guidelines
and Regulations
ISO FFIEC
Financial Institutions
Regulation
HIPPA
Protected Health
Information
ITAR
International Arms
Regulations
MPAA
Protected Media
Content
NIST
National Institute of
Standards and Technology
SEC Rule 17a-4(f)
Financial Data
Standards
VPAT/Section 508
Accountability
Standards
Asia Pacific
FISC [Japan]
Financial Industry
Information Systems
IRAP [Australia]
Australian Security
Standards
K-ISMS [Korea]
Korean Information
Security
MTCS Tier 3 [Singapore]
Multi-Tier Cloud
Security Standard
My Number Act [Japan]
Personal Information
Protection
Europe
C5 [Germany]
Operational Security
Attestation
Cyber Essentials
Plus [UK]
Cyber Threat
Protection
G-Cloud [UK]
UK Government
Standards
IT-Grundschutz
[Germany]
Baseline Protection
Methodology
X P
G
11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most cost effective
Decouple compute and storage, choice of PAYG analytics services
Storage
S3 tiers &
intelligent tiering
From $0.023 per
GB/mo to as low as
$0.004 per GB/mo
Compute
Spot & reserved
instances
Save up to 90% off
on-demand prices
EMR
Autoscaling
57% less than
on-premises
per IDC report
Redshift
less than a tenth
of the cost of
traditional solutions.
Athena &
QuickSight
Serverless pay
only for what is used
12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
More data lakes and analytics than anywhere else
More than 10,000 data lakes on AWS
13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most partners to complement AWS offerings
14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data movement solutions
Migration & Streaming Services
Data Movement
15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Most ways to move data to the data lake
Data movement from
on-premises datacenters
Dedicated network connection
Secure appliances
Ruggedized shipping containers
Database migration
Gateway that lets applications write to the cloud
Data movement from real-time sources
Connect devices to AWS
Real-time data streams
Real-time video streams
Data movement from
real-time sources
Data movement from
your on-premises
datacenters
Amazon S3
Amazon Glacier
AWS Glue
Synchronizing data
across environments
Professional services and partners
to help migration
Data
Movement
16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data lake infrastructure
& management solutions
Infrastructure Data Catalog
& ETL
Security &
Management
Data lake infrastructure & management
17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
S3
Lake Formation & Glue
Snowball Kinesis
Data Streams
Snowmobile Kinesis
Data Firehose
Redshift
EMR
Athena
Kinesis
Elasticsearch
Service
Robust data lake infrastructure
SageMaker
Comprehend
Rekognition
Durable and available; exabyte scale
Secure, compliant, auditable
Object-level controls for fine-grain access
Fast performance by retrieving subsets of data
Decoupling of compute and storage
On-demand resources, tiering, cost choices
Data lake infrastructure
& management
18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Build on robust data lake infrastructure
with Amazon S3
✔ 99.99999999999% durability
✔ Global replication capabilities
✔ Management features
✔ Cost-effective storage classes
✔ Most partner integrations
Data lake infrastructure
& management
19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
“Zestimates are more up-to-
date and accurate, because
they’re built with the absolute
latest data. That’s a huge
benefit for our users, who
depend on this information
to influence their buying or
selling decisions.”
—Jasjeet Thind, Vice President of Data
Science and Engineering, Zillow Group
Data lake infrastructure
& management
20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Set up a catalog, ETL, and data prep
with AWS Glue
Serverless provisioning, configuration,
and scaling to run your ETL jobs on
Apache Spark
Pay only for the resources used for jobs
Crawl your data sources, identify data
formats and suggest schemas and
transformations
Automates the effort in building,
maintaining and running ETL jobs
Data lake infrastructure
& management
21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
“Beeswax uses Amazon S3 and
AWS Glue Data Catalog to build a
highly reliable data lake that is
fully managed by AWS. Our
platform leverages the AWS Glue
Data Catalog integration with
Amazon EMR in Hive and
SparkSQL applications to deliver
reporting and optimization
features to our customers.”
—Ram Kumar Rengaswamy, CTO, Beeswax
Data lake infrastructure
& management
22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Challenges to making a secure data lake
Typical steps of building a data lake
Move data2 Cleanse, prep,
and catalog data
3
Configure and enforce security
and compliance policies
4
Make data available
for analytics5
Setup storage1
Data lake infrastructure
& management
23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Build a secure data lake in days
with AWS Lake Formation
Move, store, catalog, and
clean your data faster
Move, store, catalog,
and clean your data faster
with Machine Learning
Enforce security policies
across multiple services
Enforce security policies across
multiple services
Gain and manage new
insights
Empower analyst and data
scientist to gain and manage
new insights
Data lake infrastructure
& management
24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data lake infrastructure
& management
“With an enterprise-ready
option like Lake Formation,
we will be able to spend more
time deriving value from our
data rather than doing the
heavy lifting involved
in manually setting up and
managing our data lake.”
—Joshua Couch, VP Engineering
at Fender Digital
25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics solutions
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Serverless
Data processing
26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Big data processing with Apache Spark & Hadoop
with Amazon EMR
Easy to use notebooks
Low cost vs on-premises
Elastic autoscaling
Reliable 99.9% SLA
Secure with encryption and keys
Flexible, open source choice
Analytics
Enterprise-grade Easy Lowest cost
27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics
FINRA’s legacy system did not
scale to handle 130 billion
events per day. They needed to
run complex surveillance queries
over 40+ PB of data
FINRA migrated their big data
appliance to a S3 Data Lake
and uses EMR for ingestion
and processing
28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data warehouse for business reporting
with Amazon RedShift
Fast—up to 10x faster than
traditional data warehouses
Easy to setup, deploy and manage
Cost-effective
Scale on-demand for large data
volume and high query concurrency
Query data in open formats directly
from the data lake
Analytics
29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics
“20 percent of our queries now
complete in less than one
second. Best of all, we didn’t
have to change anything to
get this speed-up with
Redshift, which supports our
mission-critical workloads.”
—Greg Rokita, Executive Director
of Technology, Edmunds
30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Challenge
Needed to analyze data to find insights, identify
opportunities and evaluate business performance
The Oracle DW did not scale, was difficult to
maintain and costly
Solution
Deployed a data lake with Amazon S3, and run
analytics with Amazon Redshift, Amazon Redshift
Spectrum, and Amazon EMR
Result: They doubled the data stored (100PB),
lowered costs, and was able to gain insights faster
50 PB of data
600,000 analytics jobs/day
Analytics
31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Migrated from on-premises
data warehouse
Built a data warehouse with
Redshift and a data lake with S3
Analytics on data lake with
Amazon Athena, Amazon Redshift
Spectrum, and Amazon EMR
Report delivery went from
months to days
Analytics
32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Real-time analytics for timely insights
with Amazon Kinesis
Make streaming data available to
multiple real-time analytics applications
Run streaming applications without
managing any infrastructure
Durable to reduce the probability
of data loss
Scalable to process data from hundreds
of thousands of sources with low latencies
Analytics
33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics
“Amazon Kinesis makes it simple
to scale our solution end to end,
including the capture, processing,
and delivery of actionable
insights. This empowers our
customers to better understand
their user base.”
— Indu Narayan, Director of Data, Yieldmo
34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Operational analytics for logs and search
with Amazon Elasticsearch
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash
and Kibana
VPC support; at-rest and
in-transit encryption
Scale up and down easily
Analytics
35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics
“Ultimately, we are improving our
software products and offering
better service to our customers
because of the real-time visibility
we’re getting into log data.”
“Amazon Elasticsearch Service
enables data forensic activities
to take place and help find and
fix application problems faster.”
—Tommy Li, Senior Software Architect,
Autodesk
36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Interactive analysis
with Amazon Athena
Interactive query service to analyze data in
Amazon S3 using standard SQL
No infrastructure to set up or manage and
no data to load
Ability to run SQL queries on data archived
in Amazon Glacier
(coming soon)
Analytics
37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Analytics
“One of the big attractions of
Amazon Athena is that it’s
serverless and purely
consumption-based.”
—Matt Chesler, director of DevOps
at Movable Ink
“We only pay when we’re actually querying the
data, and we don’t have to keep a cluster
running all the time. Using Amazon Athena,
we’re able to query seven years’ worth of
data—adding up to hundreds of terabytes—
get results at least 50 percent faster, and
save nearly $15,000 per month.”
38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Serverless analytics
Deliver on-demand analytics on the data lake
S3
Data lake
Glue
(ETL &
Data Catalog)
Athen
a
QuickSight
Serverless. Zero
infrastructure. Zero
administration
Never pay for
idle resources
$
Availability and
fault tolerance
built in
Automatically
scales resources
with usage
AWS IoT
AI/ML
Devices Web Sensors Social
Analytics
39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Visualization & machine
learning solutions
Dashboards Predictive Analytics
Visualization & Machine Learning
40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Visual insights for everyone
with Amazon QuickSight
Pay only for what you use
Scale to tens of thousands of users
Embedded analytics
Build end-to-end BI solutions
Visualization &
Machine Learning
41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Priced to allow access to everyone
Create and publish
dashboards
Secure access to dashboards
anytime, anywhere
$18
/user/month
Billed annually
$0.30
/session*
up to a
max of
$5
/user/month
ReadersAuthors
Visualization &
Machine Learning
42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Introducing ML Insights
ML Anomaly
Detection
ML Forecasting
Auto Narratives
Visualization &
Machine Learning
43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
“
Customer Use Case – Embedding
"Amazon QuickSight will allow us to quickly build fast,
interactive dashboards that will seamlessly integrate
with our Next Gen Stats applications. With the
Amazon QuickSight Readers and pay-per-session
pricing, we are able to extend these secure,
customized and easy to use dashboards for each
Club without having to provision servers or manage
infrastructure – all while only paying for actual usage.
We love the direction, and look forward to expanding
use of Amazon QuickSight.”
Matt Swensson, VP Emerging Products, NFL
Use case:
500+ users (NFL teams,
broadcasters, internal
research team)
Previous tools:
Custom-built web
application
Auth:
SAML-based SSO
”
Visualization &
Machine Learning
44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Visualization &
Machine Learning
With over 20,000 Rio Tinto CRM
users globally, QuickSight is
providing an interactive solution
to explore thousands of data
points quickly and to ensure
safety in every decision
45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Visual insights for everyone
With AWS ML & AI services
Frameworks and interfaces for
machine learning practitioners
Platform services that make it
easy for any developer to get
started and get deep with ML
Application services that enable
developers to plug-in pre-built
AI functionality into their apps
Visualization &
Machine Learning
Amazon S3
Raw Data Initial training data
is annotated by
human labelers
Active learning model
is trained from human
labeled data
Ambiguous data is sent to human
labelers for annotation
Human labeled data is then sent
back to retrain and improve the
machine learning model
Training data the
model understands is
labeled automatically
An accurate training data
set is ready for use in
Amazon SageMaker
46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Visualization &
Machine Learning
Using Amazon Translate,
Lionbridge is able to scale
machine translation in order
to localize content faster and
in more languages.
Using Translate, Lionbridge
was able to reduce translation
costs by 20 percent.
47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Near real-time fraud detection
in Turbo Tax Detecting account
take-over and Identity theft detection
Develop machine learning models that
not only detect fraud offline but also
enable the product to block it online
Visualization &
Machine Learning
48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Closing Thoughts…
• Data is growing 10x every 5 years; plan for scale and plan for
change
• Use open data formats to maximize your technical agility
• Clean, well-governed data is the foundation for machine
learning
• AWS provides composable services that make it easy to build
data-driven applications that drive business value
49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
And, finally…
Do your taxes by 4/15!
(and no cheating if you’re using TurboTax )
50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Thank you!
Rahul Pathak
rapathak@amazon.com