SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
The Power of Data Orchestration:
Storage Acceleration and Servitization at Shopee
Tianbao Ding
Haoning Sun
Private &
Confidential
Private & Confidential 2
1 Storage Status
2
3
4
Storage Accelerationn
Storage Servitisation
Future Plan
Storage Acceleration and Servitization at Shopee
Storage Servitisation
Storage Servitization
Private & Confidential 3
Storage Status—Architecture
3
Data Management Platform (DMP)
Spark Flink
Yarn
Presto
App (Search,
Recommendation
etc.)
Compute
Engine
Resource
Scheduler
Storage HDFS Ozone
Platform
Product
Private & Confidential 4
Storage Status—HDFS
4
Metric Value
Number of Nodes Thousands
Storage Capacity Hundreds PB
Num of Files Billions
Max QPS Hundreds of thousands
Private & Confidential 5
Storage Status—Presto
5
Metric Value
Number of Nodes Thousands of instances
TP90 About 2 min
Input Dozens of PB per day
Number of Queries
Hundreds of thousands per
day
Private & Confidential 6
HDFS
Unstable
Performance
Presto
Unstable
Query
Q:Can queries run faster?
Storage Status—Presto Accelerate Query
Private & Confidential 7
Presto
HDFS
Presto
Alluxio
Add cache
HDFS
Storage Status—Presto Accelerate Query
Private & Confidential 8 8
Storage Status—Alluxio+Presto Typical Architecture
• Mount HDFS
• Presto visit HDFS via Alluxio
• Alluxio manage cache
Private & Confidential 9 9
Storage Status—Shortcomings
• Need specific caching policies
• Read slowly from alluxio at the first time
Private &
Confidential
Private & Confidential 10
1 Storage Status
2
3
4
Storage Acceleration
Storage Servitisation
Future Plan
Storage Acceleration and Servitisation at Shopee
Storage Servitisation
Storage Servitization
Private & Confidential 11 11
Storage Acceleration—Solution
Private & Confidential 12
Storage Acceleration—Architecture
Kafka HDFS
Audit
HMS
Computing
Application
Operator Hot Table
Cache Manager
Alluxio
HDFS
Load/Unload/m
ount
Path
delete/create
event
Set/Clear
partition
property
Load data
Load Table
Update Policy
Input
Output
Get tag
Private & Confidential 13
Presto Query
Log
Hot Table
(Hive Table)
• Partition by date
• Calculate the number of visits of table
every day
Storage Acceleration—Hot Table
Private & Confidential 14
• Scheduled every day
Hot Table
Most frequently
visited weighted
tables in the last
seven days
Recent m
partitions
every table
Load from
HDFS to
Alluxio
Persist
relationship
Set tag in
HMS
Note: It is an alpha version and the
subsequent iterations will be optimized
continuously
Storage Acceleration—Update Policy
Private & Confidential 15
HDFS
Alluxio
HMS
Presto
On Alluxio No tag
• key:cache,
value:${DC}/Alluxio/ebj@${Alluxio_nameservice}
• If partition exists, set property in partition
property
• Else, set property in table property
Storage Acceleration—HMS Tag
Private & Confidential 16
Example:
Storage Acceleration—HMS Tag
Private & Confidential 17
HDFS
Audit Log
Flink
Format:
• cmd=xxsrc=xxdst=xx
Storage Acceleration—Kafka
Kafka topic
filter
Private & Confidential 18
PATH
mount
unmount
load
query
HIVE TABLE
mount
unmount
load
load all the
recorded paths
load recent n
partitions
query
ADMIN
monitor and operator
Storage Acceleration—REST API
Private & Confidential 19
Storage Acceleration—Perf Effect
Private & Confidential 20
• 6 merged, 2 WIP, 1 fixed by Alluxio.
TYPE PR STATUS
Hadoop 2.10
Fix HdfsVersion miss hadoop 2.10 config merged
Fix integration/yarn/pom.xml enforcer-plugin miss hadoop
2.10.x config
merged
Fix common.go miss hadoop 2.10 configuration merged
Command Line
Improve shell command support ebj nameservice merged
Fix for Alluxio.logs.dir
fixed by
Alluxio
Web Page
Fix isMounted should not invoke ufs, if not /metrics page very
slowly
merged
Fix FormatUtils.getSizeFromBytes method should supports EB merged
NameServices
Fix unescape the ufs url of Alluxio fsadmin report metrics
result
WIP
Metrics Fix cache radio total not include cacheMisses WIP
Storage Acceleration—Community Contribution
Private & Confidential
Private & Confidential 21
1 Storage Situation
2
3
4
Storage Acceleration
Storage Servitization
Future Plan
Storage Acceleration and Servitization at Shopee
Private & Confidential
Private & Confidential 22
Storage Servitization—Status
▪ Most of data is stored in HDFS
▪ Various development languages are used
▪ HDFS has insufficient support for non Java clients
Private & Confidential
Private & Confidential 23
Fuse for HDFS
S3 for HDFS
▪ Deploy alluxio fuse service on physical machine
▪ Deploy alluxio fuse service on kubernetes cluster
▪ Using S3 API to access alluxio proxy service
Storage Servitization—Solutions
Private & Confidential
Private & Confidential 24
▪ Kernel
▪ User-level daemon
High-Level Architecture
Storage Servitization—Fuse
WHAT IS IT
▪ FileSystem in Userspace
Private & Confidential
Private & Confidential 25
▪ libfuse
▪ JNR-Fuse
▪ JNI-Fuse
Requirements
Implementation
Storage Servitization—Alluxio Fuse
▪ Standalone Fuse
▪ Fuse on Workers
Deployment
▪ Not support random writes
Limitations
Private & Confidential
Private & Confidential 26
Store Servitization—Alluxio CSI
▪ On nodeserver pod
▪ On separate pod(new feature)
Fuse Deployment mode
WHAT IS IT
▪ Standard storage interface for
containers
Private & Confidential
Private & Confidential 27
▪ Fuse sidecar container in a Pod to mount the
Alluxio directory
▪ Independent configuration of pods, high flexibility
▪ Each Pod runs a Fuse container without affecting
each other
▪ Each Fuse process occupies a container, so the
solution consumes more resources
Futures
Store Servitization—k8s sidecar for Alluxio
WHAT IS IT
Private & Confidential
Private & Confidential 28
Store Servitization—Summarize
Fuse on
K8s-csi
K8s-sidecar
Fuse on nodeserver
pod
Fuse on separate pod
maintenance
cost
high low higher higher
resource usage low lower high high
independence high low high high
Private & Confidential
Private & Confidential 29
▪ Bucket: A bucket is a container for objects stored in Amazon S3
▪ Object: Objects are the fundamental entities stored in Amazon S3
▪ Key: An object key (or key name) is the unique identifier for an
object within a bucket.
▪ Region: You can choose a region to store the created buckets
Store Servitization—S3
Buckets
Objects
Keys Regions
Amazon
S3
Concepts
Conception
Private & Confidential
Private & Confidential 30
▪ Alluxio can mount HDFS data
▪ Alluxio provides Proxy service
▪ Proxy is compatible with the basic operations of the S3 API
▪ S3 SDK supports many development languages
Store Servitization—S3 for HDFS
Access HDFS data via Alluxio using S3 protocol
Private & Confidential
Private & Confidential 31
▪ 1-level directory as bucket
▪ Subdirectories and file paths as key
Store Servitization—Alluxio Proxy for S3 mapping
Private & Confidential
Private & Confidential 32
Store Servitization—Proxy Authentication
▪ Authentication parser
▪ Validator
▪ Secret Manager
▪ Signature Calculation
Private & Confidential
Private & Confidential 33
Store Servitization—Service Architecture
Private & Confidential
Private & Confidential 34
Store Servitization—Community contribution
TYPE PR STATUS
proxy
Fix wrong format of s3 bucket creationDate merged
Support parse authorization headers for s3 proxy WIP
fuse
Fix wrong method call to get username and wrong
parameter assignment
merged
csi Replace invalid env with args in nodeserver merged
doc
Fix bug case of S3 REST API merged
Fix wrong file name in k8s doc merged
Fix ambiguous description for impersonation in CN
doc
merged
ozone Update ozone from 1.1.0 to 1.2.1 closed
▪ 6 merged, 1 WIP, 1 closed.
Private & Confidential
Private & Confidential 35
1 Storage Situation
2
3
4
Storage Acceleration
Storage Servitization
Future Plan
5 Future Plan
Storage Acceleration and Servitization at Shopee
Private & Confidential
Private & Confidential 36
▪ Speed up Spark and Hive
▪ Implement adaptive cache policy on CacheManager
▪ Support more POSIX APIs
▪ Optimize CSI
Storage Service
Future Plan
Storage speed up
Private & Confidential
Private & Confidential 37
Thank You
Storage Acceleration and Servitization at Shopee

Mais conteúdo relacionado

Mais procurados

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent
 
Netflix CDN and Open Source
Netflix CDN and Open SourceNetflix CDN and Open Source
Netflix CDN and Open SourceGleb Smirnoff
 
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud
 
Cross Data Center Replication with Redis using Redis Enterprise
Cross Data Center Replication with Redis using Redis EnterpriseCross Data Center Replication with Redis using Redis Enterprise
Cross Data Center Replication with Redis using Redis EnterpriseCihan Biyikoglu
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency DatabaseScyllaDB
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...Amazon Web Services
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Hw09 Large Scale Transaction Analysis
Hw09   Large Scale Transaction AnalysisHw09   Large Scale Transaction Analysis
Hw09 Large Scale Transaction AnalysisCloudera, Inc.
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safeconfluent
 

Mais procurados (20)

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
Netflix CDN and Open Source
Netflix CDN and Open SourceNetflix CDN and Open Source
Netflix CDN and Open Source
 
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
 
Cross Data Center Replication with Redis using Redis Enterprise
Cross Data Center Replication with Redis using Redis EnterpriseCross Data Center Replication with Redis using Redis Enterprise
Cross Data Center Replication with Redis using Redis Enterprise
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Cloud Oracle
Cloud Oracle Cloud Oracle
Cloud Oracle
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...
Deploy a DoD Secure Cloud Computing Architecture Environment in AWS | AWS Pub...
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Hw09 Large Scale Transaction Analysis
Hw09   Large Scale Transaction AnalysisHw09   Large Scale Transaction Analysis
Hw09 Large Scale Transaction Analysis
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 

Semelhante a The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee

The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...
The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...
The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...Alluxio, Inc.
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed_Hat_Storage
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Secure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
Secure Redis Cluster At Box: Vova Galchenko, Ravitej SistlaSecure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
Secure Redis Cluster At Box: Vova Galchenko, Ravitej SistlaRedis Labs
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018Globus
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 

Semelhante a The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee (20)

The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...
The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...
The Power of Data Orchestration: Storage Acceleration and Servitization at Sh...
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Red Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open StackRed Hat Storage Server Roadmap & Integration With Open Stack
Red Hat Storage Server Roadmap & Integration With Open Stack
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Secure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
Secure Redis Cluster At Box: Vova Galchenko, Ravitej SistlaSecure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
Secure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 

Mais de Alluxio, Inc.

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio, Inc.
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio, Inc.
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio, Inc.
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio, Inc.
 

Mais de Alluxio, Inc. (20)

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model Training
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AI
 

Último

Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxPrakarsh -
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 

Último (20)

Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 

The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee

  • 1. The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee Tianbao Ding Haoning Sun
  • 2. Private & Confidential Private & Confidential 2 1 Storage Status 2 3 4 Storage Accelerationn Storage Servitisation Future Plan Storage Acceleration and Servitization at Shopee Storage Servitisation Storage Servitization
  • 3. Private & Confidential 3 Storage Status—Architecture 3 Data Management Platform (DMP) Spark Flink Yarn Presto App (Search, Recommendation etc.) Compute Engine Resource Scheduler Storage HDFS Ozone Platform Product
  • 4. Private & Confidential 4 Storage Status—HDFS 4 Metric Value Number of Nodes Thousands Storage Capacity Hundreds PB Num of Files Billions Max QPS Hundreds of thousands
  • 5. Private & Confidential 5 Storage Status—Presto 5 Metric Value Number of Nodes Thousands of instances TP90 About 2 min Input Dozens of PB per day Number of Queries Hundreds of thousands per day
  • 6. Private & Confidential 6 HDFS Unstable Performance Presto Unstable Query Q:Can queries run faster? Storage Status—Presto Accelerate Query
  • 7. Private & Confidential 7 Presto HDFS Presto Alluxio Add cache HDFS Storage Status—Presto Accelerate Query
  • 8. Private & Confidential 8 8 Storage Status—Alluxio+Presto Typical Architecture • Mount HDFS • Presto visit HDFS via Alluxio • Alluxio manage cache
  • 9. Private & Confidential 9 9 Storage Status—Shortcomings • Need specific caching policies • Read slowly from alluxio at the first time
  • 10. Private & Confidential Private & Confidential 10 1 Storage Status 2 3 4 Storage Acceleration Storage Servitisation Future Plan Storage Acceleration and Servitisation at Shopee Storage Servitisation Storage Servitization
  • 11. Private & Confidential 11 11 Storage Acceleration—Solution
  • 12. Private & Confidential 12 Storage Acceleration—Architecture Kafka HDFS Audit HMS Computing Application Operator Hot Table Cache Manager Alluxio HDFS Load/Unload/m ount Path delete/create event Set/Clear partition property Load data Load Table Update Policy Input Output Get tag
  • 13. Private & Confidential 13 Presto Query Log Hot Table (Hive Table) • Partition by date • Calculate the number of visits of table every day Storage Acceleration—Hot Table
  • 14. Private & Confidential 14 • Scheduled every day Hot Table Most frequently visited weighted tables in the last seven days Recent m partitions every table Load from HDFS to Alluxio Persist relationship Set tag in HMS Note: It is an alpha version and the subsequent iterations will be optimized continuously Storage Acceleration—Update Policy
  • 15. Private & Confidential 15 HDFS Alluxio HMS Presto On Alluxio No tag • key:cache, value:${DC}/Alluxio/ebj@${Alluxio_nameservice} • If partition exists, set property in partition property • Else, set property in table property Storage Acceleration—HMS Tag
  • 16. Private & Confidential 16 Example: Storage Acceleration—HMS Tag
  • 17. Private & Confidential 17 HDFS Audit Log Flink Format: • cmd=xxsrc=xxdst=xx Storage Acceleration—Kafka Kafka topic filter
  • 18. Private & Confidential 18 PATH mount unmount load query HIVE TABLE mount unmount load load all the recorded paths load recent n partitions query ADMIN monitor and operator Storage Acceleration—REST API
  • 19. Private & Confidential 19 Storage Acceleration—Perf Effect
  • 20. Private & Confidential 20 • 6 merged, 2 WIP, 1 fixed by Alluxio. TYPE PR STATUS Hadoop 2.10 Fix HdfsVersion miss hadoop 2.10 config merged Fix integration/yarn/pom.xml enforcer-plugin miss hadoop 2.10.x config merged Fix common.go miss hadoop 2.10 configuration merged Command Line Improve shell command support ebj nameservice merged Fix for Alluxio.logs.dir fixed by Alluxio Web Page Fix isMounted should not invoke ufs, if not /metrics page very slowly merged Fix FormatUtils.getSizeFromBytes method should supports EB merged NameServices Fix unescape the ufs url of Alluxio fsadmin report metrics result WIP Metrics Fix cache radio total not include cacheMisses WIP Storage Acceleration—Community Contribution
  • 21. Private & Confidential Private & Confidential 21 1 Storage Situation 2 3 4 Storage Acceleration Storage Servitization Future Plan Storage Acceleration and Servitization at Shopee
  • 22. Private & Confidential Private & Confidential 22 Storage Servitization—Status ▪ Most of data is stored in HDFS ▪ Various development languages are used ▪ HDFS has insufficient support for non Java clients
  • 23. Private & Confidential Private & Confidential 23 Fuse for HDFS S3 for HDFS ▪ Deploy alluxio fuse service on physical machine ▪ Deploy alluxio fuse service on kubernetes cluster ▪ Using S3 API to access alluxio proxy service Storage Servitization—Solutions
  • 24. Private & Confidential Private & Confidential 24 ▪ Kernel ▪ User-level daemon High-Level Architecture Storage Servitization—Fuse WHAT IS IT ▪ FileSystem in Userspace
  • 25. Private & Confidential Private & Confidential 25 ▪ libfuse ▪ JNR-Fuse ▪ JNI-Fuse Requirements Implementation Storage Servitization—Alluxio Fuse ▪ Standalone Fuse ▪ Fuse on Workers Deployment ▪ Not support random writes Limitations
  • 26. Private & Confidential Private & Confidential 26 Store Servitization—Alluxio CSI ▪ On nodeserver pod ▪ On separate pod(new feature) Fuse Deployment mode WHAT IS IT ▪ Standard storage interface for containers
  • 27. Private & Confidential Private & Confidential 27 ▪ Fuse sidecar container in a Pod to mount the Alluxio directory ▪ Independent configuration of pods, high flexibility ▪ Each Pod runs a Fuse container without affecting each other ▪ Each Fuse process occupies a container, so the solution consumes more resources Futures Store Servitization—k8s sidecar for Alluxio WHAT IS IT
  • 28. Private & Confidential Private & Confidential 28 Store Servitization—Summarize Fuse on K8s-csi K8s-sidecar Fuse on nodeserver pod Fuse on separate pod maintenance cost high low higher higher resource usage low lower high high independence high low high high
  • 29. Private & Confidential Private & Confidential 29 ▪ Bucket: A bucket is a container for objects stored in Amazon S3 ▪ Object: Objects are the fundamental entities stored in Amazon S3 ▪ Key: An object key (or key name) is the unique identifier for an object within a bucket. ▪ Region: You can choose a region to store the created buckets Store Servitization—S3 Buckets Objects Keys Regions Amazon S3 Concepts Conception
  • 30. Private & Confidential Private & Confidential 30 ▪ Alluxio can mount HDFS data ▪ Alluxio provides Proxy service ▪ Proxy is compatible with the basic operations of the S3 API ▪ S3 SDK supports many development languages Store Servitization—S3 for HDFS Access HDFS data via Alluxio using S3 protocol
  • 31. Private & Confidential Private & Confidential 31 ▪ 1-level directory as bucket ▪ Subdirectories and file paths as key Store Servitization—Alluxio Proxy for S3 mapping
  • 32. Private & Confidential Private & Confidential 32 Store Servitization—Proxy Authentication ▪ Authentication parser ▪ Validator ▪ Secret Manager ▪ Signature Calculation
  • 33. Private & Confidential Private & Confidential 33 Store Servitization—Service Architecture
  • 34. Private & Confidential Private & Confidential 34 Store Servitization—Community contribution TYPE PR STATUS proxy Fix wrong format of s3 bucket creationDate merged Support parse authorization headers for s3 proxy WIP fuse Fix wrong method call to get username and wrong parameter assignment merged csi Replace invalid env with args in nodeserver merged doc Fix bug case of S3 REST API merged Fix wrong file name in k8s doc merged Fix ambiguous description for impersonation in CN doc merged ozone Update ozone from 1.1.0 to 1.2.1 closed ▪ 6 merged, 1 WIP, 1 closed.
  • 35. Private & Confidential Private & Confidential 35 1 Storage Situation 2 3 4 Storage Acceleration Storage Servitization Future Plan 5 Future Plan Storage Acceleration and Servitization at Shopee
  • 36. Private & Confidential Private & Confidential 36 ▪ Speed up Spark and Hive ▪ Implement adaptive cache policy on CacheManager ▪ Support more POSIX APIs ▪ Optimize CSI Storage Service Future Plan Storage speed up
  • 37. Private & Confidential Private & Confidential 37 Thank You Storage Acceleration and Servitization at Shopee