SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
9HWIOL ,
5QWH UDWLQ SDUN
.W HWDE WH FDOH
. KZLQ KDQNDU
0KHRO RR DUN
: WOLQH
  9HWIOL EL DWD SODWIRUP
  SDUN - 9HWIOL
  8 OWL WHQDQF SUREOHP
&   UH LFDWH S K RZQ
  ILOH OL WLQ
(   LQ HUW RYHUZULWH
)   CHSSHOLQ 5S WKRQ QRWHERRN
  @ H FD H L Y SDUN
9HWIOL
/L 1DWD ODWIRUP
9HWIOL DWD SLSHOLQH
Cloud
Apps
S3
Suro/Kafka Ursula
SSTables
Cassandra Aegisthus
Event Data
500 bn/day, 15m
Daily
Dimension Data
9HWIOL EL DWD SODWIRUP
Data
Warehouse
Service
Tools
Gateways
Prod
Clients
Clusters
Adhoc Prod TestTest
Big Data API/Portal
Metacat
Prod
: U H FD H
•  Batch jobs (Pig, Hive)
•  ETL jobs
•  Reporting and other analysis
•  Interactive jobs (Presto)
•  Iterative ML jobs (Spark)
SDUN - 9HWIOL
8L RI HSOR PHQW
•  SDUN RQ 8H R
•  HOI HUYLQ .85
•  3 OO /1. HUNHOH DWD QDO WLF WDFN
•  :QOLQH WUHDPLQ DQDO WLF
•  SDUN RQ B. 9
•  SDUN D D HUYLFH
•  B. 9 DSSOLFDWLRQ RQ 28 4D RRS
•  :IIOLQH EDWFK DQDO WLF
SDUN RQ B. 9
•  8 OWL WHQDQW FO WHU LQ .A FOR
•  4R WLQ 8 SDUN 1U L
•  28 4D RRS & .85 +
•  1 & ODU H HF LQ WDQFH W SH
•  QR H / WRWDO PHPRU
1HSOR PHQW
S3 s3://bucket/spark/1.5/spark-1.5.tgz, spark-defaults.conf (spark.yarn.jar=1440443677)
s3://bucket/spark/1.4/spark-1.4.tgz, spark-defaults.conf (spark.yarn.jar=1440304023)
/spark/1.5/1440443677/spark-assembly.jar
/spark/1.5/1440720326/spark-assembly.jar
/spark/1.4/1440304023/spark-assembly.jar
/spark/1.4/1440989711/spark-assembly.jar
name: spark
version: 1.5
tags: ['type:spark', 'ver:1.5']
jars:
- 's3://bucket/spark/1.5/spark-1.5.tgz’
Download latest tarball
From S3 via Genie
. YDQWD H
 . WRPDWH HSOR PHQW
  SSRUW P OWLSOH YHU LRQ
 1HSOR QHZ FR H LQ PLQ WH
&   ROO EDFN ED FR H LQ OH WKDQ D PLQ WH
8 OWL WHQDQF
UREOHP
1 QDPLF DOORFDWLRQ
Courtesy of “Dynamic allocate cluster resources to your Spark application” at Hadoop Summit 2015
1 QDPLF DOORFDWLRQ
// spark-defaults.conf
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 5
spark.dynamicAllocation.initialExecutors 3
spark.dynamicAllocation.maxExecutors 500
spark.dynamicAllocation.minExecutors 3
spark.dynamicAllocation.schedulerBacklogTimeout 5
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5
spark.dynamicAllocation.cachedExecutorIdleTimeout 900
// yarn-site.xml
yarn.nodemanager.aux-services
•  spark_shuffle, mapreduce_shuffle
yarn.nodemanager.aux-services.spark_shuffle.class
•  org.apache.spark.network.yarn.YarnShuffleService
UREOHP , . 6 (+ &
. 6 (+ &
UREOHP , . 6 )+
. 6 )+
val data = sqlContext
.table("dse.admin_genie_job_d”)
.filter($"dateint">=20150601 and $"dateint"<=20150830)
data.persist
data.count
UREOHP , . 6 )& . 6 ()
. 6 )& . 6 ()
•  PSWRP
•  SDUN H HF WRU WD N UDQ RPO IDLO FD LQ MRE IDLO UH
•  0D H
•  UHHPSWH H HF WRU WD N DUH FR QWH D IDLO UH
•  RO WLRQ
•  UHHPSWH H HF WRU WD N KR O EH FRQ L HUH D NLOOH
UREOHP &, B. 9 )
B. 9 )
•  PSWRP
•  8 MRE HW WLPH R W ULQ ORFDOL DWLRQ ZKHQ U QQLQ ZLWK SDUN MRE
RQ WKH DPH FO WHU
•  0D H
•  98 ORFDOL H RQH MRE DW D WLPH LQFH SDUN U QWLPH MDU L EL ORFDOL LQ
SDUN MRE PD WDNH ORQ EORFNLQ 8 MRE
•  RO WLRQ
•  WD H SDUN U QWLPH MDU RQ 413 ZLWK KL K UHSOLDFDWLRQ
•  8DNH 98 ORFDOL H P OWLSOH MRE FRQF UUHQWO
UH LFDWH
K RZQ
UH LFDWH S K RZQ
Case Behavior
Predicates with partition cols on partitioned table Single partition scan
Predicates with partition and non-partition cols on
partitioned table
Single partition scan
No predicate on partitioned table
e.g. sqlContext.table(“nccp_log”).take(10)
Full scan
No predicate on non-partitioned table Single partition scan
UH LFDWH S K RZQ IRU PHWD DWD
Analyzer
Optimizer
SparkPlanner
Parser
HiveMetastoreCatalog
getAllPartitions()
ResolveRelation
What if your table has 1.6M partitions?
. 6 (+
•  PSWRP
•  HU LQ D DLQ W KHDYLO SDUWLWLRQH 4LYH WDEOH L ORZ
•  0D H
•  UH LFDWH DUH QRW S KH RZQ LQWR 4LYH PHWD WRUH R SDUN RH I OO
FDQ IRU WDEOH PHWD DWD
•  RO WLRQ
•  K RZQ ELQDU FRPSDUL RQ H SUH LRQ YLD HW DUWLWLRQ / ILOWHU LQ
WR 4LYH PHWD WRUH
UH LFDWH S K RZQ IRU PHWD DWD
Analyzer
Optimizer
SparkPlanner
Parser
HiveTableScan
getPartitionsByFilter()
HiveTableScans
3LOH 7L WLQ
5QS W SOLW FRPS WDWLRQ
•  PDSUH FH LQS W ILOHLQS WIRUPDW OL W WDW Q P WKUHD
•  KH Q PEHU RI WKUHD WR H OL W DQ IHWFK EORFN ORFDWLRQ IRU WKH SHFLIL
H LQS W SDWK
•  HWWLQ WKL SURSHUW LQ SDUN MRE RH Q W KHOS
3LOH OL WLQ IRU SDUWLWLRQH WDEOH
Partition path
Seq[RDD]
HadoopRDD
HadoopRDD
HadoopRDD
HadoopRDD
Partition path
Partition path
Partition path
Input dir
Input dir
Input dir
Input dir
Sequentially listing input dirs via S3N file system.
S3N
S3N
S3N
S3N
. 6 ++ ( . 6 &
•  PSWRP
•  5QS W SOLW FRPS WDWLRQ IRU SDUWLWLRQH 4LYH WDEOH RQ L ORZ
•  0D H
•  7L WLQ ILOH RQ D SHU SDUWLWLRQ ED L L ORZ
•  9 ILOH WHP FRPS WH DWD ORFDOLW KLQW
•  RO WLRQ
•  / ON OL W SDUWLWLRQ LQ SDUDOOHO LQ .PD RQ 0OLHQW
•  / SD DWD ORFDOLW FRPS WDWLRQ IRU REMHFW
E ON OL WLQ
Partition path
ParArray[RDD]
HadoopRDD
HadoopRDD
HadoopRDD
HadoopRDD
Partition path
Partition path
Partition path
Input dir
Input dir
Input dir
Input dir
Bulk listing input dirs in parallel via AmazonS3Client.
Amazon
S3Client
HUIRUPDQFH LPSURYHPHQW
0
2000
4000
6000
8000
10000
12000
14000
16000
1 24 240 720
seconds
# of partitions
1.5 RC2
S3 bulk listing
SELECT * FROM nccp_log WHERE dateint=20150801 and hour=0 LIMIT 10;
5Q HUW :YHUZULWH
UREOHP , 4D RRS R WS W FRPPLWWHU
•  4RZ LW ZRUN ,
•  2DFK WD N ZULWH R WS W WR D WHPS LU
•  : WS W FRPPLWWHU UHQDPH ILU W FFH I O WD N WHPS LU WR
ILQDO H WLQDWLRQ
•  UREOHP ZLWK ,
•  UHQDPH L FRS DQ HOHWH
•  L HYHQW DO FRQ L WHQW
•  3LOH9RW3R Q 2 FHSWLRQ ULQ UHQDPH
R WS W FRPPLWWHU
•  4RZ LW ZRUN ,
•  2DFK WD N ZULWH R WS W WR ORFDO L N
•  : WS W FRPPLWWHU FRSLH ILU W FFH I O WD N R WS W WR
•  . YDQWD H ,
•  .YRL UH DQDQW FRS
•  .YRL HYHQW DO FRQ L WHQF
UREOHP , 4LYH LQ HUW RYHUZULWH
•  4RZ LW ZRUN ,
•  1HOHWH DQ UHZULWH H L WLQ R WS W LQ SDUWLWLRQ
•  UREOHP ZLWK ,
•  L HYHQW DO FRQ L WHQW
•  3LOH.OUHD 2 L W2 FHSWLRQ ULQ UHZULWH
/DWFKL SDWWHUQ
•  4RZ LW ZRUN ,
•  9HYHU HOHWH H L WLQ R WS W LQ SDUWLWLRQ
•  2DFK MRE LQ HUW D QLT H ESDUWLWLRQ FDOOH EDWFKL
•  . YDQWD H ,
•  .YRL HYHQW DO FRQ L WHQF
CHSSHOLQ
5S WKRQ
9RWHERRN
/L DWD SRUWDO
•  :QH WRS KRS IRU DOO EL DWD UHODWH WRRO DQ HUYLFH
•  / LOW RQ WRS RI /L 1DWD . 5
: W RI ER H DPSOH
•  Zero installation
•  Dependency management via Docker
•  Notebook persistence
•  Elastic resources
:Q HPDQ QRWHERRN
Quick facts about Titan
•  Task execution platform leveraging Apache Mesos.
•  Manages underlying EC2 instances.
•  Process supervision and uptime in the face of failures.
•  Auto scaling
Notebook Infrastructure
Ephemeral ports / --net=host mode
Zeppelin
Docker
Container A
172.X.X.X
Host machine A
54.X.X.X
Host machine B
54.X.X.X
Pyspark
Docker
Container B
172.X.X.X
Titan cluster YARN cluster
Spark AM
Spark AM
@ H 0D H
L Y SDUN
5WHUDWLYH MRE
5WHUDWLYH MRE
1. Duplicate data and aggregate them differently.
2. Merging aggregates back.
HUIRUPDQFH LPSURYHPHQW
0:00:00
0:14:24
0:28:48
0:43:12
0:57:36
1:12:00
1:26:24
1:40:48
1:55:12
2:09:36
job 1 job 2 job 3
hh:mm:ss
Pig
Spark 1.2
: U FRQWULE WLRQ
. 6 (
. 6 (((
. 6 (+ +
. 6 (+
. 6 ) )
. 6 )&
. 6 )
. 6
. 6 )
. 6 +
. 6 + )
. 6 ++ (
. 6
. 6 &
.
KDQN BR

Mais conteúdo relacionado

Semelhante a Netflix integrating spark at petabyte scale

Hioki PRECISION DC VOLTMETER DM7275 DM7276
Hioki PRECISION DC VOLTMETER DM7275 DM7276Hioki PRECISION DC VOLTMETER DM7275 DM7276
Hioki PRECISION DC VOLTMETER DM7275 DM7276NIHON DENKEI SINGAPORE
 
190412 Annotation Survey@関東CV勉強会
190412 Annotation Survey@関東CV勉強会190412 Annotation Survey@関東CV勉強会
190412 Annotation Survey@関東CV勉強会Takanori Ogata
 
If the term KM could get a do-over
If the term KM could get a do-overIf the term KM could get a do-over
If the term KM could get a do-overArt Schlussel
 
FORMULA LIST FOR PHYSICS
FORMULA LIST FOR PHYSICSFORMULA LIST FOR PHYSICS
FORMULA LIST FOR PHYSICSWAYNE FERNANDES
 
User guide for dukane 9000 series projectors
User guide for dukane 9000 series projectorsUser guide for dukane 9000 series projectors
User guide for dukane 9000 series projectorsDukaneAVMarketing
 
Catalogo_NEW IDRA
Catalogo_NEW IDRACatalogo_NEW IDRA
Catalogo_NEW IDRAadwoa antwi
 
Dukane imagepro 9005 9007 usermanual
Dukane imagepro 9005 9007 usermanualDukane imagepro 9005 9007 usermanual
Dukane imagepro 9005 9007 usermanualSchoolVision Inc.
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerHitoshi Sato
 
Articulo sistema de partidos 1958 - 1973 - avd version lite 1
Articulo   sistema de partidos 1958  - 1973 - avd version lite 1Articulo   sistema de partidos 1958  - 1973 - avd version lite 1
Articulo sistema de partidos 1958 - 1973 - avd version lite 1Cristian Adrian Villegas Dianta
 

Semelhante a Netflix integrating spark at petabyte scale (20)

Hioki PRECISION DC VOLTMETER DM7275 DM7276
Hioki PRECISION DC VOLTMETER DM7275 DM7276Hioki PRECISION DC VOLTMETER DM7275 DM7276
Hioki PRECISION DC VOLTMETER DM7275 DM7276
 
190412 Annotation Survey@関東CV勉強会
190412 Annotation Survey@関東CV勉強会190412 Annotation Survey@関東CV勉強会
190412 Annotation Survey@関東CV勉強会
 
If the term KM could get a do-over
If the term KM could get a do-overIf the term KM could get a do-over
If the term KM could get a do-over
 
Pro3100
Pro3100Pro3100
Pro3100
 
Sj user manual
Sj user manualSj user manual
Sj user manual
 
FORMULA LIST FOR PHYSICS
FORMULA LIST FOR PHYSICSFORMULA LIST FOR PHYSICS
FORMULA LIST FOR PHYSICS
 
User guide for dukane 9000 series projectors
User guide for dukane 9000 series projectorsUser guide for dukane 9000 series projectors
User guide for dukane 9000 series projectors
 
9007 wu l-usermanual
9007 wu l-usermanual9007 wu l-usermanual
9007 wu l-usermanual
 
Hioki short catalog_e9-12b_denkei_ss
Hioki short catalog_e9-12b_denkei_ssHioki short catalog_e9-12b_denkei_ss
Hioki short catalog_e9-12b_denkei_ss
 
8928 8933 w-usermanual
8928 8933 w-usermanual8928 8933 w-usermanual
8928 8933 w-usermanual
 
Catalogo_NEW IDRA
Catalogo_NEW IDRACatalogo_NEW IDRA
Catalogo_NEW IDRA
 
Dukane imagepro 9005 9007 usermanual
Dukane imagepro 9005 9007 usermanualDukane imagepro 9005 9007 usermanual
Dukane imagepro 9005 9007 usermanual
 
Oracle switch over_back
Oracle switch over_backOracle switch over_back
Oracle switch over_back
 
Oracle switch over_back
Oracle switch over_backOracle switch over_back
Oracle switch over_back
 
Kevin's Packet
Kevin's PacketKevin's Packet
Kevin's Packet
 
Spnch2 1
Spnch2 1Spnch2 1
Spnch2 1
 
Spnch2 1
Spnch2 1Spnch2 1
Spnch2 1
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC ContainerBuilding Software Ecosystems for AI Cloud using Singularity HPC Container
Building Software Ecosystems for AI Cloud using Singularity HPC Container
 
Cromatografia
CromatografiaCromatografia
Cromatografia
 
Articulo sistema de partidos 1958 - 1973 - avd version lite 1
Articulo   sistema de partidos 1958  - 1973 - avd version lite 1Articulo   sistema de partidos 1958  - 1973 - avd version lite 1
Articulo sistema de partidos 1958 - 1973 - avd version lite 1
 

Último

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 

Último (20)

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 

Netflix integrating spark at petabyte scale

  • 1. 9HWIOL , 5QWH UDWLQ SDUN .W HWDE WH FDOH . KZLQ KDQNDU 0KHRO RR DUN
  • 2. : WOLQH   9HWIOL EL DWD SODWIRUP   SDUN - 9HWIOL   8 OWL WHQDQF SUREOHP &   UH LFDWH S K RZQ   ILOH OL WLQ (   LQ HUW RYHUZULWH )   CHSSHOLQ 5S WKRQ QRWHERRN   @ H FD H L Y SDUN
  • 4. 9HWIOL DWD SLSHOLQH Cloud Apps S3 Suro/Kafka Ursula SSTables Cassandra Aegisthus Event Data 500 bn/day, 15m Daily Dimension Data
  • 5. 9HWIOL EL DWD SODWIRUP Data Warehouse Service Tools Gateways Prod Clients Clusters Adhoc Prod TestTest Big Data API/Portal Metacat Prod
  • 6. : U H FD H •  Batch jobs (Pig, Hive) •  ETL jobs •  Reporting and other analysis •  Interactive jobs (Presto) •  Iterative ML jobs (Spark)
  • 8. 8L RI HSOR PHQW •  SDUN RQ 8H R •  HOI HUYLQ .85 •  3 OO /1. HUNHOH DWD QDO WLF WDFN •  :QOLQH WUHDPLQ DQDO WLF •  SDUN RQ B. 9 •  SDUN D D HUYLFH •  B. 9 DSSOLFDWLRQ RQ 28 4D RRS •  :IIOLQH EDWFK DQDO WLF
  • 9. SDUN RQ B. 9 •  8 OWL WHQDQW FO WHU LQ .A FOR •  4R WLQ 8 SDUN 1U L •  28 4D RRS & .85 + •  1 & ODU H HF LQ WDQFH W SH •  QR H / WRWDO PHPRU
  • 10. 1HSOR PHQW S3 s3://bucket/spark/1.5/spark-1.5.tgz, spark-defaults.conf (spark.yarn.jar=1440443677) s3://bucket/spark/1.4/spark-1.4.tgz, spark-defaults.conf (spark.yarn.jar=1440304023) /spark/1.5/1440443677/spark-assembly.jar /spark/1.5/1440720326/spark-assembly.jar /spark/1.4/1440304023/spark-assembly.jar /spark/1.4/1440989711/spark-assembly.jar name: spark version: 1.5 tags: ['type:spark', 'ver:1.5'] jars: - 's3://bucket/spark/1.5/spark-1.5.tgz’ Download latest tarball From S3 via Genie
  • 11. . YDQWD H  . WRPDWH HSOR PHQW   SSRUW P OWLSOH YHU LRQ  1HSOR QHZ FR H LQ PLQ WH &   ROO EDFN ED FR H LQ OH WKDQ D PLQ WH
  • 13. 1 QDPLF DOORFDWLRQ Courtesy of “Dynamic allocate cluster resources to your Spark application” at Hadoop Summit 2015
  • 14. 1 QDPLF DOORFDWLRQ // spark-defaults.conf spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 5 spark.dynamicAllocation.initialExecutors 3 spark.dynamicAllocation.maxExecutors 500 spark.dynamicAllocation.minExecutors 3 spark.dynamicAllocation.schedulerBacklogTimeout 5 spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5 spark.dynamicAllocation.cachedExecutorIdleTimeout 900 // yarn-site.xml yarn.nodemanager.aux-services •  spark_shuffle, mapreduce_shuffle yarn.nodemanager.aux-services.spark_shuffle.class •  org.apache.spark.network.yarn.YarnShuffleService
  • 15. UREOHP , . 6 (+ &
  • 16. . 6 (+ &
  • 17. UREOHP , . 6 )+
  • 18. . 6 )+ val data = sqlContext .table("dse.admin_genie_job_d”) .filter($"dateint">=20150601 and $"dateint"<=20150830) data.persist data.count
  • 19. UREOHP , . 6 )& . 6 ()
  • 20. . 6 )& . 6 () •  PSWRP •  SDUN H HF WRU WD N UDQ RPO IDLO FD LQ MRE IDLO UH •  0D H •  UHHPSWH H HF WRU WD N DUH FR QWH D IDLO UH •  RO WLRQ •  UHHPSWH H HF WRU WD N KR O EH FRQ L HUH D NLOOH
  • 22. B. 9 ) •  PSWRP •  8 MRE HW WLPH R W ULQ ORFDOL DWLRQ ZKHQ U QQLQ ZLWK SDUN MRE RQ WKH DPH FO WHU •  0D H •  98 ORFDOL H RQH MRE DW D WLPH LQFH SDUN U QWLPH MDU L EL ORFDOL LQ SDUN MRE PD WDNH ORQ EORFNLQ 8 MRE •  RO WLRQ •  WD H SDUN U QWLPH MDU RQ 413 ZLWK KL K UHSOLDFDWLRQ •  8DNH 98 ORFDOL H P OWLSOH MRE FRQF UUHQWO
  • 24. UH LFDWH S K RZQ Case Behavior Predicates with partition cols on partitioned table Single partition scan Predicates with partition and non-partition cols on partitioned table Single partition scan No predicate on partitioned table e.g. sqlContext.table(“nccp_log”).take(10) Full scan No predicate on non-partitioned table Single partition scan
  • 25. UH LFDWH S K RZQ IRU PHWD DWD Analyzer Optimizer SparkPlanner Parser HiveMetastoreCatalog getAllPartitions() ResolveRelation What if your table has 1.6M partitions?
  • 26. . 6 (+ •  PSWRP •  HU LQ D DLQ W KHDYLO SDUWLWLRQH 4LYH WDEOH L ORZ •  0D H •  UH LFDWH DUH QRW S KH RZQ LQWR 4LYH PHWD WRUH R SDUN RH I OO FDQ IRU WDEOH PHWD DWD •  RO WLRQ •  K RZQ ELQDU FRPSDUL RQ H SUH LRQ YLD HW DUWLWLRQ / ILOWHU LQ WR 4LYH PHWD WRUH
  • 27. UH LFDWH S K RZQ IRU PHWD DWD Analyzer Optimizer SparkPlanner Parser HiveTableScan getPartitionsByFilter() HiveTableScans
  • 29. 5QS W SOLW FRPS WDWLRQ •  PDSUH FH LQS W ILOHLQS WIRUPDW OL W WDW Q P WKUHD •  KH Q PEHU RI WKUHD WR H OL W DQ IHWFK EORFN ORFDWLRQ IRU WKH SHFLIL H LQS W SDWK •  HWWLQ WKL SURSHUW LQ SDUN MRE RH Q W KHOS
  • 30. 3LOH OL WLQ IRU SDUWLWLRQH WDEOH Partition path Seq[RDD] HadoopRDD HadoopRDD HadoopRDD HadoopRDD Partition path Partition path Partition path Input dir Input dir Input dir Input dir Sequentially listing input dirs via S3N file system. S3N S3N S3N S3N
  • 31. . 6 ++ ( . 6 & •  PSWRP •  5QS W SOLW FRPS WDWLRQ IRU SDUWLWLRQH 4LYH WDEOH RQ L ORZ •  0D H •  7L WLQ ILOH RQ D SHU SDUWLWLRQ ED L L ORZ •  9 ILOH WHP FRPS WH DWD ORFDOLW KLQW •  RO WLRQ •  / ON OL W SDUWLWLRQ LQ SDUDOOHO LQ .PD RQ 0OLHQW •  / SD DWD ORFDOLW FRPS WDWLRQ IRU REMHFW
  • 32. E ON OL WLQ Partition path ParArray[RDD] HadoopRDD HadoopRDD HadoopRDD HadoopRDD Partition path Partition path Partition path Input dir Input dir Input dir Input dir Bulk listing input dirs in parallel via AmazonS3Client. Amazon S3Client
  • 33. HUIRUPDQFH LPSURYHPHQW 0 2000 4000 6000 8000 10000 12000 14000 16000 1 24 240 720 seconds # of partitions 1.5 RC2 S3 bulk listing SELECT * FROM nccp_log WHERE dateint=20150801 and hour=0 LIMIT 10;
  • 35. UREOHP , 4D RRS R WS W FRPPLWWHU •  4RZ LW ZRUN , •  2DFK WD N ZULWH R WS W WR D WHPS LU •  : WS W FRPPLWWHU UHQDPH ILU W FFH I O WD N WHPS LU WR ILQDO H WLQDWLRQ •  UREOHP ZLWK , •  UHQDPH L FRS DQ HOHWH •  L HYHQW DO FRQ L WHQW •  3LOH9RW3R Q 2 FHSWLRQ ULQ UHQDPH
  • 36. R WS W FRPPLWWHU •  4RZ LW ZRUN , •  2DFK WD N ZULWH R WS W WR ORFDO L N •  : WS W FRPPLWWHU FRSLH ILU W FFH I O WD N R WS W WR •  . YDQWD H , •  .YRL UH DQDQW FRS •  .YRL HYHQW DO FRQ L WHQF
  • 37. UREOHP , 4LYH LQ HUW RYHUZULWH •  4RZ LW ZRUN , •  1HOHWH DQ UHZULWH H L WLQ R WS W LQ SDUWLWLRQ •  UREOHP ZLWK , •  L HYHQW DO FRQ L WHQW •  3LOH.OUHD 2 L W2 FHSWLRQ ULQ UHZULWH
  • 38. /DWFKL SDWWHUQ •  4RZ LW ZRUN , •  9HYHU HOHWH H L WLQ R WS W LQ SDUWLWLRQ •  2DFK MRE LQ HUW D QLT H ESDUWLWLRQ FDOOH EDWFKL •  . YDQWD H , •  .YRL HYHQW DO FRQ L WHQF
  • 40. /L DWD SRUWDO •  :QH WRS KRS IRU DOO EL DWD UHODWH WRRO DQ HUYLFH •  / LOW RQ WRS RI /L 1DWD . 5
  • 41. : W RI ER H DPSOH
  • 42. •  Zero installation •  Dependency management via Docker •  Notebook persistence •  Elastic resources :Q HPDQ QRWHERRN
  • 43. Quick facts about Titan •  Task execution platform leveraging Apache Mesos. •  Manages underlying EC2 instances. •  Process supervision and uptime in the face of failures. •  Auto scaling
  • 45. Ephemeral ports / --net=host mode Zeppelin Docker Container A 172.X.X.X Host machine A 54.X.X.X Host machine B 54.X.X.X Pyspark Docker Container B 172.X.X.X Titan cluster YARN cluster Spark AM Spark AM
  • 46. @ H 0D H L Y SDUN
  • 48. 5WHUDWLYH MRE 1. Duplicate data and aggregate them differently. 2. Merging aggregates back.
  • 50. : U FRQWULE WLRQ . 6 ( . 6 ((( . 6 (+ + . 6 (+ . 6 ) ) . 6 )& . 6 ) . 6 . 6 ) . 6 + . 6 + ) . 6 ++ ( . 6 . 6 &
  • 51. .