SlideShare uma empresa Scribd logo
1 de 40
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
Samuel Kommu
Technical Marketing Engineer
sakommu@cisco.com
Hadoop Summit 2014
V3
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
• Hadoop Overview
• Scheduling and Prioritizing
Compute
Network
• Visibility Plugins
• Hadoop Optimization and Tuning
• Recommendations
• Q & A
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Big Data @ Cisco - www.cisco.com/go/bigdata
Multi-year network and compute analysis testing
(In conjunction with partners)
Hadoop World
2011, 2012 & 2013
Hadoop Summit
2012 & 2013
Certifications and Solutions with UCS C-Series and
Nexus Series Switches
Cloudera Hadoop Certified Technology
Oracle NoSQL Validated Solution
Visibility & Monitoring
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 4
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Could take several days
just to read
Per a test it took 11 days
100 Terabytes of data
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
NAS/SAN
Since you are still limited by the
Throughput of the storage systems
100 Terabytes of data
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
100 Terabytes of direct attached storage
Hadoop Distributed File System
Same job took 15 minutes
once they went parallel
spreading the load across 1000 nodes
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
• The Data Ingest & Replication
External Connectivity
East West Traffic (Replication of data blocks)
• Map Phase – Raw data Analyzed and
converted to name/value pair.
Workload translate to multiple batches of Map
task
Reducer can start the reduce phase ONLY
after the entire Map set is complete
• Mostly a IO/compute function
Hadoop Distributed File System Unstructured Data
Map
Map
Map
Map
Map
Key 1
Key 1
Key 1
Key 1
Key 1
Key 1
Key 1
Key 2
Key 1
Key 1
Key 1
Key 3
Key 1
Key 1
Key 1
Key 4
Reduce
Shuffle Phase
ReduceReduce
Result/Output
Reduce
Map
Map
Map
Map
Map
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
Key 1
Key 1
Key 1
Key 4
Key 1
Key 1
Key 1
Key 3
Key 1
Key 1
Key 1
Key 2
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
• Shuffle Phase - All name/value pair are sorted and
grouped by their keys.
• Reducer is PULLING the data from the Mapper Nodes
• High Network Activity
• Reduce Phase – All values associates with a key are
process for results, three phases
Copy - get intermediate result from each data
node local disk
Merge - to reduce the number of files
Reduce method
• Output Replication Phase - Reducer replicating result to
multiple nodes
Highest Network Activity
• Network Activities Dependent on Workload Behavior
Hadoop Distributed File System Unstructured Data
Map
Map
Map
Map
Map
Key 1
Key 1
Key 1
Key 1
Reduce
Shuffle Phase
ReduceReduce
Result/Output
Reduce
Map
Map
Map
Map
Map
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
Analyze
Search - Count
Extract Transform Load
(ETL - TeraSort)
Explode
Normalizing Data
Reduce
Ingress vs. Egress Data Set
1:0.3
Ingress vs. Egress Data Set
1:1
Ingress vs. Egress Data Set
1:2
Reduce
Reduce
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
Analyze
Simulated with
Shakespeare Wordcount
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort
Extract Transform Load
(ETL)
Simulated with
Yahoo TeraSort with output replication
Job Patterns have varying impact on network utilization
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
Small Flows/Messaging
(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)
Small – Medium Incast
(Hadoop Shuffle)
Large Flows
(HDFS Ingest)
Large Incast
(Hadoop Replication)
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 14
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
Scheduling
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
Rescheduling
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
Prioritize/De-Prioritize
Core 1 Core 2 Core 3
Job 1 Job 2
Core 4
Job 3
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Client 1 Client 2
Resource Manager
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Node Manager
ContainerContainer
Job Submission
Resource Request
Node Status
MapReduce Status
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations
0
5000
10000
15000
20000
25000
30000
35000
40000
Latency(us)
Time
UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us)
1
70
139
208
277
346
415
484
553
622
691
760
829
898
967
1036
1105
1174
1243
1312
1381
1450
1519
1588
1657
1726
1795
1864
1933
2002
2071
2140
2209
2278
2347
2416
2485
2554
2623
2692
2761
2830
2899
2968
3037
3106
3175
3244
3313
3382
3451
3520
3589
3658
3727
3796
3865
3934
4003
4072
4141
4210
4279
4348
4417
4486
4555
4624
4693
4762
4831
4900
4969
5038
5107
5176
5245
5314
5383
5452
5521
5590
5659
5728
5797
5866
5935
BufferUsed
Timeline
Hadoop TeraSort Hbase
Read/Update
Latency
Comparison of Non-
QoS vs. QoS Policy
~60% for Read
Improvement
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 18
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Traditional Networking – Switching/Routing Decisions are per flow
What if we could divide a flow into multiple parts –
that could take independent network paths
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Normal
Congested
Leaf 1 Leaf 2
Spine 2Spine 1
In traditional networks
available today:
Leaf 1 is not aware of the
congestion between Spine 2
and Leaf 2
Congestion awareness across
the network is essential for
optimal traffic distribution
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Small FlowsLarge Flows
Impact of large flows
on small flows
Without
Dynamic Packet Prioritization
Large Flows tend to use up resources:
• Bandwidth
• Buffer
Unless smaller flows are prioritized – large
flows could potentially have an adverse
impact on the smaller flows
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Decisions are per leaf
Does not consider end-to-end connectivity/congestion
Load balancing decisions are made based on
end-to-end connectivity/congestion
50% Usage
Congested Link
66% Usage
33% Usage
ECMP DLB
Leaf 1 Leaf 2
Spine 2Spine 1
Leaf 1 Leaf 2
Spine 2Spine 1
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Buffer usage is
mostly on Leafs
Not on Spines
Map Progress
Reduce Progress
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 25
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
OS : Red Hat 6.2
Hadoop : Cloudera 3 U5
Memcached
Symmetric - No link failures Asymmetric – One failed link
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
Tests with Memcached
Dynamic Packet Prioritization helps improve
memcached performance
Note: These tests were disk bound – Will be
testing with servers containing higher number
of drives
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
ECMP
DLB
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 29
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30
• Hardware
1 x Nexus 9508 (Spine)
2 x Nexus 9396 (Leaf)
8 x UCS C240 M3
2 x Xeon(R) CPU E5-2690 0 @ 2.90GHz
100G RAM
4 x 1TB Drives
UCS VIC 1225
• Software
Red Hat 6.2
Cloudera 5
NXOS 6.1(2)I2(1)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
• Open Framework
Modules will be on Github
• Based on
Apache Hadoop Rest APIs
NXOS Python APIs
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32
leaf-001# python bootflash:hadoopModule/vpmHadoop.py localNodes
---------------------------------------------------------------
Node Name Port Speed InRate OutRate Buffer
---------------------------------------------------------------
c240-m3-017 Eth1/1 10 1385580232 1128094464 0
c240-m3-018 Eth1/2 10 1882651664 1273181224 50
c240-m3-020 Eth1/3 10 1236743432 1238902664 0
c240-m3-021 Eth1/4 10 675482600 1292790704 0
---------------------------------------------------------------
In/Out Rate is the avg of last 30 seconds in Gbits/sec
NXAPI©
• Open Framework
• Based on
Apache Hadoop Rest APIs
NXOS Python APIs
Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
leaf-001# python bootflash:hadoopModule/vpmHadoop.py allNodesJobs
Node Name Port Neighbor Speed
--------------------------------------
c240-m3-001 Eth2/2 spine-001 40
c240-m3-002 Eth2/2 spine-001 40
c240-m3-004 Eth2/2 spine-001 40
c240-m3-005 Eth2/2 spine-001 40
c240-m3-017 Eth1/1 local 10
c240-m3-018 Eth1/2 local 10
c240-m3-020 Eth1/3 local 10
c240-m3-021 Eth1/4 local 10
Job ID Job Name Host Name Application Progress %
---------------------------------------------------------------------
0887 TeraGen c240-m3-001 MAPREDUCE 37.7623
---------------------------------------------------------------------
Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins
NXAPI©
• Another example
With Job info
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
Instantaneous Results
Historic Data using
GraphiteTopology
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 35
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36
0
500
1000
1500
2000
2500
SSD Non-SSD
TimeTakeninSeconds
Time taken for 1TB Sort
SSD vs. Non-SSD Drives
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37
0
500
1000
1500
2000
2500
3000
Never Always
TimeTakeninSeconds
Time taken for 1TB Sort
THP Always vs THP Never
To disable:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 38
39
 Network Attributes
 Architecture
 Availability
 Capacity, Scale &
Oversubscription
 Flexibility
 Management & Visibility
Integration Considerations
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40
Availability
• Single NIC failure doubles the job completion time.
• Dual NIC has no impact on job completion time
• Effective load-sharing of traffic flow on two NICs. NIC bonding configured at Linux – with LACP mode of bonding
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-
sharing
40
1161 min
286 min
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41
Single 1GE
100% Utilized
Dual 1GE
75% Utilized
10GE
40% Utilized
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42
Big Data @ Cisco -
www.cisco.com/go/bigdata
Q & A
For further questions and a
chance to win Beats headset
meet us in Booth P12

Mais conteúdo relacionado

Mais procurados

HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...
HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...
HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...ASBIS SK
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
 
Hacmp – high availability pdf
Hacmp – high availability  pdf Hacmp – high availability  pdf
Hacmp – high availability pdf asihan
 
Data Center Network Trends - Lin Nease
Data Center Network Trends - Lin NeaseData Center Network Trends - Lin Nease
Data Center Network Trends - Lin NeaseHPDutchWorld
 
Increase Efficiency of Solaris Operations & Hardware Life Cycle
Increase Efficiency of Solaris Operations & Hardware Life CycleIncrease Efficiency of Solaris Operations & Hardware Life Cycle
Increase Efficiency of Solaris Operations & Hardware Life CycleJomaSoft
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimizationinside-BigData.com
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable CloudChris Genazzio
 
Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Fuad Arshad
 
HP 3Par StoreServ Storage: HP All Flash Array SSD
HP 3Par StoreServ Storage: HP All Flash Array SSDHP 3Par StoreServ Storage: HP All Flash Array SSD
HP 3Par StoreServ Storage: HP All Flash Array SSDUnitiv
 
Cisco Live! :: Content Delivery Networks (CDN)
Cisco Live! :: Content Delivery Networks (CDN)Cisco Live! :: Content Delivery Networks (CDN)
Cisco Live! :: Content Delivery Networks (CDN)Bruno Teixeira
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologiesrjain51
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPROIDEA
 
DPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun RajagopalDPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun RajagopalJim St. Leger
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Next Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureNext Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureCisco Canada
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 

Mais procurados (20)

HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...
HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...
HP Storage pre virtuálne systémy (Prehľad riešení na zálohovanie a ukladanie ...
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
Hacmp – high availability pdf
Hacmp – high availability  pdf Hacmp – high availability  pdf
Hacmp – high availability pdf
 
Data Center Network Trends - Lin Nease
Data Center Network Trends - Lin NeaseData Center Network Trends - Lin Nease
Data Center Network Trends - Lin Nease
 
Increase Efficiency of Solaris Operations & Hardware Life Cycle
Increase Efficiency of Solaris Operations & Hardware Life CycleIncrease Efficiency of Solaris Operations & Hardware Life Cycle
Increase Efficiency of Solaris Operations & Hardware Life Cycle
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
 
Microsofts Configurable Cloud
Microsofts Configurable CloudMicrosofts Configurable Cloud
Microsofts Configurable Cloud
 
Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached
 
HP 3Par StoreServ Storage: HP All Flash Array SSD
HP 3Par StoreServ Storage: HP All Flash Array SSDHP 3Par StoreServ Storage: HP All Flash Array SSD
HP 3Par StoreServ Storage: HP All Flash Array SSD
 
Cisco Live! :: Content Delivery Networks (CDN)
Cisco Live! :: Content Delivery Networks (CDN)Cisco Live! :: Content Delivery Networks (CDN)
Cisco Live! :: Content Delivery Networks (CDN)
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologies
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP Moonshot
 
DPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun RajagopalDPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK Summit 2015 - Sprint - Arun Rajagopal
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
ha_module5
ha_module5ha_module5
ha_module5
 
Next Generation Nexus 9000 Architecture
Next Generation Nexus 9000 ArchitectureNext Generation Nexus 9000 Architecture
Next Generation Nexus 9000 Architecture
 
Oracle Storage a ochrana dat
Oracle Storage a ochrana datOracle Storage a ochrana dat
Oracle Storage a ochrana dat
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 

Semelhante a BigData Clusters Redefined

The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle CloudSimon Haslam
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Markus Michalewicz
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013Taldor Group
 
Introduction to Segment Routing
Introduction to Segment RoutingIntroduction to Segment Routing
Introduction to Segment RoutingMyNOG
 
Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Bruce Tolley
 
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasIntroduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasBruno Teixeira
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
IPv6 Security - Myths and Reality
IPv6 Security - Myths and RealityIPv6 Security - Myths and Reality
IPv6 Security - Myths and RealitySwiss IPv6 Council
 
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USASegment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USAJose Liste
 
SRv6-TOI-rev3i-EXTERNAL.pdf
SRv6-TOI-rev3i-EXTERNAL.pdfSRv6-TOI-rev3i-EXTERNAL.pdf
SRv6-TOI-rev3i-EXTERNAL.pdfYunLiu75
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationJeff Squyres
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsCisco Canada
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PROIDEA
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageAvere Systems
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Mellanox Technologies
 

Semelhante a BigData Clusters Redefined (20)

The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
 
Open v ran
Open v ranOpen v ran
Open v ran
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
 
Introduction to Segment Routing
Introduction to Segment RoutingIntroduction to Segment Routing
Introduction to Segment Routing
 
Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]
 
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las VegasIntroduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
Introduction to SDN and Network Programmability - BRKRST-1014 | 2017/Las Vegas
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
IPv6 Security - Myths and Reality
IPv6 Security - Myths and RealityIPv6 Security - Myths and Reality
IPv6 Security - Myths and Reality
 
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USASegment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
 
SRv6-TOI-rev3i-EXTERNAL.pdf
SRv6-TOI-rev3i-EXTERNAL.pdfSRv6-TOI-rev3i-EXTERNAL.pdf
SRv6-TOI-rev3i-EXTERNAL.pdf
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentation
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

BigData Clusters Redefined

  • 1. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Samuel Kommu Technical Marketing Engineer sakommu@cisco.com Hadoop Summit 2014 V3
  • 2. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2 • Hadoop Overview • Scheduling and Prioritizing Compute Network • Visibility Plugins • Hadoop Optimization and Tuning • Recommendations • Q & A
  • 3. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3 Big Data @ Cisco - www.cisco.com/go/bigdata Multi-year network and compute analysis testing (In conjunction with partners) Hadoop World 2011, 2012 & 2013 Hadoop Summit 2012 & 2013 Certifications and Solutions with UCS C-Series and Nexus Series Switches Cloudera Hadoop Certified Technology Oracle NoSQL Validated Solution Visibility & Monitoring
  • 4. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 4
  • 5. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5 Could take several days just to read Per a test it took 11 days 100 Terabytes of data
  • 6. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6 NAS/SAN Since you are still limited by the Throughput of the storage systems 100 Terabytes of data
  • 7. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7 100 Terabytes of direct attached storage Hadoop Distributed File System Same job took 15 minutes once they went parallel spreading the load across 1000 nodes
  • 8. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9 Map Map Map Map Map Map Map Map Map Map • The Data Ingest & Replication External Connectivity East West Traffic (Replication of data blocks) • Map Phase – Raw data Analyzed and converted to name/value pair. Workload translate to multiple batches of Map task Reducer can start the reduce phase ONLY after the entire Map set is complete • Mostly a IO/compute function Hadoop Distributed File System Unstructured Data Map Map Map Map Map Key 1 Key 1 Key 1 Key 1 Key 1 Key 1 Key 1 Key 2 Key 1 Key 1 Key 1 Key 3 Key 1 Key 1 Key 1 Key 4 Reduce Shuffle Phase ReduceReduce Result/Output Reduce Map Map Map Map Map
  • 9. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10 Key 1 Key 1 Key 1 Key 4 Key 1 Key 1 Key 1 Key 3 Key 1 Key 1 Key 1 Key 2 Map Map Map Map Map Map Map Map Map Map • Shuffle Phase - All name/value pair are sorted and grouped by their keys. • Reducer is PULLING the data from the Mapper Nodes • High Network Activity • Reduce Phase – All values associates with a key are process for results, three phases Copy - get intermediate result from each data node local disk Merge - to reduce the number of files Reduce method • Output Replication Phase - Reducer replicating result to multiple nodes Highest Network Activity • Network Activities Dependent on Workload Behavior Hadoop Distributed File System Unstructured Data Map Map Map Map Map Key 1 Key 1 Key 1 Key 1 Reduce Shuffle Phase ReduceReduce Result/Output Reduce Map Map Map Map Map
  • 10. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11 Analyze Search - Count Extract Transform Load (ETL - TeraSort) Explode Normalizing Data Reduce Ingress vs. Egress Data Set 1:0.3 Ingress vs. Egress Data Set 1:1 Ingress vs. Egress Data Set 1:2 Reduce Reduce
  • 11. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12 Analyze Simulated with Shakespeare Wordcount Extract Transform Load (ETL) Simulated with Yahoo TeraSort Extract Transform Load (ETL) Simulated with Yahoo TeraSort with output replication Job Patterns have varying impact on network utilization
  • 12. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13 Small Flows/Messaging (Admin Related, Heart-beats, Keep-alive, delay sensitive application messaging) Small – Medium Incast (Hadoop Shuffle) Large Flows (HDFS Ingest) Large Incast (Hadoop Replication)
  • 13. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 14
  • 14. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 Scheduling Core 1 Core 2 Core 3 Job 1 Job 2 Core 4 Job 3 Rescheduling Core 1 Core 2 Core 3 Job 1 Job 2 Core 4 Job 3 Prioritize/De-Prioritize Core 1 Core 2 Core 3 Job 1 Job 2 Core 4 Job 3
  • 15. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16 Client 1 Client 2 Resource Manager Node Manager Container App Master Node Manager Container App Master Node Manager ContainerContainer Job Submission Resource Request Node Status MapReduce Status
  • 16. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17 Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations 0 5000 10000 15000 20000 25000 30000 35000 40000 Latency(us) Time UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us) 1 70 139 208 277 346 415 484 553 622 691 760 829 898 967 1036 1105 1174 1243 1312 1381 1450 1519 1588 1657 1726 1795 1864 1933 2002 2071 2140 2209 2278 2347 2416 2485 2554 2623 2692 2761 2830 2899 2968 3037 3106 3175 3244 3313 3382 3451 3520 3589 3658 3727 3796 3865 3934 4003 4072 4141 4210 4279 4348 4417 4486 4555 4624 4693 4762 4831 4900 4969 5038 5107 5176 5245 5314 5383 5452 5521 5590 5659 5728 5797 5866 5935 BufferUsed Timeline Hadoop TeraSort Hbase Read/Update Latency Comparison of Non- QoS vs. QoS Policy ~60% for Read Improvement
  • 17. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 18
  • 18. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19 Traditional Networking – Switching/Routing Decisions are per flow What if we could divide a flow into multiple parts – that could take independent network paths
  • 19. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20 Normal Congested Leaf 1 Leaf 2 Spine 2Spine 1 In traditional networks available today: Leaf 1 is not aware of the congestion between Spine 2 and Leaf 2 Congestion awareness across the network is essential for optimal traffic distribution
  • 20. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21 Small FlowsLarge Flows Impact of large flows on small flows Without Dynamic Packet Prioritization Large Flows tend to use up resources: • Bandwidth • Buffer Unless smaller flows are prioritized – large flows could potentially have an adverse impact on the smaller flows
  • 21. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22 Decisions are per leaf Does not consider end-to-end connectivity/congestion Load balancing decisions are made based on end-to-end connectivity/congestion 50% Usage Congested Link 66% Usage 33% Usage ECMP DLB Leaf 1 Leaf 2 Spine 2Spine 1 Leaf 1 Leaf 2 Spine 2Spine 1
  • 22. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23 Buffer usage is mostly on Leafs Not on Spines Map Progress Reduce Progress
  • 23. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 25
  • 24. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26 OS : Red Hat 6.2 Hadoop : Cloudera 3 U5 Memcached Symmetric - No link failures Asymmetric – One failed link
  • 25. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27 Tests with Memcached Dynamic Packet Prioritization helps improve memcached performance Note: These tests were disk bound – Will be testing with servers containing higher number of drives
  • 26. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28 ECMP DLB
  • 27. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 29
  • 28. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30 • Hardware 1 x Nexus 9508 (Spine) 2 x Nexus 9396 (Leaf) 8 x UCS C240 M3 2 x Xeon(R) CPU E5-2690 0 @ 2.90GHz 100G RAM 4 x 1TB Drives UCS VIC 1225 • Software Red Hat 6.2 Cloudera 5 NXOS 6.1(2)I2(1)
  • 29. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31 • Open Framework Modules will be on Github • Based on Apache Hadoop Rest APIs NXOS Python APIs
  • 30. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32 leaf-001# python bootflash:hadoopModule/vpmHadoop.py localNodes --------------------------------------------------------------- Node Name Port Speed InRate OutRate Buffer --------------------------------------------------------------- c240-m3-017 Eth1/1 10 1385580232 1128094464 0 c240-m3-018 Eth1/2 10 1882651664 1273181224 50 c240-m3-020 Eth1/3 10 1236743432 1238902664 0 c240-m3-021 Eth1/4 10 675482600 1292790704 0 --------------------------------------------------------------- In/Out Rate is the avg of last 30 seconds in Gbits/sec NXAPI© • Open Framework • Based on Apache Hadoop Rest APIs NXOS Python APIs Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins
  • 31. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33 leaf-001# python bootflash:hadoopModule/vpmHadoop.py allNodesJobs Node Name Port Neighbor Speed -------------------------------------- c240-m3-001 Eth2/2 spine-001 40 c240-m3-002 Eth2/2 spine-001 40 c240-m3-004 Eth2/2 spine-001 40 c240-m3-005 Eth2/2 spine-001 40 c240-m3-017 Eth1/1 local 10 c240-m3-018 Eth1/2 local 10 c240-m3-020 Eth1/3 local 10 c240-m3-021 Eth1/4 local 10 Job ID Job Name Host Name Application Progress % --------------------------------------------------------------------- 0887 TeraGen c240-m3-001 MAPREDUCE 37.7623 --------------------------------------------------------------------- Scripts on GitHub: https://github.com/datacenter/VisibilityPlugins NXAPI© • Another example With Job info
  • 32. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34 Instantaneous Results Historic Data using GraphiteTopology
  • 33. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 35
  • 34. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36 0 500 1000 1500 2000 2500 SSD Non-SSD TimeTakeninSeconds Time taken for 1TB Sort SSD vs. Non-SSD Drives
  • 35. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37 0 500 1000 1500 2000 2500 3000 Never Always TimeTakeninSeconds Time taken for 1TB Sort THP Always vs THP Never To disable: echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
  • 36. Cisco Confidential© 2010 Cisco and/or its affiliates. All rights reserved. 38
  • 37. 39  Network Attributes  Architecture  Availability  Capacity, Scale & Oversubscription  Flexibility  Management & Visibility Integration Considerations
  • 38. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40 Availability • Single NIC failure doubles the job completion time. • Dual NIC has no impact on job completion time • Effective load-sharing of traffic flow on two NICs. NIC bonding configured at Linux – with LACP mode of bonding • Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load- sharing 40 1161 min 286 min
  • 39. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41 Single 1GE 100% Utilized Dual 1GE 75% Utilized 10GE 40% Utilized
  • 40. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42 Big Data @ Cisco - www.cisco.com/go/bigdata Q & A For further questions and a chance to win Beats headset meet us in Booth P12