SlideShare uma empresa Scribd logo
1 de 32
Adaptive Transfer Adjustment in Efficient Bulk
Data Transfer Management for Climate
Datasets
Alex Sim, Mehmet Balman, Dean Williams,
Arie Shoshani, Vijaya Natarajan
Lawrence Berkeley National Laboratory
Lawrence Livermore National Laboratory
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
ESG (Earth System Grid)
Supports the infrastructure for climate research
Provide technology to access, distributed, transport, catalog climate
simulation data
Production since 2004
ESG-I (1999-2001)
ESG-II (2002-2006)
ESG-CET(2006-present)
ANL, LANL, LBNL, LLNL,
NCAR, ORNL, NERSC, …
NCAR CCSM ESG portal
237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL)
965,551 files
Includes the past 7 years of joint DOE/NSF climate modeling experiments
LLNL CMIP-3 (IPCC AR4) ESG portal
35 TB of data at one location
83,337 files
model data from 13 countries
Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change (IPCC)
Over 565 scientific peer-review publications
ESG (Earth System Grid) / Climate Simulation Data
ESG (Earth System Grid) / Climate Simulation Data
ESG web portals distributed worldwide
Over 2,700 sites
120 countries
25,000 users
Over 1 PB downloaded
Climate Data : ever-increasing sizes
Early 1990’s (e.g., AMIP1, PMIP, CMIP1)
modest collection of monthly mean 2D files: ~1 GB
Late 1990’s (e.g., AMIP2)
large collection of monthly mean and 6-hourly 2D and 3D fields:
~500 GB
2000’s (e.g., IPCC/CMIP3)
fairly comprehensive output from both ocean and atmospheric
components; monthly, daily, and 3 hourly: ~35 TB
2011:
The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB
The Climate Science Computational End Station (CCES) project at
ORNL: expected around 3 PB
The North American Regional Climate Change Assessment Program
(NARCCAP): expected around 1 PB
The Cloud Feedback Model Intercomparison Project (CFMIP) archives:
expected to be .3 PB
ESG (Earth System Grid) / Climate Simulation Data
Results from the Parallel Climate Model (PCM)
depicting wind vectors, surface pressure, sea
surface temperature, and sea ice concentration.
Prepared from data published in the ESG using the
FERRET analysis tool by Gary Strand, NCAR.
Massive data collections:
● shared by thousands of
researchers
● distributed among many data
nodes around the world
Replication of published core
collection (for scalability and
availability)
Bulk Data Movement in ESG
● Move terabytes to petabytes (many thousands of files)
● Extreme variance in file sizes
● Reliability and Robustness
● Asynchronous long-lasting operation
● Recovery from transient failures and automatic restart
● Support for checksum verification
● On-demand transfer request status
● Estimation of request completion time
Replication Use-case
Compute Nodes
Compute Nodes
Data Flow
Data Nodes
Gateways
Compute Nodes
Clients
Data Nodes
Data Node
Japan Gateway
Germany Gateway
Data Node
Data Nodes
Data Node
Data Node
Data Nodes Data Nodes
Data Nodes
Australia Gateway
NCAR Gateway
LLNL Gateway
ORNL Gateway
Canada Gateway
LBNL Gateway
UK Gateway
Compute Nodes
Compute Nodes
WAN
Distributed File System
. . .
: : : : : :
BeStMan . . .
Data Node
Load Balancing Module
Parallelism / Concurrency
Module
File Replication Module
Transfer Servers
. . .
. . .
. . .
. . .
. . .
Hot File Replications
Transfer Servers Transfer Servers
Data Flow
Faster Data Transfers
End-to-end bulk data transfer (latency wall)
 TCP based solutions
 Fast TCP, Scalable TCP etc
 UDP based solutions
 RBUDP, UDT etc
 Most of these solutions require kernel level changes
 Not preferred by most domain scientists
Application Level Tuning
 Take an application-level transfer protocol (i.e.
GridFTP) and tune-up for better performance:
 Using Multiple (Parallel) streams
 Tuning Buffer size
(efficient utilization of available network capacity)
Level of Parallelism in End-to-end Data Transfer
 number of parallel data streams connected to a data transfer
service for increasing the utilization of network bandwidth
 number of concurrent data transfer operations that are
initiated at the same time for better utilization of system
resources.
Parallel TCP Streams
 Instead of a single connection at a time, multiple TCP
streams are opened to a single data transfer service in
the destination host.
 We gain larger bandwidth in TCP especially in a
network with less packet loss rate; parallel connections
better utilize the TCP buffer available to the data
transfer, such that N connections might be N times
faster than a single connection
 Multiple TCP streams puts extra system overhead
Parallel TCP Streams
Bulk Data Mover (BDM)
● Monitoring and statistics collection
● Support for multiple protocols (GridFTP, HTTP)
● Load balancing between multiple servers
● Data channel connection caching, pipelining
● Parallel TCP stream, concurrent data transfer operations
● Multi-threaded transfer queue management
● Adaptability to the available bandwidth
Bulk Data Mover (BDM)
DBFileFile File
…
FileFileFileFile
File File File
…
WAN
Network Connection and
Concurrency Manager
Transfer
Queue
Network
Connections
Transfer
Control Manager
Transfer Queue
Monitor and Manager
Concurrent Connections
…
DB/Storage
Queue Manager
Local Storage
…
File
File
…
File
File
…
File
File
…
Transfer Queue Management
* Plots generated from NetLogger
time time
timetime
The number of concurrent
transfers on the left
column shows consistent
over time in well-managed
transfers shown at the
bottom row, compared to
the ill or non-managed
data connections shown at
the top row. It leads to the
higher overall throughput
performance on the lower-
right column.
Adaptive Transfer Management
Bulk Data Mover (BDM)
● number parallel TCP streams?
● number of concurrent data transfer operations?
● Adaptability to the available bandwidth
Parallel Stream vs Concurrent Transfer
16x232x1 4x8
• Same number of total streams, but different number of concurrent connections
Parameter Estimation
 Can we predict this
behavior?
 Yes, we can come up with
a good estimation for the
parallelism level
 Network statistics
 Extra measurement
 Historical data
Parallel Stream Optimization
single stream, theoretical calculation of
throughput based on MSS, RTT and packet loss
rate:
n streams gains as much as total throughput of
n single stream: (not correct)
A better model: a relation is established
between RTT, p and the number of streams n:
Parameter Estimation
 Might not reflect the best possible current settings
(Dynamic Environment)
 What if network condition changes?
 Requires three sample transfers (curve fitting)
 need to probe the system and make
measurements with external profilers
 Does require a complex model for parameter
optimization
Adaptive Tuning
 Instead of predictive sampling, use data from
actual transfer
 transfer data by chunks (partial transfers) and
also set control parameters on the fly.
 measure throughput for every transferred data
chunk
 gradually increase the number of parallel
streams till it comes to an equilibrium point
Adaptive Tuning
 No need to probe the system and make
measurements with external profilers
 Does not require any complex model for
parameter optimization
 Adapts to changing environment
But, overhead in changing parallelism level
Fast start (exponentially increase the number of
parallel streams)
Adaptive Tuning
 Start with single stream (n=1)
 Measure instant throughput for every data chunk transferred
(fast start)
 Increase the number of parallel streams (n=n*2),
 transfer the data chunk
 measure instant throughput
 If current throughput value is better than previous one,
continue
 Otherwise, set n to the old value and gradually increase
parallelism level (n=n+1)
 If no throughput gain by increasing number of streams
(found the equilibrium point)
 Increase chunk size (delay measurement period)
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Parallel Streams (estimate starting point)
Log-log scale
Can we predict
this behavior?
Power-law model
Achievable throughput in percentage over the
number of streams with low/medium/high RTT;
(a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e)
RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT)
Power-law model
T = (n / c) (RTT / k)
80-20 rule-Pareto dist.
0.8 = (n / c) (RTT / k)
n = (e ( k * ln 0.8 / RTT )
) ∙ c
Extend power-law model:
Unlike other models in the literature (trying to find an approximation model for
the multiple streams and throughput relationship), this model only focuses on the
initial behavior of the transfer performance. When RTT is low, the achievable
throughput starts high with the low number of streams and quickly approaches to
the optimal throughput. When RTT is high, more number of streams is needed
for higher achievable throughput.
Get prepared to next-generation networks:
-100Gbps
-RDMA
Future Work
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
Bulk Data Mover
http://sdm.lbl.gov/bdm/
Earth System Grid
http://www.earthsystemgrid.org
http://esg-pcmdi.llnl.gov/
Support emails
esg-support@earthsystemgrid.org
srm@lbl.gov

Mais conteúdo relacionado

Mais procurados

Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 Download-manuals-surface water-waterlevel-40howtocompiledischargedata Download-manuals-surface water-waterlevel-40howtocompiledischargedata
Download-manuals-surface water-waterlevel-40howtocompiledischargedatahydrologyproject001
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceSpiros Oikonomakis
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Vivek Sharma
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputerinside-BigData.com
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Jonathan Rinfret
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy Ehsan Sharifi
 
Pcgrid presentation qos p2p grid
Pcgrid presentation   qos p2p gridPcgrid presentation   qos p2p grid
Pcgrid presentation qos p2p gridmarcuswac
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTDavide Gallitelli
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)Josep Flix
 
NNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IINNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IIjuanrojochacon
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingIOSR Journals
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-thingsnickyn
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced antIJCI JOURNAL
 

Mais procurados (19)

Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 Download-manuals-surface water-waterlevel-40howtocompiledischargedata Download-manuals-surface water-waterlevel-40howtocompiledischargedata
Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputer
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Pcgrid presentation qos p2p grid
Pcgrid presentation   qos p2p gridPcgrid presentation   qos p2p grid
Pcgrid presentation qos p2p grid
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
NNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IINNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run II
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather Forecasting
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced ant
 

Semelhante a Pdcs2010 balman-presentation

Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networksHarshavardhan Achrekar
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...csandit
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksbalmanme
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...IJMER
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSNexgen Technology
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers acrossnexgentech15
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacentersnexgentechnology
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsIAEME Publication
 
From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...Globus
 

Semelhante a Pdcs2010 balman-presentation (20)

Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
 
Journal paper 1
Journal paper 1Journal paper 1
Journal paper 1
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
 
Bg4101335337
Bg4101335337Bg4101335337
Bg4101335337
 
cpc-152-2-2003
cpc-152-2-2003cpc-152-2-2003
cpc-152-2-2003
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applications
 
From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 

Mais de balmanme

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterbalmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...balmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 

Mais de balmanme (20)

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 

Último

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Último (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Pdcs2010 balman-presentation

  • 1. Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets Alex Sim, Mehmet Balman, Dean Williams, Arie Shoshani, Vijaya Natarajan Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid
  • 2. ESG (Earth System Grid) Supports the infrastructure for climate research Provide technology to access, distributed, transport, catalog climate simulation data Production since 2004 ESG-I (1999-2001) ESG-II (2002-2006) ESG-CET(2006-present) ANL, LANL, LBNL, LLNL, NCAR, ORNL, NERSC, …
  • 3. NCAR CCSM ESG portal 237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL) 965,551 files Includes the past 7 years of joint DOE/NSF climate modeling experiments LLNL CMIP-3 (IPCC AR4) ESG portal 35 TB of data at one location 83,337 files model data from 13 countries Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change (IPCC) Over 565 scientific peer-review publications ESG (Earth System Grid) / Climate Simulation Data
  • 4. ESG (Earth System Grid) / Climate Simulation Data ESG web portals distributed worldwide Over 2,700 sites 120 countries 25,000 users Over 1 PB downloaded
  • 5. Climate Data : ever-increasing sizes Early 1990’s (e.g., AMIP1, PMIP, CMIP1) modest collection of monthly mean 2D files: ~1 GB Late 1990’s (e.g., AMIP2) large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB 2000’s (e.g., IPCC/CMIP3) fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB 2011: The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be .3 PB
  • 6. ESG (Earth System Grid) / Climate Simulation Data Results from the Parallel Climate Model (PCM) depicting wind vectors, surface pressure, sea surface temperature, and sea ice concentration. Prepared from data published in the ESG using the FERRET analysis tool by Gary Strand, NCAR. Massive data collections: ● shared by thousands of researchers ● distributed among many data nodes around the world Replication of published core collection (for scalability and availability)
  • 7. Bulk Data Movement in ESG ● Move terabytes to petabytes (many thousands of files) ● Extreme variance in file sizes ● Reliability and Robustness ● Asynchronous long-lasting operation ● Recovery from transient failures and automatic restart ● Support for checksum verification ● On-demand transfer request status ● Estimation of request completion time
  • 8. Replication Use-case Compute Nodes Compute Nodes Data Flow Data Nodes Gateways Compute Nodes Clients Data Nodes Data Node Japan Gateway Germany Gateway Data Node Data Nodes Data Node Data Node Data Nodes Data Nodes Data Nodes Australia Gateway NCAR Gateway LLNL Gateway ORNL Gateway Canada Gateway LBNL Gateway UK Gateway Compute Nodes Compute Nodes WAN Distributed File System . . . : : : : : : BeStMan . . . Data Node Load Balancing Module Parallelism / Concurrency Module File Replication Module Transfer Servers . . . . . . . . . . . . . . . Hot File Replications Transfer Servers Transfer Servers Data Flow
  • 9. Faster Data Transfers End-to-end bulk data transfer (latency wall)  TCP based solutions  Fast TCP, Scalable TCP etc  UDP based solutions  RBUDP, UDT etc  Most of these solutions require kernel level changes  Not preferred by most domain scientists
  • 10. Application Level Tuning  Take an application-level transfer protocol (i.e. GridFTP) and tune-up for better performance:  Using Multiple (Parallel) streams  Tuning Buffer size (efficient utilization of available network capacity) Level of Parallelism in End-to-end Data Transfer  number of parallel data streams connected to a data transfer service for increasing the utilization of network bandwidth  number of concurrent data transfer operations that are initiated at the same time for better utilization of system resources.
  • 11. Parallel TCP Streams  Instead of a single connection at a time, multiple TCP streams are opened to a single data transfer service in the destination host.  We gain larger bandwidth in TCP especially in a network with less packet loss rate; parallel connections better utilize the TCP buffer available to the data transfer, such that N connections might be N times faster than a single connection  Multiple TCP streams puts extra system overhead
  • 13. Bulk Data Mover (BDM) ● Monitoring and statistics collection ● Support for multiple protocols (GridFTP, HTTP) ● Load balancing between multiple servers ● Data channel connection caching, pipelining ● Parallel TCP stream, concurrent data transfer operations ● Multi-threaded transfer queue management ● Adaptability to the available bandwidth
  • 14. Bulk Data Mover (BDM) DBFileFile File … FileFileFileFile File File File … WAN Network Connection and Concurrency Manager Transfer Queue Network Connections Transfer Control Manager Transfer Queue Monitor and Manager Concurrent Connections … DB/Storage Queue Manager Local Storage … File File … File File … File File …
  • 15. Transfer Queue Management * Plots generated from NetLogger time time timetime The number of concurrent transfers on the left column shows consistent over time in well-managed transfers shown at the bottom row, compared to the ill or non-managed data connections shown at the top row. It leads to the higher overall throughput performance on the lower- right column.
  • 17. Bulk Data Mover (BDM) ● number parallel TCP streams? ● number of concurrent data transfer operations? ● Adaptability to the available bandwidth
  • 18. Parallel Stream vs Concurrent Transfer 16x232x1 4x8 • Same number of total streams, but different number of concurrent connections
  • 19. Parameter Estimation  Can we predict this behavior?  Yes, we can come up with a good estimation for the parallelism level  Network statistics  Extra measurement  Historical data
  • 20. Parallel Stream Optimization single stream, theoretical calculation of throughput based on MSS, RTT and packet loss rate: n streams gains as much as total throughput of n single stream: (not correct) A better model: a relation is established between RTT, p and the number of streams n:
  • 21. Parameter Estimation  Might not reflect the best possible current settings (Dynamic Environment)  What if network condition changes?  Requires three sample transfers (curve fitting)  need to probe the system and make measurements with external profilers  Does require a complex model for parameter optimization
  • 22. Adaptive Tuning  Instead of predictive sampling, use data from actual transfer  transfer data by chunks (partial transfers) and also set control parameters on the fly.  measure throughput for every transferred data chunk  gradually increase the number of parallel streams till it comes to an equilibrium point
  • 23. Adaptive Tuning  No need to probe the system and make measurements with external profilers  Does not require any complex model for parameter optimization  Adapts to changing environment But, overhead in changing parallelism level Fast start (exponentially increase the number of parallel streams)
  • 24. Adaptive Tuning  Start with single stream (n=1)  Measure instant throughput for every data chunk transferred (fast start)  Increase the number of parallel streams (n=n*2),  transfer the data chunk  measure instant throughput  If current throughput value is better than previous one, continue  Otherwise, set n to the old value and gradually increase parallelism level (n=n+1)  If no throughput gain by increasing number of streams (found the equilibrium point)  Increase chunk size (delay measurement period)
  • 28. Parallel Streams (estimate starting point)
  • 29. Log-log scale Can we predict this behavior?
  • 30. Power-law model Achievable throughput in percentage over the number of streams with low/medium/high RTT; (a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e) RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT) Power-law model T = (n / c) (RTT / k) 80-20 rule-Pareto dist. 0.8 = (n / c) (RTT / k) n = (e ( k * ln 0.8 / RTT ) ) ∙ c
  • 31. Extend power-law model: Unlike other models in the literature (trying to find an approximation model for the multiple streams and throughput relationship), this model only focuses on the initial behavior of the transfer performance. When RTT is low, the achievable throughput starts high with the low number of streams and quickly approaches to the optimal throughput. When RTT is high, more number of streams is needed for higher achievable throughput. Get prepared to next-generation networks: -100Gbps -RDMA Future Work
  • 32. The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid Bulk Data Mover http://sdm.lbl.gov/bdm/ Earth System Grid http://www.earthsystemgrid.org http://esg-pcmdi.llnl.gov/ Support emails esg-support@earthsystemgrid.org srm@lbl.gov