From: "Rachel Lance" <rlance@lbl.gov>
Subject: [CSSeminars] REMINDER: Berkeley Lab - Computing Sciences Seminar - Monday, 8/17/2009, 2:00pm TODAY
Date: Mon, August 17, 2009 1:36 pm
To: CSSeminars@hpcrd.lbl.gov
Berkeley Lab - Computing Sciences Seminar - Reminder
TODAY, August 17, 2:00pm - 3:00pm, Bldg. 50F, Room 1647
Berkeley Lab - Computing Sciences Seminar
*/Date/:*
Monday, August 17, 2009
*/Time/:*
2:00pm - 3:00pm
*/Location/:*
Bldg. 50F, Room 1647
*/Speaker/:*
Mehmet Balman
Department of Computer Science
Louisiana State University
*/Title/:*
Advance Network Reservation and Provisioning for Science
*/Abstract/:*
Scientific applications already generate many terabytes and even
petabytes of data from supercomputer runs and large-scale
experiments. The need for transferring data chunks of
ever-increasing sizes through the network shows no sign of abating.
Hence, we need high-bandwidth high speed networks such as DoE's
ESnet (Energy Sciences Network) that manage the available bandwidth
effectively. OSCARS (ESnet On-demand Secure Circuits and Advance
Reservation System) serves as the network provisioning agent on
ESnet. Currently, using OSCARS, a user can specify a desired
bandwidth reservation of bandwidth x MB/sec for a duration y hours
starting at time t. OSCARS checks network availability and capacity
for the specified window of time, and allocates it for that user if
it is available. Otherwise, it reports to the user that it is unable
to do the allocation. Accordingly, it falls upon the user to search
for a time-frame of a required bandwidth by trial-and-error, not
having knowledge of the network's available capacity at a certain
instant of time. We report a novel algorithm, where the user
specifies the total volume that needs to be transferred, a maximum
bandwidth that he/she can use, and a desired time window within
which the transfer should be done. The proposed algorithm can find
alternate allocation possibilities,including earliest time for
completion, or shortest transfer duration - leaving the choice to
the user. The proposed algorithm is quite practical when applied to
large networks with thousands of routers and links. We have
implemented our algorithm for testing and incorporation into a
future version of OSCARS. We will finish the talk with a short
demonstration.
*/Host of Seminar/: *
Arie Shoshani
-----------
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Aug17presentation.v2 2009-aug09-lblc sseminar
1. Advance Network Reservation
and Provisioning for Science
Mehmet Balman, Arie Shoshani, Alex Sim, SDM
(with the help of Evangelos Chaniotakis,
David Robertson, Mary Thompson, ESNet)
2. Introduction
• Next generation research networks such as ESNet (Energy
Sciences Network) provide high-speed on-demand data
access between collaborating institutions by delivering
network-as-a-service.
• Currently, ESNet provides yes/no answers to a reservation• Currently, ESNet provides yes/no answers to a reservation
request for (bandwidth, start_time, end_time).
• We present an approach to improve the ESNet advance
network reservation system (OSCARS) by presenting to the
clients, the possible reservation options and alternatives for
earliest completion time and shortest transfer duration.
3. Outline
• Motivation
– Data Deluge and Resource Management
• Advance Network Reservation
– ESNet and OSCARS
• Problem
– Time Dependent Transport Networks– Time Dependent Transport Networks
• Algorithm and Methodology
– Concept: using Time Windows
• Implementation Details
– Objects developed for the new package
– Modular approach for integration into OSCARS
• Demo
• Questions
4. Motivation
We are in a new era that offers new oppurtunities to
conduct scientific research with the help of
computation
Computational intensive science: particle physics, climate
modelling, bio-informatics simulations
Scientific simulations and experimental facilities
generate massive data sets
Climate modelling data
35 terabytes shared by more then 2500 users worldwide,
Next generation archive will be more than 650 terabytes
Large Hadron Collider
Expected to generate 100gigabits per second
5. Motivation
Large scale application necessitate collaborations
Data need to be transferred to remote sites for further
analysis (validate with simulations)
Need on demand high speed data access between
collaborating partiescollaborating parties
High performance visualization
Large volume data analysis
Require mass storage systems
Need coordination and management of resources
( BeStMan: Berkeley Storage Manager)
6. ESNet (Energy Sciences Network)
Provides high bandwidth network interconnect between
more than 40 sites
Connecting experimental facilities, supercomputing
centers and thousands DOE scientists
Delivering network as a service (OSCARS)
Predictable performance
Efficient resource utilization
Guaranteed bandwidth
7. OSCARS
The ESNet On-Demand Secure Circuits and Advance
Reservation System (OSCARS)
Conducts a QoS path for guaranteed bandwidth
End-to-end provisioning between multiple domains
Guaranteed bandwidth (at certain time, for a certain
bandwidth and length of time)
OSCARS components include reservation manager, Bandwidth
scheduler, and path setup system
Needs to have information about current and future states of
the network
8. OSCARS Network Reservation
Users make reservation over a web service interface
Reservation request:
source/destination end-points
Requested bandwidthRequested bandwidth
start/end times
The shortest path on from source to destination is calculated
based on the engineering metric on each link, and a bandwidth
guaranteed path is set up to commit and eventually complete
the reservation request for the given time period
9. OSCARS Topology
Components (Graph):
node (router), port, link (connecting two ports)
engineering metric (~latency)
maximum bandwidth (capacity)
Reservation:Reservation:
source, destination, path, time
(time t1, t3) A -> B -> D (900Mbps)
(time t2, t3) A -> C -> D (400Mbps)
(time t4, t5) A -> B -> D (800Mpbs)
A
CB
D
800Mbps
900Mbps 500Mbps
1000Mbps
300Mbps
Reservation 1Reservation 1
Reservation 2Reservation 2
Reservation 3Reservation 3
t1
t2 t3
t4 t5
10. OSCARS Topology
Making a reservation:
need to ensure availability of the requested bandwidth from source to
destination for the requested time interval
(time t1, t2) A to D 500Mbps (yes)
(time t , t ) A to D 600Mbps (no)
A
800Mbps
1000Mbps
(time t1, t2) A to D 600Mbps (no)
(time t1, t3) A to C 500Mbps (no)
- (bandwidth splitting not allowed)
Active reservation
reservation 1: (time t1, t3) A -> B -> D (900Mbps)
reservation 2: (time t2, t3) A -> C -> D (400Mbps)
reservation 3: (time t4, t5) A -> B -> D (800Mpbs)
CB
D
800Mbps
900Mbps 500Mbps
300Mbps
11. Reservation
For every new reservation request
R={ nsource, ndestination, Mbandwidth, tstart, tend}.
committed reservations between tstart and tend are
examined
a snapshot graph G' of the network topology is
generated
by extracting available bandwidth information for each
port in the time period (tstart, tend)
G'=G(tstart, tend) status of the network in advance
12. Example
(time t1, t2) :
A to D (600Mbps) NO
A to D (500Mbps) YES
A
100 Mbps / 900Mbps (1000Mbps)
800 Mbps / 0Mbps (800Mbps)
300 Mbps / 0 Mbps (300Mbps)
CB
D
0 Mbps / 900Mbps (900Mbps) 500 Mbps / 0Mbps (500Mbps)
300 Mbps / 0 Mbps (300Mbps)
Active reservation
reservation 1: (time t1, t3) A -> B -> D (900Mbps)
reservation 2: (time t1, t3) A -> C -> D (400Mbps)
reservation 3: (time t4, t5) A -> B -> D (800Mpbs)
13. Example
A
100 Mbps / 900Mbps (1000Mbps)
400 Mbps / 400Mbps (800Mbps)
(time t1, t3) :
A to D (500Mbps) NO
A to C (500Mbps) No
CB
D
0 Mbps / 900Mbps (900Mbps) 100 Mbps / 400Mbps (500Mbps)
300 Mbps / 0 Mbps (300Mbps)
A to C (500Mbps) No
(not max-FLOW!)
Active reservation
reservation 1: (time t1, t3) A -> B -> D (900Mbps)
reservation 2: (time t1, t3) A -> C -> D (400Mbps)
reservation 3: (time t4, t5) A -> B -> D (800Mpbs)
14. End-to-End Data movement
End-to-end High Performance Data Movement
Bandwidth network reservation
Bandwidth provisioning in client sites
Storage allocation
Therefore, we need coordination between Storage
Resource Managers and Network Resource Allocation
But the requested bandwidth can not be guaranteed
Try-and-error until get an available reservation
15. Advance Network Reservation
Client are not given other possible options
Does not provide an optimal choice for client
May cause ineffective use of overall system
Overload system with trial-and-error attemptsOverload system with trial-and-error attempts
How can we enhance the OSCARS reservation
system?
Submit constraints and the system suggests possible
reservations satisfying requirements
16. A new service
Source / destination end-points
Maximum bandwidth that can be used
Amount of data requested to be transferred (Volume)
Earliest start time
Latest completion time
Criteria (reserver a path for earliest completion, reserve a
path shortest transfer duration)
17. Alternative
Users provide maximum bandwidth they can use, total size of the
data requested to be transferred, the earliest start time, and the
latest completion time
Users can set criteria such that they would like to reserve a path for
earliest completion time or reserve a path for shortest transfer
duration.duration.
Rs
'={ nsource , ndestination, MMAXbandwidth, DdataSize, tEarliestStart, tLatestEnd}.
The reservation engine finds out the reservation
R={ nsource, ndestination, Mbandwidth, tstart, tend}
for the earliest completion or for the shortest duration
where Mbandwidth≤ MMAXbandwidth and tEarliestStart ≤ tstart < tend≤ tLatestEnd .
18. Max-bandwidth
The maximum bandwidth available for allocation from
a source node to a destination node
Modified version of Kruskal and Dijstra's algorithms
– Shortest path,– Shortest path,
– Min-cost path
– Minimum spanning tree
– Max bandwidth path
• (The bandwidth of a path is the minimum of all links over the path)
• Fast and Efficient (do not visit all possible path)
19. Path Finding
Criteria: max bandwidth
can be min hop, min eng metric or
f(bandwidth, hop count, eng metric)
A
800Mbps /eng metric 201000Mbps/eng metric 10
CB
D
800Mbps /eng metric 20
900Mbps /eng metric 30 500Mbps / eng metric 100
1000Mbps/eng metric 10
300Mbps /
eng metric 20
20. Path Finding
A
CB
(2)
A
CB
(3)(1)
A
CB
300
D
Visit B
C (parent A) 800/20/1 hop
D (parent B) 900/30/2 hops
D
Visit D
Max bandwidth from A to D is 900
Visit A
B (parent A) 1000/10/ 1hop
C (parent A) 800/20/1 hop
D
Advantage of algorithm
- visiting all path is n!
- visiting edges in the worse case is n2
22. Time-dependent Graph
We deal with a dynamic network such that the bandwidth value for
every link is time dependent,
link=e(RouterA-port1, RouterB-port2) and linkbandwidth(ttime)
Graph algorithms for time-dependent dynamic networks has beenGraph algorithms for time-dependent dynamic networks has been
studied in the literature especially for max-flow and shortest path
algorithms
The most common approach is the discrete-time algorithms in
which the time is modeled as a set of discrete values and a static
graph is constructed for every time interval.
23. Example Problem
A vehicle travelling from city A to city B
There are multiple cities between A and B connected with separate
highways.
Each highway has a specific speed limit (maximum bandwidth)
But we need to reduce our speed if there is high traffic load on the
road
We know the load on each highway for every time period
(reservations)
The first question is which path the vehicle should follow in order to
reach city B from city A as early as possible?
Or, we can delay our journey and start later if the total travel time
would be reduced. Thus, the second question is to find the route
along with the starting time for shortest travel duration.
24. Challenge
But, we are dealing with bandwidth reservation where
allocation should be set in advance when a request is
received.
We have to set the speed limit before starting andWe have to set the speed limit before starting and
cannot change that during the journey
Advance Bandwitdth Reservation
Therefore, known time-dependent graph algorithms do
not fit into our problem domain.
25. Approach
Search interval is divided into time windows
A time window represents a period of time where we
have a stable status of available bandwidth of all related
linkslinks
A snaphots of the network topology in this time
windows
The algorithm should be fast and scalable. Presenting
clients/users possible reservation requests and alternate
options
26. Time Windows
Reservation 1: (time t1, t6) A -> B -> D
(900Mbps)
Reservation 2: (time t4, t7) A -> C -> D
(400Mbps)
Reservation 3: (time t9, t12) A -> B -> D
A
CB
800Mbps
900Mbps 500Mbps
1000Mbps
300Mbps
Reservation 3: (time t9, t12) A -> B -> D
(700Mpbs)
D
900Mbps 500Mbps
time
t4t2 t3t1 t5 t6 t7 t8 t9 t10 t11 t12 t13
Reservation 1Reservation 1
Reservation 2Reservation 2
Reservation 3Reservation 3
27. Time Windows
• Time windows between t1 and t13
time
t4t2 t3t1 t5 t6 t7 t8 t9 t10 t11 t12 t13
Reservation 1Reservation 1
Reservation 2Reservation 2
Reservation 3Reservation 3
Res 1 Res 1,2
Res
2
Res 3
t4t1 t6 t7 t9 t12 t13
time
time windows
28. Time windows
Res 1 Res 1,2
Re
s 2
t4t1
t6 t7 t9
A
100 Mbps
800 Mbps
A
100 Mbps
400 Mbps
A
1000 Mbps
400 Mbps
A
1000 Mbps
800 Mbps
t4 t6
t7
CB
D
0 Mbps
100 Mbps
500 Mbps
300 Mbps)
CB
D
0 Mbps
100 Mbps
100 Mbps
300 Mbps)
CB
D
900 Mbps
1000 Mbps
100 Mbps
300 Mbps)
CB
D
900 Mbps
1000 Mbps
500 Mbps
300 Mbps)
29. Time windows
Res 1,2 Res 2
t1
t6 t9
time windows
A
400 Mbps
A
400 Mbps
t6
CB
D
0 Mbps
100 Mbps
400 Mbps
100 Mbps
300 Mbps)
CB
D
900 Mbps
1000 Mbps
400 Mbps
100 Mbps
300 Mbps)
30. Search Time Windows
• Search through these time windows in a sequential order
to check whether we can satisfy the requested allocation
for that time window.
• First, check the duration of the time window• First, check the duration of the time window
– Can we satisfy the user request in that time windows?
(we know the max bandwidth user can support)
• Then, calculate the max bandwidth available in the time
window
31. Example
Reservation 1: (time t1, t6) A -> B -> D (900Mbps)
Reservation 2: (time t4, t7) A -> C -> D (400Mbps)
Reservation 3: (time t9, t12) A -> B -> D (700Mpbs)
A
CB
800Mbps
900Mbps 500Mbps
1000Mbps
300Mbps
Ex: from A to D
max bandwidth = 200Mbps
volume = 200Mbps x 4 time slots
earliest completion
earliest start = t1, latest finish t13
D
900Mbps 500Mbps
32. Search Time Windows
Res 1 Res 1,2
Res
2
Res 3
t4t1 t6 t7 t9 t12 t13
time
windows
Res 1
Res 1, 2
Res 1, 2t1--t6
t4—t6
t1--t4
Max bandwidth from A to D
1. 900Mbps (3)
2. 100Mbps (2)
3. 100Mbps (5)Res 1, 2
2
Res 1,2
Res 1, 2
Res 2
Res 1, 2
Res 1, 2
t1--t6
t6—t7
t4—t7
t1—t7
t7—t9
t6—t9
t4—t9
t1—t9
4. 900Mbps (1)
5. 100Mbps (3)
6. 100Mbps (6)
7. 900Mpbs (2)
8. 900Mbps (3)
9. 100Mbps (5)
10. 100Mbps (8)
Reservation: ( A to D ) (100Mbps) start=t1 end=t9
33. Search Time Windows
Res 1 Res 1,2
Res
2
Res 3
t4t1 t6 t7 t9 t12 t13
time
windows
Res 3
Res 3t9—t13
t12—t12
t9—t12
Max bandwidth from A to D
1. 200Mbps (3)
2. 900Mbps (1)
3. 200Mbps (4) Res 3t9—t13 3. 200Mbps (4)
shortest duration?
Reservation: (A to D ) (200Mbps) start=t9 end=t13
from A to D, max bandwidth = 200Mbps
volume = 175Mbps x 4 time slots
earliest start = t1, latest finish t13
earliest completion: ( A to D ) (100Mbps) start=t1 end=t8
shortest duration: ( A to D ) (200Mbps) start=t9 end=t12.5
34. Implementation Details
Abstract classes:
• Graph
– Node
• list of ports owned by this node
• Up/down?
– Port
• Max bandwidth, engineering metric, Destination Port (Link)
• Up/down?
• Reservation list
– Reservation
• Start time, end time, reserved bandwidth, Path ( list of port IDs )
• Active/inactive?
• Unique ID (node, port, reservation)
– Comparable, immutable object (GID object)
• Using JAVA collections (Set, Map, Linked List)
35. Implementation Details
• Time Window list
– Time window object
• List of active reservation in this time window
• Load reservations in the system• Load reservations in the system
• Update time window list by retrieving recently
added reservations
• Return list of active reservation in a given time
window
36. Time Window list
now infinite
Time windows list
new reservation: reservation 1, start t1, end t101 10
now t1 t10 infinite
Res 1
new reservation: reservation 2, start t12, end t20
now t1 t10 t12
Res 1
t20 infinite
Res 2
37. Time Window list
new reservation: reservation 3, start t9, end t17
now t1 t10 t17
Res 1
t20 infinite
Res 2,
Res 3
t9 t12
Res 1,
Res 3
Res 3
Time windows between t1 and t20
Res 1 Res 1,3 Res 3 Res 2,3
t1 t9 t10 t12 t17
t20
38. Implementation Details
• Value
– bandwidth values used to calculate path in each step
(searching time windows)
– Keeps only related link values
• ValueBucket• ValueBucket
– Register reservation list
– Initialize with a reachable set
– Query value object by giving a set of active reservations
• Keeps the status of the topology for a specific time
interval
39. Implementation Details
• Flow
– Register graph object
– Find the reachable set with the given maximum hop count
– Load a value object
– Find maximum bandwidth from source to destination
– No unnecessary memory allocation–
• Suggest
– Register graph object
– Register reservation list
– Update time window list if necessary
– Search time windows
– Suggest a reservation request for earliest completion time or shortest
duration
40. Implementation
• Graph object
• Reservation list
– Register graph
– Register reservations
• Query (source, destination, max bandwidth, volume, max hop count)
– Find reachable set from source to destination– Find reachable set from source to destination
– Search time windows
• If reservation request can not fit into the time window skip
• Get active reservations for the time window
• Query and obtain a value object for the time window
• Calculate max bandwidth using the value object
• Examine whether request can be satisfied or not?
– Return a reservation request
– Start time, end time
– Bandwidth to allocate
– Path Value (bandwidth, eng metric, hop count)
41. Modular Design for easy integration
into OSCARS
• Graph object, and Reservation objects already exist in OSCARS
– No need to replace them
• Other objects need to be added to OSCARS, including:
– Time Window object,
– Flow object,
– Value Bucket object,
– Suggest object– Suggest object
• Using “Registration” (reference) method, not “Loading” method
– E.g. in “flow”, a new graph needs to be only registered; no need to recreate a
new object
– This approach supports modularity
43. Demo
Generated graph has 12 nodes (node1 to node12 800Mbps available)
(node1 to node5 800Mbps available )
Reservations from node1 to node12
1 )max bandwidth 500, volume 3600000 (2hours x 500), start now
2) max bandwidth 300, volume 2160000 (2hours x 300), start after 1hour
3) max bandwidth 800, volume 2880000(1hours x 800), start after 4 hours
4) max bandwidth 200, volume 1440000 (2hours x 200), start after 6 hours4) max bandwidth 200, volume 1440000 (2hours x 200), start after 6 hours
5) max bandwidth 300, volume 2160000 (2hours x 300), start after 7 hours
For each:
Ask for a reservation request for earliest completion time
Apply the reservation
node1 to node12 max bandwidth 700, volume 4320000(2hours x 600)
node1 to node5 max bandwidth 700, volume 4320000(2hours x 600)
44. Demo
hours
42 31 5 6 7 8now
500
300
reservations
300
800
200
300
Time windows
Available bandwidth from node1 to node12
300 0 500 800 0 800 600 300
Available bandwidth from node1 to node5 (node1 to node8)
500 200 700 800 200 800 800 500
46. Thanks
• Motivation
– Data Deluge and Resource Management
• Advance Network Reservation
– ESNet and OSCARS
• Problem
– Time Dependent Transport Networks
• Algorithm and Methodology• Algorithm and Methodology
– Time Windows
• Implementation Details
– Integration into OSCARS
• Demo
• Questions?