Master Degree training program research project. The presentation introduces main objectives of the thesis and describes (without providing in-depth details) the most important aspects of the activity.
1. August 2013
[research overview]
University tutor:
Università di Catania
A distributed algorithm for
Medilink srl
STATELESS LOAD
BALANCING
Eng. A. Maddalena
Abstract: Distributing data-packets on stations with
Dipartimento di Ingegneria Elettrica,
Elettronica e Informatica
Prof. Eng. O. Tomarchio
Company supervisor:
Team Leader - R&D Manager
Università di Catania
scalable and optimal store and retrival functionalities.
Ensuring load balance without collecting load-info
from stations.
Dr. A. Tino
Keywords: Distributed-Systems, Algorithms, Big-Data,
Cloud, Balancing
Trainee:
Facoltà di Ingegneria Informatica
Specialistica
Medilink srl
Sezione Ricerca e Sviluppo
2. August 2013
PROBLEM DESCRIPTION
Many stations & data to store. Data can be fregmented into little units (packets) and sent to
stations. When balancing load, some problems occur.
problems
solutions of modern algorithms
Which station to choose for a packet?
Basing on info collected from stations or by
uniformly distributed random algorithms.
How to send a packet to a station?
IP address database, centralized
solutions, distributed ip tables.
How to retireve a packet? How to locate
the station it is stored in?
Need to memorize couple (packet-id,
station-id) after choosing dst station.
How to balance packets among
different stations?
Round-robin (stateless) approaches or
basing on station loads.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
3. August 2013
DEFINING TARGETS
What we want to find is an algorithm for load balancing able to reach the following objectives.
distributed system
No centralization. If one station
faults, the system will still be
running. Performance decay is,
however, allowed.
The algorithm does not need any
info regarding station current load
to perform station selection.
Sezione Ricerca e Sviluppo
When retrieving a packet from a
station, the process must be the
most efficient possible.
scalability
stateless
Medilink srl
packet lookup
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
The architecture must be scalable.
More stations can be added (also at
runtime). Detached stations must
not cause the system to fault.
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
4. August 2013
WHAT ABOUT THE OTHERS?
つづく
Load balancing is a known field in literature. Common practices exist.
A typical architecture is centralizing load balancing into a
single network component responsible for that task.
The Load Balancer typically knows everything about all
stations. Its task is to open connections on stations
upon requests. The decision is selecting a station to
open a connection to.
Very often, common architectures like Cysco and IBM,
organize servers in clusters and pools to handle group
configurations.
The balancer is not physically connected to stations.
Everything is done through TCP/IP and a list of IPs is kept.
In any case, the balancer has a complete knowledge.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
5. August 2013
WHAT ABOUT THE OTHERS?
終わり
Load balancing is a known field in literature. There are famous algorithms out there.
dummy/naive
station state
weighted r-r
Decision took basing on
each station state (e.g.
current load). Introduces
overhead on net. Good
balancing in all conditions.
Sezione Ricerca e Sviluppo
Rotating IP-addrs.
Stateless. Need to keep
track of dst station. Good
balancing on servers with
uniform capabilities.
Using hashes of IP-header
entries to calculate
destination station. Stateless.
Direct data-retrieval, bad
balancing.
First alive, static
assignment. Stateless
approach. Provides poor
balancing.
Medilink srl
round-robin
hash oriented
predictive
Like round-robin but halting
rotation on stations with
higher weights. Keep track
of dst station. Good
balancing on static conds.
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Station state is monitored
on few fixed periods.
Predictions on current state
are made basing on
asc/desc trends.
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
6. August 2013
KEY CONCEPT: NO CENTRALIZATION
The architecture must not include any centralized device or station. Think about P2P, but a little
bit more reliable and less chaotic.
Topology must ensure
the absence of
centralized schemes.
System deployed in
each station as a
ditributed architecture.
Networking like P2P but
data exchange and
stations are more reliable.
Packets are routed!
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
7. August 2013
KEY CONCEPT: DIRECT ADDRESSING
When assigning a station to a packet, the system will not save data about this association
anywhere. At retrieval, given the packet-id, the station must be located immediately.
On packet forwarding: destination station is
computed but not memorized anywhere. The
packet will be stored at the corresponding
station with no further overhead.
On packet retrieval: destination station is
computed without relying on other info.
Destination station is reached and packet
correctly fetched.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
8. August 2013
KEY CONCEPT: STATELESS BALANCING
To balance data-load on stations, no info is required from stations. The packet is assigned with
a station without any further operation.
Data load balancing does not require
data from stations prior to station
assignment or in any further moment.
Stations keep (almost) the same
amount of packets all the time.
No overhead is generated on the
network and in time evaluations when
balancing data-loads.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
9. August 2013
SUMMARIZING KEY CONCEPTS
To balance data-load on stations, no info is required from stations. The packet is assigned with
a station without any further operation.
distributed system
Allows the architecture to benefit from
P2P properties: scalability, flexibility
and fault tolerance.
direct addressing
Fast resource management. Packets
can be located with constant
complexity algorithms.
stateless balancing
No need to introduce overhead in
communications. No need to wait for
or store state data from stations.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
10. August 2013
SHOWING EARLY RESULTS
Most simplistic simulations show very good load balancing on basic station pools.
10 station basic pool simulations. 1000 pkts fed to the pool. Difference shown.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
11. August 2013
NOT A FIELD OF DAISES
There are many problems to solve. In particular, accurate simulations are needed.
Good simulations should try to emulate real scenarions with hundreds of thousands
of packets => big loads sent to stations and many more stations => big station pools.
Current developed simulations are slow (Mathworks Matlab, Wolfram
Mathematica). Mathematical environments + functional languages cannot provide
good performance. Need for better simulations => parallelization is possible!
Numerical problems on the way. Need for numerical methods => Need for good
and fast libraries!
Parallelization would definitely fasten simulations. Need for coded simulations =>
C/C++: good performance. Parallelization libraries + good performance:
architecture dependent parallel libraries.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
12. August 2013
WHERE TO GO FROM HERE
Most simplistic simulations show very good load balancing on basic station pools.
Coding new simulations in C/C++. Very fast, but also difficult!
Integrating libraries for numerical methods.
Integrating libraries for cryptography and networking.
Integrating Intel Cilk or Intel TBB libraries for multi-core parallelization.
Need for high performance architectures: 4-core or 6-core.
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica
13. August 2013
MORE THINGS TO HANDLE
The balancing architecture discovered so far is good, but more questions arise.
What if packets have not the same size? => Balancing with a known packet size
(continuos?) distribution.
How to handle dynamic station attachment/detachment from the pool?
Naive simulations show quite interesting (undesired) behaviors. What the causes?
How to solve these problems?
Medilink srl
Sezione Ricerca e Sviluppo
Tutor:
Prof. Eng. Orazio Tomarchio
DIIEI
Università di Catania
Supervisor:
Eng. Andrea Maddalena
Software Development
Medilink srl
Research trainee:
Dr. Andrea Tino
Università degli Studi di Catania
Ingegneria Informatica