Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Hungarian ClusterGrid and its applications
1. Hungarian ClusterGrid and its
applications
Szalai Ferenc, NIIF Institute
szferi@niif.hu
http://www.clustergrid.hu
http://gug.grid.niif.hu
2. History
●
early 2002: Start, design phase
●
Jul 2002: Condor based production system
started
●
early 2003: Condor changed in the grid level to
own middleware
●
Jul 2003: development of new operating system
level services has been started
●
Nov 2005: development new generation
ClusterGrid middleware (Grid Underground)
has been started
●
Dec 2005: migration to the new operating
system level services
●
Feb 2006: change to the new middleware
3. ClusterGrid Architecure
●
ClusterGrid is a collection of individual clusters
and supercomputers integrating them with grid
middleware
●
Network oriented arch
– using MPLS VPN and 802.1q techniques for
separation -> „small private internet”
●
Virtualized central services using XEN:
– root DNS, monitoring (munin, nagios), Debian
repository mirror, entry points
●
Distributed storage
●
Clusters: standard Beowulf diskless arch.
based on GNU/Debian Linux distro
5. Storage
●
Goal
– build national distributed storage infrastucture for grid,
HPC, disastery recover backup
– reach at least 100 TB, be clever and cheep :)
●
Solution:
– IP (iSCSI) and/or ethernet (AoE) based storage
elements
– intelligent grid storage management using grid services
●
Current state: AoE based storage in two sites
(NIIF, SZTAKI) 22 TB
6. ClusterGrid applications
●
hunderd of registred users, dozens of active
●
over 80% unitilization
●
mainly parameter scanning from
– bioinformatics, statistical physics, information
science, biochemistry etc.
●
main problem: users are not familiar with
managing huge amount of job and data
processing, and porting application to different
paltforms (solaris, linux, etc.)
●
strong user support intergarted with normal
helpdesk
7. Usecase: Virtual Screening
●
Virtual screening: find molecules to be the base
of medicine with brute force technique
●
big virtual screening: find molecules for human
histamin receptor 4 (HHR4)
– known since 2001, member of GPCR group
●
using databases: 8 million molecule
●
using close source, binary only,
liscenced application:
Flexx (Biosolveit.de)
●
Takes 2 month total calcualtion
8. Grid UnderGround
●
new generation ClusterGrid middleware.
●
Since Feb 2006 using in the production system
●
Design goals:
– pure web service based framework (no WSRF)
– using selected GGF, W3C standards
– simplify service development
– focus on core services (info, storage, job management,
security, monitoring)
– KISS: Keep It Simple, Stupid
– destop and HPC ware: low memory and cpu usage
– open source development
(http://www.sourceforge.net/projects/gug)
9. GUG Architecture
●
Pure python framework:
– framework runs as a single daemon
– manage threads
– handle network communication over HTTP(S)/SOAP
– every service is a dinamicaly loadable plugin of the
framework, services use backends to separate
interfaces and functions
●
Mandatory services:
– Manager service: manage simple lifecycle of other
services. Remote management also possible.
– Grid Information System: p2p system to route
advertisements, service descriptions of services
(better than UDDI)
11. GUG Core Services
●
VOService (security)
– every entity identified by X509 cert
– every VO should set up at least one VO service
– manage authorization information, organize them into the
tree
– manage VO membership like a maling list
●
Job management components
– Exec: run and manage job in SMP systems (useful on
destops)
– Job Controller: using GGF BES interface and GGF
JSDL. Interface with common LRMS (eg: Condor, Exec
etc), no scheduling
– SuperScheduler: use the same interace and data model
as Job Controller, it's a grid level scheduler
12. GUG Core Services
●
Storage management components:
– file based arch.
– Storage Controller: stores and gives back files
using transport independent protocol like SRM
– ShareDirectory: directory and file sharing (same
interface as Storage Controller)
– File System Service: metadata catalog
– Storage Manager: provides POSIX like interface
(mkdir, ls, mv, cp etc.), create replicas on Storage
Contollers, manage file system entity types as a
plugin: file, directory, shared directory etc.
13. GUG Serives and UI
●
Additional services:
– Compiler service: create binaries from source to all
avalilable platforms. Use job management componets
– SNMP based monitoring comming soon
●
User Interface:
– modular command line interface: 'grid' command:
$ grid storage ls /grid/tmp R
/grid/tmp:
d 20060412 14:04 proba
/grid/tmp/proba:
8 20060412 14:05 szoveg
8 20060412 14:06 szoveg.1
8 20060412 14:06 masnev
$ grid job submit testjob.jsdl
– graphical and web interface comming soon
14. Future: KnowARC
●
EU funded FP6 project
●
Goal: create ligthweight, interoperable,
standard based best grid middleware ever,
strong industrial support
●
Solution: merge best features of Nordugrid
ARC and GUG
●
Partners: Oslo Univ. (NO), Lund Univ. (SE),
Uppsala (SE), Lubeck (DE), NBI (DK), SUN
(HU), s + c ag (DE), Geneve Hospital (CH),
Josif Safarik Inst. (SK), NIIF
●
More information: http://www.knowarc.eu