How To Troubleshoot Collaboration Apps for the Modern Connected Worker
OpenNebula Conf | Lightning talk: Managing a Scientific Computing Facility with OpenNebula by Sara Valero
1. OpenNebula Conf - December 2-4, 2014 - Berlin
Managing a Scientific Computing Facility
with OpenNebula
!
Sara Vallero
on behalf of the INFN-Torino computing team
The present work is partially funded under contract 20108T4XTM of
Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (Italy).
1
2. The INFN Torino Computing Centre
INFN: Italian National Institution for Nuclear Physics
• fundamental physics studies
• several units in major Italian cities
Torino Unit
STORAGE RESOURCES
• 1600 TB (gross) total
COMPUTATIONAL RESOURCES
• 69 hypervisors (KVM)
• 1200 job-slots
• 200 virtual machines
LAN/WAN
• 10Gbps links
Cloud project started in 2009
OpenNebula Conf - December 2-4, 2014 - Berlin S.Vallero
2
Stakeholders
• WLCG grid Tier2
(primarily for the ALICE experiment
at CERN)
• grid Tier2 for the BESIII experiment
at IHEP Beijing
• Computing for upcoming
experiments:
• PANDA at FAIR Darmstadt
• Belle-2 at KEK Tsukuba
• Virtual Analysis Facility for ALICE
(interactive analysis, elasticity)
• Medical Image Processing
(local research group)
• Theory (local research group)
• Virtual farms on-demand
3. Two clusters for different VM classes
SERVICES-CLASS VMs (pets)
• provide critical services
• in/out-bound connectivity
• live migration
• server-class hardware
• no particular local disk I/O requirements
• shared image repository
• resiliency-optimized FS for shared system
disks (RAID1)
OpenNebula Conf - December 2-4, 2014 - Berlin S.Vallero
Storage
Image Repository Servers
Datastore
Services Cluster Workers Cluster
3
WORKERS-CLASS VMs (cattle)
• computational work-force
(e.g. grid worker nodes)
• private IP only
• high storage I/O performance
• lower-class hardware
• locally cached image repository
for fast start-up
• performance-optimized file system
Gluster
Replicated
Volume
Shared
Datastore
for running
VMs
Cache for
image repo
Datastore
4. Current and planned activities
move to new
OpenNebula tools
OpenNebula Conf - December 2-4, 2014 - Berlin S.Vallero
4
1. Toolkit for virtual farm on-demand provisioning
• Virtual Routers (OpenWRT appliances)
• Elastic public IP
• iSCSI datastore for persistent disk space
• EC2 interface
• CloudInit contextualisation
2. Elasticity
• automatic reallocation of VMs according to application’s needs (wherever
appropriate)
• though: works only in infinite resources approximation, we usually run in saturation
• in place only for the Virtual Analysis Facility so far
3. National federated cloud for scientific computing
• upcoming INFN-wide project mostly based on OpenStack
• need to interoperate with OpenStack-based geographical services (e.g. Keystone)
4. Monitoring as a service
• based on the ELK stack (ElasticSearch, Logstash, Kibana)
• uniform monitoring interface for applications/infrastructure