SlideShare uma empresa Scribd logo
1 de 96
Shared services – the future of HPC & big data facilities for UK research?
Martin Hamilton, Jisc
David Fergusson & Bruno Silva, Francis Crick Institute
Andreas Biternas, King’s College london
Thomas King, Queen Mary University of London
Photo credit: CC-BY-NC-ND JiscHPC & Big Data 2016
Shared services for HPC & big data
1. About Jisc
– Who, why and what?
– Success stories
2. Recent developments
3. Personal perspectives & panel discussion
– David Fergusson & Bruno Silva, Francis Crick Institute
– Andreas Biternas, King’sCollege London
– Thomas King, Queen Mary University of London
1. About Jisc
HPC & Big Data 2016
1. About Jisc
Jisc is the UK higher education, further education and
skills sectors’ not-for-profit organisation for digital
services and solutions.This is what we do:
› Operate shared digital infrastructure and services
for universities and colleges
› Negotiate sector-wide deals, e.g. with IT vendors
and commercial publishers
› Provide trusted advice and practical assistance
1. About Jisc
1. About Jisc
Janet network
[Image credit: Dan Perry]
1. About Jisc
Janet network
[Image credit: Dan Perry]
1. About Jisc
Netflix
Voicenet
Akamai
Virgin Radio
Bogons
Logicalis UK
Pipex / GXN
BBC
Datahop
InTechnology
INUK
Simplecall
LINX multicast
Gamma
Google
Simplecall
Redstone
Updata
aql
Voicenet
Google
Limelight
Limelight
Akamai
BTnet
Init7
Amazon
Microsoft EU (viaTN)
Telekom Malaysia
Globelynx
10Gbit/s
1Gbit/s
100Gbit/s
GÉANT
GÉANT+
LINX
Microsoft EU (viaTW)
Total external connectivity ≈ 1Tbit/s
Leeds
Akamai
Google
VM for LGfLInTechnology
NHS N3
Exa Networks
Synetrix BBC (HD 4K pilots)
One Connect
Glasgow
&
Edinburgh
HEAnet
BBC (Pacific Quay)
Gamma
BBC (HD 4K pilots)
NHS N3
SWAN (Glas)
SWAN (Edin)
Manchester
Telecity
Harbour
Exch.
Telehouse
North &
West
VM for LGfL
RM for Schools
VM for LGfL
RM for Schools
GlobalTransit
Tata
IXManchester
IXLeeds
GlobalTransit
Level3
GlobalTransit
Level3
1. About Jisc
Doing more,
for less
1. About Jisc
www.jisc.ac.uk/about/vat-cost-sharing-group
VAT Cost Sharing Group
› Largest in UK
(we believe)
› 93% of HEIs
› 256 institutions
participating
2. Recent developments
HPC & Big Data 2016
2. Recent developments
www.jisc.ac.uk/financial-x-ray
Financial X-Ray
› Easily understand and compare
overall costs for services
› Develop business cases for
changes to IT infrastructure
› Mechanism for dialogue
between finance and IT
departments
› Highlight comparative cost of
shared and commercial third
party services
2. Recent developments
Assent (formerly Project
Moonshot)
› Single, unifying technology that enables
you to effectively manage and control
access to a wide range of web and non-
web services and applications.
› These include cloud infrastructures, High
Performance Computing, Grid Computing
and commonly deployed services such as
email, file store, remote access and
instant messaging
www.jisc.ac.uk/assent
2. Recent developments
Equipment sharing
› Brokered industry access to £60m
public investment in HPC
› Piloting the Kit-Catalogue software,
helping institutions to share details
of high value equipment
› Newcastle University alone is
sharing £16m+ of >£20K value
equipment via Kit-Catalogue
Photo credit: HPC Midlands
http://bit.ly/jiscsharing
2. Recent developments
http://bit.ly/jiscsharing
Equipment sharing
› Working with EPSRC and University of
Southampton to operationalise
equipment.data as a national service
› 45 organisations sharing details of over
12,000 items of equipment
› Conservative estimate: £240m value
› Evidencing utilisation & sharing?
2. Recent developments
Janet Reach:
› £4M funding from BIS to work
towards a Janet which is "open and
accessible" to industry
› Provides industry access to university
e-infrastructure facilities to facilitate
further investment in science,
engineering and technology with the
active participation of business and
industry
› Modelled on Innovate UK
competition process
bit.ly/janetreach
2. Recent developments
Janet Reach:
› £4M funding from BIS to work
towards a Janet which is "open and
accessible" to industry
› Provides industry access to university
e-infrastructure facilities to facilitate
further investment in science,
engineering and technology with the
active participation of business and
industry
› Modelled on Innovate UK
competition process
bit.ly/jisc-hpc
2. Recent developments
Research Data
Management Shared
Service
› Procurement under way
› Aiming to pilot for 24 months
starting this summer
› 13 pilot institutions
› Research Data Network
› Find out more:
researchdata.jiscinvolve.org
2. Recent developments
Research Data
Discovery Service
› Alpha!
› Uses CKAN to aggregate
research data from institutions
› Test system has 16.7K datasets
from 14 organisations so far
› Search and browse:
ckan.data.alpha.jisc.ac.uk
3. Personal perspectives & panel discussion
HPC & Big Data 2016
3 . Personal perspectives
www.jisc.ac.uk/shared-data-centre
3. Personal perspectives
www.emedlab.ac.uk
3. Personal perspectives
› David Fergusson
› Head of Scientific Computing
› Bruno Silva
› HPC Lead
› Francis Crick Institute
eMedLab:
Merging HPC and Cloud for
Biomedical Research
Dr Bruno Silva
eMedLab Service Operations Manager
HPC Lead - The Francis Crick Institute
bruno.silva@crick.ac.uk 01/12/2015
Institutional Collaboration
Research Data
Multidisciplinary research
DIY...
Federated Institutional support
eMedLab
Ops
team
Inst. Support
Inst. Support
Inst. Support
Inst. Support
Inst. Support
Inst. Support
No funding available for
dedicated staff!
Winning bid
• 6048 cores (E5-2695v2)
• 252 IBM Flex servers, each with
• 24 cores
• 512GB RAM per compute server
• 240GB SSD (2x120GB RAID0)
• 2x10Gb Ethernet
• 3:1 Mellanox Ethernet fabric
• IBM GSS26 – Scratch 1.2PB
• IBM GSS24 – General Purpose (Bulk) 4.3PB
• Cloud OS – OpenStack
Benchmark results
preliminary
• Aggregate HPL (one run per server – embarrassingly parallel)
• Peak 460Gflops*252 = 116Tflops
• Max – 94%
• Min – 84%
• VM ≈ Bare metal HPL runs (16 core)
Benchmark results
preliminary – bare metal only
• Storage throughput
Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s)
Create Read Write Create Read Write
Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random
16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K
100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
eMedLab Service
eMedLab service
elasticluster
eMedLab Governance &
Support Model
Projects
• Principal Investigator / Project lead
• Reports to eMedLab governance
• Controls who has access to project resources
• Project Systems Administrator
• Institutional resource and / or
• Specialised research team member(s)
• Works closely with eMedLab support
• Researchers
• Those who utilise the software and data available in eMedLab for the project
Governance
MRC eMedLab
Project Board
(Board)
Executive
Committee (Exec)
Resource Allocation
Committee
(RAC)
Technical
Governance Group
(TGG)
Research Projects Operations
Federated Institutional support
Operations Team Support
(Support to facilitators and Systems Administrators)
Institutional Support
(direct support to research)
Tickets
Training
Documentation
elasticluster
Pilot Projects
Pilot Projects
• Spiros Denaxas - Integrating EHR into i2b2 data marts
Pilot Projects
• Taane Clark – Biobank Data Analysis – evaluation of analysis
tools
Pilot Projects
• Michael Barnes - TranSMART
Pilot Projects
• Chela James - Gene discovery, rapid genome sequencing,
somatic mutation analysis and high-definition phenotyping
VM
Image
Installing OS
CPU RAM Disk
“Flavours”
VM
Instanc
e
1
VM
Instanc
e
N
Network
Start/Stop/Hold/Checkpoint
Instance
Horizon Console
SSH - External IP
SSH – Tunnel
Web interface, etc…
Pilot Projects
• Peter Van Loo – Scalable, Collaborative, Cancer Genomics
Cluster
elasticluster
Pilot Projects
• Javier Herrero - Collaborative Medical Genomics Analysis Using
Arvados
Challenges
Challenges
Support Integration
Presentation
Performance
Security Allocation
Support
Challenges - Support
• High Barrier to entry
• Provide environments that resemble HPC or Desktop, or more intuitive interfaces
• Engender new thinking about workflows
• Promote Planning and Resource management
• Train support staff as well as researchers
• Resource-intensive support
• Promote community-based support and documentation
• Provide basic common tools and templates
• Upskill and mobilise local IT staff in departments
• Move IT support closer to the research project – Research Technologist
Integration
Challenges - Integration
• Suitability of POSIX Parallel file systems for Cloud Storage
• Working closely with IBM
• Copy-on-write feature of SS (GPFS) is quite useful for fast instance creation
• SS has actually quite a lot of the scaffolding required for a good object store
• Presentation SS or NAS to VMs requires additional AAAI layer
• Working closely with Red Hat and OCF to deliver IdM
• Presentation of SS to VMs introduces stability problems that could be worked-
around with additional SS licenses and some bespoke scripting
• Non-standard Network and Storage architecture
• Additional effort by vendors to ensure stable and performant infrastructure up-to-
date infrastructure – great efforts by everyone involved!
• Network re-design
Performance
Challenges - Performance
• File System Block Re-Mapping
• SS performs extremely well with 16MB blocks – we want to leverage this
• Hypervisor overhead (not all cores used for compute)
• Minimise number of cores “wasted” on cloud management
• On the other hand fewer cores means more memory bandwidth
• VM IO performance potentially affected by virtual network stack
• Leverage features available in the Mellanox NICs such as RoCE, SR-IOV, and
offload capabilities
Challenges – Performance
Block Re-Mapping
• SS (GPFS) is very good at handling many small files – by design
• VMs perform random IO reads and a few writes with their storage
• VM storage (and Cinder storage pools) are very large files on top of GPFS
• VM block size does not match SS (GPFS) block size
Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s)
Create Read Write Create Read Write
Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random
16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K
100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
Challenges – Performance
Block Re-Mapping
• Idea: turn random into sequential IO
• Have a GPFS standing
Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s)
Create Read Write Create Read Write
Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random
16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K
100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
Presentation
Challenges - Presentation
• Access to eMedLab through VPN only
• Increases security
• Limits upload throughput
• Rigid, non-standard networking
• Immediately provides a secure environment with complete separation
• Projects only need to add VMs to the existing network
• Very inflexible, limits the possibility of a shared ecosystem of “public”
services
• Introduces great administration overheads when creating new projects –
space for improvement
VM
Instanc
e
1
VM
Instanc
e
N
Project/Tenant Network
elasticluster
https://vpn.emedlab.ac.uk
Project/Tenant
Networks
elasticluster
https://vpn.emedlab.ac.uk
Public Network
DMZ
Security
Challenges - Security
• Presentation of SS shared storage to VMs raises security concerns
• VMs will have root access – even with squash, user can sidestep identity
• Re-export SS with a server-side authentication NAS protocol
• Alternatively, abstract shared storage with another service such as iRODS
• Ability of OpenStack users to maintain security of VMs
• Particularly problematic when deploying “from scratch” systems
• A competent, dedicated PSA mitigates this
Allocation
Challenges - Allocation
• Politics and Economics of “unscheduled” cloud
• Resource allocation in rigid portions of infrastructure (large, medium, small)
• Onus of resource utilisation is with Project team
• A charging model may have to be introduced to promote good behaviour
• The infrastructure supplier does not care about efficiency, as long as cost is recovered
• Scheduling over unallocated portions of infrastructure may help maximise utilisation
• Benefits applications that function as Direct Acyclid Graphs (DAGs)
• Private cloud is finite and limited
• Once it is fully allocated, projects will be on a waiting list, rather than a queue
• Cloud bursting can “de-limit” the cloud, if funding permits it
• This would be a talk on its own.
Future Developments
Future Developments
• VM and Storage performance analysis
• Create optimal settings recommendations for Project Systems Administrators and Ops team
• Revisit Network configuration
• Provide a simpler, more standard OpenStack environment
• Simplify service delivery, account creation, other administrative tasks
• Research Data Management for Shared Data
• Could be a service within the VM services ecosystem
• IRODS is a possibility
• Explore potential of Scratch
• Integration with Assent (Moonshot tech)
• Access to infrastructure through remote credentials and local authorisation
• First step to securely sharing data across sites (Safe Share project)
Conclusions
• eMedLab is ground breaking in terms
• Institutional collaboration around a shared infrastructure
• Federated support model
• Large scale High Performance Computing Cloud (it can be done!)
• Enabling a large scale highly customisable workloads for Biomedical research
• Linux cluster still required (POSIX legacy applications)
• SS guarantees this flexibility at very high performance
• We can introduce Bare Metal (Ironic) if needed for a highly versatile platform
• Automated scheduling of granular workloads
• Can be done inside the Cloud
• True Parnership - OCF, Red Hat, IBM, Lenovo, and Mellanox
• Partnership working very well
• All vendors highly invested in eMedLab’s success
The Technical Design Group
• Mike Atkins – UCL (Project Manager)
• Andy Cafferkey – EBI
• Richard Christie – QMUL (Chair)
• Pete Clapham – Sanger
• David Fergusson – the Crick
• Thomas King – QMUL
• Richard Passey – UCL
• Bruno Silva – the Crick
Institutional Support Teams
UCL:
Facilitator: David Wong
PSA: Faruque Sarker
Crick:
Facilitator: David Fergusson/Bruno Silva
PSA: Adam Huffman, Luke Raimbach, John Bouquiere
LSHTM:
Facilitator: Jackie Stewart
PSA: Steve Whitbread, Kuba Purebski
Institutional Support Teams
Sanger:
Facilitator: Tim Cutts, Josh Randall
PSA: Peter Clapham, James Beal
EMBL-EBI:
Facilitator: Steven Newhouse/Andy Cafferkey
PSA: Gianni Dalla Torre
QMUL:
Tom King
Operations Team
Thomas Jones (UCL) Pete Clapham (Sanger)
William Hay (UCL) James Beale (Sanger)
Luke Sudbery (UCL)
Tom King (QMUL)
Bruno Silva (Ops Manager, Crick)
Adam Huffman (Crick) Andy Cafferkey (EMBL-EBI)
Luke Raimbach (Crick) Rich Boyce (EMBL-EBI)
Stefan Boeing (Data Manager, Crick) David Ocana (EMBL-EBI)
I’ll stop here…
Thank You!
VM
Image
Installing OS
CPU RAM Disk
“Flavours”
VM
Instanc
e
1
VM
Instanc
e
N
Network
Start/Stop/Hold/Checkpoint
Instance
Horizon Console
SSH - External IP
SSH – Tunnel
Web interface, etc…
VM
Instanc
e
1
VM
Instanc
e
N
Tenant Network
Open Stack
Cinder
Block Storage
(single VM access) The Internet
Tenant Network
VM
Instanc
e
1
VM
Instanc
e
N
Project/Tenant Network
elasticluster
https://vpn.emedlab.ac.uk
example research themes to be studied in the Academy Labs; by exploiting the commonalities underl
the datasets, we shall build tools and algorithms that cut across the spectrum of diseases.
Storage,)Compute,)Security,)Networking)
Access)to)Infrastructure)
Tools)&)analy<cs)
Genomic,)imaging,)clinical)datasets)
Cancer,)rare)and)cardiovascular)diseases)
GSK,)Saran)Cannon,)
DDN,)Intel,)IBM,)
Aridhia))
Farr)Ins<tute,)
Genomics)England,)
UCLH)BRC)
Informa<on)flow)
links)
ELIXIR,)ENCODE,)
1000)Genomes,)
Ensembl)
Proposed)funding)
External)funding)
Fig#1.
users to sh
within mul
will also
services a
developed
systems.
programme
infrastructu
resources
technologie
parts of th
others. Ea
guaranteed
Private(
Secure(
Collabora0ve(
Space(
Partner(
projects(
eMedLab(
Partner(
projects(
Partner(
projects(
EBI(
Partner(
projects(
FARR@UCLP(
Kings((
Health(
Partners(
Fig#3.!The!co
shared!data
resources!al
Winning bid
• Standard Compute cluster
• Ethernet network fabric
• Spectrum Scale storage
• Cloud OS
Initial requirements
• Hardware geared towards very high data throughput work – capable
for running an HPC cluster and a Cloud based on VMs
• Cloud OS (open source and commercial option)
• Tiered storage system for:
• High performance data processing
• Data Sharing
• Project storage
• VM storage
Bid responses – interesting facts
• Majority providing OpenStack as the Cloud OS
• Half included an HPC and a Cloud environment
• One provided a Vmware-based solution
• One provided a OpenStack-only solution
• Half tender responses offered Lustre
• One provided Ceph for VM storage
3. Personal perspectives
› Andreas Biternas
› HPC & Linux Lead
› King’s College London
CHALLENGES HAVING A SERVER FARM IN THE CENTER OF
LONDON
A N D R EAS B I T ER NAS
F A CU LTY OF N A TU RAL A N D M A T HEMATICAL S C I ENCES
King’s College HPC
infrastructure in JISC
DC
• Cost of Space: Roughly £25k per square meter in Strand;
• Power:
• Expensive switches and UPS which require annual maintenance;
• Unreliable power supply due to high demand in center of London;
• Cooling:
• Expensive cooling system similar to one in Virtus DC;
• High cost for running and maintenance of the system;
• Weight: Due to the oldness of the building, there are strict weight restrictions
as an auditorium is below the server farm(!);
• Noise pollution: There is strong noise from the server farm up to 2 floors
below;
Problems and costs of having server farm in
Strand campus
King’s college infrastructure in Virtus DC
• Total 25 cabinets with ~200 racks in Data Hall 1:
• 16 cabinets HPC cluster ADA+Rosalind;
• Rest King’s Central IT infrastructure: fileservers, firewalls etc.;
• Rosalind, a consortium between Faculty of Natural and
Mathematical Sciences, South London and Maudsley NHS
Foundation Trust BRC( Biomedical Research Centre) and
Guy’s and St Thomas’ NHS Foundation Trust BRC;
• Rosalind has around 5000 cores, ~150 Teraflops,
HPC and Cloud part using OpenStack;
Features of Virtus Datacentre
• Power:
• Two Redundant central power connections;
• UPS & onsite power generator;
• Two redundant PSU in each rack ;
• Cooling:
• Chilled water system cooled via fresh air;
• Configures as hot and cold aisles;
• Services:
• Remote hands;
• Installation and maintenance;
• Office, storing spaces and wifi;
• Secure access control environment;
• Better internet connection;
• No “single” connections;
• Fully resilient network;
• The bandwidth requirements of
large data sets were being met;
Connectivity with Virtus Datacentre
• Due to the contract with JISC, tenants(Francis Crick Institute,
Queen Mary University, King’s College etc.) have special rates;
• Costs:
• Standard fee for each rack which includes costs of space, cooling,
connectivity etc.;
• Power consumed form each rack in normal market(education) prices;
Costs of Virtus Datacentre
3. Personal perspectives
› Thomas King
› Head of Research Infrastructure
› Queen Mary University of London
Queen Mary
University of London
Tom King
Head of Research Infrastructure, IT Services
Who are we?
 20,000 students and 4,000 staff
 5 campuses in London
 3 faculties
Humanities & Social Sciences
Science & Engineering
Barts & the London School of Medicine and Dentistry
Copyright Tim Peake, ESA, NASA
Old World IT
 Small central provision
 Lots of independent teams offering a lot of overlap in services and
bespoke solutions
 21 machine rooms
IT Transformation Programme 2012-15
 Centralisation of staff and services ~200 people
 Consolidation into two data centres
 On-site ~20 racks
 Off-site facility within fibre channel latency distances
 Highly virtualised environment
 Enterprise services run in active-active
 JISC Janet6 upgrades
Research IT
 Services we support –
HPC
Research Storage
Hardware hosting
Clinical and secure systems
 Enterprise virtualisation is not what we’re after
 Five nines is not our issue – bang for buck
 No room at the inn
 Build our own on-site?
 The OAP home
Benefits of shared data centre
 Buying power and tenant’s association
 Better PUE than smaller on-site DC
contribution to sustainability commitment
 Transparent costing for power use
 Network redundancy – L2 and L3 of JISC network
 Collaboration – it’s all about the data
 Cloudier projects
Emotional detachment from blinking LEDs
Direction of funding – GridPP, Environmental omics Cloud
That’s all folks…
Except where otherwise noted, this
work is licensed under CC-BY
Martin Hamilton
Futurist, Jisc, London
@martin_hamilton
martin.hamilton@jisc.ac.uk
HPC & Big Data 2016

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Frictionless Supercomputing - MEW25
Frictionless Supercomputing - MEW25Frictionless Supercomputing - MEW25
Frictionless Supercomputing - MEW25
 
Research data spring: clipper
Research data spring: clipperResearch data spring: clipper
Research data spring: clipper
 
The user -driven evolution of Janet - Jisc Digifest 2016
The user -driven evolution of Janet - Jisc Digifest 2016The user -driven evolution of Janet - Jisc Digifest 2016
The user -driven evolution of Janet - Jisc Digifest 2016
 
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205
 
Data centre networking at the University of Bristol - Networkshop44
Data centre networking at the University of Bristol  - Networkshop44Data centre networking at the University of Bristol  - Networkshop44
Data centre networking at the University of Bristol - Networkshop44
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European Commission
 
Research data spring - Jisc Digital Festival 2015
Research data spring - Jisc Digital Festival 2015Research data spring - Jisc Digital Festival 2015
Research data spring - Jisc Digital Festival 2015
 
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)Jisc - Rebooting a National Innovation Agency (EUNIS 2014)
Jisc - Rebooting a National Innovation Agency (EUNIS 2014)
 
Health and clinical research - data futures, NIHR accelerating digital programme
Health and clinical research - data futures, NIHR accelerating digital programmeHealth and clinical research - data futures, NIHR accelerating digital programme
Health and clinical research - data futures, NIHR accelerating digital programme
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
perfSONAR: getting telemetry on your network
perfSONAR: getting telemetry on your networkperfSONAR: getting telemetry on your network
perfSONAR: getting telemetry on your network
 
Cross e-Infrastructure collaborations
Cross e-Infrastructure collaborationsCross e-Infrastructure collaborations
Cross e-Infrastructure collaborations
 
The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...The Climate Tagger - a tagging and recommender service for climate informatio...
The Climate Tagger - a tagging and recommender service for climate informatio...
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
The Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and NeedsThe Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and Needs
 
Frictionless Sharing - The New Normal?
Frictionless Sharing - The New Normal?Frictionless Sharing - The New Normal?
Frictionless Sharing - The New Normal?
 
Report on EDINA Authentication Related Academic Sector Activities
Report on EDINA Authentication Related Academic Sector ActivitiesReport on EDINA Authentication Related Academic Sector Activities
Report on EDINA Authentication Related Academic Sector Activities
 
Shibboleth Access Management Federations as an Organisational Model for SDI
Shibboleth Access Management Federations as an Organisational Model for SDIShibboleth Access Management Federations as an Organisational Model for SDI
Shibboleth Access Management Federations as an Organisational Model for SDI
 
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - BrusselsPoolParty Semantic Suite -  LT-Innovate Industry Summit-2016 - Brussels
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
 

Semelhante a Shared services - the future of HPC and big data facilities for UK research

Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSC
inside-BigData.com
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Dan Taylor
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
Ming Li
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 

Semelhante a Shared services - the future of HPC and big data facilities for UK research (20)

Janet Network R&D Innovation - HEAnet / Juniper Innovation Day
Janet Network R&D Innovation - HEAnet / Juniper Innovation DayJanet Network R&D Innovation - HEAnet / Juniper Innovation Day
Janet Network R&D Innovation - HEAnet / Juniper Innovation Day
 
Pioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSCPioneering and Democratizing Scalable HPC+AI at PSC
Pioneering and Democratizing Scalable HPC+AI at PSC
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016
 
Ticer summer school_24_aug06
Ticer summer school_24_aug06Ticer summer school_24_aug06
Ticer summer school_24_aug06
 
RD shared services and research data spring
RD shared services and research data springRD shared services and research data spring
RD shared services and research data spring
 
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John KayeUKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
 
The e-Ciber Superfacility Project
The e-Ciber Superfacility ProjectThe e-Ciber Superfacility Project
The e-Ciber Superfacility Project
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
UKRDDS Project Overview - Feb 2016
UKRDDS Project Overview - Feb 2016UKRDDS Project Overview - Feb 2016
UKRDDS Project Overview - Feb 2016
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 

Mais de Martin Hamilton

Mais de Martin Hamilton (20)

Keep taking the tablets? The graduation of the iPad generation
Keep taking the tablets? The graduation of the iPad generationKeep taking the tablets? The graduation of the iPad generation
Keep taking the tablets? The graduation of the iPad generation
 
The Intelligent Campus - Where the Internet of Things meets AI - HESCA June 2018
The Intelligent Campus - Where the Internet of Things meets AI - HESCA June 2018The Intelligent Campus - Where the Internet of Things meets AI - HESCA June 2018
The Intelligent Campus - Where the Internet of Things meets AI - HESCA June 2018
 
The Digital Book Thief has a Napster Moment - Edinburgh Near Future Library S...
The Digital Book Thief has a Napster Moment - Edinburgh Near Future Library S...The Digital Book Thief has a Napster Moment - Edinburgh Near Future Library S...
The Digital Book Thief has a Napster Moment - Edinburgh Near Future Library S...
 
Martin Hamilton - The wind from nowhere - Horizon scanning in an uncertain ag...
Martin Hamilton - The wind from nowhere - Horizon scanning in an uncertain ag...Martin Hamilton - The wind from nowhere - Horizon scanning in an uncertain ag...
Martin Hamilton - The wind from nowhere - Horizon scanning in an uncertain ag...
 
From Blockchain to Brexit - edtech trends for 2018 - BETT 2018
From Blockchain to Brexit - edtech trends for 2018 - BETT 2018From Blockchain to Brexit - edtech trends for 2018 - BETT 2018
From Blockchain to Brexit - edtech trends for 2018 - BETT 2018
 
Martin Hamilton - Digital skills: You won't believe what happened next!
Martin Hamilton - Digital skills: You won't believe what happened next!Martin Hamilton - Digital skills: You won't believe what happened next!
Martin Hamilton - Digital skills: You won't believe what happened next!
 
Martin Hamilton - Librarians in Outer Space - CILIP invited talk
Martin Hamilton - Librarians in Outer Space - CILIP invited talkMartin Hamilton - Librarians in Outer Space - CILIP invited talk
Martin Hamilton - Librarians in Outer Space - CILIP invited talk
 
Martin Hamilton - The impact of technology on the higher education sector - L...
Martin Hamilton - The impact of technology on the higher education sector - L...Martin Hamilton - The impact of technology on the higher education sector - L...
Martin Hamilton - The impact of technology on the higher education sector - L...
 
Martin Hamilton - Robots and AI, the calm before the Singularity? - BCS invit...
Martin Hamilton - Robots and AI, the calm before the Singularity? - BCS invit...Martin Hamilton - Robots and AI, the calm before the Singularity? - BCS invit...
Martin Hamilton - Robots and AI, the calm before the Singularity? - BCS invit...
 
Martin Hamilton - What did your AI make today? - BCS invited talk
Martin Hamilton - What did your AI make today? - BCS invited talkMartin Hamilton - What did your AI make today? - BCS invited talk
Martin Hamilton - What did your AI make today? - BCS invited talk
 
Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
 
Imagining Mars University - Universities UK 2017 conference
Imagining Mars University - Universities UK 2017 conferenceImagining Mars University - Universities UK 2017 conference
Imagining Mars University - Universities UK 2017 conference
 
Back to the future - Future Proof IT 2017
Back to the future - Future Proof IT 2017Back to the future - Future Proof IT 2017
Back to the future - Future Proof IT 2017
 
Tech in exams - SQA Assessment Expert Group - June 2017
Tech in exams - SQA Assessment Expert Group - June 2017Tech in exams - SQA Assessment Expert Group - June 2017
Tech in exams - SQA Assessment Expert Group - June 2017
 
Through the Overton Window - Health Education England horizon scanning worksh...
Through the Overton Window - Health Education England horizon scanning worksh...Through the Overton Window - Health Education England horizon scanning worksh...
Through the Overton Window - Health Education England horizon scanning worksh...
 
A new life awaits you in the off world colonies - UCISA Spotlight on Digital ...
A new life awaits you in the off world colonies - UCISA Spotlight on Digital ...A new life awaits you in the off world colonies - UCISA Spotlight on Digital ...
A new life awaits you in the off world colonies - UCISA Spotlight on Digital ...
 
Help! My robot is a teacher! - Future Edtech 2017
Help! My robot is a teacher! - Future Edtech 2017Help! My robot is a teacher! - Future Edtech 2017
Help! My robot is a teacher! - Future Edtech 2017
 
Towards a UK Edtech Strategy - Edtech Vision 2020
Towards a UK Edtech Strategy - Edtech Vision 2020Towards a UK Edtech Strategy - Edtech Vision 2020
Towards a UK Edtech Strategy - Edtech Vision 2020
 
Bridging the digital divide - Digital Skills Summit 2017
Bridging the digital divide - Digital Skills Summit 2017Bridging the digital divide - Digital Skills Summit 2017
Bridging the digital divide - Digital Skills Summit 2017
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Último (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 

Shared services - the future of HPC and big data facilities for UK research

  • 1. Shared services – the future of HPC & big data facilities for UK research? Martin Hamilton, Jisc David Fergusson & Bruno Silva, Francis Crick Institute Andreas Biternas, King’s College london Thomas King, Queen Mary University of London Photo credit: CC-BY-NC-ND JiscHPC & Big Data 2016
  • 2. Shared services for HPC & big data 1. About Jisc – Who, why and what? – Success stories 2. Recent developments 3. Personal perspectives & panel discussion – David Fergusson & Bruno Silva, Francis Crick Institute – Andreas Biternas, King’sCollege London – Thomas King, Queen Mary University of London
  • 3. 1. About Jisc HPC & Big Data 2016
  • 4. 1. About Jisc Jisc is the UK higher education, further education and skills sectors’ not-for-profit organisation for digital services and solutions.This is what we do: › Operate shared digital infrastructure and services for universities and colleges › Negotiate sector-wide deals, e.g. with IT vendors and commercial publishers › Provide trusted advice and practical assistance
  • 6. 1. About Jisc Janet network [Image credit: Dan Perry]
  • 7. 1. About Jisc Janet network [Image credit: Dan Perry]
  • 8. 1. About Jisc Netflix Voicenet Akamai Virgin Radio Bogons Logicalis UK Pipex / GXN BBC Datahop InTechnology INUK Simplecall LINX multicast Gamma Google Simplecall Redstone Updata aql Voicenet Google Limelight Limelight Akamai BTnet Init7 Amazon Microsoft EU (viaTN) Telekom Malaysia Globelynx 10Gbit/s 1Gbit/s 100Gbit/s GÉANT GÉANT+ LINX Microsoft EU (viaTW) Total external connectivity ≈ 1Tbit/s Leeds Akamai Google VM for LGfLInTechnology NHS N3 Exa Networks Synetrix BBC (HD 4K pilots) One Connect Glasgow & Edinburgh HEAnet BBC (Pacific Quay) Gamma BBC (HD 4K pilots) NHS N3 SWAN (Glas) SWAN (Edin) Manchester Telecity Harbour Exch. Telehouse North & West VM for LGfL RM for Schools VM for LGfL RM for Schools GlobalTransit Tata IXManchester IXLeeds GlobalTransit Level3 GlobalTransit Level3
  • 9. 1. About Jisc Doing more, for less
  • 10. 1. About Jisc www.jisc.ac.uk/about/vat-cost-sharing-group VAT Cost Sharing Group › Largest in UK (we believe) › 93% of HEIs › 256 institutions participating
  • 11. 2. Recent developments HPC & Big Data 2016
  • 12. 2. Recent developments www.jisc.ac.uk/financial-x-ray Financial X-Ray › Easily understand and compare overall costs for services › Develop business cases for changes to IT infrastructure › Mechanism for dialogue between finance and IT departments › Highlight comparative cost of shared and commercial third party services
  • 13. 2. Recent developments Assent (formerly Project Moonshot) › Single, unifying technology that enables you to effectively manage and control access to a wide range of web and non- web services and applications. › These include cloud infrastructures, High Performance Computing, Grid Computing and commonly deployed services such as email, file store, remote access and instant messaging www.jisc.ac.uk/assent
  • 14. 2. Recent developments Equipment sharing › Brokered industry access to £60m public investment in HPC › Piloting the Kit-Catalogue software, helping institutions to share details of high value equipment › Newcastle University alone is sharing £16m+ of >£20K value equipment via Kit-Catalogue Photo credit: HPC Midlands http://bit.ly/jiscsharing
  • 15. 2. Recent developments http://bit.ly/jiscsharing Equipment sharing › Working with EPSRC and University of Southampton to operationalise equipment.data as a national service › 45 organisations sharing details of over 12,000 items of equipment › Conservative estimate: £240m value › Evidencing utilisation & sharing?
  • 16. 2. Recent developments Janet Reach: › £4M funding from BIS to work towards a Janet which is "open and accessible" to industry › Provides industry access to university e-infrastructure facilities to facilitate further investment in science, engineering and technology with the active participation of business and industry › Modelled on Innovate UK competition process bit.ly/janetreach
  • 17. 2. Recent developments Janet Reach: › £4M funding from BIS to work towards a Janet which is "open and accessible" to industry › Provides industry access to university e-infrastructure facilities to facilitate further investment in science, engineering and technology with the active participation of business and industry › Modelled on Innovate UK competition process bit.ly/jisc-hpc
  • 18. 2. Recent developments Research Data Management Shared Service › Procurement under way › Aiming to pilot for 24 months starting this summer › 13 pilot institutions › Research Data Network › Find out more: researchdata.jiscinvolve.org
  • 19. 2. Recent developments Research Data Discovery Service › Alpha! › Uses CKAN to aggregate research data from institutions › Test system has 16.7K datasets from 14 organisations so far › Search and browse: ckan.data.alpha.jisc.ac.uk
  • 20. 3. Personal perspectives & panel discussion HPC & Big Data 2016
  • 21. 3 . Personal perspectives www.jisc.ac.uk/shared-data-centre
  • 23. 3. Personal perspectives › David Fergusson › Head of Scientific Computing › Bruno Silva › HPC Lead › Francis Crick Institute
  • 24. eMedLab: Merging HPC and Cloud for Biomedical Research Dr Bruno Silva eMedLab Service Operations Manager HPC Lead - The Francis Crick Institute bruno.silva@crick.ac.uk 01/12/2015
  • 26.
  • 29. Federated Institutional support eMedLab Ops team Inst. Support Inst. Support Inst. Support Inst. Support Inst. Support Inst. Support No funding available for dedicated staff!
  • 30. Winning bid • 6048 cores (E5-2695v2) • 252 IBM Flex servers, each with • 24 cores • 512GB RAM per compute server • 240GB SSD (2x120GB RAID0) • 2x10Gb Ethernet • 3:1 Mellanox Ethernet fabric • IBM GSS26 – Scratch 1.2PB • IBM GSS24 – General Purpose (Bulk) 4.3PB • Cloud OS – OpenStack
  • 31.
  • 32. Benchmark results preliminary • Aggregate HPL (one run per server – embarrassingly parallel) • Peak 460Gflops*252 = 116Tflops • Max – 94% • Min – 84% • VM ≈ Bare metal HPL runs (16 core)
  • 33. Benchmark results preliminary – bare metal only • Storage throughput Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s) Create Read Write Create Read Write Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random 16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K 100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
  • 37. Projects • Principal Investigator / Project lead • Reports to eMedLab governance • Controls who has access to project resources • Project Systems Administrator • Institutional resource and / or • Specialised research team member(s) • Works closely with eMedLab support • Researchers • Those who utilise the software and data available in eMedLab for the project
  • 38. Governance MRC eMedLab Project Board (Board) Executive Committee (Exec) Resource Allocation Committee (RAC) Technical Governance Group (TGG) Research Projects Operations
  • 39. Federated Institutional support Operations Team Support (Support to facilitators and Systems Administrators) Institutional Support (direct support to research) Tickets Training Documentation elasticluster
  • 41. Pilot Projects • Spiros Denaxas - Integrating EHR into i2b2 data marts
  • 42. Pilot Projects • Taane Clark – Biobank Data Analysis – evaluation of analysis tools
  • 43. Pilot Projects • Michael Barnes - TranSMART
  • 44. Pilot Projects • Chela James - Gene discovery, rapid genome sequencing, somatic mutation analysis and high-definition phenotyping VM Image Installing OS CPU RAM Disk “Flavours” VM Instanc e 1 VM Instanc e N Network Start/Stop/Hold/Checkpoint Instance Horizon Console SSH - External IP SSH – Tunnel Web interface, etc…
  • 45. Pilot Projects • Peter Van Loo – Scalable, Collaborative, Cancer Genomics Cluster elasticluster
  • 46. Pilot Projects • Javier Herrero - Collaborative Medical Genomics Analysis Using Arvados
  • 50. Challenges - Support • High Barrier to entry • Provide environments that resemble HPC or Desktop, or more intuitive interfaces • Engender new thinking about workflows • Promote Planning and Resource management • Train support staff as well as researchers • Resource-intensive support • Promote community-based support and documentation • Provide basic common tools and templates • Upskill and mobilise local IT staff in departments • Move IT support closer to the research project – Research Technologist
  • 52. Challenges - Integration • Suitability of POSIX Parallel file systems for Cloud Storage • Working closely with IBM • Copy-on-write feature of SS (GPFS) is quite useful for fast instance creation • SS has actually quite a lot of the scaffolding required for a good object store • Presentation SS or NAS to VMs requires additional AAAI layer • Working closely with Red Hat and OCF to deliver IdM • Presentation of SS to VMs introduces stability problems that could be worked- around with additional SS licenses and some bespoke scripting • Non-standard Network and Storage architecture • Additional effort by vendors to ensure stable and performant infrastructure up-to- date infrastructure – great efforts by everyone involved! • Network re-design
  • 54. Challenges - Performance • File System Block Re-Mapping • SS performs extremely well with 16MB blocks – we want to leverage this • Hypervisor overhead (not all cores used for compute) • Minimise number of cores “wasted” on cloud management • On the other hand fewer cores means more memory bandwidth • VM IO performance potentially affected by virtual network stack • Leverage features available in the Mellanox NICs such as RoCE, SR-IOV, and offload capabilities
  • 55. Challenges – Performance Block Re-Mapping • SS (GPFS) is very good at handling many small files – by design • VMs perform random IO reads and a few writes with their storage • VM storage (and Cinder storage pools) are very large files on top of GPFS • VM block size does not match SS (GPFS) block size Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s) Create Read Write Create Read Write Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random 16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K 100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
  • 56. Challenges – Performance Block Re-Mapping • Idea: turn random into sequential IO • Have a GPFS standing Bulk File System (gpfsperf GB/s) Scratch File System (gpfsperf GB/s) Create Read Write Create Read Write Sequential Sequential Random Sequential Random Sequential Sequential Random Sequential Random 16M 16M 512K 16M 512K 16M 512K 16M 512K 16M 16M 512K 16M 512K 16M 512K 16M 512K 100 88 86 131 22 96 97 89 60 141 84 83 107 20 137 137 125 28
  • 58. Challenges - Presentation • Access to eMedLab through VPN only • Increases security • Limits upload throughput • Rigid, non-standard networking • Immediately provides a secure environment with complete separation • Projects only need to add VMs to the existing network • Very inflexible, limits the possibility of a shared ecosystem of “public” services • Introduces great administration overheads when creating new projects – space for improvement
  • 62. Challenges - Security • Presentation of SS shared storage to VMs raises security concerns • VMs will have root access – even with squash, user can sidestep identity • Re-export SS with a server-side authentication NAS protocol • Alternatively, abstract shared storage with another service such as iRODS • Ability of OpenStack users to maintain security of VMs • Particularly problematic when deploying “from scratch” systems • A competent, dedicated PSA mitigates this
  • 64. Challenges - Allocation • Politics and Economics of “unscheduled” cloud • Resource allocation in rigid portions of infrastructure (large, medium, small) • Onus of resource utilisation is with Project team • A charging model may have to be introduced to promote good behaviour • The infrastructure supplier does not care about efficiency, as long as cost is recovered • Scheduling over unallocated portions of infrastructure may help maximise utilisation • Benefits applications that function as Direct Acyclid Graphs (DAGs) • Private cloud is finite and limited • Once it is fully allocated, projects will be on a waiting list, rather than a queue • Cloud bursting can “de-limit” the cloud, if funding permits it • This would be a talk on its own.
  • 66. Future Developments • VM and Storage performance analysis • Create optimal settings recommendations for Project Systems Administrators and Ops team • Revisit Network configuration • Provide a simpler, more standard OpenStack environment • Simplify service delivery, account creation, other administrative tasks • Research Data Management for Shared Data • Could be a service within the VM services ecosystem • IRODS is a possibility • Explore potential of Scratch • Integration with Assent (Moonshot tech) • Access to infrastructure through remote credentials and local authorisation • First step to securely sharing data across sites (Safe Share project)
  • 67. Conclusions • eMedLab is ground breaking in terms • Institutional collaboration around a shared infrastructure • Federated support model • Large scale High Performance Computing Cloud (it can be done!) • Enabling a large scale highly customisable workloads for Biomedical research • Linux cluster still required (POSIX legacy applications) • SS guarantees this flexibility at very high performance • We can introduce Bare Metal (Ironic) if needed for a highly versatile platform • Automated scheduling of granular workloads • Can be done inside the Cloud • True Parnership - OCF, Red Hat, IBM, Lenovo, and Mellanox • Partnership working very well • All vendors highly invested in eMedLab’s success
  • 68. The Technical Design Group • Mike Atkins – UCL (Project Manager) • Andy Cafferkey – EBI • Richard Christie – QMUL (Chair) • Pete Clapham – Sanger • David Fergusson – the Crick • Thomas King – QMUL • Richard Passey – UCL • Bruno Silva – the Crick
  • 69. Institutional Support Teams UCL: Facilitator: David Wong PSA: Faruque Sarker Crick: Facilitator: David Fergusson/Bruno Silva PSA: Adam Huffman, Luke Raimbach, John Bouquiere LSHTM: Facilitator: Jackie Stewart PSA: Steve Whitbread, Kuba Purebski
  • 70. Institutional Support Teams Sanger: Facilitator: Tim Cutts, Josh Randall PSA: Peter Clapham, James Beal EMBL-EBI: Facilitator: Steven Newhouse/Andy Cafferkey PSA: Gianni Dalla Torre QMUL: Tom King
  • 71. Operations Team Thomas Jones (UCL) Pete Clapham (Sanger) William Hay (UCL) James Beale (Sanger) Luke Sudbery (UCL) Tom King (QMUL) Bruno Silva (Ops Manager, Crick) Adam Huffman (Crick) Andy Cafferkey (EMBL-EBI) Luke Raimbach (Crick) Rich Boyce (EMBL-EBI) Stefan Boeing (Data Manager, Crick) David Ocana (EMBL-EBI)
  • 73. VM Image Installing OS CPU RAM Disk “Flavours” VM Instanc e 1 VM Instanc e N Network Start/Stop/Hold/Checkpoint Instance Horizon Console SSH - External IP SSH – Tunnel Web interface, etc…
  • 74. VM Instanc e 1 VM Instanc e N Tenant Network Open Stack Cinder Block Storage (single VM access) The Internet Tenant Network
  • 76. example research themes to be studied in the Academy Labs; by exploiting the commonalities underl the datasets, we shall build tools and algorithms that cut across the spectrum of diseases. Storage,)Compute,)Security,)Networking) Access)to)Infrastructure) Tools)&)analy<cs) Genomic,)imaging,)clinical)datasets) Cancer,)rare)and)cardiovascular)diseases) GSK,)Saran)Cannon,) DDN,)Intel,)IBM,) Aridhia)) Farr)Ins<tute,) Genomics)England,) UCLH)BRC) Informa<on)flow) links) ELIXIR,)ENCODE,) 1000)Genomes,) Ensembl) Proposed)funding) External)funding) Fig#1.
  • 77. users to sh within mul will also services a developed systems. programme infrastructu resources technologie parts of th others. Ea guaranteed Private( Secure( Collabora0ve( Space( Partner( projects( eMedLab( Partner( projects( Partner( projects( EBI( Partner( projects( FARR@UCLP( Kings(( Health( Partners( Fig#3.!The!co shared!data resources!al
  • 78. Winning bid • Standard Compute cluster • Ethernet network fabric • Spectrum Scale storage • Cloud OS
  • 79. Initial requirements • Hardware geared towards very high data throughput work – capable for running an HPC cluster and a Cloud based on VMs • Cloud OS (open source and commercial option) • Tiered storage system for: • High performance data processing • Data Sharing • Project storage • VM storage
  • 80. Bid responses – interesting facts • Majority providing OpenStack as the Cloud OS • Half included an HPC and a Cloud environment • One provided a Vmware-based solution • One provided a OpenStack-only solution • Half tender responses offered Lustre • One provided Ceph for VM storage
  • 81. 3. Personal perspectives › Andreas Biternas › HPC & Linux Lead › King’s College London
  • 82. CHALLENGES HAVING A SERVER FARM IN THE CENTER OF LONDON A N D R EAS B I T ER NAS F A CU LTY OF N A TU RAL A N D M A T HEMATICAL S C I ENCES King’s College HPC infrastructure in JISC DC
  • 83. • Cost of Space: Roughly £25k per square meter in Strand; • Power: • Expensive switches and UPS which require annual maintenance; • Unreliable power supply due to high demand in center of London; • Cooling: • Expensive cooling system similar to one in Virtus DC; • High cost for running and maintenance of the system; • Weight: Due to the oldness of the building, there are strict weight restrictions as an auditorium is below the server farm(!); • Noise pollution: There is strong noise from the server farm up to 2 floors below; Problems and costs of having server farm in Strand campus
  • 84. King’s college infrastructure in Virtus DC • Total 25 cabinets with ~200 racks in Data Hall 1: • 16 cabinets HPC cluster ADA+Rosalind; • Rest King’s Central IT infrastructure: fileservers, firewalls etc.; • Rosalind, a consortium between Faculty of Natural and Mathematical Sciences, South London and Maudsley NHS Foundation Trust BRC( Biomedical Research Centre) and Guy’s and St Thomas’ NHS Foundation Trust BRC; • Rosalind has around 5000 cores, ~150 Teraflops, HPC and Cloud part using OpenStack;
  • 85. Features of Virtus Datacentre • Power: • Two Redundant central power connections; • UPS & onsite power generator; • Two redundant PSU in each rack ; • Cooling: • Chilled water system cooled via fresh air; • Configures as hot and cold aisles; • Services: • Remote hands; • Installation and maintenance; • Office, storing spaces and wifi; • Secure access control environment;
  • 86. • Better internet connection; • No “single” connections; • Fully resilient network; • The bandwidth requirements of large data sets were being met; Connectivity with Virtus Datacentre
  • 87. • Due to the contract with JISC, tenants(Francis Crick Institute, Queen Mary University, King’s College etc.) have special rates; • Costs: • Standard fee for each rack which includes costs of space, cooling, connectivity etc.; • Power consumed form each rack in normal market(education) prices; Costs of Virtus Datacentre
  • 88. 3. Personal perspectives › Thomas King › Head of Research Infrastructure › Queen Mary University of London
  • 89. Queen Mary University of London Tom King Head of Research Infrastructure, IT Services
  • 90. Who are we?  20,000 students and 4,000 staff  5 campuses in London  3 faculties Humanities & Social Sciences Science & Engineering Barts & the London School of Medicine and Dentistry
  • 91. Copyright Tim Peake, ESA, NASA
  • 92. Old World IT  Small central provision  Lots of independent teams offering a lot of overlap in services and bespoke solutions  21 machine rooms
  • 93. IT Transformation Programme 2012-15  Centralisation of staff and services ~200 people  Consolidation into two data centres  On-site ~20 racks  Off-site facility within fibre channel latency distances  Highly virtualised environment  Enterprise services run in active-active  JISC Janet6 upgrades
  • 94. Research IT  Services we support – HPC Research Storage Hardware hosting Clinical and secure systems  Enterprise virtualisation is not what we’re after  Five nines is not our issue – bang for buck  No room at the inn  Build our own on-site?  The OAP home
  • 95. Benefits of shared data centre  Buying power and tenant’s association  Better PUE than smaller on-site DC contribution to sustainability commitment  Transparent costing for power use  Network redundancy – L2 and L3 of JISC network  Collaboration – it’s all about the data  Cloudier projects Emotional detachment from blinking LEDs Direction of funding – GridPP, Environmental omics Cloud
  • 96. That’s all folks… Except where otherwise noted, this work is licensed under CC-BY Martin Hamilton Futurist, Jisc, London @martin_hamilton martin.hamilton@jisc.ac.uk HPC & Big Data 2016