Research computing at ILRI

•

2 gostaram•789 visualizações

ILRI

Presented by Alan Orth at the CGIAR ICT Managers Meeting, ILRI, Kenya, 5 March 2014

Tecnologia

Research Computing at ILRI
Alan Orth
ICT Managers Meeting, ILRI, Kenya, 5 March 2014

Where we came from (2003)
- 32 dual-core compute
nodes
- 32 * 2 != 64
- Writing MPI code is hard!
- Data storage over NFS to
“master” node
- “Rocks” cluster distro
- Revolutionary at the time!

Where we came from (2010)
- Most of the original cluster
removed
- Replaced with single
Dell PowerEdge R910
- 64 cores, 8TB storage, 128 GB
- Threading is easier* than MPI!
- Data is local
- Easier to manage!

To infinity and beyond (2013)
- A little bit back to the
“old” model
- Mixture of “thin” and
“thick” nodes
- Networked storage
- Pure CentOS
- Supermicro boxen
- Pretty exciting! --->

Primary characteristics

Computational
capacity

Data storage

Platform
- 152 compute cores
- 32* TB storage
- 700 GB RAM
- 10 GbE interconnects
- LTO-4 tape backups (LOL?)

Homogeneous computing environment

User IDs, applications, and data are available
everywhere.

Scaling out storage with GlusterFS
- Developed by Red Hat
- Abstracts backend storage (file systems,
technology, etc)
- Can do replicate, distribute,
replicate+distribute, geo-replication (off site!),
etc
- Scales “out”, not “up”

How we use GlusterFS
[aorth@hpc: ~]$ df -h
Filesystem
Size
...
wingu1:/homes
31T
wingu0:/apps
31T
wingu1:/data
31T

Used Avail Use% Mounted on
9.5T
9.5T
9.5T

21T
21T
21T

32% /home
32% /export/apps
32% /export/data

- Persistent paths for homes, data, and
applications across the cluster.
- These volumes are replicated, so essentially
application-layer RAID1

- Project from Lawrence Livermore National Labs (LLNL)
- Manages resources
- Users request CPU, memory, and node allocations
- Queues / prioritizes jobs, logs usage, etc
- More like an accountant than a bouncer

How we use SLURM
- Can submit “batch” jobs (long-running jobs, invoke
program many times with different variables, etc)
- Can run “interactively” (something that needs keyboard
interaction)
Make it easy for users to do the “right thing”:
[aorth@hpc: ~]$ interactive -c 10
salloc: Granted job allocation 1080
[aorth@compute0: ~]$

Managing applications
- Environment modules - http://modules.
sourceforge.net
- Dynamically load support for packages in a
user’s environment
- Makes it easy to support multiple versions,
complicated packages with $PERL5LIB,
package dependencies, etc

Managing applications
Install once, use everywhere...
[aorth@hpc: ~]$ module avail blast
blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2.
2.28+
[aorth@hpc: ~]$ module load blast/2.2.28+
[aorth@hpc: ~]$ which blastn
/export/apps/blast/2.2.28+/bin/blastn
Works anywhere on the cluster!

Users and Groups
- Consistent UID/GIDs across systems
- LDAP + SSSD (also from Red Hat) is a great
match
- 389 LDAP works great with CentOS
- SSSD is simpler than pam_ldap and does
caching

More information and contact

a.orth@cgiar.org
http://hpc.ilri.cgiar.org/

Mais conteúdo relacionado

Mais procurados

Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi

HaaS: HPCC Systems as a Service – BYOD to the Cloud PartyHPCC Systems

Exadata x2 extyangjx

Writing file system in CPythondelimitry

Glusterfs session #9 index xlatorPranith Karampuri

Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd

An introduction to Big-Data processing applying hadoopAmir Sedighi

Ansible OTCFrank Kloeker

Atom: A cloud native deep learning platform at SupremindAlluxio, Inc.

8a. How To Setup HBase with DockerFabio Fumarola

RearAleksandar Bilanovic

2 db2 instance creationRavikumar Nandigam

Mongodb backupDharshan Rangegowda

Php dba cacheGjero Krsteski

More than UISujith Krishnan

Performance tuning in BlueStore & RocksDB - Li XiaoyanCeph Community

IBM DB2 LUW UDB DBA Training by www.etraining.guruRavikumar Nandigam

Cassandra4hadoopEdward Capriolo

Mosix ClusterAbhay Pai

Mais procurados (19)

Hadoop 2.x HDFS Cluster Installation (VirtualBox)

HaaS: HPCC Systems as a Service – BYOD to the Cloud Party

Exadata x2 ext

Writing file system in CPython

Glusterfs session #9 index xlator

Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...

An introduction to Big-Data processing applying hadoop

Ansible OTC

Atom: A cloud native deep learning platform at Supremind

8a. How To Setup HBase with Docker

Rear

2 db2 instance creation

Mongodb backup

Php dba cache

More than UI

Performance tuning in BlueStore & RocksDB - Li Xiaoyan

IBM DB2 LUW UDB DBA Training by www.etraining.guru

Cassandra4hadoop

Mosix Cluster

Semelhante a Research computing at ILRI

Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech

The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra

Apache Hadoop 3.0 Community UpdateDataWorks Summit

Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc

Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS

Exascale CapablSagar Dolas

ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato

12.) fabric (your next data center)Jeff Green

EEDC - Apache Pigjavicid

GIST AI-X Computing ClusterJax Jargalsaikhan

Big data processing using hadoop poster presentationAmrut Patil

Jupyter Enterprise Gateway OverviewLuciano Resende

Building SuperComputers @ HomeAbhishek Parolkar

ERS downscale2016Victor de Boer

Dell Lustre Storage Architecture Presentation - MBUG 2016Andrew Underwood

EEDC Apache Pig LanguageRoger Rafanell Mas

What Linux can learn from Solaris performance and vice-versaBrendan Gregg

WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp

Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling

Eedc.apache.pig lastFrancesc Lordan Gomis

Semelhante a Research computing at ILRI (20)

Architecting a 35 PB distributed parallel file system for science

The Anatomy Of The Google Architecture Fina Lv1.1

Apache Hadoop 3.0 Community Update

Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage

Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...

Exascale Capabl

ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data

12.) fabric (your next data center)

EEDC - Apache Pig

GIST AI-X Computing Cluster

Big data processing using hadoop poster presentation

Jupyter Enterprise Gateway Overview

Building SuperComputers @ Home

ERS downscale2016

Dell Lustre Storage Architecture Presentation - MBUG 2016

EEDC Apache Pig Language

What Linux can learn from Solaris performance and vice-versa

WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...

Building Hopsworks, a cloud-native managed feature store for machine learning

Eedc.apache.pig last

Mais de ILRI

How the small-scale low biosecurity sector could be transformed into a more b...ILRI

Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI

A training, certification and marketing scheme for informal dairy vendors in ...ILRI

Milk safety and child nutrition impacts of the MoreMilk training, certificati...ILRI

Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseasesILRI

Preventing preventable diseases: a 12-slide primer on foodborne diseaseILRI

Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistanceILRI

Food safety research in low- and middle-income countriesILRI

Food safety research LMICILRI

The application of One Health: Observations from eastern and southern AfricaILRI

One Health in action: Perspectives from 10 years in the fieldILRI

Reservoirs of pathogenic Leptospira species in UgandaILRI

Minyoo ya mbwaILRI

Parasites in dogsILRI

Assessing meat microbiological safety and associated handling practices in bu...ILRI

Ecological factors associated with abundance and distribution of mosquito vec...ILRI

Livestock in the agrifood systems transformationILRI

Development of a fluorescent RBL reporter system for diagnosis of porcine cys...ILRI

Practices and drivers of antibiotic use in Kenyan smallholder dairy farmsILRI

Mais de ILRI (20)

How the small-scale low biosecurity sector could be transformed into a more b...

Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...

A training, certification and marketing scheme for informal dairy vendors in ...

Milk safety and child nutrition impacts of the MoreMilk training, certificati...

Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases

Preventing preventable diseases: a 12-slide primer on foodborne disease

Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance

Food safety research in low- and middle-income countries

Food safety research LMIC

The application of One Health: Observations from eastern and southern Africa

One Health in action: Perspectives from 10 years in the field

Reservoirs of pathogenic Leptospira species in Uganda

Minyoo ya mbwa

Parasites in dogs

Assessing meat microbiological safety and associated handling practices in bu...

Ecological factors associated with abundance and distribution of mosquito vec...

Livestock in the agrifood systems transformation

Development of a fluorescent RBL reporter system for diagnosis of porcine cys...

Practices and drivers of antibiotic use in Kenyan smallholder dairy farms

Último

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica

Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

2024 April Patch TuesdayIvanti

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

Connecting the Dots for Information Discovery.pdfNeo4j

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

A Framework for Development in the AI AgeCprime

Research computing at ILRI

1. Research Computing at ILRI Alan Orth ICT Managers Meeting, ILRI, Kenya, 5 March 2014

2. Where we came from (2003) - 32 dual-core compute nodes - 32 * 2 != 64 - Writing MPI code is hard! - Data storage over NFS to “master” node - “Rocks” cluster distro - Revolutionary at the time!

3. Where we came from (2010) - Most of the original cluster removed - Replaced with single Dell PowerEdge R910 - 64 cores, 8TB storage, 128 GB - Threading is easier* than MPI! - Data is local - Easier to manage!

4. To infinity and beyond (2013) - A little bit back to the “old” model - Mixture of “thin” and “thick” nodes - Networked storage - Pure CentOS - Supermicro boxen - Pretty exciting! --->

5. Primary characteristics Computational capacity Data storage

6. Platform - 152 compute cores - 32* TB storage - 700 GB RAM - 10 GbE interconnects - LTO-4 tape backups (LOL?)

7. Homogeneous computing environment User IDs, applications, and data are available everywhere.

8. Scaling out storage with GlusterFS - Developed by Red Hat - Abstracts backend storage (file systems, technology, etc) - Can do replicate, distribute, replicate+distribute, geo-replication (off site!), etc - Scales “out”, not “up”

9. How we use GlusterFS [aorth@hpc: ~]$ df -h Filesystem Size ... wingu1:/homes 31T wingu0:/apps 31T wingu1:/data 31T Used Avail Use% Mounted on 9.5T 9.5T 9.5T 21T 21T 21T 32% /home 32% /export/apps 32% /export/data - Persistent paths for homes, data, and applications across the cluster. - These volumes are replicated, so essentially application-layer RAID1

10. GlusterFS <3 10GbE

11. - Project from Lawrence Livermore National Labs (LLNL) - Manages resources - Users request CPU, memory, and node allocations - Queues / prioritizes jobs, logs usage, etc - More like an accountant than a bouncer

12. Topology

13. How we use SLURM - Can submit “batch” jobs (long-running jobs, invoke program many times with different variables, etc) - Can run “interactively” (something that needs keyboard interaction) Make it easy for users to do the “right thing”: [aorth@hpc: ~]$ interactive -c 10 salloc: Granted job allocation 1080 [aorth@compute0: ~]$

14. Managing applications - Environment modules - http://modules. sourceforge.net - Dynamically load support for packages in a user’s environment - Makes it easy to support multiple versions, complicated packages with $PERL5LIB, package dependencies, etc

15. Managing applications Install once, use everywhere... [aorth@hpc: ~]$ module avail blast blast/2.2.25+ blast/2.2.26 blast/2.2.26+ blast/2. 2.28+ [aorth@hpc: ~]$ module load blast/2.2.28+ [aorth@hpc: ~]$ which blastn /export/apps/blast/2.2.28+/bin/blastn Works anywhere on the cluster!

16. Users and Groups - Consistent UID/GIDs across systems - LDAP + SSSD (also from Red Hat) is a great match - 389 LDAP works great with CentOS - SSSD is simpler than pam_ldap and does caching

17. More information and contact a.orth@cgiar.org http://hpc.ilri.cgiar.org/

Research computing at ILRI

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Research computing at ILRI

Semelhante a Research computing at ILRI (20)

Mais de ILRI

Mais de ILRI (20)

Último

Último (20)

Research computing at ILRI