SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
1
Photo credit: Aaron Gardner
Bridging the Gap - Facilitating Collaborative Life Science
Research in Commercial & Enterprise Environments
March 2017 - NEREN SEMINAR
2
I’m Chris.

I’m an infrastructure geek (and failed scientist)
I work for the BioTeam.
Photo credit: Cindy Jessel
@chris_dag
3
www.BioTeam.net
Independent Consulting Shop
Run by scientists forced to learn IT to “get science done”
Virtual company with nationwide staff
15+ years “bridging the gap” between hardcore science, HPC & IT
Honest. Objective. Vendor & Technology Agnostic.
We are hiring :)
4
Content Warning
I am not an “expert”
… or a “thought leader”
I try to speak honestly about what I
see, do and experience “on the
ground” as an IT worker
My views are biased by the types of
work I perform. Filter my words
through your own expertise …
I’m worried about time so I may skip slides
— full PDF of slide deck will be available.
5
Q1’17 Current State:

Commercial LifeSci Research Computing
6
01: Science Evolves Faster Than IT
‣ Rate of scientific innovation is incredible
‣ Same innovation rate seen with lab side instruments
‣ Scientific and instrument requirements change far faster than IT
organizations can build, rebuild or refresh complex infrastructure
‣ In the face of science world changing month-to-month:
‣ … best funded, most aggressive shops can only refresh large
installations every ~2 years. Most refresh on 3-4 year cycles.
‣ Gulp!
7
02: We’ve lost the centralization battle
‣ Old way:
‣ Centralize all HPC and Research Computing functions into a single-site,
centrally managed & supported environment
‣ Bring the users and the data to the shared environment
‣ This no longer works as well as it used to …
‣ Terabyte-scale instruments have diffused EVERYWHERE and will continue
to pop up “everywhere”
‣ Building/campus LANs can’t support tera|peta-scale data movement
‣ Does not address external collaborators or data sources well
8
03: Petabytes for “free”
‣ There are petabytes of very interesting open-access data available for free
on the internet
‣ There are many valid business and scientific reasons for a research
computing user wanting to bring some of this data in-house to facilitate new
or existing research programs but …
‣ Massive technical challenges (Ingest, ‘trash tier storage’, etc.)
‣ Massive organizational challenges:
‣ It takes a ton of work and resources to host peta-scale “free” data
‣ Organizations struggling to build governance/approval models tied to
actual business or scientific goals
9
04: Userbase now spanning the enterprise
‣ Life was a lot easier when the only users of research computing were
scientists and R&D organizations
‣ Easy to build domain expertise and bias our infrastructure to favor power and capability over 99.99%
uptime. Researchers will tolerate occasional downtime if the “payoff” is faster systems or bigger storage
‣ Much harder when the full enterprise needs “data intensive science”
‣ Those pesky corporate types want SLAs and 24x7 support :)
‣ Userbase diversity is incredible: manufacturing, process optimization,
commercial operations, sales operations, compliance, risk management,
etc, etc,
‣ Far far harder to support, train, enable and “mentor”
10
05: Data Types Getting Weird
‣ We are very good at handling terabytes and petabytes of static structured or
unstructured data - storage tech and operational practices for this have
evolved over DECADES
‣ Ingesting, storing and computing against data streams requires entirely new
tech, skills and infrastructure
‣ Sensor telementy from bioreactors in manufacturing
‣ Environmental sensor data streams from greenhouses
‣ Website clickstream and advertising metrics from Commercial Ops
‣ etc. etc.
11
06: Our Networks Suck
‣ Enterprise network architectures are optimized for lots of small concurrent
traffic flows. They have issues with “elephant flows” where a single network
flow may be using 1gb, 10gb or 40gb of bandwidth to move a big data file
‣ Our network cores can barely handle 10gig when they should be running at
40gig and 100gig so they can do 10gig to top-of-rack trivially
‣ Our building-to-building and lab-to-lab links are woefully undersized
‣ Our connections to the outside world are woefully undersized
‣ Cost of Cisco networking at 40gb and higher is simply ludicrous
12
07: Our Firewalls Suck
‣ Stuck with legacy model and operational assumptions (“Yes we can do deep
packet inspection on EVERYTHING …” & “Yeah it makes total sense to only
put a firewall at the perimeter of our network”)
‣ That $90,000 firewall advertised as “10gig ready” can’t actually handle a
large scientific data transfer because inside the box they are actually
aggregating 10x cheap 1gig network paths and calling it “10 gig”
‣ Feed it a single file transfer stream @ 10gbps and watch it thrash and drop
throughput by 90%.
13
Summarizing our key challenges
What keeps us from the collaborative computing promised land?
Collaborative Research: Key Challenges
‣ Network speeds: Internal & External
‣ Deploying ScienceDMZ architectures to take “data intensive science” load
off of networks built for business users
‣ Network security methods: Core & Edge
‣ Federated Identity Management
‣ Obtaining the domain expertise required to enable, mentor and fully support
the massively expanding class of collaborative researchers who need
sophisticated compute and analytics
14
15
Ok dude. All your challenges are tech related.
What about the human side of research facilitation?
16
Collaborative Research Challenges: Human Factors
‣ Wishful thinking rather than critical thinking about what the organization
REALLY wants to encourage. We see a lot of “build the database/catalog/
warehouse/repository/lake/commons and they will come” pitches with zero
support for follow-through. 

‣ Collab/research facilitators with enough seniority to to be thinking “Where
are the collaborative opportunities, how do they align with the business
needs, what data is actually useful to others?”
17
Collaborative Research Challenges: Human Factors, 2
‣ The BIGGEST ISSUE OF ALL:
‣ What’s in it personally for the collaborating parties?
‣ Does this get them promoted, published, solve their research problem,
answer their burning questions, etc. or does it detract from these things by
taking time away from activities more beneficial to the org or person?
‣ Does the ‘system’ support or inhibit collaboration through activities like
budget allocations, staffing, approval processes, etc. ?
‣ Org charts, corporate culture and operating models can either encourage
or stifle any collaborative efforts that may exist. h/t - Simon Twigger!
18
Collaborative Research Challenges: Human Factors, 3
‣ Research Facilitators in Industry: Someone needs to be out there learning
about the ‘silos of excellence’ and seeing the opportunities for collaboration
‣ Some scientists are too heads down in their own area to see beyond
immediate needs. Having a human to make this happen could be huge,
way more effective than all the technological ‘solutions’ we usually throw
at this problem.
‣ Impedance mismatch: A real issue. We need something like an E-Harmony
for matchmaking between collaborators with the same motivation levels !
h/t - Simon Twigger!
19
Collaborative Research Computing: Internal
Supporting internal efforts in commercial pharma/biotech
20
Facilitating Internal Collaboration
‣ Harder than multi-party collaboration in some ways
‣ Few companies incentivize or otherwise actively encourage collaboration across
departmental boundaries
‣ Or if they do “encourage” it is often just empty talk; the reality on the ground when it
comes to performance reviews, HR and local management may be different
‣ Talk is cheap. Taking steps to encourage, track and reward people is not.
‣ Other main issue is “impedence mismatch” between potential collaborators
‣ Often two groups that may wish to collaborate may have different timeframes,
interest levels and available resources. Tough to find perfect alignment
21
Internal Collaboration: How we do it (1)
‣ Regular HPC/computing training classes where all are welcome and attendees span
various business units. Serendipitous opportunities abound
‣ Mailing list, Slack etc. methods for consumers of research computing services to
actively communicate, share code and troubleshooting assistance
‣ Road-shows and “lunch and learn” sessions with rotating cast of speakers, delivered
across multiple sites. Speakers are often users/consumers with great stories and
data to talk about
‣ Having most apps and data sets on a large single namespace storage system makes
the act of collaboration easier for all comers; Private GitLab or other code hosting
portal for users to share code and tooling also helps
22
Internal Collaboration: How we do it (2)
‣ Publishing data catalogs so people understand what is available for use and
exploration is very helpful. Does not have to be complex - even a simple Wiki or web
page can work
‣ “Research Facilitators” who can embed with departments or groups for weeklong or
monthlong periods are very useful
‣ … at driving new use cases and collaborations
‣ … at collecting valuable domain knowledge needed for long term support of users
and departments
‣ … dissolving barriers between IT and people asking interesting questions
23
Internal Collaboration: Challenges
‣ Fighting for permission to deploy real, useful collaboration tools vs. management who just keep
saying “SharePoint, SharePoint, SharePoint …”
‣ The new crop of potential collaborators may sit at sites not previously covered by research
computing infrastructure or support resources
‣ As data types and tooling get more diverse and more complex it is a constant battle to retain the
internal IT “domain knowledge” necessary to help compute consumers be successful in their
efforts
‣ Research IT / R&D organizations have long known the value of hiring “research facilitators” or
embeddable support/consultants. This awareness is far less common outside of Research.
‣ Non-research/Non-product groups are often not funded at levels that allow them to think about
novel support / staffing / collaboration structures
24
Collaborative Research Computing: Multi-Party
Supporting multi-party collaboration in commercial pharma/biotech
25
Multi-Party Collaboration: How we do it (1)
‣ Supporting this work is straightforward. We don’t have to evangelize or encourage —
they know what they want to do and “our” job is to deploy & facilitate
‣ We usually don’t even have to train people. The collaborators know their data and
tooling far better than we do
‣ Important to understand in the commercial space that it is common for organizations
to be collaborators in one area and fierce competitors in other areas/markets
‣ This means that NOBODY is punching holes in firewalls and adding external people to
the local Active Directory server.
‣ Almost all of the complex multi-party collaborations that Bioteam is involved with in
this space are occurring within dedicated IaaS cloud environments
26
Multi-Party Collaboration: How we do it (2)
‣ IaaS cloud environments like a private Amazon AWS VPC are the default neutral
meeting ground for complex multi-party/multi-organization collaborations
‣ Why?
• Nobody has to invite strangers behind their firewall or VPN
• Vast amounts of storage, compute and analytics resources at-hand
• Security controls are powerful and very fine-grained . Often 1000x more capable
than the security controls we typically see “inhouse”
• Data sets may already be hosted on Amazon and if not, high-velocity data ingest is
something that can be engineered and built
• AWS is on Internet2 — good access to national research centers and academia
27
Multi-Party Collaboration: Challenges
‣ The biggest challenge is identity management, authorization and access control
‣ Building a federated ID service that can do role based access control amongst multiple people and
institutions is neither quick nor simple
‣ The people who control Active Directory “at-home” rarely interact with mere mortals and securing
approval to expose/federate an internal directory to “the cloud” can be a long and complex process
‣ In a 40,000 person global enterprise there may be only 2 folk who truly understand the deep technical
details involved with ADFS, AD, SAML, Federation and related topics. Finding those people and stealing
them for your team is hard work.
‣ Those crazy academic collaborators use weird stuff for ID management like “Shibboleth” :) that
corporate IT suits have a very hard time understanding and dealing with
‣ Other challenges: Long term storage and hosting of data if terabyte or petabyte volumes of data are
involved. Where does this go after active collaboration ends?
28
A reasonable question to ask …
‣ Why is all this collaborative scientific computing stuff on Amazon instead of a regional
specialty facility like MGHPCC?
‣ Lots of reasons but none are insurmountable …
• Awareness & ease of access
• Inertia and laziness
• 3rd party vendor & solution presence within AWS
• …
29
But …
‣ There is an interesting trend BioTeam has observed that may play into this …
‣ We are predicting a number of high-profile ‘cloud pullback’ projects this year and
next. We are actively working on at least one right now involving large-scale scientific
computing and petabyte+ volume of data.
‣ The VERY INTERESTING thing is that these projects that are being “pulled back” from
public clouds ARE NOT going back on-premise.
• … they are going to specialty facilities that appear similar in nature/mission as
MGHPCC
• End result: You may see a larger commercial/industrial presence at shared facilities
more commonly associated with academic or .gov supercomputing. Industry/
Academic collaborations may get much easier in the future if this trend holds up.
30
end; Thanks!
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag

Mais conteúdo relacionado

Mais procurados

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
mark madsen
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
Mark Krebs
 

Mais procurados (20)

Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 
2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
Defining a Practical Path to Artificial Intelligence
Defining a Practical Path to Artificial Intelligence Defining a Practical Path to Artificial Intelligence
Defining a Practical Path to Artificial Intelligence
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
 

Destaque

Destaque (10)

Survey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.pptSurvey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.ppt
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
 
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File SystemArchitecture of the Upcoming OrangeFS v3 Distributed Parallel File System
Architecture of the Upcoming OrangeFS v3 Distributed Parallel File System
 
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
 
HSM migration with EasyHSM and Nirvana
HSM migration with EasyHSM and NirvanaHSM migration with EasyHSM and Nirvana
HSM migration with EasyHSM and Nirvana
 
EasyHSM Overview
EasyHSM OverviewEasyHSM Overview
EasyHSM Overview
 
A escolha da profissão!
A escolha da profissão!   A escolha da profissão!
A escolha da profissão!
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ash
 
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
Ibm spectrum scale fundamentals workshop for americas part 1 components archi...
 

Semelhante a Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments

Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
mark madsen
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 

Semelhante a Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments (20)

Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration Stack
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Make compliance fulfillment count double
Make compliance fulfillment count doubleMake compliance fulfillment count double
Make compliance fulfillment count double
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
Asking Why
Asking WhyAsking Why
Asking Why
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 

Mais de Chris Dagdigian

Mais de Chris Dagdigian (6)

Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 

Último

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 

Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments

  • 1. 1 Photo credit: Aaron Gardner Bridging the Gap - Facilitating Collaborative Life Science Research in Commercial & Enterprise Environments March 2017 - NEREN SEMINAR
  • 2. 2 I’m Chris.
 I’m an infrastructure geek (and failed scientist) I work for the BioTeam. Photo credit: Cindy Jessel @chris_dag
  • 3. 3 www.BioTeam.net Independent Consulting Shop Run by scientists forced to learn IT to “get science done” Virtual company with nationwide staff 15+ years “bridging the gap” between hardcore science, HPC & IT Honest. Objective. Vendor & Technology Agnostic. We are hiring :)
  • 4. 4 Content Warning I am not an “expert” … or a “thought leader” I try to speak honestly about what I see, do and experience “on the ground” as an IT worker My views are biased by the types of work I perform. Filter my words through your own expertise … I’m worried about time so I may skip slides — full PDF of slide deck will be available.
  • 5. 5 Q1’17 Current State:
 Commercial LifeSci Research Computing
  • 6. 6 01: Science Evolves Faster Than IT ‣ Rate of scientific innovation is incredible ‣ Same innovation rate seen with lab side instruments ‣ Scientific and instrument requirements change far faster than IT organizations can build, rebuild or refresh complex infrastructure ‣ In the face of science world changing month-to-month: ‣ … best funded, most aggressive shops can only refresh large installations every ~2 years. Most refresh on 3-4 year cycles. ‣ Gulp!
  • 7. 7 02: We’ve lost the centralization battle ‣ Old way: ‣ Centralize all HPC and Research Computing functions into a single-site, centrally managed & supported environment ‣ Bring the users and the data to the shared environment ‣ This no longer works as well as it used to … ‣ Terabyte-scale instruments have diffused EVERYWHERE and will continue to pop up “everywhere” ‣ Building/campus LANs can’t support tera|peta-scale data movement ‣ Does not address external collaborators or data sources well
  • 8. 8 03: Petabytes for “free” ‣ There are petabytes of very interesting open-access data available for free on the internet ‣ There are many valid business and scientific reasons for a research computing user wanting to bring some of this data in-house to facilitate new or existing research programs but … ‣ Massive technical challenges (Ingest, ‘trash tier storage’, etc.) ‣ Massive organizational challenges: ‣ It takes a ton of work and resources to host peta-scale “free” data ‣ Organizations struggling to build governance/approval models tied to actual business or scientific goals
  • 9. 9 04: Userbase now spanning the enterprise ‣ Life was a lot easier when the only users of research computing were scientists and R&D organizations ‣ Easy to build domain expertise and bias our infrastructure to favor power and capability over 99.99% uptime. Researchers will tolerate occasional downtime if the “payoff” is faster systems or bigger storage ‣ Much harder when the full enterprise needs “data intensive science” ‣ Those pesky corporate types want SLAs and 24x7 support :) ‣ Userbase diversity is incredible: manufacturing, process optimization, commercial operations, sales operations, compliance, risk management, etc, etc, ‣ Far far harder to support, train, enable and “mentor”
  • 10. 10 05: Data Types Getting Weird ‣ We are very good at handling terabytes and petabytes of static structured or unstructured data - storage tech and operational practices for this have evolved over DECADES ‣ Ingesting, storing and computing against data streams requires entirely new tech, skills and infrastructure ‣ Sensor telementy from bioreactors in manufacturing ‣ Environmental sensor data streams from greenhouses ‣ Website clickstream and advertising metrics from Commercial Ops ‣ etc. etc.
  • 11. 11 06: Our Networks Suck ‣ Enterprise network architectures are optimized for lots of small concurrent traffic flows. They have issues with “elephant flows” where a single network flow may be using 1gb, 10gb or 40gb of bandwidth to move a big data file ‣ Our network cores can barely handle 10gig when they should be running at 40gig and 100gig so they can do 10gig to top-of-rack trivially ‣ Our building-to-building and lab-to-lab links are woefully undersized ‣ Our connections to the outside world are woefully undersized ‣ Cost of Cisco networking at 40gb and higher is simply ludicrous
  • 12. 12 07: Our Firewalls Suck ‣ Stuck with legacy model and operational assumptions (“Yes we can do deep packet inspection on EVERYTHING …” & “Yeah it makes total sense to only put a firewall at the perimeter of our network”) ‣ That $90,000 firewall advertised as “10gig ready” can’t actually handle a large scientific data transfer because inside the box they are actually aggregating 10x cheap 1gig network paths and calling it “10 gig” ‣ Feed it a single file transfer stream @ 10gbps and watch it thrash and drop throughput by 90%.
  • 13. 13 Summarizing our key challenges What keeps us from the collaborative computing promised land?
  • 14. Collaborative Research: Key Challenges ‣ Network speeds: Internal & External ‣ Deploying ScienceDMZ architectures to take “data intensive science” load off of networks built for business users ‣ Network security methods: Core & Edge ‣ Federated Identity Management ‣ Obtaining the domain expertise required to enable, mentor and fully support the massively expanding class of collaborative researchers who need sophisticated compute and analytics 14
  • 15. 15 Ok dude. All your challenges are tech related. What about the human side of research facilitation?
  • 16. 16 Collaborative Research Challenges: Human Factors ‣ Wishful thinking rather than critical thinking about what the organization REALLY wants to encourage. We see a lot of “build the database/catalog/ warehouse/repository/lake/commons and they will come” pitches with zero support for follow-through. 
 ‣ Collab/research facilitators with enough seniority to to be thinking “Where are the collaborative opportunities, how do they align with the business needs, what data is actually useful to others?”
  • 17. 17 Collaborative Research Challenges: Human Factors, 2 ‣ The BIGGEST ISSUE OF ALL: ‣ What’s in it personally for the collaborating parties? ‣ Does this get them promoted, published, solve their research problem, answer their burning questions, etc. or does it detract from these things by taking time away from activities more beneficial to the org or person? ‣ Does the ‘system’ support or inhibit collaboration through activities like budget allocations, staffing, approval processes, etc. ? ‣ Org charts, corporate culture and operating models can either encourage or stifle any collaborative efforts that may exist. h/t - Simon Twigger!
  • 18. 18 Collaborative Research Challenges: Human Factors, 3 ‣ Research Facilitators in Industry: Someone needs to be out there learning about the ‘silos of excellence’ and seeing the opportunities for collaboration ‣ Some scientists are too heads down in their own area to see beyond immediate needs. Having a human to make this happen could be huge, way more effective than all the technological ‘solutions’ we usually throw at this problem. ‣ Impedance mismatch: A real issue. We need something like an E-Harmony for matchmaking between collaborators with the same motivation levels ! h/t - Simon Twigger!
  • 19. 19 Collaborative Research Computing: Internal Supporting internal efforts in commercial pharma/biotech
  • 20. 20 Facilitating Internal Collaboration ‣ Harder than multi-party collaboration in some ways ‣ Few companies incentivize or otherwise actively encourage collaboration across departmental boundaries ‣ Or if they do “encourage” it is often just empty talk; the reality on the ground when it comes to performance reviews, HR and local management may be different ‣ Talk is cheap. Taking steps to encourage, track and reward people is not. ‣ Other main issue is “impedence mismatch” between potential collaborators ‣ Often two groups that may wish to collaborate may have different timeframes, interest levels and available resources. Tough to find perfect alignment
  • 21. 21 Internal Collaboration: How we do it (1) ‣ Regular HPC/computing training classes where all are welcome and attendees span various business units. Serendipitous opportunities abound ‣ Mailing list, Slack etc. methods for consumers of research computing services to actively communicate, share code and troubleshooting assistance ‣ Road-shows and “lunch and learn” sessions with rotating cast of speakers, delivered across multiple sites. Speakers are often users/consumers with great stories and data to talk about ‣ Having most apps and data sets on a large single namespace storage system makes the act of collaboration easier for all comers; Private GitLab or other code hosting portal for users to share code and tooling also helps
  • 22. 22 Internal Collaboration: How we do it (2) ‣ Publishing data catalogs so people understand what is available for use and exploration is very helpful. Does not have to be complex - even a simple Wiki or web page can work ‣ “Research Facilitators” who can embed with departments or groups for weeklong or monthlong periods are very useful ‣ … at driving new use cases and collaborations ‣ … at collecting valuable domain knowledge needed for long term support of users and departments ‣ … dissolving barriers between IT and people asking interesting questions
  • 23. 23 Internal Collaboration: Challenges ‣ Fighting for permission to deploy real, useful collaboration tools vs. management who just keep saying “SharePoint, SharePoint, SharePoint …” ‣ The new crop of potential collaborators may sit at sites not previously covered by research computing infrastructure or support resources ‣ As data types and tooling get more diverse and more complex it is a constant battle to retain the internal IT “domain knowledge” necessary to help compute consumers be successful in their efforts ‣ Research IT / R&D organizations have long known the value of hiring “research facilitators” or embeddable support/consultants. This awareness is far less common outside of Research. ‣ Non-research/Non-product groups are often not funded at levels that allow them to think about novel support / staffing / collaboration structures
  • 24. 24 Collaborative Research Computing: Multi-Party Supporting multi-party collaboration in commercial pharma/biotech
  • 25. 25 Multi-Party Collaboration: How we do it (1) ‣ Supporting this work is straightforward. We don’t have to evangelize or encourage — they know what they want to do and “our” job is to deploy & facilitate ‣ We usually don’t even have to train people. The collaborators know their data and tooling far better than we do ‣ Important to understand in the commercial space that it is common for organizations to be collaborators in one area and fierce competitors in other areas/markets ‣ This means that NOBODY is punching holes in firewalls and adding external people to the local Active Directory server. ‣ Almost all of the complex multi-party collaborations that Bioteam is involved with in this space are occurring within dedicated IaaS cloud environments
  • 26. 26 Multi-Party Collaboration: How we do it (2) ‣ IaaS cloud environments like a private Amazon AWS VPC are the default neutral meeting ground for complex multi-party/multi-organization collaborations ‣ Why? • Nobody has to invite strangers behind their firewall or VPN • Vast amounts of storage, compute and analytics resources at-hand • Security controls are powerful and very fine-grained . Often 1000x more capable than the security controls we typically see “inhouse” • Data sets may already be hosted on Amazon and if not, high-velocity data ingest is something that can be engineered and built • AWS is on Internet2 — good access to national research centers and academia
  • 27. 27 Multi-Party Collaboration: Challenges ‣ The biggest challenge is identity management, authorization and access control ‣ Building a federated ID service that can do role based access control amongst multiple people and institutions is neither quick nor simple ‣ The people who control Active Directory “at-home” rarely interact with mere mortals and securing approval to expose/federate an internal directory to “the cloud” can be a long and complex process ‣ In a 40,000 person global enterprise there may be only 2 folk who truly understand the deep technical details involved with ADFS, AD, SAML, Federation and related topics. Finding those people and stealing them for your team is hard work. ‣ Those crazy academic collaborators use weird stuff for ID management like “Shibboleth” :) that corporate IT suits have a very hard time understanding and dealing with ‣ Other challenges: Long term storage and hosting of data if terabyte or petabyte volumes of data are involved. Where does this go after active collaboration ends?
  • 28. 28 A reasonable question to ask … ‣ Why is all this collaborative scientific computing stuff on Amazon instead of a regional specialty facility like MGHPCC? ‣ Lots of reasons but none are insurmountable … • Awareness & ease of access • Inertia and laziness • 3rd party vendor & solution presence within AWS • …
  • 29. 29 But … ‣ There is an interesting trend BioTeam has observed that may play into this … ‣ We are predicting a number of high-profile ‘cloud pullback’ projects this year and next. We are actively working on at least one right now involving large-scale scientific computing and petabyte+ volume of data. ‣ The VERY INTERESTING thing is that these projects that are being “pulled back” from public clouds ARE NOT going back on-premise. • … they are going to specialty facilities that appear similar in nature/mission as MGHPCC • End result: You may see a larger commercial/industrial presence at shared facilities more commonly associated with academic or .gov supercomputing. Industry/ Academic collaborations may get much easier in the future if this trend holds up.