2018 Bio-IT World Agile in Wet Labs Speeds Big Data
1. Using Agile Techniques in Wet Labs
to Speed the Creation of Even More Big Data
Bruce Kozuma, Principal System Analyst
Kendra West, Scrum Master, Data Sciences and Data Engineering
Thursday 2018/05/17, Bio-IT World
2. About the Authors
• Bruce Kozuma is a Principal
Systems Analyst in IT
• Connect via LinkedIn:
https://linkedin.com/in/bkozuma
• Kendra West is a Scrum Master in
Data Sciences and Data Engineering
• Connect via LinkedIn:
https://linkedin.com/in/kendraleighwe
st
3. Core Members
~10
Institute Members
~38
Associate Members
~322
Employees
~1000
Post-Docs, Fellows & Scholars in Residence
~100
Visiting Scientists, Staff & Researchers
~750
Students
~550
Post-Docs/Partner Institutions
~600
Over 3,400 Broadies working together
4. About the Broad Institute of MIT and Harvard
• Propelling the understanding and
treatment of disease
• Collaborating deeply
• Reaching globally
• Empowering scientists
• Building partnerships
• Sharing data and knowledge
• Promoting inclusion
5. The Agile Manifesto
Individuals & Interactions > Processes & Tools
*Delivering Value > Comprehensive Documentation
Customer Collaboration > Contract Negotiation
Responding to Change > Following a Plan
*adapted to fit organizational needs
6. What is the Agile approach?
• We follow Twelve Agile Principles behind the Manifesto:
• Our highest priority is to satisfy the customer
through early and continuous delivery
of value
• Welcome changing requirements, even late in
development; Agile processes harness change
for the customer's competitive advantage
• Deliver frequently, from a couple of weeks to
a couple of months, with a preference to the
shorter timescale
• Value delivery is the primary measure of progress
Frequent
delivery and
feedback
7. What is the Agile approach?
• We follow Twelve Agile Principles behind the Manifesto:
• Business people and developers must work together
daily throughout the project
• The most efficient and effective method of conveying
information to and within a development team is
face-to-face conversation
• The best architectures, requirements, and designs
emerge from self-organizing teams
• At regular intervals, the team reflects on how to
become more effective, then tunes and adjusts its
behavior accordingly
Teams
communicating
openly
8. What is the Agile approach?
• We follow Twelve Agile Principles behind the Manifesto:
• Build projects around motivated individuals;
Give them the environment and support they need,
and trust them to get the job done
• Agile processes promote sustainable development;
The sponsors, developers, and users should be able
to maintain a constant pace indefinitely
• Continuous attention to technical excellence
and good design enhances agility
• Simplicity – the art of maximizing the amount
of work not done – is essential
Doing
our best
work
9. What is Scrum?
• An Agile framework
• Born in Boston
• 90% of Agile teams worldwide use Scrum
• Borrows its name from rugby
10. Scrum Values, Pillars, and Elements
Scrum values
OpeneSs
Courage
Respect
FocUs
ComMitment
Scrum pillars
• Transparency
• Inspection
• Adaptation
Scrum team
• Product Owner
• Scrum Master
• Development Team
Scrum events
• The Sprint
• Sprint Planning
• Daily Scrum
• Sprint Review
• Sprint Retrospective
Scrum artifacts
• Product Backlog
• Sprint Backlog
• Increment
• Definition of Done
11. The Broad’s mission embodies many Agile values!
Broad Mission
• Propelling the understanding
and treatment of disease
• Collaborating deeply
• Reaching globally
• Empowering scientists
• Building partnerships
• Sharing data and knowledge
• Promoting inclusion
Agile themes
• Frequent delivery & feedback
• Teams communicating openly
• Doing our best work
Too many arrows!
12. How to measure Big Data?
• Classic way is via Doug Laney’s Volume, Velocity, Variety model
• Volume: size of data (e.g., total size of a data set, number of records, number
of files, size of files)
• Velocity: Rate at which data produced and changed (e.g., production of BAMs,
changes in UCSC genome releases GRCh37 vs hg17)
• Variety:
• Diversity of formats (e.g., FASTQ, BAM, VCF, CRAM)
• Non-aligned data structures (e.g., CDISC)
• Inconsistent data semantics (e.g., cell line names)
13. Thesis of this talk!
• Using Agile techniques in wet labs and computational science speeds production of
big data in multiple dimensions
• Volume
• Increases number of samples sequenced
• Lowers cost of sequencinganalysis and barriers to clinical sequencing
• Velocity
• Reduces cycle time of physical sample preparation prior to sequencing
• Improves use of people and resources in lab work
• Variety
• Increases types of samples being sequenced (e.g., types of cells, diseases,
ethnic and geographic diversity, nomenclatures, APIs, and repositories)
14. Broad Institute launched
Initial $100M gift from Broad Foundations;
A 10-year “experiment” in collaborative
science
Broad doubles in size
Governed by MIT-Harvard leadership;
Administratively managed within MIT
Headquarters building opens
250,000 sq. ft. at 415 Main Street
Broads double initial gift to $200M
Unrestricted for Broad research and
operations
Creation of Stanley Center
Founding $100M, 10-year gift from
Stanley Medical Research Institute
“Experiment” declared a success
Broads announce new endowment of $400 million
Combined $600M Current Use + Endowment Gift
Carlos Slim Foundation provides $65M
New initiative in genomic disease research;
1st U.S. collaboration to receive funding
Stanley
building opens
at 75 Ames Street
Second gift of $74M
Slim Initiative for Genomic
Medicine for the Americas
10th anniversary
$100M gift from Broad Foundations
to launch next decade of science
Creation of the Klarman Cell Observatory
Klarman Family Foundation gift of 33M
Commitment of $650M
Ted Stanley invests in
psychiatric research
2002 2004 2007 2008 2009 2010 2012 2013 20142006 2015
Broad Genomics
GP and DSP align
Genomics Platform
BSP Arrays and
Sequencing merge
Volume – Size of sequenced sample x # samples
100,000 genomes
~ 70 PB of data
~ 825K BAM files
~ 1.2 billion hours
of streaming music
Two major research groups come together
Whitehead/MIT Center for Genome Research;
Harvard Institute of Chemistry and Cell Biology
Broad Institute, Inc. established
501(c)3 formed 9/08; Operations begin 7/09
15. Velocity
• Sequence cost/genome fallen ~$1K
• Cost to analyze a genome has also
fallen to ~$5
• Why does this matter?
Precision/Personalized
medicine involves more
sequencing
• Assert: Agile increases
velocity of reducing costs
via shorter cycle times,
cheaper reagents, reusable
software, better use of
people, etc.
16. Velocity – Sample preparation and sequencing
• Reduces cycle time of physical sample preparation prior to sequencing
• Improves use of people and resources in lab work
• How? Using Dynamic Work Design
• Principle #1: Constant reconciliation of intent and activity
• Principle #2: Regular use of structured problem solving
• Principle #3: Optimal challenge
• Principle #4: Connect the human chain
17. Velocity – Sample preparation and sequencing
• Genomics
Platform
achieves
these results
through better
technology:
• Instruments
• Software
• Reagents
• Training
• Organization
18. Velocity – Sample preparation and sequencing
• Dynamic Work Design shares many similarities with Agile/Scrum and uses many of
the same techniques:
• Visual management
• Morning production meeting
• Pull system (Kanban)
21. Velocity – People and resources
• PRISM for multiplexing screen of compounds against
cancer cell lines (wet lab)
• Dependency Map a public
portal for cancer data (wet
lab, COTS software,
software development)
Agile practices used
• Retrospectives
• Standups
• Sprints
• Kaizen
• Visual board
22. Velocity – People and resources
• Improving use of people and resources in data
science by enabling reuse
• Data Biosphere: modular and interoperable
components that can be assembled into diverse
data environments. The Data Biosphere should be
based on four governing principles. It should be:
• (1) modular, composed of functional components with well-specified interfaces
• (2) community-driven, created by many groups to foster a diversity of ideas;
• (3) open, developed under open-source licenses that enable extensibility and reuse, with
users able to add custom, proprietary modules as needed
• (4) standards-based, consistent with standards developed by coalitions such as the Global
Alliance for Genomics and Health (GA4GH)
Agile values
• Deliver value
• Work together
• Self-organizing teams
• Simplicity
23. Variety
• Increases types of samples being sequenced in additional dimensions, e.g.,
• Types and sources of cells
• Types of diseases
• Ethnic and geographic diversity
• Nomenclatures, APIs, and repositories
• Agile practices being applied in each case, speeding the processing of samples
and the creation of both sample metadata and genomic data
24. Variety – Types and sources of cells
• Agile principles being used by Broad labs involved
in Human Cell Atlas to manage wet lab work (e.g.,
visual boards,
retrospectives)
• Agile used to develop
portals to enable patients,
at scale, to sign up and
consent for studies, and
for sample processing
25. Variety – Ethnic and geographic diversity
• In 2016, 81% of participants in Genome-Wide Association Studies (GWAS) of
European descent, where African, Latin American, native or indigenous make up
less than 4%
• Agile practices used to
further studies in under-
represented populations
(e.g., visual management,
short delivery cycles)
26. Variety – Types of diseases
• Agile practices used to aid the study of a wider range of
diseases, e.g.,
• The Sabeti Lab uses Agile
practices in their work on
infectious diseases to
enable real-time sharing of
genomic data
27. Variety – Nomenclatures, APIs, and repositories
• Nomenclatures are critically important to sharing
data and promoting collaboration (e.g., cell lines)
• Broad scientists, both wet lab and data, are key
contributors to organizations and alliances that
have and promote sharing of data through public
(and coordinated) APIs
• Agile practices
used by both
groups in their
daily work!
28. How the Broad encourages adoption of Agile
• Encourages collaboration within the Broad, e.g.,
• Platforms (e.g., Genomics, Data Sciences)
• Programs (e.g., Cancer, Infectious Disease and Microbiome)
• Academic labs (e.g., Sabeti Lab, Regev Lab)
• Employs Agile within scientific groups and administration, e.g.,
• Data Sciences Platform has Agile coaches, Scrum Masters, and Product Owners as job
descriptions/titles
• Broad Information Technology Services employs Scrum for specific projects
• Supports affinity groups and offers related training
• Agile Academia, focused specifically on educating and spreading use of Agile
• PM@Broad, focused on traditional project management, but PMI embracing Agile…
• People Development workshops (e.g., Influencing without Authority, Matrix Management)
29. Recapitulation – Thesis of this talk!
• Using Agile techniques in wet labs and computational science speeds production of
big data in multiple dimensions
• Volume
• Increases number of samples sequenced
• Lowers cost of sequencinganalysis and barriers to clinical sequencing
• Velocity
• Reduces cycle time of physical sample preparation prior to sequencing
• Improves use of people and resources in lab work
• Variety
• Increases types of samples being sequenced (e.g., types of cells, diseases,
ethnic and geographic diversity, nomenclatures, APIs, and repositories)
30. Acknowledgements
• Mark Baker
• Michelle Campo
• Jean Chang
• Raymond Coderre
• Sheila Dodge
• Vicky Guo
• Andrew Hollinger
• Eric Jones
• Jen Lapan
• Yenarae Lee
• Anthony Losada
• William Mayo
• Peter Ragone
• Jennifer Roth
Thank you to the many people who helped paved the way for current and
future success! A few notable individuals:
• Katie Shakun
• David Siedzik
• Rocky Stroud
• Diolinda Vaz
• Sarah Winnicki
Broad Alumni
• Sadiya Akasha
• Zeyna Haddad
Notas do Editor
All citations in modified MLA format: <author>. <title of source>. <title of container>, <other contributors>, <version>, <number>, <publisher>, <publication date in format <year>, <month> <day>. Retrieved from <url> on <year>, <month> <day>.
“Manifesto for Agile Software Development”, 2001. Retrieved from http://agilemanifesto.org on 2018, May 14.
“Principles behind the Agile Manifesto”, 2001. Retrieved from http://agilemanifesto.org/principles.html on 2018, May 14
Kendra West, Zeyna Hadadd. “Agile ToolKit @ Broad Workshop”, 2017, October 12.
“Principles behind the Agile Manifesto”, 2001. Retrieved from http://agilemanifesto.org/principles.html on 2018, May 14
Kendra West, Zeyna Hadadd. “Agile ToolKit @ Broad Workshop”, 2017, October 12.
“Principles behind the Agile Manifesto”, 2001. Retrieved from http://agilemanifesto.org/principles.html on 2018, May 14
Kendra West, Zeyna Hadadd. “Agile ToolKit @ Broad Workshop”, 2017, October 12.
Diego Lo Giudice , Holger Kisker, Nasry Angel, “How Can You Scale Your Agile Adoption?”, 2014, February 05. Forrester. Retrieved from https://www.forrester.com/report/How+Can+You+Scale+Your+Agile+Adoption/-/E-RES110444#AST962998%202013 on 2018, May 14.
Jeff Sutherland, J.J. Sutherland, “SCRUM The Art of Doing Twice the Work in Half the Time”, 2014. Retrieved from https://www.scruminc.com/new-scrum-the-book/ on 2018, May 14.
Kendra West, Zeyna Hadadd. “Agile ToolKit @ Broad Workshop”, 2017, October 12.
The actual order is Commitment, Courage, Focus, Openness, Respect
It’s not an acrostic, it’s a mesostic.
Wikipedia. Acrostic. Retrieved from https://en.wikipedia.org/wiki/Acrostic on 2018, May 14.
Wikipedia. Mesostic. Retrieved from https://en.wikipedia.org/wiki/Mesostic on 2018, May 14.
“The Scrum GuideTM”, 2017, November. Retrieved from http://www.scrumguides.org/scrum-guide.html on 2018, May 14.
We are OPEN about what we’re working on and our progress.
We have COURAGE to change; to take on new challenges; to have frank conversations.
We RESPECT each other’s time; ideas; skills. We respect our customers.
We are FOCUSED on our goal; shield each other from distractions.
We COMMIT to completing our work; to delivering value to the customer.
Ken Schwaber, Jeff Sutherland. “The Scrum GuideTM”, 2017, November. Retrieved from http://www.scrumguides.org/scrum-guide.html on 2018, May 14.
Doug Laney. “3D Data Management: Controlling Data Volume, Velocity, and Variety”. META Group (now Gartner Group). 2001, February 06. Retrieved from https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf on 2018, May 14.
Gartner, Inc. “Big Data”. Gartner IT Glossary > Big Data. Retrieved from https://www.gartner.com/it-glossary/big-data/ on 2018, May 14.
Seth Grimes. “4 VS for Big Data Analytics”. Breakthrough Analysis (blog). 2013, July 31. Retrieved from https://breakthroughanalysis.com/2013/07/31/4-vs-for-big-data-analytics/ on 2018, May 14.
International Business Machines Corp. “The Four V’s of Big Data”. Retrieved from http://www.ibmbigdatahub.com/infographic/four-vs-big-data on 2018, May 14.
“Dimensions of Big Data”. Klarity, Social Media Broadcasts (SMB) Limited. 2015, July 27. Retrieved from http://www.klarity-analytics.com/2015/07/27/dimensions-of-big-data/ on 2018, May 14.
Gil Press. “A Very Short History of Big Data”. Forbes Media LLC. 2013, December 21. Retrieved from https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/#897211d65a18 on 2018, May 14.
Steve Lohr. “The Origins of ‘Big Data’: An Etymological Detective Story”. The New York Times Company. 2013, February 01. Retrieved from https://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/?_r=0 on 2018, May 14.
University of California Santa Cruz, “List of UCSC genome releases”, “Frequently Asked Questions: Assembly Releases and Versions”. Retrieved from https://genome.ucsc.edu/FAQ/FAQreleases.html#release1 on 2018, May 14.
You can skip the rest of the talk if you get this
BAM files are 80 – 90 GB each
Just establishing Broad’s Big Data credentials in terms of size
Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, et. al. “Big Data: Astronomical or Genomical?”, PLOS Biology, 2015, July 15. Retrieved from http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 on 2018, May 14.
“DNA Sequencing Costs: Data”, National Human Genome Research Institute. Retrieved from https://www.genome.gov/sequencingcostsdata/ on 2018, May 14.
“Cost per Genome”, National Human Genome Research Institute. Retrieved from https://www.genome.gov/images/content/costpermb_2017.jpg on 2018, May 14.
Sheila Dodge, Don Kieffer, Nelson Repenning, et. al. “Using Dynamic Work Design to Help Cure Cancer (and other diseases)”, MIT Sloan School of Management, 2016, June. Retrieved from http://mitsloan.mit.edu/shared/ods/documents/Repenning_Cancer_full.pdf&PubID=15032 on 2018, May 14.
MIT Sloan Executive Education . “Speeding the cure for cancer: Financial engineering and Dynamic Work Design”, 2017, August 25. Retrieved from https://executive.mit.edu/blogpost/speeding-the-cure-for-cancer-financial-engineering-and-dynamic-work-design on 2018, May 14.
Alix Stuart, “From Cogs to Creators: Fueling employee engagement with dynamic work design”, Alumni Magazine, 2016. Retrieved from http://mitsloan.mit.edu/alumnimagazine/2016/fall/innovation-at-work.php on 2018, May 14.
Sheila Dodge, et. al. “Using Dynamic Work Design to Help Cure Cancer (and other diseases)”, 2016, June. Retrieved from mitsloan.mit.edu/shared/ods/documents/Repenning_Cancer_full.pdf&PubID=15032 on 2018, May 14.
MIT Sloan Executive Education . “Speeding the cure for cancer: Financial engineering and Dynamic Work Design”, 2017, August 25. Retrieved from https://executive.mit.edu/blogpost/speeding-the-cure-for-cancer-financial-engineering-and-dynamic-work-design on 2018, May 14.
Alix Stuart, “From Cogs to Creators: Fueling employee engagement with dynamic work design”, Alumni Magazine, 2016. Retrieved from http://mitsloan.mit.edu/alumnimagazine/2016/fall/innovation-at-work.php on 2018, May 14.
Sheila Dodge, et. al. “Using Dynamic Work Design to Help Cure Cancer (and other diseases)”, 2016, June. Retrieved from mitsloan.mit.edu/shared/ods/documents/Repenning_Cancer_full.pdf&PubID=15032 on 2018, May 14.
MIT Sloan Executive Education . “Speeding the cure for cancer: Financial engineering and Dynamic Work Design”, 2017, August 25. Retrieved from https://executive.mit.edu/blogpost/speeding-the-cure-for-cancer-financial-engineering-and-dynamic-work-design on 2018, May 14.
Alix Stuart, “From Cogs to Creators: Fueling employee engagement with dynamic work design”, Alumni Magazine, 2016. Retrieved from http://mitsloan.mit.edu/alumnimagazine/2016/fall/innovation-at-work.php on 2018, May 14.
Achilles. Retrieved from https://portals.broadinstitute.org/achilles on 2018, May 14.
PRISM. Retrieved from https://www.broadinstitute.org/news/7944 on 2018, May 14.
Dependency Map. Retrieved from https://depmap.org/portal/ on 2018, May 14.
Jamie Ducharme, “Local Researchers Mapped the Many Ways Cancer Cells Dodge Death”, Boston Magazine, 2017, July 31. Retrieved from https://www.bostonmagazine.com/health/2017/07/31/broad-cancer-dependency-map/ on 2018, May 14.
Benedict Paten, et. al. “A Data Biosphere for Biomedical Research”, Medium, 2017, Oct 16. Retrieved from https://medium.com/@benedictpaten/a-data-biosphere-for-biomedical-research-d212bbfae95d on 2018, May 14.
Human Cell Atlas. Retrieved from https://www.humancellatlas.org on 2018, May 14.
Metastatic Prostate Cancer Project. Retrieved from https://mpcproject.org/home on 2018, May 14.
Emily Mullin. “Solving the Lack of Diversity in Genomic Research”, MIT Technology Review, 2016, October 25. Retrieved from https://www.technologyreview.com/s/602671/solving-the-lack-of-diversity-in-genomic-research/ on 2018, May 14.
Heather Lindsey, “Bringing Diversity to Genomic Data: Under-Represented Ethnic Minorities Sometimes Misclassified, Misdiagnosed”, Clinical Laboratory News, 2017, June 1. Retrieved from https://www.aacc.org/publications/cln/articles/2017/june/bringing-diversity-to-genomic-data-under-represented-ethnic-minorities on 2018, May 14.
Zhai Yun Tan. “Genetic test accuracy stymied by lack of diversity in genomic research”, MedCityNews, 2016, August 18. Retrieved from https://medcitynews.com/2016/08/ack-of-diversity-in-genomic-research/?rf=1 on 2018, May 14.
NeuroGAP-Psychosis. Retrieved from https://www.broadinstitute.org/neurogap/neurogap-psychosis on 2018, May 14.
“Sherlock: Detecting disease with CRISPR”. https://www.broadinstitute.org/videos/sherlock-detecting-disease-crispr on 2018, May 14.
Sabeti Lab. Retrieved from https://www.sabetilab.org/ on 2018, May 14.
PRISM. Retrieved from https://www.broadinstitute.org/news/7944 on 2018, May 14.
Global Alliance for Genomics & Health. Retrieved from https://www.ga4gh.org/ on 2018, May 14.
Genomic Data Commons. Retrieved from https://www.cancer.gov/about-nci/organization/ccg/research/computational-genomics/gdc on 2018, May 14.
Horia Slusanschi, “Introducing the PMI Agile Practice Guide”, Projectmanagement.com, 2017, May 21. Retrieved from https://www.projectmanagement.com/blog-post/29761/Introducing-the-PMI-Agile-Practice-Guide on 2018, May 14.