Presentation given at the Big Data in Pharma Europe conference, London February 19th 2014 (http://bigdatapharma-europe.com/). Updated for Enterprise Search Europe Summit April 29th (http://www.enterprisesearcheurope.com/2014/Tuesday.aspx).
Overview about the innovation approach taken within Technology Services at AstraZeneca, showcasing the approach, 6 examples of pilots and proof-of-concepts, and with a case study of how to implement a revolution in search analytics, using R&D as a springboard for enterprise.
Generative Artificial Intelligence: How generative AI works.pdf
Insight into AstraZeneca's Technology Services.
1. Fostering Collaboration Using Analytics
& Real-time Big Data Search:
Insight into Technology Services
Nick Brown
Technology Services Lead (EMEA)
AstraZeneca
2. AstraZeneca History
Health Connects Us All
AstraZeneca is a biopharmaceutical company with R&D at its core. Our business is
providing innovative, effective medicines that make a real difference to patients.
We have grown from agrochemicals and paints, to pharmaceuticals and biologics. As we
virtualise our R&D activities, working increasingly with external researchers, access and
how we leverage information is critical to our success.
Unfortunately there is no silver bullet and search is still an evolving art – it is how
we innovate and analyse our information will play a huge part in our future.
3. Culture of Innovation
Through Technology Services
Drive technology standards across AstraZeneca by
fostering collaborative pilots to simplify landscape
Nurture ideas that deliver immediate value to
the business, leading to step-changes
1
2
3
Create a safe environment to explore innovative
ideas closely with the business functions
4. Fail Fast Whilst
Delivering Business Value
Our technology services team works with novel approaches from start-ups, research labs,
biotechnology companies & entrepreneurs using a 3 step model for rapid business value.
5 day Proof-of-concepts (PoC) use existing functionality. Fail fast if unsuccessful.
3 month Pilots with a senior leader and a single business problem to solve.
All pilots must deliver successful value and if the approach is seen to disrupt with step-
wise improvements, then we work closely to engage and drive implementation.
11 9 3PoC Pilot Implement
5. Mobile Ideation Pilot
Gamification & Lightweight
Co-developed a mobile ideation platform
to enable AstraZeneca and it’s partners to
tap into ideas of the collective workforce.
Expanded from PoC to working prototype
very rapidly and now keen interest from
other areas of the business to leverage.
If broken down into small pieces, the crowd can help analyse big data sets
6. Open Up And Unlock
Potential Of Our Big Data
Many more people outside of AstraZeneca have already solved big data challenges, so we
piloted multiple crowd-sourcing platforms such as a community of >100,000 data scientists
Using online competitions, we received solutions from
experts in oil & gas, meteorological and mathematics. We
made available millions of historical prescription and call
pattern data points for one of our major brands, to model
and identify key impact metrics of promotional material.
Identified 3 key metrics taken back into the field
Evaluated different models and an ecosystem approach
for crowd-sourcing probably right for us now
We can learn from every industry that faces different big data challenges
7. Identifying New Entities Across
Big Data Textual Haystacks
Scientific
abstracts
Algorithm to
identify novel
entities
Comparison to
existing CI
databases
Identification of
potential NME
opportunities
Initiated a proof-of-concept with Thomson Reuters to leverage techniques applied in
news editorials to identify new potential drug candidates not seen in CI databases.
Drug entities are captured by information companies through analyst reports, websites
and publications, but candidates from smaller biotechs or academics can be missed
Identified >50 late-stage drug candidates from BRIC-MT with huge potential for
our in-licensing teams across AstraZeneca as relatively unknown externally.
It’s often the small data hidden in big data that offers the real gems
8. Sponsored IBM Extreme Blue internship program where students are
challenged to solve a business problem in 10 weeks. The team developed
oncology patient website to initially record their real voice, phonetic
algorithms, and a mobile android application to capture GPS and predict
sentences that laryngectomy patients would use in real life.
Capturing The Real Voice
For Cancer Patients
The team went on to win the 2013 European Expo within IBM across all projects and were interviewed
by Computer Weekly (press release). We are now working with colleagues in oncology to understand
the next steps and discussing wider options with local UK charities and foundations.
big data analytics has the potential to help patients in their daily lives
9. Case Study: PoC to Implementation
Distributed R&D
Photo Credit: http://cdn-wac.emirates247.com/polopoly_fs/1.509718.1370831315!/image/256556252.jpg
10. R&D Search Pilots
3 way head-to-head competition
In Q1 2013, assessed 20 enterprise search platforms & piloted 3 companies in an internal
competition to revolution search within R&D. Our tests included indexing 50M documents,
semantic tagging, text analytics and building a search based application with visualisation.
Sinequa selected as most advanced big data analytics & real time search
11. Rapidly build business intelligence applications including mobile15
>120 Connectors to unstructured & structured, internal & external data11
Accurate semantic mark-up with most advanced text-mining capabilities12
Intelligent, intuitive search hides advanced & complex search features13
Generate insight, analytics & alerts across billions of knowledge facts14
R&D Search
Platform Implementation
In July 2013, licensed Sinequa for R&D search with the intention of establishing
the hardware platform in Q3 and releasing to R&D in Q4.
12. Virtual Team
Connected By Passion
To build our applications rapidly, we supplement our team with external
experts, including running competitions on open innovation platform like
TopCoder.
13. R&D Search & Analytics
Real-time, Big Data
Volume
SCALE OF DATA
Variety
DIFFERENT FORMS OF
DATA
Velocity
ANALYSIS OF
STREAMING DATA
Veracity
UNCERTAINTY
OF DATA
For more information
on R&D Search,
contact Nick Brown
The 4V’s of R&D
Search
Over 80% of our scientific
information is unstructured and
distributed in silos across our
business. By adopting Big Data
approaches, we aim to improve
access and our decision making.
By 2014
Implement R&D Search across
iMED, GMD and MedImmune.
Springboard for one enterprise
search platform for AstraZeneca.
Filter only HQ scientific content
External (publications, patents, trials,
grants, news, conference reports)
Internal (sharepoint, documentum,
fileshares, oracle, O365, bespoke)
Daily incremental, real-time news
Automatically tagged 20 scientific
vocabularies
Deduplication of >100M documents
>20 Terabytes of internal content
Over 1 billion knowledge
connections
14. R&D Search
Screenshot
R&D Searches across all internal and external content,
developing a relevancy algorithm to find key scientific
documents, leveraging all synonyms under the hood
15. Big Data Analytis
Not Just Search & Find
Teams search their rich
internal sources but now
find relevant documents
and any associated drugs,
genes, mechanisms,
diseases and even people.
From the start of our project,
our intention was to using a big
data engine to turn scientific
information into business
intelligence through search-
based applications.
16. R&D KOL (Key Opinion Leaders)
Visual Insight
Search isn’t just about finding people! In days, we can build visualisations that extract
insight enabling business decisions (eg KOLs) without a single document ever being read.
17. R&D Intelligence
Powerful Analytics
We built R&D Intelligence to find things you don’t know about ! This computes sentence
level co-currence between any two entities instantly to spot new opportunities
Fantastic for drug repositioning, finding new life-cycle management ideas and target
identification, but enables scientists to view only sentence evidence they have rights to.
18. R&D Experts
Find & Connect within AZ & MedImmune
Find and connect to the key experts on
any scientific topic across R&D
Automatically updated profiles
Minimise duplication
Increase cross R&D collaboration
Advertise yourself
Enables social network analysis
19. R&D ChemSearch
Hunt By Chemical Sub-Structures
Users can draw a compound and search for
exact, sub-structure or similar structures.
Search against hundreds of million of AZ
compounds in R&D search library
Find documents with sub-structures
20. R&D Pulse
Alerts To New Content
R&D Pulse aims to give users access to only the latest information (past 2 weeks) with
access to all internal and external content. In addition, users can click to view the story
instantly or setup daily or weekly alerts, as well as use common search strategies. In
addition, users can star favourite articles to come back to or read later.
21. R&D Search On The Move
Mobile Client
Accessible via Amazon web-services using Ping Federate (authentication) and Data
Power (access & exchange), to enabled mobile applications to query against our big
data search index. This makes our cloud-services lightweight and quick, but elastic to
expand as demand increases. Our business applications are typically built in responsive,
HTML5 and CSS3 to accommodate smart phones, tablets and laptop users.
22. Piloting Novel Technologies For
Measurably Accurate Text Analytics
Even with R&D Search, continue to test additional enhancements to our engine
IBM Watson
Converting unstructured data into structured knowledge is key
With IBM’s Emerging Technology Research, piloted rule-based analytics engine
that creates structure from unstructured text as accurately as manual curation!
Unstructured text or tables Structured data
MATA
23. Preventing Users From Big Data
Overload Using Trend Analytics
PoC with Saama Technologies to demonstrate data science capabilities using
over 500 million connections from 60 million scientific documents
Using tools like
Google Big Query,
able to process all of
this information,
across 5 data types,
using each pairwise
combination of 10
scientific
vocabularies with
trend analytics in
seconds
Use analytics to alert to only critical information in the huge torrent flow
24. R&D Search is a foundation
Some towers will continue to be built
We are still creating the foundations for our big data engines, some will grow,
new ones will develop – but the future is very exciting for data scientists.
25. Thank You
Acknowledgements & Questions
This presentation describes work that has taken place in the past 12 months, clearly
not possible without the enormous support from many people, with great thanks to:
AstraZeneca: Ravi Sajja, Paul Fitzpatrick, Nasko Radev, Rob Hernandez, Susan Donohoe,
Ming Chen, Fari Song, Steve Woodward, Akshay Tankhiwale, Kris Nayak, Sunny Advani,
Youssef Belghali Sinequa: Christian Sestier, Tim Bell, Ariane Cavet, Frédéric Lardé & Alex
Bilger. Pebble Code: John Mildinhall, Tak Tran, Mark Durrant, Nancy Lee & Toby Hunt. IBM:
Tim Donovan, Jessica Evans, Matthew Lee, Joshua Lund, Sylvain Garcon, James Magowen,
James Luke, Edd Biddle, Alan Knox & Henry Grahame-Smith. Thomson Reuters: Matthew
Gowen, Redmond Garvey & Annabel Griffiths. Saama Technologies: Anil Nair, Krunal Patel,
Laeeq Siddique, Aditya Phatak & Mark Hanson.
Get In Touch
For more information or to discuss a novel,
exciting technology that you think we would
be interested in, please reach out to me at
nick.brown@astrazeneca.com