Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing.
By Dr. Laila Alabidi at the JOSA Data Science Meetup on 17/8/2019.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Data Science in Action
1. Data Science in Action
Applications of Data Science in Drug
Discovery, Financial Services, Project
Management, Human Resources and
Marketing
2.
3. What this talk is about
My Journey
- Change of career
- Lessons learned
Industries covered
- Pharmaceutical
- Project management
- Financial services within a major UK bank
- Human resources and recruitment
- Marketing
4. Druuuuugssssss *the boring kind
Pubmed: a repository of all medical/biological
related literature
Every day hundreds of papers are published.
How can an expert in ontology know about
advances in diabetes research that will impact
his/her research?
Expediate the arrival at a Eureka! moment
Benevolent AI
Headquartered in London UK
Startup
Partnered with Astrazeneca
5. AI to the rescue!
- GPUs
- TPUs
- Lots of money
Combining free unstructured text with hand
curated databases in chemical-drug
protein-gene gene-drug databases to achieve
insight and aid the drug discovery scientist
reach their Eureka! moment.
NLP
- POS tagging
- Syntactic parsing
- Entity detection
Graph Theory
- Inferred edges
- Path analysis
Reinforcement Learning
- Software engineers
- Data Scientists
- Bionformaticians
- Drug discovery scientists
- Clever business type people
6. Show me the money!
Onset of open banking in the UK
Major banks want to get ahead of the curve by
extracting maximal insight from their data
Millins of transactional data
Transaction Classification -- what is a salary
payment? Are we losing money to competitors
when it comes to savings?
Pensions: how to identify trends in pensions and
attract/retain customers
Mudano
Consultancy
Startup
Visionaries in implementing ai in the project
management domain.
7. Give me time! Give me clarity!
Think Kanban, Atlassian JIRA
Think like a project manager-- how do I keep
track of all tickets of all projects of all my
human resources?
How do I streamline a project to cut waste
maximise production?
How do I identify pain points in a project? I.e.
issues that might delay delivery?
8. I want to hire the best!
Think like a recruiter.
How do I identify the optimum person for a job?
- Specific experience
- Specific qualification
- Likely to be ready to move
- Go beyond keyword matches
Often the client knows the person they want to
hire -- and want someone similar!
Pre-seed startup
Emerged from a startup incubator in London
(Founders Factory)
9. What to look for?
- Compare the companies a candidate worked for
- Large companies are distinctly different to smaller ones
- Similarity matrix -- dimensionality reduction
- Look for candidates who have similar job titles
- Semantic search
- From job descriptions
- Propensity to move: who is likely to be open to new job opportunities?
10. Who should I advertise to?
- Identifying target audiences online
- Demographics
- Online behaviour
- Location data
Want to know:
- Who is likely to visit a store
- Who is likely to click on a link
- Keeping things inline with GDPR
Part of the Ominicom Group
Operates like a startup (flat management, fluid
job description room to innovate)
With the backing of a large organisation
11. The common theme? Recommender Systems
Recommend --
- The protein which activates a gene
- A drug that activates the protein
- The ideal candidate for a job
- The most relevant notifications for a project
- The right audience for an advertisement
13. Getting the data--
- Buy it
- Scrape it
- Mine it (via apps cookies etc.)
- Download it
14. Identifying data quality: startup pitfall
- Many startups jump straight into the ds model
- Don't allow time for data quality checks
- Or understanding the data
- Decide on the list of desirables in advance
- Check for missing variables
- Correlations
- Check the volume of data when joined to other data sets
- Ask: can we impute the data?
- What can we do with missing/incomplete/inaccurate data?
15. The Data Science Model: circle of life
- Don't be clever
- Start simple
- Iterate
- Play with the data! Feature selection, feature engineering
- Understand the data -- gain a little domain knowledge
- Measured by whether you can hold a conversation with a domain expert
- Understand what is required!
- Determine what the desired outcome is
- Good precision or recall
- Auc
- If unsupervised how to determine quality
- Specific gain for the business? (more revenue? More efficient work? New discoveries)
- Precision might come at the expense of discovery!
- Recall might be at the expense of efficiency
16. Supervised vs unsupervised
- Do you have training data?
- Is that training data reliable?
- What is the source?
- Mechanical Turk?
- Expert annotation?
- Is it biased?
- Is my training data copious?
- Can I combine golden corpora with silver?
17. ML vs DL
- Do you need transparency or explainability?
- E.g. legal or financial services
- How much data do you have?
- Does it support DL?
- Do you have the technology?
- Time?
- Money?
- DL models on GPUs are expensive
18. Presenting results
Conveying the:
- Significance
- Importance
- Limitations
Of a project to stakeholders/clients/management
- People who are non-experts in the field
19. Presenting the results: pearls of wisdom
- Don’t lie
- Don’t exaggerate
- Be clear
- Be honest
- Try to think like a stakeholder
- What do they want from the project?
- How do I present the importance and usefulness of the results?
- Explain the benefit of the complicates/time consuming/ expensive DS approach with the
easier/cheaper faster methods