Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

•Transferir como PPTX, PDF•

3 gostaram•786 visualizações

A text mining system must go way beyond indexing and search to appear truly intelligent. First, it should understand language beyond keyword matching (for example, distinguishing between “Jane has the flu,” “Jane may have the flu,” “Jane is concerned about the flu," “Jane’s sister has the flu, but she doesn’t,” or “Jane had the flu when she was 9” is of critical importance). This is a natural language processing problem. Second, it should “read between the lines” and make likely inferences even if they’re not explicitly written (for example, if Jane has had a fever, a headache, fatigue, and a runny nose for three days, not as part of an ongoing condition, then she likely has the flu). This is a semi-supervised machine learning problem. And third, it should automatically learn the right contextual inferences to make (for example, learning on its own that fatigue is (sometimes) a flu symptom—only because it appears in many diagnosed patients—without a human ever explicitly stating that rule). This is an association-mining problem, which can be tackled via deep learning or via more guided machine-learning techniques. This is a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records and provides real-time inferencing at scale. The architecture is built out of open source big data components: Kafka and Spark Streaming for real-time data ingestion and processing, Spark for modeling, and Titan and Elasticsearch for enabling low-latency access to results. The data science components include a UIMA pipeline with custom annotators, machine-learning models for implicit inferences, and dynamic ontologies based on deep learning with Word2Vec for representing and learning new relationships between concepts. Source code is publicly available to enable you to hack away on your own.

Software

David Talby
@davidtalby
CTO, Atigeo
SEMANTIC NATURAL LANGUAGE UNDERSTANDING
WITH SPARK, UIMA & MACHINE-LEARNED ONTOLOGIES
Claudiu Branzan
@melcutz
Principal Lead, Atigeo

2
2
THE PROBLEM
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Who is getting
meds they’re
allergic to?
Who on this protocol
did not have this
side effect?

3
AT THE BEGINNING, THERE WAS SEARCH
Scalable & robust Indexing pipeline
Tokenizers & analyzers
Synonyms, spellers & Auto-suggest
File formats & header boosting
Rankers, link & reputation boosting

4
THEN THERE WAS SEMANTIC SEARCH
“cheap red prom dresses”
“laptops under $500”
“italian restaurants near me that deliver”
“captain america civil war tonight”
“nba scores”
Dictionary Based Attribute Extraction
Dell - XPS 15.6 4K Ultra HD Touch-Screen
Laptop - Intel Core i5 - 8GB Memory -
256GB Solid State Drive - Silver
Machine Learned Attribute Extraction
If you go for the ambience, you'll be
disappointed. If you go for good,
inexpensive and authentic Mexican food,
then you're in the right place.

5
AND THEN, YOU NEED TO UNDERSTAND LANGUAGE
Prescribing sick days due to diagnosis of influenza. Positive
Jane complains about flu-like symptoms. Speculative
Jane may be experiencing some sort of flu episode. Possible
Jane’s RIDT came back negative for influenza. Negative
Jane is at high risk for flu if she’s not vaccinated. Conditional
Jane’s older brother had the flu last month. Family history
Jane had a severe case of flu last year. Patient history

6
LANGUAGE GETS COMPLEX & DOMAIN SPECIFIC
Joe expressed concerns about the risks of bird flu. Nothing
Joe shows no signs of stroke, except for numbness. Double Negative
Nausea, vomiting and ankle swelling negative. Compound
(it gets worse – in reality a lot of text isn’t valid English)
Patient denies alcohol abuse. Speculative
Allergies: Penicillin, Dust, Sneezing. Compound

7
7
LET’S BUILD THIS!
The input
(patient records)
The processing
framework
The output The query engines

8
8
SENTENCE DETECTION
SECTION DETECTION
TOKENIZER LEMMATIZER
STOPWORD REMOVAL
NEGATION DETECTION
CONDITIONAL SCOPE
SPECULATIVE SCOPE
DATE NUMBER UNIT QUANITITY
CONCEPT EXTRACTION

1 0
10
MACHINE LEARNED ANNOTATORS
Grammatical Patterns
If … then …
Direct Inferences
Age < 18 ==> Child
Lookups
RIDT (lab test)
Under-diagnosed conditions
Flu Depression
Implied by Context
relevant labs normal
Sometimes, it’s easier to just code an annotation’s business logic
But sometimes it’s easier to learn it from examples:

1 1
11
Second Demo: Machine Learned Annotator

1 3
13
WHAT ABOUT EXPANDING & UPDATING ONTOLOGIES?
Word2Vec

1 6
16
SUMMARY & APPLICATIONS
Who needs to
be vaccinated?
Who fits this
clinical trial?
Who is at risk
for sepsis?
Who is getting
meds they’re
allergic to?
Who on this protocol
did not have this
side effect?

APPENDIX
In case the live demo gets cold feet on stage
1 9

Mais conteúdo relacionado

Destaque

"There's a bot for that!" - The World of Conversational UIs and Chat BotsVishrut Shukla

Natural Language Processing for the Semantic WebIsabelle Augenstein

Tokyo azure meetup #13 build bots with azure bot servicesTokyo Azure Meetup

Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...Sage Franch

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Spark Summit

Data day2017Sanghamitra Deb

From Rocket Science to Data ScienceSanghamitra Deb

Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit

Big Data Analytics for Healthcare Decision Support- Operational and ClinicalAdrish Sannyasi

Using Machine Learning to Automate Clinical Pathwaysdiannepatricia

Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Cirdan

Clinical Trial Management Systems of next next decadeFotis Stathopoulos

Clinical research and clinical data management - Ikya Globalikya global

Oncology Big Data: A Mirage or Oasis of Clinical Value? Michael Peters

Clinical Data Management: Strategies for unregulated dataIUPUI

Artificial Intelligence Muhammad Ahad

NLPGirish Khanzode

Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Perficient

Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson

Smart Data Conference: DL4J and DataVecJosh Patterson

Destaque (20)

"There's a bot for that!" - The World of Conversational UIs and Chat Bots

Natural Language Processing for the Semantic Web

Tokyo azure meetup #13 build bots with azure bot services

Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...

Data day2017

From Rocket Science to Data Science

Extending Data Lake using the Lambda Architecture June 2015

Big Data Analytics for Healthcare Decision Support- Operational and Clinical

Using Machine Learning to Automate Clinical Pathways

Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...

Clinical Trial Management Systems of next next decade

Clinical research and clinical data management - Ikya Global

Oncology Big Data: A Mirage or Oasis of Clinical Value?

Clinical Data Management: Strategies for unregulated data

Artificial Intelligence

NLP

Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6

Deep Learning and Recurrent Neural Networks in the Enterprise

Smart Data Conference: DL4J and DataVec

Semelhante a Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

2014 abic-talkc.titus.brown

How Four Statistical Rules Forecast Who Wins a Competitive BidIntelCollab.com

All you need know about testingJorge Barroso

Essential Biology 04.4 Genetic Engineering & BiotechnologyStephen Taylor

BigData and Algorithms - LA Algorithmic TradingTim Shea

Discount Usability Testing for Agile TeamsBen Carey

IMA How to Give A Great Research Talk Julie Greensmith

Opsec for security researchersvicenteDiaz_KL

Plexus Sept Oct 2013Jane Warren, MTPW, ELS

Gamification of Chaos TestingBram Vogelaar

The Semantic Web - This time... its PersonalMark Wilkinson

4 Factors That Affect Research ReproducibilityCellero

Bioanalytical validation house of cardsE. Dennis Bashaw

Stuart Reid - When Passion Obscures the Facts:The Case For Evidence-Based Te...TEST Huddle

II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult

Deep Learning Applications in the EnterpriseGanes Kesari

Chaos Engineering Without Observability ... Is Just ChaosCharity Majors

TCUK 2012, Leah Guren, Golden Rules ReduxTCUK Conference

Hogeschool Den Haag Legal Analyticsjcscholtes

COVID-19 Antibody Test+Vaccination Certificates: There's an app for thatmeisenstadt

Semelhante a Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies (20)

2014 abic-talk

How Four Statistical Rules Forecast Who Wins a Competitive Bid

All you need know about testing

Essential Biology 04.4 Genetic Engineering & Biotechnology

BigData and Algorithms - LA Algorithmic Trading

Discount Usability Testing for Agile Teams

IMA How to Give A Great Research Talk

Opsec for security researchers

Plexus Sept Oct 2013

Gamification of Chaos Testing

The Semantic Web - This time... its Personal

4 Factors That Affect Research Reproducibility

Bioanalytical validation house of cards

Stuart Reid - When Passion Obscures the Facts:The Case For Evidence-Based Te...

II-SDV 2015, 20 - 21 April, in Nice

Deep Learning Applications in the Enterprise

Chaos Engineering Without Observability ... Is Just Chaos

TCUK 2012, Leah Guren, Golden Rules Redux

Hogeschool Den Haag Legal Analytics

COVID-19 Antibody Test+Vaccination Certificates: There's an app for that

Mais de David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby

Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby

How to Apply NLP to Analyze Clinical TrialsDavid Talby

New Frontiers in Applied NLP - PAW Healthcare 2022David Talby

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby

Applying NLP to Personalized Healthcare - 2021David Talby

Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby

Natural Language Understanding in HealthcareDavid Talby

Architecting an Open Source AI Platform 2018 editionDavid Talby

Deep learning for natural language understandingDavid Talby

Build your open source data science platformDavid Talby

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby

Mais de David Talby (12)

Building State-of-the-art Natural Language Processing Projects with Free Soft...

Turning Medical Expert Knowledge into Responsible Language Models - K1st World

How to Apply NLP to Analyze Clinical Trials

New Frontiers in Applied NLP - PAW Healthcare 2022

Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...

Applying NLP to Personalized Healthcare - 2021

Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...

Natural Language Understanding in Healthcare

Architecting an Open Source AI Platform 2018 edition

Deep learning for natural language understanding

Build your open source data science platform

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Último

Software Quality Assurance Interview QuestionsArshad QA

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek

Pharm-D Biostatistics and Research methodologyAnusha Are

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

Right Money Management App For Your Financial GoalsJhone kinadey

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

1. David Talby @davidtalby CTO, Atigeo SEMANTIC NATURAL LANGUAGE UNDERSTANDING WITH SPARK, UIMA & MACHINE-LEARNED ONTOLOGIES Claudiu Branzan @melcutz Principal Lead, Atigeo

2. 2 2 THE PROBLEM Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis? Who is getting meds they’re allergic to? Who on this protocol did not have this side effect?

3. 3 AT THE BEGINNING, THERE WAS SEARCH Scalable & robust Indexing pipeline Tokenizers & analyzers Synonyms, spellers & Auto-suggest File formats & header boosting Rankers, link & reputation boosting

4. 4 THEN THERE WAS SEMANTIC SEARCH “cheap red prom dresses” “laptops under $500” “italian restaurants near me that deliver” “captain america civil war tonight” “nba scores” Dictionary Based Attribute Extraction Dell - XPS 15.6 4K Ultra HD Touch-Screen Laptop - Intel Core i5 - 8GB Memory - 256GB Solid State Drive - Silver Machine Learned Attribute Extraction If you go for the ambience, you'll be disappointed. If you go for good, inexpensive and authentic Mexican food, then you're in the right place.

5. 5 AND THEN, YOU NEED TO UNDERSTAND LANGUAGE Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane may be experiencing some sort of flu episode. Possible Jane’s RIDT came back negative for influenza. Negative Jane is at high risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history

6. 6 LANGUAGE GETS COMPLEX & DOMAIN SPECIFIC Joe expressed concerns about the risks of bird flu. Nothing Joe shows no signs of stroke, except for numbness. Double Negative Nausea, vomiting and ankle swelling negative. Compound (it gets worse – in reality a lot of text isn’t valid English) Patient denies alcohol abuse. Speculative Allergies: Penicillin, Dust, Sneezing. Compound

7. 7 7 LET’S BUILD THIS! The input (patient records) The processing framework The output The query engines

8. 8 8 SENTENCE DETECTION SECTION DETECTION TOKENIZER LEMMATIZER STOPWORD REMOVAL NEGATION DETECTION CONDITIONAL SCOPE SPECULATIVE SCOPE DATE NUMBER UNIT QUANITITY CONCEPT EXTRACTION

9. 9 9 First Demo: Annotators & Assertions

10. 1 0 10 MACHINE LEARNED ANNOTATORS Grammatical Patterns If … then … Direct Inferences Age < 18 ==> Child Lookups RIDT (lab test) Under-diagnosed conditions Flu Depression Implied by Context relevant labs normal Sometimes, it’s easier to just code an annotation’s business logic But sometimes it’s easier to learn it from examples:

11. 1 1 11 Second Demo: Machine Learned Annotator

12. 1 2

13. 1 3 13 WHAT ABOUT EXPANDING & UPDATING ONTOLOGIES? Word2Vec

14. 1 4 14 LET’S BUILD THIS TOO!

15. 1 5 15 Third Demo: Ontology Enrichment

16. 1 6 16 SUMMARY & APPLICATIONS Who needs to be vaccinated? Who fits this clinical trial? Who is at risk for sepsis? Who is getting meds they’re allergic to? Who on this protocol did not have this side effect?

17. 1 7 17 @Atigeo @melcutz @davidtalby

18. © 2015 Atigeo, Corporation. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19. APPENDIX In case the live demo gets cold feet on stage 1 9

20. 2 0

21. 2 1

22. 2 2

23. 2 3

24. 2 4

25. 2 5

26. 2 6

27. 2 7

28. 2 8

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies

Semelhante a Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies (20)

Mais de David Talby

Mais de David Talby (12)

Último

Último (20)

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned Ontologies