SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Armando@dataAI.uk
Jointly embeddings text and
Knowledge Graph for information
extraction
Armando Vieira
Data Scientist @dataAI and @Stratified Medical
Armando@dataAI.uk
Summary
Why machines struggle to “understand” text?
The challenges of discover new knowledge in text
Deep Learning to the rescue
Words as distributed vectors
Combining text with knowledge graphs
Armando@dataAI.uk
Wouldn't it be great that...
We could extract “knowledge” expressed in text
into a machine readable format?
Armando@dataAI.uk
Or that...
We could transform all biomedical information into
an automated drug discovery process
Armando@dataAI.uk
NLP: the traditional way
Armando@dataAI.uk
Armando@dataAI.uk
Why understanding text is so hard
for a machine?
The verbs nightmare
Nested structures
Syntactic is doable semantics is hard
Other challenges (negations,…)
Long range interactions
Armando@dataAI.uk
Deep learning to the rescue
Armando@dataAI.uk
How distributed representations
solve the curse of dimensionality
problem
Armando@dataAI.uk
Armando@dataAI.uk
Armando@dataAI.uk
Distributed representations are
powerful
Armando@dataAI.uk
Armando@dataAI.uk
The Skip-gram algorithm
IDEA: Words together are semantically related
Mikolov et al 2013
Armando@dataAI.uk
But its not the end of the story
The verbs nightmare
Nested relations structure
Syntactic is doable semantics is hard
Other challenges (negations,…)
Long range correlations
Armando@dataAI.uk
Neural Embeddings
Credit: Omer Levy
Armando@dataAI.uk
Mikolov et al. (2013)
Armando@dataAI.uk
Armando@dataAI.uk
Armando@dataAI.uk
What does each similarity term
mean?
Observe the joint features with explicit representations!
uncrowned Elizabeth
majesty Katherine
second impregnate
… …
Armando@dataAI.uk
Words as vector operations
Armando@dataAI.uk
Gensim implementation in Python
Armando@dataAI.uk
Armando@dataAI.uk
How to train the embedding?
Armando@dataAI.uk
Advantages
 Efficient coding of words and relations
 Capture both local and global semantics
 Easy to parallelize
 Completely unsupervised
 Can easily handle ambiguity
Armando@dataAI.uk
Limitations of word embeddings
 They are (bi)linear machines
 Perform poorly on infrequent words
 Can not incorporate external knowledge
Armando@dataAI.uk
Knowledge graphs
Armando@dataAI.uk
Armando@dataAI.uk
Armando@dataAI.uk
Armando@dataAI.uk
Armando@dataAI.uk
Why its hard to expand knowledge?
 Sparsely connected
 Highest degree nodes are sometimes irrelevant
 Some relations types are too vague
 Integrate local and global (contextual) information
Armando@dataAI.uk
Combining text and graphs
Armando@dataAI.uk
What’s inside a knowledge graph?
Armando@dataAI.uk
Idea: combine KG and text corpus
Armando@dataAI.uk
The algorithm
Chang Xu et al
Armando@dataAI.uk
Data
Wikipedia 2014
• 3.5 billion word tokens
• Vocabulary size: 2 million
Freebase
• 44 million topics
• 2.4 billion facts
• > 1500 relation types
Armando@dataAI.uk
Results
Corpus of data ??
Armando@dataAI.uk
Armando@dataAI.uk
Beating humans in IQ test?
Analogy I Isotherm is to temperature as isobar is to:
A) atmosphere, B) wind; C) Pressure; D) latitude; E) current
Analogy 2 Identify two words (one from each set of
brackets) that form a connection (analogy) when paired with
the words in capitals: CHAPTER (book, verse, read), ACT
(stage, audience, play).
Classification Which is the odd one out?
(i) calm, (ii) quiet, (iii) relaxed, (iv) serene, (v) unruffled.
Synonym Which word is closest to IRRATIONAL?
(i) intransigent, (ii) irredeemable, (iii) unsafe, (iv) lost, (v)
nonsensical.
Antonym Which word is most opposite to MUSICAL?
(i) discordant, (ii) loud, (iii) lyrical, (iv) verbal, (v) euphonious
Armando@dataAI.uk
In average, yes!
Huang et al, June 2015
Armando@dataAI.uk
Resources
http://technology.stitchfix.com/blog/2015/03/11/
word-is-worth-a-thousand-vectors/ Chris Moody
https://levyomer.wordpress.com Levy Omer
Armando@dataAI.uk
How about biomedical data?
Few data (25 million documents)
Complex interactions between entities
Fat tail
Incorporate constrains from Physics, Chemistry &
Biology
Non-linearities: complex manifold
Armando@dataAI.uk
From here…
Neuroinflammation is the local reaction of the brain to infection, trauma, toxic
molecules or protein aggregates. The brain resident macrophages, microglia, are
able to trigger an appropriate response involving secretion of cytokines and
chemokines, resulting in the activation of astrocytes and recruitment of peripheral
immune cells. IL-1β plays an important role in this response; yet its production and
mode of action in the brain are not fully understood and its precise implication in
neurodegenerative diseases needs further characterization. Our results indicate that
the capacity to form a functional NLRP3 inflammasome and secretion of IL-1β is
limited to the microglial compartment in the mouse brain. We were not able to
observe IL-1β secretion from astrocytes, nor do they express all NLRP3
inflammasme components. Microglia were able to produce IL-1β in response to
different classical inflammasome activators, such as ATP, Nigericin or Alum. Similarly,
microglia secreted IL-18 and IL-1α, two other inflammasome-linked pro-inflammatory
factors. Cell stimulation with α-synuclein, a neurodegenerative disease-related
peptide, did not result in the release of active IL-1β by microglia, despite a weak pro-
inflammatory effect. Amyloid-β peptides were able to activate the NLRP3
inflammasome in microglia and IL-1β secretion occurred in a P2X7 receptor-
independent manner. Thus microglia-dependent inflammasome activation can play
an important role in the brain and especially in neuroinflammatory conditions.
Armando@dataAI.uk
To here
If protein A interacts with gene G at cell types C
what other proteins related to A may interact with
gene G at cell types C1?
If chemical Q attach to target T at protein P what
chemicals may attach to target T1 at protein P1?
Armando@dataAI.uk
Looking for new knowledge
We are not really looking to understand language
Rather
Extract and “validate” novel knowledge.

Mais conteúdo relacionado

Destaque

Data minning gaspar_2010
Data minning gaspar_2010Data minning gaspar_2010
Data minning gaspar_2010Armando Vieira
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionArmando Vieira
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010Armando Vieira
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars salesArmando Vieira
 
Rumor spreading and viral marketing on facebook
Rumor spreading and viral marketing on facebookRumor spreading and viral marketing on facebook
Rumor spreading and viral marketing on facebookArmando Vieira
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando VieiraArmando Vieira
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationArmando Vieira
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaignsArmando Vieira
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systemsArmando Vieira
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsArmando Vieira
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsArmando Vieira
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
 
Spread influence on social networks
Spread influence on social networksSpread influence on social networks
Spread influence on social networksArmando Vieira
 
Key ratios for financial analysis
Key ratios for financial analysisKey ratios for financial analysis
Key ratios for financial analysisArmando Vieira
 

Destaque (20)

Data minning gaspar_2010
Data minning gaspar_2010Data minning gaspar_2010
Data minning gaspar_2010
 
Requiem pelo ensino
Requiem pelo ensino Requiem pelo ensino
Requiem pelo ensino
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy prediction
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars sales
 
Rumor spreading and viral marketing on facebook
Rumor spreading and viral marketing on facebookRumor spreading and viral marketing on facebook
Rumor spreading and viral marketing on facebook
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando Vieira
 
Einstein
EinsteinEinstein
Einstein
 
Pcinicas
PcinicasPcinicas
Pcinicas
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective acceleration
 
Eurogen v
Eurogen vEurogen v
Eurogen v
 
Barcelona sabatica
Barcelona sabaticaBarcelona sabatica
Barcelona sabatica
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaigns
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systems
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithms
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
Spread influence on social networks
Spread influence on social networksSpread influence on social networks
Spread influence on social networks
 
Key ratios for financial analysis
Key ratios for financial analysisKey ratios for financial analysis
Key ratios for financial analysis
 

Semelhante a Extracting Knowledge from Pydata London 2015

Artificial intelligence original
Artificial intelligence originalArtificial intelligence original
Artificial intelligence originalSaila Sri
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overviewbutest
 
PPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYPPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYbutest
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Amit Sheth
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Artificial Intelligence Institute at UofSC
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...Dr. Haxel Consult
 
White-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfWhite-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfBoris647814
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docbutest
 
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingAn-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingTheodore J. LaGrow
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014lebsoftshore
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceyham manansala
 
Seminar Neuro-computing
Seminar Neuro-computingSeminar Neuro-computing
Seminar Neuro-computingAniket Jadhao
 

Semelhante a Extracting Knowledge from Pydata London 2015 (20)

Artificial intelligence original
Artificial intelligence originalArtificial intelligence original
Artificial intelligence original
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
Artificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and OverviewArtificial Intelligence AI Topics History and Overview
Artificial Intelligence AI Topics History and Overview
 
PPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORYPPT slides - MACHINE PERCEPTION LABORATORY
PPT slides - MACHINE PERCEPTION LABORATORY
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
 
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Semantics of the Black-Box: Using knowledge-infused learning approach to make...
Semantics of the Black-Box: Using knowledge-infused learning approach to make...
 
Swedish
SwedishSwedish
Swedish
 
Thought-powered typing
Thought-powered typing Thought-powered typing
Thought-powered typing
 
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...
 
White-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdfWhite-Paper-the-AI-behind-vectra-AI.pdf
White-Paper-the-AI-behind-vectra-AI.pdf
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Introduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.docIntroduction to Artificial Intelligence.doc
Introduction to Artificial Intelligence.doc
 
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingAn-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
 
postersimbolos
postersimbolospostersimbolos
postersimbolos
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
Lebanon SoftShore Artificial Intelligence Seminar - March 38, 2014
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Seminar Neuro-computing
Seminar Neuro-computingSeminar Neuro-computing
Seminar Neuro-computing
 

Mais de Armando Vieira

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)Armando Vieira
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyArmando Vieira
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithmsArmando Vieira
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningArmando Vieira
 
Manifold learning for credit risk assessment
Manifold learning for credit risk assessment Manifold learning for credit risk assessment
Manifold learning for credit risk assessment Armando Vieira
 

Mais de Armando Vieira (6)

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and Shiny
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learning
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Manifold learning for credit risk assessment
Manifold learning for credit risk assessment Manifold learning for credit risk assessment
Manifold learning for credit risk assessment
 

Extracting Knowledge from Pydata London 2015