SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Seman&c	
  Analysis	
  in	
  Language	
  Technology	
  
http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 



Information Extraction (I)

Named Entity Recognition (NER)
Marina	
  San(ni	
  
san$nim@stp.lingfil.uu.se	
  
	
  
Department	
  of	
  Linguis(cs	
  and	
  Philology	
  
Uppsala	
  University,	
  Uppsala,	
  Sweden	
  
	
  
Spring	
  2016	
  
	
  
	
  
1	
  
Previous	
  Lecture:	
  Distribu$onal	
  Seman$cs	
  
•  Star(ng	
  from	
  Shakespeare	
  and	
  IR	
  (term-­‐document	
  matrix)	
  …	
  
•  Moving	
  to	
  context	
  ”windows”	
  taken	
  from	
  the	
  Brown	
  corpus…	
  
•  Ending	
  up	
  to	
  PPMI	
  to	
  weigh	
  word	
  distribu(on…	
  
•  Men(oning	
  cosine	
  metric	
  to	
  compare	
  vectors….	
  
2	
  
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15
soldier 2 2 12 36
fool 37 58 1 5
clown 6 117 0 0
IR:	
  Term-­‐document	
  matrix	
  
•  Each	
  cell:	
  count	
  of	
  term	
  t	
  in	
  a	
  document	
  d:	
  	
  Nt,d:	
  	
  
•  Each	
  document	
  is	
  a	
  count	
  vector	
  in	
  ℕv:	
  a	
  column	
  below	
  	
  
3	
  
Term	
  frequency	
  of	
  
t	
  in	
  d	
  
Document	
  similarity:	
  Term-­‐document	
  matrix	
  
•  Two	
  documents	
  are	
  similar	
  if	
  their	
  vectors	
  are	
  similar	
  
4	
  
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15
soldier 2 2 12 36
fool 37 58 1 5
clown 6 117 0 0
The	
  words	
  in	
  a	
  term-­‐document	
  matrix	
  
•  Two	
  words	
  are	
  similar	
  if	
  their	
  vectors	
  are	
  similar	
  
5	
  
As#You#Like#It Twelfth#Night Julius#Caesar Henry#V
battle 1 1 8 15
soldier 2 2 12 36
fool 37 58 1 5
clown 6 117 0 0
Term-­‐context	
  matrix	
  for	
  word	
  similarity	
  
•  Two	
  words	
  are	
  similar	
  in	
  meaning	
  if	
  their	
  context	
  
vectors	
  are	
  similar	
  
6	
  
aardvark computer data pinch result sugar …
apricot 0 0 0 1 0 1
pineapple 0 0 0 1 0 1
digital 0 2 1 0 1 0
information 0 1 6 0 4 0
we say, two words are similarin meaning if their context vectors
are similar.
	
  
Compu$ng	
  PPMI	
  on	
  a	
  term-­‐context	
  matrix	
  
•  Matrix	
  F	
  with	
  W	
  rows	
  (words)	
  and	
  C	
  columns	
  (contexts)	
  
•  fij	
  is	
  #	
  of	
  $mes	
  wi	
  occurs	
  in	
  context	
  cj
7	
  
pij =
fij
fij
j=1
C
∑
i=1
W
∑
pi* =
fij
j=1
C
∑
fij
j=1
C
∑
i=1
W
∑ p* j =
fij
i=1
W
∑
fij
j=1
C
∑
i=1
W
∑
pmiij = log2
pij
pi* p* j
ppmiij =
pmiij if pmiij > 0
0 otherwise
!
"
#
$#
The	
  count	
  of	
  all	
  
the	
  words	
  that	
  
occur	
  in	
  that	
  
context	
  
The	
  count	
  of	
  all	
  the	
  
contexts	
  where	
  the	
  
word	
  appear	
  
The	
  sum	
  of	
  all	
  words	
  in	
  
all	
  contexts	
  =	
  all	
  the	
  
numbers	
  in	
  the	
  matrix	
  
Summa$on:	
  Sigma	
  Nota$on	
  (i)	
  
8	
  
It means: sum whatever appears after the Sigma: so we sum n.
What is the value of n ? The values are shown below and above the Sigma.
Below --> index variable (eg. start from 1);
Above --> the range of the sum (eg. from 1 up to 4).
In this case, it says that n goes from 1 to 4, which is 1, 2, 3 and 4
(http://www.mathsisfun.com/algebra/sigma-notation.html )
	
  
pij =
fij
fij
j=1
C
∑
i=1
W
∑we can’t delete
f(i,j) !!!	
  
Sum	
  from	
  i=1	
  to	
  4	
  
Summa$on:	
  Sigma	
  Nota$on	
  (ii)	
  	
  
•  Addi(onal	
  examples	
  
•  Sums	
  can	
  be	
  nested	
  
9	
  
Alterna$ve	
  nota$ons…	
  (Levy,	
  2012)	
  
•  When,	
  the	
  range	
  of	
  the	
  sum	
  can	
  be	
  understood	
  from	
  context,	
  it	
  
ca	
  be	
  le	
  out;	
  	
  
•  or	
  we	
  want	
  to	
  be	
  vague	
  about	
  the	
  precise	
  range	
  of	
  the	
  sum.	
  For	
  
example,	
  suppose	
  that	
  there	
  are	
  n	
  variables,	
  x1	
  through	
  xn.	
  	
  
•  In	
  order	
  to	
  say	
  that	
  the	
  sum	
  of	
  all	
  n	
  variables	
  is	
  equal	
  to	
  1,	
  we	
  
might	
  simply	
  write:	
  	
  
10	
  
Formulas:	
  Sigma	
  Nota$on	
  
11	
  
pij =
fij
fij
j=1
C
∑
i=1
W
∑
pi* =
fij
j=1
C
∑
fij
j=1
C
∑
i=1
W
∑
p* j =
fij
i=1
W
∑
fij
j=1
C
∑
i=1
W
∑
•  Numerator:	
  f	
  ij	
  =	
  a	
  single	
  cell	
  	
  
•  Denominators:	
  sum	
  the	
  cells	
  of	
  all	
  the	
  
words	
  and	
  the	
  cells	
  of	
  all	
  the	
  contexts	
  
•  Numerator:	
  sum	
  the	
  cells	
  of	
  all	
  contexts	
  
(all	
  the	
  columns)	
  
•  Numerator:	
  sum	
  the	
  cells	
  of	
  all	
  the	
  words	
  
(all	
  the	
  rows)	
  	
  
Living	
  lexicon:	
  built	
  upon	
  an	
  underlying	
  
con$nously	
  updated	
  corpus	
  	
  
12	
  
Drawbacks:	
  Updated	
  but	
  unstable	
  &	
  incomplete:	
  missing words, missing	
  
linguis(c	
  informa(on,	
  etc.	
  	
  
Mul(lingualiy,	
  func(on	
  words,	
  etc.	
  	
  
Similarity:	
  	
  
•  Given	
  the	
  underlying	
  sta(s(cal	
  model,	
  these	
  words	
  are	
  similar	
  
13	
  
Fredrik	
  Olsson	
  
Gavagai	
  blog	
  
•  Further	
  reading	
  (Magnus	
  Sahlgren)	
  :	
  
heps://www.gavagai.se/blog/
2015/09/30/a-­‐brief-­‐history-­‐of-­‐
word-­‐embeddings/	
  	
  
14	
  
End	
  of	
  previous	
  lecture	
  
15	
  
Acknowledgements
Most	
  slides	
  borrowed	
  or	
  adapted	
  from:	
  
Dan	
  Jurafsky	
  and	
  Christopher	
  Manning,	
  Coursera	
  
Dan	
  Jurafsky	
  and	
  James	
  H.	
  Mar(n	
  
	
  	
  
	
  
J&M(2015,	
  dra):	
  heps://web.stanford.edu/~jurafsky/slp3/	
  	
  	
  
	
  
	
  	
  	
  
Preliminary:	
  What’s	
  Informa$on	
  Extrac$on	
  (IE)?	
  	
  
•  IE	
  =	
  text	
  analy(cs	
  =	
  text	
  mining	
  =	
  e-­‐discovery,	
  etc.	
  
•  The	
  ul(mate	
  goal	
  is	
  to	
  convert	
  unstructured	
  text	
  into	
  structured	
  
informa(on	
  (so	
  informa(on	
  of	
  interest	
  can	
  easily	
  be	
  picked	
  up).	
  
•  unstructured	
  data/text:	
  email,	
  PDF	
  files,	
  social	
  media	
  posts,	
  tweets,	
  text	
  
messages,	
  blogs,	
  basically	
  any	
  running	
  text...	
  
•  structured	
  data/text:	
  databases	
  (xlm,	
  sql,	
  etc.),	
  ontologies,	
  dic(onaries,	
  etc.	
  	
  
17	
  
Informa$on	
  
Extrac$on	
  and	
  Named	
  
En$ty	
  Recogni$on	
  
Introducing	
  the	
  tasks:	
  
Gelng	
  simple	
  structured	
  
informa(on	
  out	
  of	
  text	
  
Informa$on	
  Extrac$on	
  
•  Informa(on	
  extrac(on	
  (IE)	
  systems	
  
•  Find	
  and	
  understand	
  limited	
  relevant	
  parts	
  of	
  texts	
  
•  Gather	
  informa(on	
  from	
  many	
  pieces	
  of	
  text	
  
•  Produce	
  a	
  structured	
  representa(on	
  of	
  relevant	
  informa(on:	
  	
  
•  rela3ons	
  (in	
  the	
  database	
  sense),	
  a.k.a.,	
  
•  a	
  knowledge	
  base	
  
•  Goals:	
  
1.  Organize	
  informa(on	
  so	
  that	
  it	
  is	
  useful	
  to	
  people	
  
2.  Put	
  informa(on	
  in	
  a	
  seman(cally	
  precise	
  form	
  that	
  allows	
  further	
  
inferences	
  to	
  be	
  made	
  by	
  computer	
  algorithms	
  
Informa$on	
  Extrac$on:	
  factual	
  info	
  
•  IE	
  systems	
  extract	
  clear,	
  factual	
  informa(on	
  
•  Roughly:	
  Who	
  did	
  what	
  to	
  whom	
  when?	
  
•  E.g.,	
  
•  Gathering	
  earnings,	
  profits,	
  board	
  members,	
  headquarters,	
  etc.	
  from	
  
company	
  reports	
  	
  
•  The	
  headquarters	
  of	
  BHP	
  Billiton	
  Limited,	
  and	
  the	
  global	
  headquarters	
  
of	
  the	
  combined	
  BHP	
  Billiton	
  Group,	
  are	
  located	
  in	
  Melbourne,	
  
Australia.	
  	
  
•  headquarters(“BHP	
  Biliton	
  Limited”,	
  “Melbourne,	
  Australia”)	
  
•  Learn	
  drug-­‐gene	
  product	
  interac(ons	
  from	
  medical	
  research	
  literature	
  
Low-­‐level	
  informa$on	
  extrac$on	
  
•  Is	
  now	
  available	
  –	
  and	
  I	
  think	
  popular	
  –	
  in	
  applica(ons	
  like	
  Apple	
  
or	
  Google	
  mail,	
  and	
  web	
  indexing	
  
•  Oen	
  seems	
  to	
  be	
  based	
  on	
  regular	
  expressions	
  and	
  name	
  lists	
  
Low-­‐level	
  informa$on	
  extrac$on	
  
•  A	
  very	
  important	
  sub-­‐task:	
  find	
  and	
  classify	
  names	
  
in	
  text.	
  
•  An	
  en(ty	
  is	
  a	
  discrete	
  thing	
  like	
  “IBM	
  Corpora(on”	
  
•  Named” means called “IBM” or “Big Blue” not “it” or
“the company”
•  often extended in practice to things like dates,
instances of products and chemical/biological
substances that aren’t really entities…
•  But also used for times, dates, proteins, etc., which aren’t
entities – easy to recognize semantic classes
Named	
  En$ty	
  Recogni$on	
  (NER)	
  
Named	
  En$ty	
  Recogni$on	
  (NER)	
  
•  A	
  very	
  important	
  sub-­‐task:	
  find	
  and	
  
classify	
  names	
  in	
  text,	
  for	
  example:	
  
•  The	
  decision	
  by	
  the	
  independent	
  MP	
  
Andrew	
  Wilkie	
  to	
  withdraw	
  his	
  support	
  
for	
  the	
  minority	
  Labor	
  government	
  
sounded	
  drama(c	
  but	
  it	
  should	
  not	
  
further	
  threaten	
  its	
  stability.	
  When,	
  aer	
  
the	
  2010	
  elec(on,	
  Wilkie,	
  Rob	
  
Oakeshoe,	
  Tony	
  Windsor	
  and	
  the	
  
Greens	
  agreed	
  to	
  support	
  Labor,	
  they	
  
gave	
  just	
  two	
  guarantees:	
  confidence	
  
and	
  supply.	
  
you have a text, and
you want to:
1.  find things that are
names: European
Commission, John
Lloyd Jones, etc.
2. give them labels:
ORG, PERS, etc.
	
  
•  A	
  very	
  important	
  sub-­‐task:	
  find	
  and	
  classify	
  names	
  in	
  
text,	
  for	
  example:	
  
•  The	
  decision	
  by	
  the	
  independent	
  MP	
  Andrew	
  Wilkie	
  to	
  
withdraw	
  his	
  support	
  for	
  the	
  minority	
  Labor	
  government	
  
sounded	
  drama(c	
  but	
  it	
  should	
  not	
  further	
  threaten	
  its	
  
stability.	
  When,	
  aer	
  the	
  2010	
  elec(on,	
  Wilkie,	
  Rob	
  
Oakeshoe,	
  Tony	
  Windsor	
  and	
  the	
  Greens	
  agreed	
  to	
  support	
  
Labor,	
  they	
  gave	
  just	
  two	
  guarantees:	
  confidence	
  and	
  
supply.	
  
Named	
  En$ty	
  Recogni$on	
  (NER)	
  
Person	
  
Date	
  
Loca(on	
  
Organi-­‐	
  
	
  	
  	
  	
  za(on	
  
	
  
	
  
Named	
  En$ty	
  Recogni$on	
  (NER)	
  
•  The	
  uses:	
  
•  Named	
  en((es	
  can	
  be	
  indexed,	
  linked	
  off,	
  etc.	
  
•  Sen(ment	
  can	
  be	
  aeributed	
  to	
  companies	
  or	
  products	
  
•  A	
  lot	
  of	
  IE	
  rela(ons	
  are	
  associa(ons	
  between	
  named	
  en((es	
  
•  For	
  ques(on	
  answering,	
  answers	
  are	
  oen	
  named	
  en((es.	
  
•  Concretely:	
  
•  Many	
  web	
  pages	
  tag	
  various	
  en((es,	
  with	
  links	
  to	
  bio	
  or	
  topic	
  pages,	
  etc.	
  
•  Reuters’	
  OpenCalais,	
  Evri,	
  AlchemyAPI,	
  Yahoo’s	
  Term	
  Extrac(on,	
  …	
  
•  Apple/Google/Microso/…	
  smart	
  recognizers	
  for	
  document	
  content	
  
Summary:	
  
Gelng	
  simple	
  structured	
  informa(on	
  out	
  of	
  text	
  
Evalua$on	
  of	
  Named	
  
En$ty	
  Recogni$on	
  
The	
  extension	
  of	
  Precision,	
  
Recall,	
  and	
  the	
  F	
  measure	
  to	
  
sequences	
  
The	
  Named	
  En$ty	
  Recogni$on	
  Task	
  
Task:	
  Predict	
  en((es	
  in	
  a	
  text	
  
	
  
	
  Foreign	
   	
  ORG	
  
	
  Ministry	
   	
  ORG	
  
	
  spokesman	
   	
  O	
  
	
  Shen	
  	
   	
  PER	
  
	
  Guofang	
   	
  PER	
  
	
  told	
   	
   	
  O	
  
	
  Reuters	
   	
  ORG	
  
	
  :	
   	
   	
  :	
  
}	
  
Standard	
  	
  
evalua(on	
  
is	
  per	
  en(ty,	
  
not	
  per	
  token	
  
P/R	
  
30	
  
P=TP/TP+FP;	
  R=TP/TP+FN	
  
FP=false	
  alarm	
  (it	
  is	
  not	
  a	
  
NE,	
  but	
  it	
  has	
  been	
  
classified	
  as	
  NE)	
  
FN	
  =it	
  is	
  true	
  that	
  it	
  is	
  a	
  
NE,	
  but	
  d	
  system	
  failed	
  
to	
  recognised	
  it	
  
Precision/Recall/F1	
  for	
  IE/NER	
  
•  Recall	
  and	
  precision	
  are	
  straighNorward	
  for	
  tasks	
  like	
  IR	
  and	
  text	
  
categoriza(on,	
  where	
  there	
  is	
  only	
  one	
  grain	
  size	
  (documents)	
  
•  The	
  measure	
  behaves	
  a	
  bit	
  funnily	
  for	
  IE/NER	
  when	
  there	
  are	
  
boundary	
  errors	
  (which	
  are	
  common):	
  
•  First	
  Bank	
  of	
  Chicago	
  announced	
  earnings	
  …	
  
•  This	
  counts	
  as	
  both	
  a	
  fp	
  and	
  a	
  fn	
  
•  Selec(ng	
  nothing	
  would	
  have	
  been	
  beeer	
  
•  Some	
  other	
  metrics	
  (e.g.,	
  MUC	
  scorer)	
  give	
  par(al	
  credit	
  
(according	
  to	
  complex	
  rules)	
  
Summary:	
  	
  
Be	
  careful	
  when	
  interpre(ng	
  the	
  P/R/F1	
  measures	
  
Sequence	
  Models	
  for	
  
Named	
  En$ty	
  
Recogni$on	
  
The	
  ML	
  sequence	
  model	
  approach	
  to	
  NER	
  
Training	
  
1.  Collect	
  a	
  set	
  of	
  representa(ve	
  training	
  documents	
  
2.  Label	
  each	
  token	
  for	
  its	
  en(ty	
  class	
  or	
  other	
  (O)	
  
3.  Design	
  feature	
  extractors	
  appropriate	
  to	
  the	
  text	
  and	
  classes	
  
4.  Train	
  a	
  sequence	
  classifier	
  to	
  predict	
  the	
  labels	
  from	
  the	
  data	
  
	
  
Tes(ng	
  
1.  Receive	
  a	
  set	
  of	
  tes(ng	
  documents	
  
2.  Run	
  sequence	
  model	
  inference	
  to	
  label	
  each	
  token	
  
3.  Appropriately	
  output	
  the	
  recognized	
  en((es	
  
NER	
  pipeline	
  
35	
  
Representa(ve	
  
documents	
  
Human	
  
annota(on	
  
Annotated	
  
documents	
  
Feature	
  
extrac(on	
  
Training	
  data	
  Sequence	
  
classifiers	
  
NER	
  system	
  
Encoding	
  classes	
  for	
  sequence	
  labeling	
  
	
   	
   	
  IO	
  encoding 	
  IOB	
  encoding	
  
	
  
	
  Fred 	
  	
   	
  PER 	
   	
  B-­‐PER	
  
	
  showed	
   	
  O 	
   	
  O	
  
	
  Sue 	
  	
   	
  PER 	
   	
  B-­‐PER	
  
	
  Mengqiu	
   	
  PER 	
   	
  B-­‐PER	
  
	
  Huang	
   	
  PER 	
   	
  I-­‐PER	
  
	
  ‘s	
   	
   	
  O 	
   	
  O	
  
	
  new	
  	
   	
  O 	
   	
  O	
  
	
  pain(ng 	
  O 	
   	
  O	
  
Features	
  for	
  sequence	
  labeling	
  
•  Words	
  
•  Current	
  word	
  (essen(ally	
  like	
  a	
  learned	
  dic(onary)	
  
•  Previous/next	
  word	
  (context)	
  
•  Other	
  kinds	
  of	
  inferred	
  linguis(c	
  classifica(on	
  
•  Part-­‐of-­‐speech	
  tags	
  
•  Label	
  context	
  
•  Previous	
  (and	
  perhaps	
  next)	
  label	
  
37	
  
Features:	
  Word	
  substrings	
  
drug
company
movie
place
person
Cotrimoxazole	
   Wethersfield	
  
Alien	
  Fury:	
  Countdown	
  to	
  Invasion	
  
0
0
0
18
0
oxa
708
0
0
06
:
0 8
6
68
14
field
Features: Word shapes
•  Word Shapes
•  Map words to simplified representation that encodes attributes
such as length, capitalization, numerals, Greek letters, internal
punctuation, etc.
Varicella-zoster Xx-xxx
mRNA xXXX
CPA1 XXXd
Sequence	
  models	
  
•  Once	
  you	
  have	
  designed	
  the	
  features,	
  apply	
  a	
  sequence	
  
classifier	
  (cf	
  PoS	
  tagging),	
  such	
  as:	
  
•  Maximum	
  Entropy	
  Markov	
  Models	
  
•  Condi(onal	
  Random	
  Fields	
  
•  etc.	
  
40	
  
The end

Mais conteúdo relacionado

Mais procurados

Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro9xdot
 
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Databricks
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 

Mais procurados (20)

Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
Nlp
NlpNlp
Nlp
 
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
NLP
NLPNLP
NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 

Destaque

Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationRichard Littauer
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Stephen Shellman
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognitionLars Juhl Jensen
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...Guy De Pauw
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2Arabic_NLP_ImamU2013
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
Named Entity Recognition - VLSP 2016
Named Entity Recognition - VLSP 2016Named Entity Recognition - VLSP 2016
Named Entity Recognition - VLSP 2016Anh Vũ
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Entity identification and extraction
Entity identification and extractionEntity identification and extraction
Entity identification and extractionMariyaQuibtiya
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpTowards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpMarina Santini
 
Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clickstkramar
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseHasan H Topcu
 

Destaque (20)

Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER)
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Named Entity Recognition - VLSP 2016
Named Entity Recognition - VLSP 2016Named Entity Recognition - VLSP 2016
Named Entity Recognition - VLSP 2016
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Named Entities
Named EntitiesNamed Entities
Named Entities
 
Entity identification and extraction
Entity identification and extractionEntity identification and extraction
Entity identification and extraction
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Towards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can HelpTowards Contextualized Information: How Automatic Genre Identification Can Help
Towards Contextualized Information: How Automatic Genre Identification Can Help
 
Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clicks
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwise
 

Semelhante a IE: Named Entity Recognition (NER)

Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...National Institute of Informatics (NII)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineGan Keng Hoon
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language ProcessingVsevolod Dyomkin
 
NELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemNELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemEstevam Hruschka
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Infrastructure Facility
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 

Semelhante a IE: Named Entity Recognition (NER) (20)

Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Lecture20 xing
Lecture20 xingLecture20 xing
Lecture20 xing
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...Some thoughts about the gaps across languages and domains through the experi...
Some thoughts about the gaps across languages and domains through the experi...
 
Document similarity
Document similarityDocument similarity
Document similarity
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
NELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning SystemNELL: The Never-Ending Language Learning System
NELL: The Never-Ending Language Learning System
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Data Science Using Python.pptx
Data Science Using Python.pptxData Science Using Python.pptx
Data Science Using Python.pptx
 

Mais de Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Marina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Marina Santini
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Marina Santini
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMarina Santini
 

Mais de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)Lecture 1: Introduction to the Course (Practical Information)
Lecture 1: Introduction to the Course (Practical Information)
 
Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities Lecture: Joint, Conditional and Marginal Probabilities
Lecture: Joint, Conditional and Marginal Probabilities
 
Mathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability TheoryMathematics for Language Technology: Introduction to Probability Theory
Mathematics for Language Technology: Introduction to Probability Theory
 

Último

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Último (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

IE: Named Entity Recognition (NER)

  • 1. Seman&c  Analysis  in  Language  Technology   http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm 
 
 Information Extraction (I)
 Named Entity Recognition (NER) Marina  San(ni   san$nim@stp.lingfil.uu.se     Department  of  Linguis(cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2016       1  
  • 2. Previous  Lecture:  Distribu$onal  Seman$cs   •  Star(ng  from  Shakespeare  and  IR  (term-­‐document  matrix)  …   •  Moving  to  context  ”windows”  taken  from  the  Brown  corpus…   •  Ending  up  to  PPMI  to  weigh  word  distribu(on…   •  Men(oning  cosine  metric  to  compare  vectors….   2  
  • 3. As#You#Like#It Twelfth#Night Julius#Caesar Henry#V battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117 0 0 IR:  Term-­‐document  matrix   •  Each  cell:  count  of  term  t  in  a  document  d:    Nt,d:     •  Each  document  is  a  count  vector  in  ℕv:  a  column  below     3   Term  frequency  of   t  in  d  
  • 4. Document  similarity:  Term-­‐document  matrix   •  Two  documents  are  similar  if  their  vectors  are  similar   4   As#You#Like#It Twelfth#Night Julius#Caesar Henry#V battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117 0 0
  • 5. The  words  in  a  term-­‐document  matrix   •  Two  words  are  similar  if  their  vectors  are  similar   5   As#You#Like#It Twelfth#Night Julius#Caesar Henry#V battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 6 117 0 0
  • 6. Term-­‐context  matrix  for  word  similarity   •  Two  words  are  similar  in  meaning  if  their  context   vectors  are  similar   6   aardvark computer data pinch result sugar … apricot 0 0 0 1 0 1 pineapple 0 0 0 1 0 1 digital 0 2 1 0 1 0 information 0 1 6 0 4 0 we say, two words are similarin meaning if their context vectors are similar.  
  • 7. Compu$ng  PPMI  on  a  term-­‐context  matrix   •  Matrix  F  with  W  rows  (words)  and  C  columns  (contexts)   •  fij  is  #  of  $mes  wi  occurs  in  context  cj 7   pij = fij fij j=1 C ∑ i=1 W ∑ pi* = fij j=1 C ∑ fij j=1 C ∑ i=1 W ∑ p* j = fij i=1 W ∑ fij j=1 C ∑ i=1 W ∑ pmiij = log2 pij pi* p* j ppmiij = pmiij if pmiij > 0 0 otherwise ! " # $# The  count  of  all   the  words  that   occur  in  that   context   The  count  of  all  the   contexts  where  the   word  appear   The  sum  of  all  words  in   all  contexts  =  all  the   numbers  in  the  matrix  
  • 8. Summa$on:  Sigma  Nota$on  (i)   8   It means: sum whatever appears after the Sigma: so we sum n. What is the value of n ? The values are shown below and above the Sigma. Below --> index variable (eg. start from 1); Above --> the range of the sum (eg. from 1 up to 4). In this case, it says that n goes from 1 to 4, which is 1, 2, 3 and 4 (http://www.mathsisfun.com/algebra/sigma-notation.html )   pij = fij fij j=1 C ∑ i=1 W ∑we can’t delete f(i,j) !!!   Sum  from  i=1  to  4  
  • 9. Summa$on:  Sigma  Nota$on  (ii)     •  Addi(onal  examples   •  Sums  can  be  nested   9  
  • 10. Alterna$ve  nota$ons…  (Levy,  2012)   •  When,  the  range  of  the  sum  can  be  understood  from  context,  it   ca  be  le  out;     •  or  we  want  to  be  vague  about  the  precise  range  of  the  sum.  For   example,  suppose  that  there  are  n  variables,  x1  through  xn.     •  In  order  to  say  that  the  sum  of  all  n  variables  is  equal  to  1,  we   might  simply  write:     10  
  • 11. Formulas:  Sigma  Nota$on   11   pij = fij fij j=1 C ∑ i=1 W ∑ pi* = fij j=1 C ∑ fij j=1 C ∑ i=1 W ∑ p* j = fij i=1 W ∑ fij j=1 C ∑ i=1 W ∑ •  Numerator:  f  ij  =  a  single  cell     •  Denominators:  sum  the  cells  of  all  the   words  and  the  cells  of  all  the  contexts   •  Numerator:  sum  the  cells  of  all  contexts   (all  the  columns)   •  Numerator:  sum  the  cells  of  all  the  words   (all  the  rows)    
  • 12. Living  lexicon:  built  upon  an  underlying   con$nously  updated  corpus     12   Drawbacks:  Updated  but  unstable  &  incomplete:  missing words, missing   linguis(c  informa(on,  etc.     Mul(lingualiy,  func(on  words,  etc.    
  • 13. Similarity:     •  Given  the  underlying  sta(s(cal  model,  these  words  are  similar   13   Fredrik  Olsson  
  • 14. Gavagai  blog   •  Further  reading  (Magnus  Sahlgren)  :   heps://www.gavagai.se/blog/ 2015/09/30/a-­‐brief-­‐history-­‐of-­‐ word-­‐embeddings/     14  
  • 15. End  of  previous  lecture   15  
  • 16. Acknowledgements Most  slides  borrowed  or  adapted  from:   Dan  Jurafsky  and  Christopher  Manning,  Coursera   Dan  Jurafsky  and  James  H.  Mar(n         J&M(2015,  dra):  heps://web.stanford.edu/~jurafsky/slp3/              
  • 17. Preliminary:  What’s  Informa$on  Extrac$on  (IE)?     •  IE  =  text  analy(cs  =  text  mining  =  e-­‐discovery,  etc.   •  The  ul(mate  goal  is  to  convert  unstructured  text  into  structured   informa(on  (so  informa(on  of  interest  can  easily  be  picked  up).   •  unstructured  data/text:  email,  PDF  files,  social  media  posts,  tweets,  text   messages,  blogs,  basically  any  running  text...   •  structured  data/text:  databases  (xlm,  sql,  etc.),  ontologies,  dic(onaries,  etc.     17  
  • 18. Informa$on   Extrac$on  and  Named   En$ty  Recogni$on   Introducing  the  tasks:   Gelng  simple  structured   informa(on  out  of  text  
  • 19. Informa$on  Extrac$on   •  Informa(on  extrac(on  (IE)  systems   •  Find  and  understand  limited  relevant  parts  of  texts   •  Gather  informa(on  from  many  pieces  of  text   •  Produce  a  structured  representa(on  of  relevant  informa(on:     •  rela3ons  (in  the  database  sense),  a.k.a.,   •  a  knowledge  base   •  Goals:   1.  Organize  informa(on  so  that  it  is  useful  to  people   2.  Put  informa(on  in  a  seman(cally  precise  form  that  allows  further   inferences  to  be  made  by  computer  algorithms  
  • 20. Informa$on  Extrac$on:  factual  info   •  IE  systems  extract  clear,  factual  informa(on   •  Roughly:  Who  did  what  to  whom  when?   •  E.g.,   •  Gathering  earnings,  profits,  board  members,  headquarters,  etc.  from   company  reports     •  The  headquarters  of  BHP  Billiton  Limited,  and  the  global  headquarters   of  the  combined  BHP  Billiton  Group,  are  located  in  Melbourne,   Australia.     •  headquarters(“BHP  Biliton  Limited”,  “Melbourne,  Australia”)   •  Learn  drug-­‐gene  product  interac(ons  from  medical  research  literature  
  • 21. Low-­‐level  informa$on  extrac$on   •  Is  now  available  –  and  I  think  popular  –  in  applica(ons  like  Apple   or  Google  mail,  and  web  indexing   •  Oen  seems  to  be  based  on  regular  expressions  and  name  lists  
  • 23. •  A  very  important  sub-­‐task:  find  and  classify  names   in  text.   •  An  en(ty  is  a  discrete  thing  like  “IBM  Corpora(on”   •  Named” means called “IBM” or “Big Blue” not “it” or “the company” •  often extended in practice to things like dates, instances of products and chemical/biological substances that aren’t really entities… •  But also used for times, dates, proteins, etc., which aren’t entities – easy to recognize semantic classes Named  En$ty  Recogni$on  (NER)  
  • 24. Named  En$ty  Recogni$on  (NER)   •  A  very  important  sub-­‐task:  find  and   classify  names  in  text,  for  example:   •  The  decision  by  the  independent  MP   Andrew  Wilkie  to  withdraw  his  support   for  the  minority  Labor  government   sounded  drama(c  but  it  should  not   further  threaten  its  stability.  When,  aer   the  2010  elec(on,  Wilkie,  Rob   Oakeshoe,  Tony  Windsor  and  the   Greens  agreed  to  support  Labor,  they   gave  just  two  guarantees:  confidence   and  supply.   you have a text, and you want to: 1.  find things that are names: European Commission, John Lloyd Jones, etc. 2. give them labels: ORG, PERS, etc.  
  • 25. •  A  very  important  sub-­‐task:  find  and  classify  names  in   text,  for  example:   •  The  decision  by  the  independent  MP  Andrew  Wilkie  to   withdraw  his  support  for  the  minority  Labor  government   sounded  drama(c  but  it  should  not  further  threaten  its   stability.  When,  aer  the  2010  elec(on,  Wilkie,  Rob   Oakeshoe,  Tony  Windsor  and  the  Greens  agreed  to  support   Labor,  they  gave  just  two  guarantees:  confidence  and   supply.   Named  En$ty  Recogni$on  (NER)   Person   Date   Loca(on   Organi-­‐          za(on      
  • 26. Named  En$ty  Recogni$on  (NER)   •  The  uses:   •  Named  en((es  can  be  indexed,  linked  off,  etc.   •  Sen(ment  can  be  aeributed  to  companies  or  products   •  A  lot  of  IE  rela(ons  are  associa(ons  between  named  en((es   •  For  ques(on  answering,  answers  are  oen  named  en((es.   •  Concretely:   •  Many  web  pages  tag  various  en((es,  with  links  to  bio  or  topic  pages,  etc.   •  Reuters’  OpenCalais,  Evri,  AlchemyAPI,  Yahoo’s  Term  Extrac(on,  …   •  Apple/Google/Microso/…  smart  recognizers  for  document  content  
  • 27. Summary:   Gelng  simple  structured  informa(on  out  of  text  
  • 28. Evalua$on  of  Named   En$ty  Recogni$on   The  extension  of  Precision,   Recall,  and  the  F  measure  to   sequences  
  • 29. The  Named  En$ty  Recogni$on  Task   Task:  Predict  en((es  in  a  text      Foreign    ORG    Ministry    ORG    spokesman    O    Shen      PER    Guofang    PER    told      O    Reuters    ORG    :      :   }   Standard     evalua(on   is  per  en(ty,   not  per  token  
  • 30. P/R   30   P=TP/TP+FP;  R=TP/TP+FN   FP=false  alarm  (it  is  not  a   NE,  but  it  has  been   classified  as  NE)   FN  =it  is  true  that  it  is  a   NE,  but  d  system  failed   to  recognised  it  
  • 31. Precision/Recall/F1  for  IE/NER   •  Recall  and  precision  are  straighNorward  for  tasks  like  IR  and  text   categoriza(on,  where  there  is  only  one  grain  size  (documents)   •  The  measure  behaves  a  bit  funnily  for  IE/NER  when  there  are   boundary  errors  (which  are  common):   •  First  Bank  of  Chicago  announced  earnings  …   •  This  counts  as  both  a  fp  and  a  fn   •  Selec(ng  nothing  would  have  been  beeer   •  Some  other  metrics  (e.g.,  MUC  scorer)  give  par(al  credit   (according  to  complex  rules)  
  • 32. Summary:     Be  careful  when  interpre(ng  the  P/R/F1  measures  
  • 33. Sequence  Models  for   Named  En$ty   Recogni$on  
  • 34. The  ML  sequence  model  approach  to  NER   Training   1.  Collect  a  set  of  representa(ve  training  documents   2.  Label  each  token  for  its  en(ty  class  or  other  (O)   3.  Design  feature  extractors  appropriate  to  the  text  and  classes   4.  Train  a  sequence  classifier  to  predict  the  labels  from  the  data     Tes(ng   1.  Receive  a  set  of  tes(ng  documents   2.  Run  sequence  model  inference  to  label  each  token   3.  Appropriately  output  the  recognized  en((es  
  • 35. NER  pipeline   35   Representa(ve   documents   Human   annota(on   Annotated   documents   Feature   extrac(on   Training  data  Sequence   classifiers   NER  system  
  • 36. Encoding  classes  for  sequence  labeling        IO  encoding  IOB  encoding      Fred      PER    B-­‐PER    showed    O    O    Sue      PER    B-­‐PER    Mengqiu    PER    B-­‐PER    Huang    PER    I-­‐PER    ‘s      O    O    new      O    O    pain(ng  O    O  
  • 37. Features  for  sequence  labeling   •  Words   •  Current  word  (essen(ally  like  a  learned  dic(onary)   •  Previous/next  word  (context)   •  Other  kinds  of  inferred  linguis(c  classifica(on   •  Part-­‐of-­‐speech  tags   •  Label  context   •  Previous  (and  perhaps  next)  label   37  
  • 38. Features:  Word  substrings   drug company movie place person Cotrimoxazole   Wethersfield   Alien  Fury:  Countdown  to  Invasion   0 0 0 18 0 oxa 708 0 0 06 : 0 8 6 68 14 field
  • 39. Features: Word shapes •  Word Shapes •  Map words to simplified representation that encodes attributes such as length, capitalization, numerals, Greek letters, internal punctuation, etc. Varicella-zoster Xx-xxx mRNA xXXX CPA1 XXXd
  • 40. Sequence  models   •  Once  you  have  designed  the  features,  apply  a  sequence   classifier  (cf  PoS  tagging),  such  as:   •  Maximum  Entropy  Markov  Models   •  Condi(onal  Random  Fields   •  etc.   40