SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Treparel
Delftechpark 26
2628 XH Delft
The Netherlands
www.treparel.com
Text Analytics
in the
EU Fusepool
project
II-SDV ’13 - Nice
Anton Heijs
CTO
anton@treparel.com
April 15, 2013
Agenda	
  
•  Introduc.on	
  of	
  the	
  EU	
  Fusepool	
  project,	
  Treparel	
  and	
  the	
  
consor.um	
  partners	
  
•  Background	
  on	
  objec.ves	
  of	
  the	
  Fusepool	
  project	
  
•  User	
  adap.ve	
  system	
  
•  Data	
  pooling	
  and	
  linking	
  
•  Large	
  scale	
  machine	
  learning	
  and	
  text	
  analy.cs	
  
•  Summary	
  
Treparel KMX – All rights reserved 2013 2www.treparel.com
The	
  EU	
  Fusepool	
  Consor.um	
  
Treparel (Delft, The Netherlands) is a global software provider in Big Data Text Analytics and
Visualization.
Global companies, government agencies, software vendors or data publishers are using Treparel
KMX text analysis software to gain faster, reliable, precise insights in large complex unstructured data
sets (like application notes, blogs, email and patents) allowing them to make better informed
decisions.
As part of the Fusepool consortium Treparel integrates the KMX Text Analytics technology for
advanced classification, clustering and visualization of large complex document collections.
Treparel KMX – All rights reserved 2013 3www.treparel.com
Each	
  partner	
  provides	
  cri.cal	
  building	
  blocks:	
  
  BUAS:	
  Dynamic	
  user	
  interfaces	
  
  ENOLL:	
  End	
  users	
  with	
  specific	
  needs	
  
  GEOX:	
  NER,	
  UI	
  design	
  
  SEARCHBOX:	
  text	
  search	
  and	
  similarity	
  seman.c	
  
matching	
  
  TREPAREL:	
  text/patent	
  classifica.on	
  engines	
  
  XEROX:	
  user-­‐adap.ve	
  learning	
  know-­‐how	
  
	
  	
  	
  	
  	
  	
  BUT	
  BREAK-­‐THROUGH	
  BY	
  INTEGRATION	
  
The	
  Big	
  Picture	
  
Treparel KMX – All rights reserved 2013 4www.treparel.com
WP4
Find &
match
WP5
GUI &
Visuals
WP2
Platform,
sourcing
WP3
Extract &
enrich
WP1
Reqs &
Testing
WP6 User Involvement
Learning Learning Learning LearningFeedback
Work	
  packages	
  
The EC Fusepool project is a 2 year EU project and started in July 2012
Treparel KMX – All rights reserved 2013 5www.treparel.com
Fusing	
  and	
  pooling	
  informa.on	
  	
  
for	
  product	
  development	
  
Vision: User-adaptive system
Living Lab: Rapid app development
Data processing: Sourcing & interlinking
Machine learning: Matching & optimizing
Integrated Use Cases
Treparel KMX – All rights reserved 2013 6www.treparel.com
Background	
  
•  SMEs	
  have	
  a	
  need	
  for	
  technology	
  intelligence	
  for	
  detec.ng	
  
and	
  responding	
  to	
  opportuni.es	
  and	
  threats	
  
•  This	
  a	
  partly	
  driven	
  by	
  growth	
  and	
  complexity	
  of	
  
patents	
  and	
  lawsuits	
  
•  Consumer	
  intelligence	
  to	
  detect	
  opinions	
  and	
  needs	
  of	
  
consumers	
  for	
  product	
  development	
  
•  Open	
  innova.on	
  requiring	
  coopera.on	
  (links	
  between	
  data,	
  
e.g.	
  finding	
  business	
  partners)	
  
•  Focus:	
  Machine	
  Learning	
  algorithms	
  to	
  improve	
  matching	
  	
  
Treparel KMX – All rights reserved 2013 7www.treparel.com
User-­‐adap.ve	
  system	
  
•  Focus:	
  monitor	
  and	
  learn	
  specific	
  needs	
  and	
  preferences	
  
of	
  a	
  user	
  to	
  align	
  features,	
  func.onali.es,	
  and	
  graphical	
  
interfaces	
  
•  AdapDve:	
  machine	
  learning	
  from	
  crowd-­‐sourcing	
  (rather	
  
than	
  rule-­‐based)	
  
•  User-­‐aligned	
  prioriDzaDon:	
  more	
  usable	
  and	
  customized	
  
interfaces,	
  sugges.ons	
  based	
  on	
  ac.vity	
  &	
  user	
  feedback	
  
Treparel KMX – All rights reserved 2013 8www.treparel.com
User-­‐adap.ve	
  matching	
  
•  Main	
  goal:	
  automated	
  user-­‐adap.ve	
  matching	
  of	
  users	
  
to:	
  
–  Patent	
  analysis	
  
–  Finding	
  funding	
  opportuni.es	
  
–  Partner	
  matching	
  
•  Key	
  asset:	
  informa.on	
  provided	
  by	
  the	
  user	
  	
  
•  User	
  Data	
  Credo:	
  	
  accuracy	
  improves	
  with	
  quan.ty	
  and	
  
quality	
  of	
  user	
  data	
  while	
  variety	
  (breadth)	
  increases	
  
with	
  number	
  of	
  users	
  
•  Living	
  lab:	
  Co-­‐crea.on	
  between	
  creators	
  and	
  consumers	
  
of	
  the	
  Fusepool	
  plaWorm	
  
Treparel KMX – All rights reserved 2013 9www.treparel.com
Data	
  sourcing	
  
•  Sources:	
  internal	
  &	
  external	
  content	
  from	
  web	
  harves.ng	
  
and	
  structured	
  data	
  sources	
  
–  using	
  content	
  databases	
  and	
  linked	
  open	
  data	
  
•  Scope:	
  ini.al	
  data	
  corpus	
  includes	
  all	
  explicitly	
  in-­‐	
  and	
  
excluded	
  sources	
  
•  Gained	
  value	
  from	
  InformaDon:	
  recommenda.ons	
  
based	
  on	
  machine	
  learning	
  from	
  feedback	
  
Treparel KMX – All rights reserved 2013 10www.treparel.com
Data	
  handling	
  
1.  Text	
  analysis	
  and	
  feature	
  extracDon:	
  ML	
  &	
  NLP	
  methods	
  
for	
  categorizing,	
  named	
  en.ty	
  extrac.on,	
  etc.	
  
2.  Shared	
  metadata	
  models:	
  mapping	
  text	
  features	
  to	
  
exis.ng/custom	
  ontologies	
  and	
  genera.on	
  of	
  seman.c	
  
triplets	
  
→  High-­‐level	
  abstrac.on	
  &	
  persistence	
  for	
  reuse	
  
→  Lightweight	
  storage:	
  mostly	
  metadata	
  only,	
  text	
  
indexing	
  and	
  abstrac.on	
  uses	
  schema-­‐free	
  key-­‐
value	
  (enabling	
  ac.onable	
  facets)	
  
Treparel KMX – All rights reserved 2013 11www.treparel.com
Data	
  interlinking	
  
•  Contextualize:	
  terms	
  are	
  interlinked	
  with	
  same	
  and	
  
similar	
  terms	
  across	
  sources:	
  
–  Enrich	
  the	
  extracted	
  content	
  with	
  exis.ng	
  informa.on	
  available	
  
in	
  the	
  Internet	
  
–  Interlink	
  as	
  much	
  informa.on	
  as	
  possible	
  to	
  increase	
  the	
  value	
  of	
  
knowledge	
  extrac.on	
  
–  Use	
  available	
  public	
  sector	
  resources	
  in	
  Seman.c	
  Web	
  and	
  LOD	
  
format	
  
•  Metadata:	
  when	
  a	
  user	
  uploads	
  texts	
  to	
  be	
  matched	
  with	
  
other	
  content,	
  only	
  the	
  metadata	
  descriptors	
  are	
  
transmiYed	
  
•  Data	
  privacy:	
  data	
  fusion	
  from	
  diverse	
  sources	
  without	
  
endangering	
  user	
  privacy	
  
Treparel KMX – All rights reserved 2013 12www.treparel.com
Searching	
  &	
  finding	
  
•  Key	
  search-­‐oriented	
  features:	
  
–  Search	
  through	
  all	
  content	
  in	
  the	
  data	
  pool	
  
–  Faceted	
  search	
  (categories,	
  metadata,	
  en..es)	
  
–  Integra.on	
  of	
  Linked	
  Open	
  Data	
  (LOD)	
  results	
  
–  Cross-­‐lingual	
  indexing	
  and	
  cross-­‐referencing	
  
–  “Did	
  you	
  mean?”-­‐func.onality	
  in	
  case	
  of	
  typos	
  and	
  auto-­‐
comple.on	
  of	
  search	
  queries	
  
•  User-­‐adapDve:	
  indexing	
  and	
  integra.on	
  based	
  on	
  users	
  
needs	
  (e.g.	
  user	
  profiling)	
  
Treparel KMX – All rights reserved 2013 13www.treparel.com
Adapta.on	
  &	
  refinement	
  
•  AdapDve	
  search:	
  results	
  are	
  aligned	
  to	
  user	
  preferences	
  
based	
  on	
  analysis	
  of	
  user	
  implicit	
  and	
  explicit	
  feedback	
  	
  
•  MulD-­‐task	
  ranking:	
  good	
  trade-­‐off	
  between	
  user-­‐
independent	
  search	
  (high	
  coverage	
  but	
  lower	
  precision)	
  
and	
  a	
  very	
  customized	
  approach	
  
•  Query	
  intent	
  discovery:	
  analysis	
  of	
  the	
  query	
  structure	
  
and	
  interlinking	
  of	
  queries	
  
Treparel KMX – All rights reserved 2013 14www.treparel.com
Correla.ng	
  &	
  matching	
  
•  Search	
  guided	
  navigaDon:	
  seman.c	
  matching	
  extracts	
  
contextual	
  rela.onships	
  to	
  list	
  related	
  content	
  
–  sugges.ons	
  organized	
  by	
  categories	
  
–  exposing	
  facets	
  within	
  related	
  content	
  
•  Distributed	
  rule	
  and	
  event	
  model:	
  defines	
  states,	
  
ac.ons,	
  and	
  consequences	
  (e.g.	
  no.fica.ons,	
  
visualiza.ons)	
  for	
  reasoning	
  based	
  on	
  light-­‐weight	
  
ontologies	
  
Treparel KMX – All rights reserved 2013 15www.treparel.com
Crowd	
  sourcing	
  &	
  	
  
supervised	
  automa.on	
  
•  RelaDonal	
  learning:	
  related	
  instances	
  are	
  used	
  to	
  reason	
  
about	
  the	
  focal	
  instance	
  
–  Ra.onality	
  of	
  content	
  (links	
  to	
  other	
  content,	
  people,	
  etc.)	
  
provide	
  rich	
  informa.on	
  
–  Similari.es/dissimilari.es	
  to	
  other	
  content	
  is	
  established	
  purely	
  
on	
  rela.onal	
  proper.es	
  
Treparel KMX – All rights reserved 2013 16www.treparel.com
Visualiza.on	
  
Clustering	
   Classifica.on	
  
Text	
  Preprocessing	
  and	
  Indexing	
  
Acquire	
  documents	
  
Present	
  Results	
  
Taxonomies,	
  
Ontologies	
  
Seman.c	
  Analysis	
  
Document	
  level	
  analysis	
  
using	
  the	
  KMX	
  technology	
  	
  
KMX	
  unique	
  func.ons:	
  
•  Extract	
  concepts	
  in	
  context	
  
using	
  clustering	
  and	
  
classifica.on	
  of	
  documents	
  
•  Use	
  classifica.on	
  to	
  create	
  
ranked	
  lists	
  and	
  to	
  tag	
  subsets	
  
•  Support	
  of	
  binary	
  and	
  mul.-­‐
class	
  Classifica.on	
  
•  Integra.on	
  with	
  other	
  
applica.ons	
  through	
  KMX	
  API	
  
Treparel KMX – All rights reserved 2012 www.treparel.com 17
Query &
Search Tools
Treparel KMX – All rights reserved 2013 18www.treparel.com
NER	
  in	
  the	
  landscaping	
  	
  
Sentence	
  level	
  analysis	
  using	
  	
  
Named	
  En.ty	
  Recogni.on	
  
•  The	
  aim	
  of	
  NER	
  is	
  to	
  iden.fy	
  en..es	
  in	
  unstructured	
  text	
  
documents.	
  
o  To	
  locate	
  :	
  mark-­‐up	
  the	
  en..es	
  
o  To	
  classify	
  :	
  into	
  predefined	
  categories/domains	
  
•  Aim	
  of	
  usage	
  
o  To	
  recognise	
  trends	
  (trained	
  and	
  new	
  high	
  frequency)	
  
o  To	
  find	
  all	
  „trained”	
  en..es	
  
o  To	
  „discover”	
  new	
  en.tes	
  
•  NER	
  approaches	
  
o  Sta.s.cs	
  based	
  (supervised	
  machine	
  learning)	
  
o  Rule	
  based	
  (regular	
  expressions)	
  
Treparel KMX – All rights reserved 2013 19www.treparel.com
Building	
  a	
  NER	
  model	
  
Treparel KMX – All rights reserved 2013 20www.treparel.com
3	
  NER	
  model	
  examples	
  
  Training	
  text:	
  500	
  patents	
  from	
  EPO	
  
  Training:	
  model	
  building	
  by	
  Stanford	
  NER	
  
•  LTE	
  (long	
  term	
  evoluDon)	
  
•  F1:	
  88%	
  
•  New	
  en..es:	
  „GSM	
  BSS”,	
  „LTE	
  TDD”	
  
•  False	
  pos:	
  „Loca.on	
  Area	
  LA”	
  
•  Elements	
  
•  F1:	
  98%	
  
•  False	
  pos:	
  „argon/hydrogen”	
  
•  Cancer	
  
•  F1:	
  87%	
  
•  New	
  en..es:	
  „myeloma	
  cell”,	
  „tumor	
  .ssue”	
  
•  False	
  pos:	
  „cell	
  mortality”,	
  „the	
  test	
  compound”	
  
Treparel KMX – All rights reserved 2013 21www.treparel.com
Using	
  NER	
  in	
  a	
  GUI	
  
Treparel KMX – All rights reserved 2013 22www.treparel.com
Raw tex
Extract	
  terms	
  indica.ng	
  
domains	
  using	
  patent	
  
classifica.on	
  codes	
  
Content	
  /	
  En.ty	
  
Hub	
  
Generate	
  Vector	
  Space	
  Model	
  
Document Vectors
Generate	
  Classifiers	
  to	
  es.mate	
  domains	
  
Annotated tex with
domain labels
Domain
labels
Training Vectors
DocId : 1122
Title : Pesticide device
Classification Code :
A61B
Text: Hihaho
Etc etc
Table
A61 : pesticide
A61B : pesticide solvent
Etc etc
Many Vectors
Doc-Id : 1122
750 terms + weights
Classifier	
  on	
  pes.cide	
  solvents	
  
25 Positive vectors
Doc-Id’s
Label
750 terms + weights
25 negative vectors
Doc-Id’s
Label
750 terms + weights
Doc-Id : 1122
Labels + scores
Title : Pesticide device
Classification Code : A61B
Text: Hihaho …Etc etc
1
2
3
4
5
6
7
8
9
10	
  
11	
  
Combining	
  text	
  analysis	
  
approaches	
  
Treparel KMX – All rights reserved 2013 23www.treparel.com
•  Data	
  sourcing:	
  retrieves	
  data	
  from	
  data	
  sources	
  via	
  data	
  
integrator	
  
•  Storage:	
  raw	
  (e.g.	
  text)	
  or	
  processed	
  data	
  stored	
  in	
  
database	
  or	
  triple	
  store	
  
•  Using	
  ML	
  to	
  enable	
  learning	
  from	
  the	
  crowd	
  
•  Push	
  &	
  Pull:	
  	
  
  interface	
  to	
  consumers	
  (web	
  and	
  mobile	
  apps)	
  
  suppor.ng	
  quality	
  and	
  access	
  control	
  from	
  portal	
  
•  Portal:	
  business	
  logic,	
  storage	
  of	
  registered	
  data	
  sources,	
  
access	
  control	
  using	
  web	
  frontend	
  
Somware	
  architecture	
  
Treparel KMX – All rights reserved 2013 24www.treparel.com
Open	
  Call	
  for	
  Users	
  
•  ApplicaDons	
  received	
  from	
  Finland,	
  Germany,	
  Spain,	
  Greece,	
  
Bulgaria,	
  Hungary,	
  Italy,	
  UK,	
  Belgium,	
  China,	
  Switzerland,	
  France,	
  
Denmark,	
  Portugal,	
  Ireland	
  and	
  The	
  Netherlands	
  	
  
•  Business	
  areas	
  covered:	
  	
  
–  Bio-­‐medical,	
  
–  Pharma	
  and	
  biotech,	
  
–  ICT/Telecommunica.ons,	
  	
  
–  Digital	
  media,	
  	
  
–  Renewable	
  energies,	
  	
  
–  Educa.on,	
  	
  
–  Innova.on/consultancy	
  services	
  for	
  SMEs.	
  	
  
•  Mix	
  of	
  profiles	
  from	
  SMEs,	
  Research,	
  SME	
  intermediaries	
  
(incubators,	
  Science	
  parks,	
  Living	
  Labs,	
  etc),	
  developers	
  and	
  more	
  
•  MulDple	
  areas	
  of	
  research,	
  development	
  and	
  innova.on	
  areas	
  are	
  
covered.	
  
25Treparel KMX – All rights reserved 2013 25www.treparel.com
•  Data	
  as	
  a	
  service:	
  scale	
  economies	
  of	
  scale	
  in	
  management	
  of	
  data	
  	
  
•  Data	
  pooling:	
  processes	
  need	
  .mely	
  aggrega.on	
  and	
  redistribu.on	
  
of	
  diverse	
  data	
  but	
  building	
  own	
  is	
  redundant	
  and	
  prohibi.ve	
  for	
  
SMEs	
  
  provide	
  services	
  on	
  top	
  of	
  pool	
  with	
  high	
  quality	
  data	
  
  provide	
  access	
  to	
  services	
  on	
  demand	
  
•  Success	
  criteria:	
  
  Early	
  provision	
  of	
  scalable	
  basic	
  Fusepool	
  services	
  
  SME	
  involvement	
  and	
  uptake	
  of	
  Fusepool	
  services	
  
  Machine	
  learning	
  for	
  data	
  cura.on	
  &	
  user	
  adapta.on	
  
•  Required	
  steps:	
  
  Stepwise	
  integra.on	
  of	
  exis.ng	
  &	
  new	
  technologies	
  
  Early	
  and	
  ongoing	
  feedback	
  from	
  end	
  users	
  
EC	
  perspec.ve	
  
Treparel KMX – All rights reserved 2013 26www.treparel.com
The	
  EC	
  FP	
  7	
  program	
  Fusepool	
  is	
  all	
  about:	
  	
  
•  Building	
  a	
  plaWorm	
  with	
  web	
  enabled	
  services	
  for	
  
•  Data	
  pooling	
  
•  Large	
  scale	
  text	
  analysis	
  
•  Large	
  scale	
  machine	
  learning	
  of	
  user	
  input	
  
•  Enable	
  SME’s	
  with	
  analy.cs	
  for	
  improving	
  their	
  
innova.on	
  and	
  compe..ve	
  strengths	
  using	
  
•  SME’s	
  involvement	
  and	
  feedback	
  to	
  the	
  Fusepool	
  services	
  
•  Machine	
  learning	
  for	
  data	
  cura.on	
  &	
  user	
  adapta.on	
  
Summary	
  
Treparel KMX – All rights reserved 2013 27www.treparel.com
Welcome to visit Fusepool at:
www.fusepool.net

Mais conteúdo relacionado

Mais procurados

A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...cscpconf
 
A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia
A Framework for Content Preparation to Support Open-Corpus Adaptive HypermediaA Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia
A Framework for Content Preparation to Support Open-Corpus Adaptive HypermediaKillian Levacher
 
Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented DatabaseEditor IJMTER
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvestingAndrewLIS688
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...csandit
 
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...cscpconf
 
Making data FAIR using InterMine
Making data FAIR using InterMineMaking data FAIR using InterMine
Making data FAIR using InterMineJustin Clark-Casey
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Surveyijceronline
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
A survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityA survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityunyil96
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 

Mais procurados (18)

A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
 
A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia
A Framework for Content Preparation to Support Open-Corpus Adaptive HypermediaA Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia
A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia
 
Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
 
Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented Database
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Distributed databases
Distributed  databasesDistributed  databases
Distributed databases
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
 
Making data FAIR using InterMine
Making data FAIR using InterMineMaking data FAIR using InterMine
Making data FAIR using InterMine
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
XML Retrieval: A Survey
XML Retrieval: A SurveyXML Retrieval: A Survey
XML Retrieval: A Survey
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Ordbms
OrdbmsOrdbms
Ordbms
 
A survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityA survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperability
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 

Destaque

Finding trends in large scale document sets
Finding trends in large scale document setsFinding trends in large scale document sets
Finding trends in large scale document setsTreparel
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & VisualizationTreparel
 
Machine Learning based Text Classification introduction
Machine Learning based Text Classification introductionMachine Learning based Text Classification introduction
Machine Learning based Text Classification introductionTreparel
 
Text and Data Visualization Introduction 2012
Text and Data Visualization Introduction 2012Text and Data Visualization Introduction 2012
Text and Data Visualization Introduction 2012Treparel
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Treparel
 

Destaque (7)

Finding trends in large scale document sets
Finding trends in large scale document setsFinding trends in large scale document sets
Finding trends in large scale document sets
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization
 
Machine Learning based Text Classification introduction
Machine Learning based Text Classification introductionMachine Learning based Text Classification introduction
Machine Learning based Text Classification introduction
 
Text and Data Visualization Introduction 2012
Text and Data Visualization Introduction 2012Text and Data Visualization Introduction 2012
Text and Data Visualization Introduction 2012
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014
 
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012
 

Semelhante a Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)

MetadataTheory: Metadata Standards (6th of 10)
MetadataTheory: Metadata Standards (6th of 10)MetadataTheory: Metadata Standards (6th of 10)
MetadataTheory: Metadata Standards (6th of 10)Nikos Palavitsinis, PhD
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...ResearchSpace
 
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...Dr. Haxel Consult
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...semanticsconference
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Intake 38 data access 4
Intake 38 data access 4Intake 38 data access 4
Intake 38 data access 4Mahmoud Ouf
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )SBGC
 
Jisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc RDM
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...DataScienceConferenc1
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum
 
Making research data more resourceful - Jisc digital festival 2015
Making research data more resourceful - Jisc digital festival 2015Making research data more resourceful - Jisc digital festival 2015
Making research data more resourceful - Jisc digital festival 2015Jisc
 

Semelhante a Text Analytics in the EU Fusepool project (at II-SDV 2013 conference) (20)

Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
MIDESS
MIDESSMIDESS
MIDESS
 
MetadataTheory: Metadata Standards (6th of 10)
MetadataTheory: Metadata Standards (6th of 10)MetadataTheory: Metadata Standards (6th of 10)
MetadataTheory: Metadata Standards (6th of 10)
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
 
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
II-SDV 2014 Insights into a Next Generation IP and R&D Dashboard (Jeroen Klei...
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Session 2.1 ontological representation of the telecom domain for advanced a...
Session 2.1   ontological representation of the telecom domain for advanced a...Session 2.1   ontological representation of the telecom domain for advanced a...
Session 2.1 ontological representation of the telecom domain for advanced a...
 
OpenKM commercial
OpenKM commercialOpenKM commercial
OpenKM commercial
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Text Mining
Text MiningText Mining
Text Mining
 
Intake 38 data access 4
Intake 38 data access 4Intake 38 data access 4
Intake 38 data access 4
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Jisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring UpdateJisc Research Data Shared Service - Spring Update
Jisc Research Data Shared Service - Spring Update
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Making research data more resourceful - Jisc digital festival 2015
Making research data more resourceful - Jisc digital festival 2015Making research data more resourceful - Jisc digital festival 2015
Making research data more resourceful - Jisc digital festival 2015
 
2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Text Analytics in the EU Fusepool project (at II-SDV 2013 conference)

  • 1. Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.com Text Analytics in the EU Fusepool project II-SDV ’13 - Nice Anton Heijs CTO anton@treparel.com April 15, 2013
  • 2. Agenda   •  Introduc.on  of  the  EU  Fusepool  project,  Treparel  and  the   consor.um  partners   •  Background  on  objec.ves  of  the  Fusepool  project   •  User  adap.ve  system   •  Data  pooling  and  linking   •  Large  scale  machine  learning  and  text  analy.cs   •  Summary   Treparel KMX – All rights reserved 2013 2www.treparel.com
  • 3. The  EU  Fusepool  Consor.um   Treparel (Delft, The Netherlands) is a global software provider in Big Data Text Analytics and Visualization. Global companies, government agencies, software vendors or data publishers are using Treparel KMX text analysis software to gain faster, reliable, precise insights in large complex unstructured data sets (like application notes, blogs, email and patents) allowing them to make better informed decisions. As part of the Fusepool consortium Treparel integrates the KMX Text Analytics technology for advanced classification, clustering and visualization of large complex document collections. Treparel KMX – All rights reserved 2013 3www.treparel.com
  • 4. Each  partner  provides  cri.cal  building  blocks:     BUAS:  Dynamic  user  interfaces     ENOLL:  End  users  with  specific  needs     GEOX:  NER,  UI  design     SEARCHBOX:  text  search  and  similarity  seman.c   matching     TREPAREL:  text/patent  classifica.on  engines     XEROX:  user-­‐adap.ve  learning  know-­‐how              BUT  BREAK-­‐THROUGH  BY  INTEGRATION   The  Big  Picture   Treparel KMX – All rights reserved 2013 4www.treparel.com
  • 5. WP4 Find & match WP5 GUI & Visuals WP2 Platform, sourcing WP3 Extract & enrich WP1 Reqs & Testing WP6 User Involvement Learning Learning Learning LearningFeedback Work  packages   The EC Fusepool project is a 2 year EU project and started in July 2012 Treparel KMX – All rights reserved 2013 5www.treparel.com
  • 6. Fusing  and  pooling  informa.on     for  product  development   Vision: User-adaptive system Living Lab: Rapid app development Data processing: Sourcing & interlinking Machine learning: Matching & optimizing Integrated Use Cases Treparel KMX – All rights reserved 2013 6www.treparel.com
  • 7. Background   •  SMEs  have  a  need  for  technology  intelligence  for  detec.ng   and  responding  to  opportuni.es  and  threats   •  This  a  partly  driven  by  growth  and  complexity  of   patents  and  lawsuits   •  Consumer  intelligence  to  detect  opinions  and  needs  of   consumers  for  product  development   •  Open  innova.on  requiring  coopera.on  (links  between  data,   e.g.  finding  business  partners)   •  Focus:  Machine  Learning  algorithms  to  improve  matching     Treparel KMX – All rights reserved 2013 7www.treparel.com
  • 8. User-­‐adap.ve  system   •  Focus:  monitor  and  learn  specific  needs  and  preferences   of  a  user  to  align  features,  func.onali.es,  and  graphical   interfaces   •  AdapDve:  machine  learning  from  crowd-­‐sourcing  (rather   than  rule-­‐based)   •  User-­‐aligned  prioriDzaDon:  more  usable  and  customized   interfaces,  sugges.ons  based  on  ac.vity  &  user  feedback   Treparel KMX – All rights reserved 2013 8www.treparel.com
  • 9. User-­‐adap.ve  matching   •  Main  goal:  automated  user-­‐adap.ve  matching  of  users   to:   –  Patent  analysis   –  Finding  funding  opportuni.es   –  Partner  matching   •  Key  asset:  informa.on  provided  by  the  user     •  User  Data  Credo:    accuracy  improves  with  quan.ty  and   quality  of  user  data  while  variety  (breadth)  increases   with  number  of  users   •  Living  lab:  Co-­‐crea.on  between  creators  and  consumers   of  the  Fusepool  plaWorm   Treparel KMX – All rights reserved 2013 9www.treparel.com
  • 10. Data  sourcing   •  Sources:  internal  &  external  content  from  web  harves.ng   and  structured  data  sources   –  using  content  databases  and  linked  open  data   •  Scope:  ini.al  data  corpus  includes  all  explicitly  in-­‐  and   excluded  sources   •  Gained  value  from  InformaDon:  recommenda.ons   based  on  machine  learning  from  feedback   Treparel KMX – All rights reserved 2013 10www.treparel.com
  • 11. Data  handling   1.  Text  analysis  and  feature  extracDon:  ML  &  NLP  methods   for  categorizing,  named  en.ty  extrac.on,  etc.   2.  Shared  metadata  models:  mapping  text  features  to   exis.ng/custom  ontologies  and  genera.on  of  seman.c   triplets   →  High-­‐level  abstrac.on  &  persistence  for  reuse   →  Lightweight  storage:  mostly  metadata  only,  text   indexing  and  abstrac.on  uses  schema-­‐free  key-­‐ value  (enabling  ac.onable  facets)   Treparel KMX – All rights reserved 2013 11www.treparel.com
  • 12. Data  interlinking   •  Contextualize:  terms  are  interlinked  with  same  and   similar  terms  across  sources:   –  Enrich  the  extracted  content  with  exis.ng  informa.on  available   in  the  Internet   –  Interlink  as  much  informa.on  as  possible  to  increase  the  value  of   knowledge  extrac.on   –  Use  available  public  sector  resources  in  Seman.c  Web  and  LOD   format   •  Metadata:  when  a  user  uploads  texts  to  be  matched  with   other  content,  only  the  metadata  descriptors  are   transmiYed   •  Data  privacy:  data  fusion  from  diverse  sources  without   endangering  user  privacy   Treparel KMX – All rights reserved 2013 12www.treparel.com
  • 13. Searching  &  finding   •  Key  search-­‐oriented  features:   –  Search  through  all  content  in  the  data  pool   –  Faceted  search  (categories,  metadata,  en..es)   –  Integra.on  of  Linked  Open  Data  (LOD)  results   –  Cross-­‐lingual  indexing  and  cross-­‐referencing   –  “Did  you  mean?”-­‐func.onality  in  case  of  typos  and  auto-­‐ comple.on  of  search  queries   •  User-­‐adapDve:  indexing  and  integra.on  based  on  users   needs  (e.g.  user  profiling)   Treparel KMX – All rights reserved 2013 13www.treparel.com
  • 14. Adapta.on  &  refinement   •  AdapDve  search:  results  are  aligned  to  user  preferences   based  on  analysis  of  user  implicit  and  explicit  feedback     •  MulD-­‐task  ranking:  good  trade-­‐off  between  user-­‐ independent  search  (high  coverage  but  lower  precision)   and  a  very  customized  approach   •  Query  intent  discovery:  analysis  of  the  query  structure   and  interlinking  of  queries   Treparel KMX – All rights reserved 2013 14www.treparel.com
  • 15. Correla.ng  &  matching   •  Search  guided  navigaDon:  seman.c  matching  extracts   contextual  rela.onships  to  list  related  content   –  sugges.ons  organized  by  categories   –  exposing  facets  within  related  content   •  Distributed  rule  and  event  model:  defines  states,   ac.ons,  and  consequences  (e.g.  no.fica.ons,   visualiza.ons)  for  reasoning  based  on  light-­‐weight   ontologies   Treparel KMX – All rights reserved 2013 15www.treparel.com
  • 16. Crowd  sourcing  &     supervised  automa.on   •  RelaDonal  learning:  related  instances  are  used  to  reason   about  the  focal  instance   –  Ra.onality  of  content  (links  to  other  content,  people,  etc.)   provide  rich  informa.on   –  Similari.es/dissimilari.es  to  other  content  is  established  purely   on  rela.onal  proper.es   Treparel KMX – All rights reserved 2013 16www.treparel.com
  • 17. Visualiza.on   Clustering   Classifica.on   Text  Preprocessing  and  Indexing   Acquire  documents   Present  Results   Taxonomies,   Ontologies   Seman.c  Analysis   Document  level  analysis   using  the  KMX  technology     KMX  unique  func.ons:   •  Extract  concepts  in  context   using  clustering  and   classifica.on  of  documents   •  Use  classifica.on  to  create   ranked  lists  and  to  tag  subsets   •  Support  of  binary  and  mul.-­‐ class  Classifica.on   •  Integra.on  with  other   applica.ons  through  KMX  API   Treparel KMX – All rights reserved 2012 www.treparel.com 17 Query & Search Tools
  • 18. Treparel KMX – All rights reserved 2013 18www.treparel.com NER  in  the  landscaping    
  • 19. Sentence  level  analysis  using     Named  En.ty  Recogni.on   •  The  aim  of  NER  is  to  iden.fy  en..es  in  unstructured  text   documents.   o  To  locate  :  mark-­‐up  the  en..es   o  To  classify  :  into  predefined  categories/domains   •  Aim  of  usage   o  To  recognise  trends  (trained  and  new  high  frequency)   o  To  find  all  „trained”  en..es   o  To  „discover”  new  en.tes   •  NER  approaches   o  Sta.s.cs  based  (supervised  machine  learning)   o  Rule  based  (regular  expressions)   Treparel KMX – All rights reserved 2013 19www.treparel.com
  • 20. Building  a  NER  model   Treparel KMX – All rights reserved 2013 20www.treparel.com
  • 21. 3  NER  model  examples     Training  text:  500  patents  from  EPO     Training:  model  building  by  Stanford  NER   •  LTE  (long  term  evoluDon)   •  F1:  88%   •  New  en..es:  „GSM  BSS”,  „LTE  TDD”   •  False  pos:  „Loca.on  Area  LA”   •  Elements   •  F1:  98%   •  False  pos:  „argon/hydrogen”   •  Cancer   •  F1:  87%   •  New  en..es:  „myeloma  cell”,  „tumor  .ssue”   •  False  pos:  „cell  mortality”,  „the  test  compound”   Treparel KMX – All rights reserved 2013 21www.treparel.com
  • 22. Using  NER  in  a  GUI   Treparel KMX – All rights reserved 2013 22www.treparel.com
  • 23. Raw tex Extract  terms  indica.ng   domains  using  patent   classifica.on  codes   Content  /  En.ty   Hub   Generate  Vector  Space  Model   Document Vectors Generate  Classifiers  to  es.mate  domains   Annotated tex with domain labels Domain labels Training Vectors DocId : 1122 Title : Pesticide device Classification Code : A61B Text: Hihaho Etc etc Table A61 : pesticide A61B : pesticide solvent Etc etc Many Vectors Doc-Id : 1122 750 terms + weights Classifier  on  pes.cide  solvents   25 Positive vectors Doc-Id’s Label 750 terms + weights 25 negative vectors Doc-Id’s Label 750 terms + weights Doc-Id : 1122 Labels + scores Title : Pesticide device Classification Code : A61B Text: Hihaho …Etc etc 1 2 3 4 5 6 7 8 9 10   11   Combining  text  analysis   approaches   Treparel KMX – All rights reserved 2013 23www.treparel.com
  • 24. •  Data  sourcing:  retrieves  data  from  data  sources  via  data   integrator   •  Storage:  raw  (e.g.  text)  or  processed  data  stored  in   database  or  triple  store   •  Using  ML  to  enable  learning  from  the  crowd   •  Push  &  Pull:       interface  to  consumers  (web  and  mobile  apps)     suppor.ng  quality  and  access  control  from  portal   •  Portal:  business  logic,  storage  of  registered  data  sources,   access  control  using  web  frontend   Somware  architecture   Treparel KMX – All rights reserved 2013 24www.treparel.com
  • 25. Open  Call  for  Users   •  ApplicaDons  received  from  Finland,  Germany,  Spain,  Greece,   Bulgaria,  Hungary,  Italy,  UK,  Belgium,  China,  Switzerland,  France,   Denmark,  Portugal,  Ireland  and  The  Netherlands     •  Business  areas  covered:     –  Bio-­‐medical,   –  Pharma  and  biotech,   –  ICT/Telecommunica.ons,     –  Digital  media,     –  Renewable  energies,     –  Educa.on,     –  Innova.on/consultancy  services  for  SMEs.     •  Mix  of  profiles  from  SMEs,  Research,  SME  intermediaries   (incubators,  Science  parks,  Living  Labs,  etc),  developers  and  more   •  MulDple  areas  of  research,  development  and  innova.on  areas  are   covered.   25Treparel KMX – All rights reserved 2013 25www.treparel.com
  • 26. •  Data  as  a  service:  scale  economies  of  scale  in  management  of  data     •  Data  pooling:  processes  need  .mely  aggrega.on  and  redistribu.on   of  diverse  data  but  building  own  is  redundant  and  prohibi.ve  for   SMEs     provide  services  on  top  of  pool  with  high  quality  data     provide  access  to  services  on  demand   •  Success  criteria:     Early  provision  of  scalable  basic  Fusepool  services     SME  involvement  and  uptake  of  Fusepool  services     Machine  learning  for  data  cura.on  &  user  adapta.on   •  Required  steps:     Stepwise  integra.on  of  exis.ng  &  new  technologies     Early  and  ongoing  feedback  from  end  users   EC  perspec.ve   Treparel KMX – All rights reserved 2013 26www.treparel.com
  • 27. The  EC  FP  7  program  Fusepool  is  all  about:     •  Building  a  plaWorm  with  web  enabled  services  for   •  Data  pooling   •  Large  scale  text  analysis   •  Large  scale  machine  learning  of  user  input   •  Enable  SME’s  with  analy.cs  for  improving  their   innova.on  and  compe..ve  strengths  using   •  SME’s  involvement  and  feedback  to  the  Fusepool  services   •  Machine  learning  for  data  cura.on  &  user  adapta.on   Summary   Treparel KMX – All rights reserved 2013 27www.treparel.com Welcome to visit Fusepool at: www.fusepool.net