SlideShare uma empresa Scribd logo
1 de 43
TEXT MINING, TERM MINING,
AND VISUALIZATION
IMPROVING THE IMPACT OF SCHOLARLY
PUBLISHING
MONDAY 16 APRIL 2012
NICE, FRANCE
Marjorie M.K. Hlava, President
Jay Ven Eman, CEO
Access Innovations, Inc.
mhlava@accessinn.com
J_ven_eman@accessinn.com
1
What we will cover today
• Term and Text Mining
• The basics of visualization
• Case studies
• Using subject terms as metrics
• Applications
• Visualizing the results
Definitions
• Term Mining - a systematic comparison processing
algorithmic method to find patterns in text
• Text Mining – using controlled vocabulary tags in text to
find patterns and directions
• Term & text mining
 Many similarities
 Can be complimentary; not mutually exclusive
Term mining
• Precise
 Meaningful semantic relationships; contextual
 Replicable; repeatable; consistent
 Vetted; controlled
 Based on a controlled vocabulary
 Trends; gaps; relationship analysis; visualizations
 Less data processing load
Text mining
 Algorithmic; formulaic
 Neural nets, statistical, latent semantic, co -
occurrence
 Serendipitous relationships
 Sentiment; hot topics; trends
 False drops; noise;
 Misleading semantic relationships
 Heavy processing load
Why take a visual look?
• Humans can process information 17 times faster in visual
presentations
• Now data can be analyzed, manipulated and presented as visual
displays.
• To see the trends effectively we need to make the data into rich
graph-able formats
6
Visualization of data
• Needs
− Measurement
− Metrics
− Numbers
• Shows
− Adjacency
− Relationships
− Trends
− Co – occurrence
− Conceptual distance
• Is richer with
− Linking
− Semantic enrichment
− Classification
• Supports
− Forecasting
− Trend analysis
− Segmentation
− Distribution
7
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Man’s attention to
visual display to convey
knowledge is ancient
8
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The art in maps
is a
longstanding
tradition
9
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Super imposing data is now common
A mash up example
10
Traffic Injury Map
UK Data Archive
US National Highway
Safety Administration
Google Maps Base
Accident categories include
children
automobile
bicycle
etc.
Data
time
place
type
Source:
JISC TechWatch: Data Mash-ups September 2010
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Mash up of bird flight migrations and
weather patterns
http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be
11
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
How does it work?
 Develop controlled vocabulary
» Prefer one with hierarchy
 Apply to full text
» Or to the “heads”
 Decide on data points to convey information
 Divide the XML into graphable sections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Start with data – like this XML file
14
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Index or tag using subject terms from
thesaurus or taxonomy
 date, category, taxonomy term, frequency
15
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Many views of one set of data
16
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Load to a visualization program
Like Prefuse
17
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Or Pajek
18
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
National Information Center
for Educational Media
Albuquerque’s own
» Sandia developed VxInsight
» Access Innovations = NICEM
Same data – several views
Primary and Secondary Education in US
Shows the US Valley of Science
Little Science taught in elementary years
20
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Using visualization to show
 From a society / publisher perspective
» Identify Core, Boundary and Cross Border
» Provides Indicators
 Activity
 Growth
 Relatedness
 Centrality
» Locates Journal domains
 From a thesaurus perspective
» Identifies terms that are too broadly defined
» Potential Improvements in thesaurus structure using topic
structures
23
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Case Study:
Mapping IEEE thesaurus space
 We are interested in an expanded map that
includes adjacencies to the IEEE data
» Expanded term set shows adjacent white space;
opportunities for expansion
 Overlaps and edges of the science
» We need comparison data
 Learn the directions in the field
» Low occurrence rate in IEEE documents?
» Linkage to terms in IEEE documents?
 Where do we find these terms? How can we
add them?
24
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The process
 Built a rule base to auto index IEEE content
» “90 % accuracy out of the box on journal data”*
» “80% out of the box on proceedings data”*
 The overlapping data sets
» Auto indexed 1.2 million Xplore records
» Auto indexed 10 years of US Patent data
» Auto indexed 10 years of Medline
 Term sets used
» IEEE thesaurus terms rule base
» Medical Subject Headings (MeSH) (and simple rule base)
» Defense Technical Information Center (DTIC) Thesaurus (
and simple rule base)
» Similar level of detail to current IEEE thesaurus terms
25
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
26
IEEE
2kterms
1.2M documents
1. The data - Select related corpus
14kDTIC
475k patents
24kMeSH
PubMed
525k docs
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
27
IEEE
2kterms
1.2M documents
2. Identify related terms
Use the IEEE Thesaurus to index the three collections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Defining expanded term space
28
IEEE
2kterms
1.2M documents
2. Identify related terms
Use MESH and DTIC to also index the three collections
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29
IEEE
2kterms
1.2M documents
3. Resulting term set
The co-indexed items from the three collections
Defining expanded term space
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30
4. Term:Term Matrix
Where do the articles and their indexing intersect?
Defining expanded term space
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31
Visualization Strategies
Matrix
Visualization
Software
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32
All data up-posted to the top level
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33
Many map options
IEEE ExperiencePrevious Experience
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34
Sensors
Council
Nucl Plasma
Sci Soc
Nanotech
Council
Ultrason,
Ferro …
Prod Saf
Engng Soc
Oceanic
Engng Soc
Geosci Rem
Sens Soc
Council
Supercond
Compon,
Packag …
Instr
Measur Soc
Magnetics
Soc
Dielectr El
Insul Soc
Electromag
Compat Soc
Antennas
Propag Soc
Power
Electron Soc
Electron
Dev Soc
Circuits &
Systems
Power &
Energy Soc
Industry
Appl Soc
Solid St
Circuits Soc
Industr
Electr Soc
Microwave
Theory Soc
Aerosp
Electr
Sys Soc
Sys Man
Cyber
Society
Computer
Intelligence
Society
Systems
Council
Reliability
Society Education
Society
Prof
Commun
Society
Computer
Society
Robot
Autom Soc
Social
Impl Techn
Council Electr
Design Auto
Signal
Proc Soc
Intell Transp
Sys Soc
Commun
Soc
Info Theory
Soc
Vehicular
Techn Soc
Consumer
Electr Soc
Broadcast
Techn Soc
Photonics
Soc
Eng Med
Biol Sci
IEEE Portfolio
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35
Radial Visualization
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36
Publication Strategy
JASIST reference
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37
Conference Strategy
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38
Turbines
Measurement
Circuits
Amplifiers
Displays
Games
Toys
Flow
Cooling
Heating
Components
Gearing
Brakes
Dynamics
Vehicles,
Parts
Disk
Optics
Photochem
Molding
Conductors
Coatings
Lasers
Lamps
Motors
Plants,
Micro-orgs
Control
Boats
Oilfield
Services
Med Instruments
Welding
Conveyers
Rubber
Acyclic Comp
Footwear
Lubricants
Radiology
Catalysis
Macromolecules
Sprayers
Electrochem
Fitness
Hygiene
Cleaning
Printing
Paper
IC Engines
Magn/Elect
Magnets
Textiles
Layers
Medical
Devices
Clocks
Pipes
Valves
Blasting
Cables
Appliances
Outerwear
Exhaust
Pumps
Packaging
Aircraft
Semiconductors
Use a Thesaurus to Label Maps
Agriculture
Food
Consumer
Products
Construction
Automotive +
Defense
Industrial
Products
Leisure
Energy
Telecom Computer
HW/SW
Electronics
Chemicals
Pharma
Metals
Health Care
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Questions Answered
 Is there a way, using our own information, to forecast our
direction?
 Where is the industry headed? What about by technology
sector?
 Does our coverage match our mission and vision?
 Can we become smarter about our data and potential
markets using our collection in new ways?
Are the societies publishing and talking about what their
charter indicates they cover?
 What are the trends – are topics emerging/cooling?
 Can we use technology and our own data to explore these
questions while enhancing our data?
39
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
The research team
 Access Innovations / Data Harmony
» Founded in 1978
» Data enrichment and normalization
» Suite of Semantic Enrichment tools
 SciTechStrategies
» Understanding data through visualization
 IEEE Indexing & Abstracting Group
40
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
We looked at
visualization of data
 Finding the Metrics
» Measurement
» Numbers
» Terms as indicators
 Ways to show
» Adjacency
» Relationships
» Trends
» Co – occurrence
» Conceptual distance
 How to enrich with
» Linking
» Semantic enrichment
» Classification
 Maps supporting
» Forecasting
» Trend analysis
» Segmentation
» Distribution
41
Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
Effective maps require
 Contextual data
 Detailed data
 Classification methods
 At least two directions in the matrix
 A little art for fun
42
43
It just takes a little imagination
Thank you
Marjorie M.K. Hlava
President
mhlava@accessinn.com
Jay Ven Eman, CEO
J_ven_eman@accessinn.com
, Access Innovations
505-998-0800

Mais conteúdo relacionado

Mais procurados

The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleepUoLResearchSupport
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014ICPSR
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
Introduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewIntroduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewKhadak Raj Adhikari
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipICPSR
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPISteven Miller
 
IBM Watson Classroom Experience
IBM Watson Classroom ExperienceIBM Watson Classroom Experience
IBM Watson Classroom ExperienceSteven Miller
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey BoultonLEARN Project
 

Mais procurados (20)

The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Public engagement while you sleep
Public engagement while you sleepPublic engagement while you sleep
Public engagement while you sleep
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Introduction and E-Research Timeline Review
Introduction and E-Research Timeline ReviewIntroduction and E-Research Timeline Review
Introduction and E-Research Timeline Review
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
 
IBM Watson Classroom Experience
IBM Watson Classroom ExperienceIBM Watson Classroom Experience
IBM Watson Classroom Experience
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
From Open Data to Open Science, by Geoffrey Boulton
 From Open Data to Open Science, by Geoffrey Boulton From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science, by Geoffrey Boulton
 

Semelhante a Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing

II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...Dr. Haxel Consult
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Access Innovations, Inc.
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...TSoholt
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for InsTemesgenHabtamu
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAResearch Data Alliance
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 

Semelhante a Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing (20)

II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
we to deep learning
we to deep learning we to deep learning
we to deep learning
 
emantic web technologies and applications for Ins
emantic web technologies and applications for Insemantic web technologies and applications for Ins
emantic web technologies and applications for Ins
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDA
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 

Mais de Access Innovations, Inc.

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsAccess Innovations, Inc.
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 

Mais de Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Último

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

Text Mining, Term Mining, and Visualization - Improving the Impact of Scholarly Publishing

  • 1. TEXT MINING, TERM MINING, AND VISUALIZATION IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING MONDAY 16 APRIL 2012 NICE, FRANCE Marjorie M.K. Hlava, President Jay Ven Eman, CEO Access Innovations, Inc. mhlava@accessinn.com J_ven_eman@accessinn.com 1
  • 2. What we will cover today • Term and Text Mining • The basics of visualization • Case studies • Using subject terms as metrics • Applications • Visualizing the results
  • 3. Definitions • Term Mining - a systematic comparison processing algorithmic method to find patterns in text • Text Mining – using controlled vocabulary tags in text to find patterns and directions • Term & text mining  Many similarities  Can be complimentary; not mutually exclusive
  • 4. Term mining • Precise  Meaningful semantic relationships; contextual  Replicable; repeatable; consistent  Vetted; controlled  Based on a controlled vocabulary  Trends; gaps; relationship analysis; visualizations  Less data processing load
  • 5. Text mining  Algorithmic; formulaic  Neural nets, statistical, latent semantic, co - occurrence  Serendipitous relationships  Sentiment; hot topics; trends  False drops; noise;  Misleading semantic relationships  Heavy processing load
  • 6. Why take a visual look? • Humans can process information 17 times faster in visual presentations • Now data can be analyzed, manipulated and presented as visual displays. • To see the trends effectively we need to make the data into rich graph-able formats 6
  • 7. Visualization of data • Needs − Measurement − Metrics − Numbers • Shows − Adjacency − Relationships − Trends − Co – occurrence − Conceptual distance • Is richer with − Linking − Semantic enrichment − Classification • Supports − Forecasting − Trend analysis − Segmentation − Distribution 7
  • 8. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Man’s attention to visual display to convey knowledge is ancient 8
  • 9. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The art in maps is a longstanding tradition 9
  • 10. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Super imposing data is now common A mash up example 10 Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010
  • 11. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Mash up of bird flight migrations and weather patterns http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be 11
  • 12. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded
  • 13. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony How does it work?  Develop controlled vocabulary » Prefer one with hierarchy  Apply to full text » Or to the “heads”  Decide on data points to convey information  Divide the XML into graphable sections
  • 14. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Start with data – like this XML file 14
  • 15. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Index or tag using subject terms from thesaurus or taxonomy  date, category, taxonomy term, frequency 15
  • 16. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Many views of one set of data 16
  • 17. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Load to a visualization program Like Prefuse 17
  • 18. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Or Pajek 18
  • 19. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19
  • 20. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony National Information Center for Educational Media Albuquerque’s own » Sandia developed VxInsight » Access Innovations = NICEM Same data – several views Primary and Secondary Education in US Shows the US Valley of Science Little Science taught in elementary years 20
  • 21. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 22. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony
  • 23. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Using visualization to show  From a society / publisher perspective » Identify Core, Boundary and Cross Border » Provides Indicators  Activity  Growth  Relatedness  Centrality » Locates Journal domains  From a thesaurus perspective » Identifies terms that are too broadly defined » Potential Improvements in thesaurus structure using topic structures 23
  • 24. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Case Study: Mapping IEEE thesaurus space  We are interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion  Overlaps and edges of the science » We need comparison data  Learn the directions in the field » Low occurrence rate in IEEE documents? » Linkage to terms in IEEE documents?  Where do we find these terms? How can we add them? 24
  • 25. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The process  Built a rule base to auto index IEEE content » “90 % accuracy out of the box on journal data”* » “80% out of the box on proceedings data”*  The overlapping data sets » Auto indexed 1.2 million Xplore records » Auto indexed 10 years of US Patent data » Auto indexed 10 years of Medline  Term sets used » IEEE thesaurus terms rule base » Medical Subject Headings (MeSH) (and simple rule base) » Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) » Similar level of detail to current IEEE thesaurus terms 25
  • 26. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 26 IEEE 2kterms 1.2M documents 1. The data - Select related corpus 14kDTIC 475k patents 24kMeSH PubMed 525k docs
  • 27. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 27 IEEE 2kterms 1.2M documents 2. Identify related terms Use the IEEE Thesaurus to index the three collections
  • 28. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Defining expanded term space 28 IEEE 2kterms 1.2M documents 2. Identify related terms Use MESH and DTIC to also index the three collections
  • 29. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29 IEEE 2kterms 1.2M documents 3. Resulting term set The co-indexed items from the three collections Defining expanded term space
  • 30. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30 4. Term:Term Matrix Where do the articles and their indexing intersect? Defining expanded term space
  • 31. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31 Visualization Strategies Matrix Visualization Software
  • 32. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32 All data up-posted to the top level
  • 33. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33 Many map options IEEE ExperiencePrevious Experience
  • 34. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34 Sensors Council Nucl Plasma Sci Soc Nanotech Council Ultrason, Ferro … Prod Saf Engng Soc Oceanic Engng Soc Geosci Rem Sens Soc Council Supercond Compon, Packag … Instr Measur Soc Magnetics Soc Dielectr El Insul Soc Electromag Compat Soc Antennas Propag Soc Power Electron Soc Electron Dev Soc Circuits & Systems Power & Energy Soc Industry Appl Soc Solid St Circuits Soc Industr Electr Soc Microwave Theory Soc Aerosp Electr Sys Soc Sys Man Cyber Society Computer Intelligence Society Systems Council Reliability Society Education Society Prof Commun Society Computer Society Robot Autom Soc Social Impl Techn Council Electr Design Auto Signal Proc Soc Intell Transp Sys Soc Commun Soc Info Theory Soc Vehicular Techn Soc Consumer Electr Soc Broadcast Techn Soc Photonics Soc Eng Med Biol Sci IEEE Portfolio
  • 35. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35 Radial Visualization
  • 36. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36 Publication Strategy JASIST reference
  • 37. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37 Conference Strategy
  • 38. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38 Turbines Measurement Circuits Amplifiers Displays Games Toys Flow Cooling Heating Components Gearing Brakes Dynamics Vehicles, Parts Disk Optics Photochem Molding Conductors Coatings Lasers Lamps Motors Plants, Micro-orgs Control Boats Oilfield Services Med Instruments Welding Conveyers Rubber Acyclic Comp Footwear Lubricants Radiology Catalysis Macromolecules Sprayers Electrochem Fitness Hygiene Cleaning Printing Paper IC Engines Magn/Elect Magnets Textiles Layers Medical Devices Clocks Pipes Valves Blasting Cables Appliances Outerwear Exhaust Pumps Packaging Aircraft Semiconductors Use a Thesaurus to Label Maps Agriculture Food Consumer Products Construction Automotive + Defense Industrial Products Leisure Energy Telecom Computer HW/SW Electronics Chemicals Pharma Metals Health Care
  • 39. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Questions Answered  Is there a way, using our own information, to forecast our direction?  Where is the industry headed? What about by technology sector?  Does our coverage match our mission and vision?  Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover?  What are the trends – are topics emerging/cooling?  Can we use technology and our own data to explore these questions while enhancing our data? 39
  • 40. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony The research team  Access Innovations / Data Harmony » Founded in 1978 » Data enrichment and normalization » Suite of Semantic Enrichment tools  SciTechStrategies » Understanding data through visualization  IEEE Indexing & Abstracting Group 40
  • 41. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony We looked at visualization of data  Finding the Metrics » Measurement » Numbers » Terms as indicators  Ways to show » Adjacency » Relationships » Trends » Co – occurrence » Conceptual distance  How to enrich with » Linking » Semantic enrichment » Classification  Maps supporting » Forecasting » Trend analysis » Segmentation » Distribution 41
  • 42. Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony Effective maps require  Contextual data  Detailed data  Classification methods  At least two directions in the matrix  A little art for fun 42
  • 43. 43 It just takes a little imagination Thank you Marjorie M.K. Hlava President mhlava@accessinn.com Jay Ven Eman, CEO J_ven_eman@accessinn.com , Access Innovations 505-998-0800

Notas do Editor

  1. Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.