Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Found in Space: Creating and Visualizing IEEE Document Space
1. SciTech Strategies, Inc.
Better Maps Better Decisions
Found in Space:
Creating and Visualizing IEEE Document Space
IEEE William Pickering
Access Innovations / Data Harmony Marjorie M.K. Hlava
SciTech Strategies Dick Klavans
June 13, 2011
2. Agenda
IEEE Challenge
» Where are our publication strengths?
» What are the emerging topics?
Access Innovation‟s Response
» Expanding the IEEE Thesaurus
SciTech‟s Support
» Mapping the Expanded IEEE Thesaurus
Lessons Learned
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
2
4. About IEEE…
Founded in 1884, IEEE is the world’s largest professional
association advancing technology for the benefit of humanity.
We publish 150 technical journals, transactions and
magazines, sponsor nearly 1200 conferences annually,
develop technology standards, and support the professional
interests of more than 400,000 members in over 160
countries.
Members participate in 38 societies and 7 councils
The IEEE Xplore® digital library provides access to IEEE
journals, transactions, letters, magazines and conference
proceedings, IET and other 3rd Party journals and conference
proceedings, IEEE Standards and IEEE educational courses.
– Approaching 3 million documents
4
5. Specific Challenges
Is there a way, using our own information, to forecast our direction?
Where is the industry headed? What about by technology sector?
Does our coverage match our mission and vision?
Can we become smarter about our data and potential markets using
our collection in new ways?
Are the societies publishing and talking about what their charter
indicates they cover?
What are the trends – are topics emerging/cooling?
Can we use technology and our own data to explore these questions
while enhancing our data?
5
6. Access Innovation’s Response
Access Innovation‟s Thesaurus
Expanding the IEEE Thesaurus
Requirements for Visualization
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
6
7. Access Innovations / Data Harmony
Founded in 1978
Suite of Semantic Enrichment tools
Updated the IEEE Thesaurus in 2005
Built a rule base to auto index IEEE content
» “90 % accuracy out of the box on journal data”*
» “80% out of the box on proceedings data”*
Auto indexed 1.2 million Xplore records
» With the IEEE thesaurus terms rule base
» With the MeSH rule base
» With DTIC rule base
*Adam D. Philippidis, Manager, Indexing & Database Production, IEEE
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
7
8. Mapping IEEE thesaurus space
We are more interested in an expanded map that
includes adjacencies to the IEEE data
» Expanded term set shows adjacent white space; opportunities
for expansion
» Similar process to that for simple map except …
» We need additional terms to add
Criteria for additional terms
» Low occurrence rate in IEEE documents
» Linkage to terms in IEEE documents
» Similar level of detail to current IEEE thesaurus terms
Where do we find these terms? How can we add them?
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
8
9. Defining expanded term space
1. Select related corpus
14k DTIC
2k terms
IEEE
475k patents PubMed
1.2M documents
525k docs
24k MeSH
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
9
10. Defining expanded term space
2. Identify related terms
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
10
11. Defining expanded term space
2. Identify related terms
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
11
12. Defining expanded term space
3. Resulting term set
2k terms
IEEE
1.2M documents
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
12
13. Defining expanded term space
4. Term:Term Matrix
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
13
14. Requirements for Visualization
From a society / publisher perspective
» Which topical areas form our core? periphery?
» Where is the coverage dense? thin?
» Which topical areas are most active? least active?
» Which topical areas seem to be emerging? declining?
» Which topical areas are interrelated? isolated?
» What are the overlaps between journals / segments?
» Where are the potential expansion points?
From a thesaurus perspective
» What terms are too broadly defined?
» How do actual topical relationships differ from the thesaurus
structure?
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
14
15. SciTech Strategies, Inc.
Founded in 1982 (Center for Research Planning)
Using Bibliometric to Identify „Micro-communities‟
Better Maps
» Accuracy
Better Decisions
» Peripheral Vision
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
15
17. Publication Strategy
JASIST reference
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
17
18. Requirements
From a society / publisher perspective
» Identify Core, Boundary and Cross Border
» Provides Indicators
Activity
Growth
Relatedness
Centrality
» Locates Journal domains
From a thesaurus perspective
» Identifies terms that are too broadly defined
» Potential Improvements in thesaurus structure
using topic structures
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
18
19. Visualization Strategies
Visualization
Matrix Software
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
19
21. Instrumentation
Compon, Dielectr El Instr Ultrason, Electromag
Packag … Insul Soc Measur Soc Ferro … Compat Soc
Prod Saf Council Magnetics Sensors Antennas
Engng Soc Supercond Soc Council Propag Soc
Nanotech Oceanic Geosci Rem Nucl Plasma
Council Engng Soc Sens Soc Sci Soc
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
22. Power / Circuits
Power Power & Industry Industr
Electron Soc Energy Soc Appl Soc Electr Soc
Electron Circuits & Solid St Microwave
Dev Soc Systems Circuits Soc Theory Soc
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
22
23. Additional Profiles
Photonics Eng Med Electromag Antennas
Soc Biol Sci Compat Soc Propag Soc
Commun Vehicular Consumer Broadcast
Soc Techn Soc Electr Soc Techn Soc
Aerosp
Electr Intell Transp Info Theory
Sys Soc Sys Soc Soc
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
23
24. Diverse Profiles
Reliability Prof Education Council Electr
Society Commun Society Design Auto
Society
Robot Social Sys Man Computer
Autom Soc Impl Techn Cyber Intelligence
Society Society
Control Systems Computer Signal
Sys Soc Council Society Proc Soc
24
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
24
25. IEEE Portfolio
Electromag
Compat Soc Prof
Reliability Commun
Society Society Education
Sensors Ultrason, Robot Society
Oceanic Council Ferro … Autom Soc
Engng Soc
Instr
Measur Soc
CouncilDielectr El Nucl Plasma
SupercondInsul Soc Sys Man
Sci Soc Computer
Cyber
Prod Saf Society Photonics
Compon, Systems Society
Engng Soc Magnetics Council Soc
Packag …
Soc
Nanotech Social
Council Impl Techn
Computer
Intelligence
Society Eng Med
Biol Sci
Council Electr
Design Auto
Industr
Industry
Geosci Rem Electr Soc
Appl Soc
Sens Soc
Antennas
Propag Soc
Power
Power &
Electron Soc Microwave
Energy Soc
Theory Soc
Circuits &
Signal Consumer
Systems
Electron Proc Soc Electr Soc
Dev Soc
Broadcast
Intell Transp Techn Soc
Sys Soc
Solid St
Circuits Soc
Aerosp
Electr Vehicular
Sys Soc Techn Soc
Commun
Soc
Info Theory
Soc
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
25
26. Lessons Learned
Map didn‟t „feel right‟
Many Terms are too broadly defined
Effective Maps require
» More contextual data
» More detailed data
» Natural classification methods
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
26
27. Maps didn’t feel right
Previous Experience IEEE Experience
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
27
28. Terms are too Broadly Defined
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
28
29. Use a Thesaurus to Label Maps
Construction Packaging Consumer
Products
Vehicles,
Parts Welding
Gearing
Automotive + Flow
Defense Boats Appliances Food
Brakes Hygiene
Aircraft
Dynamics Sprayers Cleaning
IC Engines
Turbines Industrial
Pumps
ValvesProducts Exhaust
Leisure Fitness Outerwear Footwear
Control Medical
Pipes Devices
Toys Health Care
Clocks Games Blasting Radiology
Cooling
Measurement
Energy Med Instruments Agriculture
Cables Heating Plants,
Micro-orgs
Conveyers
Oilfield
Services
Pharma
Lamps Components
Printing
Telecom Computer Motors
Acyclic Comp
HW/SW Semiconductors Lubricants Metals
Optics
Lasers Rubber
Molding Paper
Displays Electronics Catalysis
Magn/Elect Conductors Layers
Circuits Textiles
Electrochem
Magnets Macromolecules
Disk
Amplifiers Photochem Chemicals Coatings
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
29
30. Future Improvements
Current Term:Term Matrix Proposed Term:Term Matrix
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
30
31. Future Improvements
Use citations and/or text to generate maps
Use thesaurus to label maps
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
31
32. Thank you
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
32
33. IEEE Portfolio
Electromag
Compat Soc Prof
Reliability Commun
Society Society Education
Sensors Ultrason, Robot Society
Oceanic Council Ferro … Autom Soc
Engng Soc
Instr
Measur Soc
CouncilDielectr El Nucl Plasma
Insul Soc
Supercond Sys Man
Sci Soc Computer
Cyber
Prod Saf Society Photonics
Compon, Systems Society
Engng Soc Magnetics Council Soc
Packag …
Soc
Nanotech Social
Council Impl Techn
Computer
Intelligence
Society Eng Med
Biol Sci
Council Electr
Design Auto
Industr
Industry
Geosci Rem Electr Soc
Appl Soc
Sens Soc
Antennas
Propag Soc
Power
Power &
Electron Soc Microwave
Energy Soc
Theory Soc
Circuits &
Signal Consumer
Systems
Electron Proc Soc Electr Soc
Dev Soc
Broadcast
Intell Transp Techn Soc
Sys Soc
Solid St
Circuits Soc
Aerosp
Electr Vehicular
Sys Soc Techn Soc
Commun
Soc
Info Theory
Soc
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innov
SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
33
34. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
35. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
36. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
36
37. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
37
38. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
38
39. SciTech Strategies Better Maps Better Decisions Well Formed Data • Semantic Enrichment • Access Innovations • Data Harmony
39
40. SciTech Strategies, Inc.
Better Maps Better Decisions
Thank You
IEEE William Pickering
Access Innovations / Data Harmony Marjorie M.K. Hlava
SciTech Strategies Dick Klavans
June 13, 2011
Notas do Editor
A 125 year professional society, with over 148 journals, conference transactions and magazinesSponsor approx 800 conferences annuallyTotal Membership over 400,000 as of Dec 31, 2009Span the globe, with participation in 160 countries
We knew there was “gold in them thare hills!” but how to unlock it?As a leading source of research materials, could we extract new directions?Are the societies living up to their charters and covering the topical areas they think they are?Are there trends that are just a spike in interest or are they really emerging? Are they still vigorously being investigated or were they just a flash in the pan?What other things might we learn?Introducing Dick Klavens
Access Innovations and its software brand Data Harmony are known for the high caliber of data. It is clean, well formed and very accurately semantically enriched. They updated the IEEE thesaurus in 2005, building a rule base for use in indexing at the same time. The application of the terms to the IEEE content was 90% accurate – that is 90% of the terms suggested are what well trained indexers would use from a controlled vocabulary, and 80% accurate from the more difficult proceedings data at launch of the project. Since that time the rule base has improved over time and the IEEE production team only needs to spot check about 10% of the documents to insure a high standard of indexing is maintained. It has allowed IEEE to process a lot more documents with the same team and made the process more fun at the same time. The indexers are allowed time to think about the content, the thesaurus terms, what should be added and what other information can be collected to continue to enrich the files because the Data harmony software removes many of the clerical aspects of the indexing process, leveraging the mental processing of the staff. The accuracy is high enough that we simply indexed the entire contents of the eXplore database back to the earliest records in a single overnight process. Then to explore the edges of science we also indexed the 1.2 million records using Medical Subject headings and the defense Technical Information Center thesauri with similar accuracy results.
Two bases for the collaboration: Our reputation for accuracy- in this case- how to do the layouts so that the ‘picture’ is accurate Our committement to peripheral vision- doing global vs. local maps.