Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
1. Work Together Effectively
Cross Media Concept and
Entity Driven Search for
Enterprise
Chalitha Perera and Dileepa Jayakody
R&D Engineers
2. Work Together Effectively
• Headquartered in London with office in Colombo, Sri Lanka
• Focused on delivering enterprise content management solutions
• Our Skills
3. Work Together Effectively
Zaizi R&D Department
• Giving sense to the content
– Enriching it semantically
• Adding value to ECM/CMS
– More structured content, easy to manage, link and search
• Improving search
– Across different domains, data sources, User Experience
• Machine Learning applied research
5. Work Together Effectively
Problem
• Unstructured Text Content
– Text documents, PDFs, Word …
• Rapid growth in multimedia content
• Heterogeneous Data Sources
– ECMs (Alfresco, Sharepoint), File System,
Confluence, JIRA …
• Data is not useful without effective methods for
– Knowledge Extraction
– Information Retrieval
6. Work Together Effectively
Current Enterprise Search
Limitations
• Limited to keyword based search
• Search context is not considered
• Ambiguity of terms
• Low precision
• Inability to properly handle multimedia files
7. Work Together Effectively
Desired traits of Solution
• Semantically Enhance documents
– Unstructured text
– Multimedia documents
• Cross media search
• Search with semantic concepts and entities
• Federated Search
– Search across different content repositories
– User permissions
8. Work Together Effectively
Sensefy
• Semantic Enterprise Search Engine
• Cross Media Search
• Federated Search
• Smart Search Assistance
• Open Source
10. Work Together Effectively
Repository Crawler
• Four types of connectors
– Repository Connectors
– Authority Connectors
– Transformation Connectors
– Output Connectors
• Connect different source repositories with different target indexes
– Source repositories (Alfresco, Sharepoint, Confluence etc)
– Target Indexes (Solr, ElasticSearch, Amazon CloudSearch)
• Security Model to enforce source repository security policies
11. Work Together Effectively
Media In Context (MICO)
Platform
• MICO provides an integrated platform for
– Cross media analysis
– Metadata publishing
– Metadata querying
• Sensefy uses MICO as the cross media analysis engine to extract entities and concepts
from multimedia
13. Work Together Effectively
Semantic Content Enrichment
• Named Entity Recognition
– People, places, organizations and concepts
• Entity Linking
– DBpedia, Yago, Custom Enterprise knowledge bases
• Entity Disambiguation
14. Work Together Effectively
Entity Search with Suggestions
• Named Entity Suggestions
• Ability to query with disambiguated entities
• Search results with high precision
– Keyword search results for “ronaldo”- “Cristiano Ronaldo” and “Ronaldo”
– Entity Search - will contain only the documents related to selected entity