SlideShare a Scribd company logo
1 of 39
Alyona Medelyan (Pingar)
                      @zelandiya

    THE NEXT-GENERATION
             SHAREPOINT:
POWERED BY TEXT ANALYTICS
AGENDA
• Information tasks
• Text analytics
• APIs
• Demos
• Conclusions
Information tasks
What do they cost us?
How does SharePoint help?
Avg. hours per week
14.5
       13.3                                               = $37K       year / person


              9.6   9.5
                          8.8   8.3
                                        6.8   6.7
                                                    5.6   5.6
                                                                4.3   4.2

                                                                             1




                                                                     Source:
                                      IDC, Hidden Cost of Information (2005)
SHAREPOINT SAVES TIME
 Interact with SP from Outlook
       Create docs collaboratively
                   Customize search configuration
                              Use sites, sets & libraries
                                     Define Managed Metadata
                                                       Configure forms
                                                            Design Workflow
Text Analytics
What is it and how does it work?
What tasks does it solve?
WHAT IS TEXT ANALYTICS?
                unstructured data



Linguistics                                  Search
   Statistics                          Data Extraction
  Text Processing                    Document Organization
Machine Learning                    Business Intelligence
Natural Language Processing          Opinion Mining
     Text Mining
TEXT ANALYTICS SAVES MORE TIME
    Compose search reports
        Extract entities
                                        … automatically
        Mine opinions & sentiment
              Cluster search results
                   Redact
                           Summarize
                               Generate metadata
                                              Fill databases
                                                     Profanity check
Text Analytics Software
What companies offer text analytics?
What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010
 creating an $835 million market because:

• Unstructured data grows (ex. social)  Text analytics!
• Text analytics is central to effective information access
• Many successes in NLP: IBM Watson, Wolfram Alpha



                                    Full report by Seth Grimes:
                                  http://altaplana.com/TA2011
APPLICATIONS OF TEXT ANALYTICS
            Search & info access                                    39%
Customer experience management                                      39%
             Brand management                                       39%
                          Research                               36%
          Competitive intelligence                            33%
                Customer service                        26%
                       E-discovery                15%
                      Life sciences               15%
                    Product design                15%
                Online commerce             11%
                            Finance        10%
                               Other      9%
            Content management           8%
                Insurance & fraud        8%
              Millitary intelligence    7%
                 Law enforcement       6%                        Source:
                                             http://altaplana.com/TA2011
SEARCH & INFO ACCESS
 METADATA EXTRACTION

Document                  Easy to extract:                Metadata
                          File type, name & location,
                          creation & modification date,
                          authors

           Difficult to extract:
           Keywords,
           people & companies mentioned,
           suppliers & addresses mentioned
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
KEYWORD EXTRACTION

Document     Candidates                                         Keywords



           Hi All,
           As of today, MetaStock has several new functions.
           The most important new feature is the ability to
           display forward heat rate charts.
           Also, notice that the interface looks different -- this
           reflects and accommodates the new features.
           If you have any questions regarding this new
           version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
    KEYWORD EXTRACTION

    Document     Candidates       Properties                        Keywords



               Hi All,
               As of today, MetaStock has several new functions.
 Frequency     The most important new feature is the ability to
    Position   display forward heat rate charts.
Corpus stats   Also, notice that the interface looks different -- this
Relatedness    reflects and accommodates the new features.
               If you have any questions regarding this new
               version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
 KEYWORD EXTRACTION

Document      Candidates       Properties         Scoring        Keywords



            Hi All,
            As of today, MetaStock has several new functions.
Heuristic   The most important new feature is the ability to
 scoring    display forward heat rate charts.
            Also, notice that the interface looks different -- this
Machine     reflects and accommodates the new features.
learning    If you have any questions regarding this new
            version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESS
NAMES EXTRACTION

Document      Examples       Properties       Learning        Names



           If you have any questions regarding this new version of
           MetaStock, please contact Bella Santuri.


                                NLP,
       Training data                            Machine
                             Heuristics,
       (annotations)                            Learning
                             Text mining
<SEARCH + TEXT ANALYTICS> COMPANIES




 Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                        Visualization
  Tweets                Sentiment Analysis
                                                                Summary
  Surveys

Naïve approach: Sentiment-words dictionary!

Negative    Positive    BUT:
  suck      fantastic                        If you are reading this because it
 terrible   excellent                        is your darling fragrance, please
  awful     awesome                          wear it at home exclusively, and
                                             tape the windows shut.

                                                No sentiment words!
BRAND & CUSTOMER MANAGEMENT
   SENTIMENT ANALYSIS

 Reviews
Document
Document                                                  Visualization
  Tweets        Examples     Properties    Learning
                                                          Summary
  Surveys


                                       Presence
                                       Position
Training data          Lexicon                            Machine
                                    Part-of-Speech
(annotations)         induction                           Learning
                                       Negation
                                    Generalization
                Important:
                Identifying sentiment bearing sentences
                Attaching sentiment to a topic!
SENTIMENT ANALYSIS COMPANIES
Attensity
AlchemyAPI
Lexalytics
Saplo
Medallia
SAS
RESEARCH
    TEXT SUMMARIZATION
          Address      Hi All,
    Announcement       As of today, MetaStock has several new functions.
           Details     The most important new feature is the ability to
                       display forward heat rate charts.
       More details    Also, notice that the interface looks different -- this
                       reflects and accommodates the new features.
         Conclusion    If you have any questions regarding this new
                       version of MetaStock, please contact Bella Santuri.

Extractive summary:   As of today, MetaStock has several new functions.
Sentence compression: MetaStock has several new functions.
                      The new interface looks different.
Abstractive summary: MetaStock has new features and a new interface.
TEXT SUMMARIZATION COMPANIES




Lexalytics, Pingar
COMPETITIVE INTELLIGENCE:
ENTITY & ENTITY RELATION EXTRACTION




     Companies:
     OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
FRAUD INVESTIGATION:
NORMALIZATION OF DATES & NAMES




           Companies:
           Cicero, BasisTech
OPEN-SOURCE TOOLS
• NLTK – Apache license, Book, Python & academic
  datasets, nltk.org
• LingPipe – Commercial
  licenses, Tutorials, Coreference & Chinese
  segment, alias-i.com/lingpipe
• OpenNLP – Apache license, Parsing, MaxEnt
  ML, incubator.apache.org/opennlp
• GATE – restricted GPL, Training courses, Applications
  & framework, gate.ac.uk
• Stanford NLP – full GPL, Online docs, Full
  library, nlp.stanford.edu
APIs
What’s an API and how does it work?
What are the advantages of the API model?
Which API is the right one for you?
API ACCESS
                                     a protocol specifies how • SOAP
                                     XML needs to be encoded • REST
                a call is an XML message
                describing the request

                includes API authentication
                calls via a web service
                                              API                          ENGINE
             SDK
               usage examples
Developer creates                       An interface that                Software engine
  an application                     ensures communication             solves a specific task
REST API ACCESS FROM A BROWSER
API request
http://search.yahooapis.com/WebSearchService/V1/webSe
arch?appid=YahooDemo&query=madonna&context=Italian+sc
ulptors+and+painters+of+the+renaissance+favored+the+V
irgin+Mary+for+inspiration
API response
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELL




Read complete blog post “Bulk metadata extraction in SharePoint”:
http://bit.ly/powershell-migrate
API = EASY INTEGRATION & FLEXIBILITY
• Integrate into existing architecture
  via any programming language
• Improve known flaws in the current system/process
• Minimize adoption barriers within the company
  no or little training required for stuff
• Only pay for the features you need
• Flexible deployment:
   • Host API on site = Secure data exchange
   • Access the API in the cloud = Save on tech support & hardware
WHICH API IS BEST FOR YOU?
         I need to take some text and get a list of the
         important entities/keywords/phrases.


          Y: Term Extractor        API restrictions
          OpenCalais               Supported languages
          BeliefNetworks           Quality of results
          OpenAmplify              Semantic links
          AlchemyAPI 2nd           Synonyms/Duplicates
          Evri 1st

                           Blog post on API comparison:
                                      faganm.com/blog
HOW TO CHOOSE AN API:
• Define a specific task
• Think of what features are important
• Get prepared:
  • Subscribe for API keys
  • Get SDKs
  • Learn libraries
• Find representative data
• Build a test framework
• Compare results
METADATA EXTRACTION
IN SHAREPOINT
Demo
Pingar’s add-on for SharePoint 2010
built using a text analytics API
INTEGRATING APIS
INTO SCANNING
Video
Using Fuji Xerox SmartConnect and Pingar API
to scan documents in batch into SharePoint



                       http://www.youtube.com/watch?v=kluVp25upag
THE NEXT-GENERATION SHAREPOINT:
POWERED BY TEXT ANALYTICS
• What can be automated?
  • Metadata extraction, Data entry, Opinion mining,
    Sanitization, Doc approval, Summarization, …

• How to integrate text analytics
  into existing SharePoint applications?
  • Easy! Via an API

• How to find the right text analytics API?
  • Review what’s available
    Set up an experiment
    Compare results
Thank you to all of our Sponsors

More Related Content

Viewers also liked

How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)Brandwatch
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierBasis Technology
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLRBasis Technology
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchBasis Technology
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Basis Technology
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadBasis Technology
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvmAdam Gibson
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?Jos van Dongen
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010Robin.Russell
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoAshnikbiz
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 

Viewers also liked (16)

How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
How Obama Won With Big Data (Sam Zindel at Big Data Brighton)
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
 
Data analytics
Data analyticsData analytics
Data analytics
 
Final Doc_1.1
Final Doc_1.1Final Doc_1.1
Final Doc_1.1
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 

Similar to The Next Generation SharePoint: Powered by Text Analytics

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Umesh Ramalingachar
 
Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Narola Infotech
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
01 necto introduction_ready
01 necto introduction_ready01 necto introduction_ready
01 necto introduction_readywww.panorama.com
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Chris McNulty
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshopbelladati
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesInside Analysis
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)NirupamNishant2
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from MicrosoftAmplexor
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonDatabricks
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentationInSync Conference
 

Similar to The Next Generation SharePoint: Powered by Text Analytics (20)

Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012Mesh Labs Introduction June 2012
Mesh Labs Introduction June 2012
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?Should You Choose Java or Python for Data Science?
Should You Choose Java or Python for Data Science?
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
01 necto introduction_ready
01 necto introduction_ready01 necto introduction_ready
01 necto introduction_ready
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010
 
Belladati Meetup Singapore Workshop
Belladati Meetup Singapore WorkshopBelladati Meetup Singapore Workshop
Belladati Meetup Singapore Workshop
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
The New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front LinesThe New Normal: Predictive Power on the Front Lines
The New Normal: Predictive Power on the Front Lines
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from Microsoft
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Sptechcon2011 mms2010
Sptechcon2011 mms2010Sptechcon2011 mms2010
Sptechcon2011 mms2010
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentation
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

The Next Generation SharePoint: Powered by Text Analytics

  • 1. Alyona Medelyan (Pingar) @zelandiya THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS
  • 2. AGENDA • Information tasks • Text analytics • APIs • Demos • Conclusions
  • 3. Information tasks What do they cost us? How does SharePoint help?
  • 4. Avg. hours per week 14.5 13.3 = $37K year / person 9.6 9.5 8.8 8.3 6.8 6.7 5.6 5.6 4.3 4.2 1 Source: IDC, Hidden Cost of Information (2005)
  • 5. SHAREPOINT SAVES TIME  Interact with SP from Outlook  Create docs collaboratively  Customize search configuration  Use sites, sets & libraries  Define Managed Metadata  Configure forms  Design Workflow
  • 6. Text Analytics What is it and how does it work? What tasks does it solve?
  • 7. WHAT IS TEXT ANALYTICS? unstructured data Linguistics Search Statistics Data Extraction Text Processing Document Organization Machine Learning Business Intelligence Natural Language Processing Opinion Mining Text Mining
  • 8. TEXT ANALYTICS SAVES MORE TIME  Compose search reports  Extract entities … automatically  Mine opinions & sentiment  Cluster search results  Redact  Summarize  Generate metadata  Fill databases  Profanity check
  • 9. Text Analytics Software What companies offer text analytics? What are open source tools like?
  • 10. TEXT ANALYTICS: GLOBAL PERSPECTIVE User adoption has grown by 25% in 2010 creating an $835 million market because: • Unstructured data grows (ex. social)  Text analytics! • Text analytics is central to effective information access • Many successes in NLP: IBM Watson, Wolfram Alpha Full report by Seth Grimes: http://altaplana.com/TA2011
  • 11. APPLICATIONS OF TEXT ANALYTICS Search & info access 39% Customer experience management 39% Brand management 39% Research 36% Competitive intelligence 33% Customer service 26% E-discovery 15% Life sciences 15% Product design 15% Online commerce 11% Finance 10% Other 9% Content management 8% Insurance & fraud 8% Millitary intelligence 7% Law enforcement 6% Source: http://altaplana.com/TA2011
  • 12. SEARCH & INFO ACCESS  METADATA EXTRACTION Document Easy to extract: Metadata File type, name & location, creation & modification date, authors Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
  • 13. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 14. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Keywords Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 15. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Keywords Hi All, As of today, MetaStock has several new functions. Frequency The most important new feature is the ability to Position display forward heat rate charts. Corpus stats Also, notice that the interface looks different -- this Relatedness reflects and accommodates the new features. If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 16. SEARCH & INFO ACCESS KEYWORD EXTRACTION Document Candidates Properties Scoring Keywords Hi All, As of today, MetaStock has several new functions. Heuristic The most important new feature is the ability to scoring display forward heat rate charts. Also, notice that the interface looks different -- this Machine reflects and accommodates the new features. learning If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
  • 17. SEARCH & INFO ACCESS NAMES EXTRACTION Document Examples Properties Learning Names If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. NLP, Training data Machine Heuristics, (annotations) Learning Text mining
  • 18. <SEARCH + TEXT ANALYTICS> COMPANIES Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
  • 19. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Sentiment Analysis Summary Surveys Naïve approach: Sentiment-words dictionary! Negative Positive BUT: suck fantastic If you are reading this because it terrible excellent is your darling fragrance, please awful awesome wear it at home exclusively, and tape the windows shut. No sentiment words!
  • 20. BRAND & CUSTOMER MANAGEMENT  SENTIMENT ANALYSIS Reviews Document Document Visualization Tweets Examples Properties Learning Summary Surveys Presence Position Training data Lexicon Machine Part-of-Speech (annotations) induction Learning Negation Generalization Important: Identifying sentiment bearing sentences Attaching sentiment to a topic!
  • 22. RESEARCH  TEXT SUMMARIZATION Address Hi All, Announcement As of today, MetaStock has several new functions. Details The most important new feature is the ability to display forward heat rate charts. More details Also, notice that the interface looks different -- this reflects and accommodates the new features. Conclusion If you have any questions regarding this new version of MetaStock, please contact Bella Santuri. Extractive summary: As of today, MetaStock has several new functions. Sentence compression: MetaStock has several new functions. The new interface looks different. Abstractive summary: MetaStock has new features and a new interface.
  • 24. COMPETITIVE INTELLIGENCE: ENTITY & ENTITY RELATION EXTRACTION Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
  • 25. FRAUD INVESTIGATION: NORMALIZATION OF DATES & NAMES Companies: Cicero, BasisTech
  • 26. OPEN-SOURCE TOOLS • NLTK – Apache license, Book, Python & academic datasets, nltk.org • LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe • OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp • GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk • Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
  • 27. APIs What’s an API and how does it work? What are the advantages of the API model? Which API is the right one for you?
  • 28. API ACCESS a protocol specifies how • SOAP XML needs to be encoded • REST a call is an XML message describing the request includes API authentication calls via a web service API ENGINE SDK usage examples Developer creates An interface that Software engine an application ensures communication solves a specific task
  • 29. REST API ACCESS FROM A BROWSER API request http://search.yahooapis.com/WebSearchService/V1/webSe arch?appid=YahooDemo&query=madonna&context=Italian+sc ulptors+and+painters+of+the+renaissance+favored+the+V irgin+Mary+for+inspiration API response
  • 30. SOAP API ACCESS FROM VS2010
  • 31. SOAP API ACCESS IN POWERSHELL Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate
  • 32. API = EASY INTEGRATION & FLEXIBILITY • Integrate into existing architecture via any programming language • Improve known flaws in the current system/process • Minimize adoption barriers within the company no or little training required for stuff • Only pay for the features you need • Flexible deployment: • Host API on site = Secure data exchange • Access the API in the cloud = Save on tech support & hardware
  • 33. WHICH API IS BEST FOR YOU? I need to take some text and get a list of the important entities/keywords/phrases. Y: Term Extractor API restrictions OpenCalais Supported languages BeliefNetworks Quality of results OpenAmplify Semantic links AlchemyAPI 2nd Synonyms/Duplicates Evri 1st Blog post on API comparison: faganm.com/blog
  • 34. HOW TO CHOOSE AN API: • Define a specific task • Think of what features are important • Get prepared: • Subscribe for API keys • Get SDKs • Learn libraries • Find representative data • Build a test framework • Compare results
  • 35. METADATA EXTRACTION IN SHAREPOINT Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API
  • 36. INTEGRATING APIS INTO SCANNING Video Using Fuji Xerox SmartConnect and Pingar API to scan documents in batch into SharePoint http://www.youtube.com/watch?v=kluVp25upag
  • 37.
  • 38. THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS • What can be automated? • Metadata extraction, Data entry, Opinion mining, Sanitization, Doc approval, Summarization, … • How to integrate text analytics into existing SharePoint applications? • Easy! Via an API • How to find the right text analytics API? • Review what’s available Set up an experiment Compare results
  • 39. Thank you to all of our Sponsors

Editor's Notes

  1. Opening slide please include
  2. How many hours per week does an average person that uses a computer spends on Searching?What the heck is text analytics, a 101 introduction course…How API work and why they are great for both business people and developers.
  3. What are your primary applications where text comes into play?