SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
.consulting .solutions .partnership
Text Analysis with SAP HANA
Text Analysis with SAP HANA
2© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg
Motivation - Big Data1 3
Text Analysis with SAP HANA2 7
Enhancement Options3 21
Text Analysis with SAP HANA
3© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg
Motivation - Big Data1 3
Text Analysis with SAP HANA2 7
Enhancement Options3 21
Big Data - taking a closer look
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 4
• Big Data is hot topic today, but what is hidden in the “Big Data”?
• According to Merril Lynch 80-90% of all potentially usable business information may originate in
unstructured form
(Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• According to Computer World unstructured information might account for more than 70%–80% of
all data in organizations
(Holzinger, Andreas; et al. (2013). "Combining HCI, Natural Language Processing, and Knowledge Discovery - Potential of IBM Content
Analytics as an Assistive Technology in the Biomedical Field" in Human-Computer Interaction and Knowledge Discovery in Complex,
Unstructured, Big Data. Lecture Notes in Computer Science. Springer. pp. 13–24)
• This data will grow up to 40 zettabytes by 2020
• The data might origin from:
− Social Networks
− Call Centers
− “Letters” from Customer
− ...
What is the Problem with Unstructured Data?
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 5
• It is unstructured!
− Not organized
− No pre-defined data model
− No metadata or mix of data and metadata
Limited/No access to the data via classical programs
• But the data contains valuable information
We have a lot of information that is relevant for the business but we cannot access it
How can we solve that issue?
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 6
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:
− Parsing of the text
− Adding features like linguistic information
− Insertion to database in structured manner
• Examples for typical text analysis tasks:
− Entity recognition: Is it an organization or a person or a place including domain facts like
requests?
− Sentiment analysis: What attitudinal information is “hidden” in the text?
− Relationship, fact and event extraction
Text Analysis with SAP HANA
7© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg
Motivation - Big Data1 3
Text Analysis with SAP HANA2 7
Enhancement Options3 21
What has this to do with SAP HANA?
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 8
© SAP SE
Text Analysis with HANA - Basics
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 9
• Starting point: database table containing the text
• Supported data types are:
− TEXT
− BINTEXT
− NVARCHAR
− VARCHAR
− NCLOB,
− CLOB
− BLOB
Text Analysis with HANA - Basics
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 10
Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 11
Text Analysis with HANA - Basics
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 12
Index properties on the table
Text Analysis with HANA - Basics
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 13
Fulltext index table $TA_*
Text Analysis with HANA – Linguistic Analysis
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 14
LINGANALYSIS_BASIC = Tokenization
Text Analysis with HANA – Linguistic Analysis
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 15
LINGANALYSIS_STEMS = Tokeniziation + Stems
Text Analysis with HANA – Linguistic Analysis
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 16
LINGANALYSIS_FULL = Tokeniziation + Stems + Tagging
Text Analysis with HANA – Entity Extraction
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 17
• In order to get more information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
− EXTRACTION_CORE
− EXTRACTION_CORE_ENTERPRISE
− EXTRACTION_CORE_PUBLIC_SECTOR
− EXTRACTION_CORE_VOICEOFCUSTOMER
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 18
Text Analysis with HANA – Entity Extraction
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 19
EXTRACTION_CORE = Basic Entity Extraction (People, Organizations, Places)
Text Analysis with HANA – Entity Extraction
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 20
EXTRACTION_CORE_VOICEOFCUSTOMER = Basic Entity Extraction + Sentiments
Text Analysis with SAP HANA
21© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg
Motivation - Big Data1 3
Text Analysis with SAP HANA2 7
Enhancement Options3 21
Text Analysis with HANA – Custom Dictionary
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 22
• In several use cases you might need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 23
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 24
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 25
1. Find an extraction configuration that is most fitting for you
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 26
2. Copy the configuration into the target folder
Important: File suffix *.hdbtextconfig
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 27
3. Create a new custom dictionary
Important: File suffix *.hdbtextdict
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 28
4. Reference the dictionary in your configuration copy
Important: You have to specify the full path
Text Analysis with HANA – Workflow of Enhancement
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 29
5. Recreate the fulltext index using your custom configuration
Text Analysis with HANA – Enhancement of Sentiment Analysis
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 30
• Special Case: Enhancement of sentiments
• You can directly enhance/tailor the files delivered by SAP
Text Analysis with HANA – What’s next?
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 31
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples:
− Did somebody attend a CrossFit training?
− Does somebody want to join a CrossFit box?
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 32
Setup and Status Quo
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 33
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or
token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:
− Extract complex facts based on relations between entities and predicates.
− Entity-to-Entity relations to associate entities such as times, dates, and locations, with other
entities
− Identify entities in domain-specific language.
− Capture facts expressed in new, popular “slang”
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 34
Extraction Rule
Regular ExpressionsTokens
Luck ☺Dictionaries
Text Analysis with HANA
Tokens, Operators, Expression Markers and Directives
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 35
• Tokens define the syntactic units of the text analysis
<string, STEM: <stem>, POS: <postag>>
• Example: <activat.*, STEM: activat.*, POS: V>
• Several operators are possible to enable the matching:
− Standard operators e. g. character wildcard “.”, alternations “|”
− Iteration operators
e.g. zero or one occurrence of preceding item “?” ; zero or many occurrence of preceding item “*”
− Grouping and containment operators, e. g. item group “( )”, range groups “[ ]”
Text Analysis with HANA
Tokens, Operators, Expression Markers and Directives
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 36
• Expression Markers allow the definition of delimiters of the searched terms
• Several markers are available:
− Paragraph Marker: Specifies beginning and end of paragraph – [P]
− Entity Marker: Limits an expression to one or several entity types – [TE] <expr> [/TE]
− Sentence Marker: Specifies the beginning and end of a sentence – [SN] [/SN]
− Clause Container: Matches entire clause if expression is matched somewhere in the clause
[CC] <expr> [/CC]
Text Analysis with HANA
Tokens, Operators, Expression Markers and Directives
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 37
• Directives allow the definition of character classes, groups of tokens and relation types
• #define (character class): denotes character expressions
Example: #define ALPHA: [A-Za-z]
• #subgroup (group of tokens): defines a group of one or more tokens
Example: #subgroup Cloud: <HCP>|<AWS>|<Azure>
• #group (relation type): definition of custom facts and entity types consisting of one or more
tokens
Example:
#group HANA: <HANA>
#group HANANATIVE: %(HANA) <native>
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 38
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 39
Step 1 – Create a dictionary (It is all about entities)
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 40
Step 2 – Create a custom configuration
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 41
Recreate the fulltext index with the custom configuration
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 42
Next step: Create a simple plain text rule (*.hdbtextrule) and adopt configuration
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 43
Result of the plain rule
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 44
Refactor and enhance the rule
Text Analysis with HANA – Text Analysis Extraction Rules
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 45
Reduce the extracted entities using the PreProcessor Configuration
Text Analysis with HANA – Summary
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 46
• SAP HANA contains a lot of functionality
• One very powerful feature is text analysis
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the
entities and facts that you need
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• Creating custom dictionaries and text rules is cumbersome
No support in IDE
• The results of the text analysis form the basis of predictive analytics (also part of SAP HANA ☺)
© msg | September 2015 | SAP Web IDE - IT Conference on SAP Technologies by msg 47
Q&A
.consulting .solutions .partnership
Dr. Christian Lechner
Principal IT Consultant
+49 (0) 171 7617190
christian.lechner@msg-systems.com
msg systems ag (Headquarters)
Robert-Buerkle-Str. 1, 85737 Ismaning
Germany
www.msg-systems.com
Text Analysis with HANA – Ressources
© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 49
• SAP HANA Search Developer Guide (Fulltext Index Options)
help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide:
help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide:
help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:
help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:
Text Analysis and Search

Mais conteúdo relacionado

Mais procurados

SAP HANA SPS09 - HANA IM Services
SAP HANA SPS09 - HANA IM ServicesSAP HANA SPS09 - HANA IM Services
SAP HANA SPS09 - HANA IM ServicesSAP Technology
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSAP Technology
 
Dmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaDmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaLuc Vanrobays
 
SAP HANA SPS09 - HANA Modeling
SAP HANA SPS09 - HANA ModelingSAP HANA SPS09 - HANA Modeling
SAP HANA SPS09 - HANA ModelingSAP Technology
 
SAP HANA SPS09 - Development Tools
SAP HANA SPS09 - Development ToolsSAP HANA SPS09 - Development Tools
SAP HANA SPS09 - Development ToolsSAP Technology
 
SAP HANA SPS10- Extended Application Services (XS) Programming Model
SAP HANA SPS10- Extended Application Services (XS) Programming ModelSAP HANA SPS10- Extended Application Services (XS) Programming Model
SAP HANA SPS10- Extended Application Services (XS) Programming ModelSAP Technology
 
HANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessHANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessSAP Technology
 
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveDmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveLuc Vanrobays
 
HANA SPS07 Fulltext Search
HANA SPS07 Fulltext SearchHANA SPS07 Fulltext Search
HANA SPS07 Fulltext SearchSAP Technology
 
What's New in SAP HANA SPS 11 Operations
What's New in SAP HANA SPS 11 OperationsWhat's New in SAP HANA SPS 11 Operations
What's New in SAP HANA SPS 11 OperationsSAP Technology
 
SAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial DataSAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial DataSAP Technology
 
SAP Integrated Business Planning
SAP Integrated Business PlanningSAP Integrated Business Planning
SAP Integrated Business PlanningKishore Chaganti
 
DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015Luc Vanrobays
 
SQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSAP Technology
 
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Luc Vanrobays
 
SAP HANA SPS10- SQLScript
SAP HANA SPS10- SQLScriptSAP HANA SPS10- SQLScript
SAP HANA SPS10- SQLScriptSAP Technology
 
Dmm212 – Sap Hana Graph Processing
Dmm212 – Sap Hana  Graph ProcessingDmm212 – Sap Hana  Graph Processing
Dmm212 – Sap Hana Graph ProcessingLuc Vanrobays
 
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. Gaganpreet Singh
 

Mais procurados (20)

SAP HANA SPS09 - HANA IM Services
SAP HANA SPS09 - HANA IM ServicesSAP HANA SPS09 - HANA IM Services
SAP HANA SPS09 - HANA IM Services
 
SAP HANA SPS10- SHINE
SAP HANA SPS10- SHINESAP HANA SPS10- SHINE
SAP HANA SPS10- SHINE
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business Operations
 
Dmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hanaDmm203 – new approaches for data modelingwith sap hana
Dmm203 – new approaches for data modelingwith sap hana
 
SAP HANA SPS09 - HANA Modeling
SAP HANA SPS09 - HANA ModelingSAP HANA SPS09 - HANA Modeling
SAP HANA SPS09 - HANA Modeling
 
SAP HANA SPS09 - Development Tools
SAP HANA SPS09 - Development ToolsSAP HANA SPS09 - Development Tools
SAP HANA SPS09 - Development Tools
 
SAP HANA SPS10- Extended Application Services (XS) Programming Model
SAP HANA SPS10- Extended Application Services (XS) Programming ModelSAP HANA SPS10- Extended Application Services (XS) Programming Model
SAP HANA SPS10- Extended Application Services (XS) Programming Model
 
HANA SPS07 Smart Data Access
HANA SPS07 Smart Data AccessHANA SPS07 Smart Data Access
HANA SPS07 Smart Data Access
 
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and PredictiveDmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
Dmm117 – SAP HANA Processing Services Text Spatial Graph Series and Predictive
 
HANA SPS07 Fulltext Search
HANA SPS07 Fulltext SearchHANA SPS07 Fulltext Search
HANA SPS07 Fulltext Search
 
What's New in SAP HANA SPS 11 Operations
What's New in SAP HANA SPS 11 OperationsWhat's New in SAP HANA SPS 11 Operations
What's New in SAP HANA SPS 11 Operations
 
SAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial DataSAP Helps Reduce Silos Between Business and Spatial Data
SAP Helps Reduce Silos Between Business and Spatial Data
 
SAP Integrated Business Planning
SAP Integrated Business PlanningSAP Integrated Business Planning
SAP Integrated Business Planning
 
DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015DMM161 HANA_MODELING_2015
DMM161 HANA_MODELING_2015
 
SQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of ThingsSQL Anywhere and the Internet of Things
SQL Anywhere and the Internet of Things
 
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
Dmm302 - Sap Hana Data Warehousing: Models for Sap Bw and SQL DW on SAP HANA
 
SAP HANA SPS10- SQLScript
SAP HANA SPS10- SQLScriptSAP HANA SPS10- SQLScript
SAP HANA SPS10- SQLScript
 
Dmm212 – Sap Hana Graph Processing
Dmm212 – Sap Hana  Graph ProcessingDmm212 – Sap Hana  Graph Processing
Dmm212 – Sap Hana Graph Processing
 
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators. SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators.
 
EA261_2015
EA261_2015EA261_2015
EA261_2015
 

Destaque

SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform Christian Lechner
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP Technology
 
SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture Matthias Steiner
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
SAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationSAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationBernhard Luecke
 
What's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptWhat's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptSAP Technology
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
What's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingWhat's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingSAP Technology
 

Destaque (8)

SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
SAP Inside Track Munich 2016 - SAP HANA Cloud Platform
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
 
SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture SAP HANA Cloud Platform - The big picture
SAP HANA Cloud Platform - The big picture
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
SAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for InnovationSAP Platform & S/4 HANA - Support for Innovation
SAP Platform & S/4 HANA - Support for Innovation
 
What's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScriptWhat's new in SAP HANA SPS 11 SQL/SQLScript
What's new in SAP HANA SPS 11 SQL/SQLScript
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
What's New in SAP HANA View Modeling
What's New in SAP HANA View ModelingWhat's New in SAP HANA View Modeling
What's New in SAP HANA View Modeling
 

Semelhante a Text Analysis with SAP HANA

5016_s_4hana_embedded_analytics.pdf
5016_s_4hana_embedded_analytics.pdf5016_s_4hana_embedded_analytics.pdf
5016_s_4hana_embedded_analytics.pdfssuser196b2d1
 
2017 04-05-de-email-s4hana-bickenbach
2017 04-05-de-email-s4hana-bickenbach2017 04-05-de-email-s4hana-bickenbach
2017 04-05-de-email-s4hana-bickenbachKrishnagoud Dasari
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Semantic Web Company
 
The S/4 HANA Programing Paradigm
The S/4 HANA Programing ParadigmThe S/4 HANA Programing Paradigm
The S/4 HANA Programing Paradigmmsg systems Romania
 
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAPGeneXus
 
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16Thomas Demmler
 
shankarresumehacer
shankarresumehacershankarresumehacer
shankarresumehacershankar k
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016IXIASOFT
 
SAP Systems Integration by SAP PI (XI)
SAP Systems Integration by SAP PI (XI)SAP Systems Integration by SAP PI (XI)
SAP Systems Integration by SAP PI (XI)alpercelk
 
SAP HANA SQL Data Warehousing (Sefan Linders)
SAP HANA SQL Data Warehousing (Sefan Linders)SAP HANA SQL Data Warehousing (Sefan Linders)
SAP HANA SQL Data Warehousing (Sefan Linders)Twan van den Broek
 
Sap pi overview
Sap pi overviewSap pi overview
Sap pi overviewsmavachee
 
Enterprise Applications, Microservices and SAP HANA Cloud Platform
Enterprise Applications, Microservices and SAP HANA Cloud PlatformEnterprise Applications, Microservices and SAP HANA Cloud Platform
Enterprise Applications, Microservices and SAP HANA Cloud Platformmsg systems Romania
 
Applying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisApplying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisDan Segal
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processesJULIO GONZALEZ SANZ
 
Asat Overview V1.0
Asat Overview V1.0Asat Overview V1.0
Asat Overview V1.0pramodbizz
 
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits Achieved
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits AchievedCase study : New SAP S/4HANA on SUSE Implementation Business Benefits Achieved
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits AchievedSUSE
 

Semelhante a Text Analysis with SAP HANA (20)

Project report
Project reportProject report
Project report
 
5016_s_4hana_embedded_analytics.pdf
5016_s_4hana_embedded_analytics.pdf5016_s_4hana_embedded_analytics.pdf
5016_s_4hana_embedded_analytics.pdf
 
Newest mmis resume
Newest mmis  resumeNewest mmis  resume
Newest mmis resume
 
2017 04-05-de-email-s4hana-bickenbach
2017 04-05-de-email-s4hana-bickenbach2017 04-05-de-email-s4hana-bickenbach
2017 04-05-de-email-s4hana-bickenbach
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
 
The S/4 HANA Programing Paradigm
The S/4 HANA Programing ParadigmThe S/4 HANA Programing Paradigm
The S/4 HANA Programing Paradigm
 
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
97. SAP HANA como plataforma de desarrollo, combinando el mundo OLTP + OLAP
 
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16
What’s New in SAP Extended ECM 16 and SAP Archiving and Document Access 16
 
shankarresumehacer
shankarresumehacershankarresumehacer
shankarresumehacer
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
 
SAP Systems Integration by SAP PI (XI)
SAP Systems Integration by SAP PI (XI)SAP Systems Integration by SAP PI (XI)
SAP Systems Integration by SAP PI (XI)
 
SAP Reuse Tools
SAP Reuse Tools SAP Reuse Tools
SAP Reuse Tools
 
SAP HANA SQL Data Warehousing (Sefan Linders)
SAP HANA SQL Data Warehousing (Sefan Linders)SAP HANA SQL Data Warehousing (Sefan Linders)
SAP HANA SQL Data Warehousing (Sefan Linders)
 
Sap pi overview
Sap pi overviewSap pi overview
Sap pi overview
 
Analyst Toolbox August 2017
Analyst Toolbox August 2017Analyst Toolbox August 2017
Analyst Toolbox August 2017
 
Enterprise Applications, Microservices and SAP HANA Cloud Platform
Enterprise Applications, Microservices and SAP HANA Cloud PlatformEnterprise Applications, Microservices and SAP HANA Cloud Platform
Enterprise Applications, Microservices and SAP HANA Cloud Platform
 
Applying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword AnalysisApplying NLP and Machine Learning to Keyword Analysis
Applying NLP and Machine Learning to Keyword Analysis
 
Sap analytics creating smart business processes
Sap analytics   creating smart business processesSap analytics   creating smart business processes
Sap analytics creating smart business processes
 
Asat Overview V1.0
Asat Overview V1.0Asat Overview V1.0
Asat Overview V1.0
 
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits Achieved
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits AchievedCase study : New SAP S/4HANA on SUSE Implementation Business Benefits Achieved
Case study : New SAP S/4HANA on SUSE Implementation Business Benefits Achieved
 

Último

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Text Analysis with SAP HANA

  • 2. Text Analysis with SAP HANA 2© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg Motivation - Big Data1 3 Text Analysis with SAP HANA2 7 Enhancement Options3 21
  • 3. Text Analysis with SAP HANA 3© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg Motivation - Big Data1 3 Text Analysis with SAP HANA2 7 Enhancement Options3 21
  • 4. Big Data - taking a closer look © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 4 • Big Data is hot topic today, but what is hidden in the “Big Data”? • According to Merril Lynch 80-90% of all potentially usable business information may originate in unstructured form (Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.) • According to Computer World unstructured information might account for more than 70%–80% of all data in organizations (Holzinger, Andreas; et al. (2013). "Combining HCI, Natural Language Processing, and Knowledge Discovery - Potential of IBM Content Analytics as an Assistive Technology in the Biomedical Field" in Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data. Lecture Notes in Computer Science. Springer. pp. 13–24) • This data will grow up to 40 zettabytes by 2020 • The data might origin from: − Social Networks − Call Centers − “Letters” from Customer − ...
  • 5. What is the Problem with Unstructured Data? © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 5 • It is unstructured! − Not organized − No pre-defined data model − No metadata or mix of data and metadata Limited/No access to the data via classical programs • But the data contains valuable information We have a lot of information that is relevant for the business but we cannot access it
  • 6. How can we solve that issue? © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 6 • Text Analysis: Extracting high quality information from texts • Typical process of a text analysis: − Parsing of the text − Adding features like linguistic information − Insertion to database in structured manner • Examples for typical text analysis tasks: − Entity recognition: Is it an organization or a person or a place including domain facts like requests? − Sentiment analysis: What attitudinal information is “hidden” in the text? − Relationship, fact and event extraction
  • 7. Text Analysis with SAP HANA 7© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg Motivation - Big Data1 3 Text Analysis with SAP HANA2 7 Enhancement Options3 21
  • 8. What has this to do with SAP HANA? © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 8 © SAP SE
  • 9. Text Analysis with HANA - Basics © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 9 • Starting point: database table containing the text • Supported data types are: − TEXT − BINTEXT − NVARCHAR − VARCHAR − NCLOB, − CLOB − BLOB
  • 10. Text Analysis with HANA - Basics © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 10 Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)
  • 11. © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 11
  • 12. Text Analysis with HANA - Basics © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 12 Index properties on the table
  • 13. Text Analysis with HANA - Basics © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 13 Fulltext index table $TA_*
  • 14. Text Analysis with HANA – Linguistic Analysis © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 14 LINGANALYSIS_BASIC = Tokenization
  • 15. Text Analysis with HANA – Linguistic Analysis © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 15 LINGANALYSIS_STEMS = Tokeniziation + Stems
  • 16. Text Analysis with HANA – Linguistic Analysis © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 16 LINGANALYSIS_FULL = Tokeniziation + Stems + Tagging
  • 17. Text Analysis with HANA – Entity Extraction © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 17 • In order to get more information out of the data SAP delivers several configurations • These configurations focus on entity and fact extraction under specific aspects • Types of Extraction: − EXTRACTION_CORE − EXTRACTION_CORE_ENTERPRISE − EXTRACTION_CORE_PUBLIC_SECTOR − EXTRACTION_CORE_VOICEOFCUSTOMER
  • 18. © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 18
  • 19. Text Analysis with HANA – Entity Extraction © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 19 EXTRACTION_CORE = Basic Entity Extraction (People, Organizations, Places)
  • 20. Text Analysis with HANA – Entity Extraction © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 20 EXTRACTION_CORE_VOICEOFCUSTOMER = Basic Entity Extraction + Sentiments
  • 21. Text Analysis with SAP HANA 21© msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg Motivation - Big Data1 3 Text Analysis with SAP HANA2 7 Enhancement Options3 21
  • 22. Text Analysis with HANA – Custom Dictionary © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 22 • In several use cases you might need to enhance the dictionary due to your business domain • Structure of a dictionary © SAP SE
  • 23. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 23 1. Find an extraction configuration that is most fitting for you 2. Copy the configuration into the target folder 3. Create a new custom dictionary 4. Reference the dictionary in your configuration copy 5. Recreate the fulltext index using your custom configuration
  • 24. © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 24
  • 25. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 25 1. Find an extraction configuration that is most fitting for you
  • 26. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 26 2. Copy the configuration into the target folder Important: File suffix *.hdbtextconfig
  • 27. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 27 3. Create a new custom dictionary Important: File suffix *.hdbtextdict
  • 28. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 28 4. Reference the dictionary in your configuration copy Important: You have to specify the full path
  • 29. Text Analysis with HANA – Workflow of Enhancement © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 29 5. Recreate the fulltext index using your custom configuration
  • 30. Text Analysis with HANA – Enhancement of Sentiment Analysis © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 30 • Special Case: Enhancement of sentiments • You can directly enhance/tailor the files delivered by SAP
  • 31. Text Analysis with HANA – What’s next? © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 31 • Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities • Good example for this are sports! • We use the example of CrossFit® … as there are some funny facts to extract • Question: How can we extract complex entities from a text? • Examples: − Did somebody attend a CrossFit training? − Does somebody want to join a CrossFit box?
  • 32. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 32 Setup and Status Quo
  • 33. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 33 • Extraction rules (CGUL rules): pattern-based language for pattern matching using character or token-based regular expressions combined with linguistic attributes to define custom entity types. • Goal of the rule sets: − Extract complex facts based on relations between entities and predicates. − Entity-to-Entity relations to associate entities such as times, dates, and locations, with other entities − Identify entities in domain-specific language. − Capture facts expressed in new, popular “slang”
  • 34. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 34 Extraction Rule Regular ExpressionsTokens Luck ☺Dictionaries
  • 35. Text Analysis with HANA Tokens, Operators, Expression Markers and Directives © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 35 • Tokens define the syntactic units of the text analysis <string, STEM: <stem>, POS: <postag>> • Example: <activat.*, STEM: activat.*, POS: V> • Several operators are possible to enable the matching: − Standard operators e. g. character wildcard “.”, alternations “|” − Iteration operators e.g. zero or one occurrence of preceding item “?” ; zero or many occurrence of preceding item “*” − Grouping and containment operators, e. g. item group “( )”, range groups “[ ]”
  • 36. Text Analysis with HANA Tokens, Operators, Expression Markers and Directives © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 36 • Expression Markers allow the definition of delimiters of the searched terms • Several markers are available: − Paragraph Marker: Specifies beginning and end of paragraph – [P] − Entity Marker: Limits an expression to one or several entity types – [TE] <expr> [/TE] − Sentence Marker: Specifies the beginning and end of a sentence – [SN] [/SN] − Clause Container: Matches entire clause if expression is matched somewhere in the clause [CC] <expr> [/CC]
  • 37. Text Analysis with HANA Tokens, Operators, Expression Markers and Directives © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 37 • Directives allow the definition of character classes, groups of tokens and relation types • #define (character class): denotes character expressions Example: #define ALPHA: [A-Za-z] • #subgroup (group of tokens): defines a group of one or more tokens Example: #subgroup Cloud: <HCP>|<AWS>|<Azure> • #group (relation type): definition of custom facts and entity types consisting of one or more tokens Example: #group HANA: <HANA> #group HANANATIVE: %(HANA) <native>
  • 38. © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 38
  • 39. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 39 Step 1 – Create a dictionary (It is all about entities)
  • 40. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 40 Step 2 – Create a custom configuration
  • 41. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 41 Recreate the fulltext index with the custom configuration
  • 42. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 42 Next step: Create a simple plain text rule (*.hdbtextrule) and adopt configuration
  • 43. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 43 Result of the plain rule
  • 44. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 44 Refactor and enhance the rule
  • 45. Text Analysis with HANA – Text Analysis Extraction Rules © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 45 Reduce the extracted entities using the PreProcessor Configuration
  • 46. Text Analysis with HANA – Summary © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 46 • SAP HANA contains a lot of functionality • One very powerful feature is text analysis • Besides the delivered content you have a lot of options to adopt the text analysis to extract the entities and facts that you need • Since SP09 rules get compiled upon activation (no separate compilation necessary) • Creating custom dictionaries and text rules is cumbersome No support in IDE • The results of the text analysis form the basis of predictive analytics (also part of SAP HANA ☺)
  • 47. © msg | September 2015 | SAP Web IDE - IT Conference on SAP Technologies by msg 47 Q&A
  • 48. .consulting .solutions .partnership Dr. Christian Lechner Principal IT Consultant +49 (0) 171 7617190 christian.lechner@msg-systems.com msg systems ag (Headquarters) Robert-Buerkle-Str. 1, 85737 Ismaning Germany www.msg-systems.com
  • 49. Text Analysis with HANA – Ressources © msg | September 2015 | Text Analysis with SAP HANA - IT Conference on SAP Technologies by msg 49 • SAP HANA Search Developer Guide (Fulltext Index Options) help.sap.com -> Search Developer Guide • SAP HANA Text Analysis Developer Guide: help.sap.com -> TA Developer Guide • SAP HANA Text Analysis Language Reference Guide: help.sap.com -> TA Language Refrence Guide • SAP HANA Text Analysis Extraction Customization Guide: help.sap.com -> TA Extraction Customization Guide • YouTube Playlist of SAP HANA Academy: Text Analysis and Search