SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
打造面向金融場景的中文自然語言理解引擎
數據研究發展中心
陳皓遠
About me
• Member of AI group, CTBC Data R&D Center
• Past experience on
• Cyber security and defense industry
• Smartphone industry
• Familiar with
• Machine learning
• Natural language processing
• Software development
• Cloud native architecture design
Team
• CTBC Data R&D Center AI group is founded in 2018
• AI group is composed of data scientists and software developers
• Our mission is to realize AI-based solution in banking scenario
• We currently focus on
• Computer Vision (CV)
• Natural Language Processing (NLP)
Retrieved from https://www.ithome.com.tw/news/131697
Achievement
NLP
• Pluto: A Deep Learning based Watchdog for
Anti Money Laundering
• First Vertical AI paradigm in RegTech
field in CTBC globally
• Daily reduce 67% human effort on
adverse media screening
• Publication
• https://www.aclweb.org/anthology/W19-5515
CV
• NIST Face Recognition Verification Test (FRVT)
• Rank 35th globally
• Rank 2nd in Taiwan industry
• X-ATM for fraud avoidance
名次 企業名稱 國家 FRR
10 Sensetine(商湯) 中國 0.0092
18 Face++(曠視) 中國 0.0145
26 CyberLink (訊連) 台灣 0.0195
29 Tencent Deepsea (騰訊) 中國 0.0215
35 CTBC BANK (中國信託) 台灣 0.0250
39 Gorilla Technology(大猩猩) 台灣 0.0291
55 Kneron Inc. (耐能) 台灣 0.0902
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Digitalized channel plays an important role
遠見雜誌 - 2018數位⾦融⼒調查
Retrieved from https://www.gvm.com.tw/article.html?id=54981
Abundant Platform for Conversational Assistants
messaging platform
Google Home Amazon Echo
• A task-oriented dialogue system
• Chat in natural language
• Be realized on Amazon Alexa
Eno, your Capital One dialogue assistant
Motivation
• Realize a task-oriented dialogue system on heterogeneous conversational platforms
in Mandarin to serve customers facing banking scenario
Prerequisite
• A natural language understanding
(NLU)
• intent recognition (IR)
• named entity recognition (NER)
NLU
IR NER
美元定存六個月期的利率是多少
• Intent
• 查詢利率
• Entity
• 幣別:美元
• 帳戶類型:定存
• 期數:六個月
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Key Components in NLU
• Deep Neural Networks (DNN)
• Conditional Random Field (CRF)
• Recurrent Neural Network (RNN)
Preprocessing
Tokenizer POS tagger
Modeling Modeling
Embeddings
Supervised learning method
vectorization
• Intent Recognizer
• Classification problem
• Named Entity Extractor
• Sequence labeling problem
Approach
Data Preparation
• Intent dataset
• 1016 samples over 3 distinct classes
• 試算匯兌, 查詢存款利率, 查詢台外幣餘額
• Named entity dataset
• 977 samples over 6 distinct entities
• amount, money, duration, currency, acnt_type, timestamp
Great
acknowledgment
for
數位金融處
and
個金數位營運處
Intent Classification Techniques
• Preprocessing
• Tokenization (ckiptagger)
• Feature extraction
• Bag of Word (scikit-learn)
Vocabulary
[ “現在”, “台幣”,”美金”, “日圓”,“一
年期”, “定存”,“是”, “多少”]
現在美金一年期定存是多少
Text
現在 美金 一年期 定存 是 多少
Tokens
• Model
• Deep Neural Network
(DNN) (tensorflow)
[ 1 , 0 , 1 , 0 , 1 , 1 ]
Feature vector
Word Count encodingFeature engineering
Model Training
Named Entity Recognition Techniques
• Preprocessing
• Tokenization (ckiptagger)
• POS tagging (ckiptagger)
• Feature extraction
• Text and POS tags
within context
Model I : CRF for Word-Level Feature
現在美金一年期定存是多少
Text
現在(Nd) 美金(Na) 一年期(Na) 定存(Na) 是(SHI) 多少(Neqa)
Tokens
…, ( -1:現在, -1:Nd, 0:美金, 0:Na, 1:一年期, 1:NA ), …
Feature vector
Context windows: 3 tokens
• Model
• Conditional Random Field
(CRF) (scikit-learn)
Feature engineering
Model Training
Named Entity Recognition Techniques
• Preprocessing
• Tokenization (ckiptagger)
Model II : Bi-LSTM-CRF for Word-Level Embedding
現在美金一年期定存是多少
Text
現在 美金 一年期 定存 是 多少
Tokens
• Model
• Embedding Layer (keras)
• Long Short-Term Memory
(LSTM) layer (keras)
• CRF layer (keras)
Embedding learning
Features learning
Model training
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Evaluation
Methodology
Metrics
Precision Recall F1-Score
Confusion Matrix
實際 Yes 實際 No
預測 Yes True Positive (TP) False Positive (FP)
預測 No False Negative (FN) True Negative (TN)
Reference: https://en.wikipedia.org/wiki/Confusion_matrix
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Evaluation
Precision and Recall
Intent classification
0.91
0.98
0.97
0.94
0.95
0.96
0.93
0.96 0.96
0.88
0.90
0.92
0.94
0.96
0.98
1.00
查詢台外幣餘額 查詢存款利率 試算匯兌
Precision Recall F1-Score
Evaluation
Precision
Named Entity Recognition
0.79
0.75
0.85
0.74
0.55
0.90
0.98
0.93
0.80
0.89
0.81
0.96
0.00
0.20
0.40
0.60
0.80
1.00
1.20
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Evaluation
Recall
Named Entity Recognition
0.82
0.55
0.78
0.67
0.52
0.940.95
0.67
0.79 0.80
0.89
0.72
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Evaluation
F1-Score
Named Entity Recognition
0.81
0.64
0.82
0.68
0.52
0.92
0.97
0.71 0.72
0.84
0.88
0.82
0.00
0.20
0.40
0.60
0.80
1.00
1.20
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Prototype
Conversational AI with Rasa framework: https://github.com/RasaHQ/rasa
NLU
Prototype
Why Rasa ?
Extendible Architecture Open sourceOwn Our Data
• Preserve privacy
• Do not hand data over
to big tech company
• Transparency
• Community support
• Task-oriented dialogue
architecture
• Customizable
components
Rasa characteristics
CTBC strategy
• Customize Mandarin-
based component
• Integration on core
technology
• Compliance on Security and Regulation
• Customized scenario
• Ownership on core technology
Prototype
• Intent recognition
• CKIP Tokenizer (customized)
• EmbeddingIntentClassifier (built-in)
• Named Entity Recognition
• CKIP Tokenizer (customized)
• Bi-LSTM-CRF for Word-Level Embedding
(customized)
Prototype
Demo
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Conclusion
• NLU is a key module in task-oriented dialogue systems
• Intent recognizer and entity extractor are key components to realize NLU by machine
learning techniques and annotated data
• DNN performs generally better than traditional method but not for all tasks
• Rasa powered by open source offers a framework for conversational assistant
development from scratch
Summary
Conclusion
• Transfer learning based on pre-trained word embeddings initialization
• Word-based embeddings vs. char-based embeddings
• Model engineering
What’s next
Q&A

Mais conteúdo relacionado

Mais procurados

S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]
S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]
S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]日本マイクロソフト株式会社
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersSeiya Mizuno
 
Python による 「スクレイピング & 自然言語処理」入門
Python による 「スクレイピング & 自然言語処理」入門Python による 「スクレイピング & 自然言語処理」入門
Python による 「スクレイピング & 自然言語処理」入門Tatsuya Tojima
 
Python で OAuth2 をつかってみよう!
Python で OAuth2 をつかってみよう!Python で OAuth2 をつかってみよう!
Python で OAuth2 をつかってみよう!Project Samurai
 
PythonによるOPC-UAの利用
PythonによるOPC-UAの利用PythonによるOPC-UAの利用
PythonによるOPC-UAの利用Kioto Hirahara
 
Airflowを広告データのワークフローエンジンとして運用してみた話
Airflowを広告データのワークフローエンジンとして運用してみた話Airflowを広告データのワークフローエンジンとして運用してみた話
Airflowを広告データのワークフローエンジンとして運用してみた話Katsunori Kanda
 
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)NTT DATA Technology & Innovation
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!mosa siru
 
やはりお前らのMVCは間違っている
やはりお前らのMVCは間違っているやはりお前らのMVCは間違っている
やはりお前らのMVCは間違っているKoichi Tanaka
 
App center analyticsを使い倒そう
App center analyticsを使い倒そうApp center analyticsを使い倒そう
App center analyticsを使い倒そうAtsushi Nakamura
 
初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!Tetsutaro Watanabe
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例Tetsutaro Watanabe
 
BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話itkr
 
OSTree: OSイメージとパッケージシステムの間にGitのアプローチを
OSTree: OSイメージとパッケージシステムの間にGitのアプローチをOSTree: OSイメージとパッケージシステムの間にGitのアプローチを
OSTree: OSイメージとパッケージシステムの間にGitのアプローチをi_yudai
 
SlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますSlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますMamoru Ohashi
 
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門Satoru Ishikawa
 
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)NTT DATA Technology & Innovation
 
ブロックチェーンを用いた自己主権型デジタルID管理
ブロックチェーンを用いた自己主権型デジタルID管理ブロックチェーンを用いた自己主権型デジタルID管理
ブロックチェーンを用いた自己主権型デジタルID管理Hyperleger Tokyo Meetup
 

Mais procurados (20)

S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]
S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]
S12_Azure AD 活用術!アプリケーション認証を ADFS から移行しましょう。 [Microsoft Japan Digital Days]
 
Apache Avro vs Protocol Buffers
Apache Avro vs Protocol BuffersApache Avro vs Protocol Buffers
Apache Avro vs Protocol Buffers
 
Python による 「スクレイピング & 自然言語処理」入門
Python による 「スクレイピング & 自然言語処理」入門Python による 「スクレイピング & 自然言語処理」入門
Python による 「スクレイピング & 自然言語処理」入門
 
Python で OAuth2 をつかってみよう!
Python で OAuth2 をつかってみよう!Python で OAuth2 をつかってみよう!
Python で OAuth2 をつかってみよう!
 
PythonによるOPC-UAの利用
PythonによるOPC-UAの利用PythonによるOPC-UAの利用
PythonによるOPC-UAの利用
 
Airflowを広告データのワークフローエンジンとして運用してみた話
Airflowを広告データのワークフローエンジンとして運用してみた話Airflowを広告データのワークフローエンジンとして運用してみた話
Airflowを広告データのワークフローエンジンとして運用してみた話
 
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)
PCIDSSで学ぶNeuVectorの基礎(Kubernetes Novice Tokyo #21 発表資料)
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!
 
やはりお前らのMVCは間違っている
やはりお前らのMVCは間違っているやはりお前らのMVCは間違っている
やはりお前らのMVCは間違っている
 
App center analyticsを使い倒そう
App center analyticsを使い倒そうApp center analyticsを使い倒そう
App center analyticsを使い倒そう
 
初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!初心者向けMongoDBのキホン!
初心者向けMongoDBのキホン!
 
MLflow + Kubeflow MLプラットフォーム事例 #sparktokyo
MLflow + Kubeflow MLプラットフォーム事例 #sparktokyoMLflow + Kubeflow MLプラットフォーム事例 #sparktokyo
MLflow + Kubeflow MLプラットフォーム事例 #sparktokyo
 
データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例データ収集の基本と「JapanTaxi」アプリにおける実践例
データ収集の基本と「JapanTaxi」アプリにおける実践例
 
BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話BigQuery で 150万円 使ったときの話
BigQuery で 150万円 使ったときの話
 
OSTree: OSイメージとパッケージシステムの間にGitのアプローチを
OSTree: OSイメージとパッケージシステムの間にGitのアプローチをOSTree: OSイメージとパッケージシステムの間にGitのアプローチを
OSTree: OSイメージとパッケージシステムの間にGitのアプローチを
 
SlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行しますSlideShareをやめて SpeakerDeckに移行します
SlideShareをやめて SpeakerDeckに移行します
 
データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門データ分析を支える技術 DWH再入門
データ分析を支える技術 DWH再入門
 
動画配信プラットフォーム on AWS
動画配信プラットフォーム on AWS動画配信プラットフォーム on AWS
動画配信プラットフォーム on AWS
 
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
 
ブロックチェーンを用いた自己主権型デジタルID管理
ブロックチェーンを用いた自己主権型デジタルID管理ブロックチェーンを用いた自己主権型デジタルID管理
ブロックチェーンを用いた自己主権型デジタルID管理
 

Semelhante a 打造面向金融場景的中文自然語言理解引擎

Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud preventionVolodymyr Syzonenko
 
[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon형식 김
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token EngineeringTrent McConaghy
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...Edge AI and Vision Alliance
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertISSA LA
 
AI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAmazon Web Services
 
ScreenIT October 2012
ScreenIT October 2012ScreenIT October 2012
ScreenIT October 2012snapstreak
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseJohn Bambenek
 
Nitin Resume Java
Nitin Resume JavaNitin Resume Java
Nitin Resume JavaNitin Gupta
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageq-Maxim
 
DOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDoug Rosen
 
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ... Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...Databricks
 
Automated cheque recognition
Automated cheque recognitionAutomated cheque recognition
Automated cheque recognitioninfo_jojo
 

Semelhante a 打造面向金融場景的中文自然語言理解引擎 (20)

Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud prevention
 
[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token Engineering
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvert
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
AI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk Management
 
ScreenIT October 2012
ScreenIT October 2012ScreenIT October 2012
ScreenIT October 2012
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
 
Nitin Resume Java
Nitin Resume JavaNitin Resume Java
Nitin Resume Java
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
 
Sexy defense
Sexy defenseSexy defense
Sexy defense
 
DOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATION
 
"Navigate the MDR Marketplace Like a Pro!"
 "Navigate the MDR Marketplace Like a Pro!" "Navigate the MDR Marketplace Like a Pro!"
"Navigate the MDR Marketplace Like a Pro!"
 
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ... Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 
Automated cheque recognition
Automated cheque recognitionAutomated cheque recognition
Automated cheque recognition
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

打造面向金融場景的中文自然語言理解引擎

  • 2. About me • Member of AI group, CTBC Data R&D Center • Past experience on • Cyber security and defense industry • Smartphone industry • Familiar with • Machine learning • Natural language processing • Software development • Cloud native architecture design
  • 3. Team • CTBC Data R&D Center AI group is founded in 2018 • AI group is composed of data scientists and software developers • Our mission is to realize AI-based solution in banking scenario • We currently focus on • Computer Vision (CV) • Natural Language Processing (NLP) Retrieved from https://www.ithome.com.tw/news/131697
  • 4. Achievement NLP • Pluto: A Deep Learning based Watchdog for Anti Money Laundering • First Vertical AI paradigm in RegTech field in CTBC globally • Daily reduce 67% human effort on adverse media screening • Publication • https://www.aclweb.org/anthology/W19-5515 CV • NIST Face Recognition Verification Test (FRVT) • Rank 35th globally • Rank 2nd in Taiwan industry • X-ATM for fraud avoidance 名次 企業名稱 國家 FRR 10 Sensetine(商湯) 中國 0.0092 18 Face++(曠視) 中國 0.0145 26 CyberLink (訊連) 台灣 0.0195 29 Tencent Deepsea (騰訊) 中國 0.0215 35 CTBC BANK (中國信託) 台灣 0.0250 39 Gorilla Technology(大猩猩) 台灣 0.0291 55 Kneron Inc. (耐能) 台灣 0.0902
  • 5. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 6. Digitalized channel plays an important role 遠見雜誌 - 2018數位⾦融⼒調查 Retrieved from https://www.gvm.com.tw/article.html?id=54981
  • 7. Abundant Platform for Conversational Assistants messaging platform Google Home Amazon Echo
  • 8. • A task-oriented dialogue system • Chat in natural language • Be realized on Amazon Alexa Eno, your Capital One dialogue assistant
  • 9. Motivation • Realize a task-oriented dialogue system on heterogeneous conversational platforms in Mandarin to serve customers facing banking scenario Prerequisite • A natural language understanding (NLU) • intent recognition (IR) • named entity recognition (NER) NLU IR NER 美元定存六個月期的利率是多少 • Intent • 查詢利率 • Entity • 幣別:美元 • 帳戶類型:定存 • 期數:六個月
  • 10. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 11. Key Components in NLU • Deep Neural Networks (DNN) • Conditional Random Field (CRF) • Recurrent Neural Network (RNN) Preprocessing Tokenizer POS tagger Modeling Modeling Embeddings Supervised learning method vectorization • Intent Recognizer • Classification problem • Named Entity Extractor • Sequence labeling problem Approach
  • 12. Data Preparation • Intent dataset • 1016 samples over 3 distinct classes • 試算匯兌, 查詢存款利率, 查詢台外幣餘額 • Named entity dataset • 977 samples over 6 distinct entities • amount, money, duration, currency, acnt_type, timestamp Great acknowledgment for 數位金融處 and 個金數位營運處
  • 13. Intent Classification Techniques • Preprocessing • Tokenization (ckiptagger) • Feature extraction • Bag of Word (scikit-learn) Vocabulary [ “現在”, “台幣”,”美金”, “日圓”,“一 年期”, “定存”,“是”, “多少”] 現在美金一年期定存是多少 Text 現在 美金 一年期 定存 是 多少 Tokens • Model • Deep Neural Network (DNN) (tensorflow) [ 1 , 0 , 1 , 0 , 1 , 1 ] Feature vector Word Count encodingFeature engineering Model Training
  • 14. Named Entity Recognition Techniques • Preprocessing • Tokenization (ckiptagger) • POS tagging (ckiptagger) • Feature extraction • Text and POS tags within context Model I : CRF for Word-Level Feature 現在美金一年期定存是多少 Text 現在(Nd) 美金(Na) 一年期(Na) 定存(Na) 是(SHI) 多少(Neqa) Tokens …, ( -1:現在, -1:Nd, 0:美金, 0:Na, 1:一年期, 1:NA ), … Feature vector Context windows: 3 tokens • Model • Conditional Random Field (CRF) (scikit-learn) Feature engineering Model Training
  • 15. Named Entity Recognition Techniques • Preprocessing • Tokenization (ckiptagger) Model II : Bi-LSTM-CRF for Word-Level Embedding 現在美金一年期定存是多少 Text 現在 美金 一年期 定存 是 多少 Tokens • Model • Embedding Layer (keras) • Long Short-Term Memory (LSTM) layer (keras) • CRF layer (keras) Embedding learning Features learning Model training
  • 16. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 17. Evaluation Methodology Metrics Precision Recall F1-Score Confusion Matrix 實際 Yes 實際 No 預測 Yes True Positive (TP) False Positive (FP) 預測 No False Negative (FN) True Negative (TN) Reference: https://en.wikipedia.org/wiki/Confusion_matrix 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
  • 18. Evaluation Precision and Recall Intent classification 0.91 0.98 0.97 0.94 0.95 0.96 0.93 0.96 0.96 0.88 0.90 0.92 0.94 0.96 0.98 1.00 查詢台外幣餘額 查詢存款利率 試算匯兌 Precision Recall F1-Score
  • 20. Evaluation Recall Named Entity Recognition 0.82 0.55 0.78 0.67 0.52 0.940.95 0.67 0.79 0.80 0.89 0.72 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 幣別 期數 時間點 帳戶類型 錢 ⾦額 CRF BiLSTM+CRF
  • 21. Evaluation F1-Score Named Entity Recognition 0.81 0.64 0.82 0.68 0.52 0.92 0.97 0.71 0.72 0.84 0.88 0.82 0.00 0.20 0.40 0.60 0.80 1.00 1.20 幣別 期數 時間點 帳戶類型 錢 ⾦額 CRF BiLSTM+CRF
  • 22. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 23. Prototype Conversational AI with Rasa framework: https://github.com/RasaHQ/rasa NLU
  • 24. Prototype Why Rasa ? Extendible Architecture Open sourceOwn Our Data • Preserve privacy • Do not hand data over to big tech company • Transparency • Community support • Task-oriented dialogue architecture • Customizable components Rasa characteristics CTBC strategy • Customize Mandarin- based component • Integration on core technology • Compliance on Security and Regulation • Customized scenario • Ownership on core technology
  • 25. Prototype • Intent recognition • CKIP Tokenizer (customized) • EmbeddingIntentClassifier (built-in) • Named Entity Recognition • CKIP Tokenizer (customized) • Bi-LSTM-CRF for Word-Level Embedding (customized)
  • 27. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 28. Conclusion • NLU is a key module in task-oriented dialogue systems • Intent recognizer and entity extractor are key components to realize NLU by machine learning techniques and annotated data • DNN performs generally better than traditional method but not for all tasks • Rasa powered by open source offers a framework for conversational assistant development from scratch Summary
  • 29. Conclusion • Transfer learning based on pre-trained word embeddings initialization • Word-based embeddings vs. char-based embeddings • Model engineering What’s next
  • 30. Q&A