SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Talk @ Taiwan AI Academy, November 17, 2018
Textual Data Analytics in Finance
Dr. Chuan-Ju Wang (王釧茹)
Research Center for Information Technology
Innovation, Academia Sinica
Computational Finance and Data Analytics
Laboratory (CFDA Lab)
http://cfda.csie.org
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Quant — Data Scientist
Source: http://www.indeed.com/jobtrends
Source: http://www.computerweekly.com/blogs/Data-Matters/2014/06/data-scientist-the-new-quant.html
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Data Science in Finance
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Text Analytics
❖ Big Data
❖ Structured Data
❖ user logs, sensor logs, click through logs, …
❖ Unstructured Data
❖ web texts, user conversions, public opinions, reports…
❖ Big Data for Unstructured Text – Text Analytics
❖ Goal — Turn text into data for analysis, via application of
natural language processing (NLP) and analytical methods
https://insidebigdata.com/2015/06/05/text-analytics-the-next-generation-of-big-data/
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Textual Sentiment Analysis for
Financial Risk Prediction
On the Risk Prediction and Analysis of Soft
Information in Finance Reports. European Journal of
Operational Research (EJOR), 257(1), 243-250, 2017.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Soft and Hard Information in Finance
❖ Growing amount of financial data makes it more and more important
to learn how to discover valuable information for various financial
applications.
❖ In finance, there are typically two kinds of information:
❖ Soft information: text, including opinions, ideas, and market
commentary.
❖ Hard information: numerical values, such as financial measures and
historical prices.
❖ Our work aims to exploit soft information for financial risk prediction.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Risk Proxy: Stock Return Volatility
❖ Stock return
❖ Stock return volatility
❖ A common risk metric measured by the standard
deviation of returns over a period of time.
Rt =
(St St 1)
St 1
v[t n,t] =
t
i=t n(Ri R)2
n
, where R =
t
i=t n
Ri
(n + 1)
.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Analysis
❖ In this work, we attempt to apply sentiment analysis on the
risk prediction task.
❖ A finance-specific sentiment lexicon is adopted for analysis.
❖ Two machine learning techniques are adopted for the task:
❖ Regression approach: Predict the stock return volatilities.
❖ Ranking approach: Rank the companies to be in line
with their relative risk levels.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Lexicon
❖ Words in finance domain and in general usage usually have
different meanings, such as
❖ vice: immoral or wicked behavior
❖ vice: secondary (in finance context)
❖ Almost three-fourths of the words in the 10-K financial reports
from year 1994 to 2008, which are identified as negative by the
widely used Harvard Psychosociological Dictionary, are
typically not considered negative in financial contexts.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Six Finance-Specific Lexicons
❖ Loughran and McDonald (2011)
❖ When is a liability not a liability? textual analysis, dictionaries,
and 10-ks. Journal of Finance.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Problem Formulation
❖ Predict target: Future’s stock return volatility (regression) and
future’s relative risk levels (ranking)
❖ Features
❖ Soft textual information: All words or financial sentiment words
❖ Hard numerical information: The twelve months before the
report volatility for each company
v(+12)
2007/3/222006/3/22
Report filing date
2005/3/22
v(-12)
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Corpora: The 10-K Corpus
❖ A Form 10-K is an annual report required by the U.S. Securities and Exchange Commission (SEC)
❖ Only section 7 “management’s discussion and analysis of financial conditions and results of operations”(MD&A)
❖ The Sarbanes-Oxley Act of 2002: Explain the drastic increase in length during the 2002-2003 period
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Experimental Results
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Terms Analysis
amend
deficit
forbear
delist
defaultsureti
discontinu
wherebi
unabl
disput
concern
profit
violat
regain
uncom
-plet
accid
abl
integr
grantor
ceg
nasdaq
gnb
coven
forbear
waiver
sureti
excelsior
rais
ebix
shelbour
nplacement
syndic
pfc
stage
same
driver
default
small-
cap
seri
hearth
awg
amend
libert
special
benefici sever
breach
doubt
Fin-Neg
Fin-Pos
Fin-Lit
Fin-Unc
Non
SEN
ORG
1
1
2
3
4
5
2
3
4
5
deficit
deficits
default
defaulted
defaulting
defaults
delist
delisted
deslisting
delists
amend
amendable
amendatory
amended
amending
amendment
amendments
amends
forbear
forbearance
forbearances
forbearing
forbears
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
FIN10K Prototype Demo
https://cfda.csie.org/10K/
FIN10K: A Web-based Information System for
Financial Report Analysis and Visualization.
ACM CIKM (Demo paper), 2016.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Keyword Expansion via
Continuous Word Vector Representations
Discovering Finance Keywords via Continuous
Space Language Models. ACM Transactions on
Management Information Systems, 7(3), 7:1-7:17, 2016.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Sentiment Analysis — the Lexicon
❖ For sentiment analysis, the lexicon is one of the most
important and common resources.
❖ Usually have a great impact on results and the
corresponding analyses
❖ In finance, the lexicon is usually semi-manually generated.
❖ Result in inadequate words
❖ In this work, we attempt to use the advanced continuous space
language models to expand finance keywords automatically.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Continuous Space Language Models
❖ “You shall know a word by the company it keeps”

(J. R. Firth 1957)
❖ One of the most successful ideas of modern statistical NLP!
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Continuous Space Language Models
❖ Continuous space language models
❖ a.k.a. Continuous word embeddings
❖ Words are represented as low-rank dense vectors.
❖ Recent studies show their superiority in capturing
syntactic and contextual regularities in language.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Keyword Expansion
❖ Our Proposed Keyword Expansion Method
❖ Adapt this technique to incorporate syntactic
information to capture more similarly meaningful
keywords.
❖ Learn vector representations of words via a large
collection of financial reports (domain-specific)
❖ Words in the financial sentiment lexicon are used as seed
words to obtain those within the top N cosine distances.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Keyword Expansion
❖ Keyword Expansion with Syntactic Information
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
The New 10-K Corpus
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Four Prediction Tasks
❖ Four prediction tasks are conducted.
❖ To demonstrate that our approach is effective for
discovering predictability keywords
1) Post-event volatility
2) Stock volatility
3) Abnormal trading volume
4) Excess returns
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Postevent Volatility Prediction
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
FIN10K Prototype Demo
https://cfda.csie.org/10K/
FIN10K: A Web-based Information System for Financial Report Analysis
and Visualization. ACM CIKM (Demo paper), 2016.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Beyond Word-Level Analysis
❖ Multi-word expression detection and analysis
❖ Beyond Word-Level to Sentence-Level Sentiment Analysis for
Financial Reports
❖ RiskFinder: A Sentence-level Risk Detector for Financial Reports,
NAACL’18
❖ https://cfda.csie.org/RiskFinder/
❖ FRIDAYS: A Financial Risk Information Detecting and Analyzing
System, AAAI’18
❖ https://cfda.csie.org/FRIDAYS/
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Summary
❖ If structured data is big, then unstructured data is huge.
❖ 20% (structured) vs. 80% (unstructured)
❖ There is a massive potential waiting to be leveraged in
the analysis of unstructured data in the field of finance.
Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Thanks for Your Listening!

Mais conteúdo relacionado

Semelhante a [2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹

assessment 1 Submission dat e 14 - Apr- 2018 0833AM.docx
assessment 1   Submission dat e  14 - Apr- 2018 0833AM.docxassessment 1   Submission dat e  14 - Apr- 2018 0833AM.docx
assessment 1 Submission dat e 14 - Apr- 2018 0833AM.docx
festockton
 

Semelhante a [2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹 (20)

Presenting Results 成果展示
Presenting Results 成果展示Presenting Results 成果展示
Presenting Results 成果展示
 
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
 
Future Watch, China's Big Data Ecosystem Update
Future Watch, China's Big Data Ecosystem UpdateFuture Watch, China's Big Data Ecosystem Update
Future Watch, China's Big Data Ecosystem Update
 
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
 
Resume_Mahendra Kalyan
Resume_Mahendra KalyanResume_Mahendra Kalyan
Resume_Mahendra Kalyan
 
Resume_Mahendra Kalyan
Resume_Mahendra KalyanResume_Mahendra Kalyan
Resume_Mahendra Kalyan
 
The Role of Venture Capital in the US Economy
The Role of Venture Capital in the US EconomyThe Role of Venture Capital in the US Economy
The Role of Venture Capital in the US Economy
 
Leap business plan
Leap business planLeap business plan
Leap business plan
 
Tracxn Research - Robo Advisors Report, June 2017
Tracxn Research - Robo Advisors Report, June 2017Tracxn Research - Robo Advisors Report, June 2017
Tracxn Research - Robo Advisors Report, June 2017
 
assessment 1 Submission dat e 14 - Apr- 2018 0833AM.docx
assessment 1   Submission dat e  14 - Apr- 2018 0833AM.docxassessment 1   Submission dat e  14 - Apr- 2018 0833AM.docx
assessment 1 Submission dat e 14 - Apr- 2018 0833AM.docx
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Novi Labs-Tudor Pickering & Holt Energy Disruption Conference
Novi Labs-Tudor Pickering & Holt Energy Disruption Conference Novi Labs-Tudor Pickering & Holt Energy Disruption Conference
Novi Labs-Tudor Pickering & Holt Energy Disruption Conference
 
Preparing our Students for an AI Future
Preparing our Students for an AI FuturePreparing our Students for an AI Future
Preparing our Students for an AI Future
 
2017 AI Index report
2017 AI Index report2017 AI Index report
2017 AI Index report
 
Aly factsheet May_2016_white
Aly factsheet May_2016_whiteAly factsheet May_2016_white
Aly factsheet May_2016_white
 
Aly presentation nov 2017
Aly presentation nov 2017Aly presentation nov 2017
Aly presentation nov 2017
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
 
Tracxn Research - Tutoring Landscape, January 2017
Tracxn Research - Tutoring Landscape, January 2017Tracxn Research - Tutoring Landscape, January 2017
Tracxn Research - Tutoring Landscape, January 2017
 
F-Prime Capital: Market Perspective, 2018
F-Prime Capital: Market Perspective, 2018 F-Prime Capital: Market Perspective, 2018
F-Prime Capital: Market Perspective, 2018
 
Tracxn Research — Enterprise Storage Landscape, November 2016
Tracxn Research — Enterprise Storage Landscape, November 2016Tracxn Research — Enterprise Storage Landscape, November 2016
Tracxn Research — Enterprise Storage Landscape, November 2016
 

Mais de 台灣資料科學年會

[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
台灣資料科學年會
 

Mais de 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Último

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Último (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

[2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹

  • 1. Talk @ Taiwan AI Academy, November 17, 2018 Textual Data Analytics in Finance Dr. Chuan-Ju Wang (王釧茹) Research Center for Information Technology Innovation, Academia Sinica Computational Finance and Data Analytics Laboratory (CFDA Lab) http://cfda.csie.org
  • 2. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Quant — Data Scientist Source: http://www.indeed.com/jobtrends Source: http://www.computerweekly.com/blogs/Data-Matters/2014/06/data-scientist-the-new-quant.html
  • 3. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Data Science in Finance
  • 4. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Text Analytics ❖ Big Data ❖ Structured Data ❖ user logs, sensor logs, click through logs, … ❖ Unstructured Data ❖ web texts, user conversions, public opinions, reports… ❖ Big Data for Unstructured Text – Text Analytics ❖ Goal — Turn text into data for analysis, via application of natural language processing (NLP) and analytical methods https://insidebigdata.com/2015/06/05/text-analytics-the-next-generation-of-big-data/
  • 5. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Textual Sentiment Analysis for Financial Risk Prediction On the Risk Prediction and Analysis of Soft Information in Finance Reports. European Journal of Operational Research (EJOR), 257(1), 243-250, 2017.
  • 6. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Soft and Hard Information in Finance ❖ Growing amount of financial data makes it more and more important to learn how to discover valuable information for various financial applications. ❖ In finance, there are typically two kinds of information: ❖ Soft information: text, including opinions, ideas, and market commentary. ❖ Hard information: numerical values, such as financial measures and historical prices. ❖ Our work aims to exploit soft information for financial risk prediction.
  • 7. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Risk Proxy: Stock Return Volatility ❖ Stock return ❖ Stock return volatility ❖ A common risk metric measured by the standard deviation of returns over a period of time. Rt = (St St 1) St 1 v[t n,t] = t i=t n(Ri R)2 n , where R = t i=t n Ri (n + 1) .
  • 8. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Analysis ❖ In this work, we attempt to apply sentiment analysis on the risk prediction task. ❖ A finance-specific sentiment lexicon is adopted for analysis. ❖ Two machine learning techniques are adopted for the task: ❖ Regression approach: Predict the stock return volatilities. ❖ Ranking approach: Rank the companies to be in line with their relative risk levels.
  • 9. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Lexicon ❖ Words in finance domain and in general usage usually have different meanings, such as ❖ vice: immoral or wicked behavior ❖ vice: secondary (in finance context) ❖ Almost three-fourths of the words in the 10-K financial reports from year 1994 to 2008, which are identified as negative by the widely used Harvard Psychosociological Dictionary, are typically not considered negative in financial contexts.
  • 10. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Six Finance-Specific Lexicons ❖ Loughran and McDonald (2011) ❖ When is a liability not a liability? textual analysis, dictionaries, and 10-ks. Journal of Finance.
  • 11. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Problem Formulation ❖ Predict target: Future’s stock return volatility (regression) and future’s relative risk levels (ranking) ❖ Features ❖ Soft textual information: All words or financial sentiment words ❖ Hard numerical information: The twelve months before the report volatility for each company v(+12) 2007/3/222006/3/22 Report filing date 2005/3/22 v(-12)
  • 12. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Corpora: The 10-K Corpus ❖ A Form 10-K is an annual report required by the U.S. Securities and Exchange Commission (SEC) ❖ Only section 7 “management’s discussion and analysis of financial conditions and results of operations”(MD&A) ❖ The Sarbanes-Oxley Act of 2002: Explain the drastic increase in length during the 2002-2003 period
  • 13. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Experimental Results
  • 14. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Sentiment Terms Analysis amend deficit forbear delist defaultsureti discontinu wherebi unabl disput concern profit violat regain uncom -plet accid abl integr grantor ceg nasdaq gnb coven forbear waiver sureti excelsior rais ebix shelbour nplacement syndic pfc stage same driver default small- cap seri hearth awg amend libert special benefici sever breach doubt Fin-Neg Fin-Pos Fin-Lit Fin-Unc Non SEN ORG 1 1 2 3 4 5 2 3 4 5 deficit deficits default defaulted defaulting defaults delist delisted deslisting delists amend amendable amendatory amended amending amendment amendments amends forbear forbearance forbearances forbearing forbears
  • 15. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 FIN10K Prototype Demo https://cfda.csie.org/10K/ FIN10K: A Web-based Information System for Financial Report Analysis and Visualization. ACM CIKM (Demo paper), 2016.
  • 16. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Financial Keyword Expansion via Continuous Word Vector Representations Discovering Finance Keywords via Continuous Space Language Models. ACM Transactions on Management Information Systems, 7(3), 7:1-7:17, 2016.
  • 17. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Sentiment Analysis — the Lexicon ❖ For sentiment analysis, the lexicon is one of the most important and common resources. ❖ Usually have a great impact on results and the corresponding analyses ❖ In finance, the lexicon is usually semi-manually generated. ❖ Result in inadequate words ❖ In this work, we attempt to use the advanced continuous space language models to expand finance keywords automatically.
  • 18. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Continuous Space Language Models ❖ “You shall know a word by the company it keeps”
 (J. R. Firth 1957) ❖ One of the most successful ideas of modern statistical NLP!
  • 19. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Continuous Space Language Models ❖ Continuous space language models ❖ a.k.a. Continuous word embeddings ❖ Words are represented as low-rank dense vectors. ❖ Recent studies show their superiority in capturing syntactic and contextual regularities in language.
  • 20. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Keyword Expansion ❖ Our Proposed Keyword Expansion Method ❖ Adapt this technique to incorporate syntactic information to capture more similarly meaningful keywords. ❖ Learn vector representations of words via a large collection of financial reports (domain-specific) ❖ Words in the financial sentiment lexicon are used as seed words to obtain those within the top N cosine distances.
  • 21. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Keyword Expansion ❖ Keyword Expansion with Syntactic Information
  • 22. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 The New 10-K Corpus
  • 23. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Four Prediction Tasks ❖ Four prediction tasks are conducted. ❖ To demonstrate that our approach is effective for discovering predictability keywords 1) Post-event volatility 2) Stock volatility 3) Abnormal trading volume 4) Excess returns
  • 24. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Postevent Volatility Prediction
  • 25. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 FIN10K Prototype Demo https://cfda.csie.org/10K/ FIN10K: A Web-based Information System for Financial Report Analysis and Visualization. ACM CIKM (Demo paper), 2016.
  • 26. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Beyond Word-Level Analysis ❖ Multi-word expression detection and analysis ❖ Beyond Word-Level to Sentence-Level Sentiment Analysis for Financial Reports ❖ RiskFinder: A Sentence-level Risk Detector for Financial Reports, NAACL’18 ❖ https://cfda.csie.org/RiskFinder/ ❖ FRIDAYS: A Financial Risk Information Detecting and Analyzing System, AAAI’18 ❖ https://cfda.csie.org/FRIDAYS/
  • 27. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Summary ❖ If structured data is big, then unstructured data is huge. ❖ 20% (structured) vs. 80% (unstructured) ❖ There is a massive potential waiting to be leveraged in the analysis of unstructured data in the field of finance.
  • 28. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018 Thanks for Your Listening!