Masterclass: Natural Language Processing in Trading with Terry Benzschawel & Ishan Shah

Terry Benzschawel & Ishan Shah
January 14, 2020
Masterclass: Natural Language Processing
in Trading

Algorithmic & Quantitative Trading Institute
Established in 2010

Who are we?
Team
We are a group of traders, coders, analysts who love to teach and share their experience.
Mission
To bridge the gap between theory and practice.
Vision
All retail investors use Quant & Algorithmic trading by upskilling themselves and by simplifying technology.

How we help
Online classroom training for serious learners seeking to get a better role or start their own trading
business. 6-months long, 300+ hours content, personal learning coach, hands-on project work, 17+
faculty members, veriﬁed certiﬁcation
Self-paced interactive courses on various topics, Python & Excel based modelling, courses offered by
various experts in the domain.
blueshift-support@quantinsti.com
quantra@quantinsti.com
connect@quantinsti.com
Free backtesting platform with daily and minute data from NSE, NYSE

● Train a machine learning model to calculate a sentiment from a news
headline.
● Implement and compare the word embeddings methods such as Bag of Words
(BoW), TF-IDF, Word2Vec and BERT.
● Predict the stock returns and bond returns from the news headlines.
● Describe the applications of natural language processing.
● Automate and paper trade the strategies covered in the course.
Natural Language Processing in Trading
URL: https://quantra.quantinsti.com/course/natural-language-processing-trading
SELF-PACED ONLINE COURSE
LEVELDURATION
8 hours
Terry Benzschawel is the Founder and Principal at Benzschawel
Scientiﬁc, LLC. Before that, he had worked with Citigroup's
Institutional Clients Business, as a Managing Director, heading
the Quantitative Credit Trading group. He has also authored two
books on Credit Modeling.
AUTHORED BY
Advanced
INTERACTIVE
EXERCISES
LIFETIME
ACCESS
DOWNLOADABLE
STRATEGY CODES
CERTIFICATE FROM
QUANTINSTI
COURSE FEATURES WHAT LEARNERS SAY ABOUT QUANTRA
“Quantra is a marvellous source for Alpha strategies and a powerhouse
of great instructors with market experience. Also, Quantra gives a clear
research path so that one can research his own Alphas. I recommend it
for traders and researchers.”
Níkolas Pareschi
Instructor at Investidor de Sucesso, Brazil

Speakers Introduction
Terry Benzschawel is the Founder and Principal at Benzschawel
Scientific, LLC. Terry has worked as a credit strategist with a focus on
client-oriented solutions across all credit markets. Before that, he
had worked in Chase Manhattan and Citi to build algorithms to
predict corporate bankruptcy and to detect credit fraud on card
transactions. He has authored two books on Credit Modeling.
Terry Benzschawel
Founder and Principal at Benzschawel
Scientific, LLC
Ishan Shah
AVP, Content & Research at
QuantInsti
Ishan Shah is AVP and leads the content & research team at
Quantra by QuantInsti. Prior to that, he worked with Barclays in
the Global Markets team & with Bank of America Merrill Lynch.
He has a rich experience in financial markets spanning across
various asset classes in different roles.

Poll - 1
Do you use Sentiment Analysis in Trading?
A. I do
B. I don’t
C. Never heard of ‘Sentiment Analysis in Trading’

Agenda
➔ How is Natural Language Processing applied in financial markets?
➔ Different word embedding methods
➔ Aggregating Daily Sentiment Score on Quantra learning portal
➔ How does Quantra learning portal provide a unique learning experience?

How is Natural Language Processing applied in ﬁnancial markets?
Natural language processing in financial applications is most often used to gauge the sentiment (positive, negative
or neutral) of a given headline or text. In addition to directional sentiment, applications also often use a measure of
relevance to the asset or asset class in question. Once these measures are obtained, they are often summed over a
given period to make predictions about subsequent market moves.

Natural Language Processing in Financial Markets
● Natural language processing in financial markets has most often been applied in equity markets to predict
prices changes over a day or days.
● Recently, attempts are being made to apply NLP to predict price changes in corporate bond markets.
● The general approach consists of ﬁrst turning words in written text, say in news headlines or stories, into their
digital representations. This is called “embedding.”
● Next, these embedded texts are used to generate sentiment scores (positive, negative, or neutral) relative to the
market(s) whose returns they are trying to predict.
● The sentiment scores related to the market are summed over a period (e.g., a day) and used to predict the next
period’s price change.

Word Embedding Methods
Word embedding is the process of converting text into a digital representation in the computer. In the next few slides
I highlight several of the most important methods. It is important to note that this is a rapidly evolving field with many
innovations.
Word Embedding Methods
● Bag of Words
● Term Frequency-Inverse Document Frequency (TF-IDF)
● Word-2-Vec
● Embeddings from Language (ELMo) Model
● Bidirectional Encoder Representation from Transformers (BERT)

Bag of Words
● Bag of Words (BoW) is an algorithm that counts how many times a word appears in a document
- Those word counts allow us to compare documents and gauge their similarities for applications like search and document
classification
● Bag of Words (BoW) is an algorithm that counts how many times a word appears in a document
- Those word counts allow us to compare documents and gauge their similarities for applications like search and document
classification
- Each of the documents in the corpus is represented by columns of equal length
- Those are word count vectors, and output stripped of context
- The frequency of each word is effectively converted to represent the probabilities of those words’ occurrence in the document
- Probabilities that surpass certain levels will activate nodes in the network and influence the document’s classification

Term Frequency-Inverse Document Frequency (TF-IDF)
● With TF-IDF, words are given weight – TF-IDF measures relevance, not frequency
- Word counts are replaced with TF-IDF scores across the whole dataset
● TF-IDF measures the number of times that words appear in a given document (that’s “term frequency”).
- Because words such as “and” or “the” appear frequently in all documents, those must be discounted
- That’s the inverse-document frequency part. The more documents a word appears in, the less valuable that word is as a signal
to differentiate any given document
- That’s intended to leave only the frequent and distinctive words as markers
- Each word’s TF-IDF relevance is a normalized data format that also adds up to one.
- Those marker words are then fed to the neural net as features in order to determine the topic covered by the document that
contains them

Word2Vec
● Word2Vec models are two-layer neural networks that are trained to reconstruct linguistic contexts of words.
● Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions,
with each unique word in the corpus being assigned a corresponding vector in the space
● Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in
close proximity to one another in the space.
- It doesn’t distinguish the different meaning of a word with the same tokens
- For example, the word “bank” can relate to the financial institution or a river bank. The traditional word2vec is not able to
capture this granularity
● Word2vec trains words against other words that neighbor them in the input corpus
- It does so using context to predict a target word (continuous bag of words - CBOW) or using a word to predict a target context,
which is called skip-gram

Word2Vec
● Word2vec trains words against other words that neighbor them
in the input corpus
- It does so using context to predict a target word (continuous
bag of words - CBOW) or using a word to predict a target
context, which is called skip-gram
● Train the network by feeding it word pairs found in training
documents
- The network learns the statistics from the number of times each
pairing shows up
- For example, the network is probably going to get many more
training samples of (“Soviet”, “Union”) than it is of (“Soviet”,
“Sasquatch”)
- After training, if you give the network “Soviet” as input

Embeddings from Language (ELMo) Model
● ELMo uses bi-directional LSTMs to generate features for downstream tasks, which bring two advantages:
1. ELMo representations are purely character based and can learn the complex characteristic of word usage
2. Learn the change of word usage according to the different context in which it is used
●The bi-directional LSTM consists of 2 parts: a forward LM and a backward LM
- The forward LM tries to predict the next word given all the previous words from left to right:
- For each position k, the LSTM outputs a context-dependent representation where j=1,...,L and the top layer
is applied on a Softmax function to predict the next word t k+1

The BERT Model Architecture
● Bidirectional Encoder Representation from Transformers
● Unsupervised Pre-training
● Pre-train deep bidirectional representations by jointly
conditioning on both left and right context in all layers
● Instead of the recurrent neural network, it uses attention to
boost the speed with which these models can be trained, lends
itself to parallelization
● Can be extended to an intense layer and improve accuracy
● The word-embedding trained from BERT is for a
general-language purpose by a set of standard NLP techniques
such as work masking and contextual predictions

Sentiment Classiﬁcation
Source: Reuters

Aggregating Daily Sentiment Score
Jan 15
9:30 am
Jan 15
4:00 pm
Jan 14
9:30 am
Jan 14
4:00 pm

Summary
● This introductory webinar describes the use of Natural Language Processing (NLP) techniques in the context
of building trading strategies for 1-day horizons for the corporate bond market and equities markets using
news headlines.
● We described various methods for converting text into digital representations and for extracting sentiment
scores from those embeddings.
● Looking ahead to the course, we find that approaches using the latest advances in NLP are better suited to
predict future returns in credit indices, by using news headlines directly as inputs, instead of news headline
sentiments

Why I recommend Quantra for learning?
● The Quantra program on NLP provides a unique blend of underlying theory, practical applications and
programming exercises.
● Attendees will come away with a broad understanding of natural language processing techniques, their
implementation, and the challenges in applying those techniques to problems in finance.
● The applications described in the training program are actual applications to problems in predicting corporate
bond returns.
● The instructors in the program include pioneers in the field of machine learning in finance who have
successfully applied those methods to real world problems and share that experience in the lessons.

Thank You!
quantra@quantinsti.com

Masterclass: Natural Language Processing in Trading with Terry Benzschawel & Ishan Shah

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (8)

Semelhante a Masterclass: Natural Language Processing in Trading with Terry Benzschawel & Ishan Shah

Semelhante a Masterclass: Natural Language Processing in Trading with Terry Benzschawel & Ishan Shah (20)

Mais de QuantInsti

Mais de QuantInsti (20)

Último

Último (20)

Masterclass: Natural Language Processing in Trading with Terry Benzschawel & Ishan Shah