Studied feasibility of applying state-of-the-art deep learning models like end-to-end memory networks and neural attention- based models to the problem of machine comprehension and subsequent question answering in corporate settings with huge
amount of unstructured textual data. Used pre-trained embeddings like word2vec and GLove to avoid huge training costs.
3. Deep learning
What is Deep learning?
Deep learning is a new area of Machine learning research that uses multi-
layered Artificial neural networks. The objective is to learn multiple levels of
representation and abstraction that help to make sense of data such as images,
sound, and text. It is and is becoming increasingly relevant because of three
key reasons :
An infinitely flexible function – universal function approximation via Neural networks
All-purpose parameter fitting – using gradient descent and its derivative algorithms
Fast and scalable – availability of cheap GPUs for fast matrix multiplications
Typical applications of Deep learning
Convolution Neural Networks(CNN) in Computer vision and machine translation
Recurrent Neural Network(RNN) like LSTM/GRU in language modeling
Tree Neural Networks(TNN) in sentiment analysis
Reinforcement learning in Game playing and intelligent agents
4. 4
Basic Building blocks of deep learning
Most DL Networks (including Question Answering models) are composed
out of these basic building blocks:
• Fully Connected Network
• Word Embedding
• Convolutional Neural Network
• Recurrent Neural Network
6. What is a Question Answering System?
The basic idea of an automated QA system is to extract information from
documents and given a user query provide a short and concise answer that will
meet user’s information needs.
Traditional QA systems are basically of 2 types :
Information Retrieval(IR) based QA – Match and ranking based broad domain
QA using mostly unstructured data, example -> Search engines
Knowledge-based(KB) QA – semantic representation of query using structured
data like triple stores or SQL example -> Freebase , DBPedia, and Wolfram
alpha
Question types
Factoid questions – DeepMind CNN/DailyMail datset
Cloze style questions – MCTest dataset and bAbI
Open domain question answering – WikiQA and LAMBADA
QA systems
11. Motivations – What deep learning can
do for QA systems ?
Traditional QA pipeline relies a lot on manual feature engineering. The aim of
deep learning models is to eliminate this.
Aim to build systems that can directly read documents and then answer
questions based on those documents.
RNNs have been successful in language modeling and generation but could
not achieve much success in QA as they cannot store enough context in their
hidden states . To answer complex questions models require supporting facts
far back in the past.
Suffer from vanishing gradient problem if too many time-steps are used.
Solution - incorporate explicit Memory in the model and a way to address
that memory for read and write.
13. What are Memory Networks ?
Class of models that combine large memory with learning component that
can read and write to it.
Incorporates reasoning with attention over memory (RAM).
Most ML has limited memory which is more-or-less all that’s needed for
“low level” tasks e.g. object detection.
Long-term memory is required to read a story and then e.g. answer
questions about it.
It is also required for dialog: to remember previous dialog (short- and
long-term), and respond.
Models are scalable - can store and read large amount of data in memory
- entire KB
14. All MemNN have four component networks (which may or
may not have shared parameters):
I: (input feature map) convert incoming data to the internal feature
representation.
G: (generalization) update memories given new input.
O: produce new output (in feature representation space) given the
memories.
R: (response) convert output O into response seen by the outside world
Step 1: controller converts incoming data to internal
feature representation (I)
Step 2: write head updates the memories and writes the data
into memory (G)
Step 3: given the external input, the read head reads
the memory and fetches relevant data (O)
Step 4: controller combines the external data with
memory contents returned by read head to generate
output (O, R)
17. Datasets available to train/test QA
models
Facebook bAbI Simplequestions– A set of 20 tasks for testing text understanding
and reasoning. For each task, there are 10000 questions for training, and 1000 for
testing. Each task tests the machine on a specific skill set.
https://research.fb.com/downloads/babi/
Facebook bAbI Chidlren's Book Test(CBT)- Text passages and corresponding
questions drawn from Project Gutenberg Children's books. 669,343 training
questions , 8,000 dev questions and 10,000 test questions
MCTest - consists of 500 stories and 2000 questions. By being fictional, the answer
typically can be found only in the story itself. Requires machines to answer
multiple-choice reading comprehension questions about fictional stories, directly
tackling the high-level goal of open-domain machine comprehension.
http://research.microsoft.com/en-us/um/redmond/projects/mctest/
18. Language Modeling Broadened to Account for Discourse Aspects(LAMBADA
dataset) - consists of 10,022 passages, divided into 4,869 development and 5,153
test passages (extracted from 1,331 and 1,332 disjoint novels, respectively). The
average passage consists of 4.6 sentences in the context plus 1 target sentence, for
a total length of 75.4 tokens (dev) / 75 tokens (test).
http://clic.cimec.unitn.it/lambada/
DeepMind CNN and DailyMail dataset - Collection of news articles and
corresponding cloze queriesEach dataset contains many documents (90k and 197k
each), and each document has on average 4 questions approximately. Each
question is a sentence with one missing word/phrase which can be found from the
accompanying document/context
http://cs.nyu.edu/~kcho/DMQA/
19. Stanford Question answering Dataset (SQuAD) - reading comprehension dataset
consisting of questions posed by crowd-workers on a set of Wikipedia articles. The
answer to every question is a segment of text, or span, from the corresponding
reading passage. There are 100,000+ question-answer pairs on 500+ articles.
https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/
AI2 Science Exams - Elementary science questions from US state and regional
science exam. 170 multi-state and 108 4th grade questions.
http://allenai.org/data/science-exam-questions.html
WikiQA - 3047 questions sampled from Bing query logs. Each question associated
with a Wikipedia page. All sentences in the summary paragraph of the page
become the candidate answers. Only 1/3rd questions have a correct answer in the
candidate answer set.
https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-
dataset-for-open-domain-question-answering/
28. Component Description
Operating System Ubuntu 16.04 VM on Intel Octa core CPU with 6.5 GB RAM
Graphics Card NVIDIDA Testla K80 with 12 GB Ram and 2080 CUDA cores
Graphics Toolkit CUDA 8.0 with CuDNN 6.0
Python Package Manager Anaconda (Continuum Analytics) for Python 2.7
Deep learning library Keras v2.0.2
with Theano v0.9.0 backend
Other python modules Bcolz v1.0.0 for fast saving/loading of trained weights
Numpy v1.12.1 for all multi-dimensional numeric manipulations
Scikit-learn v0.18.1 for preprocessing, pipelining, feature-extraction, decomposition , dataset
splits and all general non-deep machine algorithms
Cpickle for saving model
NLTK toolkit for traditional linguistic tasks
Matplotlib v2.0.0 – for visualizing data
Pydot v1.0.28 and GraphViz v2.38.0– for visualizing deep models
Openblas 0.2.19 – for fast linear algebra operations
Pandas v0.19.2 for structured data manipulation
Protobuf 3.0.0 for protocol buffering
Flask v0.12 for web display
Experimental setup in Google Cloud
38. Future work
Train Dynamic Memory network on bAbi dataset
Train Key-value memory network on bAbi dataset
Evaluate the performance of current models on other datasets like
LAMBADA and Stanford SQUAD
Explore the possibility of transfer learning so that models trained on open
source datasets can be applied to corporate datasets with only fine tuning
Explore the use of trained models in dialog modeling for Helpdesk
Question answering