O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Language Model.pptx

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 12 Anúncio

Language Model.pptx

Baixar para ler offline

Research on character level language modelling using LSTM for semi-supervised learning. The objective is learning inner layer representations of the language model for transfer learning into a classification one.

Generalizing NLP processes using Bi-directional LSTMs to learn character(byte) level embeddings of financial news headlines up too 8 bits ( 2**8 -1) in order to study the relationship between character vectors in financial news headlines in order to transfer learning in to classification models using UTF-8 encoding. Many traditional NLP steps (lemmatize, POS, NER, stemming...) are skipped when diving to byte level making the process more universal in terms of scope then task specific.

Research on character level language modelling using LSTM for semi-supervised learning. The objective is learning inner layer representations of the language model for transfer learning into a classification one.

Generalizing NLP processes using Bi-directional LSTMs to learn character(byte) level embeddings of financial news headlines up too 8 bits ( 2**8 -1) in order to study the relationship between character vectors in financial news headlines in order to transfer learning in to classification models using UTF-8 encoding. Many traditional NLP steps (lemmatize, POS, NER, stemming...) are skipped when diving to byte level making the process more universal in terms of scope then task specific.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Semelhante a Language Model.pptx (20)

Mais recentes (20)

Anúncio

Language Model.pptx

  1. 1. Character(Byte)- Level Language Model Sequence-to-Sequence Language Model for Transfer Learning Firas Obeid
  2. 2. Agenda • Why Unstructured Data? • Character2Vector using Byte Level Encoding • Embeddings • Tips for LSTM Layer • Transfer Learning • Show code implementation with the model
  3. 3. Why the hype on Unstructured Data? • Natural language processing (NLP) has become mainstream through a focus on creating value from unstructured data. • The number of firms that only use unstructured data has shot up from 2% in 2018 to 17% in 2020, and only 3% of the firms surveyed report that they do not use alternative data sources, down from 30% in 2018. • Last week, Refinitivs’ 2020 survey shows 72% of firms’ models were negatively impacted by COVID-19. Some 12% of firms declared their models obsolete, and 15% are building new ones. The main problem was the lack of agility to quickly adapt and include new data sets in models as circumstances changed.
  4. 4. Benefits of Charac2vec • Having the character embedding, every single word’s vector can be formed even if it is out- of-vocabulary words (no Bag-of Words necessary). On the other hand, word embedding can only handle those seen words. • Good fits for misspelling words • Handles infrequent words better than word2vec embedding as later one suffers from lack of enough training opportunity for those rare words • Reduces model complexity and improving the performance (in terms of speed) • All this comes at a cost of training on larger sparse sequences thus longer time to train and optimize!
  5. 5. Why not Byte Level Even? • When ASCII encoding is used, there is no difference between reading characters or bytes. The ASCII- way of encoding characters allows for 256 characters to be encoded and (surprise…) these 256 possible characters are stored as bytes. 256 for 8 bit slots. • I will use only 127 of these possible character that are common to the English language. (7 bit slots) • 0-31, 127  Control Characters (Nonprintable) • 32-126  Alphabets(Upper/Lower case), Numeric, symbols and signs
  6. 6. Embeddings vs One- hot Encoding Binary mode returns an array denoting which tokens exist at least once in the input, while int mode replaces each token by an integer, thus preserving their order
  7. 7. Embeddings Layer • Gives relationship between characters. Based on how characters accompany each other. • Dense vector representation (n-Dimensional) of float point values. Map(char/byte) to a dense vector. • Embeddings are trainable weights/parameters by the model equivalent to weights learned by dense layer. • In our case each unique character/byte is represented with an N-Dimensional vector of floating point values, where the learned embedding forms a lookup table by "looking up" each characters dense vector in the table to encode it. • A simple integer encoding of our characters is not efficient for the model to interpret since a linear classifier only learns the weights for a single feature but not the relationship (probability distribution) between each feature(characters) or there encodings. • A higher dimensional embedding can capture “fine-grained” relationships between characters, but takes more data to learn.(256-Dimensions our case)
  8. 8. Tips for LSTM Inputs • The LSTM input layer must be 3D. [i.e batch_input_shape=(batch_size, n_timesteps, n_features)] • The meaning of the 3 input dimensions are: samples, time steps, and features (sequences, sequence_length, characters). • The LSTM input layer is defined by the input_shape argument on the first hidden layer. • The input_shape argument takes a tuple of two values that define the number of time steps and features. • The number of samples is assumed to be 1 or more. Specify to None for batch_input_shape otherwise or None for the input_length. • The reshape() function on NumPy arrays can be used to reshape your 1D or 2D data to be 3D. • The reshape() function takes a tuple as an argument that defines the new shape • The LSTM return the entire sequence of outputs for each sample (one vector per timestep per sample), if you set return_sequences=True. • Stateful RNN only makes sense if each input sequence in a batch starts exactly where the corresponding sequence in the previous batch left off. Our RNN model is stateless since each sample is different from the other and they dont form a text corpus but are separate headlines.
  9. 9. Semi-Supervised (Transfer Learning) • Previously, Word2Vec take two embedding layers for each token to predict probability of a word before and after. LSTM can handle that without random sampling, just take the max logit or probability of the outcome to predict next word or character • Unlabeled data can compensate for labeled fewer data in asset pricing .
  10. 10. Tips for Choosing Learning Rate

×