Context
LLMs are powerful for knowledge
generation and reasoning, pre-trained on
publicly available data.
To augment LLMs with private data, in-
context learning has emerged as a
paradigm, where context is inserted into
the input prompt.
In-context learning takes advantage of
LLMs' reasoning capabilities to generate
a response, offering a simple and
effective way to use them for data
augmentation.
To perform LLM’s data augmentation in a performant, efficient,
and cheap manner, we need to solve two components:
Data Ingestion
Data Indexing
1
2
3
Provides indices over your unstructured and
structured data for use with LLM’s.
These indices help to abstract away common
boilerplate and pain points for in-context learning:
Storing context in an easy-to-access format for prompt insertion.
Dealing with prompt limitations (e.g. 4096 tokens for Davinci)
when context is too big.
Dealing with text splitting.
Just Starting Out (Vector
Store Indices)
Connecting LlamaIndex to
an External Data Source of
Documents
Summarization over
Documents
Combining information
across multiple indices
Routing a Query to the
Right Index
Using Keyword Filters
Including Hierarchical
Context in your Answer
Use cases
How Each Index Works
Vector Store Index
The vector store index stores each Node and a corresponding
embedding in a Vector Store.
Data Connectors
This module contains the data
connectors for LlamaIndex. Each
connector inherits from a
BaseReader class, connects to a
data source, and loads Document
objects from that data source.
Github repository
reader
BeautifulSoup web page reader
Discord reader
Google Docs reader
Generic interface for a
data document
Notion Page reader.
LLama Hub
Simple directory reader.
Youtube Transcript reader.
File/audio reader
Image loader (text extract)
https://llamahub.ai/
and so on....
Example of usage
with Gmail connector
This loader seaches your Gmail account and
parses the resulting emails into Documents.