The document discusses Retrievable Augmented Generation (RAG), a technique to improve responses from large language models by providing additional context from external knowledge sources. It outlines challenges with current language models providing inconsistent responses and lack of understanding. As a solution, it proposes fine-tuning models using RAG and additional context. It then provides an example of implementing a RAG pipeline to power a question answering system for Munich Airport, describing components needed and hosting options for large language models.
Fact based Generative AI Leverage existing knowledge to generate specific, up-to-date yet tailored results
1. zzzzzzz
Fact based
Generative AI
Leverage existing knowledge to generate
specific, up-to-date yet tailored results.
Stefan Weber
Senior Director Software Development
Telelink Business Services
OutSystems MVP – AWS Community Builder
2. 2
Topics 1. Challenge – Why AI does not tell the truth
2. Solution – Retrievable Augmented Generation
and Fine Tuning a Large Language Model
3. Demo – Munich Airport QnA Tailored Answering
4. Flow – Implementing a RAG Pipeline with
OutSystems, OpenAI and Qdrant.
5. Run – Choose where to host your Large Language
Model
6. Forge – Ready made components for your RAG
flow.
3. Challenge
Large Language Models (LLM) exhibit inconsistency. On occasion, they excel in
providing accurate responses to inquiries, while at other times, they simply parrot
unrelated facts extracted from their training corpus. Their occasional lapses into
inconsistency are due to their systemic limitations.
LLMs possess a statistical understanding of word relationships but lack genuine
comprehension of meaning.
3
3
4. 4
Retrievable Augmented Generation (RAG)
RAG is a technique for improving the quality of
generated responses by an LLM. In this
process, information from external knowledge
sources, along with further instructions, is
provided to generate fact-based results.
Solution
Model Fine-Tuning
LLM fine-tuning is a process of adjusting and adapting
a pre-trained large language model to perform specific
tasks or to cater to a particular domain more
effectively. While fine-tuning proves effective in
emulating behaviors, it's not the best fit for cases that
require extensive domain knowledge, such as legal or
financial sectors.
RAG and Model Fine-Tuning are not mutually exclusive but should be used in combination to ensure high-quality and uniform
results.
6. RAG Flow
Turn information into data – Extract data
from information sources and create
semantic vector embeddings.
Query – Perform semantic similarity
search across vectorized data.
Synthesize – Prepare one-shot or
chain of thought prompt instructions
and inject search results.
Generate – Let LLM completions
generate tailored results based on
prompt.
6
7. Building a custom Retrievable Augmented
Generation Pipeline – Building Blocks
7
Text Cleaning
Document
Segmentation
Deduplication
Entity
Resolution
Corpus
Diversity
Annotations
8. 8
Vendor
Using the public APIs of LLM vendors
OpenAI
Aleph Alpha
Cohere
Anthropic
…
Using a Vendor Public API is the most
cost-effective way to get started with
LLMs and generative AI.
At the same time, you have no influence
on the lifecycle of data and there are
fine-tuning limitations.
Running Large Language Models
Public Cloud Runtimes
Hosting a model using a runtime of a
public cloud provider
AWS Sagemaker / Bedrock
Azure OpenAI
Huggingface
Full control of data lifecycle and security.
Possibility to offload parts of data
transformation to the platform to reduce
latency.
Own Datacenter
Build your own runtime environment or
use a prebuilt runtime.
9. 9
Forge Components
Integration Components
Azure OpenAI – OutSystems Platform Maintenance
Team
OpenAI Embeddings – Stefan Weber
Qdrant Vector Database – Stefan Weber
AWS Bedrock Runtime – Stefan Weber
Demo Application
Vector Embeddings Demo – Stefan Weber
Information Extraction Components
Adobe Acrobat Services – Stefan Weber
AWS Textract – OutSystems Platform Maintenance Team
Prompt Templating
Handlebars.Net – Miguel Antunes
Custom Code
Microsoft Semantic Kernel – Microsoft
LangChain – LangChain Inc. (e.g. via AWS Lambda Integration)
10. 10
Links
OutSystems, OpenAI Embeddings and Qdrant Vector
Database—Find Similar
OutSystems, OpenAI Embeddings and Qdrant Vector
Database—Answer Right
Get Started with OutSystems and Amazon Bedrock
Master Prompt Engineering
RAG vs Fine Tuning (Medium Member Article)
OpenAI
Qdrant Vector Database
Amazon Bedrock
11. Stefan Weber
Senior Director Software Development
Telelink Business Services
OutSystems MVP – AWS Community Builder
stefan.weber@tbs.tech
+49 1590 1888452
https://www.tbs.tech
https://www.linkedin.com/in/stefanweber1/