White Paper: Leveraging Enterprise Data with AI (RAG)
Enterprise AI (RAG): Compliance, Reliability, Costs, etc. The Complete Guide.
Understanding, framing and deploying a useful, secure and compliant RAG architecture to leverage internal business knowledge.
1. RAG in enterprise
Architecture, costs, compliance — the complete guide
Companies hold a considerable documentary asset: procedures, contracts, meeting minutes, knowledge bases, support tickets, CRM content, product sheets, HR documents, internal standards, and security policies. Yet this knowledge often remains difficult to find, interpret, and reuse.
RAG, or Retrieval-Augmented Generation, connects generative AI to a company’s internal data in order to produce answers that are more contextualized, more verifiable, and better adapted to business use cases.
This guide explains what a RAG architecture is, when to use it, how to design it, how much it can cost, which risks to anticipate, and how to approach security, confidentiality, and compliance issues.
2. Introduction
Generative AI has made visible a new way of interacting with information: asking a question in natural language and receiving a structured answer. For companies, the promise is strong: save time, reduce dependence on a few internal experts, streamline access to knowledge, and improve the quality of operational decisions.
But a general-purpose model, used on its own, does not know the company’s internal documents. It can help draft, reformulate, or structure content, but it cannot spontaneously answer based on your contracts, procedures, quality database, or technical documentation. This is precisely the role of RAG: to provide the model with the right documentary excerpts at the right time, so that it answers from a controlled business context.
RAG is therefore not simply about “connecting ChatGPT to files.” It is a complete architecture combining data, document search, security, governance, user experience, evaluation, and maintenance. Recent sources confirm that enterprise RAG implementations must address complex topics: query understanding, multi-source access, token constraints, response times, access rights, and content governance.
In this white paper, you will understand:
what RAG is, in simple terms;
when it is relevant;
when it should not be used;
which technical components make up a RAG architecture;
which structural decisions must be made;
which costs to anticipate;
how to approach security, GDPR, the AI Act, and confidentiality;
how to launch a RAG project progressively and realistically.
Key takeaway
RAG is a pragmatic response to a very common problem: companies have a lot of knowledge, but it is scattered, hard to access, and rarely usable in natural language.
3. Understanding RAG simply
What does RAG mean?
RAG stands for Retrieval-Augmented Generation.
The principle is simple:
the user asks a question;
the system searches internal documents for the most relevant passages;
these passages are added to the context sent to the language model;
the model generates an answer based on these excerpts;
the answer can cite the sources used.
AWS describes RAG as a technique that augments a large language model with external data, such as internal documents, to provide the model with the context required to produce a useful answer for a specific use case.
A simple analogy
Imagine an expert who has to answer a complex question. Without documents, they rely on memory. With RAG, the right files are automatically opened at the right pages before they answer.
The language model plays the role of the writer who formulates the answer. The search engine plays the role of the document specialist who retrieves the sources. The RAG architecture organizes the collaboration between the two.
RAG vs. a traditional chatbot
A traditional chatbot answers based on what it learned during training or from its initial prompt. It can be fluent, but it may produce answers that are too general or not aligned with your internal rules.
A RAG chatbot, on the other hand, first searches your internal sources: HR policies, product documentation, quality procedures, contracts, FAQs, support tickets, and sales documents. The answer is therefore more grounded in your operational reality.
RAG vs. fine-tuning
Fine-tuning consists of retraining or specializing a model using examples. It can be relevant for learning a style, a response structure, a classification, or a specific behavior.
But to use internal documents that change regularly, RAG is often more suitable. It allows knowledge to be updated by reindexing or synchronizing sources, without retraining the model. It also makes source citation, traceability, and access-right management easier.
Approach | Useful for | Limits
Simple prompt | One-off tasks, writing, reformulation | Not connected to internal data
Fine-tuning | Style, classification, recurring behaviors | Less suitable for changing knowledge
RAG | Document search, sourced answers, internal knowledge | Highly dependent on data quality and retrieval quality
Key takeaway
RAG is generally more relevant than fine-tuning when the main objective is to use an evolving internal document base with sourced answers.
4. When should RAG be used in enterprise?
Un titre de section clair.
Internal HR assistant
Problem: employees regularly ask the same questions about leave, remote work, expense reports, benefits, and onboarding processes.
RAG solution: connect an assistant to HR policies, internal agreements, FAQs, and procedures.
Business value: fewer repetitive requests, greater employee autonomy, consistent and traceable answers.
Augmented customer support
Problem: support teams need to search through past tickets, product sheets, escalation procedures, and knowledge bases.
RAG solution: suggest answers based on internal sources, with citations of the documents used.
Business value: reduced handling time, higher-quality answers, reuse of past incident knowledge.
Intelligent document search engine
Problem: documents are scattered across SharePoint, Google Drive, Notion, Confluence, ERP, CRM, or PDF files.
RAG solution: index priority sources and allow users to ask questions in natural language.
Business value: faster access to information, reduced dependence on experts, better reuse of knowledge.
Legal or compliance assistant
Problem: teams need to quickly find clauses, obligations, procedures, or internal decisions.
RAG solution: query contracts, policies, reference materials, internal notes, and validated archives.
Business value: faster research, more efficient preparation of analyses, better traceability.
Point of attention: this type of assistant must be carefully scoped. It can help retrieve and summarize information, but it should not be presented as automated legal advice.
Sales assistant
Problem: sales teams waste time finding the right pitches, customer cases, offer sheets, objection-handling responses, or proposal elements.
RAG solution: create an assistant connected to sales documentation and approved content.
Business value: faster responses to prospects, message consistency, faster ramp-up for new team members.
Technical knowledge base
Problem: technical documentation is extensive, partly obsolete, and difficult to browse.
RAG solution: query manuals, tickets, changelogs, diagrams, maintenance procedures, and internal guides.
Business value: faster diagnosis, fewer errors, support for field teams.
Employee onboarding
Problem: a new employee needs to absorb many documents quickly.
RAG solution: an onboarding assistant able to answer from validated internal documents.
Business value: smoother onboarding and less time spent by managers answering repetitive questions.
Key takeaway
The best RAG use cases are those where answers already exist in documents but are difficult to find, synthesize, or contextualize.
5. When RAG is not the right solution
RAG is not a universal solution.
Data that is too poor or poorly structured
If documents are incomplete, contradictory, obsolete, or scattered without logic, RAG is likely to produce poor answers. A RAG assistant does not magically fix a weak document base.
Need for complex business reasoning
RAG excels at retrieving and synthesizing information. It is less suitable when the system needs to apply complex business rules, calculate scenarios, arbitrate between several constraints, or execute transactional processes.
In these cases, a business application, a rules engine, or a structured workflow may be more appropriate.
Unreliable data
If no one knows which document is authoritative, the AI will not know either. The project must start by clarifying authorized sources, obsolete documents, and who is responsible for updates.
Automation rather than document search
If the problem is mainly about sending reminders, filling out a form, routing a ticket, or updating a CRM, a traditional automation may be enough.
Need for absolute guarantees
A RAG system can reduce hallucinations, improve traceability, and provide sources. It does not guarantee that every answer will be perfect. For critical decisions, human validation, deterministic rules, or control processes must be planned.
Key takeaway
Good scoping also means knowing when to say no to RAG. If the need is actually a workflow, a rules engine, or a traditional business application, it should be recognized as such.
6. Typical architecture of a RAG solution
A production-grade RAG architecture combines several building blocks.
Internal sources
├─ PDF, Word, PowerPoint
├─ SharePoint / Google Drive / Notion / Confluence
├─ CRM / ERP / support tickets
└─ business databases
↓
Document ingestion
├─ text extraction
├─ OCR if necessary
├─ cleaning
├─ structuring
└─ metadata detection
↓
Chunking
├─ by sections
├─ by titles
├─ by paragraphs
└─ with controlled overlap
↓
Embeddings + indexing
├─ embedding model
├─ vector database
├─ optional lexical index
└─ metadata and permissions
↓
User question
↓
Retrieval
├─ vector search
├─ keyword / BM25 search
├─ hybrid search
├─ access-right filters
└─ reranking
↓
Enriched prompt
├─ question
├─ relevant excerpts
├─ response instructions
└─ citation rules
↓
LLM
↓
User answer
├─ summary
├─ citations
├─ limitations
└─ possible actions
↓
Logs, feedback, evaluation, monitoring
Data sources
Sources can be simple, such as a folder of PDFs, or complex, such as a set of internal systems. The priority is not to connect everything, but to connect the right sources.
Document ingestion
Ingestion consists of extracting, cleaning, and transforming documents so they can be used. Scanned PDFs, tables, images, diagrams, and poorly formatted documents may require OCR, advanced parsing, or manual reprocessing.
AWS Bedrock Knowledge Bases, for example, describes managed workflows ranging from ingestion to retrieval, with conversion into text blocks, embeddings, and storage in compatible vector databases.
Chunking
Chunking consists of splitting documents into pieces. Chunks that are too long dilute the information. Chunks that are too short lose context. The strategy depends on the document type: procedure, contract, FAQ, technical documentation, or support ticket.
LlamaIndex notes that changing chunk size and overlap changes the embeddings that are calculated, and that a smaller chunk may be more precise while a larger chunk may be more general.
Embeddings and vector database
Embeddings transform text into numerical vectors that make it possible to measure semantic proximity between a question and documentary passages.
The vector database stores these vectors and makes it possible to retrieve the passages closest to a query.
Hybrid search and reranking
Vector search is useful, but it is not always enough. Modern architectures often combine:
vector search;
keyword search;
metadata filters;
reranking;
citations;
access-right controls.
Microsoft describes common RAG pipeline techniques: full-text search, vector search, chunking, hybrid search, query rewriting, and reranking. Pinecone also explains that two-stage reranking can improve quality by combining large-scale retrieval with more precise reordering.
Access-right management
A critical point: a user must not receive an answer based on a document they are not allowed to access. The architecture must apply permissions at retrieval time, not only in the interface.
Microsoft highlights the need for granular access control when private content is exposed to LLMs, with mechanisms such as security trimming, query-time filters, and network isolation.
Key takeaway
A serious RAG architecture is a complete chain: ingestion, document quality, search, permissions, prompt, model, citations, logs, and continuous improvement.
7. Important technical decisions
Which LLM model should be chosen?
The choice of model depends on several criteria:
answer quality;
cost per token;
latency;
maximum context length;
regional availability;
contractual commitments;
multilingual capability;
compatibility with security requirements.
A very powerful model is not always necessary. For many RAG use cases, the right compromise is to use a fast, economical model for simple questions, then a more advanced model for complex cases.
External API or self-hosted model?
Option | Advantages | Limits
External API | Fast to deploy, high-performing models, reduced maintenance | Vendor dependency, contractual clauses to verify, variable costs
Self-hosted model | Greater control, mastery of the environment, possible sovereignty | Infrastructure, MLOps, supervision, higher costs and skill requirements
Hybrid | Optimization by use case | More complex architecture
Which vector database?
The choice depends on volume, latency, filters, the security model, the cloud environment, and internal skills. An SME with a few thousand documents does not have the same needs as an international group with several million chunks.
Which chunking strategy?
It must be tested. The right splitting strategy depends on the documents. A FAQ can be split question by question. A contract must preserve clauses and subclauses. A procedure must keep steps together.
How should metadata be managed?
Metadata is essential:
document type;
date;
version;
author;
department;
confidentiality;
language;
client or project concerned;
access rights;
status: draft, validated, obsolete.
Metadata makes it possible to filter, prioritize, and explain answers.
How can hallucinations be limited?
No method eliminates them entirely. However, they can be reduced through:
high-quality retrieval;
mandatory citations;
instructions not to answer without a source;
confidence thresholds;
displaying limitations;
regular evaluation;
user feedback;
adversarial testing;
human validation for sensitive cases.
How should quality be evaluated?
A RAG system must be evaluated using real questions. Criteria may include:
accuracy;
completeness;
faithfulness to sources;
ability to say “I don’t know”;
relevance of citations;
latency;
satisfaction rate;
reduction in search time;
absence of information leakage.
Key takeaway
Technical decisions are not only technical. They influence cost, security, answer quality, maintenance, and user adoption.
8. The costs of a RAG project
The cost of a RAG project rarely depends on a single element. It mainly depends on the level of ambition, document quality, integrations, and usage volume.
Main cost items
Item | Description | Impact
Scoping | Business workshops, objectives, success criteria | Essential to avoid a useless POC
Document audit | Sources, quality, rights, obsolescence | Often underestimated
Data preparation | Cleaning, structuring, OCR, metadata | Can become a major cost item
Development | Ingestion, retrieval, interface, orchestration | Core of the project
Integrations | SSO, SharePoint, CRM, ERP, ticketing | Depends on the information system
LLM | Cost of API calls or model infrastructure | Depends on volume
Embeddings | Initial indexing and updates | Depends on document volume
Vector database | Storage, queries, availability | Depends on scale
Security | IAM, logs, encryption, testing | Non-negotiable
Monitoring | Tracking costs, errors, quality | Essential in production
Maintenance | Fixes, reindexing, improvement | Recurring cost
Indicative costs of the main LLM providers
Prices change often. The amounts below should be read as orders of magnitude observed on May 27, 2026 and verified before any contractual commitment.
Provider | Example models | Indicative public prices
OpenAI | GPT-5.5, GPT-5.4, GPT-5.4 mini | GPT-5.5: $5 / 1M input tokens and $30 / 1M output tokens; GPT-5.4: $2.50 / 1M input and $15 / 1M output; GPT-5.4 mini: $0.75 / 1M input and $4.50 / 1M output.
Anthropic | Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 | Opus 4.7: $5 / MTok input and $25 / MTok output; Sonnet 4.6: $3 / MTok input and $15 / MTok output; Haiku 4.5: $1 / MTok input and $5 / MTok output.
Google | Gemini 3.5 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash, Flash-Lite | Gemini 3.5 Flash standard: $1.50 / 1M input and $9 / 1M output; Gemini 2.5 Pro: $1.25 / 1M input up to 200k tokens and $10 / 1M output; Gemini 2.5 Flash: $0.30 / 1M text/image/video input and $2.50 / 1M output; Flash-Lite: $0.10 / 1M input and $0.40 / 1M output.
Mistral AI | Mistral Small 4, Large 3, Medium 3.5 | Small 4: $0.15 / M input tokens and $0.60 / M output tokens; Large 3: $0.50 / M input and $1.50 / M output; Medium 3.5: $1.50 / M input and $7.50 / M output.
Simplified calculation example
Assume:
200 users;
20 questions per business day per user;
22 days per month;
1,500 input tokens per question, including context;
500 output tokens per answer.
This represents approximately:
132 million input tokens per month;
44 million output tokens per month.
The monthly cost will depend heavily on the chosen model. On an economical model, it can remain controlled. On a premium model, it can become significant. The key point is therefore to manage cost based on actual usage, not only on displayed prices.
What drives costs up
sending too much context to the model;
using a premium model for every question;
needlessly reindexing the entire document base;
multiplying LLM calls in an agentic workflow;
not using cache;
not limiting retrieved documents;
not monitoring usage;
connecting non-priority sources;
processing large volumes of scanned PDFs without a strategy.
How to control costs
choose the cheapest model that reaches the expected quality level;
limit the context sent;
use reranking to send fewer but more relevant chunks;
cache recurring answers or contexts;
separate simple and complex questions;
track costs by team, use case, source, and query type;
define quotas;
start with a restricted document perimeter.
Key takeaway
The cost of RAG depends less on the “model price” than on the complete architecture: document volume, number of users, context length, usage frequency, retrieval quality, and level of integration.
9. Security, confidentiality, and compliance
GDPR and personal data
As soon as a RAG system processes personal data, GDPR must be taken into account: purpose, legal basis, minimization, retention period, data subject rights, security, subcontracting, and documentation.
The CNIL reminds organizations that GDPR does not prevent AI innovation, but it requires risks to individuals to be considered when personal data is used. It particularly emphasizes the need for a legal basis, minimization, and documentation of processing operations.
In a RAG project, the questions to ask are concrete:
do the documents contain personal data?
sensitive data?
HR data?
customer data?
health data?
confidential contractual information?
information covered by professional secrecy?
do all users have the right to access this information?
DPIA
A Data Protection Impact Assessment may be necessary or strongly recommended when the processing presents high risks. The CNIL recommends carrying out a DPIA for the development of AI systems in several situations, for example when sensitive data is collected, when personal data is processed at scale, when data about vulnerable individuals is used, or when datasets are cross-referenced.
AI Act
The AI Act introduces a risk-based approach. Obligations vary depending on whether the system is prohibited, high-risk, subject to transparency obligations, or minimal-risk. The European Commission states that prohibited practices and AI literacy obligations have applied since February 2, 2025, that rules relating to general-purpose AI models have applied since August 2, 2025, and that the timeline for high-risk systems was adjusted under the Digital Omnibus, with some rules planned for December 2, 2027 and others for August 2, 2028 depending on the categories.
For a company deploying an internal RAG assistant, the main issue is to qualify the use case: simple document assistant, HR tool, decision-support tool, tool used in recruitment, training, compliance, credit, health, safety, or other sensitive areas. European guidelines on the classification of high-risk AI systems were still subject to consultations and adjustments in May 2026.
Security of RAG systems
The main risks are:
leakage of sensitive data;
incorrect application of access rights;
prompt injection;
indirect prompt injection through documents;
exposure of the system prompt;
document poisoning;
unsourced answers;
logs containing sensitive data;
over-permissioned agents;
vendor dependency;
lack of auditability.
The OWASP Top 10 for LLM Applications 2025 notably ranks prompt injection, sensitive information disclosure, supply chain, data/model poisoning, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption among the major risks of LLM applications.
The UK NCSC emphasizes that prompt injection is not comparable to simple SQL injection: LLMs do not intrinsically separate data from instructions. It recommends treating LLMs as “inherently confusable” components and reducing impacts through design, privilege limitation, monitoring, and deterministic guardrails.
Recommended measures
SSO and strong authentication;
per-user access control;
document filtering at query time;
encryption in transit and at rest;
environment segmentation;
separation of dev, test, and production;
controlled logging;
log retention policy;
masking or pseudonymization where relevant;
security testing;
AI red teaming;
internal usage policy;
supplier contract review;
subcontracting clauses;
monitoring of transfers outside the EU;
documentation of choices.
The NIST AI RMF Generative AI Profile provides a voluntary framework for integrating trust, governance, measurement, and risk-management considerations into generative AI systems. The Cloud Security Alliance also recommends applying Zero Trust principles to LLM environments: least privilege, micro-segmentation, continuous monitoring, management of human and non-human identities, and governance.
Key takeaway
RAG security does not rely on a magic prompt. It relies on architecture: access rights, isolation, logs, encryption, supervision, testing, and governance.
10. Common mistakes to avoid
Démarrer par la technologie
Starting with technology
Choosing a vector database or a model before clarifying the business need often leads to an attractive but useless POC.
Connecting all data
The more sources there are, the more risks increase: document noise, obsolescence, unclear rights, costs, latency. It is better to start with a restricted and reliable corpus.
Neglecting document quality
A high-performing RAG system relies on reliable, up-to-date, and well-structured documents. Corpus quality is often more important than model choice.
Forgetting access rights
This is one of the most dangerous mistakes. An assistant must never become a workaround for accessing confidential documents.
Not planning citations
Without sources, the user cannot verify. Citations improve trust and make corrections easier.
Not testing answers
A set of real questions must be created, with expected answers and evaluation criteria. Otherwise, quality will remain subjective.
Underestimating maintenance
Documents change, sources move, models evolve, prices vary, and users discover new use cases: a RAG system must be maintained.
Confusing prototype and product
A prototype may ignore some topics: fine-grained security, monitoring, UX, error recovery, costs. A product cannot.
Launching a POC without success criteria
A POC must answer clear questions: do we save time? Are the answers reliable? Do users adopt it? Is the cost acceptable?
Key takeaway
Most RAG failures do not come from the model, but from poor scoping, a weak corpus, forgotten security, or lack of evaluation.
11. Recommended method for launching a RAG project
Scroll recommends a progressive approach in ten steps.
1. Identify the business need
Formulate the problem without talking about AI: “support teams lose 20 minutes finding a procedure,” “new employees always ask the same questions,” “sales teams do not know which document to use.”
2. Prioritize use cases
Not all use cases are equal. Priority should be given to those that combine business value, document feasibility, and manageable risk.
3. Audit available data
List the sources, their quality, freshness, format, owner, and confidentiality level.
4. Check security/GDPR constraints
Identify personal, sensitive, or confidential data. Determine rights, roles, and documentation obligations.
5. Design a target architecture
Choose the building blocks: ingestion, embeddings, vector database, model, interface, logs, monitoring, authentication.
6. Develop a limited prototype
Limit the scope: one team, one corpus, one use case, one simple interface.
7. Test on real cases
Use real user questions, not only ideal demonstrations.
8. Measure quality
Track accuracy, citations, justified non-answer rate, satisfaction, time saved, errors, and costs.
9. Deploy progressively
Expand in waves: more users, more sources, more features.
10. Maintain and improve
Reindex, correct, monitor, train, adjust prompts, review costs, improve sources.
Key takeaway
A successful RAG project looks more like a business product than an isolated technical experiment.
12. Checklist before launching a RAG project
Business problem
Who is losing time today?
What problem are we trying to solve?
Which decision or task will be improved?
What would be a useful result in 3 months?
Users
Who will use the tool?
How many users?
Which profiles?
Which different access rights?
What level of training?
Data
Which sources will be used?
Are the documents reliable?
Are they up to date?
Who maintains them?
Are there duplicates?
Are there obsolete documents?
Are the formats usable?
Security and compliance
Do the documents contain personal data?
Sensitive data?
Confidential information?
Are access rights clear?
Is a DPIA required?
Which providers will have access to the data?
Where will the data be hosted?
What is the retention period for logs?
Answer and quality
Must the model cite its sources?
Must it refuse to answer without a source?
What level of precision is expected?
Who validates the answers?
How is a bad answer corrected?
How is quality measured?
Costs
What initial budget?
What recurring budget?
How many expected queries?
Which model should be used?
Which usage thresholds?
Which monitoring?
Deployment
Prototype or product?
Which pilot team?
What timeline?
What user support?
What maintenance plan?
Key takeaway
If you cannot answer this checklist, the project is probably not ready to be developed.
13. Example of a fictional but realistic RAG project
Contexte
Context
A B2B services company with 600 employees has:
1,200 internal procedures;
300 contract templates;
450 support FAQs;
20,000 historical tickets;
project committee meeting minutes;
a poorly structured SharePoint base;
several obsolete documents that are still accessible.
Support, sales, and operations teams spend a lot of time looking for information. Answers vary depending on who is asked. New employees depend heavily on internal experts.
Initial problem
The company wants to allow its teams to ask questions in natural language:
“What is the escalation procedure for a critical incident?”
“Which clause template should be used for a public-sector client?”
“How should a late reimbursement request be handled?”
“What are the prerequisites before going into production?”
It wants sourced answers, limited to validated documents, while respecting access rights.
Chosen approach
The project starts with a limited scope: support procedures and validated FAQs. Contracts and historical tickets are excluded from the first batch because they contain more sensitive data and more document noise.
Architecture
SSO through the internal identity provider;
ingestion of validated documents from SharePoint;
text extraction and cleaning;
metadata: department, date, version, owner, status;
chunking by title and section;
embeddings;
vector database with access-right filters;
hybrid search;
reranking;
answers with citations;
logs anonymized as much as possible;
user feedback;
quality/cost dashboard.
Identified risks
obsolete documents still present;
inconsistent SharePoint rights;
contradictory procedures;
out-of-scope questions;
risk of users copy-pasting customer data into prompts;
unrealistic user expectations.
Measures taken
manually validated corpus;
exclusion of non-approved documents;
refusal to answer without a source;
display of the documents used;
usage policy;
short user training;
cost thresholds;
weekly review of feedback at launch.
Expected gains
reduced search time;
more consistent answers;
faster onboarding;
fewer requests to experts;
identification of documentation gaps.
Limits
the assistant does not replace experts;
it does not yet handle contracts;
complex answers require validation;
quality depends on document updates.
Next steps
After 8 weeks of pilot:
measure quality;
identify unanswered questions;
correct the corpus;
progressively integrate non-sensitive contract templates;
add simple workflows, for example opening a ticket or suggesting a procedure.
Key takeaway
A realistic RAG project starts small, proves its value, corrects its data, then expands progressively.
14. Conclusion
RAG is one of the most useful architectures for applying generative AI to a company’s internal data. It transforms a difficult-to-use document base into a conversational interface capable of searching, synthesizing, and citing its sources.
But a good RAG project is not simply about connecting a chatbot to documents. Its success depends on several factors:
a clear business need;
a reliable corpus;
a solid retrieval architecture;
strict access-right management;
security by design;
attention to GDPR and the AI Act;
cost management;
a simple user experience;
continuous evaluation;
real maintenance.
RAG is not magic. But when properly designed, it can become a very concrete tool to reduce search time, capitalize on internal knowledge, streamline operations, and improve the quality of business answers.
Key takeaway
The value of RAG does not come only from the model. It comes from the whole system: data, architecture, security, usage, governance, and adoption.
Le RAG est l’une des architectures les plus utiles pour appliquer l’IA générative aux données internes de l’entreprise. Il permet de transformer une base documentaire difficile à exploiter en interface conversationnelle capable de rechercher, synthétiser et citer ses sources.
Mais un bon projet RAG ne se résume pas à connecter un chatbot à des documents. Sa réussite dépend de plusieurs facteurs :
- un besoin métier clair ;
- un corpus fiable ;
- une architecture de retrieval solide ;
- une gestion stricte des droits ;
- une approche sécurité dès la conception ;
- une attention au RGPD et à l’AI Act ;
- un pilotage des coûts ;
- une expérience utilisateur simple ;
- une évaluation continue ;
- une maintenance réelle.
Le RAG n’est pas magique. Mais bien conçu, il peut devenir un outil très concret pour réduire le temps de recherche, capitaliser la connaissance interne, fluidifier les opérations et améliorer la qualité des réponses métier.
À retenir
La valeur du RAG ne vient pas seulement du modèle. Elle vient de l’ensemble : données, architecture, sécurité, usage, gouvernance et adoption.
15. Scroll call to action
Vous envisagez de créer un assistant IA connecté à vos données internes ?
Are you considering creating an AI assistant connected to your internal data?
Scroll helps you scope the need, assess feasibility, design the architecture, and develop a solution adapted to your business, technical, and regulatory constraints.
Our approach is pragmatic:
start from the business need;
audit available data;
identify risks;
choose a suitable architecture;
develop a useful prototype;
measure quality;
prepare a secure deployment.
Let’s discuss your RAG project
Before building, have your AI project scoped: use cases, data, architecture, costs, security, compliance, and deployment roadmap.
Sources used
Specifications
Brief provided by the user for the positioning, structure, target audience, tone, and constraints of the white paper.
RAG and modern architectures
Microsoft Learn, “Retrieval-augmented generation in Azure AI Search”, accessed May 27, 2026. Source used for enterprise RAG issues: query understanding, multi-source access, token constraints, governance, security, classic RAG, and agentic retrieval.
Microsoft Cloud Blog, “Common retrieval augmented generation techniques explained”, February 4, 2025. Source used for RAG techniques: full-text search, vector search, chunking, hybrid search, query rewriting, and reranking.
AWS Prescriptive Guidance, “Understanding Retrieval Augmented Generation”. Source used for the definition of RAG, the basic steps, and the role of embeddings, the vector database, and the orchestrator.
AWS, “Amazon Bedrock Knowledge Bases”. Source used for managed RAG workflows, ingestion, vector stores, advanced chunking, GraphRAG, reranking, and source attribution.
LlamaIndex, “Basic Strategies”. Source used for considerations around chunk size, chunk overlap, embeddings, and hybrid search.
Pinecone, “Rerankers and Two-Stage Retrieval”. Source used to explain the value of reranking and two-stage retrieval.
GDPR, CNIL, and personal data
CNIL, “Développement des systèmes d’IA : les recommandations de la CNIL pour respecter le RGPD”, July 22, 2025. Source used for GDPR principles applicable to AI: legal basis, minimization, security, sensitive data, DPIA, and documentation.
CNIL, passages on legal basis, minimization, and the responsibilities of the data controller / processor.
CNIL, passages on DPIA and risk criteria.
European Data Protection Board, thematic page “Artificial intelligence”, including Opinion 28/2024 on certain data protection aspects related to AI models.
AI Act
European Commission, “AI Act | Shaping Europe’s digital future”. Source used for the risk-based approach, GPAI obligations, transparency obligations, application timeline, and adjustments linked to the Digital Omnibus.
AI Act Service Desk, “Timeline for the Implementation of the EU AI Act”. Source used for the progressive implementation timeline and 2025–2027 milestones, with mention of the Digital Omnibus context.
European Commission, May 19, 2026 consultation on guidelines for classifying high-risk AI systems. Source used to note that qualification of high-risk systems remains something to verify case by case.
European Commission, “Standardisation of the AI Act”. Source used for the link between high-risk rules, harmonized standards, and the adjusted timeline.
AI, LLM, and RAG security
OWASP GenAI Security Project, “2025 Top 10 Risk & Mitigations for LLMs and Gen AI Apps”. Source used for the main LLM risks: prompt injection, sensitive information disclosure, supply chain, poisoning, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.
OWASP, “LLM01:2025 Prompt Injection”. Source used to specify that RAG and fine-tuning do not fully eliminate prompt injection risks.
NIST, “Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile”, published July 26, 2024. Source used for the generative AI risk-management framework.
NCSC, “Prompt injection is not SQL injection (it may be worse)”, December 8, 2025. Source used to explain why prompt injection is a structural risk of LLMs and why it must be managed through architecture, impact limitation, and monitoring.
Cloud Security Alliance, “Using Zero Trust to Secure Enterprise Information in LLM Environments”, February 3, 2026. Source used for Zero Trust recommendations applied to LLM environments: least privilege, micro-segmentation, continuous monitoring, IAM, and governance.
CISA, “New Best Practices Guide for Securing AI Data Released”, May 22, 2025. Source used for best practices to secure data used to train and operate AI systems.
LLM pricing and API costs
OpenAI, “API Pricing”, accessed May 27, 2026. Source used for GPT-5.5, GPT-5.4, GPT-5.4 mini, batch API, and cost-related options.
Anthropic, “Claude API Pricing”, accessed May 27, 2026. Source used for Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5 pricing.
Google AI for Developers, “Gemini Developer API pricing”, accessed May 27, 2026. Source used for Gemini 3.5 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and Gemini embeddings pricing.
Mistral AI Docs, model sheets for Mistral Small 4, Mistral Large 3, and Mistral Medium 3.5, accessed May 27, 2026. Sources used for indicative prices per million tokens and context characteristics.
Scroll
Custom AI and development agency. Paris.
20 Rue des Taillandiers
75011 Paris
contact@agence-scroll.com
+33 6 48 03 90 27
Applications & Websites
Websites & CMS
Apps
Business application
Sales enablement tool
SaaS
Marketplace
AI & automation
AI project scoping & prioritization
AI assistants connected to your data
Automation / n8n Agency
Scoping & design
Functional & technical scoping
UX / UI Design
Takeover & Modernization
No-code to Code Migration
Vibe-coded Project Takeover
Application Modernization
Resources
Use cases
Blog
White Paper — RAG: Unlocking Enterprise Data with AI
Agency
About
Contact
Privacy Policy
Legal Notices
Our no-code legacy: Webflow · Bubble · Flutterflow · Weweb · Plasmic · Directus · Make · Airtable
Scroll
© 2026 — Scroll Agency