— Practice / RAG

RAG development.

Retrieval-augmented generation lets a model answer from your own documents, with citations, instead of guessing. We build production RAG pipelines on Cloudflare Workers — chunking, embeddings, a vector store, and generation, each tuned to your corpus rather than a tutorial.

— Why us

Grounded and cited, not guessed.

The AI search on this site is a working RAG pipeline — it answers from our own pages and links the sources. We publish an open-source RAG template that other teams deploy in five minutes, and we build the same for clients: no LangChain, no framework, every primitive mapped to one service you can reason about.

Deploy your own in five minutes: git clone https://github.com/setkernel/cf-rag-template

See the open-source template Try our AI search Our AI practice

— What we build

The whole pipeline, honestly built.

Chunking and ingestion

The unglamorous part that decides answer quality — splitting your content so retrieval returns the right context, and keeping it in sync as the source changes.

Embeddings

Turning your text into vectors with a model matched to the corpus and the budget — on Cloudflare Workers AI or OpenAI, whichever fits.

Vector search

A vector store — Cloudflare Vectorize or another — with hybrid retrieval where it helps, so the model gets the passages that actually answer the question.

Grounded generation with citations

The model answers strictly from retrieved context and cites the source, with a system prompt that refuses to invent — so a wrong answer is a missing document, not a hallucination.

Evals

Retrieval and answer quality measured in the pipeline, so a change that quietly degrades results fails the build instead of shipping.

The right surface

A search box, an API, an agent tool, or an MCP server — we build the pipeline into the surface that fits, and it is cheap to run because Workers bills compute, not wait time.

— How it works

The engagement.

Same five-step method as every SetKernel build — Brief, Architect, Sprint, Ship, Operate — each with a written artefact you review. We start from a short written brief: the questions the system should answer, the documents it should answer from, and what a good answer looks like. You get a scoped price and a fit / no-fit answer within one business day.

— Where we work

Atlantic Canada, and worldwide.

We are a complete technology partner in Halifax, Nova Scotia, Canada, and we work remotely with teams well beyond the region. A RAG pipeline is cloud-native by nature — where your team sits does not change the build.

— Questions

Before you write.

What is RAG, in one sentence?

Retrieval-augmented generation retrieves the most relevant passages from your own content and hands them to a language model as context, so the answer is grounded in your documents and can cite them — rather than relying on whatever the model happened to memorise.

RAG or fine-tuning — which do I need?

Usually RAG. Fine-tuning changes how a model writes; RAG changes what it knows, and it updates the moment your documents do, with citations you can check. Most "the model should know our stuff" problems are retrieval problems, not training problems. We will tell you honestly if yours is the exception.

How do you stop it from making things up?

The system prompt instructs the model to answer only from retrieved context and to say so when the answer is not there, retrieval is tuned so the right passages actually surface, and evals catch regressions. A wrong answer becomes a missing or mis-chunked document — something you can fix — not an unexplained hallucination.

Can we see your work first?

Yes. The AI search on this site is a live RAG pipeline with citations. Read the open-source cf-rag-template and its companion essay on building RAG on Cloudflare without LangChain. We would rather show you working code than a pitch.

How do we start?

Send a short written brief — the questions to answer, the documents to answer from, the deadline. We reply in writing within one business day with fit / no-fit and, if fit, a scope and price. No discovery call before the brief.

— Engage

Have a pile of documents a model should be able to answer from?

Tell us in two paragraphs — the questions, the documents, what a good answer looks like. We reply in writing within one business day.

AI agent development MCP server development ਇੱਕ ਬ੍ਰੀਫ਼ ਲਿਖੋ