19 Mar 2026 3 min read Ghost Application

Designing a Self-Hosted AI Assistant (BottyGPT Architecture)

In my last post, I talked about where your AI knowledge should live — and why I chose to self-host DocsGPT rather than rely on third-party tools.

This is the follow-up to that: a look at how the system actually comes together.

This post isn’t about implementation details just yet. It’s about the shape of the system — how the pieces fit together, where things live, and why I made the decisions I did.

The Goal

At a high level, I wanted something pretty simple:

One AI assistant
One source of truth
Available everywhere I publish

That means whether you're reading docs or browsing my site, you're talking to the same system, with the same context and the same understanding of the content.

No fragmentation, no duplicated setups, no weird inconsistencies.

And... not a ton of time spent designing the avatar...

The Core Pattern

The entire setup follows a simple pattern:

One backend, many frontends

There’s a single DocsGPT-powered backend acting as a RAG system, and it’s shared across:

my main Ghost site
my Docusaurus docs site

Both use the same:

apiHost
apiKey
widget implementation

So no matter where you interact with it, the assistant behaves the same way — same scope, same answers, same citations.

That consistency was important to me. If the assistant is part of the experience, it shouldn’t feel different depending on where you are.

The System at a Glance

At runtime, everything lives on a single VM, orchestrated with Docker Compose.

Inside that, there are a handful of key components:

This is what you actually see on the site.

It’s embedded into:

the Ghost theme (via default.hbs)
the docs site (via script or React component)

Its job is simple:

send queries to the backend
render responses
show citations

It doesn’t know anything about the data itself — it just talks to the API.

2. DocsGPT Backend (API)

This is the core of the system.

It handles:

incoming queries
retrieval from the vector database
LLM orchestration
response formatting

It’s exposed publicly as an API, and everything flows through it.

3. Worker + Background Jobs

Some tasks don’t belong in the request cycle.

Things like:

embedding documents
ingestion
longer-running operations

These run through a worker (Celery) using Redis as a broker.

4. Data Layer

There are three main pieces here:

Qdrant → vector search (embeddings + retrieval)
Redis → task queue + short-term caching
MongoDB → metadata and operational state

Each one does one job, and keeps the system modular.

5. Frontend + Static Assets

The DocsGPT frontend (served via Nginx) handles:

UI assets
admin interface
widget resources

This sits alongside the API, but stays separate in responsibility.

Where It All Lives

Everything runs on a single Google Compute Engine VM in the Montréal region.

That includes:

backend API
worker
vector database
Redis + Mongo
frontend + Nginx

It’s all containerized and orchestrated through Docker Compose.

No Kubernetes. No distributed system. Just one well-defined runtime.

Why This Architecture

There are a few principles behind this setup.

1. Simplicity scales surprisingly far

One VM is easy to reason about.

one place to SSH
one set of logs
one deployment target

For the scale I’m operating at, that simplicity is a feature, not a limitation.

2. Consistency across experiences

By centralizing the backend, I avoid:

duplicated embeddings
inconsistent answers
fragmented context

The assistant behaves like a system, not a feature bolted onto individual sites.

3. Data residency matters

Everything runs in Canada.

That’s a deliberate choice — it keeps data within a region I’m comfortable with and makes the trust story a lot cleaner, especially for Canadian clients.

4. Predictable performance and cost

Running a dedicated VM with a local vector database means:

no surprise per-request costs
fewer network hops
more consistent latency

It’s not the most “cloud-native” approach, but it is very predictable.

Tradeoffs (On Purpose)

This setup isn’t trying to do everything.

There are a couple of intentional tradeoffs:

no automatic horizontal scaling
single VM = single point of failure
no multi-region redundancy

And that’s fine — for now.

The goal here wasn’t to build a hyperscale system. It was to build something:

understandable
controllable
and easy to evolve

What’s Next

This post is the high-level view.

In the next one, I’ll break down how this actually runs in practice — the VM setup, Docker Compose, and how all the services are wired together day-to-day.

Because the interesting part isn’t just the architecture, it’s how it behaves when you actually run it!