Dad's Manual

Thesis

AI doesn't replace you. It amplifies the parts of you that are reproducible. The part only you have, intent, is still almost entirely yours.

The whole piece builds toward one equation:

Agent+Knowledge+Brain+Memory+Senses+Interface+Tools=Capability

Intent+Capability=Work getting done

Humans have always been this stack. AI is a new stack with the same shape. Combine them and work that used to take weeks takes hours. But the thing that starts any of it, wanting something done, is still yours.

The same framework describes one chat window and a team of agents running a business. What changes is the number of agents, the depth of each block, and how they're connected. The shape holds.

The framework: eight blocks

Each block has: human version, AI version, a one-line "why this matters," and a drill-down for the curious.

A British dad figure labelled with the eight blocks of the framework: Intent at the heart, Brain at the crown, Knowledge at the neck, Memory at the temple, Senses at the face, Interface at the hands, Tools at the mug of tea, Agent as the whole figure. — Same eight blocks, human and AI. The dotted heart on the AI is not decoration.

A humanoid AI figure labelled with the same eight blocks, with a hand-drawn dotted heart and a question mark marking the absent Intent in its chest. — Same eight blocks, human and AI. The dotted heart on the AI is not decoration.

1. Intent

The thing that starts everything. Before the tools, before the models, before the brainpower: the wanting.

Human: you, deciding you want something
AI: not yet (*), a contested and occasionally alarming open question
Why it matters: every block below amplifies intent. No intent, no amplification. "Use AI more" is useless advice. "I want to get X done, faster" is where it starts.

Intent is the one block you cannot outsource. You can outsource knowledge (tell the AI what you know). You can outsource brainpower (ask the AI to think). You can outsource memory (let the AI remember for you). You can even outsource deciding (ask for a recommendation and take it). What you cannot outsource is wanting in the first place. Every block below exists to serve yours. Without it, none of them run.

* Terminator caveat acknowledged. See you in 2029.

Drill-down

The practical version of this: when an AI session goes sideways, it's almost always because the intent wasn't clear. Vague inputs produce vague outputs. "Help me with my emails" is a wish. "Draft three replies to the ones from the school, polite but firm because the headmaster is being unreasonable again" is an intent.

The better you get at articulating intent, the less time you'll spend wrangling unusable AI responses. This is the single most-underrated skill in the field, and nobody teaches it, because it isn't really AI. It's communication. The dad who shows a picture and says "number 2 on the sides, scissor-trim on top, leave the fringe" gets the haircut he wanted. The dad who says "just a little off" comes home looking like a banker from 1987. Same principle.

Academic literature calls this "prompt engineering" but don't be put off by the grandness of the term. It's just asking clearly.

2. Agent

The thing that does the work. When people say "AI", this is usually what they mean.

Human: you, your colleagues, the people you hire
AI: a growing family of specialists, including chat models, image generators, video generators, voice models, code models, and increasingly, autonomous agents that string these together
Why it matters: "AI" is a family, not a single thing. Picking the right agent for the task is half the win. You don't call a plumber for a haircut.

Drill-down

Think of AI agents like kitchen appliances. Within the family of "blenders" there are Nutribullets, Vitamixes, and Magic Bullets. They're all blenders, but you'd choose differently depending on what you're making. AI works the same way: a family of underlying kinds, each with specific products.

Language models (LLMs). The workhorse family. They process and generate text. Underneath ChatGPT is a model called GPT (made by OpenAI). Underneath Claude is Claude (Anthropic). Underneath Gemini is Gemini (Google). The product is the shop window; the model is what's behind the counter. Products: ChatGPT, Claude, Gemini, Perplexity, Copilot. Use for: thinking, writing, summarising, explaining, deciding. If in doubt, start here.

Image models. A family (mostly built on a technique called diffusion) that produces images from text descriptions or reference pictures. Products: Midjourney, Flux, Google Stitch, Ideogram, Adobe Firefly. Use for: illustrations, logos, concept art, Christmas cards.

Video models. A newer family that produces short clips, typically 5–10 seconds long. Products: Kling, Runway, Luma, Sora, Veo. Use for: social content, short montages. Not yet great for long-form.

Voice and audio models. Three flavours: transcription (Whisper), voice generation (ElevenLabs), music generation (Suno, Udio).

Decision-making systems (reinforcement learning). A quieter family that learns by trial and error rather than by reading text. Famously: AlphaGo. Also used in robotics, self-driving cars, and game-playing AI. You're unlikely to use one directly, but worth knowing exists.

Scientific models. Trained to respect specific laws, not just spot patterns in data. Example: PINNs (Physics-Informed Neural Networks), used in fluid dynamics, engineering simulations, climate modelling. Not consumer-facing, but you'll hear more as industries build their own.

Rule-based systems. The oldest family. No learning, just explicit logic. Expert systems, spam filters, the decision tree in your car's fault diagnostics. Technically still AI, and in the right context, still the right answer.

Model, product, agent: three things often confused

A model is the underlying AI engine. Think GPT-4, Claude Sonnet, Midjourney v7. You don't usually interact with models directly.
A product is the branded app wrapping a model. ChatGPT, Claude.ai, the Midjourney Discord bot. This is what most people mean when they say "AI."
An agent is a model (or product) given a role, knowledge, tools, and a goal, and let loose to pursue it semi-autonomously. Agents are what Keith's setup uses. A chat window is not an agent; an agent is a chat window with intent, memory, and hands.

This manual uses "Agent" as the friendly shorthand for "the thing doing the work," because the word has caught on generally. But when someone in the group says "I built an agent," they probably mean the third thing, not the first two.

Persona files: giving each agent a "who am I"

Every serious agent carries a persistent set of instructions defining who it is, what it does, what it doesn't do, and how it writes. Different tools and frameworks call this by different names, but it's the same idea:

CLAUDE.md: Claude Code projects
soul.md: Hermes and other character-first frameworks
AGENTS.md: emerging convention in several open-source stacks
.cursorrules: the Cursor IDE
.clinerules / .windsurfrules: Cline, Windsurf, similar tools
Custom Instructions: the field in ChatGPT Projects and Claude Projects
System prompt: the generic name that sits underneath all of the above

The file (or field) is where you write things like "You are a research analyst. Respond concisely. Never invent citations. Flag uncertainty explicitly." A 200-word persona file well done is the difference between a competent agent and a generic assistant. Done badly or not at all, the agent defaults to a vaguely helpful, verbose, uncertain non-specialist.

Foundation models and where this is heading. The newest concept in this space is foundation model: a single large model trained so broadly that it can be adapted to many tasks (text, images, audio, reasoning) rather than specialised for one. GPT-4, Gemini, Claude are all foundation models now. The taxonomy above is still useful, but expect the families to bleed into each other as one model learns to handle everything.

3. Knowledge

What the agent already knows before you ask. The equivalent of everything you learned at school, plus every book you've read, plus every job you've had.

Human: school, university, work experience, reading, apprenticeship, years of doing
AI: pre-training (read most of the internet, learned language and world), fine-tuning (apprenticed in a speciality), RLHF (learned from human feedback), safety training (learned what not to do)
Why it matters: an AI knows an enormous amount in general, and absolutely nothing about you. Your company, your kids, your life, your business: not in its training. You'll have to supply that knowledge separately.

Drill-down

There are actually three kinds of knowledge in play, and confusing them is the source of most AI frustration.

Parametric knowledge. What's baked into the model during training. Fixed. Hard to change. This is why you can't "teach ChatGPT your company's jargon" by telling it once in a chat. The chat ends, and the knowledge evaporates.

Contextual knowledge. What's in the current conversation. Everything you've pasted in, uploaded, or told the model since the session started. Powerful, but limited by the context window (see Memory) and gone when the session ends.

Retrieved knowledge (RAG, for "retrieval-augmented generation"). External knowledge the agent can look up on demand, usually from a vector database you've set up. This is how enterprise setups like Keith's give agents access to CRM data, documents, and policies without retraining anything.

Most BDD-level usage will lean on the first two. A serious personal setup eventually grows into the third.

The honest limitation: no matter how clever the AI, if the knowledge it needs isn't in one of these three buckets, it will either guess (plausibly, sometimes wrongly) or tell you it doesn't know. Noticing this distinction is half of working with AI well.

4. Brain

The processing engine. What the thinking actually runs on.

Human: one biological brain, roughly 86 billion neurons, runs on sandwiches
AI: neural networks running on specialised chips (GPUs, TPUs). Bigger models mean more capable, but more expensive and slower.
Why it matters: not all AIs are the same class of mind. A small model running locally on your phone is not the same thing as GPT-4 on a cloud server. When answers feel flat, you might be on the wrong brain.

Drill-down

Brains come in tiers. You don't need to care about the technical details, but you should care that the differences are real.

Tier 1: consumer defaults. The free or cheapest paid tier of any major model. Fine for 80% of what most dads will ever ask. ChatGPT free, Claude free, Gemini free.

Tier 2: capable paid tiers. Where serious work happens. ChatGPT Plus, Claude Pro, Gemini Advanced. Around £15–25 a month each. The models are bigger, smarter, and more tolerant of long or messy inputs.

Tier 3: pro and team tiers. For heavy users. Larger context windows, priority access, usually some agent-building tools bundled in. £50–200 a month.

Open-weight models, local and hosted. A separate family worth knowing about. You can run free open-weight models (Llama, Kimi, Qwen, DeepSeek, Mistral) on your own hardware via Ollama, LM Studio, and similar, or rent the same models on managed cloud via Ollama Cloud, Together, Groq, Fireworks, and Replicate. Two years ago these were fine for dabbling, not serious work. Today they are genuinely competitive for most tasks: summarisation, extraction, routing, a lot of reasoning, substantial writing. The frontier proprietary models (Claude, GPT-4, Gemini) still lead for the most demanding reasoning, agentic tool use, and long-context alignment, but the gap is narrower than it was, and closing. What changed: the open-weight model quality caught up, and managed hosting made running them practical at a fraction of frontier prices.

One useful rule: if a task matters and a free-tier answer feels weak, try the same prompt on a paid tier before concluding AI can't do it. Most "AI is useless" stories are actually "I used a small brain for a big problem" stories.

Hybrid routing: pick the cheapest brain that does the job. The operating pattern for anyone running multiple AI tasks is to match brain to task. Bulk work (summarising, categorising, triage, routing) goes to a fast cheap model, often open-weight on managed cloud. Client-facing writing, tricky reasoning, or analysis goes to a capable expensive one, usually frontier proprietary. Specialised jobs (vision, code, legal) go to a specialist. Paying top-tier prices to sort an inbox is the same mistake as hiring a partner to do the filing.

5. Memory

What the agent holds onto, and what it forgets.

Human: working memory (what's in your head right now), long-term memory (years of accumulated experience), external memory (your notebook, Obsidian vault, that shoebox of receipts)
AI: context window (the current chat, wiped when it ends), parametric weights (baked in from training, almost impossible to update), external stores (RAG, vector databases, auto-memory systems that remember you across sessions)
Why it matters: memory is where AI becomes personal. A fresh chat forgets you every time. A properly configured one remembers. That's the difference between a tool and an assistant.

Drill-down

Working memory matters more than people realise. Every AI chat has a context window: the amount of text the model can hold in mind at once. Today's high-end models can hold the equivalent of a long novel. Hit the limit and the earliest parts of the conversation start to drop off, silently. If the AI suddenly "forgets" something you told it an hour ago, you've probably hit the wall.

Long-term memory is new, and a game-changer. ChatGPT, Claude, and most serious agents now offer some version of "remember things about me across sessions": your preferences, your household context, your work, your name. Set this up once and every future conversation starts with context you'd otherwise have to re-explain. The thirty seconds you spend configuring memory pays back every time you open the app.

External memory is where serious setups live. A personal knowledge base (Obsidian, Notion, Tana) that the AI can read into is the dad equivalent of having a second brain, and one that a well-built agent can query. Keith's business setup is this at scale: shared knowledge bases that every agent reads from, so nobody is ever the only one who knows something.

6. Senses

How the agent perceives what you show it.

Human: sight, hearing, touch, smell, taste
AI: reads text, sees images and PDFs, hears audio, processes video. When a model can do several of these at once it's called multimodal.
Why it matters: you are not limited to typing. Take a photo of a receipt. Record a voice note. Drop in a PDF of a school letter. The modern AIs absorb all of it.

Drill-down

The unlock here is that modern chat apps (ChatGPT, Claude, Gemini) accept far more than text. A few things worth knowing:

Photos. Point your phone at something and ask about it. A label in Arabic, a broken fuse box, a receipt to expense, a maths question from your kid's homework. Often faster than typing a description.

Documents. Drop a PDF, a Word document, a scanned school letter. The AI reads it. Good for summaries, finding specific clauses, translating jargon, preparing questions before a meeting.

Voice. All major apps now accept voice input. For dads who are better talkers than typers (or driving, or doing the school run), this is the unlock. Voice in, structured text out.

Video and audio files. Less common in daily use but widely supported: meeting recordings, voice notes, dashcam clips. Upload, ask for a summary, get one in seconds.

A quiet revolution: if you're still only typing to AI, you're using a fraction of what it can do. The single most useful BDD-level habit to build this year is show it, don't describe it.

7. Interface

How you reach the agent, and how it reaches back into your world.

Human: phone, keyboard, microphone, voice, eye contact
AI: chat apps, voice mode, code editors, browser agents, desktop agents, robots, raw APIs
Why it matters: choosing the right interface is half the battle. Chat for conversations. Voice for hands-free. Browser for web tasks. IDE for code. Matching the interface to the task is an underrated skill.

Drill-down

The interface is where AI meets your actual life. A few that matter for BDDs:

Chat apps (web and mobile). The default. ChatGPT, Claude, Gemini. Great for anything that fits a conversation. Almost certainly where you're starting.

Voice mode. The mobile app of any major chat tool, with voice turned on. Genuinely useful for the school run, the commute, a walk. Ask questions, dictate emails, get summaries hands-free.

Browser agents. Claude in Chrome, ChatGPT Atlas, ChatGPT Operator. The AI takes control of a browser tab and uses the web for you: books flights, fills forms, finds things. Early, occasionally spectacular, occasionally confused. Worth trying.

Desktop agents. Cowork (the app you may be reading this in), Claude Code. The AI works with files on your computer. For anyone with folders of documents, photos, or spreadsheets to wrangle.

Code editors. Cursor, Claude Code, GitHub Copilot. The AI sits in the editor with you, reading and writing code. If you code, or want to start.

Agent-building platforms. If you want to build your own agent rather than rent a pre-made one, platforms like Hermes, Gumloop, Make, n8n, and Zapier stitch together models, tools, and triggers without coding. Make and Zapier are workflow-automation veterans that recently grew AI teeth. n8n is open-source and self-hostable. Gumloop and Hermes are AI-native. The learning curve is steeper than a chat app, lower than writing code.

APIs. The plumbing underneath all of the above. Relevant if you're building something, irrelevant otherwise.

The single mistake most dads make is staying in the chat-app interface for things better done in a voice or browser interface. If you find yourself typing out what you could say, switch to voice. If you find yourself copy-pasting between the AI and a website, try a browser agent.

8. Tools

What the agent can reach for to actually do the work, beyond talking.

Human: hammers, spreadsheets, search engines, email, your team
AI: web search, code execution, file handling, Gmail, Calendar, Slack, Linear, your Drive, image generators, video generators, and increasingly, other agents
Why it matters: a smart agent without tools can only talk. An agent with tools can actually do things: send the email, book the flight, draft the spreadsheet, file the receipt.

Drill-down

Tools are the difference between advice and action. A chat model with no tools can tell you what to say in an email. A chat model with tool access to your Gmail can draft it in your drafts folder for you to review and send. Same brain, different reach.

A useful rough taxonomy:

Information tools. Web search, database queries, knowledge-base retrieval. How the AI finds things.

Action tools. Send email, book calendar, write file, post to Slack, create a task in Linear. How the AI does things.

Creation tools. Generate image, generate video, generate code, generate audio. How the AI makes things.

Meta tools. Call another agent. The thing that turns one helpful AI into Keith's multi-agent workspace.

For most dads, the highest-value tools to wire up first are these: web search (real-time answers), Calendar (scheduling help), Gmail or equivalent (inbox triage), and a file-handling interface (let it work with your documents). Each of those, once connected, compounds what the agent can do.

Troy Hodgson's BDD AI tools list is maintained by the group and kept current. Worth bookmarking.

Specific products named in the drill-downs are current as of April 2026. A reference point, not a forever list.

How the blocks compose

Once you see the eight blocks, the next thing to notice is that they compose.

A single agent with its own knowledge, memory, senses, interface, and tools is already useful. That's a chat window with ChatGPT or Claude. Powerful on its own.

But nothing stops you from running several agents in parallel. One handles email. One handles calendar. One handles research. Each has its own role (intent delegated from you), its own knowledge, its own tools. They share a memory layer, a knowledge base everyone reads from. They can call each other when they need to. The whole network runs toward a larger intent: "keep my week on track," or "find, qualify, and follow up on new business leads."

That's what agentic workflows are. Not a different thing: the same eight blocks, plural and connected. The industry is moving from prompts (one ask, one answer) to workflows (ongoing, self-directed, multi-step). That's what people mean by "AI agents": the blocks composed and pointed at a standing goal.

This matters for the reader in two ways:

Wherever you are today is fine. Using one chat window is a legitimate use of the framework. The blocks don't demand composition; they permit it.
Scaling up adds agents, not new concepts. If you ever want what Keith described, a team of agents running parts of your business, you don't need a different mental model. You use more of the same one.

Worked examples: three altitudes

Instead of three parallel tasks, three altitudes of the same framework. Same blocks throughout. What changes is how many instances of each, and how deeply they're used.

Where are you flying today?

Altitude 1: One agent, one chat

"I want to understand this article someone sent me."

Someone in the group chat drops a link. It's a 3,000-word piece from a newspaper you don't usually read, about a subject you don't usually think about, and you don't have the time or energy to read it properly. You want the shape of it in two minutes so you can respond without pretending.

Block by block

Intent. You, wanting: "I want this article in plain English, shorter, and I want to know if it's worth reading properly."
Agent. One general language model. ChatGPT, Claude, or Gemini: it genuinely doesn't matter which one. The default app on your phone is fine.
Knowledge. Nothing extra. The model's general training is enough to understand the article.
Brain. Whichever tier you're on. The free tier handles this without breaking a sweat.
Memory. Just the current chat.
Senses. You paste the article text in. Or, if it's a screenshot or a PDF, you drop that in.
Interface. The phone app. On the sofa, one-handed.
Tools. None. Pure conversation.

Result: a two-paragraph summary with the key claims, a note on what's well-argued and what's thin, and a recommendation on whether the full piece is worth your time. Thirty seconds from paste to answer.

The point: this is AI at its most useful and least impressive. One chat window, one conversation, one article. No setup, no configuration, no agents. This is enough. The majority of us will get enormous value out of nothing more than this, forever.

Altitude 2: One agent, real work

"Plan a family weekend in Ras Al Khaimah."

It's April. The kids are climbing the walls. You've half-decided to take them somewhere within a three-hour drive for a long weekend, but the three-browser-tab research phase you'd normally go through has not happened. You want something sketched out before dinner.

Block by block

Intent. You, with a real goal: "Two adults, two kids, three nights, leaving Friday morning. Budget around AED 3,000 for the hotel portion. Kids need a pool they'd actually swim in. One adult-only dinner."
Agent. Still one general language model, but now a capable tier.
Knowledge. The model knows RAK in general: hotels, beaches, the main resorts, how long the drive is. It does not know your budget, your kids' ages, your dates, or that you promised someone a decent spa. You'll add that context.
Brain. Cloud model, standard paid tier.
Memory. Set up household context once in the app's memory (family size, ages, allergies, preferences). Every future trip-planning chat starts knowing this.
Senses. Upload photos from last year's trip to somewhere you loved. Drop in a PDF of a brochure that caught your eye.
Interface. Start on the laptop. Switch to voice on the drive to pick up the kids, to refine the shortlist out loud.
Tools. Web search for current rates and availability. Calendar integration to block the dates. Booking handoff at the end if you want to go straight to reservation.

Result: a shortlist of three resorts with quick pros and cons, a suggested itinerary for the weekend, and a booking link for your top pick. Twenty minutes of back-and-forth where two hours of tabs used to live.

The point: this is where most of us should be within a month of reading this manual. One capable agent, properly configured, with the right tools available. No multi-agent systems, no code, no CRM. AI as a competent personal fixer.

Altitude 3: Many agents, composed

"Run the top of my sales funnel."

This example is described with permission, drawing on Keith Egan's multi-agent setup shared in the BDD group chat.

You've got a business. Leads come in from a website form, LinkedIn, referrals, and the occasional speaking event. Someone (you, or an assistant, or nobody at all) has to qualify them, research the company, send a first response, schedule a call, take notes on the call, send a follow-up with a proposal, and remind you about it all when the time comes. This is a meaningful chunk of someone's week.

Block by block

Intent. Yours, at the business level: "New leads get qualified and routed within an hour. Research gets done before my first call. Follow-ups don't drop. I see a weekly summary on Friday."
Agents. Several, each with a specific role:
- A lead-qualifier reads each inbound form submission, scores it against your ideal-customer profile, and routes high-value leads to you immediately.
- A research agent pulls public information on the prospect's company: size, sector, recent news, who else you know there.
- A content drafter writes a first response and a tailored follow-up, matching your tone.
- A call analyst transcribes and summarises the conversation, flagging next actions.
- A reporting agent pulls it all into a Friday digest.
- An orchestrator decides which agent runs when, and passes information between them.
Knowledge. Shared knowledge base with your CRM data, past deals, product docs, and tone-of-voice guidelines. Each agent reads what's relevant to its role.
Brain. A mix. A cheap fast model for triage and routing. A capable model for writing. A specialised model for call analysis.
Memory. Shared long-term store so agents don't re-learn things. Per-agent working memory for each task.
Senses. Inbound emails, call transcripts, uploaded documents, signals from your CRM, LinkedIn activity if you wire it in.
Interface. Mostly invisible. The agents work in the background. You see summaries in Slack, a dashboard on your laptop, the Friday report in your inbox.
Tools. CRM write access, email send, calendar booking, web search, document generation, and connections to each other (agents calling agents).

Result: Keith's setup, roughly. Qualification goes from "hours" to "instant." Research is done before every first call. Follow-ups stop dropping. The Friday report arrives without you doing anything. The work didn't vanish; it just isn't all yours anymore.

The point: this is the shape of where the industry is going, and it's less exotic than it looks. The framework hasn't changed. There are just more instances of each block, networked. If you understand the eight blocks, you understand what's happening here, even if you never build one yourself.

A note on altitudes. None of these three is "better" than the others. Altitude 1 is the right altitude for "I want to understand this article." Altitude 3 would be absurd overkill for the same task. The skill isn't always flying higher; it's flying at the right altitude for the task in front of you.

If you got from a Commodore 64 to here, this won't break you. The parts are fewer than you think.