Video
Demystifying AI, LLMs, and RAG
Why demystify?
AI buzz is everywhere, but when you actually dive into the code, terms like “vectors,” “embeddings,” and “RAG” can feel a bit like alphabet soup. In this video, Kevin starts by reminding us that these aren’t just marketing slogans: they’re the very primitives your next AI-powered service will rely on. Getting them straight up front means smoother development, fewer surprises in production, and easier conversations when spreading the AI joy to your colleagues.
Vectors and embeddings: the DNA of AI
We can think of vectors as being points along a journey to some spot. The more points there are in the journey, the more detail we have. The more detail we have, the more easily we can compare one journey with another. This is where embeddings and semantic search come in.
- Embeddings are just vectors produced using a certain algorithm or a model. Embeddings are target specific, so they are different for each LLM you interact with.
- They’re the foundation of similarity searches and clustering. These are key when you start asking, “Which documents best answer this query?” or “What conversations are similar to this user’s message?”
By the end of this section, you’ll have a mental picture of your data living in a giant multi-dimensional galaxy—and embeddings as the star charts guiding your quest.
Prompts and prompt engineering: your conversation starter
If embeddings are the map, prompts are the directions you give to your AI agent. Kevin emphasizes that writing great prompts is part art, part science:
- Context: What background information does the model need?
- Instruction: What exactly do you want it to do?
- Format: How should it present its answer?
Prompt engineering is about iterating these three elements until your model reliably does what you expect. Kevin walks through a simple “summarize this paragraph” example, showing how small tweaks—like adding “in two sentences” or “as a bullet list”—dramatically change the output. It’s a bit like calibrating a lens: adjust the focus until the picture is sharp.
Except with LLMs, the thing you’re trying to focus your lens on can be unpredictable; random.
RAG (Retrieval-Augmented Generation): when you need real data
Large language models are amazing, but they only know what they were trained on. Most model providers should be able to tell you when the “cut off” or trained-on date is. Possibly even more important than keeping a model up to date is giving it the ability to use private, application- or enterprise-specific information in its inference.
Retrieval-Augmented Generation fixes that by:
- Retrieving the most relevant documents from your internal data store.
- Augmenting the prompt with those documents as context.
- Generating an answer that’s grounded in up-to-date, accurate facts.
Kevin shows a flowchart: user question → vector search against your knowledge base → pass top-N docs + question to the LLM → return an answer that hopefully doesn’t hallucinate.
Agentic AI: from “ask” to “act”
Up to now, we’ve talked about reactive AI: you ask, it answers. Agentic AI is the next step—AI that can make decisions, take actions, and iterate on its own:
- Autonomy: Your agent can plan multi-step workflows (e.g., “research product specs, schedule a meeting, and draft a summary”).
- Adaptation: It learns from outcomes. If a meeting gets rescheduled, it automatically updates the plan.
- Integration: Your agent calls APIs, triggers pipelines, sends notifications—whatever it takes to hit its goal.
Kevin breaks down several simple examples of how orchestration of one or more agents can add super powers to any application.
Putting it all together
In Kevin’s demo, he:
- Builds embeddings from a set of documentation
- Writes a prompt template that asks for step-by-step troubleshooting.
- Wires up a RAG pipeline so the model’s answers always reference the latest docs.
- Wraps it in an agent.
Takeaways for your next project
- Master the foundations: vectors and embeddings aren’t optional—they’re your data’s identity in AI space.
- Iterate your prompts: treat prompt engineering like T-shaping your inputs—tune for clarity and precision.
- RAG is non-negotiable: if you need accurate, up-to-date answers, retrieval-augmented generation is your go-to.
- Think agentically: once you’ve nailed reactive workflows, push into autonomous agents that can plan, act, and learn.
Kevin closes with a reminder that building agentic AI isn’t about replacing humans; it’s about amplifying their abilities. By automating routine decisions and integrating seamlessly with your existing systems, you free your team to focus on the parts of the job that truly need creativity and judgment.
Posts by this author