Lab

Building is easy. Knowing what to build is the work. | Strategic clarity from hands-on work.

I Built an AI Product on a 1,500-Year-Old Tamil Corpus.

The Spark: 3,864 Tamil devotional verses. 50,724 word-by-word meanings. Centuries of commentary. All of it locked behind static HTML on a WordPress site. I wanted to ask questions of the corpus — not just read it. "Which Alwars sang about Srirangam?" shouldn't require manual cross-referencing across 25 prabandhams.
The Build: Full-stack AI application. FastAPI backend with a 22-tool Claude agent, PostgreSQL + pgvector for relational + vector queries in one database, Vyakyarth (Indic embedding model) running locally for zero-cost semantic search, and a Next.js chat UI with SSE streaming. Scraped the entire corpus, extracted 42,731 named entities via Haiku, embedded all 3,864 verses, and shipped a conversational interface that answers natural language questions grounded in the actual text.
Architecture Highlights:
- Single PostgreSQL instance handles relational data, full-text search (Tamil with simple config — no stemmer exists), and vector similarity via pgvector. No separate vector DB needed at this scale.
- Hybrid search: FTS for exact Tamil word matches + Vyakyarth embeddings for semantic/cross-lingual queries, merged via Reciprocal Rank Fusion (rank-based, avoids the score normalization trap).
- 22-tool agent with a 5-call safety limit. 11 corpus analytics tools run pure SQL — instant results, zero AI cost. Name resolution baked into tools so the agent skips ID lookup calls, cutting per-session cost roughly in half.
- Four-layer anti-hallucination: system prompt hardening, citation enforcement via {{verse:ID}} markers, post-response verse ID validation against the DB, and negative evidence in tool results ("these 12 Alwars did NOT mention this place").
- Three caching layers: Anthropic prompt caching (54% input token reduction), query response cache (repeat queries cost $0), corpus tool memoization (deterministic SQL on static data).
- NER extraction: 42,731 entities across places, deity epithets, and theological concepts. 642 canonical mappings for deduplication. Divya Desam validation matches NER places against the canonical 108 sacred sites using prefix-stripped normalization + bidirectional substring matching.
AI Stack: Claude Sonnet 4.5 for agent and explanation endpoints. Claude Haiku 4.5 for batch NER extraction ($20 for the full corpus). Vyakyarth (krutrim-ai-labs) for embeddings — 768-dim, runs locally, zero API cost. SSE streaming with a split strategy: non-streaming tool loops, streaming final synthesis.
The Takeaway: At $0.028 per user query and ~1,200 sessions per $30 budget, the economics work because the architecture treats the LLM as the last resort — SQL answers what SQL can, embeddings handle similarity, and Claude only fires when you actually need reasoning.

Applying the MCP Filter: Two Products, Different Verdicts

Original Post

The Spark: I needed a diagnostic to answer one question: should this product be agentic? Whiteboard designs can look identical. The build exposes the truth.
The Build: Two MCP servers—a Tamil translation workflow and a movie recommendation system. Same patterns applied to both. One confirmed the agentic hypothesis. The other falsified it.
Architecture Highlights:
- Context is a budget. 78,000 tokens before any work starts on the translation workflow—the naive version simply fails.
- 9 architectural patterns applied: combined operations, batch operations, lean responses, tiered retrieval, state persistence, smart defaults, workflow guidance, external knowledge injection, incremental learning loops.
- 90% context reduction. But more importantly: translation stayed complex. Movies collapsed into search + filtering.
The Takeaway: The question to ask anyone pitching an "AI agent": what happens when you optimize for context? Does the orchestration stay complex, or does it collapse into a query layer?

MCP Apps: UI Capabilities for MCP Clients

Original Post

The Spark: MCP tools return data. But what if they could return interactive UI?
The Build: MCP Apps are now live as an official MCP extension. Tools can return dashboards, forms, visualizations, and multi-step workflows that render directly in the conversation.
The Takeaway: The boundary between tool output and user interface just disappeared.

A Methodology for Evaluating LLMs on Any Task

Original Post

The Spark: "Which LLM is best?" is the wrong question. "Best for what?" is the right one. Traditional evals tell you which model is smartest. I needed to know which model fits my specific task.
The Build: A cross-evaluation methodology. Same prompt → multiple models → multiple outputs. Each model evaluates all outputs against dimensions I define. Then meta-compare the comparison matrices.
Architecture Highlights:
- Consensus: where all evaluators agree—likely objective truths
- Divergence: where evaluators disagree—reveals what each model values
- Case study: transcript formatting. ChatGPT 5.2, Claude Opus 4.5, Gemini 3 Pro. ChatGPT won 4/8 consensus dimensions. But the divergence told me more.
AI Stack: ChatGPT → hybrid reference doc. Claude → tight executive brief. Gemini → magazine-style Q&A. These aren't bugs. They're design philosophies.
The Takeaway: Models have personalities. Match the philosophy to your task.

Can AI Review Your UX? Comparing Comet vs. Atlas on SPAs

Original Post

The Spark: A great use case for AI browsers is to ask them to review UX on single-page applications.
The Build: Compared Comet vs. Atlas on SPA UX review tasks.
The Takeaway: AI browsers are becoming viable tools for product feedback loops.

The Information Asymmetry System: Why AI Will Transform Markets

Original Post

The Spark: The Economist declared AI's war on the "rip-off economy." But the real story isn't just technology—it's feedback loops that kept information asymmetries stable for 50 years.
The Build: MIT Systems Dynamics analysis mapping three interconnected feedback loops explaining why the problem persisted and why AI might actually solve it.
Architecture Highlights:
- R1 (Vicious Cycle): Seller knowledge → information asymmetry → consumer vulnerability → market opacity. Zero negative links. Self-amplifying.
- B1 (AI Solution): Information gap → consumer AI adoption → transparency tools → gap DECREASES. One negative link. Self-correcting.
- R2 (Arms Race): Transparency tools → seller AI counter-response → gap recreates at higher sophistication. The future equilibrium.
- Critical insight: Consumer AI adoption (weeks-months) is 10-20x faster than enterprise AI deployment (12-24+ months). This asymmetry creates an extended B1 advantage window.
The Takeaway: 2024-2028 represents a multi-year consumer AI advantage window before R2 equilibrium emerges.

Don't Forget Your Design (Patterns)

Original Post

The Spark: Building an Agentic AI system without design patterns is like building a distributed system in 2010 without understanding REST or microservices. You'll ship something. It will work. Then it will become unmaintainable.
The Build: GyanAgent—a personal knowledge management system. Ingests content from YouTube, podcasts, newsletters, RSS, web articles. Uses AI to cluster insights and surface what matters.
Architecture Highlights:
- 11/23 Gang of Four patterns emerged organically: Factory Method, Prototype, Adapter, Facade, Proxy, Chain of Responsibility, Command, Mediator, Observer, Strategy, Template Method.
- Factory Method: swapped LLM providers 12+ times during development. Each swap: one environment variable. Zero code changes.
- Strategy: tuned clustering parameters across 47 runs. Each experiment: change one YAML value. No code deployments.
- Template Method: Network Watcher agent took 120 lines of domain logic. Lifecycle orchestration already done.
The Takeaway: Frameworks are tools. Patterns are blueprints. You can build a house with just tools, but without blueprints you'll end up with rooms that don't connect.

How I Run My Second Brain for $8/Month

Original Post

The Spark: I was drowning. Hundreds of newsletters unread. RSS feeds piling up. Podcasts queued. YouTube "Watch Later" mocking me. The worst part? Buried in that pile were insights that could've changed my decisions.
The Build: GyanAgent (Sanskrit for knowledge/wisdom)—a personal second brain that reads my newsletters, traverses RSS feeds, listens to podcasts, and "watches" my YouTube videos. It transcribes, summarizes, and organizes everything into topics. Runs 8 autonomous agents on a schedule.
Architecture Highlights:
- Multi-source ingestion: newsletters, RSS, podcasts, YouTube
- Transcription + summarization pipeline
- Topic-based organization with a clean UI
- Feedback loop: my reactions train the agent over time
Tools: Local AI stack. $8/month infrastructure cost. Demo: demo.gyanagent.ai
The Takeaway: The best AI tools are the ones that compound knowledge while you're not looking.

The Two Capital Loops Powering the AI Boom

Original Post

The Spark: OpenAI orchestrated deals up to $200 billion with Nvidia and AMD in two weeks. Everyone sees procurement contracts. I see two fundamentally different feedback loops.
The Build: Systems dynamics analysis of OpenAI's capital architecture.
Architecture Highlights:
- Nvidia Loop (Control): Nvidia invests $100B in OpenAI → OpenAI spends it back on Nvidia chips. Closed financial circuit. Each cycle increases dependency, increases Nvidia's leverage, enables more aggressive terms.
- AMD Loop (Alignment): OpenAI buys 6 GW of AMD GPUs → AMD rewards with penny-priced warrants worth up to 10% of the company. Symbiotic rather than parasitic. OpenAI's success drives AMD value, which benefits OpenAI's warrant position.
- OpenAI became the meta-loop operator—simultaneously customer, partner, and kingmaker.
The Takeaway: In the AI era, how you architect capital flows is as important as how you architect neural networks.

Ramanujan's Magic Squares

Original Post

The Spark: Can modern AI decipher the handwritten notes of a mathematical genius?
The Build: Tested GPT-5, Claude Sonnet 4.5, and Gemini 2.5 Pro on Ramanujan's magic square manuscripts.
The Takeaway: The frontier models can read mathematical genius. The question is whether they can extend it.

When Art Meets AI: Animating 1858 Photography

Original Post

The Spark: In 1858, Linnaeus Tripe became the first person to photograph Thanjavur's Brihadīśvara Temple. What if that static moment could move?
The Build: Used Google's Veo2/Veo3 to animate Tripe's historical photographs into video.
Tools: Veo2, Veo3, source images from Wikipedia commons.
The Takeaway: AI video generation isn't just about creating new content—it's about breathing life into history.

A/B Summarization Bakeoff: gemma3:12b vs qwen3:14b

Original Post

The Spark: My summarization results were good, not great. I suspected the model was the bottleneck.
The Build: Swapped gemma3:12b for qwen3:14b in my summarization pipeline. Ran A/B comparison.
The Takeaway: Model selection is empirical, not theoretical. Swap and measure.

Late-Night Modular AI Challenge

Original Post

The Spark: Late-evening challenge: how quickly could I chain my custom-built AI modules to dig into a specific question?
The Build: Modular AI pipeline using Ollama, Llama, Gemma, Qwen.
The Takeaway: Modular architecture pays dividends when you need to move fast.

How I AI-ed "How I AI?"

Original Post

The Spark: I learn fastest from people doing the work. So I pointed AI at a show about AI—Claire Vo's "How I AI?" podcast. I wanted to see what patterns emerge when practitioners describe their actual workflows.
The Build: A lightweight pipeline to ingest the full podcast corpus (21 episodes) and treat it like product telemetry. Every tool mention, workflow step, and role got tagged. 70 use-cases. 106 tools.
Architecture Highlights:
- Corpus ingestion with structured tagging
- Signal tracking: Agentic (20%), MCP/Connectors (10%), Multimodal (16%)
- Tool-role mapping to see who uses what
- Pattern extraction: PRD→Prototype→Code→Ship; Transcript→Summarize→Publish; Agent+MCP loops
Tools:
- Most frequent pairing in the data: Cursor × Claude in the IDE
- Surfaces where work lives: Docs/Wiki (20%), PR/CI (9%), Slack (6%)
- Taxonomy emerged: Create → Build → Analyze → Operate → Surface
The Takeaway: Tools turn over; craftsmanship doesn't.

The Learning Lab: Interactive Book Visualization

Original Post

The Spark: I was reading Anil Ananthaswamy's "Why Machines Learn?" and hit that familiar gap—the one between reading a concept and actually understanding it. Static text wasn't enough. I wanted to see the math come alive.
The Build: A weekend project. I built a Streamlit platform that turns static book chapters into a dynamic playground—interactive visualizations, Feynman-style explanations, real-time parameter controls.
Architecture Highlights:
- Plug-and-play: add new books/chapters by dropping in a Python file
- Three-panel layout: navigation left, visualization center, controls right
- Engineered prompt for structured output: Everyday Analogies, Visual Aids, Real-World Applications, Gotchas, Key Takeaways, Further Reading
AI Stack:
- GPT-5 for generating structured content (concise, on-target)
- Claude Code for vibe-coding the Streamlit app itself
The Takeaway: The most exciting use of AI is as a creative partner—building tools that deepen your own understanding.