Loading

I build AI agents your security team will actually approve.

For US tech companies past Series A that need agents, not chatbots.

Your team wants AI to actually do work. Your security team wants nothing to leak. I build for both.

Book a 30-minute call

No pitch. We map your highest-friction workflow and you decide if a paid Audit is worth it.

Scroll
Scoped access

Least-privilege by default.

Every agent only sees what it needs. Documented permission matrix. No high-risk action runs without approval.

Full audit trail

Every action logged.

Tool calls, inputs, outputs, approvers. Stored in your infrastructure. Exportable for compliance review.

Human in the loop

Approval gates on anything risky.

Customer-facing messages, writes to production, irreversible actions. Your team approves. Always.

What every agent I build will and won't do.

Will not autonomously
  • Delete or permanently destroy data, files, repos, tickets, records, or accounts
  • Send customer-facing messages (email, chat, SMS, support reply) without human approval
  • Execute financial transactions (payments, refunds, transfers, contract signing)
  • Write to production systems without scoped permissions and a rollback plan
  • Access secrets or credentials except through approved vault patterns (1Password, Doppler, AWS Secrets Manager)
  • Bypass existing approval workflows that humans rely on
  • Take irreversible actions without a human-in-the-loop gate
  • Use customer PII outside the boundaries set by your data classification
Every agent always has
  • A documented permission matrix (who can do what)
  • Audit logs for every tool call (exportable, queryable)
  • Human-approval gates on customer-facing or high-blast-radius actions
  • Eval suite for known failure modes
  • Rollback or undo plan for any state-changing operation
  • Failure-mode visibility (agent refuses unsafe requests and says why)

How an agent fits inside your infrastructure

~/agent-stack
$ describe-stack
LLM gatewayLiteLLMegress: client VPC only
OrchestratorMastraruns in client infra
ObservabilityLangfuseself-hosted, client owns data
Tool layercustom MCPleast-privilege scoped
Approval gatesSlack + webhooks
Audit sinkclient S3 / GCS
Eval suiteLangfuse + customcorrectness, PII, regression
Handoverrunbook + repoclient owns everything
$ describe-boundary
All components run inside client infrastructure.
No data leaves the boundary except via the LLM gateway.
Gateway logs every outbound call.
Every tool call is scoped, logged, and reversible.
$ status
ready.

Every outbound call is routed, logged, and governed by policy.

Offer

Agent Opportunity Audit

$3,500/one week

I map your highest-friction workflow, identify three to five automation candidates, and write you a deployment plan with risk and ROI estimates. No prototype, no commitment beyond the week.

What you get
  • Workflow audit and process map
  • Three to five candidate workflows ranked by ROI and risk
  • Tool permission matrix and data classification draft
  • Recommended pilot scope with timeline and price
  • 30-minute readout call
Book a 30-minute callWe map your workflow. You decide if a paid Audit is worth it.

Who you're working with

I'm Sarthak, an engineer based in New Delhi. I build production AI agent systems for US tech companies, focused on the unglamorous parts most AI consultants skip: permissions, audit trails, evals, and rollback. If you can't show your security team how an agent works, you can't ship it.

For the last two years, I've worked on AI training and agent systems via Turing, Ignitech, and G2i, on projects for OpenAI, Anthropic, Meta, and others. That work taught me what production-grade AI systems require beyond the demo. Now I'm bringing that into agent builds for US tech companies that need real workflow automation without the data egress risk.

Common questions

Why not just build this internally?
You can. The question is what your team's time is worth. Building a production agent system end-to-end (MCP servers, permissions, audit logs, evals, observability, integration with your existing tools) usually takes a senior engineer six to ten weeks of focused work. They have to learn the agent stack while building it. I've already done that learning on systems for OpenAI, Anthropic, Meta, and other frontier labs. Hire me for four weeks to ship a hardened pilot, or burn eight weeks of your engineer's time. The math usually wins for hiring me on the first one. Then your team owns and extends it.
Why not use n8n / Zapier / Make?
Those are great for deterministic workflows. If X then Y. The minute you need actual reasoning (is this customer angry, is this PII, should this escalate) you've outgrown them. Most teams I work with already have n8n or Zapier. We use them as the orchestration layer underneath the agent. Complementary, not competitive.
Why not use ChatGPT Enterprise / Claude Team?
Those are chat tools. Great if you want your team to ask AI questions. They don't do work. They can't read your Linear board, update your Zendesk tickets, post to your Slack. Not without an engineer building MCP servers between them. That's the gap I fill. Turning chat into agents that take action with guardrails.
Why not wait for our SaaS vendors to ship agents?
Some will. Some won't. The ones who do will charge per seat and only work within their own product. If you want one agent that crosses tools (pulls from Zendesk, updates Linear, posts to Slack, checks HubSpot) no single vendor is going to ship that for you. That's where custom agents stay valuable indefinitely.
What happens if you disappear?
Three things. One, you own all the code, the infrastructure, and the runbook. It lives in your accounts, not mine. Two, every engagement includes a handover doc that lets an internal engineer maintain and extend the system without me. Three, the architecture I build is intentionally boring. Standard tools (Mastra, LiteLLM, Langfuse), standard patterns, no proprietary magic. If I disappeared tomorrow, your team could keep this running. That's a design constraint I impose on every project.

Map your highest-friction workflow.

One call. No pitch. You leave with a clearer picture of where agents will actually help.