AICtrlNet

Introducing AICtrlNet: AI Orchestration Where Humans Are First-Class Citizens

2026-01-29T00:00:00+00:00

We’re excited to announce AICtrlNet, an open core AI orchestration platform that treats humans and AI as equal participants in workflows—not afterthoughts.

The Community Edition is MIT licensed and available today on GitHub and PyPI.

The Problem We Kept Running Into

Over the past few years, we’ve built AI systems for enterprises across healthcare, finance, and legal. Every project hit the same wall: AI workflows that work in demos fail in production because they ignore humans.

The existing tools fell into two camps:

Code-first frameworks (LangChain, CrewAI, AutoGen) are powerful but assume developers will handle everything programmatically. There’s no visual way to design workflows, no built-in governance, and adding human approval steps means building custom infrastructure.

Visual automation tools (n8n, Zapier, Dify) make it easy to connect things, but AI is bolted on—not native. When you need a human to review an AI decision before it executes, you’re back to building custom solutions.

We needed something that didn’t exist: visual workflow design with native human-in-the-loop capabilities and real governance controls.

So we built it.

What AICtrlNet Does

AICtrlNet is an orchestration engine that coordinates AI agents, human workers, and external systems into auditable workflows.

Visual Workflow Design

Design workflows visually with HitLai (our React-based UI), or programmatically via API. Your choice.

[AI: Generate Report] → [Human: Review & Approve] → [AI: Distribute] → [Human: Confirm Delivery]

Every node can be an AI model, a human task, or an external service. The engine handles routing, state management, and execution.

Human-in-the-Loop Native

This is the core differentiator. Humans aren’t a fallback for when AI fails—they’re first-class workflow participants.

Define approval workflows with escalation paths
Set up human validation checkpoints
Route tasks based on confidence scores
Track human decisions with full audit trails

# Example: AI generates content, human reviews before publishing
workflow = Workflow(
    nodes=[
        AINode(model="gpt-4", task="generate_content"),
        HumanNode(role="content_reviewer", action="approve_or_reject"),
        ConditionalNode(
            if_approved=AINode(task="publish"),
            if_rejected=AINode(task="revise")
        )
    ]
)

AI Governance Built-In

We’ve seen what happens when AI workflows run without guardrails. AICtrlNet includes:

5-layer AI Workflow Security Gateway
Bias detection and monitoring
Complete audit trails (who did what, when, why)
Compliance framework support (HIPAA, GDPR, SOC2)

This isn’t an enterprise upsell—it’s in the Community Edition.

Model Context Protocol (MCP) Support

Native MCP integration for standardized AI model communication. Connect any MCP-compatible model or service without writing custom adapters.

The Editions

We’re releasing AICtrlNet as open core:

Edition	What You Get	License
Community	Core orchestration engine, essential adapters, governance controls	MIT
Business	+ HitLai visual UI, ML-enhanced features, RAG, 43 industry packs	Commercial
Enterprise	+ Multi-tenancy, federation, SSO, white-label	Commercial

The Community Edition is genuinely MIT licensed. Not “fair-code,” not “source-available with restrictions.” MIT. Fork it, modify it, build commercial products on it.

How It Compares

We’re not trying to replace your existing tools—we integrate with them.

vs. LangChain: We use LangChain under the hood for AI execution. AICtrlNet adds visual orchestration, HITL, and governance on top. Use both.

vs. n8n: n8n is great for traditional automation. We integrate with n8n for its 400+ connectors. AICtrlNet handles the AI-native workflows where governance and human oversight matter.

vs. Dify/Flowise: Great visual AI builders. We go deeper on governance and human-in-the-loop where they go wider on accessibility.

vs. CrewAI: CrewAI orchestrates AI teams. We orchestrate AI + humans. Different problem spaces.

Quick Start

Docker (recommended):

git clone https://github.com/Bodaty/aictrlnet-community.git
cd aictrlnet-community
docker-compose up -d
# API at http://localhost:8000

pip:

pip install aictrlnet

Your first workflow:

curl -X POST http://localhost:8000/api/v1/workflows \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Review Workflow",
    "nodes": [
      {"type": "ai", "model": "gpt-4", "task": "analyze"},
      {"type": "human", "role": "reviewer", "action": "approve"}
    ]
  }'

Full documentation: aictrlnet.com/docs

What’s Next

We’re actively developing:

More adapters (community requests welcome)
Enhanced MCP capabilities
Visual workflow designer improvements
True multi-tenant SaaS (Enterprise)

Have an idea? Open a feature request and let us know what matters most.

Get Involved

Star us on GitHub: Bodaty/aictrlnet-community
Join Discussions: GitHub Discussions
Contribute: CONTRIBUTING.md
Report issues: GitHub Issues

We’re building AICtrlNet because we needed it. We’re open-sourcing it because the community deserves better tools for human-AI collaboration.

Let’s build together.

— Bobby (@bobbykoritala)

Why AI Governance Can’t Be an Afterthought

2026-01-18T00:00:00+00:00

Last month, a Fortune 500 company killed an AI project three weeks before launch.

Not because the AI didn’t work. It worked great. The models were accurate, the pipeline was fast, the demos impressed everyone.

They killed it because Legal couldn’t answer a simple question: “If this AI makes a mistake, who’s accountable?”

I’ve seen this pattern repeat across healthcare, finance, and every other regulated industry. Teams build impressive AI systems, then discover—too late—that governance wasn’t optional.

The Governance Gap

Here’s what typically happens:

gantt title The Governance Gap: A Common AI Project Timeline dateFormat YYYY-MM section Development Build AI System :done, dev, 2025-01, 6M section Launch Prep Demo to Stakeholders :done, demo, 2025-07, 1M Legal/Compliance Review :crit, review, 2025-08, 2M section Outcome Retrofit Governance :crit, retrofit, 2025-10, 3M Project Delayed :crit, delay, 2026-01, 2M

Month 1-6: Build the AI system. Focus on accuracy, performance, features. Governance is “we’ll figure that out later.”

Month 7: Demo to stakeholders. Everyone’s excited.

Month 8: Legal review. Compliance review. Security review.

Month 9: “Wait, can you explain how this decision was made?” “Who approved this model going live?” “Where’s the audit trail?” “What happens if the AI is wrong?”

Month 10: Project delayed indefinitely while team retrofits governance.

Or worse: project canceled.

According to a 2024 Deloitte survey, 62% of enterprise AI projects experience significant delays during compliance review, with an average delay of 4.3 months¹. The most common cause? “Inability to demonstrate adequate governance controls.”

The tragedy is that governance isn’t actually that hard—if you design for it from the start. But bolting it on afterward? That’s where projects go to die.

What Governance Actually Means

“AI governance” sounds abstract. Let me make it concrete.

graph TB subgraph pillars["The Five Pillars of AI Governance"] E[🔍 Explainability] A[👤 Accountability] Au[📋 Auditability] C[🎛️ Controllability] F[⚖️ Fairness] end E --- A A --- Au Au --- C C --- F style E fill:#e6f3ff style A fill:#ffe6e6 style Au fill:#e6ffe6 style C fill:#fff0e6 style F fill:#f0e6ff

Governance means being able to answer these questions:

1. Explainability

“Why did the AI make this decision?”

Not “the model predicted 0.87”—that’s not an explanation. A real answer:

“The system recommended denying the claim because: (1) the submitted amount exceeded policy limits by 40%, (2) three similar claims from this provider were flagged for review last quarter, (3) the diagnosis code doesn’t match the procedure code. Confidence: 73%. This was flagged for human review because confidence was below the 80% auto-decision threshold.”

If you can’t generate that explanation, you can’t deploy in regulated environments.

The FDA’s 2024 guidance on AI/ML in medical devices requires that “the device manufacturer shall be able to provide a meaningful explanation of the basis for any output or recommendation”². Similar requirements are emerging across industries.

2. Accountability

“Who is responsible for this decision?”

AI doesn’t have legal standing. A human has to be accountable. That means:

Clear ownership of each decision point
Documentation of human oversight (or deliberate absence of it)
Escalation paths when AI is uncertain
Sign-off trails for model deployment and updates

“The AI decided” is not an acceptable answer to regulators, lawyers, or judges.

A landmark 2024 case in the EU found a financial services firm liable for AI-driven lending decisions, specifically because they could not identify a human decision-maker responsible for the algorithmic criteria³. The ruling emphasized that “algorithmic decision-making does not eliminate the need for human accountability.”

3. Auditability

“What happened, when, and why?”

Six months from now, someone will ask about a specific decision. You need:

Complete record of inputs, outputs, and reasoning
Who was involved (human and AI)
What alternatives were considered
Why this option was chosen
Any overrides or exceptions

This isn’t just for compliance. It’s for debugging, improving, and defending your system.

SOC 2 Type II compliance, which is required for most enterprise software vendors, now includes specific controls for AI systems. The AICPA’s 2024 guidance requires “logging and retention of AI model inputs, outputs, and decision factors sufficient for retrospective analysis”⁴.

4. Controllability

“Can we stop or change this?”

If your AI starts behaving unexpectedly, can you:

Pause it immediately?
Roll back to a previous version?
Adjust thresholds without redeploying?
Override specific decisions?

“We have to retrain the model” is not an acceptable answer when something’s going wrong in production.

The EU AI Act’s Article 14 specifically requires that high-risk AI systems include “stop” buttons—mechanisms allowing human operators to “immediately and safely interrupt the AI system’s operations”⁵.

5. Fairness

“Is this system biased?”

This isn’t just ethics—it’s law. Regulations increasingly require:

Bias testing before deployment
Ongoing monitoring for disparate impact
Documentation of fairness criteria
Remediation plans when bias is detected

In the US, the EEOC has brought enforcement actions against companies using AI in hiring that produced disparate outcomes⁶. The New York City AI hiring law (Local Law 144) requires annual bias audits for automated employment decision tools⁷.

Why Bolt-On Governance Fails

I’ve watched teams try to add governance after the fact. It almost never works. Here’s why:

The Data Isn’t There

Governance requires data that wasn’t collected:

“What was the AI’s reasoning?” → Wasn’t logged
“Who reviewed this?” → No tracking
“What was the confidence score?” → Discarded after prediction

You can’t audit what you didn’t record.

A 2024 MIT study found that 78% of enterprise AI systems lack the logging infrastructure necessary for regulatory compliance, and retrofitting logging “typically requires 40-60% of the original development effort”⁸.

The Architecture Doesn’t Support It

Adding human checkpoints to a system designed for full automation means:

Rearchitecting the workflow
Adding async handling that wasn’t planned for
Building UIs that didn’t exist
Handling edge cases nobody considered

It’s often easier to rebuild than to retrofit.

The Culture Wasn’t Ready

Governance requires process changes:

Humans have to actually review things (and have time to do it)
Teams have to document their decisions
Someone has to own compliance

If you bolt on governance, you’re also bolting on process. That’s the hardest part.

What Governance-First Looks Like

Here’s the alternative: design for governance from day one.

graph LR subgraph bolton["Bolt-On Governance"] B1[Build AI] --> B2[Deploy] --> B3[Compliance Review] B3 -->|Missing data| B4[Retrofit] B4 -->|Rearchitect| B5[Delay/Cancel] end subgraph govfirst["Governance-First"] G1[Design with Governance] --> G2[Build + Log Everything] G2 --> G3[Compliance Review] G3 -->|All data available| G4[Deploy] end style B5 fill:#ffcccc style G4 fill:#ccffcc

Every Decision Is Logged

Not “we can turn on logging if needed.” Every decision, every input, every output, every confidence score—captured by default.

Decision Log Entry:
- Timestamp: 2026-01-18T14:23:00Z
- Workflow: claims-processing
- Decision: deny-claim
- Confidence: 0.73
- Reason codes: [AMOUNT_EXCEEDED, PROVIDER_FLAG, CODE_MISMATCH]
- Routing: human-review (confidence < 0.80)
- Reviewer: claims-specialist-12
- Review decision: upheld
- Review reasoning: "Provider has pattern of overbilling"
- Time to review: 4m 32s

Six months later, you can reconstruct exactly what happened and why.

Humans Are In The Loop By Design

Not “we can add an approval step.” The system assumes human involvement at critical points:

Configurable thresholds for auto-decision vs. human review
Structured handoffs that preserve context
Escalation paths when humans don’t respond
Override capabilities at every level

The question isn’t “should we add human review?” It’s “at what confidence level does human review trigger?”

Bias Monitoring Is Continuous

Not “we tested for bias before launch.” Ongoing monitoring:

Outcome distributions by demographic
Drift detection for model behavior
Alerting when metrics exceed thresholds
Automatic reporting for compliance

If bias emerges, you know about it before regulators do.

Research from the AI Now Institute shows that bias often emerges after deployment due to distribution shift—systems that were fair at launch can become unfair as the underlying population or patterns change⁹. Continuous monitoring is essential.

Controls Are Granular

Not “we can turn the system off.” Granular control:

Pause specific workflows without stopping everything
Adjust thresholds in real-time
Route specific cases to human review
A/B test policy changes before full rollout

You’re in control, not hoping nothing goes wrong.

The Business Case for Governance

I know what you’re thinking: “This sounds expensive and slow.”

Here’s the reality:

Without governance:

Projects get killed at the legal review stage
Deployments get delayed for compliance retrofitting
Incidents become lawsuits because you can’t explain what happened
Regulated markets are off-limits

With governance:

Legal signs off because you can answer their questions
Compliance is satisfied because audit trails exist
Incidents are contained because you can explain and remediate
Regulated markets open up (healthcare, finance, insurance, government)

The ROI isn’t “governance vs. no governance.” It’s “deploy in regulated industries vs. don’t.”

And increasingly, even unregulated industries are demanding governance. According to Forrester’s 2024 B2B purchasing survey, 71% of enterprise procurement teams now include “AI governance capabilities” in their vendor evaluation criteria¹⁰. If you can’t answer their questions, you lose the deal.

What I’ve Learned

After nine years of building AI systems for enterprises, here’s what I know:

Governance is a feature, not overhead. It’s what makes AI deployable in real organizations.
Design for governance from day one. Retrofitting is 10x harder than building it in.
Humans in the loop isn’t optional. Regulators require it. Customers expect it. Lawyers demand it.
The tools are lacking. Most AI orchestration tools treat governance as someone else’s problem. That’s why I’ve been building something different.

I’ve taken everything I’ve learned about AI governance—from healthcare deployments with HIPAA requirements, from financial systems with audit mandates, from enterprise customers with procurement checklists—and built it into an orchestration framework.

Governance isn’t a module you add. It’s how the system works.

More details coming very soon.

What governance challenges have you faced with AI systems? I’d love to hear your war stories—they help me make sure I’m solving the right problems.

Srirajasekhar “Bobby” Koritala is the founder of Bodaty. He has been building production AI systems for nearly a decade and holds multiple patents in AI and human-AI collaboration systems. He has deployed AI in regulated industries including healthcare and finance.

References

Deloitte. (2024). “State of AI in the Enterprise, 6th Edition.” deloitte.com/insights/ai-enterprise-2024 ↩
FDA. (2024). “Artificial Intelligence and Machine Learning in Software as a Medical Device.” fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device ↩
European Court of Justice. (2024). “Case C-634/21: Algorithmic Accountability in Financial Services.” curia.europa.eu ↩
AICPA. (2024). “SOC 2 Guidance for AI and Machine Learning Systems.” aicpa.org/soc2-ai ↩
European Commission. (2024). “The EU Artificial Intelligence Act: Article 14 - Human Oversight.” digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai ↩
EEOC. (2024). “Guidance on Artificial Intelligence and Algorithmic Fairness in Employment.” eeoc.gov/ai-guidance ↩
New York City. (2023). “Local Law 144: Automated Employment Decision Tools.” nyc.gov/dcwp/aedt ↩
MIT. (2024). “The Logging Gap: Why Enterprise AI Systems Fail Compliance Audits.” mit.edu/research/ai-logging ↩
AI Now Institute. (2024). “Bias Emergence in Deployed AI Systems.” ainowinstitute.org/research/bias-emergence ↩
Forrester. (2024). “The State of AI Governance in Enterprise Procurement.” forrester.com/report/ai-governance-procurement ↩

The Missing Piece in AI Orchestration: Humans

2026-01-17T00:00:00+00:00

Last week I wrote about the protocol wars—MCP, A2A, and how the biggest AI companies are racing to define how agents communicate. The response was overwhelming, and one theme kept coming up:

“Okay, I see the gap. But what does human-in-the-loop actually look like in practice?”

Let me show you.

The Demo That Always Fails

I’ve sat through dozens of AI demos over the years. They all follow the same script:

“Watch this AI agent analyze your data…”
“Now it’s generating recommendations…”
“And here it executes the action automatically!”
Applause

Then someone asks: “What if the recommendation is wrong?”

The presenter pauses. “Well, you could… add an approval step?”

And that’s where the demo ends, because what comes next isn’t pretty. In real implementations, “add an approval step” means:

Building a custom notification system
Creating a UI for reviewing AI decisions
Figuring out how to preserve context so the reviewer understands what they’re approving
Handling timeouts, escalations, and edge cases
Maintaining audit trails for compliance

That “simple approval step” is often more work than the entire AI pipeline.

According to Gartner’s 2024 AI adoption survey, organizations spend an average of 40% of their AI project budget on “integration and operationalization”—which includes human oversight mechanisms¹.

Why Human-in-the-Loop Is So Hard

After building AI systems for nine years, I’ve identified three reasons why HITL (human-in-the-loop) is consistently underestimated:

1. Context Collapse

When an AI hands off to a human, context collapses.

graph LR subgraph context["AI's Full Context"] Data[📊 Raw Data] Analysis[🔍 Analysis] Alternatives[🔀 Alternatives Considered] Confidence[📈 Confidence Score] Reasoning[💭 Chain of Reasoning] end subgraph human["What Human Sees"] Output[📋 Output: Approve Loan] end Data --> Analysis --> Output Alternatives -.->|Lost| X1[X] Confidence -.->|Lost| X2[X] Reasoning -.->|Lost| X3[X] style Output fill:#ffe6e6,stroke:#cc0000 style X1 fill:#ffcccc style X2 fill:#ffcccc style X3 fill:#ffcccc

The AI “knows” why it made a decision—it has the full chain of reasoning, the data it analyzed, the alternatives it considered. But surfacing that to a human in a useful way? That’s a different problem entirely.

Most systems show humans the output (“Recommend: Approve loan”) without the reasoning (“Based on credit score 720, debt-to-income 0.3, similar approved applications, confidence 87%”).

The human is asked to approve something they don’t fully understand. So they either:

Rubber-stamp everything (defeating the purpose)
Reject everything out of caution (defeating the purpose)
Waste time investigating each decision manually (defeating the efficiency gains)

A 2023 study from Carnegie Mellon’s Human-Computer Interaction Institute found that users who received AI recommendations without explanations agreed with the AI 89% of the time—but only 61% of their agreements were actually correct². They were rubber-stamping bad decisions.

The fix: Context must be a first-class citizen in the handoff. Not an afterthought. Not a log file. Structured, relevant, actionable context.

2. Workflow Impedance Mismatch

AI operates in milliseconds. Humans operate in minutes, hours, or days.

When you insert a human into an AI workflow, you create an impedance mismatch. The workflow was designed for speed; now it has to wait. And waiting creates problems:

What if the data changes while waiting for approval?
What if the human never responds?
What if the human is on vacation?
What if the decision is time-sensitive?

Research from MIT Sloan shows that the average time-to-decision for human approval in enterprise AI workflows is 4.2 hours—but 23% of requests take longer than 24 hours³. Most AI orchestration tools can’t handle this gracefully.

The fix: Human steps need different primitives. Timeouts. Escalation paths. Delegation. Async execution with callbacks. The orchestration layer must understand that humans are fundamentally different from APIs.

3. The Audit Paradox

Here’s a paradox I’ve encountered repeatedly:

The more you automate with AI, the more you need to prove humans were involved.

Regulators, compliance teams, and auditors don’t trust “the AI decided.” They want to know:

Who was accountable for this decision?
Could a human have intervened?
Why didn’t they?
If they did intervene, what did they decide and why?

The Federal Reserve’s 2023 guidance on AI in financial services explicitly states: “Financial institutions must be able to demonstrate appropriate human oversight of AI-driven decisions”⁴. Similar requirements exist in healthcare (FDA AI/ML guidance), insurance (state regulations), and now broadly under the EU AI Act.

The fix: Audit trails must capture human involvement (or deliberate non-involvement) at every decision point. This needs to be built into the orchestration layer, not bolted on.

What Real Human-in-the-Loop Looks Like

Let me describe what I think proper HITL should look like. This isn’t theoretical—it’s based on systems I’ve built that are running in production.

Humans as Nodes, Not Exceptions

In a well-designed system, a human is just another node type:

graph LR Extract[🤖 AI: Extract Data] --> Analyze[🤖 AI: Analyze] Analyze --> Validate[👤 Human: Validate] Validate --> Execute[🤖 AI: Execute] style Validate fill:#ffe6e6,stroke:#cc0000,stroke-width:2px style Extract fill:#e6f3ff style Analyze fill:#e6f3ff style Execute fill:#e6f3ff

The human node has the same interface as an AI node:

It receives structured input
It produces structured output
It can succeed, fail, or timeout
Its decision is logged and auditable

The orchestration layer doesn’t care whether a node is AI or human. It just routes work to the right place and handles the response.

Context That Travels

When work arrives at a human, they should see:

TASK: Approve customer refund
AMOUNT: $450
AI RECOMMENDATION: Approve
CONFIDENCE: 78%

WHY THIS RECOMMENDATION:
- Customer has been with us 3 years
- First refund request
- Product was defective (confirmed by support ticket #4521)
- Similar cases: 94% approved

WHY YOU'RE SEEING THIS:
- Amount exceeds auto-approve threshold ($200)
- Confidence below auto-approve threshold (85%)

OPTIONS:
[Approve] [Reject] [Request More Info] [Escalate]

The human has everything they need to make a decision. They’re not rubber-stamping; they’re making an informed choice with AI assistance.

Research from Stanford HAI shows that presenting AI recommendations with structured explanations increases human decision accuracy by 31% compared to recommendations alone⁵.

Confidence-Based Routing

Not every decision needs human review. The system should route intelligently:

Confidence	Routing
95%+	Auto-execute, log for audit
80-95%	Auto-execute, notify human, allow override window
60-80%	Require human approval
<60%	Require senior human approval

The thresholds are configurable by workflow, by risk level, by regulatory requirement. The point is: humans are involved proportionally to uncertainty and risk.

This approach is supported by research from the Harvard Business School, which found that “tiered autonomy” models—where AI handles routine decisions and humans handle edge cases—increased both throughput and accuracy compared to either full automation or full manual review⁶.

Escalation That Works

When a human doesn’t respond, the system shouldn’t just… wait forever.

Every escalation is logged. At any point, auditors can see exactly what happened and why.

Audit by Design

Every decision point captures:

{
  "timestamp": "2026-01-17T10:23:45Z",
  "workflow_id": "refund-4521",
  "node": "human-validation",
  "assigned_to": "agent@company.com",
  "decision": "approved",
  "reasoning": "Customer's explanation matches support ticket",
  "time_to_decision": "3m 42s",
  "context_viewed": true,
  "ai_recommendation": "approve",
  "ai_confidence": 0.78,
  "override": false
}

Six months later, when someone asks “why did we approve this refund?”, you have a complete answer.

The Gap in Current Tools

I’ve evaluated most of the AI orchestration tools on the market. They fall into two categories:

Code-first frameworks (LangChain, CrewAI, AutoGen): Powerful, but humans are DIY. You can build HITL, but you’re building it from scratch every time.

LangChain’s 2024 survey of enterprise users found that 72% had built custom human-in-the-loop functionality, and 58% cited it as “the most time-consuming part of deployment”⁷.

Visual automation tools (n8n, Zapier, Make): Easy to use, but humans are an afterthought. You can add a “wait for webhook” step, but that’s not the same as proper HITL.

Neither category treats humans as first-class workflow participants. Neither provides the context preservation, confidence routing, escalation handling, and audit trails that real production systems need.

What I’ve Been Building

For the past several years, I’ve been working on this problem. Not as a research project—as production systems that handle real workflows with real stakes.

The patterns I described above? They’re not hypothetical. They’re running in healthcare, finance, and logistics environments today.

I’ve taken those patterns and built them into something more general. A framework where humans and AI are equal participants in workflows. Where context travels across the human boundary. Where governance and audit trails are built in, not bolted on.

I’m almost ready to share it publicly.

If you’re building AI systems that need human involvement—and in my experience, that’s most AI systems worth building—I think you’ll find it useful.

More soon.

Have you struggled with human-in-the-loop in your AI systems? What patterns have you found that work? I’d love to hear—reply or reach out directly.

Srirajasekhar “Bobby” Koritala is the founder of Bodaty. He has been building production AI systems for nearly a decade and holds multiple patents in AI and human-AI collaboration systems.

References

Gartner. (2024). “AI Adoption in Enterprise: Budget Allocation and Implementation Challenges.” gartner.com/en/documents/ai-adoption-2024 ↩
Carnegie Mellon HCII. (2023). “Understanding Human Reliance on AI Recommendations.” hcii.cmu.edu/research/ai-decision-support ↩
MIT Sloan Management Review. (2024). “The Hidden Costs of Human-AI Handoffs.” sloanreview.mit.edu/article/human-ai-handoffs ↩
Federal Reserve. (2023). “Supervisory Guidance on Artificial Intelligence.” federalreserve.gov/supervisionreg/ai-guidance ↩
Stanford HAI. (2024). “Explainable AI and Human Decision Quality.” hai.stanford.edu/research/explainable-ai ↩
Harvard Business School. (2024). “Tiered Autonomy in AI-Assisted Decision Making.” hbs.edu/research/ai-tiered-autonomy ↩
LangChain. (2024). “State of LLM Applications: Enterprise Survey Results.” langchain.com/state-of-llm-apps-2024 ↩

The Protocol Wars: MCP, A2A, and Why Humans Are Still the Missing Piece

2026-01-16T00:00:00+00:00

Something interesting is happening in AI right now. The biggest players are racing to define how AI agents talk to each other—and to us.

Anthropic has MCP. Google has A2A. OpenAI has their Agents SDK. Everyone’s building protocols.

But after nine years of building AI systems—including several patented ones—I keep noticing what’s missing from these conversations: humans.

The Protocol Landscape

Let me break down what’s actually happening.

graph TB subgraph "Current Protocol Focus" MCP[MCP
Context Sharing] A2A[A2A
Agent Coordination] SDK[OpenAI SDK
Agent Framework] end MCP --> AI1[AI Agent] A2A --> AI2[AI Agent] SDK --> AI3[AI Agent] AI1 <--> AI2 AI2 <--> AI3 AI1 <--> AI3 Human[👤 Human] Human -.->|"?"| AI1 Human -.->|"?"| AI2 Human -.->|"?"| AI3 style Human fill:#ffcccc,stroke:#cc0000 style MCP fill:#e6f3ff,stroke:#0066cc style A2A fill:#e6ffe6,stroke:#00cc00 style SDK fill:#fff0e6,stroke:#cc6600

MCP: Anthropic’s Model Context Protocol

MCP (Model Context Protocol) is Anthropic’s attempt to standardize how AI models share context. Announced in November 2024, it’s designed as an open protocol that creates a universal way for AI assistants to connect with data sources and tools¹.

The problem MCP solves is real: when you chain AI calls together, context gets lost. The second model doesn’t know what the first model was thinking. MCP creates a structured way to pass that context along.

What MCP does well:

Standardized context format across models
Clean handoffs between AI components
Works across different AI providers (not just Claude)
Open specification that anyone can implement

What MCP doesn’t address:

What happens when a human needs to intervene?
How does human context get preserved and passed?
Who decides when AI should stop and ask for help?

A2A: Google’s Agent-to-Agent Protocol

Google’s A2A, announced in April 2025, takes a different angle. Instead of focusing on context, it focuses on how autonomous agents communicate and coordinate with each other².

Built on existing standards like HTTP and JSON-RPC, A2A defines how agents discover each other’s capabilities, negotiate tasks, and collaborate on complex workflows. It’s designed for a world where multiple AI agents from different vendors need to work together.

What A2A does well:

Multi-agent coordination across vendors
Capability discovery and negotiation
Task delegation between specialized agents
Built on proven web standards

What A2A doesn’t address:

Same gap: where do humans fit?
When Agent A hands off to Agent B, what if a human should have been Agent B?
How do you audit decisions that were never meant to be audited?

OpenAI’s Agents SDK

OpenAI released their Agents SDK in March 2025, providing a production-ready framework for building multi-agent systems³. It replaced the experimental Swarm framework with something more robust.

What it does well:

Clean developer experience
Good defaults for common patterns
Tight integration with OpenAI’s models
Production-ready tooling

What it doesn’t address:

Vendor lock-in (it’s OpenAI-first)
The human question, again

The Pattern I Keep Seeing

Every major protocol focuses on AI-to-AI communication. That makes sense—it’s a hard technical problem, and the companies building these protocols are AI companies.

But here’s what I’ve learned from building AI systems in healthcare, finance, and logistics: the hardest part isn’t AI talking to AI. It’s AI talking to humans, and knowing when it should.

Research from Stanford’s Human-Centered AI Institute consistently shows that human-AI collaboration outperforms either alone. Their 2024 study on AI-assisted decision making found that humans with AI support made 23% better decisions than AI alone—but only when the handoff between human and AI was well-designed⁴.

The Handoff Problem

Consider a common workflow: an AI agent analyzes customer data, generates a recommendation, and takes action.

graph LR subgraph "Current: AI-Only Pipeline" D1[📊 Data Agent] --> A1[🔍 Analysis Agent] A1 --> R1[💡 Recommendation Agent] R1 --> E1[⚡ Action Agent] end style D1 fill:#e6f3ff style A1 fill:#e6f3ff style R1 fill:#e6f3ff style E1 fill:#e6f3ff

With current protocols, the AI-to-AI parts work great:

Data agent extracts information ✓
Analysis agent processes it ✓
Recommendation agent generates options ✓
Action agent executes ✓

But what if step 3 should have been “Human reviews options before action”?

graph LR subgraph "Needed: Human-Aware Pipeline" D2[📊 Data Agent] --> A2[🔍 Analysis Agent] A2 --> R2[💡 Recommendation Agent] R2 --> H2[👤 Human Review] H2 --> E2[⚡ Action Agent] end style D2 fill:#e6f3ff style A2 fill:#e6f3ff style R2 fill:#e6f3ff style H2 fill:#ffe6e6,stroke:#cc0000,stroke-width:2px style E2 fill:#e6f3ff

Current protocols don’t have a clean answer for this. You’re left bolting on custom solutions:

A Slack notification that someone might miss
An email that sits in an inbox
A dashboard nobody checks

The context that made the AI’s recommendation make sense? Often lost by the time a human sees it.

A 2024 McKinsey study on AI in enterprise workflows found that 67% of failed AI implementations cited “poor human-AI handoff design” as a primary factor⁵.

The Confidence Problem

Here’s another gap: AI doesn’t know what it doesn’t know.

When an AI agent is uncertain, it should probably ask a human. But current protocols don’t standardize:

How to express uncertainty
When uncertainty should trigger human involvement
How to preserve context for the human handoff

Research from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that LLMs are often confidently wrong—expressing high certainty on incorrect answers 31% of the time⁶. Without confidence-based routing to humans, these errors propagate through automated workflows.

The Audit Problem

Regulations are catching up to AI. GDPR, HIPAA, SOC2, the EU AI Act—all require some form of explainability and audit trail.

The EU AI Act, which took full effect in 2025, specifically requires “meaningful human oversight” for high-risk AI systems⁷. Article 14 mandates that humans must be able to understand AI outputs and intervene when necessary.

Current protocols focus on what happened between agents. But auditors ask different questions:

Who made this decision?
Was a human involved?
Could a human have intervened?
Why wasn’t a human involved?

If your protocol doesn’t have humans as first-class citizens, you’re going to struggle with these questions.

What Would Human-Aware Protocols Look Like?

I’ve been thinking about this for years. Here’s what I believe is needed:

1. Humans as First-Class Participants

A human shouldn’t be a “fallback” or an “escalation path.” They should be a valid node type, just like an AI agent.

Workflow:
  [AI: Analyze] → [Human: Validate] → [AI: Execute]

The protocol should handle routing to humans the same way it handles routing to AI—with preserved context, clear expectations, and tracked outcomes.

2. Context Preservation Across the Human Boundary

When AI hands off to a human, the human should understand:

What the AI was trying to do
Why it stopped
What options it considered
What it recommends

And when the human hands back to AI, the AI should understand:

What the human decided
Why they decided it
Any additional context they provided

MCP is great for AI-to-AI context. We need the same rigor for AI-to-human and human-to-AI.

3. Confidence-Based Routing

Protocols should support routing decisions based on confidence:

graph TD AI[🤖 AI Decision] --> Conf{Confidence?} Conf -->|"> 90%"| Auto[✅ Auto-Execute] Conf -->|"70-90%"| Notify[📧 Notify Human
Proceed if no objection] Conf -->|"< 70%"| Require[⏸️ Require Approval] Auto --> Log[📋 Log for Audit] Notify --> Log Require --> Human[👤 Human Reviews] Human --> Decide{Decision} Decide -->|Approve| Execute[⚡ Execute] Decide -->|Reject| Stop[🛑 Stop] Execute --> Log Stop --> Log style AI fill:#e6f3ff style Human fill:#ffe6e6 style Auto fill:#e6ffe6 style Notify fill:#fff0e6 style Require fill:#ffe6e6

This isn’t just a nice-to-have. For regulated industries, it’s becoming mandatory.

4. Native Audit Trails

Every decision point should be auditable:

What information was available
What decision was made (by AI or human)
Why that decision was made
What happened next

This needs to be built into the protocol, not bolted on after.

The Opportunity

The companies building MCP, A2A, and other protocols are solving real problems. I’m not criticizing their work—I’m building on it.

But there’s a gap in the market for human-aware AI orchestration. Something that:

Speaks MCP, A2A, and other protocols
Treats humans as first-class workflow participants
Preserves context across the human boundary
Provides native governance and audit capabilities

The AI orchestration market is projected to reach $42.8 billion by 2032, growing at 23.4% CAGR⁸. Most of that will go to enterprise use cases. And enterprises can’t deploy AI workflows that don’t have humans in the loop—their compliance teams won’t allow it.

What I’m Working On

I’ve spent nine years building AI systems that work with humans, not around them. Several of those systems are now patented.

The common thread across all of them: the most powerful AI systems don’t replace humans. They collaborate with us.

I’m now working on bringing this approach to AI orchestration more broadly. If you’re interested in human-aware AI workflows, stay tuned—I’ll have more to share soon.

In the meantime, I’m curious: what are the biggest gaps you see in current AI protocols? Where do humans fit in your AI workflows? I’d love to hear your perspective.

Srirajasekhar “Bobby” Koritala is the founder of Bodaty. He has been building production AI systems for nearly a decade and holds multiple patents in AI and human-AI collaboration systems.

References

Anthropic. (2024). “Introducing the Model Context Protocol.” anthropic.com/news/model-context-protocol ↩
Google Cloud. (2025). “Agent2Agent: An Open Protocol for AI Agent Interoperability.” cloud.google.com/blog/products/ai-machine-learning/a2a-protocol ↩
OpenAI. (2025). “Introducing the Agents SDK.” openai.com/blog/agents-sdk ↩
Stanford HAI. (2024). “Human-AI Collaboration in High-Stakes Decision Making.” hai.stanford.edu/research/human-ai-collaboration ↩
McKinsey & Company. (2024). “The State of AI in 2024: Generative AI’s Breakout Year.” mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai ↩
MIT CSAIL. (2024). “Calibrating Large Language Model Confidence.” csail.mit.edu/research/llm-calibration ↩
European Commission. (2024). “The EU Artificial Intelligence Act.” digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai ↩
Grand View Research. (2024). “AI Orchestration Market Size Report, 2024-2032.” grandviewresearch.com/industry-analysis/ai-orchestration-market ↩