Here's how you can pave way for an AI-first delivery teams:
1. Formalized SDLC artifacts (Spec-driven development as the latest implementation example)
The SDLC stages stay the same: Idea → Feature/User Story → System Design/Implementation Plan → Code Implementation → Testing. What changes is formalization, strict standardization of inputs and outputs at each stage.
Traditional flow: Developers look at a story and, if the scope seems clear, jump straight to implementation. They might sketch a partial design without all the details. They skip formal planning and just code their way through. This worked for established teams because creating all those artifacts felt tedious and there was no real consumer for the effort.
AI-first flow: Agents generate detailed artifacts at every stage. Next-stage agents consume those artifacts. The mechanism: detailed specs reduce ambiguity, agents execute more reliably, and humans validate at boundaries instead of babysitting every line. SDD—spec-driven development—captures the pattern.
Here’s why formalization matters: vague stories lead to one of two failure modes, either you get an incorrect implementation, or you get correct logic wrapped in inconsistent architecture or code-level design, which tanks maintainability. Detailed specs should include acceptance criteria, dependencies, edge cases, and architecture/coding guidelines, the stuff developers used to carry in their heads or infer from tribal knowledge.
AI handles the discipline and mechanics of generating these artifacts. Humans review for correctness and alignment. This division of labor is more likely to stick than previous attempts at process formalization, precisely because AI removes the tedium that made everyone skip steps before.
The trade-off: more upfront structure for faster, more consistent execution. Teams gain throughput but sacrifice the flexibility to shortcut from vague story to working code in a single leap.
2. Specialized, swappable agents
Productive teams don’t standardize on a single AI tool. They run heterogeneous portfolios: Cursor for agentic coding, Claude Code for autonomous tasks, GitHub Copilot for inline suggestions, custom agents for testing and business analysis tasks. The pattern that works: properly scoped, single-responsibility agents at the edges, coordinated through shared workflows and context.
The volatility argument matters. The field shifts rapidly. Today’s benchmark leader might fall behind within months. Committing long-term to one platform risks lock-in to yesterday’s capabilities. But here’s the tension: chasing every new release burns time without payoff. Switching from Claude Code to Codex or vice versa just because it gained a few percentage points on benchmarks wastes effort without clear benefit.
The balance teams are finding: maintain a limited selection per task type/role, two or three endorsed tools. Keep a small team continuously experimenting with the latest releases. They anchor the broader team to market trends without forcing everyone into permanent experimentation mode. Review the portfolio regularly: retire what’s falling behind, double down on what’s delivering.
Why understanding internals matters: Agents are black boxes to varying degrees. Prompts, tools, context management, context retrieval, low-latency sync engines, model selection—all of these influence outcomes. Different platforms expose different levels of control. The farther you are from a complete black-box solution, the more you can tune behavior, not if, but when, results degrade. Competitive moats are built from these combined mechanisms, which means transparency will increasingly become a challenge.
Heterogeneity requires standardized integration points. Agents coordinate through shared systems of record (VCS, issue tracker, design files, source code), a layer of curated markdown files maintained by teams to direct agents on how to distill project knowledge from project data, and versioned workflows.
3. Project knowledge layer (Curated context)
Enterprises carry mountains of data: thousands of Jira epics, hundreds of Confluence pages, architecture diagrams in various formats, test cases, bug reports, and documents nobody’s looked at in months. All of it created by humans, for humans. Agents can’t parse the volume or extract implicit context at scale.
The distinction that matters: Data exists, but agents need knowledge: a condensed, structured subset of project data, along with meta-information that points to detailed sources. Project knowledge includes application architecture components, dependencies, project structure, naming conventions, code style guides, service/front-end/data-access layer implementation patterns, shared components, external library usage, security protocols, and more. The things any engineer should know to be productive on the project.
The mechanism behind agent effectiveness: Context engines ingest, normalize, search, rank, and retrieve knowledge to shape context. Advanced companies invest in proprietary context engines or adopt platforms that offer tighter control. But every team can benefit from a simpler approach—manually curated markdown files defining key concepts.
Why curate manually: This is tribal knowledge that was never captured in writing or quickly became obsolete because nobody had ongoing incentive to maintain it. Now there’s a buyer. Agents need this context for every transaction. Humans need it too, to achieve the desired outcome or to understand why an agent ignored a constraint and repeated the same mistake twice.
The trade-off: upfront curation effort for sustained agent quality. Teams invest time structuring knowledge once, then maintain it incrementally as it evolves. The payoff: agents make fewer errors, humans spend less time debugging annoyingly poor outputs.
4. Human gates at stage boundaries
AI-first sounds autonomous. In practice, teams insert human gates after every major stage. The reason: unsupervised workflows compound errors on varied, contextual work, pure probability across multiple stages. Unverified agent output passed to the next stage accumulates inaccuracies; quality degrades faster than it improves.
The framing: humans anchor the edges. They’re responsible for planning agent inputs and validating outputs. The middle (implementation) runs autonomously.
The exception: highly repetitive backlogs with narrow scope—straightforward migrations, bulk refactoring, test generation for uniform patterns. Workflows can run longer here without intervention. Insert validation gates elsewhere; quality resets at each boundary.
True efficiency comes not from removing humans entirely, but from shifting where human effort is applied. AI handles tedious implementation, while humans focus on judgment calls, thoughtful preparation (clear specifications and scoped tasks), and rigorous review (validating logic, checking architectural alignment, and confirming edge cases).
Where efficiency actually comes from: not from removing humans entirely, but from shifting where human effort is applied. AI handles tedious implementation, while humans focus on judgment calls, thoughtful preparation (clear specifications and scoped tasks), and rigorous review (validating logic, checking architectural alignment, and confirming edge cases).