Applying agentic AI to process millions of lines of code in one day using private AI

VIEW SPEECH SUMMARY

Key AI Agents Developed and Use Cases

1. Code AI Agent
- Target: Legacy software issues, especially low test coverage and spaghetti code.
- Process: Takes source code, generates unit tests via LLM dialogs, compiles and runs tests, iterates for correctness.
- Uses tools like runtime tests, mutation tests, and Sonar for code quality.
- Automatically generates pull requests for tests and code improvements, always reviewed by humans.
- Benefits: Improves test coverage, code structure, and developer confidence.

2. Document AI Agent
- Generates up-to-date code documentation for source files with summaries at multiple levels.
- Supports 50 programming languages and can output in any human language.
- Eases onboarding by providing readable, localized documentation instead of large, hard-to-read BRDs.

3. Log AI Agent
- Connects to logging infrastructure (e.g., Splunk, OpenSearch) and streams logs to privately hosted LLMs.
- Extracts actionable insights such as performance degradations or security issues.
- Allows custom prompts tailored to organization needs.
- Operates externally without software release; no data leaves infrastructure.

4. Translate AI Agent
- Aims to translate legacy code (e.g., COBOL to Java) using an indirect multi-agent approach.
- Uses Document AI to understand legacy code and generates new idiomatic code in the target language.
- Not a one-day solution but accelerates migration by weeks.

5. TDD (Test-Driven Development) Agent
- Takes user stories and target project; iteratively generates tests and code following TDD principles.
- Extends happy-path tests with edge cases and delivers pull requests for review.
- Helps instill TDD habits through enforced workflows.

6. User Story Agent
- Parses large Business Requirement Documents (BRDs) to extract, list, and reference user stories.
- Detects contradictory statements and out-of-scope content.
- Connects naturally with TDD agent for downstream coding.

Agent Assembly Line and Human-in-the-Loop
- Envisioned workflow chains agents: user story extraction -> TDD coding -> code improvements, etc.
- Human approval is enforced at every pull request stage to maintain control and quality.
- Complete automation discouraged to avoid error accumulation.

Technical and Cost Details
- Agents written in Java, runnable on Windows, Linux, Mac.
- Use free pre-trained models downloaded once; no training or ML expertise needed.
- Typical server with GPU (16GB VRAM) costs about $200/month; operational cost under $2/hour.
- No cloud vendor lock-in; models hosted privately for security.
- Models switched every 2-3 weeks depending on client preferences and availability.

Quality Control Challenges
- Code outputs are easier to validate automatically using compilation, testing, coverage, style checks.
- Documentation and BRD interpretation require human review; no reliable automatic quality metrics exist.
- Human-in-the-loop remains essential to verify outputs, especially where AI is non-deterministic.

Q&A Highlights and Additional Notes
- Pull requests are currently not interactive; humans often create new PRs after reviewing agent outputs.
- Refactoring by Code AI is local to files, respecting code structure; around 200 lines is an optimal change size.
- For log AI, agents stream logs and query them via prompts but generating dashboards/alerts is a forthcoming development.
- Different LLMs used according to client concerns: Microsoft Pi, Google Gemma 3, Mistral DevStral preferred; avoid Hallucinating models like Gemma 2.
- Plans to leverage agent orchestration platforms like A2A or MCP for better management.

Actionable Items
- Engage with Capgemini to kick off or optimize your enterprise Gen AI agent journey.
- Define clear boundaries and human approval steps (e.g., pull requests) in your AI workflows.
- Explore integrating agents for legacy code test coverage and refactoring to improve reliability.
- Use Document AI agent for comprehensive, multilingual code documentation.
- Deploy Log AI agent with your logging infrastructure to surface actionable insights early.
- Consider agent assembly lines connecting User Story Agent to TDD Agent to scale development.
- Maintain human oversight to ensure quality in agent-generated outputs.
- Monitor and evaluate different LLM models to pick the best fit for your security and performance needs.
- Stay updated on evolving best practices and tooling for agent orchestration and workflow automation.

Overall, this presentation emphasizes pragmatic, secure, low-cost Gen AI agent applications that enhance software development, especially legacy modernization, testing rigor, documentation, and operational insight while preserving critical human control.

Applying agentic AI to process millions of lines of code in one day using private AI

12:00 - 12:30, 27th of May (Tuesday) 2025 / DEV AI & DATA STAGE

Part of a developer's work can be delegated to machines, and I'm not talking about Copilot, but about Gen AI agents that relieve developers of the most demanding responsibilities such as:
•   Covering millions of lines of code with unit tests in one day
•   Creating user stories from large business documents
•   Mass code improvements based on static code analysis
•   Documenting code, including summaries for modules, packages, etc.
•   Analyzing millions of log files for performance and security
•   Creating initial iterations of new features based on user stories
All of this is done by our agents safely, without using the Internet and without high hardware requirements.

LEVEL:

Basic Advanced Expert

TRACK:

AI/ML Software Architecture

TOPICS:

AI Backend ITarchitecture Java SoftwareEngineering

Adam Witkowski

Capgemini