Going Full Agentic for Scientific Software Development

Download source Contribute

The landscape of scientific software development is being transformed by AI coding agents. Over the past few weeks, I’ve been exploring GitHub Copilot’s AI coding agent capabilities for maintaining healpy, and the experience has been remarkable. What started as an experiment has evolved into a workflow that feels like managing a team of experienced developers rather than coding alone.

In the last two weeks alone, this workflow helped me close 12 healpy issues (per gh api 'search/issues?q=repo:healpy/healpy+is:issue+involves:zonca+closed:%3E=2025-10-27' --jq '.total_count'), providing a concrete sense of momentum.

The Workflow: Assigning Issues to AI Agents

My approach has been straightforward: I take existing issues from the healpy repository—some of which have been open for 5 years or more—and assign them to GitHub Copilot. The AI agent then automatically analyzes the issue, explores the codebase, and opens a pull request with a proposed fix.

Once Copilot completes its first pass at solving the problem, it requests my review. This is where the real collaboration begins.

Managing Multiple Pull Requests Simultaneously

One of the most impressive aspects of this workflow is the ability to work on multiple issues in parallel. At my peak, I was managing five different pull requests at the same time, with Copilot working on each one independently.

This parallel approach fundamentally changed my role. Instead of being the sole developer writing every line of code, I became a technical reviewer and project manager, guiding multiple AI agents toward the right solutions. It genuinely felt like having a team of five experienced developers working for me—each one capable, but needing direction and review to ensure quality meets project standards.

Adding Another Layer: Automated Code Review with Codex

To enhance this workflow further, I enabled Codex, the AI-powered code review agent developed by OpenAI. This adds another dimension to the process:

Copilot creates and updates pull requests
Codex automatically reviews the changes and provides feedback
I can then ask Copilot to address Codex’s feedback

This creates a fascinating dynamic where AI agents are reviewing each other’s work. It’s like having five developers and an additional maintainer on the team. Codex often catches issues I might miss in my initial review, such as potential security vulnerabilities, performance concerns, or edge cases that need better handling.

The immediate feedback loop is invaluable. Instead of waiting for human reviewers, the code gets an initial review instantly, allowing me to iterate faster and catch issues earlier in the development process.

Streamlining with Auto-Merge

The final piece of this agentic workflow is enabling GitHub’s auto-merge feature. Once the iterative review process is complete and all parties (human and AI) are satisfied with the changes, I enable auto-merge. This ensures that:

All required tests pass one final time
The pull request meets all branch protection requirements
The merge happens automatically without manual intervention

This eliminates the need to babysit pull requests waiting for final test runs. I can approve a PR, enable auto-merge, and move on to reviewing the next one, confident that it will merge once everything is green.

For more details on setting up and using auto-merge effectively, see my previous post on auto-merging GitHub pull requests.

The Mobile Advantage: Reviewing On the Go

An unexpected benefit of this workflow is how well it works from a mobile device. Since my primary role has shifted from writing code to providing feedback and guidance, I can effectively manage the process from my phone.

I no longer need to be in front of my laptop to keep development moving forward. I can:

Review pull requests during coffee breaks
Provide quick feedback to AI agents while commuting
Approve changes and enable auto-merge from anywhere
Check test results and Codex reports on the go

This flexibility means the AI agents can continue working even when I’m away from my desk. I give feedback, they iterate on the solution, and the development process continues smoothly. It’s a remarkably efficient use of time that wouldn’t be possible with traditional development workflows.

What This Means for Scientific Software Development

This agentic approach to software development represents a significant shift in how we can maintain scientific software projects. The bottleneck is no longer the time it takes to write code—it’s my capacity to review and guide the work effectively.

For scientific software projects that are often maintained by researchers with limited time, this is transformative. Issues that languished for years can now be addressed systematically. The AI agents provide the development capacity, while the human maintainer provides the scientific domain knowledge and quality standards.

The workflow isn’t perfect—it requires active management, clear communication of requirements, and careful review. But it’s remarkably effective, and it’s only going to improve as these AI coding agents continue to evolve.

If you maintain scientific software, I highly recommend exploring this workflow. Start with a few straightforward issues, get comfortable with the review process, and gradually scale up. You might find, as I did, that it fundamentally changes how you approach software maintenance and development.

Getting Started

If you’re interested in trying this workflow yourself:

Explore GitHub Copilot and sign up for access (academic institutions often have free access)
Review my guide on using GitHub Copilot for scientific computing
Enable Codex on your repository for automated code review
Set up auto-merge to streamline your workflow
Start with simple issues to build confidence with the process

The future of scientific software development is looking increasingly collaborative—with AI agents as valuable team members working alongside human expertise.

Yes, of course, this post was also created by Copilot from my speech-to-text raw inputs.