Using GenAI to Build Research Software: A Conversation About What Works

Does GenAI have a place in research software development? We chat to Leeds biophysicist and self-taught developer, Daniel Rollins, about what works and what doesn't.

I used GenAI to develop software for my research and now I’m nervous about sharing it.

As an RSE, I regularly hear researchers express uncertainty about sharing code they’ve developed with Generative AI (GenAI) assistance. Recently, I had a conversation with Daniel Rollins, a biophysicist who has built a software tool called playNano using a combination of manual coding, vibe coding and code completion tools (Table 1).

As Daniel walked me through his development process, I was struck by how thoughtfully he’d approached it—building modularly with good test coverage, being consultative in his design, and managing the code base in alignment with open research principles. Yet despite this care, he still felt hesitant about sharing his work publicly.

This tension isn’t unusual. Even experienced developers can feel anxious about sharing code, but GenAI adds new questions: How can you be confident in code you didn’t write entirely yourself? What do you need to disclose? Where are the guidelines? These concerns are particularly acute for scientific and computational research software, where code correctness directly affects research validity and reproducibility.

There aren’t yet established community guidelines for GenAI use in research software development. But researchers like Daniel are already developing practical approaches that balance the power of these tools with scientific rigor. Our conversation surfaced several practices that might help others navigating similar territory.

Term	What is it	Examples
Code completion tools	AI that suggests the next few lines as you type, based on context	GitHub Copilot, Tabnine
Vibe coding	Conversational development where you describe what you want in natural language and AI generates substantial code blocks	Loveable, Claude, Cursor with chat
AI coding agents	Autonomous AI that can plan, write, test, and debug entire features with minimal human direction	Devin, GitHub Copilot Agents, OpenAI Codex
AI pair programmer	AI assistant that works alongside you, offering suggestions and answering questions as you code	GitHub Copilot, Aider, Claude code
Prompt coding (or ‘prompt-to-code’)	Using AI models to generate code based on detailed, specific prompts	Cursor, Warp Code, Microsoft Copilot

Table 1: Guide to GenAI coding tools. All require foundational coding knowledge to use effectively. The examples listed are for illustration only; research and compare different options before selecting tools for your work.

1. Learn the basics before accelerating with AI

Daniel didn’t jump straight into asking ChatGPT to build entire features. He first gained foundational programming skills by accessing online materials and taking advantage of free software development courses geared towards researchers, such as those offered by our team (browse our courses). This base level of knowledge meant he was able to understand and critique what the AI generated—which is essential, because AI generates a lot of questionable code.

“I needed to understand enough to know when something looked wrong,” he explained. For anyone who uses GenAI to support copywriting, the principle is the same – you need a baseline grasp of the language you’re writing in to use the tools effectively, and there aren’t really any shortcuts to getting there. On the flipside though , reviewing and refactoring AI-generated code is excellent practice for becoming a better developer.

2. Think about risk and context

Not all code carries the same risk. Daniel’s tool visualizes microscopy data—if something’s wrong, an experienced microscopist will notice artifacts in the images. That’s very different from, say, AI-generated code performing statistical calculations or making algorithmic decisions where errors might be invisible.

Before handing work to GenAI, consider: What could go wrong if this code has bugs? Could it introduce bias or incorrect behaviour that wouldn’t be obvious? You might decide certain sections—calculation engines, decision-making algorithms—should remain human-written.

3. Design your tests first

Daniel approached each new feature by thinking through: What should this component actually do? What behaviour would I expect? Planning tests upfront made it easier to write unambiguous specifications for the AI and verify the results afterward.

“Having some example or test data you can use as a ground truth is essential for verifying that the AI-generated code actually works as intended,” Daniel explained. This approach aligns well with Test-Driven Development (TDD) principles, where you write tests before implementation – a methodology that translates particularly well to working with AI tools, as it forces you to clarify expected behaviour before generating code.

“You can also ask the AI to help design the tests,” Daniel noted. “That conversation often clarifies what you’re actually trying to build.”

TDD is one of many good practices covered in our SWD3 course, Software Development Practices for Research.

4. Build in small, manageable pieces

Rather than asking GenAI to generate entire systems, Daniel developed the software modularly. Each piece was small enough to understand and test individually.

“I was never asking the bot to do too much at once,” he said. “I also got the AI to suggest different structural approaches and provide pros and cons for several options, then I chose the approach based on that.”

Small pieces are less intimidating to review—and the process of reviewing them is how you learn.

5. Use AI to review AI (with caution)

If you lack a human reviewer, you can use a different AI tool to review code. Daniel experimented with this, though we both acknowledged it’s not a replacement for human oversight—it’s more of an additional check.

6. Get human feedback wherever possible

This was the big one. Daniel reached out to our team for feedback on their codebase organization and some specific sections of code. We could offer reassurance that the structure looked healthy and aligned with good practices.

“It’s hard to avoid programming in isolation,” Daniel reflected, “but even small amounts of human feedback make a huge difference to your confidence.”

If you don’t have programmers available for code review, have them review pseudocode or talk through your code’s logic over a coffee. You might be surprised how valuable it is just to explain your approach to someone else, hence why many programmers use ‘rubber ducking’ – a technique used for debugging whereby the programmer explains their code in natural language to a rubber duck, to help them identify mistakes and weak areas.

7. Use GenAI for thinking, not just typing

Daniel found that chatting with AI about design decisions—before generating code—was often more valuable than having it write code directly.

“Software development is really about understanding a problem well enough to model it in code,” they explained. “Using AI to generate a complete solution isn’t very effective, whereas using it to clarify your thinking can be much more helpful.”

Something to be mindful of though, is that unless you expressly tell the agent to challenge you, you might find them too agreeable. Explicitly ask the AI to identify potential problems with your approach or propose alternatives—this pushes you to think more critically about your design choices.

8. Be transparent about your use

Daniel was concerned about disclosing their use of GenAI, so we discussed how to do this transparently, without reducing trust in the code. For example, mention in the README which functional components used GenAI assistance significantly; add comments or docstring notes in the code itself where substantial portions are AI-generated; or note it in commit messages. All of these things increase transparency around the development process, which is valued in open-source and research communities.

Being transparent isn’t about apologizing—it’s about documenting your process, just as you would with any other research methodology.

9. Invite collaboration and scrutiny

Regardless of whether AI is used, the best way to improve a codebase is to get people using it and contributing to it. Make it clear how people can raise issues or contribute to the project, and share your work to invite feedback and collaboration. While these might be the things you’re most nervous about, they’re ironically the most helpful steps you can take for your code’s quality and longevity. We discussed ongoing RSE involvement in the project’s development and maintenance—something we can offer through our consulting service—and explored outlets for the work such as the Journal of Open Source Software (JOSS). Daniel was keen to demonstrate the tool’s application in a real scientific use case where results could be properly scrutinized.

10. Protect sensitive data

Finally, when working with AI coding assistants, avoid sharing sensitive information like API keys, passwords, customer data, or proprietary algorithms. Even when developing modularly, it’s easy to accidentally paste code containing credentials or real data when asking for help.

Before sharing code with AI tools, sanitise it by replacing sensitive values with placeholder data or dummy examples. Use environment variables for credentials rather than hardcoding them, and work with synthetic or anonymized datasets when demonstrating problems to AI assistants. Our SWD3 course covers much of this good practice, which applies in relation to version control and open sourcing of code too.

“I make it a habit to use fake data whenever I’m asking AI for help,” Daniel noted. “It takes an extra minute, but it means I never have to worry about what I’ve shared.”

Most AI providers state they may use conversation data for training purposes, so treat any code you share with AI tools as potentially public information.

Where do we go from here?

Software developers often talk about ‘inner loops’ and ‘outer loops’ in development (see [1-2]). The inner loop is your rapid cycle: generate code, run it, check if it works, iterate. The outer loop is broader: integration testing, peer review, deployment.

What all of these practices have in common is that they help you tighten and multiply your inner loops when working with GenAI. By building in small pieces (#4), writing tests first (#3), and using AI to review AI (#5), you’re creating more opportunities for fast feedback within your development process, which should lead to better quality code. Your outer loops—like getting human feedback (#6) or inviting collaboration (#9)—provide the deeper validation that your overall approach is sound.

Community guidelines are still emerging for use of GenAI in scientific research software development. The University has published guidance on AI use in research, and resources like this NCRM guide offer starting points. Additionally, Post-Graduate Research (PGR) students considering using GenAI for software that forms part of an assessment should refer to the University’s guidance on the Use of GenAI in PGR Assessments.

What’s clear from conversations like this one is that researchers are already using these tools thoughtfully and developing their own good practices. For Daniel, the tools offered tangible benefits: “They sped up the process of writing my code so I could move my research forward quicker,” he noted, and even small conveniences like code prediction tools making fewer typos than he does added up to significant time savings. The challenge now is sharing these practical experiences and emerging practices more widely—which is exactly what Daniel is doing by being open about his process.

References

[1] L. Brown, “The Three Developer Loops: A New Framework for AI-Assisted Coding,” IT Revolution, Oct. 20, 2025. [Online]. Available: https://itrevolution.com/articles/the-three-developer-loops-a-new-framework-for-ai-assisted-coding/

[2] M. Møldrup, “Increase quality and speed up development by tweaking the inner and outer development Loops for AI Projects,” Full Spectrum Data Scientist, Feb. 8, 2025. [Online]. Available: https://www.fullspectrumdatascientist.com/posts/development-loops/

Cover image credit: Christopher Combe, Licence: cc-by-2.0

The RSE team offers limited ad-hoc support for software development. If you’d like feedback on your code or want to discuss your approach to using GenAI tools, get in touch.

image representing software citation showing CITATION and AUTHORS files surrounded by networks

How to make your research software citable

Making your research software citable means you get credit for that work and others can build on it with confidence. This practical guide will show you how.

N8CIR Digital Research Internships 2026: Applications Now Open

Applications are now open for the 2026 N8 Digital Research Internships, offering undergraduate students the chance to work on real research projects with our team this summer.

Research Storage Pricing Update – August 2026

From 1 August 2026, pricing for the University’s Research Data Storage service will change due to rising demand for storage.

Our self-service tools are designed to streamline your access to ARC’s resources and support services. Whether you need to request access, troubleshoot an issue, or enhance your research capabilities, these tools empower you to manage your needs efficiently and independently, anytime.

In this section

Knowledge Centre