AI Code Gen Tools Need Solid Foundations

Last updated on 06/03/2024

Last year, AI code gen tools made a huge splash into the software development world. Again. Someone had the bright idea that “Hey, source code is text. Maybe we can generate that!”. What started as one big offering has grown to dozens of tools offering to write our code for us.

These tools excite some, terrify others, and leave many wondering what exactly is going on. To better understand these tools and what they can do, let’s look at the two main ways they’re used.

Pair Programming Without A Human

You ever notice a programmer with a small toy on their desk? Maybe an action figure or a rubber duck? Sometimes it’s just desk kitsch. Sometimes it’s there so that we have an inanimate object to ask questions of. For whatever reason, externalizing a question can help us answer it. It’s so useful that it has a name: rubber ducking.

If a mute little ducky is helpful, how much more helpful would one be that can respond? Well, it depends. If the duck is correct, all is well and right with the world. We save a bunch of time context switching out of the IDE. We don’t get distracted half way through our web search for documentation to check reddit. Big savings.

What about the unhappy path though?

What if the duck is wrong?

My personal experience with code gen tools is that the duck is wrong. Not all the time, but often enough (10-15%) to be a real concern. Here are some failure cases I’ve seen:

Hallucinate API calls that don’t exist
Assume Python still doesn’t support switch statements
Ignore best practices
Answer questions without context of the repository

I’ve been writing programs for a little over 30 years now, and I’ve been doing it for a living going on 20. Fixing busted and bad code is something I’m pretty good at by now. More to the point, recognizing bad code is something that’s easy for me.

When the duck is wrong, I know it in short order. I can push back against the bad code and get to something I actually like in short order. For me, a failure case can be useful, so long as I don’t get complacent and accept the duck without questioning it.

Most of your development staff won’t be senior though. Some will be fresh out of college or bootcamp. Do they know where to go look if the tool hallucinates an API call? Can they recognize bad code? Will they have the practice in correcting bad code to fix it the right way?

Probably not. And that’s bad. What’s worse is that now that they have this tool, they’ll rely on it. It provides a low-friction way of getting help. Much lower friction than asking a senior engineer for help. It can actually increase the amount of time that someone struggles with an issue in the unhappy path.

Code Completion

Generative tools attached to the IDE is like having a rubber duck that can talk back. When they do code completion, it’s like having a rubber duck that can pair with you and make suggestions. At first blush, this sounds great! And honestly, it can be. When it works well, it’s delightful. It doesn’t always work though. It has all of the same accuracy and correctness problems as the above case, and one critical new one.

Generative Tools Make It Easy to Write Lots of Code

That’s what we want as and of programmers, right? To write lots of code. Well, yes, but also no. There’s this principle in software development, don’t repeat yourself (DRY). Code gen tools make it easy to violate this principle. How I know I need to think about keeping my code DRY is that I find it’s taking too long to write a function. “There has to be a more concise way; I need to refactor” I say. Code gen tools take away the thing that trips my Spidey-sense.

What Should We Do?

Perhapssurprisingly, my advice isn’t to avoid code generation tools. Nor is it to only provide them to your most seasoned engineering staff. The failure cases of code gen tools fall into two categories:

Reducing communication within a team
Introducing bad code into the repository

There are tools for dealing with both of these problems. The technical problems are the easiest to handle. Robust testing practices, a well oiled code review machine, and auditing all help. These are things you should be doing already, and likely are if you’re in the business of writing software.

Social issues are always harder to solve. The first thing to ask is “why is it difficult for my programmers to talk to each other?”. Are people afraid of admitting they don’t know something? Are senior developers not incentivized to mentor juniors? Is time not allocated for that? Getting to the root of these may indicate why your teams aren’t talking more.

Code generation tools are powerful, and can be quite effective. Before using them though, you need to make sure your house (and repositories) are in order. $$CALL_TO_ACTION