Contributing Writer

Why AI-generated code isn’t good enough (and how it will get better)

feature
Mar 17, 202514 mins

Did we normalize AI-generated code too soon? Here’s where it falls short and how it’s improving.

Unhappy Sad Developer Programmer Man In Stress Coding Software On Computer
Credit: Andrey_Popov

Large language models (LLMs) seemed to arrive in a flash. Monumental productivity gains were promised. Coding assistants flourished. Millions of multi-line code blocks were generated with a key press and merged. It worked like magic. But at the back of everyone’s mind was a nagging thoughtβ€”can I actually trust this code?

It feels laughable to question the merits of AI in software development in 2025, as it’s already inextricably entrenched. Microsoft reports that 150 million developers use GitHub Copilot. Stack Overflow’s 2024 survey found 61.8% of developers use AI within their development process. Google claims over a quarter of its new code is AI-generated. 

In short, β€œAI-generated code is already the norm,” says Chris Anley, chief scientist at NCC Group. But is it really up to the task?

The problems with AI-generated code 

β€œLet’s be real: LLMs are not software engineers,” says Steve Wilson, chief product officer at Exabeam and author of O’Reilly’s Playbook for Large Language Model Security. β€œLLMs are like interns with goldfish memory. They’re great for quick tasks but terrible at keeping track of the big picture.”

As reliance on AI increases, that β€œbig picture” is being sidelined. Ironically, by certain accounts, the total developer workload is increasingβ€”the majority of developers spend more time debugging AI-generated code and resolving security vulnerabilities, found The 2025 State of Software Delivery report.

β€œAI output is usually pretty good, but it’s still not quite reliable enough,” says Bhavani Vangala, co-founder and vice president of engineering at Onymos. β€œIt needs to be a lot more accurate and consistent. Developers still always need to review, debug, and adjust it.”

To improve AI-generated code, we must address key concerns: distrust, code quality issues, context limitations, hallucinations, and security risks. AI shows incredible promise, but human oversight remains critical.

Bloat and context limits

AI code completion tools tend to generate new code from scratch rather than reuse or refactor existing code, leading to technical debt. Worse, they tend to duplicate code, missing opportunities for code reuse and increasing the volume of code that must be maintained. β€œCode bloat and maintainability issues arise when verbose or inefficient code adds to technical debt,” notes Sreekanth Gopi, prompt engineer and senior principal consultant at Neuroheart.ai.

GitClear’s 2025 AI Copilot Code Quality report analyzed 211 million lines of code changes and found that in 2024, the frequency of duplicated code blocks increased eightfold. β€œSince AI-authored code began its surge in mid-2022, there has been more evidence every year that code duplication keeps growing,” says Bill Harding, CEO of Amplenote and GitClear. In addition to piling on unnecessary technical debt, cloned code blocks are linked to more defectsβ€”anywhere from 15% to 50% more, research suggests.

These issues stem from AI’s limited context. β€œAI is better the more context it has, but there is a limit on how much information can be supplied to an AI model,” says Rod Cope, chief technical officer at Perforce Software. GitHub reports Copilot Chat has a 64k-128k token context window, equating to about 30 to 100 small files or five to 20 large ones. While context windows are growing, they’re still insufficient to grasp full software architectures or suggest proper refactoring.

No β€˜big picture’ thinking

While AI excels at pattern recognition, it doesn’t see the β€œwhy” behind the code. This limits its ability to make trade-offs around business logic, user experience, or long-term maintainability. β€œAI lacks the full context and problem-solving abilities that senior engineers bring to the table,” says Nick Durkin, chief technical officer at Harness.

Coding is inherently a creative and people-centric activity. β€œAI cannot build new things that previously did not exist,” says Tobie Morgan Hitchcock, chief executive officer and co-founder of SurrealDB. β€œDevelopers use creativity and knowledge of human preference to build solutions that are specifically designed for the end user.”

As a result, AI tools often β€œwaste more time than they save” for areas like generating entire programs or where broader context is required, says NCC’s Anley. β€œThe quality of the code generated drops significantly when they’re asked to write longer-form routines.”

Hallucinations and security risks

Hallucinations still remain a concern. β€œAI doesn’t just make mistakesβ€”it makes them confidently,” says Exabeam’s Wilson. β€œIt will invent open-source packages that don’t exist, introduce subtle security vulnerabilities, and do it all with a straight face.”

These errors often stem from a poor data corpus. As Durkin explains, AI trained on synthetic data risks creating an echo chamber, leading to model collapse.

Cory Hymel, vice president of research and innovation at Crowdbotics, likewise points to a lack of high-quality training data as the biggest hurdle. For instance, OpenAI Codex, the popular model that GitHub Copilot uses, was trained on publicly available code containing errors that affect quality.

Security vulnerabilities are another issue. β€œAI-generated code may contain exploitable flaws,” says Neuroheart.ai’s Gopi. While AI is good at fixing bugs, it struggles to find them. A research paper from OpenAI found that AI agents β€œfail to root cause, resulting in partial or flawed solutions.” The paper notes:

Agents pinpoint the source of an issue remarkably quickly, using keyword searches across the whole repository to quickly locate the relevant file and functionsβ€”often far faster than a human would. However, they often exhibit a limited understanding of how the issue spans multiple components or files, and fail to address the root cause, leading to solutions that are incorrect or insufficiently comprehensive.

Other industry reports find increasing AI defects. For instance, Apiiro research found that personally identifiable information (PII) and payment data exposed in code repositories have surged three-fold since mid-2023, attributing this to the adoption of AI-assisted development.

Legal gray areas also could stunt the use of AI code and introduce compliance issuesβ€”some AI tools claim ownership of the code they output, while others retain IP for model retraining purposes. β€œMany companies are concerned about protecting proprietary data and ensuring it is not inadvertently used to train external models,” says Adam Kentosh, field chief technical officer of North America at Digital.ai.

Distrust and adoption barriers

β€œIt all comes down to trustβ€”do people trust what AI generates for building new applications?” asks Dan Fernandez, vice president of product management at Salesforce. Google’s 2024 DORA report found that, on average, developers only β€œsomewhat” trust AI-generated code.

β€œThe biggest barrier to adoption is trust in AI’s accuracy,” says Durkin. Unlike a human developer, AI has no intrinsic conscience or accountability, he says, making compliance and reliability checks more crucial for AI outputs.

AI’s opacity makes it difficult to trust in critical applications. β€œTrust is a big issue when it comes to any AI-provided code, but for legacy code in particular, which is where most software investment happens,” says Jeff Gabriel, executive vice president of engineering at Contentful.

β€œThe biggest hurdle is likely internal opposition to AI at many companies,” says Joseph Thacker, the solo founder of rez0corp and bug bounty hunter, noting that high-level staff often bar sanctioned AI use.

How AI-generated code will improve

Although AI-generated code faces obstacles, solutions are emergingβ€”many revisiting fundamental coding best practices. β€œThe challenges are multi-faceted, but we’re already seeing these challenges addressed,” says Shuyin Zhao, vice president of product for GitHub Copilot.

Validating AI outputs

Just as with human-generated code, rigorous testing must be applied to AI-generated code. β€œDevelopers should still carefully review, refine, and optimize AI-generated code to ensure it meets the highest standards for security, performance, and maintainability,” says Kevin Cochrane, chief marketing officer at Vultr.

Automated testing of AI outputs will be key. Perforce’s Cope recommends taking a slice out of the devops playbook with automated testing, static code analysis, and masking sensitive data for training AI models. β€œMany of these tools are already engineered to support AI or, if not, will do so very soon.”

β€œIncreased code throughput from AI puts pressure on downstream processes and systems, necessitating robust automation in QA testing to ensure continued reliability,” adds Digital.ai’s Kentosh. 

AI can also play a role in policing itselfβ€”double-checking code quality, using predictive models to identify potential risks, and conducting security scans. β€œMore widespread use of responsible AI (RAI) filters to screen for harmful content, security vulnerabilities, and notify users of public code matching are all important,” says GitHub’s Zhao. 

Progressive rollouts can also help avoid drawbacks by gauging the effect of individual code changes. β€œTechniques like canary deployments, feature flagging, or feature management allow teams to validate code with limited exposure,” says Durkin.

Better training data

It all comes down to the training data because, as the saying goes, β€œgarbage in, garbage out.” As such, Zhao believes we need β€œmore sanitization and use of high-quality code samples as training data.” Avoiding model collapse requires feeding AI models additive data rather than regurgitated outputs.

Feeding LLMs project-specific context, like custom libraries, style guides, software bills of materials, or security knowledge, can also improve accuracy. β€œEnsuring AI models are trained on trusted data and fine-tuned for specific applications will help improve the accuracy of AI-generated code and minimize hallucinations in outputs,” says Salesforce’s Fernandez. 

Certain IDE-based solutions and technologies are emerging to grant developers more real-time context, too. Onymos’s Vangala proposes that retrieval-augmented generation (RAG) will help reference version-specific software libraries or code repositories.

Finely tuned models

Instead of relying on massive general models, companies are shifting toward smaller, specialized models for specific coding tasks. β€œThe largest model isn’t necessary for every use case in the developer life cycle,” says Fernandez. β€œWe’re exploring a federated architecture of smaller models, where low-powered LLMs handle many tasks for developers.”

Improved training and finely tuned models will likely result in a higher degree of accuracy, but the best results may operate behind corporate firewalls. β€œ2025 will see the rise of fine-tuned models trained on companies’ existing code that run β€˜behind the wall’ significantly outperforming publicly available models,” says Crowdbotics’s Hymel.

Enhanced prompt engineering

Another aspect is improved prompt engineering. β€œWe’ll also need to work on how we prompt, which includes the additional context and potential fine-tuning for system-specific scenarios,” says Contentful’s Gabriel.

β€œPrompt engineering is going to be a necessary part of a software engineer’s job,” says Vangala. To get there, the onus is on developers to upskill. β€œWe need to teach our developers how to write better prompts to get the kind of AI output we want.” 

New AI-enabled solutions will also help. β€œThe biggest impact will come from better models and better coding applications which provide more context,” says rez0corp’s Thacker, pointing to solutions like Cursor and the recent upgrades to GitHub Copilot.

New agentic AI tools

AI agents will be a continued focal point for improving software engineering overall, bringing self-checking capabilities. β€œNew reasoning models can now iterate and verify their own work, reducing hallucinations,” says Exabeam’s Wilson.

For instance, GitHub has added Copilot Autofix, which can detect vulnerabilities and provide fix suggestions in real time, and a build and repair agent to Copilot Workspace. β€œPerhaps the biggest, most exciting thing we’ll continue to see is the use of agents to improve code quality,” says GitHub’s Zhao.

β€œI expect that AI-generated code will be normalized over the next year,” says Fernandez, pointing to the ongoing rise of AI-powered agents for software developers that extend beyond code generation to testing, documentation, and code reviews.

β€œDevelopers should also investigate the myriad of tools available to find those that work and consider how to fill the gaps with those that don’t,” says Gabriel. This will require both individual and organizational investment, he adds. 

Looking to the future, many anticipate open source leading further AI democratization. β€œI expect we’ll see a lot more open-source models emerge to address specific use cases,” says David DeSanto, chief product officer at GitLab.

Governance around AI usage

Enhancing developers’ confidence in AI-generated code will also rely on setting guardrails for responsible usage. ”With the appropriate guardrails in place to ensure responsible and trusted AI outputs, businesses and developers will become more comfortable starting with AI-generated code,” says Salesforce’s Fernandez.

To get there, leadership must establish clear directions. β€œUltimately, it’s about setting clear boundaries for those with access to AI-generated code and putting it through stricter processes to build developer confidence,” says Durkin.

β€œEnsuring transparency in model training data helps mitigate ethical and intellectual property risks,” says Neuroheart.ai’s Gopi. Transparency is crucial from an IP standpoint, too. β€œHaving no hold on AI output is critical for advancing AI code generation as a whole,” says GitLab’s DeSanto, who references GitLab Duo’s transparency commitment regarding its underlying models and usage of data.

For security-conscious organizations, on-premises AI may be the answer to avoiding data privacy issues. Running self-hosted models in air-gapped, offline deployments allows AI to be used in regulated environments while maintaining data security, says DeSanto.

Striking a balance between human and AI

All experts interviewed for this piece believe AI will assist developers rather than replace them wholesale. In fact, most view keeping developers in the loop as imperative for retaining code quality. β€œFor now, human oversight remains essential when using AI-generated code,” says Digital.ai’s Kentosh.

β€œBuilding applications will mostly remain in the hands of the creative professionals using AI to supplement their work,” says SurrealDB’s Hitchcock. β€œHuman oversight is absolutely necessary and required in the use of AI coding assistants, and I don’t see that changing,” adds Zhao. 

Why? Partially, the ethical challenges. β€œComplete automation remains unattainable, as human oversight is critical for addressing complex architectures and ensuring ethical standards,” says Gopi. That said, AI reasoning is expected to improve. According to Wilson, the next phase is AI β€œbecoming a legitimate engineering assistant that doesn’t just write code, but understands it.”

Others are even more bullish. β€œI think that the most valuable AI-driven systems will be those that can be handed over to AI coding entirely,” says Contentful’s Gabriel, although he acknowledges this is not yet a consistent reality. For now, future outlooks still place AI and humans cooperating side-by-side. β€œDevelopers will become more supervisors rather than writing every line of code,” says Perforce’s Cope.

The end goal is striking the right balance between productivity gains from AI and avoiding over-reliance. β€œIf developers rely too heavily on AI without a solid understanding of the underlying code, we risk losing creativity and technical depth, which are crucial for innovation,” says Kentosh.

Wild ride ahead

Amazon recently claimed its AI rewrote a Java application, saving $260 million. Others are under pressure to prove similar results. β€œMost companies have made an investment in some type of AI-assisted development service or copilot at this point and will need to see a return on their investment,” says Kentosh.

Due to many factors, AI adoption continues to accelerate. β€œMost every developer I know is using AI in some capacity,” adds Thacker. β€œFor many of them, AI is writing the majority of the code they produce each day.”

Yet, while AI eliminates repetitive tasks effectively, it still requires human intervention to take it to the final mile. β€œThe majority of code bases are boilerplate and repeatable,” says Crowdbotics’s Hymel. β€œWe’ll see AI being used to lay 51%+ of the β€˜groundwork’ of an application that is then taken over by humans to complete.”

The bottom line? β€œAI-generated code isn’t greatβ€”yet,” says Wilson. β€œBut if you’re ignoring it, you’re already behind. The next 12 months are going to be a wild ride.”