Matt Asay
Contributing Writer

Enterprise essentials for generative AI

analysis
Aug 25, 202510 mins

Having a โ€˜visionโ€™ is not enough. Enterprises need clear objectives, solid data, and a design plan with built-in evaluations and humans in the loop.

user at laptop with genai chatbot assistant
Credit: Summit Art Creations / Shutterstock

Every day brings a new, better large language model (LLM) or a new approach to finding signal in all the AI noise. Itโ€™s exhausting to try to keep up. But hereโ€™s a comforting yet uncomfortable truth about enterprise AI: Most of whatโ€™s loud today wonโ€™t persist tomorrow. While models trend like memes, frameworks spawn like rabbits, and at any given moment a new โ€œthis time itโ€™s differentโ€ pattern elbows yesterdayโ€™s breakthrough into irrelevance, the reality is you donโ€™t need to chase every shiny AI object. You just need to master a handful of durable skills and decisions that compound over time.

Think of these durable skills and decisions as the โ€œoperating systemโ€ of enterprise AI workโ€”the core upon which everything else runs. Get those elements right and all the other stuffโ€”agents, retrieval-augmented generation (RAG), memory, whatever gets rebranded nextโ€”becomes a plug-in.

Focus on the job, not the model

The most consequential AI decision is figuring out what problem youโ€™re trying to solve in the first place. This sounds obvious, yet most AI projects still begin with, โ€œWe should use agents!โ€ instead of, โ€œWe need to cut case resolution times by 30%.โ€ Most AI failures trace back to unclear objectives, lack of data readiness (more on that below), and lack of evaluation. Success starts with defining the business problem and establishing key performance indicators (KPIs). This seems ridiculously simple. You canโ€™t declare victory if you havenโ€™t established what victory looks like. However, this all-important first step is commonly overlooked, as Iโ€™ve noted.

Hence, itโ€™s critical to translate the business goal into a crisp task spec:

  • Inputs: what the system actually receives (structured fields, PDFs, logs)
  • Constraints: latency, accuracy thresholds, regulatory boundaries
  • Success definition: the metric the business will celebrate (fewer escalations, faster cycle time, lower cost per ticket, etc.)

This task spec drives everything elseโ€”whether you even need generative AI (often you wonโ€™t), which patterns fit, and how youโ€™ll prove value. Itโ€™s also how you stop your project from growing into an unmaintainable โ€œAI experienceโ€ that does many things poorly.

Make data clean, governed, and retrievable

Your enterpriseโ€™s advantage is not your model; itโ€™s your data, but โ€œwe have a lot of dataโ€ is not a strategy. Useful AI depends on three things:

  • Fitness for use: You want data thatโ€™s clean enough, labeled enough, and recent enough for the task. Perfection is a tax you donโ€™t need to pay; fitness is what matters. Long before genAI became a thing, I wrote, โ€œFor years weโ€™ve oversold the glamorous side of data science โ€ฆ while overlooking the simple reality that much of data science is cleaning and preparing data, and this aspect of data science is fundamental to doing data science well.โ€ Thatโ€™s never been more true.
  • Governance: Know what data you can use, how you can use it, and under what policy.
  • Retrievability: You need to get the right slice of data to the model at inference time. Thatโ€™s not a model problem; itโ€™s a data modeling and indexing problem.

Approaches to retrieval-augmented generation will continue to morph, but hereโ€™s a principle that wonโ€™t: The system can only be as good as the context you retrieve. As Iโ€™ve suggested, without organization-specific context such as policies, data, and workflows, even great models will miss the point. We therefore must invest in:

  • Document normalization: Consistent formats and chunking should align with how your users ask questions.
  • Indexing strategy: Hybrid search (lexical plus vector) is table stakes; tune for the tasks you actually run.
  • Freshness pipelines: Your index is a dynamic asset, not a quarterly project. Memory is the โ€œkiller appโ€ for AI, as Iโ€™ve written, but much of that memory must be kept fresh and recent to be useful, particularly for real-time applications. 
  • Meta-permissions: Retrieval must respect row/column/object-level access, not just โ€œwho can use the chatbot.โ€

In other words, treat your retrieval layer like an API contract. Stability and clarity there outlast any particular RAG library.

Evaluation is software testing for AI (run it like CI)

If your โ€œevaluationโ€ is two PMs and a demo room, you donโ€™t have evaluation because LLMs fail gracefully right up until they donโ€™t. The way out is automated, repeatable, task-aligned evals. Great AI requires systematic, skeptical evaluation, not vibes-driven development. Hence, success depends on treating model behavior like crash-test engineering, not magic. This means the use of golden sets (representative prompts/inputs and expected outputs, ideally derived from real production traces), numeric- and rubric-based scoring, guardrail checks, and regression gates (no new model, prompt, or retrieval change ships without passing your evaluation suite).

Evaluations are how you get off the treadmill of endless prompt fiddling and onto a track where improvements are proven. They also enable developers to swap models in or out with confidence. You wouldnโ€™t ship back-end code without tests, so stop shipping AI that way.

Design systems, not demos

The earliest wins in enterprise AI came from heroic demos. You know, the stuff you wade through on X all day. (โ€œWow, I canโ€™t believe I can create a full-length movie with a two-line prompt!โ€) That hype-ware has its place, but truly great AI is dull, as Iโ€™ve noted. โ€œAnyone whoโ€™s pushed real software to production knows that getting code to compile, pass tests, and run reliably in the wild is a far tougher slog than generating the code in the first place.โ€

Sustainable wins come from composable systems with boring interfaces:

  • Inference gateways abstract model selection behind a stable API.
  • Orchestration layers sequence tools: Retrieval โ†’ Reasoning โ†’ Action โ†’ Verification.
  • State and memory are explicit: short-term (per task), session-level (per user), and durable (auditable).
  • Observability from logs, traces, cost and latency telemetry, and drift detection.

โ€œAI agentsโ€ will keep evolving, but theyโ€™re just planners plus tools plus policies. In an enterprise, the policies (permissions, approvals, escalation paths) are the hard part. Build those in early.

Latency, cost, and UX are product features

Enterprises donโ€™t abandon AI because itโ€™s โ€œnot smart enough.โ€ They abandon it because itโ€™s too slow, too expensive, or too weird for users. Here are a few examples:

  • Latency: For interactive flows, aim under ~700ms for visible progress and under ~1.5s for a โ€œfeels instantโ€ reply. This will have a huge impact on your customer experience. Use smaller or distilled models wherever you can and stage responses (e.g., quick summary first, deep analysis on demand).
  • Cost: Track tokens like a P&L. Cache aggressively (semantic caching matters), reuse embeddings, and pick models by task need, not ego. Most tasks donโ€™t need your largest model (or a model at all).
  • UX: Users want predictability more than surprise. Offer controls (โ€œcite sources,โ€ โ€œshow stepsโ€), affordances to correct errors (โ€œedit query,โ€ โ€œthumbs down retrainโ€), and consistent failure modes.

AI doesnโ€™t change the laws of enterprise โ€œphysics.โ€ If you can show โ€œwe cut average handle time by 19% at $0.03 per interaction,โ€ your budget conversations around AI become easy, just like any other enterprise technology.

Security, privacy, and compliance are essential design inputs

Nothing kills momentum faster than a late-stage โ€œLegal says no.โ€ Bring them in early and design with constraints as first-class requirements. Enough said. This is the shortest section but arguably the most important.

Keep people in the loop

The fastest way to production is rarely โ€œfull autonomy.โ€ Itโ€™s human-in-the-loop: Assist โ†’ Suggest โ†’ Approve โ†’ Automate. You start with the AI doing the grunt work (drafts, summaries, extractions), and your people verify. Over time, your evals and telemetry make specific steps safe to auto-approve.

There are at least two benefits to this approach. The first is quality: Humans catch the 1% that wrecks trust. The second is adoption: Your team feels augmented, not replaced. That matters if you want real usage rather than quiet revolt. Itโ€™s also essential since the best approach to AI (in software development and beyond) augments skilled people with fast-but-unthinking AI.

Portability or โ€˜donโ€™t marry your modelโ€™

Andy Oliver is right: โ€œThe latest GPT, Claude, Gemini, and o-series models have different strengths and weaknesses, so it pays to mix and match.โ€ Not only that, but the models are in constant flux, as is their pricing and, very likely, your enterpriseโ€™s risk posture. As such, you donโ€™t want to be hardwired to any particular model. If swapping a model means rewriting your app, you only built a demo, not a system. You also built a problem. Hence, successful deployments follow these principles:

  • Abstract behind an inference layer with consistent request/response schemas (including tool call formats and safety signals).
  • Keep prompts and policies versioned outside code so you can A/B and roll back without redeploying.
  • Dual run during migrations: Send the same request to old and new models and compare via evaluation harness before cutting over.

Portability isnโ€™t just insurance; itโ€™s how you negotiate better with vendors and adopt improvements without fear.

Things that matter less than you think

Iโ€™ve been talking about how to ensure success, yet surely some (many!) people who have read up to this point are thinking, โ€œSure, but really itโ€™s about prompt engineering.โ€ Or a better model. Or whatever. These are AI traps. Donโ€™t get carried away by:

  • The perfect prompt. Good prompts help; great retrieval, evaluations, and UX help more.
  • The biggest model. Most enterprise tasks thrive on right-sized models plus strong context. Context is the key.
  • Tomorrowโ€™s acronym. Agents, RAG, memoryโ€”these are ingredients. Data, evaluation, and orchestration are what make it all work.
  • A single vendor to rule them all. Consolidation is nice, but only if your abstractions keep you from being stuck.

These principles and pitfalls may sound sexy and new when applied to AI, but theyโ€™re the same things that make or break enterprise applications, generally. Ultimately, the vendors and enterprises that win in AI will be those that deliver exceptional developer experience or that follow the principles Iโ€™ve laid out and avoid the pitfalls.

Matt Asay

Matt Asay runs developer marketing at Oracle. Previously Asay ran developer relations at MongoDB, and before that he was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a JD from Stanford, where he focused on open source and other IP licensing issues. The views expressed in Mattโ€™s posts are Mattโ€™s, and donโ€™t represent the views of his employer.

More from this author