Subscribe
Share
blog post background
Trending

Databricks Data + AI Summit 2026: What actually matters for enterprises investing in AI

By Valeryia Tsybulka, Data Engineer
Published on

TL;DR

  • Databricks Data + AI Summit 2026 drew over 30,000 attendees to San Francisco, and ITRex tracked the sessions, product releases, and technical deep dives as a Databricks partner.
  • Ali Ghodsi, the CEO, framed the entire event this way: AI doesn’t have an intelligence problem; it has a context problem. Every major release, from Genie Ontology to Unity AI Gateway, is a piece of that argument.
  • The biggest theme was governance: Unity AI Gateway now polices what agents can do, what they cost, and which MCP servers they’re allowed to touch.
  • LTAP and Lakebase are trying to kill the “ETL tax,” the cost of keeping transactional and analytical copies of the same data in sync. This is a real change, but it requires a 12 to 18 month architectural bet.
  • Agent Bricks has over 100,000 agents built on it and now supports any framework you want, including LangGraph, CrewAI, Claude Code SDK, and OpenAI’s SDKs. Framework lock-in is no longer the constraint.
  • The honest takeaway: most organizations whose AI initiatives underperform are failing because their data foundation and governance weren’t ready for AI agents.

If you didn’t have a chance to attend the latest Databricks summit, either in person or online, here’s the hot take: more than 20,000 organizations worldwide, including over 70% of the Fortune 500, run on Databricks. When the company changes direction on its keynote stage, that direction shows up in decisions about enterprise data infrastructure for years to come. This event, in particular, established the framework for how a sizable portion of the market will build, govern, and pay for AI through 2027 and beyond.

ITRex followed the summit as a Databricks partner, digging into the technical sessions and product documentation with the questions our clients actually ask: Is this production-ready? What does it cost to operate? What breaks when you push it past a pilot? This recap reflects that practitioner’s lens.

The headline framing, delivered directly by CEO Ali Ghodsi, was that AI has a serious context problem. Models are good enough. What’s missing in most enterprises is the governed, trustworthy data layer that lets a model or an agent act on accurate business context instead of guessing. At DAIS 2026, nearly every announcement about agentic AI, governance, and lakehouse infrastructure traced back to that single claim.

The shift from "can we build AI" to "can we govern AI"

Databricks Summit

For the past two summits, the conversation centered on whether enterprises could build AI agents at all. That question is largely settled. Organizations are already using agents to analyze data, draft reports, write code, automate workflows, and answer questions against enterprise knowledge bases. The conversation at DAIS 2026 had moved on to something harder: how do you operate these systems safely across the organization without losing track of what they’re doing or what they cost?

This is the same maturity curve we’ve watched plenty of our clients move through. The first AI project is usually a proof of concept, scoped narrowly enough to avoid hard governance questions. The second and third projects are where things get expensive, because now there are multiple agents interacting with sensitive data, multiple teams building independently, and nobody with a clear view of total spend or risk exposure. Databricks built an entire product category this year specifically to address that gap.

Genie One: An agent for business teams, not just engineers

Genie One became generally available at the summit. It’s positioned as an “agentic coworker” for marketing, finance, and sales teams, available on the web, iOS, and Android, with scheduling, alerts, and MCP tool integration built in. It generates documents and reports grounded in Unity Catalog metadata and validates user access rights before returning an answer. That detail matters more than it sounds. Many enterprise AI deployments stall because no one can confidently confirm that the answer a user received only reflected data they were authorized to see—and working backwards to prove that after the fact is difficult. Genie One bridges that gap by design.

Here’s the catch, and Databricks said this plainly in its own technical sessions: Genie One’s answers are only as good as the semantic layer underneath them. If your business glossary and metric definitions are inconsistent, scattered, or out of date, you’ll get an agent that’s confidently wrong instead of an analyst who’s slowly right. Before rolling Genie One out broadly, the prerequisite is Unity Catalog Metrics and a maintained business glossary, not a bigger and more powerful model.

Genie Ontology: Solving the stale context problem

Genie Ontology, the second release, approaches the same problem from a different perspective. It’s a self-updating knowledge graph that continuously pulls business context from Databricks and its connected tools, such as Slack, Jira, Confluence, Google Drive, and SharePoint, so agents don’t have to rebuild their understanding of the business from scratch on each call, or rely on documentation that hasn’t been touched since last year.

This is an actual engineering solution to a real, boring problem: context drift. Most enterprise knowledge bases become out of date within months of being written. An ontology that updates itself using the tools that people actually use is a significantly different approach than a static RAG pipeline pointing to a wiki that no one maintains.

Unity AI Gateway: The part that should get budget first

If there’s one announcement from DAIS 2026 that deserves to be at the top of every CIO’s list, it’s Unity AI Gateway. It extends Unity Catalog’s governance model beyond data assets into the runtime layer, meaning it governs what models, agents, MCP servers, and AI tools are actually doing while they’re doing it.

In practice, that means:

  • Spend caps and smart model routing across providers

  • Policy enforcement over what agents can access at runtime, not just what data they’re allowed to query

  • A single governance point for MCP server integrations, which is becoming its own sprawl problem as teams connect agents to more tools

Databricks presented two examples on stage that we believe are worth repeating to any executive who is skeptical of investing in governance before a single agent ships: Uber reportedly spent its entire annual AI budget in a single quarter due to engineers adopting AI tools like Claude Code and Cursor en masse, and PepsiCo described the operational nightmare of managing hundreds of different data environments across its global footprint. Spend governance for AI is no longer just a technical courtesy. It’s a finance department issue, and it shows up in the budget in months, not years, if no one is watching.

If you take one thing from this entire summit and act on it before anything else, make it this: deploy governance before scaling agent workloads. Retrofitting auditability and spend control onto agents that are already running in production is harder, slower, and pricier than building it in from the start. We’ve seen this pattern play out with multiple clients, and it never gets cheaper to fix later.

Data quality is still the bottleneck; generative AI hasn't changed that

databricks summit

It would have been easy for a summit that focused on agentic AI to treat data quality as a solved problem. It didn’t. Session after session repeated the same point: AI projects fail because of data issues far more often than because of model limitations, and that hasn’t changed in the generative AI era. Fragmented sources, inconsistent metric definitions, and unclear data ownership produce the same bad outcomes whether you’re running a basic dashboard or a fleet of autonomous agents.

The organizations Databricks pointed to as success stories shared a consistent profile: trusted data sources with documented lineage, consistent metadata management, a unified architecture without duplicate copies of the same data floating around, and clear ownership of data assets across business domains. None of that is new advice. What’s changed is the cost of skipping it. A bad dashboard misleads one analyst. A bad agent can take an autonomous action based on the same wrong number across thousands of transactions, without anyone noticing until the cost or the customer impact becomes apparent.

This is the part of the summit message we’d ask our clients to sit with the longest. It’s tempting to treat governance and data quality as the unglamorous prerequisite to the fun part, the agents and the automation. Databricks’ own field data says the opposite: the organizations moving fastest from pilot to production are the ones who treated governance as the starting point.

Lakehouse architecture is becoming the operational layer, not just the analytics layer

The lakehouse concept isn’t new, but DAIS 2026 marked a significant shift in what it’s expected to do. It is evolving from a pattern for consolidating analytics and AI workloads to a foundation capable of handling real-time operational transactions, which were previously reserved for separate OLTP databases.

Two announcements reflect that shift.

LTAP (Lake Transactional/Analytical Processing) is Databricks’ forthcoming architecture—announced at DAIS 2026 but not yet generally available—that will unify transactional and analytical workloads on a single logical copy of data stored in open formats, removing the CDC pipelines and hidden replication layers that even “zero ETL” approaches still rely on behind the scenes.

Lakebase, the serverless Postgres component that makes LTAP work, is already generally available and processing 12 million database launches a day, which gives Databricks a real production baseline behind the architectural pitch. The 2026 additions include:

  • Cross-cloud disaster recovery

  • Git-style branching for spinning up isolated dev and test environments in seconds

  • A beta hybrid vector and full-text search feature with 32x compression that supports billion-plus vector indexes affordably

Here’s where we’d push back gently on the keynote energy: LTAP is genuinely compelling, but it’s an architectural bet with a 12 to 18 month horizon for most organizations. If you’re building a greenfield data application, it’s worth evaluating seriously now, while the cost of change is low. If you’ve already got an established OLTP and OLAP setup with years of integrations built around it, the right move is a careful pilot, not a wholesale migration based on summit momentum. We’d say the same thing about Lakehouse RT, the new real-time analytics layer running on Databricks’ Reyden engine at roughly 12,000 queries per second with sub-100 ms latency directly on governed Delta and Iceberg tables. It’s still in beta. Treat it as exactly that until Databricks’ own documentation says otherwise, and confirm GA status before committing production workloads to it.

Agent Bricks & the end of framework lock-in

For teams actually building agents, the most practically useful announcement might be the Agent Bricks expansion. Over 100,000 agents have already been built on the platform, and Databricks reports processing more than 1 quadrillion tokens a year in agent workloads, which tells you the product isn’t a developer-preview toy anymore.

The more interesting detail came from Databricks’ own retrospective on a year of Agent Bricks deployments: the actual agent loop, the part everyone gets excited about, is roughly 1% of the engineering work. The remaining 99%, which includes token capacity, deployment, security, evaluation, monitoring, context management, and team sharing, is hidden technical debt that is not budgeted for upfront. Agent Bricks is built to absorb that 99%, and the 2026 update makes it framework-agnostic: LangGraph, Agno, CrewAI, the Claude Code SDK, and OpenAI’s Agent SDKs all run on it, with horizontal autoscaling through Databricks Apps.

That matters for a specific, practical reason: it removes one of the more common excuses we hear for delaying a governance investment, which is “we haven’t picked a framework yet.” You no longer need to lock into one agent framework to get production-grade deployment, observability, and cost controls. Pick the framework that fits your team’s skills, and let the platform handle the operational layer underneath it.

A companion release, Omnigent, an open-source meta-harness now also available as a managed beta, sits above whatever frameworks you’re already running, adding cost-budget controls and shared sessions across them. Databricks co-founder and CTO Matei Zaharia called it a “harness of harnesses.” It’s early and beta, but it’s a sign of where multi-framework orchestration is heading.

OpenSharing: Vendor neutrality, extended to AI assets

One announcement that drew less attention than the agentic releases but deserves more: OpenSharing, the successor to the Delta Sharing protocol, is now a project under the Linux Foundation. It extends secure, zero-copy data sharing beyond structured datasets to AI assets themselves, agent skills, models, and Genie Agents. On-premises storage partners, including MinIO, NetApp, Nutanix, HPE, and Qumulo, implement it natively, which means organizations with data that legally or contractually can’t move to the cloud still get zero-copy access without standing up custom integration work.

For enterprises that have spent the last decade trying to avoid vendor lock-in on the data side, this solution closes a gap that’s been quietly opening on the AI side. Sharing a table across organizational boundaries has been solved for a while. Sharing an agent or a model the same way hasn’t—until now.

Where the genuine risk is

A useful summit recap doesn’t just repeat the announcements. It tells you where things will go wrong if you act on them carelessly, and a few patterns are worth flagging plainly:

  • Starting with agents before fixing the data underneath them. Genie Ontology can pull context automatically from your existing documentation, but it inherits every error and inconsistency already sitting in that documentation. It doesn’t fix bad data. It just makes bad data faster to act on.

  • Deploying agents without Unity AI Gateway in place. Ungoverned spending and unauditable agent actions are far more expensive to clean up after deployment than to prevent in the first place, whether the cleanup involves a security incident or simply an uncomfortable budget conversation with finance.

  • Confusing LTAP with the “zero ETL” marketing language you’ve heard from other vendors. Instead of hiding the replication pipeline, LTAP genuinely eliminates it. That’s a real architectural difference, but it’s worth verifying against the technical documentation rather than the keynote slide before any migration planning starts.

  • Treating beta features as production-ready. Lakebase Search, Lakehouse RT, Genie ZeroOps, and the Omnigent managed service are all in beta or private preview as of this summit. Betas from Databricks are usually solid, but “usually solid” and “ready for a production dependency” are different standards. Confirm GA status at docs.databricks.com before any of these end up load-bearing in your architecture.

What we'd actually recommend, in order

Databricks Summit Chart3

If you’re trying to translate three days of keynotes into a sequenced plan, here’s the order we’d suggest based on our read of the announcements and what we already know from delivering data and AI projects for enterprise clients.

  1. Start with governance. Unity AI Gateway, with spend caps and policy controls configured, is the control plane that makes every later step auditable. This isn’t the exciting part of the roadmap, but it’s the part that prevents the expensive surprises.

  2. Build the semantic layer second. Unity Catalog Metrics, a maintained business glossary, and clearly defined domains need to be in place before Genie Ontology or Genie One can ground their answers in anything reliable. This is the work that determines whether your agent deployment is useful or just fast.

  3. Turn on observability from day one, not after something breaks. MLflow 3.0 tracing shows which tool an agent called, what it retrieved, and what it cost. That’s the diagnostic foundation for explaining agent behavior to stakeholders who, fairly, want to know why an autonomous system did what it did.

  4. Pilot Lakebase branching on something new, not your production system. Git-style database branching for isolated dev and test environments is genuinely useful for teams building AI applications with complicated data dependencies, and a greenfield project is the low-risk place to learn how it behaves.

  5. Treat Genie Code’s BI import as a migration accelerator to validate, not a drop-in replacement. Its ability to import Tableau or Power BI workbooks and auto-generate dashboards is still in beta. Check the output against your existing metric definitions before you decommission anything that’s currently working.

The bigger picture for your AI investment

Strip away the product names, and the summit was making one consistent argument: AI success in 2026 depends less on which model you use and more on whether your organization has built the operational discipline to run AI safely and measure what it actually delivers. Databricks pointed to companies like AstraZeneca, Mastercard, Novo Nordisk, and Mercedes-Benz as case studies, but they weren’t praised for picking the smartest model. They were praised for having governed data, clear ownership, and measurable outcomes tied to specific business problems before they scaled anything.

That’s consistent with what we see across our own client engagements, and it’s exactly why we don’t lead with the technology when a client comes to us wanting “an AI agent.” Instead, ITRex evaluates what’s actually broken, what data exists to fix it, and whether an agent is the right tool or whether a simpler automation gets them there faster and cheaper. Sometimes, Databricks is the right platform for that work. Occasionally, a different stack fits the client’s existing investments and team skills better, and we’ll say so. Our priority is the outcome you’re paying for, not the platform we’re partnered with.

TABLE OF CONTENTS
The shift from "can we build AI" to "can we govern AI"Genie One: An agent for business teams, not just engineersGenie Ontology: Solving the stale context problemUnity AI Gateway: The part that should get budget firstData quality is still the bottleneck; generative AI hasn't changed thatLakehouse architecture is becoming the operational layer, not just the analytics layerAgent Bricks & the end of framework lock-inOpenSharing: Vendor neutrality, extended to AI assetsWhere the genuine risk isWhat we'd actually recommend, in orderThe bigger picture for your AI investment
Contact ITRex
Contact us
background banner
Background desktop

If your team is weighing where Unity Catalog, Lakebase, or agentic AI fits into your roadmap, our Databricks consulting and implementation services cover everything from governance setup and semantic layer design to production agent deployment. Contact us to scope your project.