Weekly Column

Jun 27, 2026

AI moved from demos toward production plumbing this week: OpenAI previewed GPT-5.6 Sol and a Broadcom-built inference chip, Google added computer-use capabilities to Gemini 3.5 Flash, AWS pushed isolated MicroVM sandboxes and managed RAG, and infrastructure vendors sharpened the rack-scale economics of agents, retrieval and enterprise deployment.

The week in one paragraph

The industry’s centre of gravity keeps moving from “who has the smartest model?” to “who can turn models into dependable systems?” OpenAI previewed GPT-5.6 Sol and, more strategically, described Jalapeño, a Broadcom-built inference chip aimed at the cost and latency profile of large-language-model serving. Google DeepMind added computer-use capabilities to Gemini 3.5 Flash, while Google Cloud framed agentic AI as a perimeter and governance problem as much as a model problem. AWS launched Lambda MicroVMs for isolated, stateful execution of user- or AI-generated code and expanded Bedrock’s managed knowledge-base layer. NVIDIA and AWS pushed GPU-accelerated retrieval and new Blackwell instances; Arm argued that the CPU’s role in AI racks is expanding, not disappearing. The takeaway: agents are becoming an infrastructure category, and the winners will be the platforms that combine models, retrieval, sandboxing, governance, cost control and developer ergonomics.

The big AI/platform moves

OpenAI had the loudest week. Its official news feed says GPT-5.6 Sol is a next-generation model with stronger coding, science and cybersecurity capabilities and an upgraded safety stack. The more market-shaping announcement may be Jalapeño, the LLM-optimized inference processor it unveiled with Broadcom. The point is not merely that OpenAI wants custom silicon; that has been visible for some time. The point is that inference has become a product-design constraint. As AI moves into always-on enterprise workflows, customer support, coding agents and multimodal assistants, the cost curve of serving tokens becomes as important as benchmark leadership.

OpenAI also published research and customer material around agents transforming work, Codex for longer-running tasks, standards for advanced AI, and open-source security efforts under Daybreak and Patch the Planet. Read together, these releases show a company trying to control more of the stack: frontier models, enterprise usage analytics, coding workflows, security remediation, standards and inference economics. For customers, that integration is attractive; for competitors and regulators, it raises the familiar question of whether the AI platform layer becomes too vertically concentrated.

Google’s week was more product-and-platform oriented. Google DeepMind introduced computer use in Gemini 3.5 Flash, making browser and interface interaction a built-in capability rather than a bespoke wrapper around a model. That matters because “computer use” is one of the missing bridges between chatbot intelligence and useful agents: the model must not only answer, but operate software, observe state, recover from errors and respect policy boundaries. Google Cloud’s companion security post on VPC Service Controls for agentic AI is an important counterweight. The more agents can act, the more enterprises need destination-based perimeters, data-exfiltration controls and observability around tool use.

AWS moved aggressively on the agent runtime. Lambda MicroVMs are a new primitive inside Lambda for isolated, stateful execution environments with Firecracker isolation, near-instant launch/resume and explicit lifecycle control. The target use cases are exactly where agents create risk: AI coding assistants, interactive data tools, vulnerability scanners and applications that run user-supplied or model-generated code. In parallel, Amazon Bedrock Managed Knowledge Base abstracts the RAG stack — connectors, parsing, embeddings, re-ranking, retrieval and scaling — into a managed service. The strategic signal is clear: AWS wants agent builders to compose models, current web knowledge, enterprise documents and secure sandboxes without becoming infrastructure teams.

Data stack and enterprise software

The data-platform story this week was not a single blockbuster release; it was the continued merger of data engineering, application backends and AI retrieval. Databricks wrote about turning video into searchable, actionable intelligence, serverless Postgres for AI applications, ETL migration and customer deployments. The direction is consistent with Databricks’ broader lakehouse thesis: more enterprise information — structured tables, unstructured documents, telemetry, images and video — is being pulled into governed AI workflows.

Google Cloud made BigQuery managed Python UDFs generally available, letting teams define and execute Python scalar functions directly within SQL. That may sound incremental, but it is part of a wider pattern: analytical systems are absorbing application logic, model-adjacent transformations and developer workflows that once sat outside the warehouse. Microsoft Azure’s PostgreSQL-in-VS-Code optimization post points in the same direction from the database developer side. The winning data platforms are trying to reduce context switches: write SQL, invoke Python, tune Postgres, build apps and connect agents without leaving the platform’s control plane.

Snowflake was quieter in the sources reviewed this week, while Databricks and Google Cloud supplied more visible product signals. That does not mean the competitive dynamic has paused. It means buyers should watch whether platforms are competing on benchmark claims or on the operational details that decide adoption: lineage, cost governance, identity, vector search, observability, app deployment and developer experience. In enterprise AI, the boring surface area is increasingly the moat.

What the leaders are saying

Sundar Pichai, CEO of Google/Alphabet — In Google’s I/O 2026 remarks, Pichai said, “Ten years since we pivoted the company to be AI-first, we still see AI as the most profound way to advance our mission and improve people’s lives at scale.” Why it matters: Google is positioning its advantage as a full-stack system — silicon, models, products and platforms — rather than a single model release.
Sundar Pichai, CEO of Google/Alphabet — In the same public remarks, he said Google is taking a “differentiated, full-stack approach to AI innovation, from our custom silicon and secure foundation, to our world-class research and models, to our products and platforms.” Why it matters: This is the clearest articulation of the strategy also visible in this week’s Gemini computer-use and Google Cloud security updates.
OpenAI, official product/research announcement — OpenAI’s fresh public materials this week describe GPT-5.6 Sol as a model with stronger coding, science and cybersecurity capabilities, and Jalapeño as a custom chip built for LLM inference. No fresh, independently verifiable executive quote from Sam Altman was available in the primary sources reviewed. Why it matters: The company’s actions still speak loudly: OpenAI is treating inference infrastructure and safety systems as core product surfaces.
Broadcom/OpenAI, official partnership announcement — The Broadcom/OpenAI announcement frames Jalapeño as an LLM-optimized inference processor designed to improve performance, efficiency and scale. A fresh executive quote could not be verified from the accessible primary text during this run. Why it matters: Broadcom’s custom-silicon business is being pulled deeper into the AI platform layer, not just generic accelerator supply.
Pang Ngernsupaluck, Microsoft Azure author for agentic cloud operations — Microsoft’s Azure post argues that cloud operations are moving “from insight to action,” with agents helping translate observability and operational signals into remediation workflows. Why it matters: Microsoft is folding agents into operations, not just productivity apps, which could make Azure’s control plane more automated and stickier.
Satadal Bhattacharjee, Arm — Arm’s AI infrastructure post says, “The industry is moving beyond a simple accelerator-first view of AI infrastructure.” Why it matters: This is the hardware counter-narrative to GPU-only thinking: agentic inference stresses CPUs, memory, networking and orchestration as much as accelerator throughput.
NVIDIA, official AWS collaboration post — NVIDIA’s post says production AI systems require low-latency inference, fast vector search, strong GPU price-performance and infrastructure that scales without multiplying operational complexity. Why it matters: NVIDIA is framing itself as an end-to-end production AI infrastructure company, with retrieval and data analytics included alongside training and inference.
Cloudflare, official developer-platform post — Cloudflare says self-managed OAuth became necessary as “agentic tools drove demand for delegated access.” Why it matters: Identity and delegated authorization are becoming first-class requirements for agents that act across SaaS, cloud and developer tools.
GitHub, Copilot engineering post by Shibani Basava and Carlos Castro — GitHub’s evaluation of the Copilot agentic harness emphasizes performance across models and token efficiency while preserving the ability to choose among more than 20 models. Why it matters: Model choice is becoming a platform feature; developers will expect agent frameworks to route by task, cost and latency rather than hard-code one provider.

Products and repos worth watching

Three product categories stood out. First, secure execution for agents: AWS Lambda MicroVMs are designed for stateful, isolated sessions that can run untrusted code with VM-level boundaries. That is a direct answer to the risk created by coding agents and data-analysis agents. Expect similar primitives across clouds, because “let the model run code” is not viable without isolation, lifecycle controls and cost management.

Second, managed retrieval: Bedrock Managed Knowledge Base, NVIDIA cuVS acceleration in OpenSearch Serverless and Databricks’ push around searchable video all point to the same bottleneck. Enterprises do not just need models; they need the model to find the right proprietary context with permissions intact. Retrieval is moving from a bespoke RAG project to a managed platform layer. The hype risk is that “managed RAG” can hide quality problems: parsing, chunking, stale permissions and poor evaluation still determine whether answers are trusted.

Third, developer-accessible model serving: Hugging Face showed how to run a private OpenAI-compatible vLLM server on HF Jobs in one command, billed by hardware usage. This is not the same as a fully managed production endpoint, but it matters for evals, experiments and batch generation. The easier it becomes to spin up private serving, the more teams can test open models against their own workloads before committing to a vendor.

The open-source and developer ecosystem also produced useful signals: IBM Research’s CUGA examples on Hugging Face for agentic apps, GitHub’s Copilot harness evaluation, Cloudflare’s OAuth expansion, and security-oriented posts from Cloudflare and Google Cloud. The common thread is that agents are now pushing on every layer developers touch: auth, CI/CD, evaluation, sandboxes, model routing, observability and policy.

Regulation, risk and market context

Two policy themes deserve attention. Google published a white paper advocating a pragmatic approach to AI governance in America, while GitHub joined a coalition seeking changes to California’s AI Transparency Act to protect open source. Cloudflare also highlighted the White House’s post-quantum executive order and argued it is time to get to work. These are not disconnected issues. AI regulation, open-source liability, content provenance and cryptographic migration all affect the same buyers: enterprises and public-sector organizations trying to deploy new systems without creating compliance debt.

The market context is equally important. OpenAI’s chip work with Broadcom pressures the accelerator ecosystem, but it does not eliminate NVIDIA’s position. NVIDIA’s advantage is increasingly a full stack: GPUs, networking, CUDA libraries, vector-search acceleration, cloud reference architectures and partner validation. Arm’s argument that CPUs matter more in agentic racks complicates the picture further. If inference becomes a heterogeneous pipeline — prefill, decode, retrieval, tool calls, policy checks and application orchestration — then memory bandwidth, CPU efficiency and networking become board-level topics.

For enterprise buyers, the practical risk is over-buying the story and under-investing in operations. Agents need permissions, audit trails, rollback, isolation, evals, human escalation and spend controls. Model improvements will keep arriving, but production value will accrue to organizations that can safely connect models to real systems.

What to watch next week

Watch for follow-through on OpenAI’s GPT-5.6 Sol access model and any deeper details on Jalapeño’s availability, workloads and supply chain. Track whether AWS customers start treating Lambda MicroVMs as the default sandbox for coding-agent and data-agent products. Look for Google to connect Gemini computer use more explicitly to Workspace, Cloud and enterprise governance. In data platforms, monitor whether Databricks, Snowflake, Google and Microsoft compete more directly on managed retrieval, operational databases for AI apps and multimodal governance. In hardware, listen for more evidence that AI infrastructure buying is moving from “GPU count” to rack-level throughput per watt, memory and networking efficiency. The weekend takeaway: agentic AI is becoming less magical and more infrastructural — which is exactly what has to happen before it becomes durable enterprise software.

Sources

← Back to the feed