Product Release2024-12-12

Google Vertex AI joins the multi-provider AI surface as the third LLM provider

Google Vertex AI joins OpenAI and Anthropic Claude as the platform's third LLM provider. Gemini Pro and Flash models are addressable through the unified AI configuration layer ; a JavaScript model builder API exposes provider-specific options — regional endpoint override, grounding citations, thinking budget — without breaking the neutral application surface.

Google Vertex AI's Gemini models differ from OpenAI and Anthropic along several operational axes — regional endpoint management, grounding citations that link to Google Search results, and a thinking-budget parameter for extended reasoning. Absorbing a third provider while keeping the application surface neutral requires the adapter to handle these differences at the boundary without leaking provider-specific types upward into application code. This release ships the Vertex AI adapter and a JavaScript model builder that exposes provider-specific options to scripts that choose to use them.

Gemini model coverage

Gemini Pro and Flash. Both model families are addressable through the AI configuration layer ; the active model is declared per-environment as a configuration value. Gemini Flash's lower latency and cost make it the natural choice for high-volume RAG retrieval augmentation steps ; Gemini Pro covers reasoning-intensive tasks that benefit from the larger model.
Regional endpoint override. Vertex AI issues an endpoint per Google Cloud region. The AI configuration layer accepts a regional endpoint override alongside the project ID and credentials, so multi-region deployments route calls to the nearest Vertex endpoint without requiring separate configuration entries per region.

JavaScript model builder API

Provider-specific options at the call site. The neutral AI configuration layer covers the options every provider shares — system prompt, max tokens, temperature. The JavaScript model builder surfaces the Vertex-specific options — thinking budget, grounding toggle, safety settings — for scripts that need them. Scripts that do not use the builder call through the neutral interface and are unaffected by the Vertex-specific extensions.
Grounding citations. When grounding is enabled, Vertex AI links each claim in the model's response to Google Search sources. The adapter surfaces these as structured metadata alongside the response text ; application code can expose them as inline references or omit them depending on the use-case.

Three providers, one application surface

Configuration-only provider switching. A deployment that runs Gemini Flash for retrieval augmentation and Anthropic Claude for final synthesis declares both in the AI configuration layer ; no application-level import references a provider-specific type.
Per-scope assignment. The AI configuration hierarchy — global, company, department — applies to Vertex AI identically to the other providers. Departments that standardise on Google Cloud route all AI calls through Vertex without forking the application code shared with other departments.

With three providers behind the AI configuration layer, the multi-provider architecture is validated across three distinct API contracts in production. The integration work for subsequent providers — IBM watsonx, Cohere — is adapter work only ; the application layer and the AI configuration model are stable.

See the feature →

← All posts