Vertex AI embedding models — text-embedding-005, text-embedding-large-exp-03-07 and the multilingual variants — now resolve through the platform's AI configuration layer alongside OpenAI and Cohere. The same RAG, vector-search and semantic-deduplication pipelines that talk to OpenAI today talk to Vertex with a single configuration change ; application code is unaffected.
Global model configuration accepts an endpoint override for on-premises Vertex deployments, where Google's customer-side managed service runs inside the customer's VPC and the public endpoint is unreachable. The override applies per-environment so dev and prod can point at different Vertex instances without conditional code.
Organisations standardising on Google Cloud infrastructure inherit a complete Google-native AI path : Vertex for the LLM, Vertex for embeddings, BigQuery for the OLAP store, Hangouts Meet for the collaboration layer. Provider choice remains a per-axis decision — embedding-model and chat-model can target different providers when the workload calls for it.