graphwiz.ai
← Back to DevOps

CNCF Landscape: Cloud Native AI (CNAI)

cncfaiml-servingvector-databasevllmmilvusqdrantllmdistributed-trainingcnai

CNCF Landscape: Cloud Native AI (CNAI)

With a trend multiplier of 2.0x — the highest of any category — CNAI is the fastest-growing part of the CNCF landscape. This new category represents the convergence of cloud native infrastructure with AI/ML workloads, bringing Kubernetes-grade reliability to AI deployments.

Why CNAI Is Exploding

The CNAI category exists because running AI/ML workloads in production requires the same infrastructure patterns that cloud native was built for: containerization, orchestration, auto-scaling, and observability. As organizations deploy LLMs, vector databases, and ML pipelines at scale, they're reaching for Kubernetes-native tools rather than traditional ML platforms.

vLLM: The ML Serving Standard

vLLM at 74,870 stars and 15,022 forks is the go-to inference engine for large language models. It provides high-throughput serving with PagedAttention for memory-efficient KV-cache management.

vLLM has become the default serving layer for organizations deploying open-source LLMs because it supports virtually every model architecture, provides continuous batching for high throughput, and integrates with OpenAI-compatible APIs. If you're serving LLMs in production, vLLM is almost certainly in your stack.

Key capabilities: Continuous batching, PagedAttention, speculative decoding, multi-GPU serving, OpenAI-compatible API, and ecosystem integrations (LangChain, LlamaIndex).

Milvus: The Vector Database Leader

Milvus at 43,542 stars and 3,922 forks (incubating) is the most popular purpose-built vector database in the CNCF landscape. It powers RAG (Retrieval-Augmented Generation) architectures by storing and querying high-dimensional embeddings at scale.

Milvus supports multiple index types (IVF_FLAT, HNSW, DiskANN), hybrid search (dense + sparse), multi-tenancy, and can scale to billions of vectors across distributed storage. It's the backbone of most production RAG systems.

Key capabilities: Billion-scale vector storage, hybrid search, real-time upserts, multi-vector support, role-based access control, and Cloud native deployment on Kubernetes.

Qdrant: High-Performance Vector Search

Qdrant at 29,953 stars and 2,149 forks provides fast, filtered vector search with a Rust-based engine. It's popular for applications that need both semantic search and metadata filtering.

Qdrant excels at combining dense vector similarity with precise metadata filtering — essential for production search where you need "documents similar to X that also match filters Y and Z." Its gRPC API and native filtering make it a strong choice for real-time applications.

Key capabilities: Payload-based filtering, sparse vectors, multi-collection queries, quantization for memory efficiency, and an API-first design.

Chroma: Developer-Friendly Embedding Database

Chroma at 27,074 stars and 2,160 forks is the vector database built for developers. It provides an embedding database with a Pythonic API that's trivially easy to integrate.

Chroma is the go-to for prototyping RAG applications and local development. It supports in-memory and persistent storage, works with any embedding model, and provides built-in metadata filtering. For teams getting started with vector search, Chroma is often the first vector DB they touch.

Key capabilities: In-memory and persistent storage, automatic embedding generation, metadata filtering, open-source and self-hostable, and simple Python/JS APIs.

The CNAI Ecosystem

Beyond these four top projects, CNAI encompasses several critical subcategories:

  • ML Serving: vLLM, Triton Inference Server
  • Vector Databases: Milvus, Qdrant, Chroma, Weaviate
  • Distributed Training: Ray, DeepSpeed, Kubeflow Training
  • Model Observability: EvidentlyAI, Arize Phoenix, Langfuse
  • Data Architecture: LakeFS, DVC, Great Expectations
  • AutoML: AutoGluon, FLAML, Optuna
  • Open Enterprise AI Blueprints: Patterns for deploying AI responsibly

When to Use What

  • Serving LLMs in production? vLLM — it's the standard for a reason.
  • Building RAG systems? Milvus (scale) or Qdrant (filtered search) or Chroma (prototyping).
  • MLOps on Kubernetes? Kubeflow, Ray for distributed training.
  • Need model monitoring? Langfuse or Phoenix for trace-based evaluation.

See Also