Mem0
Mem0 documents a universal, self-improving memory layer for LLM applications at docs.mem0.ai, enabling persistent context across sessions via automatic extraction, deduplication, and semantic retrieval. The Mem0 Platform (app.mem0.ai) is a managed service with REST APIs and dashboard; Mem0 Open Source (`pip install mem0ai`) supports self-hosted deployments with pluggable vector and graph stores per docs.mem0.ai/open-source/overview. Integrations cover LangChain, CrewAI, Vercel AI SDK, and 20+ frameworks; the Python SDK uses `MemoryClient` for cloud and `Memory` for local mode.
Weaviate
Weaviate documents an open-source vector database at docs.weaviate.io/weaviate for storing objects and vector embeddings with semantic, keyword, and hybrid search, RAG, reranking, and agent workflows. The ecosystem includes self-hosted Docker/Kubernetes installs, Weaviate Cloud (console.weaviate.cloud), Query Agent, and Weaviate Embeddings for managed inference. Client libraries include Python (`weaviate-client` v4, requires Weaviate 1.23.7+), TypeScript, Go, and Java with REST, gRPC, and GraphQL APIs per the official documentation.
Qdrant
Qdrant documents an AI-native vector search engine at qdrant.tech/documentation for storing, indexing, and querying high-dimensional vectors with optional payloads, supporting dense, sparse, and multi-vector configurations. Official guides cover Docker/Kubernetes self-hosting, Qdrant Cloud on AWS/GCP/Azure, Hybrid Cloud, Private Cloud, and Qdrant Edge for embedded retrieval. Client libraries include Python (`qdrant-client`), JavaScript/TypeScript (`@qdrant/js-client-rest`), Rust, Go, Java, and .NET with REST and gRPC APIs per the API reference at api.qdrant.tech.
Chroma
Chroma documents an open-source embedding database at docs.trychroma.com for storing and querying vectors, metadata, and full-text fields in Python and JavaScript clients. Official guides cover ephemeral in-memory collections, persistent local storage, self-hosted server deployments, and Chroma Cloud at trychroma.com with authentication tokens. The docs describe collection CRUD, `add`/`query`/`get`/`update`/`delete` APIs, embedding functions (default and third-party), hybrid search, and multitenancy patterns for RAG and agent memory workloads per the documentation index.
PlanetScale
PlanetScale documents a relational database platform at planetscale.com/docs supporting both Vitess-backed MySQL clusters and PostgreSQL-compatible databases with branching, deploy requests, query insights, and optional horizontal sharding for large Vitess workloads. Official docs describe locally attached NVMe "Metal" storage for high IOPS, multi-AZ primaries with replicas, the `@planetscale/database` serverless HTTP driver for edge and serverless hosts that block outbound TCP, and standard MySQL connections via dashboard credentials or the `pscale connect` proxy CLI. PlanetScale also documents vector support alongside relational data for both MySQL and Postgres engines per the documentation index at planetscale.com/docs/llms.txt.
Turso
Turso documents libSQL—an open-source SQLite fork—and the Turso CLI (`tursodb`) at docs.turso.tech for embedded, local-first SQLite databases with optional Turso Cloud hosting for multi-region replication. Official guides cover installing the CLI (`curl -sSL tur.so/install | sh`), opening `.db` files or `:memory:` databases, schema migrations, and vector search extensions described in the Turso documentation index. Turso Cloud adds managed groups, database tokens, and HTTP/libSQL remote URLs provisioned through the Turso dashboard and `turso` CLI commands (`turso db create`, `turso db tokens create`) per quickstart pages.
Supabase
Supabase documents an open-source Postgres development platform at supabase.com/docs providing a hosted backend with database, authentication, storage, Edge Functions, realtime subscriptions, and vector search for web and mobile apps. Client libraries and REST/GraphQL APIs let teams provision projects via the dashboard or CLI, manage Row Level Security policies, and integrate AI workflows through pgvector-backed tables and partner marketplace tools. Supabase positions the stack as a Firebase alternative built on standard Postgres, with local development via the Supabase CLI and typed client SDKs for JavaScript, Flutter, Swift, and other frameworks per the documentation index.
CoreWeave
CoreWeave documents inference products at docs.coreweave.com/products/inference spanning Serverless, Dedicated (BYOW on H100/B200/A100-class GPUs), and CKS options, all exposing OpenAI API-compatible endpoints per the inference introduction. The Inference API at api.coreweave.com (v1alpha1) manages gateways, deployments, and capacity claims over REST/JSON, gRPC, or Connect with Bearer tokens requiring Inference Viewer or Inference Admin roles. Getting-started guides walk through gateway creation with IAM authentication, body-based routing on the model field, and chat completion requests against deployed weights in CoreWeave Object Storage.
NVIDIA NIM
NVIDIA NIM documents performance-optimized inference microservices at docs.api.nvidia.com/nim and docs.nvidia.com/nim that expose industry-standard APIs (OpenAI-compatible `/v1/chat/completions`, `/v1/completions`, `/v1/responses`, Anthropic-compatible `/v1/messages`) from containerized models backed by TensorRT-LLM, vLLM, or SGLang per deployment. Teams can self-host GPU-accelerated models on cloud, data center, or RTX workstations, or prototype via NVIDIA-hosted NIM API endpoints through the Developer Program. Management endpoints such as `/v1/health/ready` and `/v1/metrics` support readiness probes and Prometheus metrics on self-hosted containers per the LLM API reference.
AssemblyAI
AssemblyAI documents Voice AI APIs at assemblyai.com/docs where developers transcribe and analyze audio via REST at `https://api.assemblyai.com` and real-time WebSockets at `wss://streaming.assemblyai.com` (EU pre-recorded host `api.eu.assemblyai.com` per cloud residency docs). Pre-recorded transcription requires an explicit `speech_models` array on every `POST /v2/transcript` request—docs recommend `universal-3-pro` with `universal-2` fallback for 99-language coverage. The platform also publishes a Voice Agent API for speech-to-speech agents, Speech Understanding features (diarization, sentiment, summarization), Guardrails, and an LLM Gateway to run frontier models on transcripts.
Deepgram
Deepgram documents speech-to-text at developers.deepgram.com with WebSocket streaming on `/v1/listen` for general real-time transcription (Nova-3 model, diarization, and search features per API reference) and `/v2/listen` for conversational Flux models with integrated end-of-turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn events). Official SDKs expose `deepgram.listen.v1.connect` and `deepgram.listen.v2.connect` for binary audio streams. Docs contrast Flux—optimized for voice agents with lower turn-detection latency—against Nova-3 for meetings, IVR, and agent-assist workloads, and describe latency measurement guides targeting sub-300 ms streaming for Nova-3. Self-hosted deployments can run Flux on dedicated Engine nodes with `/v2/listen` enabled per self-hosted configuration guides.
fal
fal documents a serverless platform at fal.ai/docs where teams deploy custom models as Python `fal.App` classes with `@fal.endpoint` handlers on auto-scaling H100/A100/B200 runners, or call 1,000+ hosted Model APIs through a unified client. The workflow uses `fal run` for temporary cloud testing and `fal deploy` for persistent endpoints (for example `your-username/my-model` via `fal_client.subscribe` or `https://queue.fal.run/`). Docs describe `setup()` for one-time model loading, machine_type GPU selection, auth modes (private vs public), per-second Serverless billing versus hourly fal Compute for training, and built-in App Analytics with Prometheus-compatible metrics.
RunPod
RunPod documents a serverless platform at docs.runpod.io where teams deploy containerized AI handlers without managing servers, paying only for compute time used. Developers write Python handler functions with the Runpod SDK (`runpod.serverless.start`), package Docker images, and expose queue-based endpoints at `https://api.runpod.ai/v2/{ENDPOINT_ID}/runsync` or `/run` with `Authorization: Bearer RUNPOD_API_KEY`. Docs cover streaming handlers, load-balancing endpoints with custom HTTP frameworks, Pods for persistent GPUs, network volumes, and a REST API at rest.runpod.io for programmatic resource management.
Snowflake Cortex
Snowflake documents Cortex as a suite of managed AI features in docs.snowflake.com: Cortex AI Functions (AI_COMPLETE, AI_EXTRACT, AI_SENTIMENT, AI_TRANSLATE, and related SQL/Python surfaces) run unstructured analytics on text and images with models from OpenAI, Anthropic, Meta, Mistral AI, and DeepSeek subject to CORTEX_USER privileges. Cortex Search provides hybrid retrieval for RAG, Cortex Analyst turns natural-language questions into SQL over semantic models, and Cortex Agents orchestrate Analyst, Search, custom UDFs, and optional web search via agent APIs. Cortex Code adds CLI and Snowsight assistance for data engineering workflows, and Snowflake documents MCP support within the Cortex Code tooling hierarchy.
Baseten
Baseten documents a training and inference platform at docs.baseten.co where teams deploy models via the open-source Truss framework or call hosted Model APIs without standing up infrastructure. Config-only Truss deployments point at Hugging Face checkpoints, select GPU resources, and engines such as TensorRT-LLM; `truss push` builds optimized containers and exposes OpenAI-compatible sync endpoints like `https://model-{model_id}.api.baseten.co/environments/production/sync/v1`. Custom architectures use a Truss `Model` class with `load` and `predict` in `model.py`. Model APIs provide immediate OpenAI-SDK-style access to catalog models (DeepSeek, Qwen, GLM, and others listed in docs) using `BASETEN_API_KEY`.
Modal
Modal documents a serverless cloud at modal.com where engineers run compute-intensive Python with zero infrastructure configuration: deploy OpenAI-compatible LLM services, batch workflows, job queues, GPU training and fine-tuning, and thousands of isolated Sandboxes for agent-generated code. Official guides show defining apps with `@app.function`, container images via `modal.Image`, and GPU types in code rather than YAML. Modal states pricing is per-second serverless usage with pooled capacity across major clouds, and supports calling functions from JavaScript/Go clients in addition to Python.
Fireworks AI
Fireworks AI documents a REST platform at docs.fireworks.ai where developers call language, image, and embedding models with Bearer API keys from the dashboard or `firectl api-key create`. Models use globally unique IDs such as `accounts/<account>/models/<model-id>` and can be served via serverless inference for popular open weights (for example Llama 3.1 70B listed on fireworks.ai/models) or private dedicated GPU deployments for custom base models and LoRA addons. Official docs distinguish serverless per-token billing with best-effort uptime from dedicated deployments billed per GPU-second with private capacity, and state that prompts and generated outputs are not logged except for documented exceptions such as the FireFunction model or opt-in advanced features.
Portkey
Portkey documents an AI gateway at docs.portkey.ai that unifies access to more than 250 models through a Portkey SDK or OpenAI-compatible base URL (`PORTKEY_GATEWAY_URL`) with provider routing headers. Official quickstarts show three-line Python or TypeScript integrations that start monitoring LLM requests for resilience, security, and performance. Portkey states the open-source gateway is free to self-host while the managed service includes a free tier of 10k requests per month, edge-hosted workers adding roughly 20–40ms latency versus direct API calls, ISO 27001 and SOC 2 certifications, and optional configurations that skip storing request/response bodies.
Braintrust
Braintrust documents an AI observability platform at braintrust.dev where teams instrument applications to capture traces (inputs, outputs, latency, token usage, nested tool calls), analyze logs, annotate with human feedback, run experiments and scorers, and iterate on prompts before deployment. Official docs describe a workflow spanning Instrument → Observe → Annotate → Evaluate → Deploy, with auto-instrumentation for major providers (OpenAI, Anthropic, Gemini, Bedrock, Azure, and others listed in the integrations directory) and frameworks such as LangChain, LangGraph, Vercel AI SDK, and Pydantic AI. Span types documented include task, llm, function, tool, and score spans, each capturing metrics and metadata for debugging and building evaluation datasets.
Helicone
Helicone documents an AI Gateway at ai-gateway.helicone.ai that lets teams call 100+ models from OpenAI, Anthropic, Google, Groq, and other vendors through an OpenAI-compatible base URL while logging every request to the Helicone dashboard. Official quickstart guides show signing up at helicone.ai, creating API keys in the US control plane, and pointing standard OpenAI SDK clients at the gateway with automatic observability. Helicone states credits carry 0% markup versus provider list prices, support automatic fallbacks when a provider is down, and allow bringing your own provider keys instead of using Helicone-managed credentials.
LiteLLM
LiteLLM is an open-source Python library and proxy stack documented at docs.litellm.ai that exposes a single `completion()` interface across providers such as OpenAI, Anthropic, Vertex AI, Bedrock, and Ollama using OpenAI-compatible request and response shapes. The project documents a Router with retry, fallback, and load-balancing across deployments, optional observability callbacks (Langfuse, MLflow, Helicone, and others listed in observability guides), and a self-hosted LiteLLM Proxy (LLM Gateway) with virtual keys, spend tracking, guardrails, and an admin UI. Recent documentation also describes an MCP Gateway that centralizes MCP tool access with per-key, per-team, and per-organization permissions.
LangSmith
LangSmith is LangChain's hosted and self-hostable platform for tracing, monitoring, and improving LLM applications. Official documentation at docs.langchain.com describes instrumenting apps via environment variables, framework integrations (OpenAI, Anthropic, CrewAI, Vercel AI SDK, Pydantic AI, and others listed on the integrations page), or the LangSmith SDK so teams can inspect multi-step runs, compare prompt versions, build datasets, run offline and online evaluations, configure automations, and collect feedback queues—without assembling bespoke analytics for agent loops.
Hugging Face Hub
Hugging Face operates the Hugging Face Hub—a central place to browse and host machine-learning artifacts—alongside Spaces for demo apps and documentation for calling models through HTTP APIs using Hugging Face access tokens. Official docs outline creating accounts and tokens (`Settings → Access Tokens`), downloading files with Git LFS-compatible clients, versioning repositories, and invoking models through Inference Providers / serverless patterns published in huggingface.co documentation rather than stitching together bespoke hosting.
Replicate
Replicate is a hosted platform for executing third-party and custom machine-learning models over HTTP without provisioning GPUs yourself. Official documentation explains how to authenticate with API tokens, create asynchronous predictions, stream outputs, retrieve model metadata, wire webhooks for completion events, and optionally deploy or fine-tune checkpoints (for example FLUX image workflows) published to the Replicate catalog.
Together AI
Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.
Groq Cloud API
GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.
Weights & Biases (W&B)
Weights & Biases sells W&B, a cloud-hosted developer platform outlined at docs.wandb.ai where machine-learning practitioners instrument training jobs with first-party SDKs (`wandb`), stream scalars/media/system telemetry into hosted dashboards, collaborate through shared projects/workspaces, and manage hyperparameter Sweeps orchestrated according to Sweeps YAML plus controller policies described in vendor documentation rather than improvised spreadsheets. Companion guides publish patterns for versioning datasets/models through Artifacts, linking reproducible checkpoints plus evaluation payloads, emitting reports, tying runs to notebooks, integrating with prevalent PyTorch/Keras/JAX/Hugging Face/higher-level trainers, monitoring production inference where product SKUs advertise it, and upgrading team security controls—all scoped to whichever features your organization enables on wandb.ai.
OpenRouter
OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.
Langfuse
Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.
Postgres MCP
pg-mcp-server is a Model Context Protocol server that bridges AI agents and PostgreSQL databases. It exposes schema metadata (tables, columns, indexes, foreign keys) as MCP resources, and lets agents execute read-only SQL queries or transactional writes. Ideal for developers who want Claude, Cursor, or other LLM-powered tools to answer questions about a live database without manual SQL. Supports connection string configuration, SSL modes, and Row-level security awareness.