Developer Tools AI Tools

Jina AI

Jina AI documents Search Foundation APIs at docs.jina.ai for RAG and agent pipelines. The Reader API at POST https://r.jina.ai/ converts a single URL into LLM-friendly markdown or JSON with optional headers such as X-Engine (browser/direct), X-Return-Format, and X-Respond-With readerlm-v2 per docs. The Search API at POST https://s.jina.ai/ returns SERP-style web results optimized for downstream LLM use with query parameters q, gl, hl, num, and page. Embeddings API at POST https://api.jina.ai/v1/embeddings supports models including jina-embeddings-v5-text-small, jina-embeddings-v5-text-nano, jina-embeddings-v4, jina-embeddings-v3, and jina-clip-v2 with task types such as retrieval.query and retrieval.passage. Reranker API at POST https://api.jina.ai/v1/rerank includes jina-reranker-v3 for refining retrieval hits. EU endpoints eu.r.jina.ai and eu.s.jina.ai keep processing in EU jurisdiction per docs.

Voyage AI

Developer ToolsUsage-based API prici…

Voyage AI documents state-of-the-art embedding models and rerankers at docs.voyageai.com for retrieval-augmented generation and semantic search. The embeddings API at POST https://api.voyageai.com/v1/embeddings converts text into dense vectors; the Python `voyageai` package exposes `voyageai.Client.embed()` with models such as voyage-4-large, voyage-4, voyage-4-lite, voyage-code-3, and domain-specific variants per docs.voyageai.com/docs/embeddings. Docs describe `input_type` query vs document prompts for retrieval tasks, optional `output_dimension` (256–2048), and `output_dtype` quantization (float, int8, binary). Rerankers score query–document relevance to refine initial retrieval results. Voyage AI is part of MongoDB's AI stack (MongoDB docs reference Voyage embeddings) and integrates with vector stores and LLMs as modular RAG components.

Meilisearch

Developer ToolsOpen source

Meilisearch documents an open-source search engine at meilisearch.com/docs for fast, typo-tolerant full-text search, faceting, filtering, and sorting. Meilisearch Cloud offers hosted deployment; self-hosted options include Docker and native binaries with REST API and official SDKs (JavaScript, Python, Rust, PHP, Java, .NET, Dart, Go). Recent docs and blog posts describe semantic and hybrid search capabilities, AI-powered search experiences, and the official meilisearch-mcp server for LLM clients. Meilisearch positions itself as a developer-friendly alternative focused on sub-50ms search experiences with simple index/document APIs.

turbopuffer

Developer ToolsUsage-based pricing

turbopuffer documents a fast search engine at turbopuffer.com/docs built natively on object storage (S3, GCS, Azure Blob) with NVMe/memory caching for compute. Architecture docs describe a write-ahead log on object storage, SPFresh centroid-based ANN indexes for vectors, inverted BM25 indexes for full-text, exact metadata indexes with native filtering, and branching copy-on-write namespaces. The API supports vector ANN queries, BM25 full-text, hybrid multi-queries, regex/trigram search, filters, and encryption with customer keys per turbopuffer.com/docs/index and turbopuffer.com/docs/architecture. Docs cite production-scale limits observed (4T+ documents, 10M+ writes/s, 25k+ queries/s) with tradeoffs: higher write latency from object-storage durability and occasional cold queries on uncached namespaces.

Typesense

Developer ToolsOpen source

Typesense documents an open-source search engine at typesense.org/docs for fast typo-tolerant keyword search, faceting, and vector retrieval. Vector search docs at typesense.org/docs/30.2/api/vector-search describe KNN search on imported embeddings or auto-generated embeddings via OpenAI, Google PaLM API, or built-in Hugging Face models in huggingface.co/typesense/models (use the `ts` namespace prefix). Features include semantic search, hybrid search with rank fusion and adjustable `alpha` weighting, similar-document queries by ID, HNSW approximate search with optional `flat_search_cutoff` brute-force mode, and cosine `vector_distance` scoring. Deploy via Typesense Cloud or self-hosted Docker/binaries with REST API and official client libraries.

Pinecone

Developer ToolsStarter free tier + S…

Pinecone documents a fully managed vector database at docs.pinecone.io for storing, indexing, and querying high-dimensional embeddings at production scale. Serverless indexes support document schemas mixing dense vectors, sparse vectors, and full-text search fields with metadata filtering per docs.pinecone.io/guides/get-started/concepts. Official SDKs include Python, Node.js, Java, and Go; REST API access uses documented rate limits and plan tiers (Starter, Standard, Enterprise). Pinecone also documents Pinecone Assistant, Dedicated Read Nodes, BYOC, and Nexus offerings on pinecone.io alongside MCP integrations (Pinecone MCP Server and Pinecone Docs MCP Server) for agent workflows.

Milvus

Developer ToolsOpen source

Milvus documents a high-performance vector database at milvus.io/docs for storing, indexing, and searching embedding vectors with metadata filtering and hybrid search. Deployment options include Milvus Lite (`pip install pymilvus` for notebooks/edge), Milvus Standalone (single Docker image), and Milvus Distributed on Kubernetes per milvus.io/docs/v2.6.x/install-overview. Official SDKs include PyMilvus, Go, Java, Node.js, and C#; Zilliz Cloud offers managed Milvus. Architecture separates access, coordinator, worker, and storage layers with object storage backends (MinIO, S3, Azure Blob) per milvus.io/docs/architecture_overview.

LanceDB

Developer ToolsOpen source

LanceDB documents a multimodal lakehouse for AI at docs.lancedb.com, built on the open-source Lance columnar format for storing vectors, metadata, raw bytes, and embeddings in unified tables. LanceDB OSS is an embedded library with Python, TypeScript, and Rust SDKs for local development; LanceDB Enterprise is a distributed managed lakehouse for search, curation, feature engineering, and training workflows per docs.lancedb.com. Features include vector/semantic search, BM25 full-text search, hybrid search with SQL filters, versioning, and cloud object-store integration (S3, GCS, Azure).

Mem0

Developer ToolsMem0 Platform usage-b…

Mem0 documents a universal, self-improving memory layer for LLM applications at docs.mem0.ai, enabling persistent context across sessions via automatic extraction, deduplication, and semantic retrieval. The Mem0 Platform (app.mem0.ai) is a managed service with REST APIs and dashboard; Mem0 Open Source (`pip install mem0ai`) supports self-hosted deployments with pluggable vector and graph stores per docs.mem0.ai/open-source/overview. Integrations cover LangChain, CrewAI, Vercel AI SDK, and 20+ frameworks; the Python SDK uses `MemoryClient` for cloud and `Memory` for local mode.

Weaviate

Developer ToolsOpen source

Weaviate documents an open-source vector database at docs.weaviate.io/weaviate for storing objects and vector embeddings with semantic, keyword, and hybrid search, RAG, reranking, and agent workflows. The ecosystem includes self-hosted Docker/Kubernetes installs, Weaviate Cloud (console.weaviate.cloud), Query Agent, and Weaviate Embeddings for managed inference. Client libraries include Python (`weaviate-client` v4, requires Weaviate 1.23.7+), TypeScript, Go, and Java with REST, gRPC, and GraphQL APIs per the official documentation.

Qdrant

Developer ToolsOpen source

Qdrant documents an AI-native vector search engine at qdrant.tech/documentation for storing, indexing, and querying high-dimensional vectors with optional payloads, supporting dense, sparse, and multi-vector configurations. Official guides cover Docker/Kubernetes self-hosting, Qdrant Cloud on AWS/GCP/Azure, Hybrid Cloud, Private Cloud, and Qdrant Edge for embedded retrieval. Client libraries include Python (`qdrant-client`), JavaScript/TypeScript (`@qdrant/js-client-rest`), Rust, Go, Java, and .NET with REST and gRPC APIs per the API reference at api.qdrant.tech.

Chroma

Developer ToolsOpen source

Chroma documents an open-source embedding database at docs.trychroma.com for storing and querying vectors, metadata, and full-text fields in Python and JavaScript clients. Official guides cover ephemeral in-memory collections, persistent local storage, self-hosted server deployments, and Chroma Cloud at trychroma.com with authentication tokens. The docs describe collection CRUD, `add`/`query`/`get`/`update`/`delete` APIs, embedding functions (default and third-party), hybrid search, and multitenancy patterns for RAG and agent memory workloads per the documentation index.

PlanetScale

Developer ToolsBase and Enterprise p…

PlanetScale documents a relational database platform at planetscale.com/docs supporting both Vitess-backed MySQL clusters and PostgreSQL-compatible databases with branching, deploy requests, query insights, and optional horizontal sharding for large Vitess workloads. Official docs describe locally attached NVMe "Metal" storage for high IOPS, multi-AZ primaries with replicas, the `@planetscale/database` serverless HTTP driver for edge and serverless hosts that block outbound TCP, and standard MySQL connections via dashboard credentials or the `pscale connect` proxy CLI. PlanetScale also documents vector support alongside relational data for both MySQL and Postgres engines per the documentation index at planetscale.com/docs/llms.txt.

Turso

Developer ToolsOpen source

Turso documents libSQL—an open-source SQLite fork—and the Turso CLI (`tursodb`) at docs.turso.tech for embedded, local-first SQLite databases with optional Turso Cloud hosting for multi-region replication. Official guides cover installing the CLI (`curl -sSL tur.so/install | sh`), opening `.db` files or `:memory:` databases, schema migrations, and vector search extensions described in the Turso documentation index. Turso Cloud adds managed groups, database tokens, and HTTP/libSQL remote URLs provisioned through the Turso dashboard and `turso` CLI commands (`turso db create`, `turso db tokens create`) per quickstart pages.

Supabase

Developer ToolsFree + Paid

Supabase documents an open-source Postgres development platform at supabase.com/docs providing a hosted backend with database, authentication, storage, Edge Functions, realtime subscriptions, and vector search for web and mobile apps. Client libraries and REST/GraphQL APIs let teams provision projects via the dashboard or CLI, manage Row Level Security policies, and integrate AI workflows through pgvector-backed tables and partner marketplace tools. Supabase positions the stack as a Firebase alternative built on standard Postgres, with local development via the Supabase CLI and typed client SDKs for JavaScript, Flutter, Swift, and other frameworks per the documentation index.

CoreWeave

Developer ToolsUsage-based GPU infer…

CoreWeave documents inference products at docs.coreweave.com/products/inference spanning Serverless, Dedicated (BYOW on H100/B200/A100-class GPUs), and CKS options, all exposing OpenAI API-compatible endpoints per the inference introduction. The Inference API at api.coreweave.com (v1alpha1) manages gateways, deployments, and capacity claims over REST/JSON, gRPC, or Connect with Bearer tokens requiring Inference Viewer or Inference Admin roles. Getting-started guides walk through gateway creation with IAM authentication, body-based routing on the model field, and chat completion requests against deployed weights in CoreWeave Object Storage.

NVIDIA NIM

Developer ToolsDeveloper Program hos…

NVIDIA NIM documents performance-optimized inference microservices at docs.api.nvidia.com/nim and docs.nvidia.com/nim that expose industry-standard APIs (OpenAI-compatible `/v1/chat/completions`, `/v1/completions`, `/v1/responses`, Anthropic-compatible `/v1/messages`) from containerized models backed by TensorRT-LLM, vLLM, or SGLang per deployment. Teams can self-host GPU-accelerated models on cloud, data center, or RTX workstations, or prototype via NVIDIA-hosted NIM API endpoints through the Developer Program. Management endpoints such as `/v1/health/ready` and `/v1/metrics` support readiness probes and Prometheus metrics on self-hosted containers per the LLM API reference.

AssemblyAI

Developer ToolsPay-as-you-go per aud…

AssemblyAI documents Voice AI APIs at assemblyai.com/docs where developers transcribe and analyze audio via REST at `https://api.assemblyai.com` and real-time WebSockets at `wss://streaming.assemblyai.com` (EU pre-recorded host `api.eu.assemblyai.com` per cloud residency docs). Pre-recorded transcription requires an explicit `speech_models` array on every `POST /v2/transcript` request—docs recommend `universal-3-pro` with `universal-2` fallback for 99-language coverage. The platform also publishes a Voice Agent API for speech-to-speech agents, Speech Understanding features (diarization, sentiment, summarization), Guardrails, and an LLM Gateway to run frontier models on transcripts.

Deepgram

Developer ToolsPay-as-you-go per aud…

Deepgram documents speech-to-text at developers.deepgram.com with WebSocket streaming on `/v1/listen` for general real-time transcription (Nova-3 model, diarization, and search features per API reference) and `/v2/listen` for conversational Flux models with integrated end-of-turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn events). Official SDKs expose `deepgram.listen.v1.connect` and `deepgram.listen.v2.connect` for binary audio streams. Docs contrast Flux—optimized for voice agents with lower turn-detection latency—against Nova-3 for meetings, IVR, and agent-assist workloads, and describe latency measurement guides targeting sub-300 ms streaming for Nova-3. Self-hosted deployments can run Flux on dedicated Engine nodes with `/v2/listen` enabled per self-hosted configuration guides.

fal

Developer ToolsPer-second Serverless…

fal documents a serverless platform at fal.ai/docs where teams deploy custom models as Python `fal.App` classes with `@fal.endpoint` handlers on auto-scaling H100/A100/B200 runners, or call 1,000+ hosted Model APIs through a unified client. The workflow uses `fal run` for temporary cloud testing and `fal deploy` for persistent endpoints (for example `your-username/my-model` via `fal_client.subscribe` or `https://queue.fal.run/`). Docs describe `setup()` for one-time model loading, machine_type GPU selection, auth modes (private vs public), per-second Serverless billing versus hourly fal Compute for training, and built-in App Analytics with Prometheus-compatible metrics.

RunPod

Developer ToolsPer-second serverless…

RunPod documents a serverless platform at docs.runpod.io where teams deploy containerized AI handlers without managing servers, paying only for compute time used. Developers write Python handler functions with the Runpod SDK (`runpod.serverless.start`), package Docker images, and expose queue-based endpoints at `https://api.runpod.ai/v2/{ENDPOINT_ID}/runsync` or `/run` with `Authorization: Bearer RUNPOD_API_KEY`. Docs cover streaming handlers, load-balancing endpoints with custom HTTP frameworks, Pods for persistent GPUs, network volumes, and a REST API at rest.runpod.io for programmatic resource management.

Snowflake Cortex

Developer ToolsToken-based Cortex us…

Snowflake documents Cortex as a suite of managed AI features in docs.snowflake.com: Cortex AI Functions (AI_COMPLETE, AI_EXTRACT, AI_SENTIMENT, AI_TRANSLATE, and related SQL/Python surfaces) run unstructured analytics on text and images with models from OpenAI, Anthropic, Meta, Mistral AI, and DeepSeek subject to CORTEX_USER privileges. Cortex Search provides hybrid retrieval for RAG, Cortex Analyst turns natural-language questions into SQL over semantic models, and Cortex Agents orchestrate Analyst, Search, custom UDFs, and optional web search via agent APIs. Cortex Code adds CLI and Snowsight assistance for data engineering workflows, and Snowflake documents MCP support within the Cortex Code tooling hierarchy.

Baseten

Developer ToolsUsage-based inference…

Baseten documents a training and inference platform at docs.baseten.co where teams deploy models via the open-source Truss framework or call hosted Model APIs without standing up infrastructure. Config-only Truss deployments point at Hugging Face checkpoints, select GPU resources, and engines such as TensorRT-LLM; `truss push` builds optimized containers and exposes OpenAI-compatible sync endpoints like `https://model-{model_id}.api.baseten.co/environments/production/sync/v1`. Custom architectures use a Truss `Model` class with `load` and `predict` in `model.py`. Model APIs provide immediate OpenAI-SDK-style access to catalog models (DeepSeek, Qwen, GLM, and others listed in docs) using `BASETEN_API_KEY`.

Modal

Developer ToolsPer-second serverless…

Modal documents a serverless cloud at modal.com where engineers run compute-intensive Python with zero infrastructure configuration: deploy OpenAI-compatible LLM services, batch workflows, job queues, GPU training and fine-tuning, and thousands of isolated Sandboxes for agent-generated code. Official guides show defining apps with `@app.function`, container images via `modal.Image`, and GPU types in code rather than YAML. Modal states pricing is per-second serverless usage with pooled capacity across major clouds, and supports calling functions from JavaScript/Go clients in addition to Python.

Fireworks AI

Developer ToolsServerless per-token …

Fireworks AI documents a REST platform at docs.fireworks.ai where developers call language, image, and embedding models with Bearer API keys from the dashboard or `firectl api-key create`. Models use globally unique IDs such as `accounts/<account>/models/<model-id>` and can be served via serverless inference for popular open weights (for example Llama 3.1 70B listed on fireworks.ai/models) or private dedicated GPU deployments for custom base models and LoRA addons. Official docs distinguish serverless per-token billing with best-effort uptime from dedicated deployments billed per GPU-second with private capacity, and state that prompts and generated outputs are not logged except for documented exceptions such as the FireFunction model or opt-in advanced features.

Portkey

Developer ToolsOpen source

Portkey documents an AI gateway at docs.portkey.ai that unifies access to more than 250 models through a Portkey SDK or OpenAI-compatible base URL (`PORTKEY_GATEWAY_URL`) with provider routing headers. Official quickstarts show three-line Python or TypeScript integrations that start monitoring LLM requests for resilience, security, and performance. Portkey states the open-source gateway is free to self-host while the managed service includes a free tier of 10k requests per month, edge-hosted workers adding roughly 20–40ms latency versus direct API calls, ISO 27001 and SOC 2 certifications, and optional configurations that skip storing request/response bodies.

Braintrust

Developer ToolsFree + Paid

Braintrust documents an AI observability platform at braintrust.dev where teams instrument applications to capture traces (inputs, outputs, latency, token usage, nested tool calls), analyze logs, annotate with human feedback, run experiments and scorers, and iterate on prompts before deployment. Official docs describe a workflow spanning Instrument → Observe → Annotate → Evaluate → Deploy, with auto-instrumentation for major providers (OpenAI, Anthropic, Gemini, Bedrock, Azure, and others listed in the integrations directory) and frameworks such as LangChain, LangGraph, Vercel AI SDK, and Pydantic AI. Span types documented include task, llm, function, tool, and score spans, each capturing metrics and metadata for debugging and building evaluation datasets.

Helicone

Developer ToolsFree + Paid

Helicone documents an AI Gateway at ai-gateway.helicone.ai that lets teams call 100+ models from OpenAI, Anthropic, Google, Groq, and other vendors through an OpenAI-compatible base URL while logging every request to the Helicone dashboard. Official quickstart guides show signing up at helicone.ai, creating API keys in the US control plane, and pointing standard OpenAI SDK clients at the gateway with automatic observability. Helicone states credits carry 0% markup versus provider list prices, support automatic fallbacks when a provider is down, and allow bringing your own provider keys instead of using Helicone-managed credentials.

LiteLLM

Developer ToolsOpen source

LiteLLM is an open-source Python library and proxy stack documented at docs.litellm.ai that exposes a single `completion()` interface across providers such as OpenAI, Anthropic, Vertex AI, Bedrock, and Ollama using OpenAI-compatible request and response shapes. The project documents a Router with retry, fallback, and load-balancing across deployments, optional observability callbacks (Langfuse, MLflow, Helicone, and others listed in observability guides), and a self-hosted LiteLLM Proxy (LLM Gateway) with virtual keys, spend tracking, guardrails, and an admin UI. Recent documentation also describes an MCP Gateway that centralizes MCP tool access with per-key, per-team, and per-organization permissions.

LangSmith

Developer ToolsFree + Paid

LangSmith is LangChain's hosted and self-hostable platform for tracing, monitoring, and improving LLM applications. Official documentation at docs.langchain.com describes instrumenting apps via environment variables, framework integrations (OpenAI, Anthropic, CrewAI, Vercel AI SDK, Pydantic AI, and others listed on the integrations page), or the LangSmith SDK so teams can inspect multi-step runs, compare prompt versions, build datasets, run offline and online evaluations, configure automations, and collect feedback queues—without assembling bespoke analytics for agent loops.

Hugging Face Hub

Developer ToolsFree tier

Hugging Face operates the Hugging Face Hub—a central place to browse and host machine-learning artifacts—alongside Spaces for demo apps and documentation for calling models through HTTP APIs using Hugging Face access tokens. Official docs outline creating accounts and tokens (`Settings → Access Tokens`), downloading files with Git LFS-compatible clients, versioning repositories, and invoking models through Inference Providers / serverless patterns published in huggingface.co documentation rather than stitching together bespoke hosting.

Replicate

Developer ToolsPay-per-prediction bi…

Replicate is a hosted platform for executing third-party and custom machine-learning models over HTTP without provisioning GPUs yourself. Official documentation explains how to authenticate with API tokens, create asynchronous predictions, stream outputs, retrieve model metadata, wire webhooks for completion events, and optionally deploy or fine-tune checkpoints (for example FLUX image workflows) published to the Replicate catalog.

Together AI

Developer ToolsUsage-based inference…

Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.

Groq Cloud API

Developer ToolsFree + Paid

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

Weights & Biases (W&B)

Developer ToolsFree + Paid

Weights & Biases sells W&B, a cloud-hosted developer platform outlined at docs.wandb.ai where machine-learning practitioners instrument training jobs with first-party SDKs (`wandb`), stream scalars/media/system telemetry into hosted dashboards, collaborate through shared projects/workspaces, and manage hyperparameter Sweeps orchestrated according to Sweeps YAML plus controller policies described in vendor documentation rather than improvised spreadsheets. Companion guides publish patterns for versioning datasets/models through Artifacts, linking reproducible checkpoints plus evaluation payloads, emitting reports, tying runs to notebooks, integrating with prevalent PyTorch/Keras/JAX/Hugging Face/higher-level trainers, monitoring production inference where product SKUs advertise it, and upgrading team security controls—all scoped to whichever features your organization enables on wandb.ai.

OpenRouter

Developer ToolsFree + Paid

OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.

Langfuse

Developer ToolsOpen source

Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.

Postgres MCP

Developer ToolsFree tier

pg-mcp-server is a Model Context Protocol server that bridges AI agents and PostgreSQL databases. It exposes schema metadata (tables, columns, indexes, foreign keys) as MCP resources, and lets agents execute read-only SQL queries or transactional writes. Ideal for developers who want Claude, Cursor, or other LLM-powered tools to answer questions about a live database without manual SQL. Supports connection string configuration, SSL modes, and Row-level security awareness.