Maps the authoritative OWASP "Top 10 for Large Language Model Applications" (version 1.1) taxonomy—LLM01 Prompt Injection through LLM10 Model Theft—into an actionable readiness checklist for architects red-teaming Retrieval-Augmented Generation, Agents, plugins, training pipelines, or hosted inference gateways. Official project pages summarize each risk bucket (prompt injection bypassing safeguards, unchecked outputs enabling downstream exploits, poisoned corpora distorting reasoning, abusive workloads starving capacity, brittle supply-chain dependencies, sensitive data resurfacing inside generations, excessively privileged plugins/agents/autonomy, misplaced trust producing compliance failures, loss of proprietary model weights via API abuse). The skill pairs each category with tangible controls (policy, monitoring, toolchain limits) anchored to genai.owasp.org releases rather than anecdotes.
Use cases
- Security architecture reviews ahead of deploying an agent that can mutate production tickets
- Vendor diligence workshops comparing two LLM platforms’ logging, egress, or secret-handling posture
- Red-team drills focused on multimodal ingestion paths (documents, MCP tools, SaaS integrations)
- Legal or compliance teams translating AI risk registers into prioritized engineering milestones
- Post-incident remediation after unintended tool execution or hallucinated disclosures
Key features
- LLM01 Prompt Injection — catalog every untrusted corpus (documents, transcripts, MCP responses) flowing into prompts; insist on deterministic allow-lists plus secondary validation where models can escalate privileges.
- LLM02 Insecure Output Handling — treat model JSON/markdown/logs as hostile until schema-validated; forbid unsanitized LLM summaries from driving shells, SQL interpreters, or admin APIs.
- LLM03 Training Data Poisoning — if fine-tuning or continual learning is planned, freeze dataset provenance, monitor drift, and require reproducible supply-chain attestations referenced in OWASP corpus guidance.
- LLM04 Model Denial of Service — establish rate budgets, concurrency caps, watchdogs on repeated heavy tool calls, and autoscaling safeguards so adversaries cannot crater shared inference planes.
- LLM05 Supply Chain Vulnerabilities — pin checkpoints, attest container images/SDKs, enumerate third-party model hosts, and block shadow upgrades without CI gate approval.
- LLM06 Sensitive Information Disclosure — map secrets, PHI, regulated IDs, embeddings caches, and purge/retention windows; forbid echoing verbatim customer payloads back into logs or transcripts unless encrypted.
- LLM07 Insecure Plugin Design — ensure each plugin declares least privilege scopes, validates arguments server-side, and cannot chain into arbitrary network pivots.
- LLM08 Excessive Agency — require human approvals, reversible actions, telemetry on tool fan-out limits, and emergency kill switches whenever agents mutate external state.
- LLM09 Overreliance — pair model answers with deterministic verification (unit tests, policy engines, calculators) especially for regulated advice or safety-critical workloads.
- LLM10 Model Theft — tighten API metering, egress controls, cryptographic packaging of weights, anomaly detection on bulk downloads/scrapers, and align incident response paths if weights leak.
When to Use This Skill
- Quarterly resilience reviews tied to SOC2/ISO control mapping for generative workloads
- Design reviews before enabling net-new MCP integrations or unmanaged plugin marketplaces
- Board-level reporting that needs academically rigorous categorization sourced from OWASP
Expected Output
A scored checklist with owners, detective controls, and preventive mitigations keyed to OWASP GenAI documentation plus gap statements for auditors.
Frequently Asked Questions
- Which document version anchors this checklist?
- The public landing page summarizes v1.1 categories; reconcile quarterly with genai.owasp.org because titles may evolve alongside community releases.
- How does this differ from generic CWE reviews?
- The LLM-specific buckets highlight prompt surfaces, stochastic outputs, and agent autonomy—orthogonal failure modes seldom captured in classic web-application Top 10 alone.
- Do we replace threat modeling?
- No—this complements STRIDE/Data-flow exercises by giving LLM-heavy systems a lingua franca auditors already recognize.
Related
Related
3 Indexed items
Security review for AI-generated code
Reviews AI-generated code for security failure modes that AI assistants commonly miss: prompt injection risks, credential exposure, dependency vulnerabilities, insecure deserialization, and access control gaps. This skill catches what agents miss when they optimize for functionality over safety, especially in code that handles user input, authentication, or external data.
Designing with LLM structured outputs
This skill covers when and how to ask an LLM for machine-readable payloads: define a JSON Schema (or the vendor's equivalent), enable the structured-output feature your provider documents, validate responses in application code, and handle refusals or validation errors explicitly. It applies to tool-calling agents, extraction pipelines, configuration emitters, and any workflow where brittle text parsing creates production risk.
NIST AI Risk Management Framework (AI RMF 1.0) lifecycle checklist
Anchors facilitation workshops to NIST's voluntary Artificial Intelligence Risk Management Framework (AI RMF 1.0, formally NIST.AI.100-1 with DOI https://doi.org/10.6028/NIST.AI.100-1): the playbook issued alongside the Framework emphasizes structuring programs around the mutually reinforcing core functions GOVERN → MAP → MEASURE → MANAGE rather than improvising unrelated security tickets. NIST contemporaneously publishes companion assets such as the Trustworthy AI Resource Center playbook (airc.nist.gov), roadmap, crosswalks, and—for generative workloads—the Generative Artificial Intelligence Profile (NIST AI 600-1, July 26, 2024, DOI https://doi.org/10.6028/NIST.AI.600-1)—so teams can reconcile novel failure modes against documented categories of trustworthiness. This operational skill folds those authoritative layers into scripted prompts for cross-functional councils that must evidence documentation, escalation paths, quantitative trustworthiness analyses, prioritized mitigations, and alignment with externally referenced stakeholder expectations—not marketing slides.