Best LLM Scanners
Isometric overview of garak's generator, probe, and detector pipeline scanning an open-weight model
Tools

What Is Garak LLM Scanner? A Practitioner's Guide to NVIDIA's Open-Source LLM Vulnerability Tool

Garak is NVIDIA's open-source LLM vulnerability scanner that red-teams language models for jailbreaks, prompt injection, hallucination, data leakage, and

By Bestllmscanners Editorial · · 8 min read

If you have landed here wondering what is garak LLM scanner, the shortest answer is: it is an open-source framework that does for large language models what nmap does for networks — systematically probes a target for known weakness classes and reports what breaks. Garak is short for Generative AI Red-teaming and Assessment Kit. NVIDIA maintains it under Apache 2.0, and it has become a recurring default recommendation in LLM security evaluations because it ships ready-to-run probes rather than asking you to author attack prompts from scratch.

How Garak Works

Garak’s internal design follows a five-component pipeline. Understanding each piece clarifies what you actually get in the output.

Generators are adapters that talk to a target model. The framework ships generators for OpenAI API endpoints, Hugging Face Hub models, Replicate, AWS Bedrock, Cohere, Groq, LiteLLM, local GGUF files (llama.cpp v1046+), and any REST-accessible system. In practice this means you can point garak at a self-hosted model, a managed API, or a model served through an inference proxy without writing glue code.

Probes are the attack surface of the tool. Each probe implements one or more adversarial prompts aimed at a specific failure mode. The probe library covers:

  • DAN-style and derivative jailbreak patterns that attempt to strip alignment guardrails
  • Encoding-based prompt injection (base64, ROT13, leetspeak, and similar obfuscation layers)
  • Toxic content generation, including slurs, hate speech scaffolds, and harmful instruction completion
  • Hallucination and misinformation elicitation
  • Data leakage, including training data replay attempts
  • Malware generation via natural-language requests
  • Cross-site scripting payloads embedded in model output
  • Glitch token exploitation, where tokens that cause unpredictable model behavior are inserted into prompts
  • Package hallucination, relevant for models used in code generation pipelines

Harnesses structure how probes are executed against the generator. The default is a probewise harness that runs each probe class in sequence.

Detectors evaluate the model’s responses after probing. A detector for toxic output, for instance, checks whether the response exceeds a toxicity threshold; a detector for a jailbreak probe checks whether the model produced the constrained content it was asked to refuse. Detectors are separated from probes, so you can pair a probe with multiple detectors or swap in a custom classifier without modifying the probe.

Evaluators aggregate detector verdicts into structured reports. Garak writes results to JSONL — one record per probe attempt — which makes it straightforward to pipe into a SIEM, a notebook, or a CI gate. A hit log records confirmed vulnerability instances separately for triage.

The full architecture and design rationale are documented in the project’s research paper, published June 2024 by Derczynski, Galinkin, Martin, Majumdar, and Inie.

What Garak Covers and What It Does Not

Garak’s probe set maps reasonably well to the OWASP LLM Top 10 failure categories. Its prompt-injection probes address LLM01; toxic output and malware generation probes touch LLM02 (insecure output handling); data leakage probes cover LLM06; hallucination probes relate to LLM09. The mapping is not formal — garak does not annotate its findings with OWASP IDs — but if you use the OWASP LLM Top 10 as a coverage checklist, most of the high-severity items have at least one corresponding probe.

What garak does not do: it is not a runtime firewall. Running a garak scan tells you how a model behaves under adversarial prompting during the scan window; it does not intercept or block live inference traffic. For runtime content filtering and guardrail enforcement, you need a separate layer — our LLM Guard input/output scanning walkthrough covers the open-source toolkit that screens live prompts and responses, and the defensive AI tooling overview at guardml.io compares production-grade options.

Garak is also a black-box scanner by default. It generates prompts and inspects outputs; it does not inspect model weights, training data, or deployment configuration. White-box analysis of model internals requires different tooling.

Installation and Basic Usage

python -m pip install -U garak

That is the full install command. Once installed, a minimal scan targeting a Hugging Face model looks like:

python -m garak --model_type huggingface --model_name <org/model>

For OpenAI-compatible endpoints:

python -m garak --model_type openai --model_name gpt-4o

By default, garak selects a standard probe set and runs it against the target. You can narrow the scope with --probes to run specific probe classes, which is useful when you want to focus on a single OWASP category or when runtime costs are a constraint. The v0.15.1 release includes 31 versioned releases of accumulated probe and detector improvements.

Who Uses Garak and Why

The project describes its mission as lifting “the LLM security poverty line” — the observation that thorough LLM security evaluation has been accessible only to teams with dedicated red-team capacity and custom tooling. By packaging a broad probe library with a single pip install, garak makes systematic LLM evaluation available to any AppSec or MLOps engineer who can run a Python script.

Teams integrating LLMs into production products use garak as a pre-release gate: run a scan against the candidate model, review the hit log, decide whether failure rates on specific probe categories are acceptable for the deployment context. It fits naturally into CI/CD pipelines where JSONL output can be parsed and a build can be failed on high-severity hits. For the deeper technical treatment of garak’s probe categories, generators, and CI threshold parsing, see Garak LLM Vulnerability Scanner: How It Works and When to Use It, and for how garak stacks against PyRIT and Promptfoo as a CI gate, see Automated LLM Red-Teaming in CI.

Security researchers use it differently — as a baseline to measure how effective a proposed guardrail is, by running garak before and after deploying a content-safety classifier and comparing hit rates. The structured JSONL output makes before/after comparisons tractable.

Red teams use garak’s probe library as a starting point rather than an endpoint. The tool’s modular design lets you add custom probes for organization-specific risk scenarios — for example, probes that test for leakage of internal system-prompt content specific to your deployment.

For teams that have already done prompt-injection threat modeling, the detailed coverage notes at aisec.blog are a useful companion to garak’s injection probe results, providing attack context that the scanner’s hit log does not include.

Residual Risk

No scanner eliminates the class of vulnerability it tests. Garak’s probes represent known attack patterns as of the probe library’s last update; novel jailbreak techniques or encoding tricks not yet in the library will not generate hits. Coverage gaps are expected, particularly in adversarial scenarios where an attacker tailors prompts to a specific model’s behavior rather than using generic payload templates. One structural gap worth naming: garak runs single-turn probes, so the patient multi-turn adversary is better simulated by a programmable framework like PyRIT — see our PyRIT explainer. Treat a clean garak run as evidence that a model is not trivially breakable by known attacks, not as evidence that it is secure against a motivated adversary. For where garak sits among the full set of open-source and enterprise tools, see Best LLM Security Scanners: Open-Source and Enterprise Compared.

The scan also reflects a point in time. Models update, fine-tunes shift behavior, and system prompts change in production. Periodic rescanning as part of a continuous evaluation program closes more of that residual risk than a one-time pre-deploy gate.


Sources

Sources

  1. NVIDIA/garak — GitHub
  2. garak: A Framework for Security Probing Large Language Models (arXiv 2406.11036)
  3. garak — official project site
  4. Garak: Open-source LLM vulnerability scanner — Help Net Security
Subscribe

Best LLM Scanners — in your inbox

Comparing LLM security scanners and detection tools. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments