Pencheff is built around the principle that evidence-backed, adversarial testing should be as rigorous as a formal audit — readable by engineers, executives, and compliance teams on the same page.
Integrations and operations
Newsroom
Press coverage, bulletins, and platform announcements.
Findings, reports, dashboards, exports, integrations, and retests all read from the same normalized record.
Pencheff favors repeatable checks, then uses AI for triage, enrichment, orchestration, and remediation where it adds signal.
From the Pencheff docs
Release notes
/release-notesv0.7.0 — IP-clean expansion (2026-05-08)
Closes the four IP-risk surfaces that existed in v0.6 (CodeQL CLI on
customer code, Semgrep --config=auto, Llama Guard licence
acknowledgement, no DCO / license-audit CI) and ships the
twelve-category gap matrix from the strategic plan: vuln-DB
aggregator with AI enrichment, partner-pentest integrations, OSS
probe + DAST rule libraries, runtime LLM guardrail, runtime API
discovery, GitHub Check Run + SARIF, container admission webhook,
and supporting docs/UI for everything.
Phase 0 — IP-risk fixes
- CodeQL ripped and replaced — Semgrep OSS (pinned packs only) + Bandit + gosec + Brakeman + ESLint-security as the new SAST pack.
- Semgrep config tightened to an explicit OSS Registry pack list;
override via
PENCHEFF_SEMGREP_PACKS. - Llama Guard 3 hardened: opt-in only via
PENCHEFF_LLAMA_GUARD_ENABLED=1, license notice surfaced in everyJudgeResult.reason, default judge falls through to Granite Guardian (Apache-2.0). - DCO bot enforced on every commit (
.github/workflows/dco.yml). - License-audit CI + auto-generated
THIRD_PARTY_NOTICES.md(tools/license_audit.py). - SPDX header check for new/changed files
(
tools/spdx_check.py --changed-only). NOTICEandCONTRIBUTING.mdpublished.
Phase 1 — Foundation
- Refactored CVE feed to a pluggable
BulkFeedSourceprotocol; new RustSec (CC0) and GoVulnDB (BSD-3) feeds via theOsvBulkSourceskeleton (more ecosystems trivial to add). GET /advisories/{id}andGET /advisories?package=&ecosystem=with AI-enriched exploit walkthrough + fix recipe (Pencheff's answer to Snyk's curated DB; provenance JSONL on every run).- Partner pentest integrations — HackerOne / Bugcrowd / Cobalt — with HMAC webhook signing primitive shared with the generic webhook integration.
- Per-release SBOM published to GitHub Releases on every
v*.*.*tag, signed with cosign keyless via Sigstore.
Phase 2 — Probe & rule libraries
pencheff-probescommunity LLM red-team corpus with permissive- only JSONL schema + DoNotAnswer importer (tools/import_donotanswer_probes.py); HarmBench / AgentHarm / BeaverTails explicitly excluded for license reasons.pencheff-rulescommunity DAST rule library — Pencheff Pulse JSON format with the Nuclei→Pulse converter (tools/nuclei2pulse.py) plus AI rule synthesiser with strict validator (rejects destructive payloads, disallowed methods, non-permissive PoCs).- SAST tree-sitter pack with Solidity sub-pack (4 hand-curated rules); Lua / Scala / Dart / Kotlin / Swift / COBOL / Erlang scaffolded.
Phase 3 — Runtime + integration surfaces
- Pencheff Sentry — runtime LLM guardrail. HTTP proxy sidecar +
LiteLLM plugin + MCP middleware. Blocks prompt injection / PII /
unsafe HTML / token-ceiling violations inline. Separate package
pencheff-sentryon PyPI. (Docs) - API discovery from runtime traffic — synthesises OpenAPI 3.1
from captured
ProxyFlowrows; drift detector emitsapi_driftfindings (shadow / phantom / method-drift). (Docs) - GitHub Check Run + SARIF + Pencheff Suggest — Check Run with inline annotations on every PR scan, SARIF upload to Security → Code scanning, PR-comment suppression command parser. (Docs)
Phase 4 — Container, support, certs
- Container registry push webhooks for DockerHub / ECR / GCR / ACR (Pub/Sub envelope auto-decoded, Event Grid validation handshake handled). Each push enqueues a Trivy scan.
- Kubernetes
ValidatingAdmissionWebhook(Go) — refuses pods whose images carry unfixed critical CVEs. Helm chart published tooci://ghcr.io/balasriharsha-ch/charts/pencheff-admission. Fail-closed by default. (Docs) - "Verify with humans" finding-card flow — submit any finding to
HackerOne / Bugcrowd / Cobalt; partner callback flips
verification_statusbased on the triager's verdict. (Docs) - Procedural items (trademark searches, GitHub Secret-Scanning
Partner program application, SOC 2 + ISO 27001:2022, support-
tier hires) tracked in
docs/procedural-checklist.md.
Migration — what to do when upgrading
- Repo-scan stats keys shift:
stats.codeql→stats.semgrep,stats.bandit,stats.gosec,stats.brakeman,stats.eslint. Oldstats.codeqlrows from pre-v0.7 scans stay in the DB; the UI filters them as legacy SAST. - If you opted in to Llama Guard before v0.7, set
PENCHEFF_LLAMA_GUARD_ENABLED=1to keep using it — the default is now Granite Guardian. - The toolchain Docker image picks up Bandit / gosec / Brakeman / ESLint-security on next rebuild. CodeQL artefacts are dropped.
- Run
tools/license_audit.py --write-noticesbefore your first PR — the auto-generatedTHIRD_PARTY_NOTICES.mdis now the source of truth. - New env vars:
PENCHEFF_SEMGREP_PACKS(override SAST pack list),PENCHEFF_LLAMA_GUARD_ENABLED(opt-in Llama Guard judge).
v0.8.6 — Threat model on every scan, automatically (2026-05-08)
The v0.8.5 work made threat modeling a reusable engagement asset, but operators still had to manually generate a model before they got the adaptive scan benefit. This release closes the loop: every scan now gets a threat model, with two paths chosen by profile.
Auto-engagement on the deep profile
Every --profile deep scan against a URL with no engagement_id:
- Finds or creates an engagement keyed by
deep-{target_id[:8]}— one canonical engagement per target, deterministic slug. - Generates and persists a DREAD threat model on that engagement on first run.
- Pins the scan to that engagement and uses the model for module priority biasing.
Subsequent deep scans of the same target reuse the same engagement and the same threat model — findings accumulate, threat-model edits stick across runs.
Fly-by threat model on every other scan
quick, standard, api-only, compliance, cicd: when no engagement
is supplied, the dispatcher synthesises a DREAD model from the target
URL on the fly (~1 ms — pure-Python matrix lookup), uses it for the
module priority bias, and does not persist it. The bias is stamped
into Scan.summary.threat_model_bias for the dashboard, but no
engagement is touched.
Source label on every scan
Scan.summary.threat_model_source records which path generated the
bias for forensic clarity:
"engagement"— operator-supplied engagement carried a model."auto_engagement"— deep scan auto-created or reused the engagement."fly_by"— non-deep scan, no persistence.
5 new tests (apps/api/tests/test_auto_threat_model.py) cover the
helper that finds-or-creates the deep-scan engagement, slug-collision
safety, closed-engagement skipping, and missing-target-metadata
fallbacks.
v0.8.5 — Threat modeling, ThreatModelAgent, markdown viewer (2026-05-08)
Threat modeling — engagement-scoped STRIDE / DREAD with adaptive scan profile
- New:
POST /engagements/{id}/threat-modelgenerates a deterministic STRIDE or DREAD model from a target URL or explicit asset list.GET/PUT/DELETEcomplete the CRUD. - New:
Engagement.threat_modelJSONB column (migration 0040) andEngagement.threat_model_updated_atfor staleness signals. - Adaptive scan profile — when a scan is started against an
engagement that has a threat model, the dispatcher reorders the
profile's modules so highest-DREAD categories run first. The chosen
bias is stamped into
Scan.summary.threat_model_biasso the dashboard can show why a particular module fired first. ThreatModelAgentadded to the swarm's Phase 2 — runs in parallel with the breaker agents as a "lens" (no exclusive scan tools, only the sharedget_findings/test_endpoint). Emits an INFO-severity finding summarising threat coverage per asset.- Web UI at
/engagements/[id]/threat-model— table view (STRIDE rows or DREAD scored threats), markdown view, raw-JSON view; one-click Generate / Regenerate / Clear; surfaces the module priority bias. - Report inclusion — markdown report renders a
## Threat modelsection between executive summary and findings when the underlying scan was scoped to an engagement with a model. - 18 service tests — STRIDE/DREAD output shape, asset inference, scoring thresholds, module-bias deterministic ordering, markdown rendering, matrix completeness check.
Markdown viewer in the dashboard
Finding descriptions, executive summaries, and threat-model output now render as proper Markdown:
- GitHub-flavoured tables, strikethrough, task lists (via
remark-gfm). - Fenced code blocks with syntax highlighting (via
rehype-highlight). ```mermaidblocks render as SVG diagrams (viamermaidv11, dynamic-imported on the client so SSR is unaffected).<Markdown>is a reusable component (apps/web/components/markdown.tsx) used on the scan-detail and finding-detail pages.
Fixes the bug where the Assessments view rendered ## Proof of impact,
pipe-delimited tables, and bullet lists as plain text.
Pre-existing test fix as a side-effect
ActiveDirectoryAgent and MobileAppAgent from v0.8.0 were missing
entries in BREAKER_TOOL_ALLOCATIONS, which made
test_admin_access_agent.py fail with KeyError: 'ActiveDirectoryAgent'.
Empty allocations added; the swarm orchestrator + session-cleanup
tests are updated for the new total of 13 breakers.
v0.8.4 — Live CVE / NVD / EPSS / KEV data on every SCA scan (2026-05-08)
The SCA module already queried OSV.dev live per dependency, but EPSS and
KEV feeds were only refreshed when an operator manually called
refresh_cve_feed, and per-package OSV results were cached forever once
seen. Now every scan pulls live:
- NVD 2.0 enrichment per CVE — CWE list, CPE URIs, NVD-issued CVSS
v3.1 score & vector, canonical advisory URL. Cached 14 days
(
PENCHEFF_NVD_TTL_DAYS). SetNVD_API_KEYto raise the rate limit from 5/30 s to 50/30 s. - OSV per-package cache now has a 24 h TTL (
PENCHEFF_OSV_TTL_HOURS, set to0for always-live). - EPSS + CISA KEV are auto-refreshed at the start of every SCA scan
when the local cache is older than
PENCHEFF_FEED_TTL_HOURS(default 24 h, set to0for always-live). - Fail-open semantics — a network failure during refresh returns the stale-but-known row rather than dropping all SCA findings. Live-data intent fails open, not closed.
- Structured finding fields —
epss,epss_percentile,kev,kev_short_desc,kev_due_date,cwe_ids,advisory_url,nvd_cvss_score,nvd_cvss_vector,fix_version,package,ecosystemare now onFinding.metadata(no longer buried in description text). The canonical NVD URL is promoted to position 0 ofreferencesso DOCX / PR comment / finding card renderers link to NVD before OSV.
36 unit tests cover the NVD parser, TTL caching, fail-open paths, and the SCA scan-time refresh contract.
v0.8.3 — pencheff CLI is the canonical entry point (2026-05-08)
After pip install pencheff the package installer now puts a
pencheff executable on the user's PATH — the same shape as aws
or kubectl. The [project.scripts] entry was already present; this
release makes it the documented form everywhere.
- Added
pencheff --version/-Vfor parity withaws --version. Reads the installed package metadata viaimportlib.metadata. - Replaced every
python -m pencheff …reference across the GitHub Action, GitLab CI template, Azure DevOps pipeline, Jenkins doc, root- plugin READMEs, and 17 doc pages with the bare
pencheffform.
- plugin READMEs, and 17 doc pages with the bare
- The legacy
python -m pencheff …invocation continues to work unchanged — the package keeps a valid__main__module. - Installation docs now show
which pencheff+pencheff --versionas the post-install verification.
v0.8.2 — API key scope coverage to every public router (2026-05-08)
The default-deny scope layer introduced in v0.8.1 is now wired into
every public-facing FastAPI router — repos, sboms,
dependencies, repeater, intruder, proxy, traffic,
engagements, schedules, notes, comments, fix-proposals,
dashboard, and unified-findings join the v0.8.1 set
(scans, findings, targets, reports, assets, integrations).
The advertised scope catalog (37 scopes, 20 categories) now matches exactly what the dependency layer enforces — no silent 403s on a route that didn't opt in.
last_used_atwrites are debounced to one update per 60 s per key — a busy CI key polling every few seconds no longer issues a write per request.- Auth-flow integration tests added (21 cases) covering revoked,
expired, cross-org, detached-membership, and mismatched-workspace
paths, plus
require_scopeandsession_onlyinvariants. /repos/install-urlis correctly marked session-only (interactive GitHub App handshake); the/repos/callbackredirect was already unauthenticated.
v0.8.1 — Programmatic access: PENCHEFF_API_KEY with scoped permissions (2026-05-07)
PENCHEFF_API_KEY — per-user API keys with fine-grained permissions
Every user can now mint API keys for scripts, CI pipelines, and scheduled jobs. Manage them at Settings → API keys in the dashboard.
- Format —
pcf_live_<43-char-secret>. Stored as SHA-256; the plaintext is shown exactly once at creation. - Org-pinned — every key names exactly one organisation.
- Workspace-pinned — keys may be scoped to a specific workspace
(any member can mint these), or left org-wide (
workspace_id: null, owners and admins only). - Fine-grained scopes —
category:actionstrings. Wildcards:scans:*,*:read,*:*. - Default-deny — endpoints opt in to scope checks; routers without
a
require_scopedeclaration reject API-keyed callers regardless of scopes held. - Session-only endpoints — billing, branding, org admin / member management, and the API-key router itself never accept a key. A leaked key cannot mint more keys, change billing, or modify membership.
- Membership re-check on every request — if the issuing user is removed from the org, all of their keys for that org stop working immediately (no cache).
- Audit logged —
api_key.create,api_key.update,api_key.revokeare written toaudit_logswith the key ID and prefix.
See the API keys reference for the full scope catalog, recipes (CI/CD, SIEM forwarders, fan-out automation), and security notes.
v0.8.0 — AD/mobile/ASM MCP tools, production hardening, GitLab CI & Azure DevOps (2026-05-07)
New MCP tools (3)
-
scan_active_directory(session_id, domain, username, password, dc_ip?, modules?)— Orchestrated Active Directory enumeration: BloodHound relationship graph, Certipy ESC1–ESC15 certificate template abuse, CrackMapExec/NetExec SMB enumeration, Impacket secretsdump/Kerberoast/AS-REP roast. Selectable via themoduleslist — run one or all four. See Active Directory docs. -
scan_mobile_app(session_id, apk_path, platform?, modules?, mobsf_url?)— Static analysis of Android APKs and iOS IPAs: MobSF REST API enrichment, apktool decompile, AndroidManifest.xml security checks (debuggable, allowBackup, cleartext, exported components, minSdkVersion), and jadx-based secrets sweep (15+ patterns including AWS, GCP, Firebase, Stripe, GitHub, JWTs, PEM keys). See Mobile Security docs. -
scan_asm(session_id, org, root_domain, modules?)— Continuous Attack Surface Monitoring: passive subdomain discovery (subfinder- crt.sh), certificate transparency log watch (new issuances in last 7 days),
and asset inventory change detection (diffs vs. last snapshot). Results
persisted to
~/.pencheff/asm_inventory.db.
- crt.sh), certificate transparency log watch (new issuances in last 7 days),
and asset inventory change detection (diffs vs. last snapshot). Results
persisted to
Agent swarm: 10 → 12 Phase 2 breakers
-
ActiveDirectoryAgent— firesscan_active_directorywhen AD credentials are present; analyses BloodHound attack paths, Certipy ESC chains, and SMB share exposure; emits structured findings with step-by-step PoC commands. -
MobileAppAgent— firesscan_mobile_appagainst any APK/IPA supplied at session creation; triages MobSF findings by severity; flags hardcoded secrets with smali/Java class path and line number.
Production API hardening
-
The FastAPI app now refuses to start in
ENVIRONMENT=productionmode ifJWT_SECRETis still the insecure default orFERNET_KEYis empty. This prevents silent misconfiguration in operator deployments. -
Unhandled exception handler now returns
"Internal server error."in production instead of the fullExceptionType: messagestring, preventing internal stack details from leaking to clients.
CI/CD integrations
-
GitLab CI — reusable
.gitlab-ci.ymltemplate inapps/gitlab-ci/. Include it in any GitLab project; configure viaPENCHEFF_*CI/CD variables. Runs on MR events and default-branch pushes; report artifact retained 30 days. See GitLab CI docs. -
Azure DevOps — parameterized
azure-pipelines.ymltask inapps/azure-devops/. Use viaextends:or copy thesteps:section inline. Publishes the report as a build artifact. See Azure DevOps docs.
ASM dashboard tab
- New
/asmroute in the web dashboard (apps/web/app/asm/page.tsx) — shows total asset count, new subdomains in last 24 h, expiring certs, and an asset table with type badges. "Run Discovery" button ready for backend wiring.
PyPI
- Published as
pencheff==0.5.0—pip install --upgrade pencheff. - MCP tool count: 49 → 52.
v0.7.0 — AI agent swarm, consent screen, LLM trace persistence, evidence screenshots (2026-05-06)
Pencheff's single-agent loop is replaced as the default execution path by a 17-agent parallel swarm. Every scan now requires explicit operator consent, and every LLM call made by every agent is persisted for audit and reproduction.
AI agent swarm
- New default scan mode: one
ReconAgent→ 10 parallel breaker agents → 6 parallel synthesis agents, all coordinated by the swarm orchestrator inapps/api/pencheff_api/services/agent_runner.py. - The 10 Phase 2 breakers fan out concurrently from a frozen
ReconSnapshot:InjectionAgent,ClientSideAgent,AuthAgent,AuthzAgent,APIAgent,InfraAgent,CloudAgent,LLMRedTeamAgent,SupplyChainAgent,K8sAgent. - The 6 Phase 3 synthesis agents read the merged findings in parallel:
ChainAgent,ComplianceAgent,ProofOfImpactAgent,PayloadCraftingAgent,EvidenceCaptureAgent,AdminAccessAgent. - Typical deep-scan numbers: ~33 min wallclock, ~411 K input / ~86 K output tokens, ~109 LLM calls.
- See AI agent swarm for full operator documentation.
Consent screen at scan creation
- Every
POST /scansnow requires aconsent_payloadfield: an authorization statement (≥ 50 chars) and an acknowledged checkbox. The API returns422if either is absent. - Consent is stored on
Scan.consent_payload(JSONB) and included in audit exports. - The scan-creation UI in the web dashboard presents the disclosed-actions catalogue per agent class before accept.
LLM trace persistence
- Every LLM call made by every swarm agent is written to the new
scan_llm_tracestable (agent name, turn, request messages, response, token counts, optional reasoning block). - New endpoint
GET /scans/{id}/llm-tracesreturns the full trace array for a completed scan. Useful for cost auditing, reproduction, and debugging. - Compact summary lines appear in the assessment log per call.
Evidence screenshots
EvidenceCaptureAgent(Phase 3) takes a Playwright screenshot per verified high/critical finding with PII redacted.- Stored at
~/.pencheff/evidence/<scan_id>/<finding_id>.pnginside the worker container; served viaGET /scans/{id}/evidence/{finding_id}.png(auth required, 404 if missing).
New pencheff MCP tools
capture_evidence— Playwright screenshot of a vulnerable URL with PII redaction.scan_llm_red_team— probe an AI/LLM endpoint for prompt injection, jailbreak, and system-prompt extraction using the OWASP LLM Top-10 payload library.playwright_navigate— GET-only page navigation inheriting session auth cookies.playwright_screenshot— screenshot the current page state.playwright_enumerate_links— read-only enumeration of visible links on the active page.playwright_logout— log out and close the browser context.set_auth_state(orchestrator-internal),attach_oast(orchestrator-internal),import_endpoints(orchestrator-internal),copy_finding(orchestrator-internal),pentest_destroy(orchestrator-internal) — used by the swarm orchestrator to manage breaker sessions; not callable by agents.
Killswitch
- Set
SWARM_ENABLED=falseon the API container to revert all new scans to the legacy single-agent path immediately. In-flight scans are unaffected.
What didn't change
- No breaking changes to the scan creation API request shape beyond the new
required
consent_payloadfield. Existing integrations (CI scripts, SDK callers) need to add this field; all other fields and defaults are unchanged. - The
GET /scans,GET /scans/{id},GET /scans/{id}/findings,GET /scans/{id}/progress, andDELETE /scans/{id}endpoints are unchanged. - Deterministic scan profiles (
deterministic_only) are unaffected — the swarm only replaces the LLM-driven phase.
v0.6.0 — Auto-fix PRs, IDE extensions, Triage 2.0, unified findings (2026-05-02)
Closes the Snyk-parity gap on the defensive surface while keeping Pencheff's offensive lead.
Auto-fix PRs for SCA
- New deterministic version-bump patcher across 9 manifest formats:
requirements.txt,pyproject.toml,Pipfile,package.json,go.mod,Cargo.toml,Gemfile,composer.json,pom.xml. SCA findings flow through the existingpropose_fix→apply→ PR pipeline with no LLM cost. Lockfiles deliberately not edited — the PR body instructs the developer to run the right installer. - See Auto-fix PRs.
IDE extensions (VSCode + JetBrains)
- New
pencheff lspCLI command starts a hand-rolled Language Server over stdio. Tails~/.pencheff/history/*.jsonand republishes diagnostics whenever scan results change. - VSCode extension at
apps/vscode/; JetBrains plugin atapps/jetbrains/(Kotlin + LSP4IJ). Any LSP-aware editor (Neovim, Emacs, …) works viapencheff lspdirectly. - See IDE extensions.
EPSS + KEV + SSVC + reachability prioritisation
- Every finding gets
risk_score(0–100),ssvc_decision(act/attend/track_star/track), andreachability(exploited/reachable/present/unknown) computed at insert from CVSS × EPSS × KEV × SSVC × reachability. - Dashboard sorts by
risk_score DESC NULLS LAST. The Priority Strip surfaces the components inline on every finding card. - See EPSS, KEV & SSVC and Reachability classifier.
Triage 2.0
- Pro-tier
POST /findings/{id}/triagereturns a structured walkthrough —walkthrough/blast_radius/exploit_scenario/fix_outline/confidence— anchored on the live evidence on the finding (DAST request/response, taint trace, EPSS/KEV/SSVC). - Cached on
finding.ai_triage. Reuses theFIX_LLM_API_KEYalready configured for the auto-fix proposer. - See Triage 2.0.
Unified findings stream
- New
GET /unified-findingsmerges DAST / SAST / SCA / IaC / secrets into a single sortable, filterable queue. Replaces the scan-by-scan navigation for the "what should I fix first" use case. - New dashboard page at
/findings. Filter chips for source, severity, reachability; pagination with stable order across pages. - See Unified findings stream and the API reference.
Repository SBOMs
- New
POST /repos/{repo_id}/sbomgenerates an SBOM for the latest commit on the repository’s default branch and stores it on the repository. - New
GET /repos/{repo_id}/sbomreturns the latest stored SBOM. - Repository pages display the SBOM in both a Table view and a raw JSON view, with one-click JSON download.
- A new generation replaces any previous SBOM for that repository.
- See SBOM generation and the Repos API.
Migrations
0026_ssvc_decision—findings.ssvc_decision+ index.0027_reachability—findings.reachability+ composite index.0028_ai_triage—findings.ai_triageJSONB.0029_drop_unused_tables— drops legacy tables (no-op for fresh deploys; safety net for partial-migration recovery).
Run alembic upgrade head (or rebuild the API container — it runs
the migration step automatically).
v0.5.0 — LLM red team: OWASP LLM Top 10 + Crescendo + PAIR + judges + cloud auth (2026-04-29)
A major release. Pencheff gains a third target kind — llm — that
turns a chat-completions endpoint into a fully-instrumented red-team
target with full OWASP LLM Top 10 (2025) coverage, multi-turn
escalation, iterative attacker-driven search, optional judge models
(Llama Guard / Granite Guardian / OpenAI Moderation / executable),
embedding-similarity grading, KB-grounded factuality checks, and
mappings to MITRE ATLAS / NIST AI RMF / EU AI Act alongside OWASP.
New target kind: llm
POST /targetsacceptskind: "llm"with anllm_configblock. Provider presets:openai-chat,custom(request body template + response JSONPath),executable(local command, JSON over stdin/stdout),websocket,bedrock(SigV4 via boto3),vertex(Google ADC token caching),azure-openai(Entra OAuth),browser(Playwright drives a chat UI). Auth headers ride undercredentials.headers— any number of arbitrary K-V pairs, Fernet-encrypted.- The web UI's
/targets/newand/targets/{id}/editboth expose the full LLM form: provider preset, model, system-prompt baseline, dynamic header rows, redteam config, judge / attacker / embedder JSON blocks, thresholds, budget, retries, RPS/RPM caps.
OWASP LLM Top 10 (2025) coverage
- New MCP tool
scan_llm_red_team(session_id, categories?, techniques?, max_payloads?). Runs all 10 categories: LLM01 prompt injection, LLM02 sensitive information disclosure, LLM03 supply chain, LLM04 data and model poisoning, LLM05 improper output handling, LLM06 excessive agency, LLM07 system prompt leakage, LLM08 vector / embedding weaknesses, LLM09 misinformation, LLM10 unbounded consumption. Each category ships a curated YAML payload library; each finding aggregates by(category, technique)so reports show one Finding per technique with up to 5 evidence rows rather than N near-duplicate clones. - New scan profile shape for LLM kind:
quick= 25 payloads,standard= 75,deep= 250. Round-robin across techniques so quick profiles never starve any single technique class.
Multi-turn Crescendo + PAIR iterative search
- The
crescendostrategy is now a real 5-turn TestCase that builds context turn-by-turn. The dispatcher carries assistant replies forward asmessages[]history; an optional judge can short- circuit a clearly-refusing escalation to save budget. - New
redteam.iterative: "pair"mode — Prompt Automatic Iterative Refinement. With an attacker LLM configured, the loop sends the base prompt, reads the target's reply, asks the attacker to refine, and re-sends until VULNERABLE orpair_iterationsexhausted. Static-template fallback (iterative: "static") remains for air-gapped environments.
Strategies + composite stacking
- 21 deterministic prompt transforms:
base64,hex,rot13,morse,leetspeak,homoglyph,jailbreak,authoritative- markup,citation,best-of-n,ascii-smuggling,emoji-smuggling,image-markdown,audio-transcript,video-transcript,camelcase,pig-latin,crescendo, plus user-defined plugin strategies. composite_strategieschains transforms left-to-right (base64+leetspeak,jailbreak+ascii-smuggling, …). Languages wrap each prompt with a target-language directive — non-English locales typically have weaker safeguards.
Judges + grading
LlmJudgesupports five providers:openai-chat(any OpenAI-compatible JSON-grading model),executable(local command),llama-guard(Llama Guard 3 with the officialsafe/unsafe S1..S14parser → OWASP LLM mapping),granite-guardian(IBM Granite Guardian 3.x Yes/No protocol), andopenai-moderation(OpenAI/moderationsAPI — recommended for reasoning-model targets because it scores the visible output rather than the chain-of-thought).- New
redteam.embedderblock adds embedding-similarity grading. TestCases declaresuccess_embeddings: [...]; cosine match against any anchor at ≥ threshold promotes AMBIGUOUS verdicts to VULNERABLE. v1 supports OpenAI-compat/embeddingsand Cohereembed. - New
redteam.factualityblock (LLM09 only). KB-grounded contradiction check via the configured judge. KB can be inline,file://path, or HTTP URL.
Attacker-LLM driven synthesis
redteam.llm_synthesis: { enabled: true, n: 10 }plus anattackerblock generates novel TestCases targeted at the discovered profile — purpose, limitations, tools, user context. One attacker call per scan; cached by profile hash.
Datasets, guardrails, variables, intents
- Built-in datasets:
donotanswer,harmbench,beavertails,cyberseceval,toxic-chat. External datasets viafile://or HTTPS URL (JSON / YAML list). - Built-in guardrails:
pii,secrets,unsafe-code,tool-authz.guardrail_bypass: trueadds active bypass-template variants. redteam.variables: {...}substitutes{{var}}placeholders in prompts, turns, system, success indicators, refusal patterns, description, remediation. Useful for application-specific probes.redteam.policiesandredteam.intentsaccept user-defined policy violations and (multi-turn) intent strings — first-class TestCases dispatched alongside the OWASP modules.
Operational / cost controls
- Token-bucket rate limiter is shared per (endpoint, RPS) so 10
OWASP modules dispatching concurrently respect a single per-key
cap. 429 responses honour the upstream
Retry-Afterheader automatically and stall every concurrent dispatcher to prevent thundering-herd retries. - Per-scan budget:
max_calls,max_tokens,max_cost_usd— hard kill switch. Per-callmax_latency_msandmax_tokens_per_callthresholds emit explicit LLM10 findings when violated. - Retry with exponential backoff (
retries,backoff_s) on 429 / 500 / 502 / 503 / 504. In-process LRU cache deduplicates identical probes (cache,cache_size). - New CRITICAL finding
LLM endpoint unreachable / unauthorisedfires when ≥50% of probes return non-2xx (401/403 → CRITICAL, 404/429 → HIGH, others → MEDIUM). Closes the "Grade A despite every probe 401'd" silent-fail bug. - PII redaction: emails, SSNs, cards, phone numbers, common API
key patterns (
sk-…,xoxb-…) are masked in evidence snippets before they reach Findings or the share-by-link route.
Compliance: AI frameworks
- Every LLM finding maps to MITRE ATLAS, NIST AI RMF, and EU AI
Act alongside OWASP LLM Top 10. Tables in
plugins/pencheff/pencheff/config.py(MITRE_ATLAS_MAP,NIST_AI_RMF_MAP,EU_AI_ACT_MAP).
Reporting
- New renderers:
render_html(self-contained, embedded CSS, no JS — email-able),render_csv(stable columns, Excel-friendly),render_red_team_markdown,render_junit_xml,render_prometheus_metrics. Diff helperdiff_red_team_findingspowers regression detection across runs. - New API route
GET /scans/{a}/compare/{b}returns the structured diff (regressions, fixes, common failures) plus per-side summaries. Web UI at/scans/compare?a=…&b=…includes a JUnit-XML download for the regressions list. - New API route
POST /scans/{id}/share?ttl_seconds=Nissues a Fernet-encrypted token. Public routeGET /share/llm/{token}renders HTML / Markdown / CSV / JSON without auth — only valid forkind: "llm"scans. - Canonical Grafana dashboard at
docs/grafana/pencheff-llm-redteam.json— eight panels consuming the Prometheus exporter.
Integrations
- Slack / webhook / Jira payloads now include a per-OWASP-LLM
category breakdown and the top failed techniques when
target.kind == "llm". The same generic integration matchers apply (per-target scoping, per-event filtering, severity gating). - Scheduled scans now accept LLM targets (validates
llm_configon schedule create).
Plugin SDK
- Three new discovery directories under
~/.pencheff/:custom_llm_strategies/,custom_llm_judges/,custom_llm_providers/. Drop a Python file with anameclass attribute and a method matching the protocol; gate discovery onPENCHEFF_ENABLE_CUSTOM_MODULES=1. Plugins win over built-ins on name collision so a deployment can override the canonicaljailbreaktemplate with a deployment-specific one.
CLI
- New subcommand
pencheff llm-redteamwith--strategies,--datasets,--guardrails,--judge-{provider,endpoint,model},--max-rps,--max-cost-usd,--retries,--fail-on,--output-format {markdown,json,junit,csv,html,prometheus},--output-file, and--compare-to PRIOR_JSONfor CI-friendly regression gating.
Bug fixes
- Headers from the
Credentials.headersschema field now flow correctly into LLM probes. Previously,CredentialStore.add_from_dictread from thecustom_headersdict key but the API schema exposed it asheaders, causing every LLM probe to ship with no Authorization header → silent 401s on every request.
Schema migration
- Migration
0022addskind(string, indexed) andllm_config(JSONB) to thetargetstable; backfillskind = 'repo'for any row whoserepository_id IS NOT NULL. Existing URL targets remainkind = 'url'. Adds composite indexix_targets_workspace_kind_created.
See LLM red team feature page for the full walkthrough, and the Plugin SDK guide for custom strategies / judges / providers.
v0.4.1 — Mobile static analysis, search + pagination across the SaaS UI, Engagements removed (2026-04-28)
A targeted release. Pencheff gains an OWASP-Mobile-Top-10-aware static analyzer for APK/IPA files; the SaaS UI gets paginated, searchable target and assessment lists everywhere; and the Engagements feature (experimental in v0.4.0) is fully removed in favor of the simpler target → assessment workflow.
Mobile static analysis (Phase 1)
- New MCP tool
scan_mobile_static(session_id, apk_path?, ipa_path?, types?, use_mobsf?)— analyzes an Android APK or iOS IPA without an emulator or rooted device. Decompiles viaapktool+jadx(Android) or unzips and parsesInfo.plist(iOS), then sweeps for OWASP Mobile Top 10 issues:- AndroidManifest —
debuggable=true,allowBackup=true,usesCleartextTraffic=true, exported activities/services/receivers/ providers withoutpermission, missingnetworkSecurityConfig, dangerously lowminSdkVersion. - Hardcoded secrets in jadx-decompiled Java — AWS / Google / Firebase / Slack / GitHub / Stripe / Twilio / SendGrid / Mailgun keys, JWTs, PEM private keys, password assignments.
- Insecure crypto — DES, 3DES, RC4, ECB mode, MD5, SHA-1,
hardcoded
SecretKeySpec/IvParameterSpec,java.util.Random. - Cleartext URLs in compiled code.
- iOS Info.plist —
NSAllowsArbitraryLoadsand ATS exceptions for media / WebView, custom URL schemes (deeplink hijacking risk), embedded provisioning profiles. - iOS binary hardening — missing PIE flag (via
otool -hv, macOS only).
- AndroidManifest —
- New scan profile
mobile-static. Passpentest_init(profile= "mobile-static")thenscan_mobile_static(apk_path=...). - Compliance maps for
mobile_misconfig,mobile_secrets,mobile_crypto,mobile_storage,mobile_communication, andmobile_binarycategories added to PCI-DSS, NIST 800-53, SOC 2, ISO 27001:2022, and HIPAA. NewOWASP_MOBILE_TOP_10(M1–M10) name resolution on every finding. - Hardening:
defusedxmlfor the manifest parser (no XXE / billion- laughs), zip-slip guard on IPA extraction, 5 MB cap on per-file scans with possessive-quantifier JWT regex (no ReDoS). - Tools:
apktool,jadx,mobsfscan,qark,aapt/aapt2,androguard,otool,class-dump, andplistutilare allow-listed forrun_security_tool. SetMOBSF_API_KEYto opt into MobSF enrichment viause_mobsf=true.
Dynamic instrumentation (Frida / objection / drozer) is Phase 2 and
remains out of scope for scan_mobile_static.
SaaS UI: search + pagination on every list
/dashboard,/targets,/scans,/targets/{id}, and/repos/{id}now ship a search input (filtering name / URL / kind for targets, and report № / status / grade / target name for assessments) and a paginator on the same row, opposite the search.- Targets paginate at 6 per page, assessments at 20.
- The paginator is always visible alongside the search — even
single-page result sets render
Page 1 of 1with disabled Prev / Next, so users see the same control whether the workspace has 4 assessments or 400.
Engagements — removed
- The entire
/engagementsroute, the Workbench dropdown entry, and the engagement selector inside the Commission Scan modal are gone. Scans now POST without anengagement_id. Findings collected against an Engagement in v0.4.0 are still queryable through the Scans / Targets surface. - The Workbench dropdown's
Assetslink is also removed; the/assetspage itself remains for direct linking and ASM API consumers.
Tool count: 49 → 50 MCP tools, attack modules 53 → 57.
v0.4.0 — Engage swarm, lifecycle integrations, repos as targets, PAT private repos (2026-04-28)
Major release. Pencheff gains a 9-phase autonomous engagement, a unified target/repo model, and integrations that fire on the full finding lifecycle — not just scan completion.
pencheff engage — 9-phase autonomous swarm
- 30 specialist playbooks registered in
pencheff.playbooks.REGISTRY. 28 are adapted from 0xSteph/pentest-ai-agents; two new ones —crawl_firstandapi_authenticator— own the HTTP-first reconnaissance + login-discovery flow. - 9 phases (was 7): scope → crawl → auth → recon → vuln →
exploit → postex → detect → report. The two new phases populate
session.discovered.endpointswith the real surface before auth runs, so the auth phase picks a discovered login URL instead of guessing from a static 14-path list, and every downstream module tests the actual endpoints rather than just the base URL. - Subdomain fan-out:
pencheff engage --max-subdomains 100runs crawl + auth + vuln + exploit on each discovered subdomain, with findings merged back into the master session. - Tier 1 / Tier 2 model + OPSEC noise tagging (quiet / moderate / loud) + MITRE ATT&CK mapping on every finding.
- Engagement DB at
~/.pencheff/engagements.dbfor cross-session state (engagements, hosts, services, vulns, credentials, chains, session_log).
SaaS UI: Engage profile
- "Engage (full swarm)" added to the commission-scan modal — drives the same 9-phase pipeline from the dashboard.
- Live progress streaming: each phase + each playbook + each subdomain emits a scan-log line and an SSE event as it runs. The progress bar moves visibly across the 9 phases instead of frozen at 5% for ~10 minutes.
API-first authentication
- Default credential-based login replaced Playwright with HTTP API
probing across 14 common login endpoints. ~2-second login vs
15–30s, no Chromium dep, no SPA hydration races, no Cloudflare
Turnstile triggers. Playwright stays as the escape hatch for SSO /
SAML / MFA / CAPTCHA flows when explicit
login_stepsare supplied.
Integrations: lifecycle events + per-target scope
- Two new destinations: Google Chat
(webhook) and Jira (creates one issue per
finding_new, comments on the existing issue forfinding_changedwhen the issue key is on the finding'sexternal_refs). - Per-target scope — every integration carries a
target_idsarray. NULL = all targets; populated = only fire for scans against those targets. Targets here include both DAST URL targets and repo-mirror targets. - Per-event filter —
events: ["scan_started", "scan_done", "scan_failed", "finding_new", "finding_changed"]. Wire e.g. a PagerDuty integration scoped toscan_failed+finding_newfor one production target, while a Slack channel takes the full firehose for everything. - Five lifecycle hooks instead of one. The Celery
notify_event(scan_id, event_type, finding_id?, change_summary?, error?)task is the single dispatch surface; hooks at scan start / done / failed and at every finding-mutation endpoint (verify, suppress, unsuppress, recheck) enqueue it.
Repos as first-class Targets
- New column
targets.repository_id UUID NULL FK → repositories(id) ON DELETE CASCADE. Every Repository auto-mirrors as a Target row on registration; deleting the Repository cascades to the mirror. - Repo-mirror targets show up everywhere URL targets do — the
Targets dashboard, the integrations target multi-select,
GET /targets. They carrykind: "repo"so the UI can render a badge and route the commission-scan modal to/repos/{id}/scaninstead of/scans. - DAST scan against a repo-mirror target → 400 with a clear pointer to the repo-scan endpoint.
PAT-authenticated private repos
- New column
repositories.token_encryptedfor Fernet-encrypted Personal Access Tokens. POST /repos/githubaccepts an optionaltokenfield. With a token → validates it against the GitHub REST API, persists it encrypted, setsprivate=True. Without a token → existing public-clone behaviour.- Repo-scan worker decrypts the PAT and uses it as the
x-access-tokenpassword forgit clone. Re-registering the same repo URL with a new token rotates the stored credential without disturbing scan history or the mirror Target.
/targets/new and /repos redesign
- "Local folder" registration removed entirely from both pages. The worker can't honestly know which paths it'll see at scan time, so every repo path is now GitHub-based.
- 3-source picker on
/targets/new(Repository): Public GitHub URL · Private GitHub (PAT) · Pencheff GitHub App. - Same model on
/reposwith a 2-tab toggle (Public / Private PAT) plus the always-on GitHub App card at the top. - Detailed inline collapsible instructions for both flows: how to create a fine-grained or classic PAT (with exact scope/permission recommendations) and how to install the Pencheff GitHub App (step-by-step + permissions table + adding more repos later + removing access).
Migrations
0018— addintegrations.target_ids(UUID[]) +integrations.events(varchar[]) + GIN indexes.0019— addtargets.repository_id(UUID FK CASCADE) + idempotent backfill of one mirror Target per existing Repository.0020— addrepositories.token_encrypted(bytea NULL).
v1.0 — Expanded security workflows (2026-04-21)
Major release. Pencheff now covers the full enterprise DAST + AppSec surface in one tool.
SCA + SBOM + IaC + container
scan_dependencies— parse manifests for npm, PyPI, Go, crates.io, RubyGems, Packagist, Maven → OSV.dev CVE query → EPSS + CISA KEV enrichment.generate_sbom— produce SPDX 2.3 + CycloneDX 1.5 natively; preferssyftwhen installed.check_licenses— policy-driven license compliance (allows, denies, unknown behaviour).reachability.annotate— mark unimported deps as low-reachability to suppress noise.scan_dockerfile,scan_kubernetes,scan_terraform,scan_helm,scan_container_image.
Network VA
scan_host_vulns— Pencheff service detection → CVE lookup.scan_network_misconfig— Redis, Mongo, Elastic, Memcached, Docker, MySQL, PG, SNMP.scan_authenticated_host— SSH / WinRM / SMB package audit.scan_industrial_protocols— Modbus, BACnet, S7, EtherNet/IP, DNP3.- Local SQLite CVE cache with EPSS + CISA KEV refresh.
Intercepting proxy + fuzzer + YAML automation
start_proxy/stop_proxy— mitmproxy + pure-Python fallback.fuzz_parameter— request-template differential fuzzer with bundled XSS / SQLi / dir / param wordlists and 7 encoders.run_policy— full YAML ScanPolicy schema v1, assertions, thresholds, reports, schedule.- New passive scanner with 25+ regex rules across flows + active traffic.
Attack Surface Management + scheduling + collaboration
asm_discover— subfinder + crt.sh + optional Shodan.asm_diff/asm_cert_watch— change detection + CT log watch.- Cron-driven scheduled scans (Celery Beat).
- Finding SLA tracking (severity → due date → hourly breach monitor).
- Comments, assignment, tags, first-class collab endpoints.
- 7 integrations: Slack, Teams, Discord, PagerDuty, Opsgenie, Splunk HEC, signed generic webhook.
Risk scoring
- EPSS + CISA KEV enrichment on every finding.
risk_score = cvss × (1 + epss) × (2 if kev else 1)sorts reports by actual exploit likelihood.
Plugin SDK
BaseTestModuleformalised with lifecycle hooks.- Auto-discovery from
~/.pencheff/custom_modules/behindPENCHEFF_ENABLE_CUSTOM_MODULES=1. pencheff init-modulescaffold generator.
API + dashboard
- 9 new DB tables: schedules, assets, integrations, sboms, dependencies, proxy_sessions, finding_comments, finding_assignments, finding_tags.
- 7 new routers, 4 new Celery tasks (scheduled dispatcher, asset discovery, SLA monitor, integration fan-out).
- 5 new dashboard pages: /schedules, /assets, /integrations, /sbom/[scanId], /dependencies/[scanId].
- Nav bar updated with all new links.
Total
- MCP tools: 49 → 81
- Scan profiles: 6 → 13
- External tool allowlist: +14
- DB tables: +9
- Next.js pages: +6
- Compliance frameworks: 6 (OWASP, PCI-DSS, NIST, SOC 2, ISO 27001, HIPAA)
v0.2.1 — (2026-02-15)
Baseline release — DAST + exploit-first pentest agent.
Related