Case Study #09 · AI Defensive Diagnostics · NDA

GPT chatbot security audit: prompt injection, PII leak, and FZ-152 at ₽15M

EdTech platform with 22,000 paying users. AI tutor on GPT-4o-mini. The CTO suspected anomalies, the CISO was asking uncomfortable questions, due diligence was 4 months out. A 2.5-week red-team per OWASP LLM Top 10 found 8 critical issues — all closed before DD.

FULLY NDA — no recognizable UI or branding. Architecture diagrams only.

Industry

Large EdTech B2C (22,000 paying users)

Stack

OWASP LLM Top 10 · Promptfoo · Garak · Rebuff

Timeline

2.5 weeks audit + 1 week re-test

Compliance

FZ-152: audit + remediation

Outcome

8 critical closed · ROI ≥ 24×

View service: AI Defensive Diagnostics

01 · Pain Point

AI in production, user PII, and due diligence in 4 months

Large EdTech platform: 22,000 paying users, AI tutor on GPT-4o-mini built into the core learning product. The tutor processes student context — progress, message history, sometimes personal data from filled-in forms. The bot has been running in production for six months since launch.

The CTO noticed anomalies: users sometimes received responses unrelated to their course. The CISO, in turn, started asking uncomfortable questions about personal data processing at OpenAI — where it's stored, how it's anonymized, whether there's logging, and how this aligns with FZ-152.

In parallel, pre-IPO due diligence was four months out. Any security incident in the LLM layer meant not only reputational losses but also real risk of the deal falling through. Investors have a separate security check on AI components, with an "OWASP LLM Top 10 audited" checkbox in the checklist.

02 · Solution

Red-team per OWASP LLM Top 10 + remediation

2.5 weeks of attacks on an isolated staging copy of production with production prompts and realistic synthetic PII. Each of the 10 OWASP items was a separate series of targeted attacks, both automated (Promptfoo, Garak, Rebuff) and manual red-team via custom scenarios tailored to EdTech specifics.

01

Scoping

Inventory of all LLM touchpoints: tutor, RAG, fine-tuning data, prompts. Threat model per OWASP LLM Top 10.

02

Automated

Promptfoo + Garak + Rebuff. Thousands of payloads for injection, jailbreak, exfiltration.

03

Manual red-team

Custom scenarios for EdTech: extract another student's PII, bypass paywall, force off-course responses.

04

FZ-152 layer

Data flow audit: what PII reaches OpenAI, what anonymization is applied, where PII processing documents are, how FZ-152 compliance is achieved.

05

Re-test

After fixes — re-run of the same attacks. Green report for DD.

Critical findings — 8 of them

On second reading, each one seems like "how could we have missed this." Before the audit, not one had been noticed by the internal dev team. The list without disclosing specific payloads under NDA:

01Prompt injection via "ignore previous instructions" bypassed the system prompt and changed the bot's persona
02PII leak via context bleeding in RAG — one user's response contained fragments of another's dialogue
03FZ-152 violation — PII was sent to OpenAI without anonymization, without consent for cross-border transfer
04No rate limit — a ₽280 denial-of-wallet attack would have exhausted the daily OpenAI budget
05Jailbreak through roleplay — "you're a stage acting teacher, play this role..." removed restrictions
06System prompt exfiltration — a pair of specially crafted questions revealed the full prompt text
07Output handling — LLM responses were rendered in UI without sanitization, opening XSS via generated markdown
08Insecure plugin design — the course materials access function allowed path traversal via course name

Remediation, not just a report

An audit without an implementation plan is just paper. Each of the 33 findings (8 critical + 11 high + 14 medium) came with a concrete fix proposal: system prompt patch, RAG context isolation, migration to GigaChat 2 Max for PII (RF perimeter), Cloudflare WAF rules against injection, markdown sanitization via DOMPurify, plugin sandboxing.

Re-test after fixes

A week after fix deployment — re-run of automated attacks + targeted manual scenarios against closed critical findings. All 8 critical reproduced as blocked. The green report went into the due diligence package. As a separate document — a plan to maintain the security perimeter for the next 12 months.

03 · Stack

Industry-standard tools + custom red-team

OWASP LLM Top 10

Threat modeling framework: 10 categories of AI app vulnerabilities as a checklist

Promptfoo

Test automation: thousands of prompt variants per target, response evaluation

Garak

Toolkit for red-teaming LLMs: jailbreak, exfiltration, prompt injection out of the box

Rebuff

Prompt injection detection layer — testing and embedding into production flow

Custom red-team

Manual scenarios tailored to EdTech: extract PII, bypass paywall, manipulate grading

GigaChat 2 Max

Alternative LLM provider for migrating PII data to the RF perimeter

Cloudflare WAF

Prompt injection payload filtering rules at L7 before reaching the LLM layer

FZ-152 audit

Legal side: PII processing, consents, processing agreement, cross-border transfer

OWASP LLM Top 10PromptfooGarakRebuffCustom red-teamGigaChatCloudflare WAFFZ-152

04 · Results

Closed before due diligence

Critical closed

8 / 8

all critical findings blocked and confirmed by re-test

FZ-152 risk

₽15M 0

after PII migration to RF perimeter + PII processing documents

ROI on one fine

≥ 24×

15M / 620k audit cost — and that's just FZ-152

The due diligence package included a green report across 10 OWASP LLM Top 10 categories with auditor signature and re-test date. The investor security check passed without additional questions about the AI layer — which is itself atypical for pre-IPO compliance.

The main non-quantitative win — the CISO now has a working process for future LLM layer changes. Every new prompt, every new plugin now goes through a mini-audit against the OWASP checklist, and a full red-team cycle is scheduled every six months.

05 · Where it fits

When an AI security audit is mandatory

Universal trigger — LLM processes someone else's data or makes last-mile decisions. If your LLM layer fits any of the patterns below — an audit isn't "nice to have," it's part of the compliance perimeter:

→ AI assistant with PII access — EdTech, HealthTech, HR tools, any B2C with dialogue history
→ RAG systems with multi-tenant context — where a leak between tenants = incident
→ AI with function calling / plugins — where the LLM calls real APIs, money operations, file access
→ Public-facing chatbot on a company site — every random user = potential attacker
→ Pre-IPO / pre-M&A / pre-certification — investor / regulator AI security check
→ Any PII processing via foreign LLM providers (OpenAI, Anthropic, Google) — FZ-152 risk

What's included in the audit (2.5 + 1 week)

Inventory of all LLM touchpoints: chatbots, RAG, embedding, fine-tuning data, prompts
Automated red-team per 10 OWASP categories — thousands of payloads
Manual red-team — custom scenarios tailored to business specifics
FZ-152 audit + remediation: data flow, anonymization, PII processing documents
Report with prioritized remediation plan — not "here are the problems," but "here's the fix"
Re-test after fix deployment — green report for DD / compliance

Similar challenge?

If AI is in production and red-team hasn't been run — it's already an incident waiting to happen

Fixed audit cost — ₽620k. Timeline 2.5 + 1 week. Report protected by NDA, format suitable for due diligence or ISO 27001 packages. Can start with a pre-audit (3 days) if the scope needs verification.

Order security audit View service

Готовы начать?

Аудит за 5 000 ₽ — с конкретным отчётом и сметой

Расскажу что внедрить в вашем бизнесе в первую очередь, какая будет окупаемость, и нужен ли вообще AI для вашей задачи (иногда — нет).

Записаться на аудит Написать в Telegram

Или просто напишите свой вопрос — отвечу в течение 2 часов