GPT chatbot security audit: prompt injection, PII leak, and FZ-152 at ₽15M
EdTech platform with 22,000 paying users. AI tutor on GPT-4o-mini. The CTO suspected anomalies, the CISO was asking uncomfortable questions, due diligence was 4 months out. A 2.5-week red-team per OWASP LLM Top 10 found 8 critical issues — all closed before DD.
AI in production, user PII, and due diligence in 4 months
Large EdTech platform: 22,000 paying users, AI tutor on GPT-4o-mini built into the core learning product. The tutor processes student context — progress, message history, sometimes personal data from filled-in forms. The bot has been running in production for six months since launch.
The CTO noticed anomalies: users sometimes received responses unrelated to their course. The CISO, in turn, started asking uncomfortable questions about personal data processing at OpenAI — where it's stored, how it's anonymized, whether there's logging, and how this aligns with FZ-152.
In parallel, pre-IPO due diligence was four months out. Any security incident in the LLM layer meant not only reputational losses but also real risk of the deal falling through. Investors have a separate security check on AI components, with an "OWASP LLM Top 10 audited" checkbox in the checklist.
Red-team per OWASP LLM Top 10 + remediation
2.5 weeks of attacks on an isolated staging copy of production with production prompts and realistic synthetic PII. Each of the 10 OWASP items was a separate series of targeted attacks, both automated (Promptfoo, Garak, Rebuff) and manual red-team via custom scenarios tailored to EdTech specifics.
Inventory of all LLM touchpoints: tutor, RAG, fine-tuning data, prompts. Threat model per OWASP LLM Top 10.
Promptfoo + Garak + Rebuff. Thousands of payloads for injection, jailbreak, exfiltration.
Custom scenarios for EdTech: extract another student's PII, bypass paywall, force off-course responses.
Data flow audit: what PII reaches OpenAI, what anonymization is applied, where PII processing documents are, how FZ-152 compliance is achieved.
After fixes — re-run of the same attacks. Green report for DD.
Critical findings — 8 of them
On second reading, each one seems like "how could we have missed this." Before the audit, not one had been noticed by the internal dev team. The list without disclosing specific payloads under NDA:
- 01Prompt injection via "ignore previous instructions" bypassed the system prompt and changed the bot's persona
- 02PII leak via context bleeding in RAG — one user's response contained fragments of another's dialogue
- 03FZ-152 violation — PII was sent to OpenAI without anonymization, without consent for cross-border transfer
- 04No rate limit — a ₽280 denial-of-wallet attack would have exhausted the daily OpenAI budget
- 05Jailbreak through roleplay — "you're a stage acting teacher, play this role..." removed restrictions
- 06System prompt exfiltration — a pair of specially crafted questions revealed the full prompt text
- 07Output handling — LLM responses were rendered in UI without sanitization, opening XSS via generated markdown
- 08Insecure plugin design — the course materials access function allowed path traversal via course name
Remediation, not just a report
An audit without an implementation plan is just paper. Each of the 33 findings (8 critical + 11 high + 14 medium) came with a concrete fix proposal: system prompt patch, RAG context isolation, migration to GigaChat 2 Max for PII (RF perimeter), Cloudflare WAF rules against injection, markdown sanitization via DOMPurify, plugin sandboxing.
Re-test after fixes
A week after fix deployment — re-run of automated attacks + targeted manual scenarios against closed critical findings. All 8 critical reproduced as blocked. The green report went into the due diligence package. As a separate document — a plan to maintain the security perimeter for the next 12 months.
Industry-standard tools + custom red-team
Threat modeling framework: 10 categories of AI app vulnerabilities as a checklist
Test automation: thousands of prompt variants per target, response evaluation
Toolkit for red-teaming LLMs: jailbreak, exfiltration, prompt injection out of the box
Prompt injection detection layer — testing and embedding into production flow
Manual scenarios tailored to EdTech: extract PII, bypass paywall, manipulate grading
Alternative LLM provider for migrating PII data to the RF perimeter
Prompt injection payload filtering rules at L7 before reaching the LLM layer
Legal side: PII processing, consents, processing agreement, cross-border transfer
Closed before due diligence
all critical findings blocked and confirmed by re-test
after PII migration to RF perimeter + PII processing documents
15M / 620k audit cost — and that's just FZ-152
The due diligence package included a green report across 10 OWASP LLM Top 10 categories with auditor signature and re-test date. The investor security check passed without additional questions about the AI layer — which is itself atypical for pre-IPO compliance.
The main non-quantitative win — the CISO now has a working process for future LLM layer changes. Every new prompt, every new plugin now goes through a mini-audit against the OWASP checklist, and a full red-team cycle is scheduled every six months.
When an AI security audit is mandatory
Universal trigger — LLM processes someone else's data or makes last-mile decisions. If your LLM layer fits any of the patterns below — an audit isn't "nice to have," it's part of the compliance perimeter:
- → AI assistant with PII access — EdTech, HealthTech, HR tools, any B2C with dialogue history
- → RAG systems with multi-tenant context — where a leak between tenants = incident
- → AI with function calling / plugins — where the LLM calls real APIs, money operations, file access
- → Public-facing chatbot on a company site — every random user = potential attacker
- → Pre-IPO / pre-M&A / pre-certification — investor / regulator AI security check
- → Any PII processing via foreign LLM providers (OpenAI, Anthropic, Google) — FZ-152 risk
- Inventory of all LLM touchpoints: chatbots, RAG, embedding, fine-tuning data, prompts
- Automated red-team per 10 OWASP categories — thousands of payloads
- Manual red-team — custom scenarios tailored to business specifics
- FZ-152 audit + remediation: data flow, anonymization, PII processing documents
- Report with prioritized remediation plan — not "here are the problems," but "here's the fix"
- Re-test after fix deployment — green report for DD / compliance
If AI is in production and red-team hasn't been run — it's already an incident waiting to happen
Fixed audit cost — ₽620k. Timeline 2.5 + 1 week. Report protected by NDA, format suitable for due diligence or ISO 27001 packages. Can start with a pre-audit (3 days) if the scope needs verification.
Аудит за 5 000 ₽ — с конкретным отчётом и сметой
Расскажу что внедрить в вашем бизнесе в первую очередь, какая будет окупаемость, и нужен ли вообще AI для вашей задачи (иногда — нет).
Или просто напишите свой вопрос — отвечу в течение 2 часов