GMBSPYLAB: FB Ad Library → Telegram feed
Production pipeline collecting ads from FB Ad Library via Playwright with residential proxies and cookie rotation. Replays GraphQL requests from logged-in sessions — fully bypasses Meta dev-app verification. 30-75k new creatives/day in 30+ GEOs, output to Telegram with topic-per-GEO.
Existing spy services: expensive, slow, don't cover the needed GEOs
Commercial spy services (AdHeart, AdSpy, AdLibrary aggregators) charge $300-800/mo for a subscription and cover a limited set of GEOs. For niche markets (Tier-2 EU, LATAM, APAC) data either doesn't exist or arrives with a 1-2 day lag.
Meta API via a dev app requires passing App Review and ID verification, which is closed for most affiliate verticals. And the facebook.com/ads/library UI only works in a logged-in browser session — static scraping doesn't work.
Goal — to collect a real-time stream of ads from 30+ GEOs in-house, into our own DB, with the ability to filter by format (image/video/PWA), active spend duration, tag normalization, and posting to TG.
4-layer pipeline: collect → normalize → store → distribute
Playwright + cookie rotation + residential proxies
8 cookie accounts (4 thick + 4 thin) are exported from Dolphin Anty / AdsPower into Playwright storage_state.json. 12 paid residential proxies (Bright Data / iProyal) rotate with geographic binding to the target GEO.
On each session, headless Chromium opens facebook.com/ads/library, extracts dynamic tokens (doc_id, fb_dtsg, lsd) and then sends direct POST /api/graphql/ — this is tens of times faster than DOM parsing.
Soft-block recovery + checkpoint detection + token refresh
Meta's doc_id changes ~once every 30 minutes — the pipeline automatically re-extracts tokens without interrupting the flow. On a checkpoint (CAPTCHA / 2FA prompt) the account is marked as "cooldown" and proxied in the pool until recovery. Floor rate 2.5s/request, ceiling ~250 POST/hour/account — totaling ~2000 requests/hour from the pool.
Postgres 16 + arq + MinIO for media
SQLAlchemy 2 async + asyncpg + alembic migrations, indexes on advertiser-id × GEO × first-seen-date. arq via Redis for background tasks (media download, dedup). Images/videos → MinIO / R2 via aioboto3. Dedup via perceptual hash — 75k raw → 5-12k unique per day.
Telegram via aiogram 3 — two surfaces
v1 — Forum supergroup: createForumTopic
for each of the 30+ GEOs (EU-27 + UK + BR + TZ/MX/IN/CA/ZA/RS/...). Posting → topic-per-country, easy to filter by market.
v2 — Hashtag feed channel: single stream, tagging with
#GEO #PWA #VIDEO #1DAY #FBPAGE.
The activity-refresher cron bumps #NDAY
daily (shows "running for N days"). Affiliate link rewriter substitutes the CTA URL with our own partner.
Technologies
- Playwright (Chromium headless)
- Dolphin Anty / AdsPower (cookie export)
- Bright Data / iProyal residential proxies
- GraphQL POST replay (doc_id / fb_dtsg / lsd)
- Python 3.12 + uv (package manager)
- SQLAlchemy 2 async + asyncpg
- PostgreSQL 16 + alembic migrations
- Pydantic v2 for all schemas
- arq + Redis 7 (background tasks)
- APScheduler (timer cron)
- MinIO / Cloudflare R2 (aioboto3)
- SQLite (cache, local state)
- aiogram 3 (Bot API)
- Forum supergroup with createForumTopic
- Hashtag-feed channel + N-day refresher
- Healthcheck → Discord/TG alerts
- Playwright-FB (primary, logged-in)
- ScrapeCreators (commercial fallback non-EU)
- Meta Graph API (for verified IDs)
- EU DSA Repository (regulatory data)
- structlog (structured logs)
- 16/16 tests passing (pytest)
- Healthcheck probe + emergency halt
- Discord webhooks for status
The outcome
after dedup: 5-12k unique
EU-27 + UK + BR + 30 non-EU markets
/mo inclusive — proxies + VPS + R2. vs $300-800 from commercial spy services
Where it fits for clients: the same approach (Playwright + cookie pool + GraphQL replay + Telegram distribution) applies to any source with a logged-in web UI — Avito, WB Seller, Ozon Seller, OkCupid, LinkedIn Sales Navigator, TikTok / Google ad libraries. On a call we work out which exact source you need and discuss architecture under NDA.
Аудит за 5 000 ₽ — с конкретным отчётом и сметой
Расскажу что внедрить в вашем бизнесе в первую очередь, какая будет окупаемость, и нужен ли вообще AI для вашей задачи (иногда — нет).
Или просто напишите свой вопрос — отвечу в течение 2 часов