Skip to content
VC
Case Study · Production scraper · Performance / Ad intelligence

GMBSPYLAB: FB Ad Library → Telegram feed

Production pipeline collecting ads from FB Ad Library via Playwright with residential proxies and cookie rotation. Replays GraphQL requests from logged-in sessions — fully bypasses Meta dev-app verification. 30-75k new creatives/day in 30+ GEOs, output to Telegram with topic-per-GEO.

Type
Production scraper SaaS under NDA
Stack
Python · Playwright · aiogram 3 · Postgres · arq
Timeline
~2 months to P0 → active development
Scale
30+ GEOs · 30-75k creatives/day
01 · Pain Point

Existing spy services: expensive, slow, don't cover the needed GEOs

Commercial spy services (AdHeart, AdSpy, AdLibrary aggregators) charge $300-800/mo for a subscription and cover a limited set of GEOs. For niche markets (Tier-2 EU, LATAM, APAC) data either doesn't exist or arrives with a 1-2 day lag.

Meta API via a dev app requires passing App Review and ID verification, which is closed for most affiliate verticals. And the facebook.com/ads/library UI only works in a logged-in browser session — static scraping doesn't work.

Goal — to collect a real-time stream of ads from 30+ GEOs in-house, into our own DB, with the ability to filter by format (image/video/PWA), active spend duration, tag normalization, and posting to TG.

02 · Architecture

4-layer pipeline: collect → normalize → store → distribute

01
COLLECT

Playwright + cookie rotation + residential proxies

8 cookie accounts (4 thick + 4 thin) are exported from Dolphin Anty / AdsPower into Playwright storage_state.json. 12 paid residential proxies (Bright Data / iProyal) rotate with geographic binding to the target GEO.

On each session, headless Chromium opens facebook.com/ads/library, extracts dynamic tokens (doc_id, fb_dtsg, lsd) and then sends direct POST /api/graphql/ — this is tens of times faster than DOM parsing.

02
RESILIENCE

Soft-block recovery + checkpoint detection + token refresh

Meta's doc_id changes ~once every 30 minutes — the pipeline automatically re-extracts tokens without interrupting the flow. On a checkpoint (CAPTCHA / 2FA prompt) the account is marked as "cooldown" and proxied in the pool until recovery. Floor rate 2.5s/request, ceiling ~250 POST/hour/account — totaling ~2000 requests/hour from the pool.

03
STORE

Postgres 16 + arq + MinIO for media

SQLAlchemy 2 async + asyncpg + alembic migrations, indexes on advertiser-id × GEO × first-seen-date. arq via Redis for background tasks (media download, dedup). Images/videos → MinIO / R2 via aioboto3. Dedup via perceptual hash — 75k raw → 5-12k unique per day.

04
DISTRIBUTE

Telegram via aiogram 3 — two surfaces

v1 — Forum supergroup: createForumTopic for each of the 30+ GEOs (EU-27 + UK + BR + TZ/MX/IN/CA/ZA/RS/...). Posting → topic-per-country, easy to filter by market.

v2 — Hashtag feed channel: single stream, tagging with #GEO #PWA #VIDEO #1DAY #FBPAGE. The activity-refresher cron bumps #NDAY daily (shows "running for N days"). Affiliate link rewriter substitutes the CTA URL with our own partner.

03 · Stack

Technologies

Collect / Browser
  • Playwright (Chromium headless)
  • Dolphin Anty / AdsPower (cookie export)
  • Bright Data / iProyal residential proxies
  • GraphQL POST replay (doc_id / fb_dtsg / lsd)
Backend
  • Python 3.12 + uv (package manager)
  • SQLAlchemy 2 async + asyncpg
  • PostgreSQL 16 + alembic migrations
  • Pydantic v2 for all schemas
Queues / Cron / Storage
  • arq + Redis 7 (background tasks)
  • APScheduler (timer cron)
  • MinIO / Cloudflare R2 (aioboto3)
  • SQLite (cache, local state)
Telegram + alerts
  • aiogram 3 (Bot API)
  • Forum supergroup with createForumTopic
  • Hashtag-feed channel + N-day refresher
  • Healthcheck → Discord/TG alerts
Multi-source ingest
  • Playwright-FB (primary, logged-in)
  • ScrapeCreators (commercial fallback non-EU)
  • Meta Graph API (for verified IDs)
  • EU DSA Repository (regulatory data)
Quality / monitoring
  • structlog (structured logs)
  • 16/16 tests passing (pytest)
  • Healthcheck probe + emergency halt
  • Discord webhooks for status
04 · Results

The outcome

Creatives per day
30-75k

after dedup: 5-12k unique

GEO coverage
30+

EU-27 + UK + BR + 30 non-EU markets

Pricing
$30-60

/mo inclusive — proxies + VPS + R2. vs $300-800 from commercial spy services

Where it fits for clients: the same approach (Playwright + cookie pool + GraphQL replay + Telegram distribution) applies to any source with a logged-in web UI — Avito, WB Seller, Ozon Seller, OkCupid, LinkedIn Sales Navigator, TikTok / Google ad libraries. On a call we work out which exact source you need and discuss architecture under NDA.

Готовы начать?

Аудит за 5 000 ₽ — с конкретным отчётом и сметой

Расскажу что внедрить в вашем бизнесе в первую очередь, какая будет окупаемость, и нужен ли вообще AI для вашей задачи (иногда — нет).

Или просто напишите свой вопрос — отвечу в течение 2 часов