Governance - Moderation Workflow
Governance — Moderation Workflow
Every review passes through a two-stage pipeline: automated text filtering followed by
human moderator review for borderline or flagged content. The pipeline is defined in
lib/moderation/textFilter.ts and lib/moderation/flagging.ts. Automated passes set
the review’s moderation_status; human passes change it via PATCH /api/reviews/[id]/moderate.
Automated Filtering Pipeline
flowchart TD
A([Review submitted]) --> B["textFilter.ts\nRegex pass"]
B --> C{Hard block\ndetected?}
C -- "PII / real name / phone\n/ email / social handle\n/ explicit content\n/ threat / doxxing" --> D["status = rejected\nReturns 422 to reviewer"]
C -- Clean --> E["flagging.ts\nRisk score computed"]
E --> F{flagged_score}
F -- "> 0.7 high risk" --> G["status = quarantined\nAdmin alert"]
F -- "0.3–0.7 medium" --> H["status = pending\nAdded to moderation queue"]
F -- "< 0.3 low risk" --> I["status = approved\nPublished immediately"]
G --> J{Moderator decision}
H --> J
J -- Approve --> I
J -- Reject --> D
J -- Quarantine-extend --> G
Hard Block Rules (textFilter.ts)
| Category | Pattern Examples |
|---|---|
| Real full names | First + Last name pattern (/\b[A-Z][a-z]+ [A-Z][a-z]+\b/) |
| Phone numbers | US/intl formats, WhatsApp numbers |
| Email addresses | user@domain.tld patterns |
| Social handles | @username, t.me/, instagram.com/ |
| Workplace references | ”works at”, “works for”, employer name patterns |
| Physical addresses | Street numbers + street name patterns |
| Explicit sexual content | Curated term list |
| Threats / harassment | Curated phrase list |
| Underage indicators | Age references below 18 |
Moderation SLA
| Status | Target Resolution | Owner |
|---|---|---|
| Quarantined | < 24 hours | Admin |
| Pending (medium risk) | < 72 hours | Admin / Moderator |
| Removal requests | < 7 days | Admin |
| Abuse reports | < 48 hours | Admin |
Moderation State Machine
stateDiagram-v2
[*] --> pending: Review submitted (low/medium risk)
[*] --> quarantined: Review submitted (high risk)
[*] --> rejected: Hard block detected
pending --> approved: Moderator approves
pending --> rejected: Moderator rejects
pending --> quarantined: Moderator escalates
quarantined --> approved: Moderator clears
quarantined --> rejected: Moderator rejects
approved --> quarantined: New abuse report (re-review)
approved --> rejected: Moderator reverses
rejected --> [*]