Skip to content

Governance - Moderation Workflow

Governance — Moderation Workflow

Every review passes through a two-stage pipeline: automated text filtering followed by human moderator review for borderline or flagged content. The pipeline is defined in lib/moderation/textFilter.ts and lib/moderation/flagging.ts. Automated passes set the review’s moderation_status; human passes change it via PATCH /api/reviews/[id]/moderate.

Automated Filtering Pipeline

flowchart TD A([Review submitted]) --> B["textFilter.ts\nRegex pass"] B --> C{Hard block\ndetected?} C -- "PII / real name / phone\n/ email / social handle\n/ explicit content\n/ threat / doxxing" --> D["status = rejected\nReturns 422 to reviewer"] C -- Clean --> E["flagging.ts\nRisk score computed"] E --> F{flagged_score} F -- "> 0.7 high risk" --> G["status = quarantined\nAdmin alert"] F -- "0.3–0.7 medium" --> H["status = pending\nAdded to moderation queue"] F -- "< 0.3 low risk" --> I["status = approved\nPublished immediately"] G --> J{Moderator decision} H --> J J -- Approve --> I J -- Reject --> D J -- Quarantine-extend --> G

Hard Block Rules (textFilter.ts)

CategoryPattern Examples
Real full namesFirst + Last name pattern (/\b[A-Z][a-z]+ [A-Z][a-z]+\b/)
Phone numbersUS/intl formats, WhatsApp numbers
Email addressesuser@domain.tld patterns
Social handles@username, t.me/, instagram.com/
Workplace references”works at”, “works for”, employer name patterns
Physical addressesStreet numbers + street name patterns
Explicit sexual contentCurated term list
Threats / harassmentCurated phrase list
Underage indicatorsAge references below 18

Moderation SLA

StatusTarget ResolutionOwner
Quarantined< 24 hoursAdmin
Pending (medium risk)< 72 hoursAdmin / Moderator
Removal requests< 7 daysAdmin
Abuse reports< 48 hoursAdmin

Moderation State Machine

stateDiagram-v2 [*] --> pending: Review submitted (low/medium risk) [*] --> quarantined: Review submitted (high risk) [*] --> rejected: Hard block detected pending --> approved: Moderator approves pending --> rejected: Moderator rejects pending --> quarantined: Moderator escalates quarantined --> approved: Moderator clears quarantined --> rejected: Moderator rejects approved --> quarantined: New abuse report (re-review) approved --> rejected: Moderator reverses rejected --> [*]