Ghost Writer Documentation

Complete reference for the Ghost Protocol methodology, 5-stage pipeline, 40-point QA system, platform specs, voice profiles, detection evasion, and API integration.

1. Ghost Protocol Methodology

Ghost Writer operates under three core laws that govern every piece of content:

Controlled Chaos — Deliberate imperfection in every piece. Sentence fragments, unexpected metaphors, paragraphs that run long because the thought demands it. Never uniform.
Signature Voice — Every content type gets a specific voice profile. The engine writes as a defined persona with consistent tone, vocabulary, and structural habits.
Invisible Architecture — Detection evasion is baked into generation, not bolted on. The writing passes because of how it's constructed, not because of post-processing.

2. 5-Stage Pipeline

Every piece of content flows through five sequential stages:

Profile — Load voice profile, set perplexity/burstiness targets, identify platform constraints.
Writer — GPT-5.4 generation with Ghost Protocol system prompt.
QA Engine — 40-point check across 10 blocks.
Adapter — Format for target platform (18 supported).
Polish — Human-pass simulation with 2–3 small edits.

3. The 40-Point QA System

Every piece is validated against 40 checks organized into 10 blocks. Hard checks must pass; soft checks inform quality scoring.

Block A: Statistical (#1–7)

ID	Name	Type	Target	Description
#1	Sentence Length Variance	Hard	stdev ≥ 5	Sentence length standard deviation must meet minimum for burstiness.
#2	Vocabulary Richness TTR	Soft	≥ 0.45	Type-token ratio indicates lexical diversity.
#3	Hapax Legomena Ratio	Soft	≥ 0.25	Ratio of words used once to total unique words.
#4	Average Sentence Length	Soft	8–25 words	Within human-typical range.
#5	Short Sentence Presence	Hard	≥ 1 sentence ≤ 5 words	At least one short sentence or fragment.
#6	Long Sentence Presence	Soft	≥ 1 sentence ≥ 25 words	At least one longer, complex sentence.
#7	N-gram Diversity	Soft	Varied distribution	Token distribution should not be overly predictable.

Block B: Classifier Resistance (#8–12)

ID	Name	Type	Target	Description
#8	Conjunction Starters	Hard	≥ 1 paragraph	At least one paragraph starts with And/But/So.
#9	Fragment Usage	Soft	Contains fragments	Content includes sentence fragments.
#10	Parenthetical Asides	Soft	Contains () or —	Parentheticals or em-dashes present.
#11	Temperature Variance	Soft	0.85–0.95	Generation temperature at creation.
#12	Model Attribution Defense	Soft	Varied patterns	Patterns that resist model-specific attribution.

Block C: Linguistic (#13–18)

ID	Name	Type	Target	Description
#13	Phrase Blacklist	Hard	0 hits	Zero hits from 120+ banned AI-detectable phrases.
#14	Lexical Diversity	Soft	TTR ≥ 0.50	Vocabulary richness threshold.
#15	Readability Variance	Soft	Flesch-Kincaid 20–100	Readability score within range.
#16	Syntactic Variety	Soft	stdev ≥ 4	Sentence structure variation.
#17	Emotional Authenticity	Soft	Voice-driven	Tone matches voice profile.
#18	Metaphor/Analogy Presence	Soft	≥ 1	At least one metaphor or analogy.

Block D: Watermark (#19–20)

ID	Name	Type	Target	Description
#19	Unicode Normalization	Hard	Clean	No invisible characters or watermark artifacts.
#20	Metadata Clean	Hard	None	No embedded metadata or hidden markers.

Block E: Scoring (#21–25)

Confidence targeting, sentence-level clean, plagiarism check, anti-humanizer resistance, language authenticity.

Block F: Bias (#26–28)

Non-native bias clear, domain patterns validated, length optimization.

Block G: Adversarial (#29–31)

Pattern diversity, translation proof, authorship consistency.

Block H: Infrastructure (#32–34)

Multi-detector validation, plain text normalization, platform compliance.

Block I: Evaluation (#35–37)

Third-party benchmark, FPR exploitation clear, AI-assisted classification.

Block J: Governance (#38–40)

Disclosure compliance, audit trail, provenance proof.

4. Platform Specs

All 18 supported platforms with character limits, truncation rules, format, and best practices.

Platform	Max Chars	Truncation	Format	Best Length	Key Rules
LinkedIn	3,000	140 mobile	plain	300–1200	3 hashtags max, line breaks only
X/Twitter	280 / 25K premium	—	plain	200–280	Thread format (1/n)
Reddit	40,000	—	markdown	400–1500 words	TL;DR for >300 words
Instagram	2,200	125	plain	125–500	3–5 hashtags, emojis=2 chars
Facebook	63,206	125 mobile	plain	40–250	Front-load message
TikTok	4,000	100	plain	100–300	Hook-first
YouTube	5,000	200	plain	200–1000	Timestamps, chapters
Substack	unlimited	—	html/markdown	800–3000 words	H2/H3, pull quotes
Email	—	—	html+plain	50–300 words	subject<60, preheader<90
Blog	unlimited	—	markdown	800–2500 words	H2/H3, meta<160
White Paper	unlimited	—	markdown	2000–5000 words	Exec summary, citations
Threads	500	500	plain	100–500	Complete thought
Medium	unlimited	—	markdown	800–2500 words	5 tags, subtitle
Pinterest	500	—	plain	100–300	Keyword-rich, no hashtag spam
GBP	1,500	—	plain	150–300	CTA button types
Website	unlimited	—	html	300–800 words	Conversion copy
Reply	context-matched	—	context-matched	50–200 words	Acknowledge + answer
Reddit Comment	10,000	—	markdown	50–300 words	Conversational

5. Voice Profiles

Four built-in profiles plus custom import:

Profile	Style	Avg Words	Stdev
john-williams	Direct, opinionated, coaching analogies	16	9
agency	Professional, data-driven	18	7
technical	Precise, specification-heavy	14	6
casual	Conversational, fragment-heavy	12	11

Voice Import

Paste 2–5 writing samples → engine analyzes sentence length, vocabulary, structure, tone → extracts fingerprint → saves as custom profile.

6. Detection Methodology

How each detector works and how Ghost Writer defeats it:

GPTZero

Uses perplexity + burstiness + 7 indicators. We target perplexity >30, burstiness stdev >5, inject fragments and unexpected word choices.

Pangram v3

Classifies as AI / AI-Assisted / Human with model attribution. We vary token distribution patterns and use voice-specific vocabulary.

Originality.ai v2

99% accuracy, catches paraphrasing. We generate with human patterns from scratch rather than paraphrasing AI text.

7. API Reference

POST /api/writing-agent — Generate content

// Request
{
  "platform": "linkedin",
  "voice": "john-williams",
  "topic": "Why PMax works better with brand campaigns",
  "context": "B2B SaaS audience",
  "length": "500"
}

// Response
{
  "content": "...",
  "platform": "linkedin",
  "checks": { "passed": 38, "failed": 2, "details": [...] }
}

POST /api/writing-agent-check — Check existing text

// Request
{
  "text": "Your existing content to analyze..."
}

// Response
{
  "readability": { "fleschKincaid": 65, "grade": "8th grade", ... },
  "tone": ["confident", "direct"],
  "aiScore": 0.12,
  "suggestions": [...]
}

POST /api/writing-agent-voice — Voice profile management

// Request (analyze samples)
{
  "action": "analyze",
  "samples": ["Sample 1...", "Sample 2...", "Sample 3..."]
}

// Response
{
  "fingerprint": {
    "sentenceLength": { "mean": 16, "stdev": 9, ... },
    "vocabulary": { "ttr": 0.52, "domainTerms": [...], ... }
  }
}

8. Research & Citations

Mitchell et al. (2023) — DetectGPT
Bao et al. (2023) — Fast-DetectGPT
Liang et al. (2023) — GPT Detectors Biased Against Non-Native Writers
Hans et al. (2024) — Binoculars
MASH (2026) — Style Humanization
AuthorMist (2025) — RL-Based Evasion