Skip to content

ML Confidence Scores

FenLiu uses a trained XGBoost classifier to predict whether a post is likely to be approved or rejected. The model learns from your review history and provides a second opinion alongside the existing spam score.

Overview

Every new post entering the review queue is scored by the ML model at fetch time. The result is a confidence score between 0.0 and 1.0:

  • 1.0 — model is highly confident the post will be approved
  • 0.5 — model is uncertain (50/50)
  • 0.0 — model is highly confident the post will be rejected

The score appears as a badge in the ML column of the review table.

Reading the ML Badge

Badge Meaning
0.82 ✓ (green) Model and spam score agree — both suggest approve
0.14 ✓ (red) Model and spam score agree — both suggest reject
0.73 ⚠ (green) Model says approve, spam score says reject — they disagree
0.21 ⚠ (red) Model says reject, spam score says approve — they disagree
0.52 ? (amber, pulsing) Uncertain zone (0.4–0.6) — your judgement matters most here
Post was not scored (no model loaded, or pre-Phase C post)

The ⚠ badge is worth a second look — it means the two signals point in different directions. Neither is definitive; the combination is information.

The pulsing amber badge marks the uncertain zone (confidence 0.4–0.6). These are the posts where the model has the least confidence and where a human decision adds the most value. By default, the queue sorts these to the top.

Queue Ordering

When the ML model is loaded, the review queue defaults to uncertainty-first ordering: posts with confidence closest to 0.5 appear first. This surfaces the posts where your review makes the most difference and lets you handle clear-cut cases (high or low confidence) last.

You can change the sort order using the column headers:

Sort Behaviour
ML ↕ Uncertainty-first (default when model loaded) — most uncertain at top
Score Spam score descending
Date Newest first

Filtering by ML Confidence

The filter bar includes an ML Confidence dropdown when the model is loaded. Choose one of three presets:

Preset Range Use when
Likely Approve ≥ 0.7 You want to batch-check posts the model is confident about approving
Unsure 0.4–0.6 You want to focus on posts where the model has no strong opinion
Likely Reject ≤ 0.3 You want to batch-check posts the model is confident about rejecting
All No ML filter (default)

Posts with no ML score () always pass through the filter regardless of which preset is selected.

The ML confidence filter and spam score filter are independent — you can combine them. For example, filter to Unsure with a spam score of 20–50 to isolate borderline posts where both signals are ambiguous.

Note: there is an intentional gap between presets — scores in the 0.3–0.4 and 0.6–0.7 ranges represent a mild lean but not confident prediction. These posts appear in the unfiltered view.

What the Model Knows (and Doesn't)

What it uses: - Post content (TF-IDF text features) - Spam score at fetch time - Hashtag count, attachment count, video flag - Engagement (boosts, likes, replies) - Author bot flag - Fediverse instance

What it cannot see: - Image or video content — only that attachments exist - Account history beyond author_is_bot - Whether a photo is high quality

This gap is most visible on cat hashtag streams where rejection is often about image quality rather than spam. The model will approve everything that looks like legitimate cat content — it cannot evaluate whether the photo is good.

Model Not Loaded

If the ML model has not been trained yet (or ML_MODELS_DIR points to an empty directory), a yellow notice appears above the review queue:

ML model not loaded — run fenliu-train fit to enable confidence scores. Confidence column will show — until then.

To train and enable the model:

# Inside the container
podman exec -it <container> fenliu-train fit --output-dir /app/data/models

After the next container restart, confidence scores will appear for all newly fetched posts. Existing posts in the queue will continue to show .

Model Training and Updates

The model retrains automatically every night at 02:00 and the new version is loaded on the next container restart. See Automated Scheduling — ML Retraining for full details.

You can also retrain on demand at any time:

podman exec -it <container> fenliu-train fit --output-dir /app/data/models

The new artifacts take effect on the next restart.

Accuracy and Limitations

Based on the 110-day analysis (July 2026, 39k reviews):

Metric Value
Cross-validated accuracy 86.1% (±2.8%)
Agreement rate with human decisions 87.5%
Margin over spam-score-only baseline +24.9 pp

The model handles clear-cut cases (confident spam, obvious quality content) very well. Disagreements are almost entirely in the spam score 0–50 range — posts where the spam scorer is itself uncertain.

False positives (model approves, you reject) are most common on visual-heavy streams like #catsofmastodon and #caturday, where you're rejecting for photo quality rather than spam. The model cannot evaluate image quality.

The ML score is a second opinion, not a decision. You always make the final call.