Automated Stream Scheduling¶

FenLiu can automatically fetch new posts from your monitored hashtag streams on a configurable schedule. This eliminates the need to manually click the "Fetch" button for each stream.

Overview¶

Once enabled, the background scheduler will:

Fetch posts from active streams at regular intervals
Apply auto-reject filters if configured
Adapt fetch intervals based on stream activity (busier streams get checked more frequently)
Run continuously in the background without requiring manual intervention

Enabling Scheduling¶

By default, scheduling is enabled for all new streams with a fetch interval of 60 minutes (1 hour).

Via API¶

To get the current schedule for a stream:

curl -X GET http://localhost:8000/v1/hashtags/{stream_id}/schedule \
  -H "X-API-Key: your-api-key"

Response:

{
  "stream_id": 1,
  "hashtag": "photography",
  "enable_scheduling": true,
  "fetch_interval_minutes": 60,
  "next_scheduled_fetch": "2026-03-06T10:30:00+00:00",
  "last_check": "2026-03-06T09:30:00+00:00"
}

To update the schedule:

curl -X PATCH http://localhost:8000/v1/hashtags/{stream_id}/schedule \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d {
    "enable_scheduling": true,
    "fetch_interval_minutes": 30
  }

Configuration¶

Fetch Interval¶

The fetch_interval_minutes controls how often the stream is checked for new posts.

Minimum: 5 minutes
Maximum: 1440 minutes (24 hours)
Default: 60 minutes (1 hour)

Shorter intervals mean more frequent checks but consume more API quota from the Fediverse instance. Adjust based on:

How active the hashtag community is
Your API rate limits
How current you need the posts to be

Enabling/Disabling¶

Set enable_scheduling to false to temporarily disable scheduled fetches for a stream without deleting it. This is useful when:

Troubleshooting issues
Temporarily pausing a stream
Reducing API load

Adaptive Scheduling¶

FenLiu includes smart adaptive scheduling that automatically adjusts fetch intervals based on activity:

Busy streams (>10 posts per fetch): Interval reduced by 20% (minimum 15 minutes)
Quiet streams (<2 posts per fetch): Interval increased by 20% (maximum 4 hours/240 minutes)
Normal streams (2-10 posts): Interval remains unchanged

This means you don't have to manually tune intervals—the system learns and optimizes based on actual activity patterns.

Example¶

If you set a stream to fetch every 60 minutes:

First fetch returns 15 posts → interval becomes 48 minutes (60 × 0.8)
Next fetch returns 8 posts → interval stays 48 minutes
Following fetch returns 1 post → interval becomes 58 minutes (48 × 1.2)
Another fetch returns 0 posts → interval becomes 70 minutes (58 × 1.2)
Continuing quiet activity → interval increases toward 240-minute limit (4 hours)
Then activity picks up with 12 posts → interval drops back down to minimum 15 minutes

Auto-Reject Integration¶

If you have the auto-reject blocked posts setting enabled, it will be honored during scheduled fetches:

Posts are fetched from the hashtag
Posts are saved to the database
Posts matching any active reblog-control filter (blocked users/hashtags) are automatically rejected
Only reviewed/approved posts appear in the review queue

Viewing Schedule Status¶

When you fetch streams via the web interface, the streams list displays:

enable_scheduling: Toggle icon showing if scheduling is active
fetch_interval_minutes: Current interval in minutes
next_scheduled_fetch: When the next automatic fetch will occur
last_check: Last time posts were actually fetched (manual or scheduled)

Troubleshooting¶

Scheduled Fetches Not Running¶

Check:

Is the stream's enable_scheduling flag set to true?
Is the stream's active flag set to true?
Has the background scheduler started? (Check application logs)
Are you seeing "StreamScheduler started" in the logs at startup?

Fetch Interval Not Updating¶

Adaptive scheduling only adjusts intervals within the bounds:

Minimum: 15 minutes (no faster than every 15 minutes)
Maximum: 4 hours/240 minutes (no slower than every 4 hours)

These bounds prevent overfetching (wasting API quota) and underfetching (missing activity).

API Rate Limits¶

If you're hitting rate limits from the Fediverse instance:

Increase the fetch_interval_minutes (e.g., 120 or 180 for very active hashtags)
Disable scheduling for less critical streams
Check the Fediverse instance's rate limit policy

Best Practices¶

Start with 60 minutes: The default 1-hour interval works well for most use cases
Let adaptive scheduling work: Don't manually adjust intervals too frequently; let the system learn
Monitor first runs: Watch the first few automatic fetches to ensure everything works as expected
Check logs: Review application logs to see fetch activity and any errors
Set reasonable bounds: The 5-1440 minute range covers most scenarios; don't go outside it

API Reference¶

Get Stream Schedule¶

Endpoint: GET /v1/hashtags/{stream_id}/schedule

Authentication: API Key required

Response:

{
  "stream_id": 1,
  "hashtag": "photography",
  "enable_scheduling": true,
  "fetch_interval_minutes": 60,
  "next_scheduled_fetch": "2026-03-06T10:30:00+00:00",
  "last_check": "2026-03-06T09:30:00+00:00"
}

Update Stream Schedule¶

Endpoint: PATCH /v1/hashtags/{stream_id}/schedule

Authentication: API Key required

Request Body:

{
  "enable_scheduling": true,
  "fetch_interval_minutes": 45
}

Both fields are optional; omit to leave unchanged.

Response: Updated schedule object

Validation:

fetch_interval_minutes must be between 5 and 1440
Invalid values return 400 Bad Request

Nightly ML Model Retraining¶

FenLiu retrains the XGBoost content classifier automatically every night at 02:00.

How it starts¶

The retraining job is registered when the application starts, as part of the same APScheduler setup that handles stream fetches:

App starts
  └─ lifespan() in main.py
       └─ start_scheduler()
            └─ StreamScheduler.start()
                 ├─ loads all active streams and schedules fetch jobs
                 ├─ schedules daily cleanup job
                 └─ schedules nightly retraining job (cron: 02:00)

No external cron daemon, systemd timer, or separate process is needed — the retraining job is managed entirely within the running FenLiu container.

What happens at 02:00¶

All review_feedback rows are loaded from the database.
If fewer than 500 rows exist, training is skipped and a notice is logged — no artifacts are written.
Otherwise, a new XGBoost model is trained on the full dataset.
Before saving, existing artifacts are copied to .prev backups:
models/xgboost_model.prev.pkl
models/feature_pipeline.prev.pkl
models/model_metadata.prev.json
New artifacts replace the current ones in ML_MODELS_DIR.

When the new model takes effect¶

The running app does not hot-swap the model mid-session. New artifacts are loaded from disk on the next container restart. If you restart every morning (e.g. 05:00–06:00), the sequence is:

02:00 — retraining job runs, new artifacts saved to ML_MODELS_DIR
05:30 — container restarts
05:30 — lifespan loads new artifacts: ml_inference_service.load(models_dir)
05:31 — all new posts scored with the updated model

Verifying a retraining run¶

Check the application logs for entries around 02:00:

INFO  Nightly retraining job started
INFO  Training on 41823 rows
INFO  Retraining complete — new artifacts saved to /app/data/models

Or check artifact timestamps inside the container:

podman exec -it <container> ls -lh /app/data/models/

Manual retraining¶

You can retrain at any time without waiting for the scheduled job:

# Validate first (no artifacts saved)
podman exec -it <container> fenliu-train validate

# Train and save
podman exec -it <container> fenliu-train fit --output-dir /app/data/models

Changes take effect on the next container restart.

Rollback¶

If a new model behaves unexpectedly, restore the previous version:

podman exec -it <container> sh -c "
  cp /app/data/models/xgboost_model.prev.pkl /app/data/models/xgboost_model.pkl
  cp /app/data/models/feature_pipeline.prev.pkl /app/data/models/feature_pipeline.pkl
"

Then restart the container to load the previous artifacts.

Under the Hood¶

FenLiu uses APScheduler for background task scheduling:

Scheduler is initialized when the application starts
All active streams with enable_scheduling=true are loaded and scheduled
Nightly ML retraining is registered as a cron job at 02:00
Scheduler runs in an async event loop alongside the web server
Graceful shutdown ensures all pending tasks complete before shutdown