Skip to content

Automated Stream Scheduling

FenLiu can automatically fetch new posts from your monitored hashtag streams on a configurable schedule. This eliminates the need to manually click the "Fetch" button for each stream.

Overview

Once enabled, the background scheduler will:

  • Fetch posts from active streams at regular intervals
  • Apply auto-reject filters if configured
  • Adapt fetch intervals based on stream activity (busier streams get checked more frequently)
  • Run continuously in the background without requiring manual intervention

Enabling Scheduling

By default, scheduling is enabled for all new streams with a fetch interval of 60 minutes (1 hour).

Via API

To get the current schedule for a stream:

curl -X GET http://localhost:8000/v1/hashtags/{stream_id}/schedule \
  -H "X-API-Key: your-api-key"

Response:

{
  "stream_id": 1,
  "hashtag": "photography",
  "enable_scheduling": true,
  "fetch_interval_minutes": 60,
  "next_scheduled_fetch": "2026-03-06T10:30:00+00:00",
  "last_check": "2026-03-06T09:30:00+00:00"
}

To update the schedule:

curl -X PATCH http://localhost:8000/v1/hashtags/{stream_id}/schedule \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d {
    "enable_scheduling": true,
    "fetch_interval_minutes": 30
  }

Configuration

Fetch Interval

The fetch_interval_minutes controls how often the stream is checked for new posts.

  • Minimum: 5 minutes
  • Maximum: 1440 minutes (24 hours)
  • Default: 60 minutes (1 hour)

Shorter intervals mean more frequent checks but consume more API quota from the Fediverse instance. Adjust based on:

  • How active the hashtag community is
  • Your API rate limits
  • How current you need the posts to be

Enabling/Disabling

Set enable_scheduling to false to temporarily disable scheduled fetches for a stream without deleting it. This is useful when:

  • Troubleshooting issues
  • Temporarily pausing a stream
  • Reducing API load

Adaptive Scheduling

FenLiu includes smart adaptive scheduling that automatically adjusts fetch intervals based on activity:

  • Busy streams (>10 posts per fetch): Interval reduced by 20% (minimum 15 minutes)
  • Quiet streams (<2 posts per fetch): Interval increased by 20% (maximum 4 hours/240 minutes)
  • Normal streams (2-10 posts): Interval remains unchanged

This means you don't have to manually tune intervals—the system learns and optimizes based on actual activity patterns.

Example

If you set a stream to fetch every 60 minutes:

  1. First fetch returns 15 posts → interval becomes 48 minutes (60 × 0.8)
  2. Next fetch returns 8 posts → interval stays 48 minutes
  3. Following fetch returns 1 post → interval becomes 58 minutes (48 × 1.2)
  4. Another fetch returns 0 posts → interval becomes 70 minutes (58 × 1.2)
  5. Continuing quiet activity → interval increases toward 240-minute limit (4 hours)
  6. Then activity picks up with 12 posts → interval drops back down to minimum 15 minutes

Auto-Reject Integration

If you have the auto-reject blocked posts setting enabled, it will be honored during scheduled fetches:

  1. Posts are fetched from the hashtag
  2. Posts are saved to the database
  3. Posts matching any active reblog-control filter (blocked users/hashtags) are automatically rejected
  4. Only reviewed/approved posts appear in the review queue

Viewing Schedule Status

When you fetch streams via the web interface, the streams list displays:

  • enable_scheduling: Toggle icon showing if scheduling is active
  • fetch_interval_minutes: Current interval in minutes
  • next_scheduled_fetch: When the next automatic fetch will occur
  • last_check: Last time posts were actually fetched (manual or scheduled)

Troubleshooting

Scheduled Fetches Not Running

Check:

  1. Is the stream's enable_scheduling flag set to true?
  2. Is the stream's active flag set to true?
  3. Has the background scheduler started? (Check application logs)
  4. Are you seeing "StreamScheduler started" in the logs at startup?

Fetch Interval Not Updating

Adaptive scheduling only adjusts intervals within the bounds:

  • Minimum: 15 minutes (no faster than every 15 minutes)
  • Maximum: 4 hours/240 minutes (no slower than every 4 hours)

These bounds prevent overfetching (wasting API quota) and underfetching (missing activity).

API Rate Limits

If you're hitting rate limits from the Fediverse instance:

  • Increase the fetch_interval_minutes (e.g., 120 or 180 for very active hashtags)
  • Disable scheduling for less critical streams
  • Check the Fediverse instance's rate limit policy

Best Practices

  1. Start with 60 minutes: The default 1-hour interval works well for most use cases
  2. Let adaptive scheduling work: Don't manually adjust intervals too frequently; let the system learn
  3. Monitor first runs: Watch the first few automatic fetches to ensure everything works as expected
  4. Check logs: Review application logs to see fetch activity and any errors
  5. Set reasonable bounds: The 5-1440 minute range covers most scenarios; don't go outside it

API Reference

Get Stream Schedule

Endpoint: GET /v1/hashtags/{stream_id}/schedule

Authentication: API Key required

Response:

{
  "stream_id": 1,
  "hashtag": "photography",
  "enable_scheduling": true,
  "fetch_interval_minutes": 60,
  "next_scheduled_fetch": "2026-03-06T10:30:00+00:00",
  "last_check": "2026-03-06T09:30:00+00:00"
}

Update Stream Schedule

Endpoint: PATCH /v1/hashtags/{stream_id}/schedule

Authentication: API Key required

Request Body:

{
  "enable_scheduling": true,
  "fetch_interval_minutes": 45
}

Both fields are optional; omit to leave unchanged.

Response: Updated schedule object

Validation:

  • fetch_interval_minutes must be between 5 and 1440
  • Invalid values return 400 Bad Request

Nightly ML Model Retraining

FenLiu retrains the XGBoost content classifier automatically every night at 02:00.

How it starts

The retraining job is registered when the application starts, as part of the same APScheduler setup that handles stream fetches:

App starts
  └─ lifespan() in main.py
       └─ start_scheduler()
            └─ StreamScheduler.start()
                 ├─ loads all active streams and schedules fetch jobs
                 ├─ schedules daily cleanup job
                 └─ schedules nightly retraining job (cron: 02:00)

No external cron daemon, systemd timer, or separate process is needed — the retraining job is managed entirely within the running FenLiu container.

What happens at 02:00

  1. All review_feedback rows are loaded from the database.
  2. If fewer than 500 rows exist, training is skipped and a notice is logged — no artifacts are written.
  3. Otherwise, a new XGBoost model is trained on the full dataset.
  4. Before saving, existing artifacts are copied to .prev backups:
  5. models/xgboost_model.prev.pkl
  6. models/feature_pipeline.prev.pkl
  7. models/model_metadata.prev.json
  8. New artifacts replace the current ones in ML_MODELS_DIR.

When the new model takes effect

The running app does not hot-swap the model mid-session. New artifacts are loaded from disk on the next container restart. If you restart every morning (e.g. 05:00–06:00), the sequence is:

02:00 — retraining job runs, new artifacts saved to ML_MODELS_DIR
05:30 — container restarts
05:30 — lifespan loads new artifacts: ml_inference_service.load(models_dir)
05:31 — all new posts scored with the updated model

Verifying a retraining run

Check the application logs for entries around 02:00:

INFO  Nightly retraining job started
INFO  Training on 41823 rows
INFO  Retraining complete — new artifacts saved to /app/data/models

Or check artifact timestamps inside the container:

podman exec -it <container> ls -lh /app/data/models/

Manual retraining

You can retrain at any time without waiting for the scheduled job:

# Validate first (no artifacts saved)
podman exec -it <container> fenliu-train validate

# Train and save
podman exec -it <container> fenliu-train fit --output-dir /app/data/models

Changes take effect on the next container restart.

Rollback

If a new model behaves unexpectedly, restore the previous version:

podman exec -it <container> sh -c "
  cp /app/data/models/xgboost_model.prev.pkl /app/data/models/xgboost_model.pkl
  cp /app/data/models/feature_pipeline.prev.pkl /app/data/models/feature_pipeline.pkl
"

Then restart the container to load the previous artifacts.


Under the Hood

FenLiu uses APScheduler for background task scheduling:

  • Scheduler is initialized when the application starts
  • All active streams with enable_scheduling=true are loaded and scheduled
  • Nightly ML retraining is registered as a cron job at 02:00
  • Scheduler runs in an async event loop alongside the web server
  • Graceful shutdown ensures all pending tasks complete before shutdown