OctoReport Docs
Back to HomeGo to Console
🚀快速开始
  • 产品概述
  • 快速上手
✨核心功能
    • 数据源总览
    • 搜索类源
    • RSS订阅源
    • 网页与邮件源
    • 政府与新闻源
  • 知识库管理
  • 报告生成
  • 交互式对话
  • 邮件触发
  • 积分与日志
💡使用技巧
  • 配置技巧
  • 优化与排查
🔬产品亮点
  • URL去重
  • 原子计费
  • 系统可靠性
❓帮助中心
  • FAQ与支持

System Reliability

High-availability system architecture based on multi-instance failover, task observability, and health checks to ensure stable service operation.

💡 Core Goal: 99.9% service availability, automatic failure recovery, complete task tracking capability.

1. Multi-Instance Failover

1.1 RSSHub Multi-Instance Architecture

Architecture Design:

  • Support multiple RSSHub instances (official + self-hosted)
  • Independent health checks for each instance
  • Automatic failover (primary fails → backup instance)
  • Instance priority sorting (self-hosted > official rsshub.app)

Instance Configuration Example:

Admin Panel → RSSHub Instance Management

Instance List:
┌─────────────────────────┬──────────┬────────┬──────────┐
│ Instance URL            │ Priority │ Status │ Auth Mode│
├─────────────────────────┼──────────┼────────┼──────────┤
│ https://my-rsshub.com   │ 1 (High) │ Healthy│ BEARER   │
│ https://rsshub.app      │ 2        │ Healthy│ NONE     │
│ https://backup.rsshub.com│ 3       │ Down   │ KEY      │
└─────────────────────────┴──────────┴────────┴──────────┘

1.2 Failover Process

Automatic Failover Mechanism:

  1. Request primary instance (highest priority healthy instance)
  2. Detect failure (timeout, 500 error, connection failure)
  3. Mark instance as unhealthy (lower priority or temporarily disable)
  4. Try next instance (by priority order)
  5. Log failover (audit trail)

Process Example:

Timeline:

10:00:00 - User creates RSS data source
• Selected instance: my-rsshub.com (priority 1)
• Request URL: https://my-rsshub.com/bilibili/user/video/123
• Result: ✅ Success (200 OK)

11:00:00 - Scheduled task execution
• Selected instance: my-rsshub.com (priority 1)
• Request URL: https://my-rsshub.com/bilibili/user/video/123
• Result: ❌ Failed (Connection Timeout)
• Actions:
  1. Mark my-rsshub.com as "unhealthy"
  2. Switch to backup instance rsshub.app (priority 2)
  3. Request URL: https://rsshub.app/bilibili/user/video/123
  4. Result: ✅ Success (200 OK)
• Log: "Failover from my-rsshub.com to rsshub.app"

12:00:00 - Health check recovery
• Detected my-rsshub.com has recovered
• Restore its priority
• Use my-rsshub.com again on next execution

1.3 Health Check Mechanism

Check Methods:

  • Active Check: Send test request every 5 minutes
  • Passive Check: Mark immediately on user request failure
  • Recovery Check: Attempt recovery every 10 minutes for unhealthy instances

Check Metrics:

MetricHealthy ThresholdUnhealthy Threshold
Response Time< 5 seconds> 10 seconds
Error Rate< 5%> 20%
Connection Success Rate> 95%< 80%

PlaceholderFailover flow diagram - showing switchover from primary to backup instance

2. Task Observability

2.1 Complete Task Logs

Log Contents:

  • id
    : Unique task ID
  • type
    : Task type (COLLECT, CLEAN, REPORT_GENERATE)
  • status
    : Status (PENDING, PROCESSING, SUCCESS, FAILED)
  • createdAt
    : Creation time
  • startedAt
    : Start time
  • completedAt
    : Completion time
  • duration
    : Execution duration (milliseconds)
  • creditsUsed
    : Credits consumed
  • message
    : Execution message
  • error
    : Error information (if failed)

Log Viewing:

Sidebar → Task Logs

Supported Filters:
• By type: Data Collection / Content Cleaning / Report Generation
• By status: All / Success / Failed / In Progress
• By time: Last 24 hours / Last 7 days / Last 30 days
• By data source: Select specific source
• By report: Select specific template

Sample Logs:
┌──────────────────┬──────────┬────────┬────────┬──────────┐
│ Time             │ Type     │ Status │ Duration│ Credits  │
├──────────────────┼──────────┼────────┼────────┼──────────┤
│ 10:00:15         │ Collect  │ Success│ 2.3s   │ 20       │
│ 10:05:30         │ Report   │ Success│ 45s    │ 150      │
│ 10:10:00         │ Collect  │ Failed │ 0.5s   │ 0        │
│ 10:15:45         │ Ask      │ Success│ 3s     │ 15       │
└──────────────────┴──────────┴────────┴────────┴──────────┘

Click task to view details:
{
  "id": "task_abc123",
  "type": "COLLECT",
  "status": "FAILED",
  "message": "Connection timeout",
  "error": "Failed to connect to rsshub.app after 3 retries",
  "startedAt": "2025-10-27T10:10:00Z",
  "duration": 500,
  "creditsUsed": 0
}

2.2 Real-time Status Monitoring

Task State Machine:

PENDING (Waiting)
   ↓
PROCESSING (Running)
   ↓
SUCCESS / FAILED

State Transition Rules:
• PENDING → PROCESSING: Worker starts processing
• PROCESSING → SUCCESS: Execution succeeds
• PROCESSING → FAILED: Execution fails (timeout/error/insufficient balance)
• No state rollback (one-way flow)

In-Progress Tasks:

  • Task log shows "In Progress" label
  • Real-time execution duration updates
  • Support viewing Worker logs (advanced feature)

2.3 Performance Metrics Tracking

Key Metrics:

MetricNormal RangeAbnormal ThresholdImpact
Data Collection Duration1-5 seconds> 30 secondsSlow data source response
Report Generation Duration10-60 seconds> 5 minutesSlow LLM response or too much content
Task Success Rate> 95%< 80%Configuration error or service anomaly

3. Failure Recovery Mechanism

3.1 Automatic Retry Strategy

Retry Scenarios:

  • ✅ Network timeout
  • ✅ Temporary service unavailable (503)
  • ✅ Rate limit (429 Too Many Requests)
  • ❌ Configuration error (404 Not Found, no retry)
  • ❌ Authentication failure (401 Unauthorized, no retry)

Retry Strategy:

Exponential Backoff

1st failure → Wait 1 second → Retry
2nd failure → Wait 2 seconds → Retry
3rd failure → Wait 4 seconds → Retry
4th failure → Give up, mark task failed

Max retries: 3 times
Total timeout: 30 seconds

Sample Logs:
2025-10-27 10:00:00 [INFO] Attempting request (1/3)
2025-10-27 10:00:05 [WARN] Timeout, retrying in 1s (2/3)
2025-10-27 10:00:08 [WARN] Timeout, retrying in 2s (3/3)
2025-10-27 10:00:15 [ERROR] Max retries exceeded, task failed

3.2 Degradation Strategy

Service Degradation Scenarios:

  • Firecrawl unavailable → Auto-degrade to Browserless
  • Primary LLM model unavailable → Switch to backup model (admin configured)
  • RSSHub instance unavailable → Switch to other instances

Degradation Example (Web Scraping):

scrapePageDetail() function degradation flow:

1. Try Firecrawl (preferred)
   ↓ Failed (timeout/API error)
2. Log: "Firecrawl failed, falling back to Browserless"
   ↓
3. Try Browserless (backup)
   ↓ Success
4. Return result + indicate provider used

Task log shows:
{
  "provider": "browserless",
  "fallback": true,
  "reason": "firecrawl_timeout"
}

3.3 Manual Intervention Capability

Admin Operations:

  • Disable unhealthy instance (RSSHub Instance Management → Disable)
  • Manually retry task (Task Logs → Click "Retry")
  • Adjust priority (RSSHub Instance Management → Modify Priority)

User Operations:

  • Manually trigger execution (Data Source/Report List → "Execute Now")
  • View failure reason (Task Logs → View Error Details)
  • Modify config and retry (Edit Data Source/Report → Save → Execute Now)

PlaceholderFailure recovery flow - complete process from failure detection to auto-retry to degradation

4. Data Consistency Guarantee

4.1 Transaction Protection

Critical Operations Use Transactions:

  • ✅ Credit deduction + Task creation (atomic)
  • ✅ Content saving + Deduplication (atomic)
  • ✅ Report generation + Step result saving (atomic)

Transaction Example:

[object Object],
,[object Object], prisma.$transaction(,[object Object], (tx) => {
  ,[object Object],
  ,[object Object], existing = ,[object Object], tx.,[object Object],.,[object Object],({
    ,[object Object],: { ,[object Object],: url }
  })

  ,[object Object],
  ,[object Object], (existing && strategy === ,[object Object],) {
    ,[object Object], tx.,[object Object],.,[object Object],({
      ,[object Object],: { ,[object Object],: existing.,[object Object], },
      ,[object Object],: { ,[object Object],: ,[object Object],, ,[object Object],: ,[object Object], ,[object Object],() }
    })
  }

  ,[object Object],
  ,[object Object], ,[object Object], tx.,[object Object],.,[object Object],({
    ,[object Object],: { ,[object Object],: url, title, content }
  })
})

,[object Object],
hljs javascript

4.2 Concurrency Control

Prevent Data Races:

  • Optimistic Lock: Use version number control (
    version
    field)
  • Pessimistic Lock: Critical operations use
    FOR UPDATE
    (e.g., credit deduction)
  • Unique Constraints: Database-level duplicate prevention (e.g.,
    sourceUrl
    index)

Concurrency Scenario Example:

User submits 2 report generation tasks simultaneously:

Task A:
1. Query credit balance: 1000
2. Deduct 200 → Balance 800
3. Create Report A
4. Commit transaction ✅

Task B:
1. Query credit balance: 800 (Task A already deducted)
2. Deduct 200 → Balance 600
3. Create Report B
4. Commit transaction ✅

Result: Both tasks succeed, balance correct (600)

Without transaction protection:
Task A and B query balance simultaneously → Both 1000
Task A deducts → Balance 800
Task B deducts → Balance 800 (Wrong! Should be 600)
Result: Credit inconsistency ❌

5. System Monitoring Metrics

5.1 Key Performance Indicators (KPIs)

MetricTargetCurrentMonitoring Method
System Availability> 99.9%99.95%Health Checks
Task Success Rate> 95%97.3%Task Log Statistics
Average Response Time< 3 seconds2.1 secondsPerformance Tracking
Data Consistency100%100%Transaction Audit

5.2 Alert Mechanism

Alert Triggers:

  • ✅ Task success rate < 80% (within 1 hour)
  • ✅ All RSSHub instances unavailable
  • ✅ Database connection pool exhausted
  • ✅ Redis connection failure

Alert Notifications:

  • Admin email notification
  • Red warning badge in admin panel
  • System log recording

6. User Experience Guarantee

6.1 Transparent Error Messages

User-Friendly Error Messages:

Technical ErrorUser-Facing Message
Connection timeout
Network connection timeout, please retry later
Insufficient credits
Insufficient credits, please top up and retry
Invalid RSSHub route
RSS route configuration error, please check route format
Rate limit exceeded
API call rate too high, please retry in 5 minutes

6.2 Self-Service Troubleshooting Tools

Provided Tools:

  • Task Logs: View detailed execution logs and error information
  • Consumption Details: Track credit consumption and refund records
  • Health Status: View system component status (future feature)
  • Documentation Search: Quickly find solutions to common issues

7. Frequently Asked Questions

Q1: What if all RSSHub instances are unavailable?

A: The system will:

  1. Mark task as failed
  2. Automatically refund (if already charged)
  3. Send alert to administrator
  4. Recommendation: Configure multiple instances (official + self-hosted) to reduce risk

Q2: What if a task stays in "In Progress" status?

A: Possible reasons:

  • Worker process exception: Contact admin to check Worker status
  • Task is actually executing: Complex reports may take 5-10 minutes
  • Timeout not detected: System automatically marks timeout after 30 minutes

Q3: Why do tasks sometimes retry automatically?

A: System retries in these cases:

  • Network timeout
  • Temporary service unavailable (503)
  • Rate limit (429, retry after waiting)

Won't retry for:

  • Configuration errors (404, 401)
  • Insufficient credits
  • Business logic errors (Invalid data)

Q4: How to view system health status?

A: Current viewing methods:

  • Task Logs: View recent task success rate
  • RSSHub Instance Management (Admin): View instance status
  • Future Feature: System status page (display all component health)

Next Steps

  • URL Deduplication Technology - Learn about content deduplication mechanisms
  • Atomic Billing Mechanism - Learn about credit deduction guarantees
  • Optimization & Troubleshooting - Solve common issues