Knowledge Base Management
What is a Knowledge Base
A Knowledge Base (Library) is a container in OctoReport for categorizing and organizing content.
Core Functions:
- Content Classification: Store different topics in separate categories
- Multi-Source Aggregation: One library can link to multiple data sources, automatically collecting all their content
- Report Generation: Report templates can extract content from specified libraries for analysis
- Conversational Q&A: Ask feature can answer questions based on library content
Relationship Explanation:
- 1 library can link to multiple data sources
- 1 data source can also link to multiple libraries
- Content collected by data sources is automatically stored in all linked libraries
PlaceholderLibrary architecture diagram - showing relationships between data sources, libraries, reports/Ask
ℹ️ Note
Libraries are the core of content management. Proper library structure planning can greatly improve report generation and Q&A efficiency.
Creating a Library
Creation Steps
- Click "Knowledge Base Management" in the left sidebar
- Click "New Library" button
- Fill in basic information:
- Name: Library name (required)
- Description: Purpose description (optional, recommended)
- Click "Save"
Configuration Examples
Example 1: AI Industry News Library
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Example 2: Government Tender Information Library
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Example 3: Competitor Analysis Library
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Best Practices
- Concise clear names: Recommend 2-8 words, theme immediately clear
- Detailed descriptions: Specify library purpose, linked data source types, intended use
- Topic-based categorization: Don't create overly broad libraries (e.g., "All News"), subdivide by industry and topic
Linking Data Sources
Method 1: Link from Library Page (Recommended)
- Enter library details page
- Click "Link Data Source" button
- Select data source from dropdown list
- Click "Confirm"
Method 2: Link from Data Source Page
- Go to "Data Source Management"
- When creating or editing a data source, select target library in "Linked Libraries" field
- Save data source
Many-to-Many Relationships
Libraries and data sources support many-to-many linking:
Scenario 1: One data source linked to multiple libraries
Data Source: "36Kr Tech News" ├─ Linked Library: "AI Industry News" ├─ Linked Library: "Startup Investment News" └─ Linked Library: "Product Design Inspiration"
Scenario 2: One library linked to multiple data sources
Library: "AI Industry News" ├─ Linked Data Source: "36Kr Tech News" (RSS) ├─ Linked Data Source: "Machine Learning Blog" (RSS) ├─ Linked Data Source: "Google AI News" (Google News) └─ Linked Data Source: "AI Keyword Search" (Search Source)
⚠️ Note
- Linking is bidirectional: operations on either library or data source page establish the link
- After unlinking, already collected content won't be deleted, remains in library
- Newly linked data sources only collect future content, no historical backfill
PlaceholderLibrary and data source linking diagram - showing many-to-many relationships
Viewing and Filtering Content
Content List
Enter library details page to see all collected content:
Display Information:
- Title: Content title
- Source: Which data source it came from
- Collection Time: When content was collected
- Status: Whether cleaned, whether expired
Sorting Options:
- Default by "Collection Time" descending (newest first)
- Can switch to "Title" sorting
Filtering Features
Filter by Data Source:
- Click "Data Source" dropdown menu
- Select specific data source to show only its content
Filter by Time Range:
- Click "Time Range" selector
- Choose preset ranges (last 7/30/90 days) or custom dates
Filter by Cleaning Status:
- Cleaned: Already extracted summary and keywords using LLM
- Uncleaned: Retains original HTML content
- All: Show all content
Filter by Expiration Status:
- Valid Content: Current latest version (default)
- Expired Content: Old versions marked as expired due to URL deduplication
- All: Show all content
Content Details
Click any content title to view detailed information:
Basic Information:
- Title, source URL, collection time, data source name
Content Preview:
- If cleaned: Shows summary and keywords
- If uncleaned: Shows original HTML (can click "Trigger Cleaning")
Actions:
- View Original: Jump to original URL
- Trigger Cleaning: Manually trigger LLM cleaning (consumes credits)
- Delete: Remove from library (doesn't affect other libraries)
Usage Tips
- Regular quality checks: Check for irrelevant content, adjust data source configuration
- Manual cleaning: For important content, manually trigger cleaning for better summaries
- Use filtering: Before generating reports, use filters to confirm library has sufficient relevant content
Management Operations
Edit Library
Click "Edit" button on library details page to modify name or description.
Delete Library
Click "Delete" button on library list page.
⚠️ Warning: Deleting a library permanently deletes all content, linked data sources are unaffected.
Clear Content
Click "Clear Content" to empty library while keeping configuration, suitable for testing or restarting collection.
Best Practices
✅ Subdivide Libraries by Topic
Recommended:
Library 1: "AI Research Progress" Library 2: "AI Business Applications" Library 3: "AI Policy & Regulations"
Not Recommended:
Library: "All AI-Related Content"
Reason: Subdivided libraries are easier to manage and use, report generation can precisely extract relevant content.
✅ Properly Use Many-to-Many Linking
Scenario: Same data source may cover multiple topics
Example:
Data Source: "Tech Media General News" ├─ Linked Library: "AI Industry News" (AI articles) ├─ Linked Library: "Blockchain Updates" (blockchain articles) └─ Linked Library: "Tech Company Funding" (financing news)
Benefit: Collect once, use multiple times, save costs.
✅ Regular Checks and Optimization
Checklist:
- Weekly check content volume, confirm data sources working normally
- Monthly check content quality, remove irrelevant content
- Based on usage frequency, consider merging or splitting libraries
FAQ
Q1: What's the difference between libraries and data sources?
- Data Source: Defines where to collect content from (search, RSS, email, etc.)
- Library: Defines how to categorize and use content (report generation, Ask Q&A)
Q2: Does deleting a library affect data sources?
No. Data sources will continue collecting content, just no linked library to store it.
Q3: Why does content appear duplicated?
Possible reasons: Same data source linked to multiple libraries (normal), or deduplication strategy set to UPDATE (old versions marked as expired).
Next Steps
- Report Generation - Use libraries to generate reports
- Ask - Q&A based on libraries
- Configuration Tips - Optimize library configuration