Search Sources
What Are Search Sources
Search sources actively search for keywords through search engine APIs and automatically collect search results. Suitable for monitoring specific topics, tracking brand mentions, and discovering industry trends.
Core Advantages:
- Actively discover content (rather than passive subscription)
- Support complex keyword combinations
- Optional detail page scraping
Supported Search Engines
1. Google Search 🔍
Features:
- World's largest search engine with broadest coverage
- Uses Google Custom Search API
- Supports detail page scraping (requires additional configuration)
Use Cases:
- Global news monitoring
- English content search
- Broad topic coverage
Configuration Requirements:
- Google API Key (obtain through Google Cloud Console)
- Search Engine ID (create custom search engine)
Cost: ~10 credits/item
2. Jina AI Search 🤖
Features:
- AI-driven semantic search
- Focuses on high-quality content
- Supports detail page scraping
Use Cases:
- Technical documentation search
- High-quality content filtering
- Semantic relevance matching
Configuration Requirements:
- Jina API Key (get from https://jina.ai)
Cost: ~10 credits/item
3. Firecrawl Search 🔥
Features:
- Professional web scraping service
- Native detail scraping support (returns Markdown content directly during search)
- Returns structured Markdown format
Use Cases:
- Scenarios requiring complete content
- Structured data extraction
- High-quality content cleaning
Configuration Requirements:
- Firecrawl API Key (get from https://firecrawl.dev)
Cost:
- Search: ~10 credits/item
- Detail scraping: Included in search (no extra charge)
💡 Tip: Firecrawl is the only engine that returns Markdown directly during search, no secondary scraping needed
4. Metaso Search (秘塔AI) 🌟
Features:
- Chinese AI search engine
- Focuses on Chinese content
- Supports both web search and academic search modes
- Does not support direct detail scraping (only returns summaries)
Use Cases:
- Chinese content monitoring
- Domestic news search
- Academic literature discovery
Configuration Requirements:
- Metaso API Key (get from https://metaso.cn)
Search Scope:
- - Web search (default)
webpage - - Academic search
academic
Cost: ~3 credits/time
Placeholder4 search engine comparison cards
Configuration Parameters
1. Keywords
Required, search keywords or phrases.
Examples:
"人工智能 大模型" "OpenAI GPT-4" "renewable energy policy"
Tips:
- Use quotes for exact match:
"exact phrase" - Use space for AND relationship:
AI GPT - Combine multiple keywords for better relevance
2. Max Results
Optional, maximum number of results per search.
Default: 10
Range:
- Google Search: 1-10 (Google API limit)
- Jina/Firecrawl/Metaso: 1-50
Example:
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Cost Tip: More results = more credits consumed (charged per item)
3. Fetch Detail
Optional, whether to scrape detail page content of search results.
Default:
- Google/Jina/Firecrawl: (scrape by default)
true - Metaso: Not supported (always returns summary)
How It Works:
- Firecrawl: Returns Markdown directly during search (no extra overhead)
- Google/Jina: After search, uses fallback chain for secondary scraping
Firecrawl → Browserless - Metaso: Only returns snippet, doesn't support detail scraping
Example:
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Detail Scraping Mechanism
Scraping Strategy
Firecrawl First + Browserless Fallback:
- First try Firecrawl v2 Scrape API
- If fails, auto-degrade to Browserless (headless Chrome)
- If still fails, keep original snippet
CAPTCHA Detection
System automatically detects CAPTCHA pages to avoid saving invalid content:
- Detection keywords: ,
"verify you are human","captcha""robot check" - When CAPTCHA detected, use summary instead of detail
- Not counted in scraping success stats
Concurrency Limits
To avoid API rate limiting, system auto-controls concurrency:
- Firecrawl: Max 5 concurrent requests
- Browserless: Max 3 concurrent requests
- Adjustable in admin panel ()
/admin/system-config
Statistics
After each search completion, detail scraping stats are shown:
Detail Scraping Stats: - Total: 10 - Success: 8 - Failed: 2 - Firecrawl: 6 - Browserless: 2
PlaceholderDetail scraping fallback flow diagram
Configuration Examples
Example 1: Google Search + Fetch Details
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Description:
- Search keywords:
renewable energy policy 2024 - Return 10 results
- Auto-scrape detail pages for each result
- Use Firecrawl → Browserless fallback chain
Cost Estimate:
- Search: 10 items × 10 credits = 100 credits
- Detail scraping: Included in search
Example 2: Firecrawl Search (Recommended)
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Description:
- Use Firecrawl search (select subtype)
firecrawl - Returns Markdown content directly during search
- No secondary scraping needed, faster
- Highest content quality (professional cleaning)
Example 3: Metaso AI Search (Chinese)
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Description:
- Use Metaso AI search (select subtype)
metaso - Web search mode ()
webpage - Only returns summary (doesn't support )
fetch_detail - Suitable for Chinese content monitoring
Example 4: Academic Search
[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object],[object Object], ,[object Object], ,[object Object],hljs json
Description:
- Use Metaso AI academic search
- Search scope: (academic mode)
academic - Returns papers and academic articles
- Suitable for research and literature review
Best Practices
✅ Keyword Optimization
Use Exact Phrases:
- ❌ (too broad)
AI - ✅ (exact match)
"GPT-4 Turbo release notes"
Combine Multiple Keywords:
- ❌ (too many results)
news - ✅ (multiple keywords)
"climate change" policy 2024
✅ Cost Optimization
Disable Unnecessary Detail Scraping:
- Only need title and summary →
fetch_detail: false - Save ~50% cost
Choose Appropriate Search Engine:
- Chinese content → Metaso (3 credits/time)
- English content + details → Firecrawl (10 credits/time)
- Broad coverage → Google (10 credits/time)
✅ Schedule Strategy
News Monitoring:
- Schedule: Every 12 hours
- Dedup strategy: KEEP_OLD (avoid duplicate scraping)
Keyword Tracking:
- Schedule: 1-2 times daily
- Dedup strategy: UPDATE (get latest version)
⚠️ Common Issues
Issue 1: Search Results Fewer Than Expected
Reasons:
- Keywords too specific
- Search engine API limits
Solutions:
- Broaden keywords
- Try different search engines
Issue 2: High Detail Scraping Failure Rate
Reasons:
- Target site has anti-scraping mechanisms
- CAPTCHA verification present
Solutions:
- Use Firecrawl search (higher bypass rate)
- Disable , use summary only
fetch_detail
Issue 3: Duplicate Content
Reasons:
- Schedule too frequent
- Dedup strategy misconfigured
Solutions:
- Reduce search frequency (once daily)
- Use KEEP_OLD dedup strategy
Next Steps
- RSS Sources - Subscribe to website updates
- Web & Email Sources - Scrape specific pages
- Sources Overview - Learn about all source types