llm-public-opinion-analytics-assistant

Multi-platform hot search crawler and LLM-powered public opinion analysis system with clustering, sentiment analysis, and multi-channel push notifications

Skill file

Preview skill file
---
name: llm-public-opinion-analytics-assistant
description: Multi-platform hot search crawler and LLM-powered public opinion analysis system with clustering, sentiment analysis, and multi-channel push notifications
triggers:
  - set up public opinion monitoring system
  - analyze hot topics from multiple platforms
  - configure sentiment analysis for social media
  - create hot search crawler with push notifications
  - implement topic clustering and sentiment tracking
  - build multi-platform trending data aggregator
  - deploy LLM-based opinion analytics
  - monitor and analyze public sentiment trends
---

# LLM-Based Intelligent Public Opinion Analytics Assistant

> Skill by [ara.so](https://ara.so) — Data Skills collection.

## Overview

This project is an intelligent public opinion analysis assistant that combines real-time data from **26 trending lists across 15 mainstream platforms** with large language model (LLM) analysis capabilities. It provides conversational hot search queries, topic-specific searches, topic clustering analysis, and sentiment analysis through a web interface. The system supports keyboard shortcuts for crawler control, multi-platform data retrieval with direct navigation, and multi-channel hot topic push notifications (email, WeChat, Enterprise WeChat, Telegram).

## Key Features

- **Multi-Platform Data Collection**: Crawls 26 trending lists from 15 platforms
- **LLM-Powered Analysis**: Topic clustering, sentiment analysis, and trend detection
- **Conversational Interface**: Natural language queries for data exploration
- **Video Content Analysis**: Extracts information even from video-based news
- **Multi-Channel Notifications**: Email, WeChat Work, Telegram bot push notifications
- **Crawler Control**: Quick start/stop via keyboard shortcuts
- **Database Storage**: MySQL-based data persistence

## Installation

### Prerequisites

**Browser Driver Setup** (Required for news detail extraction):

1. **Check browser version**:
   - Open Edge/Chrome → Settings → About
   - Note your version (e.g., `115.0.5790.102`)

2. **Download matching driver**:
   - Chrome: https://chromedriver.chromium.org/
   - Edge: https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/

3. **Install driver**:
   ```bash
   # Linux/macOS
   sudo mv chromedriver /usr/local/bin/
   sudo chmod +x /usr/local/bin/chromedriver
   
   # Windows: Add driver directory to PATH
   # e.g., C:\WebDriver\chromedriver.exe
   ```

4. **Verify installation**:
   ```bash
   chromedriver --version
   ```

### Environment Setup

```bash
# Clone repository
git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git
cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Database Configuration

1. **Install MySQL** (8.0+ recommended)

2. **Create database and tables**:
   ```python
   # Reference init.py for schema
   import mysql.connector
   
   conn = mysql.connector.connect(
       host='localhost',
       user='your_user',
       password='your_password'
   )
   cursor = conn.cursor()
   
   # Create database
   cursor.execute("CREATE DATABASE IF NOT EXISTS hotsearch_db CHARACTER SET utf8mb4")
   cursor.execute("USE hotsearch_db")
   
   # Create tables (see init.py for full schema)
   cursor.execute("""
   CREATE TABLE IF NOT EXISTS hot_searches (
       id INT AUTO_INCREMENT PRIMARY KEY,
       platform VARCHAR(50),
       title VARCHAR(500),
       url VARCHAR(1000),
       rank INT,
       heat_value VARCHAR(100),
       timestamp DATETIME,
       content TEXT,
       sentiment VARCHAR(50),
       INDEX idx_platform_timestamp (platform, timestamp)
   ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
   """)
   
   conn.commit()
   cursor.close()
   conn.close()
   ```

### Configuration Files

**Create `.env` file in project root**:
```bash
# Database Configuration
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=your_user
MYSQL_PASSWORD=your_password
MYSQL_DATABASE=hotsearch_db

# LLM Configuration (OpenAI-compatible API)
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-4

# Or use Huawei Pangu Model (recommended for Chinese)
# PANGU_API_KEY=your_pangu_key
# PANGU_BASE_URL=your_pangu_endpoint

# Push Notification Channels
# Email (SMTP)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your_email@gmail.com
SMTP_PASSWORD=your_app_password
EMAIL_RECIPIENTS=recipient1@example.com,recipient2@example.com

# Enterprise WeChat Bot
WECHAT_WORK_WEBHOOK=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=your_key

# Telegram Bot
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_CHAT_ID=your_chat_id
```

**Crawler Configuration** (`hotsearchcrawler/settings.py`):
```python
# MySQL settings
MYSQL_HOST = 'localhost'
MYSQL_PORT = 3306
MYSQL_USER = 'your_user'
MYSQL_PASSWORD = 'your_password'
MYSQL_DATABASE = 'hotsearch_db'

# Optional: Platform-specific cookies
COOKIES = {
    'weibo': 'your_weibo_cookie',
    'douyin': 'your_douyin_cookie'
}

# Concurrent requests
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1
```

## Usage

### Starting the System

**1. Launch the web application**:
```bash
python app.py
```
Access at: `http://localhost:5000`

**2. Start crawlers** (via web interface or CLI):
```bash
# Manual crawler start for testing
python run_spiders.py

# Or test individual spider
cd hotsearchcrawler
scrapy crawl weibo_spider
scrapy crawl bilibili_spider
```

### Core API Usage

#### Conversational Query Interface

```python
from hotsearch_analysis_agent.agent import OpinionAnalysisAgent

# Initialize agent
agent = OpinionAnalysisAgent(
    api_key=os.getenv('OPENAI_API_KEY'),
    base_url=os.getenv('OPENAI_BASE_URL'),
    model=os.getenv('OPENAI_MODEL', 'gpt-4')
)

# Query hot searches
response = agent.query("Show me top trending topics about AI")
print(response['analysis'])

# Topic clustering
clusters = agent.cluster_topics("人工智能", days=7)
for cluster in clusters:
    print(f"Cluster: {cluster['theme']}")
    print(f"Articles: {len(cluster['articles'])}")
    print(f"Sentiment: {cluster['sentiment']}")

# Sentiment analysis
sentiment = agent.analyze_sentiment("特定主题关键词", platform="weibo")
print(f"Positive: {sentiment['positive']}%")
print(f"Negative: {sentiment['negative']}%")
print(f"Neutral: {sentiment['neutral']}%")
```

#### Direct Database Access

```python
import mysql.connector
from datetime import datetime, timedelta

conn = mysql.connector.connect(
    host=os.getenv('MYSQL_HOST'),
    user=os.getenv('MYSQL_USER'),
    password=os.getenv('MYSQL_PASSWORD'),
    database=os.getenv('MYSQL_DATABASE')
)

cursor = conn.cursor(dictionary=True)

# Get recent hot searches
cursor.execute("""
    SELECT platform, title, heat_value, url, timestamp
    FROM hot_searches
    WHERE timestamp >= %s
    ORDER BY rank ASC
    LIMIT 50
""", (datetime.now() - timedelta(hours=24),))

hot_topics = cursor.fetchall()

for topic in hot_topics:
    print(f"[{topic['platform']}] {topic['title']} - {topic['heat_value']}")
```

#### Setting Up Push Notifications

```python
from hotsearch_analysis_agent.push_service import PushService

# Initialize push service
push_service = PushService()

# Create push task
task_config = {
    'name': 'AI Tech Trending Monitor',
    'keywords': ['人工智能', '大模型', 'AI技术'],
    'platforms': ['weibo', 'bilibili', 'zhihu'],
    'schedule': '0 9,18 * * *',  # Twice daily at 9 AM and 6 PM
    'channels': ['email', 'wechat_work', 'telegram'],
    'analysis_depth': 'detailed',  # 'summary' or 'detailed'
    'min_heat_threshold': 100000
}

push_service.create_task(task_config)

# Test push notification
push_service.test_push(
    channel='email',
    subject='Test: AI Trending Report',
    content='This is a test notification.'
)
```

### Crawler Management

```python
from hotsearchcrawler.crawler_manager import CrawlerManager

manager = CrawlerManager()

# Start all crawlers
manager.start_all()

# Start specific platform
manager.start_spider('weibo_spider')

# Stop all crawlers
manager.stop_all()

# Get crawler status
status = manager.get_status()
print(f"Active crawlers: {status['active']}")
print(f"Items scraped: {status['items_count']}")
```

## Common Patterns

### Pattern 1: Daily Hot Topic Report

```python
from hotsearch_analysis_agent.report_generator import ReportGenerator
from datetime import datetime

generator = ReportGenerator()

# Generate daily report
report = generator.generate_daily_report(
    date=datetime.now(),
    topics=['科技', '财经', '国际'],
    include_sentiment=True,
    include_clustering=True,
    output_format='markdown'
)

# Save report
with open(f"report_{datetime.now().strftime('%Y%m%d')}.md", 'w', encoding='utf-8') as f:
    f.write(report)

# Auto-push report
generator.push_report(report, channels=['email', 'wechat_work'])
```

### Pattern 2: Real-Time Keyword Monitoring

```python
from hotsearch_analysis_agent.monitor import KeywordMonitor
import time

monitor = KeywordMonitor()

# Define alert keywords
critical_keywords = ['安全事故', '数据泄露', '产品召回']

monitor.add_keywords(critical_keywords)

# Start monitoring
while True:
    alerts = monitor.check_new_mentions()
    
    for alert in alerts:
        print(f"ALERT: {alert['keyword']} mentioned in {alert['platform']}")
        print(f"Title: {alert['title']}")
        print(f"Heat: {alert['heat_value']}")
        print(f"URL: {alert['url']}")
        
        # Immediate push notification
        monitor.push_alert(alert, priority='high')
    
    time.sleep(300)  # Check every 5 minutes
```

### Pattern 3: Multi-Platform Topic Correlation

```python
from hotsearch_analysis_agent.correlator import TopicCorrelator

correlator = TopicCorrelator()

# Find correlated topics across platforms
topic_keyword = "芯片技术"
correlation = correlator.find_cross_platform_correlation(
    keyword=topic_keyword,
    platforms=['weibo', 'zhihu', 'toutiao', 'bilibili'],
    time_window_hours=48
)

print(f"Topic: {topic_keyword}")
print(f"Total mentions: {correlation['total_mentions']}")
print(f"Platform distribution: {correlation['platform_dist']}")
print(f"Peak time: {correlation['peak_timestamp']}")
print(f"Related topics: {', '.join(correlation['related_topics'])}")
```

### Pattern 4: Sentiment Trend Analysis

```python
from hotsearch_analysis_agent.sentiment_tracker import SentimentTracker
import matplotlib.pyplot as plt

tracker = SentimentTracker()

# Track sentiment over time
sentiment_history = tracker.track_sentiment(
    keyword="新能源汽车",
    days=30,
    platforms=['weibo', 'zhihu']
)

# Visualize trend
dates = [s['date'] for s in sentiment_history]
positive = [s['positive'] for s in sentiment_history]
negative = [s['negative'] for s in sentiment_history]

plt.figure(figsize=(12, 6))
plt.plot(dates, positive, label='Positive', color='green')
plt.plot(dates, negative, label='Negative', color='red')
plt.xlabel('Date')
plt.ylabel('Sentiment Score (%)')
plt.title('Sentiment Trend: 新能源汽车')
plt.legend()
plt.savefig('sentiment_trend.png')
```

## Testing

### Test Individual Components

```bash
# Test crawler functionality
python runspider-test.py

# Test push notification
python test_push_task.py

# Test LLM analysis
python -m hotsearch_analysis_agent.test_analysis
```

### Sample Test Script

```python
# test_system.py
import os
from dotenv import load_dotenv
from hotsearch_analysis_agent.agent import OpinionAnalysisAgent

load_dotenv()

def test_query():
    agent = OpinionAnalysisAgent()
    result = agent.query("What are the top 5 trending topics today?")
    assert result is not None
    assert 'analysis' in result
    print("✓ Query test passed")

def test_clustering():
    agent = OpinionAnalysisAgent()
    clusters = agent.cluster_topics("科技", days=3)
    assert len(clusters) > 0
    print(f"✓ Clustering test passed ({len(clusters)} clusters found)")

def test_sentiment():
    agent = OpinionAnalysisAgent()
    sentiment = agent.analyze_sentiment("人工智能")
    assert 'positive' in sentiment
    assert 'negative' in sentiment
    print("✓ Sentiment analysis test passed")

if __name__ == '__main__':
    test_query()
    test_clustering()
    test_sentiment()
    print("\nAll tests passed!")
```

## Troubleshooting

### Browser Driver Issues

**Error**: `selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH`

**Solution**:
```bash
# Verify driver location
which chromedriver  # Linux/macOS
where chromedriver  # Windows

# Add to PATH if missing
export PATH=$PATH:/path/to/driver/directory  # Linux/macOS

# Or specify driver path in code
from selenium import webdriver
driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver')
```

### Database Connection Errors

**Error**: `mysql.connector.errors.ProgrammingError: Access denied for user`

**Solution**:
```sql
-- Grant proper privileges
GRANT ALL PRIVILEGES ON hotsearch_db.* TO 'your_user'@'localhost';
FLUSH PRIVILEGES;
```

### Crawler Not Collecting Data

**Diagnostics**:
```python
# Check crawler logs
import logging
logging.basicConfig(level=logging.DEBUG)

# Verify platform accessibility
import requests
response = requests.get('https://weibo.com/hot/search')
print(f"Status: {response.status_code}")

# Test individual spider
cd hotsearchcrawler
scrapy crawl weibo_spider -L DEBUG
```

### LLM Analysis Returning Empty Results

**Check**:
- API key validity and rate limits
- Network connectivity to LLM endpoint
- Input text encoding (must be UTF-8)

```python
# Debug LLM connection
import openai
openai.api_key = os.getenv('OPENAI_API_KEY')
openai.api_base = os.getenv('OPENAI_BASE_URL')

try:
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Test"}]
    )
    print("✓ LLM connection successful")
except Exception as e:
    print(f"✗ LLM error: {e}")
```

### Push Notifications Not Sending

**Email (SMTP)**:
```python
# Test SMTP connection
import smtplib
from email.mime.text import MIMEText

try:
    server = smtplib.SMTP(os.getenv('SMTP_HOST'), int(os.getenv('SMTP_PORT')))
    server.starttls()
    server.login(os.getenv('SMTP_USER'), os.getenv('SMTP_PASSWORD'))
    print("✓ SMTP connection successful")
    server.quit()
except Exception as e:
    print(f"✗ SMTP error: {e}")
```

**WeChat Work**:
```python
# Test webhook
import requests
import json

webhook_url = os.getenv('WECHAT_WORK_WEBHOOK')
data = {
    "msgtype": "text",
    "text": {"content": "Test notification"}
}
response = requests.post(webhook_url, json=data)
print(f"Response: {response.json()}")
```

## Advanced Configuration

### Custom LLM Model (Huawei Pangu)

```python
# hotsearch_analysis_agent/llm_config.py
from pangu_client import PanguClient

client = PanguClient(
    api_key=os.getenv('PANGU_API_KEY'),
    endpoint=os.getenv('PANGU_BASE_URL')
)

def analyze_with_pangu(text, task='sentiment'):
    response = client.complete(
        prompt=f"分析以下文本的{task}:\n{text}",
        max_tokens=2000,
        temperature=0.7
    )
    return response['text']
```

### Adding New Platform Crawlers

```python
# hotsearchcrawler/spiders/custom_spider.py
import scrapy
from hotsearchcrawler.items import HotSearchItem

class CustomPlatformSpider(scrapy.Spider):
    name = 'custom_spider'
    start_urls = ['https://example.com/trending']
    
    def parse(self, response):
        for item in response.css('.trending-item'):
            yield HotSearchItem(
                platform='custom_platform',
                title=item.css('.title::text').get(),
                url=item.css('a::attr(href)').get(),
                rank=item.css('.rank::text').get(),
                heat_value=item.css('.heat::text').get(),
                timestamp=datetime.now()
            )
```

### Custom Analysis Pipelines

```python
# hotsearch_analysis_agent/custom_analyzer.py
from hotsearch_analysis_agent.base_analyzer import BaseAnalyzer

class IndustrySpecificAnalyzer(BaseAnalyzer):
    def __init__(self, industry_keywords):
        super().__init__()
        self.industry_keywords = industry_keywords
    
    def filter_relevant_topics(self, topics):
        return [
            t for t in topics 
            if any(kw in t['title'] for kw in self.industry_keywords)
        ]
    
    def generate_industry_report(self, topics):
        relevant = self.filter_relevant_topics(topics)
        sentiment = self.batch_sentiment_analysis(relevant)
        clusters = self.cluster_by_subtopic(relevant)
        
        return {
            'total_mentions': len(relevant),
            'sentiment_distribution': sentiment,
            'topic_clusters': clusters,
            'key_influencers': self.identify_influencers(relevant)
        }
```

## Resources

- **Official Repository**: https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant
- **Huawei Pangu Model**: https://ai.gitcode.com/ascend-tribe/openpangu-embedded-7b-model
- **Scrapy Documentation**: https://docs.scrapy.org/
- **Selenium WebDriver**: https://www.selenium.dev/documentation/

Source

Creator's repository · aradotso/data-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk