llm-public-opinion-analytics

Multi-platform public opinion analysis assistant with web scraping, LLM-powered analytics, topic clustering, sentiment analysis, and multi-channel alerts

Skill file

Preview skill file
---
name: llm-public-opinion-analytics
description: Multi-platform public opinion analysis assistant with web scraping, LLM-powered analytics, topic clustering, sentiment analysis, and multi-channel alerts
triggers:
  - analyze public opinion trends across social media platforms
  - scrape hot search rankings from Chinese platforms
  - set up sentiment analysis for news topics
  - cluster trending topics using LLM
  - configure multi-channel alerts for hot topics
  - build a public opinion monitoring system
  - analyze trending news with deep learning
  - track social media hot searches in real-time
---

# LLM-Based Public Opinion Analytics Assistant

> Skill by [ara.so](https://ara.so) — Data Skills collection.

## Overview

This project is an intelligent public opinion analysis assistant that integrates real-time data from **15 mainstream platforms** across **26 ranking lists** with large language model (LLM) analysis capabilities. It provides conversational hot search queries, topic-specific searches, topic clustering, and sentiment analysis. The system supports:

- Real-time web scraping from platforms like Weibo, Bilibili, Douyin, Baidu, etc.
- LLM-powered content analysis (including video content extraction)
- Multi-channel push notifications (WeChat, Enterprise WeChat, Telegram, Email)
- Keyboard shortcuts for crawler control
- Quick data lookup and platform jumping

## Installation

### Prerequisites

1. **Python Environment**: Python 3.8+
2. **MySQL Database**: MySQL 5.7+ or 8.0+
3. **Browser Driver**: ChromeDriver or EdgeDriver

### Step 1: Browser Driver Setup

Download the driver matching your browser version:

- **Chrome**: [ChromeDriver Downloads](https://chromedriver.chromium.org/)
- **Edge**: [EdgeDriver Downloads](https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/)

Add the driver to your system PATH:

```bash
# macOS/Linux
export PATH=$PATH:/path/to/driver/directory

# Windows: Add to System Environment Variables
```

Verify installation:

```bash
chromedriver --version
# or
msedgedriver --version
```

### Step 2: Clone and Install Dependencies

```bash
git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git
cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### Step 3: Database Setup

Create MySQL database and tables:

```python
# Reference init.py for schema
import mysql.connector

conn = mysql.connector.connect(
    host=os.getenv('MYSQL_HOST', 'localhost'),
    user=os.getenv('MYSQL_USER'),
    password=os.getenv('MYSQL_PASSWORD')
)

cursor = conn.cursor()
cursor.execute("CREATE DATABASE IF NOT EXISTS hotsearch_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
cursor.execute("USE hotsearch_db")

# Create tables (see init.py for full schema)
cursor.execute("""
    CREATE TABLE IF NOT EXISTS hot_search_items (
        id INT AUTO_INCREMENT PRIMARY KEY,
        platform VARCHAR(50),
        title VARCHAR(500),
        url TEXT,
        rank_index INT,
        heat_value VARCHAR(100),
        collected_at DATETIME,
        content TEXT,
        sentiment VARCHAR(20),
        INDEX idx_platform (platform),
        INDEX idx_collected (collected_at)
    )
""")

conn.commit()
```

### Step 4: Environment Configuration

Create `.env` file in project root:

```bash
# MySQL Configuration
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=your_mysql_user
MYSQL_PASSWORD=your_mysql_password
MYSQL_DATABASE=hotsearch_db

# LLM Configuration (OpenAI-compatible API)
OPENAI_API_KEY=your_api_key
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4

# Or use Huawei Pangu Model (local deployment)
# PANGU_MODEL_PATH=/path/to/pangu/model
# PANGU_API_URL=http://localhost:8080

# Push Notification Channels
# WeChat Work Bot
WECHAT_WORK_BOT_WEBHOOK=your_webhook_url

# WeChat Work App
WECHAT_WORK_CORP_ID=your_corp_id
WECHAT_WORK_AGENT_ID=your_agent_id
WECHAT_WORK_SECRET=your_secret

# Telegram
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_CHAT_ID=your_chat_id

# Email (SMTP)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your_email@gmail.com
SMTP_PASSWORD=your_app_password
SMTP_RECIPIENTS=recipient1@example.com,recipient2@example.com
```

## Core Components

### 1. Web Scraping System (`hotsearchcrawler/`)

The crawler cluster supports 15 platforms with 26 ranking lists:

```python
# Run all spiders
python run_spiders.py

# Test specific spider
python runspider-test.py weibo  # Test Weibo scraper
```

#### Crawler Configuration

Edit `hotsearchcrawler/settings.py`:

```python
# MySQL settings
MYSQL_HOST = os.getenv('MYSQL_HOST', 'localhost')
MYSQL_PORT = int(os.getenv('MYSQL_PORT', 3306))
MYSQL_USER = os.getenv('MYSQL_USER')
MYSQL_PASSWORD = os.getenv('MYSQL_PASSWORD')
MYSQL_DATABASE = os.getenv('MYSQL_DATABASE', 'hotsearch_db')

# Optional: Platform-specific cookies
COOKIES = {
    'weibo': 'your_weibo_cookies',
    'bilibili': 'your_bilibili_cookies'
}

# Crawler settings
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1
RANDOMIZE_DOWNLOAD_DELAY = True
```

#### Available Platforms

- Social Media: Weibo, Douyin, Kuaishou
- Video: Bilibili, Tencent Video
- News: Baidu, Toutiao, Zhihu
- E-commerce: Taobao, JD.com
- Gaming: Steam, Tap Tap
- Others: Tieba, Douban, etc.

### 2. Analysis System (`hotsearch_analysis_agent/`)

LLM-powered analysis engine for topic clustering, sentiment analysis, and report generation.

```python
from hotsearch_analysis_agent.analyzer import HotSearchAnalyzer

# Initialize analyzer
analyzer = HotSearchAnalyzer(
    api_key=os.getenv('OPENAI_API_KEY'),
    api_base=os.getenv('OPENAI_API_BASE'),
    model_name=os.getenv('MODEL_NAME', 'gpt-4')
)

# Analyze topics
topics = analyzer.fetch_topics(
    platform='weibo',
    start_date='2026-05-01',
    end_date='2026-05-20'
)

# Topic clustering
clusters = analyzer.cluster_topics(topics, n_clusters=5)

# Sentiment analysis
for topic in topics:
    sentiment = analyzer.analyze_sentiment(topic['title'], topic['content'])
    print(f"{topic['title']}: {sentiment}")

# Generate report
report = analyzer.generate_report(
    query="人工智能与前沿科技",
    platforms=['weibo', 'bilibili', 'zhihu'],
    days=7
)
print(report)
```

#### Custom LLM Integration

```python
# Using Huawei Pangu Model (local deployment)
from hotsearch_analysis_agent.llm import PanguLLM

pangu = PanguLLM(
    model_path=os.getenv('PANGU_MODEL_PATH'),
    api_url=os.getenv('PANGU_API_URL')
)

response = pangu.generate(
    prompt="分析以下新闻的情感倾向:\n{news_content}",
    max_tokens=500
)
```

### 3. Web Application (`app.py`)

FastAPI-based web interface for interactive queries and control.

```python
# Start the web application
python app.py

# Default runs on http://localhost:8000
```

#### API Endpoints

```python
from fastapi import FastAPI
from hotsearch_analysis_agent.api import router

app = FastAPI()
app.include_router(router)

# Example API calls
import httpx

# Query hot searches
response = httpx.get('http://localhost:8000/api/hot-search', params={
    'platform': 'weibo',
    'limit': 20
})

# Search by keyword
response = httpx.post('http://localhost:8000/api/search', json={
    'keyword': '人工智能',
    'platforms': ['weibo', 'zhihu'],
    'days': 7
})

# Start crawler
response = httpx.post('http://localhost:8000/api/crawler/start', json={
    'platforms': ['weibo', 'bilibili']
})

# Stop crawler
response = httpx.post('http://localhost:8000/api/crawler/stop')
```

## Push Notification System

Configure and test multi-channel alerts:

```python
# test_push_task.py
from hotsearch_analysis_agent.push import PushManager

manager = PushManager()

# Configure push task
task = {
    'name': 'AI Tech Monitor',
    'query': '人工智能',
    'platforms': ['weibo', 'zhihu', 'bilibili'],
    'schedule': '0 9,18 * * *',  # Cron format: 9 AM and 6 PM daily
    'channels': ['wechat_work', 'telegram', 'email'],
    'min_heat': 100000  # Minimum heat value threshold
}

manager.create_task(task)

# Test push manually
report = """
## AI Technology Hot Topics - 2026-05-20

### Key Findings
- GPT-6 context window leaked: 2M tokens
- DeepSeek V4 uses Huawei Ascend chips
- Chinese LLM API calls lead globally for 5 weeks

[Full report content...]
"""

# Send to WeChat Work
manager.send_wechat_work(report)

# Send to Telegram
manager.send_telegram(report)

# Send email
manager.send_email(
    subject="AI Technology Hot Topics - 2026-05-20",
    content=report
)
```

### Push Channel Configuration

```python
# WeChat Work Bot (Group Webhook)
import requests

def send_wechat_work_bot(content):
    webhook = os.getenv('WECHAT_WORK_BOT_WEBHOOK')
    data = {
        "msgtype": "markdown",
        "markdown": {
            "content": content
        }
    }
    requests.post(webhook, json=data)

# Telegram Bot
from telegram import Bot

def send_telegram(content):
    bot = Bot(token=os.getenv('TELEGRAM_BOT_TOKEN'))
    chat_id = os.getenv('TELEGRAM_CHAT_ID')
    bot.send_message(chat_id=chat_id, text=content, parse_mode='Markdown')

# Email via SMTP
import smtplib
from email.mime.text import MIMEText

def send_email(subject, content):
    msg = MIMEText(content, 'html', 'utf-8')
    msg['Subject'] = subject
    msg['From'] = os.getenv('SMTP_USER')
    msg['To'] = os.getenv('SMTP_RECIPIENTS')
    
    with smtplib.SMTP(os.getenv('SMTP_HOST'), int(os.getenv('SMTP_PORT'))) as server:
        server.starttls()
        server.login(os.getenv('SMTP_USER'), os.getenv('SMTP_PASSWORD'))
        server.send_message(msg)
```

## Common Usage Patterns

### Pattern 1: Daily Hot Topic Monitoring

```python
from datetime import datetime, timedelta
from hotsearch_analysis_agent.analyzer import HotSearchAnalyzer
from hotsearch_analysis_agent.push import PushManager

analyzer = HotSearchAnalyzer()
push_manager = PushManager()

# Get yesterday's hot topics
yesterday = datetime.now() - timedelta(days=1)
topics = analyzer.fetch_topics(
    platforms=['weibo', 'zhihu', 'bilibili'],
    start_date=yesterday.strftime('%Y-%m-%d'),
    heat_threshold=50000
)

# Cluster and analyze
clusters = analyzer.cluster_topics(topics, n_clusters=5)

# Generate report
report = analyzer.generate_report_from_clusters(clusters)

# Push to all channels
push_manager.broadcast(report, channels=['wechat_work', 'telegram', 'email'])
```

### Pattern 2: Keyword Alert System

```python
# Monitor specific keywords and send immediate alerts
from hotsearch_analysis_agent.monitor import KeywordMonitor

monitor = KeywordMonitor(
    keywords=['芯片', 'AI', '大模型', '华为'],
    platforms=['weibo', 'toutiao', 'zhihu'],
    check_interval=300  # Check every 5 minutes
)

def on_match(topic):
    """Callback when keyword is matched"""
    alert = f"""
    🔔 Keyword Alert: {topic['title']}
    Platform: {topic['platform']}
    Heat: {topic['heat_value']}
    URL: {topic['url']}
    """
    push_manager.send_telegram(alert)

monitor.start(callback=on_match)
```

### Pattern 3: Deep Content Analysis

```python
# Analyze news detail pages (including video content)
from hotsearch_analysis_agent.content_extractor import ContentExtractor

extractor = ContentExtractor()

# Get detailed content from URL
url = 'https://www.bilibili.com/video/BV13pSoBBEvX/'
content = extractor.extract(url)

print(f"Title: {content['title']}")
print(f"Type: {content['type']}")  # 'video' or 'article'
print(f"Content: {content['text'][:500]}...")  # Extracted transcript/text

# Analyze sentiment
sentiment = analyzer.analyze_sentiment(content['title'], content['text'])
print(f"Sentiment: {sentiment}")

# Extract entities
entities = analyzer.extract_entities(content['text'])
print(f"Entities: {entities}")
```

### Pattern 4: Custom Report Generation

```python
# Generate custom analytical report
report_config = {
    'title': '科技行业周报',
    'query': '人工智能 OR 芯片 OR 量子计算',
    'platforms': ['all'],
    'date_range': 7,
    'sections': [
        'core_findings',  # Key discoveries
        'news_details',   # Detailed news list
        'trend_analysis', # Trend analysis
        'entity_network'  # Entity relationship graph
    ],
    'output_format': 'markdown'
}

report = analyzer.generate_custom_report(**report_config)

# Save to file
with open(f"report_{datetime.now().strftime('%Y%m%d')}.md", 'w', encoding='utf-8') as f:
    f.write(report)
```

## Troubleshooting

### Issue 1: Browser Driver Errors

```
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH
```

**Solution**: Ensure ChromeDriver/EdgeDriver is in system PATH and matches browser version.

```bash
# Check driver version
chromedriver --version

# Check Chrome version
google-chrome --version  # Linux
# or open chrome://version in browser

# Download matching version from https://chromedriver.chromium.org/
```

### Issue 2: Database Connection Failures

```
mysql.connector.errors.ProgrammingError: Access denied for user
```

**Solution**: Verify MySQL credentials in `.env` and ensure user has proper permissions.

```sql
-- Grant permissions
GRANT ALL PRIVILEGES ON hotsearch_db.* TO 'your_user'@'localhost';
FLUSH PRIVILEGES;
```

### Issue 3: LLM API Rate Limits

```
openai.error.RateLimitError: Rate limit exceeded
```

**Solution**: Implement request throttling or switch to local model:

```python
import time
from functools import wraps

def rate_limit(calls_per_minute=10):
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]
    
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            wait_time = min_interval - elapsed
            if wait_time > 0:
                time.sleep(wait_time)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limit(calls_per_minute=10)
def call_llm(prompt):
    return analyzer.generate(prompt)
```

### Issue 4: Crawler Being Blocked

**Solution**: Rotate user agents and add delays:

```python
# In hotsearchcrawler/settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

DOWNLOAD_DELAY = 3
RANDOMIZE_DOWNLOAD_DELAY = True
CONCURRENT_REQUESTS_PER_DOMAIN = 2
```

### Issue 5: Encoding Issues with Chinese Text

**Solution**: Ensure UTF-8 encoding throughout:

```python
# Database connection
import mysql.connector

conn = mysql.connector.connect(
    host=os.getenv('MYSQL_HOST'),
    user=os.getenv('MYSQL_USER'),
    password=os.getenv('MYSQL_PASSWORD'),
    database=os.getenv('MYSQL_DATABASE'),
    charset='utf8mb4',
    collation='utf8mb4_unicode_ci'
)

# File operations
with open('report.md', 'w', encoding='utf-8') as f:
    f.write(report)
```

## Advanced Configuration

### Using Huawei Pangu Model (Local Deployment)

Download and deploy the model:

```bash
# Download from https://ai.gitcode.com/ascend-tribe/openpangu-embedded-7b-model
# Start model service
python -m hotsearch_analysis_agent.llm.pangu_server --model_path /path/to/model --port 8080
```

Configure in code:

```python
from hotsearch_analysis_agent.llm import PanguLLM

analyzer = HotSearchAnalyzer(
    llm=PanguLLM(api_url='http://localhost:8080')
)
```

### Distributed Crawling

Scale up with multiple crawler instances:

```bash
# Instance 1: Weibo, Zhihu
python run_spiders.py --platforms weibo,zhihu

# Instance 2: Bilibili, Douyin
python run_spiders.py --platforms bilibili,douyin

# Instance 3: News platforms
python run_spiders.py --platforms baidu,toutiao
```

## Project Structure Reference

```
.
├── app.py                          # Web application entry
├── run_spiders.py                  # Crawler launcher
├── runspider-test.py               # Crawler testing
├── test_push_task.py               # Push notification testing
├── init.py                         # Database initialization
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── hotsearchcrawler/               # Crawler cluster
│   ├── spiders/                    # Platform-specific spiders
│   ├── settings.py                 # Crawler settings
│   └── pipelines.py                # Data pipelines
└── hotsearch_analysis_agent/       # Analysis system
    ├── analyzer.py                 # Core analysis engine
    ├── llm/                        # LLM integrations
    ├── push/                       # Push notification modules
    ├── api/                        # Web API endpoints
    └── content_extractor.py        # Content extraction utilities
```

Source

Creator's repository · aradotso/data-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk