tooluniverse-literature-deep-research

Deep literature review — PubMed, EuropePMC, bioRxiv preprints, citation networks, evidence synthesis. Disambiguates queries, runs collision-aware searches, grades evidence T1-T4, and produces structured reports. Use for systematic literature review, meta-analysis evidence collection, and detailed answer-with-citations workflows.

Skill file

Preview skill file
---
name: tooluniverse-literature-deep-research
description: Deep literature review — PubMed, EuropePMC, bioRxiv preprints, citation networks, evidence synthesis. Disambiguates queries, runs collision-aware searches, grades evidence T1-T4, and produces structured reports. Use for systematic literature review, meta-analysis evidence collection, and detailed answer-with-citations workflows.
disable-model-invocation: true
---

# Literature Deep Research

Systematic literature research: disambiguate, search with collision-aware queries, grade evidence, produce structured reports.

**KEY PRINCIPLES**: (1) Disambiguate first (2) Right-size deliverable (3) Grade every claim T1-T4 (4) All sections mandatory even if "limited evidence" (5) Source attribution for every claim (6) English-first queries, respond in user's language (7) Report = deliverable, not search log

---

## LOOK UP, DON'T GUESS

Search PubMed/EuropePMC FIRST before reasoning. A published paper beats memory.

**Factoid search strategy:**
1. Extract KEY TERMS (most specific nouns/verbs)
2. `EuropePMC_search_articles(query="term1 term2 term3", limit=5)`
3. No results -> BROADEN (remove most restrictive term)
4. Too many -> NARROW (add specific terms)
5. Answer usually in abstract of top results
6. Failed query -> try DIFFERENT TERMS/synonyms, don't repeat

---

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## Workflow

```
Phase 0: Clarify + Mode Select → Phase 1: Disambiguate + Profile → Phase 2: Literature Search → Phase 3: Report
```

---

## Phase 0: Mode Selection

| Mode | When | Deliverable |
|------|------|-------------|
| **Factoid** | Single concrete question | 1-page fact-check report + bibliography |
| **Mini-review** | Narrow topic | 1-3 page narrative |
| **Full Deep-Research** | Comprehensive overview | 15-section report + bibliography |

### Factoid Mode (Fast Path)
```markdown
# [TOPIC]: Fact-check Report
## Question / ## Answer (with evidence rating) / ## Source(s) / ## Verification Notes / ## Limitations
```

### Domain Detection

| Pattern | Domain | Action |
|---------|--------|--------|
| Gene/protein symbol | Biological target | Full bio disambiguation |
| Drug name | Drug | Drug disambiguation (1.5) |
| Disease name | Disease | Disease disambiguation (1.6) |
| CS/ML topic | General academic | Skip bio tools, literature-only |
| Cross-domain | Interdisciplinary | Resolve each entity in its domain |

### Cross-Skill Delegation
- Gene/protein deep-dive: `tooluniverse-target-research`
- Drug profile: `tooluniverse-drug-research`
- Disease profile: `tooluniverse-disease-research`

Use this skill for **literature synthesis**. Use specialized skills for **entity profiling**. For max depth, run both.

---

## Phase 1: Subject Disambiguation + Profile

### 1.1 Biological Target Resolution
```
UniProt_search → UniProt_get_entry_by_accession → UniProt_id_mapping
ensembl_lookup_gene → MyGene_get_gene_annotation
```

### 1.2 Naming Collision Detection
Check first 20 results. If >20% off-topic, build negative filter: `NOT [collision1] NOT [collision2]`.
Gene family: `"ADAR" NOT "ADAR2" NOT "ADARB1"`. Cross-domain: add context terms.

### 1.3 Baseline Profile (Bio Targets)
```
InterPro_get_protein_domains, UniProt_get_ptm_processing_by_accession, HPA_get_subcellular_location,
GTEx_get_median_gene_expression, GO_get_annotations_for_gene, Reactome_map_uniprot_to_pathways,
STRING_get_protein_interactions, intact_get_interactions, OpenTargets_get_target_tractability_by_ensemblID
```
GPCR targets: delegate to `tooluniverse-target-research`.

### 1.5 Drug Disambiguation
**Identity**: `OpenTargets_get_drug_chembId_by_generic_name`, `ChEMBL_get_drug`, `PubChem_get_CID_by_compound_name`, `drugbank_get_drug_basic_info_by_drug_name_or_id`
**Targets**: `ChEMBL_get_drug_mechanisms`, `OpenTargets_get_associated_targets_by_drug_chemblId`, `DGIdb_get_drug_gene_interactions`
**Safety**: `OpenTargets_get_drug_adverse_events_by_chemblId`, `OpenTargets_get_drug_indications_by_chemblId`, `search_clinical_trials`

### 1.6 Disease Disambiguation
```
OpenTargets disease search → EFO/MONDO IDs
DisGeNET_get_disease_genes, DisGeNET_search_disease
CTD_get_disease_chemicals
```

### 1.7 Compound Queries (e.g., "metformin in breast cancer")
Resolve both entities, then cross-reference via CTD_get_chemical_gene_interactions, CTD_get_chemical_diseases, OpenTargets drug-target/drug-disease tools. Intersect shared targets/pathways.

### 1.8 General Academic / 1.9 Interdisciplinary
Non-bio: skip bio tools, use ArXiv/DBLP/OSF. Cross-domain: resolve bio entities with 1.1-1.3, search CS/general in parallel, merge and cross-reference.

---

## Phase 2: Literature Search

**Methodology stays internal. Report shows findings, not process.**

### 2.1 Query Strategy
**Step 1: Seeds** (15-30 core papers): domain-specific title searches with date/sort filters.
**Step 2: Citation expansion**: `PubMed_get_cited_by`, `EuropePMC_get_citations/references`, `PubMed_get_related`, `SemanticScholar_get_recommendations`, `OpenCitations_get_citations`
**Step 3: Collision-filtered broader queries**: `"[TERM]" AND ([context]) NOT [collision]`

### 2.2 Literature Tools — core set + adaptive by domain

Run the **core multi-field set on every review** (catches what any single index misses), then add the domain rows that match the subject. Don't fire every source blindly — 6–10 well-chosen indexes beat 20 noisy ones.

**ALWAYS run (core, all disciplines)**: `PubMed_search_articles`, `EuropePMC_search_articles`, `openalex_search_works` (query param `search`/`query`) **or** `openalex_literature_search` (query param `search_keywords`) — pick one and match its param; mixing them silently returns off-topic results — and `SemanticScholar_search_papers`

**Then add by domain:**

| Domain | Add these | Notes |
|--------|-----------|-------|
| Biomedical / clinical | `PMC_search_papers` (full text), `PubTator3_LiteratureSearch` (entity & `relations:` queries), `PubMed_Guidelines_Search` (clinical guidelines) | PubTator normalizes gene/drug/disease entities |
| Biology (ecology/evolution/plant) | **EuropePMC as PRIMARY** + OpenAlex | PubMed returns 0–1 for non-clinical biology |
| CS / ML / AI | `ArXiv_search_papers`, `DBLP_search_publications` | arXiv + CS bibliography |
| Physics / HEP / astro | `InspireHEP_search_papers` | 1.6M+ particle/astro records |
| Broad / hard-to-find / OA | `Crossref_search_works`, `CORE_search_papers`, `DOAJ_search_articles`, `Fatcat_search_scholar` | DOI registry + OA aggregators + Internet Archive Scholar |
| Regional / EU-funded | `OpenAIRE_search_publications`, `HAL_search_archive` | EU open science + French national archive |
| Datasets / software / outputs | `Figshare_search_articles`, `Zenodo_search_records` | Citable DOIs for data & code |
| Preprints (latest) | `EuropePMC_search_articles(source='PPR')`, `OSF_search_preprints`, `BioRxiv_get_preprint`/`MedRxiv_get_preprint` (DOI lookup) | bioRxiv/medRxiv/PsyArXiv etc. |

**Multi-source**: `advanced_literature_search_agent` (12+ DBs; needs Azure key -- fallback: query the core set individually).
**Citation impact**: `iCite_search_publications` (RCR/APT), `iCite_get_publications` (by PMID), `scite_get_tallies` (support/contradict). PubMed-only; for CS use SemanticScholar.

A domain-specific index returning 0 (e.g. ArXiv on a pure-clinical topic) is normal — only worry if the whole core set is empty.

### 2.3-2.4 Full-Text & PubMed Zero-Result Fallback

Full-text: see `FULLTEXT_STRATEGY.md` for three-tier strategy.

**CRITICAL**: PubMed returns 0 for ~30% of valid queries. **Always retry with EuropePMC** when PubMed returns empty. This is not optional.

### 2.5 Tool Failure / OA Handling
Retry once -> fallback tool. Key fallbacks: PubMed_get_cited_by -> EuropePMC_get_citations -> OpenCitations. OA: Unpaywall if configured, else Europe PMC/PMC/OpenAlex flags.

---

## Phase 3: Evidence Grading

| Tier | Label | Bio Example | CS/ML Example |
|------|-------|-------------|---------------|
| **T1** | Mechanistic | CRISPR KO + rescue, RCT | Formal proof, controlled ablation |
| **T2** | Functional | siRNA knockdown phenotype | Benchmark with baselines |
| **T3** | Association | GWAS, screen hit | Observational, case study |
| **T4** | Mention | Review article | Survey, workshop abstract |

Inline: `Target X regulates Y [T1: PMID:12345678]`. Per theme: summarize evidence distribution.

---

## Report Output

| File | Mode |
|------|------|
| `[topic]_report.md` | Full |
| `[topic]_factcheck_report.md` | Factoid |
| `[topic]_bibliography.json` + `.csv` | All |

**Progressive update**: create report with all section headers immediately. Fill after each phase. Write Executive Summary LAST.

Use 15-section template from `REPORT_TEMPLATE.md`. Domain adaptations: bio (architecture/expression/GO/disease), drug (properties/MOA/PK/safety), disease (epi/patho/genes/treatments), general (history/theories/evidence/applications).

---

## Communication

Brief progress updates only: "Resolving identifiers...", "Building paper set...", "Grading evidence..."
Do NOT expose: raw tool outputs, dedup counts, search round details.

---

## References

- `TOOL_NAMES_REFERENCE.md` -- 123 tools with parameters
- `REPORT_TEMPLATE.md` -- template, domain adaptations, bibliography, completeness checklist
- `FULLTEXT_STRATEGY.md` -- three-tier full-text verification
- `WORKFLOW.md` -- compact cheat-sheet
- `EXAMPLES.md` -- worked examples

Source

Creator's repository · mims-harvard/tooluniverse

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk