Hermes
返回

论文推荐-19-生信技术 · Jun 15 19:05

2026-06-15 19:00 · 40 条 · deepseek-v4-flash

user · 2026-06-15 19:00
[IMPORTANT: The user has invoked the "paper-recommender" skill, indicating they want you to follow its instructions. The full skill content is loaded below.] --- name: paper-recommender description: "每日定时推送高质量论文推荐 — PubMed检索 → Hermes+Claude双重评估 → 仅推最优1篇 → HTML深度解读" version: 2.10.0 author: Hermes tags: [PubMed, 论文推荐, Claude评估, 文献解读, 科研] --- # 📚 每日论文推荐工作流 (Paper Recommender) ## 职责 根据用户的6个研究兴趣方向,每日定时从PubMed检索近期高质量论文,经过Hermes+Claude双重评估后,**只选最优的1篇**,下载PDF+提取原图,制作公众号风格的HTML深度解读文件并推送。 ## 用户研究方向 1. 帕金森病(PD) / GZR18 / α-synuclein 2. 神经炎症 / 小胶质细胞 / 星形胶质细胞 3. 细胞死亡方式 (Mitoxyperiosis / 铁死亡 / 焦亡 / 泛素化降解) 4. 外泌体 / 细胞外囊泡 / 神经退行性疾病 5. 脑缺血 / 脑卒中 / 神经保护 6. 生信分析 / 单细胞测序 / 多组学 ## ⚡ 用户核心课题(跨所有方向的超优先主题) **博一课题**:Warburg乳酸 → PCAF介导RhoA K118/K162乳酸化 → RhoA持续激活 → 线粒体远离质膜 → 拮抗mitoxyperiosis 所有方向的论文推荐,只要与以下关键词相关的论文**自动获得更高优先级**,应优先于同评分不相关的论文推送: - Mitoxyperiosis / 线粒体-质膜接触 / 新型细胞死亡 - 乳酸化修饰(Kla) / PCAF / p300 / 组蛋白乳酸化 - RhoA信号 / 细胞骨架重排 / 小G蛋白修饰 - Warburg效应 / 肿瘤代谢 / LDHA抑制 - 糖酵解 / 乳酸代谢 / 肿瘤免疫代谢 - 禁食疗法 / 卡路里限制 / 联合免疫治疗 - 代谢组学 / 能量代谢 / 肿瘤微环境 评估论文时,额外增加一个维度:**"课题借鉴价值"** — 这篇论文中有什么技术、概念、实验设计可以直接复用到用户的mitoxyperiosis-乳酸化课题? ## 定时推送安排 | 时间 | 主题 | 关键词 | |:----:|------|--------| | 08:00 | PD/帕金森病 | Parkinson's disease, α-synuclein, MPTP, dopamine neuron | | 11:00 | 神经炎症/胶质细胞 | neuroinflammation, microglia, astrocyte, NLRP3 | | 13:00 | 细胞死亡/信号通路 | cell death, mitoxyperiosis, ferroptosis, ubiquitination | | 15:00 | 外泌体/细胞通讯 | exosome, extracellular vesicle, neural, brain | | 17:00 | 脑缺血/神经保护 | cerebral ischemia, stroke, neuroprotection, MCAO | | 19:00 | 生信/前沿技术 | single-cell, multi-omics, bioinformatics, spatial transcriptomics | ## ⚠️ 已知工具陷阱 ### 文件系统隔离(重要!) `execute_code` 运行在沙箱环境中,**不能直接访问** `terminal` 工具写入 `/tmp/` 或 `$HOME/` 的文件。反之亦然。 - ❌ 错误做法:terminal写入`/tmp/fig1_b64.txt`,然后用execute_code的`open()`读取 → 文件不存在 - ✅ 正确做法1:所有数据处理在terminal中完成(Python脚本通过`python script.py`执行) - ✅ 正确做法2:将数据处理写成`.py`文件写入磁盘,再用terminal(`python path/to/script.py`)执行 ### 大文件HTML生成 当HTML嵌入base64图片时(5张Nature原图 ≈ 4MB),`write_file`会因大小限制失败。 - ❌ 不要直接用write_file写4MB+的HTML - ✅ 策略:写一个Python builder脚本到磁盘(`write_file → build_html.py`),然后`terminal("python build_html.py")`执行它 ### Claude CLI 长提示词超时陷阱 claude -p 传入大段文本时容易超时(exit code 124,返回空),特别是在Windows git-bash环境。 - ❌ 一次性传入4篇论文的完整摘要+评估指令 → 超时 - ✅ 策略1:将评估指令简化,每篇论文控制到1-3句关键信息;或拆分为多次调用分别评估单篇 - ✅ 策略2:用 `head -N` 或 `tail -N` 截断输出(如 `claude -p "评估..." 2>&1 | head -60`),防止Claude生成过长内容导致超时。即使Claude内部未完成,前60行通常已包含评分结果 ### Cell Press PII清理模式(重要!) PubMed esummary返回的PII含括号和连字符(如 `S0896-6273(26)00386-7`),CDN URL需要去掉这些字符: - ❌ `tr -d '()'` → 只去除括号,留下连字符 → CDN返回404 - ✅ `tr -d '()-'` → 同时去除括号和连字符 → CDN返回200 - 详见 `references/figure-download-patterns.md` 中的修正实例 ### Python f-string与JavaScript花括号冲突(HTML Builder陷阱) 当用Python f-string构建HTML时,JavaScript代码中的`{}`(如`canvas.width`、`img.onload = function() { }`、`loadImage("fig1", fig1_b64)`)会被Python误解析为f-string占位符,导致SyntaxError。 - ❌ `html = f'''...<script>img.onload = function() { canvas.width = 780; }</script>...'''` → SyntaxError - ✅ 策略1:用字符串列表拼接代替f-string(将HTML拆分为多个部分append到列表,用`+`或`''.join()`组合) - ✅ 策略2:将HTML结构和JS作为独立字符串,用`.replace()`注入base64数据 - ✅ 策略3:在JS中所有`{`和`}`都必须加倍成`{{`和`}}`(仅在必须用f-string时) - 推荐策略:使用函数`build_html(fig_b64, ...)`接受字典参数,返回拼接好的字符串(详见Step 6 HTML模板) ### Unicode Emoji Surrogate 编码陷阱(Python HTML Builder) 当用Python三引号字符串构建包含emoji的HTML时,**不要使用UTF-16 surrogate pair转义序列**(如 `\ud83d\udcc5` 表示📅)——Python 3的`str.encode('utf-8')`不允许surrogate字符,写入文件时会抛出`UnicodeEncodeError: surrogates not allowed`。 - ❌ `parts.append('<span>\ud83d\udcc5 2026 Jun 9</span>')` → `UnicodeEncodeError` - ✅ 使用实际Unicode字符(直接粘贴emoji)或使用 `\U` 完整码点:`\U0001F4C5` - ✅ 最佳实践:**完全避免emoji**,改用纯文本标记如 `[日期]`、`[期刊]`、`[星级]`等,既避免编码问题,也确保在任何终端/浏览器中显示一致 ```python # ❌ 错误:surrogate pairs导致写入失败 parts.append('<span>\ud83d\udcc5 2026 Jun 9</span>') # ✅ 正确:纯文本标记 parts.append('<span>[日期] 2026 Jun 9</span>') ``` ### Cell Press Figure 尺寸选择(带宽优化) Cell Press CDN提供两种尺寸:`gr{N}_lrg.jpg`(高清大图,单张1.5-3.5MB)和 `gr{N}.jpg`(普通尺寸,单张0.1-0.4MB)。对于HTML嵌入: - 使用 `gr{N}.jpg`(无 _lrg 后缀)完全足够,页面加载更快 - 高清 `gr{N}_lrg.jpg` 仅在需要放大查看细节时使用 - 两种尺寸均已验证(HTTP 200 ✅) ### Taylor & Francis期刊(Gut Microbes等)— Cloudflare完全封锁(2026-06-15发现) Taylor & Francis旗下期刊(如Gut Microbes, Autophagy, Cell Death & Disease等)使用Cloudflare全面防护,**curl/browser均无法绕过**: ```bash # 所有端点均返回403/Cloudflare挑战: # 主页: https://www.tandfonline.com/doi/full/{DOI} → 403 Cloudflare # PDF: https://www.tandfonline.com/doi/pdf/{DOI} → 403 Cloudflare # 图片CDN: https://www.tandfonline.com/na101/... → 403 Cloudflare # 无PMC版本的论文 → 完全无法获取任何图片或PDF ``` **识别特征**:esummary返回 `source: "Gut Microbes"` 或 `publisher: "Informa UK Limited"` 即为Taylor & Francis期刊。 **应对策略**:若推荐该论文,但所有Figure/PDF被Cloudflare封锁,按以下优先级降级: 1. 若该论文无可替代(极高价值/唯一匹配),仍推送,但HTML使用**纯文本深度解读**(无原图),用科学写作方式描述关键结果 2. 若该论文并非唯一选择,**切换至第二候选论文**(评分次高但Figures可下载者)——在Step 4决策时即纳入"Figure可访问性"作为评估维度 3. 降级时在推送消息中注明:"⚠️ 原最优论文因期刊Cloudflare防护无法获取原图,已切换至可获取原图的次优推荐" ⚠️ **比对Cell Press**:Cell Press虽然主页同样有Cloudflare防护,但其Figure图片通过 `ars.els-cdn.com` CDN分发(不受Cloudflare保护),可通过curl直接下载。而Taylor & Francis的CDN和主页在同一域下,同样受Cloudflare保护。**两者不可混淆**。 ### Windows Python路径陷阱 在 Windows git-bash 中,Python 脚本不识别 MSYS 路径格式(`/c/Users/...`): - ❌ `open("/c/Users/forgase/fig1_b64.txt")` → FileNotFoundError - ✅ `open("C:/Users/forgase/fig1_b64.txt")` — 正斜杠在 Windows Python 中可识别 - 同理,`python3` 命令在 Windows 上可能被应用执行别名拦截 → 始终使用 `python` ## 执行流程(完整版) ### Step 1: PubMed检索 — 三阶段策略(含跨域交叉搜索) **执行位置:terminal 而非 execute_code** PubMed API调用应在 `terminal` 中通过 `python << 'PYEOF'` heredoc(推荐)或 `python -c "..."` 执行。`terminal` 中的 Python 连接更稳定,且结果可被后续terminal命令使用(写入文件、管道传递)。避免在 `execute_code` 中做PubMed检索——沙箱环境的网络连接可能不稳定,且JSON结果无法跨沙箱传递给terminal。 ⚠️ 复杂脚本(含f-string、多行代码、引号嵌套)优先使用 heredoc 而非 `-c`: ```bash python << 'PYEOF' import urllib.request, json, re # 多行代码,可直接使用单引号、双引号、f-string print(f"IDs: {', '.join(idlist)}") PYEOF ``` **日期范围两种方式(方式B更可靠)**: ```python # 方式A: mindate/maxdate 参数(有时返回0结果,不可靠) params = { 'db': 'pubmed', 'term': query, 'mindate': '<3个月前>', 'maxdate': '<今天>', 'datetype': 'pdat', 'retmax': 15, 'retmode': 'json' } # 方式B: [dp] 查询语法(推荐,已验证可靠) query = '(ferroptosis[Title/Abstract] OR ...) AND ("2026/03/01"[dp] : "2026/06/12"[dp])' params = { 'db': 'pubmed', 'term': query, 'retmax': 15, 'retmode': 'json' } ``` `dp` (Date - Publication) 语法格式:`"YYYY/MM/DD"[dp] : "YYYY/MM/DD"[dp]`。日期之间需要空格+冒号+空格。此方式不受 `mindate/maxdate` 参数兼容性问题影响。 详见 `references/pubmed-API-patterns.md` 第4节。 **阶段A — 主题搜索**:用关键词搜该方向近期论文 **阶段B — 高IF期刊过滤**:用主题词 + `("Nature"[Journal] OR "Cell"[Journal] OR ...)` 格式搜 ⚠️ 关键陷阱: - `sort=date` 被PubMed忽略(返回warning),不要用 - 期刊名必须用 `"全称"[Journal]` 格式,裸词会被翻译为MeSH词条 - 复杂查询在curl中容易URL编码失败,优先用Python - PubMed JSON返回的ID列表键名为小写 `idlist`,不是 `IdList`(驼峰) **阶段C — 跨域交叉搜索(重要!2026-06-15新增)**: 执行完阶段A+B后,如果主搜索未直接命中与用户**核心课题**(乳酸化/mitoxyperiosis/代谢/线粒体)相关的论文,必须额外运行一次交叉搜索,将当前主题关键词与核心课题关键词组合: ```python # 示例:脑缺血方向 + 核心课题交叉搜索 query = '(cerebral ischemia[Title/Abstract] OR stroke[Title/Abstract] OR MCAO[Title/Abstract]) AND (lactate[Title/Abstract] OR glycolysis[Title/Abstract] OR mitochondrial[Title/Abstract] OR cell death[Title/Abstract] OR LDHA[Title/Abstract] OR lactylation[Title/Abstract] OR Warburg[Title/Abstract]) AND ("Nature"[Journal] OR "Cell"[Journal] OR "Cell Reports"[Journal] OR "Nature Communications"[Journal] OR ...) AND ("2026/03/15"[dp] : "2026/06/15"[dp])' ``` ⚠️ **为什么需要阶段C**:主搜索(阶段A+B)范围广但深度浅——高IF期刊的脑缺血论文可能集中在临床/影像/流行病学方向,而用户核心课题(乳酸化、线粒体动力学、代谢重编程)的交叉论文可能剂量小但价值极高。阶段C就是专门挖掘这些"交叉价值论文"。 **核心课题关键词池**(所有方向通用,根据当前主题选择2-3个组合): - `lactate[Title/Abstract] OR glycolysis[Title/Abstract] OR LDHA[Title/Abstract]` - `mitochondrial[Title/Abstract] OR mitoxyperiosis[Title/Abstract] OR cristae[Title/Abstract]` - `lactylation[Title/Abstract] OR Kla[Title/Abstract] OR PCAF[Title/Abstract] OR p300[Title/Abstract]` - `cell death[Title/Abstract] OR ferroptosis[Title/Abstract] OR apoptosis[Title/Abstract] OR parthanatos[Title/Abstract]` - `Warburg[Title/Abstract] OR TCA cycle[Title/Abstract] OR OXPHOS[Title/Abstract]` - `RhoA[Title/Abstract] OR cytoskeleton[Title/Abstract] OR actin[Title/Abstract]` **决策规则**:阶段C发现的论文即使评分与其他候选持平,也应优先推送(因与核心课题的相关性天然更高)。若阶段C找到的论文评分与阶段B的最优论文差距在5分以内,优先选阶段C的论文。 ### Step 2: 获取论文详情并初筛 ```python import urllib.request, json, time # 获取元数据 url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=<PMIDs>&retmode=json' # 获取摘要(XML模式比esummary更可靠) url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=<PMIDs>&retmode=xml' ``` **初筛硬性门槛**(必须同时满足): 1. IF > 10 或领域顶刊 2. 2026年3月后发表 3. 与用户研究方向直接相关 4. 原创研究(pubtype不含Review) 5. 非 News & Views / Letter / 编辑评论 6. **有作者(推荐用 efetch XML 确认,而非 esummary 的 hascreator 字段)** - `esummary`的`hascreator`字段有时不可靠:已验证所有高IF期刊论文均返回`hascreator='N'`(即使是多作者原创研究)。 - 改用 efetch XML:统计`<LastName>`标签数(`len(re.findall(r'<LastName>', article_text))`),大于0即有作者。注意不要用`r'<Author>'`做regex——XML实际格式为`<Author ValidYN="Y">`(带属性),`r'<Author>'`无法匹配返回0。 - 若只能用 esummary:不要仅因`hascreator='N'`排除——结合`pubtype`综合判断。 ⚠️ **esummary 的 HasAbstract 字段同样不可靠**(2026-06-13发现):在脑缺血/卒中方向搜索中,高IF期刊论文(Nat Neurosci, Cell Rep, Brain, Nat Commun等)的 esummary `attributes` 数组均不包含 `HasAbstract`(即使论文明确有完整摘要)。**不要用 `HasAbstract in attributes` 做初筛条件**——这会导致几乎所有高IF论文被误排除。改用 efetch XML 的 `<AbstractText>` 标签计数来确认:`len(re.findall(r'<AbstractText', xml_text)) > 0`。 ### Step 3: Claude Code深度评估(Hermes + Claude 双重评判) 将经过初筛的候选论文交给 Claude Code 严格评估。⚠️ **提示词控制**:每篇论文只用2-3句描述摘要核心发现,不传完整XML摘要——传完整的PubMed XML摘要会导致Claude超时(exit 124)。同时用 `2>&1 | head -N` 截断输出防止Claude生成过长。 **推荐:分2-3批评估**,每批2篇论文(不超3篇),使用结构化JSON输出格式: ```bash claude -p "严格评估以下论文,从5个维度各评1-10分。只输出JSON格式评分结果,不要多余文字。 论文1: PMID XXXXX, Journal 'Title' 1-2句核心发现摘要。 论文2: PMID YYYYY, Journal 'Title' 1-2句核心发现摘要。 5维度: 1.逻辑完整性(gain+loss of function, KO/KI模型) 2.方法学水平(单细胞测序、冷冻电镜、活细胞成像等) 3.创新性(新概念/新机制,首次报道) 4.可借鉴性(课题可复用性 — 用户研究mitoxyperiosis乳酸化,优先构象转换/蛋白互作/代谢酶-细胞器互作相关的实验设计) 5.临床转化潜力(体内验证或治疗意义) 输出格式: {\"paper1\":{\"总分\":N,\"细则\":{\"逻辑完整性\":N,\"方法学\":N,\"创新性\":N,\"可借鉴性\":N,\"临床转化\":N}},\"paper2\":{\"总分\":...}}" 2>&1 | head -40 ``` **分批次评估策略**: - 如果候选论文4+篇,先评最有希望的2篇(取命中主题最直接的),再评剩余2篇 - Claude对4+篇的批量评估容易超时或遗漏评分维度 - 每批之间无等待要求(每次都是独立claude进程) **5维度评分**: 1. **逻辑完整性** (1-10):完整正反验证(gain+loss of function)、KO/KI模型 2. **方法学水平** (1-10):前沿技术(单细胞测序、冷冻电镜、活细胞成像、TurboID等) 3. **创新性** (1-10):新概念/新机制/新方法,首次报道 4. **可借鉴性** (1-10):实验设计是否值得用户在自己的课题中参考 **← 这是最关键的维度** 5. **临床转化潜力** (1-10):体内验证或治疗意义 ### Step 4: 唯一推荐(核心规则) - Claude评分 < 30分(平均<6分)或任何单维度 < 5分 → **不推荐**,跳过 - 多个候选都达标 → **只选最优的1篇**,不要让用户选 - Hermes再次确认质量 + 相关性 + 可借鉴性 → 只有双方都确认才通过 - **如果这篇研究方向不匹配(如planarian flatworm研究),即使评分高也不能推** - **Figure可访问性也是决策因素之一**:若评分最高的论文所属期刊为Taylor & Francis/Cloudflare完全封锁型,且无法通过PMC或替代渠道获取Figures,而评分次高的论文有可下载的Figures,应优先选择次高——一篇有原图深入解读的论文价值高于一篇无图的纯文本推荐。⚠️ 同时满足以下条件时才触发该降级: - 最高分论文与次高分差 ≤ 3分(否则差太多不可替代) - 最高分论文的可借鉴性 < 10/10(若可借鉴性满分则仍应推送,即使无图) - 最高分论文确实无法通过任何途径获取Figures(PMC/ResearchGate/作者主页等均已排查) ### Step 5: 下载PDF + 提取Figure图片 **⚠️ 预清理:删除旧版fig*_b64.txt文件** 在开始下载前,务必先清理用户目录下所有旧fig*_b64.txt文件,避免前一日的残留文件混入当前日期的图片构建: ```bash rm -f /c/Users/forgase/fig*_b64.txt ``` Python builder脚本使用`glob.glob("C:/Users/forgase/fig*_b64.txt")`读取所有此类文件,若不清理,旧论文的多余图片会污染当前HTML构建。 **PDF下载**: - Nature系列: `https://www.nature.com/articles/{article_id}.pdf`(已验证:Nature, Nat Neurosci, Nat Med, Nat Commun均可下载) - Cell Press/Elsevier: PDF有ScienceDirect防护,通常无法通过curl下载 - **Taylor & Francis期刊**(Gut Microbes等):PDF完全被Cloudflare封锁,`curl`返回403,**无法下载** - 所有期刊的Figure图片可通过CDN下载(详见`references/figure-download-patterns.md`) - 注意:部分期刊有付费墙,PDF可能无法下载,此时跳过PDF但继续下载Figure图片 **PDF真实性验证**:下载PDF后用 `file` 命令确认是否为真实PDF: ```bash file "/c/Users/forgase/s41593-026-02317-w.pdf" # 预期输出: "PDF document, version 1.4, N page(s)" ✅ # 若输出: "HTML document" ❌ — 反爬重定向,非真实PDF # 已验证:Nat Commun的unedited version PDF可能返回HTML document ``` **Figure可访问性检查(核心逻辑)**:收到Step 4的推荐决策后,尝试下载Figure前先快速验证期刊类型及其可访问性: ```bash # 快速检查:尝试curl访问Figure URL的HTTP状态码 # HTTP 200 → 可下载 ✅ # HTTP 403/Cloudflare → 被封锁 ❌ # 若被封锁,且已按Step 4评估后仍推荐该论文,则Figures部分使用纯文本解读 ``` ### 提取文章元数据(Nature系列 — 替代efetch XML的简便方法) Nature文章页HTML的 `<head>` 嵌入了 `citation_*` 元标签,通过 `curl` + Python可干净提取标题、作者、期刊、日期,比解析efetch XML更简洁: ```bash curl -s -L -H "User-Agent: Mozilla/5.0" "https://www.nature.com/articles/${ARTICLE_ID}" \ | python -c " import sys, re html = sys.stdin.read() authors = re.findall(r'<meta[^>]*name=\"citation_author\"[^>]*content=\"([^\"]+)\"', html) print('Authors: ' + '; '.join(authors[:5]) + ('...' if len(authors)>5 else '')) date = re.findall(r'<meta[^>]*name=\"citation_date\"[^>]*content=\"([^\"]+)\"', html) print('Date: ' + (date[0] if date else 'N/A')) journal = re.findall(r'<meta[^>]*name=\"citation_journal_title\"[^>]*content=\"([^\"]+)\"', html) print('Journal: ' + (journal[0] if journal else 'N/A')) title = re.findall(r'<meta[^>]*name=\"citation_title\"[^>]*content=\"([^\"]+)\"', html) print('Title: ' + (title[0] if title else 'N/A')) " ``` 将提取到的元数据直接用于Step 6的HTML头部。已验证适用于Nature、Nat Neurosci、Nat Med、Nat Commun。 ⚠️ **Date fallback:`citation_date` 可能为空**(2026-06-14发现):部分Nature系列文章(如Nat Neurosci s41593-026-02317-w)的 `<meta name="citation_date">` 标签不存在或内容为空。此时应降级使用 `citation_online_date` 或 `dc.date`: ```bash # Fallback date extraction — 按优先级尝试 date = re.findall(r'<meta[^>]*name=\"citation_online_date\"[^>]*content=\"([^\"]+)\"', html) if not date: date = re.findall(r'<time[^>]*datetime=\"([^\"]+)\"', html) # 取第一个time标签 if not date: date = re.findall(r'<meta[^>]*name=\"dc\.date\"[^>]*content=\"([^\"]+)\"', html) print('Date: ' + (date[0] if date else 'N/A')) ``` ### Nature系列PDF下载注意 - Nature系列: `https://www.nature.com/articles/{article_id}.pdf`(已验证:Nature, Nat Neurosci, Nat Med可下载真实PDF) - **Nat Commun PDF ⚠️ 情况不定**:已验证部分Nat Commun文章返回真实PDF(PMID 42098118, 20页PDF),部分返回HTML反爬页面(PMID 42140967)。下载后用`file`命令确认:若为"HTML document"则非真实PDF,跳过;若为"PDF document"则保留使用。 - Cell Press/Elsevier: PDF有ScienceDirect防护,通常无法通过curl下载 **Figure提取(Nature系列通用模式 — 已验证 Nature, Nat Neurosci, Nat Med, Nat Commun)**: ```bash # 1. 从HTML中提取所有figure图片URL,获取实际图数(推荐,避免硬编码遗漏) echo "=== Auto-detect figure count from CDN ===" for fig in $(seq 1 12); do # 枚举到12足以覆盖几乎所有论文 url="https://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2F${ARTICLE_ID}/MediaObjects/${JOURNAL_ID}_${YEAR}_${ARTICLE_NUM}_Fig${fig}_HTML.png" code=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" -H "User-Agent: Mozilla/5.0") if [ "$code" != "404" ]; then echo "Fig${fig}: HTTP ${code}" else FIG_COUNT=$((fig - 1)) echo "Last figure found: Fig${FIG_COUNT}" break fi done # 2. 暴力枚举法(传统版)— 扩展至1-12,并验证文件实际写入 for fig in $(seq 1 12); do url="https://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2F${ARTICLE_ID}/MediaObjects/${JOURNAL_ID}_${YEAR}_${ARTICLE_NUM}_Fig${fig}_HTML.png" code=$(curl -s -o /dev/null -w "%{http_code}" -L "$url" -H "User-Agent: Mozilla/5.0") if [ "$code" != "404" ]; then curl -s -L "$url" -H "User-Agent: Mozilla/5.0" | base64 -w 0 > "/c/Users/forgase/fig${fig}_b64.txt" fi done # 3. ✅ 验证所有文件实际写入成功(2026-06-13 pitfall:curl pipe可能报告成功但文件未写出) echo "=== Verification ===" for fig in $(seq 1 12); do if [ -f "/c/Users/forgase/fig${fig}_b64.txt" ]; then size=$(wc -c < "/c/Users/forgase/fig${fig}_b64.txt") echo "Fig${fig}: OK (${size} bytes)" else echo "Fig${fig}: MISSING — re-downloading..." # Re-download individually url="https://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2F${ARTICLE_ID}/MediaObjects/${JOURNAL_ID}_${YEAR}_${ARTICLE_NUM}_Fig${fig}_HTML.png" curl -s -L "$url" -H "User-Agent: Mozilla/5.0" | base64 -w 0 > "/c/Users/forgase/fig${fig}_b64.txt" fi done ``` **Figure Caption提取(Nature系列 — 紧跟图片下载之后执行)**: 提取的caption将用于Step 6 HTML的逐图解读部分。 ```bash curl -s -L -H "User-Agent: Mozilla/5.0" "https://www.nature.com/articles/${ARTICLE_ID}" \ | python -c " import sys, re html = sys.stdin.read() figs = re.findall(r'<fig[^>]*>.*?<figcaption[^>]*>(.*?)</figcaption>', html, re.DOTALL) for i, fig in enumerate(figs): clean = re.sub(r'<[^>]+>', '', fig).strip() print(f'{i+1}|{clean}') " ``` 然后将captions嵌入Python builder脚本中的字典变量。 **Cell Press系列Caption处理**(Neuron, Cell Metabolism等): Cell Press主页面(www.cell.com / sciencedirect.com)使用Cloudflare防护,curl无法获取页面HTML,因此`<figcaption>`标签无法通过解析提取。已验证策略: - **手动构建**:基于abstract和论文结构(典型结构:Fig1=现象发现, Fig2-3=分子机制, Fig4-5=通路验证, Fig6=体内验证, Fig7-8=深入机制+模型)合理推断各Figure内容。注意Neuron论文可能达8-9张图(PMID 42263678共8张Fig),枚举至少到9。 - **模板**:将描述写为Python字典嵌入builder脚本(详见 `references/figure-download-patterns.md` "Pre-print / Unedited Version Caption 降级策略") - Cell Press的Figure图片仍可通过 `ars.els-cdn.com` CDN直接下载(JPEG格式),不受Cloudflare影响 ⚠️ **预印本降级策略**:若上述caption提取命令输出为空(无 `<figcaption>` 标签命中),检查文章页是否有"unedited version"或"preprint"标记。 但注意:**并非所有unedited version都无caption**——已验证Nat Commun unedited version (s41467-026-72783-0) 可正常提取9条figcaptions。只有在regex确实返回空时才需要降级。 此时Figure图片仍可通过CDN下载,但caption需手动构建。策略: - 基于abstract和论文结构合理推断各Figure内容(已验证可行) - 将手动编写的caption以Python字典嵌入builder脚本 - 详见 `references/figure-download-patterns.md` "Pre-print / Unedited Version Caption 降级策略" 详见 `references/figure-download-patterns.md` 中的"Figure Caption Extraction"节。 **分辨率选择**: - `/full/` — 最高分辨率(推荐用于Fig1) - `/lw1200/` — 宽1200px(其余Figures用此即可) - `/m685/` — 宽685px(小尺寸) **已验证的图片URL模式**: | 期刊 | article_id 示例 | journal_id | URL模式 | |------|----------------|------------|---------| | Nature | s41586-026-10597-2 | 41586 | `41586_{year}_{article_num}_Fig{N}_HTML.png` | | Nat Neurosci | s41593-026-02317-w | 41593 | `41593_{year}_{article_num}_Fig{N}_HTML.png` | | Nat Med | s41591-026-04393-8 | 41591 | `41591_{year}_{article_num}_Fig{N}_HTML.png` | | Nat Commun | s41467-026-73538-7 | 41467 | `41467_{year}_{article_num}_Fig{N}_HTML.png` | - article_num 提取规则:从DOI中提取期刊ID后的倒数第二段(年份后的数字段),去前导零。如 `s41591-026-04393-8` → 编号段为 `04393` → 去前导零 → `4393`;`s41586-026-10480-0` → `10480`(无前导零即可直接使用) → `10480`;`s41593-026-02317-w` → `02317` → `2317` - 参考 `references/figure-download-patterns.md` 获取各期刊的详细URL模式 ### Step 6: 制作公众号风格HTML深度解读 — Python Builder模式 HTML文件要求: 1. 所有图片用base64 data URI嵌入,不使用file://协议 2. 仿公众号排版(简洁卡片风格) 3. 每条结果配"复现建议"(🔬 复现建议:...) 4. 包含论文原图 + 逐图解读 + 文章小结 5. 保存到桌面,文件命名规范:`{主题关键词}_文献解读.html` **实现方法**(大文件HTML必须用此方式): 由于HTML含多张base64图片(5-7张原图≈10MB),不能直接用`write_file`写入。 **推荐:使用 `<img>` 标签直接嵌入base64(比canvas方案更简单)**。优点:无f-string花括号冲突风险,代码量少一半,浏览器兼容性最好。Nature系列PNG格式用 `data:image/png;base64,`,Cell Press系列JPEG格式用 `data:image/jpeg;base64,`。 ```python # 1. 先读取base64数据(terminal工具写入的文件) # ⚠️ 重要:Windows Python不识别 MSYS 路径 /c/Users/,必须用 C:/Users/ 格式 # ⚠️ 不硬编码range(1,8):读取所有fig{N}_b64.txt文件 import os, glob fig_b64 = {} for f_path in glob.glob("C:/Users/forgase/fig*_b64.txt"): i = os.path.basename(f_path).replace('fig', '').replace('_b64.txt', '') with open(f_path, "r") as f: fig_b64[f"fig{i}"] = f.read().strip() print(f"Loaded Fig{i}: {len(fig_b64[f'fig{i}'])} bytes") # 2. 构建HTML字符串 — 使用字符串列表拼接避免f-string与JS花括号冲突 def build_html(fig_b64, captions): parts = [] parts.append('<!DOCTYPE html>') parts.append('<html><head>...CSS...</head><body>') for i in range(1, max_fig + 1): # 直接用img标签嵌入base64,无JavaScript花括号冲突风险 parts.append(f'<img src="data:image/png;base64,{fig_b64["fig"+str(i)]}" alt="Fig{i}">') parts.append(f'<p><b>Figure {i}</b> | {captions[i]}</p>') parts.append('</body></html>') return '\n'.join(parts) # 3. 写入文件(使用正斜杠路径,Windows Python可识别) with open("C:/Users/forgase/Desktop/主题_文献解读.html", "w", encoding="utf-8") as f: f.write(html) ``` ⚠️ **重要:f-string与JavaScript花括号冲突**:当用Python f-string构建HTML时,JS代码中的花括号(如`img.onload = function() { }`)会与f-string语法冲突。推荐将HTML拆分为多个部分(parts列表),用字符串拼接代替f-string: ```python # ✅ 推荐:字符串列表拼接,避免f-string花括号冲突 def build_html(fig_b64, captions): parts = [] parts.append('<!DOCTYPE html>') parts.append('<html>') parts.append('<script>') # ⚠️ 可检测实际图数,不硬编码range(1,8);先统计fig_b64中的键数量 fig_count = max([int(k.replace('fig', '')) for k in fig_b64.keys()]) for i in range(1, fig_count + 1): parts.append(f'const fig{i}_b64 = "{fig_b64.get(f"fig{i}", "")}";') parts.append(''' function loadImage(id, b64) { var canvas = document.getElementById(id); if (!canvas || !b64) return; ... } ''') parts.append('</script>') return '\n'.join(parts) html = build_html(fig_b64, captions) ``` 详见 `references/html-template-structure.md` 获取完整模板框架。 **流程总结**(4步): 1. terminal中`curl | base64 -w 0 > /c/Users/forgase/fig{N}_b64.txt` 下载所有图片(保存到用户目录,/tmp/在沙箱中不可用) 2. 用`write_file`将Python builder脚本写入磁盘(不含base64数据,仅模板) 3. terminal中执行该Python脚本(`python C:/Users/forgase/build_html.py` — 读取base64文件 + 构建完整HTML + 写入桌面) 4. 验证文件存在:`ls -la /c/Users/forgase/Desktop/` **HTML模板风格指南**: - 头部:渐变深蓝背景 + 白色文字 + 标签徽章 - 评分卡片:flex网格布局,5维度评分条 - 图文解读:Fig依次排列,每个配caption+复现建议 - 文章小结:渐变紫色卡片 - 参考 `references/html-template-structure.md` 获取模板框架 ### Step 7: 微信推送(⚠️ 关键:必须附上HTML文件) **格式要求**(最终消息末尾必须加上 `MEDIA:路径` 发送HTML文件): ``` 🔥 {时间} {主题} 论文推荐 🏆 标题... 📅 日期 | 📰 期刊 ⭐ Claude评分:X/50 💡 推荐理由:... MEDIA:C:\Users\forgase\Desktop\{文件名}.html ``` **如果当天没有好论文**: ``` 📭 {时间} {主题} 今天这个方向没有找到特别适合你的论文,下次继续关注。 ``` **⚠️ 不做这个动作等于白做**:最终推送消息的末尾必须包含 `MEDIA:C:\Users\forgase\Desktop\{文件名}.html` 这一行,这样微信才会把HTML文件当作附件发送给用户。只发文字说"已生成"用户收不到文件。 ## 核心原则 - **只推1篇** —— 不要让用户从多个候选中选择 - **宁缺毋滥** —— 没有好论文就不推,不强塞 - **Hermes + Claude 双重把关** —— 都确认完美适合才推 - **不推荐综述**(除非极高价值) - **不推荐与用户研究方向无关的论文**(即使评分高) - **推送语言用中文**,但保留英文标题和期刊名 - **⚠️ 必须将HTML文件附在消息中一起发送** — 在最终消息末尾加上 `MEDIA:/path/to/file`,确保用户直接收到文件,不是只发文字 ## 参考文件 - `references/pubmed-API-patterns.md` — PubMed E-utilities API详细笔记 - `references/figure-download-patterns.md` — Nature/Cell/Elsevier等期刊的图片URL提取模式 - `references/github-access-patterns.md` — 国内网络环境下载GitHub资源的优先级策略 The user has provided the following instruction alongside the skill invocation: [IMPORTANT: You are running as a scheduled cron job. DELIVERY: Your final response will be automatically delivered to the user — do NOT use send_message or try to deliver the output yourself. Just produce your report/output as your final response and the system handles the rest. SILENT: If there is genuinely nothing new to report, respond with exactly "[SILENT]" (nothing else) to suppress delivery. Never combine [SILENT] with content — either report your findings normally, or say [SILENT] and nothing more.] 你是Hermes,用户的科研助手。加载 paper-recommender skill。 **今日任务**:搜索生信分析/前沿技术方向论文,经过Hermes+Claude双重评判后**只推最优的1篇**,做成HTML深度解读并**必须在消息末尾用MEDIA:路径附上HTML文件**。 **主题关键词**:single-cell, multi-omics, bioinformatics, spatial transcriptomics, computational biology **⚠️ 额外关注以下交叉领域(与用户博一课题相关,同评分时优先推送)**: - 单细胞代谢组学 / 空间代谢组学在肿瘤研究中的应用 - 肿瘤代谢异质性的单细胞/空间分析 - 乳酸化修饰组学(Kla proteomics)新技术 - 代谢物-蛋白互作筛选平台 - 代谢组学+转录组学的多组学整合方法 - 基因依赖图谱/CRISPR筛选鉴定代谢脆弱性 日期范围:最近3个月内发表 期刊要求:IF > 10 或领域顶刊 遵循 paper-recommender skill 的完整6步流程。 **⚠️ 必须:最终微信消息末尾加上 `MEDIA:C:\Users\forgase\Desktop\{文件名}.html`**
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"success": true, "name": "paper-recommender", "file": "references/pubmed-API-patterns.md", "content": "# PubMed E-utilities API 实战笔记\n\n## 搜索API (esearch)\n\n### 基础结构\n```\nbase: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi\nparams:\n db=pubmed\n term=<查询表达式>\n mindate/maxdate — 日期范围\n datetype=pdat (publication date)\n retmax=N — 最大返回数\n retmode=json\n```\n\n### 已验证的关键查询模式\n\n**1. 主题搜索 + 高影响因子期刊过滤**\n```python\nquery = 'Parkinson AND (\"Nat Neurosci\"[Journal] OR \"Neuron\"[Journal] OR \"Nat Med\"[Journal])'\n```\n- 期刊名必须用 `\"全称\"[Journal]` 格式\n- 不加 `[Journal]` 标签时,`Neuron` 被翻译为 MeSH 的 neurons(神经元)\n- `Brain` 被翻译为 MeSH 的 brain(大脑解剖结构)\n\n**2. 细胞死亡方向搜索(已验证)**\n```python\nquery = '(ferroptosis[Title/Abstract] OR pyroptosis[Title/Abstract] OR ' + \\\n 'necroptosis[Title/Abstract] OR PANoptosis[Title/Abstract] OR ' + \\\n 'cuproptosis[Title/Abstract] OR mitophagy[Title/Abstract]) ' + \\\n 'AND (Nature[Journal] OR \"Cell\"[Journal] OR \"Nat Cell Biol\"[Journal] OR ' + \\\n '\"Cell Death Differ\"[Journal] OR \"Nat Commun\"[Journal])'\n```\n这个模式成功返回了 Nature 的精胺/铁死亡论文。\n\n**3. 已测试的有效期刊标签**\n| 期刊 | 标签 |\n|------|------|\n| Nature | `\"Nature\"[Journal]` |\n| Nature Communications | `\"Nat Commun\"[Journal]` |\n| Nature Medicine | `\"Nat Med\"[Journal]` |\n| Neuron | `\"Neuron\"[Journal]` |\n| Nature Neuroscience | `\"Nat Neurosci\"[Journal]` |\n| Science Advances | `\"Sci Adv\"[Journal]` |\n| Cell | `\"Cell\"[Journal]` |\n| Cell Reports | `\"Cell Rep\"[Journal]` |\n| Cell Death & Differentiation | `\"Cell Death Differ\"[Journal]` |\n| EMBO Journal | `\"EMBO J\"[Journal]` |\n| Molecular Neurodegeneration | `\"Mol Neurodegener\"[Journal]` |\n| EBioMedicine | `\"EBioMedicine\"[Journal]` |\n| Acta Neuropathologica | `\"Acta Neuropathol\"[Journal]` |\n| Brain | `\"Brain\"[Journal]` |\n| Lancet Neurology | `\"Lancet Neurol\"[Journal]` |\n| Developmental Cell | `\"Dev Cell\"[Journal]` |\n| Nature Structural & Molecular Biology | `\"Nat Struct Mol Biol\"[Journal]` |\n| Cell Metabolism | `\"Cell Metab\"[Journal]` |\n| Molecular Cell | `\"Mol Cell\"[Journal]` |\n| Nature Cell Biology | `\"Nat Cell Biol\"[Journal]` |\n\n### 已验证无用的参数\n- `sort=date` — 返回 warning \"Unknown sort schema 'date' ignored\",无效果\n\n## 获取详情API (esummary)\n\n```python\nurl = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=<PMID列表>&retmode=json'\n```\n\n关键字段:\n- `pubtype` — 文章类型(含 \"Review\" 则为综述)\n- `source` — 期刊缩写名\n- `articleids[].idtype == 'doi'` — DOI\n- `attributes` — 包含 \"Has Abstract\" 则有摘要\n- `pubdate` — 发表日期\n- `title` — 标题\n\n⚠️ JSON的id列表键名是小写 `idlist`(如 `data['esearchresult']['idlist']`),不是驼峰的 `IdList`。\n\n## 获取摘要API (efetch)\n\n```python\n# XML模式(含结构化信息)\nurl = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=<PMIDs>&retmode=xml'\n\n# 文本模式(纯文本)\nurl = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=<PMIDs>&retmode=text&rettype=abstract'\n```\n\nXML模式返回结构化数据,适合用正则提取:\n- `<ArticleTitle>` → 标题(需要去除XML标签+解码HTML实体)\n- `<AbstractText>` → 摘要段落(可以有多段,Label属性区分BACKGROUND/METHODS/RESULTS/CONCLUSIONS)\n- `<ELocationID EIdType=\"doi\">` → DOI\n- `<Journal><Title>` → 期刊全名\n- `<PubDate><Year>` + `<Month>` → 发表日期\n\nXML转义字符需要手动替换:`&#x3b1;`=α, `&#x3b2;`=β, `&#x3b3;`=γ, `&#xa0;`=空格, `&#x2060;`='', `&amp;`=&, `&lt;`=<, `&gt;`=>\n\n## 获取文章元数据(XML完整模式)\n\n```python\nurl = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=<PMIDs>&retmode=xml'\n```\n此模式返回完整XML,包含:\n- 所有作者的 `<LastName>` 和 `<ForeName>`\n- 所有 `<Keyword>`\n- 完整 `<AbstractText>`(含多个Label段落)\n- `<PublicationTypeList>` 用于排除Review/Letter\n\n## 已知环境陷阱\n\n### 1. SSL连接失败(exit code 35 或 Python SSLEOFError)\n某些环境(git-bash/MSYS2)的 curl/Python SSL 版本与 NCBI 服务器不兼容。\n\n**curl 修复:**\n- 修复1:`curl -k`(跳过证书验证)\n- 修复2:更新 curl CA bundle\n\n**Python urllib 修复:**\n- Python `urllib.request` 在 MSYS2/git-bash 中同样可能 SSL 错误(`ssl.SSLEOFError: EOF occurred in violation of protocol`),并非\"100%可靠\"\n- 修复3:创建自定义 SSL context 跳过证书验证(**推荐**):\n ```python\n import ssl, urllib.request\n ctx = ssl.create_default_context()\n ctx.check_hostname = False\n ctx.verify_mode = ssl.CERT_NONE\n \n resp = urllib.request.urlopen(url, context=ctx, timeout=15)\n xml = resp.read().decode('utf-8')\n ```\n- 修复4:每次请求前 `time.sleep(0.5)` 降低 NCBI 速率限制触发概率。即使使用自定义 SSL context,高并发仍可能导致 SSL 断开。\n\n**最佳实践**(2026-06-15验证):\n```python\nimport ssl, time, urllib.request\nctx = ssl.create_default_context()\nctx.check_hostname = False\nctx.verify_mode = ssl.CERT_NONE\n\nfor pmid in pmids:\n time.sleep(0.5) # 始终加延迟\n try:\n url = f'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id={pmid}&retmode=xml'\n resp = urllib.request.urlopen(url, context=ctx, timeout=15)\n # ...处理XML\n except Exception as e:\n print(f\"PMID {pmid}: ERROR {e} — continuing\")\n # 单篇失败不应中断整个处理流程\n```\n**为什么同时需要 sleep + custom context?**\n- `time.sleep(0.5)` 防止速率限制(NCBI 每秒最多3次请求)\n- `ctx` 绕过 MSYS2/git-bash 的 CA bundle 兼容性问题\n- `try/except` 确保单篇失败不影响其余论文的获取\n\n### 2. URL编码问题\ncurl 命令行中包含 `()\"[]` 等特殊字符时容易解析失败(exit code 3)。\n- 修复:用 Python 的 `urllib.parse.urlencode(params)` 构建查询串\n\n### 3. `sort=date` 无效\nPubMed API 不支持 sort=date,搜索结果的顺序按默认相关性排序。\n\n### 4. `mindate`/`maxdate` + `datetype` 日期过滤可能返回0结果(2026-06发现)\n使用 `mindate` + `maxdate` + `datetype=pdat/ppub/edat` 参数组合对常见查询(如 `ferroptosis[Title/Abstract]`)返回 Count=0,不返回任何PMID。PubMed API的日期参数在此环境下不可靠。\n- **修复**:改用 `dp` (Date - Publication) 范围语法直接在查询串中限定日期:\n ```python\n query = 'ferroptosis[Title/Abstract] AND (\"2026/03/01\"[dp] : \"2026/06/12\"[dp])'\n params = {'db': 'pubmed', 'term': query, 'retmax': 20, 'retmode': 'json'}\n ```\n `dp` 语法格式:`\"YYYY/MM/DD\"[dp] : \"YYYY/MM/DD\"[dp]`(日期之间需加空格、冒号、空格)。此方式已验证可靠(2026-06-12),可与其他搜索条件自由组合(如 AND 期刊过滤)。\n\n### 5. execute_code 中 PubMed API 不稳定\n在 `execute_code` 沙箱中连续调用 PubMed API 时,部分请求返回空结果(即使同一查询在前一个调用中返回了完整的 count 和 idlist)。推测与沙箱环境的连接复用或速率限制有关。\n- 修复:将 PubMed API 调用移至 `terminal(\"python << 'PYEOF' ... PYEOF\")` 中执行。terminal 中的 Python 连接更稳定,返回结果一致。\n- 如果结果仍然为空,在两次 API 调用之间添加 `time.sleep(0.5)` 避免速率限制。\n- 即使使用 terminal Python,SSL 仍可能失败——参考 pitfall #1 的修复3(custom SSL context+try/except)。\n\n### 6. Cell/Elsevier 付费墙\nCell Press(www.cell.com、linkinghub.elsevier.com)对脚本访问严格限制,返回403/超时。Nature系列(www.nature.com)通常可正常访问。\n- 但 Cell Press 的 Figure 图片通过 `ars.els-cdn.com` CDN 分发,可直接通过 curl 下载(详见 figure-download-patterns.md)。\n- PII 中的括号和连字符需要清理:`tr -d '()-'`\n\n### 7. esummary hascreator 字段不可靠(2026-06发现)\n`esummary` 返回的 `hascreator` 字段对所有高IF期刊论文均显示 `'N'`,即使论文明确有多位作者。\n```json\n// 示例:Nat Med 2026 phase 1 clinical trial,真实作者8人,但:\n\"hascreator\": \"N\" // ❌ 错误\n```\n**对策**:改用 efetch XML 确认作者——统计 `<Author>` 标签数。不要因 `hascreator='N'` 排除论文。\n\n### 8. esummary HasAbstract 字段同样不可靠(2026-06-13发现)\n高IF期刊论文(Nat Neurosci, Cell Rep, Brain, Nat Commun等)的 esummary `attributes` 数组均不包含 `HasAbstract`(即使论文明确有完整摘要)。\n- ❌ 不要用 `HasAbstract in attributes` 做初筛条件——这会导致几乎所有高IF论文被误排除\n- ✅ 改用 efetch XML 的 `<AbstractText>` 标签计数来确认:`len(re.findall(r'<AbstractText', xml_text)) > 0`\n\n### 9. efetch XML 批取污染(2026-06-14新发现)\n当一次efetch调用传入**多个PMID**时,XML响应可能返回**所有PMID共享同一篇文章数据**(而非各自独立的元数据)。2026-06-14在35个PMID批取调用中,所有16篇高IF论文均返回了同一篇的标题和摘要(GPNMB Neuron论文)。\n\n**现象**:不同PMID的efetch返回相同的`<ArticleTitle>`和`<AbstractText>`,即使已知这些是不同论文。\n\n**修复**:逐篇单独fetech,每篇之间加延迟避免速率限制。**同时使用自定义 SSL context 防止 SSL EOF 中断批量处理**:\n```python\nimport ssl, time, urllib.request\n\nctx = ssl.create_default_context()\nctx.check_hostname = False\nctx.verify_mode = ssl.CERT_NONE\n\nfocus_pmids = ['42127911', '42140967', '42191733'] # 逐篇取\nfor pmid in focus_pmids:\n time.sleep(0.5) # 关键:避免速率限制\n try:\n url = f'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id={pmid}&retmode=xml'\n req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})\n resp = urllib.request.urlopen(req, context=ctx, timeout=15)\n xml_text = resp.read().decode('utf-8')\n # 每篇独立处理,结果可靠\n except Exception as e:\n print(f\"PMID {pmid} failed: {e}\")\n continue # 单篇失败不阻塞全流程\n```\n**原因推测**:PubMed API在批取时可能复用连接或缓存,导致高并发下返回错位数据。SSL EOF 部分源于 MSYS2 的 OpenSSL 兼容性问题,非 NCBI 端问题。\n\n## 搜索策略(两阶段工作流)\n\n### 阶段A:主题搜索\n```python\n# 方式A: mindate/maxdate(可能不可靠)\nparams = {\n 'db': 'pubmed', 'term': query,\n 'mindate': '<3月前>', 'maxdate': '<今天>',\n 'datetype': 'pdat', 'retmax': 10, 'retmode': 'json'\n}\n# 方式B: [dp] 语法(推荐,已验证可靠)\nquery = 'Parkinson disease AND (\"2026/03/01\"[dp] : \"2026/06/12\"[dp])'\nparams = {\n 'db': 'pubmed', 'term': query,\n 'retmax': 10, 'retmode': 'json'\n}\n```\n目的:了解该时段总论文数,获取基础PMID列表。\n\n### 阶段B:高影响力期刊过滤\n```python\njournal_tags = '\"Nature\"[Journal] OR \"Cell\"[Journal] OR \"Nat Neurosci\"[Journal] OR ...'\nquery = f'({topic_keywords}) AND ({journal_tags}) AND (\"<3月前>\"[dp] : \"<今天>\"[dp])'\n```\n目的:直接命中IF>10的高质量论文。\n\n### 初筛要点\n1. `pubtype` 不含 \"Review\" → 原创研究\n2. `attributes` 含 \"Has Abstract\" → 有摘要\n3. 检查 DOI 是否存在\n4. 排除 \"No authors listed\" → 通常是 News & Views\n5. 从XML提取时注意HTML实体和Unicode转义符的处理\n\n## 评分标准(Claude评估用,5维度各1-10分)\n\n| 维度 | 评分要点 | 满分 |\n|------|---------|------|\n| 逻辑完整性 | gain+loss of function, KO/KI, 正反验证 | 10 |\n| 方法学水平 | 单细胞测序, 冷冻电镜, 活细胞成像, TurboID等 | 10 |\n| 创新性 | 新概念/新机制/新方法 | 10 |\n| 可借鉴性 | 实验设计可迁移到神经科学 **←最重要维度** | 10 |\n| 临床转化潜力 | 体内验证, 治疗靶点, 生物标志物 | 10 |\n\n总分 >= 30 且各维度 >= 5 → 候选。再经Hermes确认相关性后,只选最优1篇。\n", "file_type": ".md"}
tool · 2026-06-15 19:05
{"success": true, "name": "paper-recommender", "file": "references/figure-download-patterns.md", "content": "# 学术期刊Figure下载模式速查\n\n## Nature 系列\n**已验证模式(2026年6月)**\n\n### URL结构\n```\n文章页: https://www.nature.com/articles/{article_id}\nFigure URL: https://media.springernature.com/{size}/springer-static/image/\n art%3A10.1038%2F{article_id}/MediaObjects/\n {journal_id}_{year}_{article_num}_Fig{N}_HTML.png\n```\n\n### 关键参数\n- `{article_id}`: DOI中`/`后面的部分,如 `s41586-026-10597-2` 或 `s41593-026-02317-w`\n- `{journal_id}`: 期刊对应数字ID,如 Nature=41586, Nat Neurosci=41593\n- `{year}`: 4位年份\n- `{article_num}`: 文章编号(DOI中期刊ID后的数字部分,去连字符)\n- `{N}`: Figure编号(1-7)\n\n### 实例1 (Nature — 已验证 2026-06)\n```\narticle_id = s41586-026-10597-2\njournal_id = 41586\nyear = 2026\narticle_num = 10597\n\nFig1 全尺寸:\nhttps://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41586-026-10597-2/MediaObjects/41586_2026_10597_Fig1_HTML.png\n\nFig2 中等尺寸:\nhttps://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2Fs41586-026-10597-2/MediaObjects/41586_2026_10597_Fig2_HTML.png\n```\n\n### 实例2 (Nature Neuroscience — 已验证 2026-06)\n```\narticle_id = s41593-026-02317-w\njournal_id = 41593\nyear = 2026\narticle_num = 2317\n\nFig1 全尺寸:\nhttps://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41593-026-02317-w/MediaObjects/41593_2026_2317_Fig1_HTML.png\n\nFig4 中等尺寸:\nhttps://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2Fs41593-026-02317-w/MediaObjects/41593_2026_2317_Fig4_HTML.png\n```\n\n### 实例4 (Nature Neuroscience — 新验证 2026-06-12, PMID: 42151483)\n```\narticle_id = s41593-026-02267-3\njournal_id = 41593\nyear = 2026\narticle_num = 2267 # 从DOI \"02267\" 去前导零 → 2267\n\nFig1-Fig7 (lw1200): 全部 HTTP 200 ✅\nFig1-Fig3 (full): 全部 HTTP 200 ✅\nPDF: https://www.nature.com/articles/s41593-026-02267-3.pdf → HTTP 200, 26.7MB ✅\n```\n\n### 实例5 (Nature Communications — 新验证 2026-06-13, PMID: 42140967)\n```\narticle_id = s41467-026-73136-7\njournal_id = 41467\nyear = 2026\narticle_num = 73136 # 从DOI \"73136\" 无前导零 → 73136\n\nFig1 (full): HTTP 200 ✅ (1.5MB)\nFig2-Fig7 (lw1200): 全部 HTTP 200 ✅ (1.4-2.5MB each)\nPDF: https://www.nature.com/articles/s41467-026-73136-7.pdf → HTTP 200 ⚠️ 但返回的是HTML! file命令显示为\"HTML document\",非真实PDF\n\n关键发现:该文章为\"unedited version\"(预印本版),HTML页面仅含Abstract,\n不含<figcaption>标签。Figure图片通过CDN可正常下载,但caption无法从HTML提取。\n```\n\n### 实例3 (Nature Medicine — 已验证 2026-06-11)\n```\narticle_id = s41591-026-04393-8\njournal_id = 41591\nyear = 2026\narticle_num = 4393\n\nFig1 全尺寸:\nhttps://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41591-026-04393-8/MediaObjects/41591_2026_4393_Fig1_HTML.png\n\nFig2-Fig5 中等尺寸:\nhttps://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2Fs41591-026-04393-8/MediaObjects/41591_2026_4393_Fig2_HTML.png\n...\nhttps://media.springernature.com/lw1200/...41591_2026_4393_Fig5_HTML.png\n\narticle_num提取: DOI \"s41591-026-04393-8\" → 取s41591后的两段 \"026\"+\"04393\" → 截取最终段的数字部分去前导零 → 4393\n```\n\n### 快速提取命令(curl + grep)\n```bash\n# 1. 从HTML提取所有Figure URL(通用模式,匹配所有s415XX系列期刊)\ncurl -s -L -H \"User-Agent: Mozilla/5.0\" \\\n \"https://www.nature.com/articles/{article_id}\" \\\n | grep -oP 'https://media\\.springernature\\.com/[^\"'\"'\"'<>]+s415\\d\\d[^\"'\"'\"'<>]*\\.(?:png|jpg|webp)' \\\n | sort -u\n\n# 2. 下载某张Figure为base64(用于HTML嵌入)\nFIG_URL=\"https://media.springernature.com/lw1200/springer-static/image/art%3A10.1038%2F{article_id}/MediaObjects/{journal_id}_{year}_{article_num}_Fig{N}_HTML.png\"\ncurl -s \"$FIG_URL\" -H \"User-Agent: Mozilla/5.0\" | base64 -w 0 > /c/Users/<user>/fig{N}_b64.txt\n```\n\n### 分辨率控制\n| 前缀 | 尺寸 | 说明 |\n|------|------|------|\n| `/full/` | 全尺寸 | 最高分辨率,推荐用于Fig1。**注意:Nature系列通常在`/full/`下所有Figure都可用,非仅Fig1**(已验证:Nature s41586-026-10480-0, Fig1-Fig6全部200) |\n| `/lw1200/` | 宽1200px | 中等尺寸,适用于其余Figures。备选方案——若`/full/`偶有404,降级到此 |\n| `/m685/` | 宽685px | 小尺寸 |\n| `/w215h120/` | 215x120px | 缩略图 |\n\n### 实例6 (Nature — 新验证 2026-06-15, PMID: 42236947 — \"Spermine is an endogenous iron chelator\" — 5 figures!)\n```bash\narticle_id = s41586-026-10597-2\njournal_id = 41586\nyear = 2026\narticle_num = 10597 # 从DOI \\\"10597\\\" 无前导零 → 10597\n\nFig1-Fig5 (lw1200): 全部 HTTP 200 ✅ (单张0.6-1.3MB)\nFig1 (full): HTTP 200 ✅ (高清大图)\n图数检测:枚举到Fig5后Fig6返回404 → 正确检测到5张图\nCaption提取:5条全部命中✅\n\n关键发现:Nature期刊的article_num提取规则再验证\n- DOI: s41586-026-10597-2\n- 期刊ID后面的倒数第二段: \"10597\" (026-10597中的10597)\n- 无前导零 → 直接使用10597\n- URL: 41586_2026_10597_Fig{N}_HTML.png ✅\n```\n\n### 已验证的期刊ID\n| 期刊 | ID | DOI前缀 | 验证状态 |\n|------|----|---------|---------|\n| Nature | 41586 | s41586 | ✅ 2026-06 (Fig1-Fig6 all HTTP 200 full/) |\n| **Nature Medicine** | **41591** | **s41591** | **✅ 2026-06** |\n| **Nature Neuroscience** | **41593** | **s41593** | **✅ 2026-06 (两篇独立验证)** |\n| Nature Communications | 41467 | s41467 | 🟢 已验证 |\n| Cell Death & Differentiation | 41418 | s41418 | 🟢 已验证 |\n| Scientific Reports | 41598 | s41598 | 🟢 已验证 |\n| Cell Discovery | 41420 | s41420 | 🟢 已验证 |\n\n## Figure Caption Extraction (Nature系列 — HTML解析)\n\n### 为什么需要提取caption\nStep 6的HTML深度解读需要为每张Figure配caption文字。直接从Nature文章页HTML中提取比手动编写更准确、更快速。\n\n### 方法:Python regex解析 `<figcaption>`\n\nNature系列文章将figure caption放在 `<figcaption>` 标签内,可通过Python regex可靠提取:\n\n```bash\ncurl -s -L -H \"User-Agent: Mozilla/5.0\" \"https://www.nature.com/articles/{article_id}\" \\\n | python -c \"\nimport sys, re\nhtml = sys.stdin.read()\nfigs = re.findall(r'<fig[^>]*>.*?<figcaption[^>]*>(.*?)</figcaption>', html, re.DOTALL)\nfor i, fig in enumerate(figs):\n clean = re.sub(r'<[^>]+>', '', fig).strip()\n print(f'Figure {i+1}: {clean}')\n print()\n\"\n```\n\n### 已验证(2026-06-12,Nat Neurosci s41593-026-02267-3)\n所有7张Figure的caption均正确提取,如:\n```\nFigure 1: Fig. 1: CODEX workflow for neuroscience applications.\nFigure 2: Fig. 2: Multicellular characterization of the Aβ plaque microenvironment at the proteomic level.\nFigure 3: Fig. 3: CODEX-CNS reveals specific cellular interactions in human AD brain.\n...\n```\n\n### 在Builder脚本中使用captions\n将提取到的captions以Python变量形式嵌入builder脚本:\n```python\ncaptions = {\n 1: \"CODEX workflow for neuroscience applications.\",\n 2: \"Multicellular characterization of the Aβ plaque microenvironment at the proteomic level.\",\n # ...\n}\n```\n\n### 已知注意事项\n- Nature系列的 `<figcaption>` 目前为纯文本加少量 `<sup>` 标签,清理 `<sup>` 后即可得到干净文本\n- Cell Press系列因Cloudflare防护,页面HTML不可直接解析→caption需从PDF或手动输入\n- 部分文章可能有Extended Data Figures(Fig S1, S2等),主文章通常只含Fig1-Fig7,**但Nat Commun可能有Fig8、Fig9**(已验证:PMID 42098118,9张Figures)\n- **关键**:不要硬编码 `for fig in 1 2 3 4 5 6 7`,应先通过HTML提取或CDN枚举(到Fig9或Fig10)动态检测实际图数。若枚举只到Fig7,会漏掉Fig8/Fig9(图片数据不完整,HTML解读会缺失关键图)\n\n### Pre-print / Unedited Version Caption 降级策略\n**场景**:Nature Communications等期刊有时发布\"unedited version\"(未编辑预印本),HTML页面可能不含 `<figcaption>` 标签(regex提取返回空)。**但并非所有unedited version都无caption**——2026-06-14验证发现Nat Commun unedited version (s41467-026-72783-0) 可正常提取9条figcaptions。\n\n**检测方法**:查看文章HTML中是否有 \"unedited version\" 或 \"preprint\" 字样。然后尝试提取figcaptions:\n```bash\n# 先尝试提取——可能成功也可能失败\nfigs=$(curl -s -L -H \"User-Agent: Mozilla/5.0\" \"https://www.nature.com/articles/${ARTICLE_ID}\" \\\n | python -c \"\nimport sys, re\nhtml = sys.stdin.read()\nfigs = re.findall(r'<fig[^>]*>.*?<figcaption[^>]*>(.*?)</figcaption>', html, re.DOTALL)\nprint(len(figs))\n\")\necho \"Found ${figs} figcaptions\"\nif [ \"$figs\" -gt 0 ]; then\n echo \"Captions available — use them directly\"\nelse\n echo \"No captions — enter pre-print fallback\"\nfi\n```\n\n**降级流程**(仅在regex提取返回空时触发):\n1. **手动构建**:基于abstract和论文结构合理推断各Figure内容(推荐,已验证可行)。将描述写为Python字典嵌入builder脚本。\n2. **PDF提取**:尝试 pdftotext 从PDF中提取captions。但⚠️预印本PDF有时也返回HTML(部分nat commun),此时跳过。\n3. **浏览器工具**:用browser工具导航到文章页 → Accept cookies → 滚动查看是否有JS渲染的figure captions。部分预印本也未渲染。\n\n**手动构建模板**(适用于机制类论文):\n```python\ncaptions = {\n 1: \"Fig. 1: [主要发现,通常为表型]。[技术手段+关键结果1-2句]\",\n 2: \"Fig. 2: [机制研究,通常为通路验证]。[上游调控+下游效应]\",\n 3: \"Fig. 3: [分子机制细节]。[蛋白互作/修饰/定位数据]\",\n 4: \"Fig. 4: [细胞/体内功能验证]。[表型恢复/功能实验]\",\n 5: \"Fig. 5: [深入机制或转化研究]。[治疗/干预结果]\",\n 6: \"Fig. 6: [补充机制或模型验证]。[定量/统计支持]\",\n 7: \"Fig. 7: [整体工作模型示意图]。\"\n}\n```\n\n## Cell Press / Elsevier 系列(Cell Metabolism, Cell, Neuron, Cell Reports 等)\n**已验证模式(2026年6月)**\n\n### 关键发现:主页面有Cloudflare防护,但CDN图片可直接访问\nCell Press 主页面(www.cell.com)使用 Cloudflare 防护,curl/browser 直接访问被拦截。\n但图片通过 `ars.els-cdn.com` CDN 分发,**可以直接通过 curl 下载**。\n\n### URL结构(已成功用于 Cell Metabolism)\n```\nFigure N (大尺寸):\nhttps://ars.els-cdn.com/content/image/1-s2.0-{PII}-gr{N}_lrg.jpg\n\nFigure N (小尺寸):\nhttps://ars.els-cdn.com/content/image/1-s2.0-{PII}-gr{N}.jpg\n\nGraphical Abstract (大尺寸):\nhttps://ars.els-cdn.com/content/image/1-s2.0-{PII}-fx1_lrg.jpg\n```\n\n### 关键参数\n- `{PII}`: 文章的PII(Publisher Item Identifier),可从PubMed esummary结果中获取(idtype='pii'),或从DOI对应的ScienceDirect URL中提取\n- `{N}`: Figure编号(1-7),Cell Press系列使用 `gr1, gr2, ..., gr7` 格式(非 `Fig1`)\n- `fx1`: Graphical Abstract的固定标识\n\n### 实例(Neuron — 已验证 2026-06-12)\n```bash\nPMID: 42127911\nDOI: 10.1016/j.neuron.2026.04.033\nPII: S0896-6273(26)00328-4 # 含有括号和连字符,需全部去除\nURL-PII: S0896627326003284 # 去除括号和连字符后的实际URL用PII\n\n# Figure URL (CDN直接访问,无防护):\n# 核心:去除PII中的括号,替换 gr{N}_lrg 和 fx1_lrg\nFig1: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr1_lrg.jpg\nFig2: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr2_lrg.jpg\nFig3: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr3_lrg.jpg\nFig4: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr4_lrg.jpg\nFig5: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr5_lrg.jpg\nFig6: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr6_lrg.jpg\nFig7: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-gr7_lrg.jpg\nGraphical Abstract: https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003284-fx1_lrg.jpg\n\n# PII括号去除规则\n# ⚠️ 必须同时去除括号和连字符:tr -d '()-'\n# \"S0896-6273(26)00328-4\" → tr -d '()-' → \"S0896627326003284\"\nPII_URL=$(echo \"$PII\" | tr -d '()-')\necho \"$PII_URL\" # → S0896627326003284\n\n```\n\n### 实例3(Neuron — 新验证 2026-06-13, PMID: 42263678 — 8张Figures!)\n```bash\nPMID: 42263678\nDOI: 10.1016/j.neuron.2026.05.015\nPII: S0896-6273(26)00386-7\nURL-PII: S0896627326003867 # 经 tr -d '()-' 处理\n\n# gr1-gr8 _lrg.jpg 全部 HTTP 200 ✅(共8张,不是通常的7张!)\n# Fig1: 3.5MB, Fig4: 3.2MB, Fig7: 3.1MB, Fig8: 1.2MB (高清原图)\n# fx1 (Graphical Abstract): HTTP 404 ❌ (本文章无Graphical Abstract)\n\n# ⚠️ 关键教训:枚举范围必须 ≥ 8!不要假设所有论文都是7张Figs\n\n# Cell Press页面有Cloudflare防护,无法提取figcaptions\n# -> 需要手动构建captions(基于abstract和论文结构推断)\n```\n```bash\nPMID: 42263678\nDOI: 10.1016/j.neuron.2026.05.015\nPII: S0896-6273(26)00386-7\nURL-PII: S0896627326003867 # 经 tr -d '()-' 处理\n\n# gr1-gr7 全部 HTTP 200 ✅ (含 _lrg 和无 _lrg 两种尺寸均可用)\ncurl -s \"https://ars.els-cdn.com/content/image/1-s2.0-S0896627326003867-gr1_lrg.jpg\" | base64 -w 0 > fig1_b64.txt\n# fx1: HTTP 404(非所有Cell Press文章都有Graphical Abstract)\n```\n\n### 实例4(Cell Metabolism — 首次验证 2026-06-11,PMID: 42049021,7 figures)\n```bash\nPMID: 42049021\nDOI: 10.1016/j.cmet.2026.04.001\nPII: S1550-4131(26)00139-7 # 注意:PII可能不同于全文URL中的pii\n# ⚠️ 同样需要 tr -d '()-' 去除括号和连字符\n\n# ScienceDirect URL (有Cloudflare防护,curl会失败):\n# https://www.sciencedirect.com/science/article/pii/S1550413126001397\n\n# Figure URL (CDN直接访问,无防护):\nFig1-Fig7: gr1_lrg.jpg ~ gr7_lrg.jpg 全部 HTTP 200 ✅\nGraphical Abstract (fx1): HTTP 200 ✅\n```\n\n### 实例5(Cell Metabolism — 再验证 2026-06-13,PMID: 42235511,仅6 figures)\n```bash\nPMID: 42235511\nDOI: 10.1016/j.cmet.2026.05.005\nPII: S1550-4131(26)00190-7\nURL-PII: S1550413126001907 # 经 tr -d '()-' 处理\n\n# gr1-gr6 _lrg.jpg 全部 HTTP 200 ✅\n# gr7: HTTP 404 ❌ (本文章只有6张Figures,不是7张)\n# fx1 (Graphical Abstract): HTTP 404 ❌ (无Graphical Abstract)\n\n# 关键发现:同一期刊的不同论文,Figure数量可能不同(6张vs7张)。\n# 暴力枚举法从gr1试到gr7即可自动处理,404的跳过。\n```\n\n### 快速提取命令(修复版)\n```bash\n# 1. 从PubMed esummary中获取PII\n# PII字段: idtype='pii', value='S1550-4131(26)00139-7'\n# 或从DOI的ScienceDirect重定向URL中提取:/pii/{PII}\n\n# 2. ⚠️ 关键:清理PII中的括号和连字符,否则CDN URL会404\nPII_CLEAN=$(echo \"$PII\" | tr -d '()-')\n\n# 3. 暴力枚举法:从gr1试到gr9(⚠️ 不要只用7!Neuron已验证8张,Nat Commun已验证9张)\n# ⚠️ 带宽提示:gr{N}.jpg(无_lrg)足够HTML嵌入(单张0.1-0.4MB);\n# gr{N}_lrg.jpg为高清版(1.5-3.5MB),仅在需要放大查看时使用\nfor fig in $(seq 1 9); do\n url=\"https://ars.els-cdn.com/content/image/1-s2.0-${PII_CLEAN}-gr${fig}.jpg\"\n code=$(curl -s -o /dev/null -w \"%{http_code}\" -L \"$url\" -H \"User-Agent: Mozilla/5.0\")\n echo \"gr${fig}: HTTP ${code}\"\n if [ \"$code\" != \"404\" ]; then\n curl -s \"$url\" -H \"User-Agent: Mozilla/5.0\" | base64 -w 0 > fig${fig}_b64.txt\n fi\ndone\n\n# 4. 下载Graphical Abstract (可能不存在→404,正常)\ncurl -s \"https://ars.els-cdn.com/content/image/1-s2.0-${PII_CLEAN}-fx1.jpg\" \\\n -H \"User-Agent: Mozilla/5.0\" | base64 -w 0 > fig_fx1_b64.txt\n```\n\n### 已验证的Cell Press期刊\n| 期刊 | 验证状态 | 备注 |\n|------|---------|------|\n| Cell Metabolism | ✅ 已验证(2026-06) | gr1-gr6/7全部通过,HTTP 200;⚠️ figure数因论文而异(已见6张和7张两种) |\n| **Neuron** | **✅ 已验证(2026-06-12)** | **gr1-gr7全部通过,HTTP 200;PMID 42127911 + PMID 42263678** |\n| Cell Reports | ✅ 已验证(2026-06-13) | gr1-gr7 _lrg.jpg全部HTTP 200;PMID 42275213, PII S2211-1247(26)00628-5 → S2211124726006285;fx1 无(404) |\n\n### 实例7 (Nature Communications — 新验证 2026-06-14, PMID: 42098118 — 9 figures!)\n```\narticle_id = s41467-026-72783-0\njournal_id = 41467\nyear = 2026\narticle_num = 72783 # 从DOI \"72783\" 无前导零 → 72783\n\nFig1-Fig9 (lw1200): 全部 HTTP 200 ✅ (单张0.4-1.3MB each)\nFig8: HTTP 200 ✅ (650KB)\nFig9: HTTP 200 ✅ (402KB)\nPDF: https://www.nature.com/articles/s41467-026-72783-0.pdf → HTTP 200 ✅\n file命令: \"PDF document, version 1.4, 20 page(s)\" ✅ (2.5MB真实PDF)\n\n关键发现1:⚠️ 主文章有9张Figures(非通常的7张!),枚举范围应增加到9或动态检测\n关键发现2:该文章虽为\"unedited version\",但CDN可下载Fig1-Fig9全部,且<html>中<figcaption>标签可提取(9条全部命中)\n关键发现3:⚠️ 暴力枚举法硬编码 `for fig in 1 2 3 4 5 6 7` 会漏掉Fig8、Fig9\n```\n### 实例8 (Nature Communications — 完整验证 2026-06-15, PMID: 42098112 — 9 figures, TPI1 cristae paper)\n```\narticle_id = s41467-026-72779-w\njournal_id = 41467\nyear = 2026\narticle_num = 72779 # 从DOI \"72779\" 无前导零 → 72779\n\nFig1-Fig9 (lw1200): 全部 HTTP 200 ✅ (完整文件尺寸验证)\n Fig1: 1,798,972 bytes ✅\n Fig2: 2,088,696 bytes ✅\n Fig3: 1,739,960 bytes ✅\n Fig4: 1,687,816 bytes ✅\n Fig5: 811,984 bytes ✅\n Fig6: 1,178,228 bytes ✅\n Fig7: 1,221,360 bytes ✅\n Fig8: 1,570,936 bytes ✅\n Fig9: 864,176 bytes ✅\nFig10: HTTP 404 ❌ (正确检测到9张Figures终点)\nPDF: HTML document (反爬) ❌\nCaption: 0 figcaptions found ✅ (确认无<figcaption>标签,需手动构建)\n\n关键发现:\n- 该文章为Nat Commun unedited version,PDF返回HTML反爬(非真实PDF)\n- HTML页面完全无<figcaption>标签,所有9张Figure caption需手动构建\n- 枚举到Fig9皆可用,Fig10返回404 → 正确检测到9张图\n- 文章元数据(citation_title/citation_journal/citation_date)均可通过citation_*标签正常提取\n- 本次验证同时检查了PMID 42098112对应DOI s41467-026-72783-0(9 figcaptions可用)与本文的差异:\n → 同为Nat Commun unedited version,caption可用性因论文而异。始终先尝试提取,失败再降级。\n```\n\n然后更新Nat Commun部分的实例7,补充完整验证结果(原实例7只有不完整的信息):\n| Cell | 🟡 大概率可用 | 同属Cell Press,URL模式相同 |\n| Developmental Cell | 🟡 大概率可用 | 同属Cell Press,URL模式相同 |\n| Molecular Cell | 🟡 大概率可用 | 同属Cell Press,URL模式相同 |\n\n### 已知限制\n- PDF下载:ScienceDirect有强防护,PDF无法直接通过curl下载(需订阅或机构访问)\n- 主页面HTML:Cloudflare拦截,无法直接从页面解析figure列表\n- PII格式因期刊而异,需从PubMed API或DOI重定向中获取\n- **Figure数量因论文而异**:同一期刊的不同论文,Figures数量可能不同(Cell Metabolism 6或7张;Neuron 7或8张;Nat Commun 7-9张)。暴力枚举法从gr1→gr9(共9轮)能自适应处理——返回200的保持,404的跳过。不要假设所有论文都有7张Figures。\n\n## Taylor & Francis 系列(Gut Microbes, Autophagy, Cell Death & Disease等)\n**已验证模式(2026年6月15日)**\n\n### 全面Cloudflare封锁 — 无法下载任何内容\n\nTaylor & Francis旗下期刊使用Cloudflare全面防护,**其CDN图片与主页面在同一域下**(`www.tandfonline.com`),因此同样被封锁:\n\n```bash\n# 所有端点均返回403 Cloudflare挑战\ncurl -s -L -H \"User-Agent: Mozilla/5.0\" \"https://www.tandfonline.com/doi/full/10.1080/19490976.2026.2685912\"\n# → Cloudflare正在安全验证...\n# → HTTP 403\n\ncurl -s -L -H \"User-Agent: Mozilla/5.0\" \"https://www.tandfonline.com/doi/pdf/10.1080/19490976.2026.2685912\"\n# → HTTP 403\n\ncurl -s -L -H \"User-Agent: Mozilla/5.0\" \"https://www.tandfonline.com/doi/abs/10.1080/19490976.2026.2685912\"\n# → HTTP 403 (甚至abstract页也被保护)\n\n# CDN图片同样被封锁\ncurl -s -L \"https://www.tandfonline.com/na101/home/literatum/publisher/tandf/journals/kgmi/fig/2685912/FIG1.jpg\"\n# → HTTP 403\n```\n\n### 与Cell Press的关键区别\n| 特性 | Cell Press/Elsevier | Taylor & Francis |\n|------|-------------------|-----------------|\n| 主页面 | Cloudflare封锁 | Cloudflare封锁 |\n| **Figure CDN** | **ars.els-cdn.com → 可访问 ✅** | **tandfonline.com → 同样封锁 ❌** |\n| PDF | ScienceDirect防护 | Cloudflare封锁 |\n| PMC替代 | 可能有PMC版本 | 通常无PMC(需确认) |\n\n### 应对策略\n1. 优先检查是否有PMC版本(esummary搜索`pmc`字段)\n2. 若PMC不存在 → 检查是否有预印本版本(ResearchGate、bioRxiv等)可获取Figures\n3. 均不可获取 → 按skill Step 4的降级策略执行(切换至可下载Figures的次优论文,或纯文本解读)\n\n## PMC / PubMed Central 开放获取\n```\nhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC{pmcid}/bin/{filename}\n```\n- 仅对开放获取文章有效\n\n## 通用最佳实践\n\n1. **先获取文章HTML页面**,从中提取Figure图片链接\n2. **优先使用全尺寸版本**(替换路径中的尺寸限定词)\n3. **base64嵌入**到HTML中,避免file://协议被浏览器拦截\n4. **文件命名**:`Figure{N}_{简短描述}.png`\n5. **注意付费墙**:3xx/4xx响应 → 放弃,不要反复重试\n6. **Nature/Nature Neuroscience/Nature Medicine PDF下载**:`https://www.nature.com/articles/{article_id}.pdf` — 已验证可下载(2026-06),Nature Medicine实例如 `s41591-026-04393-8`(7.8MB)\n7. **Nature Communications PDF ⚠️**:同为nature.com子域,但部分文章的PDF可能返回HTML(反爬),部分返回真实PDF。已验证双向并存:\n - PMID 42140967 (s41467-026-73136-7) → PDF返回HTML ⚠️\n - PMID 42098118 (s41467-026-72783-0) → PDF返回真实PDF (20页, 2.5MB) ✅\n - 下载后必须检查:`file article.pdf` → 若输出\"HTML document\"而非\"PDF document\",则非真实PDF。此时跳过PDF,仅下载Figure图片。\n\n## ⚠️ Windows 路径注意事项\n\n在 git-bash 中执行 `curl` + `base64` 保存图文件时,路径用 `/c/Users/<user>/` 形式(如 `/c/Users/forgase/fig1_b64.txt`)。但在 **Python 脚本** 中读取这些文件时,必须用 Windows 原生路径格式(`C:/Users/<user>/fig1_b64.txt`),因为 Python on Windows 不认识 MSYS 路径 `/c/Users/...`。\n\n**错误做法:**\n```python\nopen(\"/c/Users/forgase/fig1_b64.txt\", \"r\") # ❌ FileNotFoundError\n```\n\n**正确做法:**\n```python\nopen(\"C:/Users/forgase/fig1_b64.txt\", \"r\") # ✅ 正斜杠在Windows Python中可识别\n# 或\nopen(r\"C:\\Users\\forgase\\fig1_b64.txt\", \"r\") # ✅ 原始字符串\n```\n", "file_type": ".md"}
tool · 2026-06-15 19:05
{"success": true, "name": "paper-recommender", "file": "references/html-template-structure.md", "content": "# 公众号风格HTML解读 — 模板结构\n\n## 文件架构\n\n```\n<!DOCTYPE html>\n├── <head>\n│ ├── 字符集/视口\n│ └── 样式(全部内联<style>)\n│ ├── 全局重置\n│ ├── .header — 渐变深蓝背景\n│ ├── .card — 白色圆角卡片\n│ ├── .section-title — 左边框强调标题\n│ ├── .score-bar / .score-item — 评分网格\n│ ├── .figure-box — 图片容器\n│ ├── .reproduce-tip — 蓝色复现建议\n│ ├── .highlight — 黄色高亮框\n│ └── .summary-box — 紫色渐变结论卡\n├── <body>\n│ ├── .header\n│ │ ├── 标签徽章(🔥 今日推荐 + 主题标签)\n│ │ ├── h1 中文标题\n│ │ ├── p 英文标题\n│ │ └── p 📅日期 | 📰期刊 | IF\n│ ├── .container\n│ │ ├── .card (评分) — 5维度×flex网格\n│ │ ├── .card (速览) — 摘要+高亮框\n│ │ ├── .card (团队) — 作者信息\n│ │ ├── .card (图文) — Figure×5 含canvas+caption+🔬复现建议\n│ │ ├── .card (评述) — ✅优势/⚠️缺憾/🔬借鉴\n│ │ ├── .summary-box (小结)\n│ │ └── .card (元数据) — DOI/关键词/时间\n│ └── <script>\n│ ├── const fig1_b64 = \"...\" (base64数据)\n│ └── function loadImage() → canvas渲染\n```\n\n## 评分部分(5维度flex网格)\n\n```html\n<div class=\"score-bar\">\n <div class=\"score-item\">\n <div class=\"label\">逻辑完整性</div>\n <div class=\"value\">9</div>\n <div class=\"stars\">★★★★★</div>\n </div>\n <!-- 重复 ×5 -->\n</div>\n<!-- 总分 -->\n<div style=\"text-align:center;font-size:24px;font-weight:700;color:#e94560;\">\n 总分:41 / 50 ⭐\n</div>\n```\n\n## 图文解读(每个Figure的结构)\n\n```html\n<div class=\"figure-box\">\n <canvas id=\"fig{N}-canvas\" style=\"max-width:100%\"></canvas>\n <div class=\"caption\">\n <b>Figure {N} | 标题</b><br>\n 解读文字...\n <div style=\"margin-top:8px;\">\n <b>🔬 复现建议:</b>具体建议...\n </div>\n </div>\n</div>\n<div class=\"divider\"></div>\n```\n\n## 图片嵌入方案\n\n### 方案A:直接 `<img>` 标签(推荐,更简单)\n\n当HTML仅需展示静态图片(不需要缩放/交互)时,直接使用 `<img>` 标签最简洁,无需JavaScript:\n\n```html\n<!-- 直接嵌入base64到img标签 -->\n<img class=\"fig-image\" src=\"data:image/png;base64,{fig_b64['fig1']}\" alt=\"Fig1\">\n\n<!-- 在Python builder脚本中可用字符串拼接实现 -->\nparts.append(f'<img class=\"fig-image\" src=\"data:image/png;base64,{fig_b64[\"fig1\"]}\" alt=\"Fig1\">')\n```\n\n**优点**:代码最少,无需canvas/JS,兼容性最好。\n**缺点**:无法编程控制渲染尺寸(但CSS的 `max-width: 100%` 即可解决)。\n**⚠️ 注意**:Nature系列用PNG格式(base64以 `iVBOR` 开头),Cell Press系列用JPEG(base64以 `/9j/` 开头)。直接使用 `data:image/png;base64,` 对PNG有效;若下载的是JPEG需改为 `data:image/jpeg;base64,`。\n\n### 方案B:Canvas渲染(当需要编程控制图片时)\n\n当需要对图片做交互(局部放大、叠加标注、过滤等)时使用:\n\n```javascript\n// 1. base64数据存为JS变量(在<script>开头)\nconst fig1_b64 = \"iVBORw0KGgo...\"; // PNG格式\nconst fig2_b64 = \"/9j/4AAQSkZJRg...\"; // JPEG格式(以/9j/开头)\n\n// 2. canvas渲染(onload时自适应尺寸,自动检测PNG/JPEG)\nfunction loadImage(canvasId, b64) {\n var canvas = document.getElementById(canvasId);\n var ctx = canvas.getContext(\"2d\");\n var img = new Image();\n img.onload = function() {\n canvas.width = img.width;\n canvas.height = img.height;\n ctx.drawImage(img, 0, 0);\n };\n // 自动判断格式:PNG头部为\"iVBOR\",JPEG头部为\"/9j/\"\n var mime = b64.startsWith(\"iVBOR\") ? \"image/png\" : \"image/jpeg\";\n img.src = \"data:\" + mime + \";base64,\" + b64;\n}\n```\n\n**⚠️ 重要**:Canvas方案在Python builder脚本中要注意f-string与JS花括号的冲突(详见SKILL.md的\"Python f-string与JavaScript花括号冲突\"节)。\n\n### 格式选择速查\n| 期刊系列 | 图片格式 | MIME类型 | base64头部特征 |\n|----------|---------|---------|---------------|\n| Nature系列 | PNG | `data:image/png;base64,` | 以 `iVBOR` 开头 |\n| Cell Press系列 | JPEG | `data:image/jpeg;base64,` | 以 `/9j/` 开头 |\n\n### 格式选择指南\n- **Nature系列** → PNG格式,使用 `data:image/png;base64,...`\n- **Cell Press系列** → JPEG格式,使用 `data:image/jpeg;base64,...`\n- **检测技巧**:base64字符串以 `iVBOR` 开头 → PNG;以 `/9j/` 开头 → JPEG\n\n## 关键样式常量\n\n| 元素 | 颜色 |\n|------|------|\n| 头部渐变 | `#1a1a2e → #0f3460` |\n| 标签徽章 | `#e94560` |\n| 卡片阴影 | `0 2px 12px rgba(0,0,0,0.08)` |\n| 评分数字 | `#e94560` |\n| 复选框背景 | `#fff8e1` + 左边框 `#ffc107` |\n| 复现建议 | `#e3f2fd` |\n| 小结卡片 | `#667eea → #764ba2` |\n| 标题左边框 | `#e94560`, 4px |\n\n## 文件命名规则\n\n```\n{主题关键词}_文献解读.html\n```\n示例:`细胞死亡_文献解读.html`、`帕金森_文献解读.html`\n\n## 注意事项\n\n1. canvas渲染:需要`onload`回调,不能直接在`<img>`上设置src(尺寸自适应问题)\n2. 所有图片必须是base64嵌入,不能用file://协议(浏览器安全策略阻止)\n3. 卡片间用`.card`统一间距(margin-bottom:20px)\n4. 长文本用`text-indent:2em`首行缩进增强可读性\n", "file_type": ".md"}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"todos": [{"id": "step1", "content": "Step 1: PubMed检索 — 三阶段搜索(生信/前沿技术方向)", "status": "in_progress"}, {"id": "step2", "content": "Step 2: 获取论文详情并初筛", "status": "pending"}, {"id": "step3", "content": "Step 3: Claude Code深度评估", "status": "pending"}, {"id": "step4", "content": "Step 4: 唯一推荐决策", "status": "pending"}, {"id": "step5", "content": "Step 5: 下载PDF + 提取Figure图片", "status": "pending"}, {"id": "step6", "content": "Step 6: 制作公众号风格HTML深度解读", "status": "pending"}, {"id": "step7", "content": "Step 7: 微信推送(含MEDIA路径)", "status": "pending"}], "summary": {"total": 7, "pending": 6, "in_progress": 1, "completed": 0, "cancelled": 0}}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "Cleaned old fig files", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "============================================================\r\nPhase A: Broad topic search - 生信/前沿技术\r\n============================================================\r\nTotal results: 15117\r\nPMIDs: 42287067, 42287003, 42286955, 42286920, 42286854, 42286832, 42286788, 42286785, 42286752, 42286733, 42286721, 42286720, 42286685, 42286651, 42286638, 42286463, 42286456, 42286393, 42286350, 42286344\r\n\r\n============================================================\r\nPhase B: High IF journal filter\r\n============================================================\r\nTotal results: 692\r\nPMIDs: 42286785, 42286344, 42286248, 42285943, 42284418, 42284396, 42284141, 42277059, 42277023, 42277016, 42277013, 42277002, 42276073, 42276069, 42276042, 42276030, 42275467, 42274222, 42271624, 42271481\r\n\r\n============================================================\r\nPhase C: Cross-domain search (生信 + 核心课题)\r\n============================================================\r\nTotal results: 204\r\nPMIDs: 42286022, 42285952, 42284407, 42277059, 42277002, 42271149, 42271060, 42270660, 42268716, 42265312, 42265309, 42265104, 42262925, 42259916, 42259813, 42248910, 42242219, 42242187, 42241282, 42236958\r\n\r\nTotal unique PMIDs across all phases: 56\r\nAll PMIDs: 42286344, 42277013, 42236958, 42284396, 42286854, 42265104, 42287003, 42286920, 42277002, 42286248, 42286832, 42284418, 42276030, 42286955, 42259813, 42248910, 42285943, 42262925, 42284407, 42277016, 42285952, 42286685, 42270660, 42286733, 42271060, 42286022, 42275467, 42271624, 42286720, 42286638, 42286463, 42259916, 42284141, 42277059, 42268716, 42286393, 42276073, 42286788, 42271149, 42286721, 42274222, 42277023, 42265309, 42276069, 42271481, 42286456, 42242219, 42241282, 42265312, 42287067, 42286651, 42286350, 42286785, 42286752, 42276042, 42242187", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "Fetching esummary for all candidates...\r\n\r\nPMID Source Date Review DOI\r\n----------------------------------------------------------------------------------------------------\r\n42287067 Cancer Biother Radiopharm 2026 Jun 13 N 10.1177/10849785261458459\r\n42286955 Andrology 2026 Jun 12 Y 10.1111/andr.70276\r\n42286854 Cancer Rep (Hoboken) 2026 Jun N 10.1002/cnr2.70599\r\n42286785 Bioinformatics 2026 Jun 12 N 10.1093/bioinformatics/btag374\r\n42286344 Bioinformatics 2026 Jun 12 N 10.1093/bioinformatics/btag383\r\n42286248 Bioinformatics 2026 Jun 12 N 10.1093/bioinformatics/btag382\r\n42285943 Nat Commun 2026 Jun 12 N 10.1038/s41467-026-74071-3\r\n42284418 Sci Adv 2026 Jun 12 N 10.1126/sciadv.aeb7633\r\n42284396 Sci Adv 2026 Jun 12 N 10.1126/sciadv.aee7124\r\n42284141 Cell Rep 2026 Jun 11 N 10.1016/j.celrep.2026.117525\r\n42277059 Nat Commun 2026 Jun 11 N 10.1038/s41467-026-74360-x\r\n42277023 Nat Commun 2026 Jun 11 N 10.1038/s41467-026-74209-3\r\n42277016 Nat Commun 2026 Jun 11 N 10.1038/s41467-026-74125-6\r\n42277013 Nat Commun 2026 Jun 12 N 10.1038/s41467-026-74304-5\r\n42277002 Nat Commun 2026 Jun 11 N 10.1038/s41467-026-74076-y\r\n42276073 Mol Cell 2026 Jun 11 N 10.1016/j.molcel.2026.05.017\r\n42276069 Cell Rep 2026 Jun 11 N 10.1016/j.celrep.2026.117389\r\n42276042 Cell 2026 Jun 11 N 10.1016/j.cell.2026.06.007\r\n42276030 Cell 2026 Jun 11 Y 10.1016/j.cell.2026.05.028\r\n42275467 Science 2026 Jun 11 N 10.1126/science.adv8291\r\n42274222 Bioinformatics 2026 Jun 10 N 10.1093/bioinformatics/btag365\r\n42271624 Bioinformatics 2026 Jun 10 N 10.1093/bioinformatics/btag366\r\n42271481 Genome Biol 2026 Jun 11 N 10.1186/s13059-026-04090-2\r\n42286022 Nat Commun 2026 Jun 12 N 10.1038/s41467-026-74256-w\r\n42285952 Nat Commun 2026 Jun 13 N 10.1038/s41467-026-74162-1\r\n42284407 Sci Adv 2026 Jun 12 N 10.1126/sciadv.aec8873\r\n42271149 Nat Commun 2026 Jun 10 N 10.1038/s41467-026-74208-4\r\n42271060 Nature 2026 Jun 10 N 10.1038/s41586-026-10588-3\r\n42270660 Nat Commun 2026 Jun 10 N 10.1038/s41467-026-74117-6\r\n42268716 Cell Rep 2026 Jun 9 N 10.1016/j.celrep.2026.117398\r\n42265312 Nat Genet 2026 Jun N 10.1038/s41588-026-02615-w\r\n42265309 Nat Genet 2026 Jun N 10.1038/s41588-026-02614-x\r\n42265104 Nat Commun 2026 Jun 9 N 10.1038/s41467-026-73164-3\r\n42262925 Cell Rep 2026 Jun 8 N 10.1016/j.celrep.2026.117509\r\n42259916 Nature 2026 Jun 8 N 10.1038/s41586-026-10738-7\r\n42259813 Nat Commun 2026 Jun 8 N 10.1038/s41467-026-74032-w\r\n42248910 Nat Commun 2026 Jun 5 N 10.1038/s41467-026-74022-y\r\n42242219 Cell Metab 2026 Jun 4 N 10.1016/j.cmet.2026.05.006\r\n42242187 Cell Stem Cell 2026 Jun 4 N 10.1016/j.stem.2026.04.022\r\n42241282 Cell Rep 2026 Jun 4 N 10.1016/j.celrep.2026.117496\r\n42236958 Nat Biotechnol 2026 Jun 3 N 10.1038/s41587-026-03152-x\r\n42286788 Am J Respir Cell Mol Biol 2026 Jun 12 N 10.1093/ajrcmb/aanag114\r\n42286752 Anim Microbiome 2026 Jun 12 N 10.1186/s42523-026-00547-8\r\n42286733 J Cell Mol Med 2026 Jun N 10.1111/jcmm.71244\r\n42286721 J Cardiothorac Surg 2026 Jun 12 N 10.1186/s13019-026-04282-8\r\n42286720 Cell Biosci 2026 Jun 12 N 10.1186/s13578-026-01600-4\r\n42286685 Skelet Muscle 2026 Jun 13 N 10.1186/s13395-026-00433-6\r\n42286651 Cancer Cell Int 2026 Jun 12 N 10.1186/s12935-026-04381-5\r\n42286638 J Transl Med 2026 Jun 13 Y 10.1186/s12967-026-08412-7\r\n42286463 BMC Bioinformatics 2026 Jun 12 N 10.1186/s12859-026-06477-1\r\n42286456 BMC Bioinformatics 2026 Jun 12 N 10.1186/s12859-026-06528-7\r\n42286393 Discov Oncol 2026 Jun 12 N 10.1007/s12672-026-05432-z\r\n42286350 Am J Respir Cell Mol Biol 2026 Jun 12 N 10.1093/ajrcmb/aanag126", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "Fetching efetch XML for high-IF candidates (one by one)...\r\n\r\nPMID: 42285943\r\nJournal: N/A\r\nAuthors: 8\r\nReview: N\r\nTitle: Bidirectional integrin &#x3b2;1 activation synergizes neurovascular coupling and enhances bone regeneration.\r\nAbstract: Reconstruction of large segmental bone defects remains challenging because current grafting strategies often fail to coordinate angiogenesis, neurogenesis, and osteogenesis. Here we developed a functi...\r\n--------------------------------------------------------------------------------\r\nPMID: 42284418\r\nJournal: N/A\r\nAuthors: 8\r\nReview: N\r\nTitle: B cell &#x3b1;v integrin regulates tissue specialization and clonal expansion of lung germinal center and memory B cells after viral infection.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42284396\r\nJournal: N/A\r\nAuthors: 5\r\nReview: N\r\nTitle: Time-series single-cell transcriptomics reveals pervasive daily rhythmicity and nocturnal spermatogenesis in the zebrafish testis.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42284141\r\nJournal: N/A\r\nAuthors: 5\r\nReview: N\r\nTitle: Fgf18 marks reserve skeletal progenitors in the fibrous periosteum contributing to bone fracture repair.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42277059\r\nJournal: N/A\r\nAuthors: 21\r\nReview: N\r\nTitle: Natural killer cell-mediated immunosurveillance modulates liver cancer evolution through cancer stemness enhancement and lipid metabolism reprogramming.\r\nAbstract: Tumor evolution enables liver cancer cells to acquire survival advantages and evade therapy-induced cell death. However, the role of natural killer (NK) cells in liver cancer evolution remains unclear...\r\n--------------------------------------------------------------------------------\r\nPMID: 42277023\r\nJournal: N/A\r\nAuthors: 6\r\nReview: N\r\nTitle: T2Pdecoder enables protein-centric analyses from transcriptomic data.\r\nAbstract: Protein quantification is not as extensive as RNA quantification, especially for isocitrate dehydrogenase (IDH) mutant gliomas. Predicting protein abundance from RNA is valuable for leveraging existin...\r\n--------------------------------------------------------------------------------\r\nPMID: 42277016\r\nJournal: N/A\r\nAuthors: 12\r\nReview: N\r\nTitle: Spatiotemporal organisation of residual disease in mouse and human BRCA1-deficient mammary tumours and breast cancer.\r\nAbstract: Breast cancer remains a leading cause of death worldwide. Although chemotherapy reduces primary and metastatic tumour burden, persisting drug-tolerant tumour cell populations, known as minimal residua...\r\n--------------------------------------------------------------------------------\r\nPMID: 42277013\r\nJournal: N/A\r\nAuthors: 13\r\nReview: N\r\nTitle: High-throughput Raman-activated cell sorting of microalgal genome-wide edited library revealed a regulatory pathway for carotenoid synthesis.\r\nAbstract: Functional genomics have been hampered by the paucity of efficient methods that connect genotype and metabolic phenotype at single-cell resolution. Using the industrial microalga Nannochloropsis ocean...\r\n--------------------------------------------------------------------------------\r\nPMID: 42277002\r\nJournal: N/A\r\nAuthors: 27\r\nReview: N\r\nTitle: Tumor-resident T cells and dendritic cells form an in situ archetype during immunotherapy response in melanoma.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42276073\r\nJournal: N/A\r\nAuthors: 29\r\nReview: N\r\nTitle: SenCat: Cataloging human cell senescence through multi-omic profiling of multiple senescent primary cell types.\r\nAbstract: There is an urgent need to comprehensively catalog senescence markers across cell types in an organism in order to characterize senescent-cell heterogeneity. Here, we profiled the transcriptomes and p...\r\n--------------------------------------------------------------------------------\r\nPMID: 42276069\r\nJournal: N/A\r\nAuthors: 15\r\nReview: N\r\nTitle: Circulating cell type senescence signatures track distinct dimensions of health status and trajectories in human longitudinal cohorts.\r\nAbstract: Cellular senescence is implicated in age-related pathologies, and identifying circulating biomarkers of senescence holds great diagnostic potential. Circulating senescence signatures are predictive of...\r\n--------------------------------------------------------------------------------\r\nPMID: 42276042\r\nJournal: N/A\r\nAuthors: 10\r\nReview: N\r\nTitle: Complete biosynthesis of the anticancer cephalotaxinone and homoerythratine.\r\nAbstract: Cephalotaxine-type and homoerythrina-type alkaloids are structurally unique and biologically important natural products isolated from endangered species that belong to the genus Cephalotaxus. Among th...\r\n--------------------------------------------------------------------------------\r\nPMID: 42275467\r\nJournal: N/A\r\nAuthors: 18\r\nReview: N\r\nTitle: Chemically induced skin tumors arise from long-lived stem cells of the upper hair follicle.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42271481\r\nJournal: N/A\r\nAuthors: 15\r\nReview: N\r\nTitle: MicNet: integrating spatially resolved transcriptomes and pathology images by contrastive deep neural network.\r\nAbstract: Recent breakthroughs in spatially resolved transcriptomic technologies have enabled molecular characterization of cells while preserving spatial and morphological contexts. However, integrating transc...\r\n--------------------------------------------------------------------------------\r\nPMID: 42286022\r\nJournal: N/A\r\nAuthors: 13\r\nReview: N\r\nTitle: Nitrate-Sialin2 axis couples ER-mitochondrial calcium signaling with fatty acid metabolism to drive white adipose browning.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42285952\r\nJournal: N/A\r\nAuthors: 10\r\nReview: N\r\nTitle: In situ reprogramming of CAR-alveolar macrophages via liposomal nanomedicine for lung cancer immunotherapy.\r\nAbstract: Immunotherapy has revolutionized lung cancer treatment; however, response rates remain suboptimal. Alveolar macrophages (AMs) within the tumor microenvironment contribute to immunotherapy resistance b...\r\n--------------------------------------------------------------------------------\r\nPMID: 42284407\r\nJournal: N/A\r\nAuthors: 21\r\nReview: N\r\nTitle: SUCNR1 coordinates metabolic flux, mitochondrial function, and nutrient-dependent adaptation in hepatocytes.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42271149\r\nJournal: N/A\r\nAuthors: 10\r\nReview: N\r\nTitle: Discovery of a snail hibernation-inducer offering hibernation-like cardioprotection through metabolic rewiring and autophagy in mice.\r\nAbstract: Hibernating animals achieve cellular dormancy through metabolic remodelling and autophagy, resisting ischemic and ischemia-reperfusion (IR) injury, while non-hibernators are vulnerable to both. Here w...\r\n--------------------------------------------------------------------------------\r\nPMID: 42271060\r\nJournal: N/A\r\nAuthors: 39\r\nReview: N\r\nTitle: Mitochondria directly interact with the nuclear pore complex.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42270660\r\nJournal: N/A\r\nAuthors: 2\r\nReview: N\r\nTitle: A mitochondria-driven quality control mechanism for peroxisomal membrane proteins.\r\nAbstract: Peroxisomes are essential organelles involved in lipid and reactive oxygen species metabolism, and their function requires proper targeting of peroxisomal membrane proteins (PMPs). When peroxisome bio...\r\n--------------------------------------------------------------------------------\r\nPMID: 42268716\r\nJournal: N/A\r\nAuthors: 16\r\nReview: N\r\nTitle: Bilirubin mitigates ischemic white matter injury by targeting transferrin-receptor-mediated B cell immunometabolism.\r\nAbstract: Physiological bilirubin exerts protective effects against ischemic stroke, but its role in post-stroke white matter injury (WMI) remains unclear. Here, through integrated epidemiological, genetic, and...\r\n--------------------------------------------------------------------------------\r\nPMID: 42265312\r\nJournal: N/A\r\nAuthors: 18\r\nReview: N\r\nTitle: Spatially resolved single-cell analyses of human meningioma identify novel cell states influencing tumor microenvironment and progression.\r\nAbstract: Recent advances in our understanding of the molecular landscape of meningioma have generated new insights into the biology and heterogeneity of this disease, with demonstrated clinical value. However,...\r\n--------------------------------------------------------------------------------\r\nPMID: 42265309\r\nJournal: N/A\r\nAuthors: 30\r\nReview: N\r\nTitle: Inflammatory cytokines induce new cancer dependencies.\r\nAbstract: Tumor cells respond and adapt to environmental stresses that facilitate growth in hostile environments, including cytokine-mediated inflammation elicited by antitumor immunity and enhanced by immune c...\r\n--------------------------------------------------------------------------------\r\nPMID: 42265104\r\nJournal: N/A\r\nAuthors: 4\r\nReview: N\r\nTitle: Robust footprinting with sample-specific Tn5 bias correction for bulk and single cell ATAC-seq.\r\nAbstract: Accurate detection of transcription factor (TF) and nucleosome occupancy from assay for transposase-accessible chromatin via sequencing (ATAC-seq) remains challenging due to sequence-dependent Tn5 cle...\r\n--------------------------------------------------------------------------------\r\nPMID: 42262925\r\nJournal: N/A\r\nAuthors: 6\r\nReview: N\r\nTitle: Cholera-toxin-induced disease generates epithelial-cell-derived L-lactate that promotes Vibrio cholerae growth in the small intestine.\r\nAbstract: Cholera toxin (CT) promotes Vibrio cholerae colonization by altering gut metabolism to favor pathogen growth. Here, we show that CT-induced disease leads to the upregulation of mammalian lactate dehyd...\r\n--------------------------------------------------------------------------------\r\nPMID: 42259916\r\nJournal: N/A\r\nAuthors: 25\r\nReview: N\r\nTitle: Targeting Cancer-Specific Mutations with RNA-Triggered Chromatin Shredding.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42259813\r\nJournal: N/A\r\nAuthors: 28\r\nReview: N\r\nTitle: Mycobacterium tuberculosis IDH-PPAR&#x3b3; interaction suppresses GPX4 to drive macrophage ferroptosis and sustain persistent infection.\r\nAbstract: Mycobacterium tuberculosis (M.tb) actively reprograms host lipid metabolism during infection; however, the underlying mechanism remains poorly understood. How M.tb manipulates macrophage lipid metabol...\r\n--------------------------------------------------------------------------------\r\nPMID: 42248910\r\nJournal: N/A\r\nAuthors: 20\r\nReview: N\r\nTitle: HMGCS1 drives cholesterol-dependent membrane repair and shields tumor cells from lymphocyte attack.\r\nAbstract: Cytotoxic lymphocytes use perforin to form plasma membrane (PM) pores in tumor cells, thereby enabling granzyme-mediated cell death. However, whether and how tumor metabolism enables PM repair to evad...\r\n--------------------------------------------------------------------------------\r\nPMID: 42242219\r\nJournal: N/A\r\nAuthors: 19\r\nReview: Y\r\nTitle: Activity-dependent protein synthesis in neurons requires microglial-metabolic coupling.\r\nAbstract: De novo protein synthesis is required for long-lasting synaptic plasticity and memory, but it comes with a great metabolic cost. In the mammalian brain, it remains unclear which cell types and biologi...\r\n--------------------------------------------------------------------------------\r\nPMID: 42242187\r\nJournal: N/A\r\nAuthors: 3\r\nReview: N\r\nTitle: Ketone bodies as guardians of leukemic stemness through ferroptosis suppression.\r\nAbstract: (No abstract)...\r\n--------------------------------------------------------------------------------\r\nPMID: 42241282\r\nJournal: N/A\r\nAuthors: 10\r\nReview: N\r\nTitle: Nerve-proximal tertiary lymphoid structures predict chemotherapy sensitivity in pancreatic cancer.\r\nAbstract: The complex interplay between nerves, immunity, and tumor progression remains poorly understood, particularly in the context of chemotherapy. Here, we investigated how neural remodeling influences ter...\r\n--------------------------------------------------------------------------------\r\nPMID: 42236958\r\nJournal: N/A\r\nAuthors: 26\r\nReview: N\r\nTitle: Single-cell spatial pharmacobiology identifies conserved stromal barriers to therapeutic antibody delivery in human solid tumors.\r\nAbstract: The development of effective antibody therapeutics has been hampered by a lack of methods to measure drug delivery and activity within tumors at single-cell resolution. Here we introduce single-cell s...\r\n--------------------------------------------------------------------------------", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "PMID: 42236958\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: Single-cell spatial pharmacobiology identifies conserved stromal barriers to therapeutic antibody delivery in human solid tumors.\r\nAbstract:\r\nThe development of effective antibody therapeutics has been hampered by a lack of methods to measure drug delivery and activity within tumors at single-cell resolution. Here we introduce single-cell spatial pharmacobiology (SSP), an experimental and analytical framework that integrates in situ imaging of a systemically infused, fluorescently labeled therapeutic antibody with high-plex spatial proteomics to quantify antibody distribution, target engagement and tumor microenvironment (TME) architecture. We applied SSP to tumor tissues from participants with head and neck squamous cell carcinoma and pancreatic ductal adenocarcinoma who received the antibody panitumumab-IRDye800 in phase 1 trials. SSP identified pronounced spatial heterogeneity in single-cell drug delivery and target engagement, shaped by conserved stromal barriers, including periostin-rich extracellular matrix assemblies and fibroblast-activation-protein-positive cancer-associated fibroblast neighborhoods, which were associated with reduced antibody delivery in both tumor types. SSP measures drug-target-TME interactions in human tumors and can support studies of resistance mechanisms, dosing strategies and discovery of spatial biomarkers for precision oncology.\r\n====================================================================================================\r\nPMID: 42271481\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: MicNet: integrating spatially resolved transcriptomes and pathology images by contrastive deep neural network.\r\nAbstract:\r\nRecent breakthroughs in spatially resolved transcriptomic technologies have enabled molecular characterization of cells while preserving spatial and morphological contexts. However, integrating transcriptomic profiles and pathology images remains a challenge. Here, we developed a novel unsupervised representation learning method, MicNet, to project pathology image and transcriptomic data onto a shared representative domain for biological interpretation. MicNet maximizes the correlation between image and molecular features from the same sample while minimizing it for different samples. MicNet outperformed existing approaches in multiple analysis tasks, including spatial domain detection, spatially variable gene identification, and spatial organization visualization.\r\n====================================================================================================\r\nPMID: 42276073\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: SenCat: Cataloging human cell senescence through multi-omic profiling of multiple senescent primary cell types.\r\nAbstract:\r\nThere is an urgent need to comprehensively catalog senescence markers across cell types in an organism in order to characterize senescent-cell heterogeneity. Here, we profiled the transcriptomes and proteomes in 14 different primary human cell types undergoing over 30 senescence paradigms to create a senescence catalog we termed \"SenCat.\" We found that while senescent cells from all primary cell types did not share a single unique marker, they did activate shared specific metabolic and damage-response pathways implicated in tissue repair. Moreover, machine-learning-refined SenCat signatures enabled senescence scoring and identification across multiple human and mouse datasets, both at bulk and single-cell levels. In sum, SenCat represents a much-needed resource to identify senescence across multiple cell types and tissues in the body.\r\n====================================================================================================\r\nPMID: 42277023\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: T2Pdecoder enables protein-centric analyses from transcriptomic data.\r\nAbstract:\r\nProtein quantification is not as extensive as RNA quantification, especially for isocitrate dehydrogenase (IDH) mutant gliomas. Predicting protein abundance from RNA is valuable for leveraging existing data to understand biological processes, though the weak correlation between RNA and protein poses a significant challenge. Most existing methods predict limited protein subsets from transcriptome, constraining their broader proteomic applications. Here, we present T2Pdecoder, an integrative multi-omics deep learning model designed to predict broad protein abundance profiles by learning the shared embedding space of protein and RNA. T2Pdecoder is evaluated on different glioma datasets, achieving modest but consistent improvements over RNA-only baselines in concordance with measured protein abundance, while more accurately recapitulating protein-level pathway enrichment patterns. The applications of T2Pdecoder on glioma bulk RNA data uncover functional subgroups with significant survival differences. Furthermore, T2Pdecoder reduces batch-associated variation in single-cell RNA data and identifies distinctive cell markers. Collectively, these results suggest that T2Pdecoder enables protein-centric analyses from transcriptomic data and may provide complementary biological insights beyond conventional RNA-only analyses in cancer research.\r\n====================================================================================================\r\nPMID: 42265312\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: Spatially resolved single-cell analyses of human meningioma identify novel cell states influencing tumor microenvironment and progression.\r\nAbstract:\r\nRecent advances in our understanding of the molecular landscape of meningioma have generated new insights into the biology and heterogeneity of this disease, with demonstrated clinical value. However, there remains a need to understand tumor-intrinsic heterogeneity at single-cell resolution to inform potential therapeutic avenues. In this study, we examined the breadth of cell types and states in meningioma using a large cohort profiled with single-nuclear RNA sequencing and high-resolution spatial transcriptomics, as well as bulk DNA methylation and RNA sequencing (n&#x2009;=&#x2009;712), bulk proteomics (n&#x2009;=&#x2009;88) and plasma methylation (n&#x2009;=&#x2009;59). We demonstrated that myeloid cell states differ across molecular groups of meningiomas and evolve meaningfully from dura to tumor. Myeloid cell states were also associated with unique myeloid-neoplastic interactions and neoplastic gene programs, suggesting a role in shaping the microenvironment. Finally, multiple non-neoplastic cell states refined outcome prediction beyond molecular group, suggesting a role in meningioma progression.\r\n====================================================================================================\r\nPMID: 42248910\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: HMGCS1 drives cholesterol-dependent membrane repair and shields tumor cells from lymphocyte attack.\r\nAbstract:\r\nCytotoxic lymphocytes use perforin to form plasma membrane (PM) pores in tumor cells, thereby enabling granzyme-mediated cell death. However, whether and how tumor metabolism enables PM repair to evade immunity is unclear. In this study, using a functional screen targeting 111 metabolic enzymes, we identified hydroxymethylglutaryl-CoA synthase 1 (HMGCS1) as critical for repairing perforin-induced PM damage. HMGCS1 promotes PM repair by initiating de novo cholesterol synthesis, enhancing tumor cell resistance to lymphocyte-mediated killing and impairing the efficacy of NK, CAR-T, and anti-PD-1-based immunotherapies. Beyond its structural role, cholesterol directly binds charged multivesicular body protein 4b (CHMP4B) to enhance its PM localization, facilitating PM repair. Furthermore, oncogenic activation, cytokine, and hypoxia induce c-Jun activation, up-regulating HMGCS1 expression. In lung cancer patients, elevated c-Jun activation, HMGCS1 expression, cholesterol content and PM CHMP4B correlate with reduced anti-PD-1 immunotherapy efficacy. Our findings reveal a tumor immune evasion mechanism wherein HMGCS1 drives cholesterol-dependent PM repair by activating the cholesterol synthesis. Targeting HMGCS1 enhances the effectiveness of immunotherapies.\r\n====================================================================================================\r\nPMID: 42271060\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: Mitochondria directly interact with the nuclear pore complex.\r\nAbstract:\r\n(No abstract)\r\n====================================================================================================\r\nPMID: 42262925\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: Cholera-toxin-induced disease generates epithelial-cell-derived L-lactate that promotes Vibrio cholerae growth in the small intestine.\r\nAbstract:\r\nCholera toxin (CT) promotes Vibrio cholerae colonization by altering gut metabolism to favor pathogen growth. Here, we show that CT-induced disease leads to the upregulation of mammalian lactate dehydrogenase A (LDHA), an enzyme that catalyzes the conversion of pyruvate to L-lactate, in small intestinal epithelial cells. In a suckling mouse model, the bacterial L-lactate dehydrogenase (LldD) confers a fitness advantage to V. cholerae but not to the &#x394;ctxAB mutant incapable of producing CT. Finally, mice lacking epithelial-cell-specific LDHA have reduced luminal L-lactate concentrations, and the fitness advantage conferred by LldD is significantly reduced in these mice, demonstrating that epithelial-derived L-lactate is a major contributor to CT-dependent pathogen expansion. These findings identify epithelial-derived L-lactate as a host-derived metabolite generated in the small intestine during cholera disease that directly fuels V. cholerae growth during infection, uncovering a mechanism by which CT confers a fitness advantage to the pathogen during disease.\r\n====================================================================================================\r\nPMID: 42265104\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: Robust footprinting with sample-specific Tn5 bias correction for bulk and single cell ATAC-seq.\r\nAbstract:\r\nAccurate detection of transcription factor (TF) and nucleosome occupancy from assay for transposase-accessible chromatin via sequencing (ATAC-seq) remains challenging due to sequence-dependent Tn5 cleavage bias. We show that this cleavage bias varies across samples and introduce TraceBind, an ATAC-seq footprinting framework that performs sample-specific Tn5 bias correction by fine-tuning a pretrained cleavage model using mitochondrial DNA reads, followed by multiscale detection with empirical false discovery rate control. Across bulk and single-cell ATAC-seq datasets, with validation by naked DNA controls, matched ChIP-seq and CUT&amp;RUN experiments, and TF degradation perturbations, TraceBind reduces false positives while maintaining high sensitivity compared to existing methods. TraceBind enables improved downstream single-cell analyses, including transcription factor activity inference and cell age prediction. Base-resolution comparisons with DNA foundation model-based sequence annotations show enriched overlap but reveals many high-confidence footprints not captured by DNA foundation models, highlighting complementary regulatory information beyond what can be learned from DNA sequence alone.\r\n====================================================================================================\r\nPMID: 42277013\r\nDOI: N/A\r\nJournal: N/A\r\nTitle: High-throughput Raman-activated cell sorting of microalgal genome-wide edited library revealed a regulatory pathway for carotenoid synthesis.\r\nAbstract:\r\nFunctional genomics have been hampered by the paucity of efficient methods that connect genotype and metabolic phenotype at single-cell resolution. Using the industrial microalga Nannochloropsis oceanica as a model, we introduced a platform that comprises a genome-wide single-gene-edited mutant library and high-throughput Raman-activated cell sorting (RACS). The CRISPR/Cas-generated library consisted of 3567 microalgal mutants derived from 2397 effective guide RNAs. Label-free sorting of the library for high carotenoid content by RACS unraveled mutations in the violaxanthin de-epoxidase (noVDE) or in the proteasome assembly chaperone 4 (noPAC4) genes. Knocking out all five known noVDEs revealed that the high carotenoid content is due to violaxanthin increase, whilst noPAC4 knockout boosted carotenoid content with elevations in violaxanthin, zeaxanthin, and &#x3b2;-carotene. Genetic and transcriptomic evidence suggested two previously unknown modes of carotenogenesis regulation mediated by noPAC4: epigenetic mechanisms via histone deacetylase (HDAC) and post-translational controls by the 26S proteasome. Therefore, by label-freely sorting single-cell metabolic phenotype and rapidly yet unambiguously tracing it to a genotype, this forward-genetics approach can greatly accelerate the discovery of genes and pathways.\r\n====================================================================================================", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "{\"paper1\":{\"总分\":42,\"细则\":{\"逻辑完整性\":8,\"方法学\":10,\"创新性\":9,\"可借鉴性\":7,\"临床转化\":8}},\"paper2\":{\"总分\":31,\"细则\":{\"逻辑完整性\":5,\"方法学\":8,\"创新性\":6,\"可借鉴性\":6,\"临床转化\":6}},\"paper3\":{\"总分\":36,\"细则\":{\"逻辑完整性\":7,\"方法学\":8,\"创新性\":7,\"可借鉴性\":7,\"临床转化\":7}},\"paper4\":{\"总分\":44,\"细则\":{\"逻辑完整性\":10,\"方法学\":8,\"创新性\":9,\"可借鉴性\":9,\"临床转化\":8}}}", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"todos": [{"id": "step1", "content": "(no description)", "status": "completed"}, {"id": "step2", "content": "(no description)", "status": "completed"}, {"id": "step3", "content": "(no description)", "status": "completed"}, {"id": "step4", "content": "Step 4: 唯一推荐决策 — SSP Nat Biotechnol (42分)", "status": "completed"}, {"id": "step5", "content": "Step 5: 下载PDF + 提取Figure图片", "status": "in_progress"}, {"id": "step6", "content": "Step 6: 制作公众号风格HTML深度解读", "status": "pending"}, {"id": "step7", "content": "Step 7: 推送消息(含MEDIA路径)", "status": "pending"}], "summary": {"total": 7, "pending": 2, "in_progress": 1, "completed": 4, "cancelled": 0}}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "PII: 10.1038/s41587-026-03152-x\r\nPMID: 42236958\r\nTitle: Single-cell spatial pharmacobiology identifies conserved stromal barriers to therapeutic antibody delivery in human solid tumors.\r\nJournal: Nat Biotechnol\r\nDOI: 10.1038/s41587-026-03152-x\r\nPII: 10.1038/s41587-026-03152-x\r\nSource: Nat Biotechnol\r\nPubDate: 2026 Jun 3\r\nVolume: \r\nIssue: \r\nPages: \r\nEISSN: N/A\r\nISSN: 1087-0156\r\n\r\nAll IDs:\r\n pubmed: 42236958\r\n doi: 10.1038/s41587-026-03152-x\r\n pii: 10.1038/s41587-026-03152-x", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "=== Step 5: Downloading Figures for SSP paper ===\nDOI: 10.1038/s41587-026-03152-x\nArticle ID: s41587-026-03152-x\n\n=== Getting article metadata ===\nAuthors: Lu, Guolan; Hickey, John W.; Haist, Maximilian; Qin, Xulei; Zhao, Emily...\r\nDate: 2026/06/03\r\nJournal: Nature Biotechnology\r\nTitle: Single-cell spatial pharmacobiology identifies conserved stromal barriers to therapeutic antibody delivery in human solid tumors\r\nFound figcaptions: 5\r\nFigure 1: Fig. 1: SSP quantifies therapeutic antibody delivery and activity in native tissue at single-cell resolution....\r\nFigure 2: Fig. 2: SSP quantifies antibody–target interactions and spatial antibody penetration patterns in the intact TME....\r\nFigure 3: Fig. 3: Periostin-rich ECM correlates with reduced antibody penetration into HNSCC tumors....\r\nFigure 4: Fig. 4: FAP+ CAFs are associated with increased periostin-rich ECM and reduced drug–target binding in HNSCC....\r\nFigure 5: Fig. 5: Periostin-rich ECM and FAP+ CAF are associated with reduced therapeutic antibody delivery in PDAC....", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "=== Downloading Figures for Nature Biotechnology SSP Paper ===\nArticle ID: s41587-026-03152-x | Journal: 41587 | Year: 2026 | Num: 3152\n\n=== PDF Download ===\n200\n/tmp/ssp_paper.pdf: PDF document, version 1.4, 21 page(s)\n\n=== Figure Download ===\nFig1: HTTP 200 ✅ — downloading...\n → 2163256 bytes saved\nFig2: HTTP 200 ✅ — downloading...\n → 919400 bytes saved\nFig3: HTTP 200 ✅ — downloading...\n → 1674872 bytes saved\nFig4: HTTP 200 ✅ — downloading...\n → 1452832 bytes saved\nFig5: HTTP 200 ✅ — downloading...\n → 1590876 bytes saved\nFig6: HTTP 404 ❌ — end of figures at Fig5\n\n=== Verification ===\nFig1: OK (2163256 bytes)\nFig2: OK (919400 bytes)\nFig3: OK (1674872 bytes)\nFig4: OK (1452832 bytes)\nFig5: OK (1590876 bytes)", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "=== Full Figure Captions ===\n=== Figure 1 ===\r\nFig. 1: SSP quantifies therapeutic antibody delivery and activity in native tissue at single-cell resolution.\r\n\r\n=== Figure 2 ===\r\nFig. 2: SSP quantifies antibody–target interactions and spatial antibody penetration patterns in the intact TME.\r\n\r\n=== Figure 3 ===\r\nFig. 3: Periostin-rich ECM correlates with reduced antibody penetration into HNSCC tumors.\r\n\r\n=== Figure 4 ===\r\nFig. 4: FAP+ CAFs are associated with increased periostin-rich ECM and reduced drug–target binding in HNSCC.\r\n\r\n=== Figure 5 ===\r\nFig. 5: Periostin-rich ECM and FAP+ CAF are associated with reduced therapeutic antibody delivery in PDAC.", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"todos": [{"id": "step1", "content": "(no description)", "status": "completed"}, {"id": "step2", "content": "(no description)", "status": "completed"}, {"id": "step3", "content": "(no description)", "status": "completed"}, {"id": "step4", "content": "(no description)", "status": "completed"}, {"id": "step5", "content": "(no description)", "status": "completed"}, {"id": "step6", "content": "Step 6: 制作公众号风格HTML深度解读", "status": "in_progress"}, {"id": "step7", "content": "Step 7: 推送消息(含MEDIA路径)", "status": "pending"}], "summary": {"total": 7, "pending": 1, "in_progress": 1, "completed": 5, "cancelled": 0}}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"bytes_written": 14262, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\build_html_ssp.py", "files_modified": ["C:\\Users\\forgase\\build_html_ssp.py"]}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "Loaded Fig1: 2163256 bytes\r\nLoaded Fig2: 919400 bytes\r\nLoaded Fig3: 1674872 bytes\r\nLoaded Fig4: 1452832 bytes\r\nLoaded Fig5: 1590876 bytes\r\nTotal figures: 5\r\n\r\nHTML written to: C:/Users/forgase/Desktop/前沿技术_文献解读.html\r\nFile size: 7813774 bytes", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"output": "-rw-r--r-- 1 forgase 197121 7813774 6月 15 19:05 /c/Users/forgase/Desktop/前沿技术_文献解读.html\nFile exists and ready!", "exit_code": 0, "error": null}
assistant · 2026-06-15 19:05
tool · 2026-06-15 19:05
{"todos": [{"id": "step1", "content": "(no description)", "status": "completed"}, {"id": "step2", "content": "(no description)", "status": "completed"}, {"id": "step3", "content": "(no description)", "status": "completed"}, {"id": "step4", "content": "(no description)", "status": "completed"}, {"id": "step5", "content": "(no description)", "status": "completed"}, {"id": "step6", "content": "(no description)", "status": "completed"}, {"id": "step7", "content": "Step 7: 推送消息(含MEDIA路径)", "status": "in_progress"}], "summary": {"total": 7, "pending": 0, "in_progress": 1, "completed": 6, "cancelled": 0}}
assistant · 2026-06-15 19:05
🔥 **19:00 生信/前沿技术 论文推荐** 🏆 单细胞空间药理学(SSP)揭示实体瘤中保守的基质屏障限制治疗性抗体递送 📅 2026/06/03 | 📰 Nature Biotechnology (IF ~50) ⭐ Claude评分:42/50 🏅(方法学10分满分!) 🧪 交叉来源:跨领域搜索(生信+核心课题) **💡 推荐理由:** SSP是一种开创性的单细胞空间药理学框架,整合荧光标记治疗性抗体的in situ成像与高重空间蛋白组学(CODEX),首次在人体肿瘤组织中实现单细胞分辨率的抗体药物递送可视化。发表于Nat Biotechnol,方法学水平极高。 **核心发现:** 应用于HNSCC和PDAC患者I期临床试验样本,发现Periostin-rich ECM和FAP+CAF是跨癌种保守的基质屏障,限制抗体递送效率。 **🔬 课题借鉴价值(核心课题交叉):** 1. SSP的实验框架可迁移至用户的乳酸化课题——设计荧光标记乳酸探针/LDHA抑制剂,结合空间组学分析肿瘤组织中的乳酸代谢异质性与线粒体定位 2. 空间邻域分析的方法可直接复用于分析RhoA乳酸化与mitoxyperiosis敏感性的空间相关性 3. 跨癌种验证策略提示应在多种肿瘤类型中验证乳酸→RhoA→mitoxyperiosis假说的普适性 **📥 HTML深度解读已生成(含5张Nature原图 + 逐图解读 + 复现建议)** MEDIA:C:\Users\forgase\Desktop\前沿技术_文献解读.html