Hermes
返回

20260613_011159_62fea7

2026-06-13 01:11 · 66 条 · deepseek-v4-flash

user · 2026-06-13 01:11
下载小红书笔记中的视频到本地
assistant · 2026-06-13 01:15
好的,我先来看看 xhs-down skill 的目录结构和脚本。
tool · 2026-06-13 01:15
{"total_count": 10, "files": ["C:/Users/forgase/.claude/skills/xhs-down/scripts\\__pycache__\\xhs_paths.cpython-311.pyc", "C:/Users/forgase/.claude/skills/xhs-down/references\\xhs-patterns.md", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\xhs_profile_covers.py", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\xhs_author.py", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\downie_handoff.py", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\xhs_paths.py", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\xhs_download.py", "C:/Users/forgase/.claude/skills/xhs-down/scripts\\validate_manifest.py", "C:/Users/forgase/.claude/skills/xhs-down/agents\\openai.yaml", "C:/Users/forgase/.claude/skills/xhs-down/SKILL.md"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"content": "1|#!/usr/bin/env python3\n2|\"\"\"\n3|XHS Note Downloader — 小红书笔记下载器\n4|Download title, text (markdown), images (no watermark), and video (best quality).\n5|\n6|Usage:\n7| python3 xhs_download.py \"https://www.xiaohongshu.com/explore/xxx\"\n8| python3 xhs_download.py URL1 URL2 -o ./output -c \"a1=xxx; webId=yyy\"\n9|\"\"\"\n10|\n11|from __future__ import annotations\n12|\n13|import argparse\n14|import json\n15|import re\n16|import sys\n17|import time\n18|from pathlib import Path\n19|from urllib.parse import urlparse\n20|\n21|from xhs_paths import DEFAULT_OUTPUT_DIR\n22|\n23|try:\n24| import requests\n25|except ImportError:\n26| print(\"需要 requests 库: pip install requests\")\n27| sys.exit(1)\n28|\n29|# ─── Constants ────────────────────────────────────────────────────────────────\n30|\n31|XHS_NOTE_URL_PATTERN = re.compile(\n32| r\"xiaohongshu\\.com/(?:explore/|discovery/item/|user/profile/[^/?#]+/)([a-f0-9]{24})\"\n33|)\n34|\n35|XHS_SHORT_LINK_PATTERN = re.compile(r\"https?://xhslink\\.com/\\S+\")\n36|\n37|SNS_IMG_CDN_DOMAINS = [\n38| \"sns-img-bd.xhscdn.com\",\n39| \"sns-img-hw.xhscdn.com\",\n40| \"sns-img-qc.xhscdn.com\",\n41| \"sns-img-ws.xhscdn.com\",\n42| \"sns-img-al.xhscdn.com\",\n43| \"sns-img-ak.xhscdn.com\",\n44|]\n45|\n46|FALLBACK_CDN = \"ci.xiaohongshu.com\"\n47|\n48|DEFAULT_HEADERS = {\n49| \"User-Agent\": (\n50| \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) \"\n51| \"AppleWebKit/537.36 (KHTML, like Gecko) \"\n52| \"Chrome/131.0.0.0 Safari/537.36\"\n53| ),\n54| \"Accept\": \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\",\n55| \"Accept-Language\": \"zh-CN,zh;q=0.9,en;q=0.8\",\n56| \"Referer\": \"https://www.xiaohongshu.com/\",\n57|}\n58|\n59|WINDOWS_RESERVED = re.compile(r\"^(CON|PRN|AUX|NUL|COM[1-9]|LPT[1-9])$\", re.I)\n60|CHINESE_PUNCT = re.compile(\"[,。;:!?、\\u2018\\u2019\\u201c\\u201d【】《》…—~·()\\\\[\\\\]{}()]\")\n61|\n62|\n63|# ─── URL & ID Extraction ─────────────────────────────────────────────────────\n64|\n65|\n66|def extract_note_id(url: str) -> str | None:\n67| \"\"\"Extract 24-char hex note ID from XHS URL.\"\"\"\n68| m = XHS_NOTE_URL_PATTERN.search(url)\n69| return m.group(1) if m else None\n70|\n71|\n72|def resolve_short_url(url: str) -> str:\n73| \"\"\"Resolve xhslink.com short URLs to full xiaohongshu.com URLs.\n74|\n75| Short links redirect via HTTP 302 to the full URL containing\n76| the note ID and xsec_token (which allows cookie-free access).\n77| \"\"\"\n78| try:\n79| resp = requests.head(\n80| url,\n81| headers={\"User-Agent\": DEFAULT_HEADERS[\"User-Agent\"]},\n82| allow_redirects=True,\n83| timeout=10,\n84| )\n85| if \"xiaohongshu.com\" in resp.url:\n86| return resp.url\n87| except Exception:\n88| pass\n89| # Fallback: GET with full headers\n90| try:\n91| resp = requests.get(\n92| url,\n93| headers=DEFAULT_HEADERS,\n94| allow_redirects=True,\n95| timeout=15,\n96| )\n97| return resp.url\n98| except Exception:\n99| return url\n100|\n101|\n102|def normalize_url(raw: str) -> str:\n103| \"\"\"Normalize any XHS URL input to a usable fetch URL.\n104|\n105| Handles:\n106| - Direct explore/discovery URLs (pass through)\n107| - xhslink.com short links (resolve via redirect)\n108| - Share text blobs like \"标题... http://xhslink.com/o/xxx 复制后打开...\"\n109| \"\"\"\n110| raw = raw.strip()\n111|\n112| # Already a full xiaohongshu.com URL\n113| if extract_note_id(raw):\n114| return raw\n115|\n116| # Extract short link from share text\n117| short_match = XHS_SHORT_LINK_PATTERN.search(raw)\n118| if short_match:\n119| short_url = short_match.group(0)\n120| print(f\" Resolving short link: {short_url[:50]}...\")\n121| resolved = resolve_short_url(short_url)\n122| if extract_note_id(resolved):\n123| return resolved\n124| print(\" [WARN] Short link did not resolve to a note URL\")\n125|\n126| return raw\n127|\n128|\n129|# ─── HTML Parsing ─────────────────────────────────────────────────────────────\n130|\n131|\n132|def parse_initial_state(html: str, note_id: str) -> dict | None:\n133| \"\"\"Extract __INITIAL_STATE__ from HTML and find the note object.\"\"\"\n134| match = re.search(r\"__INITIAL_STATE__\\s*=\\s*(\\{.+\\})\", html, re.DOTALL)\n135| if not match:\n136| return None\n137|\n138| raw = match.group(1)\n139| # XHS sometimes uses JS `undefined` — replace with JSON-safe null\n140| raw = re.sub(r\"\\bundefined\\b\", \"null\", raw)\n141|\n142| try:\n143| state = json.loads(raw)\n144| except json.JSONDecodeError:\n145| return None\n146|\n147| note_map = state.get(\"note\", {}).get(\"noteDetailMap\", {})\n148| detail = note_map.get(note_id)\n149| if not detail:\n150| # Fallback: first available note in the map\n151| detail = next(iter(note_map.values()), None) if note_map else None\n152| if not detail:\n153| return None\n154|\n155| return detail.get(\"note\")\n156|\n157|\n158|# ─── Note Data Extraction ────────────────────────────────────────────────────\n159|\n160|\n161|def extract_note_data(note: dict) -> dict:\n162| \"\"\"Parse raw note object into structured data.\"\"\"\n163| # Tags\n164| tags = [t[\"name\"] for t in note.get(\"tagList\", []) if t.get(\"name\")]\n165|\n166| # Date: timestamp → YYYYMMDD\n167| ts = note.get(\"time\")\n168| if isinstance(ts, (int, float)):\n169| from datetime import datetime, timezone\n170|\n171| publish_date = datetime.fromtimestamp(ts / 1000, tz=timezone.utc).strftime(\n172| \"%Y%m%d\"\n173| )\n174| elif isinstance(ts, str):\n175| publish_date = ts[:8]\n176| else:\n177| publish_date = \"\"\n178|\n179| # Author\n180| user = note.get(\"user\", {})\n181| author = {\n182| \"userId\": user.get(\"userId\", \"\"),\n183| \"nickname\": user.get(\"nickname\", \"\"),\n184| \"avatar\": user.get(\"avatar\", \"\"),\n185| }\n186|\n187| # Images — no watermark + LivePhoto detection\n188| images = []\n189| for img in note.get(\"imageList\", []):\n190| url_default = img.get(\"urlDefault\", \"\") or img.get(\"url\", \"\")\n191| if url_default:\n192| # LivePhoto: video layer stored in stream.h264[0].masterUrl\n193| live_url = \"\"\n194| stream = img.get(\"stream\") or {}\n195| h264 = stream.get(\"h264\") or []\n196| if h264 and isinstance(h264, list):\n197| live_url = h264[0].get(\"masterUrl\", \"\")\n198| # Fallback: livePhoto field\n199| if not live_url:\n200| live_photo = img.get(\"livePhoto\") or {}\n201| live_url = live_photo.get(\"url\", \"\")\n202| images.append(\n203| {\n204| \"url\": pick_best_image_url(url_default),\n205| \"original\": url_default,\n206| \"width\": img.get(\"width\"),\n207| \"height\": img.get(\"height\"),\n208| \"livePhotoUrl\": live_url,\n209| }\n210| )\n211|\n212| # Video — best quality\n213| video = None\n214| v = note.get(\"video\")\n215| if v:\n216| stream = v.get(\"media\", {}).get(\"stream\", {})\n217| best_url = select_best_video(stream) or v.get(\"url\", \"\")\n218| if best_url:\n219| video = {\n220| \"url\": best_url,\n221| \"cover\": (v.get(\"cover\") or {}).get(\"urlDefault\", \"\"),\n222| \"duration\": v.get(\"duration\"),\n223| }\n224|\n225| # Stats\n226| info = note.get(\"interactInfo\", {})\n227|\n228| def _count(val):\n229| if isinstance(val, (int, float)):\n230| return int(val)\n231| if isinstance(val, str):\n232| return int(val) if val.isdigit() else 0\n233| return 0\n234|\n235| stats = {\n236| \"likes\": _count(info.get(\"likedCount\")),\n237| \"collects\": _count(info.get(\"collectedCount\")),\n238| \"comments\": _count(info.get(\"commentCount\")),\n239| \"shares\": _count(info.get(\"shareCount\")),\n240| }\n241|\n242| return {\n243| \"noteId\": note.get(\"noteId\", \"\"),\n244| \"title\": note.get(\"title\", \"\"),\n245| \"description\": note.get(\"desc\", \"\"),\n246| \"type\": \"video\" if note.get(\"video\") else \"image\",\n247| \"tags\": tags,\n248| \"publishDate\": publish_date,\n249| \"author\": author,\n250| \"images\": images,\n251| \"video\": video,\n252| \"stats\": stats,\n253| }\n254|\n255|\n256|# ─── Image URL Processing ────────────────────────────────────────────────────\n257|\n258|\n259|def extract_image_token(url: str) -> str:\n260| \"\"\"\n261| Extract image token from XHS CDN URL.\n262| Aligned with JoeanAmier/XHS-Downloader (10.4k stars):\n263| url.split('/')[5:] → join('/') → split('!')[0]\n264| Token may be multi-segment (contains directory prefix).\n265| \"\"\"\n266| if not url:\n267| return \"\"\n268|\n269| # Method 1: split(\"/\")[5:] — works for sns-webpic/sns-img format URLs\n270| parts = url.split(\"/\")\n271| if len(parts) > 5:\n272| token = \"/\".join(parts[5:]).split(\"!\")[0]\n273| if token:\n274| return token\n275|\n276| # Method 2: full pathname — works for ci.xiaohongshu.com short-path URLs\n277| try:\n278| parsed = urlparse(url)\n279| segments = [s for s in parsed.path.split(\"/\") if s]\n280| return \"/\".join(segments).split(\"!\")[0]\n281| except Exception:\n282| return \"\"\n283|\n284|\n285|def pick_best_image_url(url_default: str) -> str:\n286| \"\"\"Convert XHS image URL to no-watermark original quality.\"\"\"\n287| token = extract_image_token(url_default)\n288| if token:\n289| return f\"https://{SNS_IMG_CDN_DOMAINS[0]}/{token}\"\n290| # Fallback: strip query params\n291| try:\n292| parsed = urlparse(url_default)\n293| return f\"{parsed.scheme}://{parsed.netloc}{parsed.path}\"\n294| except Exception:\n295| return url_default\n296|\n297|\n298|# ─── Video URL Selection ─────────────────────────────────────────────────────\n299|\n300|\n301|def select_best_video(stream: dict) -> str | None:\n302| \"\"\"Select highest resolution video URL. Priority: h265 > h264.\"\"\"\n303| candidates = []\n304| for codec in [\"h265\", \"h264\", \"av1\"]:\n305| for s in stream.get(codec, []):\n306| master = s.get(\"masterUrl\", \"\")\n307| backups = s.get(\"backupUrls\", [])\n308| url = master or (backups[0] if backups else \"\")\n309| if url:\n310| w = s.get(\"width\", 0) or 0\n311| h = s.get(\"height\", 0) or 0\n312| candidates.append({\"url\": url, \"res\": w * h, \"codec\": codec})\n313|\n314| if not candidates:\n315| return None\n316|\n317| candidates.sort(key=lambda x: (x[\"res\"], x[\"codec\"] == \"h265\"), reverse=True)\n318| return candidates[0][\"url\"]\n319|\n320|\n321|# ─── CDN Retry ────────────────────────────────────────────────────────────────\n322|\n323|\n324|def build_candidate_urls(primary_url: str) -> list[str]:\n325| \"\"\"Build fallback CDN URL list for multi-node retry.\"\"\"\n326| urls = [primary_url]\n327| sns_match = re.search(r\"sns-img-\\w+\\.xhscdn\\.com\", primary_url)\n328| if sns_match:\n329| primary_domain = sns_match.group(0)\n330| for domain in SNS_IMG_CDN_DOMAINS:\n331| alt = primary_url.replace(primary_domain, domain)\n332| if alt != primary_url and alt not in urls:\n333| urls.append(alt)\n334| # ci.xiaohongshu.com fallback\n335| token = re.sub(r\"^https?://[^/]+/\", \"\", primary_url)\n336| if token:\n337| urls.append(f\"https://{FALLBACK_CDN}/{token}\")\n338| return urls\n339|\n340|\n341|# ─── File Download ────────────────────────────────────────────────────────────\n342|\n343|\n344|def download_file(\n345| url: str, save_path: Path, session: requests.Session, timeout: int = 30\n346|) -> bool:\n347| \"\"\"Download a file with multi-CDN fallback retry.\"\"\"\n348| candidates = build_candidate_urls(url)\n349|\n350| for i, candidate_url in enumerate(candidates):\n351| try:\n352| resp = session.get(candidate_url, timeout=timeout, stream=True)\n353| if resp.status_code == 200:\n354| save_path.parent.mkdir(parents=True, exist_ok=True)\n355| with open(save_path, \"wb\") as f:\n356| for chunk in resp.iter_content(chunk_size=8192):\n357| f.write(chunk)\n358| return True\n359| elif resp.status_code == 429:\n360| wait = min(2**i, 32)\n361| print(f\" rate limited, waiting {wait}s...\")\n362| time.sleep(wait)\n363| except requests.exceptions.Timeout:\n364| host = urlparse(candidate_url).netloc\n365| print(f\" timeout: {host}\")\n366| except Exception as e:\n367| print(f\" error: {e}\")\n368|\n369| return False\n370|\n371|\n372|# ─── Markdown Generation ─────────────────────────────────────────────────────\n373|\n374|\n375|def generate_markdown(data: dict, include_images: bool = True) -> str:\n376| \"\"\"Generate markdown content from structured note data.\"\"\"\n377| lines = [f\"# {data['title']}\", \"\"]\n378|\n379| # Meta\n380| lines.append(f\"作者: {data['author']['nickname']}\")\n381| lines.append(f\"来源: https://www.xiaohongshu.com/explore/{data['noteId']}\")\n382| if data[\"publishDate\"]:\n383| d = data[\"publishDate\"]\n384| if len(d) == 8:\n385| lines.append(f\"日期: {d[:4]}-{d[4:6]}-{d[6:8]}\")\n386| else:\n387| lines.append(f\"日期: {d}\")\n388| note_type = \"视频\" if data.get(\"video\") else \"图文\"\n389| lines.append(f\"类型: {note_type}\")\n390| if data[\"tags\"]:\n391| lines.append(f\"标签: {' '.join('#' + t for t in data['tags'])}\")\n392| lines.append(\"\")\n393|\n394| # Body\n395| if data[\"description\"]:\n396| lines.append(data[\"description\"])\n397| lines.append(\"\")\n398|\n399| # Image references\n400| if include_images and data[\"images\"]:\n401| lines.append(\"## 图片\")\n402| lines.append(\"\")\n403| for i, img in enumerate(data[\"images\"], 1):\n404| ext = guess_extension(img[\"url\"])\n405| lines.append(f\"![{i}]({i}{ext})\")\n406| lines.append(\"\")\n407|\n408| # Stats\n409| stats = data[\"stats\"]\n410| lines.append(\"## 数据\")\n411| lines.append(\"\")\n412| lines.append(\"| 指标 | 数量 |\")\n413| lines.append(\"|------|------|\")\n414| lines.append(f\"| 点赞 | {stats['likes']} |\")\n415| lines.append(f\"| 收藏 | {stats['collects']} |\")\n416| lines.append(f\"| 评论 | {stats['comments']} |\")\n417| lines.append(f\"| 分享 | {stats['shares']} |\")\n418| lines.append(\"\")\n419|\n420| return \"\\n\".join(lines)\n421|\n422|\n423|def guess_extension(url: str) -> str:\n424| \"\"\"Guess image file extension from URL.\"\"\"\n425| path = urlparse(url).path.lower()\n426| if \".png\" in path:\n427| return \".png\"\n428| if \".webp\" in path:\n429| return \".webp\"\n430| if \".gif\" in path:\n431| return \".gif\"\n432| return \".jpg\"\n433|\n434|\n435|# ─── Path Sanitization ───────────────────────────────────────────────────────\n436|\n437|\n438|def sanitize_segment(value: str, max_length: int = 80) -> str:\n439| \"\"\"\n440| Sanitize a path segment for filesystem compatibility.\n441| Strips Chinese punctuation, illegal chars, Windows reserved names.\n442| \"\"\"\n443| result = value\n444| # Remove angle brackets\n445| result = re.sub(r\"[<>]\", \"\", result)\n446| # Replace illegal/problematic chars with underscore\n447| result = re.sub(r'[\":\\/\\\\|?*\\x00-\\x1f]', \"_\", result)\n448| # Replace Chinese punctuation (Chrome downloads API rejects these)\n449| result = CHINESE_PUNCT.sub(\"_\", result)\n450| # Collapse .. sequences\n451| result = re.sub(r\"\\.{2,}\", \"_\", result)\n452| # Strip leading/trailing dots\n453| result = re.sub(r\"^\\.+\", \"\", result)\n454| result = re.sub(r\"\\.+$\", \"\", result)\n455| # Collapse multiple underscores\n456| result = re.sub(r\"_+\", \"_\", result)\n457| # Collapse whitespace\n458| result = re.sub(r\"\\s+\", \" \", result)\n459| result = result.strip(\". \")\n460|\n461| # Escape Windows reserved names\n462| name_part = result.split(\".\")[0]\n463| if WINDOWS_RESERVED.match(name_part):\n464| result = f\"_{result}\"\n465|\n466| # Truncate\n467| if len(result) > max_length:\n468| result = result[:max_length].rstrip(\"_ .\")\n469|\n470| return result or \"untitled\"\n471|\n472|\n473|def render_naming_template(template: str, data: dict) -> str:\n474| \"\"\"Render directory naming template with note data.\"\"\"\n475| result = template\n476| replacements = {\n477| \"{author}\": sanitize_segment(data[\"author\"][\"nickname\"]),\n478| \"{title}\": sanitize_segment(data[\"title\"]),\n479| \"{date}\": data[\"publishDate\"],\n480| \"{noteId}\": data[\"noteId\"],\n481| }\n482| for placeholder, value in replacements.items():\n483| result = result.replace(placeholder, value)\n484|\n485| # Sanitize each path segment\n486| segments = [sanitize_segment(s) for s in result.split(\"/\") if s]\n487| # Limit depth to 4\n488| if len(segments) > 4:\n489| head = segments[:3]\n490| tail = \"_\".join(segments[3:])\n491| segments = head + [tail]\n492|\n493| return \"/\".join(segments) if segments else \"untitled\"\n494|\n495|\n496|# ─── Main Download Logic ─────────────────────────────────────────────────────\n497|\n498|\n499|def download_note(\n500| url: str,\n501|", "total_lines": 762, "file_size": 26110, "truncated": true, "hint": "Use offset=501 to continue reading (showing 1-500 of 762 lines)", "is_binary": false, "is_image": false}
tool · 2026-06-13 01:15
{"content": "1|import os\n2|from pathlib import Path\n3|\n4|\n5|def get_output_dir():\n6| override = os.environ.get(\"XHS_DOWNLOAD_DIR\") or os.environ.get(\"XHS_OUTPUT_DIR\")\n7| path = Path(override).expanduser() if override else Path.cwd() / \"xhs-download\"\n8| path.mkdir(parents=True, exist_ok=True)\n9| return path\n10|\n11|\n12|DEFAULT_OUTPUT_DIR = get_output_dir()\n13|", "total_lines": 12, "file_size": 329, "truncated": false, "is_binary": false, "is_image": false}
tool · 2026-06-13 01:15
{"content": "1|---\n2|name: \"xhs-down\"\n3|description: \"Use when the user wants to download Xiaohongshu note content, including text, original images, and best-quality video, or to batch-download all notes from an author profile using browser automation. Trigger on requests like 下载小红书, 保存笔记, 去水印, 抓取 XHS 图片, 抓取 RED 视频, 批量下载小红书博主, 下载这个作者全部笔记, or build an XHS downloader/scraper. Covers note markdown export, original-image reconstruction, video download, and author-profile collection. For real download requests, run the bundled scripts and return note/profile/output evidence instead of a pure checklist. Do not use for Douyin downloads, TikTok downloads, or pure writing tasks.\"\n4|---\n5|\n6|# XHS Down\n7|\n8|Downloader workflow for Xiaohongshu notes and author profiles.\n9|\n10|## Trigger hints\n11|\n12|- Trigger on: 下载小红书、保存笔记、去水印、抓图、抓视频、批量下载小红书博主、XHS downloader、RED 下载\n13|- Also trigger when the user wants note text plus original images/video, not just screenshots\n14|- Do not trigger on: 抖音下载、纯写作改写、无下载语境的小红书内容创作\n15|\n16|## Trigger map\n17|\n18|- Strong triggers:\n19| - \"下载这篇小红书笔记\"\n20| - \"把这个博主的笔记全抓下来\"\n21| - \"去水印保存原图\"\n22| - \"抓取 RED 视频和正文\"\n23| - \"做一个小红书下载器\"\n24|- Soft triggers:\n25| - User wants note text + media together, not just screenshots\n26| - User wants an author archive rather than a single saved image\n27|- Hard exclusions:\n28| - 抖音/Douyin/TikTok 下载\n29| - 纯写作、纯改写、纯选题\n30|\n31|## Overlap routing\n32|\n33|- If the user clearly means Douyin, route to `douyin-down`.\n34|- If the user wants to write Xiaohongshu content rather than download existing notes, route to the content-creation workflow instead.\n35|\n36|## When to use\n37|\n38|- Download one or more Xiaohongshu notes to markdown plus media.\n39|- Remove watermark-bearing image URLs by reconstructing original CDN paths.\n40|- Batch-download all notes from an author profile.\n41|- Build or reuse a Xiaohongshu scraping/download pipeline.\n42|\n43|## When not to use\n44|\n45|- Douyin download: use `douyin-down`.\n46|- A pure writing or repackaging task with no download requirement.\n47|\n48|## Prerequisites\n49|\n50|- `requests`\n51|- `agent-browser` for author-profile collection\n52|- A logged-in browser session when cookies or profile scraping are required\n53|\n54|## Workflow\n55|\n56|0. For profile cover-only downloads, run `scripts/xhs_profile_covers.py` and validate `manifest.json` with `scripts/validate_manifest.py`.\n57|1. For single-note downloads, run `scripts/xhs_download.py` directly with one or more note URLs.\n58|2. Add cookies only when authentication is required or when video access needs a logged-in session.\n59|3. For author-profile collection, use `scripts/xhs_author.py` with `agent-browser --auto-connect`.\n60|4. Prefer share links that include `xsec_token` for profile discovery.\n61|5. Keep output naming explicit when the user is preparing a downstream asset library.\n62|\n63|## Output Root\n64|\n65|Default output directory:\n66|\n67|```text\n68|`./xhs-download` by default, or `XHS_OUTPUT_DIR` / `XHS_DOWNLOAD_DIR` when set\n69|```\n70|\n71|The bundled scripts use that directory by default. Temporary overrides are allowed with `-o`, `XHS_DOWNLOAD_DIR`, or `XHS_OUTPUT_DIR`.\n72|\n73|## Downie Handoff\n74|\n75|Downie 4 is available on this Mac and can be used as a fast downloader once a supported page URL or direct media URL is known:\n76|\n77|```bash\n78|scripts/downie_handoff.py \"https://example.com/video-page\"\n79|```\n80|\n81|Use Downie for handoff/fallback downloading, especially for video pages Downie already supports or direct media URLs. Keep `xhs_download.py` as the source for XHS metadata, markdown, original image reconstruction, and cases needing cookies or XHS-specific parsing. Do not claim Downie can recover App-only hidden URLs; it still needs a real URL to process.\n82|\n83|## Bundled files\n84|\n85|- `scripts/xhs_download.py`\n86|- `scripts/xhs_author.py`\n87|- `scripts/downie_handoff.py`\n88|- `scripts/xhs_profile_covers.py`\n89|- `scripts/validate_manifest.py`\n90|- `references/xhs-patterns.md`\n91|\n92|## Operating rules\n93|\n94|- Treat the author workflow as browser-automation-first; API signing paths are intentionally not the default.\n95|- Preserve markdown output plus original media whenever possible.\n96|- Keep rate limiting and cookie requirements in mind for large batch runs.\n97|\n98|## Execution contract\n99|\n100|- If the user gives real Xiaohongshu note URLs or author profile targets and asks for download, you must execute the bundled script path or stop at a clearly stated blocker such as missing cookies/session state. Do not answer with only a checklist.\n101|- If the user asks for workflow explanation only, you may stay advisory, but the first line must say `执行模式:未执行(仅流程建议)`.\n102|- For author-archive collection, prefer the browser-automation path before speculating about API-only shortcuts.\n103|\n104|## Evidence contract\n105|\n106|- Every final answer must start with one of:\n107| - `执行模式:实际下载`\n108| - `执行模式:未执行(仅流程建议)`\n109|- For `执行模式:实际下载`, the answer must also include:\n110| - `目标对象:单篇笔记/作者主页`\n111| - `执行命令:...`\n112| - output directory or exported artifact summary\n113| - whether cookies or browser session were used\n114|- Do not claim the archive is complete unless you have exported-note or media-count evidence.\n115|\n116|## Failure contract\n117|\n118|- On bad note URL, missing `xsec_token`, browser session failure, or partial media export, return:\n119| - failure reason\n120| - command attempted\n121| - whether note download or author discovery succeeded\n122|- Never blur single-note success into full-author success.\n123|\n124|## Acceptance prompts\n125|\n126|- Trigger: `请使用 $xhs-down 下载这篇小红书笔记。`\n127|- Real execution: `请使用 $xhs-down 实际准备一次作者主页批量下载,先给执行命令,再给输出证据。`\n128|- Deep path: `请使用 $xhs-down 先做 author discovery,再做单篇或批量导出,并输出每一步结果。`\n129|\n130|## Notes\n131|\n132|- `.ruff_cache` from the source skill is intentionally not migrated.\n133|- This Codex version keeps the original download logic but uses a simplified routing contract.\n134|", "total_lines": 133, "file_size": 6109, "truncated": false, "is_binary": false, "is_image": false}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"content": "501| output_dir: str = \".\",\n502| cookies: str = \"\",\n503| naming: str = \"{author}_{title}_{date}\",\n504| skip_images: bool = False,\n505| skip_video: bool = False,\n506| skip_text: bool = False,\n507| skip_live_photo: bool = False,\n508| proxy: str = \"\",\n509|) -> dict | None:\n510| \"\"\"\n511| Download a complete XHS note.\n512|\n513| Accepts any XHS URL format: explore links, discovery links,\n514| xhslink.com short links, or raw share text with embedded URLs.\n515|\n516| Returns structured note data dict on success, None on failure.\n517| \"\"\"\n518| # Normalize: resolve short links, extract from share text\n519| fetch_url = normalize_url(url)\n520|\n521| note_id = extract_note_id(fetch_url)\n522| if not note_id:\n523| print(f\"[ERR] Cannot extract note ID: {url}\")\n524| return None\n525|\n526| print(f\"Note ID: {note_id}\")\n527|\n528| # Setup session\n529| session = requests.Session()\n530| session.headers.update(DEFAULT_HEADERS)\n531| if cookies:\n532| session.headers[\"Cookie\"] = cookies\n533| if proxy:\n534| session.proxies = {\"http\": proxy, \"https\": proxy}\n535|\n536| # Fetch page — use the full resolved URL (may contain xsec_token\n537| # which allows cookie-free access via share links)\n538| print(\"Fetching page...\")\n539|\n540| try:\n541| resp = session.get(fetch_url, timeout=30)\n542| resp.raise_for_status()\n543| except Exception as e:\n544| print(f\"[ERR] Failed to fetch page: {e}\")\n545| return None\n546|\n547| # Parse note data\n548| raw_note = parse_initial_state(resp.text, note_id)\n549| if not raw_note:\n550| # Fallback: try clean explore URL (works with cookies)\n551| fallback_url = f\"https://www.xiaohongshu.com/explore/{note_id}\"\n552| if fallback_url != fetch_url:\n553| try:\n554| resp = session.get(fallback_url, timeout=30)\n555| raw_note = parse_initial_state(resp.text, note_id)\n556| except Exception:\n557| pass\n558| if not raw_note:\n559| print(\"[ERR] Failed to parse page data (may need cookies)\")\n560| return None\n561|\n562| data = extract_note_data(raw_note)\n563| note_type = \"video\" if data.get(\"video\") else \"image\"\n564| print(\n565| f\" {data['title']} | @{data['author']['nickname']} | {note_type}\"\n566| )\n567| print(\n568| f\" {len(data['images'])} images | \"\n569| f\"likes:{data['stats']['likes']} collects:{data['stats']['collects']}\"\n570| )\n571|\n572| # Build output directory\n573| dir_name = render_naming_template(naming, data)\n574| save_dir = Path(output_dir) / dir_name\n575| save_dir.mkdir(parents=True, exist_ok=True)\n576|\n577| results = {\"images\": 0, \"video\": False, \"text\": False, \"live_photos\": 0}\n578|\n579| # Download images\n580| if not skip_images and data[\"images\"]:\n581| print(f\"\\nDownloading {len(data['images'])} images...\")\n582| for i, img in enumerate(data[\"images\"], 1):\n583| ext = guess_extension(img[\"url\"])\n584| save_path = save_dir / f\"{i}{ext}\"\n585| status = \"OK\" if download_file(img[\"url\"], save_path, session) else \"FAIL\"\n586| print(f\" [{i}/{len(data['images'])}] {save_path.name} ... {status}\")\n587| results[\"images\"] += 1 if status == \"OK\" else 0\n588| if i < len(data[\"images\"]):\n589| time.sleep(0.3)\n590|\n591| # Download LivePhotos (video layer of images that have motion)\n592| live_count = sum(1 for img in data[\"images\"] if img.get(\"livePhotoUrl\"))\n593| if not skip_live_photo and live_count > 0:\n594| print(f\"\\nDownloading {live_count} LivePhotos...\")\n595| for i, img in enumerate(data[\"images\"], 1):\n596| live_url = img.get(\"livePhotoUrl\", \"\")\n597| if not live_url:\n598| continue\n599| save_path = save_dir / f\"{i}_live.mov\"\n600| if download_file(live_url, save_path, session, timeout=60):\n601| print(f\" {save_path.name} ... OK\")\n602| results[\"live_photos\"] += 1\n603| else:\n604| print(f\" {save_path.name} ... FAIL\")\n605| time.sleep(0.3)\n606|\n607| # Download video\n608| if not skip_video and data.get(\"video\") and data[\"video\"].get(\"url\"):\n609| print(\"\\nDownloading video...\")\n610| video_path = save_dir / \"video.mp4\"\n611| if download_file(data[\"video\"][\"url\"], video_path, session, timeout=120):\n612| print(\" video.mp4 ... OK\")\n613| results[\"video\"] = True\n614| else:\n615| print(\" video.mp4 ... FAIL\")\n616|\n617| # Generate markdown\n618| if not skip_text:\n619| print(\"\\nGenerating markdown...\")\n620| md = generate_markdown(data, include_images=not skip_images)\n621| md_path = save_dir / \"note.md\"\n622| md_path.write_text(md, encoding=\"utf-8\")\n623| print(\" note.md ... OK\")\n624| results[\"text\"] = True\n625|\n626| # Summary\n627| print(f\"\\n{'=' * 40}\")\n628| print(f\"Saved to: {save_dir}\")\n629| print(f\" Images: {results['images']}/{len(data['images'])}\")\n630| if live_count > 0:\n631| print(f\" Live: {results['live_photos']}/{live_count}\")\n632| if data.get(\"video\"):\n633| print(f\" Video: {'OK' if results['video'] else 'FAIL'}\")\n634| print(f\" Text: {'OK' if results['text'] else '-'}\")\n635|\n636| return data\n637|\n638|\n639|def _merge_metadata(existing: list, incoming: list) -> list:\n640| \"\"\"Merge metadata lists by noteId while preserving first-seen order.\"\"\"\n641| merged: dict[str, dict] = {}\n642| order: list[str] = []\n643|\n644| for item in existing + incoming:\n645| if not isinstance(item, dict):\n646| continue\n647| note_id = item.get(\"noteId\")\n648| if not note_id:\n649| continue\n650| if note_id not in merged:\n651| order.append(note_id)\n652| merged[note_id] = item\n653|\n654| return [merged[note_id] for note_id in order]\n655|\n656|\n657|def _load_existing_metadata(path: Path) -> list:\n658| if not path.exists():\n659| return []\n660| try:\n661| data = json.loads(path.read_text(encoding=\"utf-8\"))\n662| except (json.JSONDecodeError, OSError):\n663| return []\n664| return data if isinstance(data, list) else []\n665|\n666|\n667|# ─── CLI Entry Point ─────────────────────────────────────────────────────────\n668|\n669|\n670|def main():\n671| parser = argparse.ArgumentParser(\n672| description=\"XHS Note Downloader — 小红书笔记下载器\",\n673| formatter_class=argparse.RawDescriptionHelpFormatter,\n674| epilog=(\n675| \"Examples:\\n\"\n676| \" %(prog)s https://www.xiaohongshu.com/explore/abc123def456\\n\"\n677| \" %(prog)s URL1 URL2 URL3 -o ./downloads\\n\"\n678| ' %(prog)s URL -c \"a1=xxx; webId=yyy\"\\n'\n679| ' %(prog)s URL --naming \"{date}_{author}_{title}\"\\n'\n680| ),\n681| )\n682| parser.add_argument(\"urls\", nargs=\"+\", help=\"Note URLs (support multiple)\")\n683| default_output = str(DEFAULT_OUTPUT_DIR)\n684| parser.add_argument(\n685| \"-o\", \"--output\", default=default_output,\n686| help=f\"Output directory (default: {default_output})\",\n687| )\n688| parser.add_argument(\"-c\", \"--cookies\", default=\"\", help=\"Cookie string\")\n689| parser.add_argument(\"--cookie-file\", help=\"Path to cookie file\")\n690| parser.add_argument(\n691| \"--naming\",\n692| default=\"{author}_{title}_{date}\",\n693| help=\"Directory naming template (default: {author}_{title}_{date})\",\n694| )\n695| parser.add_argument(\"--no-images\", action=\"store_true\", help=\"Skip images\")\n696| parser.add_argument(\"--no-video\", action=\"store_true\", help=\"Skip video\")\n697| parser.add_argument(\"--no-text\", action=\"store_true\", help=\"Skip markdown\")\n698| parser.add_argument(\"--no-live-photo\", action=\"store_true\", help=\"Skip LivePhoto\")\n699| parser.add_argument(\n700| \"--proxy\", default=\"\", help=\"HTTP/SOCKS proxy (e.g. http://127.0.0.1:7890)\"\n701| )\n702| parser.add_argument(\n703| \"--delay\",\n704| type=float,\n705| default=2.0,\n706| help=\"Delay between notes in seconds (default: 2)\",\n707| )\n708| parser.add_argument(\"--json\", action=\"store_true\", help=\"Output JSON metadata\")\n709|\n710| args = parser.parse_args()\n711|\n712| # Load cookies\n713| cookies = args.cookies\n714| if args.cookie_file:\n715| cookie_path = Path(args.cookie_file)\n716| if cookie_path.exists():\n717| cookies = cookie_path.read_text().strip()\n718| else:\n719| print(f\"[WARN] Cookie file not found: {args.cookie_file}\")\n720|\n721| # Download each note\n722| success = 0\n723| all_data = []\n724| for i, url in enumerate(args.urls):\n725| if i > 0:\n726| print(f\"\\nWaiting {args.delay}s...\\n\")\n727| time.sleep(args.delay)\n728|\n729| print(f\"{'=' * 40}\")\n730| print(f\"[{i + 1}/{len(args.urls)}] {url}\")\n731| print(f\"{'=' * 40}\")\n732|\n733| data = download_note(\n734| url,\n735| args.output,\n736| cookies,\n737| args.naming,\n738| args.no_images,\n739| args.no_video,\n740| args.no_text,\n741| args.no_live_photo,\n742| args.proxy,\n743| )\n744| if data:\n745| success += 1\n746| all_data.append(data)\n747|\n748| print(f\"\\nDone: {success}/{len(args.urls)} notes downloaded\")\n749|\n750| if args.json and all_data:\n751| json_path = Path(args.output) / \"metadata.json\"\n752| metadata = _merge_metadata(_load_existing_metadata(json_path), all_data)\n753| json_path.write_text(\n754| json.dumps(metadata, ensure_ascii=False, indent=2), encoding=\"utf-8\"\n755| )\n756| print(f\"Metadata saved to: {json_path}\")\n757|\n758| return 0 if success == len(args.urls) else 1\n759|\n760|\n761|if __name__ == \"__main__\":\n762| sys.exit(main())\n763|", "total_lines": 762, "file_size": 26110, "truncated": false, "is_binary": false, "is_image": false}
assistant · 2026-06-13 01:15
好的,让我先看看 cookie 文件的格式,然后执行下载。
tool · 2026-06-13 01:15
{"content": "1|[\n2| {\n3| \"name\": \"x-rednote-holderctry\",\n4| \"value\": \"CN\",\n5| \"domain\": \".xiaohongshu.com\",\n6| \"path\": \"/\",\n7| \"expires\": 1812819922.377634,\n8| \"httpOnly\": true,\n9| \"secure\": true\n10| },\n11| {\n12| \"name\": \"id_token\",\n13| \"value\": \"VjEAAD51OQYLzMtXunRUuNdBvuMl0A3J3xdUmokdgxrvQ3Z42w9u85As8HKXz6gaNaNqsKq2apmfpGcqizPR+GUgNnk5NDm0DtF//FCOzVnfxixNGbQyKkWCLNksZQIFNTayk1hb\",\n14| \"domain\": \".xiaohongshu.com\",\n15| \"path\": \"/\",\n16| \"expires\": 1812819922.377586,\n17| \"httpOnly\": true,\n18| \"secure\": true\n19| },\n20| {\n21| \"name\": \"web_session\",\n22| \"value\": \"040069b2a5ad95ca39363b7419384b51ff1eff\",\n23| \"domain\": \".xiaohongshu.com\",\n24| \"path\": \"/\",\n25| \"expires\": 1812819922.3775,\n26| \"httpOnly\": true,\n27| \"secure\": true\n28| },\n29| {\n30| \"name\": \"gid\",\n31| \"value\": \"yjdDSSDyd2yfyjdDSSD8fWk9dYTjCk6Uv048EqiJhE1T8A283jAvvk888JJYjJy8fJ0828jK\",\n32| \"domain\": \".xiaohongshu.com\",\n33| \"path\": \"/\",\n34| \"expires\": 1815843897.437168,\n35| \"httpOnly\": false,\n36| \"secure\": false\n37| },\n38| {\n39| \"name\": \"x-rednote-datactry\",\n40| \"value\": \"CN\",\n41| \"domain\": \".xiaohongshu.com\",\n42| \"path\": \"/\",\n43| \"expires\": 1812819922.377613,\n44| \"httpOnly\": true,\n45| \"secure\": true\n46| },\n47| {\n48| \"name\": \"acw_tc\",\n49| \"value\": \"0a00db8717812838834031609e0618ebf9b77400141bb7fd2eb70cc5888727\",\n50| \"domain\": \"www.xiaohongshu.com\",\n51| \"path\": \"/\",\n52| \"expires\": 1781285683.58038,\n53| \"httpOnly\": true,\n54| \"secure\": false\n55| },\n56| {\n57| \"name\": \"sec_poison_id\",\n58| \"value\": \"1f7f2be5-61a4-49ec-83e0-78bb1fd2233a\",\n59| \"domain\": \".xiaohongshu.com\",\n60| \"path\": \"/\",\n61| \"expires\": 1781284489,\n62| \"httpOnly\": false,\n63| \"secure\": false\n64| },\n65| {\n66| \"name\": \"websectiga\",\n67| \"value\": \"cf46039d1971c7b9a650d87269f31ac8fe3bf71d61ebf9d9a0a87efb424b816c\",\n68| \"domain\": \".xiaohongshu.com\",\n69| \"path\": \"/\",\n70| \"expires\": 1781543084,\n71| \"httpOnly\": false,\n72| \"secure\": false\n73| },\n74| {\n75| \"name\": \"a1\",\n76| \"value\": \"19ebccb0de8k9xgmpya402vwlk0i9qyy17wqggt3f50000228921\",\n77| \"domain\": \".xiaohongshu.com\",\n78| \"path\": \"/\",\n79| \"expires\": 1812819884,\n80| \"httpOnly\": false,\n81| \"secure\": false\n82| },\n83| {\n84| \"name\": \"loadts\",\n85| \"value\": \"1781283884506\",\n86| \"domain\": \".xiaohongshu.com\",\n87| \"path\": \"/\",\n88| \"expires\": 1812819884,\n89| \"httpOnly\": false,\n90| \"secure\": false\n91| },\n92| {\n93| \"name\": \"webBuild\",\n94| \"value\": \"6.19.0\",\n95| \"domain\": \".xiaohongshu.com\",\n96| \"path\": \"/\",\n97| \"expires\": -1,\n98| \"httpOnly\": false,\n99| \"secure\": false\n100| },\n101| {\n102| \"name\": \"unread\",\n103| \"value\": \"{%22ub%22:%2263f593d20000000012032b62%22%2C%22ue%22:%2263dbcc480000000019020c09%22%2C%22uc%22:24}\",\n104| \"domain\": \".xiaohongshu.com\",\n105| \"path\": \"/\",\n106| \"expires\": -1,\n107| \"httpOnly\": false,\n108| \"secure\": false\n109| },\n110| {\n111| \"name\": \"ets\",\n112| \"value\": \"1781283884379\",\n113| \"domain\": \".xiaohongshu.com\",\n114| \"path\": \"/\",\n115| \"expires\": 1783875884.379787,\n116| \"httpOnly\": false,\n117| \"secure\": false\n118| },\n119| {\n120| \"name\": \"xsecappid\",\n121| \"value\": \"xhs-pc-web\",\n122| \"domain\": \".xiaohongshu.com\",\n123| \"path\": \"/\",\n124| \"expires\": 1812819884,\n125| \"httpOnly\": false,\n126| \"secure\": false\n127| },\n128| {\n129| \"name\": \"webId\",\n130| \"value\": \"2a464b0c0aa404c693087d6e8133abcb\",\n131| \"domain\": \".xiaohongshu.com\",\n132| \"path\": \"/\",\n133| \"expires\": 1812819884,\n134| \"httpOnly\": false,\n135| \"secure\": false\n136| },\n137| {\n138| \"name\": \"abRequestId\",\n139| \"value\": \"b01fbc57-5a62-5c14-b4bd-1bfd547aef1d\",\n140| \"domain\": \".xiaohongshu.com\",\n141| \"path\": \"/\",\n142| \"expires\": 1812819883.580553,\n143| \"httpOnly\": false,\n144| \"secure\": false\n145| }\n146|]", "total_lines": 145, "file_size": 3451, "truncated": false, "is_binary": false, "is_image": false}
assistant · 2026-06-13 01:15
Cookie 文件是 JSON 格式,脚本需要 cookie string 格式。我构造一个辅助脚本来处理并运行下载。
tool · 2026-06-13 01:15
{"bytes_written": 1177, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_download_xhs_video.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_download_xhs_video.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Apps > Advanced app settings > App execution aliases.", "exit_code": 49, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "/c/Users/forgase/AppData/Local/hermes/hermes-agent/venv/Scripts/python\nPython 3.11.15", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Running: C:\\Users\\forgase\\AppData\\Local\\hermes\\hermes-agent\\venv\\Scripts\\python.exe C:\\Users\\forgase\\.claude\\skills\\xhs-down\\scripts\\xhs_download.py https://www.xiaohongshu.com/discovery/item/6a2c2225000000001702ef10 -o C:/Users/forgase/Downloads -c ...\r\n========================================\r\n[1/1] https://www.xiaohongshu.com/discovery/item/6a2c2225000000001702ef10\r\n========================================\r\nNote ID: 6a2c2225000000001702ef10\r\nFetching page...\r\n[ERR] Failed to parse page data (may need cookies)\r\n\r\nDone: 0/1 notes downloaded\r\n\r\n\r\nReturn code: 1", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
页面解析失败了,让我检查一下实际页面返回了什么。
tool · 2026-06-13 01:15
{"bytes_written": 2249, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "=== Try 1: https://www.xiaohongshu.com/discovery/item/6a2c2225000000001702ef10 ===\r\nStatus: 200\r\nLength: 704079\r\nHas __INITIAL_STATE__: True\r\nMatch found, length: 254\r\n\r\n=== Try 2: https://www.xiaohongshu.com/explore/6a2c2225000000001702ef10 ===\r\nStatus: 200\r\nFinal URL: https://www.xiaohongshu.com/404?source=/404/sec_EEUiAjJV?redirectPath=https%3A%2F%2Fwww.xiaohongshu.com%2Fexplore%2F6a2c2225000000001702ef10&error_code=300031&error_msg=%E5%BD%93%E5%89%8D%E7%AC%94%E8%AE%B0%E6%9A%82%E6%97%B6%E6%97%A0%E6%B3%95%E6%B5%8F%E8%A7%88&uuid=eefa458f-10a8-4613-b661-3129a1168e45&verifyMsg=\r\nLength: 704079\r\nHas __INITIAL_STATE__: True", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"bytes_written": 3261, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs2.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs2.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Response length: 704079\r\nNo match with simple regex\r\nPattern '__INITIAL_STATE__\\s*=\\s*(\\{.+?\\})' matched, length: 254", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"bytes_written": 1394, "dirs_created": true, "lint": {"status": "error", "output": "SyntaxError: invalid syntax (line 33, column 52)"}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs3.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs3.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"success": true, "diff": "--- a/C:\\Users\\forgase\\Downloads\\_debug_xhs3.py\n+++ b/C:\\Users\\forgase\\Downloads\\_debug_xhs3.py\n@@ -30,7 +30,7 @@\n else:\n print(\"__INITIAL_STATE__ not found!\")\n # Search for noteDetailMap or other patterns\n- for term in [\"noteDetailMap\", \"noteMap\", \"note\":]:\n+ for term in [\"noteDetailMap\", \"noteMap\"]:\n p = r.text.find(term)\n if p >= 0:\n print(f\"Found '{term}' at {p}\")\n", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs3.py"], "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs3.py"}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Traceback (most recent call last):\r\n File \"C:\\Users\\forgase\\Downloads\\_debug_xhs3.py\", line 8, in <module>\r\n cookies_data = json.loads(cookie_path.read_text(encoding=\"utf-8\"))\r\n ^^^^\r\nNameError: name 'json' is not defined", "exit_code": 1, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"content": "1|#!/usr/bin/env python3\n2|\"\"\"Find __INITIAL_STATE__ with proper context.\"\"\"\n3|import re\n4|import requests\n5|from pathlib import Path\n6|\n7|cookie_path = Path(\"C:/Users/forgase/.xhs-mcp/cookies.json\")\n8|cookies_data = json.loads(cookie_path.read_text(encoding=\"utf-8\"))\n9|cookie_parts = [f\"{c['name']}={c['value']}\" for c in cookies_data if c.get('name') and c.get('value')]\n10|cookie_str = \"; \".join(cookie_parts)\n11|", "total_lines": 37, "file_size": 1385, "truncated": true, "hint": "Use offset=11 to continue reading (showing 1-10 of 37 lines)", "is_binary": false, "is_image": false}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"success": true, "diff": "--- a/C:\\Users\\forgase\\Downloads\\_debug_xhs3.py\n+++ b/C:\\Users\\forgase\\Downloads\\_debug_xhs3.py\n@@ -1,5 +1,6 @@\n #!/usr/bin/env python3\n \"\"\"Find __INITIAL_STATE__ with proper context.\"\"\"\n+import json\n import re\n import requests\n from pathlib import Path\n", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs3.py"], "lint": {"status": "ok", "output": ""}, "_warning": "C:\\Users\\forgase\\Downloads\\_debug_xhs3.py was last read with offset/limit pagination (partial view). Re-read the whole file before overwriting it.", "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs3.py"}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Found at position 692249\r\n__INITIAL_STATE__={\"global\":{\"appSettings\":{\"notificationInterval\":30,\"prefetchTimeout\":3001,\"prefetchRedisExpires\":259200000,\"webLabRedisExpires\":864000000,\"searchFilterGuideConfig\":{\"maxDailyShow\":1,\"maxTotalShow\":3,\"showInterval\":1,\"validDays\":15,\"autoCloseDelay\":5000},\"retryFeeds\":true,\"worldCupSSRDowngrade\":false,\"worldCupNightRedirectEnable\":\"enable\",\"grayModeConfig\":{\"global\":false,\"dateRange\":[\"2023-08-01 00:00:00\",\"2023-08-19 23:59:59\"],\"greyRule\":{\"layout\":{\"enable\":false,\"pages\":[\"Explore\"]},\"pages\":[\"Explore\"]},\"disableLikeNotes\":[\"64ce36f7000000000c036ba5\"],\"disableSearchHint\":false},\"NIO\":true,\"ICPInfoList\":[{\"label\":\"沪ICP备13030189号\",\"link\":\"\\u002F\\u002Fbeian.miit.gov.cn\\u002F\",\"title\":\"小红书_沪ICP备\"},{\"label\":\"营业执照\",\"link\":\"\\u002F\\u002Ffe-video-qc.xhscdn.com\\u002Ffe-platform\\u002F5581076bd6b6af2e0e943abb024ad0e16f2ebff6.pdf\",\"title\":\"小红书_营业执照\"},{\"label\":\"2024沪公网安备31010102002533号\",\"link\":\"\\u002F\\u002Fwww.beian.gov.cn\\u002Fportal\\u002FregisterSystemInfo?recordcode=31010102002533\",\"title\":\"小红书_沪公网安备\"},{\"label\":\"增值电信业务经营许可证:沪B2-20150021\",\"link\":\"\\u002F\\u002Ffe-video-qc.xhscdn.com\\u002Ffe-platform-file\\u002F104101b831hhkkll23u0678gtks7tu70004en2n231udpe\",\"title\":\"小红书_网文\"},{\"label\":\"医疗器械网络交易服务第三方平台备案:(沪)网械平台备字[2019]第00006号\",\"link\":\"\\u002F\\u002Ffe-video-qc.xhscdn.com\\u002Ffe-platform\\u002F410dce57bc12a6d7e5808060e47644fbe46f68ff.pdf\",\"title\":\"小红书_医疗器械网络交易服务\"},{\"label\":\"互联网药品信息服务资格证书:(沪)-经营性-2023-0144\",\"link\":\"\\u002F\\u002Ffe-video-qc.xhscdn.com\\u002Ffe-platform\\u002Ff37a08cacc088061beb38329c387c32fc48fc6fe.pdf\",\"title\":\"小红书_互联网药品信息服务\"},{\"label\":\"违法不良信息举报电话:4006676810\",\"link\":\"\\u002F\\u002Fwww.shjbzx.cn\",\"title\":\"小红书_上海市互联网举报中心\"},{\"label\":\"上海市互联网举报中心\",\"link\":\"\\u002F\\u002Fwww.shjbzx.cn\",\"title\":\"小红书_上海市互联网举报中心\"},{\"label\":\"网上有害信息举报专区\",\"link\":\"\\u002F\\u002Fwww.12377.cn\",\"title\":\"网上有害信息举报专区\"},{\"label\":\"自营经营者信息\",\"link\":\"\\u002F\\u002Fdc.xhscdn.com\\u002F06c2adb0-b353-11e9-9d0c-7be9ff8961c1\\u002F自营经营者信息公示.pdf\",\"title\":\"小红书_沪公网安备\"},{\"label\":\"网络文化经营许可证:沪网文(2024)1344-086号\",\"li", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"bytes_written": 3764, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs4.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs4.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Extracted initial state length: 11789\r\nState keys: ['global', 'user', 'board', 'UserFileStore', 'login', 'worldCup26', 'feed', 'layout', 'aiAbExp', 'search', 'conversation', 'traceSourceStore', 'AiSugDataForChat', 'activity', 'sportSearchOnebox', 'note', 'nioStore', 'liveList', 'liveAbExp', 'redAbExp', 'red', 'notification']\r\nnote keys: ['prevRouteData', 'prevRoute', 'commentTarget', 'isImgFullscreen', 'gotoPage', 'firstNoteId', 'autoOpenNote', 'topCommentId', 'noteDetailMap', 'serverRequestInfo', 'volume', 'recommendVideoMap', 'videoFeedType', 'rate', 'currentNoteId', 'forceScrollToComment', 'mediaWidth', 'noteHeight', 'redFirstNoteId']\r\nnoteDetailMap keys: ['null']\r\nTrying first key: null\r\nNote data keys: []\r\nHas video: False\r\n\r\nTitle: \r\nDesc: \r\nType: image", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"bytes_written": 1641, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs5.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs5.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"success": true, "diff": "--- a/C:\\Users\\forgase\\Downloads\\_debug_xhs5.py\n+++ b/C:\\Users\\forgase\\Downloads\\_debug_xhs5.py\n@@ -38,7 +38,7 @@\n print(f\" Context: ...{text[max(0,p-150):p+200]}...\")\n \n # Also search for video-related patterns\n-for pat in [r'\"video\":\\{', r'\"masterUrl\"', r'\"stream\"', r'\"media\"', r'\"noteDetailMap\"']:\n+for pat in [r'\"video\":\\{', r'\"masterUrl\"', r'\"stream\"', r'\"media\":', r'\"noteDetailMap\"']:\n matches = list(re.finditer(pat, text))\n if matches:\n print(f\"\\nPattern '{pat}': {len(matches)} matches\")\n", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs5.py"], "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs5.py"}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Traceback (most recent call last):\r\n File \"C:\\Users\\forgase\\Downloads\\_debug_xhs5.py\", line 8, in <module>\r\n cookies_data = json.loads(cookie_path.read_text(encoding=\"utf-8\"))\r\n ^^^^\r\nNameError: name 'json' is not defined", "exit_code": 1, "error": null}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"success": true, "diff": "--- a/C:\\Users\\forgase\\Downloads\\_debug_xhs5.py\n+++ b/C:\\Users\\forgase\\Downloads\\_debug_xhs5.py\n@@ -1,5 +1,6 @@\n #!/usr/bin/env python3\n \"\"\"Search for note data in HTML page.\"\"\"\n+import json\n import requests\n import re\n from pathlib import Path\n", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_xhs5.py"], "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_xhs5.py"}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Found note ID '6a2c2225000000001702ef10' at 0 positions\r\n\r\nPattern '\"noteDetailMap\"': 1 matches\r\n at 702352: ...\"noteDetailMap\":{\"undefined\":{\"comments\":{\"list\":[],\"cursor\":\"\",\"hasMore\":true,\"loading\":false,\"firstRequestFinish\":false},\"currentTime\":0,\"note\":{}}},\"serverRequestInfo\":{\"state\":\"success\",\"errorCode...", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
短链接 + xsec_token 的方式可能有效。让我试试解析短链接并直接使用脚本。
tool · 2026-06-13 01:15
{"bytes_written": 2415, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_debug_shortlink.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_debug_shortlink.py"]}
assistant · 2026-06-13 01:15
tool · 2026-06-13 01:15
{"output": "Resolved URL: https://www.xiaohongshu.com/discovery/item/6a2c2225000000001702ef10?app_platform=android&ignoreEngage=true&app_version=9.30.0&share_from_user_hidden=true&xsec_source=app_share&type=video&xsec_token=CB5slPom7beBylqUADSvyZpR9m4W7UZWG3q-VKw91n5GY%3D&author_share=1&shareRedId=ODlGMUhKSE02NzUyOTgwNjg5OTc6Pjw5&apptime=1781281376&share_id=53203f16244146ce9693f2ebd6bfd9ee&share_channel=copy_link&appuid=63d1eedd0000000026006960&xhsshare=CopyLink\r\nStatus: 200\r\nHas xsec_token!\r\nFetch status: 200\r\nHas __INITIAL_STATE__: True\r\nHas noteDetailMap: True\r\nInitial state len: 18332\r\nnoteDetailMap keys: ['6a2c2225000000001702ef10']\r\nKey '6a2c2225000000001702ef10': has video=True, title=是哪个科研牛马在熬夜改fig图🧐\r\n Video URL: http://sns-video-v6.xhscdn.com/stream/1/110/258/01ea2c21d92f6ef4010370019ebc65a454_258.mp4?sign=9391", "exit_code": 0, "error": null}
assistant · 2026-06-13 01:15
找到关键了!短链接解析后包含 `xsec_token`,可以无 cookie 获取笔记数据。video URL 已经提取到。现在用短链接重新运行下载脚本:
tool · 2026-06-13 01:15
{"bytes_written": 1140, "dirs_created": true, "lint": {"status": "ok", "output": ""}, "resolved_path": "C:\\Users\\forgase\\Downloads\\_run_xhs_download.py", "files_modified": ["C:\\Users\\forgase\\Downloads\\_run_xhs_download.py"]}
assistant · 2026-06-13 01:15