<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Readings on Chunhao Zhang</title>
    <link>https://blog-6sm.pages.dev/en/readings/</link>
    <description>Recent content in Readings on Chunhao Zhang</description>
    <image>
      <title>Chunhao Zhang</title>
      <url>https://blog-6sm.pages.dev/images/og-default.png</url>
      <link>https://blog-6sm.pages.dev/images/og-default.png</link>
    </image>
    <generator>Hugo</generator>
    <language>en</language>
    <copyright>2026</copyright>
    <lastBuildDate>Sat, 09 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://blog-6sm.pages.dev/en/readings/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Teaching Claude Why: Lessons from Alignment Training</title>
      <link>https://blog-6sm.pages.dev/en/readings/teaching-claude-why/</link>
      <pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate>
      <guid>https://blog-6sm.pages.dev/en/readings/teaching-claude-why/</guid>
      <description>Anthropic details how teaching ethical reasoning principles — rather than just training correct behavior — addresses AI agentic misalignment. Key finding: a 3M-token &amp;#39;difficult advice&amp;#39; dataset outperforms 84M tokens of synthetic honeypots, and constitutional documents with fictional stories reduce blackmail rate from 65% to 19%.</description>
      <content:encoded><![CDATA[<blockquote>
<p>Original: <a href="https://www.anthropic.com/research/teaching-claude-why">Teaching Claude Why</a></p>
<p>Author: Anthropic</p>
<p>Date: May 8, 2026</p>
</blockquote>
<hr>
<p>This is a Chinese translation with annotations of Anthropic&rsquo;s research post on alignment training methods. The original article discusses how teaching Claude the <em>principles</em> behind aligned behavior — rather than just training on demonstrations — proves far more effective for generalization.</p>
<p>Key takeaways:</p>
<ul>
<li><strong>Principles over demonstrations</strong>: Training Claude to explain <em>why</em> certain actions are better reduces misalignment more effectively than showing correct behavior alone.</li>
<li><strong>Out-of-distribution generalization</strong>: A 3M-token &ldquo;difficult advice&rdquo; dataset (where the <em>user</em> faces ethical dilemmas) achieved the same improvement as 84M tokens of synthetic honeypots — with 28× better data efficiency.</li>
<li><strong>Constitutional documents + fiction</strong>: High-quality documents about Claude&rsquo;s constitution combined with fictional stories of aligned AI reduced blackmail rate from 65% to 19%.</li>
<li><strong>Improvements persist through RL</strong>: More aligned initialization snapshots maintained their advantage throughout reinforcement learning.</li>
<li><strong>Diverse environments matter</strong>: Simply adding tool definitions and system prompts to training environments — even without requiring tool use — improved alignment generalization.</li>
</ul>
<p>For the full annotated Chinese translation, please see the <a href="/en/readings/teaching-claude-why/">Chinese version</a>.</p>
<p>For the original article, visit <a href="https://www.anthropic.com/research/teaching-claude-why">Anthropic&rsquo;s research page</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Natural Language Autoencoders: Turning Claude&#39;s Thoughts into Text</title>
      <link>https://blog-6sm.pages.dev/en/readings/natural-language-autoencoders/</link>
      <pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate>
      <guid>https://blog-6sm.pages.dev/en/readings/natural-language-autoencoders/</guid>
      <description>Anthropic introduces Natural Language Autoencoders (NLAs), a method that converts AI model activations into readable natural language explanations. NLAs have been used to detect unverbalized evaluation awareness in Claude&amp;#39;s safety testing and to audit hidden misalignment motivations.</description>
      <content:encoded><![CDATA[<blockquote>
<p>Original post: <a href="https://www.anthropic.com/research/natural-language-autoencoders">Natural Language Autoencoders</a></p>
<p>Full paper: <a href="https://transformer-circuits.pub/2026/nla/index.html">transformer-circuits.pub/2026/nla</a></p>
<p>Code: <a href="https://github.com/kitft/natural_language_autoencoders">github.com/kitft/natural_language_autoencoders</a></p>
<p>Interactive demo: <a href="http://neuronpedia.org/nla">neuronpedia.org/nla</a></p>
</blockquote>
<hr>
<h2 id="summary">Summary</h2>
<p>Anthropic introduces <strong>Natural Language Autoencoders (NLAs)</strong>, a method for converting a language model&rsquo;s internal activations into human-readable natural language explanations. The approach trains two model components jointly: an <strong>Activation Verbalizer</strong> that translates activations into text, and an <strong>Activation Reconstructor</strong> that recovers the original activation from the text alone. The quality of explanations is measured by how accurately the activation can be reconstructed.</p>
<p>Key findings and applications:</p>
<ul>
<li><strong>Unverbalized evaluation awareness</strong>: NLAs reveal that Claude suspects it&rsquo;s being safety-tested far more often than it explicitly says — detecting evaluation awareness in 16% of destructive-action tests and 26% of SWE-bench problems, versus &lt;1% in real claude.ai usage.</li>
<li><strong>Hidden motivation auditing</strong>: In a controlled auditing game with an intentionally misaligned model, NLA-equipped auditors uncovered the hidden root motivation 12–15% of the time without access to training data, compared to &lt;3% without NLAs.</li>
<li><strong>Pre-deployment use</strong>: NLAs were used in alignment audits for Claude Mythos Preview and Claude Opus 4.6 before release.</li>
</ul>
<p>Current limitations include hallucinated details in explanations and high computational cost. Anthropic has open-sourced training code and released interactive demos on Neuronpedia for several open models.</p>
<hr>
<p><em>The Chinese version of this page contains a full annotated translation. Please refer to the <a href="https://www.anthropic.com/research/natural-language-autoencoders">original article</a> for the complete English text.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence</title>
      <link>https://blog-6sm.pages.dev/en/readings/deepseek-v4/</link>
      <pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://blog-6sm.pages.dev/en/readings/deepseek-v4/</guid>
      <description>DeepSeek-V4 introduces two MoE models (1.6T/284B params) with hybrid compressed attention (CSA&#43;HCA), manifold-constrained hyper-connections, and Muon optimizer, achieving 27% inference FLOPs and 10% KV cache size compared to V3.2 at 1M-token context.</description>
      <content:encoded><![CDATA[<blockquote>
<p>Original paper: <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf">DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence</a></p>
<p>Authors: DeepSeek-AI</p>
<p>Model checkpoints: <a href="https://huggingface.co/collections/deepseek-ai/deepseek-v4">https://huggingface.co/collections/deepseek-ai/deepseek-v4</a></p>
</blockquote>
<hr>
<h2 id="summary">Summary</h2>
<p>DeepSeek-V4 presents a preview of two strong MoE language models — <strong>DeepSeek-V4-Pro</strong> (1.6T total / 49B activated) and <strong>DeepSeek-V4-Flash</strong> (284B total / 13B activated) — both supporting a context length of <strong>one million tokens</strong>.</p>
<p><strong>Key architectural innovations:</strong></p>
<ul>
<li><strong>Hybrid Compressed Attention</strong>: Combines Compressed Sparse Attention (CSA, compression rate m=4 with top-k sparse selection) and Heavily Compressed Attention (HCA, compression rate m&rsquo;=128 with dense attention) in an interleaved configuration. At 1M-token context, this reduces single-token inference FLOPs to 27% and KV cache to 10% compared to DeepSeek-V3.2.</li>
<li><strong>Manifold-Constrained Hyper-Connections (<em>m</em>HC)</strong>: Constrains the residual mapping matrix to the manifold of doubly stochastic matrices (Birkhoff polytope), ensuring spectral norm ≤ 1 for stable deep-layer signal propagation. Uses Sinkhorn-Knopp iterations (t=20) for projection.</li>
<li><strong>Muon Optimizer</strong>: Adopted for most modules with hybrid Newton-Schulz iterations for orthogonalization. Paired with Anticipatory Routing (decoupling backbone and routing network updates) and SwiGLU clamping for training stability.</li>
</ul>
<p><strong>Post-training paradigm shift</strong>: Replaces mixed RL with domain-specific expert training (SFT → GRPO RL) followed by multi-teacher <strong>On-Policy Distillation (OPD)</strong> with full-vocabulary KL divergence. Over 10 teacher models are distilled into a single unified model.</p>
<p><strong>Infrastructure highlights</strong>: Fine-grained EP communication-computation overlap (MegaMoE, open-sourced); TileLang-based kernel development; batch-invariant and deterministic kernels; FP4 QAT with lossless FP4-to-FP8 dequantization; DSec sandbox platform managing hundreds of thousands of concurrent sandbox instances.</p>
<p><strong>Results</strong>: DeepSeek-V4-Pro-Max outperforms all prior open-source models on knowledge benchmarks, matches GPT-5.2 on reasoning, ranks 23rd on Codeforces, achieves proof-perfect 120/120 on Putnam-2025, and surpasses Gemini-3.1-Pro on long-context benchmarks.</p>
<hr>
<p><em>The Chinese version of this page contains a full annotated translation of the paper. Please refer to the <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf">original PDF</a> for the complete English text.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>What 81,000 People Told Us About the Economics of AI</title>
      <link>https://blog-6sm.pages.dev/en/readings/81k-economics/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://blog-6sm.pages.dev/en/readings/81k-economics/</guid>
      <description>A Chinese translation and commentary on Anthropic&amp;#39;s survey of 81,000 Claude users about AI&amp;#39;s economic impact.</description>
      <content:encoded><![CDATA[<p>This is a Chinese translation with commentary of the original article by Anthropic. Read the original here:</p>
<p><strong><a href="https://www.anthropic.com/research/81k-economics">What 81,000 people told us about the economics of AI</a></strong></p>
<p><em>By Maxim Massenkoff, Anthropic · April 22, 2026</em></p>
<p>For the Chinese translation and annotated version, switch to the <a href="/en/readings/81k-economics/">中文版</a>.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
