如何在 GitHub Actions 缓存 cargo 注册表？ - 问题详情 - 创脉思

解读

国内面试常把“CI 提速”作为工程化能力的试金石。
GitHub Actions 默认每次 job 启动全新容器，~/.cargo/registry 与 ~/.cargo/git 会被完整重新下载，导致 cargo build 耗时翻倍。
面试官想确认：

你是否理解 Cargo 缓存路径与 key 策略；
能否写出可维护、可命中、可淘汰的 YAML 片段；
是否知道国内源加速 + 缓存组合打法，避免“墙外超时”导致缓存失效。

知识点

Cargo 缓存目录结构
- ~/.cargo/registry/index： crates.io 索引（git 仓库）
- ~/.cargo/registry/cache：下载的 .crate 压缩包
- ~/.cargo/git/db：git 依赖
GitHub Actions 缓存机制
- actions/cache@v4 提供最多 10 GB/仓库 的压缩缓存；
- key 一旦命中，后续步骤直接解压，无需网络；
- key 未命中时按 restore-keys 回退，实现“增量”缓存。
缓存 key 设计原则
- 包含 Cargo.lock 哈希，确保依赖变动即淘汰旧缓存；
- 包含 runner.os，避免 Linux 与 Windows 二进制混用；
- 保留 v1-cargo- 前缀，方便手动淘汰。
国内加速
- 设置 CARGO_NET_GIT_FETCH_WITH_CLI=true 可回退到系统 git，避免 libgit2 超时；
- 使用 rsproxy.cn 或 tuna.tsinghua.edu.cn 镜像时，缓存仍能命中，因为镜像只影响下载源，不影响文件内容哈希。

答案

给出一份可直接落地的 .github/workflows/ci.yml 片段，兼顾国内网络：

name: CI

on: [push, pull_request]

env:
  CARGO_NET_GIT_FETCH_WITH_CLI: true   # 国内 git 索引防超时
  CARGO_REGISTRIES_CRATES_IO_PROTOCOL: sparse  # 1.70+ 启用稀疏索引，更快

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Cache cargo registry
        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry/index
            ~/.cargo/registry/cache
            ~/.cargo/git/db
          key: **v1-cargo-${{ runner.os }}-${{ hashFiles('**/Cargo.lock') }}**
          restore-keys: |
            v1-cargo-${{ runner.os }}-

      - name: Build
        run: cargo build --release --locked

关键点解释

path 三行缺一不可，否则索引与压缩包分离会导致“部分命中”而重新下载；
key 里加入 Cargo.lock 哈希，任何依赖升级都会使旧缓存失效，防止“幽灵缓存”；
restore-keys 保留前缀匹配，feature 分支仍可复用 main 分支的缓存，减少首跑时间；
使用 --locked 参数，确保 CI 与本地构建一致，避免“缓存了旧版本却编译新版本”的尴尬。

拓展思考

缓存上限与淘汰策略
GitHub 只保存 7 天无访问 的缓存；超大项目可再加 ~/.cargo/bin 缓存 sccache，进一步把编译产物也缓存，但需控制 10 GB 总额。
多矩阵缓存隔离
若矩阵包含 交叉编译目标（如 aarch64-unknown-linux-musl），应在 key 中加入 ${{ matrix.target }}，避免 x86_64 与 arm 缓存互相污染。
自托管 runner 场景
国内公司常用 阿里云 ECS + self-hosted runner；此时可把缓存目录挂载到 NAS 共享盘，彻底跳过 actions/cache，实现秒级恢复，但需处理并发写锁。
缓存命中率监控
在 job 末尾通过 cargo tree --duplicates 检测重复依赖，结合 actions/cache/restore@v4 的 cache-hit 输出，把命中率上传到 Prometheus + Grafana，用数据驱动优化依赖树。