性能基准¶

用 criterion.rs 跑的微基准，主要验证鉴权热路径不会成为网关瓶颈。

鉴权热路径（`futu-auth`）¶

每个到达网关的请求都要过这几步：SHA-256 hash → KeyStore 扫描 → scope 映射 → 限额 check_and_commit → metrics counter bump。总和应在百 ns 量级。

函数	单次耗时	说明
`scope_for_proto_id(u32)`	~0.3-0.6 ns	纯 `match` 内联
`hash_plaintext(32-byte)`	~217 ns	SHA-256，硬件加速时可能更快
`KeyStore::verify` (1 key)	~430 ns	常量时间比较
`KeyStore::verify` (10 keys)	~2.0 µs	线性扫描
`KeyStore::verify` (100 keys)	~16.8 µs	仍 µs 级
`RuntimeCounters::check_and_commit` (full CheckCtx)	~82 ns	走全 7 步（market/symbol/side/hours/value/rate/daily）
`RuntimeCounters::check_full_skip_rate`	~44 ns	handler 层调用，跳 rate
`MetricsRegistry::record_event`	~55 ns	DashMap insert-or-increment
`MetricsRegistry::render_prometheus` (~160+ rows)	~26 µs	仅在 `/metrics` 抓取时跑

合计鉴权开销：典型 REST 请求（10-key KeyStore + full check + audit + metrics） ≈ 2.5 µs，网关其他处理（protobuf decode / IO / 业务 handler）远大于此。

运行¶

# 首次跑建 baseline
cargo bench -p futu-auth -- --save-baseline v1.3

# 后续改动对比
cargo bench -p futu-auth -- --baseline v1.3

HTML 报告：target/criterion/report/index.html。

其他 crate¶

cargo bench -p futu-codec —— 44 字节帧头编解码 + AES-128 body
其他 crate 没有独立 benchmark（业务逻辑 / IO 重的部分靠集成测试）

机器环境¶

以上数字跑在 MacBook (Apple Silicon)，release profile + LTO 关闭（workspace release profile 配置：lto = true，但 bench 走默认 bench profile）。实际生产 Linux x86_64 上数字应在同一量级。

负载建模¶

按一次交易 API 请求算：

客户端 → 网关 入站：
  - TCP 或 HTTP 解析          ~µs 级
  - Bearer token 提取           纳秒级
  - KeyStore::verify             ~2 µs（10-key store）
  - scope_for_path/proto         <1 ns
  - check_and_commit (trade)     ~82 ns
  - audit event 发射             ~55 ns
  → 小计鉴权：~2.5 µs

业务处理：
  - protobuf decode              ~µs
  - 业务 handler                 10-100 µs（视操作而定）
  - 转发到 Futu 后端 + 网络 RTT  10-100 ms ← 这才是主成本

出站：
  - protobuf encode              ~µs
  - TCP send                     µs

结论：鉴权开销 < 0.01 ms，远小于到富途后端的网络 RTT（即使是 HK 同机房的情况下也是 ms 级）。check_and_commit 带 DashMap 并发 lock 没成为瓶颈。

性能基准¶

鉴权热路径（futu-auth）¶

运行¶

其他 crate¶

机器环境¶

负载建模¶

鉴权热路径（`futu-auth`）¶