Audit & observability¶

Three signal sources¶

Regular logs — tracing output to stderr / journald, levels DEBUG/INFO/WARN/ERROR, for development and debugging.
Audit JSONL — events tagged target=futu_audit only, for compliance, post-hoc forensics, attack investigation.
Prometheus metrics — counter aggregates for alerting + dashboards.

Audit JSONL¶

Enable:

futu-opend --audit-log /var/log/futu-audit.jsonl
# or
futu-opend --audit-log /var/log/futu/       # directory → daily rotating futu-audit.log.YYYY-MM-DD

futu-mcp / futucli take the same flag.

Event schema¶

{
  "timestamp": "2026-04-15T10:23:45.123Z",
  "level": "WARN",                      // reject=WARN, allow=INFO, trade=WARN
  "target": "futu_audit",
  "iface": "rest" | "grpc" | "ws" | "mcp" | "cli",
  "endpoint": "/api/order" | "proto_id=2202" | "futu_place_order",
  "key_id": "bot_a" | "<missing>" | "<invalid>" | "<none>",
  "outcome": "allow" | "reject" | "success" | "failure",
  "reason": "limit: rate limit exceeded: 5 in 60s (cap 3)",
  "scope": "trade:real",                // on allow
  "args_hash": "8a3f2b9c"               // on trade events: first 8 hex of SHA-256
}

Common jq queries¶

# recent rejects
jq 'select(.outcome=="reject")' /var/log/futu-audit.jsonl | tail -20

# orders from a specific key
jq 'select(.key_id=="bot_a" and .endpoint|test("order|place|modify"))' \
  /var/log/futu-audit.jsonl

# reject reason histogram
jq -r 'select(.outcome=="reject") | .reason' /var/log/futu-audit.jsonl \
  | awk -F': ' '{print $1}' \
  | sort | uniq -c | sort -rn

# request distribution per iface
jq -r '.iface' /var/log/futu-audit.jsonl | sort | uniq -c

Batch analysis with DuckDB¶

-- load JSONL
CREATE TABLE audit AS SELECT * FROM read_json_auto('/var/log/futu-audit.jsonl');

-- daily order count per key
SELECT DATE(timestamp) AS day, COUNT(*) AS orders
FROM audit
WHERE key_id = 'bot_a' AND endpoint LIKE '%order%'
GROUP BY day;

Prometheus metrics¶

Scrape config:

prometheus.yml

scrape_configs:
  - job_name: futu-opend
    static_configs: [{ targets: ['opend:22222'] }]
  - job_name: futu-mcp
    static_configs: [{ targets: ['mcp:38765'] }]

Three counters¶

# HELP futu_auth_events_total Auth / trade events by iface, outcome, key_id
# TYPE futu_auth_events_total counter
futu_auth_events_total{iface="rest",outcome="allow",key_id="bot_a"} 1234

# HELP futu_auth_limit_rejects_total Limit-check rejects by iface, key_id, reason
# TYPE futu_auth_limit_rejects_total counter
futu_auth_limit_rejects_total{iface="grpc",key_id="bot_b",reason="rate"} 7

# HELP futu_ws_filtered_pushes_total Pushes filtered out for client lacking scope
# TYPE futu_ws_filtered_pushes_total counter
futu_ws_filtered_pushes_total{required_scope="trade",key_id="bot_c"} 42

Reason buckets¶

The reason label on futu_auth_limit_rejects_total is a finite set:

reason	Meaning
`rate`	Rate-limit exceeded
`daily`	Daily cap exceeded
`per_order`	Per-order cap exceeded
`market`	Market whitelist
`symbol`	Symbol whitelist
`side`	Direction whitelist
`hours`	Time window
`other`	Other (not covered by `classify_limit_reason`)

Alert rule examples¶

alerts.yml

groups:
  - name: futu-opend
    rules:
      - alert: FutuAuthRejectSpike
        expr: rate(futu_auth_events_total{outcome="reject"}[5m]) > 10
        for: 5m
        annotations:
          summary: "Auth reject rate high ({{ $value }}/s)"
          description: "possibly an attack or misconfigured key"

      - alert: FutuRateLimitFrequent
        expr: rate(futu_auth_limit_rejects_total{reason="rate"}[15m]) > 1
        for: 15m
        annotations:
          summary: "Key {{ $labels.key_id }} repeatedly hitting rate limit"

      - alert: FutuDailyCapNearLimit
        expr: futu_auth_limit_rejects_total{reason="daily"} > 5
        for: 5m
        annotations:
          summary: "Key {{ $labels.key_id }} hit the daily cap multiple times today"

Grafana dashboard¶

Starting with v1.4.103, release tarballs bundle a pre-built Grafana dashboard JSON: examples/grafana/futu-opend-dashboard.json.

Pre-built dashboard — import steps¶

Extract the release tarball and locate examples/grafana/futu-opend-dashboard.json
In Grafana, go to Dashboards → New → Import
Click Upload JSON file and upload the file from step 1
On the import page, pick your Prometheus datasource (the dashboard uses a ${DS_PROMETHEUS} variable so no JSON edit is needed)
Click Import — the dashboard loads, and $interface / $key_id variables auto-populate from your Prometheus labels

Dashboard contents¶

Two multi-select variables at the top ($interface / $key_id, both support All):

Row	Panel	Type	Core PromQL
Auth	Auth events — allow vs reject	timeseries	`sum by (outcome) (rate(futu_auth_events_total[5m]))`
Auth	Request rate by interface (stacked)	timeseries	`sum by (iface) (rate(futu_auth_events_total[5m]))`
Auth	Top rejected keys	table	`topk(10, sum by (key_id) (increase(futu_auth_events_total{outcome="reject"}[$__range])))`
Auth	Limit-reject reasons (pie)	piechart	`sum by (reason) (increase(futu_auth_limit_rejects_total[$__range]))`
Limits	Limit rejects by reason (per second)	timeseries	`sum by (reason) (rate(futu_auth_limit_rejects_total[5m]))`
Limits	Top keys by limit-rejects	table	`topk(10, sum by (key_id, reason) (increase(futu_auth_limit_rejects_total[$__range])))`
WS Push	WS push filtered (scope mismatch)	timeseries	`sum by (required_scope) (rate(futu_ws_filtered_pushes_total[5m]))`
WS Push	Top keys by WS push filtered	table	`topk(10, sum by (key_id, required_scope) (increase(futu_ws_filtered_pushes_total[$__range])))`
Summary	Allow rate / Total events / Limit rejects / WS filtered	stat × 4	range-aggregate

Total: 12 data panels + 4 row headers = 16 items, schemaVersion 38 (compatible with Grafana v10.x).

Manual panels (fallback — you already have your own dashboard)¶

If you don't want to import the whole dashboard and just want to bolt a few panels onto an existing board, copy these PromQL queries directly:

Request rate by iface — sum by (iface) (rate(futu_auth_events_total[5m]))
Allow vs Reject — sum by (outcome) (rate(futu_auth_events_total[5m]))
Top rejected keys — topk(10, sum by (key_id) (rate(futu_auth_events_total{outcome="reject"}[1h])))
Limit reject breakdown — sum by (reason) (increase(futu_auth_limit_rejects_total[$__range]))
WS filter dropped — sum by (required_scope) (rate(futu_ws_filtered_pushes_total[5m]))

Push / trade health (non-Prometheus)¶

In the current release, Prometheus exposes only the three counters above. Push stream health (F3 staleness / F4 circuit breaker / F5 subscriber info) lives on the /api/push-subscriber-info sync endpoint, not in Prometheus — see the "Push chain self-healing (v1.4.84)" section below. If you want these fields inside Grafana, you can add them via a JSON API datasource pointed at /api/push-subscriber-info (not included in the default dashboard).

How they fit together¶

Day-to-day monitoring → Grafana dashboard
Alert fires → look up the specific event in audit JSONL
Deep investigation → audit JSONL + DuckDB / jq

Metrics cover trends (numeric aggregates), audit covers forensics (specific events), and logs cover debugging (why did it fail).

Push-chain self-healing (v1.4.84)¶

v1.4.84 §9 introduced a 6-layer CMD3020 chain recovery to keep the push channel in steady state long-term. v1.4.84 A3 canary gates are the real-machine verify exit. Ops side needs to know how these components work and how to observe their state.

F3 staleness detector (30s interval / 60s threshold)¶

A background task scans the push channel every 30 seconds:

If stale for >60s and there are active subscriptions → auto trigger re-subscribe
daemon log: tracing::warn! "v1.4.84 §9 F3: push stream stale >60s, auto re-subscribe"
/api/push-subscriber-info counter resubscribe_triggers bumps

F4 circuit breaker (30s cooldown)¶

If F3's auto re-subscribe fails to restore the stream within 60s → circuit trips:

Dispatcher skips further push events for 30s to avoid spinning on bad state
30s elapses or any successful push arrives → auto-reset
Observation: /api/push-subscriber-info.is_circuit_tripped_now + counter circuit_breaker_trips

F2 retry (TradeReQuery 0ms / 1s / 3s / 9s)¶

Backend queries triggered by order/fill push notify (query_orders / query_account_info / query_order_fills) use 4-attempt exponential backoff:

Attempt 1: 0ms (immediate)
Attempt 2: 1s
Attempt 3: 3s
Attempt 4: 9s
daemon log tag: "v1.4.84 §9 F2: retry" with attempt number

F5 `/api/push-subscriber-info` field reference¶

Field	Type	Meaning
`push_stream_healthy`	bool	Composite (circuit not tripped + consecutive_errors <5 + last push <60s)
`last_push_received_at_ms`	int	Unix ms timestamp of the most recent push
`consecutive_parse_errors`	int	Consecutive parse failures (F3 fires when >=5)
`total_parse_errors`	int	Cumulative parse errors (monotonic counter)
`resubscribe_triggers`	int	How many times F3 fired auto re-sub
`circuit_breaker_trips`	int	How many times F4 has tripped
`is_circuit_tripped_now`	bool	Whether the circuit is currently tripped

F6 orphan order scan (30s interval / 5min threshold)¶

A background task periodically scans for status=1 Unsubmitted orders:

If an order is stuck in Unsubmitted for >5 minutes → daemon warn log "v1.4.84 §9 F6: orphan order detected acc_id=X order_id=Y age=Zs"
Diagnoses cases like broker losing the fill notify, daemon failing to rebuild its subscription, etc.

Canary real-machine verify (v1.4.84)¶

Location: scripts/canary.sh. 6 gates from v1.4.82 + 4 new gates from v1.4.84 A3 = 10 gates total.

Prerequisites¶

daemon already running (futu-opend up)
env vars $ACCOUNT / $PWD set (non-interactive login credentials)

Usage¶

# Run all gates
./scripts/canary.sh

# Run a single gate
./scripts/canary.sh canary_7_push_health_f5_live
./scripts/canary.sh canary_10_f3_staleness_auto_resub

Gate reference¶

Gate	Introduced	What it verifies
`canary_1_subscribe_push`	v1.4.82	Subscribe → WS push event received
`canary_2_place_order_cache`	v1.4.82	PlaceOrder → visible in `/api/orders` within 0ms
`canary_3_subscribe_wrong_fields`	v1.4.82	Subscribe with bad fields → loud error (`deny_unknown_fields` guard)
`canary_4_sim_place_order_hint`	v1.4.82	sim PlaceOrder with bad params → returns sim hint
`canary_5_history_kline_validation`	v1.4.82	history-kline validation → loud error
`canary_6_cmd3020_recovery`	v1.4.84	CMD3020 chain recovery real-machine (placeholder, depends on backend fault injection)
`canary_7_push_health_f5_live`	v1.4.84	`/api/push-subscriber-info` returns all 5 live fields
`canary_8_orphan_scan_f6`	v1.4.84	5.5min wait + daemon log shows orphan warn
`canary_9_f2_retry_exp_backoff`	v1.4.84	daemon log contains F2 retry smoke test
`canary_10_f3_staleness_auto_resub`	v1.4.84	`resubscribe_triggers` counter bumps

SKIP semantics (NOT FAIL)¶

Canary gates SKIP (not FAIL) in these cases:

daemon not running (no response on :22222)
daemon log file missing ($LOG_FILE unset or empty)
No retry / re-sub events in the observation window (the system is healthy — not a bug)

SKIP doesn't block release; FAIL blocks. On a healthy daemon, gates 7-10 often SKIP some; full PASS typically requires real-machine fault injection or waiting long enough for the background scan to fire.

Push stream troubleshooting (v1.4.84)¶

When ops sees push_stream_healthy=false or the FutuAuthRejectSpike alert fires, the triage flow:

Check /api/push-subscriber-info first
```
curl -s http://localhost:22222/api/push-subscriber-info | jq
```
Inspect push_stream_healthy / last_push_received_at_ms / consecutive_parse_errors / is_circuit_tripped_now.
If push_stream_healthy=false, dig deeper:
consecutive_parse_errors >= 5 → F3 is firing, auto re-sub within 30s
is_circuit_tripped_now=true → F4 in cooldown, auto-reset in 30s
last_push_received_at_ms >60s ago → channel stale, F3 is firing

Check daemon log for trigger history

grep "v1.4.84 §9 F3\|v1.4.84 §9 F4\|v1.4.84 §9 F6" /var/log/futu-opend.log | tail -20

Escalation scenarios
resubscribe_triggers keeps growing but push_stream_healthy stays false → long-term backend failure or bad subscription parameters; manual restart + backend-status check required
circuit_breaker_trips >3 in a short window → F3 re-sub isn't working, backend side may need admin intervention