Skip to content

Audit & observability

Three signal sources

  1. Regular logstracing output to stderr / journald, levels DEBUG/INFO/WARN/ERROR, for development and debugging.
  2. Audit JSONL — events tagged target=futu_audit only, for compliance, post-hoc forensics, attack investigation.
  3. Prometheus metrics — counter aggregates for alerting + dashboards.

Audit JSONL

Enable:

futu-opend --audit-log /var/log/futu-audit.jsonl
# or
futu-opend --audit-log /var/log/futu/       # directory → daily rotating futu-audit.log.YYYY-MM-DD

futu-mcp / futucli take the same flag.

Event schema

{
  "timestamp": "2026-04-15T10:23:45.123Z",
  "level": "WARN",                      // reject=WARN, allow=INFO, trade=WARN
  "target": "futu_audit",
  "iface": "rest" | "grpc" | "ws" | "mcp" | "cli",
  "endpoint": "/api/order" | "proto_id=2202" | "futu_place_order",
  "key_id": "bot_a" | "<missing>" | "<invalid>" | "<none>",
  "outcome": "allow" | "reject" | "success" | "failure",
  "reason": "limit: rate limit exceeded: 5 in 60s (cap 3)",
  "scope": "trade:real",                // on allow
  "args_hash": "8a3f2b9c"               // on trade events: first 8 hex of SHA-256
}

Common jq queries

# recent rejects
jq 'select(.outcome=="reject")' /var/log/futu-audit.jsonl | tail -20

# orders from a specific key
jq 'select(.key_id=="bot_a" and .endpoint|test("order|place|modify"))' \
  /var/log/futu-audit.jsonl

# reject reason histogram
jq -r 'select(.outcome=="reject") | .reason' /var/log/futu-audit.jsonl \
  | awk -F': ' '{print $1}' \
  | sort | uniq -c | sort -rn

# request distribution per iface
jq -r '.iface' /var/log/futu-audit.jsonl | sort | uniq -c

Batch analysis with DuckDB

-- load JSONL
CREATE TABLE audit AS SELECT * FROM read_json_auto('/var/log/futu-audit.jsonl');

-- daily order count per key
SELECT DATE(timestamp) AS day, COUNT(*) AS orders
FROM audit
WHERE key_id = 'bot_a' AND endpoint LIKE '%order%'
GROUP BY day;

Prometheus metrics

Scrape config:

prometheus.yml
scrape_configs:
  - job_name: futu-opend
    static_configs: [{ targets: ['opend:22222'] }]
  - job_name: futu-mcp
    static_configs: [{ targets: ['mcp:38765'] }]

Three counters

# HELP futu_auth_events_total Auth / trade events by iface, outcome, key_id
# TYPE futu_auth_events_total counter
futu_auth_events_total{iface="rest",outcome="allow",key_id="bot_a"} 1234

# HELP futu_auth_limit_rejects_total Limit-check rejects by iface, key_id, reason
# TYPE futu_auth_limit_rejects_total counter
futu_auth_limit_rejects_total{iface="grpc",key_id="bot_b",reason="rate"} 7

# HELP futu_ws_filtered_pushes_total Pushes filtered out for client lacking scope
# TYPE futu_ws_filtered_pushes_total counter
futu_ws_filtered_pushes_total{required_scope="trade",key_id="bot_c"} 42

Reason buckets

The reason label on futu_auth_limit_rejects_total is a finite set:

reason Meaning
rate Rate-limit exceeded
daily Daily cap exceeded
per_order Per-order cap exceeded
market Market whitelist
symbol Symbol whitelist
side Direction whitelist
hours Time window
other Other (not covered by classify_limit_reason)

Alert rule examples

alerts.yml
groups:
  - name: futu-opend
    rules:
      - alert: FutuAuthRejectSpike
        expr: rate(futu_auth_events_total{outcome="reject"}[5m]) > 10
        for: 5m
        annotations:
          summary: "Auth reject rate high ({{ $value }}/s)"
          description: "possibly an attack or misconfigured key"

      - alert: FutuRateLimitFrequent
        expr: rate(futu_auth_limit_rejects_total{reason="rate"}[15m]) > 1
        for: 15m
        annotations:
          summary: "Key {{ $labels.key_id }} repeatedly hitting rate limit"

      - alert: FutuDailyCapNearLimit
        expr: futu_auth_limit_rejects_total{reason="daily"} > 5
        for: 5m
        annotations:
          summary: "Key {{ $labels.key_id }} hit the daily cap multiple times today"

Grafana dashboard

Starting with v1.4.103, release tarballs bundle a pre-built Grafana dashboard JSON: examples/grafana/futu-opend-dashboard.json.

Pre-built dashboard — import steps

  1. Extract the release tarball and locate examples/grafana/futu-opend-dashboard.json
  2. In Grafana, go to Dashboards → New → Import
  3. Click Upload JSON file and upload the file from step 1
  4. On the import page, pick your Prometheus datasource (the dashboard uses a ${DS_PROMETHEUS} variable so no JSON edit is needed)
  5. Click Import — the dashboard loads, and $interface / $key_id variables auto-populate from your Prometheus labels

Dashboard contents

Two multi-select variables at the top ($interface / $key_id, both support All):

Row Panel Type Core PromQL
Auth Auth events — allow vs reject timeseries sum by (outcome) (rate(futu_auth_events_total[5m]))
Auth Request rate by interface (stacked) timeseries sum by (iface) (rate(futu_auth_events_total[5m]))
Auth Top rejected keys table topk(10, sum by (key_id) (increase(futu_auth_events_total{outcome="reject"}[$__range])))
Auth Limit-reject reasons (pie) piechart sum by (reason) (increase(futu_auth_limit_rejects_total[$__range]))
Limits Limit rejects by reason (per second) timeseries sum by (reason) (rate(futu_auth_limit_rejects_total[5m]))
Limits Top keys by limit-rejects table topk(10, sum by (key_id, reason) (increase(futu_auth_limit_rejects_total[$__range])))
WS Push WS push filtered (scope mismatch) timeseries sum by (required_scope) (rate(futu_ws_filtered_pushes_total[5m]))
WS Push Top keys by WS push filtered table topk(10, sum by (key_id, required_scope) (increase(futu_ws_filtered_pushes_total[$__range])))
Summary Allow rate / Total events / Limit rejects / WS filtered stat × 4 range-aggregate

Total: 12 data panels + 4 row headers = 16 items, schemaVersion 38 (compatible with Grafana v10.x).

Manual panels (fallback — you already have your own dashboard)

If you don't want to import the whole dashboard and just want to bolt a few panels onto an existing board, copy these PromQL queries directly:

  • Request rate by ifacesum by (iface) (rate(futu_auth_events_total[5m]))
  • Allow vs Rejectsum by (outcome) (rate(futu_auth_events_total[5m]))
  • Top rejected keystopk(10, sum by (key_id) (rate(futu_auth_events_total{outcome="reject"}[1h])))
  • Limit reject breakdownsum by (reason) (increase(futu_auth_limit_rejects_total[$__range]))
  • WS filter droppedsum by (required_scope) (rate(futu_ws_filtered_pushes_total[5m]))

Push / trade health (non-Prometheus)

In the current release, Prometheus exposes only the three counters above. Push stream health (F3 staleness / F4 circuit breaker / F5 subscriber info) lives on the /api/push-subscriber-info sync endpoint, not in Prometheus — see the "Push chain self-healing (v1.4.84)" section below. If you want these fields inside Grafana, you can add them via a JSON API datasource pointed at /api/push-subscriber-info (not included in the default dashboard).

How they fit together

  • Day-to-day monitoring → Grafana dashboard
  • Alert fires → look up the specific event in audit JSONL
  • Deep investigation → audit JSONL + DuckDB / jq

Metrics cover trends (numeric aggregates), audit covers forensics (specific events), and logs cover debugging (why did it fail).

Push-chain self-healing (v1.4.84)

v1.4.84 §9 introduced a 6-layer CMD3020 chain recovery to keep the push channel in steady state long-term. v1.4.84 A3 canary gates are the real-machine verify exit. Ops side needs to know how these components work and how to observe their state.

F3 staleness detector (30s interval / 60s threshold)

A background task scans the push channel every 30 seconds:

  • If stale for >60s and there are active subscriptions → auto trigger re-subscribe
  • daemon log: tracing::warn! "v1.4.84 §9 F3: push stream stale >60s, auto re-subscribe"
  • /api/push-subscriber-info counter resubscribe_triggers bumps

F4 circuit breaker (30s cooldown)

If F3's auto re-subscribe fails to restore the stream within 60s → circuit trips:

  • Dispatcher skips further push events for 30s to avoid spinning on bad state
  • 30s elapses or any successful push arrives → auto-reset
  • Observation: /api/push-subscriber-info.is_circuit_tripped_now + counter circuit_breaker_trips

F2 retry (TradeReQuery 0ms / 1s / 3s / 9s)

Backend queries triggered by order/fill push notify (query_orders / query_account_info / query_order_fills) use 4-attempt exponential backoff:

  • Attempt 1: 0ms (immediate)
  • Attempt 2: 1s
  • Attempt 3: 3s
  • Attempt 4: 9s
  • daemon log tag: "v1.4.84 §9 F2: retry" with attempt number

F5 /api/push-subscriber-info field reference

Field Type Meaning
push_stream_healthy bool Composite (circuit not tripped + consecutive_errors <5 + last push <60s)
last_push_received_at_ms int Unix ms timestamp of the most recent push
consecutive_parse_errors int Consecutive parse failures (F3 fires when >=5)
total_parse_errors int Cumulative parse errors (monotonic counter)
resubscribe_triggers int How many times F3 fired auto re-sub
circuit_breaker_trips int How many times F4 has tripped
is_circuit_tripped_now bool Whether the circuit is currently tripped

F6 orphan order scan (30s interval / 5min threshold)

A background task periodically scans for status=1 Unsubmitted orders:

  • If an order is stuck in Unsubmitted for >5 minutes → daemon warn log "v1.4.84 §9 F6: orphan order detected acc_id=X order_id=Y age=Zs"
  • Diagnoses cases like broker losing the fill notify, daemon failing to rebuild its subscription, etc.

Canary real-machine verify (v1.4.84)

Location: scripts/canary.sh. 6 gates from v1.4.82 + 4 new gates from v1.4.84 A3 = 10 gates total.

Prerequisites

  • daemon already running (futu-opend up)
  • env vars $ACCOUNT / $PWD set (non-interactive login credentials)

Usage

# Run all gates
./scripts/canary.sh

# Run a single gate
./scripts/canary.sh canary_7_push_health_f5_live
./scripts/canary.sh canary_10_f3_staleness_auto_resub

Gate reference

Gate Introduced What it verifies
canary_1_subscribe_push v1.4.82 Subscribe → WS push event received
canary_2_place_order_cache v1.4.82 PlaceOrder → visible in /api/orders within 0ms
canary_3_subscribe_wrong_fields v1.4.82 Subscribe with bad fields → loud error (deny_unknown_fields guard)
canary_4_sim_place_order_hint v1.4.82 sim PlaceOrder with bad params → returns sim hint
canary_5_history_kline_validation v1.4.82 history-kline validation → loud error
canary_6_cmd3020_recovery v1.4.84 CMD3020 chain recovery real-machine (placeholder, depends on backend fault injection)
canary_7_push_health_f5_live v1.4.84 /api/push-subscriber-info returns all 5 live fields
canary_8_orphan_scan_f6 v1.4.84 5.5min wait + daemon log shows orphan warn
canary_9_f2_retry_exp_backoff v1.4.84 daemon log contains F2 retry smoke test
canary_10_f3_staleness_auto_resub v1.4.84 resubscribe_triggers counter bumps

SKIP semantics (NOT FAIL)

Canary gates SKIP (not FAIL) in these cases:

  • daemon not running (no response on :22222)
  • daemon log file missing ($LOG_FILE unset or empty)
  • No retry / re-sub events in the observation window (the system is healthy — not a bug)

SKIP doesn't block release; FAIL blocks. On a healthy daemon, gates 7-10 often SKIP some; full PASS typically requires real-machine fault injection or waiting long enough for the background scan to fire.

Push stream troubleshooting (v1.4.84)

When ops sees push_stream_healthy=false or the FutuAuthRejectSpike alert fires, the triage flow:

  1. Check /api/push-subscriber-info first

    curl -s http://localhost:22222/api/push-subscriber-info | jq
    
    Inspect push_stream_healthy / last_push_received_at_ms / consecutive_parse_errors / is_circuit_tripped_now.

  2. If push_stream_healthy=false, dig deeper:

  3. consecutive_parse_errors >= 5 → F3 is firing, auto re-sub within 30s
  4. is_circuit_tripped_now=true → F4 in cooldown, auto-reset in 30s
  5. last_push_received_at_ms >60s ago → channel stale, F3 is firing

  6. Check daemon log for trigger history

    grep "v1.4.84 §9 F3\|v1.4.84 §9 F4\|v1.4.84 §9 F6" /var/log/futu-opend.log | tail -20
    

  7. Escalation scenarios

  8. resubscribe_triggers keeps growing but push_stream_healthy stays false → long-term backend failure or bad subscription parameters; manual restart + backend-status check required
  9. circuit_breaker_trips >3 in a short window → F3 re-sub isn't working, backend side may need admin intervention