Billy Tse
HomeRoadmapBlogContact
Playground
Buy me a bug

© 2026 Billy Tse

OnlyFansLinkedInGitHubEmail
Back to Blog
May 25, 2026•26 min read

Kalman Filter 深度拆解:點解一條公式可以由 Apollo 用到 SORT、SLAM、IMU sensor fusion?

由 1960 年 Rudolf E. Kálmán 嘅原始 paper 講起,徹底拆解 Kalman Filter 嘅 Bayesian 推導、Predict-Update cycle、Kalman Gain 嘅幾何意義,再延伸到 EKF / UKF、SORT/DeepSORT 入面嘅 constant-velocity model、SLAM 同 IMU sensor fusion 嘅實戰應用。附 NumPy + PyTorch 實作。

AIComputer Vision

原始論文:Rudolf E. Kálmán, A New Approach to Linear Filtering and Prediction Problems, ASME 1960 經典教材:Thrun, Burgard, Fox — Probabilistic Robotics(MIT Press, 2005) 現代應用代表:SORT (2016)、DeepSORT (2017)、ORB-SLAM、Apollo Guidance Computer

TL;DR

上幾篇 blog 我哋睇咗 deep learning 點樣解 ReID、pose estimation 等 high-level 嘅 vision 問題。但好多 production system 嘅 backbone 其實仲係一條 1960 年寫嘅遞推公式——Kalman Filter(卡爾曼濾波器)。

佢嘅核心問題簡單到不可思議:有 noisy 嘅 measurement,又有 noisy 嘅 dynamic model,點樣最 optimal 咁估計 state?

核心重點:

  • 🎯 Recursive Bayesian Estimation:唔需要儲存歷史 measurement,每步只用上一個 posterior + 今次 measurement 更新
  • 🧩 Predict-Update Cycle:先用 motion model predict 出 prior,再用 measurement 修正成 posterior
  • 📐 Kalman Gain 係靈魂:自動 trade-off prediction uncertainty 同 measurement uncertainty——邊個準 listen 邊個多啲
  • ⚡ Linear + Gaussian 下係 MMSE optimal:唔係 heuristic,係數學上證明 optimal 嘅 estimator
  • 🚀 由 Apollo 11 飛到 SORT:60 年嚟由 navigation、radar tracking、IMU fusion、SLAM、目標追蹤無一缺席
  • 🔧 延伸:EKF / UKF / Particle Filter 處理 non-linear / non-Gaussian 嘅現實系統

目錄

背景:State Estimation 嘅古老問題

想象你係 1960 年代 NASA 嘅工程師。Apollo 太空船要由地球飛去月球——你有:

  • Noisy IMU measurement(加速度計每秒搖兩搖)
  • 間斷嘅 ground radar fix(每幾分鐘先有一次)
  • 理論上嘅 orbital mechanics model(重力、推進力)

問題:任何一個 sensor 都唔可靠,但合埋一齊應該可以估到準確嘅位置。點樣 fuse?

Loading diagram...

三條 naive 路線

做法問題
只信 sensor每秒抖動嚴重,noise 累積
只信 motion model長期會 drift(積分 noise 爆炸)
固定權重平均權重唔可能適用所有情況(高速 vs 低速、近 radar vs 遠 radar)

💡 Kalman 嘅突破:用 Bayesian inference 做 sensor fusion,而且權重唔係 hand-tuned,係由 prediction 同 measurement 嘅 uncertainty 自動 derive。Apollo 11 嘅 onboard computer 嘅 navigation 模組就用咗 Kalman Filter——係佢將「估計」變成可計算嘅工程問題。

核心 Setup:Linear Gaussian State Space Model

Kalman Filter 假設系統可以由兩條 linear equation 描述:

Process Model(State Transition)

xk=Fkxk−1+Bkuk+wk,wk∼N(0,Qk)\boldsymbol{x}_k = \boldsymbol{F}_k \boldsymbol{x}_{k-1} + \boldsymbol{B}_k \boldsymbol{u}_k + \boldsymbol{w}_k, \quad \boldsymbol{w}_k \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{Q}_k)xk​=Fk​xk−1​+Bk​uk​+wk​,wk​∼N(0,Qk​)

Observation Model

zk=Hkxk+vk,vk∼N(0,Rk)\boldsymbol{z}_k = \boldsymbol{H}_k \boldsymbol{x}_k + \boldsymbol{v}_k, \quad \boldsymbol{v}_k \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{R}_k)zk​=Hk​xk​+vk​,vk​∼N(0,Rk​)
符號意義Tracking example(2D 物件)
xk\boldsymbol{x}_kxk​State vector[x,y,x˙,y˙]T[x, y, \dot{x}, \dot{y}]^T[x,y,x˙,y˙​]T(位置 + 速度)
Fk\boldsymbol{F}_kFk​State transition matrixconstant-velocity model
uk\boldsymbol{u}_kuk​Control inputtracking 通常 = 0
wk\boldsymbol{w}_kwk​Process noise未建模嘅加速度
zk\boldsymbol{z}_kzk​Measurementdetector 出嘅 bbox 中心
Hk\boldsymbol{H}_kHk​Observation matrix只觀察位置,唔觀察速度
vk\boldsymbol{v}_kvk​Measurement noisedetector 抖動
Qk,Rk\boldsymbol{Q}_k, \boldsymbol{R}_kQk​,Rk​Process / Measurement covariance整個 KF 嘅 hyper-parameter

🎯 注意三個 critical assumptions:

  1. Linear:state transition 同 observation 都係 linear

  2. Gaussian noise:所有 noise 都 zero-mean Gaussian

  3. Markov:xk\boldsymbol{x}_kxk​ 只依賴 xk−1\boldsymbol{x}_{k-1}xk−1​

呢三個假設一齊就保證咗:posterior 永遠都係 Gaussian——所以只需要 track μ,Σ\boldsymbol{\mu}, \boldsymbol{\Sigma}μ,Σ 兩個 sufficient statistics 就夠。

Predict-Update Cycle:5 條方程改變世界

Loading diagram...

Predict(事前估計)

x^k∣k−1=Fkx^k−1∣k−1+Bkuk\hat{\boldsymbol{x}}_{k|k-1} = \boldsymbol{F}_k \hat{\boldsymbol{x}}_{k-1|k-1} + \boldsymbol{B}_k \boldsymbol{u}_kx^k∣k−1​=Fk​x^k−1∣k−1​+Bk​uk​ Pk∣k−1=FkPk−1∣k−1FkT+Qk\boldsymbol{P}_{k|k-1} = \boldsymbol{F}_k \boldsymbol{P}_{k-1|k-1} \boldsymbol{F}_k^T + \boldsymbol{Q}_kPk∣k−1​=Fk​Pk−1∣k−1​FkT​+Qk​

直觀解讀:用 motion model 將 belief 推前一步。Covariance 一定變大(加咗 Q\boldsymbol{Q}Q)——你冇睇到 sensor 之前,自信心一定下降。

Update(事後修正)

Innovation(measurement 同 prediction 嘅 residual):

yk=zk−Hkx^k∣k−1\boldsymbol{y}_k = \boldsymbol{z}_k - \boldsymbol{H}_k \hat{\boldsymbol{x}}_{k|k-1}yk​=zk​−Hk​x^k∣k−1​

Innovation covariance:

Sk=HkPk∣k−1HkT+Rk\boldsymbol{S}_k = \boldsymbol{H}_k \boldsymbol{P}_{k|k-1} \boldsymbol{H}_k^T + \boldsymbol{R}_kSk​=Hk​Pk∣k−1​HkT​+Rk​

Kalman Gain:

Kk=Pk∣k−1HkTSk−1\boldsymbol{K}_k = \boldsymbol{P}_{k|k-1} \boldsymbol{H}_k^T \boldsymbol{S}_k^{-1}Kk​=Pk∣k−1​HkT​Sk−1​

Posterior state + covariance:

x^k∣k=x^k∣k−1+Kkyk\hat{\boldsymbol{x}}_{k|k} = \hat{\boldsymbol{x}}_{k|k-1} + \boldsymbol{K}_k \boldsymbol{y}_kx^k∣k​=x^k∣k−1​+Kk​yk​ Pk∣k=(I−KkHk)Pk∣k−1\boldsymbol{P}_{k|k} = (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \boldsymbol{P}_{k|k-1}Pk∣k​=(I−Kk​Hk​)Pk∣k−1​

🔑 Kalman Gain 嘅靈魂:

  • 當 R→0\boldsymbol{R} \to 0R→0(sensor 完美)→ K→H−1\boldsymbol{K} \to \boldsymbol{H}^{-1}K→H−1 → posterior ≈ measurement(信晒 sensor)

  • 當 R→∞\boldsymbol{R} \to \inftyR→∞(sensor 廢)→ K→0\boldsymbol{K} \to 0K→0 → posterior ≈ prediction(信晒 model)

  • 中間自動 interpolate,權重由 covariance 自然 derive,唔需要 hand-tune

一個具體 1D 例子:估體溫

你身體實際體溫係 x=37.0°Cx = 37.0°Cx=37.0°C(unknown)。你有一支體溫計,每次量度有 ±0.5°C noise。

Setup:

  • F=1,H=1,Q=0.01F = 1, H = 1, Q = 0.01F=1,H=1,Q=0.01(體溫慢慢漂)、R=0.25R = 0.25R=0.25(sensor noise variance = 0.5² = 0.25)
  • 初始 belief:x^0=36.5,P0=1.0\hat{x}_0 = 36.5, P_0 = 1.0x^0​=36.5,P0​=1.0

Round 1:量到 37.3°C

Predict:

  • x^−=1⋅36.5=36.5\hat{x}^- = 1 \cdot 36.5 = 36.5x^−=1⋅36.5=36.5
  • P−=1⋅1.0⋅1+0.01=1.01P^- = 1 \cdot 1.0 \cdot 1 + 0.01 = 1.01P−=1⋅1.0⋅1+0.01=1.01

Update:

  • y=37.3−36.5=0.8y = 37.3 - 36.5 = 0.8y=37.3−36.5=0.8
  • S=1⋅1.01⋅1+0.25=1.26S = 1 \cdot 1.01 \cdot 1 + 0.25 = 1.26S=1⋅1.01⋅1+0.25=1.26
  • K=1.01/1.26=0.802K = 1.01 / 1.26 = 0.802K=1.01/1.26=0.802
  • x^=36.5+0.802⋅0.8=37.14°C\hat{x} = 36.5 + 0.802 \cdot 0.8 = 37.14°Cx^=36.5+0.802⋅0.8=37.14°C
  • P=(1−0.802)⋅1.01=0.200P = (1 - 0.802) \cdot 1.01 = 0.200P=(1−0.802)⋅1.01=0.200

觀察:初始 belief 唔可靠(P=1.0 >> R=0.25),Kalman Gain ≈ 0.8 → 主要信 measurement。

Round 2:量到 36.8°C

Predict:x^−=37.14,P−=0.210\hat{x}^- = 37.14, P^- = 0.210x^−=37.14,P−=0.210

Update:

  • y=36.8−37.14=−0.34y = 36.8 - 37.14 = -0.34y=36.8−37.14=−0.34
  • S=0.210+0.25=0.460S = 0.210 + 0.25 = 0.460S=0.210+0.25=0.460
  • K=0.210/0.460=0.457K = 0.210 / 0.460 = 0.457K=0.210/0.460=0.457
  • x^=37.14+0.457⋅(−0.34)=36.98°C\hat{x} = 37.14 + 0.457 \cdot (-0.34) = 36.98°Cx^=37.14+0.457⋅(−0.34)=36.98°C
  • P=(1−0.457)⋅0.210=0.114P = (1 - 0.457) \cdot 0.210 = 0.114P=(1−0.457)⋅0.210=0.114

💡 注意 Kalman Gain 自動 down-weight 咗第二次 measurement(由 0.802 → 0.457)。原因:經過 Round 1,model 對 state 嘅 confidence 已經提升(P 由 1.0 → 0.200),所以下一次 measurement 嘅 relative weight 自然細啲。呢個正係 KF 嘅 elegance——你冇 tune 過任何嘢,行為自動 adapt。

Bayesian 推導:點解係 MMSE Optimal?

Loading diagram...

Kalman Filter 本質上係 recursive Bayesian filter 喺 linear-Gaussian 假設下嘅 closed-form solution。

Prediction Step(Chapman-Kolmogorov)

p(xk∣z1:k−1)=∫p(xk∣xk−1) p(xk−1∣z1:k−1) dxk−1p(\boldsymbol{x}_k | \boldsymbol{z}_{1:k-1}) = \int p(\boldsymbol{x}_k | \boldsymbol{x}_{k-1}) \, p(\boldsymbol{x}_{k-1} | \boldsymbol{z}_{1:k-1}) \, d\boldsymbol{x}_{k-1}p(xk​∣z1:k−1​)=∫p(xk​∣xk−1​)p(xk−1​∣z1:k−1​)dxk−1​

Gaussian × linear transformation 嘅 marginal 仲係 Gaussian → 直接 reduce 到 mean 同 covariance 嘅 propagation(即上面 Predict 嘅兩條方程)。

Update Step(Bayes Rule)

p(xk∣z1:k)=p(zk∣xk) p(xk∣z1:k−1)p(zk∣z1:k−1)p(\boldsymbol{x}_k | \boldsymbol{z}_{1:k}) = \frac{p(\boldsymbol{z}_k | \boldsymbol{x}_k) \, p(\boldsymbol{x}_k | \boldsymbol{z}_{1:k-1})}{p(\boldsymbol{z}_k | \boldsymbol{z}_{1:k-1})}p(xk​∣z1:k​)=p(zk​∣z1:k−1​)p(zk​∣xk​)p(xk​∣z1:k−1​)​

Gaussian × Gaussian = Gaussian(exponent 相加,still quadratic)。經過代數整理(completing the square),就得到上面 5 條 Update 方程。

🎯 MMSE Optimality 嘅意義:喺 linear-Gaussian 假設下,Kalman Filter 嘅 estimate x^k∣k\hat{\boldsymbol{x}}_{k|k}x^k∣k​ 同時係:

  • MMSE estimator:minimize E[∥xk−x^k∥2]\mathbb{E}[\|\boldsymbol{x}_k - \hat{\boldsymbol{x}}_k\|^2]E[∥xk​−x^k​∥2]

  • MAP estimator:maximize p(xk∣z1:k)p(\boldsymbol{x}_k | \boldsymbol{z}_{1:k})p(xk​∣z1:k​)

  • BLUE estimator(Best Linear Unbiased Estimator)

三個 criteria 一致——呢個係好少數情況下「優化準則 ambiguity 消失」嘅 happy case,亦解釋咗點解 KF 喺工程上咁穩定。

NumPy 實作:50 行寫齊核心 logic

pythonimport numpy as np class KalmanFilter: def __init__(self, F, H, Q, R, x0, P0, B=None): """ F: (n, n) state transition H: (m, n) observation matrix Q: (n, n) process noise cov R: (m, m) measurement noise cov x0: (n,) initial state mean P0: (n, n) initial state cov B: (n, p) control matrix (optional) """ self.F, self.H, self.Q, self.R = F, H, Q, R self.x, self.P = x0.copy(), P0.copy() self.B = B self.I = np.eye(F.shape[0]) def predict(self, u=None): self.x = self.F @ self.x if u is not None and self.B is not None: self.x = self.x + self.B @ u self.P = self.F @ self.P @ self.F.T + self.Q return self.x.copy() def update(self, z): y = z - self.H @ self.x # innovation S = self.H @ self.P @ self.H.T + self.R # innovation cov K = self.P @ self.H.T @ np.linalg.inv(S) # Kalman gain self.x = self.x + K @ y # Joseph form 更 numerically stable: # self.P = (I - KH) P (I - KH)ᵀ + K R Kᵀ self.P = (self.I - K @ self.H) @ self.P return self.x.copy() def mahalanobis(self, z): """用嚟 data association(SORT / DeepSORT 用嘅 gating)。""" y = z - self.H @ self.x S = self.H @ self.P @ self.H.T + self.R return float(y @ np.linalg.inv(S) @ y)

用嚟 track 一個 2D constant-velocity 物件

pythondt = 1.0 F = np.array([[1, 0, dt, 0], [0, 1, 0, dt], [0, 0, 1, 0], [0, 0, 0, 1]]) H = np.array([[1, 0, 0, 0], [0, 1, 0, 0]]) Q = np.diag([0.01, 0.01, 0.1, 0.1]) # 細加速度 noise R = np.diag([1.0, 1.0]) # detector noise x0 = np.array([0.0, 0.0, 1.0, 0.0]) # 估計從 (0,0) 開始以 vₓ=1 移動 P0 = np.eye(4) * 10.0 kf = KalmanFilter(F, H, Q, R, x0, P0) for t, z in enumerate(measurements): # noisy 2D detections kf.predict() kf.update(z) print(f"t={t}: pos=({kf.x[0]:.2f}, {kf.x[1]:.2f}), vel=({kf.x[2]:.2f}, {kf.x[3]:.2f})")

🔑 Numerical Stability Tip:production code 永遠用 Joseph form 更新 P,因為 floating-point error 可能令 (I−KH)P(\boldsymbol{I} - \boldsymbol{KH})\boldsymbol{P}(I−KH)P 失去 symmetry / positive-definiteness。Joseph form (I−KH)P(I−KH)T+KRKT(\boldsymbol{I}-\boldsymbol{KH})\boldsymbol{P}(\boldsymbol{I}-\boldsymbol{KH})^T + \boldsymbol{KRK}^T(I−KH)P(I−KH)T+KRKT guaranteed 保 symmetry。Apollo 嗰陣已經發現呢個問題。

Case Study 1:SORT / DeepSORT 入面嘅 KF

SORT (2016) 同 DeepSORT (2017) 係 multi-object tracking 嘅經典 baseline,到 2026 年仲係好多 production system 嘅默認 tracker。佢哋嘅 backbone 就係一個 8 維 constant-velocity Kalman Filter:

State Vector

x=[u,v,s,r,u˙,v˙,s˙]T\boldsymbol{x} = [u, v, s, r, \dot{u}, \dot{v}, \dot{s}]^Tx=[u,v,s,r,u˙,v˙,s˙]T
  • u,vu, vu,v:bbox 中心
  • sss:bbox 面積(scale)
  • rrr:aspect ratio(假設 constant,唔 track 速度)
  • 點解 r 冇速度?因為大部分物件 aspect ratio 變化好細,加 r˙\dot{r}r˙ 反而加 noise

Predict-Update Cycle 同 detector 結合

Loading diagram...

Mahalanobis Gating

DeepSORT 用 KF predict 嘅 covariance 計 Mahalanobis distance 嚟 reject 唔合理 association:

d(1)(i,j)=(zj−Hx^i)TSi−1(zj−Hx^i)d^{(1)}(i, j) = (\boldsymbol{z}_j - \boldsymbol{H}\hat{\boldsymbol{x}}_i)^T \boldsymbol{S}_i^{-1} (\boldsymbol{z}_j - \boldsymbol{H}\hat{\boldsymbol{x}}_i)d(1)(i,j)=(zj​−Hx^i​)TSi−1​(zj​−Hx^i​)

如果 d(1)>χ0.952(4)=9.4877d^{(1)} > \chi^2_{0.95}(4) = 9.4877d(1)>χ0.952​(4)=9.4877 → reject。

🎯 SORT 嘅 Lesson:一個 1960 年嘅 algorithm,配上一個 modern detector,就足以喺 MOT challenge 攞高分。複雜嘅 deep learning tracker(如 ByteTrack、OC-SORT)本質上都係 SORT + 一啲 modification——KF 一直都係靈魂。

Case Study 2:IMU + GPS Sensor Fusion

智能手機、無人機、自駕車入面,IMU(100Hz, drift)+ GPS(1Hz, accurate but jittery)嘅 fusion 係 KF 嘅 textbook application。

Sensor特性對 KF 嘅角色
IMU (accel + gyro)高頻率 (100-1000Hz)、長期 drift、短期準Process model(用嚟 predict)
GPS低頻率 (1-10Hz)、絕對位置、有 jitterMeasurement(用嚟 correct drift)
Magnetometer絕對方向、受磁場干擾Measurement(correct heading)
Wheel encoder高頻、accumulate error、轉彎打滑Measurement (擺喺 H 入面)

⚡ Trade-off 嘅美感:IMU 短期準、GPS 長期準——KF 完美咁將兩者結合。冇 KF 嘅手機定位會抖到嚇親,冇 GPS 嘅純 IMU 行幾秒就 drift 幾米。Q\boldsymbol{Q}Q(IMU noise)同 R\boldsymbol{R}R(GPS noise)嘅 tuning 就係 navigation engineer 嘅 art。

Extended Kalman Filter (EKF):處理 Non-Linearity

現實系統好少係 linear——車軚轉彎、camera projection、機械臂 forward kinematics 都係 nonlinear。

Setup

xk=f(xk−1,uk)+wk\boldsymbol{x}_k = f(\boldsymbol{x}_{k-1}, \boldsymbol{u}_k) + \boldsymbol{w}_kxk​=f(xk−1​,uk​)+wk​ zk=h(xk)+vk\boldsymbol{z}_k = h(\boldsymbol{x}_k) + \boldsymbol{v}_kzk​=h(xk​)+vk​

EKF 嘅核心 idea:Linearize via Jacobian

喺當前 estimate 附近用 Taylor 展開 first-order:

Fk≈∂f∂x∣x^k−1∣k−1,Hk≈∂h∂x∣x^k∣k−1\boldsymbol{F}_k \approx \left.\frac{\partial f}{\partial \boldsymbol{x}}\right|_{\hat{\boldsymbol{x}}_{k-1|k-1}}, \quad \boldsymbol{H}_k \approx \left.\frac{\partial h}{\partial \boldsymbol{x}}\right|_{\hat{\boldsymbol{x}}_{k|k-1}}Fk​≈∂x∂f​​x^k−1∣k−1​​,Hk​≈∂x∂h​​x^k∣k−1​​

之後 predict / update 嘅 5 條方程一樣,只係 state propagation 用真 f,hf, hf,h,covariance 傳播用 Jacobian。

經典應用:EKF-SLAM

Loading diagram...

機器人位置 + 所有 landmark 位置一齊放入 state vector,sensor model(如 range-bearing)係 nonlinear → EKF。Calculus 上漂亮,但 state 維度爆炸(n landmarks → 2n+3 維),covariance 矩陣係 dense 嘅 → 後嚟被 FastSLAM、Graph SLAM 取代。

⚠️ EKF 嘅死穴:linearization error。如果系統高度 nonlinear(例如車軚急轉),first-order Taylor 展開唔夠準,EKF 嘅 covariance 估錯 → filter 會 over-confident → 之後 reject 正確 measurement → 進一步 drift → filter divergence。

Unscented Kalman Filter (UKF):Sigma Points 嘅藝術

UKF (Julier & Uhlmann, 2000) 用 deterministic sampling 取代 linearization:

  1. 由 (μ,Σ)(\boldsymbol{\mu}, \boldsymbol{\Sigma})(μ,Σ) 生成 2n+12n+12n+1 個 sigma points(cover ±σ 嘅幾個關鍵方向)
  2. 將每個 sigma point 透過 nonlinear fff / hhh propagate(無需 Jacobian)
  3. 由 transformed sigma points 重新計算 weighted mean + covariance
Loading diagram...
FilterLinearizationAccuracy on nonlinearCost
KF不適用(要求 linear)—O(n³)
EKFFirst-order Jacobian衰(高度 nonlinear)O(n³) + 計 Jacobian
UKFSigma point samplingUp to 3rd order momentO(n³),但常數較大

💡 UKF 嘅實戰優勢:

  1. 唔使計 Jacobian——對於複雜 dynamics(如四旋翼飛行、機械臂)大幅減 implementation effort

  2. Nonlinearity 高嘅情況 accuracy 明顯好過 EKF

  3. Discontinuity 都 work(EKF 喺 discontinuity 上會死,因為 derivative undefined)

但 UKF 同樣假設 noise 係 Gaussian——遇到 multi-modal posterior 仲係要上 Particle Filter。

同 Particle Filter 嘅對比

Loading diagram...
FilterPosterior 表達LinearityGaussianityCost
KF(μ,Σ)(\boldsymbol{\mu}, \boldsymbol{\Sigma})(μ,Σ)Linear onlyGaussian onlyO(n³)
EKF(μ,Σ)(\boldsymbol{\mu}, \boldsymbol{\Sigma})(μ,Σ)Mildly nonlinearGaussian onlyO(n³)
UKF(μ,Σ)(\boldsymbol{\mu}, \boldsymbol{\Sigma})(μ,Σ)Highly nonlinearGaussian onlyO(n³)
Particle Filter{(x(i),w(i))}i=1N\{(\boldsymbol{x}^{(i)}, w^{(i)})\}_{i=1}^{N}{(x(i),w(i))}i=1N​AnyAny (包括 multi-modal)O(Nn),N 大

🎯 何時揀邊個?

  • System linear + Gaussian noise → KF(永遠係 optimal)

  • Mildly nonlinear, well-localized posterior → EKF(最常用嘅 SLAM、IMU fusion)

  • Highly nonlinear, 唔想計 Jacobian → UKF

  • Multi-modal posterior(如 robot kidnapping problem、ambiguous data association)→ Particle Filter

  • Posterior 維度極高 + sparse → Information Filter / Graph SLAM

Practical Tips:點 Tune Q 同 R?

呢個係 KF 工程上最痛嘅問題。Q(process noise)同 R(measurement noise)唔啱,filter 行為就會崩。

Tuning Strategy

症狀可能原因修正方向
Filter 反應慢,跟唔到 sudden motionQ 太細(過信 model)增加 Q(讓 KF 容許更大 motion deviation)
Filter 抖動嚴重,跟住 noise 行R 太細(過信 sensor)增加 R(讓 KF 平滑 measurement)
Innovation yky_kyk​ 一直 biasedModel 有 systematic error檢查 F / H 矩陣係咪 mis-specified
Covariance P 越嚟越細,但 estimate 衰Filter divergenceEKF:減細 step 或者轉 UKF;KF:檢查 Q 係咪太細

Innovation 統計診斷

如果 model 正確,Normalized Innovation Squared (NIS) 應該服從 χ2\chi^2χ2 分佈:

ϵk=ykTSk−1yk∼χm2\epsilon_k = \boldsymbol{y}_k^T \boldsymbol{S}_k^{-1} \boldsymbol{y}_k \sim \chi^2_{m}ϵk​=ykT​Sk−1​yk​∼χm2​

喺一個 sliding window 上面計平均,如果經常 outside 95% confidence interval → KF tuning 有問題。

pythondef chi2_check(innovations, S_matrices, window=100): """快速診斷 KF 係咪 tune 啱。""" nis = [y @ np.linalg.inv(S) @ y for y, S in zip(innovations, S_matrices)] avg_nis = np.mean(nis[-window:]) m = innovations[0].shape[0] # 預期 avg ≈ m,下界 χ²(0.025), 上界 χ²(0.975) return avg_nis, m

⚠️ 自動 tuning:可以用 EM 或者 ALS(Autocovariance Least Squares)由 innovation 序列自動估 Q, R——但需要 stationary system。對於 production tracker,通常 hand-tune + grid search 就夠。SORT 直接寫死 Q, R 而冇任何 adaptive logic——簡單就係強大。

PyTorch Differentiable Kalman Filter

2020 年代開始流行 將 KF 嵌入 deep learning 嘅 end-to-end pipeline——用 NN 學 Q, R, F,KF 做 inductive bias。

pythonimport torch import torch.nn as nn class DifferentiableKF(nn.Module): def __init__(self, state_dim, obs_dim): super().__init__() self.state_dim, self.obs_dim = state_dim, obs_dim # F, H 可以係 learnable 或者 fixed self.F = nn.Parameter(torch.eye(state_dim)) self.H = nn.Parameter(torch.zeros(obs_dim, state_dim)) # Q, R 用 NN 預測(input-dependent noise model) self.Q_net = nn.Sequential( nn.Linear(state_dim, 64), nn.ReLU(), nn.Linear(64, state_dim), ) self.R_net = nn.Sequential( nn.Linear(obs_dim, 64), nn.ReLU(), nn.Linear(64, obs_dim), ) def forward(self, x, P, z): # Predict x_pred = x @ self.F.T Q = torch.diag_embed(self.Q_net(x).exp()) P_pred = self.F @ P @ self.F.T + Q # Update y = z - x_pred @ self.H.T R = torch.diag_embed(self.R_net(z).exp()) S = self.H @ P_pred @ self.H.T + R K = P_pred @ self.H.T @ torch.linalg.inv(S) x_new = x_pred + (y.unsqueeze(-1).transpose(-1, -2) @ K.transpose(-1, -2)).squeeze(-2) I = torch.eye(self.state_dim, device=x.device) P_new = (I - K @ self.H) @ P_pred return x_new, P_new

💡 Differentiable KF 嘅 use case:

  • Learnable sensor model:CNN feature → measurement,再 KF fuse(如 DeepVO、BackpropKF)

  • Input-dependent Q, R:用 NN 預測 noise(如 occluded frame 應該大 R)

  • End-to-end training:loss 可以由 final trajectory error backprop 到 NN

但記住:Gradient through matrix inverse 同 batched linear solve 喺 GPU 上面係 expensive 操作——production 通常仲係 NumPy/Eigen CPU。

限制同常見坑

1. Linear + Gaussian 假設太強

現實世界好少完全 linear,更少完全 Gaussian。重 tail(如 GPS occasional huge error)→ KF 會 over-react → divergence。Robust KF(Huber loss innovation、Student-t noise) 係 mitigation。

2. Cold Start 問題

Initial x0,P0\boldsymbol{x}_0, \boldsymbol{P}_0x0​,P0​ 揀錯 → 頭幾步 estimate 飄走。常見做法:P0\boldsymbol{P}_0P0​ 設大啲(high prior uncertainty),等 KF 自己 converge。

3. Observability

如果 state 嘅某啲維度從 measurement 上面 unobservable(如 SLAM 入面 robot 唔郁時嘅 landmark depth),KF 嘅 covariance 喺呢個方向永遠唔減 → estimate 飄走。需要 motion 提供 information。

4. Outliers

Detector miss 一個 frame、GPS 跳一下——一個 outlier 就可能令 KF 飄走。Mitigation:

  • Gating:用 Mahalanobis distance reject outlier
  • Robust update:Innovation 太大就跳過 update step(只 predict)

5. Asynchronous Sensors

IMU 100Hz、GPS 1Hz、camera 30Hz——同 KF 嘅 fixed-rate 假設衝突。Solution:

  • Multi-rate KF:predict step 用最快 sensor 嘅 rate,update step 喺各 sensor available 時做
  • Event-based KF:完全 asynchronous,每次 measurement 嚟到就 update

技術啟示

1. Bayesian Inference + Closed-Form 嘅威力

Kalman Filter 嘅本質係 Bayesian inference 喺特殊條件下嘅 closed-form solution。現代 deep learning 入面好多 trick(VAE、diffusion model、normalizing flow)都可以追溯到 Gaussian-based probabilistic reasoning——掌握 KF 等於掌握咗呢個傳統嘅核心。

2. Recursive 比 Batch 永遠更實用

KF 嘅 recursive structure 令佢可以喺 embedded device 上面 real-time 跑,唔需要 buffer 歷史 data。同樣嘅 philosophy 反映喺:

  • Online learning vs batch learning
  • Streaming sketches vs full pass
  • Transformer 嘅 KV cache vs full attention

3. Uncertainty 比 Point Estimate 更值錢

KF output 嘅唔只係 x^\hat{\boldsymbol{x}}x^,仲有 P\boldsymbol{P}P(covariance)。下游 decision making(如自駕車嘅 safety margin、SORT 嘅 gating)正正需要呢個 uncertainty——deep learning 點解咁多 model 走去學 uncertainty(Bayesian NN、ensemble)?因為佢哋仲未追到 KF 嘅水平。

4. Simple Beats Complex(when assumptions hold)

SORT 用 1960 年嘅 KF 喺 2016 年攞 SOTA,2026 年仲喺 production——只要 assumption 合理,simple model 永遠 beat complex model。Always check if a Kalman Filter would work before reaching for a neural network.

總結

Kalman Filter 唔係一個 algorithm——係一個 mindset:

  • 用 probability 表達 belief
  • 用 dynamics 推前
  • 用 measurement 修正
  • 永遠 track uncertainty

60 年來,由 Apollo 飛去月球,到你手機嘅 GPS、到 SORT 追物件、到 SLAM 起地圖——核心都係呢條遞推。深度學習革命之後,KF 唔但冇被淘汰,反而以 differentiable form 同 inductive bias 嘅角色再生。

一張 cheat sheet

場景推薦原因
Linear system + Gaussian noiseKFMMSE optimal,5 條公式搞掂
多目標追蹤(MOT)KF + Hungarian (SORT-family)Mahalanobis gating + constant velocity
IMU + GPS fusionEKF(mildly nonlinear)已成 navigation 業界 standard
Robot SLAMEKF / UKF / Graph SLAM細場景 EKF,大場景 Graph SLAM
Quadrotor / 機械臂 controlUKF高度 nonlinear,唔想計 Jacobian
Robot kidnapping / ambiguousParticle FilterMulti-modal posterior
End-to-end learnable systemDifferentiable KFNN 學 Q, R, F,KF 做 inductive bias

相關資源

  • 📄 原始論文:Kalman 1960 — A New Approach to Linear Filtering and Prediction Problems
  • 📚 教科書 (必讀):Thrun, Burgard, Fox — Probabilistic Robotics (MIT Press, 2005)
  • 📚 應用導向:Bar-Shalom — Estimation with Applications to Tracking and Navigation
  • 💻 Python 庫:FilterPy(Roger Labbe 寫嘅 KF/EKF/UKF/PF 庫)
  • 📖 Tutorial(強烈推薦):Kalman and Bayesian Filters in Python(Jupyter notebook 互動學習)
  • 🔗 SORT:arXiv:1602.00763
  • 🔗 DeepSORT:arXiv:1703.07402
  • 🔗 BackpropKF:arXiv:1605.07148(differentiable KF 嘅 seminal paper)
  • 🔗 UKF:Julier & Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE 2004
  • 🔗 延伸閱讀:RTMO:點樣將 Coordinate Classification 塞入 YOLO,做到 One-Stage Real-Time 多人 Pose Estimation?(multi-person detection,可以接 KF 做 tracking)

Kalman Filter 教咗我哋一件事:好嘅 estimator 唔係冇 noise,而係識得 quantify noise。一個會講「我估佢喺度,但我都唔太肯定」嘅 system,永遠比「我覺得佢一定喺度」嘅 system 更可靠。下次 design system 嘅時候,問自己一句:你個 model 識唔識講「我唔知」? 📡✨

Back to all articles
目錄