Coarse Solver × Fine Solver：Parareal 點樣將時間都可以「並行」？由 Lions 2001 到 Neural-Parareal、RandNet-Parareal

📄 原始論文：Lions, Maday, Turinici, "Résolution d'EDP par un schéma en temps pararéel" (C. R. Acad. Sci., 2001) 🧪 Wikipedia entry：Parareal 🤖 Modern variants：PPINN (Karniadakis 2020)、Neural-Parareal (ITER 2024)、RandNet-Parareal (NeurIPS 2024)、Spectral Coarse Solver (Gander et al. 2025)

TL;DR

問題：解 ODE / PDE 嘅時候，時間方向係 inherently sequential——你要算 $u(t_1)$ 先算到 $u(t_2)$ 。空間方向 parallelize 好容易（domain decomposition），但時間方向點 parallelize？

Parareal 嘅答案：用兩個 solver。

⚙️ 核心 trick

Coarse solver $G$ ：cheap、quick、唔太準。負責sequentially 喺成條時間軸做粗略預測。

Fine solver $F$ ：expensive、accurate。負責喺每個 time slice 內部 parallel 做精細修正。

用一個 predictor-corrector iteration 將 $G$ 嘅粗答案逐步推向 $F$ 嘅精確答案。

經過幾個 iteration $K$ 之後就 converge，speedup ≈ $N/K$ （ $N$ = time slice 數量）。

呢個 idea 由 Lions、Maday、Turinici 喺 2001 年提出，原本係解 fluid dynamics、分子動力學、climate simulation。但近幾年呢個框架同 deep learning 撞撞下擦出新火花——用 neural network 做 coarse solver，speedup 由 ~10× 推到 100×+。今篇 blog 拆 coarse / fine solver 嘅 mathematical structure、convergence、同最新 neural variants。

為咩時間方向咁難 parallelize？

考慮一個簡單 ODE：

\frac{du}{dt} = f(u, t), \quad u(0) = u_0, \quad t \in [0, T]

如果你想喺電腦解：將 $[0, T]$ split 做好多細 step，逐 step 推：

u_{n+1} = u_n + \Delta t \cdot f(u_n, t_n) \quad \text{(forward Euler)}

呢個 update 有個根本嘅 sequential dependency： $u_{n+1}$ 要用 $u_n$ 。你冇可能直接「跳去」算 $u_{1000}$ ，因為你冇 $u_{999}$ 。

🚫 Sequential bottleneck
假設你解 1 億個 time step、每個 step 要 1ms，serial 計時 = 27 個鐘。就算你有 1000 隻 GPU，傳統 sequential time stepping 用唔到佢哋——時間方向冇 parallelism 可以食。

對比之下，空間方向好容易 parallelize：你可以將 spatial mesh split 做幾百份、每份扔俾一隻 GPU 算（domain decomposition）。但 spatial parallelism 有 limit——當 mesh 唔再夠細粒度俾你 split 嘅時候，再加 GPU 都冇用。

呢個就係 parallel-in-time (PinT) 方法嘅出發點：喺時間方向都搞啲 parallelism 出嚟。

Parareal 嘅核心 idea：兩個 solver 互補

Parareal 將時間軸 $[0, T]$ 切成 $N$ 個 sub-interval：

0 = T_0 < T_1 < T_2 < \cdots < T_N = T

每個 sub-interval $[T_n, T_{n+1}]$ 嘅長度係 $\Delta T = T/N$ 。

然後準備兩個 solver：

Loading diagram...

Solver	特性	Cost per sub-interval	Accuracy
Coarse $G$	大 step、low-order、簡化 model	$\tau_G$ （細）	低
Fine $F$	細 step、high-order、full model	$\tau_F$ （大）	高

通常 $\tau_F \gg \tau_G$ （一個 sub-interval fine solver 可能慢過 coarse solver 100 倍）。呢個 cost gap 就係 Parareal 食嘅 speedup。

Coarse 同 fine 點樣分工？

🎯 核心分工

Coarse solver $G$ ：sequentially 喺成條時間軸推一次粗答案，作為「初始 guess」。

Fine solver $F$ ：喺每個 sub-interval 內部 parallel 跑——已經有 coarse 提供嘅初始值，每個 sub-interval 之間冇 sequential dependency 啦。

然後用 predictor-corrector update 修正 coarse 嘅錯，再 iterate。

關鍵 insight：每個 fine solve 都係 independent 嘅——只要你提供 sub-interval 嘅起點，佢就可以喺自己嗰個 GPU 跑，唔需要等其他 sub-interval。Parallelism 喺呢度出。

Parareal Algorithm：完整 mathematical formulation

Notation

$U_n^k$ ：iteration $k$ 喺 time $T_n$ 嘅 Parareal solution
$G(U_n^k)$ ：用 coarse solver 喺 $[T_n, T_{n+1}]$ propagate $U_n^k$
$F(U_n^k)$ ：用 fine solver 喺 $[T_n, T_{n+1}]$ propagate $U_n^k$
$U_{\text{exact}}(T_n)$ ：真實 solution

Initialization (k = 0)：純 coarse pass

Sequentially 跑 coarse solver 一次：

U_0^0 = u_0, \quad U_{n+1}^0 = G(U_n^0), \quad n = 0, 1, \ldots, N-1

呢步係 sequential 嘅，但因為 $G$ 平，咁所以快——通常 $N \cdot \tau_G \ll \tau_F$ 都可能。

Iteration $k \to k+1$ ：predictor-corrector

呢個係 Parareal 嘅靈魂。對每個 $n$ ：

\boxed{ U_{n+1}^{k+1} = G(U_n^{k+1}) + F(U_n^k) - G(U_n^k) }

逐個 term 解釋：

Term	意義	運算 stage
$F(U_n^k)$	用上一輪嘅 initial value $U_n^k$ 跑 fine solver，得到 accurate prediction	Parallel across $n$ （key step）
$G(U_n^k)$	同上，但用 coarse solver——已經喺 iteration $k$ 算過，cache 住	已經有
$F(U_n^k) - G(U_n^k)$	Coarse solver 嘅 error——係 fine 同 coarse 之間嘅 discrepancy	—
$G(U_n^{k+1})$	用新嘅 initial value $U_n^{k+1}$ 再跑一次 coarse	Sequential across $n$ （forced）

點解 update 係咁？

直覺解釋：

💡 Predictor-Corrector Insight

Fine solver $F$ 太貴，唔可以每個 iteration 都重新 sequentially propagate 過晒。

所以用 coarse $G$ 做 sequential propagation（cheap），再用 $F - G$ 嘅 difference 做 correction。

$F(U_n^k) - G(U_n^k)$ 就係「coarse 嘅錯有幾大」——將呢個 correction 加返落新嘅 coarse propagation 上面。

Iteration 越多次， $U^k$ 越 converge 到真值， $F - G$ 嘅 error 自然就會被「self-correct」。

數學上嘅 fixed point

如果 algorithm converge 到 $U^* = U^{k+1} = U^k$ ，update 變成：

U_{n+1}^* = G(U_n^*) + F(U_n^*) - G(U_n^*) = F(U_n^*)

即係Parareal 嘅 converged solution = 完全用 fine solver sequentially 跑出嚟嘅 solution。所以：

✅ Parareal 唔係 approximation
喺 converge 之後，你得到嘅就係 fine solver 直接 serial 跑嘅 exact 結果。Parareal 唔會犧牲精度——佢係用更少嘅 wall time 拎到同 fine serial 一樣嘅 answer。

點樣 parallelize？

Loading diagram...

每個 iteration 嘅 wall time：

T_{\text{iter}} = \underbrace{\tau_F}_{\text{parallel fine, on N processors}} + \underbrace{N \cdot \tau_G}_{\text{sequential coarse correction}}

對比 serial fine solver： $T_{\text{serial}} = N \cdot \tau_F$ 。

Convergence：點解經過幾個 iteration 就夠？

Theoretical bound

對 linear ODE，可以證 Parareal 嘅 error 隨 iteration superlinearly decay：

\|U_n^k - U_{\text{exact}}(T_n)\| \leq C \cdot \frac{(C \cdot N)^k}{k!} \cdot \|G - F\|_{\text{op}}^k

幾個重要意思：

Property	Implication
Error decay 有 $k!$ 喺分母	Superlinear convergence——通常 5-10 個 iteration 就夠
Error 係 $G - F$ 嘅函數	Coarse 越接近 fine，越快收斂
Error 同 $N$ 有 $(CN)^k$ 嘅關係	$N$ 太大會延遲 convergence——sub-interval 唔好太多

實際上 K iterations 夠用？

實際 PDE / ODE 入面，通常 $K = 5 \sim 20$ iteration 就 converge 到 single-precision 精度。對應嘅 speedup：

\text{Speedup} = \frac{T_{\text{serial}}}{T_{\text{Parareal}}} = \frac{N \cdot \tau_F}{K \cdot (\tau_F + N \tau_G)} \approx \frac{N}{K} \quad \text{(when } N \tau_G \ll \tau_F\text{)}

📊 Speedup intuition

如果 $N = 100$ slice， $K = 10$ iteration converge：理論 speedup ≈ 10×

如果 $N = 1000$ slice， $K = 10$ iteration converge：理論 speedup ≈ 100×

但實際永遠細啲，因為 coarse solver 都食時間，加上 communication overhead

Coarse solver 嘅選擇：traditional approaches

點樣揀 $G$ ？呢個係 Parareal 一直最重要嘅 design decision。

Strategy	Idea	Pros	Cons
Bigger time step	$G$ 同 $F$ 用同一個 method，但 $G$ 用大 100× 嘅 $\Delta t$	實作簡單	Coarse error 可能 dominant
Lower-order method	$F$ 用 RK4， $G$ 用 forward Euler	實作簡單	Stiff problems 唔 work
Coarser spatial mesh	$F$ mesh = $1024^2$ ， $G$ mesh = $64^2$	幾乎免費（已經有 coarse mesh）	需要 inter-grid transfer operator
Reduced-order model	Project 落 low-dim subspace（POD、Krylov）	可以好快	Build POD basis 要先有 snapshot
Simplified physics	$F$ = full Navier-Stokes， $G$ = Stokes（drop nonlinear term）	有 physical insight	Domain-specific

呢啲 traditional 方法有個共通限制：coarse solver 仍然係手寫嘅 numerical scheme，accuracy 受限於人工設計。如果 coarse 同 fine 嘅 gap 太大，convergence 會慢；如果 gap 太細，coarse 又唔夠平。

呢個 tension 直接帶我哋去近年最熱嘅方向——用 neural network 做 coarse solver。

Neural network 做 coarse solver：新一波 Parareal

PPINN (Karniadakis et al., 2020)

📄 Paper：Meng et al., PPINN: Parareal physics-informed neural network for time-dependent PDEs, CMAME 2020

PPINN 嘅 setup 同經典 Parareal 倒轉：

🔄 PPINN 嘅 twist

Coarse $G$ ：傳統 numerical solver（fast, low accuracy）

Fine $F$ ：Physics-Informed Neural Network (PINN) 喺每個 sub-interval train 一個小 PINN

點解咁做？因為 train 一個 PINN 跨成個 $[0, T]$ 嘅 long time horizon 好難 converge——training 太貴 + 容易 mode collapse

Decompose 落短 sub-interval 之後，每個 PINN 只 train 一段細時間，獨立 train、獨立 inference

Loading diagram...

PPINN 喺 long-time integration 上面嘅 advantage：train 短段時間 PINN 比 train 整段更穩定、更易 converge。

Parareal with PINN as Coarse Propagator (2023)

📄 Paper：Ibrahim et al., Parareal with a physics-informed neural network as coarse propagator, Euro-Par 2023

呢篇就真正用 PINN 做 coarse solver：

🚀 Idea

PINN 訓練一次之後，inference 極快——一次 forward pass

用 GPU 跑 PINN coarse propagator，CPU 跑 fine numerical solver

Heterogeneous hardware utilization：唔再浪費 GPU 喺 coarse 嘅 mesh-based computation

Demonstrated on Black-Scholes equation：speedup 比傳統 numerical coarse solver 好

呢個 setup 第一次認真探討「neural coarse solver 唔單止 cheap，仲可以更 accurate」嘅可能性。

Neural-Parareal (ITER, 2024)

📄 Paper：Pamela et al., Neural-Parareal: Self-improving acceleration of fusion MHD simulations, Comp. Phys. Commun. 2024

呢個係 fusion plasma simulation（ITER tokamak）嘅 production use case。創新點：dynamic、self-improving coarse solver。

Loading diagram...

核心思想：Parareal 嘅 fine solver 自然會生成大量 high-fidelity training data——用呢啲 data train neural operator 做下一輪嘅 coarse solver。越跑越快。

🔬 Fusion MHD 嘅意義
ITER 模擬一個完整 tokamak discharge 喺現代 HPC cluster 都要好幾日。Neural-Parareal 將呢類 simulation 變到 deployable，係 fusion plasma research 嘅 enabler。

RandNet-Parareal (NeurIPS 2024)

📄 Paper：Gattiglio, Grigoryeva, Tamborrino, RandNet-Parareal: a time-parallel PDE solver using Random Neural Networks, NeurIPS 2024

呢篇係近年 Parareal × NN 最 elegant 嘅工作。核心觀察：

✨ RandNet 嘅 trick

Train PINN 或 neural operator 好貴——可能比 Parareal 慳嘅時間更貴

用 Random Neural Networks (RandNets)：weights 全部 random、只 train 最後一層 linear readout（即係 ELM / Reservoir Computing 風格）

Train cost ≈ 一個 linear regression

學嘅唔係 solution 本身，而係 coarse - fine 嘅 discrepancy

結果：相比 fine solver serial 嘅 125× speedup，相比經典 Parareal 嘅 22× speedup

呢個 setup 嘅 beauty：冇 deep learning training overhead，但又拎到 neural-quality 嘅 coarse correction。Universal approximator without training cost。

Parareal with Spectral Coarse Solver (Gander et al., 2025)

📄 Paper：Gander, Ohlberger, Rave, A Parareal Algorithm with Spectral Coarse Solver, 2025

最新工作（2025）。不用 neural network，但用 randomized SVD + reduced basis methods 構建 coarse solver。

核心思想：Coarse solver = local truncated singular value decomposition of fine transfer operator。透過 randomized algorithms 將 SVD 變到 embarrassingly parallel——predict 階段嘅 spectral approximation 比 single-step coarse solver 強好多。

呢個方向證明：neural network 唔係必須，spectral / reduced-order coarse solvers 仍然有 lots of room。

Coarse solver 嘅選擇 cheat sheet（2026 版）

Coarse solver type	訓練成本	Inference speed	精度	最適合
Bigger time step	0	非常快	低	Smooth, non-stiff ODE
Coarser mesh	0	快	中	Multi-scale PDE（已經有 multigrid）
Reduced-order model (POD)	中（要 snapshot）	快	中-高	Repeated parametric runs
PINN	高（每個 sub-interval / parameter 都要 retrain）	非常快（forward pass）	中-高	Parametric study、limited data 場景
Neural Operator (FNO, DeepONet)	非常高（一次性，offline）	非常快	高	Production HPC（fusion、weather）
RandNet	低（只 train readout layer）	非常快	中-高	Best of both worlds
Spectral / Reduced Basis	中（randomized SVD）	快	高	有 spectral structure 嘅 PDE

實作示範：簡單 1D heat equation

pythonimport numpy as np
from concurrent.futures import ProcessPoolExecutor

# 1D heat equation: u_t = alpha * u_xx
def fine_solver(u0, dt_fine, n_steps, alpha, dx):
    """High-accuracy fine solver: small dt, RK4."""
    u = u0.copy()
    for _ in range(n_steps):
        # RK4 update with small dt
        k1 = alpha * laplacian(u, dx)
        k2 = alpha * laplacian(u + 0.5 * dt_fine * k1, dx)
        k3 = alpha * laplacian(u + 0.5 * dt_fine * k2, dx)
        k4 = alpha * laplacian(u + dt_fine * k3, dx)
        u = u + (dt_fine / 6) * (k1 + 2*k2 + 2*k3 + k4)
    return u

def coarse_solver(u0, dT, alpha, dx):
    """Cheap coarse solver: one big forward Euler step."""
    return u0 + dT * alpha * laplacian(u0, dx)

def laplacian(u, dx):
    """Second-order central difference."""
    lap = np.zeros_like(u)
    lap[1:-1] = (u[2:] - 2*u[1:-1] + u[:-2]) / dx**2
    return lap

def parareal(u0, T, N, K, alpha, dx, dt_fine):
    """Parareal algorithm.
    
    Args:
        u0: initial condition (spatial array)
        T: final time
        N: number of time slices
        K: max iterations
        alpha: diffusion coeff
        dx: spatial step
        dt_fine: fine solver time step
    """
    dT = T / N
    n_fine_steps = int(dT / dt_fine)
    
    # Storage: U[k][n] = solution at iteration k, time T_n
    U = [[None] * (N + 1) for _ in range(K + 1)]
    U[0][0] = u0
    
    # ===== Iteration 0: pure coarse pass (sequential) =====
    for n in range(N):
        U[0][n+1] = coarse_solver(U[0][n], dT, alpha, dx)
    
    # ===== Parareal iterations =====
    for k in range(K):
        # Step 1: fine solver in PARALLEL across all sub-intervals
        with ProcessPoolExecutor(max_workers=N) as pool:
            fine_results = list(pool.map(
                fine_solver,
                [U[k][n] for n in range(N)],
                [dt_fine] * N,
                [n_fine_steps] * N,
                [alpha] * N,
                [dx] * N,
            ))
        
        # Step 2: coarse solver on previous iteration (cached)
        coarse_old = [coarse_solver(U[k][n], dT, alpha, dx) for n in range(N)]
        
        # Step 3: Predictor-corrector update (sequential but cheap)
        U[k+1][0] = u0
        for n in range(N):
            coarse_new = coarse_solver(U[k+1][n], dT, alpha, dx)
            # Predictor-corrector: U_{n+1}^{k+1} = G(U_n^{k+1}) + F(U_n^k) - G(U_n^k)
            U[k+1][n+1] = coarse_new + fine_results[n] - coarse_old[n]
        
        # Check convergence
        max_change = max(
            np.linalg.norm(U[k+1][n] - U[k][n])
            for n in range(N + 1)
        )
        print(f"Iteration {k+1}: max change = {max_change:.6e}")
        if max_change < 1e-8:
            print(f"Converged at iteration {k+1}")
            return U[k+1]
    
    return U[K]

呢個 minimal implementation 演示晒兩個關鍵點：

Fine solver call 喺 ProcessPoolExecutor 入面 parallelize——呢度係速度嘅嚟源
Predictor-corrector update 仍然 sequential，但因為只 call coarse solver，所以快

實際 production code（XBraid、PFASST、libpfasst 等）會處理 stiff equation、adaptive time step、multilevel hierarchies 等等，但核心算法仲係呢個。

Limitations 同 open problems

Hyperbolic / advection-dominated PDEs

⚠️ Parareal 嘅死穴
對 hyperbolic PDE（advection、wave equation），Parareal 出名地唔 converge 或者 converge 慢——因為呢類 PDE 冇 dissipation，coarse 嘅 phase error 唔會被 damped。

即係話：Parareal 喺 parabolic PDE（heat、diffusion）work 好好，但 wave-like 嘅 problem 仲未完全解決。

近年嘅 workaround：

PFASST（Parallel Full Approximation Scheme in Space and Time）：multilevel 嘅 Parareal
Krylov-enhanced Parareal：加 Krylov subspace acceleration
ParaExp：specialize 喺 oscillatory ODE

Strong scaling limit

Speedup 上限係 $N/K$ 。如果 problem 本身需要 $K = O(N)$ iteration 先 converge，Parareal 完全冇 speedup——慘到比 serial 仲慢（因為要算 coarse pass）。

Communication overhead

每個 iteration 結束之後，所有 processor 要 sync 同交換 sub-interval boundary value。N 大嘅時候 communication dominate runtime。

Coarse solver design 仍然係 art

雖然有 RandNet / NO / spectral 等新 approach，但冇一個 universal recipe——每個 problem 都要 case-by-case 調 coarse solver。

總結：點解 coarse vs fine solver 係一個值得深入嘅 idea

🎯 5 個 takeaway

時間方向唔係 inherently sequential——只要你接受「approximate first, correct later」嘅範式，時間都可以 parallelize

Coarse / fine 嘅分工係 universal pattern——唔單止喺 PDE，喺 multi-scale modeling、agentic AI、optimizer pipeline 都見到呢個結構

Convergence 速度由 coarse-fine gap 決定——所以 neural network coarse solver 越 accurate 越好

RandNet 證明咗：你唔需要 expensive training 都可以拎到 neural-quality coarse solver

Hyperbolic PDE 仍然係 open problem——下一代 Parareal 可能要結合 implicit method 或 multilevel hierarchy 先解決

呢個 algorithm 喺 2001 年提出嘅時候，主要 use case 係 fluid dynamics / climate simulation。但 25 年之後嘅今日，深度學習嘅興起反而令 Parareal 嘅 coarse solver 設計擁有前所未有嘅 flexibility——neural operator、PINN、RandNet 三條路線同時推進，speedup 由 ~10× 推到 100×+。

更深一層嘅啟示：「sequential inherently → parallel via correction」係一個 transferable pattern。**Speculative decoding（LLM inference）**就係一個直接嘅類比——一個 small draft model（coarse）做 quick prediction，再由 large target model（fine）做 verification。同一個哲學，不同領域。下一次當你見到一個 sequential bottleneck 嘅時候，諗下：可唔可以將佢拆成 coarse + fine？

\frac{du}{dt} = f(u, t), \quad u(0) = u_0, \quad t \in [0, T]

如果你想喺電腦解：將 $[0, T]$ split 做好多細 step，逐 step 推：

u_{n+1} = u_n + \Delta t \cdot f(u_n, t_n) \quad \text{(forward Euler)}

呢個 update 有個根本嘅 sequential dependency： $u_{n+1}$ 要用 $u_n$ 。你冇可能直接「跳去」算 $u_{1000}$ ，因為你冇 $u_{999}$ 。

🚫 Sequential bottleneck
假設你解 1 億個 time step、每個 step 要 1ms，serial 計時 = 27 個鐘。就算你有 1000 隻 GPU，傳統 sequential time stepping 用唔到佢哋——時間方向冇 parallelism 可以食。

呢個就係 parallel-in-time (PinT) 方法嘅出發點：喺時間方向都搞啲 parallelism 出嚟。

Parareal 嘅核心 idea：兩個 solver 互補

Parareal 將時間軸 $[0, T]$ 切成 $N$ 個 sub-interval：

0 = T_0 < T_1 < T_2 < \cdots < T_N = T

每個 sub-interval $[T_n, T_{n+1}]$ 嘅長度係 $\Delta T = T/N$ 。

然後準備兩個 solver：

Loading diagram...

Solver	特性	Cost per sub-interval	Accuracy
Coarse $G$	大 step、low-order、簡化 model	$\tau_G$ （細）	低
Fine $F$	細 step、high-order、full model	$\tau_F$ （大）	高

通常 $\tau_F \gg \tau_G$ （一個 sub-interval fine solver 可能慢過 coarse solver 100 倍）。呢個 cost gap 就係 Parareal 食嘅 speedup。

Coarse 同 fine 點樣分工？

🎯 核心分工

Coarse solver $G$ ：sequentially 喺成條時間軸推一次粗答案，作為「初始 guess」。

Fine solver $F$ ：喺每個 sub-interval 內部 parallel 跑——已經有 coarse 提供嘅初始值，每個 sub-interval 之間冇 sequential dependency 啦。

然後用 predictor-corrector update 修正 coarse 嘅錯，再 iterate。

Parareal Algorithm：完整 mathematical formulation

Notation

$U_n^k$ ：iteration $k$ 喺 time $T_n$ 嘅 Parareal solution
$G(U_n^k)$ ：用 coarse solver 喺 $[T_n, T_{n+1}]$ propagate $U_n^k$
$F(U_n^k)$ ：用 fine solver 喺 $[T_n, T_{n+1}]$ propagate $U_n^k$
$U_{\text{exact}}(T_n)$ ：真實 solution

Initialization (k = 0)：純 coarse pass

Sequentially 跑 coarse solver 一次：

U_0^0 = u_0, \quad U_{n+1}^0 = G(U_n^0), \quad n = 0, 1, \ldots, N-1

呢步係 sequential 嘅，但因為 $G$ 平，咁所以快——通常 $N \cdot \tau_G \ll \tau_F$ 都可能。

Iteration $k \to k+1$ ：predictor-corrector

呢個係 Parareal 嘅靈魂。對每個 $n$ ：

\boxed{ U_{n+1}^{k+1} = G(U_n^{k+1}) + F(U_n^k) - G(U_n^k) }

逐個 term 解釋：

Term	意義	運算 stage
$F(U_n^k)$	用上一輪嘅 initial value $U_n^k$ 跑 fine solver，得到 accurate prediction	Parallel across $n$ （key step）
$G(U_n^k)$	同上，但用 coarse solver——已經喺 iteration $k$ 算過，cache 住	已經有
$F(U_n^k) - G(U_n^k)$	Coarse solver 嘅 error——係 fine 同 coarse 之間嘅 discrepancy	—
$G(U_n^{k+1})$	用新嘅 initial value $U_n^{k+1}$ 再跑一次 coarse	Sequential across $n$ （forced）

點解 update 係咁？

直覺解釋：

💡 Predictor-Corrector Insight

Fine solver $F$ 太貴，唔可以每個 iteration 都重新 sequentially propagate 過晒。

所以用 coarse $G$ 做 sequential propagation（cheap），再用 $F - G$ 嘅 difference 做 correction。

$F(U_n^k) - G(U_n^k)$ 就係「coarse 嘅錯有幾大」——將呢個 correction 加返落新嘅 coarse propagation 上面。

Iteration 越多次， $U^k$ 越 converge 到真值， $F - G$ 嘅 error 自然就會被「self-correct」。

數學上嘅 fixed point

如果 algorithm converge 到 $U^* = U^{k+1} = U^k$ ，update 變成：

U_{n+1}^* = G(U_n^*) + F(U_n^*) - G(U_n^*) = F(U_n^*)

即係Parareal 嘅 converged solution = 完全用 fine solver sequentially 跑出嚟嘅 solution。所以：

✅ Parareal 唔係 approximation
喺 converge 之後，你得到嘅就係 fine solver 直接 serial 跑嘅 exact 結果。Parareal 唔會犧牲精度——佢係用更少嘅 wall time 拎到同 fine serial 一樣嘅 answer。

點樣 parallelize？

Loading diagram...

每個 iteration 嘅 wall time：

T_{\text{iter}} = \underbrace{\tau_F}_{\text{parallel fine, on N processors}} + \underbrace{N \cdot \tau_G}_{\text{sequential coarse correction}}

對比 serial fine solver： $T_{\text{serial}} = N \cdot \tau_F$ 。

Convergence：點解經過幾個 iteration 就夠？

Theoretical bound

對 linear ODE，可以證 Parareal 嘅 error 隨 iteration superlinearly decay：

\|U_n^k - U_{\text{exact}}(T_n)\| \leq C \cdot \frac{(C \cdot N)^k}{k!} \cdot \|G - F\|_{\text{op}}^k

幾個重要意思：

Property	Implication
Error decay 有 $k!$ 喺分母	Superlinear convergence——通常 5-10 個 iteration 就夠
Error 係 $G - F$ 嘅函數	Coarse 越接近 fine，越快收斂
Error 同 $N$ 有 $(CN)^k$ 嘅關係	$N$ 太大會延遲 convergence——sub-interval 唔好太多

實際上 K iterations 夠用？

實際 PDE / ODE 入面，通常 $K = 5 \sim 20$ iteration 就 converge 到 single-precision 精度。對應嘅 speedup：

\text{Speedup} = \frac{T_{\text{serial}}}{T_{\text{Parareal}}} = \frac{N \cdot \tau_F}{K \cdot (\tau_F + N \tau_G)} \approx \frac{N}{K} \quad \text{(when } N \tau_G \ll \tau_F\text{)}

📊 Speedup intuition

如果 $N = 100$ slice， $K = 10$ iteration converge：理論 speedup ≈ 10×

如果 $N = 1000$ slice， $K = 10$ iteration converge：理論 speedup ≈ 100×

但實際永遠細啲，因為 coarse solver 都食時間，加上 communication overhead

Coarse solver 嘅選擇：traditional approaches

點樣揀 $G$ ？呢個係 Parareal 一直最重要嘅 design decision。

Strategy	Idea	Pros	Cons
Bigger time step	$G$ 同 $F$ 用同一個 method，但 $G$ 用大 100× 嘅 $\Delta t$	實作簡單	Coarse error 可能 dominant
Lower-order method	$F$ 用 RK4， $G$ 用 forward Euler	實作簡單	Stiff problems 唔 work
Coarser spatial mesh	$F$ mesh = $1024^2$ ， $G$ mesh = $64^2$	幾乎免費（已經有 coarse mesh）	需要 inter-grid transfer operator
Reduced-order model	Project 落 low-dim subspace（POD、Krylov）	可以好快	Build POD basis 要先有 snapshot
Simplified physics	$F$ = full Navier-Stokes， $G$ = Stokes（drop nonlinear term）	有 physical insight	Domain-specific

呢個 tension 直接帶我哋去近年最熱嘅方向——用 neural network 做 coarse solver。

Neural network 做 coarse solver：新一波 Parareal

PPINN (Karniadakis et al., 2020)

📄 Paper：Meng et al., PPINN: Parareal physics-informed neural network for time-dependent PDEs, CMAME 2020

PPINN 嘅 setup 同經典 Parareal 倒轉：

🔄 PPINN 嘅 twist

Coarse $G$ ：傳統 numerical solver（fast, low accuracy）

Fine $F$ ：Physics-Informed Neural Network (PINN) 喺每個 sub-interval train 一個小 PINN

點解咁做？因為 train 一個 PINN 跨成個 $[0, T]$ 嘅 long time horizon 好難 converge——training 太貴 + 容易 mode collapse

Decompose 落短 sub-interval 之後，每個 PINN 只 train 一段細時間，獨立 train、獨立 inference

Loading diagram...

PPINN 喺 long-time integration 上面嘅 advantage：train 短段時間 PINN 比 train 整段更穩定、更易 converge。

Parareal with PINN as Coarse Propagator (2023)

📄 Paper：Ibrahim et al., Parareal with a physics-informed neural network as coarse propagator, Euro-Par 2023

呢篇就真正用 PINN 做 coarse solver：

🚀 Idea

PINN 訓練一次之後，inference 極快——一次 forward pass

用 GPU 跑 PINN coarse propagator，CPU 跑 fine numerical solver

Heterogeneous hardware utilization：唔再浪費 GPU 喺 coarse 嘅 mesh-based computation

Demonstrated on Black-Scholes equation：speedup 比傳統 numerical coarse solver 好

呢個 setup 第一次認真探討「neural coarse solver 唔單止 cheap，仲可以更 accurate」嘅可能性。

Neural-Parareal (ITER, 2024)

📄 Paper：Pamela et al., Neural-Parareal: Self-improving acceleration of fusion MHD simulations, Comp. Phys. Commun. 2024

呢個係 fusion plasma simulation（ITER tokamak）嘅 production use case。創新點：dynamic、self-improving coarse solver。

Loading diagram...

核心思想：Parareal 嘅 fine solver 自然會生成大量 high-fidelity training data——用呢啲 data train neural operator 做下一輪嘅 coarse solver。越跑越快。

🔬 Fusion MHD 嘅意義
ITER 模擬一個完整 tokamak discharge 喺現代 HPC cluster 都要好幾日。Neural-Parareal 將呢類 simulation 變到 deployable，係 fusion plasma research 嘅 enabler。

RandNet-Parareal (NeurIPS 2024)

📄 Paper：Gattiglio, Grigoryeva, Tamborrino, RandNet-Parareal: a time-parallel PDE solver using Random Neural Networks, NeurIPS 2024

呢篇係近年 Parareal × NN 最 elegant 嘅工作。核心觀察：

✨ RandNet 嘅 trick

Train PINN 或 neural operator 好貴——可能比 Parareal 慳嘅時間更貴

用 Random Neural Networks (RandNets)：weights 全部 random、只 train 最後一層 linear readout（即係 ELM / Reservoir Computing 風格）

Train cost ≈ 一個 linear regression

學嘅唔係 solution 本身，而係 coarse - fine 嘅 discrepancy

結果：相比 fine solver serial 嘅 125× speedup，相比經典 Parareal 嘅 22× speedup

呢個 setup 嘅 beauty：冇 deep learning training overhead，但又拎到 neural-quality 嘅 coarse correction。Universal approximator without training cost。

Parareal with Spectral Coarse Solver (Gander et al., 2025)

📄 Paper：Gander, Ohlberger, Rave, A Parareal Algorithm with Spectral Coarse Solver, 2025

最新工作（2025）。不用 neural network，但用 randomized SVD + reduced basis methods 構建 coarse solver。

呢個方向證明：neural network 唔係必須，spectral / reduced-order coarse solvers 仍然有 lots of room。

Coarse solver 嘅選擇 cheat sheet（2026 版）

Coarse solver type	訓練成本	Inference speed	精度	最適合
Bigger time step	0	非常快	低	Smooth, non-stiff ODE
Coarser mesh	0	快	中	Multi-scale PDE（已經有 multigrid）
Reduced-order model (POD)	中（要 snapshot）	快	中-高	Repeated parametric runs
PINN	高（每個 sub-interval / parameter 都要 retrain）	非常快（forward pass）	中-高	Parametric study、limited data 場景
Neural Operator (FNO, DeepONet)	非常高（一次性，offline）	非常快	高	Production HPC（fusion、weather）
RandNet	低（只 train readout layer）	非常快	中-高	Best of both worlds
Spectral / Reduced Basis	中（randomized SVD）	快	高	有 spectral structure 嘅 PDE

實作示範：簡單 1D heat equation

pythonimport numpy as np
from concurrent.futures import ProcessPoolExecutor

# 1D heat equation: u_t = alpha * u_xx
def fine_solver(u0, dt_fine, n_steps, alpha, dx):
    """High-accuracy fine solver: small dt, RK4."""
    u = u0.copy()
    for _ in range(n_steps):
        # RK4 update with small dt
        k1 = alpha * laplacian(u, dx)
        k2 = alpha * laplacian(u + 0.5 * dt_fine * k1, dx)
        k3 = alpha * laplacian(u + 0.5 * dt_fine * k2, dx)
        k4 = alpha * laplacian(u + dt_fine * k3, dx)
        u = u + (dt_fine / 6) * (k1 + 2*k2 + 2*k3 + k4)
    return u

def coarse_solver(u0, dT, alpha, dx):
    """Cheap coarse solver: one big forward Euler step."""
    return u0 + dT * alpha * laplacian(u0, dx)

def laplacian(u, dx):
    """Second-order central difference."""
    lap = np.zeros_like(u)
    lap[1:-1] = (u[2:] - 2*u[1:-1] + u[:-2]) / dx**2
    return lap

def parareal(u0, T, N, K, alpha, dx, dt_fine):
    """Parareal algorithm.
    
    Args:
        u0: initial condition (spatial array)
        T: final time
        N: number of time slices
        K: max iterations
        alpha: diffusion coeff
        dx: spatial step
        dt_fine: fine solver time step
    """
    dT = T / N
    n_fine_steps = int(dT / dt_fine)
    
    # Storage: U[k][n] = solution at iteration k, time T_n
    U = [[None] * (N + 1) for _ in range(K + 1)]
    U[0][0] = u0
    
    # ===== Iteration 0: pure coarse pass (sequential) =====
    for n in range(N):
        U[0][n+1] = coarse_solver(U[0][n], dT, alpha, dx)
    
    # ===== Parareal iterations =====
    for k in range(K):
        # Step 1: fine solver in PARALLEL across all sub-intervals
        with ProcessPoolExecutor(max_workers=N) as pool:
            fine_results = list(pool.map(
                fine_solver,
                [U[k][n] for n in range(N)],
                [dt_fine] * N,
                [n_fine_steps] * N,
                [alpha] * N,
                [dx] * N,
            ))
        
        # Step 2: coarse solver on previous iteration (cached)
        coarse_old = [coarse_solver(U[k][n], dT, alpha, dx) for n in range(N)]
        
        # Step 3: Predictor-corrector update (sequential but cheap)
        U[k+1][0] = u0
        for n in range(N):
            coarse_new = coarse_solver(U[k+1][n], dT, alpha, dx)
            # Predictor-corrector: U_{n+1}^{k+1} = G(U_n^{k+1}) + F(U_n^k) - G(U_n^k)
            U[k+1][n+1] = coarse_new + fine_results[n] - coarse_old[n]
        
        # Check convergence
        max_change = max(
            np.linalg.norm(U[k+1][n] - U[k][n])
            for n in range(N + 1)
        )
        print(f"Iteration {k+1}: max change = {max_change:.6e}")
        if max_change < 1e-8:
            print(f"Converged at iteration {k+1}")
            return U[k+1]
    
    return U[K]

呢個 minimal implementation 演示晒兩個關鍵點：

Fine solver call 喺 ProcessPoolExecutor 入面 parallelize——呢度係速度嘅嚟源
Predictor-corrector update 仍然 sequential，但因為只 call coarse solver，所以快

實際 production code（XBraid、PFASST、libpfasst 等）會處理 stiff equation、adaptive time step、multilevel hierarchies 等等，但核心算法仲係呢個。

Limitations 同 open problems

Hyperbolic / advection-dominated PDEs

⚠️ Parareal 嘅死穴
對 hyperbolic PDE（advection、wave equation），Parareal 出名地唔 converge 或者 converge 慢——因為呢類 PDE 冇 dissipation，coarse 嘅 phase error 唔會被 damped。

即係話：Parareal 喺 parabolic PDE（heat、diffusion）work 好好，但 wave-like 嘅 problem 仲未完全解決。

近年嘅 workaround：

PFASST（Parallel Full Approximation Scheme in Space and Time）：multilevel 嘅 Parareal
Krylov-enhanced Parareal：加 Krylov subspace acceleration
ParaExp：specialize 喺 oscillatory ODE

Strong scaling limit

Speedup 上限係 $N/K$ 。如果 problem 本身需要 $K = O(N)$ iteration 先 converge，Parareal 完全冇 speedup——慘到比 serial 仲慢（因為要算 coarse pass）。

Communication overhead

每個 iteration 結束之後，所有 processor 要 sync 同交換 sub-interval boundary value。N 大嘅時候 communication dominate runtime。

Coarse solver design 仍然係 art

雖然有 RandNet / NO / spectral 等新 approach，但冇一個 universal recipe——每個 problem 都要 case-by-case 調 coarse solver。

總結：點解 coarse vs fine solver 係一個值得深入嘅 idea

🎯 5 個 takeaway

時間方向唔係 inherently sequential——只要你接受「approximate first, correct later」嘅範式，時間都可以 parallelize

Coarse / fine 嘅分工係 universal pattern——唔單止喺 PDE，喺 multi-scale modeling、agentic AI、optimizer pipeline 都見到呢個結構

Convergence 速度由 coarse-fine gap 決定——所以 neural network coarse solver 越 accurate 越好

RandNet 證明咗：你唔需要 expensive training 都可以拎到 neural-quality coarse solver

Hyperbolic PDE 仍然係 open problem——下一代 Parareal 可能要結合 implicit method 或 multilevel hierarchy 先解決

相關資源

經典論文

📄 Original Parareal：Lions, Maday, Turinici, "Résolution d'EDP par un schéma en temps pararéel", C. R. Acad. Sci., 2001
📚 Wikipedia：Parareal
📖 Survey：Gander, "50 Years of Time Parallel Time Integration"（2015）

Neural network × Parareal

📄 PPINN：Meng, Li, Zhang, Karniadakis (2020)
📄 Parareal-PINN coarse：Ibrahim et al. (2023)
📄 Neural-Parareal (ITER fusion)：Pamela et al. (2024)
📄 RandNet-Parareal：Gattiglio, Grigoryeva, Tamborrino (NeurIPS 2024)
📄 Allen-Cahn + operator learning：Parareal for Allen-Cahn

Software / Implementations

🔧 XBraid（LLNL，C/C++ multigrid-in-time）
🔧 PFASST++ / libpfasst（Lawrence Berkeley，PFASST framework）
🔧 pySDC（Python spectral deferred correction + Parareal）

TL;DR

目錄

為咩時間方向咁難 parallelize？

Parareal 嘅核心 idea：兩個 solver 互補

Coarse 同 fine 點樣分工？

Parareal Algorithm：完整 mathematical formulation

Notation

Initialization (k = 0)：純 coarse pass

Iteration k→k+1k \to k+1k→k+1：predictor-corrector

點解 update 係咁？

數學上嘅 fixed point

點樣 parallelize？

Convergence：點解經過幾個 iteration 就夠？

Theoretical bound

實際上 K iterations 夠用？

Coarse solver 嘅選擇：traditional approaches

Neural network 做 coarse solver：新一波 Parareal

PPINN (Karniadakis et al., 2020)

Parareal with PINN as Coarse Propagator (2023)

Neural-Parareal (ITER, 2024)

RandNet-Parareal (NeurIPS 2024)

Parareal with Spectral Coarse Solver (Gander et al., 2025)

Coarse solver 嘅選擇 cheat sheet（2026 版）

實作示範：簡單 1D heat equation

Limitations 同 open problems

Hyperbolic / advection-dominated PDEs

Strong scaling limit

Communication overhead

Coarse solver design 仍然係 art

總結：點解 coarse vs fine solver 係一個值得深入嘅 idea

相關資源

經典論文

Neural network × Parareal

最新發展（2025）

Software / Implementations

相關 blog

TL;DR

目錄

為咩時間方向咁難 parallelize？

Parareal 嘅核心 idea：兩個 solver 互補

Coarse 同 fine 點樣分工？

Parareal Algorithm：完整 mathematical formulation

Notation

Initialization (k = 0)：純 coarse pass

Iteration k→k+1k \to k+1k→k+1：predictor-corrector

點解 update 係咁？

數學上嘅 fixed point

點樣 parallelize？

Convergence：點解經過幾個 iteration 就夠？

Theoretical bound

實際上 K iterations 夠用？

Coarse solver 嘅選擇：traditional approaches

Neural network 做 coarse solver：新一波 Parareal

PPINN (Karniadakis et al., 2020)

Parareal with PINN as Coarse Propagator (2023)

Neural-Parareal (ITER, 2024)

RandNet-Parareal (NeurIPS 2024)

Parareal with Spectral Coarse Solver (Gander et al., 2025)

Coarse solver 嘅選擇 cheat sheet（2026 版）

實作示範：簡單 1D heat equation

Limitations 同 open problems

Hyperbolic / advection-dominated PDEs

Strong scaling limit

Communication overhead

Coarse solver design 仍然係 art

總結：點解 coarse vs fine solver 係一個值得深入嘅 idea

相關資源

經典論文

Neural network × Parareal

最新發展（2025）

Software / Implementations

相關 blog

Iteration $k \to k+1$ ：predictor-corrector

Iteration $k \to k+1$ ：predictor-corrector