2026-01-11 · 5 min read

Deep-FEM-UAV-Wing Dev Log (2/2): Dataset Validation → GNN Training → FEM vs AI Comparison

Completing the surrogate model pipeline: validating 200 FEM cases, training GraphSAGE GNN, and building a side-by-side comparison dashboard. Includes performance analysis and lessons learned.

GNNPyTorch GeometricFEMSurrogate ModelGraphSAGEGradioDeep Learning

Goal: Validate 200 FEM cases, train a GraphSAGE surrogate model, and build a FEM vs AI comparison dashboard — completing the end-to-end pipeline from Part 1.

0) 3-minute summary — What did I build?

I trained an AI model that predicts wing stress in seconds instead of minutes (FEM). Then I built a dashboard to compare FEM ground truth vs AI predictions side-by-side.

👉 Live Demo: Hugging Face Space

(Dataset validation): Checked 200 FEM cases for NaN/Inf, extreme values, and failure rates. All 200 passed.
(Graph construction): Converted surface meshes into PyTorch Geometric graphs with node features (position, normal) and edge connectivity.
(GNN training): Trained GraphSAGE (3 layers, 128 hidden) with masked loss to handle root singularity. Best MAE: 0.79 MPa.
(Inference + visualization): Generated AI prediction GLBs with unified colormap for fair comparison.
(Gradio dashboard): Side-by-side 3D viewer with engineering report and error metrics.

Results

FEM vs AI side-by-side comparison: Unified colormap (viridis, 98th percentile normalization)

Left: FEM simulation (Ground Truth) / Right: AI prediction (GraphSAGE)

1) Stage 4 — Dataset Validation

Before training, I validated all 200 FEM cases to ensure data quality.

1-1) Validation checks

# Key validation criteria
checks = {
    "nan_inf": "No NaN or Inf values in stress/displacement",
    "stress_range": "Max stress within reasonable bounds (< 1 GPa)",
    "node_count": "Sufficient surface nodes (> 100)",
    "displacement": "Displacement magnitude reasonable (< 1m)",
}

1-2) Results

Metric	Value
Total cases	200
Passed	200 (100%)
Failed	0
Avg surface nodes	~4,200
Avg max stress	~50 MPa

All 200 cases passed validation. The dataset is clean and ready for training.

2) Stage 5 — GNN Training (GraphSAGE)

2-1) Why GraphSAGE?

For mesh-based surrogate modeling, the key question is: how do we represent irregular mesh topology?

Approach	Pros	Cons
MLP (node-wise)	Simple	Ignores connectivity
CNN (voxelize)	Standard	Resolution loss, memory
GNN (graph)	Native mesh topology	More complex

GraphSAGE was chosen because:

Handles variable-size meshes naturally
Aggregates neighbor information (stress depends on surrounding structure)
Inductive: generalizes to unseen geometries

2-2) Graph construction

Each surface mesh becomes a PyG Data object:

Data(
    x=[N, 6],           # Node features: [x, y, z, nx, ny, nz]
    edge_index=[2, E],  # Edge connectivity (from mesh faces)
    y=[N, 1],           # Target: von Mises stress (normalized)
    loss_mask=[N],      # True for nodes outside root singularity band
)

Loss mask rationale: FEM produces artificially high stress at fixed boundaries (root singularity). We exclude nodes with y < 5% span from the loss calculation.

2-3) Model architecture

GraphSAGE (3 layers)
├── SAGEConv(6 → 128) + ReLU + Dropout(0.1)
├── SAGEConv(128 → 128) + ReLU + Dropout(0.1)
├── SAGEConv(128 → 128) + ReLU + Dropout(0.1)
└── Linear(128 → 1)

Key hyperparameters:

Hidden dim: 128
Layers: 3
Dropout: 0.1
Aggregation: mean
Optimizer: Adam (lr=1e-3)
Scheduler: ReduceLROnPlateau (patience=10)

2-4) Training setup

# Requires CUDA for reasonable training time
python scripts/train_gnn.py

Train/Val/Test split: 160/20/20 cases
Epochs: 200 (early stopping patience=30)
Loss: MSE on normalized stress (masked)
Hardware: RTX A4000 (trained on Windows, inference on Mac)

2-5) Training results

Best epoch: 142
Train Loss: 0.0053
Val Loss:   0.0010
Test MAE:   0.79 MPa (masked)

The model converged smoothly with no signs of overfitting (val loss < train loss).

3) Stage 6 — Inference + Visualization

3-1) Unified colormap issue

Initial problem: FEM used viridis, AI used jet colormap. This made visual comparison misleading.

Before fix:

FEM: viridis (purple → yellow)
AI: jet (blue → red)
Different color scales

After fix:

Both use viridis
Same normalization: [min, 98th percentile] of FEM stress
Fair visual comparison

# Unified color scale: use GT (FEM) stress range
valid_gt = gt_stress[loss_mask]
vmin = float(np.min(valid_gt))
vmax = float(np.percentile(valid_gt, 98))  # 98th to avoid outlier domination

normalized = (pred_stress - vmin) / (vmax - vmin)
colors = viridis_colormap(normalized)

3-2) Deformation visualization issue

Another issue: AI wing appeared "bent" in the viewer.

Cause: I was applying displacement to AI visualization (deform_scale=10.0), but FEM visualization showed original geometry.

Fix: Use original pos for both FEM and AI GLBs. Deformation is only for engineering analysis, not comparison.

3-3) Gradio dashboard

The final dashboard provides:

Side-by-side 3D viewer: FEM (left) vs AI (right)
Engineering report: Geometry, material, stress metrics, safety factor
AI accuracy metrics: MAE, RMSE, max error (all nodes / masked)

with gr.Blocks() as demo:
    gr.Markdown("# Deep-FEM-UAV-Wing")
    gr.Markdown("**FEM vs AI Comparison Dashboard**")

    with gr.Row():
        with gr.Column():
            gr.Markdown("### FEM (Ground Truth)")
            fem_viewer = gr.Model3D()
        with gr.Column():
            gr.Markdown("### AI Prediction")
            ai_viewer = gr.Model3D()

    report = gr.Markdown()  # Engineering report

4) Performance Analysis

4-1) Overall metrics (200 cases)

Metric	Value
MAE (masked)	0.79 MPa
Relative error	~3.3% of max stress
Val/Train loss ratio	0.19 (no overfitting)
Inference time	~0.1 sec/case (vs ~60 sec for FEM)

4-2) Per-case analysis (Test set: Cases 001-005)

Case	Masked MAE	FEM Max (MPa)	AI Max (MPa)	Note
001	1.99 MPa	134.2	37.0	Peak underestimated
002	1.02 MPa	178.4	14.0	Peak underestimated
003	0.61 MPa	49.4	17.7	Good
004	0.84 MPa	26.8	13.2	Good
005	0.77 MPa	118.2	10.9	Peak underestimated

4-3) Key finding: Peak stress underestimation

The most significant limitation is peak stress underestimation:

Average stress: Well predicted (MAE ~0.79 MPa)
Peak stress: Consistently underestimated (AI predicts ~30-50% of actual peak)

Why does this happen?

GraphSAGE aggregates neighbor information via mean pooling:

$$ h_v = W \cdot \text{MEAN}(\{h_u : u \in N(v)\}) $$

This averaging smooths out local extrema. Stress concentrations (high gradients) get diluted by surrounding lower-stress nodes.

4-4) Is this overfitting?

No. Evidence:

Val loss (0.0010) < Train loss (0.0053)
Test MAE consistent with validation MAE
Model generalizes to unseen geometries

The peak underestimation is a structural limitation of mean-aggregation GNNs, not overfitting.

5) Limitations & Future Work

5-1) Current limitations

Limitation	Impact	Mitigation
Peak underestimation	Safety factor overestimated	Use FEM for final verification
Training data bounds	Extrapolation unreliable	Restrict to trained geometry range
Single pressure value	Limited generalization	Train with variable pressure
Surface-only	No internal stress	Use volumetric GNN if needed

5-2) Potential improvements

Attention-based aggregation: Replace mean pooling with attention to preserve peaks

GAT instead of GraphSAGE
$h_v = \sum \alpha_{uv} \cdot W \cdot h_u$ (attention weights $\alpha$)

Peak-weighted loss: Add penalty for peak stress error

$\text{loss} = \text{MSE}(pred, gt) + \lambda \cdot |max(pred) - max(gt)|^2$

Multi-scale architecture: Capture both local peaks and global distribution

``python # U-Net style encoder-decoder on graphs ``

Physics-informed constraints: Add equilibrium/boundary condition losses

5-3) When to use this surrogate model

Use case	Recommended?
Early design screening	✅ Yes
Parameter sweep	✅ Yes
Real-time optimization	✅ Yes
Final safety verification	❌ No (use FEM)
Regulatory certification	❌ No (use FEM)

6) Conclusion

What I built

An end-to-end pipeline for AI-based structural analysis prediction:

Blender → Gmsh → CalculiX → PyG → Gradio
(Geometry)  (Mesh)   (FEM)    (GNN)  (Dashboard)

Key achievements

200 cases generated, meshed, and solved automatically
GraphSAGE surrogate trained with 0.79 MPa MAE
600x speedup (60s FEM → 0.1s AI)
Side-by-side comparison dashboard for validation

Key lessons

Data quality matters: Mesh consistency (winding, normals) caused many bugs
Loss masking is essential: Root singularity would dominate training
Visualization must match: Unified colormap/scale for fair comparison
Know your limitations: GNN smoothing affects peak prediction

Final thoughts

Surrogate modeling is powerful for accelerating design iteration, but it's not a replacement for physics-based simulation. The key is knowing when each tool is appropriate.

For this project:

AI: Fast screening, parameter exploration
FEM: Final verification, safety-critical decisions

The code is available at: GitHub - Deep-FEM-UAV-Wing