Back to Projects

Gothic Grandma LLC · Open Source

GG.Flow

Visual pipeline platform for multimodal scientific research

Lead Platform Engineer · 2026–Present · Alpha · Open Source

Python TypeScript FastAPI Electron React Flow SQLite PostgreSQL
Key Achievement: 148 nodes across 9 domains · Language-agnostic node SDK with ShellNode CLI wrappers · Content-addressable caching · Auto-generation from Python, R, CLI, YAML · Plugin architecture via Python entry points · IDE-style dockable panel interface

GG.Flow is a visual pipeline orchestration platform purpose-built for scientific research. You build analysis pipelines by connecting nodes on a canvas — the same way you'd draw a methods diagram, except each node actually runs. Every connection carries typed data. Every parameter is visible and editable. Every run is logged. The pipeline is the documentation.

GG.Flow unifies EEG, MRI, machine learning, statistics, visualization, behavioral analysis, and data I/O into a single visual environment — connecting tools written in Python, MATLAB, R, and CLI (FSL, FreeSurfer, ANTs) without requiring researchers to learn each tool's interface.

Open source. GG.Flow is the only open-source component of the Gothic Grandma ecosystem. A graduate student in Nairobi should be able to build the same MRI preprocessing pipeline as a well-funded research hospital. Public release planned for 2026.

Multimodal Pipeline Example

Load EEG
Filter
ICA
Epochs
Load MRI
BET
FLIRT
ML / Stats
Viz
Every node: typed ports · visible parameters · cached results · full provenance

What Problem It Solves

A new graduate student inherits a folder of analysis scripts. Half reference hardcoded paths on the old student's machine. A critical script depends on an FSL version that's no longer installed. The student spends their first semester getting the pipeline to run — not doing science.

Meanwhile, labs produce increasingly multimodal data: EEG, MRI, EMG, kinematics, questionnaires. Getting from raw data to a publication figure means stitching together Python notebooks, FSL commands, MATLAB scripts, R statistics, and bash glue. It works — until someone graduates, until a reviewer asks "what happens if you change the bandpass cutoff?" and answering requires re-running a fragile chain.

The analysis pipeline is the most valuable and most fragile artifact in any neuroscience lab. GG.Flow makes it visual, reproducible, and permanent.

How It Works

148 Nodes Across 9 Domains
EEG.Flow — 32 MNE-based nodes
MR.Flow — 46 FSL/FreeSurfer/ANTs nodes
ML.Flow — 22 ML + evaluation nodes
Stats.Flow — 15 statistical tests
Viz.Flow — 12 visualization nodes
IO.Flow — 15 data loading/export nodes
Beh.Flow — 6 behavioral analysis nodes

Flow packages register via Python entry points — install a package, restart the server, nodes appear in the palette.

Language-Agnostic Node SDK

All nodes extend SyncNode (blocking) or BaseNode (async) with typed input/output ports and configurable parameters. 11 port types: DataFrame, TimeSeries, Epochs, Image, Matrix, Scalar, String, Boolean, Dict, List, Any.

  • Python functions — wrap any function with type hints as a node
  • CLI toolsShellNode wraps FSL, FreeSurfer, ANTs, MATLAB
  • R packages — via rpy2 integration
  • Annotated markerstyping.Annotated with Input()/Parameter() to classify parameters explicitly
Auto-Generation Engine

Generate GG.Flow nodes automatically from existing code:

  • Python modules — introspects signatures and type hints
  • CLI tools — parses --help output
  • YAML specs — manual but precise definitions
  • R packages — via rpy2

Quality tier system: TIER_1 (full type hints + docstring), TIER_2 (partial), TIER_3 (no hints — skipped by default). Each generated file tagged with quality tier and review status.

Content-Addressable Caching

Every node execution is hashed: SHA-256(node_type, config, input_hashes). If inputs and parameters haven't changed, the node returns cached results instantly.

  • Change a downstream statistical test — upstream preprocessing doesn't re-run
  • Add a new analysis branch for all subjects — only the new nodes execute
  • For MRI pipelines that take hours per subject, this saves weeks of compute per year
Batch Execution with Failure Isolation

Run any pipeline across multiple subjects with variable substitution and per-subject failure isolation. If subject 23 fails, the other 49 complete normally. Fix the issue, re-run only subject 23.

Per-subject progress tracking via WebSocket. Configurable parallelism.

Sub-Pipelines

Any saved pipeline becomes a single node inside another pipeline. Port inference from source/sink nodes. Max recursion depth: 10 levels with circular reference detection.

ShellNode — CLI Tool Wrappers

Wrap any command-line tool as a pipeline node via ShellNode. This is how GG.Flow integrates FSL, FreeSurfer, ANTs, MATLAB, and any other CLI tool without requiring Python bindings.

Each ShellNode builds a command list, executes it in a subprocess, and exposes output files as typed ports. Includes parse_output_file(directory, pattern) for finding results by glob pattern after execution.

Plugin Architecture

Flow packages register via Python entry_points. Install a package, restart the server, and its nodes appear in the palette — no configuration required. This makes GG.Flow extensible by third-party developers and lab-specific tooling.

Three built-in pipeline templates ship for quick starts: EEG Preprocessing, MRI Structural, and Cross-Modal ML. Labs can also publish their own starter templates alongside their flow packages.

IDE-Style Dockable Interface

The frontend uses FlexLayout-React to provide a fully dockable, resizable panel system. Node palette, canvas, inspector, execution log, run history, results viewer, and console — all arrangeable to match the researcher's workflow.

8 Zustand stores manage pipeline state, project selection, execution progress, job tracking, layout, logs, palette filtering, and theming (dark/light modes).

Architecture

  • Backend — FastAPI orchestrator with async SQLAlchemy ORM, content-addressable cache, WebSocket progress broadcasting
  • Frontend — Electron 28 + React 18 + React Flow + Zustand + FlexLayout-React + TailwindCSS
  • SDKSyncNode/BaseNode base classes, 11 port types, ShellNode CLI wrappers, ggflow-autogen CLI tool
  • Execution — Kahn's topological sort, pluggable executor pools (thread/process/GPU/async), graceful per-node error handling
  • Database — Multi-project strategy: registry DB tracks all projects, per-project DBs store pipelines/nodes/edges/runs. SQLite default, PostgreSQL supported
  • Serialization — Hybrid serde with format versioning: scalars inline, DataFrames to Parquet, arrays to NumPy, images to PNG, mixed dicts to .npz + manifest
  • Monorepouv (Python) + pnpm (JavaScript) workspace. 256 Python files, 50 TypeScript files across 9 packages
  • CI/CD — GitLab CI pipeline: lint (ruff), test (pytest --cov), build (Vite production)

Codebase: 256 Python files (~54K LOC), 50 TypeScript files, 21 test suites, 9 modular packages, 3 pipeline templates. Managed as a uv + pnpm monorepo with GitLab CI.

Origin

GG.Flow grew out of the MUSE ecosystem. When you're building a simulation engine that models dozens of biological and environmental systems, you develop strong opinions about how computation should be orchestrated — how heterogeneous processes should be composed, cached, monitored, and made robust to failure.

Working simultaneously in neuroscience labs, the same orchestration problems appeared everywhere: fragile script chains, lost institutional knowledge, irreproducible analyses, multimodal data that nobody can integrate cleanly. GG.Flow is the convergence — orchestration thinking from a large-scale simulation engine applied to concrete problems of scientific data analysis.

Impact

  • Student onboarding — new students see the full analysis as a visual graph, understand it in an afternoon
  • Reviewer responses — "what if you change X?" becomes: change the parameter, re-run, cache handles the rest
  • Lab continuity — when someone graduates, the pipeline stays: versioned, shareable, executable
  • Open science — share the pipeline file alongside your paper; anyone can inspect or reproduce exactly what you did
  • Grant language — "analyses conducted using GG.Flow with content-addressable caching and complete provenance tracking" reads differently from "custom scripts in MATLAB and Python"

Related Tools

How does this connect? GG.Flow is the open-source expression of the same principle driving everything I build: complex behavior emerges from well-orchestrated simple components. The same architecture that runs 100K simulation entities also orchestrates 148 research analysis nodes. See the throughline →