Precision Neural Engineering Lab
BIDS-SQL
Database infrastructure for neuroimaging dataBIDS-SQL is a database schema and Python library for storing, querying, and managing BIDS-formatted neuroimaging datasets. It transforms the file-based BIDS structure into queryable relational tables—enabling complex queries across subjects, sessions, modalities, and processing pipelines.
The schema captures everything from raw acquisition metadata to processing provenance: which files were processed, by which pipeline stages, with what QC metrics. It's the data persistence layer that enables MR.Flow's dataset management and will power future neuroimaging tools.
Infrastructure, not end-user facing. BIDS-SQL is the foundational data layer that other tools build on—the architecture that determines what queries are possible and how efficiently workflows can run.
Schema Architecture
datasets subjects sessions files metadata scansDataset descriptions, participant info, file entities (task, run, acquisition), JSON sidecar storage
eeg_metadata eeg_channels eeg_events eeg_electrodes eeg_coordsystem50+ recording parameter fields, channel info, event markers, electrode positions
mri_metadata mri_intended_for fmri_events physio_metadata dwi_metadataAcquisition parameters, fieldmap relationships, physiological recordings
processing_runs processing_stages qc_metrics ica_components bad_channels rejected_epochsFull pipeline tracking with timing, status, errors, QC thresholds, component classifications
What Problem It Solves
BIDS (Brain Imaging Data Structure) organizes neuroimaging data as files in directories. This works for storage, but falls apart when you need to:
- Query across 200 subjects: "Which subjects completed Task A with good QC?"
- Track processing: "Which files have been preprocessed? Which failed?"
- Join metadata: "Get all EEG files where sampling rate > 500 Hz and impedance < 10 kΩ"
- Maintain provenance: "What pipeline version processed this file? What QC metrics resulted?"
BIDS-SQL makes BIDS queryable. Instead of walking directories and parsing JSON, you write SQL.
How It Works
Dual Backend: SQLite for Local, PostgreSQL for Shared
- SQLite — Single-file database for local projects, no server needed
- PostgreSQL — Multi-user access for shared lab databases
- Identical Python API — Same code works with either backend
- Schema versioning — Modular SQL files applied in order
Import/Export: BIDS Directory ↔ Database Roundtrip
- BIDSImporter — Parse BIDS directories with PyBIDS, populate database
- BIDSExporter — Export database contents back to BIDS-formatted directories
- Entity normalization — Auto-strip 'sub-', 'ses-' prefixes for clean queries
- Metadata extraction — JSON sidecars stored as queryable key-value pairs
Query Helpers for Common Operations
- File queries — Filter by datatype, task, run, session, subject, acquisition
- QC queries — Get metrics with threshold evaluation, summary statistics
- Processing queries — Track pipeline stages, find failed jobs, check completion
- Metadata search — Search across sidecar fields with SQL patterns
Processing Provenance: Full Pipeline Tracking
- Pipeline runs — Track inputs, outputs, execution status, timing, errors
- Multi-stage support — Record each processing stage independently
- QC metrics — Store metrics with categories (signal_quality, artifacts, motion) and thresholds
- ICA tracking — ICLabel classifications with probabilities and removal decisions
- Bad channel history — Detection method, interpolation records
- Epoch rejection — Reasons, peak amplitudes, timestamps
Example Queries
-- Get all EEG files for subjects who completed TaskA
SELECT f.filepath, s.subject_id
FROM files f
JOIN subjects s ON f.subject_id = s.id
WHERE f.datatype = 'eeg' AND f.task = 'TaskA';
-- Find files that failed preprocessing
SELECT f.filepath, pr.error_message
FROM processing_runs pr
JOIN files f ON pr.input_file_id = f.id
WHERE pr.status = 'failed';
-- QC summary: average metrics by subject
SELECT s.subject_id, AVG(qc.value) as avg_snr
FROM qc_metrics qc
JOIN files f ON qc.file_id = f.id
JOIN subjects s ON f.subject_id = s.id
WHERE qc.metric_name = 'snr'
GROUP BY s.subject_id;
Architecture Decisions
- Modular schemas — Core + extensions (EEG, MRI, provenance) can be mixed
- Foreign key constraints — Data integrity with ON DELETE CASCADE
- Strategic indexing — Indexes on commonly filtered fields (dataset_id, datatype, task)
- BIDS compliance — Entity naming matches the spec exactly
- File lineage — Derivatives track source files via source_file_id
Impact
- Powers MR.Flow's dataset management and job tracking
- Enables complex queries that would require custom scripts with file-based BIDS
- Full reproducibility through processing provenance
- Foundation for future neuroimaging tools in the Neurotrophy ecosystem
Related Tools
How does this connect? This is the data layer that enables tools like MR.Flow to query and coordinate neuroimaging workflows. For context on how infrastructure architecture shapes what's possible, see Architecture → Processing → Behavior.