Back to Projects

Proprio

Real-time wearable data pipeline

Co-Founder & Technical Lead · 2017–2021 · PlatformSTL

Python C# Node.js scikit-learn AWS AWS Lambda MongoDB Swift Apple Watch IMU Sensors Real-time ML

Proprio was a full-stack real-time data platform: Swift watch app → AWS Lambda ingestion → MongoDB → ML classification → web dashboards. $100K SBIR funded, shown to have 20% better accuracy than state-of-the-art approaches.

End-to-end ownership of a streaming pipeline that processed continuous IMU data from Apple Watches, ran patient-specific ML classifiers, and served results to clinician-facing dashboards.

View Publication →

A sensor-to-dashboard pipeline with real-time streaming, serverless ingestion, document storage, ML inference, and multi-tenant visualization.

Proprio confusion matrix showing classification accuracy
Confusion matrix showing activity classification accuracy

Engineering Challenge

The problem: continuous sensor data from wearables, but no infrastructure to make it useful.

  • High-frequency capture — IMU data at 30Hz from Apple Watch
  • Two-hop delivery — watch → iPhone (local), iPhone → AWS (on WiFi)
  • Unreliable connectivity — gaps in uploads, packet loss, out-of-order delivery
  • Per-user models — population models fail; each user needs calibration
  • Multi-tenant access — patients, clinicians, and researchers need different views

The solution: a resilient upload pipeline with local buffering on iPhone, deduplication and gap handling on ingest, and a ML pipeline that trains per-user classifiers.

System Architecture

Data Ingestion (AWS Lambda + Node.js)

  • Serverless ingestion — Lambda functions handle bursty uploads from iPhones
  • Deduplication — handle retries and out-of-order delivery gracefully
  • Gap detection — identify missing time windows, request re-upload if available
  • Batched writes — aggregate samples before storage

Storage Layer (MongoDB)

  • Document model — flexible schema for heterogeneous sensor data
  • Time-series indexing — optimized for range queries on timestamped data
  • Per-user partitioning — data isolation for multi-tenant access

ML Pipeline (Python + scikit-learn)

  • Feature extraction — time-domain and frequency-domain features from raw IMU
  • Per-user training — calibration sessions generate labeled data for individual classifiers
  • Model registry — versioned classifiers deployed per user

Client Apps

  • Swift watch app — background 30Hz IMU capture, streams to paired iPhone
  • Swift iPhone app — local buffering, WiFi-triggered uploads to AWS
  • C# data tools — MongoDB queries, data structuring for analysis
  • Web dashboards — role-based views for patients, clinicians, researchers
Full-stack ownership: watch app, iPhone app, serverless backend, document storage, ML training pipeline, and multi-tenant web dashboards.

My Role

As Co-Founder & Technical Lead, I owned the full technical stack:

  • Designed and implemented the end-to-end data pipeline from watch to dashboard
  • Built the serverless ingestion layer and MongoDB data model
  • Co-invented the patient-specific classification method (published, peer-reviewed)
  • Wrote the SBIR grant that secured $100K funding
  • Ran 50+ customer discovery interviews and clinical validation studies

The company pivoted when we hit the R01 funding barrier. The experience shaped how I think about building data-intensive platforms—real-time pipelines, per-user state, multi-tenant access patterns.

Results

  • 20% better accuracy than published state-of-the-art (per-user models vs population models)
  • $100K SBIR grant — wrote and won federal funding for R&D
  • Peer-reviewed publication — validated methodology in production
  • NSF I-Corps + BioGenerator — customer discovery and startup accelerator
t-SNE visualization of patient movement data clusters
t-SNE visualization showing per-user clustering in feature space