MLRunX Docs - Ibra Niang

Overview

MLRunX tracks ML experiments as first-class runs. Each run can store params, tags, metrics, events, and artifacts while remaining easy to query and compare across projects.

Current release focus: Rust API gateway + SQLite-backed local deployment, Next.js dashboard, project-scoped API keys, share links, and a Python SDK with async batching and offline spooling.

Why MLRunX

Run-centric model with searchable metadata and comparison views.
Performance-first backend (Rust/Axum + gRPC path).
Local-first deployment with minimal ops overhead.
Scaffolded path to scale-out backends (ClickHouse/Postgres/MinIO).

Quick Start

Start API (standalone)

git clone https://github.com/ibusnowden/MLRunX.git
cd MLRunX
cargo run --bin mlrunx-api
# HTTP :3001, gRPC :50051, SQLite ./mlrunx.db

Start Dashboard

cd apps/ui
npm install
npm run dev
# UI on http://localhost:3000

Docker option

docker run -p 3001:3001 -p 50051:50051 -v mlrunx-data:/data \
  ghcr.io/ibusnowden/mlrunx:latest

Use Hosted API

From the training machine:

uv pip install --upgrade mlrunx
export MLRUNX_SERVER_URL=https://mlrunx.your-domain.com
export MLRUNX_API_KEY=mlrunx_...
export MLRUNX_PROJECT_ID=019c...
python run.py

Minimal run code:

import mlrunx

run = mlrunx.init(
    project_id="019c...",
    name="char-gpt-scratch",
    tags={"framework": "scratch", "dataset": "names"},
)

for step in range(1000):
    loss = train_step()
    val_loss = eval_step()
    run.log({"loss": loss, "val_loss": val_loss}, step=step)

run.finish(status="finished")

Python SDK

The SDK is asynchronous and non-blocking. Calls to run.log() are queued and flushed in the background to avoid slowing training loops.

Install

pip install mlrunx

Basic usage

import mlrunx

run = mlrunx.init(
    project="demo-project",
    name="train-resnet50",
    tags={"model": "resnet50", "dataset": "imagenet"},
)

run.log_params({"lr": 0.001, "batch_size": 32})

for step in range(1000):
    loss, acc = train_step()
    run.log({"loss": loss, "accuracy": acc}, step=step)

run.finish()

Context manager

import mlrunx

with mlrunx.init(project="demo-project") as run:
    run.log_params({"optimizer": "adamw", "epochs": 10})
    for step in range(200):
        run.log({"loss": train_step()}, step=step)
# automatically flushes and closes

Run API

Core methods for everyday tracking:

run = mlrunx.init(project="my-project", name="exp-01")

run.log({"loss": 0.41, "accuracy": 0.87}, step=140)
run.log_params({"lr": 0.0005, "dropout": 0.1})
run.log_tags({"owner": "ibra", "stage": "baseline"})

run.finish(status="finished")

Offline spool behavior

export MLRUNX_SPOOL_ENABLED=true
export MLRUNX_SPOOL_DIR=~/.mlrunx/spool
export MLRUNX_SPOOL_MAX_SIZE=100000000

Configuration

Common runtime variables used by the server and SDK.

Variable	Default	Description
`MLRUNX_SERVER_URL`	`http://localhost:3001`	SDK target API URL
`MLRUNX_API_KEY`	`None`	Auth key for SDK requests
`MLRUNX_BATCH_SIZE`	`1000`	Max events per flush batch
`MLRUNX_BATCH_TIMEOUT_MS`	`1000`	Max queue age before flush
`MLRUNX_COALESCE_METRICS`	`true`	Keep latest metric per step
`MLRUNX_SPOOL_ENABLED`	`true`	Enable offline disk spool
`MLRUNX_OFFLINE`	`false`	Force offline-only mode
`API_HTTP_PORT`	`3001`	Rust API HTTP port
`API_GRPC_PORT`	`50051`	Rust API gRPC port

Architecture

Current architecture is monolith-first: one API server with clear internal boundaries for future service extraction.

Python SDK  -->  Rust API (Axum + Tonic)  -->  SQLite (v0.1 default)
                    |                             |
                    |                             +--> API keys, share tokens, run metadata
                    +--> Next.js UI (TypeScript)       + metrics/events/params

Scale-out path (scaffolded in repo)

SDK --> Ingest Service --> ClickHouse (metrics)
UI  <-> API Gateway    <-> PostgreSQL (metadata)
Processor --> MinIO (artifacts)

Project Layout

MLRunX/
├── apps/
│   ├── api/                  # Rust API gateway
│   └── ui/                   # Next.js dashboard
├── sdks/
│   ├── python/               # Python SDK
│   └── integrations/         # Framework hooks
├── services/
│   ├── ingest/               # Scaffolded ingest service
│   └── processor/            # Scaffolded rollup processor
├── crates/proto/             # Shared protobuf contracts
├── infra/docker/             # Compose stack and local infra
├── docs/                     # Architecture/specs/ops docs
└── bench/                    # Benchmarks and thresholds

MLRunX

Overview

Why MLRunX

Quick Start

Start API (standalone)

Start Dashboard

Docker option

Use Hosted API

Python SDK

Install

Basic usage

Context manager

Run API

Offline spool behavior

Configuration

Architecture

Scale-out path (scaffolded in repo)

Project Layout

Useful Links