Data Flow¶
Detailed walkthrough of data flow patterns in ALPCRUN.CH.
Task Submission Flow¶
Step-by-Step: Client to Worker¶
1. Client → Queue Manager: CreateSession
- Client sends session configuration
- Queue manager allocates queues
- Returns session ID
2. Client → Central Cache: SetSharedData (optional)
- Client uploads shared data
- Cache stores with version number
- Returns success
3. Client → Queue Manager: Open ClientStream
- Bidirectional gRPC stream established
- Connection kept alive
4. Client → Queue Manager: Send Batch (task)
- Task batch sent over stream
- Queue manager enqueues in session's task queue
5. Queue Manager: Scheduling
- Identifies available workers
- Sends BIND instruction to worker
- Worker ID added to session
6. Worker → Queue Manager: Open WorkerStream
- Worker connects with session ID
- Bidirectional stream established
7. Queue Manager → Worker: Send Batch (task)
- Dequeues task from session queue
- Sends to worker over stream
8. Worker → Node Cache: GetSharedData (if needed)
- Worker requests shared data
- Node cache checks local storage
- If miss: forward to central cache
- Returns data to worker
9. Worker: Process Task
- Deserializes task
- Performs computation
- Serializes result
10. Worker → Queue Manager: Send Batch (result)
- Result batch sent over stream
- Queue manager enqueues in session's result queue
11. Queue Manager → Client: Send Batch (result)
- Dequeues result from session queue
- Sends to client over stream
12. Client: Process Result
- Deserializes result
- Aggregates or stores
Sequence Diagram¶
Client Queue Mgr Worker Node Cache Central Cache
| | | | |
|--CreateSession->| | | |
|<--Session ID----| | | |
| | | | |
|-----SetSharedData----------------------------------------->|
|<---Success----------------------------------------------|
| | | | |
|--ClientStream-->| | | |
|<--Stream Open---| | | |
| | | | |
|--Send Task----->| | | |
| |--BIND------->| | |
| | |--WorkerStream-->| |
| |<--Stream-----| | |
| |--Task------->| | |
| | |--GetSharedData->| |
| | | |--Forward------->|
| | | |<--Data----------|
| | |<--Data----------| |
| | | | |
| | | [Process] | |
| | | | |
| |<--Result-----| | |
|<--Result--------| | | |
| | | | |
Session Lifecycle¶
State Transitions¶
┌─────────┐
│ Created │ (CreateSession)
└────┬────┘
│
▼
┌─────────┐
│ Active │ (ClientStream open, tasks flowing)
└────┬────┘
│
├──────────────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ Idle │ │ Closed │ (CloseSession)
└────┬────┘ └─────────┘
│
│ (timeout)
▼
┌─────────┐
│ Closed │
└─────────┘
Session States¶
Created: - Session exists in registry - Queues allocated - No streams connected - No workers bound
Active: - Client stream connected - Tasks being submitted - Workers bound and processing - Results being returned
Idle: - No activity for idle_timeout duration - Client may be disconnected - Workers may be unbound - Session still exists (can be reactivated)
Closed: - Session removed from registry - Queues drained (if cleanup enabled) - All workers unbound - Resources released
Worker Binding Flow¶
Scheduler Logic¶
Every scheduler tick (default: 1s):
1. Get all sessions sorted by priority (desc)
2. For each session:
a. Check if session has pending tasks
b. Get current worker count for session
c. If workers < min_workers OR (workers < max_workers AND queue not empty):
- Find available (idle) workers
- Send BIND instruction
- Mark worker as bound
3. For each session:
a. Check idle timeout
b. If idle AND workers > 0:
- Send UNBIND to workers
- Mark workers as available
Worker State Machine¶
┌──────┐
│ Init │
└───┬──┘
│
▼
┌──────────┐
│ Idle │<─────────┐
└────┬─────┘ │
│ │
│ (BIND) │
▼ │
┌──────────┐ │
│ Bound │ │
└────┬─────┘ │
│ │
│ (Processing) │
▼ │
┌──────────┐ │
│ Busy │ │
└────┬─────┘ │
│ │
│ (UNBIND) │
└────────────────┘
Shared Data Flow¶
Initial Setup¶
Client Central Cache Node Cache Worker
| | | |
|--SetSharedData-------->| | |
| (v1) | | |
|<--Success--------------| | |
| | | |
|--Send Task (v1)------->Queue Mgr--Task(v1)->| |
| | |
| |--GetSharedData-->|
| | (v1) |
| |--Request v1----->|
| | |
| |<--Forward req----|
| |<--Get v1------------| |
| |--Data v1----------->| |
| |--Data v1-------->|
| |<--Cache v1-------|
| | |
| |--Process-------->|
Update Flow¶
Client Central Cache Node Cache Worker
| | | |
|--UpdateSharedData----->| | |
| (v2) | | |
|<--Success--------------| | |
| | | |
|--Send Task (v2)------->Queue Mgr--Task(v2)->| |
| | |
| |--GetSharedData-->|
| | (v2) |
| | [Cache miss v2] |
| |--Request v2----->|
| |<--Get v2------------| |
| |--Data v2----------->| |
| |--Data v2-------->|
| |<--Cache v2-------|
| | |
Batching and Flow Control¶
Client-Side Batching¶
Task Generator Batcher Stream Queue Manager
| | | |
|--Task 1----------->| | |
|--Task 2----------->| | |
|--Task 3----------->| [Batch full] | |
| |--Batch(1,2,3)--->| |
| | |--Enqueue---------->|
| | | |
|--Task 4----------->| | |
|--Task 5----------->| | |
| [Timer expires] | | |
| |--Batch(4,5)----->| |
| | |--Enqueue---------->|
Worker-Side Flow Control¶
Queue Manager Worker Stream Worker Result Queue
| | | |
|--Task Batch 1------>| | |
|--Task Batch 2------>| [Max inflight] | |
|--Task Batch 3------>| [Buffer full] | |
| [BLOCKED] | | |
| |--Process 1------->| |
| | |--Result 1---------->|
| |<--Result 1--------| |
| [Can send again] | | |
|--Task Batch 4------>| | |
Dead Letter Flow¶
Task Failure Handling¶
Worker Queue Manager Client (ClientStream) Dead Letter Queue
| | | |
|--Process Task--------->| | |
| [ERROR] | | |
|--Result (error)------->| | |
| |--Move to DLQ------------->| |
| | | |
| | | |
| |
Client (ClientStreamDeadLetters) | |
| | |
|--Request dead letters---------------------------->| |
|<--Batch (with errors)------------------------------------|
| [Inspect error] |
| [Decide: retry or log] |
|--Send retry task-------------------------------->| |
Multi-Session Flow¶
Concurrent Sessions¶
Client A Queue Mgr Worker Pool Client B
| | | |
|--Session A----->| | |
| |--BIND W1-------->| |
| |--BIND W2-------->| |
| | | |
| | | <--Session B--|
| | |<--BIND W3--------|
| | |<--BIND W4--------|
| | | |
|--Tasks A------->|--Tasks---------->| |
| | |<--Tasks----------|<--Tasks B--
| | | |
| | | [Process A & B] |
| | | |
|<--Results A-----|<--Results--------| |
| | |--Results-------->|--Results B->
Priority-Based Scheduling¶
Time Queue Mgr Worker Pool Sessions
t0 | | High (Pri=10)
| | Med (Pri=5)
| | Low (Pri=1)
| |
t1 |--BIND(High)----->| W1,W2,W3 [High gets workers first]
| |
t2 |--BIND(Med)------>| W4,W5 [Med gets remaining]
| |
t3 | | [Low waits]
| |
t4 | [High completes] |
|<--UNBIND---------| W1,W2,W3
|--BIND(Low)------>| W1,W2,W3 [Low can now run]
Cache Hierarchy¶
Node Cache Pull-Through¶
Worker 1 Node Cache Central Cache Worker 2
| | | |
|--Get v1------>| | |
| | [MISS] | |
| |--Get v1--------->| |
| |<--Data v1--------| |
|<--Data v1-----| | |
| | [CACHED] | |
| | | |
| | | <--Get v1--|
| | | [HIT] |
| | | Data v1--->|
| |<-----------------+-----------------|
Throughput Optimization¶
Pipeline Parallelism¶
Stage 1 (10 workers) Stage 2 (5 workers) Stage 3 (2 workers)
| | |
[Process] [Process] [Aggregate]
| | |
v v v
Session A ----------> Session B ----------> Session C
(Priority 10) (Priority 8) (Priority 6)
Adaptive Batching¶
Throughput
^
| ┌─────────────── [Large batches, high throughput]
| /
| /
| / [Adaptive adjustment]
| /
|/___________ [Small batches, low throughput]
└─────────────────────> Time
Next Steps¶
- Components: Component architecture details
- Overview: Architecture principles
- API Reference: API documentation