begin persistence/attachment overhaul
This commit is contained in:
29
persistent_workers.md
Normal file
29
persistent_workers.md
Normal file
@@ -0,0 +1,29 @@
|
||||
## Persistent Remote Worker Architecture
|
||||
|
||||
This repository now includes a small controller/attach framework that keeps both Flamenco and SheepIt workers running on each remote Windows host even after the SSH session closes.
|
||||
|
||||
### Components
|
||||
|
||||
| File | Purpose |
|
||||
| --- | --- |
|
||||
| `remote_worker_controller.ps1` | Runs on each remote host. Launches the real worker process (Flamenco or SheepIt), redirects its stdout/stderr to a log file, listens for commands, and restarts the worker if it crashes. |
|
||||
| `remote_worker_attach.ps1` | Also runs remotely. Streams the controller’s log file back to the local SSH session and forwards commands (`pause`, `resume`, `quit`, etc.) to the worker’s stdin. It can also be invoked in a `-CommandOnly` mode to send a single command without attaching. |
|
||||
| `unified_flamenco_launcher.ps1` / `unified_sheepit_launcher.ps1` | Local launchers that copy the helper scripts to each host, start controllers in the background, and open attach sessions as requested. |
|
||||
|
||||
All metadata, logs and helper scripts live under `C:\ProgramData\UnifiedWorkers\<worker-type>\<worker-name>\`.
|
||||
|
||||
### Typical Flow
|
||||
|
||||
1. **Start/attach**: The launcher ensures the controller + payload are present on the target host, starts the controller through `Start-Process`, then launches `remote_worker_attach.ps1` via SSH so you can see live output and type commands. Closing the SSH window only ends the attach session—the worker keeps running.
|
||||
2. **Reattach**: Run the launcher again and choose the same worker. If a controller reports that a worker is already running, the launcher simply opens another attach session.
|
||||
3. **Pause/Resume/Quit all (SheepIt)**: The SheepIt launcher now exposes menu options that iterate over every enabled worker and invoke the attach helper in `-CommandOnly` mode to send the requested command.
|
||||
|
||||
### Manual Verification Checklist
|
||||
|
||||
1. **Controller deployment**: From the launcher, start a worker once and confirm `%ProgramData%\UnifiedWorkers` appears on the remote host with `controller.ps1`, `attach-helper.ps1`, `logs\worker.log`, and `state\worker-info.json`.
|
||||
2. **Persistence**: Attach to a worker, close the SSH window, then reattach. The log stream should resume and the worker PID reported in metadata should remain unchanged.
|
||||
3. **Command channel**: While attached, type `pause`, `resume`, and `quit`. The controller should log each command and the worker should react accordingly. Repeat the same commands using the SheepIt “Pause/Resume/Quit All” menu entries to validate the automation path.
|
||||
4. **Failure recovery**: Manually terminate the worker process on a remote host. The controller should log the exit, update metadata, and restart the worker up to the configured retry limit.
|
||||
|
||||
These steps ensure both Flamenco and SheepIt workflows operate as intended with the new persistent worker model. Adjust the retry/backoff values inside `remote_worker_controller.ps1` if you need more aggressive or conservative restart behavior.
|
||||
|
||||
Reference in New Issue
Block a user