Compare commits

...

3 Commits

Author SHA1 Message Date
Nathan
6d4cdb0549 zipseq 7zinst of 0 bottleneck fix 2025-11-11 16:31:09 -07:00
Nathan
55a2074c62 fix zipseq ignoring zip metadata and not replacing with 7z 2025-11-11 12:45:44 -07:00
Nathan
a152d17136 work config 2025-11-11 10:27:48 -07:00
3 changed files with 683 additions and 26 deletions

View File

@@ -19134,7 +19134,6 @@ _**Agent (model default, mode Agent)**_
---
The issue: `UpdateSequences.bat` is in `Renders\04 Securing Your Seam\`, so going up one level (`%script_dir%..`) gives `Renders\` instead of the project root. Checking the batch file logic:
@@ -19160,7 +19159,6 @@ The issue: `UpdateSequences.bat` is in `Renders\04 Securing Your Seam\`, so goin
---
The issue: `UpdateSequences.bat` is in `Renders\04 Securing Your Seam\`, so going up one level (`..`) gives `Renders\` instead of the project root. `ZipSeqArchv.bat` works because it's directly in `Renders\`. Updating `UpdateSequences.bat` to go up two levels to reach the project root:
@@ -19193,7 +19191,6 @@ Lines added: 1, lines removed: 1
---
Fixed. `UpdateSequences.bat` is in subdirectories like `Renders\04 Securing Your Seam\`, so it needs to go up two levels to reach the project root, not one.
The script now goes up two levels (`..\..`) to find `P:\250827_FestivalTurf\.config\config.json`.
@@ -19202,3 +19199,636 @@ Redeploy with `ConfigLoader.ps1` to update all `UpdateSequences.bat` files in th
---
_**User (2025-11-11 19:15Z)**_
@2025-11-11_01-50Z-update-configloader-to-overwrite-gitattributes-and-gitignore.md we pulled some changes from home. the zipper is removing all non-7z archives, but it's not replacing them with new 7z archives. We need the zips to be kept if their metadata is still consistent, and only replaced with 7z if the metadata are not consistent.
---
_**Agent (model default, mode Agent)**_
---
---
Checking the upgrade logic from `.zip` to `.7z` to ensure `.zip` files are only replaced when metadata changes:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
Checking the `zip_sequence` function for `.zip` deletion logic:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
Fixing the logic: when a `.zip` exists, `.7z` doesn't, and metadata matches, keep the `.zip` and don't create a `.7z`. Updating the code:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
Fixing the logic: when a `.zip` exists, `.7z` doesn't, and metadata matches, keep the `.zip` and don't create a `.7z`. Updating the code:
---
<tool-use data-tool-type="write" data-tool-name="search_replace">
<details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
**Chunk 1**
Lines added: 3, lines removed: 2
```diff
@@ -650,14 +650,15 @@
# Check if we need to upgrade from .zip to .7z
old_zip_path = None
+ old_stored_state = None
if ZIPPER_TYPE == &quot;7z&quot;:
old_zip_path = zip_path.with_suffix(&quot;.zip&quot;)
if old_zip_path.exists():
old_state_path = state_path_for(old_zip_path)
old_stored_state = load_state(old_state_path)
- # If old .zip exists and archive doesn&#39;t, we&#39;ll check state later
+ # If old .zip exists and .7z doesn&#39;t, use old .zip&#39;s state for comparison
if not zip_path.exists() and old_stored_state is not None:
stored_state = old_stored_state
- # If archive exists and we have stored state, do quick check before computing full state
+ # If .7z archive exists and we have stored state, do quick check before computing full state
if zip_path.exists() and stored_state is not None:
# Quick check: if directory mtime is older than archive, likely unchanged
```
**Chunk 2**
Lines added: 1, lines removed: 1
```diff
@@ -667,5 +668,5 @@
# If directory wasn&#39;t modified since archive was created, skip state computation
if dir_mtime &lt;= archive_mtime:
- # Still need to check for old .zip cleanup
+ # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
if old_zip_path and old_zip_path.exists():
old_zip_path.unlink(missing_ok=True)
```
**Chunk 3**
Lines added: 15, lines removed: 7
```diff
@@ -685,11 +686,19 @@
# Check if state changed
if stored_state is not None and not state_changed(seq_state, stored_state):
- # Target archive is up to date, but we might still need to clean up old .zip
- if old_zip_path and old_zip_path.exists():
- old_zip_path.unlink(missing_ok=True)
- old_state_path = state_path_for(old_zip_path)
- if old_state_path.exists():
- old_state_path.unlink(missing_ok=True)
- continue
+ # Metadata matches stored state
+ if zip_path.exists():
+ # .7z exists and is up to date, clean up old .zip if it exists
+ if old_zip_path and old_zip_path.exists():
+ old_zip_path.unlink(missing_ok=True)
+ old_state_path = state_path_for(old_zip_path)
+ if old_state_path.exists():
+ old_state_path.unlink(missing_ok=True)
+ elif old_zip_path and old_zip_path.exists() and old_stored_state is not None:
+ # .7z doesn&#39;t exist, but .zip exists and metadata matches
+ # Keep the .zip file, don&#39;t create .7z
+ continue
+ else:
+ # No archive exists, but state matches (shouldn&#39;t happen, but be safe)
+ continue
work_items.append((seq_dir, zip_path, state_path, seq_state))
```
</details>
</tool-use>
---
---
Checking for linting errors:
---
<tool-use data-tool-type="generic" data-tool-name="read_lints">
<details>
<summary>Tool use: **read_lints** • Read lints for 1 file</summary>
Lint paths:
- `zip_sequences.py`
Lint results
**No lint errors found**
</details>
</tool-use>
---
---
Fixed the logic. The script now:
1. **Keeps `.zip` files when metadata matches**: If a `.zip` exists, no `.7z` exists, and the metadata matches the current state, it keeps the `.zip` and does not create a `.7z`.
2. **Only replaces `.zip` with `.7z` when metadata changes**: If the metadata doesn't match (files have changed), it creates a new `.7z` and deletes the old `.zip`.
3. **Cleans up obsolete `.zip` files**: If a `.7z` already exists and is up-to-date, it deletes any old `.zip` files.
The key change is on lines 696-699: when a `.zip` exists, no `.7z` exists, and the metadata matches, the script skips creating a `.7z` and keeps the `.zip` file.
This ensures that `.zip` files are only replaced with `.7z` when the files have actually changed, not just because the script is running.
---
_**User (2025-11-11 19:15Z)**_
Very low CPU and RAM allocation on this. Let's make Max7zInst=0 be our original balls-to-the-walls process.
[zip] RAM: 102.2GB available (80% of total), 40.9GB for compression (40%), 8.0GB per job (dict: 1024MB) → RAM limit: 5, requested: None, final: 5
---
_**Agent (model default, mode Agent)**_
---
Updating the auto-calculation logic for `Max7zInst=0` to be more aggressive, using more CPU and RAM:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
---
Updating the auto-calculation to be more aggressive when `Max7zInst=0`:
---
<tool-use data-tool-type="write" data-tool-name="search_replace">
<details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
**Chunk 1**
Lines added: 6, lines removed: 5
```diff
@@ -282,12 +282,13 @@
# Auto-calculate based on RAM if Max7zInst not configured
+ # &quot;Balls-to-the-walls&quot; mode: use maximum resources
if available_ram is not None:
- # 7z uses ~2-3x dictionary size in RAM, but with overhead use 8x for safety
- # This accounts for 7z&#39;s internal buffers, OS overhead, and other processes
- FIXED_RAM_PER_JOB = FIXED_DICT_SIZE_MB * 8 * 1024 * 1024 # 8GB per job
+ # 7z uses ~2-3x dictionary size in RAM, use 3x for aggressive mode
+ # This is more realistic and allows more concurrent workers
+ FIXED_RAM_PER_JOB = FIXED_DICT_SIZE_MB * 3 * 1024 * 1024 # 3GB per job
# available_ram is already 80% of total (20% reserved for system)
- # Use only 40% of that for compression jobs (very conservative to prevent swapping)
- compression_ram = int(available_ram * 0.4)
+ # Use 95% of available RAM for compression jobs (aggressive mode)
+ compression_ram = int(available_ram * 0.95)
# Calculate worker limit based on fixed per-job RAM
```
**Chunk 2**
Lines added: 1, lines removed: 1
```diff
@@ -305,5 +306,5 @@
log(
&quot;zip&quot;,
- f&quot;RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (40%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → &quot;
+ f&quot;RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (95%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → &quot;
f&quot;RAM limit: {ram_limit}, requested: {requested}, final: {final_limit}&quot;,
verbose_only=True,
```
</details>
</tool-use>
---
---
<tool-use data-tool-type="generic" data-tool-name="read_lints">
<details>
<summary>Tool use: **read_lints** • Read lints for 1 file</summary>
Lint paths:
- `zip_sequences.py`
Lint results
**No lint errors found**
</details>
</tool-use>
---
---
---
Updated the auto-calculation for `Max7zInst=0` to be more aggressive.
**Changes:**
1. **Increased RAM usage**: From 40% to 95% of available RAM for compression
2. **Reduced per-job RAM estimate**: From 8GB to 3GB per job (more realistic for a 1GB dictionary)
**Results for your system:**
- **Before**: 40.9GB for compression ÷ 8GB per job = **5 workers**
- **After**: ~97GB for compression ÷ 3GB per job = **~32 workers**
This should use more CPU and RAM. The dictionary size remains 1GB, so compression ratio is unchanged.
With `Max7zInst=0`, the script will use up to ~32 concurrent 7z instances, maximizing throughput. If you want to limit it, set `Max7zInst` to a specific number in your config.
---
_**User (2025-11-11 21:54Z)**_
it's now trying to do 32 jobs with 3gb allocated to each, even when there's only one sequence to zip. let's make it account for the number of 7z jobs there are to do.
---
_**Agent (model default, mode Agent)**_
---
---
Updating the logic to cap workers at the number of sequences to zip. Checking how `max_workers` is called:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
<tool-use data-tool-type="grep" data-tool-name="grep">
<details>
<summary>Tool use: **grep** • Grep for "def run_zip" in "zip_sequences.py" • 1 match</summary>
Output mode: content
| File | Content | Line |
|------|------|------|
| `zip_sequences.py` | `def run_zip(requested_workers: int \| None, *, verbose: bool) -> int:` | L639 |
</details>
</tool-use>
---
---
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
---
Updating `max_workers` to cap workers at the number of work items:
---
<tool-use data-tool-type="read" data-tool-name="read_file">
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
</details>
</tool-use>
---
---
Capping workers at the number of work items:
---
<tool-use data-tool-type="write" data-tool-name="search_replace">
<details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
**Chunk 1**
Lines added: 5, lines removed: undefined
```diff
@@ -295,4 +295,9 @@
ram_limit = max(1, compression_ram // FIXED_RAM_PER_JOB)
+ # Cap at number of actual work items (can&#39;t have more workers than jobs)
+ num_work_items = len(work_items) if work_items else 0
+ if num_work_items &gt; 0:
+ ram_limit = min(ram_limit, num_work_items)
+
# Use RAM limit directly (no CPU limit)
final_limit = ram_limit
```
**Chunk 2**
Lines added: 1, lines removed: 1
```diff
@@ -307,5 +312,5 @@
&quot;zip&quot;,
f&quot;RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (95%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → &quot;
- f&quot;RAM limit: {ram_limit}, requested: {requested}, final: {final_limit}&quot;,
+ f&quot;RAM limit: {ram_limit}, work items: {num_work_items}, requested: {requested}, final: {final_limit}&quot;,
verbose_only=True,
verbose=verbose
```
</details>
</tool-use>
---
---
Updating the configured `MAX_7Z_INSTANCES` case to also cap at the number of work items:
---
<tool-use data-tool-type="write" data-tool-name="search_replace">
<details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
**Chunk 1**
Lines added: 4, lines removed: 1
```diff
@@ -265,6 +265,9 @@
# Check if Max7zInst is configured
if MAX_7Z_INSTANCES is not None:
- # Use configured maximum instances, but still respect user&#39;s --jobs
+ # Use configured maximum instances, but still respect user&#39;s --jobs and work items
final_limit = MAX_7Z_INSTANCES
+ num_work_items = len(work_items) if work_items else 0
+ if num_work_items &gt; 0:
+ final_limit = min(final_limit, num_work_items)
if requested and requested &gt; 0:
final_limit = min(final_limit, requested)
```
**Chunk 2**
Lines added: 1, lines removed: 1
```diff
@@ -274,5 +277,5 @@
&quot;zip&quot;,
f&quot;Using Max7zInst={MAX_7Z_INSTANCES} from config → &quot;
- f&quot;requested: {requested}, final: {final_limit}&quot;,
+ f&quot;work items: {num_work_items}, requested: {requested}, final: {final_limit}&quot;,
verbose_only=True,
verbose=verbose
```
</details>
</tool-use>
---
---
Updating the default case when RAM detection fails:
---
<tool-use data-tool-type="write" data-tool-name="search_replace">
<details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
**Chunk 1**
Lines added: 12, lines removed: 3
```diff
@@ -323,9 +323,18 @@
# RAM detection failed, use a safe default (no CPU limit)
- if verbose:
- log(&quot;zip&quot;, &quot;RAM detection failed and Max7zInst not set, using default worker limit of 4&quot;, verbose_only=True, verbose=verbose)
default_limit = 4
+ num_work_items = len(work_items) if work_items else 0
+ if num_work_items &gt; 0:
+ default_limit = min(default_limit, num_work_items)
if requested and requested &gt; 0:
- default_limit = requested
+ default_limit = min(default_limit, requested)
+ if verbose:
+ log(
+ &quot;zip&quot;,
+ f&quot;RAM detection failed and Max7zInst not set, using default worker limit → &quot;
+ f&quot;work items: {num_work_items}, requested: {requested}, final: {default_limit}&quot;,
+ verbose_only=True,
+ verbose=verbose
+ )
return (default_limit, fixed_dict_size_bytes)
```
</details>
</tool-use>
---
---
<tool-use data-tool-type="generic" data-tool-name="read_lints">
<details>
<summary>Tool use: **read_lints** • Read lints for 1 file</summary>
Lint paths:
- `zip_sequences.py`
Lint results
**No lint errors found**
</details>
</tool-use>
---
---
---
Updated the code to cap worker count at the number of sequences to zip. Changes:
1. **Auto-calculate mode (`Max7zInst=0`)**: Caps workers at the number of work items
2. **Configured mode (`Max7zInst=N`)**: Also caps at the number of work items
3. **Default fallback**: Also caps at the number of work items
Now when there's only 1 sequence to zip, it will use 1 worker instead of trying to spawn 32. The verbose output will show:
- `work items: 1`
- `final: 1` (instead of 32)
The script still uses aggressive RAM allocation when there are many sequences, but won't exceed the actual number of jobs.
---

View File

@@ -1,6 +1,6 @@
{
"dailyFormat": "daily_YYMMDD",
"structDir": "D:\\0 ProjectStructure",
"structDir": "A:\\1 Amazon_Active_Projects\\3 ProjectStructure",
"zipper": "7z",
"compression": 9,
"Max7zInst": 0

View File

@@ -264,8 +264,11 @@ def max_workers(
# Check if Max7zInst is configured
if MAX_7Z_INSTANCES is not None:
# Use configured maximum instances, but still respect user's --jobs
# Use configured maximum instances, but still respect user's --jobs and work items
final_limit = MAX_7Z_INSTANCES
num_work_items = len(work_items) if work_items else 0
if num_work_items > 0:
final_limit = min(final_limit, num_work_items)
if requested and requested > 0:
final_limit = min(final_limit, requested)
@@ -273,7 +276,7 @@ def max_workers(
log(
"zip",
f"Using Max7zInst={MAX_7Z_INSTANCES} from config → "
f"requested: {requested}, final: {final_limit}",
f"work items: {num_work_items}, requested: {requested}, final: {final_limit}",
verbose_only=True,
verbose=verbose
)
@@ -281,18 +284,24 @@ def max_workers(
return (final_limit, fixed_dict_size_bytes)
# Auto-calculate based on RAM if Max7zInst not configured
# "Balls-to-the-walls" mode: use maximum resources
if available_ram is not None:
# 7z uses ~2-3x dictionary size in RAM, but with overhead use 8x for safety
# This accounts for 7z's internal buffers, OS overhead, and other processes
FIXED_RAM_PER_JOB = FIXED_DICT_SIZE_MB * 8 * 1024 * 1024 # 8GB per job
# 7z uses ~2-3x dictionary size in RAM, use 3x for aggressive mode
# This is more realistic and allows more concurrent workers
FIXED_RAM_PER_JOB = FIXED_DICT_SIZE_MB * 3 * 1024 * 1024 # 3GB per job
# available_ram is already 80% of total (20% reserved for system)
# Use only 40% of that for compression jobs (very conservative to prevent swapping)
compression_ram = int(available_ram * 0.4)
# Use 95% of available RAM for compression jobs (aggressive mode)
compression_ram = int(available_ram * 0.95)
# Calculate worker limit based on fixed per-job RAM
ram_limit = max(1, compression_ram // FIXED_RAM_PER_JOB)
# Cap at number of actual work items (can't have more workers than jobs)
num_work_items = len(work_items) if work_items else 0
if num_work_items > 0:
ram_limit = min(ram_limit, num_work_items)
# Use RAM limit directly (no CPU limit)
final_limit = ram_limit
if requested and requested > 0:
@@ -304,8 +313,8 @@ def max_workers(
ram_per_job_gb = FIXED_RAM_PER_JOB / (1024 ** 3)
log(
"zip",
f"RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (40%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → "
f"RAM limit: {ram_limit}, requested: {requested}, final: {final_limit}",
f"RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (95%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → "
f"RAM limit: {ram_limit}, work items: {num_work_items}, requested: {requested}, final: {final_limit}",
verbose_only=True,
verbose=verbose
)
@@ -313,11 +322,20 @@ def max_workers(
return (final_limit, fixed_dict_size_bytes)
# RAM detection failed, use a safe default (no CPU limit)
if verbose:
log("zip", "RAM detection failed and Max7zInst not set, using default worker limit of 4", verbose_only=True, verbose=verbose)
default_limit = 4
num_work_items = len(work_items) if work_items else 0
if num_work_items > 0:
default_limit = min(default_limit, num_work_items)
if requested and requested > 0:
default_limit = requested
default_limit = min(default_limit, requested)
if verbose:
log(
"zip",
f"RAM detection failed and Max7zInst not set, using default worker limit → "
f"work items: {num_work_items}, requested: {requested}, final: {default_limit}",
verbose_only=True,
verbose=verbose
)
return (default_limit, fixed_dict_size_bytes)
# For zip compression, use existing estimation-based approach
@@ -649,16 +667,17 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
# Check if we need to upgrade from .zip to .7z
old_zip_path = None
old_stored_state = None
if ZIPPER_TYPE == "7z":
old_zip_path = zip_path.with_suffix(".zip")
if old_zip_path.exists():
old_state_path = state_path_for(old_zip_path)
old_stored_state = load_state(old_state_path)
# If old .zip exists and archive doesn't, we'll check state later
# If old .zip exists and .7z doesn't, use old .zip's state for comparison
if not zip_path.exists() and old_stored_state is not None:
stored_state = old_stored_state
# If archive exists and we have stored state, do quick check before computing full state
# If .7z archive exists and we have stored state, do quick check before computing full state
if zip_path.exists() and stored_state is not None:
# Quick check: if directory mtime is older than archive, likely unchanged
try:
@@ -666,7 +685,7 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
archive_mtime = zip_path.stat().st_mtime_ns
# If directory wasn't modified since archive was created, skip state computation
if dir_mtime <= archive_mtime:
# Still need to check for old .zip cleanup
# Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
if old_zip_path and old_zip_path.exists():
old_zip_path.unlink(missing_ok=True)
old_state_path = state_path_for(old_zip_path)
@@ -684,13 +703,21 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
# Check if state changed
if stored_state is not None and not state_changed(seq_state, stored_state):
# Target archive is up to date, but we might still need to clean up old .zip
if old_zip_path and old_zip_path.exists():
old_zip_path.unlink(missing_ok=True)
old_state_path = state_path_for(old_zip_path)
if old_state_path.exists():
old_state_path.unlink(missing_ok=True)
continue
# Metadata matches stored state
if zip_path.exists():
# .7z exists and is up to date, clean up old .zip if it exists
if old_zip_path and old_zip_path.exists():
old_zip_path.unlink(missing_ok=True)
old_state_path = state_path_for(old_zip_path)
if old_state_path.exists():
old_state_path.unlink(missing_ok=True)
elif old_zip_path and old_zip_path.exists() and old_stored_state is not None:
# .7z doesn't exist, but .zip exists and metadata matches
# Keep the .zip file, don't create .7z
continue
else:
# No archive exists, but state matches (shouldn't happen, but be safe)
continue
work_items.append((seq_dir, zip_path, state_path, seq_state))