zipseq ignore _CURRENT

2025-11-10 19:32:31 -07:00
parent 23562de9cf
commit f2e3a96bf2
2 changed files with 440 additions and 3 deletions
--- a/.specstory/history/2025-11-11_01-50Z-update-configloader-to-overwrite-gitattributes-and-gitignore.md
+++ b/.specstory/history/2025-11-11_01-50Z-update-configloader-to-overwrite-gitattributes-and-gitignore.md
@@ -569,3 +569,440 @@ Both files will be hidden after deployment.
 ---
 _**User (2025-11-11 01:50Z)**_
@zip_sequences.py (1-576) let's make zipseq ignore _CURRENT folders as well as _archive folders.
 ---
 _**Agent (model default, mode Agent)**_
 ---
 Updating `zip_sequences.py` to ignore `_CURRENT` folders in addition to `_archive` folders.
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 ---
 Updating the code to ignore `_CURRENT` folders:
 ---
 <tool-use data-tool-type="write" data-tool-name="search_replace">
 <details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
 **Chunk 1**
 Lines added: 1, lines removed: 1
 ```diff
@@ -123,5 +123,5 @@
  def is_archive_path(path: Path) -&gt; bool:
 -     return any(part == &quot;_archive&quot; for part in path.parts)
 +     return any(part in (&quot;_archive&quot;, &quot;_CURRENT&quot;) for part in path.parts)
 ```
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="write" data-tool-name="search_replace">
 <details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
 **Chunk 1**
 Lines added: 1, lines removed: 1
 ```diff
@@ -129,5 +129,5 @@
      for dirpath, dirnames, filenames in os.walk(root):
          path = Path(dirpath)
 -         dirnames[:] = [d for d in dirnames if d != &quot;_archive&quot;]
 +         dirnames[:] = [d for d in dirnames if d not in (&quot;_archive&quot;, &quot;_CURRENT&quot;)]
          if is_archive_path(path):
              continue
 ```
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="write" data-tool-name="search_replace">
 <details><summary>Tool use: **code_edit** • Edit file: zip_sequences.py</summary>
 **Chunk 1**
 Lines added: 1, lines removed: 1
 ```diff
@@ -140,5 +140,5 @@
      for dirpath, dirnames, filenames in os.walk(seq_dir):
          path = Path(dirpath)
 -         dirnames[:] = [d for d in dirnames if d != &quot;_archive&quot;]
 +         dirnames[:] = [d for d in dirnames if d not in (&quot;_archive&quot;, &quot;_CURRENT&quot;)]
          if is_archive_path(path):
              continue
 ```
 </details>
 </tool-use>
 ---
 ---
 <tool-use data-tool-type="generic" data-tool-name="read_lints">
 <details>
          <summary>Tool use: **read_lints** • Read lints for 1 file</summary>
 Lint paths:
 - `zip_sequences.py`
 Lint results
 Lint errors for `zip_sequences.py`
 | Message | Range | Severity |
 |-------|-------|----------|
 | Variable "source" is not accessed | 55:9-55:15 | DIAGNOSTIC_SEVERITY_HINT |
 </details>
 </tool-use>
 ---
 ---
 ---
 Updated `zip_sequences.py` to ignore `_CURRENT` folders in addition to `_archive` folders. Changes:
 1. **`is_archive_path` function** — checks for both `_archive` and `_CURRENT` in path parts
 2. **`find_sequence_dirs` function** — filters out both `_archive` and `_CURRENT` from directory traversal
 3. **`iter_sequence_files` function** — filters out both `_archive` and `_CURRENT` from directory traversal
 The script now skips both `_archive` and `_CURRENT` folders when scanning for sequences to zip.
 ---
 _**User (2025-11-11 01:50Z)**_
 I'm hitting a hard RAM bottleneck. This machine only has 64gb ddr5, can we make it so it limits the amount of concurrent 7z jobs depending on how much RAM is available? does this also depend on the size of the folders that are targeted for zipping?
 ---
 _**Agent (model default, mode Plan)**_
 ---
 Reviewing the current implementation to understand how worker count is managed and how we can add RAM-based limiting:
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 ---
 Reviewing the code. A few questions to finalize the approach:
 1. RAM reservation: How much RAM should we reserve for the system (e.g., 8–16 GB), or should we use a percentage (e.g., use up to 80% of available RAM)?
 2. RAM estimation strategy:
   - a) Simple: limit concurrent jobs based on total available RAM (e.g., assume each job uses ~2–4 GB)
   - b) Size-based: estimate RAM per job from folder size and adjust dynamically
   - c) Hybrid: start with size-based estimates, fall back to simple limits if estimation fails
 3. Should this apply only to 7z compression, or also to zip compression?
 Folder size does matter: larger folders being compressed simultaneously will use more RAM. 7z with high compression levels can use significant memory, especially for large sequences.
 ---
 _**User (2025-11-11 01:50Z)**_
 1. I think 7z uses 80% by default, but if you have 3 instances trying to use 80%, therein lies the trouble.
 2. hybrid approach sounds good
 3. I'm not sure, does zip compression also 
 need high ram allocation?
 ---
 _**Agent (model default, mode Plan)**_
 ---
 Researching zip compression RAM usage, then drafting the plan:
 ---
 <tool-use data-tool-type="search" data-tool-name="codebase_search">
 <details>
                <summary>Tool use: **codebase_search** • Searched codebase "How does Python zipfile compression use memory or RAM?" • **14** results</summary>
 | File | Lines |
 |------|-------|
 | `zip_sequences.py` | L199-326 |
 | `zip_sequences.py` | L195-326 |
 | `zip_sequences.py` | L202-382 |
 | `compress_blend_files.bat` | L65 |
 | `zip_sequences.py` | L42-114 |
 | `zip_sequences.py` | L70 |
 | `zip_sequences.py` | L329-387 |
 | `zip_sequences.py` | L385-470 |
 | `zip_sequences.py` | L387-517 |
 | `zip_sequences.py` | L138-169 |
 | `CONFIG.md` | L24 |
 | `compress_blend_files.bat` | L63-116 |
 | `zip_sequences.py` | L473-565 |
 | `zip_sequences.py` | L181-195 |
 </details>
 </tool-use>
 ---
 ---
 Creating the plan. Python's `zipfile` is more memory-efficient than 7z (typically hundreds of MB to a few GB), so we'll focus RAM limiting on 7z and apply conservative limits to zip if needed.
 ---
 <tool-use data-tool-type="unknown" data-tool-name="create_plan">
 <details>
 <summary>Tool use: **create_plan**</summary>
 Parameters:
 ```json
 {
  &quot;plan&quot;: &quot;# Add RAM-Based Worker Limiting to zip_sequences.py\n\n## Overview\nModify &#96;zip_sequences.py&#96; to limit concurrent compression jobs based on available RAM. Use a hybrid approach: estimate RAM per job from folder size for 7z compression, with fallback to simple per-job estimates.\n\n## Implementation Details\n\n### 1. Add RAM detection function\n- Create &#96;get_available_ram()&#96; function in &#96;zip_sequences.py&#96;\n- Use &#96;psutil&#96; if available (cross-platform), fallback to platform-specific methods (Windows: &#96;ctypes&#96; with &#96;GlobalMemoryStatusEx&#96;)\n- Return available RAM in bytes\n- Reserve ~20% of total RAM for system (leaving 80% usable, but accounting for multiple 7z instances)\n\n### 2. Estimate RAM usage per job\n- Create &#96;estimate_ram_per_job(seq_dir: Path, seq_state: dict) -&gt; int&#96; function\n- For 7z: estimate based on folder size (total bytes from &#96;seq_state&#96;)\n  - Formula: &#96;max(base_ram, folder_size_bytes * compression_factor)&#96;\n  - Base RAM: ~500MB per job\n  - Compression factor: ~0.1-0.2x for 7z (7z uses ~80% of available, but we need to account for multiple jobs)\n  - For very large folders (&gt;10GB), cap estimate at reasonable max (e.g., 8GB per job)\n- For zip: use conservative estimate (~1GB per job, zipfile is more memory-efficient)\n- Fallback: if estimation fails, use simple per-job estimate (2GB for 7z, 1GB for zip)\n\n### 3. Modify &#96;max_workers()&#96; function\n- Rename to &#96;calculate_max_workers()&#96; or extend to accept RAM constraints\n- Add parameters: &#96;requested: int | None&#96;, &#96;work_items: list | None = None&#96;\n- Calculate CPU-based limit (existing logic)\n- If &#96;work_items&#96; provided and using 7z:\n  - Get available RAM\n  - Estimate RAM per job for each work item\n  - Calculate max workers: &#96;min(cpu_limit, floor(available_ram / max(estimated_ram_per_job)))&#96;\n  - Ensure minimum of 1 worker\n- If using zip: apply conservative RAM limiting (e.g., max 4-6 workers regardless of CPU)\n- Return the minimum of CPU limit and RAM limit\n\n### 4. Update &#96;run_zip()&#96; function\n- Calculate work items first (existing logic)\n- Pass work items to &#96;calculate_max_workers()&#96; to get RAM-aware worker count\n- Log RAM-based worker limit if verbose\n- Use calculated worker count for ThreadPoolExecutor\n\n### 5. Handle missing dependencies\n- Make &#96;psutil&#96; optional (try/except import)\n- If &#96;psutil&#96; unavailable, use platform-specific fallback\n- Log warning if RAM detection fails, fall back to CPU-based limits\n\n## Files to Modify\n- &#96;zip_sequences.py&#96;: Add RAM detection, estimation, and worker limiting logic\n\n## Configuration\n- No config changes needed initially\n- Could add &#96;maxRamUsage&#96; or &#96;ramReservePercent&#96; to config.json later if needed\n\n## Testing Considerations\n- Test on systems with different RAM amounts (32GB, 64GB, 128GB+)\n- Test with large folders (&gt;10GB) and small folders (&lt;1GB)\n- Verify fallback behavior when psutil unavailable\n- Ensure minimum 1 worker is always used&quot;,
  &quot;name&quot;: &quot;Add RAM-based worker limiting to zip_sequences&quot;,
  &quot;todos&quot;: [
    {
      &quot;content&quot;: &quot;Add get_available_ram() function with psutil support and Windows fallback&quot;,
      &quot;status&quot;: &quot;pending&quot;,
      &quot;id&quot;: &quot;add-ram-detection&quot;
    },
    {
      &quot;content&quot;: &quot;Add estimate_ram_per_job() function that estimates RAM based on folder size for 7z, conservative estimate for zip&quot;,
      &quot;status&quot;: &quot;pending&quot;,
      &quot;id&quot;: &quot;add-ram-estimation&quot;
    },
    {
      &quot;content&quot;: &quot;Modify max_workers() to accept work_items and calculate RAM-aware worker limits&quot;,
      &quot;status&quot;: &quot;pending&quot;,
      &quot;id&quot;: &quot;modify-max-workers&quot;,
      &quot;dependencies&quot;: [
        &quot;add-ram-detection&quot;,
        &quot;add-ram-estimation&quot;
      ]
    },
    {
      &quot;content&quot;: &quot;Update run_zip() to calculate work items first, then pass to max_workers() for RAM-aware limit&quot;,
      &quot;status&quot;: &quot;pending&quot;,
      &quot;id&quot;: &quot;update-run-zip&quot;,
      &quot;dependencies&quot;: [
        &quot;modify-max-workers&quot;
      ]
    },
    {
      &quot;content&quot;: &quot;Add verbose logging for RAM detection, estimation, and worker limit calculations&quot;,
      &quot;status&quot;: &quot;pending&quot;,
      &quot;id&quot;: &quot;add-logging&quot;,
      &quot;dependencies&quot;: [
        &quot;modify-max-workers&quot;
      ]
    }
  ],
  &quot;overview&quot;: &quot;Implement hybrid RAM-based worker limiting that estimates RAM usage per job based on folder size (for 7z) and dynamically adjusts concurrent workers to prevent RAM exhaustion on systems with limited memory.&quot;
 }
 ```
 Additional data:
 ```json
 {
  &quot;status&quot;: &quot;accepted&quot;,
  &quot;reviewData&quot;: {
    &quot;status&quot;: &quot;Requested&quot;,
    &quot;selectedOption&quot;: &quot;none&quot;,
    &quot;isShowingInput&quot;: false
  }
 }
 ```
 Result:
 ```json
 {
  &quot;accepted&quot;: {
    &quot;finalTodos&quot;: [
      {
        &quot;content&quot;: &quot;Add get_available_ram() function with psutil support and Windows fallback&quot;,
        &quot;status&quot;: &quot;pending&quot;,
        &quot;id&quot;: &quot;add-ram-detection&quot;
      },
      {
        &quot;content&quot;: &quot;Add estimate_ram_per_job() function that estimates RAM based on folder size for 7z, conservative estimate for zip&quot;,
        &quot;status&quot;: &quot;pending&quot;,
        &quot;id&quot;: &quot;add-ram-estimation&quot;
      },
      {
        &quot;content&quot;: &quot;Modify max_workers() to accept work_items and calculate RAM-aware worker limits&quot;,
        &quot;status&quot;: &quot;pending&quot;,
        &quot;id&quot;: &quot;modify-max-workers&quot;,
        &quot;dependencies&quot;: [
          &quot;add-ram-detection&quot;,
          &quot;add-ram-estimation&quot;
        ]
      },
      {
        &quot;content&quot;: &quot;Update run_zip() to calculate work items first, then pass to max_workers() for RAM-aware limit&quot;,
        &quot;status&quot;: &quot;pending&quot;,
        &quot;id&quot;: &quot;update-run-zip&quot;,
        &quot;dependencies&quot;: [
          &quot;modify-max-workers&quot;
        ]
      },
      {
        &quot;content&quot;: &quot;Add verbose logging for RAM detection, estimation, and worker limit calculations&quot;,
        &quot;status&quot;: &quot;pending&quot;,
        &quot;id&quot;: &quot;add-logging&quot;,
        &quot;dependencies&quot;: [
          &quot;modify-max-workers&quot;
        ]
      }
    ]
  }
 }
 ```
 Status: **completed**
 </details>
 </tool-use>
 ---
 _**User (2025-11-11 01:50Z)**_
 Implement the plan as specified, it is attached for your reference. Do NOT edit the plan file itself.
 To-do's from the plan have already been created. Do not create them again. Mark them as in_progress as you work, starting with the first one. Don't stop until you have completed all the to-dos.
 ---
 _**Agent (model default, mode Agent)**_
 ---
 Reviewing the current code structure, then implementing the plan step by step.
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="generic" data-tool-name="todo_write">
 <details><summary>Tool use: **todo_write** • Todo List</summary>
 - [-] Add get_available_ram() function with psutil support and Windows fallback
 - [ ] Add estimate_ram_per_job() function that estimates RAM based on folder size for 7z, conservative estimate for zip
 - [ ] Modify max_workers() to accept work_items and calculate RAM-aware worker limits
 - [ ] Update run_zip() to calculate work items first, then pass to max_workers() for RAM-aware limit
 - [ ] Add verbose logging for RAM detection, estimation, and worker limit calculations
 </details>
 </tool-use>
 ---
 ---
--- a/zip_sequences.py
+++ b/zip_sequences.py
@@ -122,13 +122,13 @@ def log(mode: str, message: str, *, verbose_only: bool = False, verbose: bool =
 def is_archive_path(path: Path) -> bool:
-    return any(part == "_archive" for part in path.parts)
+    return any(part in ("_archive", "_CURRENT") for part in path.parts)
 def find_sequence_dirs(root: Path) -> Iterator[Path]:
    for dirpath, dirnames, filenames in os.walk(root):
        path = Path(dirpath)
-        dirnames[:] = [d for d in dirnames if d != "_archive"]
+        dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
        if is_archive_path(path):
            continue
        has_frames = any(Path(dirpath, f).suffix.lower() in SEQUENCE_EXTENSIONS for f in filenames)
@@ -139,7 +139,7 @@ def find_sequence_dirs(root: Path) -> Iterator[Path]:
 def iter_sequence_files(seq_dir: Path) -> Iterator[Path]:
    for dirpath, dirnames, filenames in os.walk(seq_dir):
        path = Path(dirpath)
-        dirnames[:] = [d for d in dirnames if d != "_archive"]
+        dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
        if is_archive_path(path):
            continue
        for filename in filenames: