work config

more 7zinst behavior modification
HOME config
2025-11-12 09:49:40 -07:00 · 2025-11-12 09:01:32 -07:00 · 2025-11-11 18:09:35 -07:00
3 changed files with 2712 additions and 49 deletions
--- a/.specstory/history/2025-11-11_01-50Z-update-configloader-to-overwrite-gitattributes-and-gitignore.md
+++ b/.specstory/history/2025-11-11_01-50Z-update-configloader-to-overwrite-gitattributes-and-gitignore.md
@@ -4560,6 +4560,300 @@ Lines added: 1, lines removed: 1
 ---
 <tool-use data-tool-type="generic" data-tool-name="read_lints">
 <details>
          <summary>Tool use: **read_lints** • Read lints for 1 file</summary>
 Lint paths:
 - `zip_sequences.py`
 Lint results
 Lint errors for `zip_sequences.py`
 | Message | Range | Severity |
 |-------|-------|----------|
 | Variable "source" is not accessed | 73:9-73:15 | DIAGNOSTIC_SEVERITY_HINT |
 | "seq_dir" is not accessed | 186:26-186:33 | DIAGNOSTIC_SEVERITY_HINT |
 | Variable "zip_path" is not accessed | 326:18-326:26 | DIAGNOSTIC_SEVERITY_HINT |
 | Variable "state_path" is not accessed | 326:28-326:38 | DIAGNOSTIC_SEVERITY_HINT |
 </details>
 </tool-use>
 ---
 ---
 Updated the code to use `0` for auto-calculate instead of `null`:
 1. **DEFAULT_CONFIG**: Changed from `None` to `0`
 2. **Logic**: Treats `0` as `None` (auto-calculate), so `0` triggers RAM-based calculation
 3. **config.json**: Set to `0`
 Now:
 - `"Max7zInst": 0` = auto-calculate based on RAM
 - `"Max7zInst": 1` = limit to 1 instance
 - `"Max7zInst": 2` = limit to 2 instances
 - etc.
 This makes the config clearer: `0` means automatic, and any positive number sets a hard limit.
 ---
 _**User (2025-11-11 01:50Z)**_
 I think we're just dealing with massive sequences. This is only 1 instance, so even if I had 2x the ram, I would only be able to afford 2!
 ---
 _**Agent (model default, mode Agent)**_
 ---
 Large sequences are using far more RAM than estimated. Making the dictionary size configurable and reducing the default to lower per-instance usage:
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
 </details>
 </tool-use>
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: config.json</summary>
 </details>
 </tool-use>
 ---
 ---
 _**User (2025-11-11 01:50Z)**_
 Nope, we're keeping the dictsize as large as possible. I just need to keep in mind my limitations on a per-project basis.
 ---
 _**Agent (model default, mode Agent)**_
 ---
 The current setup supports this. Keep the dictionary size at 1GB for best compression, and set `Max7zInst` per project in `.config/config.json` to limit concurrent instances based on your RAM.
 For example:
 - `"Max7zInst": 1` - Only 1 concurrent 7z instance (safest for large sequences)
 - `"Max7zInst": 2` - 2 concurrent instances (if you have more RAM)
 - `"Max7zInst": 0` - Auto-calculate (may still hit limits with very large sequences)
 The dictionary size stays at 1GB for maximum compression, and you control concurrency per project. This lets you tune each project based on sequence sizes and available RAM.
 ---
 _**User (2025-11-11 06:13Z)**_
 this is on another system. ignore the 7z PATH error. Are there any other relevant errors?
 & "R:\Creative\artsy\maya\2025-02 BondingArtifact\Renders\ZipSeqArchv.bat"
 [zip] RAM: 76.7GB available (80% of total), 30.7GB for compression (40%), 8.0GB per job (dict: 1024MB) → RAM limit: 3, requested: None, final: 3
 [zip] Renders\0_GaslightRender\2024-11-23\GaslightRender_smol -> Renders\_zipped\0_GaslightRender\2024-11-23\GaslightRender_smol.7z
 [zip] Renders\3_background\2025-01-13\background -> Renders\_zipped\3_background\2025-01-13\background.7z
 [zip] Renders\3_background\2025-10-26\3_background_dark -> Renders\_zipped\3_background\2025-10-26\3_background_dark.7z
 [zip] Renders\3_background\2025-10-30\3_background_dark -> Renders\_zipped\3_background\2025-10-30\3_background_dark.7z
 [zip] Renders\4_StartupScreen\2025-02-05\4_StartupScreen_evensrenumbered -> Renders\_zipped\4_StartupScreen\2025-02-05\4_StartupScreen_evensrenumbered.7z
 [zip] Renders\4_StartupScreen\2025-02-08\Shot2Redo_1024 -> Renders\_zipped\4_StartupScreen\2025-02-08\Shot2Redo_1024.7z
 [zip] Renders\4_StartupScreen\2025-02-08\Shot3Redo_128 -> Renders\_zipped\4_StartupScreen\2025-02-08\Shot3Redo_128.7z
 [zip] Renders\4_StartupScreen\2025-02-09\Shot1 -> Renders\_zipped\4_StartupScreen\2025-02-09\Shot1.7z
 [zip] Renders\4_StartupScreen\2025-02-10\Shot2 -> Renders\_zipped\4_StartupScreen\2025-02-10\Shot2.7z
 [zip] Renders\4_StartupScreen\2025-02-12\Shot3 -> Renders\_zipped\4_StartupScreen\2025-02-12\Shot3.7z
 [zip] Renders\4_StartupScreen\2025-02-13\Shot6 -> Renders\_zipped\4_StartupScreen\2025-02-13\Shot6.7z
 [zip] Renders\4_StartupScreen\2025-02-14\Shot8 -> Renders\_zipped\4_StartupScreen\2025-02-14\Shot8.7z
 [zip] Renders\4_StartupScreen\2025-02-19\StartupScreen_finalfix -> Renders\_zipped\4_StartupScreen\2025-02-19\StartupScreen_finalfix.7z
 [zip] Renders\4_StartupScreen\2025-10-21\Shot4 -> Renders\_zipped\4_StartupScreen\2025-10-21\Shot4.7z
 [zip] Renders\4_StartupScreen\2025-10-24\4_StartupScreen_dark -> Renders\_zipped\4_StartupScreen\2025-10-24\4_StartupScreen_dark.7z
 [zip] Renders\4_StartupScreen\2025-10-24\Shot1_dark -> Renders\_zipped\4_StartupScreen\2025-10-24\Shot1_dark.7z
 [zip] Renders\4_StartupScreen\2025-10-24\Shot2_dark -> Renders\_zipped\4_StartupScreen\2025-10-24\Shot2_dark.7z
 [zip] Renders\4_StartupScreen\2025-10-24\Shot3_dark -> Renders\_zipped\4_StartupScreen\2025-10-24\Shot3_dark.7z
 [zip] Renders\4_StartupScreen\2025-10-25\Shot1_dark -> Renders\_zipped\4_StartupScreen\2025-10-25\Shot1_dark.7z
 [zip] Renders\4_StartupScreen\2025-10-25\Shot3_dark -> Renders\_zipped\4_StartupScreen\2025-10-25\Shot3_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-02\Shot1_dark -> Renders\_zipped\4_StartupScreen\2025-11-02\Shot1_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-03\Shot1_dark -> Renders\_zipped\4_StartupScreen\2025-11-03\Shot1_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-04\4_StartupScreen_dark -> Renders\_zipped\4_StartupScreen\2025-11-04\4_StartupScreen_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-04\Shot2_dark -> Renders\_zipped\4_StartupScreen\2025-11-04\Shot2_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-04\Shot3_dark -> Renders\_zipped\4_StartupScreen\2025-11-04\Shot3_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-07\4_StartupScreen_dark -> Renders\_zipped\4_StartupScreen\2025-11-07\4_StartupScreen_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-07\Shot2_dark -> Renders\_zipped\4_StartupScreen\2025-11-07\Shot2_dark.7z
 [zip] Renders\4_StartupScreen\2025-11-07\Shot3_dark -> Renders\_zipped\4_StartupScreen\2025-11-07\Shot3_dark.7z
 [zip] Renders\5_goodbye\2025-02-23\219_goodbye_RR1 -> Renders\_zipped\5_goodbye\2025-02-23\219_goodbye_RR1.7z
 [zip] Renders\5_goodbye\2025-02-23\219_goodbye_RR1_evensrenamed -> Renders\_zipped\5_goodbye\2025-02-23\219_goodbye_RR1_evensrenamed.7z
 [zip] Renders\5_goodbye\2025-02-23\219_goodbye_RR2 -> Renders\_zipped\5_goodbye\2025-02-23\219_goodbye_RR2.7z
 [zip] Renders\5_goodbye\2025-02-23\219_goodbye_RR2_evensrenamed -> Renders\_zipped\5_goodbye\2025-02-23\219_goodbye_RR2_evensrenamed.7z
 [zip] Renders\5_goodbye\2025-02-23\219_goodbye_RR3 -> Renders\_zipped\5_goodbye\2025-02-23\219_goodbye_RR3.7z
 [zip] Renders\5_goodbye\2025-02-23\goodbye -> Renders\_zipped\5_goodbye\2025-02-23\goodbye.7z
 [zip] Renders\5_goodbye\2025-02-23\goodbye_test -> Renders\_zipped\5_goodbye\2025-02-23\goodbye_test.7z
 [zip] Renders\4_StartupScreen\2025-02-08\Shot1Redo_1024 -> Renders\_zipped\4_StartupScreen\2025-02-08\Shot1Redo_1024.7z
 Sequence sync failed: 7z compression requested but 7z executable not found in PATH. Please install 7z (e.g., via Chocolatey: choco install 7zip) or set zipper to 'zip' in config.json
 Traceback (most recent call last):
  File "R:\Creative\artsy\maya\0 ProjectStructure\zip_sequences.py", line 841, in <module>
    raise SystemExit(main())
                     ~~~~^^
  File "R:\Creative\artsy\maya\0 ProjectStructure\zip_sequences.py", line 835, in main
    updated = run_zip(args.jobs, verbose=args.verbose)
  File "R:\Creative\artsy\maya\0 ProjectStructure\zip_sequences.py", line 719, in run_zip
    updated_paths.extend(future.result())
                         ~~~~~~~~~~~~~^^
  File "C:\Python313\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "C:\Python313\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Python313\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "R:\Creative\artsy\maya\0 ProjectStructure\zip_sequences.py", line 628, in process_zip
    zip_sequence(seq_dir, zip_path, per_job_memory_limit)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "R:\Creative\artsy\maya\0 ProjectStructure\zip_sequences.py", line 444, in zip_sequence
    raise RuntimeError(
    ...<3 lines>...
    )
 RuntimeError: 7z compression requested but 7z executable not found in PATH. Please install 7z (e.g., via Chocolatey: choco install 7zip) or set zipper to 'zip' in config.json
 . was unexpected at this time.
 ---
 _**Agent (model default, mode Agent)**_
 ---
 The only other error is the batch file syntax error at the end: `. was unexpected at this time.`
 This is a Windows batch file error, not from the Python script. It likely comes from the batch file that calls the Python script. Checking the batch file:
 ---
 <tool-use data-tool-type="generic" data-tool-name="glob_file_search">
 <details>
 <summary>Tool use: **glob_file_search** • Searched codebase "ZipSeqArchv.bat" • **1** result in **1** directory</summary>
 Directory: **d:\0 ProjectStructure** (1 file)
 | File |
 |------|
 | `ZipSeqArchv.bat` |
 </details>
 </tool-use>
 ---
 ---
 <tool-use data-tool-type="read" data-tool-name="read_file">
 <details><summary>Tool use: **read_file** • Read file: ZipSeqArchv.bat</summary>
 </details>
 </tool-use>
 ---
 ---
@@ -4569,6 +4863,41 @@ Lines added: 1, lines removed: 1
 ---
 The batch file looks fine. The `. was unexpected at this time.` error is likely a side effect of how the batch file handles the Python error exit.
 The Python script is working correctly:
 - RAM detection: 76.7GB available
 - Worker calculation: 3 workers (RAM limit)
 - Processing sequences as expected
 - Only fails when trying to run 7z (PATH issue, which you're ignoring)
 The batch file error is likely from error handling when Python exits with a non-zero code. The script logic appears correct; the only issue is the missing 7z executable.
 To verify, you could temporarily set `"zipper": "zip"` in the config to test without 7z, but otherwise the script is functioning as expected.
 ---
 _**User (2025-11-11 06:13Z)**_
 threadripper epic
 ---
 _**Agent (model default, mode Agent)**_
 ---
 Threadripper system with ~96GB RAM. The script is using it well:
 - **76.7GB available** (80% of total)
 - **3 workers** calculated from RAM (30.7GB / 8GB per job)
 - Processing **36 sequences** concurrently
 - CPU at **86%** utilization
 The RAM-based worker calculation is working as intended. With that much RAM, you can run multiple 7z instances in parallel while keeping the 1GB dictionary size for maximum compression. The script should handle large sequences efficiently on this system.
 ---
--- a/.specstory/history/2025-11-12_01-02Z-compression-failure-and-memory-allocation-issues.md
+++ b/.specstory/history/2025-11-12_01-02Z-compression-failure-and-memory-allocation-issues.md
--- a/zip_sequences.py
+++ b/zip_sequences.py
@@ -58,6 +58,12 @@ DEFAULT_CONFIG = {
 }
 def log(mode: str, message: str, *, verbose_only: bool = False, verbose: bool = False) -> None:
    if verbose_only and not verbose:
        return
    print(f"[{mode}] {message}", flush=True)
 def load_config() -> dict:
    # First try to load from project's .config folder (current working directory)
    # Then fall back to ProjectStructure repo config (next to zip_sequences.py)
@@ -70,22 +76,28 @@ def load_config() -> dict:
        ("repo", repo_config),
    ]
    log("init", "Loading configuration sources...")
    for source, config_path in config_paths:
        try:
            if config_path.exists():
                log("init", f"Reading {source} config at {config_path}")
                text = config_path.read_text(encoding="utf-8")
                try:
                    data = json.loads(text)
                    if isinstance(data, dict):
                        merged = DEFAULT_CONFIG.copy()
                        merged.update(data)
                        log("init", f"Configuration loaded from {source}")
                        return merged
                except json.JSONDecodeError:
                    log("init", f"Config file at {config_path} is invalid JSON; skipping")
                    continue
        except OSError:
            log("init", f"Unable to read config at {config_path}; skipping")
            continue
    # If no config found, return defaults
    log("init", "No config files found; using default settings")
    return DEFAULT_CONFIG.copy()
@@ -197,22 +209,32 @@ def estimate_ram_per_job(seq_dir: Path, seq_state: dict) -> int:
    total_bytes = sum(entry.get("size", 0) for entry in seq_state.get("files", []))
    if ZIPPER_TYPE == "7z":
-        # Base RAM: 500MB per job
+        # Fixed dictionary size: 1GB (1024MB)
-        base_ram = 500 * 1024 * 1024  # 500 MB
+        FIXED_DICT_SIZE_MB = 1024
        FIXED_DICT_SIZE_BYTES = FIXED_DICT_SIZE_MB * 1024 * 1024
-        # Compression factor: 7z can use significant RAM, especially for large files
+        # Dictionary RAM: ~11x dictionary size (1GB dict = 11GB RAM for dictionary operations)
-        # Use 0.15x factor (conservative estimate accounting for 7z's 80% usage)
+        # This is the main memory consumer for 7z LZMA compression
-        compression_factor = 0.15
+        dict_ram = FIXED_DICT_SIZE_BYTES * 11  # 11GB for 1GB dictionary
-        # For very large folders (>10GB), cap at 8GB per job
+        # Input buffer: 7z processes in chunks, but for very large sequences needs more RAM
-        max_ram_per_job = 8 * 1024 * 1024 * 1024  # 8 GB
+        # For sequences >100GB, use a larger buffer factor to handle file metadata and processing
-        large_folder_threshold = 10 * 1024 * 1024 * 1024  # 10 GB
+        if total_bytes > 500 * 1024 * 1024 * 1024:  # >500GB (extremely large)
-        
+            # Extremely large sequences: use 8% of size, capped at 64GB
-        if total_bytes > large_folder_threshold:
+            # This accounts for file metadata, directory structures, and processing overhead
-            estimated_ram = max_ram_per_job
+            input_buffer = min(int(total_bytes * 0.08), 64 * 1024 * 1024 * 1024)
            overhead = 8 * 1024 * 1024 * 1024  # 8GB overhead for extremely large sequences
        elif total_bytes > 100 * 1024 * 1024 * 1024:  # >100GB
            # Very large sequences: use 5% of size, capped at 32GB
            input_buffer = min(int(total_bytes * 0.05), 32 * 1024 * 1024 * 1024)
            overhead = 4 * 1024 * 1024 * 1024  # 4GB for large sequences
        else:
-            estimated_ram = max(base_ram, int(total_bytes * compression_factor))
+            # Smaller sequences: use 15% of size, capped at 2GB
-            estimated_ram = min(estimated_ram, max_ram_per_job)
+            input_buffer = min(int(total_bytes * 0.15), 2 * 1024 * 1024 * 1024)
            overhead = 1 * 1024 * 1024 * 1024  # 1GB for smaller sequences
        # Total RAM estimate
        estimated_ram = dict_ram + input_buffer + overhead
        return estimated_ram
    else:
@@ -245,7 +267,7 @@ def max_workers(
    # If no work items provided, return CPU-based limit
    if work_items is None or len(work_items) == 0:
-        return (cpu_limit, None)
+        return (cpu_limit, {})
    # Try to calculate RAM-based limit
    available_ram = get_available_ram()
@@ -254,7 +276,7 @@ def max_workers(
        # RAM detection failed, fall back to CPU limit
        if verbose:
            log("zip", "RAM detection failed, using CPU-based worker limit", verbose_only=True, verbose=verbose)
-        return (cpu_limit, None)
+        return (cpu_limit, {})
    # For 7z: use fixed dictionary size and calculate workers
    if ZIPPER_TYPE == "7z":
@@ -272,6 +294,12 @@ def max_workers(
            if requested and requested > 0:
                final_limit = min(final_limit, requested)
            # Create RAM limits dict (all use fixed dict size, but return as dict for consistency)
            ram_limits_dict: dict[Path, int] = {}
            if work_items:
                for seq_dir, _, _, _ in work_items:
                    ram_limits_dict[seq_dir] = fixed_dict_size_bytes
            if verbose:
                log(
                    "zip",
@@ -281,45 +309,97 @@ def max_workers(
                    verbose=verbose
                )
-            return (final_limit, fixed_dict_size_bytes)
+            return (final_limit, ram_limits_dict)
        # Auto-calculate based on RAM if Max7zInst not configured
-        # "Balls-to-the-walls" mode: use maximum resources
+        # Use bin-packing algorithm with size-aware RAM estimation
        if available_ram is not None:
            # 7z uses ~2-3x dictionary size in RAM, use 3x for aggressive mode
            # This is more realistic and allows more concurrent workers
            FIXED_RAM_PER_JOB = FIXED_DICT_SIZE_MB * 3 * 1024 * 1024  # 3GB per job
            # available_ram is already 80% of total (20% reserved for system)
-            # Use 95% of available RAM for compression jobs (aggressive mode)
+            # Use 95% of available RAM for compression jobs
            compression_ram = int(available_ram * 0.95)
-            # Calculate worker limit based on fixed per-job RAM
+            # Estimate RAM for each work item and create list with (seq_dir, estimated_ram, work_item)
-            ram_limit = max(1, compression_ram // FIXED_RAM_PER_JOB)
+            work_items_with_ram: list[tuple[Path, int, tuple[Path, Path, Path, dict]]] = []
            ram_limits_dict: dict[Path, int] = {}
            for seq_dir, zip_path, state_path, seq_state in work_items:
                try:
                    estimated_ram = estimate_ram_per_job(seq_dir, seq_state)
                    work_items_with_ram.append((seq_dir, estimated_ram, (seq_dir, zip_path, state_path, seq_state)))
                    ram_limits_dict[seq_dir] = estimated_ram
                except Exception:
                    # If estimation fails, use a safe default (12GB minimum for 1GB dict)
                    default_ram = 12 * 1024 * 1024 * 1024  # 12GB
                    work_items_with_ram.append((seq_dir, default_ram, (seq_dir, zip_path, state_path, seq_state)))
                    ram_limits_dict[seq_dir] = default_ram
            # Sort by estimated RAM (largest first) for bin-packing
            work_items_with_ram.sort(key=lambda x: x[1], reverse=True)
            # Bin-packing algorithm: pack largest items first
            bins: list[list[tuple[Path, int, tuple[Path, Path, Path, dict]]]] = []
            bin_remaining: list[int] = []
            for seq_dir, estimated_ram, work_item in work_items_with_ram:
                # Try to fit in existing bin
                placed = False
                for i, remaining in enumerate(bin_remaining):
                    if remaining >= estimated_ram:
                        bins[i].append((seq_dir, estimated_ram, work_item))
                        bin_remaining[i] -= estimated_ram
                        placed = True
                        break
                # If doesn't fit, create new bin
                if not placed:
                    bins.append([(seq_dir, estimated_ram, work_item)])
                    bin_remaining.append(compression_ram - estimated_ram)
            # Worker count is number of bins
            worker_count = len(bins)
            # Cap at number of actual work items (can't have more workers than jobs)
            num_work_items = len(work_items) if work_items else 0
            if num_work_items > 0:
-                ram_limit = min(ram_limit, num_work_items)
+                worker_count = min(worker_count, num_work_items)
-            # Use RAM limit directly (no CPU limit)
+            # Respect user's --jobs if provided
            final_limit = ram_limit
            if requested and requested > 0:
-                final_limit = min(final_limit, requested)
+                worker_count = min(worker_count, requested)
            if verbose:
                ram_gb = available_ram / (1024 ** 3)
                compression_ram_gb = compression_ram / (1024 ** 3)
-                ram_per_job_gb = FIXED_RAM_PER_JOB / (1024 ** 3)
+                total_estimated_gb = sum(ram for _, ram, _ in work_items_with_ram) / (1024 ** 3)
                log(
                    "zip",
-                    f"RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (95%), {ram_per_job_gb:.1f}GB per job (dict: {FIXED_DICT_SIZE_MB}MB) → "
+                    f"RAM: {ram_gb:.1f}GB available (80% of total), {compression_ram_gb:.1f}GB for compression (95%)",
-                    f"RAM limit: {ram_limit}, work items: {num_work_items}, requested: {requested}, final: {final_limit}",
+                    verbose_only=True,
                    verbose=verbose
                )
                log(
                    "zip",
                    f"Estimated RAM per sequence: {total_estimated_gb:.1f}GB total across {num_work_items} sequences",
                    verbose_only=True,
                    verbose=verbose
                )
                if len(bins) > 0:
                    bin_sizes = [sum(ram for _, ram, _ in bin_items) / (1024 ** 3) for bin_items in bins]
                    log(
                        "zip",
                        f"Bin-packing: {worker_count} workers, bin sizes: {[f'{s:.1f}GB' for s in bin_sizes[:5]]}{'...' if len(bin_sizes) > 5 else ''}",
                        verbose_only=True,
                        verbose=verbose
                    )
                log(
                    "zip",
                    f"Final worker count: {worker_count} (requested: {requested})",
                    verbose_only=True,
                    verbose=verbose
                )
-            return (final_limit, fixed_dict_size_bytes)
+            # Return worker count and RAM limits dict
            return (worker_count, ram_limits_dict)
        # RAM detection failed, use a safe default (no CPU limit)
        default_limit = 4
@@ -328,6 +408,14 @@ def max_workers(
            default_limit = min(default_limit, num_work_items)
        if requested and requested > 0:
            default_limit = min(default_limit, requested)
        # Create RAM limits dict with safe defaults (12GB per job for 1GB dict)
        ram_limits_dict: dict[Path, int] = {}
        if work_items:
            default_ram = 12 * 1024 * 1024 * 1024  # 12GB default
            for seq_dir, _, _, _ in work_items:
                ram_limits_dict[seq_dir] = default_ram
        if verbose:
            log(
                "zip",
@@ -336,21 +424,25 @@ def max_workers(
                verbose_only=True,
                verbose=verbose
            )
-        return (default_limit, fixed_dict_size_bytes)
+        return (default_limit, ram_limits_dict)
    # For zip compression, use existing estimation-based approach
    # Estimate RAM per job for each work item
    ram_estimates = []
    ram_limits_dict: dict[Path, int] = {}
    for seq_dir, zip_path, state_path, seq_state in work_items:
        try:
            estimated_ram = estimate_ram_per_job(seq_dir, seq_state)
            ram_estimates.append(estimated_ram)
            ram_limits_dict[seq_dir] = estimated_ram
        except Exception:
            # If estimation fails, use fallback estimate
-            ram_estimates.append(1024 * 1024 * 1024)  # 1GB fallback for zip
+            fallback_ram = 1024 * 1024 * 1024  # 1GB fallback for zip
            ram_estimates.append(fallback_ram)
            ram_limits_dict[seq_dir] = fallback_ram
    if not ram_estimates:
-        return (cpu_limit, None)
+        return (cpu_limit, {})
    max_ram_per_job = max(ram_estimates)
    ram_limit = max(1, available_ram // max_ram_per_job)
@@ -368,13 +460,7 @@ def max_workers(
            verbose=verbose
        )
-    return (final_limit, None)
+    return (final_limit, ram_limits_dict)
 def log(mode: str, message: str, *, verbose_only: bool = False, verbose: bool = False) -> None:
    if verbose_only and not verbose:
        return
    print(f"[{mode}] {message}")
 def is_archive_path(path: Path) -> bool:
@@ -516,11 +602,11 @@ def zip_sequence(seq_dir: Path, zip_path: Path, per_job_memory_limit: int | None
                "-t7z",  # Use 7z format, not zip
            ]
-            # Add fixed dictionary size if specified (7z memory usage is controlled by dictionary size)
+            # Always use fixed dictionary size: 1GB (1024MB)
-            if per_job_memory_limit is not None:
+            # The per_job_memory_limit parameter is the estimated RAM usage (for logging/info only)
-                # per_job_memory_limit is actually the fixed dictionary size in bytes
+            # We keep dictionary at 1GB for best compression regardless of RAM estimate
-                dict_size_mb = per_job_memory_limit // (1024 * 1024)
+            FIXED_DICT_SIZE_MB = 1024
-                cmd.append(f"-md={dict_size_mb}m")
+            cmd.append(f"-md={FIXED_DICT_SIZE_MB}m")
            cmd.extend([
                str(temp_zip_abs),
@@ -655,9 +741,20 @@ def process_expand(zip_path: Path, state: dict, *, verbose: bool) -> None:
 def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
    work_items: list[tuple[Path, Path, Path, dict]] = []
    log("init", f"Scanning sequences under {RENDER_ROOT.resolve()}")
    total_scanned = 0
    quick_skipped = 0
    state_skipped = 0
    empty_dirs = 0
    queued = 0
    if RENDER_ROOT.exists():
        for seq_dir in find_sequence_dirs(RENDER_ROOT):
            total_scanned += 1
            rel = seq_dir.relative_to(RENDER_ROOT)
            if total_scanned <= 5 or total_scanned % 10 == 0:
                log("scan", f"[{total_scanned}] Inspecting {rel}")
            # Get the target archive path (will be .7z if ZIPPER_TYPE is "7z")
            zip_path = archive_path_for(seq_dir)
            state_path = state_path_for(zip_path)
@@ -685,6 +782,9 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
                    archive_mtime = zip_path.stat().st_mtime_ns
                    # If directory wasn't modified since archive was created, skip state computation
                    if dir_mtime <= archive_mtime:
                        quick_skipped += 1
                        if quick_skipped <= 5:
                            log("scan", f"Skipping {rel} (unchanged since archive)")
                        # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
                        if old_zip_path and old_zip_path.exists():
                            old_zip_path.unlink(missing_ok=True)
@@ -699,11 +799,17 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
            # Compute current state only if we need to
            seq_state = compute_state(seq_dir)
            if not seq_state["files"]:
                empty_dirs += 1
                if empty_dirs <= 5:
                    log("scan", f"{rel} has no files; skipping")
                continue
            # Check if state changed
            if stored_state is not None and not state_changed(seq_state, stored_state):
                # Metadata matches stored state
                state_skipped += 1
                if state_skipped <= 5:
                    log("scan", f"{rel} metadata unchanged; archive up to date")
                if zip_path.exists():
                    # .7z exists and is up to date, clean up old .zip if it exists
                    if old_zip_path and old_zip_path.exists():
@@ -720,16 +826,30 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
                    continue
            work_items.append((seq_dir, zip_path, state_path, seq_state))
            queued += 1
            if queued <= 5 or queued % 5 == 0:
                total_bytes = sum(entry.get("size", 0) for entry in seq_state.get("files", []))
                size_gb = total_bytes / (1024 ** 3)
                log("scan", f"Queued {rel} for compression (~{size_gb:.2f}GB) [{queued} total]")
    if not work_items:
        if not RENDER_ROOT.exists():
            log("zip", "Render root 'Renders' not found; nothing to zip.")
        else:
            log("zip", "Archives already up to date; no sequences needed zipping.")
        log(
            "scan",
            f"Summary: scanned {total_scanned}, quick-skipped {quick_skipped}, "
            f"state-skipped {state_skipped}, empty {empty_dirs}, queued {queued}",
        )
        return 0
    # Calculate RAM-aware worker count based on work items
-    worker_count, per_job_memory_limit = max_workers(requested_workers, work_items, verbose=verbose)
+    worker_count, ram_limits_dict = max_workers(requested_workers, work_items, verbose=verbose)
    log(
        "init",
        f"Preparing to compress {len(work_items)} sequence(s) with {worker_count} worker(s)",
    )
    updated_paths: list[Path] = []
@@ -738,7 +858,7 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
    with ThreadPoolExecutor(max_workers=worker_count) as executor:
        future_map = {
-            executor.submit(process_zip, seq_dir, zip_path, state_path, seq_state, per_job_memory_limit, verbose=verbose): seq_dir
+            executor.submit(process_zip, seq_dir, zip_path, state_path, seq_state, ram_limits_dict.get(seq_dir), verbose=verbose): seq_dir
            for seq_dir, zip_path, state_path, seq_state in work_items
        }
@@ -758,6 +878,11 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
            verbose_only=True,
            verbose=verbose,
        )
    log(
        "scan",
        f"Summary: scanned {total_scanned}, quick-skipped {quick_skipped}, "
        f"state-skipped {state_skipped}, empty {empty_dirs}, queued {queued}",
    )
    removed = cleanup_orphan_archives(verbose=verbose)
    if removed:
@@ -851,6 +976,13 @@ def cleanup_orphan_archives(*, verbose: bool) -> int:
 def main() -> int:
    args = parse_args()
    log("init", "zip_sequences starting up...")
    log("init", f"Working directory: {Path.cwd()}")
    log("init", f"Mode: {args.mode}, zipper: {ZIPPER_TYPE}, jobs arg: {args.jobs or 'auto'}")
    if ZIPPER_TYPE == "7z":
        exe = SEVEN_Z_EXE or "not found"
        max_inst = MAX_7Z_INSTANCES if MAX_7Z_INSTANCES is not None else "auto"
        log("init", f"7z executable: {exe}, Max7zInst: {max_inst}")
    if args.mode == "expand":
        # For expand mode, use simple CPU-based worker calculation
Author	SHA1	Message	Date
Nathan	6c20b2de52	work config	2025-11-12 09:49:40 -07:00
Raincloud	b7ae8880e7	more 7zinst behavior modification	2025-11-12 09:01:32 -07:00
Raincloud	0da1dbd09a	HOME config	2025-11-11 18:09:35 -07:00