zipseq file type mix handling

2025-12-29 13:29:25 -07:00
parent e48c790d89
commit 374e92ffb8
2 changed files with 330 additions and 14 deletions
--- a/.specstory/history/2025-12-29_17-30Z-zipseq-thumbs-db-exclusion.md
+++ b/.specstory/history/2025-12-29_17-30Z-zipseq-thumbs-db-exclusion.md
@@ -251,3 +251,305 @@ No linting errors were found. The changes are ready to use.

 ---

+_**User (2025-12-29 17:37Z)**_
+
+does zipseq currently scan other types in the seq than png? e.g. jpg, exr
+
+---
+
+_**Agent (model default, mode Agent)**_
+
+
+
+---
+
+
+Checking which file types `zipseq` scans:
+
+
+---
+
+<tool-use data-tool-type="read" data-tool-name="read_file">
+<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
+
+</details>
+</tool-use>
+
+---
+
+<tool-use data-tool-type="read" data-tool-name="read_file">
+<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
+
+</details>
+</tool-use>
+
+---
+
+
+
+---
+
+
+Yes. `zipseq` scans multiple file types, not just PNG.
+
+```45:52:zip_sequences.py
+SEQUENCE_EXTENSIONS = {
+    ".png",
+    ".jpg",
+    ".jpeg",
+    ".tif",
+    ".tiff",
+    ".exr",
+}
+```
+
+The `find_sequence_dirs()` function uses this set to identify sequence directories:
+
+```491:491:zip_sequences.py
+        has_frames = any(Path(dirpath, f).suffix.lower() in SEQUENCE_EXTENSIONS for f in filenames)
+```
+
+Supported extensions:
+- `.png`
+- `.jpg`
+- `.jpeg`
+- `.tif`
+- `.tiff`
+- `.exr`
+
+All of these are included in archives and tracked in manifests.
+
+---
+
+_**User (2025-12-29 17:37Z)**_
+
+if there's a sequence with both jpg and png (by mistake) will it scan and compress both?
+
+---
+
+_**Agent (model default, mode Agent)**_
+
+
+
+---
+
+
+Yes. If a sequence directory contains both JPG and PNG files, both will be scanned and compressed.
+
+How it works:
+1. `find_sequence_dirs()` checks if a directory has any files matching `SEQUENCE_EXTENSIONS` (`.png`, `.jpg`, `.jpeg`, `.tif`, `.tiff`, `.exr`). If it finds any, it treats the directory as a sequence directory.
+2. Once identified as a sequence directory, `iter_sequence_files()` includes all files in that directory (except `Thumbs.db` and archive paths), regardless of extension:
+
+```517:524:zip_sequences.py
+def iter_sequence_files(seq_dir: Path) -> Iterator[Path]:
+    for dirpath, dirnames, filenames in os.walk(seq_dir):
+        path = Path(dirpath)
+        dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
+        if is_archive_path(path):
+            continue
+        for filename in filenames:
+            # Ignore Thumbs.db files completely
+            if filename.lower() == "thumbs.db":
+                continue
+            yield path / filename
+```
+
+So if a directory has both `frame_001.jpg` and `frame_001.png`, both will be:
+- Included in the state manifest
+- Compressed into the archive
+
+The extension check only determines whether a directory is a sequence directory; it doesn't filter which files get included once it's identified as one.
+
+---
+
+_**User (2025-12-29 17:37Z)**_
+
+I had one with jpgs that I removed so only the pngs remain, but the manifest isn't changing.
+
+---
+
+_**Agent (model default, mode Agent)**_
+
+
+
+---
+
+
+Checking the state comparison logic to see why the manifest isn't updating after removing JPGs:
+
+
+---
+
+<tool-use data-tool-type="read" data-tool-name="read_file">
+<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
+
+</details>
+</tool-use>
+
+---
+
+<tool-use data-tool-type="search" data-tool-name="codebase_search">
+<details>
+                <summary>Tool use: **codebase_search** • Searched codebase "How does the quick skip check work when directory mtime is older than archive?" • **14** results</summary>
+            
+| File | Lines |
+|------|-------|
+| `zip_sequences.py` | L981-1156 |
+| `zip_sequences.py` | L979-1156 |
+| `scripts_old/0MoveToCurrent.bat` | L61 |
+| `zip_sequences.py` | L563-650 |
+| `zip_sequences.py` | L471-513 |
+| `zip_sequences.py` | L516-560 |
+| `zip_sequences.py` | L1159-1286 |
+| `zip_sequences.py` | L676-969 |
+| `zip_sequences.py` | L861-969 |
+| `zip_sequences.py` | L984-1286 |
+| `UpdateSequences.ps1` | L54-96 |
+| `zip_sequences.py` | L584-657 |
+| `zip_sequences.py` | L861-976 |
+| `zip_sequences.py` | L557-650 |
+
+</details>
+</tool-use>
+
+---
+
+
+
+---
+
+
+The quick-skip optimization is likely the cause. It skips state computation if the directory mtime is <= the archive mtime, which can miss deletions when the directory mtime doesn't update. Checking the logic and making it more reliable:
+
+
+---
+
+<tool-use data-tool-type="read" data-tool-name="read_file">
+<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
+
+</details>
+</tool-use>
+
+---
+
+
+
+---
+
+
+The quick-skip optimization can miss file deletions when the directory mtime doesn't update. Adding a pre-check: if any files from the stored state are missing, we'll recompute the state instead of skipping.
+
+
+---
+
+<tool-use data-tool-type="write" data-tool-name="search_replace">
+
+<details><summary>Tool use: **search_replace** • Edit file: zip_sequences.py</summary>
+
+**Chunk 1**
+Lines added: 29, lines removed: 15
+
+```diff
+@@ -1030,19 +1030,33 @@
+              if zip_path.exists() and stored_state is not None:
+                  # Quick check: if directory mtime is older than archive, likely unchanged
+-                 try:
+-                     dir_mtime = seq_dir.stat().st_mtime_ns
+-                     archive_mtime = zip_path.stat().st_mtime_ns
+-                     # If directory wasn't modified since archive was created, skip state computation
+-                     if dir_mtime &lt;= archive_mtime:
+-                         quick_skipped += 1
+-                         if quick_skipped &lt;= 5:
+-                             log("scan", f"Skipping {rel} (unchanged since archive)")
+-                         # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
+-                         if old_zip_path and old_zip_path.exists():
+-                             old_zip_path.unlink(missing_ok=True)
+-                             old_state_path = state_path_for(old_zip_path)
+-                             if old_state_path.exists():
+-                                 old_state_path.unlink(missing_ok=True)
+-                         continue
+                 # But first verify that all files in stored state still exist (catches deletions)
+                 try:
+                     # Quick check: verify stored files still exist (catches file deletions)
+                     stored_files_exist = True
+                     for entry in stored_state.get("files", []):
+                         # Skip Thumbs.db entries (they're filtered out anyway)
+                         if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+                             continue
+                         file_path = seq_dir / entry["path"]
+                         if not file_path.exists():
+                             stored_files_exist = False
+                             break
+                     
+                     # Only do mtime check if all stored files still exist
+                     if stored_files_exist:
+                         dir_mtime = seq_dir.stat().st_mtime_ns
+                         archive_mtime = zip_path.stat().st_mtime_ns
+                         # If directory wasn't modified since archive was created, skip state computation
+                         if dir_mtime &lt;= archive_mtime:
+                             quick_skipped += 1
+                             if quick_skipped &lt;= 5:
+                                 log("scan", f"Skipping {rel} (unchanged since archive)")
+                             # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
+                             if old_zip_path and old_zip_path.exists():
+                                 old_zip_path.unlink(missing_ok=True)
+                                 old_state_path = state_path_for(old_zip_path)
+                                 if old_state_path.exists():
+                                     old_state_path.unlink(missing_ok=True)
+                             continue
+                  except OSError:
+                      # If stat fails, fall through to full state computation
+```
+
+</details>
+
+</tool-use>
+
+---
+
+
+
+---
+
+
+
+
+
+---
+
+<tool-use data-tool-type="generic" data-tool-name="read_lints">
+<details>
+          <summary>Tool use: **read_lints** • Read lints for 1 file</summary>
+      
+Lint paths:
+
+- `zip_sequences.py`
+
+Lint results
+
+
+**No lint errors found**
+</details>
+</tool-use>
+
+---
+
+
+
+---
+
+
+Fixed the issue. The quick-skip optimization was skipping state computation when the directory mtime was <= the archive mtime, which can miss file deletions if the directory mtime doesn't update.
+
+**The fix:** Before the quick-skip, the code now verifies that all files in the stored state still exist. If any are missing (like the JPGs you removed), it skips the quick-skip and recomputes the full state, which will detect the change and update the manifest.
+
+Now when you run `zipseq`:
+1. It checks if stored files still exist
+2. If JPGs are missing, it recomputes the state
+3. It detects the difference and updates the manifest
+4. It recompresses the archive with only the remaining PNGs
+
+The manifest should now update correctly when files are removed, even if the directory mtime doesn't change.
+
+---
+
--- a/zip_sequences.py
+++ b/zip_sequences.py
@@ -1029,7 +1029,21 @@ def run_zip(requested_workers: int | None, *, verbose: bool) -> int:
            # If .7z archive exists and we have stored state, do quick check before computing full state
            if zip_path.exists() and stored_state is not None:
                # Quick check: if directory mtime is older than archive, likely unchanged
+                # But first verify that all files in stored state still exist (catches deletions)
                try:
+                    # Quick check: verify stored files still exist (catches file deletions)
+                    stored_files_exist = True
+                    for entry in stored_state.get("files", []):
+                        # Skip Thumbs.db entries (they're filtered out anyway)
+                        if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+                            continue
+                        file_path = seq_dir / entry["path"]
+                        if not file_path.exists():
+                            stored_files_exist = False
+                            break
+                    
+                    # Only do mtime check if all stored files still exist
+                    if stored_files_exist:
                        dir_mtime = seq_dir.stat().st_mtime_ns
                        archive_mtime = zip_path.stat().st_mtime_ns
                        # If directory wasn't modified since archive was created, skip state computation