Files
ProjectStructure/.specstory/history/2025-12-29_17-30Z-zipseq-thumbs-db-exclusion.md
2025-12-29 13:29:25 -07:00

14 KiB

Zipseq Thumbs.db exclusion (2025-12-29 17:30Z)

User (2025-12-29 17:37Z)

zipseq needs to completely ignore Thumbs.db. It should be removed from any existing manifests. If the manifest is changed, but only Thumbs.db is changed (removed) it should not be rezipped.


Agent (model default, mode Agent)


Examining zip_sequences.py to understand how it handles manifests and files.


Tool use: **read_file** • Read file: zip_sequences.py


Updating zip_sequences.py to ignore Thumbs.db files. Modifying file iteration, state loading, and state comparison.


Tool use: **read_file** • Read file: zip_sequences.py

Tool use: **read_file** • Read file: zip_sequences.py

Tool use: **read_file** • Read file: zip_sequences.py

Tool use: **read_file** • Read file: zip_sequences.py


Updating the code to ignore Thumbs.db files:


Tool use: **search_replace** • Edit file: zip_sequences.py

Chunk 1 Lines added: 3, lines removed: undefined

@@ -522,4 +522,7 @@
              continue
          for filename in filenames:
+             # Ignore Thumbs.db files completely
+             if filename.lower() == "thumbs.db":
+                 continue
              yield path / filename
  

Tool use: **search_replace** • Edit file: zip_sequences.py

Chunk 1 Lines added: 6, lines removed: undefined

@@ -567,4 +567,10 @@
      try:
          state = json.loads(state_path.read_text())
+         # Remove Thumbs.db entries from loaded state
+         if "files" in state:
+             state["files"] = [
+                 entry for entry in state.get("files", [])
+                 if Path(entry.get("path", "")).name.lower() != "thumbs.db"
+             ]
          # Normalize timestamps in loaded state to ensure consistency
          # This handles state files created before normalization was added

Tool use: **search_replace** • Edit file: zip_sequences.py

Chunk 1 Lines added: 4, lines removed: 1

@@ -597,7 +597,10 @@
      
      def normalize_state(state: dict) -> dict:
-         """Normalize timestamps in state to filesystem precision."""
+         """Normalize timestamps in state to filesystem precision and filter out Thumbs.db."""
          normalized = {"files": []}
          for entry in state.get("files", []):
+             # Ignore Thumbs.db files in state comparison
+             if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+                 continue
              mtime_ns = entry["mtime_ns"]
              if is_windows:

Chunk 2 Lines added: 1, lines removed: 1

@@ -610,5 +613,5 @@
          return normalized
      
-     # Compare normalized states
+     # Compare normalized states (Thumbs.db already filtered out)
      normalized_seq = normalize_state(seq_state)
      normalized_stored = normalize_state(stored_state)


Checking for lint errors and verifying that zip_sequence uses the filtered file list:


Tool use: **read_lints** • Read lints for 1 file

Lint paths:

  • zip_sequences.py

Lint results

No lint errors found


Tool use: **read_file** • Read file: zip_sequences.py


Verifying that zip_sequence uses the filtered iterator:


Tool use: **read_file** • Read file: zip_sequences.py


Summary of Changes

  1. iter_sequence_files() — Skips Thumbs.db files during iteration, so they're never included in new states or archives.

  2. load_state() — Removes any Thumbs.db entries from existing manifests when loading state files.

  3. state_changed() — Filters out Thumbs.db entries from both states before comparison, so if the only change is removing Thumbs.db, it won't trigger a rezip.

Results:

  • Thumbs.db files are ignored during scanning
  • Existing manifests have Thumbs.db entries removed when loaded
  • If a manifest change is only due to Thumbs.db removal, it won't trigger recompression

The zip_sequence() function automatically benefits from these changes since it uses iter_sequence_files(), which now filters out Thumbs.db.

No linting errors were found. The changes are ready to use.


User (2025-12-29 17:37Z)

does zipseq currently scan other types in the seq than png? e.g. jpg, exr


Agent (model default, mode Agent)


Checking which file types zipseq scans:


Tool use: **read_file** • Read file: zip_sequences.py

Tool use: **read_file** • Read file: zip_sequences.py


Yes. zipseq scans multiple file types, not just PNG.

SEQUENCE_EXTENSIONS = {
    ".png",
    ".jpg",
    ".jpeg",
    ".tif",
    ".tiff",
    ".exr",
}

The find_sequence_dirs() function uses this set to identify sequence directories:

        has_frames = any(Path(dirpath, f).suffix.lower() in SEQUENCE_EXTENSIONS for f in filenames)

Supported extensions:

  • .png
  • .jpg
  • .jpeg
  • .tif
  • .tiff
  • .exr

All of these are included in archives and tracked in manifests.


User (2025-12-29 17:37Z)

if there's a sequence with both jpg and png (by mistake) will it scan and compress both?


Agent (model default, mode Agent)


Yes. If a sequence directory contains both JPG and PNG files, both will be scanned and compressed.

How it works:

  1. find_sequence_dirs() checks if a directory has any files matching SEQUENCE_EXTENSIONS (.png, .jpg, .jpeg, .tif, .tiff, .exr). If it finds any, it treats the directory as a sequence directory.
  2. Once identified as a sequence directory, iter_sequence_files() includes all files in that directory (except Thumbs.db and archive paths), regardless of extension:
def iter_sequence_files(seq_dir: Path) -> Iterator[Path]:
    for dirpath, dirnames, filenames in os.walk(seq_dir):
        path = Path(dirpath)
        dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
        if is_archive_path(path):
            continue
        for filename in filenames:
            # Ignore Thumbs.db files completely
            if filename.lower() == "thumbs.db":
                continue
            yield path / filename

So if a directory has both frame_001.jpg and frame_001.png, both will be:

  • Included in the state manifest
  • Compressed into the archive

The extension check only determines whether a directory is a sequence directory; it doesn't filter which files get included once it's identified as one.


User (2025-12-29 17:37Z)

I had one with jpgs that I removed so only the pngs remain, but the manifest isn't changing.


Agent (model default, mode Agent)


Checking the state comparison logic to see why the manifest isn't updating after removing JPGs:


Tool use: **read_file** • Read file: zip_sequences.py

Tool use: **codebase_search** • Searched codebase "How does the quick skip check work when directory mtime is older than archive?" • **14** results
File Lines
zip_sequences.py L981-1156
zip_sequences.py L979-1156
scripts_old/0MoveToCurrent.bat L61
zip_sequences.py L563-650
zip_sequences.py L471-513
zip_sequences.py L516-560
zip_sequences.py L1159-1286
zip_sequences.py L676-969
zip_sequences.py L861-969
zip_sequences.py L984-1286
UpdateSequences.ps1 L54-96
zip_sequences.py L584-657
zip_sequences.py L861-976
zip_sequences.py L557-650


The quick-skip optimization is likely the cause. It skips state computation if the directory mtime is <= the archive mtime, which can miss deletions when the directory mtime doesn't update. Checking the logic and making it more reliable:


Tool use: **read_file** • Read file: zip_sequences.py


The quick-skip optimization can miss file deletions when the directory mtime doesn't update. Adding a pre-check: if any files from the stored state are missing, we'll recompute the state instead of skipping.


Tool use: **search_replace** • Edit file: zip_sequences.py

Chunk 1 Lines added: 29, lines removed: 15

@@ -1030,19 +1030,33 @@
              if zip_path.exists() and stored_state is not None:
                  # Quick check: if directory mtime is older than archive, likely unchanged
-                 try:
-                     dir_mtime = seq_dir.stat().st_mtime_ns
-                     archive_mtime = zip_path.stat().st_mtime_ns
-                     # If directory wasn't modified since archive was created, skip state computation
-                     if dir_mtime &lt;= archive_mtime:
-                         quick_skipped += 1
-                         if quick_skipped &lt;= 5:
-                             log("scan", f"Skipping {rel} (unchanged since archive)")
-                         # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
-                         if old_zip_path and old_zip_path.exists():
-                             old_zip_path.unlink(missing_ok=True)
-                             old_state_path = state_path_for(old_zip_path)
-                             if old_state_path.exists():
-                                 old_state_path.unlink(missing_ok=True)
-                         continue
+                 # But first verify that all files in stored state still exist (catches deletions)
+                 try:
+                     # Quick check: verify stored files still exist (catches file deletions)
+                     stored_files_exist = True
+                     for entry in stored_state.get("files", []):
+                         # Skip Thumbs.db entries (they're filtered out anyway)
+                         if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+                             continue
+                         file_path = seq_dir / entry["path"]
+                         if not file_path.exists():
+                             stored_files_exist = False
+                             break
+                     
+                     # Only do mtime check if all stored files still exist
+                     if stored_files_exist:
+                         dir_mtime = seq_dir.stat().st_mtime_ns
+                         archive_mtime = zip_path.stat().st_mtime_ns
+                         # If directory wasn't modified since archive was created, skip state computation
+                         if dir_mtime &lt;= archive_mtime:
+                             quick_skipped += 1
+                             if quick_skipped &lt;= 5:
+                                 log("scan", f"Skipping {rel} (unchanged since archive)")
+                             # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
+                             if old_zip_path and old_zip_path.exists():
+                                 old_zip_path.unlink(missing_ok=True)
+                                 old_state_path = state_path_for(old_zip_path)
+                                 if old_state_path.exists():
+                                     old_state_path.unlink(missing_ok=True)
+                             continue
                  except OSError:
                      # If stat fails, fall through to full state computation



Tool use: **read_lints** • Read lints for 1 file

Lint paths:

  • zip_sequences.py

Lint results

No lint errors found



Fixed the issue. The quick-skip optimization was skipping state computation when the directory mtime was <= the archive mtime, which can miss file deletions if the directory mtime doesn't update.

The fix: Before the quick-skip, the code now verifies that all files in the stored state still exist. If any are missing (like the JPGs you removed), it skips the quick-skip and recomputes the full state, which will detect the change and update the manifest.

Now when you run zipseq:

  1. It checks if stored files still exist
  2. If JPGs are missing, it recomputes the state
  3. It detects the difference and updates the manifest
  4. It recompresses the archive with only the remaining PNGs

The manifest should now update correctly when files are removed, even if the directory mtime doesn't change.