14 KiB
Zipseq Thumbs.db exclusion (2025-12-29 17:30Z)
User (2025-12-29 17:37Z)
zipseq needs to completely ignore Thumbs.db. It should be removed from any existing manifests. If the manifest is changed, but only Thumbs.db is changed (removed) it should not be rezipped.
Agent (model default, mode Agent)
Examining zip_sequences.py to understand how it handles manifests and files.
Tool use: **read_file** • Read file: zip_sequences.py
Updating zip_sequences.py to ignore Thumbs.db files. Modifying file iteration, state loading, and state comparison.
Tool use: **read_file** • Read file: zip_sequences.py
Tool use: **read_file** • Read file: zip_sequences.py
Tool use: **read_file** • Read file: zip_sequences.py
Tool use: **read_file** • Read file: zip_sequences.py
Updating the code to ignore Thumbs.db files:
Tool use: **search_replace** • Edit file: zip_sequences.py
Chunk 1 Lines added: 3, lines removed: undefined
@@ -522,4 +522,7 @@
continue
for filename in filenames:
+ # Ignore Thumbs.db files completely
+ if filename.lower() == "thumbs.db":
+ continue
yield path / filename
Tool use: **search_replace** • Edit file: zip_sequences.py
Chunk 1 Lines added: 6, lines removed: undefined
@@ -567,4 +567,10 @@
try:
state = json.loads(state_path.read_text())
+ # Remove Thumbs.db entries from loaded state
+ if "files" in state:
+ state["files"] = [
+ entry for entry in state.get("files", [])
+ if Path(entry.get("path", "")).name.lower() != "thumbs.db"
+ ]
# Normalize timestamps in loaded state to ensure consistency
# This handles state files created before normalization was added
Tool use: **search_replace** • Edit file: zip_sequences.py
Chunk 1 Lines added: 4, lines removed: 1
@@ -597,7 +597,10 @@
def normalize_state(state: dict) -> dict:
- """Normalize timestamps in state to filesystem precision."""
+ """Normalize timestamps in state to filesystem precision and filter out Thumbs.db."""
normalized = {"files": []}
for entry in state.get("files", []):
+ # Ignore Thumbs.db files in state comparison
+ if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+ continue
mtime_ns = entry["mtime_ns"]
if is_windows:
Chunk 2 Lines added: 1, lines removed: 1
@@ -610,5 +613,5 @@
return normalized
- # Compare normalized states
+ # Compare normalized states (Thumbs.db already filtered out)
normalized_seq = normalize_state(seq_state)
normalized_stored = normalize_state(stored_state)
Checking for lint errors and verifying that zip_sequence uses the filtered file list:
Tool use: **read_lints** • Read lints for 1 file
Lint paths:
zip_sequences.py
Lint results
No lint errors found
Tool use: **read_file** • Read file: zip_sequences.py
Verifying that zip_sequence uses the filtered iterator:
Tool use: **read_file** • Read file: zip_sequences.py
Summary of Changes
-
iter_sequence_files()— SkipsThumbs.dbfiles during iteration, so they're never included in new states or archives. -
load_state()— Removes anyThumbs.dbentries from existing manifests when loading state files. -
state_changed()— Filters outThumbs.dbentries from both states before comparison, so if the only change is removingThumbs.db, it won't trigger a rezip.
Results:
Thumbs.dbfiles are ignored during scanning- Existing manifests have
Thumbs.dbentries removed when loaded - If a manifest change is only due to
Thumbs.dbremoval, it won't trigger recompression
The zip_sequence() function automatically benefits from these changes since it uses iter_sequence_files(), which now filters out Thumbs.db.
No linting errors were found. The changes are ready to use.
User (2025-12-29 17:37Z)
does zipseq currently scan other types in the seq than png? e.g. jpg, exr
Agent (model default, mode Agent)
Checking which file types zipseq scans:
Tool use: **read_file** • Read file: zip_sequences.py
Tool use: **read_file** • Read file: zip_sequences.py
Yes. zipseq scans multiple file types, not just PNG.
SEQUENCE_EXTENSIONS = {
".png",
".jpg",
".jpeg",
".tif",
".tiff",
".exr",
}
The find_sequence_dirs() function uses this set to identify sequence directories:
has_frames = any(Path(dirpath, f).suffix.lower() in SEQUENCE_EXTENSIONS for f in filenames)
Supported extensions:
.png.jpg.jpeg.tif.tiff.exr
All of these are included in archives and tracked in manifests.
User (2025-12-29 17:37Z)
if there's a sequence with both jpg and png (by mistake) will it scan and compress both?
Agent (model default, mode Agent)
Yes. If a sequence directory contains both JPG and PNG files, both will be scanned and compressed.
How it works:
find_sequence_dirs()checks if a directory has any files matchingSEQUENCE_EXTENSIONS(.png,.jpg,.jpeg,.tif,.tiff,.exr). If it finds any, it treats the directory as a sequence directory.- Once identified as a sequence directory,
iter_sequence_files()includes all files in that directory (exceptThumbs.dband archive paths), regardless of extension:
def iter_sequence_files(seq_dir: Path) -> Iterator[Path]:
for dirpath, dirnames, filenames in os.walk(seq_dir):
path = Path(dirpath)
dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
if is_archive_path(path):
continue
for filename in filenames:
# Ignore Thumbs.db files completely
if filename.lower() == "thumbs.db":
continue
yield path / filename
So if a directory has both frame_001.jpg and frame_001.png, both will be:
- Included in the state manifest
- Compressed into the archive
The extension check only determines whether a directory is a sequence directory; it doesn't filter which files get included once it's identified as one.
User (2025-12-29 17:37Z)
I had one with jpgs that I removed so only the pngs remain, but the manifest isn't changing.
Agent (model default, mode Agent)
Checking the state comparison logic to see why the manifest isn't updating after removing JPGs:
Tool use: **read_file** • Read file: zip_sequences.py
Tool use: **codebase_search** • Searched codebase "How does the quick skip check work when directory mtime is older than archive?" • **14** results
| File | Lines |
|---|---|
zip_sequences.py |
L981-1156 |
zip_sequences.py |
L979-1156 |
scripts_old/0MoveToCurrent.bat |
L61 |
zip_sequences.py |
L563-650 |
zip_sequences.py |
L471-513 |
zip_sequences.py |
L516-560 |
zip_sequences.py |
L1159-1286 |
zip_sequences.py |
L676-969 |
zip_sequences.py |
L861-969 |
zip_sequences.py |
L984-1286 |
UpdateSequences.ps1 |
L54-96 |
zip_sequences.py |
L584-657 |
zip_sequences.py |
L861-976 |
zip_sequences.py |
L557-650 |
The quick-skip optimization is likely the cause. It skips state computation if the directory mtime is <= the archive mtime, which can miss deletions when the directory mtime doesn't update. Checking the logic and making it more reliable:
Tool use: **read_file** • Read file: zip_sequences.py
The quick-skip optimization can miss file deletions when the directory mtime doesn't update. Adding a pre-check: if any files from the stored state are missing, we'll recompute the state instead of skipping.
Tool use: **search_replace** • Edit file: zip_sequences.py
Chunk 1 Lines added: 29, lines removed: 15
@@ -1030,19 +1030,33 @@
if zip_path.exists() and stored_state is not None:
# Quick check: if directory mtime is older than archive, likely unchanged
- try:
- dir_mtime = seq_dir.stat().st_mtime_ns
- archive_mtime = zip_path.stat().st_mtime_ns
- # If directory wasn't modified since archive was created, skip state computation
- if dir_mtime <= archive_mtime:
- quick_skipped += 1
- if quick_skipped <= 5:
- log("scan", f"Skipping {rel} (unchanged since archive)")
- # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
- if old_zip_path and old_zip_path.exists():
- old_zip_path.unlink(missing_ok=True)
- old_state_path = state_path_for(old_zip_path)
- if old_state_path.exists():
- old_state_path.unlink(missing_ok=True)
- continue
+ # But first verify that all files in stored state still exist (catches deletions)
+ try:
+ # Quick check: verify stored files still exist (catches file deletions)
+ stored_files_exist = True
+ for entry in stored_state.get("files", []):
+ # Skip Thumbs.db entries (they're filtered out anyway)
+ if Path(entry.get("path", "")).name.lower() == "thumbs.db":
+ continue
+ file_path = seq_dir / entry["path"]
+ if not file_path.exists():
+ stored_files_exist = False
+ break
+
+ # Only do mtime check if all stored files still exist
+ if stored_files_exist:
+ dir_mtime = seq_dir.stat().st_mtime_ns
+ archive_mtime = zip_path.stat().st_mtime_ns
+ # If directory wasn't modified since archive was created, skip state computation
+ if dir_mtime <= archive_mtime:
+ quick_skipped += 1
+ if quick_skipped <= 5:
+ log("scan", f"Skipping {rel} (unchanged since archive)")
+ # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
+ if old_zip_path and old_zip_path.exists():
+ old_zip_path.unlink(missing_ok=True)
+ old_state_path = state_path_for(old_zip_path)
+ if old_state_path.exists():
+ old_state_path.unlink(missing_ok=True)
+ continue
except OSError:
# If stat fails, fall through to full state computation
Tool use: **read_lints** • Read lints for 1 file
Lint paths:
zip_sequences.py
Lint results
No lint errors found
Fixed the issue. The quick-skip optimization was skipping state computation when the directory mtime was <= the archive mtime, which can miss file deletions if the directory mtime doesn't update.
The fix: Before the quick-skip, the code now verifies that all files in the stored state still exist. If any are missing (like the JPGs you removed), it skips the quick-skip and recomputes the full state, which will detect the change and update the manifest.
Now when you run zipseq:
- It checks if stored files still exist
- If JPGs are missing, it recomputes the state
- It detects the difference and updates the manifest
- It recompresses the archive with only the remaining PNGs
The manifest should now update correctly when files are removed, even if the directory mtime doesn't change.