556 lines
14 KiB
Markdown
556 lines
14 KiB
Markdown
<!-- Generated by SpecStory, Markdown v2.1.0 -->
|
|
|
|
<!-- cursor Session 709b2b05-b26b-4f58-8cd9-1a63b137e96e (2025-12-29 17:30Z) -->
|
|
|
|
# Zipseq Thumbs.db exclusion (2025-12-29 17:30Z)
|
|
|
|
_**User (2025-12-29 17:37Z)**_
|
|
|
|
zipseq needs to completely ignore Thumbs.db. It should be removed from any existing manifests. If the manifest is changed, but only Thumbs.db is changed (removed) it should not be rezipped.
|
|
|
|
---
|
|
|
|
_**Agent (model default, mode Agent)**_
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Examining `zip_sequences.py` to understand how it handles manifests and files.
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Updating `zip_sequences.py` to ignore `Thumbs.db` files. Modifying file iteration, state loading, and state comparison.
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Updating the code to ignore `Thumbs.db` files:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="write" data-tool-name="search_replace">
|
|
|
|
<details><summary>Tool use: **search_replace** • Edit file: zip_sequences.py</summary>
|
|
|
|
**Chunk 1**
|
|
Lines added: 3, lines removed: undefined
|
|
|
|
```diff
|
|
@@ -522,4 +522,7 @@
|
|
continue
|
|
for filename in filenames:
|
|
+ # Ignore Thumbs.db files completely
|
|
+ if filename.lower() == "thumbs.db":
|
|
+ continue
|
|
yield path / filename
|
|
|
|
```
|
|
|
|
</details>
|
|
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="write" data-tool-name="search_replace">
|
|
|
|
<details><summary>Tool use: **search_replace** • Edit file: zip_sequences.py</summary>
|
|
|
|
**Chunk 1**
|
|
Lines added: 6, lines removed: undefined
|
|
|
|
```diff
|
|
@@ -567,4 +567,10 @@
|
|
try:
|
|
state = json.loads(state_path.read_text())
|
|
+ # Remove Thumbs.db entries from loaded state
|
|
+ if "files" in state:
|
|
+ state["files"] = [
|
|
+ entry for entry in state.get("files", [])
|
|
+ if Path(entry.get("path", "")).name.lower() != "thumbs.db"
|
|
+ ]
|
|
# Normalize timestamps in loaded state to ensure consistency
|
|
# This handles state files created before normalization was added
|
|
```
|
|
|
|
</details>
|
|
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="write" data-tool-name="search_replace">
|
|
|
|
<details><summary>Tool use: **search_replace** • Edit file: zip_sequences.py</summary>
|
|
|
|
**Chunk 1**
|
|
Lines added: 4, lines removed: 1
|
|
|
|
```diff
|
|
@@ -597,7 +597,10 @@
|
|
|
|
def normalize_state(state: dict) -> dict:
|
|
- """Normalize timestamps in state to filesystem precision."""
|
|
+ """Normalize timestamps in state to filesystem precision and filter out Thumbs.db."""
|
|
normalized = {"files": []}
|
|
for entry in state.get("files", []):
|
|
+ # Ignore Thumbs.db files in state comparison
|
|
+ if Path(entry.get("path", "")).name.lower() == "thumbs.db":
|
|
+ continue
|
|
mtime_ns = entry["mtime_ns"]
|
|
if is_windows:
|
|
```
|
|
|
|
**Chunk 2**
|
|
Lines added: 1, lines removed: 1
|
|
|
|
```diff
|
|
@@ -610,5 +613,5 @@
|
|
return normalized
|
|
|
|
- # Compare normalized states
|
|
+ # Compare normalized states (Thumbs.db already filtered out)
|
|
normalized_seq = normalize_state(seq_state)
|
|
normalized_stored = normalize_state(stored_state)
|
|
```
|
|
|
|
</details>
|
|
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Checking for lint errors and verifying that `zip_sequence` uses the filtered file list:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="generic" data-tool-name="read_lints">
|
|
<details>
|
|
<summary>Tool use: **read_lints** • Read lints for 1 file</summary>
|
|
|
|
Lint paths:
|
|
|
|
- `zip_sequences.py`
|
|
|
|
Lint results
|
|
|
|
|
|
**No lint errors found**
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Verifying that `zip_sequence` uses the filtered iterator:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
## Summary of Changes
|
|
|
|
1. **`iter_sequence_files()`** — Skips `Thumbs.db` files during iteration, so they're never included in new states or archives.
|
|
|
|
2. **`load_state()`** — Removes any `Thumbs.db` entries from existing manifests when loading state files.
|
|
|
|
3. **`state_changed()`** — Filters out `Thumbs.db` entries from both states before comparison, so if the only change is removing `Thumbs.db`, it won't trigger a rezip.
|
|
|
|
Results:
|
|
- `Thumbs.db` files are ignored during scanning
|
|
- Existing manifests have `Thumbs.db` entries removed when loaded
|
|
- If a manifest change is only due to `Thumbs.db` removal, it won't trigger recompression
|
|
|
|
The `zip_sequence()` function automatically benefits from these changes since it uses `iter_sequence_files()`, which now filters out `Thumbs.db`.
|
|
|
|
No linting errors were found. The changes are ready to use.
|
|
|
|
---
|
|
|
|
_**User (2025-12-29 17:37Z)**_
|
|
|
|
does zipseq currently scan other types in the seq than png? e.g. jpg, exr
|
|
|
|
---
|
|
|
|
_**Agent (model default, mode Agent)**_
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Checking which file types `zipseq` scans:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Yes. `zipseq` scans multiple file types, not just PNG.
|
|
|
|
```45:52:zip_sequences.py
|
|
SEQUENCE_EXTENSIONS = {
|
|
".png",
|
|
".jpg",
|
|
".jpeg",
|
|
".tif",
|
|
".tiff",
|
|
".exr",
|
|
}
|
|
```
|
|
|
|
The `find_sequence_dirs()` function uses this set to identify sequence directories:
|
|
|
|
```491:491:zip_sequences.py
|
|
has_frames = any(Path(dirpath, f).suffix.lower() in SEQUENCE_EXTENSIONS for f in filenames)
|
|
```
|
|
|
|
Supported extensions:
|
|
- `.png`
|
|
- `.jpg`
|
|
- `.jpeg`
|
|
- `.tif`
|
|
- `.tiff`
|
|
- `.exr`
|
|
|
|
All of these are included in archives and tracked in manifests.
|
|
|
|
---
|
|
|
|
_**User (2025-12-29 17:37Z)**_
|
|
|
|
if there's a sequence with both jpg and png (by mistake) will it scan and compress both?
|
|
|
|
---
|
|
|
|
_**Agent (model default, mode Agent)**_
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Yes. If a sequence directory contains both JPG and PNG files, both will be scanned and compressed.
|
|
|
|
How it works:
|
|
1. `find_sequence_dirs()` checks if a directory has any files matching `SEQUENCE_EXTENSIONS` (`.png`, `.jpg`, `.jpeg`, `.tif`, `.tiff`, `.exr`). If it finds any, it treats the directory as a sequence directory.
|
|
2. Once identified as a sequence directory, `iter_sequence_files()` includes all files in that directory (except `Thumbs.db` and archive paths), regardless of extension:
|
|
|
|
```517:524:zip_sequences.py
|
|
def iter_sequence_files(seq_dir: Path) -> Iterator[Path]:
|
|
for dirpath, dirnames, filenames in os.walk(seq_dir):
|
|
path = Path(dirpath)
|
|
dirnames[:] = [d for d in dirnames if d not in ("_archive", "_CURRENT")]
|
|
if is_archive_path(path):
|
|
continue
|
|
for filename in filenames:
|
|
# Ignore Thumbs.db files completely
|
|
if filename.lower() == "thumbs.db":
|
|
continue
|
|
yield path / filename
|
|
```
|
|
|
|
So if a directory has both `frame_001.jpg` and `frame_001.png`, both will be:
|
|
- Included in the state manifest
|
|
- Compressed into the archive
|
|
|
|
The extension check only determines whether a directory is a sequence directory; it doesn't filter which files get included once it's identified as one.
|
|
|
|
---
|
|
|
|
_**User (2025-12-29 17:37Z)**_
|
|
|
|
I had one with jpgs that I removed so only the pngs remain, but the manifest isn't changing.
|
|
|
|
---
|
|
|
|
_**Agent (model default, mode Agent)**_
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Checking the state comparison logic to see why the manifest isn't updating after removing JPGs:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="search" data-tool-name="codebase_search">
|
|
<details>
|
|
<summary>Tool use: **codebase_search** • Searched codebase "How does the quick skip check work when directory mtime is older than archive?" • **14** results</summary>
|
|
|
|
| File | Lines |
|
|
|------|-------|
|
|
| `zip_sequences.py` | L981-1156 |
|
|
| `zip_sequences.py` | L979-1156 |
|
|
| `scripts_old/0MoveToCurrent.bat` | L61 |
|
|
| `zip_sequences.py` | L563-650 |
|
|
| `zip_sequences.py` | L471-513 |
|
|
| `zip_sequences.py` | L516-560 |
|
|
| `zip_sequences.py` | L1159-1286 |
|
|
| `zip_sequences.py` | L676-969 |
|
|
| `zip_sequences.py` | L861-969 |
|
|
| `zip_sequences.py` | L984-1286 |
|
|
| `UpdateSequences.ps1` | L54-96 |
|
|
| `zip_sequences.py` | L584-657 |
|
|
| `zip_sequences.py` | L861-976 |
|
|
| `zip_sequences.py` | L557-650 |
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
The quick-skip optimization is likely the cause. It skips state computation if the directory mtime is <= the archive mtime, which can miss deletions when the directory mtime doesn't update. Checking the logic and making it more reliable:
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="read" data-tool-name="read_file">
|
|
<details><summary>Tool use: **read_file** • Read file: zip_sequences.py</summary>
|
|
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
The quick-skip optimization can miss file deletions when the directory mtime doesn't update. Adding a pre-check: if any files from the stored state are missing, we'll recompute the state instead of skipping.
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="write" data-tool-name="search_replace">
|
|
|
|
<details><summary>Tool use: **search_replace** • Edit file: zip_sequences.py</summary>
|
|
|
|
**Chunk 1**
|
|
Lines added: 29, lines removed: 15
|
|
|
|
```diff
|
|
@@ -1030,19 +1030,33 @@
|
|
if zip_path.exists() and stored_state is not None:
|
|
# Quick check: if directory mtime is older than archive, likely unchanged
|
|
- try:
|
|
- dir_mtime = seq_dir.stat().st_mtime_ns
|
|
- archive_mtime = zip_path.stat().st_mtime_ns
|
|
- # If directory wasn't modified since archive was created, skip state computation
|
|
- if dir_mtime <= archive_mtime:
|
|
- quick_skipped += 1
|
|
- if quick_skipped <= 5:
|
|
- log("scan", f"Skipping {rel} (unchanged since archive)")
|
|
- # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
|
|
- if old_zip_path and old_zip_path.exists():
|
|
- old_zip_path.unlink(missing_ok=True)
|
|
- old_state_path = state_path_for(old_zip_path)
|
|
- if old_state_path.exists():
|
|
- old_state_path.unlink(missing_ok=True)
|
|
- continue
|
|
+ # But first verify that all files in stored state still exist (catches deletions)
|
|
+ try:
|
|
+ # Quick check: verify stored files still exist (catches file deletions)
|
|
+ stored_files_exist = True
|
|
+ for entry in stored_state.get("files", []):
|
|
+ # Skip Thumbs.db entries (they're filtered out anyway)
|
|
+ if Path(entry.get("path", "")).name.lower() == "thumbs.db":
|
|
+ continue
|
|
+ file_path = seq_dir / entry["path"]
|
|
+ if not file_path.exists():
|
|
+ stored_files_exist = False
|
|
+ break
|
|
+
|
|
+ # Only do mtime check if all stored files still exist
|
|
+ if stored_files_exist:
|
|
+ dir_mtime = seq_dir.stat().st_mtime_ns
|
|
+ archive_mtime = zip_path.stat().st_mtime_ns
|
|
+ # If directory wasn't modified since archive was created, skip state computation
|
|
+ if dir_mtime <= archive_mtime:
|
|
+ quick_skipped += 1
|
|
+ if quick_skipped <= 5:
|
|
+ log("scan", f"Skipping {rel} (unchanged since archive)")
|
|
+ # Still need to check for old .zip cleanup (we have .7z, so .zip is obsolete)
|
|
+ if old_zip_path and old_zip_path.exists():
|
|
+ old_zip_path.unlink(missing_ok=True)
|
|
+ old_state_path = state_path_for(old_zip_path)
|
|
+ if old_state_path.exists():
|
|
+ old_state_path.unlink(missing_ok=True)
|
|
+ continue
|
|
except OSError:
|
|
# If stat fails, fall through to full state computation
|
|
```
|
|
|
|
</details>
|
|
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
<tool-use data-tool-type="generic" data-tool-name="read_lints">
|
|
<details>
|
|
<summary>Tool use: **read_lints** • Read lints for 1 file</summary>
|
|
|
|
Lint paths:
|
|
|
|
- `zip_sequences.py`
|
|
|
|
Lint results
|
|
|
|
|
|
**No lint errors found**
|
|
</details>
|
|
</tool-use>
|
|
|
|
---
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
Fixed the issue. The quick-skip optimization was skipping state computation when the directory mtime was <= the archive mtime, which can miss file deletions if the directory mtime doesn't update.
|
|
|
|
**The fix:** Before the quick-skip, the code now verifies that all files in the stored state still exist. If any are missing (like the JPGs you removed), it skips the quick-skip and recomputes the full state, which will detect the change and update the manifest.
|
|
|
|
Now when you run `zipseq`:
|
|
1. It checks if stored files still exist
|
|
2. If JPGs are missing, it recomputes the state
|
|
3. It detects the difference and updates the manifest
|
|
4. It recompresses the archive with only the remaining PNGs
|
|
|
|
The manifest should now update correctly when files are removed, even if the directory mtime doesn't change.
|
|
|
|
---
|
|
|