Cache
The openMind Hub Client cache mechanism is a central cache system for cross-library sharing to prevent repeated download of the same file.
Cache File Structure
The structure of cache files is as follows:
<CACHE_DIR>
├─ <MODELS>
├─ <DATASETS>
├─ <SPACES>
CACHE_DIR is usually the home directory of a user, which can be defined by the cache_dir parameter.
MODELS, DATASETS, and SPACE are the cache directories of models, datasets, and spaces, respectively. They share a common root directory. Each repository directory is named in the following format: Repository type (MODELS, DATASETS, SPACES) --Namespace (organization or user name, if any) --Repository name. The following is an example.
bash<CACHE_DIR> ├─ models--ByteDance--SDXL-Lightning ├─ models--ByteDance--sd2.1-base-zsnr-laionaes5 ├─ models--bert-base-cased ├─ datasets--glue
Cache Functions
In a cache directory, all files are downloaded from the model repository. The cache ensures that files will not be repeatedly downloaded if they already exist and are not updated. If a file is updated and the latest file is requested, the latest file will be downloaded. (The historical file will be retained in case it needs to be used again.) To implement the preceding functions, all cache directories must have the same structure. Each cache directory must contain the following contents:
<CACHE_DIR>
├─ datasets--glue
│ ├─ refs
│ ├─ blobs
│ ├─ snapshots
...
Due to the resumable download function, the openMind Hub Client does not automatically delete the cached files that fail to be downloaded. You are advised to manually delete the cached files when they are not required.
Refs
The refs folder contains files with the latest commit identifier (commit hash) of a specified branch. Example:
If you fetch files from the main branch of a repository, the refs folder will contain a file named main, which contains the commit hash of the current head.
If the latest commit hash of the main branch is de9f2c, refs/main will also contain de9f2c.
If an updated is committed with a new hash 7fd25e within the same branch, re-downloading the referred file will update the refs/main file to include 7fd25e.
Blobs
The blobs folder contains the actual files you have downloaded. The name of each file is their hash value.
[ 96] .
└── [ 160] models--chatglm3-6B
├── [ 160] blobs
│ ├── [1201M] 447d41b7c5e7b2558905c98733469aa9e132540c91e13c4cdd7bfc58b60cc650
│ ├── [ 398] b098244a71fbe69ce149682d9072a7629f7e908c
│ └── [1.4K] c2d28f08b86bacac392140d0f6b26c05d567321f
├── [ 96] refs
│ └── [ 40] main
└── [ 128] snapshots
├── [ 128] de9f2ce1b3afad3e85a0bd17d9b100db4b3
│ ├── [ 52] README.md -> ../../blobs/b098244a71fbe69ce149682d9072a7629f7e908c
│ └── [ 76] pytorch_model.bin -> ../../blobs/447d41b7c5e7b2558905c98733469aa9e132540c91e13c4cdd7bfc58b60cc650
└── [ 128] 7fd25eb4b0d3255bfef95601890afbd17d9
├── [ 52] README.md -> ../../blobs/c2d28f08b86bacac392140d0f6b26c05d567321f
└── [ 76] pytorch_model.bin -> ../../blobs/447d41b7c5e7b2558905c98733469aa9e132540c91e13c4cdd7bfc58b60cc650
Through this file sharing mechanism, if a file is downloaded in version de9f2c but is not modified in version 7fd25e, the file's hash value remains unchanged. Therefore, you do not need to download the file (pytorch_model.bin) again.
no_exist
In addition to the blobs, refs, and snapshots folders, a .no_exist folder may be found in the cache. This folder is used to track files that you have tried to download but do not exist in the model repository. It is mainly used to reduce HTTP requests from other repositories to accelerate model loading.