Download APIs
om_hub_download
Download a single file. The following is an example:
python
from openmind_hub import om_hub_download
om_hub_download(
repo_id="PyTorch-NPU/t5_small",
filename="config.json",
subfolder=None,
repo_type=None,
revision=None,
cache_dir=None,
local_dir="demo/t5_small",
local_dir_use_symlinks="auto",
user_agent=None,
force_download=False,
proxies=None,
token=None,
local_files_only=False,
endpoint=None,
resume_download=True,
)
- repo_id (
str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed. - filename (
str): file to be downloaded. - subfolder (
str, optional): parent directory of the file. If the file does not have a parent directory or thefilenamecontains the parent directory, set this parameter toNone. - repo_type (
str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model". - revision (
str, optional): branch name. Letters, digits, underscores (_), and hyphens (-) are allowed. - cache_dir (
str,Path, orNone, optional): local path where the cache is stored. It is defaulted to~/.cache/openmind/hub. - local_dir (
str,Path, orNone, optional): local path to which the file is downloaded. By default, the .symlink file is created only in the cache. - local_dir_use_symlinks (
"auto"orbool, optional): used together withlocal_dir. If it is set to"auto", the system determines whether to create a .symlink file based on the file size. If the value isTrue, the system creates a .symlink file for all files. If the value isFalse, the system does not create a .symlink file for any file. The default value is"auto". - user_agent (
Dict,str, orNone, optional): user agent information. - force_download (
bool, defaulted asFalse, optional): whether to forcibly download the file regardless the cache. - proxies (
Dict, optional): proxy information. - token (
str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required. - local_files_only (
bool, optional): If the value isTrue, the file is not downloaded and the cache path is returned only when the local cache exists. The default value isFalse. - endpoint (
str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name. - resume_download (
bool, optional): whether to resume the previously interrupted download. The default value isTrue. - kwargs: required only for compatibility with third-party components.
snapshot_download
Download files from a repository. The following is an example:
python
from openmind_hub import snapshot_download
snapshot_download(
repo_id="PyTorch-NPU/t5_small",
repo_type=None,
revision=None,
cache_dir=None,
local_dir="demo/t5_small",
local_dir_use_symlinks="auto",
library_name=None,
library_version=None,
user_agent=None,
proxies=None,
resume_download=True,
force_download=False,
token=None,
local_files_only=False,
allow_patterns="allowed_folder\*", # Download only files in the allowed_folder directory.
ignore_patterns=["*.log", "*.txt"], # Does not download .log and .txt files.
max_workers=8,
tqdm_class=None,
endpoint=None,
)
- repo_id (
str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed. - repo_type (
str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model". - revision (
str, optional): branch name. Letters, digits, underscores (_), and hyphens (-) are allowed. - cache_dir (
str,Path, orNone, optional): local path where the cache is stored. It is defaulted to~/.cache/openmind/hub. - local_dir (
str,Path, orNone, optional): local path to which the file is downloaded. By default, the .symlink file is created only in the cache. - local_dir_use_symlinks (
"auto"orbool, optional): used together withlocal_dir. If it is set to"auto", the system determines whether to create a .symlink file based on the file size. If the value isTrue, the system creates a .symlink file for all files. If the value isFalse, the system does not create a .symlink file for any file. The default value is"auto". - library_name (
str, optional): name of the library that initiates an external request. - library_version (
str, optional): version of the library that initiates an external request. - user_agent (
strorDict, optional): information about the user who initiates an external request. - proxies (
Dict, optional): proxy information. - resume_download (
bool, optional): whether to resume the previously interrupted download. The default value isTrue. - force_download (
bool, defaulted asFalse, optional): whether to forcibly download the file regardless the cache. - token (
str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required. - local_files_only (
bool, optional): If the value isTrue, the file is not downloaded and the cache path is returned only when the local cache exists. The default value isFalse. - allow_patterns (
List[str]orstr, optional): only certain types of files can be downloaded. For example,allow_patterns="allowed_folder\*"indicates that only files in the allowed_folder directory are downloaded. - ignore_patterns (
List[str]orstr, optional): ignore downloading a certain type of file. For example,ignore_patterns="*.log"indicates that all log files are ignored. - max_workers (
int, optional): number of threads used for download. Defaults to 8. - tqdm_class (
tqdm.auto.tqdm_asyncio, optional): tqdm class used by the progress bar. Defaults to None, indicating that the default progress bar is used. - endpoint (
str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name. - kwargs: required only for compatibility with third-party components.
get_om_file_metadata
Obtain the file information. The following is an example:
python
from openmind_hub import get_om_file_metadata
get_om_file_metadata(
url='https://modelers.cn/api/v1/file/PyTorch-NPU/t5_small/info?ref=main&path=README.md',
token=None,
proxies=None,
timeout=10,
)
- url (
str): URL of the file information to be obtained. The URL must start with the production environment domain name (which can be changed by setting the environment variableOPENMIND_HUB_ENDPOINT). - token (
str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required. - proxies (
Dict, optional): proxy information. - timeout (
intorfloat, optional): timeout interval of an external request, in seconds. Defaults to 10. - kwargs: required only for compatibility with third-party components.
om_hub_url
Concatenate the file download URL based on the given repository name and file name. The following is an example:
python
from openmind_hub import om_hub_url
om_hub_url(
repo_id="PyTorch-NPU/t5_small",
filename="config.json",
subfolder=None,
revision=None,
endpoint=None,
)
- repo_id (
str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed. - filename (
str): file to be downloaded. - subfolder (
str, optional): parent directory of the file to be downloaded. If the value offilenamecontains the parent directory, you do not need to set this parameter. - revision (
str, optional): branch of the file to be downloaded. The value consists of letters, digits, underscores (_), and hyphens (-). - endpoint (
str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name. - kwargs: required only for compatibility with third-party components.
http_get
Generally, you are advised to use the om_hub_download method to download files as it provides a more complete, secure, and easy-to-use download function. When using http_get, be careful not to overwrite important files. The following is an example:
python
from openmind_hub import http_get, om_hub_url
http_get(
url=om_hub_url(repo_id="owner/repo", filename="filename"),
temp_file=open('/demo/file', 'wb'),
proxies=None,
headers=None,
displayed_filename=None,
resume_size=0,
expected_size=None,
_nb_retries=5,
)
- url (
str): URL of the file to be downloaded. The URL must start with the production environment domain name (which can be changed by setting the environment variableOPENMIND_HUB_ENDPOINT). - temp_file (
BinaryIO): file used to store the downloaded content. Each download operation will overwrite this file. - proxies (
Dict, optional): proxy information. - headers (
Dict[str, str], optional): request header constructed using thebuild_om_headersmethod, which contains access token information. - displayed_filename (
str, optional): name of the file to be downloaded. - resume_size (
intorfloat, optional): size of the downloaded file, in KB. This parameter is required only for resumable download. - expected_size (
int, optional): expected size of the file to be downloaded, in KB. - _nb_retries (
int, optional): maximum number of automatic retries when a request fails. The default value is 5. - kwargs: required only for compatibility with third-party components.
try_to_load_from_cache
Searches for the path of the file in the cache. If the path does not exist, None is returned. The following is an example:
python
from openmind_hub import try_to_load_from_cache
try_to_load_from_cache(
repo_id="PyTorch-NPU/t5_small",
filename="config.json",
cache_dir=None,
revision=None,
repo_type=None,
)
- repo_id (
str): name of the repository where the cache file is located. The format is username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed. - filename (
str): name of a cache file. - cache_dir (
str,Path, orNoneoptional): cache directory. It is defaulted to~/.cache/openmind/hub. - revision (
str, optional): branch of the cache file in the repository. Letters, digits, underscores (_), and hyphens (-) are allowed. It is defaulted to "main". - repo_type (
str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model".