English
Content on This Page

Download APIs

om_hub_download

Download a single file. The following is an example:

python
from openmind_hub import om_hub_download

om_hub_download(
    repo_id="PyTorch-NPU/t5_small",
    filename="config.json",
    subfolder=None,
    repo_type=None,
    revision=None,
    cache_dir=None,
    local_dir="demo/t5_small",
    local_dir_use_symlinks="auto",
    user_agent=None,
    force_download=False,
    proxies=None,
    token=None,
    local_files_only=False,
    endpoint=None,
    resume_download=True,
)
  • repo_id (str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed.
  • filename (str): file to be downloaded.
  • subfolder (str, optional): parent directory of the file. If the file does not have a parent directory or the filename contains the parent directory, set this parameter to None.
  • repo_type (str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model".
  • revision (str, optional): branch name. Letters, digits, underscores (_), and hyphens (-) are allowed.
  • cache_dir (str, Path, or None, optional): local path where the cache is stored. It is defaulted to ~/.cache/openmind/hub.
  • local_dir (str, Path, or None, optional): local path to which the file is downloaded. By default, the .symlink file is created only in the cache.
  • local_dir_use_symlinks ("auto" or bool, optional): used together with local_dir. If it is set to "auto", the system determines whether to create a .symlink file based on the file size. If the value is True, the system creates a .symlink file for all files. If the value is False, the system does not create a .symlink file for any file. The default value is "auto".
  • user_agent (Dict, str, or None, optional): user agent information.
  • force_download (bool, defaulted as False, optional): whether to forcibly download the file regardless the cache.
  • proxies (Dict, optional): proxy information.
  • token (str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required.
  • local_files_only (bool, optional): If the value is True, the file is not downloaded and the cache path is returned only when the local cache exists. The default value is False.
  • endpoint (str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name.
  • resume_download (bool, optional): whether to resume the previously interrupted download. The default value is True.
  • kwargs: required only for compatibility with third-party components.

snapshot_download

Download files from a repository. The following is an example:

python
from openmind_hub import snapshot_download

snapshot_download(
    repo_id="PyTorch-NPU/t5_small",
    repo_type=None,
    revision=None,
    cache_dir=None,
    local_dir="demo/t5_small",
    local_dir_use_symlinks="auto",
    library_name=None,
    library_version=None,
    user_agent=None,
    proxies=None,
    resume_download=True,
    force_download=False,
    token=None,
    local_files_only=False,
    allow_patterns="allowed_folder\*",    # Download only files in the allowed_folder directory.
    ignore_patterns=["*.log", "*.txt"],    # Does not download .log and .txt files.
    max_workers=8,
    tqdm_class=None,
    endpoint=None,
)
  • repo_id (str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed.
  • repo_type (str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model".
  • revision (str, optional): branch name. Letters, digits, underscores (_), and hyphens (-) are allowed.
  • cache_dir (str, Path, or None, optional): local path where the cache is stored. It is defaulted to ~/.cache/openmind/hub.
  • local_dir (str, Path, or None, optional): local path to which the file is downloaded. By default, the .symlink file is created only in the cache.
  • local_dir_use_symlinks ("auto" or bool, optional): used together with local_dir. If it is set to "auto", the system determines whether to create a .symlink file based on the file size. If the value is True, the system creates a .symlink file for all files. If the value is False, the system does not create a .symlink file for any file. The default value is "auto".
  • library_name (str, optional): name of the library that initiates an external request.
  • library_version (str, optional): version of the library that initiates an external request.
  • user_agent (str or Dict, optional): information about the user who initiates an external request.
  • proxies (Dict, optional): proxy information.
  • resume_download (bool, optional): whether to resume the previously interrupted download. The default value is True.
  • force_download (bool, defaulted as False, optional): whether to forcibly download the file regardless the cache.
  • token (str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required.
  • local_files_only (bool, optional): If the value is True, the file is not downloaded and the cache path is returned only when the local cache exists. The default value is False.
  • allow_patterns (List[str] or str, optional): only certain types of files can be downloaded. For example, allow_patterns="allowed_folder\*" indicates that only files in the allowed_folder directory are downloaded.
  • ignore_patterns (List[str] or str, optional): ignore downloading a certain type of file. For example, ignore_patterns="*.log" indicates that all log files are ignored.
  • max_workers (int, optional): number of threads used for download. Defaults to 8.
  • tqdm_class (tqdm.auto.tqdm_asyncio, optional): tqdm class used by the progress bar. Defaults to None, indicating that the default progress bar is used.
  • endpoint (str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name.
  • kwargs: required only for compatibility with third-party components.

get_om_file_metadata

Obtain the file information. The following is an example:

python
from openmind_hub import get_om_file_metadata

get_om_file_metadata(
    url='https://modelers.cn/api/v1/file/PyTorch-NPU/t5_small/info?ref=main&path=README.md',
    token=None,
    proxies=None,
    timeout=10,
)
  • url (str): URL of the file information to be obtained. The URL must start with the production environment domain name (which can be changed by setting the environment variable OPENMIND_HUB_ENDPOINT).
  • token (str, optional): ignore this parameter for a public repository; for a private repository, an access token that has the read permission on the target repository is required.
  • proxies (Dict, optional): proxy information.
  • timeout (int or float, optional): timeout interval of an external request, in seconds. Defaults to 10.
  • kwargs: required only for compatibility with third-party components.

om_hub_url

Concatenate the file download URL based on the given repository name and file name. The following is an example:

python
from openmind_hub import om_hub_url

om_hub_url(
    repo_id="PyTorch-NPU/t5_small",
    filename="config.json",
    subfolder=None,
    revision=None,
    endpoint=None,
)
  • repo_id (str): target repository, in the format of username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed.
  • filename (str): file to be downloaded.
  • subfolder (str, optional): parent directory of the file to be downloaded. If the value of filename contains the parent directory, you do not need to set this parameter.
  • revision (str, optional): branch of the file to be downloaded. The value consists of letters, digits, underscores (_), and hyphens (-).
  • endpoint (str, optional): domain name or IP address to be accessed. None indicates the default production environment domain name.
  • kwargs: required only for compatibility with third-party components.

http_get

Generally, you are advised to use the om_hub_download method to download files as it provides a more complete, secure, and easy-to-use download function. When using http_get, be careful not to overwrite important files. The following is an example:

python
from openmind_hub import http_get, om_hub_url

http_get(
    url=om_hub_url(repo_id="owner/repo", filename="filename"),
    temp_file=open('/demo/file', 'wb'),
    proxies=None,
    headers=None,
    displayed_filename=None,
    resume_size=0,
    expected_size=None,
    _nb_retries=5,
)
  • url (str): URL of the file to be downloaded. The URL must start with the production environment domain name (which can be changed by setting the environment variable OPENMIND_HUB_ENDPOINT).
  • temp_file (BinaryIO): file used to store the downloaded content. Each download operation will overwrite this file.
  • proxies (Dict, optional): proxy information.
  • headers (Dict[str, str], optional): request header constructed using the build_om_headers method, which contains access token information.
  • displayed_filename (str, optional): name of the file to be downloaded.
  • resume_size (int or float, optional): size of the downloaded file, in KB. This parameter is required only for resumable download.
  • expected_size (int, optional): expected size of the file to be downloaded, in KB.
  • _nb_retries (int, optional): maximum number of automatic retries when a request fails. The default value is 5.
  • kwargs: required only for compatibility with third-party components.

try_to_load_from_cache

Searches for the path of the file in the cache. If the path does not exist, None is returned. The following is an example:

python
from openmind_hub import try_to_load_from_cache

try_to_load_from_cache(
    repo_id="PyTorch-NPU/t5_small",
    filename="config.json",
    cache_dir=None, 
    revision=None,
    repo_type=None,
)
  • repo_id (str): name of the repository where the cache file is located. The format is username/repository name. For the username and repository name, letters, digits, dots (.), underscores (_), and hyphens (-) are allowed.
  • filename (str): name of a cache file.
  • cache_dir (str, Path, or None optional): cache directory. It is defaulted to ~/.cache/openmind/hub.
  • revision (str, optional): branch of the cache file in the repository. Letters, digits, underscores (_), and hyphens (-) are allowed. It is defaulted to "main".
  • repo_type (str, optional): repository type, which can be "model", "dataset", or "space". The default value is None, which indicates "model".