Trainer APIs
openmind.TrainingArguments Class
The TrainingArguments class is used to configure parameters of training tasks, including hyperparameters, model saving paths, log recording options, and learning rates required during training.
Parameters
- Parameters supported by the
TrainingArguments classesof both PyTorch and MindSpore
| Name | PyTorch Type | MindSpore Type | Description | Default Value for PyTorch | Default Value for MindSpore |
|---|---|---|---|---|---|
| output_dir | str | str | Output directory | None | "./output" |
| overwrite_output_dir | bool | bool | Whether to overwrite the output directory | False | False |
| seed | int | int | Random seed | 42 | 42 |
| use_cpu | bool | bool | Whether to use a CPU | False | False |
| do_train | bool | bool | Whether to perform training | False | False |
| do_eval | bool | bool | Whether to perform evaluation | False | False |
| do_predict | bool | bool | Whether to perform inference | False | False |
| num_train_epochs | float | float | Total number of training epochs | 3.0 | 3.0 |
| resume_from_checkpoint | str | str | Preloaded weights | None | None |
| evaluation_strategy | Union[IntervalStrategy, str] | Union[IntervalStrategy, str] | Evaluation strategy | "no" | "no" |
| per_device_train_batch_size | int | int | Training batch size of each device | 8 | 8 |
| per_device_eval_batch_size | int | int | Evaluation batch size of each device | 8 | 8 |
| per_gpu_train_batch_size | int | int | Training batch size of each GPU (Not recommended.) | None | None |
| per_gpu_eval_batch_size | int | int | Evaluation batch size of each GPU (Not recommended.) | None | None |
| gradient_accumulation_steps | int | int | Gradient accumulation steps | 1 | 1 |
| ignore_data_skip | bool | bool | Whether to ignore data skipping during resumable training | False | False |
| dataloader_drop_last | bool | bool | Whether the data loader drops the last batch | False | True |
| dataloader_num_workers | int | int | Number of processes in the data loader | 0 | 8 |
| optim | Union[OptimizerNames, str] | Union[OptimizerType, str] | Optimizer | "adamw_torch" | "fp32_adamw" |
| adam_beta1 | float | float | Adam optimizer beta1 | 0.9 | 0.9 |
| adam_beta2 | float | float | Adam optimizer beta2 | 0.999 | 0.999 |
| adam_epsilon | float | float | Adam optimizer epsilon | 1e-8 | 1e-8 |
| weight_decay | float | float | Weight decay | 0.0 | 0.0 |
| lr_scheduler_type | Union[SchedulerType, str] | Union[LrSchedulerType, str] | Type of the learning rate scheduler | "linear" | "cosine" |
| learning_rate | float | float | Learning rate. | 5e-5 | 5e-5 |
| warmup_ratio | float | float | Warmup ratio | 0.0 | None |
| warmup_steps | int | int | Warmup steps | 0 | 0 |
| max_grad_norm | float | float | Maximum norm of gradient clipping | 1.0 | 1.0 |
| logging_strategy | Union[IntervalStrategy, str] | Union[LoggingIntervalStrategy, str] | Logging strategy | "steps" | "steps" |
| logging_steps | float | float | Number of logging steps | 500 | 1 |
| save_steps | float | float | Weight saving steps | 500 | 500 |
| save_strategy | str | Union[SaveIntervalStrategy, str] | Weight saving strategy | "steps" | "steps" |
| save_total_limit | int | int | Maximum number of weights that can be saved | None | 5 |
| save_on_each_node | bool | bool | Whether to save weights on different nodes | False | True |
| hub_model_id | str | str | Hub model ID | None | None |
| hub_strategy | Union[HubStrategy, str] | Union[HubStrategy, str] | Hub push strategy | "every_save" | "every_save" |
| hub_token | str | str | Hub token | None | None |
| hub_private_repo | bool | bool | Hub private repository | False | False |
| hub_always_push | bool | bool | Whether to always push a model to the Hub | False | False |
| data_seed | int | int | Number of random seeds of a data sampler | None | None |
| eval_steps | float | float | Number of steps during evaluation | None | None |
| push_to_hub | bool | bool | Whether to push a model to the Hub | False | False |
- Parameters supported independently by PyTorch
| Name | Type | Description | Default Value |
|---|---|---|---|
| optim_args | str | Optimizer parameters | None |
| label_names | List[str] | Label name | None |
| load_best_model_at_end | bool | Whether to load the optimal model at the end | False |
| metric_for_best_model | str | Metric for the optimal model | None |
| greater_is_better | bool | Whether a greater metric is better | None |
| label_smoothing_factor | float | Label smoothing factor | 0.0 |
| include_inputs_for_metrics | bool | Whether the metric includes inputs | False |
| prediction_loss_only | bool | Whether to return only loss when evaluation is conducte and prediction is generated | False |
| eval_accumulation_steps | int | Number of prediction steps to accumulate output tensors | None |
| eval_delay | float | Number of steps to wait before the first evaluation | None |
| max_steps | int | Maximum number of training steps | -1 |
| lr_scheduler_kwargs | dict | Addition arguments of the scheduler | {} |
| log_level | str | Log level | "passive" |
| log_level_replica | str | Log level used on replicas | "warning" |
| log_on_each_node | bool | Whether logs are recorded only in the primary node for distributed training | True |
| logging_dir | str | Directory that saves logs | None |
| logging_first_step | bool | Whether to record the first global_step | False |
| logging_nan_inf_filter | bool | Whether to filter 'nan' and 'inf' loss for log recording | True |
| save_safetensors | bool | Whether to save the weights in safetensor format | True |
| save_only_model | bool | Whether to save only model status during checkpointing | False |
| jit_mode_eval | bool | Whether to use PyTorch JIT for inference | False |
| use_ipex | bool | Whether to use Intel extensions (Not supported) | False |
| bf16 | bool | Whether to use the bf16 format | False |
| fp16 | bool | Whether to use the fp16 format | False |
| tf32 | bool | Whether to use the tf32 format (Not supported) | None |
| fp16_opt_level | str | Weight saving strategy | "O1" |
| fp16_backend | str | Specifies the backend used by fp16. | "auto" |
| half_precision_backend | str | Device used for mixed-precision training | "auto" |
| bf16_full_eval | bool | Whether to use bf16 during evaluation | False |
| fp16_full_eval | bool | Whether to use fp16 during evaluation | False |
| disable_tqdm | bool | Whether to disable the progress bar | None |
| remove_unused_columns | bool | Whether to automatically delete columns that are not used by the model's forward method | True |
| fsdp | Union[List[FSDPOption, str]] | Whether to use FSDP | None |
| fsdp_config | Union[dict, str] | FSDP configuration | None |
| local_rank | int | Process ID for distributed training | -1 |
| tpu_num_cores | int | Number of cores used for TPU training (Not supported) | None |
| past_index | int | Index when hidden states are used for prediction | -1 |
| ddp_backend | str | Backend for DDP distributed training | None |
| run_name | str | Running descriptor | None |
| deepspeed | str | DeepSpeed configuration | None |
| accelerator_config | str | Accelerate configuration | None |
| debug | Union[str, List[DebugOption]] | Enablement for one or more debugging functions | None |
| length_column_name | str | Column name for precomputed length | "length" |
| group_by_length | bool | Whether to group together samples with roughly the same length in the training dataset | False |
| ddp_find_unused_parameters | bool | Whether to pass find_unused_parameters to DistributedDataParallel | None |
| report_to | List[str] | List of integrations to report the results and logs | "all" |
| ddp_bucket_cap_mb | int | Value of bucket_cap_mb passed to DistributedDataParallel | None |
| ddp_broadcast_buffers | bool | Whether to pass the value of ddp_broadcast_buffers to DistributedDataParallel | None |
| ddp_timeout | int | Timeout for DDP calls | 1800 |
| dataloader_pin_memory | bool | Whether to pin memory in the data loader | True |
| dataloader_persistent_workers | bool | Whether to maintain worker dataset instances alive | False |
| dataloader_prefetch_factor | int | Number of batches preloaded by each worker | None |
| skip_memory_metrics | bool | Whether to skip adding of memory profiler report to metrics | True |
| gradient_checkpointing | bool | Whether to use gradient checkpoints to save memory | False |
| gradient_checkpointing_kwargs | dict | Arguments related to gradient checkpoints | None |
| auto_find_batch_size | bool | Whether to find a batch size that fits into memory automatically through exponential decay | False |
| full_determinism | bool | Whether to call enable_full_determinism instead of set_seed to ensure reproducible results in distributed training | False |
| torchdynamo | str | Backend compiler of TorchDynamo (Not supported) | None |
| ray_scope | str | Scope to use when doing hyperparameter search with Ray | "last" |
| use_mps_device | bool | Whether to use mps device (Not supported) | False |
| torch_compile | bool | Whether to use PyTorch 2.0 to compile the model (Not supported) | False |
| torch_compile_backend | str | Backend used by torch.compile (Not supported) | None |
| torch_compile_mode | str | torch.compile mode (Not supported) | None |
| split_batches | bool | Whether to split batches generated by the data loader to devices | None |
| include_tokens_per_second | bool | Whether to calculate tokens per second of each device | None |
| include_num_input_tokens_seen | bool | Whether to track the number of input tokens seen throughout training | None |
| neftune_noise_alpha | float | Whether to activate NEFTune noise embeddings | None |
| optim_target_modules | Union[str, List[str]] | Target module to be optimized | None |
- Parameters supported independently by MindSpore
| Name | Type | Description | Default Value |
|---|---|---|---|
| only_save_strategy | bool | Whether a task directly exits after saving the strategy file | False |
| auto_trans_ckpt | bool | Whether to enable automatic weight transformation | False |
| src_strategy | str | Distributed strategy file for preloaded weights | None |
| batch_size | int | Training batch size of each device. per_device_train_batch_size will be overwritten. | None |
| sink_mode | bool | Whether to directly sink data to devices through channels. | True |
| sink_size | int | Size of sunk data for training or evaluation in each step. | 2 |
| mode | int | Running in GRAPH_MODE (0) or PYNATIVE_MODE (1). | 0 |
| resume_training | bool | Whether to enable resumable training | False |
| remote_save_url | str | OBS saving path | None |
| device_id | int | Device ID | 0 |
| device_target | str | Target device where the task is executed. Value options: 'Ascend', 'GPU', or 'CPU'. | "Ascend" |
| enable_graph_kernel | bool | Whether to use graph fusion. | False |
| graph_kernel_flags | str | Level of graph fusion. | "--opt_level=0" |
| save_graphs | bool | Whether to save computational graphs | False |
| save_graphs_path | str | Path for saving computational graphs | "./graph" |
| max_call_depth | int | Maximum depth of a function call. | 10000 |
| max_device_memory | str | Maximum available memory of the device. | "1024GB" |
| use_parallel | bool | Whether to enable the parallel mode | False |
| parallel_mode | int | Whether to run in data parallel (0), semi-automatic parallel (1), automatic parallel (2), or hybrid parallel (3) mode. | 1 |
| gradients_mean | bool | Whether to execute the mean operator after gradient AllReduce | False |
| loss_repeated_mean | bool | Whether to execute the mean operator backwards during repeated computation. | False |
| enable_alltoall | bool | Indicates whether to generate the AllToAll communication operator during communication. | False |
| enable_parallel_optimizer | bool | Whether to enable optimizer parallelism | False |
| full_batch | bool | If the entire batch dataset is loaded in automatic parallel mode, set full_batch to True. This API is not recommended. Replace it with dataset_strategy. | True |
| dataset_strategy | Union[str, tuple] | Dataset sharding strategy. | "full_batch" |
| search_mode | str | Strategy search mode, which is valid only in automatic parallel mode. This API is an experimental API. Exercise caution when using this API. | "sharding_propagation" |
| data_parallel | int | Data parallelism | 1 |
| gradient_accumulation_shard | bool | Whether the gradient accumulation variable is sharded along the data parallelism dimension. | False |
| parallel_optimizer_threshold | int | Sets the threshold for parameter optimizer. | 64 |
| optimizer_weight_shard_size | int | Sets the size of the communicator for which the optimizer weight is sharded. | -1 |
| strategy_ckpt_save_file | str | Path for saving distribution strategy files. | "./ckpt_strategy.ckpt" |
| model_parallel | int | Model parallelism | 1 |
| expert_parallel | int | Expert parallelism | 1 |
| pipeline_stage | int | Pipeline parallelism | 1 |
| gradient_aggregation_group | int | Size of a gradient communication operation fusion group. | 4 |
| micro_batch_num | int | Minimum number of batches for pipeline computation | 1 |
| micro_batch_interleave_num | int | Number of concurrent copies | 1 |
| use_seq_parallel | bool | Whether to enable sequence parallelism. | False |
| vocab_emb_dp | bool | Whether to split the vocabulary only along the data parallelism dimension. | True |
| expert_num | int | Number of experts. | 1 |
| capacity_factor | float | Expert factor. | 1.05 |
| aux_loss_factor | float | Loss contribution factor. | 0.05 |
| num_experts_chosen | int | Number of experts selected for each marker. | 1 |
| recompute | bool | Recomputation | False |
| select_recompute | bool | Selective recomputation | False |
| parallel_optimizer_comm_recompute | bool | Whether to recalculate the AllGather communication introduced by the optimizer parallelism. | False |
| mp_comm_recompute | bool | Whether to recalculate the communication operations introduced by model parallelism. | True |
| recompute_slice_activation | bool | Whether to slice the cell output stored in the memory. | False |
| layer_scale | bool | Whether to enable layer decay. | False |
| layer_decay | float | Layer decay coefficient. | 0.65 |
| lr_end | float | End learning rate. | 1e-6 |
| warmup_lr_init | float | Initial learning rate in the warm-up phase. | 0.0 |
| warmup_epochs | int | Performs linear preheating in the warmup_epochs part of the total number of steps. | None |
| lr_scale | bool | Whether to enable learning rate scaling. | False |
| lr_scale_factor | int | Learning rate scaling factor. | 256 |
| python_multiprocessing | bool | Whether to enable the Python multi-process mode. | False |
| numa_enable | bool | Set the default status of NUMA to enabled. | False |
| prefetch_size | int | Sets the queue capacity of threads in a pipe. | 1 |
| wrapper_type | str | Class name of the wrapper. | "MFTrainOneStepCell" |
| scale_sense | Union[str, float] | Value or class name of scale sense. | "DynamicLossScaleUpdateCell" |
| loss_scale_value | int | Initial loss scaling factor. | 65536 |
| loss_scale_factor | int | Increment and decrement factors of the loss scaling coefficient. | 2 |
| loss_scale_window | int | Maximum number of consecutive training steps for increasing the loss scaling coefficient. | 1000 |
| use_clip_grad | bool | Whether to enable gradient clipping. | True |
| train_dataset | str | Training dataset path | None |
| eval_dataset | str | Evaluation dataset path | None |
| dataset_task | str | Task type corresponding to a dataset | None |
| dataset_type | str | Dataset type | None |
| train_dataset_in_columns | list[str] | Training dataset input column names | None |
| train_dataset_out_columns | list[str] | Training dataset output column names | None |
| eval_dataset_in_columns | list[str] | Evaluation dataset input column names | None |
| eval_dataset_out_columns | list[str] | Evaluation dataset output column names | None |
| shuffle | bool | Whether the training dataset is out of order | True |
| repeat | int | Number of repetitions of the training dataset | 1 |
| metric_type | Union[List[str], str] | Matrix format. | None |
| save_seconds | int | Checkpoints are saved every X seconds. | None |
| integrated_save | bool | Whether to merge and save the split tensors in the automatic parallelism scenario. | None |
| eval_epochs | int | Number of epoch intervals between evaluations. 1 indicates that evaluation is performed at the end of each epoch. | None |
| profile | bool | Whether to enable profiling | False |
| profile_start_step | int | Start step of profiling | 1 |
| profile_end_step | int | End step of profiling | 10 |
| init_start_profile | bool | Whether to enable data collection during profiler initialization. | False |
| profile_communication | bool | Whether to collect communication performance data during multi-device training. | False |
| profile_memory | bool | Whether to collect tensor memory data. | True |
| auto_tune | bool | Whether to enable automatic data acceleration. | False |
| filepath_prefix | str | Save path and file prefix of the optimized global configuration. | "./autotune" |
| autotune_per_step | int | Sets the step interval for adjusting the automatic data acceleration configuration. | 10 |
train_batch_size
Obtains the training batch size.
Prototype
def train_batch_size()
eval_batch_size
Obtains the evaluation batch size.
Prototype
def eval_batch_size()
world_size
Obtains the number of parallel processes.
Prototype
def world_size()
process_index
Obtains the index of the current process.
Prototype
def process_index()
local_process_index
Obtains the index of the current local process.
Prototype
def local_process_index()
should_log
Determines whether the current process should generate logs. Currently, this API is supported only by PyTorch.
Prototype
def should_log()
should_save
Determines whether the current process should be written to a disk. Currently, this API is supported only by PyTorch.
Prototype
def should_save()
_setup_devices
Sets the device. Currently, this API is supported only by PyTorch.
Prototype
def _setup_devices()
device
Obtains the device used by the current process. Currently, this API is supported only by PyTorch.
Prototype
def device()
get_process_log_level
Obtains the process log level. Currently, this API is supported only by PyTorch.
Prototype
def get_process_log_level()
main_process_first
Indicates that the main process takes precedence. Currently, this API is supported only by PyTorch.
Prototype
def main_process_first(local: bool = True, desc: str = "work")
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| local | Whether it is local | bool | Not supported. |
| desc | Work description | str | Not supported. |
get_warmup_steps
Obtains the number of warmup iteration steps.
Prototype
def get_warmup_steps(num_training_steps: int)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| num_training_steps | Number of training iteration steps | int | int |
to_dict
Serializes instances into a dictionary.
Prototype
def to_dict()
to_json_string
Serializes instances into a JSON string.
Prototype
def to_json_string()
to_sanitized_dict
Serializes instances into a parameter dictionary that can be used for TensorBoard. Currently, this API is supported only by PyTorch.
Prototype
def to_sanitized_dict()
set_training
Set training parameters.
Prototype
def set_training(
learning_rate: float = 5e-5,
batch_size: int = 8,
weight_decay: float = 0,
num_epochs: float = 3,
max_steps: int = -1,
gradient_accumulation_steps: int = 1,
seed: int = 42,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| learning_rate | Initial learning rate | float | float |
| batch_size | Training batch size of each device | int | int |
| weight_decay | Weight decay | float | float |
| num_epochs | Total number of training epochs | float | float |
| max_steps | Maximum number of training steps | int | Not supported. |
| gradient_accumulation_steps | Number of update steps of gradient accumulation | int | int |
| seed | Random seed set at the beginning of training | int | int |
| kwargs["gradient_checkpointing"] | If the value is True, gradient checkpoints are used to save memory, but the backpropagation speed is slowed down. | bool | Not supported. |
set_evaluate
Sets evaluation parameters.
Prototype
def set_evaluate(
strategy: Union[str, IntervalStrategy] = "no",
steps: int = 500,
batch_size: int = 8,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| strategy | Evaluation strategy used during training. The options are as follows: - "no": No evaluation is performed during training.- "steps": An evaluation is performed (and logged) every steps.- "epoch": An evaluation is performed at the end of each epoch. | Union[str, IntervalStrategy] | Union[str, IntervalStrategy] |
| steps | Number of update steps between two evaluations if strategy="steps". | int | int |
| batch_size | Batch size of each device used for evaluation | int | int |
| kwargs["accumulation_steps"] | Number of prediction steps to accumulate the output tensors before the output tensors are moved to the CPU | int | Not supported. |
| kwargs["delay"] | Number of epochs or steps to wait before the first evaluation is performed, which depends on the evaluation_strategy | float | Not supported. |
| kwargs["loss_only"] | Ignores all outputs except losses | bool | Not supported. |
| kwargs["jit_mode"] | Whether to use PyTorch JIT in inference | bool | Not supported. |
set_testing
Sets test parameters.
Prototype
def set_testing(
batch_size: int = 8,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| batch_size | Batch size of each device used for testing | int | int |
| kwargs["loss_only"] | Ignores all outputs except losses | bool | bool |
| kwargs["jit_mode"] | Whether to use PyTorch JIT in inference | bool | Not supported. |
set_save
Sets and saves all relevant parameters.
Prototype
def set_save(
strategy: Union[str, IntervalStrategy] = "steps",
steps: int = 500,
total_limit: Optional[int] = None,
on_each_node: bool = False,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| strategy | Weight saving strategy. The options are as follows: - "no": Checkpoints are not saved during training.- "epoch": Checkpoints are saved at the end of each epoch.- "steps": Checkpoints are saved every save_steps . | Union[str, IntervalStrategy] | Union[str, IntervalStrategy] |
| steps | Number of update steps between two checkpoint savings if strategy="steps" | int | int |
| total_limit | Limits the total number of checkpoints. Older checkpoints in output_dir are deleted. | int | int |
| on_each_node | Whether to save models and checkpoints on each node or only on the primary node during multi-node distributed training | bool | bool |
set_logging
Sets all parameters related to log recording.
Prototype
def set_logging(
strategy: Union[str, IntervalStrategy] = "steps",
steps: int = 500,
report_to: Union[str, List[str]] = "none",
level: str = "passive",
first_step: bool = False,
nan_inf_filter: bool = False,
on_each_node: bool = False,
replica_level: str = "passive",
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| strategy | Training log saving strategy. The options are as follows: - "no": No log is recorded during training.- "epoch": Logs are recorded at the end of each epoch.- "steps": Logs are recorded every save_steps. | Union[str, IntervalStrategy] | Union[str, IntervalStrategy] |
| steps | Number of update steps between two log records if strategy="steps". | int | int |
| report_to | Token used to push a model to the Hub | str | Not supported. |
| level | Logger log level to be used on the main process. The options include "debug", "info", "warning", "error", "critical", and "passive". | str | Not supported. |
| first_step | Whether to record and evaluate the first global_step. | bool | Not supported. |
| nan_inf_filter | Whether to filter the nan and inf losses in logs. | bool | Not supported. |
| on_each_node | Whether log_level is used for logging on each node or only on the primary node during distributed training. | bool | Not supported. |
| replica_level | Logger log level used on replicas | str | Not supported. |
set_push_to_hub
Sets all parameters for synchronizing checkpoints with the Hub.
Prototype
def set_push_to_hub(
model_id: str,
strategy: Union[str, HubStrategy] = "every_save",
token: Optional[str] = None,
private_repo: bool = False,
always_push: bool = False,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| model_id | Name of the repository synchronized with the local output_dir. It can be a simple model ID, in which case the model will be pushed to your namespace. It can also be the name of the repository, for example, "user_name/model". | str | str |
| strategy | Defines the strategy for pushing data to the Hub. The options are as follows: - "end": When the Trainer.save_model method is invoked, push a model and its configuration, tokenizer, and model card.- "every_save": Each time a model is saved, push the model and its configuration, tokenizer, and model card. The push is asynchronous and does not block training. If saving is very frequent, a new push will only be attempted after the previous push is complete. At the end of training, the final model will make one last push.- "checkpoint": similar to "every_save", but the latest checkpoint is also pushed to a subfolder named last-checkpoint, making it easy to resume training using trainer.train(resume_from_checkpoint="last-checkpoint").- "all_checkpoints": similar to "checkpoint", but all checkpoints are pushed (so you will get a checkpoint folder for each folder in your final repository). | Union[str, HubStrategy] | Union[str, HubStrategy] |
| token | Token used to push a model to the Hub | str | str |
| private_repo | If the value is True, a Hub repository will be set to private. | bool | bool |
| always_push | If the value is False, the Trainer will skip checkpoint pushing when the previous push is not complete. | bool | bool |
set_optimizer
Sets all parameters related to the optimizer and its hyperparameters.
Prototype
def set_optimizer(
name: Union[str, OptimizerNames],
learning_rate: float = 5e-5,
weight_decay: float = 0,
beta1: float = 0.9,
beta2: float = 0.999,
epsilon: float = 1e-8,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| name | Optimizer type | Union[str, OptimizerNames] | Union[str, OptimizerType] |
| learning_rate | Initial learning rate | float | float |
| lr_end | End learning rate. | Not supported. | float |
| weight_decay | Weight decay | float | float |
| beta1 | beta1 hyperparameter of the Adam optimizer or its variants | float | float |
| beta2 | beta2 hyperparameter of the Adam optimizer or its variants | float | float |
| epsilon | epsilon hyperparameter of the Adam optimizer or its variants | float | float |
| kwargs["args"] | Optional parameter passed to AnyPrecisionAdamW (valid only when optim="adamw_anyprecision"). The default value is None. | str | Not supported. |
set_lr_scheduler
Sets all parameters related to the learning rate scheduler and its hyperparameters.
Prototype
def set_lr_scheduler(
name: Union[str, SchedulerType] = "linear",
num_epochs: float = 3.0,
max_steps: int = -1,
warmup_ratio: float = 0,
warmup_steps: int = 0,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| name | Type of the learning rate scheduler | Union[str, SchedulerType] | Union[str, LrSchedulerType] |
| num_epochs | Total number of training epochs | float | float |
| max_steps | Maximum number of training steps | int | Not supported. |
| warmup_ratio | Ratio of total training steps used for a linear warmup from 0 to learning_rate. | float | float |
| warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate. | int | int |
set_dataloader
Sets a data loader.
Prototype
def set_dataloader(
train_batch_size: int = 8,
eval_batch_size: int = 8,
drop_last: bool = False,
num_workers: int = 0,
ignore_data_skip: bool = False,
sampler_seed: Optional[int] = None,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| train_batch_size | Training batch size | int | int |
| eval_batch_size | Evaluation batch size | int | int |
| drop_last | Whether to drop the last incomplete batch | bool | bool |
| num_workers | Number of subprocesses used for data loading | int | int |
| ignore_data_skip | Whether to skip the batches and epochs during training resumption to get the data loading at the same stage as in the previous training | bool | bool |
| sampler_seed | Random seed for data sampler | int | int |
| kwargs["pin_memory"] | Whether to pin memory in the data loader. The default value is True. | bool | Not supported. |
| kwargs["persistent_workers"] | If the value is True, the data loader does not close the worker process after a dataset has been consumed once. This allows the dataset instance of the worker process to be active. The training may be accelerated, but the RAM usage increases. The default value is False. | bool | Not supported. |
| kwargs["auto_find_batch_size"] | Automatically finds a batch size that will fit into memory. accelerate needs to be installed. The default value is False. | bool | Not supported. |
| kwargs["prefetch_factor"] | Number of batches that each process loads in advance. | int | Not supported. |
openmind.Trainer Class
The Trainer class is used to implement functions such as model training, evaluation, and inference. It is the core component of training and provides many methods and functions to manage the entire training process, including data loading, model forward propagation, loss calculation, and gradient update.
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| args | Arguments used to configure datasets, hyperparameters, and optimizers | TrainingArguments | TrainingArguments |
| task | Enumerates the task types. | Not supported. | str |
| model | Model instance used for training, evaluation, or prediction | Union[PreTrainedModel, torch.nn.Module] | Union[mindformers.models.PreTrainedModel, str] |
| model_name | Model name. | Not supported. | str |
| pet_method | PET method name. | Not supported. | str |
| tokenizer | Tokenizer | PreTrainedTokenizerBase | mindformers.models.PreTrainedTokenizerBase |
| train_dataset | Training dataset | Dataset | Union[str, mindspore.dataset.BaseDataset] |
| eval_dataset | Evaluation dataset | Union[Dataset, Dict[str, Dataset]] | Union[str, mindspore.dataset.BaseDataset] |
| data_collator | Function for batch data processing | DataCollator | Not supported. |
| image_processor | Processor for image pre-processing. | Not supported. | mindformers.models.BaseImageProcessor |
| audio_processor | Processor for audio pre-processing. | Not supported. | mindformers.models.BaseAudioProcessor |
| optimizers | Optimizer | Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR] | mindspore.nn.Optimizer |
| compute_metrics | Function for computing metrics during evaluation | Callable[[EvalPrediction], Dict] | Union[dict, set] |
| callbacks | Callback function list | List[TrainerCallback] | Union[List[mindspore.train.Callback], mindspore.train.Callback] |
| eval_callbacks | Evaluation callback function list. | Not supported. | Union[List[mindspore.train.Callback], mindspore.train.Callback] |
| model_init | Function that instantiates the model to be used | Callable[[], PreTrainedModel] | Not supported. |
| preprocess_logits_for_metrics | Function that preprocesses the output results before computing metrics | Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | Not supported. |
| save_config | Saves the configuration of the current task. | Not supported. | bool |
train
Performs training steps.
Prototype
def train(*args, **kwargs)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| train_checkpoint | Restores the training weight of the network. | Not supported. | Union[str, bool] |
| resume_from_checkpoint | Preloaded weights | Union[str, bool] | Union[str, bool] |
| trial | Trial run or hyperparameter dictionary for hyperparameter search | Union["optuna.Trial", Dict[str, Any]] | Not supported. |
| ignore_keys_for_eval | List of keys in the model output that should be ignored when used to collect evaluation predictions during training | List[str] | Not supported. |
| resume_training | Resumable training switch | Not supported. | bool |
| auto_trans_ckpt | Automatic weight transformation switch | Not supported. | bool |
| src_strategy | Distributed strategy file for preloaded weights | Not supported. | str |
| do_eval | Whether to perform evaluation during training. | Not supported. | bool |
evaluate
Evaluates the operation.
Prototype
def evaluate(*args, **kwargs)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| eval_dataset | Evaluation dataset | Union[Dataset, Dict[str, Dataset]] | Union[str, mindspore.dataset.BaseDataset, mindspore.dataset.Dataset, Iterable] |
| eval_checkpoint | Weight of the evaluation network. | Not supported. | Union[str, bool] |
| ignore_keys | List of keys in the model output that should be ignored when used to collect predictions | List[str] | Not supported. |
| metric_key_prefix | Metric name prefix | str | Not supported. |
| auto_trans_ckpt | Automatic weight transformation switch | Not supported. | bool |
| src_strategy | Distributed strategy file for preloaded weights | Not supported. | str |
predict
Executes inference.
Prototype
def predict(*args, **kwargs)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| predict_checkpoint | Weight of the inference network. | Not supported. | Union[str, bool] |
| test_dataset | Inference dataset | Dataset | Not supported. |
| ignore_keys | List of keys in the model output that should be ignored when used to collect predictions | List[str] | Not supported. |
| metric_key_prefix | Metric name prefix | str | Not supported. |
| input_data | Input data for inference | Not supported. | Union[GeneratorDataset,Tensor, np.ndarray, Image, str, list] |
| batch_size | Batch size | Not supported. | int |
| auto_trans_ckpt | Automatic weight transformation switch | Not supported. | bool |
| src_strategy | Distributed strategy file for preloaded weights | Not supported. | str |
add_callback
Adds a callback function to the current callback list.
Prototype
def add_callback(callback)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| callback | Callback used to return the result. | Union[type, TrainerCallback] | Union[type, mindspore.train.Callback] |
pop_callback
Deletes a callback from the current callback list and returns it. If the callback cannot be found, None is returned (no error is thrown).
Prototype
def pop_callback(callback)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| callback | Callback used to return the result. | Union[type, TrainerCallback] | Union[type, mindspore.train.Callback] |
remove_callback
Deletes a callback function from the current callback list.
Prototype
def remove_callback(callback)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| callback | Callback used to return the result. | Union[type, TrainerCallback] | Union[type, mindspore.train.Callback] |
save_model
Saves a model so that you can reload it using the from_pretrained() method.
Prototype
def save_model(*args, **kwargs)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| output_dir | Model saving path | str | str |
| _internal_call | Whether to upload the model to the repository hub when the save_model method is called when args.push_to_hub is set to True. The default value is False, indicating that the push is performed. | bool | bool |
init_hf_repo
Create and initialize git repo in args.hub_model_id.
Prototype
def init_hf_repo()
push_to_hub
Upload model and tokenizer to the args.hub_model_id repository on the Hub.
Prototype
def push_to_hub(
commit_message: Optional[str] = "End of training",
blocking: bool = True,
**kwargs,
)
Parameters
| Name | Description | PyTorch Supported Type | MindSpore Supported Type |
|---|---|---|---|
| commit_message | Message to commit during push. The default value is End of training | str | str |
| blocking | Whether the function should be returned only when git push is complete. The default value is True. | bool | bool |