Trainer APIs

openmind.TrainingArguments Class

The TrainingArguments class is used to configure parameters of training tasks, including hyperparameters, model saving paths, log recording options, and learning rates required during training.

Parameters

Parameters supported by the TrainingArguments classes of both PyTorch and MindSpore

Name	PyTorch Type	MindSpore Type	Description	Default Value for PyTorch	Default Value for MindSpore
output_dir	str	str	Output directory	None	"./output"
overwrite_output_dir	bool	bool	Whether to overwrite the output directory	False	False
seed	int	int	Random seed	42	42
use_cpu	bool	bool	Whether to use a CPU	False	False
do_train	bool	bool	Whether to perform training	False	False
do_eval	bool	bool	Whether to perform evaluation	False	False
do_predict	bool	bool	Whether to perform inference	False	False
num_train_epochs	float	float	Total number of training epochs	3.0	3.0
resume_from_checkpoint	str	str	Preloaded weights	None	None
evaluation_strategy	Union[IntervalStrategy, str]	Union[IntervalStrategy, str]	Evaluation strategy	"no"	"no"
per_device_train_batch_size	int	int	Training batch size of each device	8	8
per_device_eval_batch_size	int	int	Evaluation batch size of each device	8	8
per_gpu_train_batch_size	int	int	Training batch size of each GPU (Not recommended.)	None	None
per_gpu_eval_batch_size	int	int	Evaluation batch size of each GPU (Not recommended.)	None	None
gradient_accumulation_steps	int	int	Gradient accumulation steps	1	1
ignore_data_skip	bool	bool	Whether to ignore data skipping during resumable training	False	False
dataloader_drop_last	bool	bool	Whether the data loader drops the last batch	False	True
dataloader_num_workers	int	int	Number of processes in the data loader	0	8
optim	Union[OptimizerNames, str]	Union[OptimizerType, str]	Optimizer	"adamw_torch"	"fp32_adamw"
adam_beta1	float	float	Adam optimizer beta1	0.9	0.9
adam_beta2	float	float	Adam optimizer beta2	0.999	0.999
adam_epsilon	float	float	Adam optimizer epsilon	1e-8	1e-8
weight_decay	float	float	Weight decay	0.0	0.0
lr_scheduler_type	Union[SchedulerType, str]	Union[LrSchedulerType, str]	Type of the learning rate scheduler	"linear"	"cosine"
learning_rate	float	float	Learning rate.	5e-5	5e-5
warmup_ratio	float	float	Warmup ratio	0.0	None
warmup_steps	int	int	Warmup steps	0	0
max_grad_norm	float	float	Maximum norm of gradient clipping	1.0	1.0
logging_strategy	Union[IntervalStrategy, str]	Union[LoggingIntervalStrategy, str]	Logging strategy	"steps"	"steps"
logging_steps	float	float	Number of logging steps	500	1
save_steps	float	float	Weight saving steps	500	500
save_strategy	str	Union[SaveIntervalStrategy, str]	Weight saving strategy	"steps"	"steps"
save_total_limit	int	int	Maximum number of weights that can be saved	None	5
save_on_each_node	bool	bool	Whether to save weights on different nodes	False	True
hub_model_id	str	str	Hub model ID	None	None
hub_strategy	Union[HubStrategy, str]	Union[HubStrategy, str]	Hub push strategy	"every_save"	"every_save"
hub_token	str	str	Hub token	None	None
hub_private_repo	bool	bool	Hub private repository	False	False
hub_always_push	bool	bool	Whether to always push a model to the Hub	False	False
data_seed	int	int	Number of random seeds of a data sampler	None	None
eval_steps	float	float	Number of steps during evaluation	None	None
push_to_hub	bool	bool	Whether to push a model to the Hub	False	False

Parameters supported independently by PyTorch

Name	Type	Description	Default Value
optim_args	str	Optimizer parameters	None
label_names	List[str]	Label name	None
load_best_model_at_end	bool	Whether to load the optimal model at the end	False
metric_for_best_model	str	Metric for the optimal model	None
greater_is_better	bool	Whether a greater metric is better	None
label_smoothing_factor	float	Label smoothing factor	0.0
include_inputs_for_metrics	bool	Whether the metric includes inputs	False
prediction_loss_only	bool	Whether to return only loss when evaluation is conducte and prediction is generated	False
eval_accumulation_steps	int	Number of prediction steps to accumulate output tensors	None
eval_delay	float	Number of steps to wait before the first evaluation	None
max_steps	int	Maximum number of training steps	-1
lr_scheduler_kwargs	dict	Addition arguments of the scheduler	{}
log_level	str	Log level	"passive"
log_level_replica	str	Log level used on replicas	"warning"
log_on_each_node	bool	Whether logs are recorded only in the primary node for distributed training	True
logging_dir	str	Directory that saves logs	None
logging_first_step	bool	Whether to record the first global_step	False
logging_nan_inf_filter	bool	Whether to filter 'nan' and 'inf' loss for log recording	True
save_safetensors	bool	Whether to save the weights in safetensor format	True
save_only_model	bool	Whether to save only model status during checkpointing	False
jit_mode_eval	bool	Whether to use PyTorch JIT for inference	False
use_ipex	bool	Whether to use Intel extensions (Not supported)	False
bf16	bool	Whether to use the bf16 format	False
fp16	bool	Whether to use the fp16 format	False
tf32	bool	Whether to use the tf32 format (Not supported)	None
fp16_opt_level	str	Weight saving strategy	"O1"
fp16_backend	str	Specifies the backend used by fp16.	"auto"
half_precision_backend	str	Device used for mixed-precision training	"auto"
bf16_full_eval	bool	Whether to use bf16 during evaluation	False
fp16_full_eval	bool	Whether to use fp16 during evaluation	False
disable_tqdm	bool	Whether to disable the progress bar	None
remove_unused_columns	bool	Whether to automatically delete columns that are not used by the model's forward method	True
fsdp	Union[List[FSDPOption, str]]	Whether to use FSDP	None
fsdp_config	Union[dict, str]	FSDP configuration	None
local_rank	int	Process ID for distributed training	-1
tpu_num_cores	int	Number of cores used for TPU training (Not supported)	None
past_index	int	Index when hidden states are used for prediction	-1
ddp_backend	str	Backend for DDP distributed training	None
run_name	str	Running descriptor	None
deepspeed	str	DeepSpeed configuration	None
accelerator_config	str	Accelerate configuration	None
debug	Union[str, List[DebugOption]]	Enablement for one or more debugging functions	None
length_column_name	str	Column name for precomputed length	"length"
group_by_length	bool	Whether to group together samples with roughly the same length in the training dataset	False
ddp_find_unused_parameters	bool	Whether to pass `find_unused_parameters` to `DistributedDataParallel`	None
report_to	List[str]	List of integrations to report the results and logs	"all"
ddp_bucket_cap_mb	int	Value of `bucket_cap_mb` passed to `DistributedDataParallel`	None
ddp_broadcast_buffers	bool	Whether to pass the value of `ddp_broadcast_buffers` to `DistributedDataParallel`	None
ddp_timeout	int	Timeout for DDP calls	1800
dataloader_pin_memory	bool	Whether to pin memory in the data loader	True
dataloader_persistent_workers	bool	Whether to maintain worker dataset instances alive	False
dataloader_prefetch_factor	int	Number of batches preloaded by each worker	None
skip_memory_metrics	bool	Whether to skip adding of memory profiler report to metrics	True
gradient_checkpointing	bool	Whether to use gradient checkpoints to save memory	False
gradient_checkpointing_kwargs	dict	Arguments related to gradient checkpoints	None
auto_find_batch_size	bool	Whether to find a batch size that fits into memory automatically through exponential decay	False
full_determinism	bool	Whether to call `enable_full_determinism` instead of `set_seed` to ensure reproducible results in distributed training	False
torchdynamo	str	Backend compiler of TorchDynamo (Not supported)	None
ray_scope	str	Scope to use when doing hyperparameter search with Ray	"last"
use_mps_device	bool	Whether to use mps device (Not supported)	False
torch_compile	bool	Whether to use PyTorch 2.0 to compile the model (Not supported)	False
torch_compile_backend	str	Backend used by torch.compile (Not supported)	None
torch_compile_mode	str	torch.compile mode (Not supported)	None
split_batches	bool	Whether to split batches generated by the data loader to devices	None
include_tokens_per_second	bool	Whether to calculate tokens per second of each device	None
include_num_input_tokens_seen	bool	Whether to track the number of input tokens seen throughout training	None
neftune_noise_alpha	float	Whether to activate NEFTune noise embeddings	None
optim_target_modules	Union[str, List[str]]	Target module to be optimized	None

Parameters supported independently by MindSpore

Name	Type	Description	Default Value
only_save_strategy	bool	Whether a task directly exits after saving the strategy file	False
auto_trans_ckpt	bool	Whether to enable automatic weight transformation	False
src_strategy	str	Distributed strategy file for preloaded weights	None
batch_size	int	Training batch size of each device. per_device_train_batch_size will be overwritten.	None
sink_mode	bool	Whether to directly sink data to devices through channels.	True
sink_size	int	Size of sunk data for training or evaluation in each step.	2
mode	int	Running in GRAPH_MODE (0) or PYNATIVE_MODE (1).	0
resume_training	bool	Whether to enable resumable training	False
remote_save_url	str	OBS saving path	None
device_id	int	Device ID	0
device_target	str	Target device where the task is executed. Value options: 'Ascend', 'GPU', or 'CPU'.	"Ascend"
enable_graph_kernel	bool	Whether to use graph fusion.	False
graph_kernel_flags	str	Level of graph fusion.	"--opt_level=0"
save_graphs	bool	Whether to save computational graphs	False
save_graphs_path	str	Path for saving computational graphs	"./graph"
max_call_depth	int	Maximum depth of a function call.	10000
max_device_memory	str	Maximum available memory of the device.	"1024GB"
use_parallel	bool	Whether to enable the parallel mode	False
parallel_mode	int	Whether to run in data parallel (0), semi-automatic parallel (1), automatic parallel (2), or hybrid parallel (3) mode.	1
gradients_mean	bool	Whether to execute the mean operator after gradient AllReduce	False
loss_repeated_mean	bool	Whether to execute the mean operator backwards during repeated computation.	False
enable_alltoall	bool	Indicates whether to generate the AllToAll communication operator during communication.	False
enable_parallel_optimizer	bool	Whether to enable optimizer parallelism	False
full_batch	bool	If the entire batch dataset is loaded in automatic parallel mode, set full_batch to True. This API is not recommended. Replace it with dataset_strategy.	True
dataset_strategy	Union[str, tuple]	Dataset sharding strategy.	"full_batch"
search_mode	str	Strategy search mode, which is valid only in automatic parallel mode. This API is an experimental API. Exercise caution when using this API.	"sharding_propagation"
data_parallel	int	Data parallelism	1
gradient_accumulation_shard	bool	Whether the gradient accumulation variable is sharded along the data parallelism dimension.	False
parallel_optimizer_threshold	int	Sets the threshold for parameter optimizer.	64
optimizer_weight_shard_size	int	Sets the size of the communicator for which the optimizer weight is sharded.	-1
strategy_ckpt_save_file	str	Path for saving distribution strategy files.	"./ckpt_strategy.ckpt"
model_parallel	int	Model parallelism	1
expert_parallel	int	Expert parallelism	1
pipeline_stage	int	Pipeline parallelism	1
gradient_aggregation_group	int	Size of a gradient communication operation fusion group.	4
micro_batch_num	int	Minimum number of batches for pipeline computation	1
micro_batch_interleave_num	int	Number of concurrent copies	1
use_seq_parallel	bool	Whether to enable sequence parallelism.	False
vocab_emb_dp	bool	Whether to split the vocabulary only along the data parallelism dimension.	True
expert_num	int	Number of experts.	1
capacity_factor	float	Expert factor.	1.05
aux_loss_factor	float	Loss contribution factor.	0.05
num_experts_chosen	int	Number of experts selected for each marker.	1
recompute	bool	Recomputation	False
select_recompute	bool	Selective recomputation	False
parallel_optimizer_comm_recompute	bool	Whether to recalculate the AllGather communication introduced by the optimizer parallelism.	False
mp_comm_recompute	bool	Whether to recalculate the communication operations introduced by model parallelism.	True
recompute_slice_activation	bool	Whether to slice the cell output stored in the memory.	False
layer_scale	bool	Whether to enable layer decay.	False
layer_decay	float	Layer decay coefficient.	0.65
lr_end	float	End learning rate.	1e-6
warmup_lr_init	float	Initial learning rate in the warm-up phase.	0.0
warmup_epochs	int	Performs linear preheating in the warmup_epochs part of the total number of steps.	None
lr_scale	bool	Whether to enable learning rate scaling.	False
lr_scale_factor	int	Learning rate scaling factor.	256
python_multiprocessing	bool	Whether to enable the Python multi-process mode.	False
numa_enable	bool	Set the default status of NUMA to enabled.	False
prefetch_size	int	Sets the queue capacity of threads in a pipe.	1
wrapper_type	str	Class name of the wrapper.	"MFTrainOneStepCell"
scale_sense	Union[str, float]	Value or class name of scale sense.	"DynamicLossScaleUpdateCell"
loss_scale_value	int	Initial loss scaling factor.	65536
loss_scale_factor	int	Increment and decrement factors of the loss scaling coefficient.	2
loss_scale_window	int	Maximum number of consecutive training steps for increasing the loss scaling coefficient.	1000
use_clip_grad	bool	Whether to enable gradient clipping.	True
train_dataset	str	Training dataset path	None
eval_dataset	str	Evaluation dataset path	None
dataset_task	str	Task type corresponding to a dataset	None
dataset_type	str	Dataset type	None
train_dataset_in_columns	list[str]	Training dataset input column names	None
train_dataset_out_columns	list[str]	Training dataset output column names	None
eval_dataset_in_columns	list[str]	Evaluation dataset input column names	None
eval_dataset_out_columns	list[str]	Evaluation dataset output column names	None
shuffle	bool	Whether the training dataset is out of order	True
repeat	int	Number of repetitions of the training dataset	1
metric_type	Union[List[str], str]	Matrix format.	None
save_seconds	int	Checkpoints are saved every X seconds.	None
integrated_save	bool	Whether to merge and save the split tensors in the automatic parallelism scenario.	None
eval_epochs	int	Number of epoch intervals between evaluations. 1 indicates that evaluation is performed at the end of each epoch.	None
profile	bool	Whether to enable profiling	False
profile_start_step	int	Start step of profiling	1
profile_end_step	int	End step of profiling	10
init_start_profile	bool	Whether to enable data collection during profiler initialization.	False
profile_communication	bool	Whether to collect communication performance data during multi-device training.	False
profile_memory	bool	Whether to collect tensor memory data.	True
auto_tune	bool	Whether to enable automatic data acceleration.	False
filepath_prefix	str	Save path and file prefix of the optimized global configuration.	"./autotune"
autotune_per_step	int	Sets the step interval for adjusting the automatic data acceleration configuration.	10

train_batch_size

Obtains the training batch size.

Prototype

python

def train_batch_size()

eval_batch_size

Obtains the evaluation batch size.

Prototype

python

def eval_batch_size()

world_size

Obtains the number of parallel processes.

Prototype

python

def world_size()

process_index

Obtains the index of the current process.

Prototype

python

def process_index()

local_process_index

Obtains the index of the current local process.

Prototype

python

def local_process_index()

should_log

Determines whether the current process should generate logs. Currently, this API is supported only by PyTorch.

Prototype

python

def should_log()

should_save

Determines whether the current process should be written to a disk. Currently, this API is supported only by PyTorch.

Prototype

python

def should_save()

_setup_devices

Sets the device. Currently, this API is supported only by PyTorch.

Prototype

python

def _setup_devices()

device

Obtains the device used by the current process. Currently, this API is supported only by PyTorch.

Prototype

python

def device()

get_process_log_level

Obtains the process log level. Currently, this API is supported only by PyTorch.

Prototype

python

def get_process_log_level()

main_process_first

Indicates that the main process takes precedence. Currently, this API is supported only by PyTorch.

Prototype

python

def main_process_first(local: bool = True, desc: str = "work")

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
local	Whether it is local	bool	Not supported.
desc	Work description	str	Not supported.

get_warmup_steps

Obtains the number of warmup iteration steps.

Prototype

python

def get_warmup_steps(num_training_steps: int)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
num_training_steps	Number of training iteration steps	int	int

to_dict

Serializes instances into a dictionary.

Prototype

python

def to_dict()

to_json_string

Serializes instances into a JSON string.

Prototype

python

def to_json_string()

to_sanitized_dict

Serializes instances into a parameter dictionary that can be used for TensorBoard. Currently, this API is supported only by PyTorch.

Prototype

python

def to_sanitized_dict()

set_training

Set training parameters.

Prototype

python

def set_training(
    learning_rate: float = 5e-5,
    batch_size: int = 8,
    weight_decay: float = 0,
    num_epochs: float = 3,
    max_steps: int = -1,
    gradient_accumulation_steps: int = 1,
    seed: int = 42,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
learning_rate	Initial learning rate	float	float
batch_size	Training batch size of each device	int	int
weight_decay	Weight decay	float	float
num_epochs	Total number of training epochs	float	float
max_steps	Maximum number of training steps	int	Not supported.
gradient_accumulation_steps	Number of update steps of gradient accumulation	int	int
seed	Random seed set at the beginning of training	int	int
kwargs["gradient_checkpointing"]	If the value is True, gradient checkpoints are used to save memory, but the backpropagation speed is slowed down.	bool	Not supported.

set_evaluate

Sets evaluation parameters.

Prototype

python

def set_evaluate(
    strategy: Union[str, IntervalStrategy] = "no",
    steps: int = 500,
    batch_size: int = 8,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
strategy	Evaluation strategy used during training. The options are as follows: - `"no"`: No evaluation is performed during training. - `"steps"`: An evaluation is performed (and logged) every `steps`. - `"epoch"`: An evaluation is performed at the end of each epoch.	Union[str, IntervalStrategy]	Union[str, IntervalStrategy]
steps	Number of update steps between two evaluations if `strategy="steps"`.	int	int
batch_size	Batch size of each device used for evaluation	int	int
kwargs["accumulation_steps"]	Number of prediction steps to accumulate the output tensors before the output tensors are moved to the CPU	int	Not supported.
kwargs["delay"]	Number of epochs or steps to wait before the first evaluation is performed, which depends on the evaluation_strategy	float	Not supported.
kwargs["loss_only"]	Ignores all outputs except losses	bool	Not supported.
kwargs["jit_mode"]	Whether to use PyTorch JIT in inference	bool	Not supported.

set_testing

Sets test parameters.

Prototype

python

def set_testing(
    batch_size: int = 8,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
batch_size	Batch size of each device used for testing	int	int
kwargs["loss_only"]	Ignores all outputs except losses	bool	bool
kwargs["jit_mode"]	Whether to use PyTorch JIT in inference	bool	Not supported.

set_save

Sets and saves all relevant parameters.

Prototype

python

def set_save(
    strategy: Union[str, IntervalStrategy] = "steps",
    steps: int = 500,
    total_limit: Optional[int] = None,
    on_each_node: bool = False,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
strategy	Weight saving strategy. The options are as follows: - `"no"`: Checkpoints are not saved during training. - `"epoch"`: Checkpoints are saved at the end of each epoch. - `"steps"`: Checkpoints are saved every `save_steps` .	Union[str, IntervalStrategy]	Union[str, IntervalStrategy]
steps	Number of update steps between two checkpoint savings if `strategy="steps"`	int	int
total_limit	Limits the total number of checkpoints. Older checkpoints in `output_dir` are deleted.	int	int
on_each_node	Whether to save models and checkpoints on each node or only on the primary node during multi-node distributed training	bool	bool

set_logging

Sets all parameters related to log recording.

Prototype

python

def set_logging(
    strategy: Union[str, IntervalStrategy] = "steps",
    steps: int = 500,
    report_to: Union[str, List[str]] = "none",
    level: str = "passive",
    first_step: bool = False,
    nan_inf_filter: bool = False,
    on_each_node: bool = False,
    replica_level: str = "passive",
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
strategy	Training log saving strategy. The options are as follows: - `"no"`: No log is recorded during training. - `"epoch"`: Logs are recorded at the end of each epoch. - `"steps"`: Logs are recorded every `save_steps`.	Union[str, IntervalStrategy]	Union[str, IntervalStrategy]
steps	Number of update steps between two log records if `strategy="steps"`.	int	int
report_to	Token used to push a model to the Hub	str	Not supported.
level	Logger log level to be used on the main process. The options include `"debug"`, `"info"`, `"warning"`, `"error"`, `"critical"`, and `"passive"`.	str	Not supported.
first_step	Whether to record and evaluate the first `global_step`.	bool	Not supported.
nan_inf_filter	Whether to filter the `nan` and `inf` losses in logs.	bool	Not supported.
on_each_node	Whether `log_level` is used for logging on each node or only on the primary node during distributed training.	bool	Not supported.
replica_level	Logger log level used on replicas	str	Not supported.

set_push_to_hub

Sets all parameters for synchronizing checkpoints with the Hub.

Prototype

python

def set_push_to_hub(
    model_id: str,
    strategy: Union[str, HubStrategy] = "every_save",
    token: Optional[str] = None,
    private_repo: bool = False,
    always_push: bool = False,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
model_id	Name of the repository synchronized with the local output_dir. It can be a simple model ID, in which case the model will be pushed to your namespace. It can also be the name of the repository, for example, `"user_name/model"`.	str	str
strategy	Defines the strategy for pushing data to the Hub. The options are as follows: - `"end"`: When the `Trainer.save_model` method is invoked, push a model and its configuration, tokenizer, and model card. - `"every_save"`: Each time a model is saved, push the model and its configuration, tokenizer, and model card. The push is asynchronous and does not block training. If saving is very frequent, a new push will only be attempted after the previous push is complete. At the end of training, the final model will make one last push. - `"checkpoint"`: similar to `"every_save"`, but the latest checkpoint is also pushed to a subfolder named last-checkpoint, making it easy to resume training using `trainer.train(resume_from_checkpoint="last-checkpoint")`. - `"all_checkpoints"`: similar to `"checkpoint"`, but all checkpoints are pushed (so you will get a checkpoint folder for each folder in your final repository).	Union[str, HubStrategy]	Union[str, HubStrategy]
token	Token used to push a model to the Hub	str	str
private_repo	If the value is `True`, a Hub repository will be set to private.	bool	bool
always_push	If the value is `False`, the `Trainer` will skip checkpoint pushing when the previous push is not complete.	bool	bool

set_optimizer

Sets all parameters related to the optimizer and its hyperparameters.

Prototype

python

def set_optimizer(
    name: Union[str, OptimizerNames],
    learning_rate: float = 5e-5,
    weight_decay: float = 0,
    beta1: float = 0.9,
    beta2: float = 0.999,
    epsilon: float = 1e-8,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
name	Optimizer type	Union[str, OptimizerNames]	Union[str, OptimizerType]
learning_rate	Initial learning rate	float	float
lr_end	End learning rate.	Not supported.	float
weight_decay	Weight decay	float	float
beta1	beta1 hyperparameter of the Adam optimizer or its variants	float	float
beta2	beta2 hyperparameter of the Adam optimizer or its variants	float	float
epsilon	epsilon hyperparameter of the Adam optimizer or its variants	float	float
kwargs["args"]	Optional parameter passed to AnyPrecisionAdamW (valid only when `optim="adamw_anyprecision"`). The default value is None.	str	Not supported.

set_lr_scheduler

Sets all parameters related to the learning rate scheduler and its hyperparameters.

Prototype

python

def set_lr_scheduler(
    name: Union[str, SchedulerType] = "linear",
    num_epochs: float = 3.0,
    max_steps: int = -1,
    warmup_ratio: float = 0,
    warmup_steps: int = 0,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
name	Type of the learning rate scheduler	Union[str, SchedulerType]	Union[str, LrSchedulerType]
num_epochs	Total number of training epochs	float	float
max_steps	Maximum number of training steps	int	Not supported.
warmup_ratio	Ratio of total training steps used for a linear warmup from 0 to `learning_rate`.	float	float
warmup_steps	Number of steps used for a linear warmup from 0 to `learning_rate`.	int	int

set_dataloader

Sets a data loader.

Prototype

python

def set_dataloader(
    train_batch_size: int = 8,
    eval_batch_size: int = 8,
    drop_last: bool = False,
    num_workers: int = 0,
    ignore_data_skip: bool = False,
    sampler_seed: Optional[int] = None,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
train_batch_size	Training batch size	int	int
eval_batch_size	Evaluation batch size	int	int
drop_last	Whether to drop the last incomplete batch	bool	bool
num_workers	Number of subprocesses used for data loading	int	int
ignore_data_skip	Whether to skip the batches and epochs during training resumption to get the data loading at the same stage as in the previous training	bool	bool
sampler_seed	Random seed for data sampler	int	int
kwargs["pin_memory"]	Whether to pin memory in the data loader. The default value is True.	bool	Not supported.
kwargs["persistent_workers"]	If the value is True, the data loader does not close the worker process after a dataset has been consumed once. This allows the dataset instance of the worker process to be active. The training may be accelerated, but the RAM usage increases. The default value is False.	bool	Not supported.
kwargs["auto_find_batch_size"]	Automatically finds a batch size that will fit into memory. accelerate needs to be installed. The default value is False.	bool	Not supported.
kwargs["prefetch_factor"]	Number of batches that each process loads in advance.	int	Not supported.

openmind.Trainer Class

The Trainer class is used to implement functions such as model training, evaluation, and inference. It is the core component of training and provides many methods and functions to manage the entire training process, including data loading, model forward propagation, loss calculation, and gradient update.

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
args	Arguments used to configure datasets, hyperparameters, and optimizers	TrainingArguments	TrainingArguments
task	Enumerates the task types.	Not supported.	str
model	Model instance used for training, evaluation, or prediction	Union[PreTrainedModel, torch.nn.Module]	Union[mindformers.models.PreTrainedModel, str]
model_name	Model name.	Not supported.	str
pet_method	PET method name.	Not supported.	str
tokenizer	Tokenizer	PreTrainedTokenizerBase	mindformers.models.PreTrainedTokenizerBase
train_dataset	Training dataset	Dataset	Union[str, mindspore.dataset.BaseDataset]
eval_dataset	Evaluation dataset	Union[Dataset, Dict[str, Dataset]]	Union[str, mindspore.dataset.BaseDataset]
data_collator	Function for batch data processing	DataCollator	Not supported.
image_processor	Processor for image pre-processing.	Not supported.	mindformers.models.BaseImageProcessor
audio_processor	Processor for audio pre-processing.	Not supported.	mindformers.models.BaseAudioProcessor
optimizers	Optimizer	Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR]	mindspore.nn.Optimizer
compute_metrics	Function for computing metrics during evaluation	Callable[[EvalPrediction], Dict]	Union[dict, set]
callbacks	Callback function list	List[TrainerCallback]	Union[List[mindspore.train.Callback], mindspore.train.Callback]
eval_callbacks	Evaluation callback function list.	Not supported.	Union[List[mindspore.train.Callback], mindspore.train.Callback]
model_init	Function that instantiates the model to be used	Callable[[], PreTrainedModel]	Not supported.
preprocess_logits_for_metrics	Function that preprocesses the output results before computing metrics	Callable[[torch.Tensor, torch.Tensor], torch.Tensor]	Not supported.
save_config	Saves the configuration of the current task.	Not supported.	bool

train

Performs training steps.

Prototype

python

def train(*args, **kwargs)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
train_checkpoint	Restores the training weight of the network.	Not supported.	Union[str, bool]
resume_from_checkpoint	Preloaded weights	Union[str, bool]	Union[str, bool]
trial	Trial run or hyperparameter dictionary for hyperparameter search	Union["optuna.Trial", Dict[str, Any]]	Not supported.
ignore_keys_for_eval	List of keys in the model output that should be ignored when used to collect evaluation predictions during training	List[str]	Not supported.
resume_training	Resumable training switch	Not supported.	bool
auto_trans_ckpt	Automatic weight transformation switch	Not supported.	bool
src_strategy	Distributed strategy file for preloaded weights	Not supported.	str
do_eval	Whether to perform evaluation during training.	Not supported.	bool

evaluate

Evaluates the operation.

Prototype

python

def evaluate(*args, **kwargs)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
eval_dataset	Evaluation dataset	Union[Dataset, Dict[str, Dataset]]	Union[str, mindspore.dataset.BaseDataset, mindspore.dataset.Dataset, Iterable]
eval_checkpoint	Weight of the evaluation network.	Not supported.	Union[str, bool]
ignore_keys	List of keys in the model output that should be ignored when used to collect predictions	List[str]	Not supported.
metric_key_prefix	Metric name prefix	str	Not supported.
auto_trans_ckpt	Automatic weight transformation switch	Not supported.	bool
src_strategy	Distributed strategy file for preloaded weights	Not supported.	str

predict

Executes inference.

Prototype

python

def predict(*args, **kwargs)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
predict_checkpoint	Weight of the inference network.	Not supported.	Union[str, bool]
test_dataset	Inference dataset	Dataset	Not supported.
ignore_keys	List of keys in the model output that should be ignored when used to collect predictions	List[str]	Not supported.
metric_key_prefix	Metric name prefix	str	Not supported.
input_data	Input data for inference	Not supported.	Union[GeneratorDataset,Tensor, np.ndarray, Image, str, list]
batch_size	Batch size	Not supported.	int
auto_trans_ckpt	Automatic weight transformation switch	Not supported.	bool
src_strategy	Distributed strategy file for preloaded weights	Not supported.	str

add_callback

Adds a callback function to the current callback list.

Prototype

python

def add_callback(callback)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
callback	Callback used to return the result.	Union[type, TrainerCallback]	Union[type, mindspore.train.Callback]

pop_callback

Deletes a callback from the current callback list and returns it. If the callback cannot be found, None is returned (no error is thrown).

Prototype

python

def pop_callback(callback)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
callback	Callback used to return the result.	Union[type, TrainerCallback]	Union[type, mindspore.train.Callback]

remove_callback

Deletes a callback function from the current callback list.

Prototype

python

def remove_callback(callback)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
callback	Callback used to return the result.	Union[type, TrainerCallback]	Union[type, mindspore.train.Callback]

save_model

Saves a model so that you can reload it using the from_pretrained() method.

Prototype

python

def save_model(*args, **kwargs)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
output_dir	Model saving path	str	str
_internal_call	Whether to upload the model to the repository hub when the `save_model` method is called when `args.push_to_hub` is set to `True`. The default value is False, indicating that the push is performed.	bool	bool

init_hf_repo

Create and initialize git repo in args.hub_model_id.

Prototype

python

def init_hf_repo()

push_to_hub

Upload model and tokenizer to the args.hub_model_id repository on the Hub.

Prototype

python

def push_to_hub(
    commit_message: Optional[str] = "End of training",
    blocking: bool = True,
    **kwargs,
)

Parameters

Name	Description	PyTorch Supported Type	MindSpore Supported Type
commit_message	Message to commit during push. The default value is End of training	str	str
blocking	Whether the function should be returned only when `git push` is complete. The default value is True.	bool	bool

Trainer APIs ​

openmind.TrainingArguments Class ​

train_batch_size ​

eval_batch_size ​

world_size ​

process_index ​

local_process_index ​

should_log ​

should_save ​

_setup_devices ​

device ​

get_process_log_level ​

main_process_first ​

get_warmup_steps ​

to_dict ​

to_json_string ​

to_sanitized_dict ​

set_training ​

set_evaluate ​

set_testing ​

set_save ​

set_logging ​

set_push_to_hub ​

set_optimizer ​

set_lr_scheduler ​

set_dataloader ​

openmind.Trainer Class ​

train ​

evaluate ​

predict ​

add_callback ​

pop_callback ​

remove_callback ​

save_model ​

init_hf_repo ​

push_to_hub ​

Trainer APIs

openmind.TrainingArguments Class

train_batch_size

eval_batch_size

world_size

process_index

local_process_index

should_log

should_save

_setup_devices

device

get_process_log_level

main_process_first

get_warmup_steps

to_dict

to_json_string

to_sanitized_dict

set_training

set_evaluate

set_testing

set_save

set_logging

set_push_to_hub

set_optimizer

set_lr_scheduler

set_dataloader

openmind.Trainer Class

train

evaluate

predict

add_callback

pop_callback

remove_callback

save_model

init_hf_repo

push_to_hub