English
Content on This Page

Model Inference

In machine learning and deep learning, model inference is a process in which a trained model is used to make predictions on new input data. The inference process usually involves the following steps:

  1. Input processing: Formats or standardizes new input data (such as images, texts, and sounds) so the input can be accepted by the model.

  2. Forward propagation: The input data is propagated forward through the network or model structure. This process involves operations such as weighted summation of parameters and function activation.

  3. Output generation: The model generates output based on the forward propagation result. The output can be a classification label, continuous value, or probability distribution.

  4. Postprocessing: In some cases, the original output of the model needs to be further processed or converted to be more intuitive or meet application requirements.

These can all be done by using openMind Library pipeline, which calls AI models in an end-to-end manner. You only need to compile code simply to complete inference, greatly improving development efficiency.

openMind Library pipeline method supports the PyTorch and MindSpore frameworks. In addition, pipeline supports tasks in multiple domains, such as text generation, text classification, and image recognition.

This document describes how to use the pipeline to load a model and perform inference from the following aspects:

openMind Library Environment Setup

For details, see openMind Library Installation Guide.

Basic Pipeline Usage

The pipeline supports two frameworks: PyTorch and MindSpore. When defining the pipeline, you can use the framework parameter to specify the framework. pt indicates the PyTorch framework, and ms indicates the MindSpore framework. PyTorch framework supports two types of backend: transformers and diffusers. MindSpore supports three types of backend: mindformers, mindone, and mindnlp, which are transferred through the backend parameter.

In openMind Library, inference tasks in each framework have their own pipeline methods. For example, in the PyTorch framework, a text-to-audio task may be implemented by using the TextToAudioPipeline method. To streamline operations, a common pipeline method is provided to load the corresponding task.

Supported Frameworks

The pipeline supports the following frameworks:

  • PyTorch: pt is used as the value of framework.
  • MindSpore: ms is used as the value of framework.

Supported Backends

In addition, different frameworks support different backend.

  • The PyTorch framework supports the following two types of backend:

    • transformers
    • diffusers
  • The MindSpore framework supports the following two types of backend:

    • mindformers
    • mindnlp
    • mindone

These backend types can be specified by the backend parameter.

Example of Using the Pipeline

You can configure task, model, framework, and backend to load the model of the corresponding framework and task.

  1. Text generation task based on transformers in the PyTorch framework:

    python
    from openmind import pipeline
    
    pipe = pipeline(
        task="text-generation",
        model="Baichuan/Baichuan2_7b_chat_pt",
        framework="pt",
        backend="transformers",
        trust_remote_code=True,
        device="npu:0",
    )
    output = pipe("Give three tips for staying healthy.")
    print(output)
     
    '''
    Output:
    1. Eat a balanced diet: Ensure that your diet includes a mix of fruits, vegetables, whole grains, lean proteins, and healthy fats. This will provide your body with the essential nutrients it needs to function properly.
    2. Stay hydrated: Drink plenty of water throughout the day to help flush out toxins and maintain proper body functions. Avoid drinking too much sugar-sweetened or caffeinated beverages as these can lead to dehydration.
    3. Be active: Aim to get at least 150 minutes of moderate-intensity aerobic activity or 75 minutes of vigorous-intensity aerobic activity per week, along with muscle-strengthening activities on two or more days per week. This will help you maintain a healthy weight, improve cardiovascular health, and reduce the risk of chronic diseases.
    '''
    
  2. Text-to-image generation task based on diffusers in the PyTorch framework:

    python
    from openmind import pipeline
    from PIL import Image
    
    pipe=pipeline(
        task="text-to-image",
        model="PyTorch-NPU/stable-diffusion-xl-base-1_0",
        framework="pt",
        backend="diffusers",
        device="npu:0",
    )
    image = pipe("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting")
    image.save("diffusers.png")
    

    diffusers

  3. Text generation task based on mindformers in the MindSpore framework:

    python
    from openmind import pipeline
    import mindspore as ms
    
    ms.set_context(mode=0, device_id=0, jit_level='o0', infer_boost='on', max_device_memory='59GB') 
    
    pipe = pipeline(task="text-generation",
                    model='MindSpore-Lab/qwen1_5_7b',
                    framework='ms',
                    model_kwargs={"use_past": True},
                    trust_remote_code=True)
    outputs = pipe("Give me some advice on how to stay healthy.")
    print(outputs)
    
  4. Text generation task based on mindnlp in the MindSpore framework:

    python
    from openmind import pipeline
    
    generator = pipeline(
        task="text-generation",
        model="AI-Research/Qwen2-7B",
        framework="ms",
        backend="mindnlp",
    )
    outputs = generator("Give me some advice on how to stay healthy.")
    print(outputs)
    
  5. Text-to-image generation task based on mindone in the MindSpore framework:

    python
    from openmind import pipeline
    import mindspore
    
    pipe = pipeline(
        "text-to-image",
        model="AI-Research/stable-diffusion-3-medium-diffusers",
        backend="mindone",
        framework="ms",
        mindspore_dtype=mindspore.float16,
    )
    image = pipe("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k")[0][0]
    image.save("mindone.png")
    

SiliconDiff Inference Acceleration

Introduction to SiliconDiff

SiliconDiff is a diffusion model acceleration library developed by SiliconFlow. Based on the leading diffusion model acceleration technology, SiliconDiff aims to provide high-performance text-to-image solutions by combining top hardware resources in China, such as Ascend chips.

SiliconDiff Acceleration Principles

SiliconDiff is based on the torch compile + torch npu technical solution. It supports operator fusion, redundancy calculation elimination, and JIT optimization at the backend of the customized compiler. In addition, it supports dynamic shapes. No extra compilation overhead is required for shape switching.

Silicondiff

Using SiliconDiff

For tasks on the diffusers, you can use the use_silicondiff parameter to accelerate the inference and improve the inference performance.

python
from openmind import pipeline
import torch

generator = pipeline(task="text-to-image", 
                     model="PyTorch-NPU/stable-diffusion-xl-base-1_0", 
                     device="npu:0",
                     torch_dtype=torch.float16,
                     use_silicondiff=True,
                     )

image = generator(prompt="masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting",)

The versions of silicondiff_npu and PyTorch are as follows:

PyTorch Versionsilicondiff_npu Version
2.1.02.1.0.post3

Model Support and Performance Improvement

The following table lists the supported models and their performance improvement after SiliconDiff is enabled on the server.

ModelSiliconDiff DisabledSiliconDiff EnabledPerformance
Stable Diffusion v1.54.20s3.80s10.62%
SD-XL 1.0-base9.11s8.35s9.13%
Stable Diffusion v2.13.90s3.46s12.64%

Lossless Accuracy

The following figure shows the images generated before and after SiliconDiff is used.

Stable Diffusion v1.5

Diffusers + Torch-NPUDiffusers + SiliconDiff-NPU

SD-XL 1.0-base

Diffusers + Torch-NPUDiffusers + SiliconDiff-NPU

Default Loading

Note that if some parameters are not specified, pipeline is loaded by default based on the existing parameter settings.

  • If only the task parameter is specified, the pipeline is loaded based on the default framework, backend, and model.
  • If only the task and framework are specified, the pipeline is loaded based on the default backend and model.
  • If only the task, framework, and backend are specified, the pipeline is loaded based on the default model.

For details about the default framework, backend and model of different inference tasks, see Inference Tasks and Parameters Supported by Pipeline.

When using pipeline, you can use openMind Models to search for models that meet your requirements. If no proper model is found, you can train the model or fine-tune the model. You are advised to upload the trained or fine-tuned model to the openMind Models for sharing with developers. For details about how to upload the model, see Model Sharing.

Pipeline Parameters

Key Parameters

framework

pipeline supports the PyTorch (pt) and MindSpore (ms) framework, which are specified by framework. The following is a pipeline instance running on the MindSpore framework:

python
from openmind import pipeline

text_pipeline_ms = pipeline(task="text-generation", model="MindSpore-Lab/baichuan2_7b_chat", framework='ms')
output = text_pipeline_ms("hello!")

backend

The PyTorch framework supports the following two types of backend: transformers and diffusers The MindSpore framework supports three backend: mindformers, mindone, and mindnlp. The value is specified by the backend parameter.

  • The following is a pipeline instance that runs in the MindSpore framework and is specified as mindnlp at the backend:
python
from openmind import pipeline

text_pipeline_ms = pipeline(task="text-generation", model="AI-Research/Qwen2-7B", framework='ms', backend="mindnlp")
output = text_pipeline_ms("Give me some advice on how to stay healthy.")

device

You can use the device parameter to specify the processor where the inference task is located. Currently, three types of processors are supported: CPU and NPU. If the device parameter is not specified, pipeline automatically selects a processor. All these three processors can run properly on the PyTorch and MindSpore frameworks. The following lists examples using different processor:

  • CPU
python
generator = pipeline(task="text-generation", device="cpu")
  • NPU
python
# PyTorch
generator = pipeline(task="text-generation", device="npu:0")

model and tokenizer

In addition to the model address, the model parameter can also be used to pass an instantiated model object for inference. When the model parameter is used to pass an instantiated model object, the tokenizer parameter must also be used to pass a specific instantiated object.

python
from openmind import pipeline
from openmind import AutoModelForSequenceClassification, AutoTokenizer

# Create a model object and perform inference.
model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
text_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer, framework="pt")

outputs = text_classifier("This is great !")
# [{'label': 'POSITIVE', 'score': 0.9998694658279419}]

use_silicondiff

For tasks on the diffusers, you can use the use_silicondiff parameter to accelerate the inference and improve the inference performance.

python
from openmind import pipeline
import torch

generator = pipeline(task="text-to-image", 
                     model="PyTorch-NPU/stable-diffusion-xl-base-1_0", 
                     device="npu:0",
                     torch_dtype=torch.float16,
                     use_silicondiff=True,
                     )

image = generator("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting")

Specific Parameters

pipeline provides specific parameters for model inference, which can be configured separately. For a text generation task, you can specify parameters such as max_new_tokens and num_beams to control the text length and beam size to be generated.

python
from openmind import pipeline

# Set task-specific parameters.
params = {
    "max_new_tokens": 50,  # Limit the text length to be generated to 50 tokens.
    "num_beams": 5  # Use the beam search algorithm to generate text. The beam size is 5.
}

text_generator = pipeline("text-generation", device="npu:0", trust_remote_code=True, **params)
generated_text = text_generator("Once upon a time,")
print(generated_text)

'''
Output:
Once upon a time, there was a small village nestled between two mountains. The villagers lived simple lives, working the land and taking care of their families. One day, a stranger arrived in the village. He was a wise old man with a long white beard and a ro
'''

Full Parameter Reference

For details about all pipeline parameters, see Pipeline APIs.

Inference Tasks and Parameters Supported by Pipeline

Default framework of the pipeline task, default backend of the framework, and supported backend types

Task NameDefault FrameworkDefault Backend of PyTorchDefault Backend of MindSporeSupported Backend
text-classificationPyTorchtransformerstransformers
text-to-imagePyTorchdiffusersmindonediffusers, mindone
visual-question-answeringPyTorchtransformerstransformers
zero-shot-object-detectionPyTorchtransformerstransformers
zero-shot-classificationPyTorchtransformerstransformers
depth-estimationPyTorchtransformerstransformers
image-to-imagePyTorchtransformerstransformers
mask-generationPyTorchtransformerstransformers
text-generationPyTorchtransformersmindformerstransformers, mindformers, mindnlp
zero-shot-image-classificationPyTorchtransformerstransformers
feature-extractionPyTorchtransformerstransformers
image-classificationPyTorchtransformerstransformers
image-to-textPyTorchtransformerstransformers
text2text-generationPyTorchtransformerstransformers
token-classificationPyTorchtransformerstransformers
fill-maskPyTorchtransformerstransformers
question-answeringPyTorchtransformerstransformers
summarizationPyTorchtransformerstransformers
table-question-answeringPyTorchtransformerstransformers
translationPyTorchtransformerstransformers

Inference Tasks and Models Supported by PyTorch

transformers backend

Task NameDefault Model
text-classification"PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english"
text-generation"Baichuan/Baichuan2_7b_chat_pt"
question-answering"PyTorch-NPU/roberta_base_squad2"
table-question-answering"PyTorch-NPU/tapas_base_finetuned_wtq"
fill-mask"PyTorch-NPU/bert_base_uncased"
summarization"PyTorch-NPU/bart_large_cnn"
zero-shot-image-classification"PyTorch-NPU/siglip_so400m_patch14_384"
feature-extraction"PyTorch-NPU/xlnet_base_cased"
depth-estimation"PyTorch-NPU/dpt_large"
image-classification"PyTorch-NPU/beit_base_patch16_224"
image-to-image"PyTorch-NPU/swin2SR_classical_sr_x2_64"
image-to-text"PyTorch-NPU/blip-image-captioning-large"
mask-generation"PyTorch-NPU/sam_vit_base"
text2text-generation"PyTorch-NPU/flan_t5_base"
zero-shot-classification"PyTorch-NPU/deberta_v3_large_zeroshot_v2.0"
zero-shot-object-detection"PyTorch-NPU/owlvit_base_patch32"
token-classification"PyTorch-NPU/camembert_ner"
translation"PyTorch-NPU/t5_base"
visual-question-answering"PyTorch-NPU/blip_vqa_base"

diffusers backend

Task NameDefault Model
text-to-image"PyTorch-NPU/stable-diffusion-xl-base-1_0"

Inference Tasks Supported by MindSpore

mindformers backend

Task NameDefault Model
text-generation"MindSpore-Lab/qwen1_5_7b"

[Note] When Mindformers is used for MindSpore model inference, the device memory must be greater than or equal to 64 GB.

mindnlp backend

Task NameDefault Model
text-generation"AI-Research/Qwen2-7B"

mindone backend

Task NameDefault Model
text-to-image"AI-Research/stable-diffusion-3-medium-diffusers"