模型推理

模型推理（Model Inference）是指在机器学习和深度学习中，使用训练好的模型对新的输入数据进行处理，以得到预测结果或决策的过程。推理过程通常涉及以下步骤：

1、输入处理：将新的输入数据（如图像、文本、声音等）格式化或标准化，使其符合模型期望的输入格式。

2、前向传播：输入数据通过网络或模型的结构进行前向传播，这一过程中会涉及参数的加权求和、激活函数处理等操作。

3、输出生成：模型根据前向传播的结果生成输出，输出可以是分类标签、连续值、概率分布等。

4、后处理：在某些情况下，模型的原始输出需要进一步处理或转换，以便更加直观或符合应用需求。

而使用openMind Library pipeline可以端到端地一键调用AI模型，用户只需对代码进行简单编写，即可完成推理，大幅提升开发效率。

openMind Library pipeline方法支持PyTorch和MindSpore两种框架，并且自动选择合适的框架进行推理。此外，pipeline方法支持多个领域的任务，例如文本生成、文本分类、图像识别等。

本章节将从以下几个方面介绍如何使用pipeline加载模型并进行推理：

openMind Library环境准备
pipeline基本用法
pipeline参数
pipeline推理的示例
pipeline当前支持的推理任务及其默认模型

openMind Library环境准备

详细步骤参考openMind Library安装指南。

pipeline基本用法

在openMind Library中，每个不同的推理任务都有一个相关联的pipeline方法，例如文本转音频任务对应TextToAudioPipeline方法。但使用通用的pipeline方法会更简单，该方法包含了所有特定的推理任务，并且自动加载该任务的默认模型和预处理类。下文将以文本生成任务为例，说明pipeline方法的使用方式：

通过配置task和model，可以加载对应任务的处理模型：

python

from openmind import pipeline

pipe = pipeline(task="text-generation", model="Baichuan/Baichuan2_7b_chat_pt")
output = pipe("Give three tips for staying healthy")
print(output)
  
'''
输出:
1. Eat a balanced diet rich in fruits, vegetables and whole grains.
2. Exercise regularly to improve your physical health.
3. Get enough sleep.
'''

使用PyTorch框架时，通过task参数指定一个推理任务，构造pipeline实例：
python
```
from openmind import pipeline

# 指定文本生成推理任务（MindSpore暂不支持默认模型）
pipe = pipeline(task="text-generation")

output = pipe("This is a test")
'''
输出：
{'output': 'This is a test of the power of the Internet'}
'''
```
当只指定task参数时，pipeline会指定一个默认的模型进行加载，不同推理任务的默认模型见pipeline当前支持的推理任务及其默认模型。

使用PyTorch框架时，通过model参数指定推理模型，构造pipeline实例：

python

from openmind import pipeline
pipe = pipeline(model="PyTorch-NPU/bert_large_uncased", device="npu:0")

output = pipe("Hello I'm a [MASK] model.")
'''
输出：
output: [{'score': 0.1074332845234985, 'token': 4827, 'token_str': 'fashion', 'sequence': "hello i'm a fashion model."}, {'score': 0.08821374173962973, 'token': 2535, 'token_str': 'role', 'sequence': "hello i'm a role model."}, {'score': 0.05331168323755264, 'token': 2047, 'token_str': 'new', 'sequence': "hello i'm a new model."}, {'score': 0.046342361718416214, 'token': 3565, 'token_str': 'super', 'sequence': "hello i'm a super model."}, {'score':0.027102632448072202, 'token':2986, 'token_str': 'fine', 'sequence': "hello i'm a fine model."}]
'''

当只指定model参数时，pipeline会自动获取推理任务，从而实例化pipeline对象。

在使用pipeline方式时，可以通过openMind模型库查找适合自己需求的模型。如果找不到合适的模型，开发者可以进行模型训练或模型微调。我们鼓励将训练/微调后的模型上传至openMind模型库分享给更多开发者使用，上传方式可参考模型分享。

pipeline参数

重要参数

framework

pipeline支持PyTorch(pt)和MindSpore(ms)两种框架，并通过framework参数来进行指定。当环境中只有一种框架时，如果用户不指定该参数，会根据当前开发环境自动选择框架。当环境中存在多种框架时，用户必须指定该参数。以下为运行在MindSpore框架上的pipeline实例：

python

from openmind import pipeline

text_pipeline_ms = pipeline(task="text_generation", model="MindSpore-Lab/baichuan2_7b_chat", trust_remote_code=True, framework='ms')
output = text_pipeline_ms("hello!")

device

用户可以通过device参数来指定推理任务所在的处理器，当前支持CPU、NPU类型的处理器。如果不指定device参数，pipeline将会自动选取处理器。无论选择哪种处理器，在PyTorch框架和MindSpore框架上都可以正常运行。以下为运行在各处理器上的示例：

指定在CPU上

python

generator = pipeline(task="text-generation", device="cpu")

指定在NPU上

python

# PyTorch
generator = pipeline(task="text-generation", device="npu:0")

python

# MindSpore
generator = pipeline(task="text-generation", model="MindSpore-Lab/baichuan2_7b_chat", trust_remote_code=True, device_id=0)

model

model参数除了支持传入模型地址，也支持传入实例化的模型对象来进行推理：

python

from openmind import pipeline
from openmind import AutoModelForSequenceClassification, AutoTokenizer

# 创建模型对象，并进行推理
model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
text_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer, framework="pt", device="npu:0")

outputs = text_classifier("This is great !")
# [{'label': 'POSITIVE', 'score': 0.9998695850372314}]

tokenizer

用户可以通过tokenizer参数指定特定的tokenizer进行推理：

python

from openmind import pipeline
from openmind import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/chatglm2_6b", trust_remote_code=True)
model = AutoModel.from_pretrained("PyTorch-NPU/chatglm2_6b", trust_remote_code=True).half()
pipe = pipeline(task="chat", model=model, tokenizer=tokenizer, framework="pt", device="npu:0")
outputs, history = pipe("人工智能是什么", do_sample=False)
print(output)

'''
输出:
人工智能(Artificial Intelligence, AI)是一种涵盖了多个学科领域的科技领域，旨在创造出可以执行与人类智能类相似的任务的计算机
程序和系统。AI通常包括机器学习、自然语言处理、计算机视觉、知识表示、推理、规划和决策等技术。其目标是使计算机能够自主地处理复杂的任务，例如识别图像和语音、自动驾驶、智能机器人、自然语言交互等。AI是一个快速发展的领域，涉及到数学、统计学、计算机科学、认知心理学、哲学等多个学科的知识。
'''

batch_size

默认情况下，pipeline不会对推理进行批处理，原因是使用批处理不一定会更快。

但如果想让批处理起作用，用户可以通过batch_size参数，在模型前向过程实现批量前向推理：

python

from openmind import pipeline
generator = pipeline(task="text-generation", batch_size=16, device="npu:0")
output = generator("人工智能是什么")
print(output)

'''
输出:
user: hello
assistant: Hi
'''

特定参数

pipeline提供了特定参数进行模型推理，可允许单独配置，以帮助用户完成工作。例如，对于文本生成任务，可以通过指定max_length和num_beams参数来控制生成的文本长度和生成的beam大小，以影响生成的结果：

python

from openmind import pipeline

text_generator = pipeline(task="text-generation", device="npu:0")

# 设置特定任务参数
params = {
    "max_length": 50,  # 生成的文本长度限制为50个token
    "num_beams": 5  # 使用beam search算法生成文本，beam大小为5
}
generated_text = text_generator("Once upon a time", **params)
print(generated_text[0]['generated_text'])

全量参数

pipeline的全量参数可以参考Pipeline API接口

pipeline推理的示例

可在不同场景任务下使用pipeline进行推理，详见以下示例。

图像任务

python

from PIL import Image
from openmind import pipeline
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

model_id = "PyTorch-NPU/siglip_so400m_patch14_384"
image_classifier = pipeline(task="zero-shot-image-classification", model=model_id, device="npu:0")
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

'''
输出:
[{'score': 0.5088, 'label': '2 cats'}, {'score': 0.0, 'label': 'a remote'}, {'score': 0.0, 'label': 'a plane'}]
'''

pipeline当前支持的推理任务及其默认模型

PyTorch当前支持的推理任务与默认模型

任务名称	默认模型
text-classification	"PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english"
text-generation	"Baichuan/Baichuan2_7b_chat_pt"
question-answering	"PyTorch-NPU/roberta_base_squad2"
table-question-answering	"PyTorch-NPU/tapas_base_finetuned_wtq"
fill-mask	"PyTorch-NPU/bert_base_uncased"
summarization	"PyTorch-NPU/bart_large_cnn"
zero-shot-image-classification	"PyTorch-NPU/siglip_so400m_patch14_384"
feature-extraction	"PyTorch-NPU/xlnet_base_cased"
depth-estimation	"PyTorch-NPU/dpt_large"
image-classification	"PyTorch-NPU/beit_base_patch16_224"
image-to-image	"PyTorch-NPU/swin2SR_classical_sr_x2_64"
image-to-text	"PyTorch-NPU/blip-image-captioning-large"
mask-generation	"PyTorch-NPU/sam_vit_base"
text2text-generation	"PyTorch-NPU/flan_t5_base"
zero-shot-classification	"PyTorch-NPU/deberta_v3_large_zeroshot_v2.0"
zero-shot-object-detection	"PyTorch-NPU/owlvit_base_patch32"
token-classification	"PyTorch-NPU/camembert_ner"
translation	"PyTorch-NPU/t5_base"
visual-question-answering	"PyTorch-NPU/blip_vqa_base"

MindSpore当前支持的推理任务

任务名
text-generation

模型推理 ​

openMind Library环境准备 ​

pipeline基本用法 ​

pipeline参数 ​

重要参数 ​

framework ​

device ​

model ​

tokenizer ​

batch_size ​

特定参数 ​

全量参数 ​

pipeline推理的示例 ​

图像任务 ​

pipeline当前支持的推理任务及其默认模型 ​

PyTorch当前支持的推理任务与默认模型 ​

MindSpore当前支持的推理任务 ​

模型推理

openMind Library环境准备

pipeline基本用法

pipeline参数

重要参数

framework

device

model

tokenizer

batch_size

特定参数

全量参数

pipeline推理的示例

图像任务

pipeline当前支持的推理任务及其默认模型

PyTorch当前支持的推理任务与默认模型

MindSpore当前支持的推理任务