什么是MLFlow模型?
用于打包机器学习模型的格式。
MLFlow 模型部署流程
- 定义 MLmodel 文件
- 学习、评估和自定义模型
- 部署模型
更多信息
- 保存格式
- 定义毛尔穆德尔
- 为模型建立服务器
mlflow models serve -m my_model
- 部署到 AWS 存储标记器
mlflow deployments create -t sagemaker -m my_model [other options]
- 可设定为Lmodel值
- time_created
- run_id
- signature
- input_example
- databricks_runtime
- mlflow_version
- 日志
- 为了重现环境,每次记录模型时,都会自动记录以下文件:
- conda.yaml
- python_env.yaml
- requirements.txt
- 模型签名
- 支持基于列的签名
- 支持基于张量签名
- 模型输入示例
- 提供有效模型输入的实例
- 模型 API
- add_flavor
- save
- .log
- load
- 标准内置翻转
- python_function
- crate
- h2o
- keras
- mleap
- pytorch
- sklearn
- spark
- tensorflow
- onnx
- gluon
- xgboost
- lightgbm
- catboost
- spacy
- fastai
- statsmodels
- prophet
- pmdarima
- diviner
- 部署模型
- 使用 API 部署
- 端点定义
- /ping 用于健康检查
- /health (与 /ping 相同)
- /用于获取版本mlflow版本
- /用于分配评分
- 请求示例
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{ "columns": ["a", "b", "c"], "data ": [[1, 2, 3], [4, 5, 6]] }'
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[{"a": 1,"b": 2,"c": 3}, {"a": 4,"b": 5,"c": 6}
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"instances": [{"a": "s1", "b": 1, c": [1, 2, 3]}, {"a": "s2", "b": 2, "c": [4, 5, 6]}, {"a": "s3", "b": 3, "c": [7, 8, 9]}]}'
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"inputs": {"a": ["s1", "s2", "s3 "], "b": [1, 2, 3], "c": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}}']'
- 在 ML 服务器中提供(实验阶段)
- 使用 ML 服务器提供 MLflow 模型
mlflow models serve -m my_model --enable-mlserver
- 生成使用 ML 服务器生成的医生映像
mlflow models build -m my_model --enable-mlserver -n my-model
目录配置
my_model/
├── MLmodel
├── model.pkl
├── conda.yaml
├── python_env.yaml
└── requirements.txt
MLmodel 文件
time_created: 2018-05-25T17:28:53.35
flavors:
sklearn:
sklearn_version: 0.19.1
pickled_model: model.pkl
python_function:
loader_module: mlflow.sklearn
flavors
基于列
signature:
inputs: '[{"name": "sepal length (cm)", "type": "double"}, {"name": "sepal width
(cm)", "type": "double"}, {"name": "petal length (cm)", "type": "double"}, {"name":
"petal width (cm)", "type": "double"}]'
outputs: '[{"type": "integer"}]'
张量底座
signature:
inputs: '[{"name": "images", "dtype": "uint8", "shape": [-1, 28, 28, 1]}]'
outputs: '[{"shape": [-1, 10], "dtype": "float32"}]'
带签名的日志记录
基于列
import pandas as pd
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
iris = datasets.load_iris()
iris_train = pd. DataFrame(iris.data, columns=iris.feature_names)
clf = RandomForestClassifier(max_depth=7, random_state=0)
clf.fit(iris_train, iris.target)
signature = infer_signature(iris_train, clf.predict(iris_train))
mlflow.sklearn.log_model(clf, "iris_rf", signature=signature)
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec
input_schema = Schema([
ColSpec("double", "sepal length (cm)"),
ColSpec("double", "sepal width (cm)"),
ColSpec("double", "petal length (cm)"),
ColSpec("double", "petal width (cm)"),
])
output_schema = Schema([ColSpec("long")])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)
张量底座
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from keras.optimizers import SGD
import mlflow
import mlflow.keras
from mlflow.models.signature import infer_signature
(train_X, train_Y), (test_X, test_Y) = mnist.load_data()
trainX = train_X.reshape((train_X.shape[0], 28, 28, 1))
testX = test_X.reshape((test_X.shape[0], 28, 28, 1))
trainY = to_categorical(train_Y)
testY = to_categorical(test_Y)
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(trainX, trainY, epochs=10, batch_size=32, validation_data=(testX, testY))
signature = infer_signature(testX, model.predict(testX))
mlflow.keras.log_model(model, "mnist_cnn", signature=signature)
import numpy as np
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
input_schema = Schema([
TensorSpec(np.dtype(np.uint8), (-1, 28, 28, 1)),
])
output_schema = Schema([TensorSpec(np.dtype(np.float32), (-1, 10))])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)
模型输入
基于列
input_example = {
"sepal length (cm)": 5.1,
"sepal width (cm)": 3.5,
"petal length (cm)": 1.4,
"petal width (cm)": 0.2
}
mlflow.sklearn.log_model(..., input_example=input_example)
张量底座
# each input has shape (4, 4)
input_example = np.array([
[[ 0, 0, 0, 0],
[ 0, 134, 25, 56],
[253, 242, 195, 6],
[ 0, 93, 82, 82]],
[[ 0, 23, 46, 0],
[ 33, 13, 36, 166],
[ 76, 75, 0, 255],
[ 33, 44, 11, 82]]
], dtype=np.uint8)
mlflow.keras.log_model(..., input_example=input_example)