Organize your machine learning projects into your MLflowProjects

1/23/2023

MLflowProjects

What are MLFlowProjects?

Rules for organizing and writing code.

MLFlowProjects Introduction Flow

Choose your environment
Describe the environment configuration file
Run the project

Learn more

Describe your project in more detail by adding files that are YAML-formatted text files
Conda, Virtualenv, Docker containers are supported
Conda environment
- Supports CuDNN, Intel MKL therapy
- You can specify the Conda environment for an MLflow project by including the file in the root of the conda.yaml project directory or by including an conda_env entry in the file
Virtualenv environment
- Download Python with pyenv
- Enable virtualenv as execution environment
- You can specify the Virtualenv environment for your MLflow project by including an entry in the python_env file
Docker container environment
- Using Docker containers,
You can use command line tools or Python API to run any project
MLproject files can be added to the root directory for further configuration
Specifying the Environment
- Conda Environment
- conda_env: files/config/conda_environment.yaml
- VIrtualenv 環境
- python_env: files/config/python_env.yaml
- Docker container environment
- docker_env:
  image: mlflow-docker-example-environment
Command syntax
Specifying Parameters
- You can specify the data type and default value for each parameter
- parameter_name: data_type
- data_type
  - string
  - fload
  - path
  - uri
Running the Project
- MLflow run from the command line tool
- Options
  - -e, –entry-point
  - -v, –version
  - -P, –param-list
  - -A, –docker-args
  - –experiment-name
  - –experiment-id
  - -b, –backend
  - -c, –backend-config
  - –no-conda
  - –env-manager
  - –storage-dir
  - –run-id
  - –run-name
  - –skip-image-build
- Python API to mlflow.projects.run()
There is also a way to run MLflow projects in Kubernetes (experimental)
You can run MLflow projects in a Docker environment on Kubernetes
API may change due to experimental stage
kubernetes_backend.jsonkubernetes_job_template.yaml
Using
- Run MLflow projects in Kubernetes
- MLflow builds a new Docker image with a project
- MLflow pushes new project images to the specified Docker registry
- Start a Kubernetes job on the specified Kubernetes cluster
Execution Guide
- If you don't already have a Docker environment, add it to your MLflow project
- Create a backend configuration JSON file
- Get credentials to access your project's Docker and Kubernetes resources
- Run the project using the MLflow project CLI or by specifying the project URI and path to the backend configuration file

Example of setting description for each environment

Conda Environment

conda_environment.yaml

name: My Project

conda_env: my_env.yaml
# Can have a docker_env instead of a conda_env, e.g.
# docker_env:
#    image:  mlflow-docker-example

entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"

VIrtualenv 環境

python_env.yaml

python: "3.7.13"
# Dependencies required to build packages. This field is optional.
build_dependencies:
  - pip
  - setuptools
  - wheel==0.37.1
# Dependencies required to run the project.
dependencies:
  - mlflow
  - scikit-learn==1.0.2

Docker container environment

Example 1: Image

docker_env:
  image: mlflow-docker-example-environment

Example 2: Mounting a Volume and Specifying Environment Variables

docker_env:
  image: mlflow-docker-example-environment
  volumes: ["/local/path:/container/mount/path"]
  environment: [["NEW_ENV_VAR", "new_var_value"], "VAR_TO_COPY_FROM_HOST_ENVIRONMENT"]

Example 3: Image in a remote registry

docker_env:
  image: 012345678910.dkr.ecr.us-west-2.amazonaws.com/mlflow-docker-example-environment:7.0

Example 4: Using a prebuilt image

docker_env:
  image: python:3.7

↓

mlflow run ... --skip-image-build

Example 5: Using Kubenetes

apiVersion: batch/v1
kind: Job
metadata:
  name: "{replaced with MLflow Project name}"
  namespace: mlflow
spec:
  ttlSecondsAfterFinished: 100
  backoffLimit: 0
  template:
    spec:
      containers:
      - name: "{replaced with MLflow Project name}"
        image: "{replaced with URI of Docker image created during Project execution}"
        command: ["{replaced with MLflow Project entry point command}"]
        env: ["{appended with MLFLOW_TRACKING_URI, MLFLOW_RUN_ID and MLFLOW_EXPERIMENT_ID}"]
        resources:
          limits:
            memory: 512Mi
          requests:
            memory: 256Mi
      restartPolicy: Never