Deploy your machine learning model as an API: Flask tutorial

You trained a machine learning model that works great on your laptop. But how do other people or applications actually use it? A model sitting in a notebook file is useless for real applications. You need to deploy it as a service that accepts requests and returns predictions.

Deploying machine learning models as APIs transforms them from experiments into production systems that solve real problems. An API or Application Programming Interface lets any application send data to your model and receive predictions back. This is how production ML systems work in companies around the world.

Deploying your machine learning model as an API using Flask turns your trained model into a web service. Flask is a lightweight Python web framework perfect for serving ML predictions. By the end of this tutorial, you’ll have a working API that accepts requests over HTTP and returns predictions from your model.

Why deployment matters for ML models

Training a model is only half the work. The model becomes valuable when people can actually use it. Deployment bridges the gap between a trained model and practical application.

Without deployment, your model lives in a notebook or script that only you can run. Nobody else benefits from the work you did. Deployment makes your model accessible to web applications, mobile apps, other services, or anyone with internet access.

APIs are the standard way to deploy ML models. They provide a clean interface where you send input data and receive predictions. The API handles loading the model, preprocessing inputs, making predictions, and formatting outputs. Users don’t need to know Python or understand your model internals.

Real world ML systems almost always use APIs. A mobile app sends an image to an API for classification. A website sends text to an API for sentiment analysis. A business application sends customer data to an API for churn prediction. APIs make ML accessible.

Setting up your Flask environment

Flask is a micro web framework for Python that’s perfect for simple APIs. It’s lightweight, easy to learn, and requires minimal boilerplate code.

Install Flask and any libraries your model needs. If you built a spam classifier, you need scikit-learn. If you built an image classifier, you need TensorFlow or PyTorch.

# Install required packages
# pip install flask numpy scikit-learn joblib

# Verify installation
import flask
print(f"Flask version: {flask.__version__}")

Create a project directory with a clear structure. Put your saved model file, the Flask app, and any helper code in organized folders. Good structure makes maintenance easier.

ml_api/
├── models/
│   └── spam_classifier.pkl
├── app.py
├── requirements.txt
└── test_api.py

The requirements.txt file lists all packages your API needs. This makes it easy for others to install dependencies and ensures consistency across environments.

Flask==2.3.0
numpy==1.24.0
scikit-learn==1.3.0
joblib==1.3.0

Creating a basic Flask API

Start with a minimal Flask app that responds to requests. The basic structure includes importing Flask, creating an app instance, and defining routes that handle requests.

from flask import Flask, request, jsonify
import joblib
import numpy as np

# Create Flask app
app = Flask(__name__)

# Load your trained model
model = joblib.load('models/spam_classifier.pkl')
vectorizer = joblib.load('models/tfidf_vectorizer.pkl')

# Health check endpoint
@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy', 'model': 'loaded'})

# Prediction endpoint
@app.route('/predict', methods=['POST'])
def predict():
    try:
        # Get data from request
        data = request.get_json()
        message = data.get('message', '')
        
        if not message:
            return jsonify({'error': 'No message provided'}), 400
        
        # Preprocess and predict
        message_vector = vectorizer.transform([message])
        prediction = model.predict(message_vector)[0]
        probability = model.predict_proba(message_vector)[0]
        
        # Format response
        result = {
            'message': message,
            'prediction': 'spam' if prediction == 1 else 'ham',
            'confidence': float(max(probability))
        }
        
        return jsonify(result)
    
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

This creates two endpoints. The health endpoint confirms the API is running and the model loaded successfully. The predict endpoint accepts POST requests with message data and returns predictions.

The try-except block handles errors gracefully. If something goes wrong, return a proper error message instead of crashing. The 400 status code indicates bad request. The 500 status code indicates server error.

Testing your API locally

Before deploying to the cloud, test your API locally to make sure it works. Run the Flask app and it starts a development server on port 5000.

python app.py

The server runs and waits for requests. In another terminal or using a tool like curl or Postman, send test requests.

# test_api.py
import requests
import json

# API endpoint
url = 'http://localhost:5000/predict'

# Test messages
test_messages = [
    "Hey, are we still meeting for lunch?",
    "CONGRATULATIONS! You've won a FREE iPhone! Click now!",
    "Can you send me that report?"
]

# Send requests
for message in test_messages:
    response = requests.post(
        url,
        json={'message': message},
        headers={'Content-Type': 'application/json'}
    )
    
    result = response.json()
    print(f"\nMessage: {message}")
    print(f"Prediction: {result['prediction']}")
    print(f"Confidence: {result['confidence']:.2%}")

This script sends POST requests to your API and prints the responses. Verify that predictions make sense and the API handles different inputs correctly.

Test error cases too. Send empty messages, malformed JSON, or requests without the message field. Your API should return appropriate error messages instead of crashing.

Handling input validation and errors

Production APIs need robust input validation. Users will send unexpected data, either accidentally or maliciously. Your API must handle this gracefully.

Validate that required fields exist. Check data types are correct. Ensure values are within acceptable ranges. Return clear error messages when validation fails.

@app.route('/predict', methods=['POST'])
def predict():
    # Check content type
    if not request.is_json:
        return jsonify({'error': 'Content-Type must be application/json'}), 400
    
    data = request.get_json()
    
    # Validate required fields
    if 'message' not in data:
        return jsonify({'error': 'Missing required field: message'}), 400
    
    message = data['message']
    
    # Validate data type
    if not isinstance(message, str):
        return jsonify({'error': 'Message must be a string'}), 400
    
    # Validate length
    if len(message) == 0:
        return jsonify({'error': 'Message cannot be empty'}), 400
    
    if len(message) > 5000:
        return jsonify({'error': 'Message too long (max 5000 characters)'}), 400
    
    # Process request...

Comprehensive validation prevents crashes and provides helpful feedback to API users. Clear error messages save debugging time.

Deploying to the cloud

Local testing confirms your API works. Now deploy it so others can access it. Several platforms make deployment straightforward.

Render, Railway, and Heroku offer free tiers perfect for ML APIs. They handle infrastructure so you focus on your application. Deployment typically involves pushing your code to Git and configuring the platform.

For Render deployment, create a render.yaml file specifying how to build and run your app.

services:
  - type: web
    name: ml-api
    env: python
    buildCommand: pip install -r requirements.txt
    startCommand: gunicorn app:app

Gunicorn is a production WSGI server that replaces Flask’s development server. It handles concurrent requests efficiently.

Add gunicorn to requirements.txt and modify your app to work with it.

# At the end of app.py, change to:
if __name__ == '__main__':
    app.run()

Docker provides another deployment option that packages your app and dependencies into a container. This ensures consistency across environments.

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 5000

CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Build the Docker image and run it locally or deploy to any cloud platform that supports containers.

Security and production considerations

Production APIs need security measures. Add authentication to prevent unauthorized access. Rate limiting stops abuse. HTTPS encrypts data in transit.

Simple API key authentication works for many use cases. Clients include an API key in request headers. The server validates the key before processing requests.

import os

API_KEY = os.environ.get('API_KEY', 'your-secret-key')

def require_api_key(f):
    def decorated_function(*args, **kwargs):
        key = request.headers.get('X-API-Key')
        if key != API_KEY:
            return jsonify({'error': 'Invalid API key'}), 401
        return f(*args, **kwargs)
    decorated_function.__name__ = f.__name__
    return decorated_function

@app.route('/predict', methods=['POST'])
@require_api_key
def predict():
    # Your prediction code...

Store the API key as an environment variable, never in your code. Different environments can use different keys.

Monitor your deployed API to catch issues early. Log requests, track response times, and monitor error rates. Set up alerts for unusual activity or performance degradation.

Consider caching predictions for identical inputs. If multiple users send the same request, return the cached result instead of running inference again. This reduces compute costs and improves response times.

Making your API production ready

Production APIs should include documentation so users know how to use them. Document available endpoints, required parameters, response formats, and error codes.

Add versioning to your API URLs like /v1/predict. This lets you make breaking changes without disrupting existing users. New versions get new URLs.

Implement logging to track requests and debug issues. Log inputs, outputs, errors, and timing information. This helps diagnose problems in production.

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    logger.info(f"Received prediction request")
    # Process request...
    logger.info(f"Prediction: {result['prediction']}, Confidence: {result['confidence']}")
    return jsonify(result)

Deploy your machine learning model as an API makes your work accessible and useful. The Flask framework provides a simple way to expose models as web services. With proper validation, error handling, and deployment, your model becomes a production system that others can rely on.

Ready to understand how to properly evaluate whether your deployed models are actually working well? Check out our guide on model evaluation and testing to learn comprehensive techniques for measuring ML performance beyond basic accuracy.