User Guide

Django River ML allows you to easily deploy river online machine learning for a Django project. It is based off of chantilly with hopes of having similar design. We include example clients and a test application in tests. We are excited about what you might build with this, and please give us a ping. if you have a question, find a bug, or want to request a feature! This is an open source project and we are eager for your contribution. 🎉️

Quick Start

Once you have django-river-ml installed (Installation) you can do basic setup.

Setup

Add it to your INSTALLED_APPS along with rest_framework

INSTALLED_APPS = (
    ...
    'django_river_ml',
    'rest_framework',
    ...
)

Add django-river-ml’s URL patterns:

from django_river_ml import urls as django_river_urls
urlpatterns = [
    ...
    url(r'^', include(django_river_urls)),
    ...
]

If you use something like Django Rest Swagger for yur API documentation, registering the django_river_urls alongside your app should render the endpoints nicely in the user interface! E.g., to extend the above, we might have a set of API views (to show up in docs) and server views (to not show up):

from django_river_ml import urls as django_river_urls

from rest_framework_swagger.views import get_swagger_view
schema_view = get_swagger_view(title="Spack Monitor API")

server_views = [
    url(r"^api/docs/", schema_view, name="docs"),
]

urlpatterns = [
    ...
    path("", include("django_river_ml.urls", namespace="django_river_ml")),
    url(r"^", include((server_views, "api"), namespace="internal_apis")),
    ...
]

And this will render the Django River ML API alongside your other API prefixes. For example, here is Django River ML deployed under “ml”:

An example of Django River ML with ``URL_PREFIX`` "ml" showing up in the API docs

Settings

It is highly recommended that you minimally set these settings in your app settings.py and do not use the default of the plugin:

Title

Name

Default

Description

Example

APP_DIR

$root

Base directory for storing secrets and cache. Defaults to the root of the module installation (recommended to change)

os.path.dirname(os.path.abspath(__file__))

SHELVE_SECRET_KEY

None

Secret key for shelve (if STORAGE_BACKEND set to “shelve” (will be generated if not found)

fgayudsiushfdsfdf

JWT_SECRET_KEY

None

Secret key for json web tokens (if authentication enabled, and will be generated if not found)

fgayudsiushfdsfdf

The following additonal settings are available to set in your settings.py:

Title

Name

Default

Description

Example

URL_PREFIX

api

The api prefix to use for the endpoint

river

STORAGE_BACKEND

shelve

The storage backend to use, either shelve or redis (requires redis setup)

redis

REDIS_DB

river-redis

The redis database name, only used if STORAGE_BACKEND is set to redis

another-name

REDIS_HOST

localhost

The redis host name, only used if STORAGE_BACKEND is set to redis

redis-container

REDIS_PORT

6379

The redis port, only used if STORAGE_BACKEND is set to redis

1111

CACHE_DIR

None (and then is set to os.path.join(APP_DIR, "cache"))

The cache directory for tokens, recommended to set a custom APP_DIR and it will be a sub-directory cache there

/opt/cache

GENERATE_IDENTIFIERS

True

Always generate identifiers for predictions. If False, you can still provide an identifier to the predict endpoint to use.

True

DISABLE_AUTHENTICATION

True

For views that require authentication, disable them.

True

DOMAIN_URL

http://127.0.0.1:8000

Domain used in templates, and api prefix

https://ml-server

SESSION_EXPIRES_SECONDS

600

The number of seconds a session (upload request) is valid (10 minutes)

6000

TOKEN_EXPIRES_SECONDS

600

The number of seconds a token is valid (10 minutes)

6000

VIEW_RATE_LIMIT

10000/1day

View rate limit using django-ratelimit based on ipaddress

100/1day

VIEW_RATE_LIMIT_BLOCK

True

Given that someone goes over, are they blocked for a period?

False

VIEW_RATE_LIMIT_DISABLE

True

Globally disable rate limiting (ideal for development or for a heavily used learning server)

False

API_VIEWS_ENABLED

[]

Provide a list of view names (strings) to enable. If not set (empty list or None) all are enabled.

['predict', 'service_info', 'metrics', 'models', 'model_download', 'stats']

Custom Models

Django River ML has support for custom models, where a custom model is one you’ve defined in your application to use with river. In order for this to work, you will need to define your model somewhere in your app so it is importable across Django apps (e.g., and when Django River ML tries to unpickle a model object of that type, it will be found). If needed, we can further define a custom set of classes in settings that can be looked for via importlib, however the simple approch to define it in your app or otherwise install a module that makes it importable is suggested.

Custom models currently support stats but not metrics, and metrics could be supported if we think about how to go about it. the CustomModel flavor is designed to be mostly forgiving to allow you to choose any prediction function you might have, and we can extend this if needed.

API Views Enabled

If you want to disable some views, you can set of a list of views to enable using API_VIEWS_ENABLED. As an example, let’s say we are going to have learning done internally, and we just want to expose metadata and prediction endpoints. We could do:

DJANGO_RIVER_ML = {
    ...
    # Only allow these API views
    "API_VIEWS_ENABLED": ['predict', 'service_info', 'metrics', 'models', 'model_download', 'stats']
    ...
}

The views to choose from include:

  • auth_token

  • service_info

  • learn

  • predict

  • label

  • metrics

  • stream_metrics

  • stream_events

  • stats

  • model_download

  • model

  • models

Note that “model” includes most interactions to create or get a model.

For more advanced settings like customizing the endpoints with authentication, see the settings.py in the application.

Authentication

If you have DISABLE_AUTHENTICATION set to true, or you customize the settings AUTHENTICATED_VIEWS to change the defaults, then you shouldn’t need to do any kind of authentication. This might be ideal for a development or closed environment that is only accessible to you or your team. However, for most cases you are strongly encouraged to have authentication. Authentication will require creating a user, to which Django River ML will add a token generated by Django Restful, if not already generated. For purposes of example, we can quickly create a user as follows:

python manage.py createsuperuser
Username (leave blank to use 'dinosaur'):
Email address:
Password:
Password (again):
Superuser created successfully.

And at this point, you can also ask for the token.

python manage.py get_token dinosaur
Enter Password:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

You can then export this token in the environment to be found by the river api client.

export RIVER_ML_USER=dinosaur
export RIVER_ML_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Otherwise you will need to manually go through a standard OAuth2 workflow of using basic auth to look for a 401 response with a Www-Authenticate header, parsing that to find the “realm” (the authentication server) and then making a request to that endpoint with the base64 encoded user and token in the Authenticate header. It’s much easier to use the client to do it for you, which will cache your token (until it expires and you need to request a new one automatically).

Of course if you have a Django interface with OAuth for login, you can make a settings or profile page to easily retrieve the same token. Open an issue if you need guidance to do this. We might consider adding a front-end view to provide by default if it’s desired.

Sample Application

An example app is provided that you can use to test. Once you have your environment setup, you can do:

$ python manage.py makemigrations
$ python manage.py migrate
$ python manage.py runserver

In another terminal, you can then run a sample script:

$ python examples/regression/run.py
$ python examples/binary/run.py
$ python examples/multiclass/run.py
$ python examples/cluster/run.py
$ python examples/custom/run.py

Note that creme models are supported, but with minimum functionalty (no metrics) and you must install creme on your own.

$ python examples/creme/run.py

Testing

Running tests with the example server is also fairly easy!

python runtests.py

Interaction from Inside your Application

While a user is going to be interacting with your API endpoints, you might want to interact with models from within your Django application. This is possible by interacting with the DjangoClient directly, which wraps the database and is exposed to the API via a light additional wrapper. The following examples can walk you through the different kinds of interactions. For all interactions, you’ll wait to create a client first:

from django_river_ml.client import DjangoClient
client = DjangoClient()

This also means you can choose to not add the Django River URLs to your application and have more control over your ML interactions internally. You can also take an approach between those two extremes, and just expose a limited set using the API_VIEWS_ENABLED setting. When empty, all views are enabled. Otherwise, you can set a custom set of names to enable (and those not in the list will not be enabled).

Note

We are planning to allow retrieving a model from version control, e.g., given that you are starting a Kubernetes run of Spack Monitor (which will need to use redis to store model information) and the model is not found.

Get Models

To get models, simply ask for them! This will return a list of names to interact with further.

client.models()
['milky-despacito', 'tart-bicycle']

Create a Model

If you want your server to create some initial model (as opposed to allowing users to upload them) you can easily do that:

from river import feature_extraction, cluster
model_name = "peanut-butter-clusters"
model = feature_extraction.BagOfWords() | cluster.KMeans(
    n_clusters=100, halflife=0.4, sigma=3, seed=0
)
model_name = client.add_model(model, "cluster", name=model_name)

You could also do the same from file, meaning a pickled model you’ve created elsewhere.

import pickle
model_name = "peanut-butter-clusters"
with open('%s.pkl' % model_name, 'rb') as fd:
    model = pickle.load(fd)
model_name = client.add_model(model, "cluster", name=model_name)

Or if your application has automatically created an “empty” model with your desired name (the default when enabled), you can delete it first and do the same.

client.delete_model(model_name)
import pickle
with open('%s.pkl' % model_name, 'rb') as fd:
    model pickle.load(fd)
model_name = client.add_model(model, "cluster", name=model_name)

Delete a Model

You can delete a model by name.

client.delete_model("tart-bicycle")
True

Get Model

Of course to retrieve the model directly, you can do:

model = client.get_model("spack-errors")

Keep in mind you are holding onto the model in memory and will need to again save it.

client.save_model(model, "spack-errors")

Note that if you are doing standard predict or learn endpoints, you can simply provide the model name and you don’t need to worry about retrieving or saving (the function handles it for you!). If, however, you want to save a model to file (pickle to be specific) you can do:

client.save_picke("spack-errors", "spack-errors.pkl")

You can also load a model (this only loads a pickle and does not save anything to a database):

model = client.load_model("spack-errors.pkl")

Stats and Metrics for a Model

The one drawback of uploading a model is that you won’t have stats or metrics populated. You simply cannot, given that the learning happened elsewhere, and is normally done on the server to get these metrics. However, once you’ve done more learning you can ask for stats or metrics:

client.stats("spack-errors")
client.metrics("spack-errors")

Interactive Learn

You’ll generally need to provide the model name and features, although if you provide an identifier these things can be looked up.

client.learn(
    model_name,
    ground_truth=ground_truth,
    prediction=prediction,
    features=features,
    identifier=identifier,
)

Interactive Label

To do a labeling (post learn) you’ll need to provide the label, the identifier, and the model name.

client.label(label, identifier, model_name)

Interactive Predict

A prediction usually needs the model name and features, and optionally an identifier.

client.predict(features, model_name, identifier)

The remainder of functions on the client are used internally and you should not need to call them directly. This library is under development and we will have more endpoints and functionality coming soon!