.. _getting_started-user-guide: ========== User Guide ========== Django River ML allows you to easily deploy `river `_ online machine learning for a `Django `_ project. It is based off of `chantilly `_ with hopes of having similar design. We include `example clients `_ and a test application in `tests `_. We are excited about what you might build with this, and please `give us a ping `_. if you have a question, find a bug, or want to request a feature! This is an open source project and we are eager for your contribution. 🎉️ Quick Start =========== Once you have ``django-river-ml`` installed (:ref:`getting_started-installation`) you can do basic setup. Setup ----- Add it to your ``INSTALLED_APPS`` along with ``rest_framework`` .. code-block:: python INSTALLED_APPS = ( ... 'django_river_ml', 'rest_framework', ... ) Add django-river-ml's URL patterns: .. code-block:: python from django_river_ml import urls as django_river_urls urlpatterns = [ ... url(r'^', include(django_river_urls)), ... ] If you use something like `Django Rest Swagger `_ for yur API documentation, registering the ``django_river_urls`` alongside your app should render the endpoints nicely in the user interface! E.g., to extend the above, we might have a set of API views (to show up in docs) and server views (to not show up): .. code-block:: python from django_river_ml import urls as django_river_urls from rest_framework_swagger.views import get_swagger_view schema_view = get_swagger_view(title="Spack Monitor API") server_views = [ url(r"^api/docs/", schema_view, name="docs"), ] urlpatterns = [ ... path("", include("django_river_ml.urls", namespace="django_river_ml")), url(r"^", include((server_views, "api"), namespace="internal_apis")), ... ] And this will render the Django River ML API alongside your other API prefixes. For example, here is Django River ML deployed under "ml": .. image:: img/schema-view.png :alt: An example of Django River ML with ``URL_PREFIX`` "ml" showing up in the API docs Settings -------- It is highly recommended that you minimally set these settings in your app settings.py and do not use the default of the plugin: .. list-table:: Title :widths: 25 25 25 25 :header-rows: 1 * - Name - Default - Description - Example * - ``APP_DIR`` - $root - Base directory for storing secrets and cache. Defaults to the root of the module installation (recommended to change) - ``os.path.dirname(os.path.abspath(__file__))`` * - ``SHELVE_SECRET_KEY`` - None - Secret key for shelve (if ``STORAGE_BACKEND`` set to "shelve" (will be generated if not found) - fgayudsiushfdsfdf * - ``JWT_SECRET_KEY`` - None - Secret key for json web tokens (if authentication enabled, and will be generated if not found) - fgayudsiushfdsfdf The following additonal settings are available to set in your ``settings.py``: .. list-table:: Title :widths: 25 25 25 25 :header-rows: 1 * - Name - Default - Description - Example * - ``URL_PREFIX`` - api - The api prefix to use for the endpoint - river * - ``STORAGE_BACKEND`` - shelve - The storage backend to use, either shelve or redis (requires redis setup) - redis * - ``REDIS_DB`` - river-redis - The redis database name, only used if ``STORAGE_BACKEND`` is set to redis - another-name * - ``REDIS_HOST`` - localhost - The redis host name, only used if ``STORAGE_BACKEND`` is set to redis - redis-container * - ``REDIS_PORT`` - 6379 - The redis port, only used if ``STORAGE_BACKEND`` is set to redis - 1111 * - ``CACHE_DIR`` - None (and then is set to ``os.path.join(APP_DIR, "cache")``) - The cache directory for tokens, recommended to set a custom ``APP_DIR`` and it will be a sub-directory ``cache`` there - /opt/cache * - ``GENERATE_IDENTIFIERS`` - True - Always generate identifiers for predictions. If False, you can still provide an identifier to the predict endpoint to use. - True * - ``DISABLE_AUTHENTICATION`` - True - For views that require authentication, disable them. - True * - ``DOMAIN_URL`` - http://127.0.0.1:8000 - Domain used in templates, and api prefix - https://ml-server * - ``SESSION_EXPIRES_SECONDS`` - 600 - The number of seconds a session (upload request) is valid (10 minutes) - 6000 * - ``TOKEN_EXPIRES_SECONDS`` - 600 - The number of seconds a token is valid (10 minutes) - 6000 * - ``VIEW_RATE_LIMIT`` - 10000/1day - View rate limit using django-ratelimit based on ipaddress - 100/1day * - ``VIEW_RATE_LIMIT_BLOCK`` - True - Given that someone goes over, are they blocked for a period? - False * - ``VIEW_RATE_LIMIT_DISABLE`` - True - Globally disable rate limiting (ideal for development or for a heavily used learning server) - False * - ``API_VIEWS_ENABLED`` - ``[]`` - Provide a list of view names (strings) to enable. If not set (empty list or None) all are enabled. - ``['predict', 'service_info', 'metrics', 'models', 'model_download', 'stats']`` Custom Models ^^^^^^^^^^^^^ Django River ML has support for custom models, where a custom model is one you've defined in your application to use with river. In order for this to work, you will need to define your model somewhere in your app so it is importable across Django apps (e.g., and when Django River ML tries to unpickle a model object of that type, it will be found). If needed, we can further define a custom set of classes in settings that can be looked for via importlib, however the simple approch to define it in your app or otherwise install a module that makes it importable is suggested. Custom models currently support stats but not metrics, and metrics could be supported if we think about how to go about it. the ``CustomModel`` flavor is designed to be mostly forgiving to allow you to choose any prediction function you might have, and we can extend this if needed. API Views Enabled ^^^^^^^^^^^^^^^^^ If you want to disable some views, you can set of a list of views to enable using ``API_VIEWS_ENABLED``. As an example, let's say we are going to have learning done internally, and we just want to expose metadata and prediction endpoints. We could do: .. code-block:: python DJANGO_RIVER_ML = { ... # Only allow these API views "API_VIEWS_ENABLED": ['predict', 'service_info', 'metrics', 'models', 'model_download', 'stats'] ... } The views to choose from include: - auth_token - service_info - learn - predict - label - metrics - stream_metrics - stream_events - stats - model_download - model - models Note that "model" includes most interactions to create or get a model. For more advanced settings like customizing the endpoints with authentication, see the `settings.py `_ in the application. Authentication -------------- If you have ``DISABLE_AUTHENTICATION`` set to true, or you customize the settings ``AUTHENTICATED_VIEWS`` to change the defaults, then you shouldn't need to do any kind of authentication. This might be ideal for a development or closed environment that is only accessible to you or your team. However, for most cases you are strongly encouraged to have authentication. Authentication will require creating a user, to which Django River ML will add a token generated by Django Restful, if not already generated. For purposes of example, we can quickly create a user as follows: .. code-block:: console python manage.py createsuperuser Username (leave blank to use 'dinosaur'): Email address: Password: Password (again): Superuser created successfully. And at this point, you can also ask for the token. .. code-block:: console python manage.py get_token dinosaur Enter Password: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx You can then export this token in the environment to be found by the `river api client `_. .. code-block:: console export RIVER_ML_USER=dinosaur export RIVER_ML_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Otherwise you will need to manually go through a standard OAuth2 workflow of using basic auth to look for a 401 response with a ``Www-Authenticate`` header, parsing that to find the "realm" (the authentication server) and then making a request to that endpoint with the base64 encoded user and token in the Authenticate header. It's much easier to use the client to do it for you, which will cache your token (until it expires and you need to request a new one automatically). Of course if you have a Django interface with OAuth for login, you can make a settings or profile page to easily retrieve the same token. Open an issue if you need guidance to do this. We might consider adding a front-end view to provide by default if it's desired. Sample Application ------------------ An `example app `_ is provided that you can use to test. Once you have your environment setup, you can do: .. code-block:: console $ python manage.py makemigrations $ python manage.py migrate $ python manage.py runserver In another terminal, you can then run a sample script: .. code-block:: console $ python examples/regression/run.py $ python examples/binary/run.py $ python examples/multiclass/run.py $ python examples/cluster/run.py $ python examples/custom/run.py Note that creme models are supported, but with minimum functionalty (no metrics) and you must install ``creme`` on your own. .. code-block:: console $ python examples/creme/run.py Testing ------- Running tests with the example server is also fairly easy! .. code-block:: console python runtests.py Interaction from Inside your Application ---------------------------------------- While a user is going to be interacting with your API endpoints, you might want to interact with models from within your Django application. This is possible by interacting with the ``DjangoClient`` directly, which wraps the database and is exposed to the API via a light additional wrapper. The following examples can walk you through the different kinds of interactions. For all interactions, you'll wait to create a client first: .. code-block:: python from django_river_ml.client import DjangoClient client = DjangoClient() This also means you can choose to not add the Django River URLs to your application and have more control over your ML interactions internally. You can also take an approach between those two extremes, and just expose a limited set using the ``API_VIEWS_ENABLED`` setting. When empty, all views are enabled. Otherwise, you can set a custom set of names to enable (and those not in the list will not be enabled). .. note:: We are planning to allow retrieving a model from version control, e.g., given that you are starting a Kubernetes run of Spack Monitor (which will need to use redis to store model information) and the model is not found. Get Models ^^^^^^^^^^ To get models, simply ask for them! This will return a list of names to interact with further. .. code-block:: python client.models() ['milky-despacito', 'tart-bicycle'] Create a Model ^^^^^^^^^^^^^^ If you want your server to create some initial model (as opposed to allowing users to upload them) you can easily do that: .. code-block:: python from river import feature_extraction, cluster model_name = "peanut-butter-clusters" model = feature_extraction.BagOfWords() | cluster.KMeans( n_clusters=100, halflife=0.4, sigma=3, seed=0 ) model_name = client.add_model(model, "cluster", name=model_name) You could also do the same from file, meaning a pickled model you've created elsewhere. .. code-block:: python import pickle model_name = "peanut-butter-clusters" with open('%s.pkl' % model_name, 'rb') as fd: model = pickle.load(fd) model_name = client.add_model(model, "cluster", name=model_name) Or if your application has automatically created an "empty" model with your desired name (the default when enabled), you can delete it first and do the same. .. code-block:: python client.delete_model(model_name) import pickle with open('%s.pkl' % model_name, 'rb') as fd: model pickle.load(fd) model_name = client.add_model(model, "cluster", name=model_name) Delete a Model ^^^^^^^^^^^^^^ You can delete a model by name. .. code-block:: python client.delete_model("tart-bicycle") True Get Model ^^^^^^^^^ Of course to retrieve the model directly, you can do: .. code-block:: python model = client.get_model("spack-errors") Keep in mind you are holding onto the model in memory and will need to again save it. .. code-block:: python client.save_model(model, "spack-errors") Note that if you are doing standard predict or learn endpoints, you can simply provide the model name and you don't need to worry about retrieving or saving (the function handles it for you!). If, however, you want to save a model to file (pickle to be specific) you can do: .. code-block:: python client.save_picke("spack-errors", "spack-errors.pkl") You can also load a model (this only loads a pickle and does not save anything to a database): .. code-block:: python model = client.load_model("spack-errors.pkl") Stats and Metrics for a Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The one drawback of uploading a model is that you won't have stats or metrics populated. You simply cannot, given that the learning happened elsewhere, and is normally done on the server to get these metrics. However, once you've done more learning you can ask for stats or metrics: .. code-block:: python client.stats("spack-errors") client.metrics("spack-errors") Interactive Learn ^^^^^^^^^^^^^^^^^ You'll generally need to provide the model name and features, although if you provide an identifier these things can be looked up. .. code-block:: python client.learn( model_name, ground_truth=ground_truth, prediction=prediction, features=features, identifier=identifier, ) Interactive Label ^^^^^^^^^^^^^^^^^ To do a labeling (post learn) you'll need to provide the label, the identifier, and the model name. .. code-block:: python client.label(label, identifier, model_name) Interactive Predict ^^^^^^^^^^^^^^^^^^^ A prediction usually needs the model name and features, and optionally an identifier. .. code-block:: python client.predict(features, model_name, identifier) The remainder of functions on the client are used internally and you should not need to call them directly. This library is under development and we will have more endpoints and functionality coming soon!