Monday, September 16, 2019

Writing Production code for Machine learning deployment

Overview

Most likely, you would have your ML pipeline code for the research environment in tools like Jupyter Notebook.

So we need to code in production for:
  Create and transform features.
  Incorporate the feature selection.
  Build ml models.
  Score new data.




There are three main ways for writing ML pipeline in production.

 Procedural Programming - Sequence of functions like Jupyter notebooks.
 Custom pipeline code - OOPS way that calls the procedures in order.
 Third party pipeline code - OOPS way that calls the procedures in order of third party. eg; scikit learn

Procedural Programming

 In Procedural Programming, procedures, also known as routines, subroutines or functions, are carried out as a series of computational steps.

Here is refers to writing the series of feature creation, feature transformation, model training and data scoring steps as functions, that we can call and run one after the other.


We keep following things in the yaml file

Hard coded variables to engineer, and values to use to transform features.

Hardcoded paths to retrieve and store data

By changing these values, we can re-adjust out models.





No comments:

Post a Comment