Friday, September 13, 2019

Machine Learning Model Pipeline Overview

Below is an typical machine learning pipeline.


Step 1: Gathering data.

Making the data available to people who can take the data and build ml models. Data may come from business, from 3rd party or use publicly available data.

Step 2: Data Analysis

We need to get good understanding of what data is telling us. It is good practice to know variables, to know how variables are related to each other. What variables we can use and what we cannot, depending upon the regulations that come with the business.

Step 3: Feature Engineering(includes Data pre-processing)

After Step 2 , we should have good understanding whether we can use variables as they are or transform them into something that can be passed to ml model. This includes filling missing values,
encoding categorical variables and date etc.

Step 4: Feature Selection/Variable selection

Finding those variables that are most relevant to solve the problem and build the model using these variables.

Step 5: Model Building

Here we will build many/few ml algorithms analyze the performance, and use the one that gives best result. We evaluate the model statistics here.

Step 6: Model - business uplift evaluation

We evaluate what is the uplift in the business value of the new model. For example if we were building model for fraud, we would evaluate amount of money that we would not disburse to fraudulent applications.


For a model to be deployed to production, we need Step 3, 4 and 5 to be deployed to production.

For the whole system we need to deploy the data and the model pipeline.





No comments:

Post a Comment