Saturday, September 21, 2019

Custom Machine Learning Pipeline in production

Custom ML Pipeline is built using OOP programming.

In OOP, we write code in the form of objects.
The objects can store data  and can also store instructions or procedures to modify that data.
       Data => attributes.
       Instructions pr procedures => methods.

A pipeline is a set of data processing steps connected in series, where typically, the output of one element is the input of the next one.

The element of a pipeline can be executed in parallel or in time-sliced fashion. This is useful when we require use of big data or high computing power eg: neural networks.

So, a custom ml pipeline is a sequence of steps, aimed at loading and transforming data, to get it ready for training or scoring where:
   - We write processing steps as objects(OOP)
   - We write sequence i.e pipeline as objects (OOP)

Refer: customPipelineProcessor.py
           customPipelineTrain.py




Leveraging Third party pipeline : Scikit-Learn




How is scikit-learn organized?




The characteristics of scikit-learn pipeline is such that, you can have as many transformers as you want and all of them except the last one, the last one should be a predictor.




Feature creation and Feature engineering steps as Scikit-learn Objects.

Transformers: class that have fit an transform method, it transforms data.
Use of scikit-learn base transformers
     Inherit class and adjust the fit and transform methods.



Scikit-Learn Pipeline - Code
Below the code for the Scikit-Learn pipeline, utilising the transformers we created in the previous lecture. Briefly, we list inside the pipeline, the different transformers, in the order they should run. The final step is the linear model. Right in front of the linear model, we should run the Scaler.

You will better understand the structure of the code in the coming lectures. Briefly, we write the transformers in a script within a folder called processing. We also write a config file, where we specify the categorical and numerical variables. Bear with us and we will show you all the scripts. For now, make sure you understand well how to write a scikit-learn pipeline.

No comments:

Post a Comment