Model Deployment
Deploying the model could mean it's available to a small group of people within an organization or a customer operating on the other side of the planet. For each of these user groups, it is best advised to provide the machine learning model as a service that they can access for their purpose. This service can be provided in a production environment to which the user is granted complete access.
Machine Learning models can be deployed in offline or online mode. In offline mode, the model conducts batch prediction. When a request for service is made, the data is loaded dynamically into memory as a separate process, then call the prediction functions, unload it from memory and free the sources. In the case of online mode, the model can be deployed in a container to a service cluster which is distributed into many servers.
When handling big data, large computing power is the need of the hour. In that case, machine learning models (offline or online) can be deployed in either of the following ways.
Cloud-Computing: Any service offered via the Internet (“cloud”) is referred to as Cloud computing. Cloud computing services include storage, database, software, networking, analytics or intelligence which can be offered via the Internet. These services can be broadly categorized as Infrastructure, Platform and Software.
High-Performance Computing on-premise: High-Performance computing is the use of supercomputers for performing complex computations and parallel processing on a large scale. When deployed on-site, a network of computers (referred to as clusters) with high processing power, large memory space and strong network connections is set up. These computers collectively perform the computations with high efficiency in less processing time.
High-Performance Computing on cloud: Deploying several high processing computers in a cloud base and accessing them via the internet is HPC on cloud. It combines the best of cloud computing and high-performance computing and provides for flexible scalability to grow or reduce the infrastructure based on the workload at hand.
Last updated