Machine Learning Framework
Last updated
Last updated
A framework is a complete package, it is an interface or library for the development of models. In recent times, many such frameworks have been created for the implementation of Machine Learning algorithms. While some of these frameworks are free and open-source, others are managed by big tech companies such as Microsoft, Facebook and Google.
These frameworks provide a one-stop approach for developers to quickly develop their models. They are easy to understand for coding and debugging. They optimize the model performance while providing some clarity on its working. They also offer functions for parallelizing the computational process. Some of the popular machine learning frameworks are briefly described below (Figure 5-11) .
TensorFlow: It is an open-source library developed by Google Brain for deep neural networks and machine learning. It performs numerical computations with data flow graphs, nodes representing the operations and edges representing the arrays. Regressions, Classifications and Neural Network models are supported by this library. It is flexible offering different models and versions which can run simultaneously. It is compatible to run on CPUs, GPUs, desktops or mobile devices.
Keras: It is a deep learning API, written in Python that can run on top of TensorFlow, Theano or Cognitive Toolkit (CNKT). It is a user-friendly framework that supports convolutional neural networks (CNN), recurrent convolutional neural networks (RCNN) and their combinations. It is compatible with CPUs and GPUs. Each component of the model is broken down into modules that can be connected to create a new model. The models are written in Python which is compact, easy to debug and executed quickly.
PyTorch: Torch is another open-source machine learning library. Built from Lua programming language it supports dynamic computational graphs. The models can be changed during the process which is better than going back to the first step and recreating the entire model. It is user-friendly, with many pre-trained models to use at high speed and efficiency.
ScikitLearn: It is a free Python library for machine learning consisting of many supervised and unsupervised machine learning algorithms. It is built upon the foundation of strong libraries such as SciPy, NumPy, and Matplotlib. It is highly efficient in use and speed for the production of models. It includes algorithms for classification, regression, clustering such as support vector machines, gradient boosting, random forest, k-means and DBSCAN (Density-based spatial clustering of applications with noise).
Caffe: It is an open-source framework supporting different architectures of deep learning for image classification and segmentation. The deep learning models can be scaled up using GPUs. In Caffe, it is easy to switch between CPU and GPU with just a single flag. Caffe2 is an improved version created by Facebook which provides many cross-platform libraries to deploy models on mobile devices.
Apache Singa: It provides a flexible programming model that can be parallelized during the training process. It is extensible across wide range hardware. It consists of three main components: IO, Model and Core. The IO part handles reading and writing data. The Core component handles operations and memory functions. The Model component comprises the algorithms and data structures of the machine learning models.
Apache Mahout: This is an extensible framework for building scalable algorithms used for implementing machine learning techniques, including clustering, recommendation, and classification. It includes matrix and vector libraries. It is deployed on Hadoop using the MapReduce paradigm.
Apache Spark: It is a distributed open-source cluster-computing framework. It is written in Java, Scala, R and Python. It is extensible across different machines, big or small, locally or on cloud. It also possesses the ability to access data from different sources. Its functionalities include ETL (Extract, Transform and Load), machine learning, batch and stream processing of data.
Amazon Machine Learning: It provides visualization tools and wizards. It supports three types of models, i.e., binary classification, multi-class classification, and regression. It allows users to create a data source object from the MySQL database. Also, it permits users to create a data source object from data stored in Amazon Redshift.
Microsoft Cognitive Toolkit (CNTK): This is an open-source deep learning framework created by Microsoft, describing neural networks as a series of directed graphs. It is popularly used for speech, text and image recognition. It supports a wide variety of algorithms including CNN, RNN, LSTM (Long-short term memory networks), etc. It is extensible across multiple hardware types (CPUs, GPUs) and supports the parallelization process. With options to customize the requirements of the model in terms of metrics, algorithms and networks, it provides great ease of usage to the developer.