The competition is hot as ever between Google and Microsoft. Google recently released an open source machine learning framework. Microsoft is right on its heels by releasing a similar project called Distributed Machine Learning Toolkit (DMLT). The toolkit simplifies machine learning work across distributed systems by allowing models to be trained on multiple nodes at once.
Microsoft wrote in its introduction to the framework “Bigger models tend to generate better accuracies in various applications. However, it remains a challenge for common machine learning researchers and practitioners to learn big models.”
The core of DMLT is a C++ SDK for a client-server architecture. “A number of server instances run on multiple machines and are responsible for maintaining the global model parameters,” says Microsoft in its documentation. “The training routines access and update the parameters with some client APIs that call the underlying communication facilities.”
DMLT was created to make it easier for data scientists to perform model training across multiple machine nodes without having to worry about the essence of managing threads or workloads. It simplifies inter-process communication, as two different libraries MPI and ZMQ are available and can be used interchangeably.
DMLT includes two major algorithms for model training. LightLDA will likely be the most commonly used for quick training of large data models. Microsoft claims it has been able to train models with “trillions of parameters” on only an eight-node system with LightLDA. Also included are Distributed Word Embedding and Distributed Multisense Word Embedding, algorithms for determining the relationships of words to each other.
DMLT was surprisingly given an extremely low-key release by Microsoft. The source code was made publicly available on November 9. The company claims that it was only the beginning and more algorithms are on the way.