The Project


Machine learning models are software artefacts. Among the stream of models generated by, not all of them satisfy the based requirement of real-​world deployment. Can we continuously test ML models in the way we are testing traditional softwares? is a continuous integration engine developed for ML. Given a new machine learning model committed into the system, and a set of user-​specified conditions and test dataset (e.g., the new model is at least 1 points better than the old model), checks whether the given model satisfies all the test conditions.

One technical challenge is overfitting — after every test query, the test set will lose some of its statistical power. If we are not being careful, after a while, we are going to overfit to the provided test set and would potentially return a wrong answer. The technical core of is a collection of techniques to measure the “information leakage” coming along with each test query, and inform the user when a new test dataset is required.

Input: (1) An endless stream of models trained by the AutoML system; (2) A test set and a list of test conditions.

Output: An endless stream of models, each of which is labelled by {Pass, Failure}.

Action: The user has to provide a new test set when requests so.



Continuous Integration of Machine Learning Models with A Rigorous Yet Practical Treatment
C Renggli, B Karlaš, B Ding, F Liu, K Schawinski, W Wu, C Zhang
[MLSYS] Proceedings of Machine Learning and Systems

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference — it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens. In this paper, we present, to our best knowledge, the first continuous integration system for machine learning. The challenge of building is to provide rigorous guarantees, e.g., single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, e.g., 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems. and in action: towards data management for statistical generalization
C Renggli, FA Hubis, B Karlaš, K Schawinski, W Wu, C Zhang
[VLDB Demo] Proceedings of the VLDB Endowment

Developing machine learning (ML) applications is similar to developing traditional software — it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of “software engineering for ML” is largely missing — developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself.In this paper, we view the management of ML development life-cycles from a data management perspective. We demonstrate two closely related systems, and, that provide some “principled guidelines” for ML application development: ci is a continuous …


Building continuous integration services for machine learning
B Karlaš, M Interlandi, C Renggli, W Wu, C Zhang, DMI Babu, J Edwards, C Lauren, A Xu, M Weimer
[SIGKDD] Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Continuous integration (CI) has been a de facto standard for building industrial-strength software. Yet, there is little attention towards applying CI to the development of machine learning (ML) applications until the very recent effort on the theoretical side. In this paper, we take a step forward to bring the theory into practice.