MLOps: adressing the gap between data science and software engineering

Artificial Intelligence & AI & Machine Learning
10 de marzo de 2024

The concept of data driven decision making has been impacting businesses for cen- turies[1], coining the term business intelligence to mean the ability to make informed decisions based on current events, past information, and expertise. Ever since the digital revolution in the 1950s, analytics have continued to evolve, transforming the landscape of decision-making and organizational strategy[2]. Initially, analytics focused on basic de- scriptive statistics to understand past trends and performance. However, with the advent of more powerful computing technologies and the accumulation of vast amounts of data, analytics progressed to encompass predictive analytics, enabling organizations to forecast future outcomes and trends[3]. This evolution laid the groundwork for the integration of AI into organizational processes, which, from the end of the past decade, and into the cur- rent one [4], has grown from being a novel concept to a cornerstone of modern business strategy[5][6].

This evolution has seen AI applications expand across various industries, from per- sonalized recommendation systems in sales to fraud detection in finance, with 91.5% of companies investing into AI research in some measure[7]. Despite this, only 14,6% of them have integrated AI capabilities into production. A likely cause of this disconnect is the issue of deployment, which is cited as one of the most critical problems in AI/ML implementation [8].

Besides the obvious possible causes, such as the high cost and scarcity of the hardware required to run large models[9], an issue at the forefront of this gap seems to lie in the backgrounds of data scientists, of whom a large majority does not have any computer science/computer engineering related formation[10]. This knowledge gap may then lead to a disconnect in organizational infrastructure, where enterprises have the ability to un- derstand business cases and generate models that fit their needs, but lack the know-how when it comes to integrating these predictive capabilities in their production workflow.

Figure 1.1: Data Scientist LinkedIn profiles by background.

Additionally, due to the importance of integrating several data sources, managing data repositories, and the need for resilience and availability in data[11], it is sensible to sug- gest that engineers involved in ML projects should have a strong backend background.

What is MLOps?

From 2018 onwards, stemming from the concept of DevOps, the term MLOps (Machine Learning Operations) quickly started gaining relevance[12], in response to the growing need for streamlined processes in deploying, monitoring, and managing machine learning models in production environments. It refers to the practices, tools, and organizational culture that integrate ML models into the software development and operational lifecy- cle[11].

Figure 1.2: Popularity of the term MLOps from 2018 to present day. Google Trends.

Ever since then, the MLOps ecosystem has continued to grow, and it is precisely this evolution that has allowed Machine Learning models to become an industry staple, allow- ing businesses to scale their analytical engines to include higher amounts of models, by keeping track of versioning, and transforming the usual validation steps (metric obtention over test data, avoiding model degradation…) into those of a typical DevOps continuous integration pipeline[11].

What the ecosystem offers

While the tools belonging to this ecosystem are many, with a wide variety of software ranging from simple solutions to entire suites of end-to-end management capabilities, the basic needs of any successfully implanted MLOps environment are the following:

Version control: For the same reasons that git has become a need in software de- velopment, subjecting ML models to version control, where each version contains metadata regarding each model’s inputs, outputs, validation metrics, etc., has be- come a necessity for any development team wanting to develop Machine Learning applications. The ability to improve on previous iterations by retraining models on new/different data while maintaining traceability, avoiding duplication or redun- dancy, and the possibility of rollback make teamwork in development a lot more agile and collaborative. Additionally, by tracking the hyperparameters of mod- els across experiments, we can achieve an often underestimated milestone: repro- ducibility.

Data management and integration: Data is the lifeblood of Machine Learn- ing models. As such, procuring a representative dataset for a given problem is paramount to achieving a solid solution. While data scientists and researchers may be used to receiving pre-constructed, pre-labeled datasets, reality is seldom so for- giving, and projects often require integration of several data sources, receiving data from each through a strong backend, storing it in different schemas, and generating a final dataset containing the necessary features and labels. Being able to easily deploy and manage the pieces of such infrastructure is also a key element of any MLOps stack. Additionally, data governance as a part of its management, where compliance with regulatory requirements, ethical standards, and data privacy laws is ensured, is paramount to the correct development of ML in organizations[13]. These mechanisms work to ensure data security, prevent bias, and enforce fairness and transparency in machine learning models.

Deployment and maintenance: While an entire article could (and perhaps will) be written on the topic of deployment and maintenance, engulfing all topics re- lated to scalability, performance, retraining strategy, customizability, and monitor- ing, suffices to say that streamlining and partially automating such processes is key to maintaining productivity when a large number of them needs to be managed. Continuous integration, monitoring for performance and drift, retraining as needed, and different deployment schemas are all part of this functionality.

Lifecycle management: This term refers to the systematic process of managing the model throughout its entire lifecycle, from development to decomissioning/sub- stitution. From the perspective of MLOps, being able to consult the stage of a model in its lifecycle, modify this data, provide traceability regarding its performance, and seamlessly remove it from production when necessary.

Additionally, certain MLOps frameworks also include all necessary tooling to build preprocessing pipelines, perform feature selection, train, and validate models before car- rying out the subsequent steps.

Figure 1.3: Stages of a ML model lifecycle.

Challenges and future work

Despite its growth, the discipline is still rather new, and faces challenges for which gener- alized solutions (as opposed to case-based ones) have yet to be developed. Some of these issues are:

Data Drift and Model Degradation: One of the key challenges in MLOps is dealing with data drift and model degradation over time. As inputs change along with the real world’s distribution, the performance of machine learning models may deteriorate, leading to lower accuracy and degraded predictions. This occurs due to the corpus on which the original model was trained increasingly becoming less rep- resentative of real world status. Addressing such gradual decay requires a constant influx of real world data, which the model can be retrained with, along with close monitoring of the model on recent test sets. Even with such precautions, unforeseen events or outliers may affect the model’s performance, and post-processing of the metrics based on them may become necessary.

Workflow Bottlenecks: In many organizations, slow workflows between data sci- ence and DevOps teams hinder the scalability and efficiency of MLOps processes. While all models and functionalities need to be tracked, traceable, and tested, it is paramount to ensure that this doesn’t hinder iteration more than it promotes it, as candidate models (as opposed to testing models) need to be tracked, but it should also be easy and painless to deploy new versions.

Risk Management and Compliance: MLOps introduces new risks related to data privacy, security, and compliance. Organizations must ensure that their MLOps processes adhere to regulatory requirements and industry standards to mitigate the risk of fines, reputational damage, and financial loss. While this is definitely a risk, certain MLOps-based paradigm shifts in model training, such as the case of Federated Learning[14], can become allies in MLOps Data governance, rather than hindrances.

Industry impact

With the growth of MLOps towards the end of 2021, the market value for AI across companies is estimated to have grown by more than double[15]. While this cannot be chalked up uniquely to the adoption of MLOps practices, its impact is undeniable, its market growth also being remarkable[16].

Besides its impact in economic growth, MLOps also shares correlation with the advent of cloud services, where computing power can be rented along with tools that facilitate its usage. Where Big Data already required highly availabe, failure resistant data stor- age, massive deployment of ML models requires additional hardware, fast data streaming capabilities, and tooling that enables all the aforementioned capabilities, which cloud providers such as Google, AWS or Microsoft Azure have developed. In a way, Cloud engineering has almost become synonymous with the AI and MLOps environment, their relevance showing correlation in their trend and seasonality[12].

Figure 1.4: Popularity of the terms MLOps and Cloud Engineering from 2021 to present day. Google Trends.

Conclusions

The field of analytics has been a quintessential part of business strategy for decades, and, arguably, centuries. Its growth has culminated in the usage of AI and ML for predictive and prescriptive processes, which has become a quickly growing, thriving market in the past few years, with most organizations investing in its research and development. De- spite this, the novelty and complexity of the field has left very few companies actually harnessing its result in productive enterprises, mostly due to deployment, but also due to the lack of a streamlined development roadmap. The implementation of MLOps prac- tices, as well as the organizational reshaping that it entails, has allowed for great growth in market value, besides automating parts of the workflow and redirecting work efforts to the development of models and infrastructure.

While at the current date it is a common consensus that MLOps has yet to reach a stage of maturity, where productivity finally plateaus, The current state of the art is no less promising for it. As with all fields in engineering, streamlining a production roadmap for all models, with tools that make their design, implementation, and exploitation a sim- pler process has, and is expected to continue increasing the value that organizations can harness from data science and its adjacent disciplines. While tools to address the afore- mentioned challenges are still in growth, the landscape of AI in organizations is becoming increasingly shapen by the processes that enable it, whose growth spells out a bright future path for the discipline.

 

Bibliography

[1] R. M. Devens, Cyclopaedia of commercial and business anecdotes. Detroit: Gale Research Co.
[2] T. Davenport, “Analytics 3.0,” Harvard business review, vol. 91, no. 12, p. 64, Dec. 2013. [Online]. Available: https://search.proquest.com/docview/1465269349.
[3] D. R. Cox, Principles of Statistical Inference, 1st ed. Cambridge: Cambridge Uni-versity Press, Aug. 2006, ISBN: 9780521685672. [Online]. Available: http://dx. doi.org/10.1017/CBO9780511813559.

[4] R. Bean. “How big data and ai are driving business innovation in 2018.” (Feb.2018), [Online]. Available: https://search.proquest.com/docview/1994096490.
[5] S. M. Chan-Olmsted, “A review of artificial intelligence adoptions in the media industry,” International Journal on Media Management, vol. 21, no. 3-4, pp. 193– 215, 2019, doi: 10.1080/14241277.2019.1695619. [Online]. Available: https:// doi.org/10.1080/14241277.2019.1695619.

[6] P. Gerbert, M. Reeves, S. Ransbotham, D. Kiron, and M. Spira. “Global competi- tion of ai in business: How china differs.” (Jul. 2018), [Online]. Available: https: //search.proquest.com/docview/2074592178.

[7] “Newvantage partners posts 2019 big data and ai executive survey,” Manufacturing Close-Up, Jan. 2019.
[8] T. Davenport and K. Malone, “Deployment as a critical business data science discipline,” Harvard Data Science Review, Feb. 2021. [Online]. Available: https:// explore.openaire.eu/search/result?id=doi_________::8728dce30062792b105d97d53c30e14f.

[9] E. Griffith, The desperate hunt for the a.i. boom’s most indispensable prize, Aug.2023.
[10] “The state of data science – stitch data.” (), [Online]. Available: https://www .stitchdata.com/resources/the-state-of-data-science/.
[11] M. Treveil, N. Omont, C. Stenac, et al., Introducing MLOps: How to Scale Machine Learning in the Enterprise. O’Reilly Media, Incorporated, 2020, ISBN: 9781492083290. [Online]. Available: https://books.google.es/books?id=fR64zQEACAAJ.

[12] trends.google.com. “Google trends.” (2024), [Online]. Available: http://trends.google.com/trends.

[13] M. Al-Ruithe, E. Benkhelifa, and K. Hameed, “A systematic literature review of data governance and cloud data governance,” Personal and Ubiquitous Computing, vol. 23, Nov. 2019. DOI: 10.1007/s00779-017-1104-3.

[14] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey on federated learning,” Knowledge-Based Systems, vol. 216, p. 106775, 2021.
[15] Artificial intelligence (ai) market by component (hardware, software, and services),deployment (cloud and on-premise), application (virtual assistants/chatbots, fore- casts modeling, text analytics, speech analytics, computer vision, predictive main- tenance, and others), and end-user industry (bfsi, government, aerospace defense, automotive, healthcare, it telecom, manufacturing, education, retail e-commerce, energy utilities, media entertainment, and others): Global opportunity analysis and industry forecast, 2022–2030, Jan. 2023. [Online]. Available: https://www. nextmsc.com/report/artificial-intelligence-market.

[16] “Mlops market size, share covid-19 impact analysis, by deployment (cloud, onpremise, and hybrid), by enterprise type (smes and large enterprises), by end-user
(it telecom, healthcare, bfsi, manufacturing, retail, and others), and regional forecast, 2023 – 2030.” (Feb. 2024), [Online]. Available: https://www.fortunebusinessinsights.com/mlops-market-108986.


Tags:
, ,

Compártelo en Facebook Twitter LinkedIn e-mail
[contact-form-7 id="4" title="Contacto"]