Please enable JavaScript to view the comments powered by Disqus. Do Data Scientists Like Being Inclined Towards DevOps?


Do Data Scientists Like Being Inclined Towards DevOps?



Last updated 22/07/2021

Do Data Scientists Like Being Inclined Towards DevOps?

On the off chance that you were going to outline a production Machine Learning pipeline, the start — planning and preparing models, and so on — would clearly have a place with the data science work. 

Sooner or later, regularly when it's an ideal opportunity to take models to creation, an ordinary pipeline will change from information science to foundation errands. Instinctively, this is the place the data science group hands things over to another person, like DevOps. 

However, this isn't generally the situation. To an ever-increasing extent, information researchers are being approached to deal with conveying models to creation also. 

As per Algorithmia, a lion's share of data scientists report investing over 25% of their energy in model sending alone. Episodically, you can check this by taking a gander at what number of information researcher work postings incorporate things like Kubernetes, Docker, and EC2 under "essential experience."


Why data scientists shouldn’t have to handle model serving

The most straightforward answer here is that model serving is a foundation issue, not a data science issue. You can see this by simply looking at the stacks utilized for each: 

There are obviously a few data scientists who like DevOps and can work cross-practically, yet they are uncommon. Truth be told, we would state the cover between data science and DevOps is every now and again overestimated. 

To flip things around, okay expect a DevOps designer to have the option to plan another model engineering, or to have a huge amount of involvement in hyperparameter tuning? There likely are DevOps engineers who have those data science aptitudes, and everything is learnable, yet it is odd to consider those obligations the space of your DevOps group. 

Data scientists, more than likely, didn't get into the field to stress over autoscaling or to compose Kubernetes shows. So for what reason organizations cause them to do it?


Companies are neglecting their infrastructure

Among many organizations, there’s a fundamental misunderstanding of how complex model serving is. The attitude is often “Just wrapping a model in Flask is good enough for now.”

The reality is, serving models at any scale involves solving some infrastructure challenges. For example:

  • How do you update models in production automatically — without any downtime?
  • How do you efficiently autoscale a 5 GB model that runs on GPUs?
  • How do you monitor and debug production deployments?
  • How do you do all of this without running up a massive cloud bill?

Presently, to be reasonable, ML framework is a genuinely new idea. Uber just uncovered Michelangelo, their forefront interior ML foundation, two years prior. The playbook for ML framework is as yet being written from numerous points of view. In any case, there are still a lot of instances of how an association can isolate the worries of information science and DevOps, without the designing assets of a Uber.

How to separate data science and DevOps

Cortex was designed to delineate data science from DevOps, and to automate all the infrastructure code they were writing. Since open-sourcing, they have worked with data science teams who’ve adopted it, and their experiences have also informed our approach.

They conceptualize the handoffs between data science, DevOps, and product engineering with an easy, abstract architecture they refer to as Model-API-Client:

  • Model. A trained model, with some kind of predict() function that engineers can use without needing data science expertise.
  • API. The infrastructure layer that takes a trained model and deploys it as a web service. Cortex was built to automate this layer.
  • Client. The actual application that interacts with the web service deployed in the API layer.

In the model phase, data scientists train and export a model. They also write a predict() function for generating and filtering predictions from the model.

They then hand this model off to the API phase, at which point it is entirely the DevOps function’s responsibility. To the DevOps function, the model is just a Python function that needs to be turned into a microservice, containerized, and deployed.

Once the model-microservice is live, product engineers query it like any other API. To them, the model is just another web service.

The Model-API-Client architecture is not the only way to separate the concerns of data science and engineering, but it serves to illustrate that you can draw a line between data science and DevOps without introducing extravagant overhead or building expensive end-to-end platforms.

By just establishing clear handoff points between functions in your ML pipeline, you can free data scientists up to do what they’re best at — data science.

Topic Related Post

DevOps for Mobile App Development
Top 8 New Trends in DevOps Worth Checking out in 2021
A brief overview of Bitbucket

About Author

NovelVista Learning Solutions is a professionally managed training organization with specialization in certification courses. The core management team consists of highly qualified professionals with vast industry experience. NovelVista is an Accredited Training Organization (ATO) to conduct all levels of ITIL Courses. We also conduct training on DevOps, AWS Solution Architect associate, Prince2, MSP, CSM, Cloud Computing, Apache Hadoop, Six Sigma, ISO 20000/27000 & Agile Methodologies.



* Your personal details are for internal use only and will remain confidential.


Upcoming Events


Every Weekend


Every Weekend


Every Weekend


Every Weekend

Topic Related

Take Simple Quiz and Get Discount Upto 50%