AMA With Aparna Dhinakaran: ML Observability

Oct 11, 2022

We regularly invite ML practitioners and industry leaders to share their experiences with our Community. Want to ask questions to our next guest? Join BentoML Community Slack

We recently invited Aparna Dhinakaran, Co-Founder and Chief Product Officer at Arize AI, to speak about her experience building an ML observability platform, the evolution of MLOps, and more!

Key Takeaways:

• ML observablity helps teams quickly detect issues, troubleshoot why they happened, and improve model performance.

• Observability and monitoring are necessarily different. Monitoring indicates when issues arise while observability helps you understand why and how to fix model issues easily.

• ML requires different tooling than software infracture. While they follow similar principals, the differences between the two toolchains necessitate products with their own core competencies.

Q: What is ML observability and why do I need it?

A: ML Observability is the practice of obtaining a deep understanding into your model’s data and performance across its lifecycle. Observability doesn’t just stop at surfacing a red or green light, but enables ML practitioners to root cause/explain why a model is behaving a certain way in order to improve it.

Teams need it because model issues happen all the time in the real world! Just like we monitor software applications, ml models need to be monitored and when things inevitably go wrong, teams need tools to troubleshoot them!

Q: For MLOps, what is the biggest difference between now and your days back in Uber?

A: I think there’s a lot of cool things happening in the MLOps space!

For one, there are a ton more Central ML teams across enterprises. Here’s a great piece from our customer success lead who works with many centralized ML teams: https://towardsdatascience.com/the-death-of-central-ml-is-greatly-exaggerated-1f1626b3a8d4

How they operate, how does central ml platform look like - all of this is still growing a TON.

There’s also a lot more tools than back in 2016. A lot of infra needed to do ML back then had to be built in house. I don’t think this is the case these days.

There is also a wider range of models deployed - we see more NLP & CV use cases with the advent of deep learning. Tools to build, deploy, troubleshoot these types of models and data are of more need because of the growing use.

Q: “Deploying frequently removes the need to do model monitoring.” What are your thoughts on this statement?😉😁

A: Love the question with the hot take 🌶😉.

I disagree MASSIVELY with that statement - it’s a misconception. We actually work with models that are retrained and deployed daily and they STILL have issues that are caught 😄.

Even if models are deployed regularly, they can still have data quality issues in their feature pipeline that feed the model and ultimately cause worse model outcomes.

Q: Where do you think the MLOps landscape is heading? Do you think there's going to be consolidation around end-to-end platforms? Will observability/monitoring continue to be separate tools in the long run?

A: I actually think more and more of the best-of-breed platforms are going to gaining larger traction!

In the software toolchain, best of breed tools won. Github won the software versioning control market, and there are strong contenders like WandB as a parallel in the MLOps Space. Datadog really did win the infra observability category, and that is where Arize sits in as a parallel in MLOps space - ML Observability.

I’ve also been noticing more of the end to end platforms are focused on less technical users, but still empowering these citizen data scientists to use ML. More of the deep technical ML practitioners will prefer solutions that can plug and play into their stack with deep abilities to configure.

Q: What is the difference between observability and monitoring?

A: I like to think of monitoring as pointing out an issue, but not enough to really be able to root cause the issue. Think of it as a red/green light - tells you something is going wrong, but doesn’t tell you how to go fix it.

Observability is that ability to trace back the issue to underlying data or model cause!

This is a piece I wrote a while ago that goes deeper on this: https://aparnadhinak.medium.com/beyond-monitoring-the-rise-of-observability-c53bdc1d2e0b

Q: What are the difficulties of monitoring covariate drift in online serving scenarios?

A: We work a lot with cases where the input features go through some level of transformation before they are ingested into the model, ex: one-hot encoding. Making sure that the data you are expecting in production matches the offline data schema/distribution requires a lot of data wrangling in online serving.

One other unique case (not drift) that we hear about input features is about measuring data consistency between offline and online features. This is especially common with folks who have feature stores. We did a piece with Feast on how to monitor for data consistency to avoid training-serving skew: https://arize.com/blog/feast-and-arize-supercharge-feature-management-and-model-monitoring-for-mlops/

Q: In your opinion, for those implementing an in-house model monitoring solution, what are some of the biggest challenges and common mistakes?

A: I think the most common issue we see is teams not realizing how much work they are biting off when deciding to build a model monitoring solution in house. How to handle various model versions and training/test/prod environments, embeddings, ranking models metrics, the UI visualization to debug model metrics, looking at various segments not just aggregate metrics…. its a LOT.

We have seen teams start internally first and then realize how much time they are investing in building this when it isn’t their company/team objective to have the best ml observability platform … it’s to build the best models. It is not just the initial build, but the continual maintenance.

If you are still deciding to build, I’d check out this checklist of the fundamentals to build: https://arize.com/resource/machine-learning-observability-checklist/

Q: What are key differences between monitoring/observing ML models vs regular web services/apis?

A: There’re a lot of differences. Fundamentally, there are different types of observability

• Infra Observability

• Data Observability

• ML Observability

Infra observability is really focused on app timing metrics and tracing typically involves digging through various subsystems to get to the application slowing down timing.

In ML, [observability] is focused on model performance metrics and troubleshooting involves digging back to the data to fix [model issues].

How does monitoring change for ML Models when you have to use production data to get accurate results? How does that change how we should think about environment based testing (i.e. having staging, dev, uat environments first)?

I think this is actually really similar to regular infra observability - have to get signals from when the model or application is deployed to monitor it. In ML, we do spend so much time looking at model performance in training but it’s important to remember that models degrade over time in production and performance isn’t static.

In the regular devops world, we would never just run unit tests and then not monitor the system in the real world. It’s time for ML to grow up🌶 .

Q: Between real-time and offline monitoring, which one is more challenging to monitor? What are the challenges?

A: Both have its own challenges but in general I’d say real-time. With real-time, getting back ground truth and then connect back to the predictions to be able to calculate model performance metrics can be tough.

It still has to be done for offline, but the expectations from users usually match the delay.

Q: Thank you for the explanation of observability vs monitoring. With the tools we have know, how well can we explain a deep learning model? What are the models that are easy to explain?

A: There a ton of explainability techniques out there that work for deep learning models - check out the SHAP library: https://github.com/slundberg/shap

I think there is a gap between these deep-tech explainability techniques and then explaining it to less technical users.

Q: Was there a particular gap in the market that you thought needed solving when you founded Arize?

A: Yes! I wanted to build a product that solved a pain I felt myself. I used to be an ML engineer at Uber and didn’t have any tools to get visibility into my models. This is what compelled me to found Arize

Q: What do you think is the future of AI/ML Engineering? Will companies still hire engineers or will tools be so matured, everyone just need to subscribe to these products?

A: There will absolutely always be more need of ML engineers. I think of all the software tools built in the last few decades as allowed for more companies to adopt technology and therefore increased demand for software engineers. See the same trend happening in ML - more infrastructure for ML will allow more enterprises/companies to use ML and hire more ML practitioners to build better and better models 🙂.

* The discussion was lightly edited for better readability.

Join our global Community

Billions of predictions per day 3000+ community members Use by 1000+ organizations

Start a free trial

Get in touch

Subscribe our newsletter

Use Cases

Open Source