01
Kale
Kale is an open-source observability and auto-scaling platform designed for machine learning workloads on Kubernetes to provide more intelligent scaling decisions.
The Problem
The current Kubernetes Horizontal Pod Autoscaler (HPA) is limited in its ability to scale based on metrics beyond CPU and memory usage. This poses a challenge for machine learning workloads, which often require monitoring and scaling based on network and GPU metrics as well. This limitation can lead to suboptimal resource allocation and performance for ML jobs running on Kubernetes.
The Solution - Kale
A monitoring and autoscaling tool designed for machine learning workloads on Kubernetes to provide more intelligent scaling decisions.
📝 Sign up for kale
📈 Enter Prometheus Server URL and Pod Name you would like to monitor.
📸 Take a snapshot of your pod metrics and view your saved snapshots in your "History" tab. Click on a saved snapshot to open up its metrics.