Daniel Kim

Daniel Kim is a Senior Engineering Manager for New Relic's Developer Relations team, where he helps developers get better visibility into their cloud-native Kubernetes environments using OpenTelemetry, Prometheus, and other open source technologies. A resident of San Francisco, he enjoys long walks with his dog Gizmo and a fancy latte in hand.

The speaker's profile picture


Debugging LLMs in prod with OpenTelemetry
Daniel Kim

Large Language Models (LLMs), the underlying technology powering AI applications, are black boxes without predictable outputs. A change of a single word can return a completely different answer. Engineers with LLMs in production have to be prepared for the unpredictable – users will submit prompts that break the system, a simple PR to fix one issue will lead to four unforeseen issues, and latency can get quickly out of hand. However, abnormal behavior in production isn’t specific to running LLMs – it’s a reality for most modern software. Observability allows engineers to analyze the inputs and outputs of complex software, even black boxes like LLMs, providing multiple signals needed to troubleshoot in production.

In this talk, learn how you can leverage OpenTelemetry’s Python instrumentation to easily monitor traces, metrics, and logs from your AI applications, whether you are using a framework like Langchain or a foundation model API, like AWS Bedrock. Then, we will walk through how we can use this data to better debug and improve performance and cost efficiency of your AI stack!