How do you know your LLM app is truly working – not just responding?
In this session, you’ll learn how to evaluate your Prompt Flows to move beyond intuition and into measurable performance.
We’ll cover:
– the built-in evaluation pipeline and how it works under the hood,
– available metrics such as Groundedness, Relevance, Fluency, and how to choose the right ones for your use case,
– building and customizing your evaluation configuration,
– interpreting evaluation results to diagnose issues and drive improvements,
– and common mistakes to avoid when integrating evaluation into your workflow.
Whether you’re fine-tuning prompts or managing full-blown LLM workflows, this session will equip you with practical techniques to ensure your solutions are accurate, consistent, and reliable.
Benefits of attending the webinar –
– Understand the value of evaluation in LLM workflows and why metrics matter beyond intuition.
– Learn to apply built-in metrics to assess model output quality.
– Master the setup of evaluation pipelines using Prompt Flow’s configuration tools.
– Gain confidence in interpreting results to guide prompt and flow improvements.
– Avoid common mistakes and adopt proven practices for integrating evaluation into your development cycle.
Is there a demo?
Yes – there will be one comprehensive demo during the webinar. This single walkthrough will cover all key aspects of the evaluation process in Prompt Flow, including:
– How to configure an evaluation
– How to select appropriate metrics for your use case
– How to interpret evaluation results and use them to improve prompt flow quality
The demo is designed to provide a full, end-to-end view of how evaluation fits into real-world Prompt Flow scenarios.
Experience level (i.e., Level 100, level 200, level 300, level 400) – 300
This session is intended for participants with hands-on experience building LLM flows using Prompt Flow or similar tools. Attendees should be comfortable with LLM concepts, and ready to apply evaluation strategies to improve the quality and reliability of their AI workflows.
Log In



