Ankush Gola’s Post

View profile for Ankush Gola, graphic

Co-Founder, LangChain

Introducing Pairwise Evaluations in LangSmith Evaluating LLM application outputs can be challenging when done in isolation. It's often more effective to rank outputs preferentially, a method beneficial not only for human evaluators but also for LLM-as-a-judge systems. Example: instead of grading an LLM output's 'vagueness' on a 0-1 scale, rank two outputs by which one is more vague. With this in mind, I'm pretty excited about this powerful new feature in LangSmith: native support for running and visualizing pairwise experiments! Docs here: https://lnkd.in/dGHgc9MZ

View organization page for LangChain, graphic

253,271 followers

🍐 Pairwise Evaluation in LangSmith For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach. LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator. Dive into our latest blog & video tutorial to learn about pairwise evaluation and walk through an example of how to use custom pairwise evaluators in LangSmith. ✍ Read our blog post: https://lnkd.in/gdCZHxQp 📽 Watch the video: https://lnkd.in/gAz_dKZg 📄 Check out the docs: https://lnkd.in/gv4z2zEZ

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics