Harrison Chase’s Post

View profile for Harrison Chase, graphic

Co-Founder and CEO at LangChain

♊️Pairwise comparisons - our newest evaluation feature that I'm super excited to unveil This is NOT the same as typical comparison views ✨This is native pairwise comparison functionality - brand new, pretty unique, and pretty powerful ❓How is it different and why am I so excited about it? 1⃣The previous way we had for comparing runs would evaluate each run individually, and then let you view the results side by side For example, if you were evaluating summarization, you'd create an evaluator to score a summary on a scale of 1-10, and then compare the results 2⃣This is fine/best for a lot of use cases. That's why we implemented it first, and why most (all?) other eval tools focus on this 3⃣However, it does have some downsides. Oftentimes it's hard to come up with a prompt to perfectly score summarization on a scale of 1-10. This task is hard for even humans to do! 4⃣It's often easier to not score an LLM generation in a vacuum, but rather compare it to another generation and say whether it is better or worse. ⭐️There's a reason the most trusted LLM eval (LLMsys arena) does this type of "pairwise comparison" - it's easier and more intuitive to do⭐️ 5⃣We've taken this idea and added in native support for "pairwise comparisons" in LangSmith. Specify two runs, as well as a comparison metric. We show how to do this using LLM-as-a-judge (with prompts from the LLMsys chatbot arena) Check out the blog and Lance Martin's video in the tweet below for more information and a great walkthrough of how to use

View organization page for LangChain, graphic

253,314 followers

🍐 Pairwise Evaluation in LangSmith For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach. LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator. Dive into our latest blog & video tutorial to learn about pairwise evaluation and walk through an example of how to use custom pairwise evaluators in LangSmith. ✍ Read our blog post: https://lnkd.in/gdCZHxQp 📽 Watch the video: https://lnkd.in/gAz_dKZg 📄 Check out the docs: https://lnkd.in/gv4z2zEZ

  • No alternative text description for this image

cc: Liz Darnell Harrison is a good follow for intuition on LLM application development. If you are ever in the SF Bay area and he is giving a talk, I recommend it.

Roland Siebelink ⦨

I help speed up startup results | Founder & CEO, Midstage Institute | Author, "Scaling Silicon Valley Style" | Forbes Coaches Council Member 🌈

2mo

A great example of how LangChain stays at the trailblazing front!

Like
Reply
Beach Break Apartments Santa Teresa, Costa Rica

Beach Break, Modern Apartments in Santa Teresa, Costa Rica

2mo

impressive!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics