Ankush Gola’s Post

Co-Founder, LangChain

2mo

Introducing Pairwise Evaluations in LangSmith Evaluating LLM application outputs can be challenging when done in isolation. It's often more effective to rank outputs preferentially, a method beneficial not only for human evaluators but also for LLM-as-a-judge systems. Example: instead of grading an LLM output's 'vagueness' on a 0-1 scale, rank two outputs by which one is more vague. With this in mind, I'm pretty excited about this powerful new feature in LangSmith: native support for running and visualizing pairwise experiments! Docs here: https://lnkd.in/dGHgc9MZ

LangChain

253,271 followers

2mo

🍐 Pairwise Evaluation in LangSmith For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach. LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator. Dive into our latest blog & video tutorial to learn about pairwise evaluation and walk through an example of how to use custom pairwise evaluators in LangSmith. ✍ Read our blog post: https://lnkd.in/gdCZHxQp 📽 Watch the video: https://lnkd.in/gAz_dKZg 📄 Check out the docs: https://lnkd.in/gv4z2zEZ

To view or add a comment, sign in

More Relevant Posts

LangChain

253,271 followers
2mo
Report this post
🍐 Pairwise Evaluation in LangSmith For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach. LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator. Dive into our latest blog & video tutorial to learn about pairwise evaluation and walk through an example of how to use custom pairwise evaluators in LangSmith. ✍ Read our blog post: https://lnkd.in/gdCZHxQp 📽 Watch the video: https://lnkd.in/gAz_dKZg 📄 Check out the docs: https://lnkd.in/gv4z2zEZ
3 Comments
Like Comment
To view or add a comment, sign in
Harrison Chase

Co-Founder and CEO at LangChain
2mo
Report this post
♊️Pairwise comparisons - our newest evaluation feature that I'm super excited to unveil This is NOT the same as typical comparison views ✨This is native pairwise comparison functionality - brand new, pretty unique, and pretty powerful ❓How is it different and why am I so excited about it? 1⃣The previous way we had for comparing runs would evaluate each run individually, and then let you view the results side by side For example, if you were evaluating summarization, you'd create an evaluator to score a summary on a scale of 1-10, and then compare the results 2⃣This is fine/best for a lot of use cases. That's why we implemented it first, and why most (all?) other eval tools focus on this 3⃣However, it does have some downsides. Oftentimes it's hard to come up with a prompt to perfectly score summarization on a scale of 1-10. This task is hard for even humans to do! 4⃣It's often easier to not score an LLM generation in a vacuum, but rather compare it to another generation and say whether it is better or worse. ⭐️There's a reason the most trusted LLM eval (LLMsys arena) does this type of "pairwise comparison" - it's easier and more intuitive to do⭐️ 5⃣We've taken this idea and added in native support for "pairwise comparisons" in LangSmith. Specify two runs, as well as a comparison metric. We show how to do this using LLM-as-a-judge (with prompts from the LLMsys chatbot arena) Check out the blog and Lance Martin's video in the tweet below for more information and a great walkthrough of how to use
LangChain

253,271 followers
2mo

🍐 Pairwise Evaluation in LangSmith For LLM use cases like text generation or chat (where there may not be a single "correct" answer), picking a preferred response with pairwise evaluation can be an effective approach. LangSmith’s pairwise evaluation lets you (1) define a custom pairwise LLM-as-judge evaluator with any desired criteria and (2) compare two LLM generations using this evaluator. Dive into our latest blog & video tutorial to learn about pairwise evaluation and walk through an example of how to use custom pairwise evaluators in LangSmith. ✍ Read our blog post: https://lnkd.in/gdCZHxQp 📽 Watch the video: https://lnkd.in/gAz_dKZg 📄 Check out the docs: https://lnkd.in/gv4z2zEZ
4 Comments
Like Comment
To view or add a comment, sign in
Muhtasham O.

AI/ML Engineer
7mo
Report this post
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by Eugene Yan's blog on this topic, I hacked something together over the weekend to streamline this process and also detect potential hallucinations. While human evaluation is essential for understanding summary quality, combining it with automated evaluation can streamline and broaden the testing process. Appreciate the feedback and contributions: https://lnkd.in/djjDEBND Eugene's blog: https://lnkd.in/djjSGkG3

GitHub - Muhtasham/summarization-eval: 📝 Reference-Free automatic evaluation for text summarization

github.com

2 Comments
Like Comment
To view or add a comment, sign in
Jessica Smith

Project Manager, Strategic Accounts at Procore Technologies
6mo
Report this post
Skip the searching and get the answers you need, fast, with Document Crunch��s all-new Contract Chat. Level up your contract insights and get to the heart of what you need to know, when you need to know it. Get insights from your contract without wasting time reading through the document line by line. Simply ask your contract a question and get an easy-to-understand answer immediately. Learn more --> https://ow.ly/IPxI50QtQuS
Like Comment
To view or add a comment, sign in
Michelle Fewel

Account Executive, General Contractors | Procore Technologies
6mo
Report this post
Skip the searching and get the answers you need, fast, with Document Crunch’s all-new Contract Chat. Level up your contract insights and get to the heart of what you need to know, when you need to know it. Get insights from your contract without wasting time reading through the document line by line. Simply ask your contract a question and get an easy-to-understand answer immediately. Learn more --> https://ow.ly/9oYL50Qu6U5
Like Comment
To view or add a comment, sign in
Arielle Scesa

Revenue Enablement Manager at Procore
6mo
Report this post
Skip the searching and get the answers you need, fast, with Document Crunch’s all-new Contract Chat. Level up your contract insights and get to the heart of what you need to know, when you need to know it. Get insights from your contract without wasting time reading through the document line by line. Simply ask your contract a question and get an easy-to-understand answer immediately. Learn more --> https://ow.ly/6YCH50QtN9e
Like Comment
To view or add a comment, sign in
Brenna Skalski

Actor and Sales Professional
2mo
Report this post
AI is the way of the future my friends!! There is no avoiding it, just learning how to maximize it for your role, business, or practice. We’re so lucky to have Hayley Hatcher-Mullins in our office who jumps in quick to help when one of my clients needs some hands on assistance, research, tool guidance, etc.! Just one of the amazing parts of our service that makes Veritext stand out. 💥💥💥 Just wanted to brag on you for a second, Hayley Hatcher-Mullins 😊 #veritext #whitegloveservice #AI #future #depositions #testimony

Hayley Hatcher-Mullins

Customer Support Associate | Site Concierge - Fort Worth, Texas Veritext Legal Solutions
2mo

⏩ Fast Track your Summary Transcript Process with Veritext Smart Summary! 📰 https://lnkd.in/gDRi9dCN

2 Comments
Like Comment
To view or add a comment, sign in
Karien Pype

VP Marketing @ ONTOFORCE
4mo
Report this post
💡A knowledge graph is a powerful tool for organizing and analyzing data through the use of a network of interconnected entities and their relationships. For pharma companies, this means a more efficient way to handle vast amounts of data related to drugs, diseases, compounds, research findings, and more. However, a knowledge graph is only a powerful tool if it is created and maintained properly – which can be challenging for teams or organizations who lack the expertise and/or the proper infrastructure to do so. Read on for insights into using, building, and maintaining a reliable knowledge graph.👇 #datascience #lifesciences #pharmaceuticals
ONTOFORCE

3,278 followers
4mo

A knowledge graph is a powerful tool only if it is created and maintained properly – which can be challenging for teams or organizations who lack the expertise and/or the proper infrastructure to do so. 👉 In this blog post, we offer insights into building and maintaining a reliable knowledge graph: https://lnkd.in/eiE6WbdR
Like Comment
To view or add a comment, sign in
Patrick Reeder
2mo
Report this post
Skip the searching and get the answers you need, fast, with Document Crunch’s all-new Contract Chat. Level up your contract insights and get to the heart of what you need to know, when you need to know it. Get insights from your contract without wasting time reading through the document line by line. Simply ask your contract a question and get an easy-to-understand answer immediately. Learn more --> https://ow.ly/fPJZ50RU3x8
Like Comment
To view or add a comment, sign in
bwook Kim

AutoRAG Main Developer - (주)마커 (Markr.AI) AI Researcher
1mo
Report this post
[English] AutoRAG Faq + Tips/Tricks to share !! I've been getting a lot of questions about AutoRAG lately, so I've put together a FAQ post with some of the most common ones! At the end of the article, I've also included tips on how to improve RAG performance in domain-specific documents. I hope it will be helpful to many people! https://lnkd.in/gpJqNkBh

FaQ & Tips/Tricks

medium.com
Like Comment
To view or add a comment, sign in

32 Posts

View Profile Follow

Ankush Gola’s Post

More Relevant Posts

Explore topics