Daniel Tunkelang
Mountain View, California, United States
38K followers
500+ connections
About
Articles by Daniel
Contributions
Activity
-
Honored to participate in Nicolay Christopher Gerold's star-studded podcast series on How AI Is Built. Nicolay and I had a great conversation about…
Honored to participate in Nicolay Christopher Gerold's star-studded podcast series on How AI Is Built. Nicolay and I had a great conversation about…
Shared by Daniel Tunkelang
-
We just started posting our first Call for Participation for Search Solutions 2024. So if you have not received it yet via your favourite mailing…
We just started posting our first Call for Participation for Search Solutions 2024. So if you have not received it yet via your favourite mailing…
Liked by Daniel Tunkelang
-
Talking about Learning To Rank (LTR) models, Daniel Tunkelang just wrote a great post about relevance judgements in Search. "The cost of training a…
Talking about Learning To Rank (LTR) models, Daniel Tunkelang just wrote a great post about relevance judgements in Search. "The cost of training a…
Liked by Daniel Tunkelang
Experience & Education
Publications
-
Semantic Equivalence of e-Commerce Queries
KDD 2023 Workshop on e-Commerce and NLP (ECNLP)
Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for…
Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives. The framework utilizes both surface similarity and behavioral similarity to determine query equivalence. Surface similarity involves canonicalizing queries based on word inflection, word order, compounding, and noise words. Behavioral similarity leverages historical search behavior to generate vector representations of query intent. An offline process is used to train a sentence similarity model, while an online nearest neighbor approach supports processing of unseen queries. Experimental evaluations demonstrate the effectiveness of the proposed approach, outperforming popular sentence transformer models and achieving a Pearson correlation of 0.85 for query similarity. The results highlight the potential of leveraging historical behavior data and training models to recognize and utilize query equivalence in e-commerce search, leading to improved user experiences and business outcomes. Further advancements and benchmark datasets are encouraged to facilitate the development of solutions for this critical problem in the e-commerce domain.
Other authorsSee publication -
ORDSIM: Ordinal Regression for E-Commerce Query Similarity Prediction
Proceedings of the International Workshop on Interactive and Scalable Information Retrieval methods for eCommerce (ISIR-eCom) 2022
Query similarity prediction task is generally solved by regression based models with square loss. Such a model is agnostic of absolute similarity values and it penalizes the regression error at all ranges of similarity values at the same scale. However, to boost e-commerce platform's monetization, it is important to predict high-level similarity more accurately than low-level similarity, as highly similar queries retrieves items according to user-intents, whereas moderately similar item…
Query similarity prediction task is generally solved by regression based models with square loss. Such a model is agnostic of absolute similarity values and it penalizes the regression error at all ranges of similarity values at the same scale. However, to boost e-commerce platform's monetization, it is important to predict high-level similarity more accurately than low-level similarity, as highly similar queries retrieves items according to user-intents, whereas moderately similar item retrieves related items, which may not lead to a purchase. Regression models fail to customize its loss function to concentrate around the high-similarity band, resulting poor performance in query similarity prediction task. We address the above challenge by considering the query prediction as an ordinal regression problem, and thereby propose a model, ORDSIM (ORDinal Regression for SIMilarity Prediction). ORDSIM exploits variable-width buckets to model ordinal loss, which penalizes errors in high-level similarity harshly, and thus enable the regression model to obtain better prediction results for high similarity values. We evaluate ORDSIM on a dataset of over 10 millions e-commerce queries from eBay platform and show that ORDSIM achieves substantially smaller prediction error compared to the competing regression methods on this dataset.
Other authorsSee publication -
MMM, Search!
Presented to Wikimedia Foundation
An opinionated talk about search metrics, models, and methods. Presented to the Wikimedia Foundation on April 27, 2020 at the invitation of Wikimedia CTO Grant Ingersoll.
-
Doing Data Science Right - Your Most Common Questions Answered
First Round Review
In this article, we've summarized the advice we give to founders who are interested in building data science teams. We explain why data science is so important for many startups, when companies should begin investing in it, where to put data science in their organization and how to build a culture where data science thrives.
Other authorsSee publication -
The Role of Network Distance in Linkedin People Search
37th Annual International ACM SIGIR Conference (SIGIR 2014)
-
Web Science: How is it different?
ACM Web Science 2014 Conference (WebSci 2014)
Keynote Address at ACM Web Science 2014 Conference (WebSci 2014)
-
Symposium on Human–Computer Information Retrieval
Journal of Big Data
-
Find and be Found: Information Retrieval at LinkedIn
36th Annual International ACM SIGIR Conference (SIGIR 2013)
Invited talk at the 36th Annual International ACM SIGIR Conference (SIGIR 2013)
Other authorsSee publication -
Introduction to Special Issue on Human-Computer Information Retrieval
Journal of Information Processing & Management
-
Content, Connections, and Context
6th ACM International Conference on Recommender Systems (RecSys 2012)
Keynote at Workshop on Recommender Systems and the Social Web (RSWeb 2012)
-
Data By The People, For The People
21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
-
Recommendations as a Conversation with the User
5th ACM International Conference on Recommender Systems (RecSys 2011)
Tutorial exploring role of recommendations as part of a conversation between user and information-seeking system.
-
Social Navigation: A Position Paper
HCIR
In this position paper, we propose social navigation as a paradigm
for information access. We define social navigation as navigation
through explicit manipulation of a social lens and offer examples
of its application. -
Design for Interaction
ACM SIGMOD/PODS Conference
Special invited session on Human-Computer Interaction with Information.
-
Faceted Search
Morgan & Claypool
In Synthesis Lectures on Information Concepts, Retrieval, and Services, edited by Gary Marchionini of the University of North Carolina.
-
Resolving the Battle Royale between Information Retrieval and Information Science
Information Seeking Support Systems Workshop
Position paper at invitational workshop sponsored by the National Science Foundation (NSF).
-
Enterprise Information Access and the User Experience
IT Professional 9(1)
Explanation of enterprise information access framework IT Professional is published by the IEEE.
Other authorsSee publication -
Information Access and the User Experience
30th Annual International ACM SIGIR Conference (SIGIR 2007)
Invited presentation at Industry Event.
-
Dynamic Category Sets: An Approach for Faceted Search
ACM SIGIR Workshop on Faceted Search
Novel approach that addresses the vocabulary problem for faceted data.
-
Processing Search Queries in a Distributed Environment
ACM Thirteenth Conference on Information and Knowledge Management (CIKM 2004)
-
Making the Nearest Neighbor Meaningful
Workshop on Clustering High Dimensional Data and its Applications at Second SIAM International Conference on Data Mining (SDM 2002)
Proposes new data-driven difference measure for categorical data.
-
JIGGLE: Java Interactive Graph Layout Environment
6th Annual Symposium on Graph Drawing (GD '98)
Java-based platform for experimenting with numerical optimization approaches to general graph layout.
-
Lexical Navigation: Using Incremental Graph Drawing for Query Refinement
5th Annual Symposium on Graph Drawing (GD '97)
Patents
-
Leveraging A Social Graph For Use With Electronic Messaging
Issued US 9,971,993
-
Method and System for Semantic Search against a Document Collection
Issued US 9,710,518
-
Method and System for Semantic Search against a Document Collection
Issued US 9,116,948
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,874,549
-
Method and System for Information Retrieval with Clustering
Issued US 8,676,802
-
Method and System for Semantic Search Against a Document Collection
Issued US 8,473,503
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,219,593
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,051,073
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,051,084
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,024,327
-
System and Method for Measuring the Quality of Document Sets
Issued US 8,005,643
-
Hierarchical Data-Driven Navigation System and Method for Information Retrieval
Issued US 7,912,823
-
Hierarchical Data-Driven Navigation System and Method for Information Retrieval
Issued US 7,035,864
-
Applying Numerical Approximation to General Graph Drawing
Issued US 5,995,114
-
Search Result Identification Using Vector Aggregation
Filed US 17/646,695
-
Identifying Members of a Small and Medium Business Segment
Filed US 14/318,326
-
Techniques For Identifying And Presenting Connection Paths
Filed US 13/548,957
-
Techniques For Identifying And Presenting Connection Paths
Filed US 13/482,884
Languages
-
Spanish
-
-
French
-
-
Italian
-
-
Portuguese
-
-
German
-
Recommendations received
9 people have recommended Daniel
Join now to viewMore activity by Daniel
-
The most common goal that my search clients express is a desire to improve their ranking. This post sketches out some challenges of obtaining labeled…
The most common goal that my search clients express is a desire to improve their ranking. This post sketches out some challenges of obtaining labeled…
Shared by Daniel Tunkelang
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More