Author Archives: Toke Eskildsen

About Toke Eskildsen

IT-Developer at statsbiblioteket.dk with a penchant for hacking Lucene/Solr.

Beware the cursorMark, my son!

Posted on October 24, 2023 by Toke Eskildsen

Efficient export of stored text content from multi-shard Solr setups using cursorMark for individual shards and merging the results externally. Continue reading →

Posted in eskildsen, Hacking, open source, Performance, Solr | Tagged datasets, Performance, Solr | Leave a comment

Dumb-down at Indexing or Nested Data in the Solr Search Engine

Posted on August 16, 2022 by Toke Eskildsen

Sigfrid Lundberg, Ph. D., Software Developer Royal Danish Library Copenhagen Denmark twitter — github — web site Are passions, then, the Pagans of the soul? Reason alone baptized? alone ordain’d To touch things sacred? (Edward Young — 1683-1765) Introduction The … Continue reading →

Posted in sigge, Solr, usability | Leave a comment

Which type bug?

Posted on June 10, 2020 by Toke Eskildsen

A light tale of bug hunting an Out Of Memory problem with SolrCloud. The setup and the problem At the Royal Danish Library we provide full text search for the Danish Netarchive. The heavy lifting is done in a single … Continue reading →

Posted in eskildsen, Solr | Tagged bughunting, memory | Leave a comment

Touching encouraged (an ongoing story)

Posted on October 26, 2019 by Toke Eskildsen

Ongoing experiments with a large touch screen providing access to cultural heritage material Continue reading →

Posted in eskildsen, Visualization | Leave a comment

DocValues jump tables in Lucene/Solr 8

Posted on March 12, 2019 by Toke Eskildsen

Lucene/Solr 8 is about to be released. Among a lot of other things is brings LUCENE-8585, written by your truly with a heap of help from Adrien Grand. LUCENE-8585 introduces jump-tables for DocValues, is all about performance and brings speed-ups … Continue reading →

Posted in eskildsen, Hacking, Low-level, Lucene, Performance, Solr, Uncategorized | 7 Comments

Faster DocValues in Lucene/Solr 7+

Posted on October 2, 2018 by Toke Eskildsen

This is a fairly technical post explaining LUCENE-8374 and its implications on Lucene, Solr and (qualified guess) Elasticsearch search and retrieval speed. It is primarily relevant for people with indexes of 100M+ documents. Teaser We have a Solr setup for … Continue reading →

Posted in eskildsen, Hacking, Low-level, Lucene, Performance, Solr | 1 Comment

juxta – image collage with metadata

Posted on February 7, 2017 by Toke Eskildsen

Creating large collages of images to give a bird’s eye view of a collection seems to be gaining traction. Two recent initiatives: The New York Public Library has a very visually pleasing presentation of public domain digitizations, but with a … Continue reading →

Posted in Uncategorized | Leave a comment

70TB, 16b docs, 4 machines, 1 SolrCloud

Posted on November 30, 2016 by Toke Eskildsen

At Statsbiblioteket we maintain a historical net archive for the Danish parts of the Internet. We index it all in Solr and we recently caught up with the present. Time for a status update. The focus is performance and logistics, … Continue reading →

Posted in Hacking, Low-level, Performance, Solr, Statsbiblioteket, Uncategorized | 6 Comments

CDX musings

Posted on March 18, 2016 by Toke Eskildsen

This is about web archiving, corpus creation and replay of web sites. No fancy bit fiddling here, sorry. There is currently some debate on CDX, used by the Wayback Engine, Open Wayback and other web archive oriented tools, such as … Continue reading →

Posted in Uncategorized | Leave a comment

Faster grouping, take 1

Posted on January 18, 2016 by Toke Eskildsen

A failed attempt of speeding up grouping in Solr, with an idea for next attempt. Grouping at a Statsbiblioteket project We have 100M+ articles from 10M+ pages belonging to 700K editions of 170 newspapers in a single Solr shard. It … Continue reading →

Posted in Uncategorized | Leave a comment

Author Archives: Toke Eskildsen

About Toke Eskildsen

Archives

Meta