-
Archives
- October 2023
- August 2022
- February 2021
- June 2020
- October 2019
- March 2019
- October 2018
- July 2018
- May 2018
- March 2017
- February 2017
- November 2016
- July 2016
- March 2016
- January 2016
- November 2015
- October 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- June 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- June 2013
- April 2013
- March 2013
- February 2013
- January 2013
- September 2011
- May 2011
- March 2011
- February 2011
- January 2011
- November 2010
- October 2010
- September 2010
- June 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
-
Meta
Author Archives: Toke Eskildsen
Beware the cursorMark, my son!
Efficient export of stored text content from multi-shard Solr setups using cursorMark for individual shards and merging the results externally. Continue reading
Posted in eskildsen, Hacking, open source, Performance, Solr
Tagged datasets, Performance, Solr
Leave a comment
Dumb-down at Indexing or Nested Data in the Solr Search Engine
Sigfrid Lundberg, Ph. D., Software Developer Royal Danish Library Copenhagen Denmark twitter — github — web site Are passions, then, the Pagans of the soul? Reason alone baptized? alone ordain’d To touch things sacred? (Edward Young — 1683-1765) Introduction The … Continue reading
Posted in sigge, Solr, usability
Leave a comment
Which type bug?
A light tale of bug hunting an Out Of Memory problem with SolrCloud. The setup and the problem At the Royal Danish Library we provide full text search for the Danish Netarchive. The heavy lifting is done in a single … Continue reading
Touching encouraged (an ongoing story)
Ongoing experiments with a large touch screen providing access to cultural heritage material Continue reading
Posted in eskildsen, Visualization
Leave a comment
DocValues jump tables in Lucene/Solr 8
Lucene/Solr 8 is about to be released. Among a lot of other things is brings LUCENE-8585, written by your truly with a heap of help from Adrien Grand. LUCENE-8585 introduces jump-tables for DocValues, is all about performance and brings speed-ups … Continue reading
Posted in eskildsen, Hacking, Low-level, Lucene, Performance, Solr, Uncategorized
7 Comments
Faster DocValues in Lucene/Solr 7+
This is a fairly technical post explaining LUCENE-8374 and its implications on Lucene, Solr and (qualified guess) Elasticsearch search and retrieval speed. It is primarily relevant for people with indexes of 100M+ documents. Teaser We have a Solr setup for … Continue reading
juxta – image collage with metadata
Creating large collages of images to give a bird’s eye view of a collection seems to be gaining traction. Two recent initiatives: The New York Public Library has a very visually pleasing presentation of public domain digitizations, but with a … Continue reading
Posted in Uncategorized
Leave a comment
70TB, 16b docs, 4 machines, 1 SolrCloud
At Statsbiblioteket we maintain a historical net archive for the Danish parts of the Internet. We index it all in Solr and we recently caught up with the present. Time for a status update. The focus is performance and logistics, … Continue reading
Posted in Hacking, Low-level, Performance, Solr, Statsbiblioteket, Uncategorized
6 Comments
CDX musings
This is about web archiving, corpus creation and replay of web sites. No fancy bit fiddling here, sorry. There is currently some debate on CDX, used by the Wayback Engine, Open Wayback and other web archive oriented tools, such as … Continue reading
Posted in Uncategorized
Leave a comment
Faster grouping, take 1
A failed attempt of speeding up grouping in Solr, with an idea for next attempt. Grouping at a Statsbiblioteket project We have 100M+ articles from 10M+ pages belonging to 700K editions of 170 newspapers in a single Solr shard. It … Continue reading
Posted in Uncategorized
Leave a comment