Guides / Sending and managing data / Prepare your records for indexing

To ensure good performance, Algolia limits the size of each record. Long content, like a detailed Wikipedia page, might be too big to fit into one of these records.

To work around this, divide long pages into smaller “chunks”. This not only helps you stay within the size limit but also makes your search more relevant. Break the page into sections or even paragraphs, and store each as a separate record.

When splitting into chunks, organize them based on the page structure. For instance, if you’re dealing with a lengthy Wikipedia article, create separate records for each section like “Introduction” or “History”.

If you’re using the Algolia Crawler and the record size exceeds the limit, use the helpers.splitContentIntoRecords() helper to split the page into smaller chunks.

Avoid duplicates

When you split a page, the same content might appear in multiple records. By setting the distinct parameter to true, Algolia ensures only the most relevant of these duplicate records is shown. You decide what counts as ‘distinct’ by choosing a meaningful attribute, like the title of a section.

Example

In the following example, you’ve structured your records for a long page. To make sure that search results show only one entry per section, you:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[
  {
    "title": "Algolia",
    "permalink": "https://en.wikipedia.org/wiki/Algolia",
    "content": "Algolia is a U.S. startup company offering a web search product through a SaaS (software as a service) model."
  },
  {
    "title": "Algolia",
    "section": "Company",
    "permalink": "https://en.wikipedia.org/wiki/Algolia",
    "content": "Algolia was founded in 2012 by Nicolas Dessaigne and Julien Lemoine, who are originally from Paris, France. It was originally a company focused on offline search on mobile phones. Later it was selected to be part of Y Combinator's[1] Winter 2014 class."
  },
  {
    "title": "Algolia",
    "section": "Company",
    "permalink": "https://en.wikipedia.org/wiki/Algolia",
    "content": "Starting with two data centres in Europe and the US, Algolia opened a third centre in Singapore in March 2014,[2] and as of 2016, claimed to be present in 47 locations across 15 worldwide regions.[3] It serves roughly 1,600 customers, handling 12 billion user queries per month.[4] Those customers are among e-commerce, medium and other fields, including DC Shoes, Medium and vevo.[5] In May 2015, Algolia received 18.3 million dollars in a series A investment from a financial group led by Accel Partners,[6] and in 2017 a $53M series B investment, also led by Accel Partners[7] From June 2016 to June 2017, the usage of Algolia by small websites has increased from 632 to 1,591 in the \"top 1mio websites\" evaluated by BuiltWith. In the same timeframe, BuiltWith recorded no significant usage increase among their \"top 10k homepages\".[8]"
  },
  {
    "title": "Algolia",
    "section": "Products and technology",
    "permalink": "https://en.wikipedia.org/wiki/Algolia",
    "content": "The Algolia model provides search as a service, offering web search across a client's website using an externally hosted search engine.[9][10] Although in-site search has long been available from general web search providers such as Google, this is typically done as a subset of general web searching. The search engine crawls or spiders the web at large, including the client site, and then offers search features restricted to only that target site. This is a large and complex task, available only to large organisations at the scale of Google or Microsoft."
  }
]

How to enable the distinct feature

You can enable distinct from Algolia’s dashboard or API.

Using the API

If using the API to enable ‘distinct`, you can either do it at indexing time (when you add records to your indices) or at query time (when users search).

  1. Set an attribute, such as section, as the attributeForDistinct
  2. Set distinct to true to deduplicate your results.

At indexing time

1
2
3
4
$index->setSettings([
  'attributeForDistinct' => 'section',
  'distinct' => true
]);

At query time

1
2
3
$results = $index->search('query', [
  'distinct' => true
]);

Using the dashboard

  1. Go to the Algolia dashboard and select your Algolia application.
  2. On the left sidebar, select Search.
  3. Select your Algolia index:

    Select your Algolia application and index

  4. Click the Configuration tab.
  5. In the Search behavior section, select Deduplication and Grouping.
  6. Set the Distinct drop-down menu option to true.
  7. Select your attribute in the Attribute for Distinct drop-down menu.
  8. Save your changes.
Did you find this page helpful?