Guides / Personalization / AI Personalization (beta) / Configure personalization / Prerequisites

May 27, 2024

Prepare your index structure

Use categorical attributes

Categorical attributes are attributes with a fixed number of possible values.

Incorporating such attributes into your index structure categorizes your data into well-defined, relatively smaller buckets.

Copy
[
  { "objectID": "ID01", "title": "Chair", "color": "red", "categories": ["Furniture", "Outdoors"] },
  { "objectID": "ID02", "title": "Table", "color": "green", "categories": ["Furniture", "Outdoors"] },
  { "objectID": "ID03", "title": "Water bottle", "color": "blue", "categories": ["Outdoors"] },
  { "objectID": "ID04", "title": "Book", "color": "red", "categories": ["Books"] },
  { "objectID": "ID05", "title": "Headphones", "color": "green", "categories": ["Electronics"] },
  { "objectID": "ID06", "title": "Phone", "color": "blue", "categories": ["Electronics"] },
  { "objectID": "ID07", "title": "Television", "color": "red", "categories": ["Electronics"] },
  { "objectID": "ID08", "title": "Can", "color": "green", "categories": ["Cooking & Dining", "Kitchenware"] },
  { "objectID": "ID09", "title": "Bowl", "color": "blue", "categories": ["Cooking & Dining", "Kitchenware"] }
]

Good categorical attributes

An attribute like color with finite values like red, green, and blue is considered a good categorical attribute because it organizes your index into three distinct buckets. Other good categorical attributes include gender, brand or categories.

Bad categorical attributes

Attributes that are unique for each record, such as objectID and title, are poor choices because they don’t offer any grouping into buckets. Other poor categorical attributes include description, sku, or price.

Avoid nesting attributes

When organizing data within records, prioritize a flat structure. Reserve deeper nesting for hierarchical facets.

Consider the following example where several keys are being nested.

Copy
{
  "key1": {
      "key2": {
          "key3": "value"  
        }
    }
}

To improve this index structure, the attribute-value pair must be simplified, making it easier for AI Personalization to process the index.

Copy
{
  "key1_key2_key3": "value"
}

To optimize performance, AI Personalization sets a maximum of 50 key-value pairs for nested attributes. Adhering to this limit ensures your nested attributes are processed efficiently. If you need to exceed these constraints, contact the Algolia support team.

Avoid mixing attribute types

Maintaining a consistent type for attributes is an important step in preparing your index structure.

Copy
[
  { "color": "red" },
  { "color": ["red", "blue"] },
  { "color": 250000000 }
]

Using this index as it is would lead to unexpected results. This is because the attribute color can be defined as a string, an array, and an integer.

A structure like this usually indicates an underlying issue with your data. You must evaluate your index and ensure a consistent type for your attributes.

Copy
[
  { "color": "red" },
  { "color": "green" },
  { "color": "blue" }
]

Avoid mixing attributes from different domains

When preparing your index structure, you must ensure that attributes are relevant to a single domain.

For example, an index for articles shouldn’t contain attributes relevant to product information and vice versa.

Language is a common domain which could also lead to a mix of attributes within your index.

Copy
[
  { 
    "objectID": "MH001", 
    "color": "red", 
    "title": "Bag" 
  },
  { 
    "objectID": "MH002", 
    "color": "rouge", 
    "title": "Sac" 
  },
  { 
    "objectID": "MH001", 
    "color": "blue", 
    "title": "Chair" 
  }
]

The second record is out of place within this index because it contains attributes in the French language. You can fix this by reviewing the index and ensuring that attributes from different domains haven’t been mixed.

Copy
[
  { 
    "objectID": "MH001", 
    "color": "red", 
    "title": "Bag" 
  },
  { 
    "objectID": "MH002", 
    "color": "red", 
    "title": "Bag" 
  },
  { 
    "objectID": "MH001", 
    "color": "blue", 
    "title": "Chair" 
  }
]

How AI Personalization validates attributes

It’s also important to understand how AI Personalization validation process works.

AI Personalization prioritizes attributes that significantly contribute to personalization, ensuring your user profiles are built on meaningful data. It filters out attributes based on user interactions.

An attribute-value pair must show significant user interaction. It doesn’t set a fixed benchmark for checks but instead looks for a significant number of interactions. For the attribute-pair, color:red, if AI Personalization randomly picks a set of users from last month (say 200), it would want to see at least some of them interacting with red products.
Attributes with minimal relevance are likely to be disregarded. An attribute is likely to be filtered out if it only applies to a tiny fraction of your products since it won’t generate enough user interactions.
Attributes with excessive diversity are likely to be disregarded. For example, an attribute like brand is likely to be filtered out if AI Personalization notices that there are thousands of different brands across millions of products and no single brand is getting enough user interaction to be considered important.
Unusual attribute values will be disregarded. Attributes with lots of exotic values (such as color:pink_with_brown_dots) aren’t excluded entirely but the exotic values are filtered out and won’t be processed.

Configure personalization

Prepare your event implementation

Did you find this page helpful?