Guides / Scaling

While most SaaS services choose virtualization, Algolia uses bare metal servers to directly interact with the computer’s essential resources, such as the operating system, CPU, RAM, and disk.

Bare metal servers

For Algolia, virtual machines aren’t necessary. Bare metal servers have been time and task-slicing for years without needing virtual machines. With powerful hardware components, a single server can handle countless users, especially if they’re all doing the same operations, which is the case with Algolia.

If you have large data requirements, you can choose a plan with dedicated servers, giving you exclusive access to the entire cluster. For most use cases, though, sharing servers works well. In both situations, the principle is the same: Algolia operates directly on the machine.

At Algolia, the term server actually refers to a cluster of three identical servers. With clusters, Algolia can provide a reliability of 99.99% because they guarantee that at least one of the three servers is always available.

Algolia clusters

An Algolia cluster is a set of three servers. All servers are equal: each is equally capable of responding to every request. For this to be possible, each server must have the same data, index settings, and overall system configuration, enabling the cluster to behave as a single server.

The benefit of clusters is redundancy: the service is still available if one or two servers go down. This redundancy guarantees an SLA of 99.99% availability.

Algolia clusters in more detail

A cluster of three servers acts as one, each ready to serve at any moment, waiting for the current request while the other two remain on-call to process subsequent requests.

Clusters are used for all search and indexing operations. Even if you’re operating a globally popular retail website with an immense clothing collection, you can update your indices anytime while thousands of users search for different clothing items. Each request is balanced and evenly distributed so all three servers share the load.

Algolia has hundreds of clusters with access to data centers worldwide.

Why use clusters

  • Clusters aren’t designed to optimize capacity. Algolia doesn’t split your data across three computers, where each gets a third of the data. This would triple the capacity, but that isn’t the goal. You usually only need a single server to store your data (even with enormous databases). When it comes to search, all that’s needed is a small subset of your data: usually small enough for one server.
  • Clusters aren’t about concurrency, where parts of a single operation are split across different computers.
  • Clusters aren’t used for parallel computing: each server in a cluster processes the whole request independently.

Clusters are about redundancy. For Algolia, performance goes hand-in-hand with reliability: a fast and relevant search is of little value if the search engine is unavailable.

What happens when a server goes down

A server is unavailable when other servers in the cluster can’t reach it. This can happen for many reasons: a temporary network failure, a server being too busy to respond, or a server being physically down. Synchronizing data within the cluster requires uninterrupted communication between all three servers, as explained below with the consensus algorithm. When one or more machines in a cluster are unreachable, the synchronization process is at risk.

If a machine is unreachable, the other two continue functioning normally, processing indexing and search requests. They can achieve consensus among themselves, and when the third returns, it can synchronize with the same index as the other two.

While a server might be unreachable to others in the cluster, it may still be able to receive indexing requests from your computers. This is a severe problem for synchronization: the “down” server has no idea what the other two servers are doing with their indices. If it were to start using its indexing changes without sharing those changes with the other servers, the cluster would end up with two different datasets.

Algolia queues indexing jobs on any unreachable server to ensure this doesn’t happen. While the other two servers continue to process their indexing jobs and synchronize, the absent server puts any indexing jobs on hold. These “on hold” jobs are only processed once the whole cluster is back together.

Availability over consistency

For hosted services, there’s a tradeoff between data availability and consistency:

  • Availability: constant access to your data, no service outage.
  • Consistency: the same data everywhere at the same time (for example, all users get the same search results simultaneously)

Algolia chose availability over consistency because when someone searches, they should get results without failure. Algolia considers slight data differences between users less critical than users not getting results.

For technical reasons, achieving these two goals with equal success is impossible (see CAP theorem and eventual consistency). Among these reasons is that every client gets three servers to service all search requests, guaranteeing that at least one server is always available. If Algolia were to delay searches until all three computers have the same data, this would cause delays.

That said, synchronizing data between three servers takes seconds or less. You will, therefore, only infrequently experience data discrepancies.

Search operations

For search, server-to-server communication is less important: as long as a server is functional, it can process search requests.

From servers to clusters to a Distributed Search Network

Clusters ensure:

  • Availability: if one or two servers go down, your users aren’t affected, and search is always available.
  • Redundancy: having three live copies of your data makes recovery easier.

Consensus of three servers

Clusters require a robust consensus algorithm to ensure that each server always has the same data without service interruption. Algolia uses the Raft algorithm to coordinate all index input (adding, updating, and deleting index data) so that all machines in a cluster update simultaneously.

Distance counts

When servers share the same data center or same power lines, a single flood or power outage can bring down the entire cluster. Thus, to ensure cluster reliability, Algolia separates the servers so that no single incident can bring the whole cluster down. This is done by adding new data centers in neighboring regions with no physical links.

Sharing a network is the most common reason for system downtime, so part of creating distance is addressing network issues. Algolia does this by ensuring that no server within the same cluster uses the same ISP. These distances don’t affect the Raft consensus among machines.

Extending the cluster with a Distributed Search Network

If you have a worldwide user base, Algolia’s Distributed Search Network (DSN) adds one or more satellite servers to a cluster, extending your reach into other regions. Every DSN contains a cluster’s entire data and settings.

For example, if you have a cluster on the east coast of the United States, you can add a west coast DSN server for those users to reduce the network latency between the client and the server, which improves performance.

You can also use DSN to share the load of significant cluster activity by offloading requests to the DSN whenever your servers reach peak usage.

Monitoring and locating Algolia’s clusters and servers

You can monitor your servers, clusters, and DSNs from the dashboard:

The Monitoring and Usage APIs also provide a window into all your cluster and DSN activity.

Where are the clusters and servers located?

  • US-East (Virginia): two different Equinix data centers in Ashburn & COPT DC-6 in Manassas (three independent, autonomous systems)

  • US-West (California): three different Equinix data centers in San Jose (three independent, autonomous systems)

  • US-Central (Texas): two different data centers in Dallas (two independent, autonomous systems)

  • Europe (France): four different data centers in Roubaix, two different data centers in Strasbourg, and one data center in Gravelines

  • Europe (Netherlands): (DSN only) four different data centers around Amsterdam

  • Europe (Germany): seven different data centers in Falkenstein and one data center in Frankfurt (two independent, autonomous systems)

  • Europe (UK): two different data centers in London (two independent, autonomous systems)

  • Canada: four different data centers in Beauharnois

  • Middle East: (DSN only) one data center in Dubai

  • Singapore: two different data centers in Singapore (two independent, autonomous systems)

  • Brazil: three different data centers around São Paulo (two independent, autonomous systems)

  • Japan: one data center in Tokyo and one data center in Osaka

  • Australia: three data centers in Sydney (two independent, autonomous systems)

  • India: one data center in Noida

  • Hong Kong: two different data centers (two independent, autonomous systems)

  • South Africa: two data centers in Johannesburg (two independent, autonomous systems)

When you create your account, you can select which region you want to use. Some regions aren’t available for all plans. For more details, see Pricing

You can also use the DSN feature to distribute your search engine in multiple regions and decrease the latency for your audience in different parts of the world.