HA (high-availability) setup for Smart Video Filter

With high expectations on website and service availability in 2018, it is especially important to ensure the redundant DR (disaster recovery) copies of the service are running at all times and are ready to take on the full PROD load within seconds. Hosting companies like Amazon have long solved this problem for standard services, e.g. for Elasticsearch cluster. Since cluster always runs with 1 or more replicas, the replica node is ready to take over for a short period of time, till a new primary is spun up and synched after the primary failure. A level of abstraction such as Kubernetes also allows creation of high-availability service.

With all available options, what should we use in the real world? It depends on the budget and the available hardware. My recently released service, Smart Video Filter, is a low budget solution working on 2 physical machines running CentOS 7. Two enterprise-grade SSDs with large TBW (terabytes written) resource is substantially cheaper than AWS in terms of storage cost and provisioned IOPS cost. It is recommended to run HA setups with 3 machines, but 2 machines (PROD and DR) provide enough reliability and redundancy in most cases. Four different services on those machines needed to seemlessly switch between PROD and DR: ElasticSearch, MongoDB, Mining Service, and Search Service.

ElasticSearch setup over 2 machine includes creating a cluster with 1 primary and 1 replica. ElasticSearch reads and writes happily proceed even if one of those nodes is down. No special setup is necessary.

MongoDB setup on 2 nodes is trickier. MongoDB has a protection against a split brain condition. The cluster does not allow writes, if a primary is not chosen. A primary can only be chosen by the majority of nodes and there is no majority with 1 out of 2 nodes down. Addition of an Arbiter instance is recommended in such cases. However, a simple arbiter setup isn’t going to work, if the arbiter is deployed on one of data nodes. If the entire node goes down, then it takes the arbiter down with it. What I ended up implementing is a workaround of the split-brain protection, when a MongoDB config is overwritten by the mining service. The mining service provides an independent confirmation that one of data nodes is dead, and adds an Arbiter on a different machine to the cluster, while removing an arbiter running on the same machine as the failed data node. Node health detection by the mining service is described below.

Search Service makes use of the health API. One instance of the service is deployed on PROD and DR node each. Each instance deploys a RESTful endpoint with a predictable response, which consists simply of the string “alive”. Each instance also deploys a client to read this status from itself and from the other node. When both nodes are alive, the PROD node takes over. When DR node detects that it is alive, but the PROD node is not alive, then it takes over. Each node is self-aware: it detects its role by comparing its static IP (within a LAN network) to the defined IPs of the PROD and DR nodes. When the node takes over, it uses port triggering on the router to direct future external requests to itself. It was shown within testing that port triggering can switch the routed node within seconds.

Mining Service employs same health API + another external API, which reports, whether the job is running. When PROD or DR nodes are ready to take over, they let the current job finish on another node, before scheduling jobs on itself. Jobs should not run on PROD and DR node simultaneously. The health detection also helps to switch a MongoDB Arbiter between nodes to ensure MongoDB can elect a primary.

After the full setup is implemented, the system is capable of correctly functioning with only disruptions of several seconds, when one of the machines goes down entirely. This was readily demonstrated within testing, when all services remained highly available throughout a rolling restart of 2 machines!

Leave a Reply

Your email address will not be published. Required fields are marked *