{"id":288,"date":"2017-11-10T15:01:47","date_gmt":"2017-11-10T15:01:47","guid":{"rendered":"https:\/\/bootstrap-it.com\/blog\/?p=288"},"modified":"2017-11-10T15:09:04","modified_gmt":"2017-11-10T15:09:04","slug":"high-availability-concepts-and-theory","status":"publish","type":"post","link":"https:\/\/bootstrap-it.com\/blog\/?p=288","title":{"rendered":"High Availability: Concepts and Theory"},"content":{"rendered":"<div id=\"s-share-buttons\" class=\"horizontal-w-c-circular s-share-w-c\"><a href=\"http:\/\/www.facebook.com\/sharer.php?u=https:\/\/bootstrap-it.com\/blog\/?p=288\" target=\"_blank\" title=\"Share to Facebook\" class=\"s3-facebook hint--top\"><\/a><a href=\"http:\/\/twitter.com\/intent\/tweet?text=High Availability: Concepts and Theory&url=https:\/\/bootstrap-it.com\/blog\/?p=288\" target=\"_blank\"  title=\"Share to Twitter\" class=\"s3-twitter hint--top\"><\/a><a href=\"http:\/\/reddit.com\/submit?url=https:\/\/bootstrap-it.com\/blog\/?p=288&title=High Availability: Concepts and Theory\" target=\"_blank\" title=\"Share to Reddit\" class=\"s3-reddit hint--top\"><\/a><a href=\"http:\/\/www.linkedin.com\/shareArticle?mini=true&url=https:\/\/bootstrap-it.com\/blog\/?p=288\" target=\"_blank\" title=\"Share to LinkedIn\" class=\"s3-linkedin hint--top\"><\/a><a href=\"mailto:?Subject=High%20Availability:%20Concepts%20and%20Theory&Body=Here%20is%20the%20link%20to%20the%20article:%20https:\/\/bootstrap-it.com\/blog\/?p=288\" title=\"Email this article\" class=\"s3-email hint--top\"><\/a><\/div><p id=\"2d19\" class=\"graf graf--p graf-after--h3\"><em class=\"markup--em markup--p-em\">This article is adapted from chapter 7 of my book,\u00a0<\/em><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\"><em class=\"markup--em markup--p-em\">Teach Yourself Linux Virtualization and High Availability: prepare for the LPIC-3 304 certification exam<\/em><\/a><em class=\"markup--em markup--p-em\">.<\/em><\/p>\n<p id=\"3631\" class=\"graf graf--p graf-after--p\">Let\u2019s focus more on some of the larger architectural principles of cluster management than on any single technology solution. We get to see some actual implementations later\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\">in the book<\/a>\u00a0&#8211; and you can learn a lot about how this works on Amazon\u2019s AWS in my\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/livebook.manning.com\/#!\/book\/learn-amazon-web-services-in-a-month-of-lunches\/chapter-14\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/livebook.manning.com\/#!\/book\/learn-amazon-web-services-in-a-month-of-lunches\/chapter-14\">Learn Amazon Web Services in a Month of Lunches book<\/a>\u00a0from Manning. But for now, let\u2019s first make sure we\u2019re comfortable with the basics.<\/p>\n<p id=\"358a\" class=\"graf graf--p graf-after--p\">Running server operations using clusters of either physical or virtual computers is all about improving both reliability and performance over and above what you could expect from a single, high-powered server. You add reliability by avoiding hanging your entire infrastructure on a single point of failure (i.e., a single server). And you can increase performance through the ability to very quickly add computing power and capacity by scaling up and out.<\/p>\n<p id=\"d7ef\" class=\"graf graf--p graf-after--p\">This might happen through intelligently spreading your workloads among diverse geographic and demand environments (load balancing), providing<br \/>\nbackup servers that can be quickly brought into service in the event a working node fails (failover), optimizing the way your data tier is deployed, or allowing for fault tolerance through loosely coupled architectures.<\/p>\n<p id=\"3609\" class=\"graf graf--p graf-after--p\">We\u2019ll get to all that. First, though, here are some basic definitions:<\/p>\n<p id=\"d75b\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Node<\/strong>: A single machine (either physical or virtual) running server operations independently on its own operating system. Since any single node can fail, meeting availability goals requires that multiple nodes operate as part of a cluster.<\/p>\n<p id=\"842f\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Cluster<\/strong>: Two or more server nodes running in coordination with each other to complete individual tasks as part of a larger service, where mutual awareness allows one or more nodes to compensate for the loss of another.<\/p>\n<p id=\"8d54\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Server failure<\/strong>: The inability of a server node to respond adequately to client requests. This could be due to a complete crash, connectivity problems, or because it has been overwhelmed by high demand.<\/p>\n<p id=\"51bf\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Failover<\/strong>: The way a cluster tries to accommodate the needs of clients orphaned by the failure of a single server node by launching or redirecting other nodes to fill a service gap.<\/p>\n<p id=\"6673\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Failback<\/strong>: The restoration of responsibilities to a server node as it recovers from a failure.<\/p>\n<p id=\"d5cb\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Replication<\/strong>: The creation of copies of critical data stores to permit reliable synchronous access from multiple server nodes or clients and to ensure they will survive disasters. Replication is also used to enable reliable load balancing.<\/p>\n<p id=\"cde1\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Redundancy<\/strong>: The provisioning of multiple identical physical or virtual server nodes of which any one can adopt the orphaned clients of another one that fails.<\/p>\n<p id=\"2305\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Split brain<\/strong>: An error state in which network communication between nodes or shared storage has somehow broken down and multiple individual nodes, each believing it\u2019s the only node still active, continue to access and update a common data source. While this doesn\u2019t impact shared-nothing designs, it can lead to client errors and data corruption within shared clusters.<\/p>\n<p id=\"55b3\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Fencing<\/strong>: To prevent split brain, the stonithd daemon can be configured to automatically shut down a malfunctioning node or to impose a virtual fence between it and the data resources of the rest of a cluster. As long as there is a chance that the node could still be active, but is not properly coordinating with the rest of the cluster, it will remain behind the fence. Stonith stands for \u201cShoot the other node in the head\u201d. Really.<\/p>\n<p id=\"a8ef\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Quorum<\/strong>: You can configure fencing (or forced shutdown) to be imposed on nodes that have fallen out of contact with each other or with some shared resource. Quorum is often defined as more than half of all the nodes on the total cluster. Using such defined configurations, you avoid having two subclusters of nodes, each believing the other to be malfunctioning, attempting to knock the other one out.<\/p>\n<p id=\"adc5\" class=\"graf graf--p graf-after--p\"><strong class=\"markup--strong markup--p-strong\">Disaster Recover<\/strong>y: Your infrastructure can hardly be considered highly available if you\u2019ve got no automated backup system in place along with an integrated and tested disaster recovery plan. Your plan will need to account for the redeployment of each of the servers in your custer.<\/p>\n<h4 id=\"30fe\" class=\"graf graf--h4 graf-after--p\">Active\/Passive Cluster<\/h4>\n<p id=\"e061\" class=\"graf graf--p graf-after--h4\">The idea behind service failover is that the sudden loss of any one node in a service cluster would quickly be made up by another node taking its place. For this to work, the IP address is automatically moved to the standby node in the event of a failover. Alternatively, network routing tools like load balancers can be used to redirect traffic away from failed nodes. The precise way failover happens depends on the way you have configured your nodes.<\/p>\n<p id=\"4913\" class=\"graf graf--p graf-after--p\">Only one node will initially be configured to serve clients, and will continue to do so alone until it somehow fails. The responsibility for existing and new clients will then shift (i.e., \u201cfailover\u201d) to the passive\u200a\u2014\u200aor backup\u200a\u2014\u200anode that until now has been kept passively in reserve. Applying the model to multiple servers or server room components (like power supplies), n+1 redundancy provides just enough resources for the current demand plus one more unit to cover for a failure.<\/p>\n<figure id=\"7c45\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*f3FAK-0sBhz347VYjQUsEQ.png\" data-width=\"1280\" data-height=\"720\" data-is-featured=\"true\" data-action=\"zoom\" data-action-value=\"1*f3FAK-0sBhz347VYjQUsEQ.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*f3FAK-0sBhz347VYjQUsEQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*f3FAK-0sBhz347VYjQUsEQ.png\" \/><\/div>\n<\/div>\n<\/figure>\n<figure id=\"442a\" class=\"graf graf--figure graf-after--figure\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*MXXBRytPbze9nmY0RH_8Ag.png\" data-width=\"1280\" data-height=\"720\" data-action=\"zoom\" data-action-value=\"1*MXXBRytPbze9nmY0RH_8Ag.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MXXBRytPbze9nmY0RH_8Ag.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MXXBRytPbze9nmY0RH_8Ag.png\" \/><\/div>\n<\/div>\n<\/figure>\n<h4 id=\"63dc\" class=\"graf graf--h4 graf-after--figure\">Active\/Active Cluster<\/h4>\n<p id=\"34dd\" class=\"graf graf--p graf-after--h4\">A cluster using an active\/active design will have two or more identically configured nodes independently serving clients.<\/p>\n<figure id=\"cb7b\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*BAZtz_iDmGIj2hy-MvuUPg.png\" data-width=\"1280\" data-height=\"720\" data-action=\"zoom\" data-action-value=\"1*BAZtz_iDmGIj2hy-MvuUPg.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*BAZtz_iDmGIj2hy-MvuUPg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*BAZtz_iDmGIj2hy-MvuUPg.png\" \/><\/div>\n<\/div>\n<\/figure>\n<p id=\"86cb\" class=\"graf graf--p graf-after--figure\">Should one node fail, its clients will automatically connect with the second node and, as far as resources permit, receive full resource access.<\/p>\n<figure id=\"445f\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*g5Q9d7nhz8cghuLT1l2IzA.png\" data-width=\"1280\" data-height=\"720\" data-action=\"zoom\" data-action-value=\"1*g5Q9d7nhz8cghuLT1l2IzA.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*g5Q9d7nhz8cghuLT1l2IzA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*g5Q9d7nhz8cghuLT1l2IzA.png\" \/><\/div>\n<\/div>\n<\/figure>\n<p id=\"66b9\" class=\"graf graf--p graf-after--figure\">Once the first node recovers or is replaced, clients will once again be split between both server nodes.<\/p>\n<p id=\"78aa\" class=\"graf graf--p graf-after--p\">The primary advantage of running active\/active clusters lies in the ability to efficiently balance a workload between nodes and even networks. The load balancer\u200a\u2014\u200awhich directs all requests from clients to available servers\u200a\u2014\u200ais configured to monitor node and network activity and use some predetermined algorithm to route traffic to those nodes best able to handle it. Routing policies might follow a round-robin pattern, where client requests are simply alternated between available nodes, or by a preset weight where one node is favored over another by some ratio.<\/p>\n<p id=\"862b\" class=\"graf graf--p graf-after--p\">Having a passive node acting as a stand-by replacement for its partner in an active\/passive cluster configuration provides significant built-in redundancy. If your operation absolutely requires uninterrupted service and seamless failover transitions, then some variation of an active\/passive architecture should be your goal.<\/p>\n<h4 id=\"beaf\" class=\"graf graf--h4 graf-after--p\">Shared-Nothing vs. Shared-Disk Clusters<\/h4>\n<p id=\"942f\" class=\"graf graf--p graf-after--h4\">One of the guiding principles of distributed computing is to avoid having your operation rely on any single point of failure. That is, every resource should be either actively replicated (redundant) or independently replaceable (failover), and there should be no single element whose failure could bring down your whole service.<\/p>\n<p id=\"20d3\" class=\"graf graf--p graf-after--p\">Now, imagine that you\u2019re running a few dozen nodes that all rely on a single database server for their function. Even though the failure of any number of the nodes will not affect the continued health of those nodes that remain, should the database go down, the entire cluster would become useless. Nodes in a shared-nothing cluster, however, will (usually) maintain their own databases so that\u200a\u2014\u200aassuming they\u2019re being properly synced and configured for ongoing transaction safety\u200a\u2014\u200ano external failure will impact them.<\/p>\n<p id=\"347d\" class=\"graf graf--p graf-after--p\">This will have a more significant impact on a load balanced cluster, as each load balanced node has a constant and critical need for simultaneous access to the data. The passive node on a simple failover system, however, might be able to survive some time without access.<\/p>\n<figure id=\"c95f\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*79d-yp6C8tdPJ5Fiu14Wkg.png\" data-width=\"1280\" data-height=\"720\" data-action=\"zoom\" data-action-value=\"1*79d-yp6C8tdPJ5Fiu14Wkg.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*79d-yp6C8tdPJ5Fiu14Wkg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*79d-yp6C8tdPJ5Fiu14Wkg.png\" \/><\/div>\n<\/div>\n<\/figure>\n<figure id=\"173f\" class=\"graf graf--figure graf-after--figure\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*C34EBTHibYVGWLZriOZtdQ.png\" data-width=\"1280\" data-height=\"720\" data-action=\"zoom\" data-action-value=\"1*C34EBTHibYVGWLZriOZtdQ.png\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*C34EBTHibYVGWLZriOZtdQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*C34EBTHibYVGWLZriOZtdQ.png\" \/><\/div>\n<\/div>\n<\/figure>\n<p id=\"34a4\" class=\"graf graf--p graf-after--figure\">While such a setup might slow down the way the cluster responds to some requests\u200a\u2014\u200apartly because fears of split-brain failures might require periodic fencing through stonith\u200a\u2014\u200athe trade off can be justified for mission critical deployments where reliability is the primary consideration.<\/p>\n<h4 id=\"3da5\" class=\"graf graf--h4 graf-after--p\">Availability<\/h4>\n<p id=\"bbc0\" class=\"graf graf--p graf-after--h4\">When designing your cluster, you\u2019ll need to have a pretty good sense of just how tolerant you can be of failure. Or, in other words, given the needs of the people or machines consuming your services, how long can a service disruption last before the mob comes pouring through your front gates with pitch forks and flaming torches. It\u2019s important to know this, because the amount of redundancy you build into your design will have an enormous impact on the down-times you will eventually face.<\/p>\n<p id=\"a18c\" class=\"graf graf--p graf-after--p\">Obviously, the system you build for a service that can go down for a weekend without anyone noticing will be very different from an e-commerce site whose customers expect 24\/7 access. At the very least, you should generally aim for an availability average of at least 99%\u200a\u2014\u200awith some operations requiring significantly higher real-world results. 99% up time would translate to a loss of less than a total of four days out of every year.<\/p>\n<p id=\"611c\" class=\"graf graf--p graf-after--p\">There is a relatively simple formula you can use to build a useful estimate of Availability (A). The idea is to divide the Mean Time Before Failure by the Mean Time Before Failure plus Mean Time To Repair.<\/p>\n<pre id=\"eac3\" class=\"graf graf--pre graf-after--p\">A = MTBF \/ (MTBF + MTTR)<\/pre>\n<p id=\"fd7e\" class=\"graf graf--p graf-after--pre\">The closer the value of A comes to 1, the more highly available your cluster will be. To obtain a realistic value for MTBF, you\u2019ll probably need to spend time exposing a real system to some serious punishment, and watching it carefully for software, hardware, and networking failures. I suppose you could also consult the published life cycle metrics of hardware vendors or large-scale consumers like Backblaze to get an idea of how long heavily-used hardware can be expected to last.<\/p>\n<p id=\"dd31\" class=\"graf graf--p graf-after--p\">The MTTR will be a product of the time it takes your cluster to replace the functionality of a server node that\u2019s failed (a process that\u2019s similar to, though not identical with, disaster recovery\u200a\u2014\u200awhich focuses on quickly replacing failed hardware and connectivity). Ideally, that would be a value as close to zero seconds as possible.<\/p>\n<figure id=\"8b9d\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*qGnfvJNjSyUwjFQl0q5OhQ.png\" data-width=\"566\" data-height=\"294\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*qGnfvJNjSyUwjFQl0q5OhQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*qGnfvJNjSyUwjFQl0q5OhQ.png\" \/><\/div>\n<\/div><figcaption class=\"imageCaption\">Server Availability<\/figcaption><\/figure>\n<p id=\"a354\" class=\"graf graf--p graf-after--figure\">The problem is that, in the real world, there are usually far too many unknown variables for this formula to be truly accurate, as nodes running different software configurations and built with hardware of varying profiles and ages will have a wide range of life expectancies. Nevertheless, it can be a good tool to help you identify the cluster design that\u2019s best for your project.<\/p>\n<p id=\"d143\" class=\"graf graf--p graf-after--p\">With that information, you can easily generate an estimate of how much overall downtime your service will likely in the course of an entire year.<\/p>\n<p id=\"ad51\" class=\"graf graf--p graf-after--p\">A related consideration, if you\u2019re deploying your resources on a third-party platform provider like VMWare or Amazon Web Services, is the provider\u2019s Service Level Agreement (SLA). Amazon\u2019s EC2, for instance, guarantees that their compute instances and block store storage devices will deliver a Monthly Uptime Percentage of at least 99.95%\u200a\u2014\u200awhich is less than five hours\u2019 down time per year. AWS will issue credits for months in which they missed their targets\u200a\u2014\u200athough not nearly enough to compensate for the total business costs of your downtime. With that information, you can arrange for a level of service redundancy that\u2019s suitable for your unique needs.<\/p>\n<p id=\"7462\" class=\"graf graf--p graf-after--p\">Naturally, as a service provider to your own customers, you may need to publish your own SLA based on your MTBF and MTTR estimates.<\/p>\n<h4 id=\"096f\" class=\"graf graf--h4 graf-after--p\">Session Handling<\/h4>\n<p id=\"8565\" class=\"graf graf--p graf-after--h4\">For any server-client relationship, the data generated by stateful HTTP sessions needs to be saved in a way that makes it available for future interactions. Cluster architectures can introduce serious complexity into these relationships, as the specific server a client or user interacts with might change between one step and the next.<\/p>\n<p id=\"03a0\" class=\"graf graf--p graf-after--p\">To illustrate, imagine you\u2019re logged onto Amazon.com, browsing through their books on LPIC training, and periodically adding an item to your cart (hopefully, more copies of this book). By the time you\u2019re ready to enter your payment information and check out, however, the server you used to browse may no longer even exist. How will your current server know which books you decided to purchase?<\/p>\n<p id=\"5d59\" class=\"graf graf--p graf-after--p\">I don\u2019t know exactly how Amazon handles this (but you might get some hints from my Manning \u201c<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.manning.com\/books\/learn-amazon-web-services-in-a-month-of-lunches?a_aid=bootstrap-it&amp;amp;a_bid=1c1b5e27\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.manning.com\/books\/learn-amazon-web-services-in-a-month-of-lunches?a_aid=bootstrap-it&amp;amp;a_bid=1c1b5e27\">Learn AWS in a Month of Lunches<\/a>\u201d book), but the problem is often addressed through a data replication tool like memcached running on an<br \/>\nexternal node (or nodes). The goal is to provide constant access to a reliable and consistent data source to any node that might need it.<\/p>\n<p id=\"c460\" class=\"graf graf--p graf-after--p graf--trailing\"><em class=\"markup--em markup--p-em\">This article is adapted from \u201c<\/em><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/bootstrap-it.com\/index.php\/books\/\"><em class=\"markup--em markup--p-em\">Teach Yourself Linux Virtualization and High Availability: prepare for the LPIC-3 304 certification exam<\/em><\/a><em class=\"markup--em markup--p-em\">\u201d. It was also published on <a href=\"https:\/\/medium.com\/@dbclin\">Medium<\/a>. Check out my other books on AWS and Linux administration.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is adapted from chapter 7 of my book,\u00a0Teach Yourself Linux Virtualization and High Availability: prepare for the LPIC-3 304 certification exam. Let\u2019s focus more on some of the larger architectural principles of cluster management than on any single&hellip; <a href=\"https:\/\/bootstrap-it.com\/blog\/?p=288\" class=\"more-link\">Continue Reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":296,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-288","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.2.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>High Availability: Concepts and Theory - Bootstrap IT<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/bootstrap-it.com\/blog\/?p=288\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"High Availability: Concepts and Theory - Bootstrap IT\" \/>\n<meta property=\"og:description\" content=\"This article is adapted from chapter 7 of my book,\u00a0Teach Yourself Linux Virtualization and High Availability: prepare for the LPIC-3 304 certification exam. Let\u2019s focus more on some of the larger architectural principles of cluster management than on any single&hellip; Continue Reading &rarr;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/bootstrap-it.com\/blog\/?p=288\" \/>\n<meta property=\"og:site_name\" content=\"Bootstrap IT\" \/>\n<meta property=\"article:published_time\" content=\"2017-11-10T15:01:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-11-10T15:09:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/bootstrap-it.com\/blog\/wp-content\/uploads\/ha1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"dbclin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@davidbclinton\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"dbclin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/bootstrap-it.com\/blog\/?p=288\",\"url\":\"https:\/\/bootstrap-it.com\/blog\/?p=288\",\"name\":\"High Availability: Concepts and Theory - Bootstrap IT\",\"isPartOf\":{\"@id\":\"https:\/\/bootstrap-it.com\/blog\/#website\"},\"datePublished\":\"2017-11-10T15:01:47+00:00\",\"dateModified\":\"2017-11-10T15:09:04+00:00\",\"author\":{\"@id\":\"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/ae0fb1d5b3b01558b92b6426d77766ec\"},\"breadcrumb\":{\"@id\":\"https:\/\/bootstrap-it.com\/blog\/?p=288#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/bootstrap-it.com\/blog\/?p=288\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/bootstrap-it.com\/blog\/?p=288#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/bootstrap-it.com\/blog\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"High Availability: Concepts and Theory\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/bootstrap-it.com\/blog\/#website\",\"url\":\"https:\/\/bootstrap-it.com\/blog\/\",\"name\":\"Bootstrap IT\",\"description\":\"Learn technology using technology\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/bootstrap-it.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/ae0fb1d5b3b01558b92b6426d77766ec\",\"name\":\"dbclin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/a93785d437350478a7f1dfcbec58d26bc28e0124e405179acbe1b4325c09f90a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/a93785d437350478a7f1dfcbec58d26bc28e0124e405179acbe1b4325c09f90a?s=96&d=mm&r=g\",\"caption\":\"dbclin\"},\"sameAs\":[\"http:\/\/bootstrap-it.com\/\",\"dbclinton\",\"https:\/\/twitter.com\/davidbclinton\"],\"url\":\"https:\/\/bootstrap-it.com\/blog\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"High Availability: Concepts and Theory - Bootstrap IT","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/bootstrap-it.com\/blog\/?p=288","og_locale":"en_US","og_type":"article","og_title":"High Availability: Concepts and Theory - Bootstrap IT","og_description":"This article is adapted from chapter 7 of my book,\u00a0Teach Yourself Linux Virtualization and High Availability: prepare for the LPIC-3 304 certification exam. Let\u2019s focus more on some of the larger architectural principles of cluster management than on any single&hellip; Continue Reading &rarr;","og_url":"https:\/\/bootstrap-it.com\/blog\/?p=288","og_site_name":"Bootstrap IT","article_published_time":"2017-11-10T15:01:47+00:00","article_modified_time":"2017-11-10T15:09:04+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/bootstrap-it.com\/blog\/wp-content\/uploads\/ha1.png","type":"image\/png"}],"author":"dbclin","twitter_card":"summary_large_image","twitter_creator":"@davidbclinton","twitter_misc":{"Written by":"dbclin","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/bootstrap-it.com\/blog\/?p=288","url":"https:\/\/bootstrap-it.com\/blog\/?p=288","name":"High Availability: Concepts and Theory - Bootstrap IT","isPartOf":{"@id":"https:\/\/bootstrap-it.com\/blog\/#website"},"datePublished":"2017-11-10T15:01:47+00:00","dateModified":"2017-11-10T15:09:04+00:00","author":{"@id":"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/ae0fb1d5b3b01558b92b6426d77766ec"},"breadcrumb":{"@id":"https:\/\/bootstrap-it.com\/blog\/?p=288#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/bootstrap-it.com\/blog\/?p=288"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/bootstrap-it.com\/blog\/?p=288#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/bootstrap-it.com\/blog"},{"@type":"ListItem","position":2,"name":"High Availability: Concepts and Theory"}]},{"@type":"WebSite","@id":"https:\/\/bootstrap-it.com\/blog\/#website","url":"https:\/\/bootstrap-it.com\/blog\/","name":"Bootstrap IT","description":"Learn technology using technology","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/bootstrap-it.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/ae0fb1d5b3b01558b92b6426d77766ec","name":"dbclin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/bootstrap-it.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a93785d437350478a7f1dfcbec58d26bc28e0124e405179acbe1b4325c09f90a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a93785d437350478a7f1dfcbec58d26bc28e0124e405179acbe1b4325c09f90a?s=96&d=mm&r=g","caption":"dbclin"},"sameAs":["http:\/\/bootstrap-it.com\/","dbclinton","https:\/\/twitter.com\/davidbclinton"],"url":"https:\/\/bootstrap-it.com\/blog\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/288","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=288"}],"version-history":[{"count":1,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/288\/revisions"}],"predecessor-version":[{"id":289,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/288\/revisions\/289"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=\/wp\/v2\/media\/296"}],"wp:attachment":[{"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bootstrap-it.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}