(function(doc, html, url) { var widget = doc.createElement("div"); widget.innerHTML = html; var script = doc.currentScript; // e = a.currentScript; if (!script) { var scripts = doc.scripts; for (var i = 0; i < scripts.length; ++i) { script = scripts[i]; if (script.src && script.src.indexOf(url) != -1) break; } } script.parentElement.replaceChild(widget, script); }(document, '

Increasing reliability of Amazon EC2 spot instances with a fault-tolerant multi-agent architecture

What is it about?

Cloud providers have recently offered their unused resources as transient instances. Amazon sells idle cloud resources as spot instances pricing by an auction-based market mechanism to reduce the cost without any availability guarantee. Thus, to dynamically and autonomously manage cloud resources to execute user applications ensuring greater reliability with cheaper spot instances is an open problem. In this context, we propose a fault-tolerant multi-agent architecture as middleware of cloud providers and users to mediate access to a wide range of heterogeneous resources providing a resilient application execution environment with a dynamic flexible fault-tolerant mechanism based on adaptive checkpointing. Our architecture combines a case-based reasoning model with a survival analysis model to predict failure events and refine fault-tolerant plans with adequate parameters to increase reliability optimizing total execution time and costs. We evaluated the proposed architecture with real historical data collected from Amazon EC2 price changes including, with approximately 21 million records and generating 1,362,816 scenarios stored in our case knowledge database. The results considering the time to revocation achieved high levels of accuracy (98%) with a gain up to 74.48% to total execution time, reducing total cost when compared to other approaches in the literature.

Why is it important?

we propose a fault-tolerant multi-agent architecture as middleware of cloud providers and users to mediate access to a wide range of heterogeneous resources providing a resilient application execution environment with a dynamic flexible fault-tolerant mechanism based on adaptive checkpointing fault tolerance technique.

Read more on Kudos…
The following have contributed to this page:
Pergentino Araujo
' ,"url"));