Skip navigation EPAM

DataStax Enterprise and the Cloud: Why migrating to a NoSQL database might be your organization’s smartest move

Todd Homa

Director, Big Data Technology Solutions, EPAM

Apache Cassandra™ is a NoSQL database that was developed as a fast, scalable, and cost-effective data management platform for cloud applications that need to scale and perform in distributed environments. EPAM and DataStax are collaborating to integrate data centers at scale with DataStax Enterprise (DSE), the database for cloud applications, based on Apache Cassandra. Many industries are migrating their entire infrastructures to the cloud as a measure to reduce cost, but there are several other ways in which migrating to DSE can increase business efficiencies.


How DSE can improve business

I’m finding that more and more enterprises are looking to migrate their RDBMS to NoSQL in order to cut costs and streamline processes. With the cloud offering greater opportunities to increase resource efficiencies, many organizations can benefit from migrating to DSE for a variety of reasons:

1. Geographic distribution – DSE, by design, is a distributed system. It is very intuitive and simple to add additional data centers to provide geographic proximity to customers. This is a much more complex task in the RDBMS world and sometimes requires additional tooling.

2. Linear scalability – DSE scales horizontally by adding additional nodes (machines). NetFlix has done a performance test that shows that Cassandra does scale linearly with the addition of nodes. Oracle and DB2 cannot scale linearly because there is a bottleneck in their fundamental design. Both products allow you to add additional resources for processing but they both use shared data sets. This means that the IO system for your data becomes your bottleneck. This is NOT the case with a fully distributed system (both storage and processing) like DSE.

3. 100% Uptime – Sure, this might be a bit of a stretch. No system can guarantee 100% uptime. However, DSE is designed with failure in mind and you configure data redundancy based on your tolerance for failure. In theory, you could configure DSE in such a way that all data is replicated to every other node. In this scenario, even if only one node was up, the system would still be up. This is not practical from a performance perspective so most people use a replication factor of three which provides three instances of the data at all times. This can be configured across data centers which will provide even more redundancy.

4. Cost reduction – In my anecdotal experience, the cost to run a DSE cluster has been 80-90% cheaper than an equivalent RDBMS platform. Cost is determined by two major components: licensing and scalability. In terms of licensing, the per-node cost for DSE is far less than traditional RDBMS vendors. And when it comes to scalability, traditional RDBMS platforms typically scale vertically, meaning that more hardware is added to a single machine to improve performance. This is cost effective up until the point that there are diminishing returns on very high end software. On the other hand, DSE, along with other distributed systems, scales horizontally by adding more nodes to the system to scale. This reduces the overall cost of the platform in the long run.

DSE is particularly useful for managing big data applications that modern businesses today need to process mass amounts of information, particularly when it comes to scalable cloud technology solutions.  Migrating to DSE could be a smart move for your business by providing better performance at a lower cost.

Hello. How Can We Help You?

Get in touch with us. We'd love to hear from you.

Our Offices