Built in NYC – by Colin Hanner
Is it too much to ask for fast, reliable and easy-to-use data pipelines that eliminate redundancies?
For data scientists, the answer can oftentimes be “yes,” especially as companies undergo growth and need to scale rapidly. (And, as the resource itself expands: the International Data Corporation predicts the world’s data will grow to 163 zettabytes in 2025 — 10 times the amount of data generated in 2016.)
Massive amounts of data being transferred from a source can cause significant wait times, and discrepancies in data can lead to lots of manual fixes down the line. Simply put, there’s a lot that can go wrong, and that’s not ideal when there’s a business, or a business’s business, at stake.
Of course, there are many ways to build and scale data pipelines. Vahid Saberi, a senior data scientist at EPAM Systems, combats these issues by leveraging Spark, a large-scale data processing analytics engine that enables the pipelines they construct for their clients to be fast and efficient.
To read the full article, click here.