When one cute little source for data gets you thinking about the world of really big data
It’s been absolutely fascinating to see just how much data we’re creating in today’s world. Let me give you a fun example. About a month ago, I became a new dad! And, already, our baby has a pretty big digital footprint. In fact, I even wrote an entire blog around becoming a cloud-powered parent, utilizing smart tools, analyzing data, and even looking at patterns.
Think it’s crazy? Well, working with data patterns and cognitive systems not only gets me, as a parent a bit of extra sleep, this may very well become the new normal moving forward in our everyday lives.
Consider this, cognitive systems can greatly step up the frequency, flexibility, and immediacy of data analysis across a range of industries, circumstances, and applications. IDC estimates that the amount of the global datasphere subject to data analysis will grow by a factor of 50 to 5.2ZB in 2025; and the amount of analyzed data that is “touched” by cognitive systems will grow by a factor of 100 to 1.4ZB in 2025!
Those are some pretty big stats in a world that continues to become even more digitized. But what are the practical applications here? Many organizations certainly know that they have a lot of data; but they don’t entirely know what to do with it.
Earlier this month, I discussed how success with IT initiatives isn't just about creating data; it's about making good use of that data. I covered some real barriers to better data utilization including:
- Lost data points and silos.
- Losing control of connected devices
- Forgetting about security
- Not understanding the difference between embedded data and productivity data
- Ensuring infrastructure keeps up with data creation
So, you’re in an organization with a lot of data. You’re housing it in various locations and you have a business initiative to leverage this information to help you make better decisions. This is where lots of folks get stuck. That is…
Step 1: Create data
Step 2: ???
Step 3: Profit
That being said, let’s look at a few tools that can really help you grasp the power of data.
Big data engines
You’ve heard of them and you’ve read about them. But, what is a big data engine? Quite simply, big data engines allow you to examine and analyze large amounts of data. The key purpose is to uncover patterns and insights, and even correlate different types of data points. This is something humans and simple technologies can’t do. Big data engines are, arguably, your starting point for creating business intelligence.
Already, industries all over the world are leveraging big data engines to stay competitive and help them make better business as well as market decisions. There are a few parts to big data. They include data managing, the mining process, in-memory analytics, and the engine itself. This could be Hadoop, MapR, Google Bigdata, Cloudera, Hortonworks, MongoDB, Azure big data and analytics, and many more. The choice of design and engine will really depend on your data set and your own use-cases.
Oftentimes, when I discuss data warehousing, I get the term "database" thrown in there. Let’s start here, a data warehouse is not a database. Although you could argue that they’re both relational data systems, they absolutely serve different purposes. Data warehousing allows you to pull data together from a number of different sources. The purpose is to help analyze and report on data. Data warehouses store vast amounts of historical data for fast, as well as complex queries, across all data types being pulled together. There are lots of use-cases for a data warehouse as well. For example, if you’re doing very large amounts of data mining and require an intelligent "warehouse" to store all of this data. A data warehouse is far better than a traditional database.
Similarly, if you need to quantify vast amounts of market data in-depth, a data warehouse could help. You can use that data to understand the behavior of users in an online business to help make better decisions around services and products.
While we’re on this topic, you may have heard the term "data lake". This is a newer data processing technology, which focuses on structured, semi-structured, unstructured, and raw data points for analysis. Data warehouses, on the other hand, only look at structured and processed data. Where data warehousing can be used by business professionals, a data lake is more commonly used by data scientists. As far as examples, you can find data warehousing services from a variety of solutions, including Amazon Redshift, Google BigQuery, and Panoply. Data lake examples include Amazon S3 as well as the Azure Blog Storage service.
This data analytics technology is really cool! When you gather all of your data, a key stopping point is the lack of "visualizing" the information in a productive manner. Data visualization allows you to see information as a picture, graph, or illustrated collection of data. The point is that it helps you interact with your data to see new concepts and patterns that were once difficult to grasp. From there, you can drill much deeper into your charts and graphs to really understand how data is changing and impacting your business. Plus, there are some powerful mechanisms that can help with data visualization. They include Microsoft Power BI, Tableau, IBM Watson, SAP Analytics, and Google Analytics.
This next part is important – there are numerous different approaches to data analytics. I didn’t even have time to get into predictive analytics!
The bottom line is that you have options around on-premise architectures, cloud-driven solutions, and the hybrid option as well. The first step in any data journey will be your own exploration of data repositories, how you create data, and the structure of your data. That is, what are the sources, is it structured or unstructured data, and are there governance, compliance requirements, or regulations (GRC) wrapped around it? All of this will dictate the type of solution you’ll leverage and what can be most effective.
Navigating the landscape of your own data really shouldn’t be a solo journey. This is a big reason why there are entire professions around data science and analytics. Be sure to leverage these experts to help you get the most out of your own data sets.
Bill is an enthusiastic technologist with experience in a variety of industries. This includes data center, cloud, virtualization, security, AI, mobility, edge solutions, and much more. His architecture work includes large virtualization and cloud deployments as well as focusing on overcoming emerging business challenges. Bill enjoys writing, blogging, and educating colleagues around everything that is technology. During the day, Bill is the Director of Technology Solutions at EPAM where he works with AI, machine learning, blockchain solutions, DevOps, cloud, and advanced technologies to help engineer the digital future. Bill's whitepapers, articles, video blogs and podcasts have been published and referenced on WindowsITPro, Data Center Knowledge, InformationWeek, Network Computing, TechTarget, Dark Reading, Forbes, CBS Interactive, Slashdot, and many others. As an active member of the technology industry, Bill was ranked #16 globally in an Onalytica study of the top 100 most influential individuals in the cloud landscape; and #4 in a different Onalytica study reviewing the industry's top data security experts.
The original article is here.