So, what is big data?
Watching a recommended TV show on Netflix? Shopping on Amazon? Browsing Chrome? Clicking a cookie pop-up? Using a TikTok filter?
If yes, big data technologies are firmly a part of your life.
All of these services are collecting and processing massive amounts of diverse data known nowadays as big data.
In essence, big data is a buzzword standing for explosive growth in data and the emergence of advanced tools and techniques to uncover patterns in it.
Many define big data by four Vs: Volume, Velocity, Variety, and Veracity.
Volume: It’s petabytes, or even exabytes, of data
Velocity: The pace at which data is flowing in is mind-boggling: 1.7 megabytes of data is created every second per person
Variety: Big data is mixed data, including both structured and raw, unstructured data from social media feeds, emails, search indexes, medical images, voice recordings, video, and many other sources
Veracity: A significant part of big data is associated with uncertainty and imprecision
Big data undergoes a few stages to deliver insights. These can be presented as follows:
Why has big data come into prominence?
From marketing intelligence enabling personalized offers to predictive maintenance, real-time alerts, innovative products, and next-level supply chains, leading companies that know how to deal with big data challenges reap enormous benefits across industries from data analytics and data science.
But big data is so massive, so messy, and so ridiculously fast-growing that it’s next to impossible to analyze it using traditional systems and techniques.
The hottest technologies of today — cloud computing, artificial intelligence, and more seamless analytics tools — have made the task accomplishable. There are a few problems with big data, though. Read on.
Challenges of big data — What stands in the way to a digital nirvana?
Despite new technology solutions deluging the market, a slew of big data problems drag down digital transformation efforts. Less than half of companies say in a new study from NewVantage Partners that they are driving innovation with data or competing on analytics.
Most companies (92%) cite people, business processes, and culture as principal big data challenges. Only 8% put down major big data barriers to technology limitations. What’s exactly the problem with big data implementation?
ITRex CEO Vital Likhadzed sat down with us to discuss common big data issues faced by companies and ways to fix them. Here is his insightful analysis that covers the five biggest big data pitfalls:
- Data silos and poor data quality
- Lack of coordination to steer big data/AI initiatives
- Skills shortage
- Solving the wrong problem
- Dated data and inability to operationalize insights
Big data challenge 1: Data silos and poor data quality
The problem with any data in any organization is always that it is kept in different places and in different formats. A simple task like having a look at production costs might be daunting for a manager when finance is keeping tabs on supplies expenses, payroll, and other financial data, as it should do, while information from machines on the manufacturing floor is sitting unintegrated in the production department’s database, as it shouldn’t.
Another major challenge with big data is that it’s never 100% consistent. Getting a detailed overview of shipments to, say, India can also be a problem for our plant in question, if the sales team handles local clients under the India tag, production uses the IND acronym while finance has gone for a totally different country code. The varying levels of data granularity they may apply for managing their databases only rub more salt in the wound of big data analytics.
Finally, data is prone to errors. The more datasets you have, the more likely you are to get the same data misstated with different types and margins of error. There can also be duplicate records multiplying challenges for your big data analytics.
Building a data governance framework is a non-negotiable imperative if you want workable data. This framework establishes policies, procedures, and processes to set the bar for the quality of your data, make it visible, and install solid safeguards (if you by any chance don’t have data security and privacy on your radar, you should — non-compliance with regulatory requirements like GDPR and CCPA is punished painfully). It’s important to align your data governance with business needs. If you are in healthcare, for instance, it definitely should be centered around compliance with HIPAA or other industry standards.
With robust data governance in place, you will be well equipped to address the quality and consistency challenges with big data by implementing master data and metadata management practices.
A consolidation model is a good choice for managing master data (your key business data about customers, products, suppliers, or locations). In this approach, master data is merged from different sources into a central repository that acts as a single version of truth, or the “golden record.” This helps eliminate the duplication and redundancy problem with big data.
For metadata (data about your data) management, you will need to build a data catalog. It’s essentially an inventory of all your data assets for data discovery. Advanced data catalogs incorporate business glossaries, run checks on data quality, offer data lineage, and help with data preparation. However, hard-and-fast validation rules are needed to ensure that data entries match catalog definitions. Both business and IT people should take part in defining them.
Embed quality considerations into the setup of applications as part of managing your entire IT ecosystem, but define data requirements based on your use cases. It’s important. You should first identify your business problem or use case (in very specific terms) and determine what data you need to solve it. And only then requirements for data should be carefully considered.
When working with data, organize it into several logical layers. This means that you should integrate, treat and transform your data into new entities step by step so that it reaches the analytics layer as a higher quality resource that makes sense for business users.
Make use of technology innovations wherever possible to automate and improve parsing, cleansing, profiling, data enrichment, and many other data management processes. There are plenty of good data management tools in the market.
The role of data stewards is critical. Data governance is not only about standards and technologies but in large measure about people. Data stewards are responsible for data quality, acting as a central point of contact in the organization to go to for all data-related issues. They have a down-to-earth understanding of data lineage (how data is captured, changed, stored, and utilized), which enables them to trace issues to their root cause in data pipelines.
Big data challenge 2: Lack of coordination to steer big data/AI initiatives
With no single point of accountability, data analytics often boils down to poorly focused initiatives. Implemented by standalone business or IT teams on an ad hoc basis, such projects lead to missed steps and misinformed decisions.
Any data governance strategy, no matter how brilliant, is also doomed, if there’s no one to coordinate it.
Even worse, a disjointed approach to data management makes it impossible to understand what data is available at the level of the organization, let alone to prioritize use cases
Any data-powered organization needs a centralized role like the chief data officer who should be primarily responsible for spelling out STRICT RULES as part of data governance and making sure they are followed for all data projects. In fact, they should be applied to every IT initiative because in one way or another any IT initiative today will be related to data, whether you want to spin off a database, build a new application, or update a legacy system.
The role of chief data officer can be taken by a senior data master or by the chief information officer who has always been a perfect fit.
The chief data officer is instrumental to setting the company’s strategic data vision, driving data governance policies, and adjusting processes to the mastery of the organization.
Establishing data tribes, or centers of excellence, is also a very, very good idea. Such squads normally include data stewards, data engineers, and data analysts who team up to build the company’s data architecture and consistent data processes. They will help too with addressing the coordination problem with big data.
To make your data tribe efficient, it is important you measure their performance by the number of big data use cases identified and successfully implemented. This way, they will be motivated to help other teams with extracting maximum value from new technologies and data the company has on its hands.
Education is another key mission of data squads. A common problem is that many people just don’t want to learn new skills because learning can be challenging and uncomfortable. The data tribe keeps people engaged, educates them on how to use new tools and work use cases, and importantly lends a hand with changing their day-to-day processes.
Make sure your data squad is doing the following:
Looking for opportunities and gaps in processes across the organization for implementing AI business solutions
Incubating skills and sharing tribal knowledge through mentoring
Cooperating closely with subject matter experts from business teams to identify pain points they are struggling with
Asking business teams the right questions to understand clearly their KPIs and how data can help achieve them
Big data challenge 3: Skills shortage
This problem with big data implementation is pretty straightforward: demand for data science and analytics skills has been so far outpacing supply.
There is a reason. In an attempt to lay hands on data-powered revenue sources and not to lose opportunities to competitors, organizations have rushed to adopt big data analytics.
With the skills shortage, they, however, are having difficulty taking advantage of their data.
Grow your own tech talent to fix this big data challenge. Tap into the potential of your technical employee base through reskilling and upskilling, but focus on upskilling. People don’t like changes, while adding a new skill — a skill in data modeling, data architecture, data engineering, or ML — to their already fair skillset might sound like a much better idea to them.
Run training programs and workshops for your tech folks but make sure that the time and resources are not wasted. People shouldn’t take training to come back with the idea that they need to hire a guru who will tell them what to do with their data. As a follow-up, encourage them to bring something valuable to the table. This can be a potential improvement to their workflow or another business process.
Partner with higher education institutions (colleges and universities) to discover promising junior talent.
Bring a strategic partner into the fold if you can’t boost your in-house teams with homegrown data skills or need niche skills with implementing a big data solution. Just keep in mind that no one knows your business better than you. The same holds for your data: only you know what data you collect and what data you store. So, first identify your business problem and only then look for a highly skilled tech partner that successfully solved a similar business problem in the past (captain here).
Democratize your data radically to make it accessible and usable for employees with no specialized algorithm or coding knowledge. Company-wide education on data topics will help you tackle the big data problem of the skills shortage by strengthening data literacy and driving data adoption at all levels. Data stewards should also take an active part in the initiative.
Also, simplify analysis for business users with easy-to-use self-service analytics tools, like dashboards and recommendation systems. Encourage their daily use for data-driven decisions. A good example here would be a global digital industrial conglomerate that has built an analytics platform incorporating a business semantic layer to give employees real-time access to data they are working with day to day, from HR, finance, and marketing to production. Pre-defined sets organize data under ‘human’ titles that everyone can understand, while allowing personalization. Their next step is to train algorithms so that they could analyze individual workflows and recommend improvements in their day-to-day jobs. Another fair example would be a top global retailer that has democratized access to data for over three million employees with the help of an advanced self-service data analytics platform designed and built by ITRex. The platform provides a 360-degree view of all available data for easy analysis and reporting.
No matter how skillful your tech talent is, your data won’t give you insights, if business users don’t know what to do about it. It’s them, regular front-line employees, – not just “geeks” – who should do analytics, develop simple visualizations, and tell stories, translating data into powerful action.
Big data challenge 4: Solving the wrong problem
As McKInsey says in its recent data report, “Think business backwards, not data forward.”
In fear of missing out, many organizations are too quick to jump into a big data initiative without spending time figuring out what business problem exactly they want to solve. This is another big data challenge that derails many projects.
A clear and feasible business goal will help you ask the right questions as to what you should measure to understand value. Many AI projects fail because people choose to go with metrics that are easiest to track or standard performance indicators that they or others usually track.
With this big data challenge ignored, you throw away precious resources on projects that make no or little business impact, and your ROI is NEVER measurable.
Clarify your business strategy to align big data analytics. Make sure your company leaders are on the same page. Surprisingly, they are often not. A survey by the MIT Sloan Management Review has revealed that only 28% of executives and middle managers responsible for executing the organization’s strategy could list three of its strategic priorities.
Think about the problems you’re having and ask the right questions (a lot of questions!). So, fine, we have a lame business process here, but how can we improve it? Will it be through cost savings? If yes, what makes up our current costs, and how much do we want to save and how soon do we want to reach our target? What data is most relevant? Do we have enough of it to measure our results?
Be very specific with your questions, business challenges at hand, and desired outcomes. To get a FEASIBLE PROJECT, your data squad should ask business people questions over and over again and keep listening.
It will be a good idea if your data team makes a list of all business decisions that the company should make regularly. It will help them identify easy candidates for a data-driven approach.
Big data challenge 5: Dated data and inability to operationalize insights
In the COVID-19 world, this big data problem has become more acute as the need for speed has increased. Even if you analyze data for trends, including data from sensors or social media, you may need to adapt. The truth is, the pandemic has rendered a lot of historical data and business assumptions useless because of behavioral changes. If you have an AI model built on pre-COVID data, it may well happen you don’t have any current data at all to do big data analytics.
Slice and dice your big data initiative to turn it into small data challenges. Thus, it will be easier for your team to keep pace with changing business priorities and data requirements and produce insights quickly for immediate decision-making. And don’t forget to go first for low-hanging fruit, because any company has processes that can be improved with simple automation. Plus, the value you get will earn your data initiative more credibility with business users. You will need their engagement when you move to scale up big data and AI implementation.
Go agile, counterintuitive as it may sound. In agile, teams deliver chunks of business value at the end of every sprint (a short time-boxed period). This approach is absolutely workable in a big data environment too. As you move forward in small iterations, you are able to begin delivering value immediately before all the necessary metadata is identified and cataloged. You can adjust your data model along the way.
Agile puts your business users and data team in one room where they generate, test, and validate hypotheses on an ongoing basis, always using FRESH DATA that is pouring in. With every sprint, the development team is sharing new information that business can check on the spot to make sure it contains a relevant answer. If it doesn’t, the tech guys go digging for new data again and adjust the data model to test a new hypothesis. Your entire data science workflow can be reduced from months to days.
The agile approach will involve establishing DataOps and MLOps practices for the entire big data cycle. They will help you:
Monitor data patterns that no longer hold (data drift) to ensure that your model continues to predict accurately
Automate data management processes using advanced technologies
Reduce the cycle time of big data analytics by an order of magnitude
Scale and operationalize data analytics across the organization
Using agile means failing fast and failing often to eventually win. Coupled with automation, this approach allows teams to be quick with eliminating failing assumptions and unearthing useful hypotheses that can be turned into action in a timely manner.
Big data adoption does not happen overnight, and big data challenges are profound. We hope our tips and insights will help you successfully navigate major problems with big data. Many data projects indeed fail. But it shouldn’t be yours.