Simo Ahava’s atypical viewpoint on data quality and communication structures freshened up the whole lounge at the Go Analytics! conference. OWOX, the MarTech leader in the CIS region, welcomed thousands of experts to this gathering to share their knowledge and ideas.
OWOX BI Team would like you to think over the concept proposed by Simo Ahava, which definitely has potential to make your business grow.
The Quality of Data and Quality of the Organization
The quality of data depends on the person who’s analyzing it. Typically, we would blame all flaws in the data on tools, workflows, and datasets. But is that reasonable?
Frankly speaking, the quality of data is directly tied to how we communicate within our organizations. The quality of the organization determines everything, starting with the approach to data mining, estimation, and measurement, continuing with processing, and ending with the overall quality of the product and decision-making.
Companies and Their Communication Structures
Let’s imagine a company specializes in one tool. The people in this company are great at finding certain problems and solving them for the B2B segment. Everything’s great, and no doubt you know a couple companies like this.
The side effects of these companies’ activities are hidden in the long-term process of raising the requirements for data quality. At the same time, we should remember that tools created to analyze data work with data only and are isolated from the business problems – even if they’re created to solve them.
That’s why another kind of firm has appeared. These companies are specialized in workflow debugging. They can find a whole bunch of problems in business processes, put them on a whiteboard, and tell the executives:
Here, here, and there! Apply this new business strategy and you’ll be fine!
But it sounds too good to be true. The efficiency of advice that isn’t based on an understanding of the tools is doubtful. And those consulting firms tend not to understand why such problems appeared, why each new day brings new complexities and errors, and which tools were set up incorrectly.
So the usefulness of these companies on their own is limited.
There are companies with both business expertise and knowledge of tools. In these companies, everyone is obsessed by hiring people with great qualities, experts who are certain in their skills and knowledge. Cool. But typically, these companies aren’t aimed at solving communication problems inside of the team, which they often see as unimportant. So as new problems appear, the witch hunting starts — whose fault is it? Maybe the BI specialists confused the processes? No, the programmers didn’t read the technical description. But all in all, the real problem is that the team can’t think over the problem clearly to solve it together.
This shows us that even in a company stuffed with cool specialists, everything will take more effort than necessary if the organization isn’t mature enough. The idea that you have to be the adult and be responsible, especially in a crisis, is the last thing people are thinking about in most companies.
Even my two-year-old child who’s going to kindergarten seems more mature than some of the organizations I’ve worked with.
You can’t create an efficient company only by hiring a great number of specialists, as they’re all absorbed by some group or department. So management continues to hire specialists, but nothing changes because the structure and logic of the workflow doesn’t change at all.
If you don’t do anything to create channels of communication inside and outside of these groups and departments, all of your efforts will be meaningless. That’s why communication strategy and maturity is Ahava’s focus.
Conway’s Law Applied to Analytics Companies
Fifty years ago, a great programmer named Melvin Conway made a suggestion that later became popularly known as Conway’s law:
Organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.
Melvin Conway, Conway’s Law
These thoughts appeared at a time when one computer fit one room perfectly! Just imagine: Here we have one team working on one computer, and there we have another team working on another computer. And in real life, Conway’s law means that all the communication flaws that appear among those teams will be mirrored in the structure and functionality of the programs they develop.
This theory has been tested hundreds of times in the development world and has been discussed a lot. The most certain definition of Conway’s law was created by Pieter Hintjens, one of the most influential programmers of the early 2000s, who said that “if you are in a shitty organization, you will make shitty software.” (Amdahl to Zipf: Ten Laws of the Physics of People)
It’s easy to see how this law works in the marketing and analytics world. In this world, companies are working with giant amounts of data gathered from different sources. We can all agree that data itself is fair. But if you inspect data sets closely, you’ll see all the imperfections of the organizations that collected that data:
- Missing values where engineers haven’t talked through an issue
- Wrong formats where nobody paid attention and nobody discussed the number of decimal places
- Communication delays where nobody knows the format of transfer (batch or stream) and who must receive the data
That’s why data exchange systems disclose our imperfections completely.
Data quality is the achievement of tool specialists, workflow experts, managers, and the communication among all these people.
The Best and Worst Communication Structures for Multidisciplinary Teams
A typical project team in a MarTech or marketing analytics company consists of business intelligence (BI) specialists, data scientists, designers, marketers, analysts, and programmers (in any combination).
But what will happen in a team that doesn’t understand the importance of communication? Let’s see. The programmers will write code for a long time, trying hard, while another part of the team will just wait for them to pass the baton. At last, the beta version will be released, and everyone will be murmuring about why it took so long. And when the first flaw appears, everyone will start looking for someone else to blame but not for ways to avoid the situation that got them there.
If we look deeper, we’ll see that mutual aims weren’t understood correctly (or at all). And in such a situation, we’ll get a damaged or flawed product.
The worst features of this situation:
- Insufficient involvement
- Insufficient participation
- Lack of cooperation
- Lack of trust
How can we fix it? Literally by making people talk.
Let’s gather everyone together, set topics of discussion, and schedule weekly meetings: marketing with BI, programmers with designers and data specialists. Then we’ll hope that people talk about the project. But that’s still not enough because team members are still not talking about the whole project and aren’t talking with the whole team. It’s easy to get snowed under with tens of meetings and no way out and no time to do the work. And those messages after meetings will kill the rest of the time and understanding of what to do next.
That’s why meeting are only the first step. We still have some problems:
- Poor communication
- Lack of mutual aims
- Insufficient involvement
Sometimes, people try to pass along important information about the project to their colleagues. But instead of the message getting through, the rumor machine does everything for them. When people don’t know how to share their thoughts and ideas properly and in the proper environment, information will be lost on the way to the recipient.
These are symptoms of a company struggling with communication problems. And it starts to cure them with meetings. But we always have another solution.
Lead everyone to communicatе over the project.
The best features of this approach:
- Knowledge and skills exchange
- Non-stop education
This is an extremely complex structure that’s hard to create. You may know a few frameworks that take this approach: Agile, Lean, Scrum. It doesn’t matter what you name it; all of them are built on the “making everything together at the same time” principle. All those calendars, task queues, demo presentations, and stand-up meetings are aimed at making people talk about the project frequently and all together.
That’s why I like Agile a lot, because it includes the importance of communication as a prerequisite for project survival.
And if you think you’re an analyst who doesn’t like Agile, look at it another way: It helps you to show the results of your work – all of your processed data, those great dashboards, your data sets – to make people appreciate your efforts. But to do that, you have to meet your colleagues and talk with them at the round table.
What’s next? Everybody has started to talk about the project. Now we have to prove the quality of the project. To do this, companies typically hire a consultant with the highest professional qualifications.
The main criterion of a good consultant (I can tell you because I’m a consultant) is constantly decreasing his involvement in the project.
A consultant can’t just feed a company small pieces of professional secrets because that won’t make the company mature and self-sustaining. If your company can’t already live without your consultant, you should consider the quality of the service you’ve received.
By the way, a consultant shouldn’t make reports or become an additional pair of hands for you. You have your inside colleagues for that.
The main aim of hiring a consultant is education, fixing structures and processes, and facilitating communication. A consultant’s role is not monthly reporting but rather implanting himself or herself into the project and being totally involved in the daily routine of the team.
A good strategic marketing consultant fills gaps in the knowledge and understanding of project participants. But he or she may never do the work for somebody. And one day, everyone will need to work just fine without the consultant.
The results of effective communication are an absence of witch hunting and finger pointing. Before a task is started, people share their doubts and questions with other team members. Thus, most problems are solved before the work begins.
Let’s see how all of that influences the most complicated part of the marketing analysis job: defining data flows and merging data.
How is the Communication Structure Mirrored in Data Transfer and Processing?
Let’s suppose we have three sources giving us the following data: traffic data, e-commerce product data / purchase data from the loyalty program, and mobile analytics data. We’ll go through the data processing stages one by one, from streaming all that data to Google Cloud to sending everything for visualization in Google Data Studio with the help of Google BigQuery.
Based on our example, what questions should people be asking to assure clear communication during each stage of data processing?
- Data collection stage. If we forget to measure something important, we can’t go back in time and remeasure it. Things to consider beforehand:
- If we don’t know what to name the most important parameters and variables, how can we deal with all the mess?
- How will events be flagged?
- What will be the unique identifier for chosen data flows?
- How will we take care of security and privacy?
- How will we gather data where there are limitations on data collection?
- Merging data flows into the stream. Consider the following:
- The main ETL principles: Is it a batch or stream type of data transfer?
- How will we mark the conjunction of stream and batch data transfers?
- How will we adjust them in the same data schema without losses and mistakes?
- Time and chronology questions: How will we check the timestamps?
- How can we know if data renovation and enrichment is working correctly within timestamps?
- How will we validate hits? What happens with invalid hits?
- Data aggregation stage. Things to consider:
- Specialized settings for ETL processes: What do we have to do with invalid data?
Patch or delete?
- Can we get profit from it?
- How will it impact the quality of the whole data set?
- Specialized settings for ETL processes: What do we have to do with invalid data?
The first principle for all of these stages is that the mistakes stack on top of each other and inherit from each other. Data collected with a flaw at the first stage will make your head slightly burn during all the subsequent stages. And the second principle is that you should choose points for data quality assurance. Because at the aggregation stage, all the data will be mixed together, and you won’t be able to influence the quality of the mixed data. This is really important for machine learning projects, where the quality of data will affect the quality of the machine learning results. Good results are unattainable with low-quality data.
This is the CEO stage. You may have heard about the situation when the CEO looks at the numbers on the dashboard and says: “Okay, we’ve got a lot of profit this year, even more than previously, but why are all the financial parameters in the red zone?” And at this moment, it’s too late to look for the mistakes, as they should have been caught a long time ago.
Everything is based on communication. And on the topics of conversation. Here’s an example of what should be discussed while preparing Yandex streaming:
You’ll find the answers to most of these questions only together with your whole team. Because when somebody makes a decision based on guessing or personal opinion without testing the idea with others, mistakes can appear.
Complexities are everywhere, even in the simplest places.
Here’s one more example: When tracking the impression scores of product cards, an analyst notices an error. In the hit data, all impressions from all banners and product cards were sent right after page loading. But we can’t be sure if the user really looked at everything on the page. The analyst comes to the team to inform them about this in detail.
The BI says that we can’t leave the situation like that.
How can we calculate the CPM if we can’t even be sure if the product was shown? What’s the qualified CTR for the pictures then?
The marketers answer:
Look, everyone, we can create a report showing the best CTR and verify it against a similar creative banner or photo in other places.
And then the developers will say:
Yes, we can solve this problem with the help of our new integration for scroll tracking and subject visibility checking.
Finally, the UI/UX designers say:
Yeah! We can choose if we need the lazy or eternal scroll or pagination at last!
Here are the steps this small team went through:
- Defined the problem
- Presented the business consequences of the problem
- Measured the impact of changes
- Presented technical decisions
- Discovered the non-trivial profit
To solve this problem, they should check the data collection from all systems. A partial solution in one part of the data schema won’t resolve the business problem.
That’s why we have to work together. The data must be collected responsibly each day, and it’s hard work to do that. And the quality of data must be achieved by hiring the right people, buying the right tools, and investing money, time, and effort into constructing effective communication structures, which are vital for an organization’s success.