Data Hygiene Best Practices

Spray bottle labeled Data Clean pointed at laptop
You don’t have to be a rocket scientist to understand that having a wrong phone number for a person can keep you from reaching that person.

On a personal level, everyone has experienced the hassle of needing to reach someone, only to find out that they've changed their number. On the macro level, bad data contaminates business records so rapidly that it can be difficult to keep up. Every year thousands of people change jobs, get phone numbers, get married or divorced, move to a new city — and their lives take all kinds of different directions. Data hygiene best practices must be followed to contain and quarantine bad data.

What if it takes longer than expected for a B2B to reach out to a hot prospect? 30% of people change jobs every year, and now your point of contact is working for a different company. B2C models contend with the data churn of 43% of consumers changing their phone numbers in a year, while 25% to 33% of email addresses are outdated within 12 months.

All of this has a significant impact on your delivery and response rates, not to mention how much time you will waste trying to trace a prospect.

As you can see, B2B data deteriorates at a very fast pace, and if you fail to keep abreast of these changes, you’ll find yourself among the 62% of organizations who are infected with inaccurate customer data. Proactive and preventative data scrubbing can result in more accurate customer data.

What is Data Hygiene?

Data hygiene refers to the collective processes conducted to ensure the cleanliness of data. Data is considered clean if it is relatively error-free. Dirty data is a term used to describe inaccurate, incomplete, and inconsistent data. Following data hygiene best practices can counteract data churn and make it easier to embark on a data migration project.

Common Data Hygiene Errors

Bad data can be caused by any number of factors. Here are some of the most common errors:

  • Missing data/missing fields
  • Inconsistent mapping
  • Data that is not validated at the time of entry
  • Separate overlapping systems with duplicate or conflicting data
  • Generalized, badly defined values
  • Confusing naming schema
  • Out of date, invalid, inaccurate data
  • Sloppy formatting/invalid syntax that could break the new system

The Trillion-Dollar Cost of Bad Data Hygiene

According to Forbes, violation of data hygiene best practices costs the U.S. economy approximately $3.1 trillion annually. But where does that expense come from?

Let’s look at the 1-10-100 Quality rule, as formulated by George Labovitz and Yu Sang Chang. This states that it costs $1 to verify a data record as it’s entered, or at the point of data capture (the prevention cost); it costs $10 to cleanse and deduplicate the record (the correction cost); and it costs $100 continuing to work with a record that’s never cleansed (the failure cost).

While this is expressed in US dollars in the above description, it can be understood to mean any number of ‘units’, measured in financial terms. We can also use 1-10-100 to measure ‘cost’ in terms of resources or time.

What this means in practical terms is that if you don’t invest in data hygiene services, you can end up losing a lot of money.

That’s not to say that all companies are being short-sighted about B2B data hygiene. As described above, data can go bad rapidly. B2B data decays at a rate of 70% per year, and the average company loses 12% of its revenue as a result. When you look at these percentages, the $3.1 trillion lost in the US economy in a year comes into clearer focus.

Data Hygiene Best Practices

Make your ERP selection a robust system that includes functions which make it easy to mandate certain fields be filled in. Build in some validation functionality, and do some logic checks. Establish standardization rules and add constraints. As much as possible, make the system validate the data at the point it is entered.

There are ERP tools and integrations for keeping records updated in real-time. As well, there are ERP tools that automate data cleansing.

If you’re infected with bad data, it is going to take a dedicated project devoted to parsing, standardizing, cleansing and validating the data to implement data hygiene best practices. ERP Advisors Group can provide guidance in designing such a project.

Once you’re disinfected, you can put in systems and personnel to maintain scrubbed data and ensure data cleanliness.

Be watchful for indicators that dirty data has started to impact your business. Failing to follow data hygiene best practices can be an underlying reason why your performance as a business is struggling. Dirty data infestation might be an underlying driver as to why you’re not closing deals, or sales deals are taking longer, or accounting isn't collecting faster, or manufacturing costs are going up, or engineering is buying too much product — there are endless examples.

Here at ERP Advisors Group, we specialize in providing expert guidance on data hygiene and data migration. We help our clients assess the quality of the data, whether it is good or bad, and how much effort is required to clean it. We can provide direction as to what ERP tools to select for data cleansing and for keeping data updated, as well as guidance in designing projects to disinfect dirty data.

Get Expert Help From Our ERP Consultants

Narrator: This is The ERP Advisor.

Today's episode: The Science of Proper Data Hygiene.

Juliette Welch: Shawn Windle is our speaker for today. Shawn is the Founder and Managing Principal of ERP Advisors Group based in Denver, Colorado.

On today's call, Shawn will discuss the science behind proper data hygiene and the best way to approach a data migration project. Shawn, welcome. Thanks for joining us today.

Shawn Windle: Yeah, you bet, Juliette. It's always a pleasure.

Juliette: Thank you, we appreciate it. So, I guess we can just get started here. For those who might not be familiar with our topic, what is data hygiene exactly?

Shawn: So, data hygiene is when you go to the dentist and they start to— just kidding, that's a different kind of hygiene.

Now this is really about the enterprise data that you have throughout your company, so it could be things like contacts, customers, all kinds of different things, vendors, your invoices. If you think about an organization, nonprofit or profit, there's all this information — this data — about the business.

But how do we not only keep it clean, but keep it maintained and keep it accurate? So, it's really data hygiene. You can think of it as a process for ensuring the cleanliness of the data — so pretty straightforward.

You really want to look at ways to keep things error free in terms of how you're tracking information. And if you can think about the cost of not having clean data or keeping a good data hygiene program in place, that's where it gets really interesting. We'll probably talk about that more a little bit later but that's a pretty good definition to work with.

Juliette: Okay, great. That's a great start. So, what are some of the most common errors that we should watch out for?

Shawn: Yeah, so it's kind of an interesting thing, because you spend all this money on enterprise software — so it could be anything from a fire department that has to track information about their patients and the people that they go see and help all the way through to — it could be a distributor who's tracking information about their inventory or even some crazy examples I'm trying to think about where we have a client that sells very zany clothes and they have to keep track of the designs that they've made for maybe a Broncos player who's got a special outfit. Or maybe there's other interesting symbols and things that they put on their undergarments in that.

So, there's a lot of data that an enterprise has to keep track of, and so if you think about the mistakes and the common areas to watch out for, one of the things is you just don't have the right data elements where there's really important information that isn't being tracked.

So, we were at a business recently and as they're starting to grow and expand their engineer to order manufacturer and the information that they have about their costing, it's actually pretty light, like they don't have really specific information that goes into their projects. Because, well, in the past it didn't matter as much, but now as they're getting bigger and the costs are starting to increase and there's more labor and variability in what they're building, they're realizing, wow, we really need information about job costs that we didn't have before, so that's a big one to look out for, too.

There can also be things like really badly defined values. So, a great and super simple example is if you look at a company name — so almost any organization will have a name field, whether it's the company name or the donors name or some kind of a name.

Well what goes into that name field? Is it the full name of the company? So, ABC Co. dash LLC or Comma LLC? Is it just ABC? ABC Co.? There's all these different options that are kind of sitting out there and without really good standards being set by the organization, you're really up to the whims of usually salespeople because the salespeople are working with the customer that really at the beginning of the life cycle of a deal. And so, they put information fast, right? Because they need to just enter their information and get on to the next task.

And so, this badly defined values is something that that we run into a lot where you get garbage in equals garbage out.

So, we were just talking to another business yesterday and we were talking about how they want to automate their invoicing that starts all the way from, hey, let's get an order in upfront and then let's just — lights out is a common phrase that you'll hear; I just want to put in the order and the invoice gets automatically emailed and taxes get automatically applied but one of the people on the calls in the UK division was saying oftentimes the salespeople don't get the right bill-to information. They’ll get the ship-to, they'll figure out who to send the software to — it's a software vendor — but they don't have the bill-to information.

And if you think about the salesperson, they're trying to sell the deal, they're trying to get onto the next one and they get a signature and they go on to the next deal.

But one of the things that they have to enforce is to get the who to send the invoice to upfront. So, you see this combination of business process and data that the better the data is — the better the process is defined, the less errors you'll see on the data side, too. So putting in better standards — having a program around data hygiene where you're looking at these couple things — can make a huge difference.

Juliette: That's a lot to keep track of, right?

Shawn: Right. It's amazing. You need an ERP for that.

Juliette: That’s right. So, here I have a statistic: according to Forbes, poor data quality costs the US economy approximately $3.1 trillion annually. Can you explain where that expense might come from?

Shawn: Yeah, it sounds a little bit like, what? — but if you think about a really simple scenario where, well, let's take that last one, the bill-to information. Okay, so the salesperson says, okay, here's the order, here's what the customer wants, here's the price, they want an extra discount, I went back to the manager, the manager approved it and went back to the customer. The customer says I actually want these other terms and like the balls are going back and forth. You're like, oh my gosh okay, and I need to bill-to information just to get the order closed, I'm going to put it in, and done. That's kind of the cycle that salespeople are paid to do. They want to get the deal closed fast.

They put in Joe Schmo as the as the bill-to. Maybe they asked the prospect like, who do we send the bill to? Joe Schmo, great. We'll put in Joe Schmo. And that's it. Then the order gets placed, the fulfillment starts, then we send an invoice over to accounting and accounting looks at that field and says, okay, it's Joe Schmo. So, then accounting says, fine, put in the invoice. They don't know the invoice goes out. The payment terms, let's say are 30 days.

So now we have almost like a time machine starts. Like day one, day two, day 30 — accounting looks in and says we haven't gotten paid on this, what's going on? Maybe I'll just go ahead and send another email out to Joe Schmo and just see what's happening here. The email goes out. And nothing happens, right? It's not like there's a bounced email or anything like that. But it takes a couple days to see kind of hm, maybe I give him an extra couple days to see if they respond or whatever, and they haven't responded.

Meanwhile, the rest of the business is moving forward. The cost of capital is increasing. We're having to pay people and this invoice is still not getting paid. So, finally the accounting person, after maybe another 30 days — because they're dealing with a million things — where after 60 days total he says, I better call this person to find out what's going on.

It's very common in B2B where you don't get paid within a certain amount of time, so then they call and there's nobody named Joe Schmo at the company. So, they're like what's going on here? So, then the first person that they call is the salesperson. This is 60 days ago. The salesperson says, well, that's what my buyer said it was, we sell engineering software and that's what the head of engineering told me to do is to send it to Joe Schmo.

And the accounting person is like, okay, well, I'm going to call your customer and find out. Like no, no, no, no let me call the customer. The software person is busy. So, another couple days go by, they finally loft over a call, it takes the person on the other side a couple days to get back to them — we're at like day 70 and I'm still not done — they come back and say, Joe Schmo doesn't work with us anymore, now it's Jane Doe.

And they're like, oh, okay, so the salesperson tells the accounting department, okay, go ahead and send it to Jane Doe. So, then the accounting department sends it over to Jane Doe. Jane Doe gets it. She puts it into the AP payables list, and they don't do payables except on Fridays and it's a Monday.

Now we're at day 80. Check gets cut — let's say it's just checks — and the check takes a week to get to us. We're at day 90.

So, if you think about the amount of cost that went into — like if the salesperson would have gotten it right upfront — and don't get me wrong, I'm not throwing salespeople under the bus. But if they would have gotten that right — maybe it was another couple minutes or something — the costs would have been like a dollar.

But then you think about all of the time that the accounting person put into tracking all of this stuff that was based off of this wrong piece of data upfront. Not to mention the cost to the overall company of not having that cash to be able to do something different with it. Like that $1.00 mistake — if it would have just been a dollar that went into it — it's probably worth over $100 if not more.

I mean that scenario is even more because an accounting person fully loaded cost is maybe doing $50.00 an hour, maybe even more. So, this person spending five, six, ten hours on follow-up. So that's cost there, much less the other costs. So it's this compounding problem that just gets bigger and bigger the worse the data gets.

That's a simple example, but there's even other examples around the engineering process or even looking at HR where, oops, we got the decimal place in the wrong spot.

But so, we paid the employee less — like a lot less — and the employee gets upset. And now the employee has that thing in their mind like, I don't know if I can trust these guys because somebody just put a decimal in the wrong place.

Juliette: Right. Human error.

Shawn: Yeah, and so that's really — with a proper data hygiene program in place — and don't get me wrong, everybody is busy. I do understand that. But again, that's why you see these kinds of quotes where getting data wrong costs billions and trillions of dollars. I think even in your example in the quote that Forbes talked about — that's where that comes from.

Juliette: Okay, and I'm sure there's probably examples where if, say Joe Schmo left the company, who did they even try to collect the money from — if Jane Doe doesn't come in and take his place?

Shawn: That's exactly right.

Juliette: So then how do they collect the money? And who ends up paying?

Shawn: That's right. And sometimes nobody does. I mean, the bill just goes into cyberspace or wherever, and oftentimes companies don't stay on their accounts receivable that much. So, then we've completely lost all those dollars. So, all that salesperson had to do is get the bill-to right in the first place. And I'll talk to those salespeople and tell them that. I'm going to talk to our salesperson after this meeting, actually.

Juliette: Yes, so okay. Well with that said, what would you say is the best way for a business or nonprofit to stay out in front of deteriorating data?

Shawn: That is a great question.

So, I think you're really — let's take the nonprofit as an example. Let's take their donor database. That is vital to every single nonprofit — some of them do more grants — but the most important thing to do is having a really good solid system that is built such that it's easy to say mandate certain fields be filled in.

So, when a donor calls into a nonprofit, and says, hey, I'd like to donate and here's my credit card for $100 or whatever it is. Okay, great. So, the person enters the information, clicks save, processes the credit card, and everything is done. That's how the core process should go.

But then you look at the nuances of like, did we ask for certain information upfront when the person called in to do the donation? Or a person decides to do it online, did we ask for the right fields? Like I get it that that’s obvious and you don't want to ask somebody too much information. You want to get their credit card, right? You want to get the donation processed.

But there really is this balance between getting the transaction done — no different than the sales person or even an accounting person who's doing the collections, there's a Dunning process it's called where you notify the customer by email. You send him a letter da da da da da. The accounting person needs to keep track of the data of what's happening there, too. So, it's not just upfront, it's on the tail end, too.

But having systems that enforce gathering the right kind of data is vital. And I think most organizations understand that — we've talked a lot about 2020 being the year of SMBs for ERP. That's a big reason why, because smaller organizations mid-size are realizing, oh my gosh, we're putting all of our enterprise data in a spreadsheet and there's no way to enforce things like the mail-to address is correct.

I mean, you guys have probably experienced this online where you fill out an order and sometimes I put my address in and I'm in Lakewood, but the Postal Service decides that it's Denver. So, then a little screen, comes back and says, did you mean this address? Oh sure. That just saved a ton of time, because the order goes out, it goes to some other wonky address, the product sits there, I get upset, and then we don't receive the product and everything like that.

So having strong systems in place with field level control — again, that's sort of an obvious thing that you want to take a look at and have those controls in place. But I would say — and we see some of our bigger clients that do what's called a master data management program, an MDM program, where they're looking at a customer in the CRM and a customer in order fulfillment and a customer in accounting, lining up that record so that we have the full visibility of the customer. We're managing the customer data throughout the whole process versus by system.

A lot of the organizations that we work with, they get that concept, and they may even put in an MDM person if they can afford to have a guy or a gal — now I mean, even to the extent of data scientists, these are people that look at data and they try to figure out how they can make better decisions. What can we understand from the data that we have? Those programs are amazing. But what doesn't happen is a data hygiene program like, hey, I know what we should do today, let's go clean our data.

Juliette: Yeah, it sounds fun.

Shawn: I think I should go sell some more products or I should go collect some more money or whatever first, right? But it really is true that when you look at the expense of bad data that you really, as a reactive model, sometimes you have to put the right program in place. You just have to. It's sort of like these conditions where you start to see little indicators like, why is our day sales outstanding going up? Let's just look into the area. Let's find out what's going on.

Gosh, we're sending a lot of invoices. And we're sending to the wrong addresses. We're not getting responses from people, or there could be other indicators with inventories starting to go up. Why is our inventory going up? Why isn't anybody looking at what we have in the warehouse before they purchase new products? Look in the warehouse, see what's there.

Like you really have to start investigating from a business standpoint — the indicators are there of bad data hygiene. They're right there. It's sort of like — it's again these dental analogies. Recently my dog — I have the two most amazing dogs in the world like I'm sure everybody else does, my dogs are cool — but one of mine has bad breath ever since we've had him, and so we're finally like, okay, that's it, like what's going on? And we take him to the vet and he's got like corroded teeth. We're like okay we’re worst dog parents ever.

But when we really got in and got the problem handled, now his breath is better. So, there are dog bad breath indicators. When you think of data hygiene, think of bad dog breath? I don't know.

But it's true that there are things that are happening in the business that when you start inspecting and going into this, just know if there's anything that you guys can get from listening to this and watching this discussion that we're having is know there's indicators of a bad data hygiene environment. And again, you'll start to spot those when you look at like days being delayed because people aren't getting back to you or our expenses are going up, expenditures are going up around inventory, even our salespeople aren't as effective as we think they should be. How is the lead data that we're giving them?

I want you to know that data hygiene is one of those things that can lead to these bad indicators in your business. But the most important thing is that something can be done about it, even if it's on a reactive side where it's like, that's it, we have 50 custom fields in salesforce.com on a customer. And maybe it's too much because the data that's being entered into the ten fields that we really need is wrong. There might be some things that we have to change with our processes, but until you go in and really look at why the data is the way it is, you don't know that kind of thing.

So I'm a little bit — I can get pretty passionate actually about data and I didn’t even realize it until we had this discussion. Because it can lead to so many bad things happening that there are simple solutions, there's systems that you can buy, but most likely the systems that most of our clients have are fine anyway.

They can build in some validations, do some little logic checks, and that kind of thing — and that helps out, but definitely putting it into people’s hands as well, and making people understand, okay, if you don't tell us the country of shipment, salesperson upfront, accounting doesn't know what to do with the taxation on the back end. And if we don't get the taxes right on the back end, the CFO goes to jail.

That's the reality. And all of a sudden the frontend salesperson says, you know what, I'm going to get this right for you.

That kind of conversation changes everything.

Juliette: Well it's like knowing what to look for, when to look for it, and how to look for it. You can just head everything off at the path, right?

Shawn: That's right, yeah. That's why we do these. That's why I love these conversations with Juliette from the ERP trusted advisor.

Juliette: Thank you Shawn.

Shawn: It's great.

Juliette: Yes, it's really great. You are a wealth of knowledge.

So, I think we're coming to the end of our time. Is there anything else you'd like to add for our listeners today and just maybe kind of summarize a little bit about data hygiene?

Shawn: Yeah. I think like we talked about, data hygiene, if you look at it as a program, that's really the best way to do it. There's people, there's processes, there's tasks that need to get done. Doing it more proactively is always a good thing.

There's a lot of things that initiatives and things that people have to work on in a business, so totally understand. But when you see those indicators, know that data hygiene might be an underlying driver as to why we’re not closing deals or even sales deals are taking longer or, just look across the whole business, accounting isn't collecting as fast, or manufacturing costs are going up, or engineering is buying too much product or whatever it is. Know that data hygiene can be a really underlying reason why we have the bad dog breath or why our performance as a business is really struggling.

And again when you feel that, go in and investigate for yourself. Use your own eyes to look in here and say okay, these guys are saying everything is fine. I'm going to go look at some records, I'm going to go see here and trace it through.

And then the last thing that I would add, too, is as I said that when you have a cross-process discussion and you talk to stakeholders who are involved upfront and in the middle and the back end and they start talking about the problems that they're having, you'll see that the people that are kind of upstream, if you will, if data and other things trickle down. And value kind of goes all the way through that whole chain that if there's something upfront that's not right, it's going to impact everybody else down the whole process to the point of trillions of dollars like Forbes said. So that's why we're talking about this. I appreciate that.

Juliette: So, thank you Shawn for joining us today. We really appreciate it and thank you everyone for joining us for today's call.

Please let us know if you have any questions. We're here to answer any questions you might have and to help in any way we can.

Please join us for our next call scheduled for March 11th: The Role of Blockchain and ERP: Beyond Cryptocurrency. In this next edition of The ERP Advisor, find out why we think 2020 is the year blockchain ERP tools will dominate and possibly even make sense for small and midsized businesses. Please go to our website erpadvisorsgroup.com for more details and to register.

Thank you again, Shawn. We appreciate your time.

Shawn: Thank you, Juliette.

ERP Advisors Group is one of the country's top independent enterprise software consulting firms. Advising mid to large sized businesses on selecting and implementing business applications including ERP, CRM, HCM, business intelligence, and other enterprise applications which equate to millions of dollars in software deals each year across many industries.

This has been The ERP Advisor.

RELATED