Mind The Data Gap

There is a lot of talk about Big Data these days, and if you are anywhere near Silicon Valley, it is more like a deafening roar. As important and difficult as Big Data is, we are also working to solve challenges of a different sort that gets relatively little attention: what we call Diverse Data. The key distinction between Big Data and Diverse Data is how structurally similar one piece of data is to another.

Why does diversity matter when working with data? A computer can only do what it has been programmed to do. That is true even if you include Artificial Intelligence, where a computer has been programmed to "learn" as it explores its surroundings. Hence, the amount of diversity within a set of data has enormous implications for how we build our software.

To put it simply: data is just nerd-speak for information. A computer that has been programmed to work with a certain type of information also must be told how to interpret the data. And that is where diversity comes into play.

Think about an example of a uniformly structured set of data: the names, addresses, and phone numbers of all the residents of State X. Because we have a well-established convention for how to represent this information, storing it is straightforward.

Now consider an irregularly structured set of data: the organizational charts of all companies incorporated in State X. Every organizational chart has essentially the same kind of information, such as employee names, job titles, and reporting relationships. The kind of information contained in an organizational chart doesn't vary much from one company to another, but the structure (i.e. format) of the information changes quite a bit. Companies use a wide variety of tools to build and share their organizational charts, from general-purpose diagram tools to custom software, resulting in a chaotic morass of information.

Then what does any of this have to do with water utilities? Every water utility that Valor Water Analytics works with is unique. As a small company, we're able to establish a relationship with each one. This gives us tremendous insight into how they operate, which also informs how their data is structured.

On the whole, water utility data is diverse. Delivering water to a community is about as local as it gets; as a result, every utility exists to cater to their customers and is therefore uniquely shaped by its customers' needs. Every decision that a utility makes regarding operations can have an impact on the organizational structure of their data. That is not to say that utilities don't face shared challenges; they do, and we can use those similarities as a common foundation. But whether a utility has been operating the same way for 50 years or whether they are at the leading edge of the technology curve, we will bridge the gap to seamlessly receive and analyze their data.

Each time we work with a new utility, we start with a conversation about the data. How is the data structured? How often does it change? What data is most important, and what data is most sensitive?  As we wade deep into the data abyss, we revel in the beauty of the chaos, the thrill of the challenge, and the satisfaction in shedding light on darkness. Integrating disparate systems are often messy - and that can be a good thing.

Authored by: David Wegman, Valor Water Analytics CTO

We welcome your comments and questions. Feel free to contact us at [email protected].