Zero visibility: Issues in Water Use Data Resolution

By David Wegman, CTO, Valor Water

In the beginning -- that is, before HD television -- there was standard definition television.  Back then, nobody complained much about the quality of the image.  In reality, the reason why people didn't make a fuss was that they didn't know what they were missing out on.  The same goes for the transition from cassette tapes to CDs and a host of other evolutionary enhancements in audio/visual quality over the years.  Ignorance is bliss.

We are now seeing some parallels in the era of data.  Utility metering data is a typical example.  As Advanced Metering Infrastructure (AMI) continues to be deployed, we are increasingly able to detect signals that were previously hidden.  This new visibility is driven by the greater number of data points per time interval that accompanies a smart meter deployment.  Whereas pre-AMI water meters are typically read around once per month, so as to coincide with a monthly billing interval, AMI can provide a meter reading once per hour -- a 720x increase in the number of data points.

In the digital audio world, the number of data points in each time interval is known as the sample rate.  Sample rate is one of two primary contributors to the quality of digital audio.  The other is known as bit depth, which describes how many unique values can be used to encode each sample.  The reason why 24-bit audio usually sounds better than 16-bit audio is that it is a truer representation of the original signal; less has been lost in translation.  The same concepts in video are known as frame rate and resolution, respectively.

Resolution in water meter data

Water usage data streams similarly benefit from an increase in resolution.  In a time-series graph of water usage data, a higher sample rate leads to more granularity on the X-axis (representing time).  Meanwhile, a higher resolution leads to more granularity on the Y-axis (representing water volume).  AMI has received a lot of attention, but it's only part of the equation.  It's vital to have sufficient resolution as well for the data to be useful.

In the United States, most water meters measure volume in gallons or cubic feet.  While older water meters might have a resolution of 10 gallons, newer meters can have a resolution of 1 gallon or less.  (For the purposes of data analysis, it is not important whether meters record volume in gallons or cubic feet.)  A meter that can record at a lower level of granularity is considered to be higher resolution.  While smart meters tend to be newer than traditional meters, even smart meters aren't necessarily higher resolution, as the defining characteristic of smart meters is the higher sample rate (X-axis).

Turning bits into pictures

With the explosion in the amount of data being produced by water infrastructure, data visualization becomes paramount for understanding. One way that we can explore smart meter data visually is to plot the distribution of meter reads from a set of meters using a histogram, which places each meter read into a bin based on the water consumption recorded.  If the bin size is 10 gallons, then the first bin describes the number of meter reads that registered 0-10 gallons. Here is a histogram showing the distribution of meter reads from one set of meters with a resolution of 10 gallons.

waterchart1

For comparison, here is another set of meters with a resolution of 0.1 cubic feet (approximately 0.74 gallons), which has been converted to gallons.

waterchart2

We can compare the graphs to help identify what importance, if any, water meter resolution has in capturing usage data.  There are some differences between the two graphs.  The second graph has more meter reads in each bin, and also has a smaller decrease in meter reads from each bin to the next.  However, the overall trends shown in the two graphs are similar.  The majority of the meter reads are in the first bin, followed by a drop off approximating a power curve and a long tail.  (The bins toward the right side of the graph actually represent relatively small positive values, not zero.)  We might draw the conclusion that water meter resolution is not a significant factor in capturing and reflecting usage.

A closer look

And yet, there's a critical difference between the two data sets that's not evident from the graphs alone.  Consider the resolution of the first set of meters and look once again at the first graph.  These meters have a resolution of 10 gallons, which means that they will only record values that are multiples of 10: 0, 10, 20, 30, etc.  All of the meter reads in each bin have recorded the exact same volume.  The first bin contains only zero reads, the second bin contains reads of 10 gallons, etc.

The bin size in both histograms is the same.  The first bin counts meter reads with a volume of at least 0 but less than 10, the second bin counts meter reads with a volume of at least 10 but less than 20, and so on.  However, in the first graph, all of the meter reads in the first bin have a volume of exactly zero.  Meanwhile, in the second graph, the meter reads in the first bin are a mix of zero and non-zero.  We can explore this subtle, yet important, distinction between the two data sets.

The importance of zero

The number of meter reads with a volume of exactly zero turns out to be very relevant.  As it happens, in the first set of meters, 81.5% of the meter reads are exactly zero.  In the second set, 19.0% are exactly zero.  Assuming that consumption is similar between the two sets of meters, the relatively poor meter resolution in the first set is obscuring the actual consumption.  Consider, as an example, a customer who consumes a total of 10 gallons over a 24-hour period.  Using a meter from the second set, this consumption will result in between one and 24 hourly data points of non-zero, depending on how much water was actually consumed during each hour.  However, using a meter from the first set, the same usage will typically register as exactly one hourly data point of non-zero and 23 hourly data points of zero.  (Which data point is non-zero depends on the current value of the meter at the beginning of the period.)  Even during periods of consumption, the meter will continue to record zero until its minimum resolution of 10 gallons has been met.  The meters in the first set present a distorted view of the true underlying consumption, due to the proliferation of zero reads that correspond to non-zero consumption.

Zero reads are of particular interest for data analysts because they indicate not very low consumption, but non-consumption.  While the two graphs shown above tell a similar story, comparing the percentage of reads that are zero in the two data sets yields something very revealing about the importance of resolution.  We are also reminded that while data visualization can be a very powerful tool, it relies on the underlying data being accurate, and can distort reality if we are not careful with what is displayed and how.

Used effectively, the increase in utility meter data has enormous potential.  The higher the resolution of water meters, we can clearly detect signals that are otherwise invisible.  As resolution improves, we can all benefit from a better understanding of where and when our resources are being consumed.  So dim the lights, stretch out on the couch, and enjoy the stream of bits.