Challenges and Innovations: Current and Future States of Water Affordability


This is the first in a series of Valor Water Analytics blog posts exploring water affordability, customer nonpayment, and technology that can enable utilities to deliver water more equitably and sustainably to all customers. 

Where We Are: Assessment of Water Affordability Today

By Stacey Isaac Berahzer, Christine Boyle, PhD, editing by Maryana Pinchuk

Given the looming affordability crisis, new interventions are needed to help communities pay their water bills, while also helping utilities collect the water and wastewater fees needed to fund much-needed infrastructure upgrades. To date, a large volume of work on how to measure affordability has been undertaken, but what is sorely needed are programmatic strategies to help both households and utilities cope with the mounting costs of clean water provision.

This post (the first in a series on affordability) explores three foundational topics: defining affordability, measuring affordability, and funding customer assistance programs (CAPs). These are important topics for beginning to understand the current state of the conversation on water affordability, and they are relatively well-covered in scholarly and industry publications. In future posts, we will explore topics that have received less coverage but are also critical to tackling the growing affordability crisis: customer data, utility-customer communications, and novel interventions that may help utilities recover more revenue while assisting their most economically vulnerable customers.

Defining Affordability: What is “Affordable” Water?


The United Nations (UN) recognizes the basic human right to water and sanitation. In its Comment No. 15, the UN defines this right to water as the right “of everyone to sufficient, safe, acceptable and physically accessible and affordable water for personal and domestic uses.”

But what does it mean to say that water must be “affordable?” How to define affordability in the context of water service delivery is an ongoing discussion. There is some consensus emerging that affordability is not a universal term and should be defined at the local rather than the national level. In the latest version of its “Principles of Water Rates, Fees and Charges” (or M1) publication, the American Water Works Association (AWWA) includes a few updated chapters, including the “Low-Income Affordability Programs” chapter. The authors state: Given variations in local economic conditions, compositions of the customer base, and community values, defining affordability must be done at the local level.” In another article, “Is Our Water Affordable?,” authors Jon Davis and Joe Crea corroborate this idea: Any one-size-fits-all guidance on what constitutes affordable water service is going to be inappropriate when applied to most local considerations.”

In this view, affordability is not a top-down mandate with clear, universal success metrics to be achieved, but rather an ongoing conversation that takes into account the specifics of each water utility service area and its customers.

Measuring affordability: Which Metrics Work Best?


Inherent in any definition of water affordability – at the local or national levels – are metrics for benchmarking it. The UN offers some guidance on how water affordability should be measured. In 2002, the UN stated that to be affordable, water costs should not exceed 3% of household income, with the combined cost of water and sanitation not exceeding 5%.  But, there are many older guidelines for measuring and benchmarking water affordability within the United States. The most common way of measuring water affordability, and indeed the most maligned method, involves Median Household Income (MHI). In what may have been a series of unfortunate events, “4.5% of MHI” evolved into an infamous benchmark for affordability, where 2% is used on the wastewater side and the other 2.5% is attributed to water. This guidance is derived, in part, from the 1997 EPA report “Combined Sewer Overflows—Guidance for Financial Capability Assessment and Schedule Development.” The metric, known as the Residential Indicator, was developed only for wastewater systems’ ability to comply with federal regulation, but it somehow mistakenly became a poor proxy for an affordability benchmark. The 2.5% MHI benchmark on the water side is similarly arbitrary and its origins shrouded in mystery.

Critics of using MHI to measure affordability point to the fact that even people below the “middle” income point in any community also need water. Therefore, measuring affordability at the median level hides the problems that customers in the lower income brackets, such as the lowest 20% income-earners in a community, face. Since these lower-income earners have less disposable income, it is more relevant to benchmark water affordability against their incomes. Alternatives to MHI that reflect this belief include the Affordability Ratio (AR) and Hours’ Labor at Minimum Wage (HM). The AR considers the cost of water and wastewater compared to essential household expenses such as taxes, housing, food, medicine, health care, and home energy. Jason Mumm and Julius Ciaccia offer another alternative to the Residential Indicator (RI) called the Weighted Average Residential Index, or WARi.™” It is a calculation of the weighted average financial burdens across all income levels, in all census tracts in a given utility’s service area. Less crude examples than the “4.5% of MHI” metric look at a spectrum of incomes in a community and the relative percentage of those incomes that customers are spending on water. The UNC Environmental Finance Center’s free Affordability Assessment Tool is a good example of putting this metric into practice.

Acting on Affordability: How to Transform Definitions and Measurement into Action?


Communities that have crossed the initial hurdles of locally defining and setting a threshold for affordability may be displeased with the (growing) number of their customers who cross this threshold. A natural response is to develop a customer assistance program (CAP) to help customers that fall below the accepted affordability threshold. But, funding a CAP may be challenging. A 2017 report, “Navigating Legal Pathways to Rate-Funded Customer Assistance Programs,” found that deciding if a utility’s primary revenue source – rates – can be used to fund a CAP is not a simple exercise. Most states have fuzzy language when it comes to whether a utility’s rate revenues can be used for a program like a CAP. Money from volunteer bill round up programs, lease money from cell companies, and royalties from insurance on service lines all have a bright green light for CAP funding. Unfortunately, these methods tend to generate relatively small amounts of money. To develop a robust and effective CAP, a utility needs to access its rates revenues. There is mounting evidence, and publications, that paying for a CAP through rates revenue ultimately helps a utility’s bottom line. States like Indiana have recently revised their statutory language to make it clear that utilities can include CAP development in the rate setting process. 


These three affordability topics – defining affordability, measuring affordability, and funding CAPs – have been the focus of much research and publication up to this point. While there is still room for more work in all of these areas, there are additional areas ripe for exploration, especially when it comes to how a utility can take advantage of new technologies around communicating with customers. Stay tuned – we will explore this in subsequent installments of this blog series.

Smoke on the Water: Valor Staff Tours California’s State Water Project

By: Maryana Pinchuk

Smoke and fire may have been in the air (literally) in California these past few weeks, but water is never far behind as a subject of concern for residents of the state. Earlier this month, while fires raged from Los Angeles to Sacramento, my colleague Renee and I accompanied staff from the Municipal Water District of Southern California, as well as other water utility staff and interested citizens from Southern California, on an inspection trip to learn more about the California State Water Project.

As Municipal Water District of Southern California Director Larry McKenney pointed out at the start of our trip, the state of California has the 5th largest economy globally (just ahead of Britain), and its productivity depends largely on the mostly water-scarce state’s ability to move water. The State Water Project is a system of dams, pumping stations, reservoirs, and aqueducts that conveys water from a small water-rich area in the northernmost part of the state to the dry but highly populous communities in the middle and south. The project is the largest provider of water and power in the state, and one of the largest in the world.

Sunset over the San Luis Reservoir, the fifth largest reservoir in the state.

Sunset over the San Luis Reservoir, the fifth largest reservoir in the state.

This sophisticated system of water conveyance begins in the Feather River near Sacramento. Water from the river collects in Lake Oroville and passes through Oroville Dam before proceeding on to the Sacramento–San Joaquin River Delta. The water then travels down the California Aqueduct to the San Luis Reservoir, where it is pumped further south to meet the water needs of Southern California communities, including Los Angeles and Santa Barbara to the west (via the Castaic and Pyramid Lake reservoirs), and San Diego and Orange County to the east.

Pyramid Lake Reservoir, completed in 1973, is the deepest lake in the state. Here, water is held and conveyed to Castaic Lake Reservoir and from there supplies northwestern Los Angeles County.

Pyramid Lake Reservoir, completed in 1973, is the deepest lake in the state. Here, water is held and conveyed to Castaic Lake Reservoir and from there supplies northwestern Los Angeles County.

The State Water Project may not exactly be the most well-known tourist attraction in the state, but it is the secret engine that powers some of the most iconic features of California, from the glitzy pools of Hollywood to the more modest groves of California almond trees – a crop that, like asparagus, melon, cotton, and other local cash crops, thrives in the dry and temperate Mediterranean-like climate of the Sacramento–San Joaquin River Delta.

Cotton growing in the Delta. We learned that California cotton is sold and prized worldwide for its high quality and even ends up in some products marketed as “Egyptian cotton”!

Cotton growing in the Delta. We learned that California cotton is sold and prized worldwide for its high quality and even ends up in some products marketed as “Egyptian cotton”!

Joe Del Bosque discusses almond cultivation and shows us his trees

Joe Del Bosque discusses almond cultivation and shows us his trees

Almonds, we learned from longtime Delta resident and farmer Joe Del Bosque of Del Bosque Farms, are a cousin of the peach tree, and farmers have learned to graft almond saplings to the hardier peach roots, which are less susceptible to rotting in heavily irrigated soil. But the ingenuity of the Delta farming community is meeting its match in the precarious ecology of the Delta, where a system of levees built in the 1800s to turn marshland into farmland is beginning to show its age, and where soil erosion and earthquakes threaten the $50-billion-a-year agricultural business.

Over breakfast in the state capital, with the lingering smell of smoke providing an uncomfortable reminder of the increasing danger posed by climate change and extreme weather, we were shown a presentation about the challenges facing the Delta in the next 50 years. We watched a model simulation of the probable effects of a major earthquake – long overdue in the area – on water quality in the Delta. We all winced as the model showed the levees disintegrating and a cloud of salt water from the San Francisco Bay pumping steadily eastward hour by hour. According to the simulation, by the end of a week after the initial quake, all of Southern California’s water supply would be rendered non-potable.

Suisun Marsh , one of the few preserved tidal marshes that showcase how the Delta looked before it was transformed by agriculture.

Suisun Marsh, one of the few preserved tidal marshes that showcase how the Delta looked before it was transformed by agriculture.

To address the very real possibility that gradual (through levee erosion) or sudden (through a major quake) salinization may one day cripple the Delta leg of the State Water Project, the Municipal Water District of Southern California is proposing to create a set of tunnels through the area. This would ensure that fresh water could continue to be channeled through the Delta to consumers in the south, even if the Delta were flooded with brackish water. The proposal, called the Water Fix, has raised objections from some conservation groups that argue against diverting flow from the rivers in the area. However, others contend that what the wildlife that already struggle to thrive in the agriculturally-dominated waterscape of the Delta need is not higher throughput in the rivers, but other conservation practices – e.g., fish weirs and controlled flooding of fallow farmland to allow fish fry to mature in a predator-free environment before returning to the river system – that are not incompatible with the Water Fix.

A fish weir near Sacramento – during a major rain event, fish and water will be directed into this fallow field to mitigate flooding and provide a safe environment for fish fry to grow in.

A fish weir near Sacramento – during a major rain event, fish and water will be directed into this fallow field to mitigate flooding and provide a safe environment for fish fry to grow in.

We wrapped up our trip with a visit to the Jensen Water Treatment Plant, the last stage that State Water Project water passes through before being delivered to SoCal customers. In the hills to the north of the plant, the Los Angeles Aqueduct (not part of the State Water Project) delivers an additional supply of water from Mono Lake to the city of Los Angeles. As evidenced by the heated history of that water infrastructure project, culminating in the legendary California “Water Wars” depicted in the 1974 noir film Chinatown, controversies around water are far from new in this state. And yet, through over a century of conflict over water rights and allocation – as well as the additional issues posed more recently by increased water scarcity – California’s water infrastructure has continued to rise to the occasion and meet the ever-growing needs of the state and its residents. California’s water supply may seem precarious, but water utilities and their staff are certainly used to facing and overcoming challenges, and the successes of the past point to hope for the future.

Maryana and Renee from Valor at the Jensen Water Treatment Plant, with the Los Angeles Aqueduct in the background.

Maryana and Renee from Valor at the Jensen Water Treatment Plant, with the Los Angeles Aqueduct in the background.

Watermark Month of Service

The Valor Water Analytics team participated with several events during Xylem Watermark’s Global Month of Service, during which Xylem employees around the world came together to participate in volunteer events in service for their respective communities.

Beach Cleanup 10.28.18

The Valor Water Analytics team in San Francisco participated in their very first Watermark volunteer event, a beach clean up organized by the Surfrider Foundation SF Chapter. The organization’s mission is to protect oceans and beaches through a powerful activist network. They organize clean-up events on a regular basis and raise community awareness around reducing pollution in beach habitats. The Surfrider Foundation operates in several cities in California, as well as in other coastal areas across the United States, with 81 chapters in 10 regions.

It was wonderful to see a huge turnout at the cleanup event by Xylem employees, local school groups, and others in the community. There were more people than buckets for the collection of garbage and we noticed a variety of volunteers of all age groups, from toddlers to elderly people. The Valor Team was very successful in their search for trash and collected items including several cigarette butts, Styrofoam, beer bottle caps, and even a fork and spoon ended up in one of the buckets! It was a great experience and a huge inspiration for the team to participate more regularly in such events.

You can find more information about the Surfrider Foundation and their mission on their website:


Watermark 101 10.29.18

The Valor team is geared up and ready to give back! We held a Watermark 101 lunch and learn as part of the October MOS events to kick off Valor’s involvement with Watermark. The team learned more about the Watermark vision, how to get involved, and brainstormed events for the upcoming year. We wrapped up with a competitive game of Watermark Trivia! Topics included our local water system, Xylem and Watermark, state of our water infrastructure, and the national water investment gap. Not surprising, our winners were Valor founder Christine Boyle and Valor veteran Renee Jutras. The rest of the team will be studying hard for round two in the future!


Halloween 10.31.18

This years Halloween theme was water. It was time to put on our thinking caps and get creative. Here’s how the Valor team did!

We had a rain cloud, a sea monkey, Dory, a dead meter, The Great Pacific Garbage Patch, a mermaid, and Leonardo from Teenage Mutant Ninja Turtles.

And the winners...

Most On Xylem Message: TGPGP

Most "One of Those Days" Feeling: Rain Cloud

Most "Sharkey Kids Favorite": Dory

Most "Best Sea Friends Experience": Mermaid + Sea Monkey

Most "Can I Borrow That Costume": Leonardo

Overall Winner: Anomalous Zombie Meter


Apparent Water Loss, Optimized Vision, and Entrepreneurship: Q&A with Valor Founder and CEO Dr. Christine Boyle

By Elizabeth Harvell of UNC Environmental Finance Center

Earlier this year, Valor Water Analytics (Valor) was acquired by Xylem Inc., a $13B water technology company that services utility and commercial clients across 150 countries. While this is big news in its own right within the water industry, it’s especially exciting for the Environmental Finance Center: Valor Founder and CEO Dr. Christine Boyle previously worked as a research assistant at the EFC while pursuing her doctorate in water resource planning from the University of North Carolina at Chapel Hill.

Read the full interview on the UNC EFC Blog.

Valor Water Analytics Acquired by Water Giant Xylem


We are excited to announce that Valor Water Analytics (Valor) was recently acquired by industry leader Xylem Inc (NYSE: XYL). Xylem is a $13B water technology company that services utility and commercial clients across 150 countries.

Dr. Christine Boyle founded Valor in 2013 with a mission to bring big data solutions to water utilities in order to improve their financial and water resource sustainability. To accomplish this, Valor created a suite of world-class software products. Valor’s products are now deployed in ten states across the USA, including notable utilities such as American Water and Suez. Its “Hidden Revenue Locator” product is widely recognized as a best-in-class technology for automated loss detection. The company remains committed to integrating its technology with all meters across the US and beyond. Valor will now execute on this ambitious vision under the Xylem umbrella.

The alignment of Valor and Xylem in product and vision made this acquisition the right strategy for Valor’s next stage of growth. Under Xylem, Team Valor continues and will spearhead Xylem’s Silicon Valley branch and lead Xylem’s advanced data science initiatives. Valor’s product lines will join Xylem’s existing suite of advanced analytics products. This exit demonstrates the value of building an innovative water technology that brings measurable value to the water sector.

Valor had previously raised $2.8M from investors such as the Urban Innovation Fund, Y Combinator, 500 Startups, Apsara, Hydro Venture Partners, Shore Ventures, Syzygy, and Matadero Ventures. These investors supported this exit and are excited for the next chapter of Valor.

Valor is looking forward to solving the world’s water issues as part of Xylem’s world-class team of dedicated water professionals.

Water utility websites: Moving from good to great

Jordan Schuster (Intern @ Valor) and Janani Mohanakrishnan (Chief Delivery & Product Officer @ Valor)

In the modern world, there is one thing as ubiquitous to Americans as water: the internet. The time that U.S. users are spending on websites and mobile applications continues to grow, and standards for visually appealing and informative content is higher than ever before. A great website can even be the deciding factor when a customer is choosing between competitors. When it comes to water though, customers rarely have a choice. Therefore, many water utilities have little incentive to provide a great customer experience through their websites. A recent study by J.D. Power noted that water utilities lag in satisfaction with their websites compared with gas and electric utilities, particularly in the areas of being able to update service, account login, make a payment, and review account information. This got us chatting internally at Valor, and we decided to do a quick study of twenty-five water utility websites to assess current state and develop our recommendations for a great customer experience.


We identified twenty-five water utilities in total across the U.S. The distribution by region is presented below. Within each region, there was a mix of small and large sized utilities, and a mix of public and private utilities.


We assessed these utility websites for five key performance metrics – Clear Bill Pay Access, Helpful Menus, Visual Appeal, Water Efficiency Content, and Report a Problem. Success was determined as the ability to either access the relevant content or gain the impression associated with the metric, in five seconds of opening the website.


In general, the quality of the water utility websites we examined were pretty good. The table below summarizes how the websites performed against our metrics. There was no website that failed on all performance metrics. An unexpected finding was that over a quarter of the websites did not have the Report a Problem feature clearly listed. This is interesting since it would limit customer engagement with the website, and also prevent utility companies from having quick notice of major leaks or other customer issues. It is possible that utilities provide this information to their customers through other communication channels; this was beyond our scope to investigate. Another interesting finding was that not all utilities had water efficiency measures or conservation messaging prominently in the menu, or somewhere on the main page so that users would be inspired to learn more. This was especially lacking amongst the Northeastern utilities studied. It raises the question: Should we expect water awareness content from our utilities ALL the time, or only when there is drought?



How can water utility websites move from good to great? While the current state of utility websites is acceptable, we predict that more customers will want a great experience every time they access the utility website. With the plethora of talent in web design, and the reasonable costs for web engineers, there really is no excuse for not keeping websites up to date and customer-centric.

Here are a few features that would take water utility websites to the next level.

  1. Mobile App page – to download Bill Pay and Account Management Apps

  2. Text and Tweet links – for Alerts, Service Issues, and Feedback

  3. Photo and Video clips – that provoke Thought, and convey Customer Appreciation

  4. Multiple Language Translator functionality – for non-native English speakers

  5. Single sign on for all utilities that the customer uses – e.g. water, gas, electric


Check out your water utility’s website and mobile applications, and send them a quick message with your feedback. Let’s help our water utilities create websites that get us the right level of content whenever we want it. Water is earth’s most precious resource, and water utilities and their customers need to work hand and hand to move towards a better, wetter future.

Customer-Side Water Leaks: Achieving the best outcome

By Callie Smith (Winter Intern @ Valor Water Analytics) and Janani Mohanakrishnan (Chief Delivery & Product Officer @ Valor Water Analytics)

The average city water utility in the United States loses up to 30% of their water through leaks and unbilled usage, according to Navigant Research. Many of these leaks are on the customer-side. Given this high annual water and revenue loss, why do some leaks go unfixed even when they’re identified? What additional role can utilities play to achieve the best outcome of none to very few customer leaks.


Let’s first consider the system in which we’re operating. Water leaks can occur on the utility side (transmission and distribution networks) or on the customer side. Water utilities are typically responsible for any leaks leading up to the retail water meter (assuming the water meter is curbside), whereas customers are responsible for leaks that occur within the bounds of their property. Accountability for customer-side leaks becomes more convoluted when factoring in the customer classification, e.g. multi-family residential buildings or mobile homes. The duties of owners and tenants can vary by state, and lease contracts are highly variable.

Split ownership can interfere in the resolution of known issues, as exemplified by a water quality issue that occurred in Providence, Rhode Island, last summer. Prompted by the lead water crisis in Flint, Michigan, North Providence & Providence Water announced last summer that they would reduce the number of lead water lines serving residential properties. The water utility was responsible for replacing lead lines from the water main in the street to the curb, but was not allowed to spend ratepayer funds on private property. The North Providence program was set up to finance replacement of the pipes from the curb to the water meter. As the program was unable to pay for any old plumbing ‘behind the water meter,’ residents were faced with potentially paying for customer-side lead service line replacements and any other internal lead fixtures, a cost ranging from $3,000 to $10,000, as estimated by the EPA. Many customers could not pay these amounts, and since the risk of partial lead replacements could actually increase levels of lead contamination in the water, the Providence Water Board halted the lead pipe replacement program.


So how can we bridge this ownership gap when it comes to customer leaks? There appears to be three main reasons why customer leaks continue unchecked: (a) lack of leak data analysis and notifications, (b) lack of information on customer responsibilities (especially in apartment/landlord/tenant cases), and (c) lack of financial programs or incentives from water utilities or cities for quick resolution.

Many utilities conduct customer leak analysis – either in house or through an external partner. Leak information, once available, needs to be packaged in a way that then inspires customers to action. Valor Water Analytics provides leak detection through our Hidden Revenue Locator solution, for 10+ clients across the USA. We notice that >50% of customer leaks self-resolve in a day or two, and have determined that it is more valuable for customers to be alerted only of ‘longer drips’ or ‘major leaks.’ In addition to reducing the amount of notifications, the mode of communication also plays an important role in inspiring action. The average water utility customer still likes to be notified of leaks via phone.

East Bay Municipal Utility District conduced a social study to test customer response to a home water report service that integrated water use data, norm-based evaluations, and educational suggestions about water use. The study found that those using the customer portal were more likely to conserve water and participate in the municipality’s rebate and audit programs. The study did not specifically consider customer response to leaks; however, it did find that more digestible notifications and evaluations resulted in improved customer action.

Since utilities have intimate knowledge of customer water consumption, utilities could easily provide information on customer responsibilities, while sending out leak notifications. Let’s say a major leak was detected from water meter readings for a rented home. Responsibility between the tenant and landlord depends on state laws and lease agreements, a fact that could be mentioned to the resident upon notification of the leak. The utility could also offer information on leak severity (in gallons and dollars), and a survey for the resident to narrow down the potential location of the leak within their property. Additionally, utilities could provide customers with a list of plumbers, typical costs, and information on any financial programs applicable to their situation. Water utilities with conservation targets have taken the lead in providing some of these supplementary measures for customer-side leaks. It will be interesting to see if this becomes a standard for all water utilities in the future.


In conclusion, there is a multitude of factors – economic, environmental, social — that influence how customers respond to water issues. Knowledge is the first step towards addressing these issues, however, policy changes and a shift in our thinking around water use can be a great help in reducing customer-side leaks. Valor Water Analytics is helping this goal by partnering with utilities to provide information to residents, in a way that will incentivize people to take action on their customer-side leaks, and save them from expending a precious resource.

Why Eliminating Ambiguity in Your Data Matters

By David Wegman, CTO @ Valor

The next time you strike up a conversation with your friendly neighborhood computer, take note of how long it takes before you get frustrated.  Despite the advances in artificial intelligence over the past decades -- and despite the incredible capacity humans have for adaptation -- human-computer interaction is unnatural (from the human perspective, anyway).  Every touch point where people provide input to computers, or receive output from computers, is an opportunity for misunderstanding.


Even as our systems are getting smarter all the time, there are some simple steps we can take to eliminate ambiguity.  Data architects serve an important role, helping to ensure that information is not lost or garbled in translation.  These techniques are essentially an investment.  Every minute spent on avoiding problems up front can save much more time down the road when things aren't working properly.


Date formats


Which came first, 3/7/2017 or 5/4/2017?  The answer depends on where you are in the world when asking the question.  In the United States, dates are commonly represented as month/day/year, so these dates would usually be interpreted as March 7 and May 4, respectively.  In many other countries, dates are represented as day/month/year, so they would be interpreted as July 3 and April 5, respectively.


This becomes problematic when a data file, which includes date information, is read by a computer system.  Each time the system encounters a field known to be a date, it must decide how to interpret the information.  Fortunately, most modern systems allow us to choose the format of the date for inputting and outputting dates.  However, if the date format is not chosen carefully, it can result in one of the most pernicious types of errors in computer systems: one which does not raise a flag immediately and lays dormant for some time.  A date which is incorrectly interpreted can result in a myriad of problems, as was widely publicized at the end of the last century.


Given enough data points, it may eventually become clear whether dates have been written starting with the month or day (e.g. if one of the values is 3/15/2017, the format cannot be day/month/year because 15 cannot refer to the month, so the format is probably month/day/year).  This approach is suboptimal because it requires an additional step which is not guaranteed to work properly in all cases.  A better approach is to avoid the problem altogether by taking care when choosing a date format.


To eliminate ambiguity when working with dates, when possible, use the format YYYY/MM/DD.  This represents a four-digit year, followed by a two-digit month, followed by a two-digit day.  March 7, 2017 would be represented as 2017/03/07.  This format is widely understood and eliminates the ambiguity that can occur when the year appears at the end.


Field delimiters


A common method for storing tabular data is in CSV (comma-separated values) format.  In a CSV file, each line contains one row of a table.  Within each line, a delimiter character appears in between each value, demarcating the columns.  The delimiter character is usually a comma or a tab.


A problem can arise when one of the values that needs to be stored contains the delimiter character.  For example, a person's name may contain a comma (e.g. "Martin Luther King, Jr.").  In this situation, a line containing this value will contain an extra delimiter character.  Software which treats each occurrence of the delimiter as a new column may be confused by the fact that the number of columns is inconsistent from one line to another.


One strategy is to choose a delimiter character which does not appear in any of the values.  This technique can help minimize problems, however it is not guaranteed to completely avoid them as new data files are created in the future.  A better approach is to wrap values that may contain delimiter characters in double quotes, and to ensure that literal double quote characters are specifically labeled (or "escaped," in programmer speak) using a backslash character.  This ensures that the data file will be parseable regardless of the data that needs to be stored.




Units of measure


Sally's water meter recorded 350 gallons of water used.  John's recorded 200 cubic feet of water used.  Who used more water?  This is a question with a simple answer (John did).  But what if the units were not specified?  If all we know is that Sally used 350 and John used 200, we might decide that Sally used more, but only if we first assume that their meters record water using the same units.  Even if that assumption is correct, if we don't know the units, we won't be able to bill properly for the water or compare the quantity to an amount stored in other systems.


Quantitative values (i.e., measurements) should always have units specified.  When preparing a data file, you can provide information about units as a separate field.  For example:


Alternatively, units can be provided in documentation which accompanies the data.  One advantage of including units information inline in the data is that anyone with the data will automatically have units information, even if the documentation is not accessible.  Another benefit is that the units can vary from one record to another, as in the earlier example of two different people whose water meters recorded in different units.  However, in some cases it may not be practical to provide units inline, and good documentation can help fill this gap.


Keep clear and carry on


Data parsing errors are not unusual.  However, with a small investment they can be minimized.  By avoiding common data pitfalls and making the right choices at the outset, you will eliminate unnecessary troubleshooting and set yourself up for success.

Valor Water Analytics Intern Blog: Krishna Rao

Valor Water Analytics Intern Blog: Krishna Rao

Hi, I'm Krishna!

I am an environmental fluid mechanics and hydrology engineer currently pursuing my masters degree  at Stanford University. I work on the intersection of data science and water hydraulics to create intelligent statistical models. Apart from my course curriculum, I also pursue research in eco-hydrology remote sensing as a research assistant. Thanks to the long commute to work, I am catching up on my reading. I am currently reading Lab Girl by Hope Jahren.