water data

Cyber Security and the 4th Annual Water Data Summit: 5 Takeaways on Open Water Data

By: Victor Miao, Software Engineer

Water’s greatest minds coalesced from all over the world at the 4th Annual Water Data Summit on August 22 and 23, at the University of California, Davis. Hosted by the California Data Collaborative, water experts from all sectors gathered to discuss and collaborate on water data. With one of the main themes being California’s AB 1755 (the Open and Transparent Water Data Act), speakers addressed the current state of water data and its challenges. Here are some of the main takeaways.

Summit panelists discuss economic impacts of irrigation reduction

Summit panelists discuss economic impacts of irrigation reduction

Open Data

California Governor Gavin Newsom’s Citizenville describes one example outcome of open data: the efforts of a single programmer mapping public police data directly led to reform and behavioral changes. Similarly, one speaker cited public road data as the initial spark for Google Map. Hoping to invoke similar ingenuity, many water experts believe that open water data will empower people to make better-informed decisions regarding sustainable water management.

Jesper Elkjær Christensen, Senior Advisor of the Water Technology Alliance of the Consulate General of Denmark, presented an enlightening case study on the benefits of open water data in Denmark. Public, private, and academic organizations have long collaborated on public groundwater data, mapping groundwater since 1999 and creating other tools to help guide important water decisions. Jesper’s conclusion from Denmark’s success: “make water data useful, public, standardized, and collaborative”.

Denmark’s publicly available and interactive groundwater maps

Denmark’s publicly available and interactive groundwater maps

Establish Trust (in the Internet of Water)

The Internet of Water (IoW) is a project started at Duke University, designed to enable open water data to help guide sustainable water management. One example goal depicted the ability to look up local water quality on Google. According to IoW’s co-founder and panelist Martin Doyle, most of the breakthrough water technology already exists in AMI and improved water sensors. However, the issue remains in spreading these technologies as the industry standard. Many organizations still maintain legacy infrastructure and data.

On the same IoW panel and from a different perspective, Deven Upadhyay represented the Metropolitan Water District of Southern California. He discussed his vision of publicly accessible and trusted water data for every locale. However, he cited trust and cyber security as the most important challenges for an open data platform. Cyber security, as another main summit topic, is covered in more detail in the section below.

No Utility Left Behind

One recurring issue of the conference addressed small water utilities that would not be able to comply with proposed open water data policies, much less be well-equipped to protect their data against cyber attacks. Hoping to address such issues, the Aspen Institute Dialogue Series convened in 2017, acting as a neutral space. The Dialogue gathered a diverse group including public and private sectors, water experts, and academics to discuss national policy, water data, and sustainable water management. Accordingly, IoW aims only for voluntary participants in producing open data, instead of focusing on influencing public policy on open data.

Water Data Needs Work

“Water and data have not been married for long,'' stated one presenter. It is a fledgling field that necessitates adopting present and newly available technology to be robust, secure, and useful. Surprisingly, 2019 is the first year that all California water utilities were able to provide aggregate monthly water usage, and only as a direct result of an emergency measure due California’s 2011-2017 drought. 

Joaquin Esquivel, Chair of the California State Water Board, presented similar findings on the state-level. Citing both his current position and his previous position as Director of Information and Technology under California Senator Barbara Boxer, he acknowledged that while California is progressive, its data infrastructure still needs much work.

Open Data Works

As open data helped empower everyday people to innovate, open source tools and software aim to do similar good. The California Data Collaborative hosts a plethora of open source tools and software on their GitHub, such as an evapotranspiration estimator, real-time snow water estimator, and real-time reservoir visualization.

Similarly, many other organizations presented open source projects or research on open data:

  • Sacramento State’s open source tool for mapping groundwater quality, designed to help water managers identify disadvantaged communities with contaminated groundwater

  • One Stanford University PhD candidate’s research on Google search trends appears to show a correlation between media coverage of the 2011-2017 California drought and state-wide voluntary water usage reduction. Mandatory water restrictions seemed to correlate less with water conservation.

  • University of California, Irvine presented a disturbing recent trend of increasingly unsafe water in rural and low-income areas, especially in Oklahoma and Texas. In 2015, 9% of water systems in 2015 violating the Safe Water Act and affecting 21 million people.

  • FlowWest showcased their open source software such as a salmon life cycle model, supporting the efforts of the Central Valley Project Improvement Act to protect fish and wildlife.

Cyber Security

Online privacy and security have become increasingly important topics in more recent years, and for good reason. Data breaches and ransomware seem to occur far too often. However, the rising public awareness should hold agencies more accountable for our private and sensitive data. Indeed, cyber security cemented itself as a keystone topic at the 4th Annual Water Data Summit as well. This year, speakers at the summit helped us better understand the full stack of data security.

Your data is in good hands…

Despite, or perhaps because of, the recent prevalence and awareness of data breaches, many organizations have been actively preparing for and defending against the worst. Listed here are a few cyber security best-practices brought up during the summit.

Cyber Security panel. Pictured (L-R): William Johnson, David Wegman, Rocky Smith

Cyber Security panel. Pictured (L-R): William Johnson, David Wegman, Rocky Smith


William Johnson, Information System Division Manager at East Bay Municipal Utility District (EBMUD), outlined their best practices as a business-to-consumer entity that uses consumer data:

  • Defend against infiltration: EBMUD regularly performs penetration assessments both internally as well as through third party auditors, in order to find and fortify any weak points.

  • Defend against exfiltration: employees must abide by clear protocols (e.g. no sending cleartext via email, correctly storing and transferring data, etc.) and participate in random phishing tests that automatically enroll them into anti-phishing programs if they fail.

  • Cloud security: EBMUD employs a cloud-first approach, knowing that Amazon and other cloud providers often employ the best specialist teams in cyber security that other organizations generally cannot match with on-site solutions.

David Wegman, our very own CTO at Valor Water Analytics (a Xylem brand), presented Valor’s guidelines as a business-to-business agency that receives data from other businesses:

  • Minimize data: sometimes, clients send Valor extraneous data and personally identifiable information that we do not want nor ask for. Valor does not store this data, so that there is no possibility of such sensitive information leaking.

  • Grant as little access as necessary: users are only given access to what they need, and in the form of temporary access tokens that expire shortly afterwards. This reduces the damage of a potential breach by forcibly limiting its duration.

  • Layering approach: in addition to minimizing data, blocking unwanted visitors with a firewall, and granting only temporary access, Valor encrypts sensitive data so that it cannot be traced back to its origins. This layered security addresses the worst case scenario, lessening the impact of any stage of a potential breach.

  • Cloud security: Valor shares EBMUD’s sentiments on the cloud, understanding that an on-site solution would likely be less secure and more difficult to maintain.

Rocky Smith, Business Solutions Architect at Cisco, and Internet of Things (IoT) expert, stated, “You have not been attacked yet, are being attacked, or in the aftermath of an attack”. With this mindset, Cisco prepares for every scenario, designing the best possible outcome.

  • Establish perimeters everywhere: extend firewalls and permissions barriers frequently - even between internal tools, to mitigate potential breaches.

  • Minimize data: even if IoT or other connected devices are compromised, they should not have any sensitive data to leak, only anonymous or useless strings of numbers.

  • Properly back up data: ransomware only has leverage against an organization when, by definition, they hold hostage something valuable. With proper backups, an agency protects itself by being able to recover their valuable data without the need to comply with a malicious entity’s demands.

...as long as it is prioritized

Given these industry-standard practices, one might ask “why are there breaches at all?”. Frankly, organizations need to understand and prioritize security in the first place to even have these measures in place. Instead, some organizations may be far too small, too large and slow-moving, or simply unaware of security threats. Fortunately for everyone, we do have some brilliant individuals and unbiased organizations hoping to tackle some of these issues.

Related Links

California Data Collaborative

http://californiadatacollaborative.org/

https://github.com/California-Data-Collaborative

Internet of Water

https://internetofwater.org/

Aspen Dialogue on Sustainable Water Infrastructure

https://www.aspeninstitute.org/programs/energy-and-environment-program-3/water/

Denmark’s Groundwater Maps and Data

https://eng.geus.dk/products-services-facilities/data-and-maps/groundwater-maps-and-data/ 

Sacramento State - California Groundwater Contamination Risk Index

https://www.owp.csus.edu/grid/

FlowWest

http://www.flowwest.com/

https://github.com/FlowWest


Getting to Know Our Summer Interns: Introducing Jordan

IMG_2829.jpg

This is part 3 of our intern introduction blog posts. You can read part 1 here, and part 2 here.

By: Jordan Manthey

I’m 21-year-old and from Tampa, Florida, going into my 4th year as a UC Berkeley undergraduate student, studying computer science and data science. Growing up along the Gulf of Mexico provided me with a conscious appreciation for water and consequently sparked my interest in Xylem and its mission to Solve Water. My experience as a software engineer intern here has allowed me to apply the skills I’ve learned from university to real-world problems. Though I feel that my classes have prepared me well for a career in software development, there are still several gaps between academia and professional work-- specifically around working in a team environment. Valor has done a phenomenal job bridging these gaps for me this summer, and here I’ll highlight some of my key learning experiences as an aspiring developer.

jira.png

JIRA

At best, my computer science classes have permitted one partner on projects to ensure academic integrity for all students, however, these restrictions no longer apply in the workplace with a full team of software engineers. I’ve found it incredibly useful to ask questions, take in suggestions, and learn from other people’s code in order to facilitate my own work. Valor uses an issue and project tracking software, JIRA, which reinforces this type of collaboration by allowing team members to make comments and share code snippets on any task. Because JIRA shows which people are assigned to each problem, it has been perfect in showing who to look for help when needed. I have learned that this tool is critical for software development and can highly increase team productivity when used correctly.

docker.png

DOCKER

I also quickly found it necessary to make sure my own projects are compatible with the company’s existing software and work properly on each team member’s machine. This problem has always been handled for me or completely overlooked in school, so I was intrigued about learning the ways in which Valor standardizes their software. Here we use Docker Engine, a software tool that deploys virtual containers to provide each team member with the same development environment across platforms. It’s very powerful knowing that the team will be able to run my code as expected when I work within the Docker container, and vice versa.

unittest.png

UNIT TESTING

 I‘ve had experience creating unit tests for my academic projects to ensure my code behaves as expected, but this has always been a posterior exercise for my own sanity. Throughout my internship, I’ve experienced the benefits of writing unit tests before beginning the implementation process. Tackling real-world problems is typically not as clear-cut, and it can save significant time to consider the end goal of your program before you even get started working. This method will similarly make debugging sessions easier since you already have proper test suites to traceback any issues. Unit testing maintains a greater purpose when working within a development team-- It not only proves to others that your code runs properly, but it also will act as a signal to team members when their own changes compromise your work later down the line. 

Though I took what I’ve learned at UC Berkeley and applied it to my internship, I now plan on carrying what I’ve learned at Valor over into my final year as an undergraduate student. I give many thanks to the enjoyable and supporting team I met here this Summer as I learned and laughed a lot. But most importantly of all, I’ve mastered the Bay Area public transportation system.


Xylem Digital Solutions West hosts Norwegian Business School Delegation

By: Sabrina Strauss, Office Manager

StudentVisit5.png

On July 26, 36 students from the Norwegian School of Economics’ Corporate Innovation program visited the Xylem - Digital Solutions’ San Francisco office to learn about innovation in the water industry.  The 36 students are all enrolled at a summer business certificate program at UC Berkeley. CEO Christine Boyle and their professor Leah Edwards have been working together for several years to help students gain office experience and learn about innovation in Silicon Valley. For the past 3 summers (2017-2019), Valor / Xylem has hosted two summer interns from the program, with both Xylem and the interns learning a lot each year. Interns have helped with marketing material, product documentation, and user group event planning. We also host them to a baseball game each summer. 

StudentVisit2.png

The 2019 program on Corporate Innovation focusses on startup methods utilized within bigger organizations, as well as the acquisition of startups as a way to kickstart innovation in big companies. The one-day visit aimed to provide students real -life examples of corporate innovation, with a focus on the water industry. 

Christine Boyle started the session with an introduction of Xylem and their business mission, as well as the development of their relationship with Valor Water Analytics, which led to the company’s acquisition in 2018. Valor Water is now part of the Xylem - Digital Solutions group.  Next, Xylem staff delivered sessions on product management, data science & AI, and software development practices at Xylem. 

StudentVisit3.jpg
StudentVisit4.jpg

The students showed great interest in each presentation, and had great questions.  It was a pleasure to have the group and their Professor stop by, and we are looking forward to the next batch in 2020.


Challenges and innovations: Current and Future States of Water Affordability: Part 3.

Note:

This is the third in a series of Valor Water Analytics blog posts exploring water affordability, customer nonpayment, and potential solutions that enable utilities to deliver water more equitably and sustainably to all customers. You can read posts one and two here.

Where We Can Go Tomorrow: Exploring Novel Interventions for Nonpayment Reduction

By Maryana Pinchuk, Stacey Isaac Berahzer, and Christine Boyle, PhD

In our two previous blog posts in this series, we explored the definitions and metrics used to assess affordability, discussed the role of customer assistance programs (CAPs) in addressing affordability, and considered some major challenges that utilities face when setting up CAPs. In this post, we will briefly discuss rate structures and policy changes that influence affordability, as well as cover additional novel interventions that may reduce utility customer nonpayment in the water sector and related sectors.

Rates and policies shape affordability

black-blocks-close-up-1329328.jpg

As a recent study on affordability in New Jersey points out, before even considering a CAP, it is important for a utility to examine how its basic rate structure affects low-income customers’ bills. The study points out that “rate structures that rely heavily on fixed charges, as opposed to volumetric charges, will tend to disadvantage low-income customers, as they tend to use less water than higher-income customers.” This lower volume of use also makes declining block rate structure less equitable for low-income customers. The study concludes that “reforming a utility’s basic rate structure can go a long way to reducing burdens on low-income customers, reducing the need for additional, income-based assistance.”

Policy changes also have the potential to make a big difference. At the state level, New Jersey is being encouraged to adopt language for water similar to the 1999 state law known as the Electric Discount and Energy Competition Act.  That act declared that it was the policy of the state to ensure “universal access to affordable and reliable electric power and natural gas service.”

If such legislative changes were to occur, water affordability would jump to an even higher priority for utilities, and we could see a lot more activity in this area. Among other things, utilities could be mandated to “report on key metrics related to rates, customer bills, and low-income affordability.” However, as we previously discussed, many utilities currently lack the underlying data to track these metrics.

Customer communication to drive behavioral change

balloons-3534209_1920.png

While there have been discussions on using CAPs, rate structures, and policy to influence customer nonpayment, an area yet to be explored in depth is how changes to customer communication might increase the willingness of some customers to pay. From conversations with multiple utilities about affordability and nonpayment, it is clear that customers have different reasons for not paying their utility bills. While some do lack the financial means to afford their bills, nonpayment can also result from sticker shock, knowledge gaps, and other factors that impact the customer’s decision to pay. These factors may seem daunting to understand and address, but some tools and techniques may offer a path forward for utilities interested in changing customer nonpayment behavior.

Behavioral nudges are lightweight, targeted interventions that aim to implicitly and positively influence consumer behavior. There are many examples of nudges that have proved successful in influencing consumer behavior in the realm of conservation. Some examples include:

  • Providing information to consumers about how their consumption compares to their neighbors’, in order to conserve resources. This intervention was successfully leveraged by Opower to reduce energy consumption. Customers who received the comparative energy reports reduced their household energy consumption both immediately after receiving the notifications and in the longer term.

  • Creating data visualizations that build consumer awareness. In Cape Town, South Africa, a publicly accessible map of neighborhood water consumption was used to demonstrate to individuals that a large number of households are abiding by conservation guidelines, in order to normalize conservation targets. This intervention was deployed during a major drought and may have helped avert the “Day Zero” water crisis.

  • Sending targeted notifications to consumers to encourage positive behaviors. Researchers found that conservation-oriented bill inserts successfully helped consumers reduce their water use in South Africa, and that notifications emphasizing the social recognition and public good of conservation – as opposed to notifications that offered conservation tips or emphasized the financial cost of wasting water – were most successful at encouraging conservation behavior during the Cape Town “Day Zero” drought (you can read more about the research here).

Innovative communication techniques to reduce nonpayment

What lessons can utilities struggling with nonpayment learn from these successful conservation-focused nudges?

As we mentioned in our last post, water utilities have traditionally only communicated with their customers through a monthly, bimonthly or quarterly water bill. Many of the interventions above show that by prioritizing timely, targeted customer communication, utilities have the opportunity to positively change customer behavior. These interventions also relied on a deep understanding of the region-specific underlying factors that influenced consumers’ behavior – for example, understanding that social recognition was a more important factor in conservation for water consumers in Cape Town than lack of knowledge about the cost of water or tips on how to conserve.

We believe that the techniques that have worked for conservation may also work for nonpayment. By using predictive analytics, a utility’s customer base can be segmented into water customers at risk of nonpayment, based on the root cause of each’s customer’s failing to pay bills. Valor has found that sending targeted messages to these specific customer segments can motivate non-paying and late-paying customers to pay on time. In a pilot at a mid-sized utility in Georgia, we were able to reduce the amount of outstanding customer payments by 50% and decrease the total number of service shutoffs by 50%, year-over-year.

Targeted marketing to ease chronic nonpayment

accuracy-bullseye-center-226580.jpg

Of course, nudges alone will not fully solve the affordability problem. Identifying different segments within nonpaying customers also means separating customers who for various reasons won’t pay their bill despite having the financial means, from those who can’t pay at all. For these customers who chronically struggle to afford water, CAPs and other affordability programs are a good solution. Using predictive analysis on customer data, utilities can identify which customers fall into this segment – instead of guessing or using outdated or inaccurate historical data – and tailor their CAP marketing efforts specifically to this group to increase CAP enrollment.

The future of affordability

800px-Funny_Direction_Sign.jpeg

When thinking about how utility affordability will look in in the next five to ten years, CAPs, policy, and rate structures play an important role. But it is also important to consider the tools that other industries – e.g., credit card companies, cellular data providers – have instituted to solve customer nonpayment. This includes more flexible payment options, such as installment plans, pay-as-you-go, pre-pay, and more. These alternative payment systems only increase the need for predictive analytics (to understand which customers need what type of assistance) and customer notifications (to communicate effectively about progress to payment). No matter the payment system that a utility offers, we believe the key to reducing the vicious cycle of nonpayment, late fees, and service shutoffs is ensuring that utilities a) know and understand their customers, and b) communicate early and often with them to get ahead of nonpayment.

Accelerating tomorrow now: Key insights from the Utility Analytics Institute’s 2019 annual summit

By Maryana Pinchuk

From May 6 to 8, staff from US and international energy utilities gathered in Charlotte, North Carolina for the Utility Analytics Institute’s annual summit. The theme of the summit was “Accelerate Now,” and throughout the sessions, speakers highlighted how advanced analytics could be used to achieve a variety of goals for utilities, from making smarter business decisions to optimizing the productivity of operations staff and increasing customer benefits. Overall, the summit served as a rallying cry for companies like Xylem to develop more tools to help utilities overcome the obstacles of today and prepare to meet the challenges and opportunities of tomorrow.

Smart decision-making in the age of big data

Sensus Data Scientist Vincent Toups delivering a lightning talk on the history of data science at Sensus – with bingo cards!

Sensus Data Scientist Vincent Toups delivering a lightning talk on the history of data science at Sensus – with bingo cards!

Whether it’s coming from sensors, loggers, or smart assets, the amount of data available on each meter, customer, and square mile of utility distribution network continues to grow each year. For savvy utilities, this wealth of information presents an opportunity to make smarter business decisions. As one example, Mohamad Hussin, Senior Engineer at Dubai Electricity and Water Authority (DEWA), discussed building a machine learning model to understand the actual expected life of a transformer. When DEWA analyzed the data in their service area, they found that transformers were lasting an average of only 15 years in the field before replacement, even though the expected lifespan according to traditional industry guidance was 30-40 years. Through predictive modeling, DEWA was able to more accurately identify which transformers were truly likely to fail, versus ones that were functioning correctly but were likely to get taken out of service for other reasons, e.g., because of a lack of demand in that part of the grid. This knowledge allowed DEWA to prioritize testing and replacing the right assets, saving money on unnecessary field service.

This type of data-driven asset condition assessment is the approach advocated for by Xylem and demonstrated in solutions like Valor’s Hidden Revenue Locator for customer metering and data handling inaccuracies. When utilities use analytics to facilitate smarter asset management, they lower the cost of O&M and drive greater revenue recovery.

Test-and-learn mindset to get the most of out of operations

Discussing customer needs at the Sensus booth

Discussing customer needs at the Sensus booth

Brian Savoy of Duke Energy touched on the importance of implementing a data-driven operational process for increasing worker productivity. Brian, Senior Vice President of Business Transformation and Technology at Duke, discussed the recent evolution of Duke’s internal tool and process development practices from classic waterfall – i.e., a years-long R&D phase before any new product or process was built and operationalized – to a nimble agile process, where an idea could be formulated, developed, and tested in a matter of weeks. This new data-driven approach to operations allowed Duke to pilot radically transformative processes and see returns right away. An example Brian shared was developing an iPhone application to assist with field operations. The app doubled the productivity of Duke’s field crews when it replaced the clunky and expensive legacy tools Duke had been using to track personnel and materials during field maintenance.

Brian’s story provides a valuable lesson in how applying data-driven thinking and technology to basic utility operations practices can increase efficiency. Valor’s proven method of program delivery follows these principles, relying on a two-stage diagnosis and drill-down approach that allows utilities to optimize their deployment of personnel and realize efficiency gains.

Transforming customer service through data and technology

Together, advanced analytics and a data-driven mindset can also dramatically transform the relationship of the utility to its customers. In a panel Q&A discussion, representatives from Exelon, Evergy, and Duke discussed how they’ve implemented predictive customer analytics and chatbots to assist customers with frequently asked questions. Combined, these innovations have freed up their customer service staff to take on more complex customer communications that they previously would not have had time to engage in. Patty Durand, President and CEO of the Smart Energy Consumer Collaborative (SECC) presented findings from SECC consumer surveys that indicated an even greater desire from utility customers to get more meaningful, actionable communications from their utilities about their consumption, as well as ways to save money on their bills.

Xylem recognizes that the future of utility customer engagement – whether for customer leaks or nonpayment – relies on targeted, just-in-time, proactive communication strategies. Valor’s solutions suite includes tools for proactively identifying water leaks and anomalous gas usage behind the customer’s meter in order to facilitate notifications to customers. To tackle the tremendous and growing affordability challenge that utilities and their customers are facing, Valor also provides a predictive nonpayment management solution that can help utilities avert the vicious cycle of nonpayment and service shutoffs with proactive communication and intervention strategies.

Challenges to implementing data-driven solutions

In addition to the many success stories, speakers also discussed the obstacles they have faced when adopting advanced analytics. Their challenges included talent acquisition and retention; data quality and quantity; and siloed data – i.e., only being able to see insights from their own service area. While EPRI, the Electric Power Research Institute, has aggregated data from multiple energy utilities in order to help provide deeper insights on common issues and trends, many utilities are still limited to the data they have, which may be insufficient for building and training robust machine learning models.

Sensus and Valor team picture

Sensus and Valor team picture

Key takeaways for the water sector

Coming from a company that focuses primarily on water, the tools and practices shared by energy-focused utilities, vendors, and nonprofits at the UAI summit were excitingly cutting-edge. The water sector has many lessons to learn from its energy counterparts in advanced analytics and data adoption. But my key takeaway is that companies like Xylem that offer cross-cutting, hardware-agnostic solutions have a huge opportunity to help water utilities – which in the US tend to be smaller and more fragmented than electric and gas utilities – overcome their data quality and quantity challenges. With every program that we deploy, we build a database of best practices and rich insights into customer meters and behavior that can help the next utility we work with make smarter business decisions, optimize operations, and improve customer service. Together, we can help the utilities of today transform their data into insights that will guide them through the challenges and opportunities of the future.


New Feature Alert: Performance Gains Tracker Launched in the Hidden Revenue Locator

By Heidi Smith, Global Product Manager

Valor recently launched a dashboard page, named Performance Gains, in order to enhance our core product, Hidden Revenue Locator (HRL). The Performance Gains page is accessed via the HRL portal and provides a summary of client utilities’ meter asset health including performance assessments in key areas, such as % of meters currently under registering. 

The Performance Gains page allows utility operations teams to:

  • Quickly view meter asset health across all meters to decide where to focus work efforts.

  • View under-registration data to decide which meters to replace and when.

  • Create budgets for your meter assessment management program based on meter fleet performance.

Additionally, the utility management teams will always have the latest metrics at hand to share with board members or other stakeholders to:

  • Enable budgeting decisions on your meter program

  • Demonstrate your progression / management efficiency

  • Showcase the value gained from our Valor solution and make a case for continued subscription

We encourage you to fully explore the new view. My favorite feature, suggested by one of our utility clients, is to hover over the bar graph lines on the meter under-registration analysis to see percentages of meters that are under-registering for a specific slice.  In our demo example, 2.2% of the 10-year-old meters are under-registering.

PerformanceGainsAge.jpg

Check out a sample Performance Gains of our demo utility, which is gaining tremendous value from the program through significant investigations of their flags!

PerformanceGainsSample.jpg

This new view is brought to you thanks to the vision of Janani Mohanakrishnan and Christine Boyle; valuable user feedback from Valor’s Client User Group; the hard work of software design and implementation of Renee Jutras; and quality checks by Glen Semino and Kristine Gali.

Why Eliminating Ambiguity in Your Data Matters

By David Wegman, CTO @ Valor

The next time you strike up a conversation with your friendly neighborhood computer, take note of how long it takes before you get frustrated.  Despite the advances in artificial intelligence over the past decades -- and despite the incredible capacity humans have for adaptation -- human-computer interaction is unnatural (from the human perspective, anyway).  Every touch point where people provide input to computers, or receive output from computers, is an opportunity for misunderstanding.

 

Even as our systems are getting smarter all the time, there are some simple steps we can take to eliminate ambiguity.  Data architects serve an important role, helping to ensure that information is not lost or garbled in translation.  These techniques are essentially an investment.  Every minute spent on avoiding problems up front can save much more time down the road when things aren't working properly.

 

Date formats

 

Which came first, 3/7/2017 or 5/4/2017?  The answer depends on where you are in the world when asking the question.  In the United States, dates are commonly represented as month/day/year, so these dates would usually be interpreted as March 7 and May 4, respectively.  In many other countries, dates are represented as day/month/year, so they would be interpreted as July 3 and April 5, respectively.

 

This becomes problematic when a data file, which includes date information, is read by a computer system.  Each time the system encounters a field known to be a date, it must decide how to interpret the information.  Fortunately, most modern systems allow us to choose the format of the date for inputting and outputting dates.  However, if the date format is not chosen carefully, it can result in one of the most pernicious types of errors in computer systems: one which does not raise a flag immediately and lays dormant for some time.  A date which is incorrectly interpreted can result in a myriad of problems, as was widely publicized at the end of the last century.

 

Given enough data points, it may eventually become clear whether dates have been written starting with the month or day (e.g. if one of the values is 3/15/2017, the format cannot be day/month/year because 15 cannot refer to the month, so the format is probably month/day/year).  This approach is suboptimal because it requires an additional step which is not guaranteed to work properly in all cases.  A better approach is to avoid the problem altogether by taking care when choosing a date format.

 

To eliminate ambiguity when working with dates, when possible, use the format YYYY/MM/DD.  This represents a four-digit year, followed by a two-digit month, followed by a two-digit day.  March 7, 2017 would be represented as 2017/03/07.  This format is widely understood and eliminates the ambiguity that can occur when the year appears at the end.

 

Field delimiters

 

A common method for storing tabular data is in CSV (comma-separated values) format.  In a CSV file, each line contains one row of a table.  Within each line, a delimiter character appears in between each value, demarcating the columns.  The delimiter character is usually a comma or a tab.

 

A problem can arise when one of the values that needs to be stored contains the delimiter character.  For example, a person's name may contain a comma (e.g. "Martin Luther King, Jr.").  In this situation, a line containing this value will contain an extra delimiter character.  Software which treats each occurrence of the delimiter as a new column may be confused by the fact that the number of columns is inconsistent from one line to another.

 

One strategy is to choose a delimiter character which does not appear in any of the values.  This technique can help minimize problems, however it is not guaranteed to completely avoid them as new data files are created in the future.  A better approach is to wrap values that may contain delimiter characters in double quotes, and to ensure that literal double quote characters are specifically labeled (or "escaped," in programmer speak) using a backslash character.  This ensures that the data file will be parseable regardless of the data that needs to be stored.

 

 

waterunit.png

Units of measure

 

Sally's water meter recorded 350 gallons of water used.  John's recorded 200 cubic feet of water used.  Who used more water?  This is a question with a simple answer (John did).  But what if the units were not specified?  If all we know is that Sally used 350 and John used 200, we might decide that Sally used more, but only if we first assume that their meters record water using the same units.  Even if that assumption is correct, if we don't know the units, we won't be able to bill properly for the water or compare the quantity to an amount stored in other systems.

 

Quantitative values (i.e., measurements) should always have units specified.  When preparing a data file, you can provide information about units as a separate field.  For example:

 

Alternatively, units can be provided in documentation which accompanies the data.  One advantage of including units information inline in the data is that anyone with the data will automatically have units information, even if the documentation is not accessible.  Another benefit is that the units can vary from one record to another, as in the earlier example of two different people whose water meters recorded in different units.  However, in some cases it may not be practical to provide units inline, and good documentation can help fill this gap.

 

Keep clear and carry on

 

Data parsing errors are not unusual.  However, with a small investment they can be minimized.  By avoiding common data pitfalls and making the right choices at the outset, you will eliminate unnecessary troubleshooting and set yourself up for success.