May 13, 2022
May 13, 2022

STOMP out Bad Metrics

How do we ensure consistency, accuracy, and relevancy? And more importantly, how do we ensure operating measures - metrics - already in use are trustworthy and measuring as intended? Social Technology Operating Measures Principles - or STOMP - is a set of principles to help standardize the measurement and qualification of metrics.
The majority of technology companies today all have a need to create and share metrics to measure the value of their products and services. Metrics commonly used in the industry include Daily Active Users or DAU, Monthly Active Users or MAU, number of downloads, and number of viewers. But how do we know the code is measuring what’s intended by the business or product owner? When these metrics are shared publicly, how do we know our customers, partners, and investors can trust them? We need to ensure these key business metrics are valid.
Many metrics used by businesses are often referred to as operating measures or key performance indicators (KPIs). These measures are quantifiable and used to monitor and analyze the success or failure of their products. As operating measures become more and more important to understanding a business’ health both internally and publicly, it's become critical that we ensure consistent, accurate, and relevant measurement.
But how do we ensure consistency, accuracy, and relevancy? And more importantly, how do we ensure operating measures - metrics -  already in use are trustworthy and measuring as intended? This brings us to the topic of this post and where a set of principles can help STOMP out these problems. 
What is STOMP?
To help create more consistency and trustworthiness in metrics used in our industry, in the fall of 2019, working with a consortium of industry peers, we established a set of principles to be used as best practice guidelines in building quality into metrics, especially key business and product metrics used publicly. Social Technology Operating Measures Principles  - or STOMP - is a set of principles to help standardize the measurement and qualification of metrics. STOMP is a set of guidelines that facilitate the standardized reporting of key Operating Measures, applicable to organizations in the technology industry that develop mobile and other social applications and platforms. 
The principles provide guidance for 5 areas of an operating measure: 1. Definition, Design, and Reporting Parameters,  2. Internal Controls, 3. Processing Quality and Accuracy, 4. Governance, and 5. Disclosure Guidance.

Definition, Design, and Reporting 
If an application opens in the forest, and no one sees it, did it really open? In all seriousness, there is a philosophical truth to this when it comes to operating measures. For example, if the application is accidentally opened or a feature within the application was clicked on by mistake then quickly closed, should this action be reported as engagement? If an application is left open for an extended period of time, with no engagement activity logged, should this time be reported? Should users using their account for platform A (e.g., log in with Snapchat, log in with Google) with the intent to log in or sign up for platform B be included or excluded in the user engagement reporting of platform A? 
The first set of principles within STOMP is to help guide in answering these questions around definition, design, and reporting of metrics. A metric should be defined considering its business and philosophical intent. The business intent often is straightforward: count the # of users doing X or calculate the total time of session Y. However, all too often metrics designed for the business purpose alone are instrumented incorrectly or diverge from the original business intent as new products are introduced, exposing businesses to reporting risks when the intent isn’t regularly reviewed. For example, you create an application that allows you to add AR unicorns to your photos. To measure the success of your app, you count all the photos your users took and added unicorns using the app. As the app takes off, unicorns are pretty cool afterall, and you allow third-party developers to use your unicorn AR tech in their own applications. If you are just counting the event of adding unicorns, you may mistakenly count all the unicorns added within third-party apps - but is this the intent of your measurement? 
General Technology Controls 
Once a metric is defined and designed within a system, a set of best practice internal controls should be in place to ensure a metric isn’t at risk of inadvertent changes that could alter or break its reporting. STOMP recommends standard access, change management, and proper testing and release controls to be in place. These controls should be formalized, well documented, and operated, especially when such controls will need to be reviewed by independent third parties.
Quality & Accuracy Tolerances
Many metrics report on the engagement of a user while using a product. This usually occurs across hundreds of millions of users, oftentimes on a mobile device. Measuring events and actions that take place on a large range of mobile devices, across the globe in varying network conditions, are open to an array of risks that need to be mitigated. What are the impacts of metrics delays as a result of poor network conditions?  How should the metric account for bad actors, spam, and accounts violating terms of services? How should the metric account for activities occurring in different timezones? 
The STOMP principles expect technology companies to consider these risks, determine tolerance thresholds, and build the right mechanisms to identify anomalies affecting these tolerance thresholds. 
Governance
Independent committees can provide an outsider perspective given they aren’t as close to the metric or responsible for what the metric is measuring. They can also bridge the gap between the business, product, and technical teams responsible for implementing the metric and highlight additional risks that may otherwise be missed. For an organization’s core metrics, especially those that are publicly reported, the controls in place should be regularly reviewed and audited. 
Metric definition changes, new metrics, the retirement of metrics, and quality issues should all be reviewed and opined on by a committee or governance body established in the organization. 
Disclosure Guidance 
STOMP recommends establishing criteria for determining when a finding or incident is significant or material to the metric, resulting in a substantial likelihood that a stakeholder would consider the information important to disclose. 

STOMP at Snap

There is a degree of overhead associated with applying the principles and it is not expected all metrics should follow STOMP. At Snap, we take a risk-based approach and identify our most critical business metrics to apply the STOMP principles. These metrics are defended through three lines of protection.
First Line of Defense
We require a set of baseline quality checks and internal controls to be in place on all key metrics. Changes to existing key metrics are rigorously tested and analyzed prior and during release. For example, our DAU is calculated based on event logic instrumented in client mobile versions of our application, and this logic is checked by automated QA tests to ensure the instrumentation of the metric is functioning and logged as intended before each application version release. We gate version releases to a small % of iOS and Android users before ramping up to 100% of the user base, so we can monitor any anomalies to DAU reporting from previous versions. We have established data loss, delay, and duplication thresholds, and put alerts in place to detect breaches.
Second Line of Defense
The Metrics Governance team serves as a dedicated second line of defense, adding additional anomaly checks on the data quality of metrics as well as performing certifications on key metrics.

The first phase of the certification is to review the metric's definition and design. The data logging and computation logic of the metric is compared against the definition of what the metric intends to measure. Discrepancies are reported to the metric’s data science and instrumentation engineering teams to correct.
The second phase, completed by the Metrics Governance team’s Forensic Data Scientists, includes testing a metric’s quality across sessions of data from different versions of the app. Anomalies are investigated, root-caused, and routed to the metric’s data science and instrumentation engineering teams for correction. 
Third Line of Defense
A final line of defense, Snap’s Internal Audit team, performs annual audits on select key metrics. Results of these audits and control improvements are presented to the first and second line to implement. As an additional line of oversight, we’ve established a Metrics Disclosure Committee composed of team members from Legal, Finance, Metrics Governance, Product, Engineering, and Audit. We have found this mix of cross functional teams brings a good balance of business strategy and risk perspectives when thinking about our key metrics. All new key  metrics, definition changes, and metric incidents are reviewed by this committee. 
What’s Next?

There is a need for consistency in metrics shared publicly; these metrics are critical to the decisions made every day by customers, partners, and investors. An established baseline of quality needs to be in place and recently the STOMP consortium has been in discussions with several public accounting firms for input on next steps for STOMP to be a set of principles that can be audited. 
Are you interested in technical challenges such as these? The Metrics Governance team at Snap is looking for talented people around the world to join us in building out our vision. For more information, please refer to our current Engineering opportunities here or you can browse all openings at careers.snap.com.

References