AR-Enabled Catalogs

Universal User Modeling (UUM):

A Foundation Model for User Understanding at Snapchat

Overview

User understanding is crucial to the effectiveness of modern recommender systems. Traditionally, user modeling is performed independently within each product surface (e.g. Discover tab, Spotlight tab) with the goal of improving product-specific metrics. While performant for specific product surfaces, this approach doesn't scale well and overlooks the value of cross-surface signals, limiting the system's ability to uncover deeper relationships in user behavior and preferences as a whole.

At Snap, we introduced Universal User Modeling (UUM)—a foundational user model that captures user behaviors across all surfaces. Rather than replacing existing surface-specific models, UUM complements them through universal user representation enriched with cross-domain signals, enhancing user understanding and personalization. UUM has been adopted across major use cases at Snap including Friend Stories, Ads, Spotlight, Notification, Lens, and Content Search, driving significant engagement and DAU growth.

Design Overview

UUM is  a standalone model that generates shareable long-term user embeddings, capturing more than one year of historical behavior. Short-term user embeddings, which reflect recent or near-realtime user activities, continue to be consumed by application-specific models. The architecture is illustrated in the diagram below.

Our approach involves logging user behavior data across all major in-app surfaces—such as Content, Ads, and Lens Events, among others—into a unified data collection pipeline. This consolidated dataset will enable the training of universal user embeddings that capture both user intent and interest across the entire Snapchat ecosystem. All data used in UUM strictly adheres to Snap’s privacy policy. These embeddings are intended to be general-purpose and reusable across various surfaces.

To capture long-term behavioral patterns, a dedicated model is trained on the full scope of a user’s cross-surface sequence of activities. This model generates consistent, shareable embeddings that provide a richer view of user behavior over time and surfaces, supplementing product-specific signals and enabling broader personalization across the Snap platform.

Data and Feature Engineering

The raw data for UUM consists of engagement events from multiple domains, including Content , Ads, Growth, Lens etc. in the Snapchat ecosystem. These events are aggregated daily to form the final sequences for model training and serving. Challenges exist in this process in order for us to efficiently build the sequence in a scalable manner as the data is expected to be cross domain, and be pliable in terms of feature inclusion, data sampling as well as backfilling.

To tackle the complexities of large-scale data and enrich our models, we invested in building a highly efficient and flexible data pipeline.  Leveraging Spark and Iceberg Storage format, we assemble raw data into multiple intermediate and domain-specific datasets at daily granularity. This granular organization allows for diverse sampling and merging strategies, offering flexibilities for downstream consumption and future scalability. 

Our platform also features a flexible cross-domain event merge mechanism, letting models learn intricate user interest profiles across Snapchat. A key challenge was supporting long event sequences from power users. We prioritize high-intent events (like 'boost', 'send') and trim low-intent events (like 'watch') via uniform sampling. This optimized strategy allows us to model user action sequences for over a year, capturing long-term interests effectively.

Model

As depicted in the picture below, we adopt sequence encoders (transformer, multi-head attention) for modeling user interaction sequence. The output of the sequence encoder is sent through the cross layer for further feature interaction. Finally, the user embedding is then concatenated with candidate embeddings for future event prediction.

Dedicated sequence encoders are utilized for each domain, offering flexibility for new domain integration, mitigating heterogeneities of feature and negative transfer (as evidenced in [1]) compared to a single cross-domain encoder. Nevertheless, the late feature fusion used by these dedicated encoders restricts feature interaction and embedding learning. To address this deficiency, we leverage information bottleneck tokens [1]. These tokens are designed to control the flow of inter-domain information, overcoming the limitations of late fusion and yielding better user embedding representations while mitigating negative transfer.

To better align with downstream objectives, the UUM model is trained using a multi-task objective centered on next-k event prediction across the app’s core product surfaces. . This approach enables the model to share knowledge across surfaces while reducing noise by averaging over multiple future interactions, leading to a more robust representation of user preferences. We use the cross-entropy loss for binary tasks and the mean squared error loss for regression tasks.

Use Case: Snapchat Mixed Feed

This diagram outlines the UUM embeddings application in one of the end-to-end pipelines for a real-time personalized ranking system (Spotlight) at Snap. The user life events and real-time engagement data fully compliant with Snap’s privacy policy are collected and processed through the UUM upstream data pipeline, which includes daily data generation, model training, and embedding generation. These embeddings are ingested into a real-time feature store and also used to enhance ranking data through join/backfill processes. In parallel, engagement data feeds into the ranking pipeline, where ranking data is generated and used to train a ranking model. The trained model is then deployed to power real-time content ranking, enabling dynamic and personalized user experiences across the app.

Conclusion

Snap’s mission is to empower people to express themselves, live in the moment, learn about the world, and have fun together. To support this goal, the Snapchat app provides users with multiple product surfaces such as Lenses, Spotlight, Discover, and many others. In this work, we have given our first step into learning holistic representations that better capture our users’ interests. The work presented in this post allows us to improve our recommendations and improve the experience of our users. If interested in learning more details about the work, we have published papers on both KDD [1] and SIRIP (SIGIR) [2] providing more technical details. We will continue to make progress in this direction both from a product and research perspective.

References:
[1] Revisiting Self-attention for Cross-Domain Sequential Recommendation, C. Ju, L Neves, B Kumar, L Collions, T Zhao, Y Qiu, Q Dou, S Nizam, S Yang, N Shah, KDD 2025
[2] Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat, C. Ju, L Neves, B Kumar, L Collins, T Zhao, Y Qiu, Q Dou, Y Zhou, S Nizam, R Ozturk, Y Liu, S Yang, M. Malik, N. Shah, SIGIR 2025