Lookalike audience modeling collaboration¶
About the template¶
The lookalike audience modeling template empowers you to discover and target new, high-value customers who mirror your most profitable existing ones. By training a custom XGBoost machine learning model within a Snowflake Collaboration Data Clean Room, you can significantly enhance your marketing efforts without ever exposing or sharing raw data between parties.
This example uses a 3-step pipeline built on internal tables for multistep workflows. Each step saves its results to an internal table in the clean room, and the next step reads from that table. This gives the advertiser a decision point after each step:
Train: Build an XGBoost model on the seed audience. Review the model quality (AUC, error rate) before proceeding.
Score: Score the full publisher population using the trained model. Review the audience size before activating.
Activate: Send the scored lookalike audience to the publisher account.
This example demonstrates a two-party collaboration where the publisher is the collaboration owner and provides a data offering, a code bundle containing two Python UDFs (one for training, one for scoring), and three templates (train, score, activate). The advertiser joins the collaboration, links their seed audience data, and runs the three templates in sequence.
Collaboration roles¶
Collaborator |
Roles |
Actions |
|---|---|---|
Publisher |
Owner, data provider |
Registers a data offering (user features including membership status, age band, region, and activity level), a code bundle with Python UDFs for model training and scoring, three templates (train, score, activate), and creates the collaboration. After the advertiser activates results, the publisher views and processes the lookalike audience data. |
Advertiser |
Analysis runner, data provider (to self) |
Registers a data offering (seed audience with purchase amounts and segments). Joins the collaboration, links their data, and runs the 3-step pipeline: trains the model, scores the population, and activates the scored audience to the publisher. |
Key use cases¶
Customer acquisition: Find new customers who are similar to your most valuable existing customers by building a predictive model on shared features.
Increase ROI: Improve the return on investment of your marketing campaigns by targeting users who are statistically more likely to convert.
Expand market reach: Discover new market segments that you may not have previously considered, based on feature patterns in your seed audience.
Personalized advertising: Deliver more relevant and personalized ad experiences by targeting a data-driven lookalike audience rather than broad demographics.
Get the worksheets and template¶
Download the worksheets and install them in two separate Snowflake accounts in the same organization and the same cloud hosting environment. These worksheets show how to create and run a collaboration with a lookalike audience modeling template that you can use and modify.
Step 1: Generate sample data¶
Generate sample data in both your publisher and advertiser accounts by running the Python sample data generator.
Download the Python sample data table generator.
Tip
To run the sample data generator:
In Snowsight, go to Projects > Worksheets > + > Python Worksheet.
Paste the contents of the downloaded file into the worksheet.
Set Handler to
mainand Return type toString.Update the
DATABASE_NAMEandSCHEMA_NAMEvariables with your values.Select Run.
Step 2: Run the publisher and advertiser worksheets¶
After generating sample data, download and run the publisher and advertiser worksheets. Run these worksheets using the same role you used to generate the sample data. See instructions to upload a SQL worksheet into your Snowflake account.