Skip to main content

Preparing Assignment Source Tables for Warehouse Native Experimentation

Overview

To prepare Assignment Sources for Warehouse Native Experimentation, transform your raw exposure or impression logs into a clean, standardized table that serves as the foundation for experimentation analyses.

This page describes the required fields, recommended fields, and best practices for preparing your assignment source tables.

Required columns

Every Assignment SourceA data source that defines how users are assigned to different variations in an experiment. In Warehouse Native, this data is typically stored in your data warehouse. table must include the following columns:

ColumnTypeDescription
Unique KeySTRINGUnique identifier for the unit of randomization (for example, user_id, account_id, or a custom key). Must be stable across the experiment duration.
Exposure TimestampDATETIME / TIMESTAMPThe precise time when the assignment occurred (for example, when an impression was logged, a flag evaluated, or getTreatment was called).
Treatment (Variant Group)STRINGThe assigned experiment variant (for example, control, treatment_a, variant_1).
info

These fields are mandatory. Without them, Warehouse Native cannot map exposures to experiment results.

While not required, the following fields make debugging, filtering, and governance more efficient.

ColumnTypeDescription
Experiment ID / NameSTRINGHelps differentiate exposures when multiple experiments are logged in the same raw table.
Targeting RuleSTRINGIndicates which targeting rule or condition led to the assignment. Useful for audit and debugging. If you are using FME feature flag impressions, filter by a single targeting rule to ensure the experiment analyzes the intended population.
Environment IDSTRINGAllows filtering by environment (for example, production, staging). When configuring an assignment source in FME, you can map column values to a matching Harness environment or hard-code a single environment. When creating an experiment, it must be scoped to one environment.
Traffic TypeSTRINGDistinguishes the unit type (for example, user, account, anonymous visitor). When configuring an assignment source, you can map column values or hard-code the environment. Each experiment must be scoped to one traffic type.

Common raw table schemas

Most organizations log impressions or exposures from feature flag evaluations, SDKs, or event pipelines. Below are common raw schemas and how to normalize them.

Feature Flag Evaluation Logs

Example Raw SchemaTransformations
user_id
flag_name
treatment
impression_time
environment
rule_id
• Map flag_name values → experiment_id (if multiple flags correspond to the same experiment).
• Cast evaluation_time to TIMESTAMP.
• Deduplicate on (user_id, experiment_id) by keeping the earliest exposure.

A/B Test Impression Logs

Example Raw SchemaTransformations
experiment_id
user_id
bucket or arm
impression_time
• Standardize buckettreatment.
• Standardize impression_timeexposure_timestamp.
• Deduplicate to keep only the first exposure per user per experiment.

Event Logging Pipelines (Custom Analytics Events)

Example Raw SchemaTransformations
event_name
event_time
properties.experiment_id
properties.variant
properties.user_id
• Flatten nested fields (JSON → explicit columns).
• Filter to only event_name = 'experiment_exposure'.
• Standardize column names to match required schema.

Prepare your assignment table

Follow these best practices for preparing your assignment table in your data warehouse.

  • De-duplication: Keep only the earliest exposure per user per experiment. For example:

    QUALIFY ROW_NUMBER() OVER (
    PARTITION BY user_id, experiment_id
    ORDER BY exposure_timestamp ASC
    ) = 1
  • Consistent Variant Labels: Standardize variant naming (control, treatment, variant_1) across experiments. Avoid null or empty strings; default to control if needed.

  • Timestamps in UTC: Store all exposure timestamps in UTC for consistent comparisons across regions.

  • Stable Identifiers: Use the same user or account key across Assignment SourceA data source that defines how users are assigned to different variations in an experiment. In Warehouse Native, this data is typically stored in your data warehouse. and Metric SourceA data source that defines how metrics are collected and calculated for an experiment. In Warehouse Native, this data is typically stored in your data warehouse. tables. If your system logs multiple IDs (for example, cookie_id and user_id), choose the most stable one.

  • Environment Separation: If raw tables mix environments (for example, staging and production), add an environment_id column and filter accordingly. This prevents accidental inclusion of test data in production environments.

  • Partitioning and Indexing: Partition large tables by DATE(exposure_timestamp) to optimize query performance. Cluster or index by experiment_id and user_id for faster lookups.

Example prepared table schema

ColumnTypeExample
user_idSTRINGabc123
experiment_idSTRINGcheckout_flow_v2
treatmentSTRINGcontrol
exposure_timestampTIMESTAMP2025-03-14T12:45:00Z
environment_idSTRINGprod
traffic_typeSTRINGuser

Once your Assignment Source tables are prepared and validated, see Setting Up an Assignment Source to connect them in Harness FME.