Access to DWH

Access to DWH is paid service that should be requested separately.

Please contact your account manager to get more details.

Intro

Smartico provides access to the Data Warehouse (DWH) with row data related to the system operation. The DWH is based on Google BigQuery and exposes facts and dimensions tables.

There are mainly three domains of data exposed through the DWH:

  • CRM - information about campaigns, communication, conversions, automation rules etc

  • Gamification - missions, levels, tournaments, mini-games, points, store purchases

  • Affiliation - registration facts, financial transactions, payments, adjustments, affiliates information

Some of these domains have shared information, e.g. bonuses - can be given from the CRM and from Gamification. In the Affilication domain you will find also CRM data, so you can analyse all communication that is addressed to the affiliates.

Smartico DWH shouldn't be used to run analytical queries directly, as limitations apply to how much data can be retrieved daily.

Instead, you should do a batch load of changes to your analytical system and run reports on your side.

All fact tables are partitioned by date for your convenience so that you can load only the data delta for the completed day efficiently. We recommend running the import of the previous day a few hours after midnight in UTC to ensure that last-day data is fully delivered to the fact tables.

Pay attention to the fact that the data set 'dwh_ext_xxx' is given as an example. The dataset name for your particular setup is the same as the name of the user that you will get to access DWH.

Records in the fact tables are immutable (except cases when stated explicitly), which means they are not changing after they are added.

Fact tables are updated in real-time, and dimensional tables are updated once per hour.

Dimensional tables

dwh_ext_xxx.dm_activity_type

List of possible "activities" that can be used to build a Flow for the scheduled & real-time campaigns. There are activities like "Send SMS", "Send Mail", "Give Bonus", "Stop campaign" etc.

There are essential activities that exist in any campaign, like

  • Journey start (1) - represents the fact that the campaign started

  • Journey stopped (2) - represents the stop of the campaign

  • Journey Converted (3) - represents the fact that the campaign was converted

Table structure:

ColumnTypeNote

activity_type_id

int64

activity_name

string

dwh_ext_xxx.dm_audience

The list of campaigns. Includes both real-time and scheduled.

Table structure:

ColumnTypeNote

audience_id

int64

audience_name

string

entry_mode_id

int64

0 - ONCE_IN_A_LIFE 1 - ONCE_IN_AN_OPEN_JOURNEY

2 - EVERY_TIME_CONDITION_MET

3 - STOP_AND_START

audience_status_id

int64

1 - Draft 2 - Active 3 - Paused 4 - Disabled 5 - Archived 6 - Executed (for Scheduled only)

audience_exec_type_id

int64

1 - Realtime campaign of Marketing category 2 - Realtime campaign of Operational category 3 - Scheduled campaign of Marketing category 4 - Scheduled campaign of Operational category

dwh_ext_xxx.dm_resource

Communication resources (templates) of all types, e.g. SMS, popups, emails, etc.

Note that the list includes both - main resources and variations of the main resources

Table structure:

ColumnTypeNote

resource_id

int64

resource_name

string

resource_type_id

int64

1 - Mail 2 - Popup 3 - SMS 4 - Push 5 - Inbox 9 - IVR

dwh_ext_xxx.dm_ach

Table contains information about missions and badges

ColumnTypeNote

ach_id

int64

ID of mission/badge

internal_name

string

Name of mission/badge

dwh_ext_xxx.dm_ach_points_change_source

Lookup table with possible sources for the points transactions.

ColumnTypeNote

source_type_id

int64

ID

source_type_name

string

Name of the source

CRM fact tables

dwh_ext_xxx.j_communication

Includes all communications sent to the end-user and all possible outcomes (fact_type_id) of these communications, like failure of delivery, impressions, clicks, etc.

Ideas to understand communications concept

  • the communication is always triggered by some "activity" from the campaign. This activity is referenced by engagement_id or engagement_uid. Such "activity" is part of the campaign, where the starting point of the campaign (the fact that the user entered into the campaign) is presented by the root_engagement_uuid & root_engagement_create_date. The campaign definition itself if presented by root_audience_id

  • The first fact that happens for communication is CREATE, after that, it is usually either SENT and DELIVERED or FAILED. It can be later seen by the end-user (IMPRESSION) and CLICKED. Users can also OPTOUT from the communication by following the "opt-out" link in the resource. For Inbox type of message, the user can also READ it and DELETE

  • The communication is always connected to some resource (template), presented as resource_id, and in case the resource has variations, by the resource_variation_id

  • Note that smartico is tracking the fact of any communication only 30 days after it's created

Partitioned by: fact_date

Data availability: last 180 days by fact_date

Table structure:

ColumnTypeNote

fact_date

timestamp

The date/time when fact happened

user_id

int64

fact_type_id

int64

CREATE = 1 SENT = 2 IMPRESSION = 3 CLICK = 4 FAIL = 5 DELIVERED = 6 OPTOUT = 7 READ = 8 (for Inbox only) DELETE = 9 (for Inbox only)

fact_details

string

for CLICK fact - the link that was clicked for FAIL - the detailed reason for fail

activity_type_id

int64

Type of activity that triggered that communication (FK: smarticoC.dm_activity_type)

engagement_uid

uuid

Reference to the smarticoC.j_engagements, represents the point in the Campaign/Flow from where the communication was triggered

resource_id

int64

FK: smarticoC.dm_resource, main resource

resource_variation_id

int64

FK: smarticoC.dm_resource, specific variation of the resource

root_audience_id

int64

FK: smarticoC.dm_audience, the campaign that triggered communication

root_engagement_uuid

uuid/string

Reference to the smarticoC.j_engagements, represents the "root" point in the Campaign/Flow from where the communication was triggered

root_engagement_create_date

timestamp

The date when the campaign that triggered specific communication is started for the user

engagement_id

int64

Similar to engagement_uid, but in the int64 presentation

communication_id

int64

The unique ID of the communication will be the same for all facts through which the communication is transitioning. The ID id generated on the fact CREATE(1)

audience_category_id

int64

Category of the campaign. 1 - Marketing 2 - Operational

external_id

string

ID of the communication assigned by the external gateway at the moment when the external gateway accepted communication from Smartico

planned_send_date

timestamp

In some setups, the campaign that creates communication is scheduling delivery for a later time, in such case planned_send_date will reflect this date

fail_reason_id

int64

ID of the fail reason, FK: smarticoC.dm_com_fail_reason

user_ext_id

string

The ID of the user in the operator system

dwh_ext_xxx.j_engagements

The table represents the steps of how the user enters the campaign, exits it, converts, and goes through all the "activities" of the Flow defined in the campaign.

The table represents the recursive structure of user transition over the campaign graph.

For most of the use cases, you will need to analyze only records that have activity_type_id as one of

  • 1 - represents the start of the campaign

  • 2 - stop the campaign

  • 3 - conversion of campaign

The table structure below describes only essential fields that could be used to analyze the campaign performance. If you need a deeper analysis, please contact support@smartico.ai for an explanation.

Partitioned by: create_date

Data availability: last 180 days by create_date

Table structure:

ColumnTypeNote

engagement_id

int64

unique id

engagement_uid

uuid/string

same as unique id, but uuid presentation

root_audience_id

int64

FK: smarticoC.dm_audience

create_date

timestamp

time when user got this activity

user_id

int64

root_engagement_id

int64

reference to the same table pointing to the starting point of this campaign for the specific user

root_engagement_create_date

timestamp

The date when the campaign is started for a specific user

activity_type_id

int64

FK: smarticoC.dm_activity_type

user_ext_id

string

The ID of the user in the operator system

resource_id

int64

ID of the resource if the activity is related to the communication. Important that the field is populated retrospectively every night at 3 UTC

Example of query that returns

  • data of label 2305

  • users entered campaign 793485 after 20/11/2023

  • and converted in this campaign

   WITH SRC AS (
        SELECT 
            engagement_id,
            root_audience_id,
            root_engagement_id,
            root_engagement_create_date,
            user_id,
            engagement_uid,
            activity_type_id,
            create_date,
            from_control_group,
            activity_id
        FROM dwh_ext_xxx.j_engagements
        WHERE label_id = 2305 AND activity_type_id in (1,2,3)
            AND root_audience_id IN (793485)
            AND create_date >= '2023-11-20T00:00:00Z'
            AND root_engagement_create_date >= '2023-11-20T00:00:00Z'
            -- for the performance, you can limit upper date only with 30 days from the 'to', this is MAX duration of campaign
            AND create_date <= TIMESTAMP_ADD('2023-11-21T23:59:59Z', INTERVAL 31 DAY)
            AND root_engagement_create_date <= '2023-11-21T23:59:59Z'
    ),
    PREFINAL as (
        SELECT 
            SRC.root_audience_id, SRC.root_engagement_id, user_id,
            MAX(CASE WHEN SRC.activity_type_id = 1 then SRC.engagement_uid else null end) as root_engagement_uid,
            MAX(CASE WHEN SRC.activity_type_id = 1 then SRC.create_date ELSE NULL END) as start_date,
            MAX(CASE WHEN SRC.activity_type_id = 3 then SRC.create_date ELSE NULL END) as convert_date,
            MAX(CASE WHEN SRC.activity_type_id = 2 then SRC.create_date ELSE NULL END) as stop_date,   
            MAX(CASE WHEN SRC.activity_type_id = 1 THEN SRC.from_control_group ELSE NULL END) as from_control_group
        FROM SRC
        WHERE true
        GROUP BY SRC.root_audience_id, SRC.root_engagement_id, SRC.user_id
    ),
    FINAL as (
        SELECT 
            COUNT(*) OVER() AS cnt, ROW_NUMBER() OVER(ORDER BY PREFINAL.start_date DESC) AS row,
            PREFINAL.*,
        FROM PREFINAL
        WHERE true AND convert_date IS NOT NULL
    )
    SELECT *
    FROM FINAL

Gamification fact tables

dwh_ext_xxx.g_ach_completed

The table represents history of all completed missions & badges

Partitioned by: create_date

Data availability: last 180 days by create_date

ColumnTypeNote

ach_id

INT64

ID of the mission/badge

create_date

TIMESTAMP

The time when mission/badge compelted

is_recurring

BOOLEAN

Indicate if the original mission was recurring

user_ext_id

STRING

External ID of the user

crm_brand_id

INT64

ID of the brand in the Smartico system

Example of query returning number of users completed missions in the last 3 days

select A.internal_name, A.ach_id, count(*) as completed_users_count
from dwh_ext_xxx.g_ach_completed M 
inner join dwh_ext_xxx.dm_ach A ON M.ach_id = A.ach_id 
WHERE M.create_date > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 DAY)
GROUP BY A.internal_name, A.ach_id

dwh_ext_xxx.g_ach_points_change_log

The table represents all changes in the gamification points

Partitioned by: create_date

Data availability: last 180 days by create_date

ColumnTypeNote

create_date

TIMESTAMP

The time of transaction

source_type_id

INT64

Type of of the source of the change, refer to dm_ach_points_change_source

source_entity_id

INT64

Id of the entity that triggered transaction, the lookup table depends on the source_type_id

points_collected

INT64

Amount of points in transaction, could be negative in case of deduction

user_points_ever

INT64

Amount of points user collected ever after the transaction

user_points_balance

INT64

Current points balance after transaction

user_ext_id

STRING

External ID of the user

crm_brand_id

INT64

ID of the brand in the Smartico system

Example of query showing the source of the points credit and deposit

select S.source_type_name, 
	sum( CASE WHEN points_collected > 0 THEN points_collected ELSE 0 END) as points_credit,
	sum( CASE WHEN points_collected < 0 THEN -1 * points_collected ELSE 0 END) as points_debit
from dwh_ext_12078.g_ach_points_change_log L 
inner join dwh_ext_12078.dm_ach_points_change_source S ON L.source_type_id = S.source_type_id 
WHERE L.create_date  > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 DAY)
group by S.source_type_name

Affiliation dimensional & fact tables

Example of query that calculates total commission with split by components (CPA, CPL, RevShare) for all affiliates lifetime.

WITH X as (
  SELECT 
    A.affiliate_id, A.parent_affiliate_id,
    COALESCE(SUM(cpa_fee),0) as cpa_fee,
    COALESCE(SUM(cpl_fee),0) as cpl_fee,
    COALESCE(SUM(rev_share_fee),0) as rev_share_fee,
    COALESCE(SUM(sub_affiliate_fee),0) as sub_affiliate_fee
  FROM dwh_ext_XXX.aff_fin_history_daily F
  INNER JOIN dwh_ext_XXX.dm_affiliate A ON F.affiliate_id = A.affiliate_id
  WHERE operation_date > '2000-01-01 00:00:00 UTC'
  GROUP BY A.affiliate_id, A.parent_affiliate_id
),
SUB_SUMS as (
  SELECT parent_affiliate_id, SUM(X.sub_affiliate_fee) as sub_affiliate_fee
  FROM X
  GROUP BY parent_affiliate_id
  HAVING SUM(X.sub_affiliate_fee) > 0
),
FINAL as (
  select TOP_AFFS.* EXCEPT(sub_affiliate_fee, parent_affiliate_id), SUB_SUMS.sub_affiliate_fee, 
    TOP_AFFS.cpa_fee + TOP_AFFS.rev_share_fee + TOP_AFFS.cpl_fee + COALESCE(SUB_SUMS.sub_affiliate_fee,0) as total_commission
  from X as TOP_AFFS
  left join SUB_SUMS ON TOP_AFFS.affiliate_id = SUB_SUMS.parent_affiliate_id
)
SELECT * 
FROM FINAL
ORDER BY total_commission DESC

Notes:

  1. Provided example of calculating of sub_affiliate_fee should be used only in the setup with one level of sub-affiliates network (when only direct parent of affiliate is getting attribution from the comission generated by his children). In case of multi-level affiliation network setup, the sub-affailites commissions should be calculated in the different way

  2. These calculations are not taking into accounts adjustments that can be done in 3 differences cases:

    1. Manual adjustments

    2. Adjustments based on the "Tiers" based deal structures (RevShare tiers)

    3. Adjustment done beginning of the month based on "No negative cary over" calculations

dwh_ext_xxx.dm_affiliate

The table represent information about affiliate

ColumnTypeNote

affiliate_id

INT64

ID of the affiliate

create_date

TIMESTAMP

Date when affiliate is created

parent_affiliate_id

INT64

ID of the parent affiliate

manager_id

INT64

ID of the manager

payment_method_id

INT64

ID of payment method

ext_affiliate_id

STRING

ID of affiliate in the external system

dwh_ext_xxx.aff_fin_history_daily

The table represents daily aggregation of all financial data system collects about a player. Each column refers to some type of financial data and reflect the SUM of all numbers affilaite system have received for some "operation_date". To understand the final numbers for a player, you should SUM all records for a player, for example if you need to know the total deposit amount of a player, you should SUM all "deposits" + "first_deposit" columns for a player.

Partitioned by: operation_date

Table structure:

ColumnTypeNote

ext_customer_id

STRING

Player ID the way it's reported to affiliate system

brand_id

INTEGER

Brand ID in affiliate system

country

STRING

ISO2 country code of the player

affiliate_id

INTEGER

Affiliate ID in case player belongs to one

registration_id

INTEGER

TAP registration id

operation_date

TIMESTAMP

financial day

update_date

TIMESTAMP

last update date

first_deposit

NUMERIC

first deposit amount

deposits

NUMERIC

deposits amount reported during operation_date

withdrawals

NUMERIC

withdrawals amount reported during operation_date

chargebacks

NUMERIC

chargebacks amount reported during operation_date

bonuses

NUMERIC

bonuses amount reported during operation_date

deductions

NUMERIC

deductions amount reported during operation_date

bets

NUMERIC

bets amount reported during operation_date

wins

NUMERIC

wins amount reported during operation_date

operations

NUMERIC

operations count reported during operation_date

ggr

NUMERIC

GGR amount reported during operation_date

cpl_fee

NUMERIC

CPL fee generated during operation_date

cpa_fee

NUMERIC

CPA fee generated during operation_date

rev_share_fee

NUMERIC

RevShare fee generated during operation_date

sub_affiliate_fee

NUMERIC

Total sub affiliate fee generated during operation_date for all affiliate levels

net_pnl

NUMERIC

NET PnL calculated during operation_date

deposit_count

NUMERIC

deposits count reported during operation_date

withdrawal_count

NUMERIC

withdrawals count reported during operation_date

q_cpa

NUMERIC

if value is 1, it means that at this operation_date player was qualified for CPA

q_cpl

NUMERIC

if value is 1, it means that at this operation_date player was qualified for CPL

FAQ

Can't access DWH from Power BI, error: Unable to authenticate with Google BigQuery Storage API. Check your account permissions.

You need to disable the usage of Storage API in Power BI settings following this guide - https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery.

"Use Storage API" - A flag that enables using the Storage API of Google BigQuery. This option is true by default. This option can be set to false to not use the Storage API and use REST APIs instead.

Can't access table dwh_ext_xxx.j_engagements and any other

You need to replace "xxx" value with the label_id specific for your setup. You can also see the label_id in the name of BigQuery user that has the format like "dwh_ext_1234"

Can different labels and brands are access with the samw DWH account

Every label requires separate account to access DWH.

You could have many brands under one label setup, data related to all these labels will be availalbe from the repsective DWH account for this label

Last updated

Change request #417: