# Access to DWH

{% hint style="info" %}
Access to DWH is paid service that should be requested separately.&#x20;

Please contact your account manager to get more details.&#x20;
{% endhint %}

## Intro

Smartico provides access to the Data Warehouse (DWH) with row data related to the system operation. The DWH is based on Google BigQuery and exposes facts and dimensions tables.&#x20;

There are mainly three domains of data exposed through the DWH:

* CRM - information about campaigns, communication, conversions, automation rules etc
* Gamification - missions, levels, tournaments, mini-games, points, store purchases
* Affiliation - registration facts, financial transactions, payments, adjustments, affiliates information

Some of these domains have shared information, e.g. bonuses - can be given from the CRM and from Gamification. In the Affilication domain you will find also CRM data, so you can analyse all communication that is addressed to the affiliates.

{% hint style="warning" %}
Smartico DWH shouldn't be used to run analytical queries directly, as limitations apply to how much data can be retrieved daily.

Instead, you should do a batch load of changes to your analytical system and run reports on your side.

All fact tables are partitioned by date for your convenience so that you can load only the data delta for the completed day efficiently. We recommend running the import of the previous day a few hours after midnight in UTC to ensure that last-day data is fully delivered to the fact tables.

Pay attention to the fact that the data set '**dwh\_ext\_xxx**' is given as an example. The dataset name for your particular setup is the same as the name of the user that you will get to access DWH.&#x20;
{% endhint %}

Records in the fact tables are immutable (except cases when stated explicitly), which means they are not changing after they are added.

Fact tables are updated in **real-time**, and dimensional tables are updated once per **hour**.

## Dimensional tables

### dwh\_ext\_xxx.dm\_activity\_type

List of possible "activities" that can be used to build a Flow for the scheduled & real-time campaigns. There are activities like "Send SMS", "Send Mail", "Give Bonus", "Stop campaign" etc.

There are essential activities that exist in any campaign, like

* Journey start (1) - represents the fact that the campaign started
* Journey stopped (2) - represents the stop of the campaign
* Journey Converted (3) - represents the fact that the campaign was converted

Table structure:

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>activity_type_id</td><td>int64</td><td></td></tr><tr><td>activity_name</td><td>string</td><td></td></tr></tbody></table>

### dwh\_ext\_xxx.dm\_audience

The list of campaigns. Includes both real-time and scheduled.

Table structure:

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>audience_id</td><td>int64</td><td></td></tr><tr><td>audience_name</td><td>string</td><td></td></tr><tr><td>entry_mode_id</td><td>int64</td><td><p>0 - ONCE_IN_A_LIFE<br>1 - ONCE_IN_AN_OPEN_JOURNEY</p><p>2 - EVERY_TIME_CONDITION_MET</p><p>3 - STOP_AND_START </p></td></tr><tr><td>audience_status_id</td><td>int64</td><td>1 - Draft<br>2 - Active<br>3 - Paused<br>4 - Disabled<br>5 - Archived<br>6 - Executed (for Scheduled only)</td></tr><tr><td>audience_exec_type_id</td><td>int64</td><td>1 - Realtime campaign of Marketing category<br>2 - Realtime campaign of Operational category<br>3 - Scheduled campaign of Marketing category<br>4 - Scheduled campaign of Operational category<br></td></tr></tbody></table>

### dwh\_ext\_xxx.dm\_resource <a href="#table-smarticocdm_resource" id="table-smarticocdm_resource"></a>

Communication resources (templates) of all types, e.g. SMS, popups, emails, etc.

Note that the list includes both - main resources and variations of the main resources

Table structure:

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>resource_id</td><td>int64</td><td></td></tr><tr><td>resource_name</td><td>string</td><td></td></tr><tr><td>resource_type_id</td><td>int64</td><td>1 - Mail<br>2 - Popup<br>3 - SMS<br>4 - Push<br>5 - Inbox<br>9 - IVR</td></tr></tbody></table>

### dwh\_ext\_xxx.dm\_ach <a href="#table-smarticocdm_resource" id="table-smarticocdm_resource"></a>

Table contains information about missions and badges

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>ach_id</td><td>int64</td><td>ID of mission/badge</td></tr><tr><td>internal_name</td><td>string</td><td>Name of mission/badge</td></tr></tbody></table>

### dwh\_ext\_xxx.dm\_ach\_points\_change\_source <a href="#table-smarticocdm_resource" id="table-smarticocdm_resource"></a>

Lookup table with possible sources for the points transactions.

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>source_type_id</td><td>int64</td><td>ID</td></tr><tr><td>source_type_name</td><td>string</td><td>Name of the source</td></tr></tbody></table>

## CRM fact tables

<figure><img src="/files/dHtP8Bbyd8Qv5U7Z21bY" alt=""><figcaption><p>Communication &#x26; Campaigns (engagements) fact tables and relation between them</p></figcaption></figure>

### dwh\_ext\_xxx.j\_communication <a href="#table-smarticocj_communication" id="table-smarticocj_communication"></a>

Includes all communications sent to the end-user and all possible outcomes (fact\_type\_id) of these communications, like failure of delivery, impressions, clicks, etc.

Ideas to understand communications concept

* the communication is always triggered by some "activity" from the campaign. This activity is referenced by engagement\_id or engagement\_uid. Such "activity" is part of the campaign, where the starting point of the campaign (the fact that the user entered into the campaign) is presented by the root\_engagement\_uuid & root\_engagement\_create\_date. The campaign definition itself if presented by root\_audience\_id
* The first fact that happens for communication is CREATE, after that, it is usually either SENT and DELIVERED or FAILED. It can be later seen by the end-user (IMPRESSION) and CLICKED. Users can also OPTOUT from the communication by following the "opt-out" link in the resource. For Inbox type of message, the user can also READ it and DELETE
* The communication is always connected to some resource (template), presented as resource\_id, and in case the resource has variations, by the resource\_variation\_id
* Note that smartico is tracking the fact of any communication only 30 days after it's created

**Partitioned by: fact\_date**

**Data availability: last 180 days by fact\_date**

Table structure:

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>fact_date</td><td>timestamp</td><td>The date/time when fact happened</td></tr><tr><td>user_id</td><td>int64</td><td></td></tr><tr><td>fact_type_id</td><td>int64</td><td>CREATE = 1<br>SENT = 2<br>IMPRESSION = 3<br>CLICK = 4<br>FAIL = 5<br>DELIVERED = 6<br>OPTOUT = 7<br>READ = 8 (for Inbox only)<br>DELETE = 9 (for Inbox only)</td></tr><tr><td>fact_details</td><td>string</td><td>for CLICK fact - the link that was clicked<br>for FAIL - the detailed reason for fail</td></tr><tr><td>activity_type_id</td><td>int64</td><td>Type of activity that triggered that communication (FK: smarticoC.dm_activity_type)</td></tr><tr><td>engagement_uid</td><td>uuid</td><td>Reference to the smarticoC.j_engagements, represents the point in the Campaign/Flow from where the communication was triggered</td></tr><tr><td>resource_id</td><td>int64</td><td>FK: smarticoC.dm_resource, main resource</td></tr><tr><td>resource_variation_id</td><td>int64</td><td>FK: smarticoC.dm_resource, specific variation of the resource</td></tr><tr><td>root_audience_id</td><td>int64</td><td>FK: smarticoC.dm_audience, the campaign that triggered communication</td></tr><tr><td>root_engagement_uuid</td><td>uuid/string</td><td>Reference to the smarticoC.j_engagements, represents the "root" point in the Campaign/Flow from where the communication was triggered</td></tr><tr><td>root_engagement_create_date</td><td>timestamp</td><td>The date when the campaign that triggered specific communication is started for the user</td></tr><tr><td>engagement_id</td><td>int64</td><td>Similar to engagement_uid, but in the int64 presentation</td></tr><tr><td>communication_id</td><td>int64</td><td>The unique ID of the communication will be the same for all facts through which the communication is transitioning.  The ID id generated on the fact CREATE(1)</td></tr><tr><td>audience_category_id</td><td>int64</td><td>Category of the campaign.<br>1 - Marketing<br>2 - Operational</td></tr><tr><td>external_id</td><td>string</td><td>ID of the communication assigned by the external gateway at the moment when the external gateway accepted communication from Smartico</td></tr><tr><td>planned_send_date</td><td>timestamp</td><td>In some setups, the campaign that creates communication is scheduling delivery for a later time, in such case planned_send_date will reflect this date </td></tr><tr><td>fail_reason_id</td><td>int64</td><td>ID of the fail reason, FK: smarticoC.dm_com_fail_reason</td></tr><tr><td>user_ext_id</td><td>string</td><td>The ID of the user in the operator system</td></tr></tbody></table>

### dwh\_ext\_xxx.j\_engagements

The table represents the steps of how the user enters the campaign, exits it, converts, and goes through all the "activities" of the Flow defined in the campaign.

The table represents the recursive structure of user transition over the campaign graph.

For most of the use cases, you will need to analyze only records that have activity\_type\_id as one of

* 1 - represents the start of the campaign
* 2 - stop the campaign
* 3 - conversion of campaign

The table structure below describes only essential fields that could be used to analyze the campaign performance. If you need a deeper analysis, please contact <support@smartico.ai> for an explanation.

**Partitioned by: create\_date**

**Data availability: last 180 days by create\_date**

Table structure:

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>engagement_id</td><td>int64</td><td>unique id</td></tr><tr><td>engagement_uid</td><td>uuid/string</td><td>same as unique id, but uuid presentation</td></tr><tr><td>root_audience_id</td><td>int64</td><td>FK: smarticoC.dm_audience</td></tr><tr><td>create_date</td><td>timestamp</td><td>time when user got this activity</td></tr><tr><td>user_id</td><td>int64</td><td></td></tr><tr><td>root_engagement_id</td><td>int64</td><td>reference to the same table pointing to the starting point of this campaign for the specific user</td></tr><tr><td>root_engagement_create_date</td><td>timestamp</td><td>The date when the campaign is started for a specific user</td></tr><tr><td>activity_type_id</td><td>int64</td><td>FK: smarticoC.dm_activity_type</td></tr><tr><td>user_ext_id</td><td>string</td><td>The ID of the user in the operator system</td></tr><tr><td>resource_id</td><td>int64</td><td>ID of the resource if the activity is related to the communication. Important that the field is populated retrospectively every night at 3 UTC</td></tr></tbody></table>

Example of query that returns&#x20;

* data of label 2305
* users entered campaign 793485 after 20/11/2023
* and converted in this campaign

```sql
   WITH SRC AS (
        SELECT 
            engagement_id,
            root_audience_id,
            root_engagement_id,
            root_engagement_create_date,
            user_id,
            engagement_uid,
            activity_type_id,
            create_date,
            from_control_group,
            activity_id
        FROM dwh_ext_xxx.j_engagements
        WHERE label_id = 2305 AND activity_type_id in (1,2,3)
            AND root_audience_id IN (793485)
            AND create_date >= '2023-11-20T00:00:00Z'
            AND root_engagement_create_date >= '2023-11-20T00:00:00Z'
            -- for the performance, you can limit upper date only with 30 days from the 'to', this is MAX duration of campaign
            AND create_date <= TIMESTAMP_ADD('2023-11-21T23:59:59Z', INTERVAL 31 DAY)
            AND root_engagement_create_date <= '2023-11-21T23:59:59Z'
    ),
    PREFINAL as (
        SELECT 
            SRC.root_audience_id, SRC.root_engagement_id, user_id,
            MAX(CASE WHEN SRC.activity_type_id = 1 then SRC.engagement_uid else null end) as root_engagement_uid,
            MAX(CASE WHEN SRC.activity_type_id = 1 then SRC.create_date ELSE NULL END) as start_date,
            MAX(CASE WHEN SRC.activity_type_id = 3 then SRC.create_date ELSE NULL END) as convert_date,
            MAX(CASE WHEN SRC.activity_type_id = 2 then SRC.create_date ELSE NULL END) as stop_date,   
            MAX(CASE WHEN SRC.activity_type_id = 1 THEN SRC.from_control_group ELSE NULL END) as from_control_group
        FROM SRC
        WHERE true
        GROUP BY SRC.root_audience_id, SRC.root_engagement_id, SRC.user_id
    ),
    FINAL as (
        SELECT 
            COUNT(*) OVER() AS cnt, ROW_NUMBER() OVER(ORDER BY PREFINAL.start_date DESC) AS row,
            PREFINAL.*,
        FROM PREFINAL
        WHERE true AND convert_date IS NOT NULL
    )
    SELECT *
    FROM FINAL
```

## Gamification fact tables

### dwh\_ext\_xxx.g\_ach\_completed

The table represents history of all completed missions & badges

**Partitioned by: create\_date**

**Data availability: last 180 days by create\_date**

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>ach_id</td><td>INT64</td><td>ID of the mission/badge</td></tr><tr><td>create_date</td><td>TIMESTAMP</td><td>The time when mission/badge compelted</td></tr><tr><td>is_recurring</td><td>BOOLEAN</td><td>Indicate if the original mission was recurring</td></tr><tr><td>user_ext_id</td><td>STRING</td><td>External ID of the user</td></tr><tr><td>crm_brand_id</td><td>INT64</td><td>ID of the brand in the Smartico system</td></tr></tbody></table>

Example of query returning number of users completed missions in the last 3 days

```sql
select A.internal_name, A.ach_id, count(*) as completed_users_count
from dwh_ext_xxx.g_ach_completed M 
inner join dwh_ext_xxx.dm_ach A ON M.ach_id = A.ach_id 
WHERE M.create_date > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 DAY)
GROUP BY A.internal_name, A.ach_id
```

### dwh\_ext\_xxx.g\_ach\_points\_change\_log

The table represents all changes in the gamification points

**Partitioned by: create\_date**

**Data availability: last 180 days by create\_date**

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>create_date</td><td>TIMESTAMP</td><td>The time of transaction</td></tr><tr><td>source_type_id</td><td>INT64</td><td>Type of of the source of the change, refer to dm_ach_points_change_source</td></tr><tr><td>source_entity_id</td><td>INT64</td><td>Id of the entity that triggered transaction, the lookup table depends on the source_type_id</td></tr><tr><td>points_collected</td><td>INT64</td><td>Amount of points in transaction, could be negative in case of deduction</td></tr><tr><td>user_points_ever</td><td>INT64</td><td>Amount of points user collected ever after the transaction</td></tr><tr><td>user_points_balance</td><td>INT64</td><td>Current points balance after transaction</td></tr><tr><td>user_ext_id</td><td>STRING</td><td>External ID of the user</td></tr><tr><td>crm_brand_id</td><td>INT64</td><td>ID of the brand in the Smartico system</td></tr></tbody></table>

Example of query showing the source of the points credit and deposit

```sql
select S.source_type_name, 
	sum( CASE WHEN points_collected > 0 THEN points_collected ELSE 0 END) as points_credit,
	sum( CASE WHEN points_collected < 0 THEN -1 * points_collected ELSE 0 END) as points_debit
from dwh_ext_12078.g_ach_points_change_log L 
inner join dwh_ext_12078.dm_ach_points_change_source S ON L.source_type_id = S.source_type_id 
WHERE L.create_date  > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 DAY)
group by S.source_type_name
```

## Affiliation dimensional & fact tables

Example of query that calculates total commission with split by components (CPA, CPL, RevShare) for all affiliates lifetime.

```sql
WITH X as (
  SELECT 
    A.affiliate_id, A.parent_affiliate_id,
    COALESCE(SUM(cpa_fee),0) as cpa_fee,
    COALESCE(SUM(cpl_fee),0) as cpl_fee,
    COALESCE(SUM(rev_share_fee),0) as rev_share_fee,
    COALESCE(SUM(sub_affiliate_fee),0) as sub_affiliate_fee
  FROM dwh_ext_XXX.aff_fin_history_daily F
  INNER JOIN dwh_ext_XXX.dm_affiliate A ON F.affiliate_id = A.affiliate_id
  WHERE operation_date > '2000-01-01 00:00:00 UTC'
  GROUP BY A.affiliate_id, A.parent_affiliate_id
),
SUB_SUMS as (
  SELECT parent_affiliate_id, SUM(X.sub_affiliate_fee) as sub_affiliate_fee
  FROM X
  GROUP BY parent_affiliate_id
  HAVING SUM(X.sub_affiliate_fee) > 0
),
FINAL as (
  select TOP_AFFS.* EXCEPT(sub_affiliate_fee, parent_affiliate_id), SUB_SUMS.sub_affiliate_fee, 
    TOP_AFFS.cpa_fee + TOP_AFFS.rev_share_fee + TOP_AFFS.cpl_fee + COALESCE(SUB_SUMS.sub_affiliate_fee,0) as total_commission
  from X as TOP_AFFS
  left join SUB_SUMS ON TOP_AFFS.affiliate_id = SUB_SUMS.parent_affiliate_id
)
SELECT * 
FROM FINAL
ORDER BY total_commission DESC


```

<figure><img src="/files/vB5J9eTtTdZ2Pr3kr9dn" alt=""><figcaption><p>Example of the query results</p></figcaption></figure>

Notes:&#x20;

1. Provided example of calculating of sub\_affiliate\_fee should be used only in the setup with one level of sub-affiliates network (when only direct parent of affiliate is getting attribution from the comission generated by his children). In case of multi-level affiliation network setup, the sub-affailites commissions should be calculated in the different way
2. These calculations are not taking into accounts adjustments that can be done in 3 differences cases:
   1. Manual adjustments
   2. Adjustments based on the "Tiers" based deal structures (RevShare tiers)
   3. Adjustment done beginning of the month based on "No negative cary over" calculations

### dwh\_ext\_xxx.dm\_affiliate

The table represent information about affiliate

<table><thead><tr><th width="224">Column</th><th width="132">Type</th><th>Note</th></tr></thead><tbody><tr><td>affiliate_id</td><td>INT64</td><td>ID of the affiliate</td></tr><tr><td>create_date</td><td>TIMESTAMP</td><td>Date when affiliate is created</td></tr><tr><td>parent_affiliate_id</td><td>INT64</td><td>ID of the parent affiliate</td></tr><tr><td>manager_id</td><td>INT64</td><td>ID of the manager</td></tr><tr><td>payment_method_id</td><td>INT64</td><td>ID of payment method</td></tr><tr><td>ext_affiliate_id</td><td>STRING</td><td>ID of affiliate in the external system</td></tr></tbody></table>

### dwh\_ext\_xxx.aff\_fin\_history\_daily

The table represents daily aggregation of all financial data system collects about a player. Each column refers to some type of financial data and reflect the SUM of all numbers affilaite system have received for some "operation\_date". To understand the final numbers for a player, you should SUM all records for a player, for example if you need to know the total deposit amount of a player, you should SUM all "deposits" + "first\_deposit" columns for a player.&#x20;

**Partitioned by: operation\_date**

Table structure:

<table><thead><tr><th width="215">Column</th><th width="133.33333333333331">Type</th><th>Note</th></tr></thead><tbody><tr><td>ext_customer_id</td><td>STRING</td><td>Player ID the way it's reported to affiliate system</td></tr><tr><td>brand_id</td><td>INTEGER</td><td>Brand ID in affiliate system</td></tr><tr><td>country</td><td>STRING</td><td>ISO2 country code of the player</td></tr><tr><td>affiliate_id</td><td>INTEGER</td><td>Affiliate ID in case player belongs to one</td></tr><tr><td>registration_id</td><td>INTEGER</td><td>TAP registration id</td></tr><tr><td>operation_date</td><td>TIMESTAMP</td><td>financial day</td></tr><tr><td>update_date</td><td>TIMESTAMP</td><td>last update date</td></tr><tr><td>first_deposit</td><td>NUMERIC</td><td>first deposit amount</td></tr><tr><td>deposits</td><td>NUMERIC</td><td>deposits amount reported during operation_date</td></tr><tr><td>withdrawals</td><td>NUMERIC</td><td>withdrawals amount reported during operation_date</td></tr><tr><td>chargebacks</td><td>NUMERIC</td><td>chargebacks amount reported during operation_date</td></tr><tr><td>bonuses</td><td>NUMERIC</td><td>bonuses amount reported during operation_date</td></tr><tr><td>deductions</td><td>NUMERIC</td><td>deductions amount reported during operation_date</td></tr><tr><td>bets</td><td>NUMERIC</td><td>bets amount reported during operation_date</td></tr><tr><td>wins</td><td>NUMERIC</td><td>wins amount reported during operation_date</td></tr><tr><td>operations</td><td>NUMERIC</td><td>operations count reported during operation_date</td></tr><tr><td>ggr</td><td>NUMERIC</td><td>GGR amount reported during operation_date</td></tr><tr><td>cpl_fee</td><td>NUMERIC</td><td>CPL fee generated during operation_date</td></tr><tr><td>cpa_fee</td><td>NUMERIC</td><td>CPA fee generated during operation_date</td></tr><tr><td>rev_share_fee</td><td>NUMERIC</td><td>RevShare fee generated during operation_date</td></tr><tr><td>sub_affiliate_fee</td><td>NUMERIC</td><td>Total sub affiliate fee generated during operation_date for <strong>all affiliate levels</strong></td></tr><tr><td>net_pnl</td><td>NUMERIC</td><td>NET PnL calculated during operation_date</td></tr><tr><td>deposit_count</td><td>NUMERIC</td><td>deposits count reported during operation_date</td></tr><tr><td>withdrawal_count</td><td>NUMERIC</td><td>withdrawals count reported during operation_date</td></tr><tr><td>q_cpa</td><td>NUMERIC</td><td>if value is 1, it means that at this operation_date player was qualified for CPA</td></tr><tr><td>q_cpl</td><td>NUMERIC</td><td>if value is 1, it means that at this operation_date player was qualified for CPL</td></tr></tbody></table>

## FAQ

#### Can't access DWH from Power BI, error: Unable to authenticate with Google BigQuery Storage API. Check your account permissions.

You need to disable the usage of Storage API in Power BI settings following this guide - <https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery>.&#x20;

> "Use Storage API" - A flag that enables using the Storage API of Google BigQuery. This option is true by default. This option can be set to false to not use the Storage API and use REST APIs instead.

#### Can't access table dwh\_ext\_xxx.j\_engagements and any other

You need to replace "xxx" value with the label\_id specific for your setup. You can also see the label\_id in the name of BigQuery user that has the format like "dwh\_ext\_1234"

**Can different labels and brands are access with the samw DWH account**

Every label requires separate account to access DWH.

You could have many brands under one label setup, data related to all these labels will be availalbe from the repsective DWH account for this label


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.smartico.ai/welcome/~/changes/B87CND5z8x8BAzJwmSVQ/technical-guides/access-to-dwh.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
