Smartico Data Warеhouse

Access to Smartico DWH is a paid service that should be requested separately.

Please contact your Customer Success Manager to get more details.

Intro

Smartico provides access to the Data Warehouse (DWH) with row data related to the system operation. The DWH is based on Google BigQuery and exposes facts and dimensions tables.

There are mainly three main domains of data exposed through the DWH:

  • CRM - information about campaigns, communication, conversions, automation rules, etc

  • Gamification - missions, levels, tournaments, mini-games, points, store purchases

  • Affiliation - registration facts, financial transactions, payments, adjustments, affiliates information

Some of these domains have shared information, e.g. bonuses - can be given from the CRM and from Gamification. In the Affiliation domain you will also find CRM data, so you can analyse all communication that is addressed to the affiliates.

Smartico DWH shouldn't be used to run analytical queries directly, as limitations apply to how much data can be retrieved daily.

Instead, you should do a batch load of changes to your analytical system and run reports on your side.

All fact tables are partitioned by date for your convenience so you can efficiently load only the data delta for the completed day. We recommend running the import of the previous day a few hours after midnight in UTC to ensure that last-day data is fully delivered to the fact tables.

Don't apply functions on the partition columns when filtering with 'WHERE' clause. The correct example of taking data for the last day is - 'WHERE fact_date >= TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY), DAY)'

Pay attention to the fact that the data set 'dwh_ext_xxx' is given as an example. The dataset name for your particular setup is the same as the name of the user to whom you will get to access DWH.

Records in the fact tables are immutable (except cases when stated explicitly), which means they are not changing after they are added.

Fact tables are updated in real-time, and dimensional tables are updated once per hour.

For the details of fact & dimensions tables please use following links:

Smartico DWH \ Affiliation viewsSmartico DWH \ CRM viewsSmartico DWH \ Gamification views

Segments export

You can query DWH to get a near-real-time list of users that belong to specific segments.

To make a segment available for exporting, it first needs to be marked as exportable through the DWH in Smartico BackOffice, as shown on the screen below.

After that, segment content can be queried using the following SQL, where XXX needs to be replaced with the ID of your label and YYY with the ID of the segment.


select * from dwh_ext_XXX.fn_export_segment_YYY()

Example of query results

Query for user profile details

You can query DWH to get detailed information about users stored on the Smartico side.


SELECT user_id, user_ext_id, core_registration_date, core_wallet_currency, core_user_language, core_tags, core_public_tags, core_external_markers
FROM dwh_ext_2283.j_user
where ARRAY_LENGTH(core_tags) > 0 and core_registration_date is not null

Note that only a small subset of user profile properties are exposed through the DWH, and only properties that are shared through all the clients. If you have a need to expose specific properly, please contact Smartico support.

Currently exposed properties:

PropertyTypeMeaning

user_id

INT64

ID of the user in Smartico system

user_ext_id

STRING

ID of the user in the integrated platform

core_registration_date

TIMESTAMP

Date/time of registration

core_user_last_time_online

TIMESTAMP

Date/time when user was last time online

ach_level_current_id

INT64

ID of the current gamification level

ach_points_ever

INT64

Amount of points user collected ever

ach_points_balance

INT64

Current points balance of user

user_country

STRING

Country

core_user_language

STRING

Language, ISO code

core_user_last_device_type

STRING

Last used device type (Desktop, Mobile, Native, Wrapper)

core_wallet_currency

STRING

Currency code

core_tags

STRING[]

Array of user markers

core_public_tags

STRING[]

core_external_markers

STRING[]

Transactions deduplication

Note that in rare cases some fact tables may contain duplicated records. This may happen in very exceptional cases and doesn't have impact on operation part of system, so for example if there is duplicated record for the "mail sent" fact, the mail wasn't sent twice, just a fact of sending is duplicated.

You can do a deduplication using the field marked as "PK" in corresponding table.

ColumnTypeNote

fact_date

TIMESTAMP

The time of transaction

level_change_id

INT64

Unique ID of the level change fact

from_level_id

INT64

ID of the level before transaction

to_level_id

INT64

ID of the level after transaction

points_change_source_id

INT64

If level change is triggered by points balance change, then this field represents the source of points change. Possible IDs:

    Journey = 1,
    AchievementTaskCompletion = 2,
    AchievementCompletion = 3,
    LevelsStructureChange = 4,
    StorePurchase = 5,
    ManualAdjustment = 6,
    Leaderboard = 7,
    Tournament = 11,
    AutomationRule = 12,
    TournamentRegistration = 13,
    TournamentRegistrationCancellation = 14,
    RefundPoints = 15,
    PlayMiniGame = 16,
    WinMiniGame = 17,

points_collected_on_level_change

INT64

If level change is triggered by points balance change, then this field represents the amount of points added to the user at the moment of upgrade

user_points_ever

INT64

"Points ever" balance of user after level hcange

user_ext_id

STRING

External ID of the user

crm_brand_id

INT64

ID of the brand in the Smartico system

How to connect to the Google BigQuery SQL

Smartico leverages Google BigQuery to offer robust and scalable data analytics capabilities. As part of our service, we provide a secure JSON key file that enables programmatic access to our DWH. This section outlines how to use the provided JSON key to connect to BigQuery, depending on the Business Intelligence (BI) tool you choose.

Overview of JSON Key Usage

A JSON key file is used as a credential for authentication. It allows secure access to Google BigQuery without needing user-specific Google credentials. The JSON key file you receive will be associated with a specific service account created and managed by our company, which has been granted access to necessary BigQuery resources.

Important Security Note

Please store your JSON key file securely and ensure it is only accessible to those who require it. Do not share the JSON key publicly or with unauthorized users, as it provides direct access to our Data Warehouse.

Connecting to BigQuery Using Various BI Tools

Different BI tools have varied methods for setting up connections to BigQuery using a JSON key. Below, we offer guidance on connecting some popular BI tools to our DWH. For detailed instructions, refer to the specific documentation for the BI tool you are using.

Note: you shouldn't run analytical queries directly on Smartico DWH, as it may incur high costs. The proper way is to build a pipeline that loads daily data in your internal DWH and build reports on top of it.

Google Data Studio

  • Connection Type: Direct integration with Google services.

  • Key Usage: Typically, Data Studio uses OAuth 2.0 for authentication, which does not require a service account JSON key if using a Google account with access rights.

Tableau

  • Connection Type: Direct connection using a service account.

  • Key Usage: Upload your JSON key file during the setup of the BigQuery connection in Tableau.

Looker

  • Connection Type: Direct connection using a service account.

  • Key Usage: Paste the contents of the JSON key file into the appropriate field when configuring the BigQuery connection in Looker.

Microsoft Power BI

  • Connection Type: Connect using ODBC drivers.

  • Key Usage: Use the Simba ODBC Driver for Google BigQuery and specify the path to your JSON key file in the DSN configuration.

Qlik Sense

  • Connection Type: Connect using ODBC drivers.

  • Key Usage: Configure a System DSN using the ODBC Data Source Administrator on Windows, including the JSON key file.

FAQ

How to analyze DWH usage

You can find 2 reports related to DWH usage in the Reports \ Analytics section of Smartico BackOffice.

DWH: requests count & data read for the current month - is showing the requests count and amount of data read per day

DWH: queries log for last 24 hours - will show you exact requests completed in the last 24 hours and the amount of data read for each

Can't access DWH from Power BI, error: Unable to authenticate with Google BigQuery Storage API. Check your account permissions.

You need to disable the usage of Storage API in Power BI settings following this guide - https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery.

"Use Storage API" - A flag that enables using the Storage API of Google BigQuery. This option is true by default. This option can be set to false to not use the Storage API and use REST APIs instead.

How to authenticate in Power BI with the service account

Use the following guide from Microsoft - https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery

  • Service Account Email: must be in email format

  • Service Account JSON key file contents: once this JSON key is downloaded, all new lines must be removed from the file so that the contents are in one line. Once the JSON file is in that format, the contents can be pasted into this field.

Pay attention the content of JSON file need to be manually formatted in one line

Can't access table dwh_ext_xxx.j_engagements and any other

You need to replace "xxx" value with the label_id specific for your setup. You can also see the label_id in the name of the BigQuery user that has the format l"dwh_ext_1234"

Can different labels and brands have access using the same DWH account?

Every label requires a separate account to access DWH.

You could have many brands under one label setup. Data related to all these labels will be available from the respective DWH account for this label

The documentation says not to use the DW directly for analytical queries, as it may involve costs. Is the best way to carry out a daily data load to an internal DW?

The best approach is to make a daily incremental load from Smartico DWH to your DWH.

Dimensional tables (prefixed as “dm”) are quite small, and you can reload them fully every day.

Other tables (facts tables), are partitioned by date/time, so you can run every day at 1 AM UTC, and get fresh data for the previous day, and load it to your DWH

When using a BI tool such as Power BI in the import model, can I connect directly to the DWH? Is there any financial or performance impact?

If you use Power BI or any other tool to query data directly from Smartico DWH, this may incur a high load and additional costs depending on how complex reports you are doing. Best way is to load data from Smartico DWH to your DWH and connect Power BI to your DWH.

Regarding the "import" mode in Power BI - according to the Microsoft documentation, it can work in two ways - "Full refresh" and "Incremental refresh". Based on the names, the best way is to use "Incremental way" in order to avoid high resource usage, but for exact setup please consult with an expert in the Power BI tool.

Is there a limit on the number of queries that does not generate this extra cost?

There is a limit of 1 TB of scanned data per month (it reflects the way how Google is building pricing for BigQuery usage). There is no limit in number of queries.

Reading the documentation, there are some details we'd like to know: Is the data in the DWH always available for a window of the last 180 days?

Yes, fact tables are limited to last 180 days. If you do incremental/daily load of data finally, you can store unlimited in terms of time, data on your side.

Can we get data from all the campaigns within this DWH? Information related to the segmentations we used, results by channel (sms, email, push, etc), value assigned to the active group and control group, CTR, conversion rate, etc?

You can get results from communication channels like SMS, email, push, etc.

The fact table for communication channels - j_communication, includes all fails, clicks, and impressions, so you can build derivative metrics like CTR, Delivery rate, Fail rate, etc.

j_engagements fact table represents information about campaigns, you can build conversion rate for campaigns using it.

Segment exporting is available through the table-function fn_export_segment, you can find details in the documentation above.

Last updated