◾Smartico Data Warеhouse
Access to Smartico DWH is a paid service that should be requested separately.
Please contact your Customer Success Manager to get more details.
Intro
Smartico provides access to the Data Warehouse (DWH) with row data related to the system operation. The DWH is based on Google BigQuery and exposes facts and dimensions tables.
There are mainly three main domains of data exposed through the DWH:
CRM - information about campaigns, communication, conversions, automation rules, etc
Gamification - missions, levels, tournaments, mini-games, points, store purchases
Affiliation - registration facts, financial transactions, payments, adjustments, affiliates information
Some of these domains have shared information, e.g. bonuses - can be given from the CRM and from Gamification. In the Affiliation domain you will also find CRM data, so you can analyse all communication that is addressed to the affiliates.
Smartico DWH shouldn't be used to run analytical queries directly, as limitations apply to how much data can be retrieved daily.
Instead, you should do a batch load of changes to your analytical system and run reports on your side.
All fact tables are partitioned by date for your convenience so you can efficiently load only the data delta for the completed day. We recommend running the import of the previous day a few hours after midnight in UTC to ensure that last-day data is fully delivered to the fact tables.
Don't apply functions on the partition columns when filtering with 'WHERE' clause. The correct example of taking data for the last day is - 'WHERE fact_date >= TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY), DAY)'
Pay attention to the fact that the data set 'dwh_ext_xxx' is given as an example. The dataset name for your particular setup is the same as the name of the user to whom you will get to access DWH.
Records in the fact tables are immutable (except cases when stated explicitly), which means they are not changing after they are added.
Fact tables are updated in real-time, and dimensional tables are updated once per hour.
For the details of fact & dimensions tables please use following links:
◾Smartico DWH \ Affiliation views◾Smartico DWH \ CRM views◾Smartico DWH \ Gamification viewsSegments export
You can query DWH to get a near-real-time list of users that belong to specific segments.
To make a segment available for exporting, it first needs to be marked as exportable through the DWH in Smartico BackOffice, as shown on the screen below.
After that, segment content can be queried using the following SQL, where XXX needs to be replaced with the ID of your label and YYY with the ID of the segment.
Example of query results
Query for user profile details
You can query DWH to get detailed information about users stored on the Smartico side.
Note that only a small subset of user profile properties are exposed through the DWH, and only properties that are shared through all the clients. If you have a need to expose specific properly, please contact Smartico support.
Currently exposed properties:
user_id
INT64
ID of the user in Smartico system
user_ext_id
STRING
ID of the user in the integrated platform
core_registration_date
TIMESTAMP
Date/time of registration
core_user_last_time_online
TIMESTAMP
Date/time when user was last time online
ach_level_current_id
INT64
ID of the current gamification level
ach_points_ever
INT64
Amount of points user collected ever
ach_points_balance
INT64
Current points balance of user
user_country
STRING
Country
core_user_language
STRING
Language, ISO code
core_user_last_device_type
STRING
Last used device type (Desktop, Mobile, Native, Wrapper)
core_wallet_currency
STRING
Currency code
Transactions deduplication
Note that in rare cases some fact tables may contain duplicated records. This may happen in very exceptional cases and doesn't have impact on operation part of system, so for example if there is duplicated record for the "mail sent" fact, the mail wasn't sent twice, just a fact of sending is duplicated.
You can do a deduplication using the field marked as "PK" in corresponding table.
fact_date
TIMESTAMP
The time of transaction
level_change_id
INT64
Unique ID of the level change fact
from_level_id
INT64
ID of the level before transaction
to_level_id
INT64
ID of the level after transaction
points_change_source_id
INT64
If level change is triggered by points balance change, then this field represents the source of points change. Possible IDs:
points_collected_on_level_change
INT64
If level change is triggered by points balance change, then this field represents the amount of points added to the user at the moment of upgrade
user_points_ever
INT64
"Points ever" balance of user after level hcange
user_ext_id
STRING
External ID of the user
crm_brand_id
INT64
ID of the brand in the Smartico system
How to connect to the Google BigQuery SQL
Smartico leverages Google BigQuery to offer robust and scalable data analytics capabilities. As part of our service, we provide a secure JSON key file that enables programmatic access to our DWH. This section outlines how to use the provided JSON key to connect to BigQuery, depending on the Business Intelligence (BI) tool you choose.
Overview of JSON Key Usage
A JSON key file is used as a credential for authentication. It allows secure access to Google BigQuery without needing user-specific Google credentials. The JSON key file you receive will be associated with a specific service account created and managed by our company, which has been granted access to necessary BigQuery resources.
Important Security Note
Please store your JSON key file securely and ensure it is only accessible to those who require it. Do not share the JSON key publicly or with unauthorized users, as it provides direct access to our Data Warehouse.
Connecting to BigQuery Using Various BI Tools
Different BI tools have varied methods for setting up connections to BigQuery using a JSON key. Below, we offer guidance on connecting some popular BI tools to our DWH. For detailed instructions, refer to the specific documentation for the BI tool you are using.
Note: you shouldn't run analytical queries directly on Smartico DWH, as it may incur high costs. The proper way is to build a pipeline that loads daily data in your internal DWH and build reports on top of it.
Google Data Studio
Connection Type: Direct integration with Google services.
Key Usage: Typically, Data Studio uses OAuth 2.0 for authentication, which does not require a service account JSON key if using a Google account with access rights.
Tableau
Connection Type: Direct connection using a service account.
Key Usage: Upload your JSON key file during the setup of the BigQuery connection in Tableau.
Looker
Connection Type: Direct connection using a service account.
Key Usage: Paste the contents of the JSON key file into the appropriate field when configuring the BigQuery connection in Looker.
Microsoft Power BI
Connection Type: Connect using ODBC drivers.
Key Usage: Use the Simba ODBC Driver for Google BigQuery and specify the path to your JSON key file in the DSN configuration.
Qlik Sense
Connection Type: Connect using ODBC drivers.
Key Usage: Configure a System DSN using the ODBC Data Source Administrator on Windows, including the JSON key file.
FAQ
How to analyze DWH usage
You can find 2 reports related to DWH usage in the Reports \ Analytics section of Smartico BackOffice.
DWH: requests count & data read for the current month - is showing the requests count and amount of data read per day
DWH: queries log for last 24 hours - will show you exact requests completed in the last 24 hours and the amount of data read for each
Can't access DWH from Power BI, error: Unable to authenticate with Google BigQuery Storage API. Check your account permissions.
You need to disable the usage of Storage API in Power BI settings following this guide - https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery.
"Use Storage API" - A flag that enables using the Storage API of Google BigQuery. This option is true by default. This option can be set to false to not use the Storage API and use REST APIs instead.
How to authenticate in Power BI with the service account
Use the following guide from Microsoft - https://learn.microsoft.com/en-us/power-query/connectors/google-bigquery
Service Account Email: must be in email format
Service Account JSON key file contents: once this JSON key is downloaded, all new lines must be removed from the file so that the contents are in one line. Once the JSON file is in that format, the contents can be pasted into this field.
Pay attention the content of JSON file need to be manually formatted in one line
Can't access table dwh_ext_xxx.j_engagements and any other
You need to replace "xxx" value with the label_id specific for your setup. You can also see the label_id in the name of the BigQuery user that has the format l"dwh_ext_1234"
Can different labels and brands have access using the same DWH account?
Every label requires a separate account to access DWH.
You could have many brands under one label setup. Data related to all these labels will be available from the respective DWH account for this label
The documentation says not to use the DW directly for analytical queries, as it may involve costs. Is the best way to carry out a daily data load to an internal DW?
The best approach is to make a daily incremental load from Smartico DWH to your DWH.
Dimensional tables (prefixed as “dm”) are quite small, and you can reload them fully every day.
Other tables (facts tables), are partitioned by date/time, so you can run every day at 1 AM UTC, and get fresh data for the previous day, and load it to your DWH
When using a BI tool such as Power BI in the import model, can I connect directly to the DWH? Is there any financial or performance impact?
If you use Power BI or any other tool to query data directly from Smartico DWH, this may incur a high load and additional costs depending on how complex reports you are doing. Best way is to load data from Smartico DWH to your DWH and connect Power BI to your DWH.
Regarding the "import" mode in Power BI - according to the Microsoft documentation, it can work in two ways - "Full refresh" and "Incremental refresh". Based on the names, the best way is to use "Incremental way" in order to avoid high resource usage, but for exact setup please consult with an expert in the Power BI tool.
Is there a limit on the number of queries that does not generate this extra cost?
There is a limit of 1 TB of scanned data per month (it reflects the way how Google is building pricing for BigQuery usage). There is no limit in number of queries.
Reading the documentation, there are some details we'd like to know: Is the data in the DWH always available for a window of the last 180 days?
Yes, fact tables are limited to last 180 days. If you do incremental/daily load of data finally, you can store unlimited in terms of time, data on your side.
Can we get data from all the campaigns within this DWH? Information related to the segmentations we used, results by channel (sms, email, push, etc), value assigned to the active group and control group, CTR, conversion rate, etc?
You can get results from communication channels like SMS, email, push, etc.
The fact table for communication channels - j_communication, includes all fails, clicks, and impressions, so you can build derivative metrics like CTR, Delivery rate, Fail rate, etc.
j_engagements fact table represents information about campaigns, you can build conversion rate for campaigns using it.
Segment exporting is available through the table-function fn_export_segment, you can find details in the documentation above.
Last updated