To begin populating a newly created dimension, use of whitelists only is recommended because the data volume for capturing observed values can become problematic.
Note: Even if you created a whitelist to normalize observed values, the entire set of observed values is still captured to the database. In Whitelist + Observed Values
, the whitelist functions only to normalize values. It does not filter the data set that is written to the database. Enabling the capture of observed values allows these values to be recorded into the database without limit.
Unbounded dimensions can affect the Tealeaf solution in the following ways:
- Cleared growth of the
TL_REPORTS
database - Bottlenecks in the Data Service and Portal
- Performance in the Event Manager, as well as the Portal and user client UI
Data integrity issues
Ideally, a dimension is configured to capture all meaningful values. In practice, however, it may not be possible. Whether you are capturing whitelisted values only or whitelist + observed values, there are data integrity issues that should be considered.
If you configured a Whitelist Only
dimension, the whitelist is the set of all values. All values that are detected in the capture stream and do not appear in the whitelist are identified as [Others]
values in the data set. You can still perform data analysis, but detailed analysis on the individual [Others]
values is not possible.
For Whitelist + Observed Values
, data integrity is a bit more complicated. Suppose your dimension is configured to capture 1000 values per hour in a one-Canister environment, and you are interested in two values: Value A and Value B. Neither value is stored in the whitelist.
Recorded values are indicated below. The Detected columns indicate the number of values that were captured for the dimension before the value instance was detected; if the number of values is over 1000, then by default the recorded value is [Others]
.
Hour | Value A Detected # | Value A Recorded | Value B Detected # | Value B Recorded |
---|---|---|---|---|
1 | 100 | Value A |
200 | Value B |
2 | 100 | Value A |
1200 | [Others] |
3 | 1100 | [Others] |
200 | Value B |
4 | 1100 | [Others] |
1200 | [Others] |
According to the data, Value A
occurred only in Hours 1 and 2, while Value B
occurred in Hours 1 and 3.
However, if both values are added to the whitelist, then they are detected and recorded every hour.
Note: Whether you are using Whitelist Only
or Whitelist + Observed
Values
, it is important to review and update your whitelists regularly to maintain data integrity and to limit the volume of captured data.