To prevent unbounded growth of the database tables that are used to store dimension values, Tealeaf enforces the automated trimming of dimension values.
When a dimension is configured to capture and store observed values, each instance of a detected value is stored in the database, which means that the database can grow without limit. The automation of dimension value trimming is intended to place an upper limit on the volume of data stored in your system for each dimension and to prevent total consumption of available disk space.
Note: Dimension trimming is intended to prevent the database from completely consuming available storage space and causing system-wide failures. It should not be used as a replacement for monitoring the data growth of individual dimensions. Use whitelists wherever possible for tracking dimension values.
How it works
Periodically, the Data Collector scans the dimension values for each dimension in the TL_REPORTS
database. If the number of stored values for any dimension exceeds the globally defined limit for a dimension, then the Data Collector trims the oldest values in the database, which is based on the timestamp when the value was last captured. The oldest values are trimmed until the number of values for the dimension is less than the specified limit. For example, if the defined global limit is 1,000,000 values per dimension and Dimension A contains 1002,000 values, the next trimming that is run by the Data Collector will remove the 2,000 oldest values.
- The oldest values are determined by the timestamps that are associated with each value. These timestamps are updated whenever the dimension values are updated in a separate process.
- Suppose Dimension A is captured before Dimension B, and then Dimension A is captured again. In this case, Dimension B is considered to be older than Dimension A, since the timestamp for Dimension A occurred more recently.
- The time when the dimension values were last updated is available through the Data Collector log in the Portal.
- In the reporting data, all references to the dimension values above the global limit may be remapped to the
[others]
category as part of the dimension trimming run. This step in the process is resource-intensive. - Whitelisted values are not removed during a dimension trimming.
Note: Except for calendar-related dimensions, all dimensions that are visible to Tealeaf users are analyzed and, if necessary, trimmed, which includes dimensions that are provided by Tealeaf.
Note: If the number of values that are stored for a dimension reaches the defined global limit, the number of values is trimmed to the global limit. However, if new values are detected for the dimension, then they are stored until the next time the Data Collector trims dimension values, adding more values above the global limit. In this manner, a dimension that is trimmed once can be trimmed each time the Data Collector runs, which further impedes system performance. As more dimensions reach the global limit, the process to trim them takes longer and longer. Any dimension that was trimmed should be converted to a whitelist, if possible.
Monitor defined limits
In the System Status report, you can review the Database Table Size report, which contains information about the growth of dimension data tables. This report can be used to monitor the size of the dimension data in your system.
By default, the Database Table Size report runs once per day at 2:00 AM. By default, dimension trimming happens at 3:00 AM. Therefore, by default the Database Table Size report monitors table size before trimming has occurred, effectively reporting on yesterday's growth.
You can change the timing of the dimension trim operation.
Define the global limit
By default, the global limit for the number of values that can be stored in any individual dimension is 750,000 values. When dimension trimming occurs, each dimension is trimmed to contain no more than 750,000 values.
Depending on the volume of data that Tealeaf captures, you may want to adjust this value for your available storage resources. Keep in mind that this setting is applied to all dimensions that Tealeaf provides and that you define. The value that you define should be sufficiently high to effectively manage the highest volume dimensions in your environment.
Note: The global limit for number of values that are stored in a dimension should be defined such that few dimensions ever reach it. A dimension that is continuously being trimmed needs to be managed through a whitelist.
Update counts for trimmed dimension values in report data
When the oldest values for a dimension are automatically trimmed, reporting data that uses the dimension values can be optionally updated, too. This updating involves replacing each instance of a trimmed value with the [others]
dimension constant.
By default, the Dimension Trimming - Update Fact Counts
setting is enabled.
For a dimension trimming operation, the number of database changes is factored by:
- The number of instances of a trimmed value in the reporting database
- The number of reports using the value
- The number of values that are trimmed for the dimension
- The number of dimensions being trimmed
For data-intensive dimensions, such as URL (Normalized
), it is especially important to manage the dimension values using white lists.
Note: Updating the event + dimension counts for trimmed values in the reporting data can take a few minutes to multiple hours to complete.
Review the results
When the option to update fact counts is disabled, reporting data is not updated during dimension trimming operations. As a result, discrepancies can be introduced between the sum of event counts not filtered by the trimmed dimension and the sum of event counts filtered by the trimmed dimension.
If a set of dimension values was trimmed and the corresponding report data was not updated retroactively, a warning message is displayed in the Portal when the dimension is used in a Report Builder report.
Note: If you do not update counts in the reporting data as part of your dimension trimming, updates for previously reviewed dimension values are not then applied to the reporting data if the option is enabled later.
After you review the results of the trimming operation in your reports, you can reset the trim flag on the dimension so that the Portal message is no longer displayed when the dimension is used in reports.
Schedule dimension trimming
The trimming process can take a long time, especially if there is a high volume of reporting data to update. It should be defined to be run during off-peak hours.
In particular, the URL (Normalized)
dimension that is provided by Tealeaf can generate a high number of values in a short period of time, and since it appears in various performance reports that are provided by Tealeaf as well as any user-defined reports referencing the dimension, trimming of the dimension can require a great deal of updating of the report data, which is resource-intensive.
Schedule the dimension trimming to occur as early as possible during the evening, after Tealeaf users are no longer heavily using the system and after the Scheduling Service cycled all services.
An early scheduling ensures that other processes scheduled for off-peak hours, such as scorecard calculation and scheduled reports delivery, can use the most up-to-date dimension values.
Note: While the Data Collector trims dimension values, report data is not updated from the Canisters. As a result, any scheduled reports that are configured to be run while the dimension trimming is occurring may have incomplete data. Report data is not updated until the next scheduled run of the Data Collector.
In the Data Collector setting of the Tealeaf CX Settings category of the Portal Management page, you may configure the settings to define the scheduling of dimension trimming.