Data Collector

These settings define the characteristics for the Tealeaf Data Collector, including buffer and batch sizes, intervals and time-outs, and the gathering of statistics on Tealeaf events.

The Data Collector scans each active Canister every 5 minutes for updated information. It is a Windows™ service on the Report Server.

Table 1. Data Collector
Setting	Description	Default
`Data Aggregation`	Set this value to `Enabled` to enable data aggregation of collected data for reporting purposes. Note: Don't change this setting unless directed by Tealeaf.	`Enabled`
`Data Aggregation - cxOverstat Daily Data Processing`	Describes the available settings for the frequency of daily cxOverstat aggregations and the date range over which they occur.	`Daily Through Previous Day`
`Data Aggregation - Daily Data Processing`	Describes the available settings for the frequency of daily aggregations and the date range over which they occur. Note: Configuring this setting to a value other than `Daily through Start of Hourly Retention Period` retains overlapping aggregated data in the database and may impact system performance during data aggregation.	`Daily through Start of Hourly Retention Period`
`Data Aggregation - Daily Data Time of Day`	Defines the hour of the day when the daily data aggregation run is performed, if the daily data aggregation is set to occur on a daily basis. Note: Configure the daily data aggregation run to be performed during an off-peak hour.	`02:00`
`Data Aggregation - Hourly Data Compression`	Enable this setting to compress data and to reduce the amount of space that is used by the database. When enabled, the process runs hourly instead of daily.	`Enabled`
`Data Aggregation - Hourly Performance Data Compression`	Enable this setting to compress data and to reduce the amount of space that is used by the database. When enabled, the process runs hourly instead of daily.	`Disabled`
`Data Aggregation - Max Concurrent Daily Threads`	The maximum number of concurrent threads that can be used for aggregating daily reporting data.	`4`
`Data Aggregation - Max Concurrent Performance Threads`	The maximum number of concurrent threads that can be used for aggregating performance data.	`3`
`Data Aggregation - Max Concurrent Threads`	The maximum number of concurrent threads that can be used for aggregating base reporting data.	`4`
`Data Aggregation - Max dimension extraction threads`	The maximum number of concurrent threads that can be used for aggregating dimension data.	`4`
`Data Aggregation - Performance Daily Data Processing`	When data aggregation is performed on hourly performance data, this parameter defines the scope of the data that is aggregated at the daily level. Available options: Note: Configuring this setting to a value other than `Daily through Start of Hourly Retention Period` retains overlapping aggregated data in the database and may impact system performance during data aggregation. `Hourly through Current Hour` - Performance data is aggregated to the daily level through the current hour. `Daily through Previous Day` - Performance data is aggregated at the daily level through the previous day. `Daily through Start of Hourly Retention Period` - Performance data is aggregated at the daily level for dates before the start of the hourly data retention period, after which the hourly data is applicable.	`Daily through Start of Hourly Retention Period`
`Data Aggregation - Performance Daily Data Time of Day`	The time of day when the data aggregation run is performed on performance data to aggregate hourly data to daily data	`4:00`
`Data Aggregation - Performance Data`	When `Enabled`, data on client performance, response times, and connection times is aggregated for reporting purposes.	`Enabled`
Data Aggregation - Performance Data Use View	You can enable or disable the Use View to improve performance.	`Disabled`
Data Aggregation - Use View	You can enable or disable the Use View to improve performance.	`Disabled`
`Data Collection`	Set this value to `Enabled` to enable the Data Collector service to collect data from the Long Term Canister for insertion into the Tealeaf database. Note: Don't change this setting unless directed by Tealeaf.	`Enabled`
`Data Collection - Batch Size`	How many records to extract or load at once. Note: Don't change this setting unless directed by Tealeaf.	`2000`
`Data Collection - Max Concurrent`	The number of canisters from which the Data Service will collect in parallel.	`2`
`Data Collection - Maximum rows collected per canister table`	If `Data Collection - Limit run time` is enabled, this parameter defines the maximum number of rows that the Data Collector is permitted to collect from a single Canister table on each Data Collector run. Note: This parameter is used in conjunction with `Data Collection - Limit run time` to force more frequent updates of report data, which assists the data collection process when it is far behind or struggling with a sudden spike in data volume.	`2000000`
`Data Collection - Limit run time`	When enabled, the Data Collector gathers only outstanding data with a timestamp that occurs before the start of the data collection run, instead of collecting all outstanding data. When enabled, the `Data Collection - Maximum rows collected per canister table` function is also enabled, and its limit is applied. Note: This parameter is used in conjunction with `Data Collection - Maximum rows collected per canister table` to force more frequent updates of report data, which assists the data collection process when it is far behind or struggling with a sudden spike in data volume.	`Disabled`
`Data Collection Processes - Max Tries Per Staging Table Set`	When an individual Data Collection run times out, the timeout setting is doubled. This process can be repeated up to the number of times defined in this setting. The timeout is applied across all processing runs. If the timeout is reached, the new doubled timeout setting applies until the service is restarted, at which point the timeout setting is reset to the value defined in the `Database Connection - Timeout (seconds)` setting.	`5`
`Data Collector - Log Entry Max Wait Time (minutes)`	Defines the frequency in minutes for writing accumulated log entries to the log file.	`5`
`Data Collector - Log Entry Threshold (rows)`	Defines the number of log entries that are required before writing to the log file. Log entries are saved when one of the following occurs: The log entry threshold that is configured for this setting is met. The amount of time that is defined in Data Collector Log Entry Max Wait Time (minutes) has expired.	`1000`
`Data Collector Logging Level`	Specify the logging level for the Data Collector only. Note: Do not change this setting unless directed by Tealeaf. Levels: 0 - `none` 1 - `Error` (default value) 2 - `Warning` 3 - `Info` 4 - `Detail` 5 - `Status` 6 - `Trace` 7 - `All` Note: Status level messages always appear in the log for any non-zero logging level. This value overrides the system logging level, which can be configured through TMS.	`Error`
`Data Extraction - Max Table Queue Size`	The maximum number of in-memory tables maintained by the Data Service. This setting is used to limit the size of the Data Service memory footprint if the speed of reading from the canister is significantly faster than writing to SQL Server. Note: Don't change this setting unless directed by Tealeaf.	`100`
`Data Trimming - Canister Data`	When `Enabled`, the un-aggregated event data is trimmed based on the configured Data Collector settings. Note: When this value is set to `Disabled`, no trimming occurs at all. The SQL Server database can grow without limit.	`Enabled`
`Data Trimming - Canister Data Immediate Trim`	When `Data Trimming - Canister Data` is `Enabled`, enabling this parameter forces the canister data tables to be immediately trimmed after the Data Collector has aggregated the data in them. Note: If errors are encountered during an aggregation operation, the data in with which the errors are retained. Re-aggregation and trimming is attempted during normal canister trim operations. If `Data Trimming - Canister Data` is disabled, this parameter is ignored. Don't change this setting unless directed by Tealeaf.	`Enabled`
`Data Trimming - Interval`	Determines the interval at which the Data Service trims data from the database 0 - None, 1 - Hourly, 2 - Daily, 3 - Weekly.	`Hourly`
`Data Trimming - Max Batch Size`	The maximum number of records trimmed from the reporting or canister data tables in any single delete statement executed as part of a trimming operation.	`100000`
`Data Trimming - Reporting Data`	When `Enabled`, the reporting data is trimmed based on the configured Data Collector settings. Note: When this value is set to `Disabled`, no trimming occurs at all. The Reports database can grow without limit.	`Enabled`
`Data Trimming - Statistics`	When `Enabled`, the Tealeaf statistics data is trimmed based on the configured Data Collector settings. Note: When this value is set to `Disabled`, no trimming occurs at all. The Statistics database can grow without limit.	`Enabled`
`Data Trimming - System`	When `Enabled`, the user activity logs in the `TL_SYSTEM` database are trimmed based on the configured Data Collector settings. Note: When this value is set to `Disabled`, no trimming occurs at all. The related System database tables can grow without limit.	`Enabled`
`Data Trimming - Time for Daily Trim`	If the value for `Data Trimming - Interval` is `Daily`, then this value defines the time of day at which the reporting data is trimmed. Time is based on the Tealeaf system time zone. It should be configured for an off-peak hour.	`3:00`
`Data Trimming - Top Movers`	When `Enabled`, the Top Mover data is trimmed based on the configured Data Collector settings. Note: When this value is set to `Disabled`, no trimming occurs at all. Top Movers data in the database can grow without limit.	`Enabled`
`Database Connection - Timeout (seconds)`	The timeout in seconds when connecting to the database. If the Data Collector aggregation operation times out (exceeds this setting), the setting is doubled in the next run. If it times out again, this timeout continues to be doubled until the number of times defined in the `Data Collection Processes - Max Tries per Staging Table` setting. The temporary extended connection timeout setting is maintained until the service is restarted, after which it reverts to the original timeout value defined for this setting.	`60`
`Database Growth Calculation Time of Day`	The time of day when the database growth report is populated with current size information This report is available through the Portal.	`5:00`
`Dimension Log Aggregation`	When `Enabled`, Tealeaf dimension values are aggregated from log entries at predefined intervals.	`Enabled`
`Dimension Log Aggregation Interval`	When `Dimension Log Aggregation` is enabled, this setting defines the time interval between checks of the logs for reference values.	`Hourly`
`Dimension Log Aggregation Time`	When `Dimension Log Aggregation` is enabled and `Dimension Log Aggregation Interval` is set to `Daily`, this setting defines the 24-hour time when the review of the logs is executed.	`3:00`
`Dimension Trimming - Day of Week`	If `Dimension Trimming - Frequency` is set to `Weekly`, then this setting defines the day of the week when the trim operation is executed. If `Dimension Trimming - Frequency` is set to `Monthly`, then this setting defines the first occurrence of the day in the month when the trim operation is executed.	`Sunday`
`Dimension Trimming - Frequency`	Set this value to how frequently the dimension trimming operation is executed: `Daily`, `Weekly`, or `Monthly`.	`Daily`
`Dimension Trimming - Time of Day`	The time of day when the dimension trimming operation occurs Note: Set this value to occur during an off-peak hour, as early as possible after the end of peak usage and after the Scheduling Service has cycled services. If services are cycled during a dimension trim operation, the Data Collector is forced to restart. Since the Database Table Size report is updated at 2AM, changes to the table sizes are not reflected in the report until the following evening under the default setting.	`3:00`
`Dimension Trimming - Update Fact Counts`	In addition to trimming dimension values from the dimension data, this setting enables the updating of counts of trimmed dimension values to `[others]` in all reporting data, when `Enabled`. When `Enabled`, updating of the counts for trimmed dimension values requires making the change to every instance of the dimension value in all reporting data. Depending on the number of instances and the number of trimmed values, this process can take a few minutes to multiple hours to complete. When `Disabled`, reporting data is not updated during dimension trimming operations. As a result, discrepancies can be introduced between the sum of event counts not filtered by the trimmed dimension and the sum of event counts filtered by the trimmed dimension. Note: If you do not update counts in the reporting data as part of your dimension trimming, updates for previously reviewed dimension values are not subsequently applied to the reporting data if the option is enabled at a later time.	`Enabled`
`Dimension Value Tracking - Max Concurrent Threads`	The maximum number of concurrent threads that can be spawned when timestamps for individual dimension values are being updated. The Data Collector updates the timestamps for each value in each dimension when they are detected. During a dimension trimming operation, the Data Collector reads the timestamps associated with each value to determine the most recently occurring ones. This setting defines the maximum number of threads that can be spawned during the timestamp updating process, which runs independently of the dimension trimming process. Note: Leave this value unchanged from the default setting.	`4`
`Fact Limits - Check Interval (minutes)`	Defines the interval when the number of facts written for each event within the past hour is compared to the permitted maximum. Accepted values are 15, 30 or 60 minutes.	`60`
`Fallback to RowByRow insert on BulkInsert Error`	When enabled and an error occurs during bulk insertion of data, the Data Collector reverts to inserting data row by row.	`Disabled`
`PreAggregation - Canister Polling Interval (seconds)`	Frequency, in seconds, of polling canisters to check for new data.	`60`
`PreAggregation - Staging Table Threshold (rows)`	The number of rows that are required for triggering staging table creation and writing the pre-aggregated data to the staging table. Note: Data is also written to the staging table if minimum required rows are not accumulated but the amount of time that is configured in PreAggregation - Staging Table Write Max Wait Time has expired.	`100000`
`PreAggregation - Staging Table Write Max Wait Time (minutes)`	The maximum amount of time, in minutes, the pre-aggregator will wait before writing the pre-aggregated data to the staging table. Note: If the value set in PreAggregation - Staging Table Threshold (rows) has not yet been reached during this time period, the data that has been pre-aggregated data that is accumulated during this time period is written to staging table.	`5`
Refresh Frequency (minutes) - Event Definitions	The frequency, in minutes, of refreshing event definitions from the database.	`1`
Refresh Frequency (minutes) - Control Settings	The frequency, in minutes, of refreshing control settings from the database.	`1`
`Send Report Schedules`	When `Enabled`, scheduled reports are delivered according to their configured settings.	`Enabled`
`Table Partitioning`	If `Enabled`, this setting tells the Data Service that tables in the reporting database have been partitioned and need to be maintained properly. Note: Don't change this setting unless directed by Tealeaf.	`Disabled`
`TDEL Buffer (minutes)`	The number of minutes of already collected data to leave in the canister for recovery purposes if the system is running slowly. Note: Don't change this setting unless directed by Tealeaf.	`30`
`Top Movers`	When `Enabled`, Top Movers that have been configured and enabled are calculated. Hourly Top Movers are calculated once per hour, and daily Top Movers are calculated once per day.	`Enabled`
`Top Movers - Auto-calculate Daily Top Movers`	When `Enabled`, this setting forces the creation and calculation of daily Top Movers for all current and newly created events and event + dimension combinations. Switching this setting changes the status of all Top Movers in the system. You may still manually enable or disable individual Top Movers. Note: Enabling this setting can have significant impacts on data storage and performance.	`Disabled`
`Top Movers - Auto-calculate Hourly Top Movers`	When `Enabled`, this setting forces the creation and calculation of hourly Top Movers for all current and newly created events and event + dimension combinations. Switching this setting changes the status of all Top Movers in the system. You may still manually enable or disable individual Top Movers. Note: Enabling this setting can have significant impacts on data storage and performance.	`Disabled`
`Top Movers - Maximum data points for calculations`	The maximum number of data points required to calculate average and deviation values for event and dimension Top Movers. No more than the maximum number of data points are used in the calculation of averages and standard deviations for event and dimension Top Movers.	`16`
`Top Movers - Minimum data points for calculations`	The minimum number of data points required to calculate average and deviation values for event and dimension Top Movers. If there are too few data points, the deviation is not calculated for the period. Tealeaf recommends that you do not set this value below the default value (`4`). The minimum accepted value is `2`.	`4`
`Top Movers - Number of Threads used for calculations`	The number of threads that the Data Collector uses when performing Top Mover hourly and daily calculations. Note: You can raise this setting to attempt to improve performance of Top Mover calculations. However, depending on the system load at the time of calculation, raising this setting can negatively impact system performance.	`4`
`Top Movers - Time for Daily Calculation`	When `Top Movers` is enabled, this value specifies the time when deviation calculations are performed for daily Top Movers. Time is based on the Tealeaf system time zone. It should be configured for an off-peak hour.	`4:30`
`Top Movers - Calculation Mode`	This setting configures how Top Movers are computed. `Consecutive Days` - Top Movers are calculated for consecutive days. For example, the data reported for a single Top Mover might contain entries for each day of last week and this week. `Same Days` - Top Mover deviations are calculated from the same day from the previous weeks. For example, deviation values for Wednesday are computed using the preceding Wednesdays.	`Consecutive Days`

Data Collector Statistics log (`TLDataCollectorStats.log`)

The TLDataCollectorStats.log contains one line for every minute, indicating counts about AGG, AGKEY, PATH tables.

The TLDataCollectorStats.log is written to the Data Collector Server and contains the following columns:


READ-AGG READ-AGKEY READ-PATH WRITE(O)-AGG WRITE(O)-AGKEY WRITE(O)-PATH WRITE(A)-AGG 
WRITE(A)-AGKEY WRITE(A)-PATH

WRITE(A), stands for the actual records written to the Pre-aggregator
WRITE(O), is the original count, before pre-aggregation. (The Original count that matches with Read).

Data Collector Statistics log (`TLDataCollectorStats.log`)

Join the community

Academy

Portal administration

Data Collector Statistics log (TLDataCollectorStats.log)

Related articles

Join the community

Academy

Data Collector Statistics log (`TLDataCollectorStats.log`)