Passive Capture is the process by which Experience Analytics software captures the data that flows between your visitor's computer and your web servers.
The following list of terms apply to the passive capture process:
- The switch is a hardware device that routes all incoming and outgoing data packets between your visitors' computers and your web servers. Typically, switches are configured using a hardware option called a https://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/10570-41.html, which delivers a copy of every HTTP packet to the capture server.
- The TCP/IP protocol organizes interaction between computers into packets. An individual Web page can be broken down into many packets, each transmitted individually between computers. The capture server typically monitors millions of packets traveling nearly simultaneously between your Web servers and visitors' computers. These packets can arrive in any order and sometimes must be retransmitted. The capture server can be configured to ignore packets that are not of interest, such as email messages or packets sent to IP addresses of servers not hosting the website.
- The HTTP protocol defines a request as a message requesting a response from one computer to another. The capture server collects all HTTP data to re-create the request and response traffic.
- A response is the return message to a computer, which has made a request. After capturing a request, the capture server then processes and assembles packets in search of the response to it.
- A hit is defined as a request and the corresponding response to it.
After the hit has been collected, the Passive Capture software can scan the data to see if the hit is of interest. For example, images that are displayed on every web page are not very interesting and can be discarded. Also, sensitive information such as user names, passwords, and credit card numbers can be deleted.
After removing unwanted data, the Capture software securely transmits the hit data to the Processing Server.
- Many website interactions are encrypted to protect the data from being read or manipulated by third parties. The Capture software has to decrypt the data in order to match requests and responses. Typically, the Capture software is configured to re-encrypt the software using SSL for transmission to the processing servers.
Stream data is the HTTP DataStream captured by Experience Analytics includes request, response, hit, session, and event data. This data is a sequence of digitally encoded signals (packets of data or data packets) that is used to transmit or receive information.
Experience Analytics receives all packets copied by the switch and forwarded down the SPAN port to the CX Passive Capture Application server.
Stream data can be modified in the CX PCA server and in the Windows™ pipeline on the Processing Server. The data can be modified in these two places only.
Of the packets received by the CX PCA server, only the HTTP and HTTPS packets are re-assembled, processed, and forwarded for additional processing and storage. In most configurations, other types of packets are ignored.
Two modes determine what is processed:
- Business Mode:
Retains only specified file extensions (such as
.asp) and encoding types.
- Business IT Mode:
Retains all hits, including static objects.
In Experience Analytics, a Request is the component of data that originates from when user visits your website.
Typically, the raw request data is not displayed anywhere in the Experience Analytics system, although for debugging purposes the system can be configured to show it. Instead, the REQ buffer is used to store meta data about the hit, including everything contained in the original request.
Each HTTP request (and only HTTP) that is captured causes the Passive Capture software to look for an HTTP response.
- If there is no request, any response is ignored.
- If the HTTP request is encrypted, Experience Analytics must decrypt it to understand what was requested.
Request and response data can be manipulated in the PCA pipeline through privacy rules and in the Windows pipeline via deployed and configured session agents.
You can use the Request view when you replay a visitor session to view the data contained in the Request.
In addition to storing the raw HTTP request data in Experience Analytics, the request record is used to store additional attributes for the hit.
These hit attributes are extracted or computed from the request or response and include information such as the IP address of the sender and receiver, performance timing, and form field variables.
This buffer is generated by the CX Passive Capture Application after the request and response have been captured.
The request record is an unstructured text blob containing multiple text delimited sections of either
name=value pairs or XML. The request record is always encoded as UTF-8. In addition to Experience Analytics predefined sections, a custom
[appdata] section can be populated during processing by configured session agents, usually to simplify downstream data processing or evaluation.
The hit attributes can be used as source data for events and reports. Hit attribute data can be exported to third-party systems via cxConnect for Data Analysis
Sections of the Request Record
Hit attributes can be displayed in a list of sessions found through search, either in the Portal or the CX RealiTea Viewer.
The following table lists some of the request sections from which hit attribute data is extracted.
Custom attributes populated by session agents. These attributes are automatically indexed for search.
HTTP request environment variables such as the HTTP Referer and HTTP Status Code.
Time stamp of the request and performance timing for the hit, calculated by Experience Analytics
||Parsed GET and POST data fields from the request.|
Fact information derived from the hit.
Hit Attribute Export
Displays showing selected hit attributes for sessions that are found through search can be exported from the Portal and the Viewer into Excel.
The contents of the request record can be extracted from the Experience Analytics system for import into other external systems by using ETL tools with the cxConnect for Data Analysis product.
For each HTTP request, the corresponding HTTP response is also captured and, if necessary, decrypted. Typically only responses of content-type text/html are retained, except for the following items, which are also retained:
- Error code responses
- RIA requests (XML)
- Binary Files explicitly kept
HTML data is stored in the same encoding scheme as it was captured.
The response data can be viewed using session replay, either in the browser or the viewer. For HTML responses, a rendered view and a source view are provided.
Response data can be used for evaluating events.
Request and response data can be manipulated in the PCA pipeline through privacy rules and in the Windows pipeline via deployed and configured session agents.
Each request record/response pair is reassembled in Experience Analytics to compose a hit.
Note: Experience Analytics discards most hits as static objects that are not unique to a particular browser session. This approach is how Experience Analytics keeps the data volume to a reasonable size.
Note: Experience Analytics uses the term page to refer to hits that are not discarded.
Experience Analytics hit counts often do not include every hit of the web server but sometimes do include more hits than page views recorded by other systems. While Experience Analytics defaults to keeping hits of
content-type = text/html, it can also be configured to keep other data types such as dynamically created images.
The attributes of the hit are displayed in the request record.
After a hit is discarded, the hit does not exist in Experience Analytics. However, its existence can be uncovered through replay, where the hit is regenerated, or by looking at the response HTML in some cases. A record of dropped hits is reported in the statistics that are generated by the CX Passive Capture Application.
For hits that are not discarded, the data in the request record or response can be processed by the Experience Analytics Event Engine. An event is defined as a trigger, a condition, and an action that is specified by an Experience Analytics user.
A trigger is a defined moment in the lifespan of a session when events can be evaluated. Each event is associated with a specific trigger and can only reference the event-related data available in that trigger.
A condition is either the occurrence of a text string in the hit data or the combination of other Events occurring in the hit.
The action for a hit event is to record the Event Identifier, actual Hit Time, event value, and more. These items are stored in separate records with the hit data.
Events are useful for modeling user interactions with the web application and to represent those interactions in structured reports. Events are similar to web page tags yet are added dynamically based upon the DataStream.
Events also help to manage session recording. Data triggered events can be used to monitor session, request record, and response text, which provides the basis for the event conditions and the basis for recording.
Hit event data can be seen in multiple places. Counts for active events are shown in the Portal in the Active Events page and can be used to trigger alerts.
The following are sample conditions that can be used for event definitions:
- Hit is received. Hit attribute is found in the request or response. Hit attribute can be defined as a specific string or as the content between two specified tags.
- Other events are processed. Session attribute value:
<, or a range of values.
- Session End. Session attribute value:
<, or a range of values.
Based upon the event conditions, one or more of the following actions can be taken:
- Make the event searchable: First occurrence or every occurrence
- Make the event reportable: First occurrence or every occurrence
- Store the detected value as text or as a number
- Trigger another event
- Scrape text between tags
- Store session state when the event occurred as dimensions
- Identify membership in a list/group
- Update session attributes
- Send event data to an external system through the Experience Analytics Event Bus
- Close session in Experience Analytics
- Extend Experience Analytics session timeout
- Discard session
Definitions of the events are stored in a common database location to which all Canisters in the environment refer. Individual instance event data are stored with the session to which it is associated. Aggregated counts and average numeric values for events are recorded into a database.
Event data fields include:
- Session Key
- Hit Attributes: Key, Index, or other metadata
- Detected values:
- String that matched
- String that is bounded by the match pattern, such as
Name = Value
- String that is converted to a number (for example,
27.50as shopping cart total)
- String identifier of value that is defined in an enumerated list (for example, List of OS types)
- String identifier of group to which text found belongs (for example,
- Any reference dimensions that are associated with the event.
Event Data Export
Selected event data can be streamed across a TCP/IP socket to external systems in real time by using the Event Bus API for third-party analysis.
In Experience Analytics, a session is a series of hits between a specific browser and the web server, assembled to present a clear picture or representation of how a visitor interacted with the web site.
A typical session involves an individual user interacting with the web server to request (by sending an HTTP request) and retrieve (through the returned HTTP response) a series of web pages before leaving the site. These request/response pairings are stitched together into hits, and the sequence of hits in the session are stitched together to comprise the session data.
Sessions to which the visitor is continuing to add hits are known as active sessions. If the visitor is no longer adding hits for a predefined time period or triggers an action (such as logging out of the site), the active session may be closed. Sessions that have been closed are known as completed sessions.
Every hit belongs to a session. Sessions can fragment due to various factors.
Since HTTP is a stateless protocol, Experience Analytics requires a method of associating the hits of an individual session. In almost all deployments, this association is managed through a session cookie.
As each hit is received in the Processing Server (which manages the Canister) from the PCA server, the cookie is used to store the hit with previously captured hits. After no additional hits that are containing the cookie are received for the configured Idle Session Timeout period, the session is closed.
Session durations that exceed a preconfigured value can trigger closure.
Like a web server, Experience Analytics cannot typically identify if hits are coming from different browser windows on the same requesting browser. Hits from different browser windows are integrated into the same session.
Sessions can become fragmented. For example, the visitor can resume a session after a period of inactivity exceeding the timeout value. Even though the session cookie is the same, Experience Analytics stores this visitor's experience as two session fragments. The following situations can cause session fragmentation:
- Experience Analytics or web application timeout setting is exceeded
- Sessions that are stored across multiple data centers
- Sessions that are stored across multiple Canisters
- Large sessions can exceed maximum session size limits
- Poor sessionization
At search time, Experience Analytics provides the ability to defragment such sessions. For replay and analysis of individual sessions, Experience Analytics can connect the fragments.
Note: Reporting data indicates that session fragments are individual sessions. For example, the time gap between fragments may be longer than the reporting data collection interval.
Through events, you can populate session attributes with specified values. These variables and their values can be found through search in the Portal or the RTV application.
For sessions that are completed, Experience Analytics can process event conditions for the entire session, such as the occurrence of certain hit events during the session.
Events are defined and managed through the Experience Analytics Event Manager, which is accessible through the portal. By defining events, the user can model the workflow through the monitored website and create markers for search and report aggregation.
Canister data is derived data that is created from the HTTP DataStream. Canister data is stored in several places and forms.
A properly configured Experience Analytics system attempts to store all hits for a session in the same Canister, which is a daily collection of sessions that are processed by one Processing Server and the indexes that are associated with them.
Although most users do not see canister data, it can be viewed through the Active menu in the Portal or by searching for active sessions through the Portal or the Viewer. Completed Canister data can be viewed through search of completed sessions through the Portal or the Viewer.
Note: Active sessions are not indexed, but the data structures allow fast searching of some hit attributes. Experience Analytics can perform full scans of the response data, but this method can slow system performance.
Canister data is typically retained for 10 days, after which it is erased to make room for newer data. The cxVerify product allows for search-based subsets of each Canister to be stored for longer periods in a different Canister.
Sessions can sometimes be fragmented across multiple Canisters. Experience Analytics can defragment sessions across multiple Canisters for replay in the Portal.
The Canister is divided into two parts:
- The Short Term Canister contains active sessions, where hits are still being received as they occur. Hit event records are created now.
- Unless the Canister is spooling, a new hit added to an active session is available for search and review through the Portal in a matter of seconds.
- Search of active sessions is limited to full text search, which is slower than indexed search for completed session data.
- The Long Term Canister contains completed sessions, which are created when no hits are received for the Idle Timeout period or other trigger met. When a session is closed, all session events are processed. The session is recorded to disk, and the contents are indexed by a text search engine.
- An active session is rendered a completed session within approximately five minutes of the end of the session, unless the system is behind in indexing sessions.
- Completed sessions are collected into a set of LSSN files for the day.
- Aggregate counts for hit events and session events are collected and aggregated for the reporting database.
Dimensions are sets of reference data that can be captured and recorded when the event is triggered. Dimensions are associated with a defined event.
A dimension contains a set of values that are captured by a defined pattern or value recorded from an event. These values provide contextual information at the time when the event is recorded. They are stored in the request when the hit is processed by the Canister.
For example, Experience Analytics provides the following reference dimensions. You can also define your own event dimensions.
If an event is associated with these dimensions, the values of these dimensions are recorded with the event when it is triggered. So, if an event is created to detect the presence of Status Code 500 errors in the response, the values of the above can be recorded with this event instance to facilitate debugging the issue.
Dimensions are organized into groups. A report group is a collection of dimensions. A dimension may belong to multiple report groups. When recording, the Experience Analytics system collects aggregate counts for every combination of dimension values.
An event may be associated with multiple report groups.
When an event is triggered in a hit, the Report Group data is recorded with the event in a structure that is called a fact in the REQ record. A fact contains the recorded event value and any dimension values for associated report groups and other data.
This internal data structure is used to facilitate searching on dimensional data that are related to the recorded event. Each dimension instance value is hashed to provide a more easily indexed value for searching. When a string is input through the search interface (for example, "/DEFAULTPAGE"), the same algorithm is used to create a hashed value that can be found in the search index.
The following code sample is an example fact that is recorded in the
[TLFID_80] section of the request buffer:
[TLFID_80] Searchable=True TLFID=80 TLFactValue=1 TLDimHash1=38A7EF5D4FA961F712055D92FC56088A TLDimHash2=BC3F1812E3C8837962A83226D4A30082 TLDimHash3=8606AC74FD2DECC1899004C49B226FAE TLDimHash4=5E6D512952FFBB9673B1D0CB08EF33B0 TLDim1=/DEFAULTPAGE TLDim2=WWW.TEALEAF.COM TLDim3=OTHERS TLDim4=18.104.22.168
In the above, the fact identifier (
TLFID=80) and recorded event value (
TLFactValue=1) are listed above the hashed values and plain text values for each dimension. Only the first 256 characters of the dimension value are recorded in plain text.
In Experience Analytics, an index is an arrangement of important words that appear in the HTTP DataStream. When a session is completed, it is written from the in-memory database (STC) to disk and marked for indexing.
Because retaining captured hit data is expensive in terms of disk space, for most Experience Analytics deployments, only a subset of the captured hit data is indexed. Experience Analytics indexes:
- The body of the response, without HTML tags.
- Selected sections stored in the request record.
- Selected event data such as the event identifier and event value.
Indexed data includes:
- Select data from the request record
[urlfield], session attributes
- Event Data (ID, value)
HTML/Headers are excluded
This index data is retained for the same length of time as the Canister data. The index data can be regenerated from the Canister data at any time.
Note: Depending on system load and configuration, canister data is typically indexed within 5 minutes of session completion.
A generated index cannot be viewed, although search results indicate the use of indexes. Using the same indexing algorithms as the Canister, the Viewer can create and display an index for the sessions that are currently loaded, although an exact match is not guaranteed.