Data Processing/Sessionization

Home / Data Processing/Sessionization
In this example we will build on the DimML application defined in the Web Analytics use case which collects several fields from interaction on a webpage. We will sessionize the data, thus making data available on the sequence of events such as the number of events in the session, the last referring URL used by the visitor, the path of events and an indication if it’s the start of the session. For that purpose we will first introduce a SessionID which will be stored in localstorage of the browser. Initially a random function is used for that.
val SessionId = `sessionStorage.dimmlcid=sessionStorage.dimmlcid||guid()`

def guid = `dimml.sha1(+new Date()+Math.random().toString(36).slice(2)+navigator.userAgent).slice(20)`
We can now use this session id to create a session using the flow element. Note that the flow element introduces a server side variable session which can be used to assign values to (an reuse in following calls). In the code below we will introduce a field (boolean) which shows the start of the visit (entry) with a 1 or a 0. We also introduce the entire path of the session in terms of page names. Finally we use a session based parameter to persist a value throughout the session. This means that the field is updated when a new value is set, but the previous version is used when the current value of the parameter is empty. The mechanism can be used for any and all field in the data tuple.
flow
=> session['SessionId',
    Visit = `(session[pageName] = (session[pageName]?:0).toLong()+1)==1?1:0`@groovy,
    PagePath = `(session.pagepath=(session.pagepath?:[]))<<pageName`@groovy,
    ReferrerPersist = `session.referrer = referrer?:session.referrer?:""`@groovy
]
Finally we would like to capture some specific information once a session is over. For this we can use the buffer:session flow element which makes a data tuple available once a session has been completed. More information on this flow element can be found here. In the example code we will capture the startTime for the first interaction in the session and the endTime for the lat interaction in the session. The difference is then the total duration of the session, which is something we can compute with the code flow element into a field like this:
=> buffer:session['SessionId', timeout = '30m', `
 session.startTime = session.startTime?:System.currentTimeMillis()
 session.endTime = System.currentTimeMillis()
 false
`@groovy]
=> code[sessionDuration = `Math.round(((endTime-startTime)/60000)+30)`@groovy]

Additional assignment

  1. Integrate this code with the code of the Web Analytics use case. Note that for testing purposes we recommend setting the session expiration to 30s to make testing the processing of a session a lot quicker.
  2. Extend the code such that the number of viewed page is also available at the end of the session
  3. Extend the code such that the a field Exit is available which has the value 1 at the end of a session and 0 for all other events