[Building Sakai] Discussion: KNL-364 and KNL-500

Steve Swinsburg steve.swinsburg at gmail.com
Thu May 27 05:04:59 PDT 2010


On 27/05/2010, at 6:49 PM, Jean-Francois Leveque wrote:

> Why shouldn't the session table also be in the history tables ? There 
> are lots of other useful informations in SAKAI_SESSION.
> 

Technically they should be archived together and then SQL can be run to your hearts content.

However, the motivation behind this was not for adding this to make querying historical data easier, it was for making live queries faster.

Via the API, in order to retrieve the user associated with an Event you need to go via the UsageSessionService for the session associated with that event, which as I mentioned is always a DB hit. The event records are never updated, so IMO the minor denormalization here is acceptable.

So my explicit +1.

cheers,
Steve





> I would be willing to reconsider my vote based on the answers to Ray's 
> criteria to justify denormalization.
> 
> So far other criteria do not work for production, AFAICT.
> 
> J-F
> 
> Ian Boston a écrit :
>> +1 to Stephens comment, iirc once the session table is cleaned, there  
>> is no way of connecting an event to a user. Adding user id to each  
>> record would make archiving for audit simpler.
>> 
>> Ian
>> 
>> Sent from my iPhone
>> 
>> On 22 May 2010, at 09:40, "Stephen Marquard"  
>> <stephen.marquard at uct.ac.za> wrote:
>> 
>>> Also I'm guessing many sites denormalize the data anyway for  
>>> archiving.
>>> 
>>> We certainly do, i.e. our SAKAI_EVENT history tables (in a separate  
>>> db)
>>> have USER_ID added, otherwise queries on very large data sets (e.g.  
>>> all
>>> events in 2010 for a given user) become very slow.
>>> 
>>> So I don't have an issue with the denormalization here - I think it
>>> makes sense. As the data is written once and is immutable, there's no
>>> risk of inconsistency, and it's a worthwhile tradeoff of some disk  
>>> space
>>> for performance.
>>> 
>>> Regards
>>> Stephen
>>> 
>>>>>> Ray Davis <ray at media.berkeley.edu> 5/21/2010 6:01 PM >>>
>>> In my experience, denormalization is a normal part of the relational  
>>> DB
>>> 
>>> development cycle on a relational DB. See, for example:
>>> 
>>> http://www.agiledata.org/essays/dataModeling101.html#Denormalize
>>> http://www.siue.edu/~dbock/cmis564/denormal.htm
>>> 
>>> Basically, the workflow I'm used to is
>>> 
>>> 1. Normalize as part of data analysis.
>>> 2. Implement the "pure" code.
>>> 3. Monitor performance/scalability to find bottlenecks.
>>> 4. Denormalize where it looks like it would help.
>>> 5. Go back to 3. :)
>>> 
>>> What's needed to justify denormalization:
>>> * Evidence that the denormalization actually improves things (since
>>> intuition is often wrong)
>>> * Modifications to the denormalized tables should be coordinated in a
>>> single code location
>>> 
>>> If both of those hold true in this case, would you be willing to
>>> reconsider your vote?
>>> 
>>> Best,
>>> Ray
>>> 
>>> On 5/21/10 7:58 AM, Jean-Francois Leveque wrote:
>>>> -1 on adding USER_ID to SAKAI_EVENT, not a good idea from a DB point
>>> of
>>>> view : denormalization
>>>> 
>>>> Why not caching?
>>>> 
>>>> cheers,
>>>> Jean-Francois
>>>> 
>>>> Dunstall, Christopher a écrit :
>>>>> Definitely a +1 for this.
>>>>> 
>>>>> This would make statistics generation faster and less resource
>>> hungry than it currently is.
>>>>> 
>>>>> Chris Dunstall | Service Support - Applications
>>>>> Technology Integration/OLE Virtual Team
>>>>> Division of Information Technology | Charles Sturt University |
>>> Bathurst, NSW
>>>>> Ph: 02 63384818 | Fax: 02 63384181
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: sakai-dev-bounces at collab.sakaiproject.org
>>> [mailto:sakai-dev-bounces at collab.sakaiproject.org] On Behalf Of
>>> Steve Swinsburg
>>>>> Sent: Wednesday, 19 May 2010 10:26 PM
>>>>> To: sakai-dev at collab.sakaiproject.org Developers
>>>>> Subject: [Building Sakai] Discussion: KNL-364 and KNL-500
>>>>> 
>>>>> Some recent work in the UsageSessionService and EventTrackingService
>>> API's has brought to the surface this old feature request, adding a
>>> USER_ID field to the SAKAI_EVENT table.
>>> http://jira.sakaiproject.org/browse/KNL-364
>>>>> As it stands, to get the userId of the person who published an
>>> event, you must go via the sessionId. This is achieved via:
>>> UsageSessionService.getSession(id) however this is a database query,  
>>> and
>>> there is no cache so is a potential bottleneck.
>>>>> Interestingly, Event.getUserId() has a comment in its Javadoc that
>>> if the value is null, to use the session. However, this value is  
>>> always
>>> null, it's never set.
>>>>> And while we are at it, might as well add a getter to the Event
>>> object for the time field, since there is no way to get that either,
>>> even though it is stored in the table.
>>>>> http://jira.sakaiproject.org/browse/KNL-500
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> cheers,
>>>>> Steve
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> 
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"



More information about the sakai-dev mailing list