[sakai2-tcc] KNL-652 site caching testing (UMich preliminary testing)

Anthony Whyte arwhyte at umich.edu
Tue Mar 15 08:11:02 PDT 2011


> AZ: I think that it is necessary to make 2.8 very production ready from a technical view. I think that 2.8 should be released with these patches in place or held until we can get them in. I don't like the idea of having to tell everyone not to use 2.8 and just wait for 2.8.1.


Agreed.  Aaron's view echoes my earlier recommendation.

> ST: We have announced that we plan to release 2.8.0 at the end of this month. There is little if any precedent for a "preview" release,


Agreed, no precedent exists plus I regard it as a waste of time and energy to release a 2.8.0 "preview."   Besides, a preview already exists in the guise of the 2.8.x branch, if not earlier Sakai CLE releases such as 2.7.1 (we are not talking about new software here as is the case with OAE).  I recognize that getting 2.8.x up and running requires a basic understanding of Java, Maven, Tomcat and, depending on taste, MySQL or Oracle (if not demoing with HSQLDB) but unless one cedes responsibility to a vendor to deploy and maintain a Sakai installation, knowing how to work with these technologies is a prerequisite for deploying the Sakai CLE.

> ST: If we commit to making this short-term problem (as in releasing 2.8.1 quickly), the question is no longer technical, I think we can avoid a serious PR problem for this group as we have already agreed to a month-long delay.


2.8.1 release timing should be based on technical considerations, resource availability and Community needs.  PR problems "for this group" is not something that should drive decision-making in this area.

Currently, I know of only two schools who plan to deploy 2.8.0 during Q2 and Q3 2011: Rutgers and the Universidad Complutense de Madrid.  Would Rutgers prefer to deploy sakai-2.8.0 with the KNL-652 fix in place, acknowledging that we may need another week before Michigan is satisfied with their production testing or is Chuck Hedrick happy to patch 2.8.0 sans KNL-652 after its released?  I think it a question worth asking.

Speaking personally, I would prefer to generate a 2.8.0 release tag and artifacts that are likely to be deployed at some future date by the likes of Michigan, UCT, Indiana, ANU, Stanford, UPV, Rutgers, Oxford, rSmart, Longsight, Unicon--that is, those institutions and SCAs most closely associated with Sakai CLE development, QA and maintenance.  I expect that if we release 2.8.0 without a fix for KNL-652 in place no institution or organization listed above will deploy 2.8.0 as is--a highly unsatisfactory as well as uninspiring result for those charged with shepherding the release through to completion.

In my 2.8.0 status reports to the TCC (24 Jan, 15 Feb) I noted that a 1 March release date was unrealistic and recommended that we target late March for a release which I described in my 15 Feb update as "possible."   I think it important to note that irrespective of KNL-652, a few 2x blockers remain.

Cheers,

Anthony




My 15 Feb update:

> Earlier, I recommended that we revise the schedule and target the end of March 2011 for a release.  I use the word "target" deliberately as none of us are in a position to compel anyone other than ourselves to meet particular deadlines.  That said, I believe that the blockers listed below can be addressed (or if necessary downgraded) within the next couple of weeks if the requisite energy is applied.  A late March release is possible but it only allows for three QA tags. 
> 
> 22/23 Feb QA tag
> 8/9 March QA tag
> 22/23 March QA tag
> 24-30 March prep for release 
> 31 March release ?





On Mar 15, 2011, at 4:10 PM, Aaron Zeckoski wrote:

> I think that it is necessary to make 2.8 very production ready from a
> technical view. I think that 2.8 should be released with these patches
> in place or held until we can get them in. I don't like the idea of
> having to tell everyone not to use 2.8 and just wait for 2.8.1.
> 
> I also think we should push these into the 2.7 kernel as well.
> 
> Unicon can provide technical and testing support for both 2.8 and 2.7
> to assist with these efforts.
> 
> -AZ
> 
> 
> On Tue, Mar 15, 2011 at 8:38 AM, David Haines <dlhaines at umich.edu> wrote:
>> There are 4 caching changes that should be added as a group.  When we added just the site caching fix other issues popped up that quickly led to similar problems.  We needed KNL-293, KNL-654, KNL-662, and KNL-664 to get memory under control.
>> 
>> The problem clearly doesn't make Sakai useless since the problem exists since at least 2.6 and some people have been running that code for a long time.  I wouldn't run or recommend running 2.7 or 2.8 in production without these fixes but, in fact, it is likely that many installations aren't big enough for this to make a problem in the short term.  Does it make any sense to release a 2.8.0 "preview" on time and then a final with these fixes? 2.8 as it stands should be useful for preparing an installation of 2.8.
>> 
>> IMHO it is not a good idea for Sakai to release code with a serious problem, but that is a political not a technical opinion.
>> 
>> - Dave
>> 
>> 
>> David Haines
>> CTools Developer
>> Digital Media Commons
>> University of Michigan
>> dlhaines at umich.edu
>> 
>> 
>> 
>> 
>> On Mar 15, 2011, at 5:32 AM, Anthony Whyte wrote:
>> 
>>> It's a blocker against kernel 1.3.0 (sakai-2.8 binds to kernel 1.2).  For sakai-2.8.0 it is being tracked as a known issue and David Horwitz and I have held back from including the fix in kernel 1.2/1.1/1.0 releases while we await the results of UMich testing.
>>> 
>>> I'd like to hear from the UMich members of the TCC list regarding their assessment of the site caching fix (r88169-70) and the implications for sakai-2.8.0 if it is included/excluded.
>>> 
>>> Cheers,
>>> 
>>> Anthony
>>> 
>>> 
>>> On Mar 14, 2011, at 11:21 PM, May, Megan Marie wrote:
>>> 
>>>> Isn't this already on the blocker list for 2.8 or are you proposing an elevation of the bug?
>>>> 
>>>> Megan
>>>> 
>>>> -----Original Message-----
>>>> From: sakai2-tcc-bounces at collab.sakaiproject.org [mailto:sakai2-tcc-bounces at collab.sakaiproject.org] On Behalf Of Aaron Zeckoski
>>>> Sent: Friday, March 11, 2011 8:55 AM
>>>> To: Anthony Whyte
>>>> Cc: sakai2-tcc Committee
>>>> Subject: Re: [sakai2-tcc] KNL-652 site caching testing (UMich preliminary testing)
>>>> 
>>>> I think that is a really good plan.
>>>> It may be worth backporting to 2.7 as well.
>>>> -AZ
>>>> 
>>>> 
>>>> On Fri, Mar 11, 2011 at 8:38 AM, Anthony Whyte <arwhyte at umich.edu> wrote:
>>>>> I recommend that if Michigan's production testing continues to proves
>>>>> positive we should include this fix in sakai-2.8.0, even if it means
>>>>> we delay the release a further 7-10 days in order to work it in.
>>>>> Cheers,
>>>>> Anthony
>>>>> 
>>>>> Begin forwarded message:
>>>>> 
>>>>> From: "David Haines (JIRA)" <bugs-admin at sakaiproject.org>
>>>>> Date: March 11, 2011 3:25:42 PM GMT+02:00
>>>>> To: mt-jira at collab.sakaiproject.org
>>>>> Subject: [mt-jira] [Sakai Jira] Commented: (KNL-652) SIte caching is
>>>>> broken in Kernel 1.1.9
>>>>> 
>>>>> 
>>>>>   [
>>>>> https://jira.sakaiproject.org/browse/KNL-652?page=com.atlassian.jira.p
>>>>> lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=122629#c
>>>>> omment-122629
>>>>> ]
>>>>> 
>>>>> David Haines commented on KNL-652:
>>>>> ----------------------------------
>>>>> 
>>>>> Load testing at Michigan using a lightly patched 1.1.9 kernel
>>>>> indicates that the memory leak is significantly reduced.  This
>>>>> required applying 4 patches related to caching: KNL-293, KNL-654,
>>>>> KNL-662, and KNL-664 to fix multiple leaks.  Our tests indicate that a
>>>>> site cache size of 10000 gives a site cache hit rate of around 90%.
>>>>> We should have a week or more of production experience with these
>>>>> changes, applied to a 1.1.10 kernel, as of the week of 4/4/11.
>>>>> 
>>>>> SIte caching is broken in Kernel 1.1.9
>>>>> 
>>>>> --------------------------------------
>>>>> 
>>>>>               Key: KNL-652
>>>>> 
>>>>>               URL: https://jira.sakaiproject.org/browse/KNL-652
>>>>> 
>>>>>           Project: Kernel - K1
>>>>> 
>>>>>        Issue Type: Bug
>>>>> 
>>>>>        Components: Impl
>>>>> 
>>>>>  Affects Versions: 1.1.9
>>>>> 
>>>>>          Reporter: David Haines
>>>>> 
>>>>>          Assignee: David Horwitz
>>>>> 
>>>>>          Priority: Blocker
>>>>> 
>>>>>           Fix For: 1.3.0
>>>>> 
>>>>>       Attachments: Classes in Heap.png, Top Consumers.png
>>>>> 
>>>>> 
>>>>> Site caching is broken in the K1 kernel 1.1.9 release.  Sites objects
>>>>> themselves are cached but when sites are evicted from the site cache
>>>>> the secondary, in memory, caches for tools, pages, and groups are not
>>>>> cleaned up, the references to the objects remain and those objects can
>>>>> not be garbage collected.  Over time this leads to large numbers of
>>>>> objects that won't be used but will consume memory.  At Michigan,
>>>>> after not restarting Sakai for a couple of weeks we ended up with
>>>>> almost 2GB of memory devoted to the site cache.  That caused long
>>>>> periods of fruitless garbage collection and a degradation of service.
>>>>> 
>>>>> The problem is avoided in the short run by restarting Sakai instances
>>>>> or by manually clearing the caches with the Admin memory tool.
>>>>> 
>>>>> A patch to address this is being tested at Michigan.
>>>>> 
>>>>> Note that the only way to control Ehcache cache memory consumption for
>>>>> a memory only cache is to limit the number of elements in the cache.
>>>>> KNL-293 should also be applied when the site caching is fixed so that
>>>>> there is a way to adjust the site cache size to match local requirements.
>>>>> 
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> -
>>>>> For more information on JIRA, see:
>>>>> http://www.atlassian.com/software/jira
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> mt-jira mailing list
>>>>> mt-jira at collab.sakaiproject.org
>>>>> http://collab.sakaiproject.org/mailman/listinfo/mt-jira
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> sakai2-tcc mailing list
>>>>> sakai2-tcc at collab.sakaiproject.org
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Aaron Zeckoski - Software Engineer - http://tinyurl.com/azprofile _______________________________________________
>>>> sakai2-tcc mailing list
>>>> sakai2-tcc at collab.sakaiproject.org
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> sakai2-tcc mailing list
>>> sakai2-tcc at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>> 
>> _______________________________________________
>> sakai2-tcc mailing list
>> sakai2-tcc at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>> 
> 
> 
> 
> -- 
> Aaron Zeckoski - Software Engineer - http://tinyurl.com/azprofile
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3829 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai2-tcc/attachments/20110315/71434b8e/attachment-0001.bin 


More information about the sakai2-tcc mailing list