[Building Sakai] theory about missing users in site

Cris J Holdorph holdorph at unicon.net
Wed Mar 5 12:43:32 PST 2014


I ran into a problem that sounds almost exactly like this.  I worked 
some with Aaron Z. but even he didn't hear about the final result we had.

We tried in our own custom code to get a 'Site' object from the 
SiteService.  We then tried to do addMember then 'save' the Site object.

There was absolutely a race condition/dirty cache problem, where if we 
threw enough users (we could even replicate with TWO! users) at the 
system quickly enough, we could cause the problem.

The users themselves would appear to get added to the site just fine. 
But eventually some users would disappear from the site.  The reason was 
that a later user who was doing the same actions, would overwrite the 
memberships.  So, now you ask, why didn't the second user have up to 
date membership list?  Caching.  Specifically the caching in KNL-600.

We found out that if we disabled that cache, we did not have the 
problem.  Now, this caching is important, so I wouldn't recommend 
someone turn off the caching, but I can confirm with this caching turned 
on, there is absolutely a bug.  But it's one of those bugs that is 
REALLY difficult to prove in native sakai, in one instance only.  I 
would suspect you could reproduce the bug in a clustered environment if 
you could get two users (in two browers) two use two different machines 
in the cluster.  I have not tried that.

---- Cris J H

On 03/05/2014 01:26 PM, Noah Botimer wrote:
> There should be a cluster-wide membership invalidation on any changes.
> If something has disrupted that, it is almost certainly a bug.
>
> I think you identify some important items here. An atomic add/remove
> with cache invalidation would narrow the window in terms of time.
> Committing only the delta would narrow the window in terms of scope.
>
> Given that this is not a problem that has come up with any regularity, I
> wonder if site-manage is doing anything special to mitigate the issue. I
> don't think it is, so I suspect that the invalidation isn't firing for
> some reason, though it just might not come up with site-manage since
> site admin is usually pretty single-threaded. Anyway, with invalidation
> alone, the window should be somewhere in the neighborhood of one second.
> i would expect one of the users to see the problem immediately, not some
> minutes into a test.
>
> Is it at all possible that the JSP is doing something strange with
> transactions? Specifically, is the add committed to disk immediately
> (ruling out a dangling transaction waiting for a full commit/rollback)?
>
> Thanks,
> -Noah
>
> On Mar 5, 2014, at 11:10 AM, Charles Hedrick wrote:
>
>> I have a site that any user can join. It’s used for training people,
>> and documenting that they have been trained.
>>
>> They use a JSP that joins them to the site and then puts them into it.
>> They take a test in the site. but 15 min into the test, they are
>> suddenly no longer in the site.
>>
>> Here’s my theory:
>>
>> The code in the JSP was
>>
>> +         AuthzGroup realmEdit = AuthzGroupService.getAuthzGroup(realmId);
>> +         realmEdit.addMember(email, role, true, false);
>> +         AuthzGroupService.save(realmEdit);
>>
>> But I think this code has a race condition. I believe getAuthzGroup
>> will get a cached version of the realm. So it could be out of date. If
>> a user is added on one front end, the next user could fetch an old
>> version of the realm, add to that, and save it, thus losing the first
>> user. The next time the first front end refreshes its cache, the user
>> will be out of the site.
>>
>> What I’m trying as a fix is to use joinGroup. This appears to be
>> race-free, because it uses SQL that does an add.
>>
>> In my case it has a problem, in that joinGroup always works on the
>> current user. I want to do this before the login. I can fake it, but
>> it would be convenient to have addUser and removeUser operations that
>> specify a user.
>>
>> The situation would be better if there were a getAuthzGroupEdit, that
>> always fetched from the database. That would limit the race condition
>> to a fairly small window.
>>
>> Another approach would be for the AuthzGroup object to have a
>> duplicate copy of all the fields.The save operation does’t wipe out
>> the group and recreate it. It quite intelligently compares current and
>> new value ,and make the appropriate changes. But you really want to
>> compare with the original value of the object, not the current database.
>>
>> Consider
>>
>> getAuthGroup returns {Smith{
>> someone on another front end adds Jones
>> the program adds White
>> the program saves {Smith, White}
>>
>> You want to compare the new value, {Smith, White} against the original
>> value when the group was fetched, i.e. {Smith}. That will cause you to
>> add White. The code compares it with the current database, which is
>> {Smith, Jones}, thus causing it to delete Jones and add White.
>>
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> <mailto:sakai-dev at collab.sakaiproject.org>
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>
>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>


More information about the sakai-dev mailing list