[Building Sakai] Proposed API change to reduce cost of resources list view

Jim Eng jimeng at umich.edu
Wed Sep 23 06:04:21 PDT 2009


Could we use triggers to keep the second table in sync with the realms  
tables?  Here's what I think should be correct MySQL 5.1 syntax for  
one such trigger:

create trigger public_content_insert after insert on SAKAI_REALM_RL_FN
	for each row begin
		if NEW.FUNCTION_KEY in (select FUNCTION_KEY from  
SAKAI_REALM_FUNCTION where FUNCTION_NAME = 'content.read') and  
NEW.ROLE_KEY in (select ROLE_KEY from SAKAI_REALM_ROLE where ROLE_NAME  
= '.anon') then
			insert into SAKAI_PUBLIC_CONTENT_REALMS  select REALM_ID from  
SAKAI_REAMS where REALM_KEY = NEW.REALM_KEY;
		end if;
	end;
	
The syntax for the inner insert statement seems to be incorrect, but  
the error messages are obscure and unhelpful.  If anybody has  
suggestions about the correct syntax, I'd like to do some testing.

We would also need a trigger for deletes and possibly one for  
updates.  This approach would keep the SAKAI_PUBLIC_CONTENT_REALMS  
table in sync with the realms tables, but a trigger would fire for  
every insert to and delete from the SAKAI_REALM_RL_FN table.  I wonder  
whether that is costly.

I still think the best approach would be to get the list of realm id's  
from the realms tables like I suggested in the first email.  Most of  
the time the list would be empty, so we'd need to understand the  
performance implications of the query when it returns no results.

Thanks.

Jim



On Sep 22, 2009, at 5:39 PM, Jim Eng wrote:

> Drew did a little testing of this query in Oracle and found that it  
> ran OK when it returned results.  He said it is hard to determine  
> the performance when it finds no matching rows. It did use the index  
> in both cases.
>
> Maybe we will need to take the route Adi (and others) suggested --  
> keeping a separate table for fast lookups of realms with public view  
> enabled.  We would still let the realms tables handle actual  
> permissions checks in the relatively rare cases where someone is  
> trying to access a public resource, but for looking up info for the  
> resources tool's list view, we could get it from another table.
>
> To do that, we would need to add realm id's to a new table. I'll  
> call the new table 'SAKAI_PUBLIC_CONTENT_REALMS' for now.  It might  
> need only one column, the realm_id.  We would then need to keep that  
> in sync with the realms table by adding realms id's to it whenever a  
> resource is made public and removing a realm id whenever a resource  
> is no longer public.
>
> Problems would arise if someone edited the realm in the admin's  
> realms tool and added or removed the .anon role for content.read in  
> a realm without doing it through the resources tool.  Not sure how  
> to prevent that.
>
> To do this, we'd need to do a conversion to populate the new table.   
> I think the entire conversion would look something like this:
>
> create table SAKAI_PUBLIC_CONTENT_REALMS (REALM_ID VARCHAR2(255) NOT  
> NULL);
>
> insert into SAKAI_PUBLIC_CONTENT_REALMS
>   select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM
>   where SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY
>   and FUNCTION_KEY in (select FUNCTION_KEY from SAKAI_REALM_FUNCTION  
> where FUNCTION_NAME = 'content.read')
>   and ROLE_KEY in (select ROLE_KEY from SAKAI_REALM_ROLE where  
> ROLE_NAME = '.anon');
>
> That might not be exactly right, but I think it's close.  It could  
> be done inside ContentHosting with no changes at all to authz.
>
> I still think it would be better to use the existing realms tables,  
> if that can be made more efficient, to avoid having to keep two  
> tables in sync.
>
> Jim
>
>
> On Sep 22, 2009, at 3:56 PM, Jim Eng wrote:
>
>> I think the query might look something like this:
>>
>> select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM  
>> where SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and   
>> SAKAI_REALM.REALM_ID LIKE '/content/group/ 
>> 8da01d38-024a-4c1d-9184-7e798cf5837e/%' and FUNCTION_KEY in (select  
>> FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCTION_NAME =  
>> 'content.read')  and ROLE_KEY in (select ROLE_KEY from  
>> SAKAI_REALM_ROLE where ROLE_NAME = '.anon');
>>
>> I have not yet been able to test this on Oracle or MySQL.  I'm  
>> trying to get to a point where I can tell whether this is even  
>> close to what we'd want.
>>
>> Jim
>>
>>
>>
>>
>> On Sep 22, 2009, at 3:21 PM, Jim Eng wrote:
>>
>>> It looks like MySQL handled this well in both cases while Oracle  
>>> handled one well and more or less failed on the other, resulting  
>>> in a full-table scan.  Is that right?
>>>
>>> If I understood Drew's message, Oracle used the index for '/admin/ 
>>> %' and did a full table scan for '/group/%'.  Is that right?
>>>
>>> It seems like '/admin/%' would match very few rows and '/group/%'  
>>> would match many many rows (possibly an overwhelming majority?).
>>>
>>> Jim
>>>
>>>
>>> On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:
>>>
>>>> Are you arguing that calling one like select (which still uses  
>>>> the index) is slower than individually calling (potentially  
>>>> 1000’s) multiple rows one at a time?
>>>>
>>>> Bryan
>>>>
>>>>
>>>> On 9/22/09 1:57 PM, "Stephen Marquard"  
>>>> <stephen.marquard at uct.ac.za> wrote:
>>>>
>>>> Happily that does seem to be the case for mysql as well (which is  
>>>> usually worse than Oracle in index-related optimizations), e.g.
>>>>
>>>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>>>> LIKE '/group/%';
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+--------+-------------+
>>>> | id | select_type | table            | type  |  
>>>> possible_keys          | key                    | key_len | ref   
>>>> | rows   | Extra       |
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+--------+-------------+
>>>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>>>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL  
>>>> | 375212 | Using where |
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+--------+-------------+
>>>> 1 row in set (0.00 sec)
>>>>
>>>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>>>> LIKE '/group/!admin/%';
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+------+-------------+
>>>> | id | select_type | table            | type  |  
>>>> possible_keys          | key                    | key_len | ref   
>>>> | rows | Extra       |
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+------+-------------+
>>>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>>>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL  
>>>> |   72 | Using where |
>>>> +----+-------------+------------------+------- 
>>>> +------------------------+------------------------+--------- 
>>>> +------+------+-------------+
>>>> 1 row in set (0.46 sec)
>>>>
>>>> Cheers
>>>> Stephen
>>>>
>>>> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009  
>>>> 7:48 PM >>>
>>>> Right sided like searches are still able to use the DBMS's  
>>>> indexes.  They are still really quick.  So I think this is a good  
>>>> idea.
>>>>
>>>> Bryan
>>>>
>>>>
>>>> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net>  
>>>> wrote:
>>>>
>>>> My understanding is that SQL statements involving "like" are not  
>>>> very
>>>> efficient, either.  Are you trading one problem for another?
>>>>
>>>> - Mark
>>>>
>>>> Jim Eng wrote:
>>>> > The list view in the Resources tool in Sakai 2.x has a column  
>>>> to show
>>>> > who can see each resource or folder. Potential values include the
>>>> > entire site, members of particular groups, and the public.
>>>> >
>>>> > Items at any level may be made public, so the question ("is  
>>>> this item
>>>> > public?") must be asked for virtually every item in a site.
>>>> > Determining whether resources and folders are open to the  
>>>> public is
>>>> > relatively expensive because it must be repeated so often.  If  
>>>> a site
>>>> > contains lots of resources, the expanded view can results in  
>>>> dozens,
>>>> > hundreds or thousands of queries.  The only short cut at this  
>>>> time is
>>>> > that the query can be avoided for items that are inside a  
>>>> folder that
>>>> > is public since they will also be public.
>>>> >
>>>> > The SQL query is against the various realms tables.  It asks  
>>>> whether
>>>> > the anonymous role has the "content.read" permission for the  
>>>> realm
>>>> > associated with a particular resource or folder (or for the  
>>>> realm of
>>>> > any folder that contains the item).
>>>> >
>>>> > We could improve on that if the SecurityService (or
>>>> > AuthzGroupService?) interface had a method to ask a different
>>>> > question: Are there any realms within this site for which the
>>>> > anonymous role has "content.read" permission (and if so, what  
>>>> are they)?
>>>> >
>>>> > This query would likely be implemented with a SQL "like"  
>>>> operator and
>>>> > a one-sided wildcard. This is actually a join of information from
>>>> > three or four tables, but the query would be asking for a list of
>>>> > sakai_realm.realm_id values like '/content/group/<side-id>/ 
>>>> %' (or like
>>>> > '/content/group-user/<user-id>/%' for workspaces, and it's  
>>>> possible
>>>> > there are other variations) for which there are entries in the
>>>> > sakai_realm_rl_fn table where the role_key identifies the  
>>>> anonymous
>>>> > role and the function_key corresponds to "content.read". The  
>>>> result
>>>> > would be a list of realm identifiers.
>>>> >
>>>> > That would allow Content Hosting to get a list of realms  
>>>> (possibly
>>>> > empty) for items that are public.  Content Hosting could then  
>>>> avoid
>>>> > making dozens or hundreds or thousands of individual queries by
>>>> > checking that list of public realms.  An alternative to adding a
>>>> > specific API method for this would be to execute that query in  
>>>> the
>>>> > SecurityService and do some form of caching so the Security  
>>>> Service
>>>> > could answer this question hundreds of times without doing  
>>>> hundreds of
>>>> > individual queries.
>>>> >
>>>> > Thoughts?
>>>> >
>>>> > Jim
>>>> >
>>>>
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>>>  with a subject of "unsubscribe"
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090923/afbf4e73/attachment.html 


More information about the sakai-dev mailing list