[Building Sakai] Proposed API change to reduce cost of resources list view

Jim Eng jimeng at umich.edu
Tue Sep 22 12:56:47 PDT 2009


I think the query might look something like this:

select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM where  
SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and   
SAKAI_REALM.REALM_ID LIKE '/content/group/ 
8da01d38-024a-4c1d-9184-7e798cf5837e/%' and FUNCTION_KEY in (select  
FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCTION_NAME =  
'content.read')  and ROLE_KEY in (select ROLE_KEY from  
SAKAI_REALM_ROLE where ROLE_NAME = '.anon');

I have not yet been able to test this on Oracle or MySQL.  I'm trying  
to get to a point where I can tell whether this is even close to what  
we'd want.

Jim




On Sep 22, 2009, at 3:21 PM, Jim Eng wrote:

> It looks like MySQL handled this well in both cases while Oracle  
> handled one well and more or less failed on the other, resulting in  
> a full-table scan.  Is that right?
>
> If I understood Drew's message, Oracle used the index for '/admin/%'  
> and did a full table scan for '/group/%'.  Is that right?
>
> It seems like '/admin/%' would match very few rows and '/group/%'  
> would match many many rows (possibly an overwhelming majority?).
>
> Jim
>
>
> On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:
>
>> Are you arguing that calling one like select (which still uses the  
>> index) is slower than individually calling (potentially 1000’s)  
>> multiple rows one at a time?
>>
>> Bryan
>>
>>
>> On 9/22/09 1:57 PM, "Stephen Marquard" <stephen.marquard at uct.ac.za>  
>> wrote:
>>
>> Happily that does seem to be the case for mysql as well (which is  
>> usually worse than Oracle in index-related optimizations), e.g.
>>
>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>> LIKE '/group/%';
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +--------+-------------+
>> | id | select_type | table            | type  |  
>> possible_keys          | key                    | key_len | ref  |  
>> rows   | Extra       |
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +--------+-------------+
>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL |  
>> 375212 | Using where |
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +--------+-------------+
>> 1 row in set (0.00 sec)
>>
>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>> LIKE '/group/!admin/%';
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +------+-------------+
>> | id | select_type | table            | type  |  
>> possible_keys          | key                    | key_len | ref  |  
>> rows | Extra       |
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +------+-------------+
>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL  
>> |   72 | Using where |
>> +----+-------------+------------------+------- 
>> +------------------------+------------------------+---------+------ 
>> +------+-------------+
>> 1 row in set (0.46 sec)
>>
>> Cheers
>> Stephen
>>
>> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48  
>> PM >>>
>> Right sided like searches are still able to use the DBMS's  
>> indexes.  They are still really quick.  So I think this is a good  
>> idea.
>>
>> Bryan
>>
>>
>> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>>
>> My understanding is that SQL statements involving "like" are not very
>> efficient, either.  Are you trading one problem for another?
>>
>> - Mark
>>
>> Jim Eng wrote:
>> > The list view in the Resources tool in Sakai 2.x has a column to  
>> show
>> > who can see each resource or folder. Potential values include the
>> > entire site, members of particular groups, and the public.
>> >
>> > Items at any level may be made public, so the question ("is this  
>> item
>> > public?") must be asked for virtually every item in a site.
>> > Determining whether resources and folders are open to the public is
>> > relatively expensive because it must be repeated so often.  If a  
>> site
>> > contains lots of resources, the expanded view can results in  
>> dozens,
>> > hundreds or thousands of queries.  The only short cut at this  
>> time is
>> > that the query can be avoided for items that are inside a folder  
>> that
>> > is public since they will also be public.
>> >
>> > The SQL query is against the various realms tables.  It asks  
>> whether
>> > the anonymous role has the "content.read" permission for the realm
>> > associated with a particular resource or folder (or for the realm  
>> of
>> > any folder that contains the item).
>> >
>> > We could improve on that if the SecurityService (or
>> > AuthzGroupService?) interface had a method to ask a different
>> > question: Are there any realms within this site for which the
>> > anonymous role has "content.read" permission (and if so, what are  
>> they)?
>> >
>> > This query would likely be implemented with a SQL "like" operator  
>> and
>> > a one-sided wildcard. This is actually a join of information from
>> > three or four tables, but the query would be asking for a list of
>> > sakai_realm.realm_id values like '/content/group/<side-id>/%' (or  
>> like
>> > '/content/group-user/<user-id>/%' for workspaces, and it's possible
>> > there are other variations) for which there are entries in the
>> > sakai_realm_rl_fn table where the role_key identifies the anonymous
>> > role and the function_key corresponds to "content.read". The result
>> > would be a list of realm identifiers.
>> >
>> > That would allow Content Hosting to get a list of realms (possibly
>> > empty) for items that are public.  Content Hosting could then avoid
>> > making dozens or hundreds or thousands of individual queries by
>> > checking that list of public realms.  An alternative to adding a
>> > specific API method for this would be to execute that query in the
>> > SecurityService and do some form of caching so the Security Service
>> > could answer this question hundreds of times without doing  
>> hundreds of
>> > individual queries.
>> >
>> > Thoughts?
>> >
>> > Jim
>> >
>>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090922/81af2a68/attachment.html 


More information about the sakai-dev mailing list