[Building Sakai] Proposed API change to reduce cost of resources list view

Jim Eng jimeng at umich.edu
Tue Sep 22 14:39:00 PDT 2009


Drew did a little testing of this query in Oracle and found that it  
ran OK when it returned results.  He said it is hard to determine the  
performance when it finds no matching rows. It did use the index in  
both cases.

Maybe we will need to take the route Adi (and others) suggested --  
keeping a separate table for fast lookups of realms with public view  
enabled.  We would still let the realms tables handle actual  
permissions checks in the relatively rare cases where someone is  
trying to access a public resource, but for looking up info for the  
resources tool's list view, we could get it from another table.

To do that, we would need to add realm id's to a new table. I'll call  
the new table 'SAKAI_PUBLIC_CONTENT_REALMS' for now.  It might need  
only one column, the realm_id.  We would then need to keep that in  
sync with the realms table by adding realms id's to it whenever a  
resource is made public and removing a realm id whenever a resource is  
no longer public.

Problems would arise if someone edited the realm in the admin's realms  
tool and added or removed the .anon role for content.read in a realm  
without doing it through the resources tool.  Not sure how to prevent  
that.

To do this, we'd need to do a conversion to populate the new table.  I  
think the entire conversion would look something like this:

create table SAKAI_PUBLIC_CONTENT_REALMS (REALM_ID VARCHAR2(255) NOT  
NULL);

insert into SAKAI_PUBLIC_CONTENT_REALMS
   select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM
   where SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY
   and FUNCTION_KEY in (select FUNCTION_KEY from SAKAI_REALM_FUNCTION  
where FUNCTION_NAME = 'content.read')
   and ROLE_KEY in (select ROLE_KEY from SAKAI_REALM_ROLE where  
ROLE_NAME = '.anon');

That might not be exactly right, but I think it's close.  It could be  
done inside ContentHosting with no changes at all to authz.

I still think it would be better to use the existing realms tables, if  
that can be made more efficient, to avoid having to keep two tables in  
sync.

Jim


On Sep 22, 2009, at 3:56 PM, Jim Eng wrote:

> I think the query might look something like this:
>
> select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM where  
> SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and   
> SAKAI_REALM.REALM_ID LIKE '/content/group/ 
> 8da01d38-024a-4c1d-9184-7e798cf5837e/%' and FUNCTION_KEY in (select  
> FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCTION_NAME =  
> 'content.read')  and ROLE_KEY in (select ROLE_KEY from  
> SAKAI_REALM_ROLE where ROLE_NAME = '.anon');
>
> I have not yet been able to test this on Oracle or MySQL.  I'm  
> trying to get to a point where I can tell whether this is even close  
> to what we'd want.
>
> Jim
>
>
>
>
> On Sep 22, 2009, at 3:21 PM, Jim Eng wrote:
>
>> It looks like MySQL handled this well in both cases while Oracle  
>> handled one well and more or less failed on the other, resulting in  
>> a full-table scan.  Is that right?
>>
>> If I understood Drew's message, Oracle used the index for '/admin/ 
>> %' and did a full table scan for '/group/%'.  Is that right?
>>
>> It seems like '/admin/%' would match very few rows and '/group/%'  
>> would match many many rows (possibly an overwhelming majority?).
>>
>> Jim
>>
>>
>> On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:
>>
>>> Are you arguing that calling one like select (which still uses the  
>>> index) is slower than individually calling (potentially 1000’s)  
>>> multiple rows one at a time?
>>>
>>> Bryan
>>>
>>>
>>> On 9/22/09 1:57 PM, "Stephen Marquard"  
>>> <stephen.marquard at uct.ac.za> wrote:
>>>
>>> Happily that does seem to be the case for mysql as well (which is  
>>> usually worse than Oracle in index-related optimizations), e.g.
>>>
>>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>>> LIKE '/group/%';
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +--------+-------------+
>>> | id | select_type | table            | type  |  
>>> possible_keys          | key                    | key_len | ref  |  
>>> rows   | Extra       |
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +--------+-------------+
>>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL |  
>>> 375212 | Using where |
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +--------+-------------+
>>> 1 row in set (0.00 sec)
>>>
>>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID  
>>> LIKE '/group/!admin/%';
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +------+-------------+
>>> | id | select_type | table            | type  |  
>>> possible_keys          | key                    | key_len | ref  |  
>>> rows | Extra       |
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +------+-------------+
>>> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
>>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL  
>>> |   72 | Using where |
>>> +----+-------------+------------------+------- 
>>> +------------------------+------------------------+---------+------ 
>>> +------+-------------+
>>> 1 row in set (0.46 sec)
>>>
>>> Cheers
>>> Stephen
>>>
>>> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48  
>>> PM >>>
>>> Right sided like searches are still able to use the DBMS's  
>>> indexes.  They are still really quick.  So I think this is a good  
>>> idea.
>>>
>>> Bryan
>>>
>>>
>>> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>>>
>>> My understanding is that SQL statements involving "like" are not  
>>> very
>>> efficient, either.  Are you trading one problem for another?
>>>
>>> - Mark
>>>
>>> Jim Eng wrote:
>>> > The list view in the Resources tool in Sakai 2.x has a column to  
>>> show
>>> > who can see each resource or folder. Potential values include the
>>> > entire site, members of particular groups, and the public.
>>> >
>>> > Items at any level may be made public, so the question ("is this  
>>> item
>>> > public?") must be asked for virtually every item in a site.
>>> > Determining whether resources and folders are open to the public  
>>> is
>>> > relatively expensive because it must be repeated so often.  If a  
>>> site
>>> > contains lots of resources, the expanded view can results in  
>>> dozens,
>>> > hundreds or thousands of queries.  The only short cut at this  
>>> time is
>>> > that the query can be avoided for items that are inside a folder  
>>> that
>>> > is public since they will also be public.
>>> >
>>> > The SQL query is against the various realms tables.  It asks  
>>> whether
>>> > the anonymous role has the "content.read" permission for the realm
>>> > associated with a particular resource or folder (or for the  
>>> realm of
>>> > any folder that contains the item).
>>> >
>>> > We could improve on that if the SecurityService (or
>>> > AuthzGroupService?) interface had a method to ask a different
>>> > question: Are there any realms within this site for which the
>>> > anonymous role has "content.read" permission (and if so, what  
>>> are they)?
>>> >
>>> > This query would likely be implemented with a SQL "like"  
>>> operator and
>>> > a one-sided wildcard. This is actually a join of information from
>>> > three or four tables, but the query would be asking for a list of
>>> > sakai_realm.realm_id values like '/content/group/<side-id>/ 
>>> %' (or like
>>> > '/content/group-user/<user-id>/%' for workspaces, and it's  
>>> possible
>>> > there are other variations) for which there are entries in the
>>> > sakai_realm_rl_fn table where the role_key identifies the  
>>> anonymous
>>> > role and the function_key corresponds to "content.read". The  
>>> result
>>> > would be a list of realm identifiers.
>>> >
>>> > That would allow Content Hosting to get a list of realms (possibly
>>> > empty) for items that are public.  Content Hosting could then  
>>> avoid
>>> > making dozens or hundreds or thousands of individual queries by
>>> > checking that list of public realms.  An alternative to adding a
>>> > specific API method for this would be to execute that query in the
>>> > SecurityService and do some form of caching so the Security  
>>> Service
>>> > could answer this question hundreds of times without doing  
>>> hundreds of
>>> > individual queries.
>>> >
>>> > Thoughts?
>>> >
>>> > Jim
>>> >
>>>
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>>  with a subject of "unsubscribe"
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090922/0b9ab232/attachment.html 


More information about the sakai-dev mailing list