[Building Sakai] Proposed API change to reduce cost of resources list view
Jim Eng
jimeng at umich.edu
Tue Sep 22 12:56:47 PDT 2009
I think the query might look something like this:
select SAKAI_REALM.REALM_ID from SAKAI_REALM_RL_FN,SAKAI_REALM where
SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and
SAKAI_REALM.REALM_ID LIKE '/content/group/
8da01d38-024a-4c1d-9184-7e798cf5837e/%' and FUNCTION_KEY in (select
FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCTION_NAME =
'content.read') and ROLE_KEY in (select ROLE_KEY from
SAKAI_REALM_ROLE where ROLE_NAME = '.anon');
I have not yet been able to test this on Oracle or MySQL. I'm trying
to get to a point where I can tell whether this is even close to what
we'd want.
Jim
On Sep 22, 2009, at 3:21 PM, Jim Eng wrote:
> It looks like MySQL handled this well in both cases while Oracle
> handled one well and more or less failed on the other, resulting in
> a full-table scan. Is that right?
>
> If I understood Drew's message, Oracle used the index for '/admin/%'
> and did a full table scan for '/group/%'. Is that right?
>
> It seems like '/admin/%' would match very few rows and '/group/%'
> would match many many rows (possibly an overwhelming majority?).
>
> Jim
>
>
> On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:
>
>> Are you arguing that calling one like select (which still uses the
>> index) is slower than individually calling (potentially 1000’s)
>> multiple rows one at a time?
>>
>> Bryan
>>
>>
>> On 9/22/09 1:57 PM, "Stephen Marquard" <stephen.marquard at uct.ac.za>
>> wrote:
>>
>> Happily that does seem to be the case for mysql as well (which is
>> usually worse than Oracle in index-related optimizations), e.g.
>>
>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID
>> LIKE '/group/%';
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +--------+-------------+
>> | id | select_type | table | type |
>> possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +--------+-------------+
>> | 1 | SIMPLE | CONTENT_RESOURCE | range |
>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767 | NULL |
>> 375212 | Using where |
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +--------+-------------+
>> 1 row in set (0.00 sec)
>>
>> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID
>> LIKE '/group/!admin/%';
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +------+-------------+
>> | id | select_type | table | type |
>> possible_keys | key | key_len | ref |
>> rows | Extra |
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +------+-------------+
>> | 1 | SIMPLE | CONTENT_RESOURCE | range |
>> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767 | NULL
>> | 72 | Using where |
>> +----+-------------+------------------+-------
>> +------------------------+------------------------+---------+------
>> +------+-------------+
>> 1 row in set (0.46 sec)
>>
>> Cheers
>> Stephen
>>
>> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48
>> PM >>>
>> Right sided like searches are still able to use the DBMS's
>> indexes. They are still really quick. So I think this is a good
>> idea.
>>
>> Bryan
>>
>>
>> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>>
>> My understanding is that SQL statements involving "like" are not very
>> efficient, either. Are you trading one problem for another?
>>
>> - Mark
>>
>> Jim Eng wrote:
>> > The list view in the Resources tool in Sakai 2.x has a column to
>> show
>> > who can see each resource or folder. Potential values include the
>> > entire site, members of particular groups, and the public.
>> >
>> > Items at any level may be made public, so the question ("is this
>> item
>> > public?") must be asked for virtually every item in a site.
>> > Determining whether resources and folders are open to the public is
>> > relatively expensive because it must be repeated so often. If a
>> site
>> > contains lots of resources, the expanded view can results in
>> dozens,
>> > hundreds or thousands of queries. The only short cut at this
>> time is
>> > that the query can be avoided for items that are inside a folder
>> that
>> > is public since they will also be public.
>> >
>> > The SQL query is against the various realms tables. It asks
>> whether
>> > the anonymous role has the "content.read" permission for the realm
>> > associated with a particular resource or folder (or for the realm
>> of
>> > any folder that contains the item).
>> >
>> > We could improve on that if the SecurityService (or
>> > AuthzGroupService?) interface had a method to ask a different
>> > question: Are there any realms within this site for which the
>> > anonymous role has "content.read" permission (and if so, what are
>> they)?
>> >
>> > This query would likely be implemented with a SQL "like" operator
>> and
>> > a one-sided wildcard. This is actually a join of information from
>> > three or four tables, but the query would be asking for a list of
>> > sakai_realm.realm_id values like '/content/group/<side-id>/%' (or
>> like
>> > '/content/group-user/<user-id>/%' for workspaces, and it's possible
>> > there are other variations) for which there are entries in the
>> > sakai_realm_rl_fn table where the role_key identifies the anonymous
>> > role and the function_key corresponds to "content.read". The result
>> > would be a list of realm identifiers.
>> >
>> > That would allow Content Hosting to get a list of realms (possibly
>> > empty) for items that are public. Content Hosting could then avoid
>> > making dozens or hundreds or thousands of individual queries by
>> > checking that list of public realms. An alternative to adding a
>> > specific API method for this would be to execute that query in the
>> > SecurityService and do some form of caching so the Security Service
>> > could answer this question hundreds of times without doing
>> hundreds of
>> > individual queries.
>> >
>> > Thoughts?
>> >
>> > Jim
>> >
>>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org
> with a subject of "unsubscribe"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090922/81af2a68/attachment.html
More information about the sakai-dev
mailing list