[Building Sakai] Proposed API change to reduce cost of resources list view

Jim Eng jimeng at umich.edu
Tue Sep 22 12:21:29 PDT 2009


It looks like MySQL handled this well in both cases while Oracle  
handled one well and more or less failed on the other, resulting in a  
full-table scan.  Is that right?

If I understood Drew's message, Oracle used the index for '/admin/%'  
and did a full table scan for '/group/%'.  Is that right?

It seems like '/admin/%' would match very few rows and '/group/%'  
would match many many rows (possibly an overwhelming majority?).

Jim


On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:

> Are you arguing that calling one like select (which still uses the  
> index) is slower than individually calling (potentially 1000’s)  
> multiple rows one at a time?
>
> Bryan
>
>
> On 9/22/09 1:57 PM, "Stephen Marquard" <stephen.marquard at uct.ac.za>  
> wrote:
>
> Happily that does seem to be the case for mysql as well (which is  
> usually worse than Oracle in index-related optimizations), e.g.
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE  
> '/group/%';
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +--------+-------------+
> | id | select_type | table            | type  |  
> possible_keys          | key                    | key_len | ref  |  
> rows   | Extra       |
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +--------+-------------+
> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL |  
> 375212 | Using where |
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +--------+-------------+
> 1 row in set (0.00 sec)
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE  
> '/group/!admin/%';
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +------+-------------+
> | id | select_type | table            | type  |  
> possible_keys          | key                    | key_len | ref  |  
> rows | Extra       |
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +------+-------------+
> |  1 | SIMPLE      | CONTENT_RESOURCE | range |  
> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL |    
> 72 | Using where |
> +----+-------------+------------------+------- 
> +------------------------+------------------------+---------+------ 
> +------+-------------+
> 1 row in set (0.46 sec)
>
> Cheers
> Stephen
>
> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48  
> PM >>>
> Right sided like searches are still able to use the DBMS's indexes.   
> They are still really quick.  So I think this is a good idea.
>
> Bryan
>
>
> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>
> My understanding is that SQL statements involving "like" are not very
> efficient, either.  Are you trading one problem for another?
>
> - Mark
>
> Jim Eng wrote:
> > The list view in the Resources tool in Sakai 2.x has a column to  
> show
> > who can see each resource or folder. Potential values include the
> > entire site, members of particular groups, and the public.
> >
> > Items at any level may be made public, so the question ("is this  
> item
> > public?") must be asked for virtually every item in a site.
> > Determining whether resources and folders are open to the public is
> > relatively expensive because it must be repeated so often.  If a  
> site
> > contains lots of resources, the expanded view can results in dozens,
> > hundreds or thousands of queries.  The only short cut at this time  
> is
> > that the query can be avoided for items that are inside a folder  
> that
> > is public since they will also be public.
> >
> > The SQL query is against the various realms tables.  It asks whether
> > the anonymous role has the "content.read" permission for the realm
> > associated with a particular resource or folder (or for the realm of
> > any folder that contains the item).
> >
> > We could improve on that if the SecurityService (or
> > AuthzGroupService?) interface had a method to ask a different
> > question: Are there any realms within this site for which the
> > anonymous role has "content.read" permission (and if so, what are  
> they)?
> >
> > This query would likely be implemented with a SQL "like" operator  
> and
> > a one-sided wildcard. This is actually a join of information from
> > three or four tables, but the query would be asking for a list of
> > sakai_realm.realm_id values like '/content/group/<side-id>/%' (or  
> like
> > '/content/group-user/<user-id>/%' for workspaces, and it's possible
> > there are other variations) for which there are entries in the
> > sakai_realm_rl_fn table where the role_key identifies the anonymous
> > role and the function_key corresponds to "content.read". The result
> > would be a list of realm identifiers.
> >
> > That would allow Content Hosting to get a list of realms (possibly
> > empty) for items that are public.  Content Hosting could then avoid
> > making dozens or hundreds or thousands of individual queries by
> > checking that list of public realms.  An alternative to adding a
> > specific API method for this would be to execute that query in the
> > SecurityService and do some form of caching so the Security Service
> > could answer this question hundreds of times without doing  
> hundreds of
> > individual queries.
> >
> > Thoughts?
> >
> > Jim
> >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090922/7addb383/attachment.html 


More information about the sakai-dev mailing list