[Building Sakai] Proposed API change to reduce cost of resources list view
Jim Eng
jimeng at umich.edu
Tue Sep 22 12:21:29 PDT 2009
It looks like MySQL handled this well in both cases while Oracle
handled one well and more or less failed on the other, resulting in a
full-table scan. Is that right?
If I understood Drew's message, Oracle used the index for '/admin/%'
and did a full table scan for '/group/%'. Is that right?
It seems like '/admin/%' would match very few rows and '/group/%'
would match many many rows (possibly an overwhelming majority?).
Jim
On Sep 22, 2009, at 2:15 PM, Holladay, Bryan Andrew wrote:
> Are you arguing that calling one like select (which still uses the
> index) is slower than individually calling (potentially 1000’s)
> multiple rows one at a time?
>
> Bryan
>
>
> On 9/22/09 1:57 PM, "Stephen Marquard" <stephen.marquard at uct.ac.za>
> wrote:
>
> Happily that does seem to be the case for mysql as well (which is
> usually worse than Oracle in index-related optimizations), e.g.
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE
> '/group/%';
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +--------+-------------+
> | id | select_type | table | type |
> possible_keys | key | key_len | ref |
> rows | Extra |
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +--------+-------------+
> | 1 | SIMPLE | CONTENT_RESOURCE | range |
> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767 | NULL |
> 375212 | Using where |
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +--------+-------------+
> 1 row in set (0.00 sec)
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE
> '/group/!admin/%';
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +------+-------------+
> | id | select_type | table | type |
> possible_keys | key | key_len | ref |
> rows | Extra |
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +------+-------------+
> | 1 | SIMPLE | CONTENT_RESOURCE | range |
> CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767 | NULL |
> 72 | Using where |
> +----+-------------+------------------+-------
> +------------------------+------------------------+---------+------
> +------+-------------+
> 1 row in set (0.46 sec)
>
> Cheers
> Stephen
>
> >>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48
> PM >>>
> Right sided like searches are still able to use the DBMS's indexes.
> They are still really quick. So I think this is a good idea.
>
> Bryan
>
>
> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>
> My understanding is that SQL statements involving "like" are not very
> efficient, either. Are you trading one problem for another?
>
> - Mark
>
> Jim Eng wrote:
> > The list view in the Resources tool in Sakai 2.x has a column to
> show
> > who can see each resource or folder. Potential values include the
> > entire site, members of particular groups, and the public.
> >
> > Items at any level may be made public, so the question ("is this
> item
> > public?") must be asked for virtually every item in a site.
> > Determining whether resources and folders are open to the public is
> > relatively expensive because it must be repeated so often. If a
> site
> > contains lots of resources, the expanded view can results in dozens,
> > hundreds or thousands of queries. The only short cut at this time
> is
> > that the query can be avoided for items that are inside a folder
> that
> > is public since they will also be public.
> >
> > The SQL query is against the various realms tables. It asks whether
> > the anonymous role has the "content.read" permission for the realm
> > associated with a particular resource or folder (or for the realm of
> > any folder that contains the item).
> >
> > We could improve on that if the SecurityService (or
> > AuthzGroupService?) interface had a method to ask a different
> > question: Are there any realms within this site for which the
> > anonymous role has "content.read" permission (and if so, what are
> they)?
> >
> > This query would likely be implemented with a SQL "like" operator
> and
> > a one-sided wildcard. This is actually a join of information from
> > three or four tables, but the query would be asking for a list of
> > sakai_realm.realm_id values like '/content/group/<side-id>/%' (or
> like
> > '/content/group-user/<user-id>/%' for workspaces, and it's possible
> > there are other variations) for which there are entries in the
> > sakai_realm_rl_fn table where the role_key identifies the anonymous
> > role and the function_key corresponds to "content.read". The result
> > would be a list of realm identifiers.
> >
> > That would allow Content Hosting to get a list of realms (possibly
> > empty) for items that are public. Content Hosting could then avoid
> > making dozens or hundreds or thousands of individual queries by
> > checking that list of public realms. An alternative to adding a
> > specific API method for this would be to execute that query in the
> > SecurityService and do some form of caching so the Security Service
> > could answer this question hundreds of times without doing
> hundreds of
> > individual queries.
> >
> > Thoughts?
> >
> > Jim
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20090922/7addb383/attachment.html
More information about the sakai-dev
mailing list