[Building Sakai] Proposed API change to reduce cost of resources list view

Drew Zhu qszhu at umich.edu
Tue Sep 22 11:16:38 PDT 2009


In Oracle, the one end wildcard will be better or the same, depending 
other factors. For example, the number of rows (should the percentage of 
the rows). If that "LIKE" alone is very selective, the optimizer will 
choose to use the index. If the that LIKE alone is not very selective, 
whether or not full table scan will depend on the selectivities of other 
conditions in the where clauses. It depends on the data distribution in 
the tables.
The following are the two examples for the same query with different 
contents. One is doing a full table scan and one is using an index.

SQL> SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE '/group/%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1391154514

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost 
(%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  3954K|  4846M|   239K  
(1)| 00:55:53 |
|*  1 |  TABLE ACCESS FULL| CONTENT_RESOURCE |  3954K|  4846M|   239K  
(1)| 00:55:53 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RESOURCE_ID" LIKE '/group/%')


1* SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE '/admin/%';

Execution Plan
----------------------------------------------------------
Plan hash value: 3978719898

------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                   | Rows  | 
Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                        |     1 |  
1285 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CONTENT_RESOURCE       |     1 |  
1285 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | CONTENT_RESOURCE_INDEX |     1 
|       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("RESOURCE_ID" LIKE '/admin/%')
       filter("RESOURCE_ID" LIKE '/admin/%')

I did not go further on the stats related data in our production and I 
think the above query plan should be clear.

Thanks,
Drew


Stephen Marquard wrote:
> Happily that does seem to be the case for mysql as well (which is usually worse than Oracle in index-related optimizations), e.g.
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE '/group/%';
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+--------+-------------+
> | id | select_type | table            | type  | possible_keys          | key                    | key_len | ref  | rows   | Extra       |
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+--------+-------------+
> |  1 | SIMPLE      | CONTENT_RESOURCE | range | CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL | 375212 | Using where | 
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+--------+-------------+
> 1 row in set (0.00 sec)
>
> mysql> explain SELECT * FROM CONTENT_RESOURCE WHERE RESOURCE_ID LIKE '/group/!admin/%';
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+------+-------------+
> | id | select_type | table            | type  | possible_keys          | key                    | key_len | ref  | rows | Extra       |
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+------+-------------+
> |  1 | SIMPLE      | CONTENT_RESOURCE | range | CONTENT_RESOURCE_INDEX | CONTENT_RESOURCE_INDEX | 767     | NULL |   72 | Using where | 
> +----+-------------+------------------+-------+------------------------+------------------------+---------+------+------+-------------+
> 1 row in set (0.46 sec)
>
> Cheers
> Stephen
>  
>   
>>>> "Holladay, Bryan Andrew" <bahollad at indiana.edu> 9/22/2009 7:48 PM >>> 
>>>>         
> Right sided like searches are still able to use the DBMS's indexes.  They are still really quick.  So I think this is a good idea.
>
> Bryan
>
>
> On 9/22/09 1:43 PM, "Mark Norton" <markjnorton at earthlink.net> wrote:
>
> My understanding is that SQL statements involving "like" are not very
> efficient, either.  Are you trading one problem for another?
>
> - Mark
>
> Jim Eng wrote:
>   
>> The list view in the Resources tool in Sakai 2.x has a column to show
>> who can see each resource or folder. Potential values include the
>> entire site, members of particular groups, and the public.
>>
>> Items at any level may be made public, so the question ("is this item
>> public?") must be asked for virtually every item in a site.
>> Determining whether resources and folders are open to the public is
>> relatively expensive because it must be repeated so often.  If a site
>> contains lots of resources, the expanded view can results in dozens,
>> hundreds or thousands of queries.  The only short cut at this time is
>> that the query can be avoided for items that are inside a folder that
>> is public since they will also be public.
>>
>> The SQL query is against the various realms tables.  It asks whether
>> the anonymous role has the "content.read" permission for the realm
>> associated with a particular resource or folder (or for the realm of
>> any folder that contains the item).
>>
>> We could improve on that if the SecurityService (or
>> AuthzGroupService?) interface had a method to ask a different
>> question: Are there any realms within this site for which the
>> anonymous role has "content.read" permission (and if so, what are they)?
>>
>> This query would likely be implemented with a SQL "like" operator and
>> a one-sided wildcard. This is actually a join of information from
>> three or four tables, but the query would be asking for a list of
>> sakai_realm.realm_id values like '/content/group/<side-id>/%' (or like
>> '/content/group-user/<user-id>/%' for workspaces, and it's possible
>> there are other variations) for which there are entries in the
>> sakai_realm_rl_fn table where the role_key identifies the anonymous
>> role and the function_key corresponds to "content.read". The result
>> would be a list of realm identifiers.
>>
>> That would allow Content Hosting to get a list of realms (possibly
>> empty) for items that are public.  Content Hosting could then avoid
>> making dozens or hundreds or thousands of individual queries by
>> checking that list of public realms.  An alternative to adding a
>> specific API method for this would be to execute that query in the
>> SecurityService and do some form of caching so the Security Service
>> could answer this question hundreds of times without doing hundreds of
>> individual queries.
>>
>> Thoughts?
>>
>> Jim
>>
>>     
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>
>
>   



More information about the sakai-dev mailing list