[Building Sakai] Culling duplicates from SAM_ITEMGRADING_T

Karen Tsao ktsao at stanford.edu
Wed Mar 27 14:41:30 PDT 2013


Hi Josh,

This sql script is not prepared by me and Stanford doesn't have this issue.
I am afraid to give you some answer that is wrong... Maybe it's better to
directly contact the developers who contributes the script?

Thanks,
Karen

On Tue, Mar 26, 2013 at 12:07 PM, Joshua Swink <joshua.swink at gmail.com>wrote:

>
> Hi Karen. Thanks for your response.
>
> On Thu, Mar 21, 2013 at 3:05 PM, Karen Tsao <ktsao at stanford.edu> wrote:
>
>> Hi Jushua,
>>
>> The reason the script only cleans up null SUBMITTEDDATE records is we can
>> successfully reproduce duplicates with null SUBMITTEDDATE (SAM-758). If you
>> have duplicates that have not null SUBMITTEDDATE, these must be created in
>> other flow which we haven't found. Are these old data or some are lately
>> created?
>>
>
> These items span from the time we started using Sakai - in 2005 - to the
> present. We are using the rSmart CLE, but I'm not aware of any changes they
> may have made.
>
>
>>
>> Are these SUBMITTEDDATE different? How about other fields, eg, score? If
>> they are all the same except the SUBMITTEDDATE, I think it should be ok to
>> remove the earliest duplicates. But, I am not 100 % confident. Maybe you
>> can try this in your QA instance first?
>>
>>
> Typically I see that the SUBMITTEDDATEs are the same. PUBLISHEDANSWERID is
> null except in the one with the highest ITEMGRADINGID. AUTOSCORE is 1 in
> the one withthe highest ITEMGRADINGID, and 0 in the others.
>
>
>> Not sure if there are other schools encountered this issue. If yes, hope
>> they can give you some feedback.
>>
>>
> Additionally, even with after the multiple not-null SUBMITTEDDATE items
> are removed, I still can't create the index. It turns out that by
> restricting the query to those that correspond to a published item with
> TYPEID 1 and 4, many duplicates are missed. This is the original query:
>
> SELECT g.PUBLISHEDITEMID, g.AGENTID, g.ASSESSMENTGRADINGID,
> g.PUBLISHEDITEMTEXTID
> FROM SAM_ITEMGRADING_T g, sam_publisheditem_t i where g.publisheditemid =
> i.itemid and (i.typeid='1' or i.typeid='4')
> GROUP BY g.PUBLISHEDITEMID, g.AGENTID, g.ASSESSMENTGRADINGID,
> g.PUBLISHEDITEMTEXTID HAVING COUNT(*) > 1
>
> And after taking out the TYPEID requirement:
>
> SELECT g.PUBLISHEDITEMID, g.AGENTID, g.ASSESSMENTGRADINGID,
> g.PUBLISHEDITEMTEXTID
> FROM SAM_ITEMGRADING_T g
> GROUP BY g.PUBLISHEDITEMID, g.AGENTID, g.ASSESSMENTGRADINGID,
> g.PUBLISHEDITEMTEXTID HAVING COUNT(*) > 1
>
> First, I removed all duplicates that were identified by the initial query
> (which selects types 1 and 4). Then I run this query that doesn't depend on
> the type, and there are over 20,000 results. Many of these items have up to
> 10 duplicates.
>
> Removing the duplicates isn't much of a problem, I'm just wondering what
> the risk is.
>
> Josh
>
>
>
>
>> Thanks,
>> Karen
>>
>> The script does include a commented-out statement for those particular
>>> entries:
>>>
>>>  delete from sam_itemgrading_t where itemgradingid in (
>>>         select MIN(g.itemgradingid) from sam_itemgrading_t g where
>>>         g.agentid = r.agentid and
>>>         g.publisheditemid = r.publisheditemid and
>>>         g.assessmentgradingid = r.assessmentgradingid and
>>>         g.publisheditemtextid = r.publisheditemtextid
>>>         GROUP BY g.PUBLISHEDITEMID, g.AGENTID, g.ASSESSMENTGRADINGID,
>>> g.PUBLISHEDITEMTEXTID
>>>
>>> This seems pretty reasonable, to delete the earliest submission.
>>>
>>> Josh
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>>> "unsubscribe"
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130327/963f4b11/attachment.html 


More information about the sakai-dev mailing list