[Building Sakai] PDF uploads to Resources
Steve Swinsburg
steve.swinsburg at gmail.com
Thu Sep 2 18:06:09 PDT 2010
To fix these I would suggest a Quartz job that finds all the affected resources, reads them out, updates the type then saves them again, all using the ContentHostingService API. Of course if it's only a handful it could be done manually.
Cheers,
Steve
Sent from my iPhone
On 03/09/2010, at 3:47, Omer Piperdi <omer at rice.edu> wrote:
> I modified the query a little.. Here is what I came up with..
>
> SELECT a.resource_id, a.file_path, a.xml,
> a.resource_uuid, a.binary_entity
> FROM content_resource a
> where upper(a.resource_id) like upper('%.pdf%')
> and upper(a.resource_id) not like upper('%http:%')
> and a.xml is null
> and dbms_lob.getlength(a.binary_entity) < 2000
> and
> utl_raw.cast_to_varchar2(hextoraw(dbms_lob.substr(a.binary_entity)))
> like '%text/url%'
>
> Thanks
> Omer
>
> On 9/2/2010 12:23 PM, Omer Piperdi wrote:
>>
>> The query is good for the time being to see how many people having this
>> issue.. (But I saw "ORA-06502: PL/SQL: numeric or value error: raw
>> variable length too long", when I ran the query for all pdfs)
>>
>> We also have pdfs uploaded as application/binary as well.. But it prompt
>> to choose a program to open at least.. text/url is
>> throwing 404.
>>
>> Thanks again,
>> Omer
>>
>> On 9/2/2010 11:54 AM, Matthew Jones wrote:
>>> Well there are queries you can run to *see*, but there's no straight SQL
>>> you can run to modify it. You'd have to write some code to do it.
>>> Unfortunately this data is encoded in a binary field (rather than text),
>>> which makes it faster to process but not possible to modify. This is in
>>> BINARY_ENTITY in CONTENT_RESOURCE. I wrote about this last February [1]
>>> and provided a query for Oracle. I don't know what it would be for
>>> Mysql. You can't run a string replace on this field because the length
>>> of each element is encoded within the string (with unprintable
>>> characters) and if anything is changed to be longer/shorter it will
>>> break when it reads it back. So you'd actually need to read
>>> to de-serialize it with code and write it back. The java that does this
>>> is in this link in DbContentService.java if you wanted to try, or it
>>> could probably also be done in some scripted language.
>>>
>>> An example of binary to text for a pdf file is below . . . Unfortunately
>>> for this file it thinks that this was uploaded as application/binary as
>>> well, instead of pdf.
>>>
>>> CHSBRE
>>> B/group/b11a03c0-b1a5-40d2-8617-63ad4aa968e9e/*LIS Tracking
>>> Sheet.pdf*)org.sakaiproject.content.types.fileUpload inherited����
>>> d e'http://purl.org/dc/elements/1.1/creatore DAV:getlastmodified
>>> 20100324000103169e DAV:get*contenttype application/binary*e
>>> SAKAI:content_priority
>>> 2e'http://purl.org/dc/elements/1.1/subjecte)http://purl.org/dc/elements/1.1/publishere!http://purl.org/dc/terms/abstracte+http://purl.org/dc/elements/1.1/alternativee
>>> CHEF:copyrightchoice I hold copyright.e
>>> CHEF:modifiedby$05d1fgf55-5qaw-4340-8f25-214a7e332097e!http://purl.org/dc/terms/audiencee
>>> DAV: . . .
>>>
>>> [1]
>>> http://collab.sakaiproject..org/pipermail/sakai-dev/2010-February/005709.html
>>> <http://collab.sakaiproject.org/pipermail/sakai-dev/2010-February/005709.html>
>>>
>>> On Thu, Sep 2, 2010 at 11:39 AM, Omer Piperdi<omer at rice.edu
>>> <mailto:omer at rice.edu>> wrote:
>>>
>>> Is there a query that I can run against content_resource table and
>>> see if resource_id has .pdf in it and resource type is not
>>> application/pdf.
>>>
>>> Which column has file type info?
>>>
>>> Thanks
>>> Omer
>>>
>>>
>>> On 9/1/2010 5:07 PM, Matthew Jones wrote:
>>>
>>> Yea, it's a bug with firefox on some platforms, there is
>>> currently no
>>> fix for Sakai.
>>>
>>> There was a jira proposed (KNL-101) to use a file type detection
>>> library
>>> (like mime-util). However it *looked* like it involved changing some
>>> api's in the kernel, and I haven't finished fixing it yet. It's
>>> hopefully get to it to looking at it again before the 2.8
>>> freeze, but
>>> have a number of higher local priorities before then. :(
>>>
>>> -Matthew
>>>
>>> On Wed, Sep 1, 2010 at 5:54 PM, Omer Piperdi<omer at rice.edu
>>> <mailto:omer at rice.edu>
>>> <mailto:omer at rice.edu<mailto:omer at rice.edu>>> wrote:
>>>
>>> We have seen pdf uploads to Resources creates file type as
>>> text/url,
>>> instead of application/pdf, which is causing the users not
>>> able to open
>>> the file..
>>>
>>> We upgraded our Sakai Kernel to 1.1.9 and running 2.7.x
>>> branch.. This is
>>> happening mostly on a MAC with Firefox.
>>>
>>> Anyone seen this or any pointer for JIRA?
>>>
>>> Thanks
>>> Omer
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> <mailto:sakai-dev at collab.sakaiproject.org>
>>> <mailto:sakai-dev at collab.sakaiproject.org
>>> <mailto:sakai-dev at collab.sakaiproject.org>>
>>>
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>> <mailto:sakai-dev-unsubscribe at collab.sakaiproject.org>
>>> <mailto:sakai-dev-unsubscribe at collab.sakaiproject.org
>>> <mailto:sakai-dev-unsubscribe at collab.sakaiproject.org>> with a
>>> subject of "unsubscribe"
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>>
>> !DSPAM:2294,4c7fddbb185366261963365!
>>
>>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
More information about the sakai-dev
mailing list