[Building Sakai] TinyURL Service Impl Issue

Jon Gorrono jpgorrono at ucdavis.edu
Fri Jul 17 18:56:06 PDT 2009


I'm sort of reeling over this.... are you sure?

 unaccustomed and unprepared for this kind of response...

:)



On Wed, Jul 15, 2009 at 2:51 AM, Steve
Swinsburg<s.swinsburg at lancaster.ac.uk> wrote:
> Good idea, I'll add that as an additional method where you specify the TTL
> before the URL is no longer resolved.
>
>
> cheers,
> Steve
>
> ---
> Steve Swinsburg
> Portal Systems Developer
> Centre for e-Science
> Lancaster University
> Lancaster
> LA1 4YT
>
> email: s.swinsburg at lancaster.ac.uk
> phone: +44 (0) 1524 594870
>
>
>
>
>
>
>
> On 15 Jul 2009, at 04:27, Jon Gorrono wrote:
>
>> I have been a little curious why timeboxing is not used to decrease
>> collisions for this service.... we use the service in a custom guest
>> access app we made to put registration urls etc in emails....
>> obviously we want to timebox access to the urls but a ttl can be
>> xtremely effective in reducing an otherwise skewed data set to
>> something resembling white noise... as it give a small window view of
>> a larger distribution
>>
>> Maybe an api call could change the ttl and return the probability of
>> collision during that time or some other silliness
>>
>>
>>
>> On Wed, Jul 8, 2009 at 2:52 PM, Steve
>> Swinsburg<s.swinsburg at lancaster.ac.uk> wrote:
>>>
>>> Ok I've added an alternative method to return a longer random id which
>>> makes it less guessable. The number of permutations with this combo
>>> has 30 zeroes (on a-z and 0-9).
>>>
>>> Have some fun with the calculator! http://bit.ly/gJP1X
>>>
>>> cheers,
>>> Steve
>>>
>>>
>>>
>>> On 8 Jul 2009, at 17:25, antranig at caret.cam.ac.uk wrote:
>>>
>>>> The issue with collisions occurs when you are hosting URLs which are
>>>> not protected by the container, but held by some external resource -
>>>> for example, a signup system "click here to validate your email to
>>>> Xxxxx service". The implementation I made actually runs in two modes,
>>>> you may request either "SECURE" or "SHORT" URLs by an option to the
>>>> API. The short URLs start at only a couple of characters and
>>>> are indeed sequential.
>>>> Anyway, feel free to implement your own system, it is the Sakai way
>>>> after all :)
>>>>
>>>> Cheers,
>>>> Antranig.
>>>>
>>>> Quoting Steve Swinsburg <s.swinsburg at lancaster.ac.uk>:
>>>>
>>>>> The security is handled in the container anyway so you get a
>>>>> shortened
>>>>> URL and it resolves into the full URL, redirects and Sakai takes
>>>>> over.
>>>>> Even if it was sequential it wouldn't be a security concern.
>>>>>
>>>>> With a keyspace of lowercase letters and numbers(36 chars) there are
>>>>> about 2 billion combinations, with uppercase as well there's 50
>>>>> billion. Info here on the theoretical limits:
>>>>> http://bit.ly/3QG2uj
>>>>>
>>>>> Using RandomStringUtils from commons-lang, I wrote a short method to
>>>>> iterate over a million alphanumeric combinations using letters and
>>>>> numbers of length 6 chars and didn't run into a single collision.
>>>>> Using this method and a transactional check when adding into the
>>>>> database is probably the way to go.
>>>>>
>>>>> We can also filter for bad words, BBC has an interesting list (thanks
>>>>> Seth! :)
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>>>>
>>>>>> Is there a reason why the id generation could be more like tinyURL
>>>>>> and
>>>>>> simply be sequential and thus very short (at least initially and
>>>>>> probably for most institutions it would stay small).
>>>>>>
>>>>>> Just a thought.
>>>>>> -AZ
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm at umich.edu>
>>>>>> wrote:
>>>>>>>
>>>>>>> You could consider using 22 character uuids as autogenerated tinies
>>>>>>> in
>>>>>>> the url. It might not be able to be called tinyurl anymore, but it
>>>>>>> would at least be short, linkable and (much more) collision free.
>>>>>>>
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>>>>
>>>>>>> I'm not convinced it would be necessary to keep these id's from
>>>>>>> being
>>>>>>> unpredictable. The current security model should prevent anyone
>>>>>>> from
>>>>>>> accessing content they don't have permission to even guessing
>>>>>>> id's. I
>>>>>>> think eventually when a UI was designed for this it might even want
>>>>>>> to
>>>>>>> allow the user to specify a custom ID, similar to how the Site
>>>>>>> Aliases
>>>>>>> (on new site creation) works, allowing a user to specify anything
>>>>>>> they
>>>>>>> want for their site alias. (Like tinyurl.com) So this would almost
>>>>>>> guarantee collisions.
>>>>>>>
>>>>>>> So I wouldn't think that the possibility of a collision on an low
>>>>>>> use
>>>>>>> feature for an indexed column (when collisions will probably be
>>>>>>> featured in eventually anyway) likely won't be a big deal.
>>>>>>>
>>>>>>> -Matthew
>>>>>>>
>>>>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig at caret.cam.ac.uk> wrote:
>>>>>>>>
>>>>>>>> Please take a look at the comments at the head of the
>>>>>>>> "EighteenIDGenerator"
>>>>>>>> file for some details of the maths involved in computing how many
>>>>>>>> documents will arise before a collision:
>>>>>>>>
>>>>>
>>>>
>>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>>>>
>>>>>>>> I wouldn't really recommend using a key space smaller than this
>>>>>>>> one
>>>>>>>> if you want to avoid collisions - although of course you will
>>>>>>>> reprobe
>>>>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>>>>> predictable -
>>>>>>>> this is why I use a SecureRandom here: One of the requirements of
>>>>>>>> the
>>>>>>>> TinyURLService that I made for the work at CARET was that the ID
>>>>>>>> sequence could not be predicted - to prevent the possibility of
>>>>>>>> malicious
>>>>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>>>>> designed
>>>>>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>>>>>> PHP
>>>>>>>> system, perhaps more risk than is present in a typical
>>>>>>>> application!)
>>>>>>>>
>>>>>>>> Note that a further issue you have with an LCG is running it in a
>>>>>>>> cluster -
>>>>>>>> even if you can lock on the particular instance you have, you
>>>>>>>> can't
>>>>>>>> guarantee that some other instance in the cluster has not
>>>>>>>> stepped on
>>>>>>>> past the slot you want. I think that the approach in my service
>>>>>>>> (a SecureRandom plus a transactional check into persistence) is
>>>>>>>> most
>>>>>>>> likely to work reasonably well in a cluster - although it hasn't
>>>>>>>> been
>>>>>>>> tested there :P
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Antranig.
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Steve Swinsburg <s.swinsburg at lancaster.ac.uk>:
>>>>>>>>
>>>>>>>>> Carl,
>>>>>>>>>
>>>>>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>>>>>> leverage or should we roll our own?
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>>>>
>>>>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>>>>> collision of
>>>>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>>>>> considerably.
>>>>>>>>>>
>>>>>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>>>>>> noticed
>>>>>>>>>> something about the way URLs are generated.  If I'm headed in
>>>>>>>>>> the
>>>>>>>>>> wrong direction with this, please correct me.
>>>>>>>>>>
>>>>>>>>>> The very general approach is:
>>>>>>>>>> 1) generate a random ID
>>>>>>>>>> 2) check that the ID doesn't already exist
>>>>>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>>>>>> the
>>>>>>>>>> caller.
>>>>>>>>>>
>>>>>>>>>> The following code for generating a URL:
>>>>>>>>>>
>>>>>>>>>> Random generator = new Random();
>>>>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>>>>
>>>>>>>>>> has an issue with collision which is addressed by checking for
>>>>>>>>>> the
>>>>>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>>>>>> time
>>>>>>>>>> will continue to grow because of the growing chance of collision
>>>>>>>>>> over
>>>>>>>>>> time.
>>>>>>>>>>
>>>>>>>>>> Explanation of concern:
>>>>>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>>>>>> n is
>>>>>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>>>>>> the
>>>>>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>>>>>> n-1/n.
>>>>>>>>>> Considering n needs to be sufficiently large to have a low to
>>>>>>>>>> zero
>>>>>>>>>> rate of duplication over time, this check will take more time
>>>>>>>>>> because
>>>>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>>>>> column in
>>>>>>>>>> the database is indexed, the lookup time is minimized but
>>>>>>>>>> there's
>>>>>>>>>> still a call to the database lookup on each generated ID.  As
>>>>>>>>>> the
>>>>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>>>>> possibly
>>>>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>>>>
>>>>>>>>>> The current code recreates the Random each time which stops
>>>>>>>>>> nextInt()
>>>>>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>>>>>> Java
>>>>>>>>>> implements a linear congruential generator to generate random
>>>>>>>>>> numbers
>>>>>>>>>> which requires knowledge of the previously generated ID to be
>>>>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>>>>> reuses it
>>>>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>>>>
>>>>>>>>>> I would suggest creating an LCG in the TinyURL service so that
>>>>>>>>>> the
>>>>>>>>>> proper range can be easily managed (better performance due to a
>>>>>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>>>>>> after
>>>>>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>>>>>> rules
>>>>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>>>>> sufficiently large prime number.  This still carries a concern
>>>>>>>>>> of
>>>>>>>>>> generating IDs across a cluster but is relatively easy to
>>>>>>>>>> address.
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>>>>> _______________________________________________
>>>>>>>>>> sakai-dev mailing list
>>>>>>>>>> sakai-dev at collab.sakaiproject.org
>>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>>
>>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>>>>>>>
>>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> sakai-dev mailing list
>>>>>>>>> sakai-dev at collab.sakaiproject.org
>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>
>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>>>>>>
>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sakai-dev mailing list
>>>>>>>> sakai-dev at collab.sakaiproject.org
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>
>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>>>>>
>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev at collab.sakaiproject.org
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>>>>
>>>>>>> with a subject of "unsubscribe"
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>>>>> Senior Research Engineer - CARET - University of Cambridge
>>>>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>>>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev at collab.sakaiproject.org
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>>> with a subject of "unsubscribe"
>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev at collab.sakaiproject.org
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to
>>>>> sakai-dev-unsubscribe at collab.sakaiproject.org
>>>>> with a subject of "unsubscribe"
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>>> "unsubscribe"
>>>
>>
>>
>>
>> --
>> Jon Gorrono
>> PGP Key: 5434509D
>>
>> email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>+++++++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<++++++++.>---.>>+.<<<----.-.>++.}
>> http{ats.ucdavis.edu}
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>
>



-- 
Jon Gorrono
PGP Key: 5434509D
email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>+++++++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<++++++++.>---.>>+.<<<----.-.>++.}
http{ats.ucdavis.edu}

Sent from Davis, CA, United States


More information about the sakai-dev mailing list