[Building Sakai] TinyURL Service Impl Issue

Steve Swinsburg s.swinsburg at lancaster.ac.uk
Mon Jul 6 09:44:54 PDT 2009


Carl,

Sounds good, is there an existing LCG Java implementation we can  
leverage or should we roll our own?

cheers,
Steve


On 6 Jul 2009, at 12:34, Carl Hall wrote:

> Summary: There is an ever increasing probability that the collision of
> IDs in the TinyURL service will slow down the service considerably.
>
> In looking at the implementation of the TinyURL service, I've noticed
> something about the way URLs are generated.  If I'm headed in the
> wrong direction with this, please correct me.
>
> The very general approach is:
> 1) generate a random ID
> 2) check that the ID doesn't already exist
> 3) if exists, go to 1.  if not, record the ID and release it to the  
> caller.
>
> The following code for generating a URL:
>
> Random generator = new Random();
> int seed = Math.abs(generator.nextInt());
>
> has an issue with collision which is addressed by checking for the
> existence of that ID before handing it out but the algorithm run time
> will continue to grow because of the growing chance of collision over
> time.
>
> Explanation of concern:
> The first generated ID will have a 0/n chance of collision where n is
> the total number of possible IDs.  The next ID has a 1/n chance, the
> next 2/n, etc up to the last having a chance of collision being n-1/n.
> Considering n needs to be sufficiently large to have a low to zero
> rate of duplication over time, this check will take more time because
> of the number of collisions over time.  Assuming that the ID column in
> the database is indexed, the lookup time is minimized but there's
> still a call to the database lookup on each generated ID.  As the
> number of IDs generated reaches the max possible, there are possibly
> n-1 lookups before a valid ID is found.
>
> The current code recreates the Random each time which stops nextInt()
> from knowing history (ie. new random seed on each generation).  Java
> implements a linear congruential generator to generate random numbers
> which requires knowledge of the previously generated ID to be
> effective.  Math.random() creates the random seed once and reuses it
> for the life of the JVM process which is a better approach.
>
> I would suggest creating an LCG in the TinyURL service so that the
> proper range can be easily managed (better performance due to a
> narrower scope than Math.random()) and the seed can be restored after
> a restart.  I have created a prototype in Python based on the rules
> laid out in the Wikipedia article on LCG's [1] and using a
> sufficiently large prime number.  This still carries a concern of
> generating IDs across a cluster but is relatively easy to address.
>
> WDYT?
>
> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"



More information about the sakai-dev mailing list