[Building Sakai] radical approach to azg deadlocks

Hedrick Charles hedrick at rutgers.edu
Mon Oct 10 16:42:32 PDT 2011


I'm seriously thinking of doing a SELECT FOR UPDATE for the realm entry at the start of all the transactions in DbAuthzGroupService.java. I believe that will take care of all these problems. The cost, of course, it that it will lock out all permission checks for that realm. But deadlocks lock things for a lot longer.

However I'm also thinking I may have freaked out unnecessarily. I'm trying to track down a problem that's caused my system to pause for like a minute. I did "show processlist" and found a bunch of threads waiting for SELECT on the realm tables. One was always in UPDATE or INSERT. When I saw that we're seeing deadlocks on those tables, I connected the two. The problem I see is that the deadlocks seem to be for row locks. So it's not clear that they would cause everything to hang. Unfortunately it's hard to get enough information when this happens. The only way I can picture this happening is that there are subselects on REALM_ROLE and some other things. It's conceivable that there is a lock on a commonly used role or something else that blocks all realms.

So I'm currently debating whether it's worth doing SELECT FOR UPDATE to serialize update operations for individual realms, which I would think would clean things up better than rewriting all the individual queries to try and make them all deadlock-free.


On Oct 10, 2011, at 7:05:28 PM, Hedrick Charles wrote:

> And just to make a complete set, I've just seen a deadlock on SAKAI_REALM alone, again two updates in different threads using the same key.
> 
> The usual problem upgrading a shared lock to exclusive.
> 
> 
> On Oct 10, 2011, at 5:38:21 PM, Hedrick Charles wrote:
> 
>> Note that these are two threads trying to add the same record. This is documented in the manual as leading to a deadlock. The suggested solution is ON DUPLICATE KEY UPDATE. That causes an exclusive lock to be used rather than a shared one, and thus there is no need to upgrade from shared to exclusive, and the problem does not occur.
>> 
>> My sense is that the previous problem I reported can be fixed by not doing a subquery and this one by ON DUPLICATE KEY UPDATE. The first change is most easily done in the common code. ON DUPLICATE KEY UPDATE may be better if confined to the Mysql SQL.
>> 
>> I'm just conjecturing that when doing INSERT with a subquery they need locks on the table involved in the subquery. That's the only way I can see to explain the problem. I still can't explain the whole deadlock however, because the information doesn't show what locks the UPDATE owns. But it seems to be a good bet that removing the subquery will keep the UPDATE SAKAI_REALM from interacting with the INSERT SAKAI_REALM_RL_GR. 
>> 
>> 
>> On Oct 10, 2011, at 4:29:45 PM, Hedrick Charles wrote:
>> 
>>> Here's one on just SAKAI_REALM_RL_GR. It wouldn't be so bad if it didn't cause all reads of SAKAI_REALM_RL_GR to hang. I'm kind of surprised at that.
>>> 
>>> It looks like this fairly straightforward INSERT attempts to get a shared lock and then upgrade it to exclusive. This causes obvious problems if two threads do the same thing at the same time.
>>> 
>>> 
>>> *** (1) TRANSACTION:
>>> TRANSACTION 4 3826138716, ACTIVE 50 sec, OS thread id 30971 inserting
>>> mysql tables in use 3, locked 3
>>> LOCK WAIT 8 lock struct(s), heap size 1216, 4 row lock(s)
>>> MySQL thread id 131400, query id 13271312947 sakai-prod1.oirt.rutgers.edu 172.16.75.16 sakaiuser Sending data
>>> INSERT INTO SAKAI_REALM_RL_GR (REALM_KEY, USER_ID, ROLE_KEY, ACTIVE, PROVIDED) VALUES ( (SELECT REALM_KEY FROM SAKAI_REALM WHERE REALM_ID = x'2F736974652F36303063313134302D333839302D343237342D303063612D646431376433373130376261'), x'39336633303631652D353634652D346662632D393562372D366337386630393664613165',  (SELECT ROLE_KEY FROM SAKAI_REALM_ROLE WHERE ROLE_NAME = x'53747564656E74'), x'31', x'30')
>>> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
>>> RECORD LOCKS space id 0 page no 3160597 n bits 288 index `PRIMARY` of table `sakai`.`SAKAI_REALM_RL_GR` trx id 4 3826138716 lock_mode X locks gap before rec insert intention waiting
>>> Record lock, heap no 70 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
>>> 0: len 4; hex 80009ad9; asc     ;; 1: len 30; hex 39336639666364642d303463392d346438342d613963662d343230333936; asc 93f9fcdd-04c9-4d84-a9cf-420396;...(truncated); 2: len 6; hex 000000003481; asc     4 ;; 3: len 7; hex 800013401c0ff5; asc    @   ;; 4: len 4; hex 80000008; asc     ;; 5: len 1; hex 31; asc 1;; 6: len 1; hex 30; asc 0;;
>>> 
>>> *** (2) TRANSACTION:
>>> TRANSACTION 4 3826138710, ACTIVE 50 sec, OS thread id 30912 inserting, thread declared inside InnoDB 1
>>> mysql tables in use 3, locked 3
>>> 8 lock struct(s), heap size 1216, 4 row lock(s)
>>> MySQL thread id 131341, query id 13271312939 sakai-prod1.oirt.rutgers.edu 172.16.75.16 sakaiuser Sending data
>>> INSERT INTO SAKAI_REALM_RL_GR (REALM_KEY, USER_ID, ROLE_KEY, ACTIVE, PROVIDED) VALUES ( (SELECT REALM_KEY FROM SAKAI_REALM WHERE REALM_ID = x'2F736974652F36303063313134302D333839302D343237342D303063612D646431376433373130376261'), x'39336633303631652D353634652D346662632D393562372D366337386630393664613165',  (SELECT ROLE_KEY FROM SAKAI_REALM_ROLE WHERE ROLE_NAME = x'53747564656E74'), x'31', x'30')
>>> *** (2) HOLDS THE LOCK(S):
>>> RECORD LOCKS space id 0 page no 3160597 n bits 288 index `PRIMARY` of table `sakai`.`SAKAI_REALM_RL_GR` trx id 4 3826138710 lock mode S locks gap before rec
>>> Record lock, heap no 70 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
>>> 0: len 4; hex 80009ad9; asc     ;; 1: len 30; hex 39336639666364642d303463392d346438342d613963662d343230333936; asc 93f9fcdd-04c9-4d84-a9cf-420396;...(truncated); 2: len 6; hex 000000003481; asc     4 ;; 3: len 7; hex 800013401c0ff5; asc    @   ;; 4: len 4; hex 80000008; asc     ;; 5: len 1; hex 31; asc 1;; 6: len 1; hex 30; asc 0;;
>>> 
>>> *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
>>> RECORD LOCKS space id 0 page no 3160597 n bits 288 index `PRIMARY` of table `sakai`.`SAKAI_REALM_RL_GR` trx id 4 3826138710 lock_mode X locks gap before rec insert intention waiting
>>> Record lock, heap no 70 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
>>> 0: len 4; hex 80009ad9; asc     ;; 1: len 30; hex 39336639666364642d303463392d346438342d613963662d343230333936; asc 93f9fcdd-04c9-4d84-a9cf-420396;...(truncated); 2: len 6; hex 000000003481; asc     4 ;; 3: len 7; hex 800013401c0ff5; asc    @   ;; 4: len 4; hex 80000008; asc     ;; 5: len 1; hex 31; asc 1;; 6: len 1; hex 30; asc 0;;
>>> 
>>> *** WE ROLL BACK TRANSACTION (2)
>>> ------------
>>> 
>>> 
>> 
> 



More information about the sakai-dev mailing list