Forums | Mahara Community

Developers /
Collection copying issue


Shen Zhang's profile picture
Posts: 84

21 July 2015, 1:43 PM

Hi

We are on version 15.04. We had an issue today getting a class of students copying a shared collection. The message in the error_log is "...ERROR:  deadlock detected..." 

Has anyone here experienced a similar issue? I did a copy of the same collection after we had the issue and there was no problem. No errors registered in the error_log either. It seems to be a load issue.

I would really appreciate if someone could help us out here.

 

Kind regards,

Shen

Aaron Wells's profile picture
Posts: 896

22 July 2015, 7:11 PM

Well, the message "deadlock detected" comes from our ADODB database driver, and it's apparently a message that trickles up from the underlying database. Specifically, it appears to be a message that Postgres can throw. Are you using Postgres?

Here's a page in the Postgres manual about deadlocks: http://www.postgresql.org/docs/9.4/static/explicit-locking.html

So, it's probably a bug in the Mahara code, that something in our collection copying code (or page copying, or artefact copying) can cause a deadlock in the database when there ar multiple requests going at the same time.

Unfortunately, such problems can be difficult to track down. Probably a starting point would be to look at places in the code that use an UPDATE query when copying.

Cheers,

Aaron

Shen Zhang's profile picture
Posts: 84

06 August 2015, 8:53 AM

Hi Aaron,

Thanks for the reply. The query related to the deadlock error is:

 [WAR] 8e (lib/errors.php:747) Command was: UPDATE "usr" SET "salt" = ? , "passwordchange" = ? , "active" = ? , "deleted" = ? , "expiry" = ? , "expirymailsent" = ? , "lastlogin" = ? , "lastlastlogin" = ? , "lastaccess" = ? , "inactivemailsent" = ? , "staff" = ? , "admin" = ? , "firstname" = ? , "lastname" = ? , "studentid" = ? , "preferredname" = ? , "email" = ? , "profileicon" = ? , "suspendedctime" = ? , "suspendedreason" = ? , "suspendedcusr" = ? , "quota" = ? , "quotaused" = ? , "authinstance" = ? , "ctime" = ? , "showhomeinfo" = ? , "username" = ? , "password" = ? , "unread" = ? , "urlid" = ? , "probation" = ? WHERE "id" = ? and values was (1c2bccc2,1,1,0,,0,2015-07-21 12:34:04,,2015-07-21 13:11:24,0,0,0,......,759231,,,,209715200,349495,6,2015-07-13 14:10:36,1,cxs7424,,0,,0,57080), referer: WebsiteAddress/view/view.php?id=47735

You are right that we are using Postgres and it is an UPDATE query. Any idea what we can go from here?

Kind regards,

Shen

anonymous profile picture
Deleted user
Posts: 1

04 September 2015, 9:40 PM

Hi Shen, Aaron,

We too are on 15.04 and experiencing what looks to be a similar error, although we are using MySQL.

The error arises when attempting to copy pages or collections. Like Shen's post, it appears to be load related, but the 'load' isn't large numbers, even around 20-30 users seems to trigger it.

We hadn't noticed the issue before upgrading to 15.04 but have no other indication that it is related to that version.

Any help or advice is greatly appreciated, this is having a significant impact on our service.

Nic

EDIT:

Below is an example of the deadlock reports we have received (with sensitive data anonymised): 

--
 
------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-09-03 10:37:41 7f715da3f700
*** (1) TRANSACTION:
TRANSACTION 213962719, ACTIVE 3 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 12 lock struct(s), heap size 2936, 6 row lock(s), undo log entries 3
MySQL thread id 6805211, OS thread handle 0x7f715dac1700, query id 304205187 <> epwar_xx update
INSERT INTO "ep_view_rows_columns" ("view", "row", "columns") VALUES (46774, 1, 3)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 585517 n bits 728 index "ep_viewrowscolu_vie_ix" of table "epwar"."ep_view_rows_columns" trx id 213962719 lock_mode X insert intention waiting
*** (2) TRANSACTION:
TRANSACTION 213962578, ACTIVE 6 sec starting index read
mysql tables in use 3, locked 3
14 lock struct(s), heap size 2936, 7 row lock(s), undo log entries 3
MySQL thread id 6805199, OS thread handle 0x7f715da3f700, query id 304205202 <> epwar_xx updating
UPDATE "ep_usr" SET "username" <>
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 585517 n bits 728 index "ep_viewrowscolu_vie_ix" of table "epwar"."ep_view_rows_columns" trx id 213962578 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 580244 n bits 176 index "PRIMARY" of table "epwar"."ep_usr" trx id 213962578 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (1)

---
Aaron Wells's profile picture
Posts: 896

07 September 2015, 6:53 PM

Okay, I've filed a bug about this issue, here: https://bugs.launchpad.net/mahara/+bug/1492919

Is this 20-30 users all trying to copy the same page/collection, or all they copying different pages & collections?

(I'm guessing since it appears to be an index deadlock, that it doesn't matter whether they're all copying the same or another one.)

anonymous profile picture
Deleted user
Posts: 32

05 September 2015, 4:24 AM

We recently upgraded to 15.04 as well. We are also having issues copying collections. I has happened twice working with faculty, where they copy collection and then wait a long time. Of course they click the copy collection button again and then nothing happens. Upon returning to the pages list, the pages have been copied, but they are not able to be viewed, and get the message that they must have encountered a bug in the system.

I really need to remedy this asap, as I will be teaching large amounts of students to copy collections soon. Please tell me where to go from here.

Shen Zhang's profile picture
Posts: 84

07 September 2015, 10:23 AM

We have been monitoring this issue for a while. What we had noticed was that whenever we had a class of students working on copying collections/pages, the CPU load shot up. It doesn't need a big number, 20 to 30 students is enough to crash the server. We ended up doubling up the core on the server. This seems to ease the problem. However, we would still spot messages in the error log "...ERROR:  deadlock detected..." 

emily Walker's profile picture
Posts: 3

07 September 2015, 10:43 PM

some extra debug if this helps

 

Sep 04 14:10:47 2015] [error] [client 137.205.88.222] [WAR] 62 (lib/dml.php:1268) Failed to get a recordset: mysqli error: [1213: Deadlock found when trying to get lock; try restarting transaction] in adodb_throw(UPDATE "ep_usr" SET "email" = 'removed email' WHERE "id" = 5055 , )
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] [WAR] 62 (lib/dml.php:1268) Command was: UPDATE "ep_usr" SET "email" = ? WHERE "id" = ? and values was ('removed email,5055)
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] Call stack (most recent first):
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] * update_record("usr", object(stdClass)) at /var/www/xxxx/docroot/artefact/internal/lib.php:746
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] * ArtefactTypeEmail->commit() at /var/www/xxxx/docroot/lib/user.php:523
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] * set_profile_field(5055, "email", "removed email", true) at /var/www/xxxx/docroot/lib/user.php:2296
[Fri Sep 04 14:10:47 2015] [error] [client 137.205.88.222] * create_user(object(User), array(size 0), "xxxxx", object(AuthXmlrpc), "u1531841") at /var/www/xxxxx/docroot/auth/xmlrpc/lib.php:204


*i removed users personal email addresses from debug info above

*main auth is XML-RPC but shibboleth has been added to institution

 

 


Aaron Wells's profile picture
Posts: 896

08 September 2015, 11:45 AM

Hm, actually everyone is reporting these deadlock errors in different parts of the system, and while doing different activities. Which leads me to think that it's not a Mahara "bug" per se, but a resource usage issue.

Also leading to that conclusion, is that we have not encountered this same problem here at Catalyst IT, even though we host some very large Mahara sites. We have experienced bad slowdown when we have a whole classroom of users doing the same thing at the same time, but not deadlock-related crashes.

So there's probably not a smoking gun here with a simple fix. Although perhaps some of the database optimization patches that we've got in code review may help with this problem.

emily Walker's profile picture
Posts: 3

08 September 2015, 10:37 PM

this is another site that is seing the issue since it upgraded from 1.10 to 15.4

LATEST DETECTED DEADLOCK

------------------------

2015-09-07 16:02:28 7f0fbf5bd700

*** (1) TRANSACTION:

TRANSACTION 106095184, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 9 lock struct(s), heap size 1184, 5 row lock(s), undo log entries 2 MySQL thread id 13181220, OS thread handle 0x7f0fbeb54700, query id 170339627 localhost epsmuc_xx update INSERT INTO "ep_view_rows_columns" ("view", "row", "columns") VALUES (42332, 1, 3)

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 0 page no 1484099 n bits 680 index "ep_viewrowscolu_vie_ix" of table "epsmuc"."ep_view_rows_columns" trx id 106095184 lock_mode X insert intention waiting

*** (2) TRANSACTION:

TRANSACTION 106095169, ACTIVE 0 sec inserting mysql tables in use 1, locked 1

29 lock struct(s), heap size 6544, 18 row lock(s), undo log entries 12 MySQL thread id 13181219, OS thread handle 0x7f0fbf5bd700, query id 170339930 localhost epsmuc_xx update INSERT INTO "ep_view_rows_columns" ("view", "row", "columns") VALUES ('42331', 1, '1')

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 0 page no 1484099 n bits 680 index "ep_viewrowscolu_vie_ix" of table "epsmuc"."ep_view_rows_columns" trx id 106095169 lock_mode X

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 0 page no 1484099 n bits 680 index "ep_viewrowscolu_vie_ix" of table "epsmuc"."ep_view_rows_columns" trx id 106095169 lock_mode X insert intention waiting

*** WE ROLL BACK TRANSACTION (1)

 

I wouldnt say we are seeing this with high activity. The issues encountered were not present for these sites when they were running 1.10 last year and performing the same activities.

23 results