[Using Sakai] Sakai Error on one of our two nodes

Anders Nordkvist anders.nordqvist at his.se
Tue Sep 23 07:32:11 PDT 2014


Hi I raised it to 10000. I have now rebuilt the sakai node and it seems to work much better.

Regards Anders


Skickat från min Samsung Mobil.


-------- Originalmeddelande --------
Från: Steve Swinsburg
Datum:23-09-2014 09:35 (GMT+01:00)
Till: Anders Nordkvist
Kopia: sakai-user at collab.sakaiproject.org,Stephen Marquard ,Matthew Jones ,Sam Ottenhoff
Rubrik: RE: [Using Sakai] Sakai Error on one of our two nodes


What have you raised the limit to?

sent from my mobile

On 19/09/2014 4:41 PM, "Anders Nordkvist" <anders.nordqvist at his.se<mailto:anders.nordqvist at his.se>> wrote:
Hi,

Unfortunately my problem seems to be persistent. The server was showing errors again this morning with “to many open files” and lsof for my Sakai user showed 9890 open files. I think I have to rebuild the second node to get this to work with the search index. The other option might be to disable search index but as we don’t really know how many that uses that function I don’t think this is an alternative.

Regards
Anders Nordkvist
System administrator
University Of Skövde
Sweden



From: Matthew Jones [mailto:matthew at longsight.com<mailto:matthew at longsight.com>]
Sent: den 17 september 2014 19:51
To: Anders Nordkvist
Cc: Stephen Marquard; Sam Ottenhoff; steve.swinsburg at gmail.com<mailto:steve.swinsburg at gmail.com>; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: Re: [Using Sakai] Sakai Error on one of our two nodes

I personally wouldn't worry too much about those open files, especially on 2.9 if you're running search and that seems to be what is causing it. Sakai 2.9 and 10 just starting up are near the 1024 file limit because of all the tools included by default so it's only expected that you'd hit that error.

And search prior to switching over to elasticsearch (in 10) had a number of issues and is completely removed from Sakai as of 10.1. I wouldn't be completely surprised if it has to open up ~2000 files while it's indexing. Ideally it would close them off Really that file limit is meant to protect the developer and warn you if you have an actual file or socket leak in the code, which we have had in the past.

This same conversation came up in 2011 and it just really needed a lot of open files to complete the index.
http://collab.sakaiproject.org/pipermail/production/2011-November/001658.html

I'd really look to upgrading to at least 10.1 if you want a more reliable search tool.

On Wed, Sep 17, 2014 at 9:57 AM, Anders Nordkvist <anders.nordqvist at his.se<mailto:anders.nordqvist at his.se>> wrote:
Hi,


I managed to set the limit to 10000. It worked after I added this line



session required pam_limits.so

to /etc/pam.d/common-session

Whats strange in this case thought is that the nodes are so different in open files:

Node 2
lsof -u sakai |grep -i index | wc -l
2134

Node 1
lsof | grep index | grep sakai | wc -l
94

And the second node is increasing a lot faster.


Regards
Anders Nordkvist
System administrator
University Of Skövde
Sweden




From: Matthew Jones [mailto:matthew at longsight.com<mailto:matthew at longsight.com>]
Sent: den 16 september 2014 14:56
To: Anders Nordkvist
Cc: Stephen Marquard; Sam Ottenhoff; steve.swinsburg at gmail.com<mailto:steve.swinsburg at gmail.com>; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>

Subject: Re: [Using Sakai] Sakai Error on one of our two nodes

You'd want to set the hard and soft limits to make it easier. The soft limit is something that should be able to be changed later. You need to change that with the -S option. Without setting the soft limit, nothing changes.
http://askubuntu.com/questions/162229/how-do-i-increase-the-open-files-limit-for-a-non-root-user

In the file is Sakai capitalized? That would also be a problem but probably isn't the case.

I'm on ubuntu 14.04 and my /etc/security/limits.conf file says at the end and this works.

# End of file

sakai hard nofile 65535
sakai soft nofile 65535

$ ulimit -n
65535

On Tue, Sep 16, 2014 at 8:17 AM, Anders Nordkvist <anders.nordqvist at his.se<mailto:anders.nordqvist at his.se>> wrote:
Hi again,

I can’t seem to get the ”ulimit –n 10000” to work. I only get no permission for the Sakai user:

-su: ulimit: open files: cannot modify limit: Operation not permitted

I have set the permission in “/etc/security/limit.conf” and rebooted.

Sakai hard nofile 10000

And Ive set the “ulimit” in the “tomcat/bin/setenv”

Ulimit –n 10000

Am I doin it wrong? Feels like ive read hundreds of pages from the net but can’t get it right anyhow :(
Im using Ubuntu 12.04.4 LTS

Regards Anders

From: Stephen Marquard [mailto:stephen.marquard at uct.ac.za<mailto:stephen.marquard at uct.ac.za>]
Sent: den 16 september 2014 09:06
To: Anders Nordkvist; Sam Ottenhoff; steve.swinsburg at gmail.com<mailto:steve.swinsburg at gmail.com>
Cc: sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: RE: [Using Sakai] Sakai Error on one of our two nodes

If your search indexes are somehow corrupt, then you should either disable search entirely (search.enable = false in sakai.properties), or delete all your search indexes, truncate the search tables, and do a full index rebuild.

Regardless of that, I’d still suggest setting the open files limit in your Sakai startup script to at least 10000.

Regards
Stephen

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen.marquard at uct.ac.za<mailto:stephen.marquard at uct.ac.za>
Phone: +27-21-650-5037<tel:%2B27-21-650-5037> Cell: +27-83-500-5290<tel:%2B27-83-500-5290>

From: Anders Nordkvist [mailto:anders.nordqvist at his.se]
Sent: 16 September 2014 08:54 AM
To: Sam Ottenhoff; steve.swinsburg at gmail.com<mailto:steve.swinsburg at gmail.com>; Stephen Marquard
Cc: sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: RE: [Using Sakai] Sakai Error on one of our two nodes

Hi,

If I delete or move the indexwork files in the sakai dir on node two would that be a solution for my problems or do you think I have to start over on node two with a clean tomcat? I don’t think the problems will go away by just increasing the open file limit because it seems like the index open files just keeps on increasing. I got the “to many open files” again this morning wih a:

lsof -u sakai | grep -i indexwork | wc –l

of 4300 files.


Regards
Anders Nordkvist
System administrator
University Of Skövde
Sweden



From: sakai-user-bounces at collab.sakaiproject.org<mailto:sakai-user-bounces at collab.sakaiproject.org> [mailto:sakai-user-bounces at collab.sakaiproject.org] On Behalf Of Anders Nordkvist
Sent: den 15 september 2014 15:46
To: Sam Ottenhoff
Cc: sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: Re: [Using Sakai] Sakai Error on one of our two nodes

Unfortunatley it seems like my index files are the files going up without decreasing and so it might be corrupted as Steve is writing:

sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1585
sakai at scio2:~$ lsof -u sakai | wc -l
3260
sakai at scio2:~$ lsof -u sakai | wc -l
3261
sakai at scio2:~$ lsof -u sakai | wc -l
3262
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1594
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1594
sakai at scio2:~$ lsof -u sakai | wc -l
3242
sakai at scio2:~$ lsof -u sakai | wc -l
3235
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1594
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1594
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1639
sakai at scio2:~$ lsof -u sakai | wc -l
3315
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1648
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1657
sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
1666

Regards Anders

From: Sam Ottenhoff [mailto:ottenhoff at longsight.com]
Sent: den 15 september 2014 15:14
To: Anders Nordkvist
Cc: Steve Swinsburg; Stephen Marquard; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: Re: [Using Sakai] Sakai Error on one of our two nodes

The ulimit of 1024 is a per-process limit and your lsof output shows several different processes.

On Mon, Sep 15, 2014 at 8:21 AM, Anders Nordkvist <anders.nordqvist at his.se<mailto:anders.nordqvist at his.se>> wrote:
Ok thanks,

But isnt it strange that I have 1024  limit when I check with “ulimit –a” and when I run “lsof –u Sakai | wc –l” I now get 3067 and that is over the limit?

Regards Anders

From: Steve Swinsburg [mailto:steve.swinsburg at gmail.com<mailto:steve.swinsburg at gmail.com>]
Sent: den 15 september 2014 13:26
To: Stephen Marquard
Cc: Anders Nordkvist; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: Re: [Using Sakai] Sakai Error on one of our two nodes

This is pretty much a standard step now that Sakai is so large. It's likely the OS update and subsequent restart has reset this down to a lower level. Increase it as much as you like - 10000 should get you out of trouble.

The search error is directly related to this error as it cannot get another file descriptor open to write search indexes. Hopefully it has not corrupted the index.
regards,
Steve

On Mon, Sep 15, 2014 at 9:03 PM, Stephen Marquard <stephen.marquard at uct.ac.za<mailto:stephen.marquard at uct.ac.za>> wrote:
If you have more than one java process running, then that would be a factor. Are your 2 nodes on one server, or one node on two servers?

I’d suggest you take a look at:

lsof -u tomcat | grep -v jar

and see if there’s anything unusual, and also add

ulimit -n 5000

to your Sakai startup script to see if that helps.

Cheers
Stephen


---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen.marquard at uct.ac.za<mailto:stephen.marquard at uct.ac.za>
Phone: +27-21-650-5037<tel:%2B27-21-650-5037> Cell: +27-83-500-5290<tel:%2B27-83-500-5290>

From: Anders Nordkvist [mailto:anders.nordqvist at his.se<mailto:anders.nordqvist at his.se>]
Sent: 15 September 2014 12:58 PM
To: Stephen Marquard; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>

Subject: RE: Sakai Error on one of our two nodes

Hi Stephen,

Thanks for the tips. I get this when I run the commands:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63739
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63739
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
sakai at scio2:~$ lsof -u sakai | wc -l
2769

If I understand this right we have a max of 1024 for open files a process but the actually open files are 2769. Is this because there is more processes running?

Regards Anders

From: Stephen Marquard [mailto:stephen.marquard at uct.ac.za]
Sent: den 15 september 2014 12:26
To: Anders Nordkvist; sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: RE: Sakai Error on one of our two nodes

Hi Anders

You have 2 different problems; one from “Too many open files” and the other from the search service.

For the “too many open files” issue, you should see how many are being used and what the OS limit is on your app server. For example if your Sakai process runs as the tomcat user, you can run:

# lsof -u tomcat | wc -l
3821

and run “ulimit -a” to see the per-process OS limits. You can change these in your Sakai startup script, e.g. we have:

# Increase max open files
ulimit -n 100000

which is probably totally unnecessarily large, but we definitely had to increase it past the default 1024 in the early days. 5000 is perhaps reasonable.

It’s possible the “too many open files” is a symptom of another problem rather than just an underlying limit that you’ve run into, in which case you need to see what those open files are (which could include socket connections) and why they are getting opened and not closed.

Regards
Stephen

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen.marquard at uct.ac.za<mailto:stephen.marquard at uct.ac.za>
Phone: +27-21-650-5037<tel:%2B27-21-650-5037> Cell: +27-83-500-5290<tel:%2B27-83-500-5290>

From: sakai-user-bounces at collab.sakaiproject.org<mailto:sakai-user-bounces at collab.sakaiproject.org> [mailto:sakai-user-bounces at collab.sakaiproject.org] On Behalf Of Anders Nordkvist
Sent: 15 September 2014 12:07 PM
To: sakai-user at collab.sakaiproject.org<mailto:sakai-user at collab.sakaiproject.org>
Subject: [Using Sakai] Sakai Error on one of our two nodes

Hi,

We have had problems with Sakai at the University of Skövde Sweden after an OS update and restart of systems last friday. We have 2.9.x and have two Sakai nodes and on top of that we have a netscaler distributing the load and behind a mysql server. The Sakai nodes collect information via LDAP from our Microsoft AD. The problem occurred several hours after the update of OS and restart of machines (about 11hours). During this time you only have a 50/50 % chance to login because the netscaler is not working properly and is not directing traffic to the working node. Can you guys please take a look at this and see if you can figure it out? This is the log from the beginning:

2014-09-12 22:08:07,941  WARN http-bio-8080-exec-121 org.apache.myfaces.shared_impl.renderkit.html.HtmlImageRendererBase - ALT attribute is missing for : _idJsp64
2014-09-12 22:14:00,421  WARN http-bio-8080-exec-108 com.sun.faces.renderkit.html_basic.HtmlBasicRenderer - Unable to find component with ID 'df_compose_title' in view.
2014-09-12 22:14:00,422  WARN http-bio-8080-exec-108 com.sun.faces.renderkit.html_basic.HtmlBasicRenderer - Unable to find component with ID 'df_compose_body' in view.
Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
        at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
        at java.lang.Thread.run(Thread.java:745)

Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
        at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
        at java.lang.Thread.run(Thread.java:745)

Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
        at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
        at java.lang.Thread.run(Thread.java:745)

Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Me
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-user/attachments/20140923/22d2248a/attachment-0001.html 


More information about the sakai-user mailing list