[Using Sakai] Sakai Error on one of our two nodes

Steve Swinsburg steve.swinsburg at gmail.com
Tue Sep 23 00:35:39 PDT 2014


What have you raised the limit to?

sent from my mobile
On 19/09/2014 4:41 PM, "Anders Nordkvist" <anders.nordqvist at his.se> wrote:

>  Hi,
>
>
>
> Unfortunately my problem seems to be persistent. The server was showing
> errors again this morning with “to many open files” and lsof for my Sakai
> user showed 9890 open files. I think I have to rebuild the second node to
> get this to work with the search index. The other option might be to
> disable search index but as we don’t really know how many that uses that
> function I don’t think this is an alternative.
>
>
>
>  Regards
>
> Anders Nordkvist
>
> System administrator
>
> University Of Skövde
>
> Sweden
>
>
>
>
>
>
>
> *From:* Matthew Jones [mailto:matthew at longsight.com]
> *Sent:* den 17 september 2014 19:51
> *To:* Anders Nordkvist
> *Cc:* Stephen Marquard; Sam Ottenhoff; steve.swinsburg at gmail.com;
> sakai-user at collab.sakaiproject.org
> *Subject:* Re: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> I personally wouldn't worry too much about those open files, especially on
> 2.9 if you're running search and that seems to be what is causing it. Sakai
> 2.9 and 10 just starting up are near the 1024 file limit because of all the
> tools included by default so it's only expected that you'd hit that error.
>
>
>
> And search prior to switching over to elasticsearch (in 10) had a number
> of issues and is completely removed from Sakai as of 10.1. I wouldn't be
> completely surprised if it has to open up ~2000 files while it's indexing.
> Ideally it would close them off Really that file limit is meant to protect
> the developer and warn you if you have an actual file or socket leak in the
> code, which we have had in the past.
>
>
>
> This same conversation came up in 2011 and it just really needed a lot of
> open files to complete the index.
>
>
> http://collab.sakaiproject.org/pipermail/production/2011-November/001658.html
>
>
>
> I'd really look to upgrading to at least 10.1 if you want a more reliable
> search tool.
>
>
>
> On Wed, Sep 17, 2014 at 9:57 AM, Anders Nordkvist <anders.nordqvist at his.se>
> wrote:
>
>  Hi,
>
>
>
> I managed to set the limit to 10000. It worked after I added this line
>
>
>
> session required pam_limits.so
>
>
>
> to /etc/pam.d/common-session
>
>
>
> Whats strange in this case thought is that the nodes are so different in
> open files:
>
>
>
> Node 2
>
> lsof -u sakai |grep -i index | wc -l
>
> 2134
>
>
>
> Node 1
>
> lsof | grep index | grep sakai | wc -l
>
> 94
>
>
>
> And the second node is increasing a lot faster.
>
>
>
>
>
> Regards
>
> Anders Nordkvist
>
> System administrator
>
> University Of Skövde
>
> Sweden
>
>
>
>
>
>
>
>
>
> *From:* Matthew Jones [mailto:matthew at longsight.com]
> *Sent:* den 16 september 2014 14:56
> *To:* Anders Nordkvist
> *Cc:* Stephen Marquard; Sam Ottenhoff; steve.swinsburg at gmail.com;
> sakai-user at collab.sakaiproject.org
>
>
> *Subject:* Re: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> You'd want to set the hard and soft limits to make it easier. The soft
> limit is something that should be able to be changed later. You need to
> change that with the -S option. Without setting the soft limit, nothing
> changes.
>
>
> http://askubuntu.com/questions/162229/how-do-i-increase-the-open-files-limit-for-a-non-root-user
>
>
>
> In the file is Sakai capitalized? That would also be a problem but
> probably isn't the case.
>
>
>
> I'm on ubuntu 14.04 and my /etc/security/limits.conf file says at the end
> and this works.
>
>
>
> # End of file
>
>
>
> sakai hard nofile 65535
>
> sakai soft nofile 65535
>
>
>
> $ ulimit -n
>
> 65535
>
>
>
> On Tue, Sep 16, 2014 at 8:17 AM, Anders Nordkvist <anders.nordqvist at his.se>
> wrote:
>
>  Hi again,
>
>
>
> I can’t seem to get the ”ulimit –n 10000” to work. I only get no
> permission for the Sakai user:
>
>
>
> -su: ulimit: open files: cannot modify limit: Operation not permitted
>
>
>
> I have set the permission in “/etc/security/limit.conf” and rebooted.
>
>
>
> Sakai hard nofile 10000
>
>
>
> And Ive set the “ulimit” in the “tomcat/bin/setenv”
>
>
>
> Ulimit –n 10000
>
>
>
> Am I doin it wrong? Feels like ive read hundreds of pages from the net but
> can’t get it right anyhow :(
>
> Im using Ubuntu 12.04.4 LTS
>
>
>
> Regards Anders
>
>
>
> *From:* Stephen Marquard [mailto:stephen.marquard at uct.ac.za]
> *Sent:* den 16 september 2014 09:06
> *To:* Anders Nordkvist; Sam Ottenhoff; steve.swinsburg at gmail.com
> *Cc:* sakai-user at collab.sakaiproject.org
> *Subject:* RE: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> If your search indexes are somehow corrupt, then you should either disable
> search entirely (search.enable = false in sakai.properties), or delete all
> your search indexes, truncate the search tables, and do a full index
> rebuild.
>
>
>
> Regardless of that, I’d still suggest setting the open files limit in your
> Sakai startup script to at least 10000.
>
>
>
> Regards
>
> Stephen
>
>
>
> ---
> Stephen Marquard, Learning Technologies Co-ordinator,
> Centre for Innovation in Learning and Teaching (CILT)
> University of Cape Town
> http://www.cilt.uct.ac.za
> stephen.marquard at uct.ac.za
> Phone: +27-21-650-5037 Cell: +27-83-500-5290
>
>
>
> *From:* Anders Nordkvist [mailto:anders.nordqvist at his.se
> <anders.nordqvist at his.se>]
> *Sent:* 16 September 2014 08:54 AM
> *To:* Sam Ottenhoff; steve.swinsburg at gmail.com; Stephen Marquard
> *Cc:* sakai-user at collab.sakaiproject.org
> *Subject:* RE: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> Hi,
>
>
>
> If I delete or move the indexwork files in the sakai dir on node two would
> that be a solution for my problems or do you think I have to start over on
> node two with a clean tomcat? I don’t think the problems will go away by
> just increasing the open file limit because it seems like the index open
> files just keeps on increasing. I got the “to many open files” again this
> morning wih a:
>
>
>
> lsof -u sakai | grep -i indexwork | wc –l
>
>
>
> of 4300 files.
>
>
>
>
>
> Regards
>
> Anders Nordkvist
>
> System administrator
>
> University Of Skövde
>
> Sweden
>
>
>
>
>
>
>
> *From:* sakai-user-bounces at collab.sakaiproject.org [
> mailto:sakai-user-bounces at collab.sakaiproject.org
> <sakai-user-bounces at collab.sakaiproject.org>] *On Behalf Of *Anders
> Nordkvist
> *Sent:* den 15 september 2014 15:46
> *To:* Sam Ottenhoff
> *Cc:* sakai-user at collab.sakaiproject.org
> *Subject:* Re: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> Unfortunatley it seems like my index files are the files going up without
> decreasing and so it might be corrupted as Steve is writing:
>
>
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1585
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3260
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3261
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3262
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1594
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1594
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3242
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3235
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1594
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1594
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1639
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 3315
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1648
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1657
>
> sakai at scio2:~$ lsof -u sakai | grep -i indexwork | wc -l
>
> 1666
>
>
>
> Regards Anders
>
>
>
> *From:* Sam Ottenhoff [mailto:ottenhoff at longsight.com
> <ottenhoff at longsight.com>]
> *Sent:* den 15 september 2014 15:14
> *To:* Anders Nordkvist
> *Cc:* Steve Swinsburg; Stephen Marquard;
> sakai-user at collab.sakaiproject.org
> *Subject:* Re: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> The ulimit of 1024 is a per-process limit and your lsof output shows
> several different processes.
>
>
>
> On Mon, Sep 15, 2014 at 8:21 AM, Anders Nordkvist <anders.nordqvist at his.se>
> wrote:
>
>  Ok thanks,
>
>
>
> But isnt it strange that I have 1024  limit when I check with “ulimit –a”
> and when I run “lsof –u Sakai | wc –l” I now get 3067 and that is over the
> limit?
>
>
>
> Regards Anders
>
>
>
> *From:* Steve Swinsburg [mailto:steve.swinsburg at gmail.com]
> *Sent:* den 15 september 2014 13:26
> *To:* Stephen Marquard
> *Cc:* Anders Nordkvist; sakai-user at collab.sakaiproject.org
> *Subject:* Re: [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> This is pretty much a standard step now that Sakai is so large. It's
> likely the OS update and subsequent restart has reset this down to a lower
> level. Increase it as much as you like - 10000 should get you out of
> trouble.
>
> The search error is directly related to this error as it cannot get
> another file descriptor open to write search indexes. Hopefully it has not
> corrupted the index.
>
> regards,
> Steve
>
>
>
> On Mon, Sep 15, 2014 at 9:03 PM, Stephen Marquard <
> stephen.marquard at uct.ac.za> wrote:
>
>  If you have more than one java process running, then that would be a
> factor. Are your 2 nodes on one server, or one node on two servers?
>
>
>
> I’d suggest you take a look at:
>
>
>
> lsof -u tomcat | grep -v jar
>
>
>
> and see if there’s anything unusual, and also add
>
>
>
> ulimit -n 5000
>
>
>
> to your Sakai startup script to see if that helps.
>
>
> Cheers
>
> Stephen
>
>
>
>
>
> ---
> Stephen Marquard, Learning Technologies Co-ordinator,
> Centre for Innovation in Learning and Teaching (CILT)
> University of Cape Town
> http://www.cilt.uct.ac.za
> stephen.marquard at uct.ac.za
> Phone: +27-21-650-5037 Cell: +27-83-500-5290
>
>
>
> *From:* Anders Nordkvist [mailto:anders.nordqvist at his.se]
> *Sent:* 15 September 2014 12:58 PM
> *To:* Stephen Marquard; sakai-user at collab.sakaiproject.org
>
>
> *Subject:* RE: Sakai Error on one of our two nodes
>
>
>
> Hi Stephen,
>
>
>
> Thanks for the tips. I get this when I run the commands:
>
>
>
> core file size          (blocks, -c) 0
>
> data seg size           (kbytes, -d) unlimited
>
> scheduling priority             (-e) 0
>
> file size               (blocks, -f) unlimited
>
> pending signals                 (-i) 63739
>
> max locked memory       (kbytes, -l) 64
>
> max memory size         (kbytes, -m) unlimited
>
> open files                      (-n) 1024
>
> pipe size            (512 bytes, -p) 8
>
> POSIX message queues     (bytes, -q) 819200
>
> real-time priority              (-r) 0
>
> stack size              (kbytes, -s) 8192
>
> cpu time               (seconds, -t) unlimited
>
> max user processes              (-u) 63739
>
> virtual memory          (kbytes, -v) unlimited
>
> file locks                      (-x) unlimited
>
> sakai at scio2:~$ lsof -u sakai | wc -l
>
> 2769
>
>
>
> If I understand this right we have a max of 1024 for open files a process
> but the actually open files are 2769. Is this because there is more
> processes running?
>
>
>
> Regards Anders
>
>
>
> *From:* Stephen Marquard [mailto:stephen.marquard at uct.ac.za
> <stephen.marquard at uct.ac.za>]
> *Sent:* den 15 september 2014 12:26
> *To:* Anders Nordkvist; sakai-user at collab.sakaiproject.org
> *Subject:* RE: Sakai Error on one of our two nodes
>
>
>
> Hi Anders
>
>
>
> You have 2 different problems; one from “Too many open files” and the
> other from the search service.
>
>
>
> For the “too many open files” issue, you should see how many are being
> used and what the OS limit is on your app server. For example if your Sakai
> process runs as the tomcat user, you can run:
>
>
>
> # lsof -u tomcat | wc -l
>
> 3821
>
>
>
> and run “ulimit -a” to see the per-process OS limits. You can change these
> in your Sakai startup script, e.g. we have:
>
>
>
> # Increase max open files
>
> ulimit -n 100000
>
>
>
> which is probably totally unnecessarily large, but we definitely had to
> increase it past the default 1024 in the early days. 5000 is perhaps
> reasonable.
>
>
>
> It’s possible the “too many open files” is a symptom of another problem
> rather than just an underlying limit that you’ve run into, in which case
> you need to see what those open files are (which could include socket
> connections) and why they are getting opened and not closed.
>
>
>
> Regards
>
> Stephen
>
>
>
> ---
> Stephen Marquard, Learning Technologies Co-ordinator,
> Centre for Innovation in Learning and Teaching (CILT)
> University of Cape Town
> http://www.cilt.uct.ac.za
> stephen.marquard at uct.ac.za
> Phone: +27-21-650-5037 Cell: +27-83-500-5290
>
>
>
> *From:* sakai-user-bounces at collab.sakaiproject.org [
> mailto:sakai-user-bounces at collab.sakaiproject.org
> <sakai-user-bounces at collab.sakaiproject.org>] *On Behalf Of *Anders
> Nordkvist
> *Sent:* 15 September 2014 12:07 PM
> *To:* sakai-user at collab.sakaiproject.org
> *Subject:* [Using Sakai] Sakai Error on one of our two nodes
>
>
>
> Hi,
>
>
>
> We have had problems with Sakai at the University of Skövde Sweden after
> an OS update and restart of systems last friday. We have 2.9.x and have two
> Sakai nodes and on top of that we have a netscaler distributing the load
> and behind a mysql server. The Sakai nodes collect information via LDAP
> from our Microsoft AD. The problem occurred several hours after the update
> of OS and restart of machines (about 11hours). During this time you only
> have a 50/50 % chance to login because the netscaler is not working
> properly and is not directing traffic to the working node. Can you guys
> please take a look at this and see if you can figure it out? This is the
> log from the beginning:
>
>
>
> 2014-09-12 22:08:07,941  WARN http-bio-8080-exec-121
> org.apache.myfaces.shared_impl.renderkit.html.HtmlImageRendererBase - ALT
> attribute is missing for : _idJsp64
>
> 2014-09-12 22:14:00,421  WARN http-bio-8080-exec-108
> com.sun.faces.renderkit.html_basic.HtmlBasicRenderer - Unable to find
> component with ID 'df_compose_title' in view.
>
> 2014-09-12 22:14:00,422  WARN http-bio-8080-exec-108
> com.sun.faces.renderkit.html_basic.HtmlBasicRenderer - Unable to find
> component with ID 'df_compose_body' in view.
>
> Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor
> run
>
> SEVERE: Socket accept failed
>
> java.net.SocketException: Too many open files
>
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>
>         at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>
>         at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>
>         at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor
> run
>
> SEVERE: Socket accept failed
>
> java.net.SocketException: Too many open files
>
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>
>         at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>
>         at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>
>         at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor
> run
>
> SEVERE: Socket accept failed
>
> java.net.SocketException: Too many open files
>
>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>
>         at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>
>         at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>
>         at java.net.ServerSocket.accept(ServerSocket.java:498)
>
>         at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>
>         at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> Sep 12, 2014 10:17:00 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor
> run
>
> SEVERE: Socket accept failed
>
> java.net.SocketException: Too many open files
>
>         at java.net.PlainSocketImpl.socketAccept(Native Me
>
> ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-user/attachments/20140923/9db6bf7a/attachment-0001.html 


More information about the sakai-user mailing list