Further info on Identi.ca problems

pI posted a few days ago about somea href=http://status.net/2011/11/05/overview-of-technical-problems-with-identi-ca technical problems with identi.ca/a. tl;dr version:nbsp;our Web servers occasionally hit very high load and stop responding, hurting performance of the site. I've found out a few things in the last few days, so I thought I'd update here for those interested./p
pHaving a high load on a server can come from several causes. It can either be due to high I/Onbsp;(network connections) or high CPU usages.nbsp;Direct problems can be repeated connections to non-responsive network services, or buggy software that works inefficiently or goes into an infinite loop. Load can also be spread over multiple processes, or just be one process hogging all the resources./p
pHere's what we're seeing:nbsp;one Apache process, on our Web server, has an explosive growth in memory usage. You can see an example in this a href=http://pastebin.com/YLY59Qk3ps output/a. Process 26617 has allocated 4Gb of memory, which is causing swapping to virtual memory, which has slowed the server to a standstill. /p
pThis memory leak is kind of confusing, since we've got a 96Mb a href=http://php.net/manual/en/ini.core.phpmemory_limit/a set in our PHP configuration. Theoretically, StatusNet itself shouldn't be able to allocate this much memory./p
pAnother point that seems worth noting is that my systems team already had checks in place for this. If the server is loaded, periodic checks will kill and restart Apache.nbsp;That means that largely these servers have been recovering on their own. It also means that the problem happens more frequently than I thought -- once or twice an hour, not once a day./p
pAt this point, I'm working on a few fronts. First, I'd like to restrict the amount of memory available to any one process on the system. That should prevent (Inbsp;think)nbsp;the issue of one process forcing the server to swap and dragging down everything else with it. I'm trying limits.conf (hasn't worked yet)nbsp;and may venture into a href=http://en.wikipedia.org/wiki/Cgroupscgroups /aif that doesn't work out./p
pSecond, I've tried to mitigate some of the effects of long-running Apache processes by tuning our Apache settings (including MaxRequestsPerClient) to prevent a process from building up a lot of memory over time./p
pThird, I'm trying to map individual hits to Apache processes so I can determine what exactly is making that process explode. Inbsp;hope that I can identify what's causing this explosive memory allocation and fix it so it doesn't happen in the first place./p
pThanks to everyone on identi.ca for their patience while I work this out. Still hacking, I promise!/p

!--
rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:trackback=http://madskills.com/public/xml/rss/module/trackback/
rdf:Description rdf:about=http://status.net/2011/11/08/further-info-on-identi-ca-problems dc:identifier=http://status.net/2011/11/08/further-info-on-identi-ca-problems dc:title=Further info on Identi.ca problems trackback:ping=http://status.net/trackback/6269 /
/rdf:RDF
--
div class=trackback-url
div class=box

h2 class=titleTrackback URL for this post:/h2

div class=contenthttp://status.net/trackback/6269/div
/div!-- /box --/div

See original: StatusNet Further info on Identi.ca problems