Disclaimer: Everything in this post comes from Declan Lynch. He doesn’t really blog anymore but I felt this important enough to share so I’m just trying to put out what info I know.
So we had a big problem at the day job this week. Our main server was getting hammered with what looked like a memory leak. This server houses our main business application that Declan wrote as well as the iPad BarCode Scanner stuff that I wrote. So naturally the question was asked of me:
“What code updates have you pushed to production in the last week?”
A perfectly fair question. Apparently he’s actually read some of my code.. who knew?
Anyway as luck would have it I’ve promoted nothing for a little while. I’ve been working on some new stuff that’s not ready. I could even use SourceTree to go back and find the date of my last promotion and it was far enough away that I was pretty much in the clear! Talk about having a good alibi! haha
Anyway – as best as I can tell Declan soon figured that is was not a memory leak but an unexplained CPU spike. Something was happening that was killing the server. And let me tell you, this is a BEEFY server. But still the CPU and HTTP Task was getting HAMMERED.
So what caused this almost Catastrophic problem?
The Safari Web Browser. Specifically the “Top Sites” feature. From two users.
Here’s how Declan described it with the needed fix:
what I thought was a memory leak wasn;t a memory leak, it was Safari’s Top Sites feature on two different users machines. Safari has a feature called Top Sites, as you visit different web sites it adds the site to the Top Sites list.
So for these users it added our internal site because they use it a lot.. Safari is also trying to be helpful, by pre loading the site to show a preview. Except in our case the sites are locked down so there is a redirect to a login page so Safari tries loading it again.
and again, and again, and again
about 6 times a second safari was hitting our server. Driving the CPU usage up till the site became unresponsive for users. Once we removed the site from the two users top sites listing the server settled back down and is behaving normally again.
and now for a fancy workaround…
you can tell the browser what content to serve if it is being loaded by a ‘preview’ function using the following code.
so you could redirect the preview to a totally different site or you could redirect to an image
How did he diagnose this and fine the issue?
He had to put the XPages Toolbox on the production server. With the backend monitor he noticed all the hits to the same login page over and over. So he went to the users machine and used netstat to confirm that it had a connection opened to Safari. Then he closed safari and saw the connection drop, But the connection came back on just opening Safari without going to out website. So he saw the Top Sites listing and removed it. No more connections.
He got the users IP address by running ‘tell http show thread state’ on the domino server. That shows all the treads and IP Addresses.
He does other tricks as well. I can struggle with something for a whole day and when I do break down and ask for help he often has the solution before I’m even done explaining the problem.
Yep. I have the best boss in the world! It’s like working with Sherlock Holmes.