On scaling Stockopedia

22nd Nov '13 by Edward Croft

17 comments 21271 reads

Stockopedia has grown from a hobby project on a single web server, to a full blown web application with over 10 servers each running different components of our tech stack. We've had to grow our systems to fulfil the demands of the thousands of subscribers who are now using the product. We are ambitious in our plans and try to be as forward thinking in our architecture as possible but this is one huge learning curve for all of us and very occasionally something jumps out and bites when we least expect it.

This morning we suffered a nasty 4 hour downtime period for which I hugely apologise. We had had a few issues earlier this week but it took this morning's downtime for us to finally realise the issue was a hardware failure at a critical part of the stack. Thankfully we managed to find a temporary solution to bring the site back live while a more permanent solution will be deployed tonight.

What we've found over the last couple of years is that every time we have a problem the team is able to come up with a very robust solution to make sure it doesn't happen again. Each time we witness a failure it strengthens the product for the future. I guess we can call these growing pains and I hope that subscribers can be patient while we are going through these changes. The more we grow, the more we can invest and the better the services will be for all subscribers in future.

I wanted to reiterate to subscribers that all data, screens, folios and customisations are backed up and secured offline every hour of the day. Outages are due to technical issues involved in serving our financial and website databases and have never led to any data losses. We take massive precautions in all of this and will continue to improve the robustness of our stack to bring you the service that you have faith in us to provide.

We've managed a 98% uptime record for our services over the last year, and the last time we had a full day offline was over 12 months ago. This actually compares very favourably with other Software as a Service businesses who average around 97% - many Twitter users will remember the infamous 'Fail Whale' ! Of course, when subscribers demand access to all of…

Unlock with Email

Unlock with Google

Already have an account?
Login here

JKeat

22nd Nov '13

"What we've found over the last couple of years is that every time we have a problem the team is able to come up with a very robust solution to make sure it doesn't happen again"

Anti-fragility! :)

On another note, there's still a bug I previously reported on the Valuation Chart in the Folio Analysis page that still hasn't been fixed yet. It's been awhile now and I know you were all busy with the Investor Show prep previously.

Will it be looked at anytime soon?

Edward Croft

Indeed - we do know about that one - and we promised someone else it would be fixed this week ! I'm still hoping we can.

Trigger

The problem you face is not just the occasional hardware glitch but as you imply one of growth. That is growth in terms of products you offer and users. Both will put more strain on your system and I'm guessing your users will expect better than 98% uptime.

To be able to expand at will and provide a decent uptime requires fully flexible and redundant/failover architecture which very expensive!! That's without thinking about the resources you need to maintain and update it. One way of achieving this is to outsource the server provision to a provider that has large datacentre capabilities that you are just a small part. As long as segregation/security is to your approval and the SLA's are appropriate you can achieve a lot.

If you ever went down this route you should seek advise on structuring your contracts as this is the key to success, cost savings (hopefully) and what resource you should retain.

Also, virtualisation, if you don't already use it can be great for both utilisation growth and failover/recovery, even on the fly. It was pretty impressive when I left IT 2 years ago so should do some fab stuff now.

Just a thought.

lightningtiger

Faults happen from time to time & need fixing. Well done for fixing the fault in such a short time.

I have had an intermittent fault with my BT line & broadband kept cutting out for the last month. Being an ex BT engineer I managed to find the fault myself. It took 6 BT engineers to attempt to find the fault. They all failed to find it. The line was tested OK on 6 occasions when the fault did not show up. I had to point out where the fault was to the last engineer so that the fault could be fixed.
It is at last working properly now thank goodness. So well done &.............

Cheers from Lightningtiger

Calalily

I think you did well under the pressure of expanding. Something has to give a little in these circumstances and you did well to get it under control and keep up communication. Well done the Ed, Dave & team!

slartybartfast

The only comment I would make is that having continually seen a message directing me to Twitter, there were no messages there. I see that the first was at 08:13.

BrianGeee

Stockopedia is a useful site, but probably not critical to most users. I'd prefer the staff to keep pushing ahead with good quality s/w development, and don't mind too much if there's a h/w failure once in a while.

PhilHofGrowYourDough

Any successful online service needs to build resilience ... particularly against DOS attacks (which was my 1st thought today). If it wasnt a DOS attack then it's only a matter of time.

mpat89

You guys should just get a rack in a data centre (i.e. TeleCity) and build a private cloud. That way you are immune to any single hardware failures and suddenly you have an extremely scalable system (essentially it's your own AWS). I would have presumed you were already doing this - is this not the case? If not why not?! :)

23rd Nov '13

mpat89 - we actually spent a month deploying on Amazon's cloud, but found it wasn't fast enough. Certainly not for our purposes which require massive amounts of computation in very short spaces of time. The input/output stress on the computation servers / databases require extremely high specification servers and off the shelf solutions didn't cut it. So yes, we decided to build our own private cloud. But you can get rainy days even in the cloud ;-)

Ha yeah. The problem with IT infrastructure is that something always goes wrong. Problems always end up taking longer to fix than expected. Personally all I care about at the moment are these upcoming white papers on stock ranks. :)

Bdroop

24th Nov '13

Mirrored servers? Lots of em! Whilst it was unfortunate, it's normal. Glad to hear you're growing! Hopefully scale at some point can improve the product further still.

corrsfan

Have to agree on the SLA's as above, you do get what you pay for. I used to be in Dell server tech support. those that paid for 4hr support - as long as they werent miles from anywhere during a blizzard, would pretty much get the parts and engineer within 4hrs, and the likes of telecity would get both typically within 2hrs. if the parts werent avaliable locally it would be flown from elsewhere to get there within the 4hrs (even on christmas day!). Now those that didnt want to pay for 4hrs we would only commit to by 5pm the next working day. So youd have unfortunate souls phone in at 4:30pm on friday, being told tuesday 9- 5:3pm, and if the part wasnt there on tuesday better make that wednesday.

Im sure youll find some happy medium to manage all of this between SLA's and spare-redundant hardware between the different resource pools you will have for the web, db, storage as you grow.

dunno

26th Nov '13

Also, virtualisation, if you don't already use it can be great for both utilisation growth and failover/recovery, even on the fly. It was pretty impressive when I left IT 2 years ago so should do some fab stuff now.

Agreed. Buy (or lease or rent space on ...) a few servers and run 2 or 3 virtual machines on each. Or just 1 if it's a resource hog. This makes it easy to move apps around if apps need more power or hardware goes down. We buy pre-owned, but still in warranty, hardware, I think intel/linux machines nowadays, you can get some powerful kit for very reasonable prices that way.

DJLJ23

6th Dec '13

Hi,

I have found this site incredibly useful and had been thinking about using the European information, but given how unreliable the site is currently, that's on hold.
could you share with us your plans to improve the service, please

Square Mile Junky

30th Dec '13

Defining a scalable architecture for a service provisioned through the cloud is a demanding job. I have worked for 24*7 online gambling companies and web and email scanning companies and latency can be a killer as you grow - especially if any number crunching is involved. At some point you have to bite the bullet and migrate to a fully scalable architecture. The companies I worked for were not keen on VM's, and our Network architect described Amazon Cloud as a place for hobbyists :-). You are right to build your own cloud. Another option is to go with a provider that shares your passion for data and wants to partner with you in providing best in class service - let them focus on exceeding any SLAs you set.. You have a great product, and I am sure you are already on the case.

Hi Square Mile Junky. Just catching up here. Yes we tried the Amazon Cloud, but the latencies were really dreadful for the kind of heavy computation that we do every night. We spent 4 weeks re-building everything on Amazon's cloud only to pull the plug. Amazon is great for consumer mobile apps that just require massive on demand storage, but not necessarily much manipulation once the data is stored. We've found our problem doesn't suit their solution - but glad we tried it.

Thanks for the support - we are constantly investing in and improving the service. Most of the work we do goes unseen by everyone using the site. It's a big job to build out something that crunches gigabytes of data for 8 hours every night and presents it all speedily to thousands of end users during the day. We've already put in a lot of processes since November that have brought our uptime back and beyond where we were for the 18 months before November. But we've certainly further to go.

We'll also be making some massive changes in 2014 - adding 6000 US stocks and shifting our whole codebase onto a new framework. So I can't promise there won't be further disruption. I hope though that everyone can enjoy the journey even if we have the odd stumble - nobody ever grew up without a few bruised knees !

On scaling Stockopedia

Unlock the rest of this article with a 14 day trial

About the Author

Edward Croft

17 comments

Unlock the metrics for all these stocks and more...

Unlock the metrics for all these stocks and more...