Software architecture for high
traffic website
Case study - Stack Overflow
Presenter: Ngô Xuân Hòa (Novaon Adnetwork - Novanet)
Hanoi .Net Meetup
Contents
About Stack Overflow
● Beginning
● Restructure #1
● Restructure # 2
● Founders
● Principles
SO architecture
● StackExchange.Redis
● Dapper
● Jil
Open-source Libs
About Stack Overflow
Founders
Jeff Atwood
Joel Spolsky
2008
Stack Overflow
2009 2010 2011
Server Fault
Stack Exchange 1.0
Stack Exchange 2.0
Stack Overflow Carees
Rome wasn’t build in a day!
● 100+ Q&A Sites
● 600+ million pageviews a month
● 3000+ requests per second
● 16+ million users
● 8+ million question
● 40+ million answers
Principles
Perfomance Is a Feature
Cache All The Thing!
Reinvention is OK
Stack Overflow Architecture
2 times restructuring
Stack Exchange 1.0
● ASP.NET MVC
● SQL Server
● LINQ to SQL
● Wikipedia DB
Design
Stack Exchange Network
LINQ to SQL
HAProxy
Redis
Lucene.NET
Scale Up
● Cache every things
● Elastic Search
● Reinvention
Stack Exchange 1.0 Structure
Windows NLBLoad balancing
IIS Server IIS ServerWeb server
SQL ServerDatabase
Window NLB
● Cons:
○ Limit to 8
Nodes
○ Cannot detect
service failed
Web-tier
ASP.NET MVC
LINQ to SQL
SQL Server
● All-in-memory
● Full text search
● 16 million pageviews a month
● 3 million unique visitors a month
● 6 million visits a month
Follow none but learn from everyone!
Pros
● Bottleneck: Database SQL Server
● High cost to scale up
● Simple
Cons
Restructure #1 - Stack Exchange Network
HAProxy
Redis Cache
Lucene.NET
Tag Engine
Stack Exchange Network Structure
HAProxy
Redis
IIS Servers
Database
protobuf
sqlhttp http
Load Balancing
● HAProxy:
○ Run in Linux
○ Free
Web-tier
ASP.NET MVC 3
LINQ to SQL
jQuery 1.4.5
Lucene.Net
Redis
● In-memory cache
● Master-slave
● Messaging notification
3 Type Cache
Local Cache Site Cache
● Use Redis
● Cache Site’s data:
- Q&As
- Acceptance rates
- ...
Global Cache
● Use Redis
● Cache System Data:
- User info
- Inbox
- ...
● Use
HttpRunTime.Cache
● Cache:
- User Session
- View Count
- ...
Update cache flow - Local cache
Local Cache
Redis
DB
Other sites
1 3
2.1
2.2
4
1 - OnStartup - Subcribe invalidation message
to Redis
2.1 - Data changed (by other sites, apps…)
2.2 - Send message to Redis
3 - Redis send Notification to Subscribers
4 - Get data from DB - update Local cache
Deployment flow with HAProxy
● Tell HAProxy to take the server out of rotation via a POST
● Delay to let IIS finish current requests (~5 sec)
● Stop the website
● Copy files
● Start the website
● Local testing, update local cache, etc…
● Re-enable HAProxy via another POST
● High performance
● Low-cost Load Balancing
(use HAProxy)
● Use Messaging của Redis
for cache invalidation
Pros
● Too many SQL query
Cons
● 95 million pageviews a month
● 800 requests per second
● 16 million users
Restructure #2 - Scale Up
Cache All the Thing
Elastic Search
Reinvention
Stack Exchange Network Structure
Elastic Search
Tag Engine
Databases
Redis
HAProxy
5 Level cache
Network Level Local Cache Redis Cache SQL SV Cache SSD
● Network Level: Browser cache…
● Local Cache: HttpRuntime.Cache - Cache all data in memory
● Redis Cache: Cache all data
● SQL Server Cache: Cache all data in memory (the database servers have 384GB of RAM)
Cache Flow
● Check Local Cache
● Else, check Redis Cache and update Local Cache
● If Cache Redis doesn’t have data, fetch from databases, then update Redis Cache
and Local Cache
Cache All the Things!
Pros
● Data has latency
● Very, Very Fast (<400ms)
● Low servers load:
○ IIS: 10-15% CPU usage
○ DB: 10% CPU usage
● 99% request served by cache
Cons
● 95 million pageviews a month
● 800 requests per second
● 16 million users
Open-source Libs
• StackExchange.Redis - high perfomance Redis client
• Dapper - a micro ORM - very fast
• Jil - fast JSON Serializer
Reinvention is OK!
Reference sources
● http://stackoverflow.com
● http://highscalability.com
● http://codinghorror.com
● http://www.joelonsoftware.com
● http://nickcraver.com
● http://josephwoodward.co.uk/2014/02/the-architecture-of-stackoverflow/
Thank you!
Ngô Xuân Hòa
xuanhoa862001@gmail.com

Software architecture for high traffic website

  • 1.
    Software architecture forhigh traffic website Case study - Stack Overflow Presenter: Ngô Xuân Hòa (Novaon Adnetwork - Novanet) Hanoi .Net Meetup
  • 2.
    Contents About Stack Overflow ●Beginning ● Restructure #1 ● Restructure # 2 ● Founders ● Principles SO architecture ● StackExchange.Redis ● Dapper ● Jil Open-source Libs
  • 3.
  • 4.
  • 5.
    2008 Stack Overflow 2009 20102011 Server Fault Stack Exchange 1.0 Stack Exchange 2.0 Stack Overflow Carees Rome wasn’t build in a day!
  • 6.
    ● 100+ Q&ASites ● 600+ million pageviews a month ● 3000+ requests per second ● 16+ million users ● 8+ million question ● 40+ million answers
  • 7.
    Principles Perfomance Is aFeature Cache All The Thing! Reinvention is OK
  • 8.
  • 9.
    2 times restructuring StackExchange 1.0 ● ASP.NET MVC ● SQL Server ● LINQ to SQL ● Wikipedia DB Design Stack Exchange Network LINQ to SQL HAProxy Redis Lucene.NET Scale Up ● Cache every things ● Elastic Search ● Reinvention
  • 10.
    Stack Exchange 1.0Structure Windows NLBLoad balancing IIS Server IIS ServerWeb server SQL ServerDatabase
  • 11.
    Window NLB ● Cons: ○Limit to 8 Nodes ○ Cannot detect service failed Web-tier ASP.NET MVC LINQ to SQL SQL Server ● All-in-memory ● Full text search
  • 12.
    ● 16 millionpageviews a month ● 3 million unique visitors a month ● 6 million visits a month
  • 13.
    Follow none butlearn from everyone!
  • 14.
    Pros ● Bottleneck: DatabaseSQL Server ● High cost to scale up ● Simple Cons
  • 15.
    Restructure #1 -Stack Exchange Network HAProxy Redis Cache Lucene.NET Tag Engine
  • 16.
    Stack Exchange NetworkStructure HAProxy Redis IIS Servers Database protobuf sqlhttp http
  • 17.
    Load Balancing ● HAProxy: ○Run in Linux ○ Free Web-tier ASP.NET MVC 3 LINQ to SQL jQuery 1.4.5 Lucene.Net Redis ● In-memory cache ● Master-slave ● Messaging notification
  • 18.
    3 Type Cache LocalCache Site Cache ● Use Redis ● Cache Site’s data: - Q&As - Acceptance rates - ... Global Cache ● Use Redis ● Cache System Data: - User info - Inbox - ... ● Use HttpRunTime.Cache ● Cache: - User Session - View Count - ...
  • 19.
    Update cache flow- Local cache Local Cache Redis DB Other sites 1 3 2.1 2.2 4 1 - OnStartup - Subcribe invalidation message to Redis 2.1 - Data changed (by other sites, apps…) 2.2 - Send message to Redis 3 - Redis send Notification to Subscribers 4 - Get data from DB - update Local cache
  • 20.
    Deployment flow withHAProxy ● Tell HAProxy to take the server out of rotation via a POST ● Delay to let IIS finish current requests (~5 sec) ● Stop the website ● Copy files ● Start the website ● Local testing, update local cache, etc… ● Re-enable HAProxy via another POST
  • 21.
    ● High performance ●Low-cost Load Balancing (use HAProxy) ● Use Messaging của Redis for cache invalidation Pros ● Too many SQL query Cons
  • 22.
    ● 95 millionpageviews a month ● 800 requests per second ● 16 million users
  • 23.
    Restructure #2 -Scale Up Cache All the Thing Elastic Search Reinvention
  • 24.
    Stack Exchange NetworkStructure Elastic Search Tag Engine Databases Redis HAProxy
  • 25.
    5 Level cache NetworkLevel Local Cache Redis Cache SQL SV Cache SSD ● Network Level: Browser cache… ● Local Cache: HttpRuntime.Cache - Cache all data in memory ● Redis Cache: Cache all data ● SQL Server Cache: Cache all data in memory (the database servers have 384GB of RAM)
  • 26.
    Cache Flow ● CheckLocal Cache ● Else, check Redis Cache and update Local Cache ● If Cache Redis doesn’t have data, fetch from databases, then update Redis Cache and Local Cache
  • 27.
  • 28.
    Pros ● Data haslatency ● Very, Very Fast (<400ms) ● Low servers load: ○ IIS: 10-15% CPU usage ○ DB: 10% CPU usage ● 99% request served by cache Cons
  • 29.
    ● 95 millionpageviews a month ● 800 requests per second ● 16 million users
  • 30.
    Open-source Libs • StackExchange.Redis- high perfomance Redis client • Dapper - a micro ORM - very fast • Jil - fast JSON Serializer Reinvention is OK!
  • 31.
    Reference sources ● http://stackoverflow.com ●http://highscalability.com ● http://codinghorror.com ● http://www.joelonsoftware.com ● http://nickcraver.com ● http://josephwoodward.co.uk/2014/02/the-architecture-of-stackoverflow/
  • 32.
    Thank you! Ngô XuânHòa xuanhoa862001@gmail.com