I have an app running with:
- one instance of nginx as the frontend (serving static file)
- a cluster of node.js
application for the backend (using cluster and expressjs modules)
- one instance of Postgres as the DB
Is this architecture sufficient if the application needs scalability (this is only for HTTP / REST requests) for:
500 request per seconds (each requests only fetches data from the DB, those data could be several ko, and with no big computation needed after the fetch).
20000 users connected at the same time
Where could be the bottlenecks ?
Problem courtesy of: Luc
For the specified load (500 simple requests/second), I wouldn’t have thought that this will be too much of a problem. And my guess would be that a cluster of node instances will not even be necessary.
However, as you’ve only got a single instance, when it comes to scaling up, that is most likely going to be your bottleneck. You’ve also got the additional issue that this would be your single point of failure (I’m not familiar with Postgres, here were working with an Oracle cluster and dataguard which means that we’ve got a backup database cluster to mitigate that).
If you do not require a relational data model, then something MongoDB may be a more scalable choice.
One other thing to bear in mind is your network infrastructure. If you are going to add clusters/nodes, then make sure that the network can handle the distributed load.
One last thing: Generally, it is impossible to determine whether an application on an architecture can handle a particular load without performance/volume/stress testing, so the answer is a resounding “maybe”.
Solution courtesy of: beny23
You should be ok at 500 ops/sec. Redesign if you expect to go into the thousand ops/sec.
Without knowing a lot more data from you, disk I/O will be your bottleneck most likely. This will occur in your PostgreSQL database at around 10k ops/sec if you’re I/O’ing from the hard drive on stock hardware and also slow down if you’re doing a JOIN on the SQL command. This will also slow down the more concurrent users you have trying to access a single drive. Your drive seek time is going to freak out, as you’ll constantly be randomly accessing the drive.
You should research the structure of your data and if a relational database is needed (do you have to do a JOIN?). A noSQL solution might be the way to go. Always try to get your disk I/O as distributed and sequential as possible.
Discussion courtesy of: EhevuTov
This recipe can be found in it’s original form on Stack Over Flow