Few months ago, a client approached us with below complaints from users.
“Video is slow…”
“Taking too long to upload!”
On the surface, these issues seemed network related. But a deeper analysis soon revealed multiple issues with many solutions.
Let’s look at what was going on under the hood. The whole application, including video assets was hosted on a single server. When traffic increased, this server struggled to handle the demand which often resulted in errors. Below mentioned are a few problems that were present:
- Failure during high traffic
- Limited storage capability on instance
- Slow streaming
- Large video uploads taking too long
- Running compute intensive and time consuming tasks inside request and response cycle
- Video transcoding
- Thumbnail creation from video
- Thumbnail resizing with ratio correction
- Sending large number of email notifications
- Cache invalidation
- Indexing to search engine
- Inability to scale horizontally to meet demand
After analysis of the problem, it was clear that we had to re-architect a whole section of the application. A phased approach was taken to improve user experience ASAP .
Step One: High Impact, Low Effort
It was obvious how we can improve streaming performance and storage limitation. Move video storage to a cloud storage service and deliver video using a CDN. This provides excellent streaming performance, high durability, and massive storage capability. This leads to reduction of load on the application server.
Now let’s focus our attention on improving the user experience of the content creators.
Step Two: Prioritise!
The main problem usually faced by content creators was the delay to successfully upload videos. As mentioned above, this delay was only partly due to the actual file upload. Upon upload of the video, a large number of time-consuming and compute intensive processes had to be completed before the user receives a response. This degraded user experience significantly and contributed to the overall load of the server.
Firstly, speed was slightly improved by channeling upload through CDN. Secondly, all heavy tasks after file upload were moved to background worker for asynchronous execution.
This leads to improvement in response time, as content creators see the success message immediately after file uploads. Time-consuming tasks such as transcoding and sending a large number of emails is no longer a problem affecting users.
Step three: Scaling up
Now that we have improved video streaming, upload performance and user experience. Let’s focus on the ability to handle higher traffic volumes by scaling horizontally.
We started with separation of front-end site, background worker and message broker. Then placed the front end site behind a load balancer and made it stateless. This setup now allows us to scale out horizontally as required. Now that we are sending jobs to background workers through a message broker, we can horizontally scale workers too.
Note: As this blog post is meant to provide a very high-level overview only, explaining each component of the implementation is beyond the scope of this article. I may follow this up with a series of posts explaining the specifics as time permits.