Smart Video Filter project

After 2 years and 3 months of efforts I finally released the Smart Video Filter project. It started as a long hope of mine from several years ago: as a YouTube viewer, I should be able to filter only universally liked videos with high ratio of likes to dislikes. The YouTube search results and recommendations often contain objectively bad videos, e.g. which are low quality/take 10% of the screen or contain static pictures or are offensive/cruel or are aimed at advertising some product with an unrelated video. All those videos would get a lot of dislikes and relatively few likes, such that their ratio = likes/dislikes is low. The videos I would really like to watch are the ones with very high ratio = likes/dislikes. These videos are objectively amazing and are out of this world. Hence the idea of the filtered search, where for any combination of search terms the user could filter only the videos with high enough “ratio”. The idea can be trivially extended to ratios of likes to views, comments to views, dislikes to views, etc.

A small concern in the above approach is that videos of similar quality with fewer views tend to naturally have high ratios of likes to dislikes (or likes to views). The videos with fewer views tend to be watched by subscribers/aficionados/lovers. As the video gains popularity, it gets exposed to a wider audience, which typically likes it less. Then the ratio goes down. The primary goal is to find the best of all videos. Then we can for each number of views check how many videos have a higher ratio of, e.g. likes to views and only provide the results in top x% of the ratio. A uniform sample of videos across all range of views would be readily obtained by selecting videos in top x% (of some ratio) without specifying the target number of views. That is precisely the idea behind  Smart Video Filter.

Though the idea appears simple, the implementation took many evenings over a long period of time. The project is quite rich in required expertise: UX design, architecture, back-end engineering, front-end engineering, and devops skills. The technologies include NoSQL databases MongoDB and Elasticsearch, front-end technologies Angular 5 and Angular Material from Google, CentOS 7 administration and cluster administration. The project went through several stages. The initial UI was written in AngularJS and then rewritten in Angular 5. The initial backend datastore included PostgreSQL, which was unable to support the required I/O loads and was switched to MongoDB. Initially the project used Elasticsearch 2.1 and then gradually migrated up to Elasticsearch 6.2. Retrieval and refresh of videos and channels metadata need to rely on non-trivial algorithms to be efficient and not use up a daily quota before noon. All that relies on heavily multi-threaded and fault-tolerant Java backend. The underlying cluster system implements high availability, when after a complete loss of one machine all systems still work. YouTube terms of service and developer policies are quite strict and took a while to comply with. Recently, I got an audit review from YouTube compliance team, and well, they aren’t shutting me down yet.

I’m greatly enjoying the final implementation myself, while searching for the best videos and filtering over duration in a fine-grained way. My long-term hope came true! Even now I got heavily distracted using the service instead of writing the post. 46:1 ratio video of Donald Trump “singing” Shake it Off by Taylor Swift is an amazing composition! The project has obvious ways to improve to substitute the YouTube original search even more: provide recommendations and play videos on the site without redirecting to YouTube. However, Smart Video Filter is a “small market share” application. If the “market maker” YouTube itself was to implement it, then lots of videos would not be regularly shown in search results/recommendations, which would have discouraged the content creators. Hope you enjoy this niche service as I enjoy it myself!

Leave a Reply

Your email address will not be published. Required fields are marked *