ML tool for live stream content moderation
As the first step, our AI/ML engineers developed a powerful image classifier using a convolutional neural network (CNN). Its algorithms were designed to take a screenshot of every broadcast at a regular interval for unsafe content analysis to approve or escalate it to human moderators. The key features of this ML-powered content moderation tool were:
●
A CNN-based image classifier trained using TensorFlow
●
Ability to classify images across multiple categories
●
Sequential image processing
●
Deployed on a cluster of AWS EC2 instances with auto-scaling
●
Run on central processing units (CPUs), a solution that was quite cost-effective and sufficient for inference tasks when the client’s user base and the amount of user-generated content were relatively small
ML model inference optimization
As the user base of the client’s apps increased significantly, CPU performance was no longer matching the scale of user-generated broadcasts. The number of live stream samples for analysis rose to a few thousand per second, and the inference speed of the ML model became a bottleneck. It was critical to find a more efficient and cost-effective hardware solution so that the ML model could process images at scale and at speed.
Our MLOps engineers suggested moving the model to a graphical processing unit (GPU) to take advantage of its ability to perform a mass amount of processing operations simultaneously. This move delivered unprecedented efficiency gains, making deployment many orders of magnitude faster and cheaper. In particular, the following was implemented:
●
Adoption of MLOps best practices to manage processes across the lifecycle of the ML model
●
One ML model per GPU run on an auto-scaled cluster of microservices.
●
Batch processing of images instead of sequential processing
●
New Python libraries added, including Ray to improve service architecture and deployment and TensorRT to optimize model performance on GPU
●
Implementation of REST API service with FastAPI