Blog

Let’s hop on board with YOLO-WORLD – an efficient, zero-shot object detector

When you approach the problem of detecting objects in an image, you often choose to detect specific classes. This entails adapting the model to the particular task: preparing datasets, labeling new samples, and training the model.

This process is often time-consuming and tedious. Zero-shot models are coming up against such inconveniences. A collection of methods that allows pre-trained weights to detect classes of interest without fine-tuning. These solutions adopt textual prompts for the user to specify a list of object classes of interest.  Zero-shot models typically use heavyweight architectures based on transformers. Consequently, they could be faster. The delay during inference may prevent such a solution from being used in many applications.

The YOLO-WORLD mentioned in the title is a zero-shot model that has overcome the problem of heavy architecture and slow inference.

A zero-shot solution that draws inspiration from the YOLO family of architectures.

According to the publication (https://arxiv.org/pdf/2401.17270.pdf), the model is up to 20 times faster than competing methods.

wykres1
Source: [https://arxiv.org/pdf/2401.17270.pdf]
This solution’s secret is using an efficient Yolo backbone, providing object detection in an image. That includes image features in subsequent analysis. The list of classes the user gives is processed by the text encoder, resulting in text embeddings. Those embeddings are reused in later model inferences, giving us faster performance. Later, the data goes through a custom network performing cross-modality fusion.

wykres2
Source: [https://arxiv.org/pdf/2401.17270.pdf]
It’s a solution that allows us to have real-time zero-shot object detection, thus providing an incredible convenience without compromising our applications’ performance requirements.

Noctuai boasts its proprietary platform for implementing various video analytics models, AICam. If anyone is interested in deploying specialized solutions based on innovative techniques such as those described in this blog, we invite you to contact us. With over ten years of experience in IT and deployments across industries from Oil & Gas to healthcare worldwide, we are well-equipped to meet diverse needs.