Meta AI, known earlier as Facebook AI, has launched what it calls as the “first high-performance self-supervised [machine learning] algorithm” called data2vec. data2vec is aimed at achieving self-supervised learning beyond specific use cases. Hitherto, self-supervised models were such that they could solve only a specific problem. A self-supervised language model could not solve a visual problem and a self-supervised visual model could not solve an audio problem. How data2vec will be different is that it will use the same algorithm to solve distinct problems , and move a step forward towards generalized artificial intelligence. A single model can now see, read, and listen, and comprehend rules across all these inputs. According to Meta AI, ‘through self-supervised learning, machines are able to learn about the world just by observing it and then figuring out the structure of images, speech or text.’ This approach is more effective for machines as they can now complete tasks of greater complexity like understanding the text for more and more spoken languages. With data2vec, Meta AI claims to be getting ‘closer to building machines that learn about different aspects of the world around them without having to rely on labeled data.’ We are nearing a future where AI could be able to use videos, audio recording, and articles to learn about even complicated subjects such as a game of chess or soccer, thus making AI more adaptable. Meta AI also claims that data2vec ‘outperformed the previous best single-purpose algorithms for computer vision and speech and it is competitive on NLP tasks’. The main idea behind data2vec is to enable machines to perform unfamiliar tasks as well. This will also bring computers a step closer to a world wherein computers will rely on less and less labeled data to complete their tasks.
The new algorithm works on a teacher network and a student network. The teacher network computes tasks from text, audio, or images and then the same is masked to repeat the process for a student network, which is entasked to predict representations of the full input data, while being given just a part of it. The prediction comes from internal representations of the input data, hence removing the dependence on a single modality.
One can access the open source code here.
If you are looking forward to machines with less reliability on labeled data or want to talk about data2vec contact us at firstname.lastname@example.org and subscribe to our newsletter