Meta's A.I. video generator tool

The next step to art creation by AIs

Recently, we are seeing ever more how the power of AIs in different fields can accomplish different tasks, though not always perfectly, with amazing results. Just think about the latest AI image generator tools that are spreading the internet. Day after day, they can produce beautiful images, even copying famous artists’ styles.

Meta is now trying to take a step forward with a tool able to generate videos through Artificial Intelligence. Its new tool, called Make-A-Video is available via Twitter. Although the results may look pretty weird, it would be no surprise if AI video-generation tools would overtake AI image-generation tools as a new trend.

However, achieving good results is not as easy as for images. An animation needs a higher degree of coherence between frames and the ability to make subjects interact and move accordingly. That’s why the error rate rises. In addition, video generation needs much more data to draw from.

Anyway, albeit we are in an early stage, Meta has achieved good results, and Make-A-video can generate results with just a few words as a prompt, just like Dall-E or Midjourney.

According to the research paper, the Meta team used an evolved version of diffusion’s Text-to-image generation model to animate images, although the lack of large datasets with high-quality text-video pairs is still a problem due to the complexity of modeling higher-dimensional video data because text-to-video AI models need to be trained by huge datasets that are too large compared to those of images.

To generate images, diffusion models begin with noise that is generated randomly, and then they gradually adjust it to get closer to the goal prompt, but the quality of the training data has a significant impact on how accurate the outcomes are.

>>> Is Google's A.I. LaMDA sentient?

But the amazing thing about the Meta algorithm is that it doesn’t need paired text-video data and therefore doesn’t require too much data to work.

Currently, Make-A-Video generates silent clips made up of 16 frames generated at 64 x 64 pixels, which are subsequently upscaled to 768 x 768 pixels using another AI model. They barely last for five seconds and only show one action or scene.

According to Meta, Make-A-Video’s AI learned “what the world looks like from paired text-image data and how the world moves from video footage with no associated text”. It was trained using more than 2.3 billion text-image pairs from the LAOIN-5B database and millions of videos from the WebVid-10M and HD-VILA-100M databases.

Meta claims that static images with paired text are enough for training text-to-video models since they may be used to infer movements, activities, and events. In a similar way, even without any text describing them, “unsupervised videos are sufficient to learn how different entities in the world move and interact”.

The researchers acknowledged that, like “all large-scale models trained on data from the web, [their] models have learned and likely exaggerated social biases, including harmful ones”, but claimed to have done what they could to control the quality of the training data by filtering LAOIN-5B’s dataset of all text-image pairs that contained NSFW content or toxic words. One of the main problems in the industry is preventing AIs from producing insulting, false, or dangerous content.

Anyway, the results look like stop-motion videos with some glitches that make them seem surreal or dreamy.

The tool can be applied in a few different ways, such as to give motion to a single image, to fill in the gaps between two photos, or to create new iterations of a video based on the original.

>>> Beyond AI transformers

It’s not hard to imagine a future where our stories could come to life in a movie completely generated by an A.I. where not only images but also music and dialogs are created by an algorithm. That would be amazing for those who would like to have the opportunity to see what their stories would be like. However, some creators may be worried this technology could steal their creativity. However, these tools could integrate with the existing creative processes, adding new styles. Nonetheless, when the quality becomes hyperrealistic, it may happen anyway, but the major problem will be dealing with media that look so realistic that they could be taken for real with all the risks associated.

However, you can already make videos with the help of AI using an all-in-one video-editing app that allows you to get great results thanks to its powerful features. With Invideo, you can create a video with its essential but effective tools, which allows you to focus on the result even if you’re not an expert. In addition, with the help of AI, you can save time and find the right content. Invideo offers several video templates that can fit your content, as well as tools like a ‘background remover’, and the possibility to add text to your clips. You can also add music, stock images, and videos, as well as your own media. However, the most interesting features are the options to convert a blog article into a video, generate a script with AI, and convert text into a voiceover.

>>> Microsoft's VALL-E AI can clone your voice

Meta’s A.I. video generator tool

The next step to art creation by AIs

Related articles

The AI’s loneliness trap

The copilot conundrum

Autonomous AI agents

Figure’s Helix AI brain helps robots observe and learn tasks in their environment

Recent articles

The AI’s loneliness trap

The copilot conundrum

Autonomous AI agents

Figure’s Helix AI brain helps robots observe and learn tasks in their environment