Google is gearing up to announce its very own AI-powered text-to-video generator. The company is calling them Google Imagen Video. The statement came just a few days after Meta informed the world about their text-to-video generator.
Watch on Twitter:
What to Expect?
It is in its development stage. However, by the time it is released for the wider population, it should be capable of producing 1280px768p videos from a basic written prompt at 24 fps.
Imagen Video would have the ability to generate videos based on the work of famous painters or artists like Vincent van Gogh. It should be able to generate 3D rotating objects preserving their structure simultaneously. Also, expect it to render text in different animation systems.
Imagen Video would take a text description and generate a three-frame-per-second, 16-frame video at 24×48 pixel resolution. The system will upscale and predict additional frames offering a final 128-frame, 24 fps video at 720p.
The Imagen Video even understands depth and three-dimensionality thereby letting drone flythrough video be made that rotate around capturing the items from various angles and that too without distortion.
Watch on Twitter:
The Imagen Video is trained on 14 million video-text pairs, 60 million image-text pairs, and the LAION image-text dataset. The latter was also used to train Stable Diffusion. Google is hoping its AI model to “significantly decrease the difficulty of high-quality content generation.” Imagen Video builds on Imagen by Google, which is a similar text-to-image program to DALL-E by OpenAI.
Things to be Noted
It is to be noted that the Imagen Video results are chosen by Google itself. Any independent or third-party tester is yet to try the program. It is believed by Google that Imagen Video can render text properly which Stable Diffusion and DALL-E struggle with. The text generated by them is barely readable.
Watch this YouTube video:
Area of Concern
However, Google has voiced concern over the ‘problematic data’ used to train the AI-image generator programs. The tech giant is trying to filter out violent content, sexually explicit content, cultural biases, and social stereotypes. It is apprehensive of the fact that the tool can be used “to generate, fake, hateful, explicit, or harmful content.” Google further added, “We have decided not to release the Imagen Video model or its source code until these concerns are mitigated.”