In this newsletter we are going to explore the hugging face space of the week.

OmniLottie

OmnieLottie is a model to generate Lottie Animations from Text Prompts. Lottie Animations are lightweight, scalable, vector-based animation file formats (JSON) that can be used on websites, mobile apps, and UI design

The text prompt used is:

a blue bird appearing, pulsing while sliding downward, lingers briefly, then growing back while sliding upward to reset with clear phase changes, repeating seamlessly

and the output is:

The generation took 808 tokens and 48.4s generation time. This can be used nicely for websites to generate nice animations. You could generate nice 404 Pages with this.

Qwen3.5-0.8B WebGPU demo

This Qwen Mode is a 0.8 Billion Parameter model with a incredible size of ~85Mb, it can run on your browser. In the hugging face space you can either upload a video or use your camera. I’m going to use the OBS virtual camera option to show a video of the popular streamer Speed:

The initial prompt is: Briefly describe what you see (2 sentences max).

The description of the Qwen Model is black male reacting shocked . This one seems good.

Ethos Studio

In Ethos Studio you can upload mp3 files and let the AI transcribe and identify speaks. I tried to upload 10s Audio of me speaking, but I always got a Webhook error, so lets the demo from them:

The demo shows a mp4 file uploaded with two different speaks identified. The transcription matches the audio of the video and also matched the right speakers. Can be nicely used for automated video generations.

Qie Object Remover

In Qie object remover you can upload images and draw boxes around objects to remove

The prompt I’m using is Remove the red and blue boxes with #1 and #2

The Image I use is from Peter and I want to remove the microphone

This is the result. Wow…

Okay, WOW. (the reason why it is cropped is my fault, I didn’t change the default height & width. But this is incredible. I often used Nano Banana Pro and Qwen Image Edit but this one is really really good. I’m going to use this from know on. Whats even better? You can run it 100% locally using docker:

docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all \
	registry.hf.space/prithivmlmods-qie-object-remover-bbox:latest python app.py

Qwen 3VL — Multimodal Understanding

This Space supports:

Image Understanding
Video QA
Video Detection
Video Point Tracking

Lets test the Video Point Tracking. Point Tracking is defined as:

⁽^{Video Point Tracking}^{— Specify what to track. The model locates 2D point coordinates on sampled frames and overlays tracking dots across the full video. (max_secs<=7))}

For this I’m going to use a soccer GIF:

It only supports mp4/video files, so lets convert it to mp4 and upload it. The prompt is going to be: Track the football

☹ Unfortunately it didn’t work. The Application crashed 4x. Maybe some of you can try it out

LTX-2.3 Distilled (22B): Fast Audio-Video Generation

This Model generates a video with audio by giving a reference image + prompt

The Image I gave him is my “cryptopunk” Image. The text prompt I gave is: Make this image come alive with cinematic motion, smooth animation and scream "Subscribe"

The output is really good, but It has no audio :/

9. March Hugging Face Space of the week