VideoMDM learns to generate 3D body motion without ever seeing 3D training data, which could open motion generation to the vastly larger world of ordinary video.
Real AI work, shown with the how.
HYDRA-X is a research model that handles images and video through a single shared tokenizer, which turns out to matter quite a bit for how unified multimodal models are actually built.
One tokenizer for images and video in a single model
JUN 11Kwai's open 30B MoE model can reason over hour-long videos in a single pass, which puts it in rare company among openly released multimodal models.
Keye-VL-2.0: open multimodal model built for long video
JUN 9Point an AI coding agent at a product URL and get a working Remotion project back, no video-generation model in the loop.
Remotion ad video skill: code-driven ads from a URL
MAY 19Microsoft's 3.8B-parameter diffusion model punches at FLUX and SD3 levels while costing meaningfully less to train, which matters if you care about what comes next in open image generation.
Lens: a leaner text-to-image model that trades compute for quality
MAY 8An 8B-parameter open-weight image model from Baidu that claims top-of-class quality among open generators, worth a look if you're weighing alternatives to Flux or SD3.
ERNIE-Image: Baidu's open text-to-image model
APR 14A set of 19 Claude Code skills that wire Claude directly into Higgsfield AI, so you can generate images, kick off Seedance 2.0 video jobs, and run full UGC ad pipelines from a single automated workflow.
Claude Code skills for Higgsfield AI video and image pipelines
APR 13A set of structured skills for tools like Cursor or Copilot that steer the AI through a real design process instead of letting it freestyle.
Agent skills that keep AI coding tools on a design process
MAR 28A custom ComfyUI node that takes a reference image and a reference voice and keeps both consistent through a generated clip, so you can build video around a specific identity instead of a generic one.
ComfyUI node for locking a character's face and voice across generated video
MAR 21A multimodal open model from GAIR that turns text or images into video with synced audio, tuned specifically for human subjects.
daVinci-MagiHuman: human-focused video with audio from one model
MAR 2167 ready-to-drop DESIGN.md and SKILL.md files that teach agentic tools your taste before they touch a pixel.
A curated library of design skill files for AI agents
MAR 9A set of skills that loads Claude Code with working knowledge of ComfyUI's node APIs, so you can describe the node you want and get code that actually fits the system.
Claude Code skills for building ComfyUI custom nodes
MAR 5Lightricks' latest open model takes text, images, or existing clips and returns video with audio already attached, skipping the separate sound step that makes most open video pipelines a chore.
LTX-2.3: open video and audio from one model
MAR 4An MCP server that connects Claude Code directly to ComfyUI, so you can generate images and video, edit your graph, and manage models without leaving your terminal session.
ComfyUI control from inside Claude Code
FEB 15A Claude Code skill that runs 250-plus checks across your Google, Meta, YouTube, LinkedIn, TikTok, Microsoft, and Apple Ads accounts and hands back a weighted score with recommendations.
Paid ad audit skill for Claude Code, across seven platforms
FEB 11An open-source MCP server that lets Claude and other AI assistants drive Photoshop directly, so you can describe what you want and watch the layers move.
Control Photoshop from an AI assistant via MCP
JAN 26A custom node suite that brings Alibaba's Qwen3-TTS models into your ComfyUI workflow, letting you generate, design, and clone voices without leaving the graph.
Qwen3-TTS in ComfyUI, with voice cloning
JAN 22A custom node that gives you an interactive 3D viewport inside ComfyUI so you can set exact camera angles and pipe them straight into your image generation prompts.
3D camera angle picker for ComfyUI
JAN 8A fine-tune on Qwen's image editing model that takes a photo and hands back a version of it from a new camera position, which saves a reshoot for product or object work.
LoRA for re-shooting a subject from a different angle
JAN 7Lightricks' open video model that generates picture and sound together in one pass, saving you the separate audio step that most open models leave behind.
LTX-2: open video model with audio built in
JAN 3Nothing cleared the bar here yet — the standard stays high.