Anime Denser

Note: the videos in the showcase post don’t fully represent what this LoRA does, please pay attention to the grids in gallery instead. DescriptionThis is an enhancement (slider) LoRA aimed at controlling the compositional density of the environment. Detailers and tweakers have been some of my favorite types of LoRAs since the SD1.5 days, so I wanted to create something similar for a video model. It is the first (but not the last) of the enhancement LoRAs I planned to train. It’s based on a training concept I call differential LoRA, which is mostly grounded in the methodology described by ComfyTinker for this LoRA. I became interested by this approach and decided to reproduce it (though for, ehm, well, a different domain). I relied on these two notes: one, two. The concept itself is based on a simple yet powerful idea: by training a pair of LoRAs that differ in only one specific feature, and then merging them by subtracting one LoRA from the other, it becomes possible to “distill“ a distinct concept into a separate LoRA. This derived LoRA can then be used to control the presence and intensity of that feature during inference, simply by adjusting its strength. By effectively isolating a single concept, it’s possible (in theory) to prevent bleeding of side-noise features, which is inevitable if you just train a generic concept LoRA on a non-isolated dataset. However, the datasets used for the two anchor LoRAs must be very similar in every way except for the target feature being trained. Otherwise, feature bleeding can also take place even after careful extraction of target concept LoRA. It’s worth noting that the method itself is apparently not brand new - it shares a lot with earlier ideas like LECO, slider LoRAs (popularized by Ostris and AI Toolkit), “Concept sliders“ project, one can remember FlexWaifu and some techniques from NLP (like vector arithmetic for embeddings). Not to mention the famous script by kohya-ss. But this is the first time I’ve seen it applied to training video model LoRAs (although attention mechanisms in DiTs are perhaps not so different from “classical“ transformers). (And surely this concept is not unique to ML, it mirrors numerous techniques from classical signal processing, such as phase-cancellation for audio noise reduction.) UsagePlease bear in mind that this LoRA does not enforce an anime style on its own. However, since my interest in realistic videos lies somewhere below zero, I designed it specifically for use with 2D animation LoRAs and tested it exclusively in that context, so I named it Anime Denser accordingly. I havent tested it on anything else (and I don’t plan to). (That said, even though this LoRA was trained and tested only for 2D animation, it should - if I got it done right - probably work with realistic videos as well. The idea of concept isolation is meant to bypass styling differences. However, this is just an assumption on my side.) All the clips I’ve published were generated using this LoRA in combination with the “vanilla“ (base) model and the Studio Ghibli LoRA. (Maybe not the best choice, since Ghibli LoRA is so strong it could function as an enhancer on its own and sometimes neglect amplifying effect of this LoRA.) And I also applied self-forcing lightx2v LoRA, otherwise, creating demonstration videos for this LoRA would’ve taken way more than just two days. Workflow I use for this LoRA utilizes WanVideoWrapper, but any native workflow with LoRA loader should work too. Sample workflow can be obtained by dragging any of the videos from showcase post into ComfyUI or downloaded from here: JSON. (Be aware that vertical grid videos do not have embedded workflow. ) The safe strength range for this LoRA is from -3 (enforces lower environmental density) to 3 (enforces higher density, more complex compositional “clutter“). For long and elaborate prompts, it’s also possible to push it to 4 or -4 to compensate for text encoder influence, but going beyond that will most likely result in noisy outputs. (After extensive testing, I noticed that at high strength, the LoRA sometimes - not always - may slightly amplify green hues, mostly in indoor environments. The reason is unclear; perhaps some feature maps were subtracted disproportionately during the subtraction process, or maybe it tries to somehow “densify“ lighting.) DatasetFor differential LoRAs, the dataset plays a crucial role. For the concept I chose (compositional density), I needed two datasets: one with highly detailed (“dense“) scenes, and another with low-density (“sparse“) scenes. The more these scenes have in common, the more effective the noise cancellation will be during subtraction. Ideally, I should have collected views of the same scene but with different levels of detail. At first, I planned to do this by gathering some well-detailed scenes from anime movies and running them through VACE or a simple V2V pipeline with low denoise settings. But then two
Back to Top