Hugging Face reposted this
CatVTON is now available on Hugging Face Spaces! 900M parameters, Inference needs < 8G VRAM for 1024X768 resolution💪 Gradio Demo: https://lnkd.in/gZuUp4VC
The AI community building the future.
External link for Hugging Face
We’re on a journey to solve and democratize artificial intelligence through natural language.
Paris, FR
Hugging Face reposted this
CatVTON is now available on Hugging Face Spaces! 900M parameters, Inference needs < 8G VRAM for 1024X768 resolution💪 Gradio Demo: https://lnkd.in/gZuUp4VC
Hugging Face reposted this
An undergraduate in the Department of Computer Science at Korea University, deeply interested in artificial intelligence, particularly generative models in the vision domain.
Now, PAG is officially supported by Diffusers in the stable version! Try it out🥰 Use cases: https://lnkd.in/g8sxzutP Supported pipelines: https://lnkd.in/gkNejfRC I am deeply grateful to the amazing team at HuggingFace for their incredible work. Special thanks to Sayak Paul for taking charge of PixArt-Sigma and providing many insightful comments, Aryan V S for taking HunyuanDiT and refactoring the code brilliantly, Alvaro Somoza for handling Kolors and conducting numerous experiments, Apolinário Passos (Poli) for being an early adopter and advocate of PAG (he actually spread PAG!), and [YiYi Xu](github.com/yiyixuxu) for designing the overall framework for PAG pipelines. It has been truly inspiring to receive direct feedback from the passionate maintainers of Diffusers and to witness their swift code implementations and working methods while implementing Stable Diffusion 3 + PAG pipeline.
🧨 Diffusers v0.30 is out! One of the biggest releases till date packed with image, audio & video models among so many other improvements! Also my first release as a member on the team 🤗 Image pipelines: - Flux - Lumina - Auraflow - Kolors - Perturbed Attention Guidance 🤝 Stable Diffusion, AnimateDiff, HunyuanDiT, PixArt-Sigma, Kolors Video pipelines: - CogVideoX (introduces the first 3D VAE into Diffusers!) - Latte - AnimateDiff SparseCtrl - FreeNoise 🤝 AnimateDiff Audio pipelines: - Stable Audio Huge shoutout to Donghoon Ahn and team - the folks who created PAG which is a technique that improves baseline image generation quality with many tweakable settings to play around with for different stylistic effects (read more about it here: https://lnkd.in/guseSWFf)! I'm really inspired by how helpful they've been in the community, in providing insights and in integrating it to several other generation pipelines. We also came with day-0 support for Flux (by Black Forest Labs), Auraflow (by fal) and CogVideoX (by ZhipuAI and Tsinghua University). Witnessing the team work hard on so many parallel collaborations and being a part of it has been a fun and crazy learning experience. Checkout the release notes: https://lnkd.in/gSNbA_NV Also a huge thanks to all the heroes of Diffusers - the open source contributors who improve the library better everyday!
Hugging Face reposted this
Flux.1-DEV with ControlNet Canny!🔥🔥 The performance is next-level! Launch the app with this Colab: https://lnkd.in/gSapm6h9 OR access the app on Hugging Face Spaces: https://lnkd.in/gVGKu-zz
Hugging Face reposted this
📣 Introducing VFusion3D, a new image to 3D model from AI at Meta 🔥 Novel approach of creating a 3D generative model by leveraging a strong pre-trained video diffusion model! 💡Meta's VFusion3D has following key features: - A feed-forward 3D generative model - Scarcity of 3D data handled using a fine-tuned video diffusion model that can create high-quality multi-view 3D dataset - Trained on 3 million these synthetic data points - Creative Commons (CC BY-NC 4.0) license Official gradio app on Hugging Face Spaces: https://lnkd.in/g6AwYV9q
Hugging Face reposted this
Transformer Explainer Interactive Learning of Text-Generative Models discuss: https://lnkd.in/ejApsiQq Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of mathematical operations and model structures. It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together to predict the next tokens. Our tool requires no installation or special hardware, broadening the public's education access to modern generative AI techniques.
Hugging Face reposted this
Introducing Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙 > Trained on 45,000 hours of open speech (datasets released as well) > Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release) > Mini trained on a larger text encoder, large trained on both larger text & decoder > Also supports SDPA & Flash Attention 2 for an added speed boost > In-built streaming, we provide a dedicated streaming class optimised for time to the first audio > Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that > Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do) Apache 2.0 licensed codebase, weights and datasets! 🐐 Can't wait to see what y'all would build with this! 🤗
Hugging Face reposted this
I am super excited to announce that we've acquired XetHub! 🎉 XetHub has developed technologies to enable Git to scale to TeraByte-size repositories. Under the hood they've been adding file chunking and deduplication inside Git. This will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub's repos. 🔥 In the announcement blogpost (read it here: https://lnkd.in/e-jxSeCf), I also shared a few cool stats about where the Hugging Face Hub is today 🤯: • number of repos: 1.3m models, 450k datasets, 680k spaces • total cumulative size: 12PB stored in LFS (280M files) / 7,3 TB stored in git (non-LFS) • Hub’s daily number of requests: 1B • daily Cloudfront bandwidth: 6PB 🤯🤯
🔥🔥🔥
I am super excited to announce that we've acquired XetHub! 🎉 XetHub has developed technologies to enable Git to scale to TeraByte-size repositories. Under the hood they've been adding file chunking and deduplication inside Git. This will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub's repos. 🔥 In the announcement blogpost (read it here: https://lnkd.in/e-jxSeCf), I also shared a few cool stats about where the Hugging Face Hub is today 🤯: • number of repos: 1.3m models, 450k datasets, 680k spaces • total cumulative size: 12PB stored in LFS (280M files) / 7,3 TB stored in git (non-LFS) • Hub’s daily number of requests: 1B • daily Cloudfront bandwidth: 6PB 🤯🤯
Hugging Face reposted this
🧨 Diffusers v0.30 is out! One of the biggest releases till date packed with image, audio & video models among so many other improvements! Also my first release as a member on the team 🤗 Image pipelines: - Flux - Lumina - Auraflow - Kolors - Perturbed Attention Guidance 🤝 Stable Diffusion, AnimateDiff, HunyuanDiT, PixArt-Sigma, Kolors Video pipelines: - CogVideoX (introduces the first 3D VAE into Diffusers!) - Latte - AnimateDiff SparseCtrl - FreeNoise 🤝 AnimateDiff Audio pipelines: - Stable Audio Huge shoutout to Donghoon Ahn and team - the folks who created PAG which is a technique that improves baseline image generation quality with many tweakable settings to play around with for different stylistic effects (read more about it here: https://lnkd.in/guseSWFf)! I'm really inspired by how helpful they've been in the community, in providing insights and in integrating it to several other generation pipelines. We also came with day-0 support for Flux (by Black Forest Labs), Auraflow (by fal) and CogVideoX (by ZhipuAI and Tsinghua University). Witnessing the team work hard on so many parallel collaborations and being a part of it has been a fun and crazy learning experience. Checkout the release notes: https://lnkd.in/gSNbA_NV Also a huge thanks to all the heroes of Diffusers - the open source contributors who improve the library better everyday!
Hugging Face reposted this
An undergraduate in the Department of Computer Science at Korea University, deeply interested in artificial intelligence, particularly generative models in the vision domain.
Now, PAG is officially supported by Diffusers in the stable version! Try it out🥰 Use cases: https://lnkd.in/g8sxzutP Supported pipelines: https://lnkd.in/gkNejfRC I am deeply grateful to the amazing team at HuggingFace for their incredible work. Special thanks to Sayak Paul for taking charge of PixArt-Sigma and providing many insightful comments, Aryan V S for taking HunyuanDiT and refactoring the code brilliantly, Alvaro Somoza for handling Kolors and conducting numerous experiments, Apolinário Passos (Poli) for being an early adopter and advocate of PAG (he actually spread PAG!), and [YiYi Xu](github.com/yiyixuxu) for designing the overall framework for PAG pipelines. It has been truly inspiring to receive direct feedback from the passionate maintainers of Diffusers and to witness their swift code implementations and working methods while implementing Stable Diffusion 3 + PAG pipeline.
🧨 Diffusers v0.30 is out! One of the biggest releases till date packed with image, audio & video models among so many other improvements! Also my first release as a member on the team 🤗 Image pipelines: - Flux - Lumina - Auraflow - Kolors - Perturbed Attention Guidance 🤝 Stable Diffusion, AnimateDiff, HunyuanDiT, PixArt-Sigma, Kolors Video pipelines: - CogVideoX (introduces the first 3D VAE into Diffusers!) - Latte - AnimateDiff SparseCtrl - FreeNoise 🤝 AnimateDiff Audio pipelines: - Stable Audio Huge shoutout to Donghoon Ahn and team - the folks who created PAG which is a technique that improves baseline image generation quality with many tweakable settings to play around with for different stylistic effects (read more about it here: https://lnkd.in/guseSWFf)! I'm really inspired by how helpful they've been in the community, in providing insights and in integrating it to several other generation pipelines. We also came with day-0 support for Flux (by Black Forest Labs), Auraflow (by fal) and CogVideoX (by ZhipuAI and Tsinghua University). Witnessing the team work hard on so many parallel collaborations and being a part of it has been a fun and crazy learning experience. Checkout the release notes: https://lnkd.in/gSNbA_NV Also a huge thanks to all the heroes of Diffusers - the open source contributors who improve the library better everyday!