Heuristics AI / Cryptocurrencies / Telegram Index

Open in telegram

☆☆☆☆☆

⚑ Report channel

1,820 @heuristics_ai

Description

Ai research updates
LLMs
Reinforcement learning
Deep learning
GANs
Stable diffusion
Transformers
NLP

Kindly join (⁠☞⁠ ⁠ಠ⁠_⁠ಠ⁠)⁠☞ @heuristics_ai

We recommend to visit

Hamster Kombat Announcement

42,184,808 @hamster_kombat

Community chat: https://t.me/hamster_kombat_chat_2

Website: https://hamster.network

Twitter: x.com/hamster_kombat

YouTube: https://www.youtube.com/@HamsterKombat_Official

Bot: https://t.me/hamster_kombat_bot

Last updated 3 months, 3 weeks ago

Blum: All Crypto – One App

29,762,949 @blumcrypto

Your easy, fun crypto trading app for buying and trading any crypto on the market.
📱 App: @Blum
🤖 Trading Bot: @BlumCryptoTradingBot
🆘 Help: @BlumSupport
💬 Chat: @BlumCrypto_Chat

Last updated 9 months, 2 weeks ago

tapswap community

20,317,793 @tapswapai

Turn your endless taps into a financial tool.
Join @tapswap_bot

Collaboration - @taping_Guru

Last updated 4 months, 1 week ago

6 months, 1 week ago

60 #

6 months, 1 week ago

tasty multimodal transformer papers which i like in november of 2024
[3/3]

Here, i prepare papers with the model which process text and image embeddings. In all papers, authors used simple decoder architecture and predict next token. They work differently with images: normalizing flows, rectified flow, just mse between next and current tokens.

Multimodal Autoregressive Pre-training of Large Vision Encoders
by Apple
tldr: simple yet effective multimodal transformer
• one simple decoder which predict next img patches and next token.
• can be used for image understanding, img caption.
• bettter than sota contrastive models (SigLIP) in multimodal image understanding.
link: https://arxiv.org/abs/2411.14402

JetFormer: An Autoregressive Generative Model of Raw Images and Text by DeepMind
tl;dr: use normalizing flow instead of vqvae for image embeddings.
- train from scratch to model text and raw pixels jointly
- transformer predicts distribution of next image latents, so we will could sample during inference.
- normalizing flow do not lose information so potentially this approach might be good for understandings and generation at the same time.
link: https://arxiv.org/abs/2411.19722?s=35

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation by DeepSeek

tl;dr: combine next text token prediction with flow matching.
• model easily understands image and text prompt
• generate image embeddings from noise embeds via flow matching.
• use differeng image embeddings for understanding and for generation.
- understanding: [image - caption] : generation: [prompt -image]
link: https://arxiv.org/abs/2411.07975

my thoughts
Check out this tech plot twist - like something from an action movie! All the top labs are simultaneously ditching CLIP with its contrastive learning and switching to pure autoregression. And it makes total sense - why have separate encoders for images and text when you can teach one model to do it all?

DeepMind really went for it here - they straight up put normalizing flow right into the core architecture. Meanwhile, DeepSeek took a different route - mixing flow matching with VQVAE to enhance features. Both approaches work, and that's amazing! Apple's keeping up too - they built a super simple decoder that predicts both tokens and patches, and it just works better than SigLIP.

You know what's really cool? We're watching a new generation of models being born - universal, powerful, yet elegantly simple. The old CLIP+VQVAE combos will soon be history.

131 #

6 months, 1 week ago

163 #

6 months, 2 weeks ago

50 #

6 months, 2 weeks ago

Please vote !!

64 #

6 months, 2 weeks ago

83 #

6 months, 3 weeks ago

98 #

6 months, 3 weeks ago

105 #

6 months, 3 weeks ago

164 #

We recommend to visit

Hamster Kombat Announcement

42,184,808 @hamster_kombat

Last updated 3 months, 3 weeks ago

Blum: All Crypto – One App

29,762,949 @blumcrypto

Your easy, fun crypto trading app for buying and trading any crypto on the market.
📱 App: @Blum
🤖 Trading Bot: @BlumCryptoTradingBot
🆘 Help: @BlumSupport
💬 Chat: @BlumCrypto_Chat

Last updated 9 months, 2 weeks ago

tapswap community

20,317,793 @tapswapai

Turn your endless taps into a financial tool.
Join @tapswap_bot

Collaboration - @taping_Guru

Last updated 4 months, 1 week ago