DL in NLP / Cryptocurrencies / Telegram Index

Open in telegram

☆☆☆☆☆

⚑ Report channel

12,697 @dlinnlp

Description

Новости и обзоры статей на тему обработки естественного языка, нейросетей и всего такого.

Связь: @dropout05 (рекламы нет)

Advertising

We recommend to visit

OKX Новости

8,949,438 @okx_ru

Официальный новостной канал криптобиржи OKX | www.okx.com на русском языке.

💬 Комьюнити: t.me/okx_russian

👨‍💻 Поддержка: [email protected]

АДМИН: @DaniiOKX
Маркетинг: @CoffeeTrends

Last updated 2 weeks, 4 days ago

Meta Silense TON

6,350,079 @tonmetasilense

Here in simple language about TON and crypto

Founder: @metasalience
contact : @deftalk_bot

Last updated 3 months, 2 weeks ago

Дайте TON!

4,591,170 @givemetonru

Канал о TON и все что с ним связано:
1. Аналитика
2. Инсайды
3. Авторское мнение

Ведро для спама: @ton_telegrambot

Бот с курсами криптовалют: @TonometerBot

Чат: @chaTON_ru

Админ: @filimono

Last updated 2 weeks, 6 days ago

2 months, 2 weeks ago

Но дадут ли нобелевку по литературе за Deep Learning Book

3,300 #

2 months, 3 weeks ago

Soumith Chintala (создатель pytorch) выдаёт базу о том как тренироваться на 10К GPU
x.com/soumithchintala/status/1841498799652708712

Оч короткий TL;DR (всем рекомендую прочитать оригинал, он не длинный)

Maximize batch size and GPU utilization: 3D parallelism + gradient checkpointing
Overlap communication, e.g. while N-1th layer is computing backward, all GPUs with an Nth layer can all-reduce
Optimize for your GPU cluster network topology
Failure recovery, at 10k GPU scale, things fail all the time -- GPUs, NICs, cables, etc
At 10K scale bit flips actually become a problem and can cause loss explosions. Save your model state as frequently and as quickly as you can. To speed it up save it in shards and to CPU memory first and then in a seaprate thread write to disk

6,800 #

3 months ago

O1 mini inference scaling experiments

Прикольное саммари экспериментов одного чела. Коротко: если убедить модель дольше думать (что пока что непросто) pass@1 реально будет расти лог-линейно. При этом это скорее всего не majority voting или self consistency тк эти методы упираются в потолок

8,700 #

8 months ago

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
arxiv.org/abs/2404.15758

We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens.

arXiv.org

Let's Think Dot by Dot: Hidden Computation in Transformer...

Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task...

8,400 #