Deep Learning, Computer Vision and NLP

Description
Deep Learning๐Ÿ’ก,
Computer Vision ๐Ÿ“ฝ๏ธ &
#Ai ๐Ÿง 

Get #free_books,
#Online_courses,
#Research_papers,
#Codes, and #Projects,
Tricks and Hacks, coding, training Stuff

Suggestion @AIindian
Advertising
We recommend to visit

Community chat: https://t.me/hamster_kombat_chat_2

Twitter: x.com/hamster_kombat

YouTube: https://www.youtube.com/@HamsterKombat_Official

Bot: https://t.me/hamster_kombat_bot
Game: https://t.me/hamster_kombat_bot/

Last updated 1ย month ago

Your easy, fun crypto trading app for buying and trading any crypto on the market

Last updated 3ย weeks, 2ย days ago

Turn your endless taps into a financial tool.
Join @tapswap_bot


Collaboration - @taping_Guru

Last updated 1ย week, 2ย days ago

5ย days, 22ย hours ago

How Much GPU Memory Needed To Server A LLM ?

This is a common question that consistnetly comes up in interview or during the disscusiion with your business stakeholders.

And itโ€™s not just a random question โ€” itโ€™s a key indicator of how well you understand the deployment and scalability of these powerful models in production.

As a data scientist understanding and estimating the require GPU memory is essential.

LLM's (Large Language Models) size vary from 7 billion parameters to trillions of parameters. One size certainly doesnโ€™t fit all.

Letโ€™s dive into the math that will help you estimate the GPU memory needed for deploying these models effectively.

๐“๐ก๐ž ๐Ÿ๐จ๐ซ๐ฆ๐ฎ๐ฅ๐š ๐ญ๐จ ๐ž๐ฌ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐†๐๐” ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐ข๐ฌ

General formula, ๐ฆ = ((๐ * ๐ฌ๐ข๐ณ๐ž ๐ฉ๐ž๐ซ ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ)/๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐๐ž๐ง๐ฌ๐ข๐ญ๐ฒ) * ๐จ๐ฏ๐ž๐ซ๐ก๐ž๐š๐ ๐Ÿ๐š๐œ๐ญ๐จ๐ซ

Where:
- ๐ฆ is the GPU memory in Gigabytes.
- ๐ฉ is the number of parameters in the model.
- ๐ฌ๐ข๐ณ๐ž ๐ฉ๐ž๐ซ ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ typically refers to the bytes needed for each model parameter, which is typically 4 bytes for float32 precision.
- ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐๐ž๐ง๐ฌ๐ข๐ญ๐ฒ (q) refer to the number of bits typically processed in parallel, such as 32 bits for a typical GPU memory channel.
- ๐จ๐ฏ๐ž๐ซ๐ก๐ž๐š๐ ๐Ÿ๐š๐œ๐ญ๐จ๐ซ is often applied (e.g., 1.2) to account for additional memory needed beyond just storing parameters, such as activations, temporary tensors, and any memory fragmentation or padding.

๐’๐ข๐ฆ๐ฉ๐ฅ๐ข๐Ÿ๐ข๐ž๐ ๐…๐จ๐ซ๐ฆ๐ฎ๐ฅ๐š:

M = ((P * 4B)/(32/Q)) * 1.2

With this formula in hand, I hope you'll feel more confident when discussing GPU memory requirements with your business stakeholders.

#LLM

1ย week, 4ย days ago
2ย months, 2ย weeks ago
**The matrix calculus for Deep Learning.** โ€ฆ

The matrix calculus for Deep Learning. Very well written. https://explained.ai/matrix-calculus/

We recommend to visit

Community chat: https://t.me/hamster_kombat_chat_2

Twitter: x.com/hamster_kombat

YouTube: https://www.youtube.com/@HamsterKombat_Official

Bot: https://t.me/hamster_kombat_bot
Game: https://t.me/hamster_kombat_bot/

Last updated 1ย month ago

Your easy, fun crypto trading app for buying and trading any crypto on the market

Last updated 3ย weeks, 2ย days ago

Turn your endless taps into a financial tool.
Join @tapswap_bot


Collaboration - @taping_Guru

Last updated 1ย week, 2ย days ago