Community chat: https://t.me/hamster_kombat_chat_2
Twitter: x.com/hamster_kombat
YouTube: https://www.youtube.com/@HamsterKombat_Official
Bot: https://t.me/hamster_kombat_bot
Game: https://t.me/hamster_kombat_bot/
Last updated 2ย months, 2ย weeks ago
Your easy, fun crypto trading app for buying and trading any crypto on the market
Last updated 2ย months, 1ย week ago
Turn your endless taps into a financial tool.
Join @tapswap_bot
Collaboration - @taping_Guru
Last updated 2ย weeks, 5ย days ago
Uber used RAG and AI agents to build its in-house Text-to-SQL, saving 140,000 hours annually in query writing time. ๐
Hereโs how they built the system end-to-end:
The system is called QueryGPT and is built on top of multiple agents each handling a part of the pipeline.
First, the Intent Agent interprets user intent and figures out the domain workspace which is relevant to answer the question (e.g., Mobility, Billing, etc).
The Table Agent then selects suitable tables using an LLM, which users can also review and adjust.
Next, the Column Prune Agent filters out any unnecessary columns from large tables using RAG. This helps the schema fit within token limits.
Finally, QueryGPT uses Few-Shot Prompting with selected SQL samples and schemas to generate the query.
QueryGPT reduced query authoring time from 10 minutes to 3, saving over 140,000 hours annually!
Link to the full article: https://www.uber.com/en-IN/blog/query-gpt/?uclick_id=6cfc9a34-aa3e-4140-9e8e-34e867b80b2b
How Much GPU Memory Needed To Server A LLM ?
This is a common question that consistnetly comes up in interview or during the disscusiion with your business stakeholders.
And itโs not just a random question โ itโs a key indicator of how well you understand the deployment and scalability of these powerful models in production.
As a data scientist understanding and estimating the require GPU memory is essential.
LLM's (Large Language Models) size vary from 7 billion parameters to trillions of parameters. One size certainly doesnโt fit all.
Letโs dive into the math that will help you estimate the GPU memory needed for deploying these models effectively.
๐๐ก๐ ๐๐จ๐ซ๐ฆ๐ฎ๐ฅ๐ ๐ญ๐จ ๐๐ฌ๐ญ๐ข๐ฆ๐๐ญ๐ ๐๐๐ ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐ข๐ฌ
General formula, ๐ฆ = ((๐ * ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ)/๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ) * ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ
Where:
- ๐ฆ is the GPU memory in Gigabytes.
- ๐ฉ is the number of parameters in the model.
- ๐ฌ๐ข๐ณ๐ ๐ฉ๐๐ซ ๐ฉ๐๐ซ๐๐ฆ๐๐ญ๐๐ซ typically refers to the bytes needed for each model parameter, which is typically 4 bytes for float32 precision.
- ๐ฆ๐๐ฆ๐จ๐ซ๐ฒ ๐๐๐ง๐ฌ๐ข๐ญ๐ฒ (q) refer to the number of bits typically processed in parallel, such as 32 bits for a typical GPU memory channel.
- ๐จ๐ฏ๐๐ซ๐ก๐๐๐ ๐๐๐๐ญ๐จ๐ซ is often applied (e.g., 1.2) to account for additional memory needed beyond just storing parameters, such as activations, temporary tensors, and any memory fragmentation or padding.
๐๐ข๐ฆ๐ฉ๐ฅ๐ข๐๐ข๐๐ ๐ ๐จ๐ซ๐ฆ๐ฎ๐ฅ๐:
M = ((P * 4B)/(32/Q)) * 1.2
With this formula in hand, I hope you'll feel more confident when discussing GPU memory requirements with your business stakeholders.
Best article on GenAI getting started
https://blog.bytebytego.com/p/where-to-get-started-with-genai
The matrix calculus for Deep Learning. Very well written. https://explained.ai/matrix-calculus/
Community chat: https://t.me/hamster_kombat_chat_2
Twitter: x.com/hamster_kombat
YouTube: https://www.youtube.com/@HamsterKombat_Official
Bot: https://t.me/hamster_kombat_bot
Game: https://t.me/hamster_kombat_bot/
Last updated 2ย months, 2ย weeks ago
Your easy, fun crypto trading app for buying and trading any crypto on the market
Last updated 2ย months, 1ย week ago
Turn your endless taps into a financial tool.
Join @tapswap_bot
Collaboration - @taping_Guru
Last updated 2ย weeks, 5ย days ago