
LLMs · · ·
-
DeepSeek R-1 671B on 14x RTX 3090s: Real-World Results & Key Takeaways
Posted on
2 MinutesHow KTransformers Dominated llama.cpp in Real-World Inference
Hello! If you don’t know me, I’m the guy with the 14x RTX 3090s Basement AI Server . Earlier this week, alongside @OsMo999 as my guest commentator, I livestreamed running DeepSeek R-1 671B-q4 using KTransformers on my AI Server and below are the key takeaways from that 6-hour session.
I was inspired by an announcement from the KTransformers Team showcasing optimizations for running and offloading DeepSeek R-1 671B, 4-bit quantized, to the CPU. Instead of just benchmarking this on my own, I livestreamed everything from A to Z—during the livestream, I went beyond the initial plan and dived into how I use my AI server with vLLM, ExLlamaV2, and llama.cpp, and ran some comparisons.
KTransformers boosted prompt eval speeds by ~15x over llama.cpp! This matched the benchmarks they released in their release docs. Here’s the eval data for the run at the 1:39:59 mark of the stream:
Prompt Evaluation
Generation Evaluation
Funny enough, last week I wrote a blog post saying not to use llama.cpp for multi-GPU setups… and then I ended up livestreaming it this week!
I plan to stream regularly, so let me know what you’d like to see next! Maybe you’d even like to join as a guest? 😎
I have previously written a lot of in-depth blogposts on LLMs and AI, but never livestreamed on my own, so I’d love to hear your feedback! Drop your thoughts, ideas, or even suggestions for future AI server experiments and livestreams.
Find me below:
-
Resources From X/Twitter Audio Space on LLMs & AI (2025-02-02)
Posted on
3 MinutesCurated Links & Insights from the X/Twitter Space on LLMs, RAG, and AI Tools
Here are the resources that were shared and discussed during the Space on February 2nd, 2025. I’ve also included a few additional resources that I believe will enhance the collection. Space recording (in Arabic) can be found here.
AI, ML, and Neural Networks Visualization
LangChain and Node.js Tutorials
AI adoption in the Middle East
Three Takeaways From DeepSeek’s Big Week
Six Takeaways From a Monumental Week for AI
Shared by @Mishtar
Generative AI in Action
AI Agents in Action