Site Logo
Osman's Odyssey: Byte & Build
Chronicles of a Perpetual Learner

LLMs · · ·

  • DeepSeek R-1 671B on 14x RTX 3090s: Real-World Results & Key Takeaways

    Posted on
    2 Minutes

    How KTransformers Dominated llama.cpp in Real-World Inference

    Hello! If you don’t know me, I’m the guy with the 14x RTX 3090s Basement AI Server . Earlier this week, alongside @OsMo999 as my guest commentator, I livestreamed running DeepSeek R-1 671B-q4 using KTransformers on my AI Server and below are the key takeaways from that 6-hour session.

    I was inspired by an announcement from the KTransformers Team showcasing optimizations for running and offloading DeepSeek R-1 671B, 4-bit quantized, to the CPU. Instead of just benchmarking this on my own, I livestreamed everything from A to Z—during the livestream, I went beyond the initial plan and dived into how I use my AI server with vLLM, ExLlamaV2, and llama.cpp, and ran some comparisons.

    KTransformers boosted prompt eval speeds by ~15x over llama.cpp! This matched the benchmarks they released in their release docs. Here’s the eval data for the run at the 1:39:59 mark of the stream:

    Prompt Evaluation

    Generation Evaluation

    Funny enough, last week I wrote a blog post saying not to use llama.cpp for multi-GPU setups… and then I ended up livestreaming it this week!

    I plan to stream regularly, so let me know what you’d like to see next! Maybe you’d even like to join as a guest? 😎

    I have previously written a lot of in-depth blogposts on LLMs and AI, but never livestreamed on my own, so I’d love to hear your feedback! Drop your thoughts, ideas, or even suggestions for future AI server experiments and livestreams.

    Find me below:

  • Resources From X/Twitter Audio Space on LLMs & AI (2025-02-02)

    Posted on
    3 Minutes

    Curated Links & Insights from the X/Twitter Space on LLMs, RAG, and AI Tools

    Here are the resources that were shared and discussed during the Space on February 2nd, 2025. I’ve also included a few additional resources that I believe will enhance the collection. Space recording (in Arabic) can be found here.

    AI, ML, and Neural Networks Visualization

    LangChain and Node.js Tutorials

    AI adoption in the Middle East

    Three Takeaways From DeepSeek’s Big Week

    Six Takeaways From a Monumental Week for AI

    Shared by @Mishtar

    Generative AI in Action

    AI Agents in Action