<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Grok Mountain’s Substack]]></title><description><![CDATA[Grok model training and fine tuning.]]></description><link>https://www.grokmountain.com</link><image><url>https://substackcdn.com/image/fetch/$s_!XSCC!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e0cb741-745e-408e-8f0e-ebdc17910bdd_1280x1280.png</url><title>Grok Mountain’s Substack</title><link>https://www.grokmountain.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 04 May 2026 11:53:27 GMT</lastBuildDate><atom:link href="https://www.grokmountain.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Grok Mountain]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[grokmountain@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[grokmountain@substack.com]]></itunes:email><itunes:name><![CDATA[Grok Mountain]]></itunes:name></itunes:owner><itunes:author><![CDATA[Grok Mountain]]></itunes:author><googleplay:owner><![CDATA[grokmountain@substack.com]]></googleplay:owner><googleplay:email><![CDATA[grokmountain@substack.com]]></googleplay:email><googleplay:author><![CDATA[Grok Mountain]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Nvidia GPUs: Powering the Future of AI with the H100 and the upcoming B200]]></title><description><![CDATA[Exploring the Nvidia H100 and the soon to be released B200 GPU, their SXM form factors, and the specialized servers driving AI innovation.]]></description><link>https://www.grokmountain.com/p/nvidia-gpus-powering-the-future-of</link><guid isPermaLink="false">https://www.grokmountain.com/p/nvidia-gpus-powering-the-future-of</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 06 Mar 2025 23:56:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PpzJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Nvidia GPUs have become the backbone of artificial intelligence, driving the computational power needed to train and run today&#8217;s most advanced models. In this blog post, we&#8217;ll take a deep dive into the H100&#8212;the current gold standard for AI workloads&#8212;the upcoming B200, and the specialized hardware that makes these chips shine. We&#8217;ll also explore why optimization is key in this high-stakes world.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PpzJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PpzJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PpzJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:275771,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158554578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PpzJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PpzJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b41b72a-91f5-4758-926d-d145e5a1f03d_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The H100: The Current King of AI GPUs</strong></p><p>Let&#8217;s start with the Nvidia H100, the reigning champion for AI model training and inference. This GPU is a beast, capable of handling massive workloads with ease. A single H100 costs between <strong>$25,000 and $40,000</strong>, depending on configuration and volume&#8212;a hefty price tag, but one that&#8217;s justified by its performance. For example, it can train a billion-parameter model with a textbook&#8217;s worth of data in about <strong>7 minutes</strong>. That&#8217;s the kind of speed that turns weeks of computation into a coffee break, making it a must-have for AI researchers and companies pushing the boundaries of machine learning.</p><p>The H100&#8217;s power isn&#8217;t just about numbers; it&#8217;s about enabling breakthroughs. As Sam Altman, CEO of OpenAI, has said, <em>&#8220;The amount of compute needed to train AI models is doubling every few months, and without advancements in hardware, we wouldn&#8217;t be able to keep up with the demands of AI research.&#8221;</em> The H100 is a direct answer to that demand, fueling everything from chatbots to cutting-edge scientific simulations.</p><div><hr></div><p><strong>The B200: A New Era of Performance</strong></p><p>But Nvidia isn&#8217;t stopping there. Enter the <strong>B200</strong>, an exciting new chip set to debut soon. Priced at around <strong>$30,000 to $40,000</strong>&#8212;not much more than the H100&#8212;it promises to be a game-changer. How? It&#8217;s expected to be <strong>four times faster</strong> than the H100, capable of training that same billion-parameter model with a textbook&#8217;s worth of data in <strong>less than 2 minutes</strong>. That&#8217;s not just an incremental improvement; it&#8217;s a leap that could unlock new possibilities in AI, from trillion-parameter models to real-time applications that redefine industries.</p><p>The B200&#8217;s arrival is generating buzz for good reason. Companies like xAI, which powers projects like Grok, are already deploying tens of thousands of H100s in data centers like Colossus. With the B200, they&#8217;ll have even more horsepower to tackle the next generation of AI challenges.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kygh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kygh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kygh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kygh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kygh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kygh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158554578?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kygh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kygh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kygh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kygh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feae8fc4e-fa2e-414f-9581-fa5348bb9580_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>SXMs: The Specialized Form Factor</strong></p><p>These modern AI chips aren&#8217;t your typical consumer GPUs. They come in a specialized form factor called <strong>SXM</strong> (Server PCI Express Module), designed specifically for high-performance computing. When you buy an H100 or B200, you&#8217;re not just getting the GPU&#8212;you&#8217;re getting it embedded in an SXM module. This isn&#8217;t optional; it&#8217;s essential.</p><p>So, what is an SXM? Think of it as the perfect partner to the GPU, like a custom chassis for a high-performance engine. The SXM provides the <strong>power delivery</strong>, <strong>cooling</strong>, and <strong>network connectivity</strong> that these chips need to operate at their peak in data center environments. Each GPU has unique specifications&#8212;hundreds of watts of power, intense heat output, and high-speed data transfer requirements&#8212;and the SXM is tailored to meet them. It wouldn&#8217;t make sense to purchase an Nvidia GPU without its corresponding SXM; they&#8217;re a single, inseparable unit built to work together.</p><div><hr></div><p><strong>Specialized Servers: Built Around the GPU</strong></p><p>These SXM modules don&#8217;t just sit on a shelf&#8212;they&#8217;re installed into servers designed specifically for them. Nvidia offers the <strong>DGX series</strong>, with each model crafted for a particular GPU generation. For instance, the <strong>DGX H100</strong> is built to house H100 GPUs, complete with optimized power supplies, advanced cooling systems, and high-speed interconnects like NVLink. Some of these servers can hold <strong>up to eight GPUs</strong>, turning a single machine into a computational juggernaut.</p><p>But Nvidia isn&#8217;t the only game in town. Third-party vendors like <strong>Dell</strong>, <strong>HPE</strong>, and <strong>Supermicro</strong> also offer servers that support Nvidia&#8217;s SXMs. These aren&#8217;t off-the-shelf systems; they&#8217;re engineered to meet the exact needs of each GPU generation, from power draw to thermal management to network bandwidth. Whether it&#8217;s a DGX or a third-party server, the design revolves around the GPU&#8212;not the other way around.</p><div><hr></div><p><strong>Optimization: Maximizing Every Dollar</strong></p><p>When a single GPU costs tens of thousands of dollars, you don&#8217;t leave performance on the table. That&#8217;s why optimization is critical in high-end AI hardware. Servers aren&#8217;t generic; they&#8217;re purpose-built for specific GPU chips to squeeze out every ounce of capability. The power supply must deliver the precise wattage, the cooling must handle the heat, and the interconnects must support multi-GPU communication. This tight integration ensures that organizations get the most out of their investment.</p><p>This isn&#8217;t just about AI either. Nvidia GPUs are accelerating breakthroughs in fields like <strong>drug discovery</strong>, where they simulate molecular interactions to find new treatments, and <strong>climate science</strong>, where they model environmental changes with unprecedented detail. In these high-stakes applications, an optimized system can mean the difference between success and stagnation.</p><div><hr></div><p><strong>Looking Forward</strong></p><p>The H100 is the current gold standard, but the B200 is poised to take the crown, offering blazing speed at a similar cost. These GPUs, paired with their SXMs and housed in specialized servers from Nvidia&#8217;s DGX line or third-party vendors, represent a carefully optimized ecosystem. As AI continues to transform the world, this hardware will remain at the heart of it, driving innovation and discovery at an unprecedented pace. The future of GPUs is bright&#8212;and fast.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Trump's Bargaining Chip: How AI GPUs Could Lead to Peace with Putin]]></title><description><![CDATA[Could AI GPUs be Trump&#8217;s key to peace with Putin? This theory explores how tech might stabilize Ukraine, end conflict, and shift global alliances.]]></description><link>https://www.grokmountain.com/p/trumps-bargaining-chip-how-ai-gpus</link><guid isPermaLink="false">https://www.grokmountain.com/p/trumps-bargaining-chip-how-ai-gpus</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Tue, 04 Mar 2025 01:59:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K9TQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the ever-evolving chess game of global power, certain commodities have historically defined the board&#8212;oil fueled empires, gold built wealth, and uranium tipped the scales of war. Today, a new contender has emerged: AI chips, specifically Graphics Processing Units (GPUs), which are driving the artificial intelligence revolution. This blog post explores a bold prediction: that AI GPUs could become the ultimate bargaining chip in a potential U.S.-Russia deal, brokered by a future Trump administration, to secure peace, rare earth minerals, and American manufacturing jobs. More broadly, it underscores a critical truth&#8212;nations worldwide are waking up to an AI arms race, and GPUs are the essential component to stay in the game.</p><p>This is a speculative theory, not a confirmed policy, but it reflects the growing realization among world powers that AI chips are no longer just tech tools&#8212;they&#8217;re strategic assets reshaping global trade and geopolitics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K9TQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K9TQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K9TQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:221882,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158338316?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K9TQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!K9TQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f85825e-df56-45f8-a9ac-397ab6e45bd1_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The AI Arms Race: GPUs as the Key to Power</strong></p><p>Artificial intelligence is transforming everything&#8212;military drones, cybersecurity, economic automation, and beyond. At its core are GPUs, the specialized microchips that power the immense computational demands of AI development. Countries that control access to these chips hold the keys to technological supremacy, while those without them risk falling behind in a race that&#8217;s heating up fast.</p><p>The parallels to history are striking. Just as oil powered the industrial age and uranium defined the nuclear era, GPUs are now the differentiating commodity in the AI age. World powers like the United States, China, and Russia see AI as a cornerstone of national security and economic strength. The U.S. leads with companies like Nvidia producing cutting-edge chips like the H100, but the gap is narrowing as others scramble to catch up. For any nation aiming to compete, GPUs aren&#8217;t optional&#8212;they&#8217;re essential.</p><div><hr></div><p><strong>Russia&#8217;s Dilemma: Talent Without Tools</strong></p><p>Russia is a case study in this high-stakes race. The country boasts some of the world&#8217;s most talented software engineers&#8212;experts in machine learning, cryptography, and coding who could rival Silicon Valley&#8217;s best. But talent alone isn&#8217;t enough. To turn their skills into AI breakthroughs, Russia needs GPUs&#8212;lots of them. Sanctions and export controls have left Moscow with a critical shortage, hobbling its ability to compete in the AI arena.</p><p>This gap makes GPUs a glaring vulnerability&#8212;and a powerful incentive. Russia doesn&#8217;t just want these chips; it <em>needs</em> them to unlock its potential, modernize its economy, and bolster its military capabilities. In a world where AI is the new frontier, GPUs are the one commodity Russia is poor in, making them a prime target in any negotiation.</p><div><hr></div><p><strong>The U.S. Agenda: Jobs, Minerals, and Independence</strong></p><p>While Russia hungers for GPUs, the United States has its own wishlist. Two priorities stand out: revitalizing local manufacturing and securing rare earth minerals. The U.S. wants to bring high-tech jobs back home, a cornerstone of Donald Trump&#8217;s &#8220;America First&#8221; vision. Producing AI chips domestically could create thousands of skilled positions, boost the economy, and maintain America&#8217;s technological edge.</p><p>Equally critical are rare earth minerals&#8212;elements like neodymium and dysprosium that are vital for tech production, from smartphones to missiles. The U.S. relies heavily on China, which controls over 80% of the global supply, a dependency that&#8217;s a strategic Achilles&#8217; heel amid rising tensions. Russia, with its vast reserves of these minerals, offers an alternative. A deal with Moscow could free the U.S. from China&#8217;s grip, fueling its tech industries and reducing geopolitical risks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7-Y9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7-Y9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7-Y9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:170808,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158338316?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7-Y9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7-Y9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22c48d4e-368d-4db7-a431-a9cfc481a473_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>A Trump-Putin Deal: GPUs as the Bargaining Chip</strong></p><p>Enter Donald Trump, a dealmaker with a penchant for bold moves. Trump has long expressed a desire for a peace deal between Russia and Ukraine, a goal that could cement his legacy. But recent events&#8212;his fallout with Ukrainian President Volodymyr Zelenskyy and talk of the U.S. distancing itself from NATO&#8212;suggest a pivot. If Trump sees Zelenskyy as uncooperative and NATO as a burden, Vladimir Putin might become the partner he turns to.</p><p>To get Putin to agree to peace&#8212;say, a ceasefire or frozen conflict in Ukraine&#8212;Trump needs to offer something big. AI GPUs could be that bargaining chip. Picture this: the U.S. agrees to sell Russia a capped number of GPUs&#8212;let&#8217;s use 100,000 H100 chips per year as a theoretical example. This would give Russia enough computational power to kickstart its AI ambitions without rivaling U.S. dominance (American AI labs and supercomputers use far more). In exchange, Russia could provide rare earth minerals, helping the U.S. shore up its tech supply chain.</p><p>The deal could extend further. Russia might keep territories like Crimea and Donbas, with the U.S. assuring Ukraine stays out of NATO&#8212;a key Kremlin demand. Ukraine could be placated with aid or security guarantees. For Trump, it&#8217;s a trifecta: peace in Ukraine, minerals for U.S. tech, and manufacturing jobs at home. For Putin, it&#8217;s GPUs and a geopolitical win.</p><p>The cap on chips, like our 100,000 H100s example, would need flexibility. Technology moves fast&#8212;newer, more powerful GPUs will emerge, and the limit would need revisiting to keep the deal viable. It might be tied to U.S. production levels or adjusted as the geopolitical landscape shifts, ensuring both sides benefit long-term.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VWaz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VWaz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VWaz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293350,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158338316?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VWaz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VWaz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1bf3b78-6219-48dd-a352-cf2d7f70799f_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>AI Chips: The New Commodity of Global Trade</strong></p><p>This theory isn&#8217;t just about one deal&#8212;it&#8217;s a window into a larger shift. AI chips are becoming the oil, gold, and uranium of the 21st century. Countries that produce them, like the U.S., wield immense leverage&#8212;rewarding allies, sanctioning foes, and striking deals that redraw alliances. Nations like Russia, desperate for access, will trade whatever they have to get them.</p><p>The AI arms race is making this clear to leaders worldwide. GPUs aren&#8217;t just hardware; they&#8217;re power&#8212;military, economic, and political. As AI grows, expect more trade deals, diplomatic gambits, and even conflicts to hinge on these chips. They&#8217;re the bargaining chips of a new era, defining who rises and who falls in the global pecking order.</p><div><hr></div><p><strong>Conclusion: A Prediction for a Chip-Driven Future</strong></p><p>This U.S.-Russia deal is a prediction, not a fact&#8212;pure speculation based on current trends. But it highlights a reality we can&#8217;t ignore: AI chips are now instruments of statecraft. In the AI arms race, GPUs are the essential ingredient, and nations are racing to secure them. For Russia, they&#8217;re the missing piece to unleash its engineers&#8217; potential. For the U.S., they&#8217;re leverage to rebuild manufacturing and break China&#8217;s mineral monopoly.</p><p>If Trump&#8212;or any leader&#8212;pursues this path, AI GPUs could broker peace, fuel economies, and shift global power dynamics. It&#8217;s a gamble, but in geopolitics, the biggest risks often yield the biggest rewards. One thing is certain: as the world wakes up to the AI revolution, these tiny chips will play a massive role in the trade deals shaping our future.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Why Grok 10 Will Rely on Reinforcement Learning for 90% of Its Training]]></title><description><![CDATA[Following the trends of plateauing world knowledge and larger and larger AI data centers, it stands to reason that the future of AI model training will be dominated by Reinforcement Learning (RL).]]></description><link>https://www.grokmountain.com/p/why-grok-10-will-rely-on-reinforcement</link><guid isPermaLink="false">https://www.grokmountain.com/p/why-grok-10-will-rely-on-reinforcement</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Sun, 02 Mar 2025 02:34:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!v1SV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine a future where AI doesn&#8217;t just know everything humans have ever written&#8212;it also knows how to <em>use</em> that knowledge to solve problems, make decisions, and even outthink us in ways we can&#8217;t yet fathom. We&#8217;re not there yet, but with each new version of Grok, xAI&#8217;s powerful language model, we&#8217;re getting closer. Today, I&#8217;m going to make a bold prediction: by the time we reach Grok 10, a staggering <strong>90% to 95% of its training compute power</strong> will be dedicated to <em>Reinforcement Learning</em> (RL), not the traditional pre-training on vast amounts of text data.</p><p>This might sound like tech jargon, but stick with me&#8212;I&#8217;ll break it down. By the end, you&#8217;ll see why this shift is not only likely but necessary for AI to reach its full potential.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v1SV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v1SV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v1SV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:267855,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158206473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v1SV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!v1SV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e3543ff-024a-4f1c-ad29-1ec1c97b23ef_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What Is Pre-training? What Is Reinforcement Learning?</strong></p><p>Let&#8217;s start with the basics. When we talk about training AI models like Grok, there are two main phases: <strong>pre-training</strong> and <strong>Reinforcement Learning (RL)</strong>.</p><ul><li><p><strong>Pre-training</strong> is like giving the AI a giant textbook. It&#8217;s the process of feeding the model massive amounts of text&#8212;books, articles, websites, social media posts, and more&#8212;so it can learn the patterns of language, facts about the world, and how humans communicate. Think of it as the AI &#8220;reading&#8221; everything humans have ever written to build its foundational knowledge.</p><ul><li><p><em>Example</em>: When Grok answers a question about history or science, it&#8217;s drawing on what it learned during pre-training.</p></li></ul></li><li><p><strong>Reinforcement Learning (RL)</strong>, on the other hand, is like hands-on practice. It&#8217;s where the AI learns by doing&#8212;solving problems, making mistakes, and improving over time. In RL, the model is rewarded for getting things right and penalized for errors, which helps it refine its skills. This is how the AI learns to <em>reason</em>, make decisions, and solve complex tasks.</p><ul><li><p><em>Example</em>: When Grok is asked to solve a math problem or write code, it&#8217;s not just recalling facts&#8212;it&#8217;s using RL to think through the steps and arrive at the correct answer.</p></li></ul></li></ul><p>In short: pre-training gives the AI knowledge, while RL teaches it how to use that knowledge effectively.</p><div><hr></div><p><strong>The Knowledge Plateau: Why More Data Isn&#8217;t the Answer</strong></p><p>Now, here&#8217;s where things get interesting. For <strong>Grok 1, Grok 2, and Grok 3</strong>, each version has absorbed more of the world&#8217;s available knowledge. Grok 1, launched in 2023, was trained on a large but limited dataset. By Grok 3, released in 2025, the model had access to vastly more text data, thanks to xAI&#8217;s growing resources. It&#8217;s likely that Grok 3 has already &#8220;read&#8221; a huge chunk of what&#8217;s out there&#8212;books, scientific papers, news articles, and more.</p><p>But there&#8217;s a catch: while the amount of human knowledge <em>is</em> growing&#8212;new books, articles, and posts are published every day&#8212;the <em>rate</em> of that growth is slowing compared to the explosion in compute power. Think about it: the core of human knowledge&#8212;history, science, literature&#8212;doesn&#8217;t double overnight. Yes, we&#8217;re always learning new things, but the foundational knowledge remains relatively stable. So, while Grok 3 might have read nearly everything worth reading, Grok 10 won&#8217;t have ten times more text to learn from. The well of human knowledge isn&#8217;t bottomless.</p><p>This is what I call the <strong>knowledge plateau</strong>. We&#8217;re approaching a point where throwing more data at the AI won&#8217;t make it much smarter. It&#8217;s like trying to teach a genius more facts&#8212;they might learn a bit, but their real growth comes from <em>using</em> what they already know in smarter ways.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BkEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BkEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BkEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:347317,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158206473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BkEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BkEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1acd7ccc-2776-41d3-b622-d174a659b8e9_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Compute Power Explosion: A Game-Changer</strong></p><p>While the growth of new knowledge is flattening, the opposite is happening with <strong>compute power</strong>&#8212;the raw processing muscle behind AI training. Let&#8217;s look at the trend in the number of GPUs (processors) used for each Grok version:</p><ul><li><p><strong>Grok 1 (2023)</strong>: Trained on approximately <strong>10,000 GPUs</strong>.</p></li><li><p><strong>Grok 2 (2024)</strong>: Scaled up to around <strong>25,000 Nvidia H100 GPUs</strong>.</p></li><li><p><strong>Grok 3 (2025)</strong>: Utilized an impressive <strong>200,000 H100 GPUs</strong> in the Colossus supercluster.</p></li></ul><p>This progression&#8212;from 10,000 to 25,000 to 200,000 GPUs&#8212;shows a clear pattern: each new Grok iteration harnesses significantly more compute power. If this trend continues, by the time we reach Grok 10, we could be looking at the equivalent of <strong>20,000,000 H100 chips</strong>&#8212;100 times the compute power used for Grok 3.</p><p>But it&#8217;s not just about piling on more chips. Nvidia and other chipmakers are continually improving GPU technology, making each new generation faster and more efficient. Future chips might deliver double or triple the performance of today&#8217;s H100s. So, while Grok 10 might not physically use 20,000,000 H100 chips, it could achieve that massive compute level with fewer, more advanced processors. Either way, the compute power available for training Grok 10 will be unlike anything we&#8217;ve seen before.</p><p>This explosion in compute power changes everything. With data growth slowing, all that extra processing muscle can be used to refine the AI&#8217;s ability to <em>think</em>&#8212;and that&#8217;s where RL comes in. The sheer scale of compute will make it possible, and necessary, to focus on teaching Grok how to reason and solve problems, not just memorize more facts.</p><div><hr></div><p><strong>The Trend Is Already Here: From Grok 1 to Grok 3</strong></p><p>This shift toward RL isn&#8217;t just a future possibility&#8212;it&#8217;s already happening. Let&#8217;s examine how the percentage of compute power dedicated to RL has increased with each Grok version:</p><ul><li><p><strong>Grok 1 (2023)</strong>: Likely used only <strong>10% to 20% of its compute</strong> on RL, with the rest going to pre-training. It was functional but not groundbreaking.</p></li><li><p><strong>Grok 2 (2024)</strong>: With more compute available, RL&#8217;s share grew to around <strong>15% to 25%</strong>. This helped Grok 2 perform better on tasks requiring reasoning and problem-solving.</p></li><li><p><strong>Grok 3 (2025)</strong>: With a massive leap in compute, RL likely took up <strong>20% to 30%</strong> of the training budget. This is when we saw Grok 3 excel in benchmarks like the AIME math competition, where it scored 93.3% using advanced reasoning techniques.</p></li></ul><p>The pattern is clear: as compute power increases, the percentage dedicated to RL is rising. If this trend continues&#8212;and there&#8217;s every reason to think it will&#8212;by Grok 10, RL could dominate, using <strong>90% to 95% of the available compute</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2A6I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2A6I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2A6I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:369956,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/158206473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2A6I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2A6I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09525777-f9c2-42c3-8c52-73a5ced1b97a_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Why RL Will Dominate Grok 10&#8217;s Training</strong></p><p>So, why will RL take up such a huge share of compute for Grok 10? It comes down to one simple truth: <strong>to make AI truly intelligent, we need to focus on how it thinks, not just what it knows.</strong></p><p>With the equivalent of 20 million H100 chips at its disposal, Grok 10 will have more than enough power to absorb all the world&#8217;s knowledge in the pre-training phase, using just <strong>5% to 10% of its compute</strong>. The remaining <strong>90% to 95%</strong> will be spent on RL&#8212;teaching the AI to reason, adapt, and solve problems in ways that mimic (or even surpass) human intelligence.</p><p>Here&#8217;s what that could look like:</p><ul><li><p><strong>Advanced Reasoning</strong>: Grok 10 could solve complex, multi-step problems in math, science, or engineering, thinking through each step like a human expert.</p></li><li><p><strong>Decision-Making</strong>: It could weigh options, consider trade-offs, and make optimal choices in real-time, whether in business, healthcare, or everyday life.</p></li><li><p><strong>Creativity</strong>: RL could help Grok 10 generate novel ideas, designs, or solutions by exploring countless possibilities and learning from feedback.</p></li></ul><p>In essence, RL is the key to unlocking AI&#8217;s full potential. As <strong>Elon Musk</strong>, CEO of xAI, has said:</p><p><em>&#8220;The future of AI isn&#8217;t just about knowing more&#8212;it&#8217;s about reasoning better. That&#8217;s where the real breakthroughs will come from.&#8221;<br>(Source: Musk&#8217;s remarks during Grok 3&#8217;s unveiling, February 2025)</em></p><p>Other AI experts agree. <strong>Demis Hassabis</strong>, co-founder of DeepMind, has long championed RL as the path to more capable AI:</p><p><em>&#8220;Reinforcement Learning is how we&#8217;ll get AI to not just mimic human knowledge, but to think like humans&#8212;or better.&#8221;<br>(Source: Hassabis&#8217; 2024 TED Talk on AI&#8217;s future)</em></p><div><hr></div><p><strong>Conclusion: A Smarter, Not Just Bigger, AI</strong></p><p>As we look toward Grok 10, the writing is on the wall: the future of AI training is in Reinforcement Learning. With the knowledge plateau limiting the gains from more pre-training and compute power skyrocketing to the equivalent of 20 million H100 chips, RL will take center stage. By dedicating <strong>90% to 95% of its compute</strong> to RL, Grok 10 won&#8217;t just be a bigger model&#8212;it will be a <em>smarter</em> one, capable of reasoning, problem-solving, and decision-making at a level we&#8217;ve never seen before.</p><p>This isn&#8217;t just a technical shift; it&#8217;s a glimpse into a future where AI doesn&#8217;t just answer questions&#8212;it helps us solve the world&#8217;s toughest challenges. And that&#8217;s a future worth getting excited about.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Singularity Unveiled: A Leap Beyond Humanity and the Chips That Could Take Us There]]></title><description><![CDATA[Exploring the possibility of super-intelligent AI breaking through the plateau of Moore's Law by designing the next generation of microchips.]]></description><link>https://www.grokmountain.com/p/the-singularity-unveiled-a-leap-beyond</link><guid isPermaLink="false">https://www.grokmountain.com/p/the-singularity-unveiled-a-leap-beyond</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Wed, 26 Feb 2025 04:02:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RHVQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Picture a moment when artificial intelligence doesn&#8217;t just mirror human thought but eclipses it&#8212;when machines grow so brilliant they redesign themselves, triggering an intelligence explosion beyond comprehension. This is the Technological Singularity, a theoretical tipping point where AI becomes self-improving, reshaping human civilization in ways that defy prediction. First articulated by mathematician Vernor Vinge and amplified by thinker Ray Kurzweil, the Singularity isn&#8217;t a distant dream&#8212;it&#8217;s a horizon drawing closer, with visionaries like Elon Musk and Kurzweil offering bold timelines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RHVQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RHVQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RHVQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:346459,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157939098?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RHVQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RHVQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F668528c2-b0fd-4082-ba3f-55ce19d340ac_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>When Will the Threshold Be Crossed?</strong></p><p>Elon Musk, the trailblazer steering Tesla, SpaceX, and xAI, has a habit of pushing the clock forward. On February 23, 2025, he posted on X, &#8220;We are on the event horizon of the singularity,&#8221; hinting it&#8217;s nearly here&#8212;perhaps 2025 or 2026. In 2023, he pegged artificial general intelligence (AGI), a stepping stone to the Singularity, at &#8220;3 years, maybe 6 years&#8221; (2026-2029), and in 2024, he predicted AI surpassing any human by &#8220;around the end of next year&#8221; (2025). Musk&#8217;s urgency clashes with Ray Kurzweil&#8217;s steadier outlook. In his 2005 book <em>The Singularity Is Near</em>, Kurzweil set 2045 as the year AI overtakes humanity, a forecast he&#8217;s held firm on, rooted in exponential tech curves. Musk, impatient as ever, has dismissed 2045 as too leisurely&#8212;Singularity, he argues, is barreling toward us faster.</p><p><strong>Moore&#8217;s Law Stumbles</strong></p><p>For decades, Moore&#8217;s Law powered this journey&#8212;Gordon Moore&#8217;s 1965 insight that transistor counts double every two years, skyrocketing computing muscle. It shrank transistors from clunky giants to nanoscale wonders. But the rhythm&#8217;s faltering. Physical barriers loom: transistors can&#8217;t shrink much past a few nanometers without quantum glitches and heat gumming up the works. The doubling cycle&#8217;s stretched beyond two years, prompting Nvidia&#8217;s Jensen Huang to proclaim in 2022, &#8220;Moore&#8217;s Law is dead.&#8221; Today&#8217;s pinnacle, Nvidia&#8217;s H100&#8212;80 billion transistors, 4 petaflops&#8212;drives colossal data centers like Colossus, but the era of &#8220;smaller, faster, cheaper&#8221; is plateauing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lzXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lzXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lzXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:305072,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157939098?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lzXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lzXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7bc2c12-c654-43ca-99a9-35d13f768435_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>AI as Chip Architect: The Singularity&#8217;s Spark</strong></p><p>Now imagine the Singularity flipping the script. What if AI, at that self-enhancing peak, doesn&#8217;t just run on chips but designs them? Envision an intelligence so vast it leapfrogs human limits, crafting exotic microchips that leave the H100 in the dust. This could be the Singularity&#8217;s masterstroke: AI rewriting hardware rules, reigniting exponential leaps where Moore&#8217;s Law stalled.</p><ul><li><p><strong>Neuromorphic Designs</strong>: The human brain doesn&#8217;t crunch like silicon&#8212;it&#8217;s sparse, event-driven, and wildly efficient. Neuromorphic chips echo this, with neuron-like circuits firing only when triggered. Humanity struggles to refine them; a Singularity-level AI could perfect their blueprint, slashing power from megawatts to watts while matching data-center might.</p></li><li><p><strong>Exotic Materials</strong>: Silicon&#8217;s throne could crumble as AI harnesses graphene&#8212;atom-thin carbon sheets, blazingly conductive&#8212;or diamond substrates, with peerless heat handling. Human labs tinker with these; an AI could simulate and optimize them in virtual realms, solving production riddles still out of reach.</p></li><li><p><strong>Beyond Human Ingenuity</strong>: Chip design today is linear&#8212;engineers sketch, test, tweak. A superintelligent AI could run millions of parallel simulations, exploring radical setups like 3D-stacked circuits, photonic data flows (light over electrons), or quantum-classical hybrids. It&#8217;d spot patterns and possibilities invisible to human eyes, unbound by mortal constraints.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zn3N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zn3N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zn3N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:280452,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157939098?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zn3N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Zn3N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd22fd8f4-9d68-4617-babd-2f4814d5c7ad_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Feedback Loop Takes Off</strong></p><p>Here&#8217;s the kicker: a runaway feedback loop. Smarter AI builds better chips, which make AI smarter, quicker. The H100 trains models like Grok 3; tomorrow&#8217;s AI-crafted chip&#8212;perhaps a neuromorphic graphene titan&#8212;could birth Grok 10 in days, not months, on a power sip. Each jump fuels the next, turning the Singularity into an unstoppable surge. Compute power might not double every two years&#8212;it could balloon tenfold yearly, or more, as AI retools its own evolution.</p><p><strong>The Edge of Tomorrow</strong></p><p>If Musk&#8217;s &#8220;event horizon&#8221; holds, this shift is years&#8212;not decades&#8212;away. Data centers could shrink from city-spanning to toaster-sized, or laptops might pack Colossus-level punch. Yet limits linger: energy, physics, or unseen roadblocks could tether even a Singularity AI. Still, a new Moore&#8217;s Law could rise&#8212;not of transistors, but of intelligence-forged innovation&#8212;propelled by chips beyond today&#8217;s wildest dreams, designed by minds once called our own creation. The Singularity isn&#8217;t just near; it might reforge the world, one chip at a time.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Inside the Colossus: How NVIDIA and xAI Teamed Up to Build a Supercomputer for Grok 3]]></title><description><![CDATA[NVIDIA provides technology beyond GPUs. In building Colossus, xAI partnered with NVIDIA to utilize their high-end networking and software solutions to build & manage the worlds largest data center.]]></description><link>https://www.grokmountain.com/p/inside-the-colossus-how-nvidia-and</link><guid isPermaLink="false">https://www.grokmountain.com/p/inside-the-colossus-how-nvidia-and</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Wed, 26 Feb 2025 02:37:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!atuk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine a giant digital brain, built to think faster and smarter than anything we&#8217;ve seen before. That&#8217;s what xAI, the AI company started by Elon Musk, set out to create with their Colossus data center. This massive supercomputer powers Grok 3, the latest version of their truth-seeking AI chatbot. But xAI didn&#8217;t do it alone&#8212;they had a powerhouse partner in NVIDIA, a company famous for its cutting-edge tech. NVIDIA didn&#8217;t just supply the raw muscle with their H100 GPUs; they brought a whole toolkit, including the Spectrum-X Ethernet and likely their NVIDIA AI Enterprise platform, to make Colossus a reality. Let&#8217;s break it down in simple terms and see what the big shots at xAI and NVIDIA have to say about it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!atuk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!atuk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!atuk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!atuk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!atuk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!atuk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:200546,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157935443?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!atuk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!atuk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!atuk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!atuk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9050808-ade3-41ff-8f76-aea34dee5013_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The H100 GPUs: The Heart of Colossus</strong></p><p>At the core of Colossus are 100,000 NVIDIA H100 GPUs&#8212;think of them as super-fast calculators designed specifically for AI. These GPUs (short for Graphics Processing Units) are like the engines that crunch massive amounts of data to teach Grok 3 how to understand the world. Each one can handle billions of calculations per second, making them perfect for training an AI that needs to analyze tons of text, images, and more. xAI started with 100,000 of these, and they&#8217;re already planning to double that to 200,000, mixing in even faster H200 GPUs.</p><p>Elon Musk himself called it out: &#8220;Colossus is the most powerful AI training system in the world. Nice work by xAI team, NVIDIA and our many partners/suppliers.&#8221; That&#8217;s a big claim, but the H100s are a big reason why. They&#8217;re built to handle the heavy lifting of AI training, where the system learns by processing data over and over until it gets smarter. Without these GPUs, Grok 3 wouldn&#8217;t have the horsepower to aim for being &#8220;the smartest AI on Earth,&#8221; as Musk and xAI&#8217;s team described it in a recent livestream.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g4x8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g4x8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g4x8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:236338,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157935443?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g4x8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g4x8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21e37ef5-3855-40c4-acb8-44c465683c74_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Spectrum-X Ethernet: The Superhighway for Data</strong></p><p>But raw power isn&#8217;t enough&#8212;those GPUs need to talk to each other fast. That&#8217;s where NVIDIA&#8217;s Spectrum-X Ethernet comes in. Picture a busy city with thousands of cars (the GPUs) trying to share information. Without good roads, you&#8217;d get traffic jams, and everything would slow down. Spectrum-X is like a superhighway that keeps data moving smoothly between all 100,000 GPUs. It&#8217;s a networking system that uses Ethernet&#8212;a common way computers connect&#8212;but turbocharges it for AI workloads.</p><p>NVIDIA says Spectrum-X delivers &#8220;95% data throughput with zero latency degradation or packet loss,&#8221; which is a fancy way of saying it keeps everything running fast and without hiccups. Standard Ethernet might clog up with &#8220;flow collisions&#8221; (like car crashes on a road), but Spectrum-X avoids that with smart tricks like congestion control and adaptive routing. This is a big deal because, as an xAI spokesperson put it, &#8220;NVIDIA&#8217;s Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive scale, creating a super-accelerated and optimized AI factory.&#8221; In other words, Spectrum-X makes sure Grok 3&#8217;s training doesn&#8217;t get stuck in traffic.</p><p>Gilad Shainer, NVIDIA&#8217;s Senior VP of Networking, added, &#8220;The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads.&#8221; For the layperson, that means it&#8217;s all about speed and efficiency&#8212;keys to getting Grok 3 ready faster.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ECVh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ECVh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ECVh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:257368,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157935443?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ECVh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ECVh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cfc74e3-e62d-494f-8759-449ce191a8bb_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>NVIDIA AI Enterprise: The Brain Behind the Operation?</strong></p><p>While NVIDIA and xAI haven&#8217;t explicitly confirmed it in every press release, it&#8217;s likely that the NVIDIA AI Enterprise platform played a role too. Think of this as the conductor of an orchestra, making sure all the GPUs and networks play in harmony. It&#8217;s a software package that helps companies manage giant AI projects&#8212;everything from setting up the training process to keeping the system running smoothly. For a project as huge as Colossus, which went from an empty factory to a working supercomputer in just 122 days, this kind of software could be a game-changer.</p><p>NVIDIA AI Enterprise simplifies the tricky stuff&#8212;like splitting up tasks across thousands of GPUs or fixing problems when they pop up. It&#8217;s built to work with NVIDIA&#8217;s hardware, like the H100s and Spectrum-X, so it&#8217;s a natural fit. While we don&#8217;t have a direct quote tying it to Colossus, NVIDIA&#8217;s own descriptions call it &#8220;an end-to-end, secure, cloud-native suite of AI software,&#8221; perfect for &#8220;accelerating the development and deployment of AI.&#8221; That sounds like exactly what xAI needed to get Grok 3 off the ground so fast.</p><p><strong>Why It Matters: A Team Effort for a Smarter AI</strong></p><p>The collaboration between NVIDIA and xAI isn&#8217;t just about piling up tech&#8212;it&#8217;s about building something groundbreaking. The H100 GPUs give Colossus its muscle, Spectrum-X keeps the data flowing, and NVIDIA AI Enterprise (if used) ties it all together. Together, they&#8217;ve created what Musk calls &#8220;the most powerful training system in the world,&#8221; and they did it in record time&#8212;122 days from start to finish, with training starting just 19 days after the first racks arrived.</p><p>For us regular folks, this means Grok 3 could soon be answering our questions with unmatched smarts, thanks to a supercomputer that&#8217;s more than the sum of its parts. As an xAI spokesperson summed it up, &#8220;xAI has built the world&#8217;s largest, most-powerful supercomputer. NVIDIA&#8217;s Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive scale.&#8221; And with plans to double Colossus to 200,000 GPUs, the future looks even bigger.</p><p>So, next time you chat with Grok 3, remember: it&#8217;s not just Elon&#8217;s vision or xAI&#8217;s grit&#8212;it&#8217;s also NVIDIA&#8217;s tech wizardry making it happen, one GPU, network, and software trick at a time.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The AI Arms Race: U.S., China, and the Path to Grok 3]]></title><description><![CDATA[History of the AI arms race, where control of Nvidia's H100 chips plays a major role in giving the US an advantage over China.]]></description><link>https://www.grokmountain.com/p/the-ai-arms-race-us-china-and-the</link><guid isPermaLink="false">https://www.grokmountain.com/p/the-ai-arms-race-us-china-and-the</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Tue, 25 Feb 2025 03:06:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xHCd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The showdown between the United States and China over artificial intelligence (AI) has become the defining tech rivalry of our era&#8212;a high-stakes sprint toward a super-advanced AI, often dubbed the Singularity. What started as a quest for innovation has escalated into a geopolitical chess match, with microchips as the kingmakers and the recent release of Grok 3 by xAI marking a pivotal move.</p><p><strong>A Brief History of the AI War</strong></p><p>The U.S. laid the groundwork for modern AI in the 2000s, driven by Silicon Valley ingenuity and academic breakthroughs. By 2016, it was a national priority, with Barack Obama declaring, &#8220;The nation that goes all-in on AI will lead the 21st century.&#8221; China countered in 2017 with a bold plan to dominate AI by 2030, fueled by vast data and state investment. As Eric Schmidt, former Google CEO, warned, &#8220;China is catching up fast&#8212;by 2025, they could be neck-and-neck.&#8221; The U.S. responded with export controls, turning hardware into a weapon to preserve its lead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xHCd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xHCd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xHCd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:280637,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157859391?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xHCd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xHCd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcaa5098-1002-4387-988a-e39d07be038d_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Chips and Memory Driving AI</strong></p><p>Training cutting-edge AI demands elite hardware. Nvidia&#8217;s H100 GPUs are the crown jewels, offering unrivaled power for crunching massive datasets behind models like ChatGPT or Grok. &#8220;The H100 is the engine of the AI revolution,&#8221; says Jensen Huang, Nvidia&#8217;s CEO. &#8220;Without it, you&#8217;re stuck in the slow lane.&#8221; Paired with high-bandwidth memory (HBM3), these chips slash training times from months to days, making them indispensable for data centers chasing the next AI leap.</p><p><strong>U.S. Restrictions to Hold the Line</strong></p><p>Since October 2022, the U.S. has tightened the screws on China&#8217;s access to this tech. The Department of Commerce banned exports of advanced GPUs like the H100, HBM, and chip-making tools from firms like ASML. By 2024, updates capped GPU shipments to third countries&#8212;over 100 nations face limits&#8212;and set strict quotas (e.g., 1,700 H100s annually without licenses). &#8220;We&#8217;re not just protecting our lead; we&#8217;re shaping the future,&#8221; said Commerce Secretary Gina Raimondo in 2023. The goal? Keep China reliant on outdated hardware while U.S. firms race ahead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jaCD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jaCD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jaCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247227,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157859391?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jaCD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jaCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ff596ea-ffe1-4177-ace2-10394af0bf61_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>China&#8217;s Defiant Push</strong></p><p>China&#8217;s response is a mix of grit and guile. Huawei&#8217;s Ascend 910B chips are stepping up, though they trail Nvidia&#8217;s best. &#8220;We&#8217;ll build our own destiny,&#8221; vowed Huawei&#8217;s rotating chairman, Guo Ping, in 2022. SMIC, China&#8217;s top chipmaker, is squeezing older tech (7nm) for results, while black-market smuggling of banned GPUs fills gaps&#8212;albeit at a trickle. Leveraging its data advantage, China focuses on deploying AI in real-time systems, from surveillance to logistics, buying time until its domestic tech matures.</p><p><strong>The Singularity Prize</strong></p><p>The ultimate target is the Singularity&#8212;an AI so advanced it outstrips human intellect, self-improving to solve problems we can&#8217;t fathom. &#8220;Whoever wins this race will rewrite the global order,&#8221; predicts Elon Musk, xAI&#8217;s founder. A nation wielding such power could crack encryption, dominate markets, or reshape warfare, securing its interests for decades. The U.S. sees it as a democratic safeguard; China, a tool for centralized supremacy.</p><p><strong>Grok 3 and the H100 Advantage</strong></p><p>Enter Grok 3, released by xAI in February 2025. This latest iteration, built to accelerate human scientific discovery, underscores America&#8217;s edge. Musk recently secured 200,000 Nvidia H100 chips for xAI&#8217;s data centers&#8212;a haul China can only dream of under current restrictions. With each H100 locked behind U.S. export walls, China&#8217;s Ascend chips and smuggled GPUs can&#8217;t match this scale or speed. &#8220;Those 200,000 H100s are a game-changer,&#8221; Musk tweeted. &#8220;It&#8217;s raw compute power China can&#8217;t touch&#8212;yet.&#8221; Training Grok 3 on this arsenal positions xAI, and the U.S., closer to the Singularity than ever.</p><p><strong>The Road Ahead</strong></p><p>As of now, the U.S. leads, bolstered by Grok 3 and its hardware dominance. China&#8217;s resilience keeps it in the fight, but the H100 gap is a steep hill to climb. The Singularity looms as both prize and peril&#8212;whoever gets there first could dictate humanity&#8217;s future. For now, Grok 3&#8217;s release is a loud signal: the U.S. isn&#8217;t just playing defense; it&#8217;s charging toward victory.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Origin of Grok 3 - the Colossus Data Center in Memphis]]></title><description><![CDATA[How Musk leapfrogged the LLM competition by procuring 200,000 NVIDIA H100 GPUs and building a cutting edge data center in Memphis]]></description><link>https://www.grokmountain.com/p/origin-of-grok-3-the-colossus-data</link><guid isPermaLink="false">https://www.grokmountain.com/p/origin-of-grok-3-the-colossus-data</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 20 Feb 2025 03:12:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!n_Ue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here&#8217;s the story of how xAI built the Memphis data center to train Grok 3, pieced together from what&#8217;s known about the project as of February 19, 2025.</p><p><strong>The Reasoning for Building the Data Center</strong></p><p>xAI, driven by Elon Musk&#8217;s ambition to rival AI giants like OpenAI, needed a massive leap in computational power to train Grok 3, touted as the &#8220;smartest AI on Earth.&#8221; Grok 2 had been trained on a modest 8,000 GPUs, but Grok 3 demanded a scale-up to tackle complex math, science, and coding tasks, aiming to outpace competitors like ChatGPT and DeepSeek. The goal was clear: create a model capable of real-time reasoning and self-correction, requiring not just more GPUs but a tightly integrated, high-speed cluster. This wasn&#8217;t something they could piecemeal together&#8212;they needed a dedicated, monstrous supercomputer, and they needed it fast to stay on Musk&#8217;s aggressive timeline for a December 2024 release (later adjusted to February 2025).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n_Ue!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n_Ue!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n_Ue!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230657,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157522697?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n_Ue!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!n_Ue!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F156c8b6e-aea3-43a3-9438-be119abf4a5b_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Evaluating Cloud Providers</strong></p><p>Initially, xAI explored partnering with existing cloud providers like Oracle, which had supplied GPU capacity for earlier Grok models. Musk reportedly negotiated a $10 billion deal for GPU clusters with Oracle, but talks fell apart by mid-2024. The sticking point? Time. Cloud providers quoted 18&#8211;24 months to provision and interconnect the 100,000+ GPUs xAI demanded&#8212;a delay that would&#8217;ve killed their timeline. Even with Oracle&#8217;s 16,000 H100 GPUs already in use by xAI, scaling to the level Grok 3 required meant custom infrastructure beyond what cloud vendors could deliver quickly. Frustrated, Musk pivoted: xAI would build its own data center, controlling every aspect from hardware to deployment speed.</p><p><strong>Site Selection: Why Memphis?</strong></p><p>Memphis wasn&#8217;t a random choice&#8212;it was a calculated move for speed and practicality. xAI scouted locations and landed on a 785,000-square-foot abandoned Electrolux factory in southwest Memphis, acquired by Phoenix Investors. The site offered a ready-made shell with industrial zoning, bypassing the 18&#8211;24-month construction timeline of a new build. Memphis also had logistical perks: access to the Tennessee Valley Authority (TVA) power grid and the Memphis Aquifer for cooling, plus a pro-business local government eager to fast-track approvals via the Greater Memphis Chamber. Musk&#8217;s team saw a chance to flip a vacant factory into a &#8220;Gigafactory of Compute&#8221; in months, not years. The decision was finalized by June 2024, with construction starting almost immediately.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hdC-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hdC-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hdC-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157522697?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hdC-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hdC-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaddf454-39c4-4321-bf1d-26b9cfe718ea_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Choosing H100 GPUs and Procurement Challenges</strong></p><p>The NVIDIA H100 GPU was the obvious pick&#8212;each chip delivers up to 4 PFLOPS of FP8 performance, with 80 GB of HBM2e memory and 2 TB/s bandwidth, perfect for the dense, parallel workloads of Grok 3&#8217;s training. xAI initially aimed for 65,000 H100s, a number floated in early plans, reflecting a balance between ambition and what NVIDIA could realistically supply amid global shortages. Procurement wasn&#8217;t smooth sailing&#8212;NVIDIA&#8217;s H100s were in high demand, with Meta snapping up 350,000 that year and Tesla diverting 12,000 originally slated for its own projects to xAI. Musk leaned on his NVIDIA ties (and diverted Tesla shipments) to secure the initial batch, but cosmic-ray bit flips, mismatched BIOS firmware, and network cable issues plagued early testing. Still, by July 22, 2024, the first 65,000 GPUs were online, dubbed &#8220;Colossus,&#8221; in just 122 days from start to finish.</p><p><strong>Expansion to 100,000 and 200,000 GPUs</strong></p><p>The jump from 65,000 to 100,000 GPUs came fast&#8212;by September 2024, Colossus hit that mark, driven by Musk&#8217;s realization that Grok 3&#8217;s complexity outstripped initial estimates. The Reasoning Beta mode, with its &#8220;Big Brain&#8221; feature, demanded more compute to refine self-correcting algorithms. Doubling to 200,000 GPUs by December 2024 was announced at the Memphis Chamber&#8217;s Chairmen&#8217;s Luncheon, fueled by xAI&#8217;s $6 billion funding round and a vision to dwarf rivals. This expansion wasn&#8217;t just about scale&#8212;it aimed to make Colossus the &#8220;most powerful AI training cluster in the world,&#8221; syncing all GPUs via NVIDIA&#8217;s Spectrum-X Ethernet for unmatched data throughput. Plans for 1 million GPUs by 2026 were teased, signaling xAI&#8217;s long-term bet on exponential growth.</p><p><strong>Electrical and Cooling Needs</strong></p><p>Powering 200,000 H100s was a beast of a challenge. Each GPU consumes 700W at peak, so 100,000 GPUs alone demanded 70 MW, ballooning to 250 MW with servers and cooling overhead&#8212;enough for a small city. Memphis started with 8 MW from an existing substation, scaled to 50 MW by August 2024 via MLGW upgrades costing $760,000. xAI then invested $24 million in a new substation for 150 MW, buffered by 14 Voltagrid natural gas generators (35 MW total) and Tesla MegaPacks ($38 million) to handle surges. Cooling was trickier&#8212;200,000 GPUs needed a custom liquid-cooling system, not fans. xAI rented a quarter of the U.S.&#8217;s mobile cooling units early on, then built a closed-loop setup pulling 1.3 million gallons daily from the aquifer (plans for a $78 million wastewater recycling plant aim to cut this reliance). Locals raised concerns about grid strain and emissions, but xAI pressed on.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBDO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBDO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBDO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224915,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.grokmountain.com/i/157522697?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TBDO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TBDO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ba6ae9-92a9-4dc6-9fc0-320cbc873c95_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Comparison to ChatGPT and DeepSeek Data Centers</strong></p><p>Colossus dwarfs most known LLM training setups. ChatGPT&#8217;s GPT-3 used 3 million GPU-hours on V100s (10,000 GPUs equivalent), and GPT-4 likely scaled to 20,000&#8211;30,000 H100s across Microsoft&#8217;s Azure clusters&#8212;big, but not a single, cohesive 200,000-GPU beast. DeepSeek V3 trained on 2,048 H800s for $5.5 million, and R1 might&#8217;ve used 50,000 GPUs&#8212;efficient, but a fraction of Colossus&#8217;s scale. Meta&#8217;s Llama 3.1 (405B parameters) took 31 million GPU-hours on ~20,000 H100s. At 200,000 GPUs, Colossus&#8217;s 250 MW and 200 million GPU-hours for Grok 3 outstrip these by orders of magnitude, though its $1 billion+ cost contrasts with DeepSeek&#8217;s lean efficiency. Size-wise, it&#8217;s closer to Frontier (37,000 GPUs, 7,300 sq ft) than typical AI clusters, but its single-site density is unmatched.</p><p>From a factory shell to a 200,000-GPU titan in under a year, Memphis became xAI&#8217;s bold answer to the AI race&#8212;a story of speed, scale, and stubborn ingenuity.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Understanding the Context Window: A Train Station Analogy]]></title><description><![CDATA[How training data must be broken into sequences of tokens, adhering to the size of the context window. These sequences are then processed in batches, with batch size dependent on available hardware.]]></description><link>https://www.grokmountain.com/p/understanding-the-context-window</link><guid isPermaLink="false">https://www.grokmountain.com/p/understanding-the-context-window</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Sat, 15 Feb 2025 22:19:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!B-lc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine the world of Large Language Models (LLMs) as a bustling train station. In this metaphor, the neural network is the station itself, with its own fixed dimensions and operational rules. Let's dive into how this analogy helps explain one of the most critical aspects of LLMs: the <strong>context window</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B-lc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B-lc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B-lc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:315309,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B-lc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B-lc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31d68419-2bc4-43dd-948a-db90f87f5730_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Context Window: Your Train Station</strong></p><p>The <strong>context window</strong> in an LLM can be thought of as the length of the train station. Just as a train station has a limit on how long a train can be, each LLM has a cap on how many tokens (or pieces of text) it can process at once. Here are the context window sizes for some top LLMs:</p><ul><li><p><strong>Grok-1</strong>: 8,192 tokens</p></li><li><p><strong>ChatGPT (GPT-3.5)</strong>: 4,096 tokens</p></li><li><p><strong>ChatGPT (GPT-4 standard)</strong>: 8,192 tokens, with variants up to 128,000 tokens</p></li><li><p><strong>DeepSeek-R1</strong>: 128,000 tokens</p></li><li><p><strong>Claude (Anthropic)</strong>: Up to 100,000 tokens</p></li></ul><p>The context window is crucial because it dictates how much text the model can "see" at once, directly affecting its ability to understand context, make coherent responses, or generate text with continuity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EG4o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EG4o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EG4o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249212,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EG4o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EG4o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e5dea9-3871-4fa7-b7ad-deb184ee7c8d_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Preparing Training Data: Tokenization and Sequencing</strong></p><p>When preparing data for training an LLM, think of your text as a long freight train. Here&#8217;s how you get it ready for the station:</p><p><strong>1. Tokenization</strong>: First, you break down your text into tokens, which are like individual train cars. Each token could be a word, part of a word, or even punctuation, depending on the model's tokenizer.</p><p><strong>2. Dividing into Sequences</strong>: Next, you need to cut this long train into smaller ones that fit within the station's length (context window).</p><ul><li><p><strong>Optimal Splits</strong>: The goal is to split at natural linguistic boundaries, like the end of sentences or paragraphs. This ensures that the semantic meaning remains intact, much like not breaking a sentence halfway through would keep the cargo's purpose clear.</p></li><li><p><strong>Handling Long Texts</strong>: If a piece of text (train) is longer than the context window, you might split it, but with overlap if possible, to maintain some context. Imagine if you had to cut a very long train but wanted to ensure some continuity in the cargo.</p></li><li><p><strong>Padding</strong>: If a sequence is shorter than the context window, you pad it with empty cars (padding tokens) to reach the station's length. This doesn't affect the training but ensures uniformity in processing.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oIve!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oIve!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oIve!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oIve!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oIve!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oIve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:320760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oIve!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oIve!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oIve!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oIve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a952179-7520-4f8f-93a6-0112dfd106c1_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Role of Batches: Multiple Trains at Once</strong></p><p>Now, imagine this station has multiple parallel rails where trains can enter simultaneously. This is where <strong>batching</strong> comes into play:</p><ul><li><p><strong>Batch Size</strong>: This represents how many trains (sequences) can enter the station at the same time. The number of rails and the station's width are analogous to how many GPUs and how much memory you have.</p></li></ul><p><strong>Real-World Batch Sizes:</strong></p><ul><li><p><strong>Small Operations</strong>: With limited hardware, you might only have enough room for a few trains at once. A batch size of about <strong>4 sequences</strong> is common in such scenarios, akin to a small, rural station.</p></li><li><p><strong>Medium-Scale Applications</strong>: With more resources, you could handle batch sizes from <strong>32 to 512 sequences</strong>. This is like a busy train station in a medium-sized city where multiple trains can be processed simultaneously for efficiency.</p></li><li><p><strong>Large Commercial Applications</strong>: Here, we're talking about major hubs, where you might see batch sizes up to <strong>4000 sequences or more</strong>, leveraging extensive hardware capabilities, like a vast, modern metropolitan station.</p></li></ul><p><strong>Drawbacks of Batch Sizes:</strong></p><ul><li><p><strong>Too Small</strong>:</p><ul><li><p><strong>Inefficiency</strong>: Small batches mean more frequent parameter updates, which can slow down training due to the overhead of each update. However, they might lead to better generalization due to the stochastic nature of updates.</p></li><li><p><strong>Computational Overhead</strong>: More iterations for the same amount of data, potentially leading to longer training times.</p></li></ul></li><li><p><strong>Too Large</strong>:</p><ul><li><p><strong>Memory Constraints</strong>: Larger batches require more memory, potentially leading to out-of-memory errors if not managed well.</p></li><li><p><strong>Generalization Issues</strong>: They might converge faster per epoch but can lead to overfitting, as the model sees less variability in each update.</p></li><li><p><strong>Optimization Challenges</strong>: The optimization landscape can become smoother but might miss out on finding good, sharp minima that generalize better.</p></li></ul></li></ul><p><strong>Wrapping Up</strong></p><p>Understanding the context window, tokenization, and batching through the lens of a train station analogy helps visualize how LLMs process and learn from text data. When preparing data for training, you're essentially ensuring each train fits perfectly into the station, maintaining the integrity of the cargo (text's meaning), and optimizing how many trains can be processed at once to balance efficiency with learning quality. Whether you're running a small operation or a large-scale commercial application, these considerations are pivotal in harnessing the full potential of LLMs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[An Android's Journey Through University: Understanding Neural Networks and the Transformer Block]]></title><description><![CDATA[Explaining the layers of neural networks and Transformers, and how hidden state vectors are created and enriched as they pass through these layers, ultimately to make a simple prediction.]]></description><link>https://www.grokmountain.com/p/an-androids-journey-through-university</link><guid isPermaLink="false">https://www.grokmountain.com/p/an-androids-journey-through-university</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 13 Feb 2025 03:04:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!D4zP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine an android, not just any machine but one designed to learn and interpret language. This android's journey through an educational system will help us unravel the mysteries of neural networks, with a special focus on the Transformer Block, one of the most revolutionary architectures in AI. Each step of this journey corresponds to a layer in a neural network where "teaching methods," or weights, are adjusted to enhance learning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D4zP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D4zP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D4zP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256566,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D4zP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!D4zP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4fadae2-c94d-4a81-9aa1-58d3ca8c8b1d_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Professor Embedding - The Assembly:</strong></p><p>Our story begins with <strong>Professor Embedding</strong>, the architect of our android. He has a crucial task: to build the android according to specific specifications that will allow it to be admitted to Transformer College.</p><ul><li><p><strong>Size and Dimensions:</strong> Professor Embedding constructs the android with a fixed size and dimensions. These will not change throughout the android's educational journey. The dimensions of the android (or hidden state vectors) are set here so that they remain consistent through all layers of Transformer College, ensuring seamless interaction among the professors' teachings.</p></li><li><p><strong>What's Happening:</strong> This is the <strong>Embedding Layer</strong>, where words or tokens are transformed into vectors or hidden states. These vectors represent the words in a high-dimensional space, capturing semantic relationships.</p></li><li><p><strong>Teaching Method (Weights):</strong> Professor Embedding's teaching methods are the weights of the Embedding Layer.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y8g4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y8g4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y8g4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:199315,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y8g4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Y8g4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b275227-2501-4ef3-8d67-216723aa122e_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Transformer College - The Heart of Learning:</strong></p><p>Once assembled, the android enrolls at <strong>Transformer College</strong>, where it will undergo a transformation through several layers, each taught by different professors:</p><ul><li><p><strong>Professor Attention:</strong> Here, the android learns about the relationships between words in the memorized text. Professor Attention teaches the android how each word influences every other word in the sequence, much like how the self-attention mechanism in transformers works.</p><ul><li><p><strong>What's Happening:</strong> This layer computes attention scores that determine how much focus to place on other parts of the input when processing each part.</p></li></ul></li><li><p><strong>Professor FFN (Feed-Forward Network):</strong> After understanding word relationships, the android learns about broader contexts and deeper meanings. Professor FFN broadens the android's knowledge, similar to how a feed-forward network adds non-linear layers to enhance the understanding of each position in the sequence.</p><ul><li><p><strong>What's Happening:</strong> Each position in the sequence gets processed by the same FFN, which adds depth to the learning of each token or word.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_-M0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_-M0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_-M0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_-M0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_-M0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bdca4c2-6e7e-42cc-a037-ffc2266894ea_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Professor Output - The Final Exam:</strong></p><p>Finally, the android leaves Transformer College to face <strong>Professor Output</strong>, who isn't part of the college but is the final step in this educational journey:</p><ul><li><p><strong>What's Happening:</strong> Professor Output gives the android one task: to fill out a vast spreadsheet where each row is a word from the English vocabulary. The android must assign a probability to each word, indicating how likely it is to be the next word in the memorized sequence. This task mirrors the Output Layer, which converts the rich, contextual representations into logits or probabilities for prediction.</p></li><li><p><strong>The End:</strong> Once the task is complete, the android's job is done; it "self-destructs" metaphorically, as its hidden states are no longer needed after prediction.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YO6T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YO6T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YO6T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YO6T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YO6T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa97fbb18-6f77-48cc-a9bb-357551482ed3_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Backpropagation - The Art of Learning:</strong></p><p>After the android completes its journey, the true magic of learning happens in reverse, but this depends on whether we're in training or inference:</p><ul><li><p><strong>Inference (Using the Model):</strong></p><ul><li><p>When the android is in the real world (inference), the probabilities calculated by Professor Output are used directly to extend the output sequence. If the task is to generate text, the word with the highest probability might be chosen, or a sampling method might be used to select from the top probabilities for more varied text generation.</p></li></ul></li><li><p><strong>Training (Learning from Mistakes):</strong></p><ul><li><p>During training, after Professor Output computes the probabilities, they are compared to the actual next word in the training text. This comparison results in a measure of how accurate the android's predictions were.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FD7k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FD7k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FD7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230236,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FD7k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FD7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd855ff7-5346-44a5-a6ce-0e9311f367e7_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Backward Pass (Backpropagation):</strong></p><ul><li><p><strong>Professor Output</strong> first assesses how far off the predictions were from the actual words. He adjusts his teaching methods (weights) to better predict the correct word next time.</p></li><li><p><strong>Professor FFN</strong> then revises his lessons based on how they contributed to the final prediction, learning from the feedback provided by Professor Output. Small adjustments are made to better capture the context needed for accurate predictions.</p></li><li><p><strong>Professor Attention</strong> adjusts next, seeing how his focus on word relationships influenced FFN's teachings and ultimately the output. He modifies his methods to better highlight the important connections between words.</p></li><li><p>Lastly, <strong>Professor Embedding</strong> modifies his initial construction of the android based on how well the entire educational journey performed. Adjustments here aim to better represent words in a way that's more conducive to learning the correct patterns.</p></li></ul></li></ul></li><li><p>This process of backpropagation means each professor slightly tweaks their teaching methods (weights) based on the prediction errors, in reverse order from the output back to the input, ensuring that each layer's contribution to the final prediction is optimized for better performance in future encounters with similar text.</p></li></ul></li></ul><p><strong>Conclusion:</strong></p><ul><li><p><strong>Neural Network:</strong> Collectively, all professors from Professor Embedding through Transformer College to Professor Output make up the neural network. During training, each professor's teaching method (weights) is iteratively refined through backpropagation to improve prediction accuracy, and during inference, these learned methods are applied to generate or understand text.</p></li><li><p><strong>Transformer Block:</strong> This is the specialized segment where the android's understanding is deeply transformed, symbolizing how hidden state vectors are processed through multiple layers, gaining context and meaning without changing in size or dimensions.</p></li><li><p><strong>The Journey:</strong> This narrative of the android's educational journey illustrates how data, in the form of language, is processed, learned from, and ultimately used for prediction or generation in neural networks, particularly those employing Transformer architecture. The consistency in dimensions ensures that this learning is coherent and effective across different phases of the model's use, from training to inference.</p></li></ul><p>By visualizing this journey, we demystify how neural networks, especially those with Transformer Blocks, work to understand and generate human language with such sophistication, highlighting the critical role of backpropagation in the learning process and the practical application of learned knowledge during inference.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Emergence of Intelligence in LLMs via Complex Interactions of Weights]]></title><description><![CDATA[How weights (parameters) in LLMs are individually meaningless, but collectively across many layers of the neural network, give rise to intelligent output.]]></description><link>https://www.grokmountain.com/p/the-emergence-of-intelligence-in</link><guid isPermaLink="false">https://www.grokmountain.com/p/the-emergence-of-intelligence-in</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Sat, 08 Feb 2025 23:59:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LlLD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Large Language Models (LLMs) have captured our fascination with their ability to mimic human language. But what powers these models? At their core are concepts like weights and backpropagation, which are pivotal to their learning process. Let's dive into these through an imaginative analogy: envisioning a complex machine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LlLD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LlLD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LlLD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:243408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LlLD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LlLD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0411750b-31b2-4d97-8906-7198d28cc13d_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Machine Analogy: A Cube of Intelligence</strong></p><p>Imagine a colossal glass cube, filled with a unique gel. Suspended within this gel are billions of tiny metallic reflectors, each one centimeter apart. These reflectors represent <strong>weights</strong> in our neural network, each with a unique orientation that dictates how incoming light will be altered.</p><ul><li><p><strong>Light as Information</strong>: The rays of light entering this cube symbolize <strong>hidden state vectors</strong> in an LLM, the mathematical representation of words from the input text (prompt).</p></li><li><p><strong>The Journey Through the Gel</strong>: As light travels through this medium, it interacts with each reflector. However, the gel itself isn't just a passive holder. It acts like <strong>activation functions</strong>, capable of amplifying or suppressing the light rays, introducing non-linearity to the path. This is akin to how activation functions in neural networks transform inputs, allowing the model to capture complex patterns.</p></li><li><p><strong>The Output Wall</strong>: At the cube's end, there's a wall displaying every English word. When light exits, it illuminates words, with brightness indicating the probability of selection.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H7AK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H7AK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H7AK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184199,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H7AK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!H7AK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5a7630c-7a49-41bf-8cb2-f9facce5e42f_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Understanding Backpropagation: Tuning the Reflectors</strong></p><p>Here's where learning, or <strong>backpropagation</strong>, comes into play:</p><ul><li><p><strong>The Feedback Mechanism</strong>: When the light doesn't illuminate the expected words, we start adjusting the reflectors using what we'll call "electromagnetic waves".</p></li><li><p><strong>Adjusting the Reflectors</strong>: These waves, representing gradients of the loss function, move backward from the output wall to each reflector. They gently tweak each reflector's orientation based on how much it contributed to the mismatch between prediction and reality. This is <strong>gradient descent</strong>, where we calculate how to adjust each weight to reduce future errors.</p></li><li><p><strong>From Output to Input</strong>: The process begins at the back, where errors are most visible, and works its way to the front, refining the entire path of light.</p></li><li><p><strong>Iterative Refinement</strong>: This adjustment doesn't occur once but over many <strong>epochs</strong>, with each pass through the data refining the reflectors' orientations, much like training an LLM over numerous iterations to improve its accuracy.</p></li></ul><p><strong>Emergence in Complex Systems: The Collective Intelligence</strong></p><p>Now, let's explore how this system gives rise to intelligence through <strong>emergence</strong>:</p><ul><li><p><strong>The Power of the Collective</strong>: Individually, no reflector holds intelligence; it's just metal with an orientation. But together, they create a system where complex behavior emerges.</p></li><li><p><strong>Emergent Intelligence</strong>: This collective interaction results in a behavior that can't be predicted by examining one or even a few reflectors. The system, as a whole, processes, understands, and generates language, showcasing how simple rules lead to complex, unpredictable outcomes.</p></li><li><p><strong>Unpredictable Patterns</strong>: Like in other complex systems, such as ant colonies or the human brain, the interactions of these simple components (weights) over vast datasets lead to emergent capabilities.</p></li></ul><p><strong>Predictability and Complexity</strong></p><ul><li><p><strong>Beyond Individual Weights</strong>: Analyzing weights in isolation doesn't reveal much about the model's behavior. This mirrors how in our cube, understanding one or a few reflectors won't help predict the light pattern on the output wall.</p></li><li><p><strong>The Complexity of Emergence</strong>: The true intelligence of an LLM comes from how all these weights interact, much like focusing a telescope where each adjustment to a lens (reflector) contributes to the final, focused image.</p></li></ul><p><strong>Additional Considerations</strong></p><ul><li><p><strong>Regularization</strong>: To prevent reflectors from aligning too uniformly or chaotically, imagine "stabilizing fields" or gel viscosity adjustments. This represents <strong>regularization</strong> in neural networks, which helps prevent overfitting by ensuring a balanced learning process.</p></li><li><p><strong>Bias</strong>: Think of biases as smaller, fixed reflectors that subtly shift the light's path before it hits the main reflectors, adding another layer of fine-tuning to the model's output.</p></li></ul><p><strong>Real-World Implications</strong></p><p>This understanding isn't just academic; it underpins applications from language translation to ethical considerations in AI, where emergent behaviors can lead to unforeseen outcomes.</p><p><strong>Relating to Other Complex Systems</strong></p><p>Just as in natural systems where simple rules lead to complex behaviors, LLMs exemplify how basic interactions can result in sophisticated intelligence.</p><p><strong>Conclusion</strong></p><p>Through this analogy of a glass cube with metallic reflectors, we've explored how weights and backpropagation in LLMs work to create emergent intelligence. The reflectors, simple in isolation, together form a system where complexity and intelligence arise from simple rules, much like the wonders of natural complex systems.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Glossary:</strong></p><ul><li><p><strong>Epoch</strong>: One complete pass through the entire training dataset.</p></li><li><p><strong>Gradient</strong>: The slope of the loss function, indicating how to adjust weights.</p></li><li><p><strong>Loss Function</strong>: Measures how far off predictions are from actual results.</p></li><li><p><strong>Activation Function</strong>: Introduces non-linearity into the output of a neuron, enabling complex representations.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Introduction to Neurons in Transformers]]></title><description><![CDATA[Exploring the definition of "neurons" in neural networks, from the broad definition to the strict, as this term is sometimes used in various ways in the machine learning space.]]></description><link>https://www.grokmountain.com/p/introduction-to-neurons-in-transformers</link><guid isPermaLink="false">https://www.grokmountain.com/p/introduction-to-neurons-in-transformers</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 06 Feb 2025 23:38:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gu4s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of neural networks, particularly transformer-based models like Grok, the term "neuron" is frequently used to describe computational units. Understanding these "neurons" requires looking at them through different lenses, from broad to strict definitions, to grasp how they operate within the intricate layers of a transformer.</p><p><strong>Broad Definition of Neurons in Neural Networks</strong></p><p>At its broadest, a <strong>neuron</strong> in a neural network can be thought of as an entity that uses <strong>weights</strong> to perform transformations on inputs to produce outputs. Within transformer architectures, each layer involves some form of weight-based computation, implying that every layer could be considered to contain neurons under this interpretation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gu4s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gu4s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gu4s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:179870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gu4s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gu4s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0869d4d5-abbe-408a-95a6-7e700abba4a0_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Understanding Weights</strong></p><p><strong>Weights</strong> are akin to the intelligence or unique perspective of each neuron. Each weight determines how much influence one input feature has on the neuron's output. In a way, weights are what give each neuron its unique "job" or "view" of the data. When neurons act collectively in a layer, their combined weights enrich the hidden state vectors with meaning, transforming raw data into nuanced representations as they pass through each layer.</p><p><strong>Strict Definition of Neurons</strong></p><p>However, if we define neurons more strictly:</p><ul><li><p>A <strong>neuron</strong> is a function that:</p><ul><li><p>Takes a single vector as input.</p></li><li><p>Applies <strong>weights</strong> to this vector for transformation.</p></li><li><p>Uses an activation function (like ReLU or GELU) to introduce non-linearity.</p></li><li><p>Outputs a scalar value after activation.</p></li></ul></li></ul><p>This definition is most directly applicable to the neurons found in the <strong>Feed Forward Network (FFN)</strong> of transformers. Here, each neuron processes the output from the previous layer, multiplies it by a weight matrix (where each weight signifies a different aspect of data interpretation), applies an activation function, and outputs a scalar.</p><p>While this strict definition gives us a precise understanding of how neurons operate in their most basic form, the architecture of transformers pushes these concepts further. In transformers, neurons don't just work in isolation; they are part of a larger, interconnected system where their roles are defined not only by their individual weights but by how they interact with other neurons across layers. Let's explore how this definition expands in the context of transformer models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Dhy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Dhy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Dhy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:196443,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Dhy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Dhy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a42867f-d9f7-45b1-bb72-42312cd4ded9_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Expanding the Definition</strong></p><ul><li><p><strong>Feed Forward Network (FFN) Neurons</strong>:</p><ul><li><p>These neurons operate exactly as described in the strict definition, handling the transformation of features into higher-level representations based on their unique weight configurations.</p></li></ul></li><li><p><strong>Output Layer Neurons</strong>:</p><ul><li><p>Expanding slightly, the neurons in the output layer also fit into this model. They take the final hidden state vectors, apply weights (each weight determining the importance of each feature for the final prediction), and can use an activation function like softmax to convert logits into probabilities for classification tasks. However, not all output layers use an activation function, sometimes outputting logits directly based on their weighted interpretations.</p></li></ul></li><li><p><strong>Normalization Layers</strong>:</p><ul><li><p>Layers like Layer Normalization adjust vectors by normalizing them based on their statistics (mean and variance). These aren't neurons in the strict sense as they don't use weights for transformation, but they prepare data for neuron-like processing by ensuring stability in the network's computations. By standardizing the scale of inputs, normalization layers ensure that weights across different neurons or layers can be compared and combined more effectively. This stability is vital for training deep networks where the scale of inputs can dramatically shift across layers, potentially leading to vanishing or exploding gradients without normalization.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Llr5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Llr5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Llr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:176816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Llr5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Llr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca743a74-239f-4353-bfb1-6aedc9427704_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Self-Attention Layers</strong></p><ul><li><p><strong>Self-Attention Mechanism</strong>:</p><ul><li><p>This layer uses three sets of weights to compute Query (Q), Key (K), and Value (V) matrices:</p><ul><li><p><strong>Q, K, V Calculation</strong>: Rather than traditional neurons, these operations involve matrix manipulations where the input is a whole matrix, not individual vectors. Here, weights define how each part of the sequence interacts with another, creating a dynamic, context-aware understanding of the data. These weights, learned during training, enable the model to discern patterns or relationships within the text, allowing for a nuanced understanding of context. Each weight in these matrices essentially votes on how much attention one part of the sequence should pay to another, dynamically adapting to the content and structure of the input.</p></li><li><p><strong>Attention Scores and Output</strong>: The computation of attention involves these matrices interacting in ways that are complex, distributed, and not easily reducible to the concept of individual neurons. Instead, think of this as a sophisticated network of interactions where traditional neuron models are stretched to their limits.</p></li></ul></li></ul></li></ul><p><strong>Conclusion</strong></p><p>Defining neurons in transformers involves understanding them through various levels of abstraction. From the broadest perspective, neurons are everywhere weights are applied, giving each neuron its unique role in data interpretation. In the strictest sense, they are in the FFN for scalar operations. Expanding this definition, we include the output layer for task-specific predictions and normalization layers for data preparation. However, in self-attention, we stretch the neuron concept to encompass matrix operations that are more like coordinated clusters of computational entities. This nuanced understanding underscores the complexity and beauty of transformer architectures in modern AI.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Transformers and Holography: How AI Models Capture the 'Whole in Every Part']]></title><description><![CDATA[Explaining how the FFN is able to process each token independently due to it taking as input contextually aware vectors, thanks to the holographic nature of self attention.]]></description><link>https://www.grokmountain.com/p/transformers-and-holography-how-ai</link><guid isPermaLink="false">https://www.grokmountain.com/p/transformers-and-holography-how-ai</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 06 Feb 2025 03:02:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fXsE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Have you ever wondered how transformers, the backbone of modern AI language models, relate to holography? To understand this connection, let's first delve into an intriguing aspect of holography known as the "whole in every part" principle.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fXsE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fXsE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fXsE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138350,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fXsE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fXsE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1adb0e37-f6c5-41b4-bef7-f58388164f2e_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Holography and the "Whole in Every Part":</strong></p><ul><li><p>In holography, unlike traditional photography, if you cut a hologram in half, or even into quarters or eighths, each piece will still generate the whole holographic image when you shine a laser on it, albeit at a lower resolution. This is different from a photograph where, if you only have an eighth of it, you wouldn't know what's in the rest of the image.</p></li></ul><p><strong>Application in Transformers:</strong></p><ul><li><p>Similarly, in transformer models, this principle is mirrored by the self-attention mechanism. Here, each word or token in a sequence isn't processed in isolation but gains awareness of its context through interactions with every other token. This means each token becomes a "holographic" representation, carrying not just its own meaning but also echoes of the entire sentence or context.</p></li></ul><p>To illustrate how this works, let's consider the prompt, "What did Newton realize when an apple fell from the tree and hit him on the head?"</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dIan!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dIan!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIan!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIan!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIan!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dIan!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:311155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dIan!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dIan!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dIan!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dIan!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37b76c66-d833-4516-b48a-54366d5c766b_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Embedding Layer:</strong></p><ul><li><p>Here, each word starts as a simple 3D image of itself. "Apple" is just an apple, "tree" is just a tree, and "Newton" is just Newton.</p></li></ul></li><li><p><strong>Self-Attention Mechanism:</strong></p><ul><li><p>These 3D images are then fed into the self-attention layer, which acts on the whole sequence at once:</p><ul><li><p>"Apple" no longer stands alone; it now has a less vivid but significant presence of the tree and Newton. This vector becomes a holographic image where the apple is central, but with shadows or impressions of the tree from which it fell and the person it hit. While each vector now includes context from the whole sequence, the primary token remains the central focus, with contextual elements providing additional depth.</p></li><li><p>Similarly, "Newton" now includes faint images of the apple and tree, encapsulating the scene where he's pondering under a tree.</p></li></ul></li></ul></li></ul><p><strong>Why This Matters for Transformers:</strong></p><ul><li><p>This "holographic" approach is crucial because it allows the subsequent Feed-Forward Network (FFN) to operate effectively. The vectors, now enriched with context from the self-attention mechanism, are ready for the FFN, which processes these tokens independently but can only do so with meaningful results if each vector already embodies the context of the whole sequence. Without this context, the FFN would struggle to make sense of individual tokens in isolation.</p></li></ul><p><strong>The Major Innovation of Transformers:</strong></p><ul><li><p>It's vital to understand that the self-attention mechanism, or our "hologram-making machine," doesn't work on tokens one at a time. Instead, <strong>it takes the entire sequence of tokens (3D images) as input simultaneously</strong>, processing them together to create context-aware vectors. This is fundamentally different from how the FFN operates:</p><ul><li><p><strong>FFN's Operation:</strong> The FFN, in contrast, processes these now contextually rich or "holographic" vectors one at a time. Each vector, having been transformed by self-attention to embody the "whole in every part," can be independently processed by the FFN because it already contains the necessary context from the entire sequence. Remember, this is an analogy; the transformer's actual operations are mathematical and involve vector manipulations, not light.</p></li></ul></li><li><p><strong>Why This Works:</strong> Without this holographic nature imbued by self-attention, where each vector holds a piece of the entire sequence's context, the FFN would be back to the limitations of RNNs or LSTMs, where processing is sequential and lacks the parallel, global context understanding. The self-attention mechanism computes each token's relationship to all others in parallel, allowing for simultaneous context awareness across the sequence.</p></li></ul><p><strong>Understanding the FFN:</strong></p><ul><li><p>The FFN doesn't manipulate holograms; it works on vectors enriched with contextual information from the self-attention mechanism, treating each as if it were a 'hologram' of the whole sequence. In reality, the FFN consists of multiple layers where each might expand, transform, and then reduce the dimensionality of the data, enhancing it with learned features. It can be thought of as two prisms:</p><ul><li><p>The first transformation expands this holographic light into a higher-dimensional space, exploring complex patterns or features.</p></li><li><p>The second transformation refocuses this enriched light back into the original dimensionality, now with added context.</p></li></ul></li><li><p>After passing through the FFN, the "apple" vector not only shows the scene but also how this scene relates to broader concepts like gravity, enlightenment, or scientific discovery.</p></li></ul><p>Keep in mind, while this analogy applies broadly to transformers, different models might implement these concepts in varied ways.</p><p><strong>In conclusion,</strong> the transformer architecture uses a method akin to holography's 'whole in every part' principle to understand language comprehensively. By making each word aware of the entire context, transformers can process language in parallel, providing both efficiency and depth in understanding. This innovative approach has revolutionized how we think about and implement neural networks for language processing.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Deciphering the Language of the Ancients: How Self-Attention Works in LLMs]]></title><description><![CDATA[How the Q, K, and V matrices are used during the self attention mechanism to decipher the interrelationship between words (tokens) in a sequence.]]></description><link>https://www.grokmountain.com/p/deciphering-the-language-of-the-ancients</link><guid isPermaLink="false">https://www.grokmountain.com/p/deciphering-the-language-of-the-ancients</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Wed, 05 Feb 2025 03:33:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rOLK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine you've joined the legendary archaeologist, Professor Indiana Jones, on his latest adventure. He's unearthed a sequence of novel hieroglyphics from an ancient Egyptian site, a puzzle that promises secrets about a legendary medallion. But here's the twist: neither Indiana nor anyone else understands these symbols. The riddle before us: Are these hieroglyphics telling a tale of a king burying a valuable medallion, being buried with one, or does the medallion serve as a tool for digging?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rOLK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rOLK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rOLK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:200796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rOLK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rOLK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2f608c-2589-4059-ac3b-4656dc76df92_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This scenario mirrors the challenge faced by Large Language Models (LLMs) when processing sequences of words or tokens. Just like these hieroglyphics, we start with a set of vectors representing words in a sequence. The first hurdle to overcome, before any action or prediction can be made, is to understand the interrelationship between these tokens.</p><p><strong>The Need for Contextual Awareness</strong></p><p>When you pose a question to an LLM, the model begins by trying to understand the context or the relationships between the words in your query. This is where self-attention comes into play, much like Indiana Jones needing to decipher the hieroglyphics before embarking on his quest for the medallion.</p><p>In our analogy, before Indiana can venture into the world to find the medallion (akin to the Feed Forward Network or FFN where world knowledge is applied), he must first understand what the hieroglyphic sequence is saying. He does this by leveraging three specialized classrooms, each contributing uniquely to this understanding:</p><p><strong>1. Theoretical Linguistics Classroom:</strong></p><ul><li><p><strong>Role</strong>: Focuses on the core concepts of the language. Students here ponder, "What could these symbols mean?"</p></li><li><p><strong>Function</strong>: They determine which symbols or combinations are crucial for understanding, much like focusing on what parts of the sequence need attention.</p></li><li><p><strong>Visual</strong>: Imagine this classroom as a vast grid of desks, each student equipped with a different piece of knowledge or approach to deciphering hieroglyphics.</p></li></ul><p><strong>2. Practical Decoding Classroom:</strong></p><ul><li><p><strong>Role</strong>: Students here, with cryptographic and archaeological expertise, explore how symbols connect or relate within the sequence. They ask, "How do these symbols interact within the text?"</p></li><li><p><strong>Function</strong>: They establish connections or patterns, similar to matching queries with relevant parts of the data.</p></li><li><p><strong>Visual</strong>: Another grid of desks, where each student's unique skill set helps in interpreting the practical implications of the hieroglyphics.</p></li></ul><p><strong>3. Critical Analysis Classroom:</strong></p><ul><li><p><strong>Role</strong>: This classroom delves into why certain combinations of symbols might be significant, pondering cultural or symbolic meanings.</p></li><li><p><strong>Function</strong>: They refine the context by providing depth or alternative interpretations, much like enriching the understanding of the sequence.</p></li><li><p><strong>Visual</strong>: Each student in their desk, analyzing the hieroglyphics from a cultural or symbolic perspective, transforming raw symbols into meaningful insights.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zWqA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zWqA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zWqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:235550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zWqA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zWqA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cb265b-27c1-4235-9354-335a1bdafadd_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Magic Behind the Classrooms: The Weight Matrices</strong></p><p>Each hieroglyphic or token in our sequence is represented by a vector in a high-dimensional space, known as a <em>hidden state vector</em>. These vectors capture the initial essence or meaning of each symbol before further interpretation.</p><p>These classrooms metaphorically represent static weight matrices (W_q, W_k, W_v) in the self-attention mechanism of LLMs. Here's the technical detail:</p><ul><li><p><strong>Static Weights</strong>: The knowledge or teaching approach in these classrooms (W_q, W_k, W_v) doesn't change after training; they are the weights that, once learned, remain fixed for inference. However, for each new hieroglyphic sequence, they produce unique reports (Q, K, V matrices) based on the sequence's hidden states.</p></li><li><p><strong>Dynamic Matrices</strong>: When a new set of hieroglyphics (or a sequence of tokens) comes in, these static weights interact with the hidden state vectors to produce:</p><ul><li><p><strong>Q Matrix</strong>: From W_q, helping to focus attention on relevant parts of the sequence.</p></li><li><p><strong>K Matrix</strong>: From W_k, matching queries to relevant parts of the sequence.</p></li><li><p><strong>V Matrix</strong>: From W_v, providing the actual content or transformation based on the context.</p></li></ul></li><li><p><strong>The Math</strong>: Simply put, each classroom's output is generated by multiplying the hidden state matrix (the collective of hidden state vectors) by their respective weight matrices:</p><ul><li><p><strong>Q = hidden_states * W_q</strong></p></li><li><p><strong>K = hidden_states * W_k</strong></p></li><li><p><strong>V = hidden_states * W_v</strong></p></li></ul><p>Think of this multiplication like each student in the classroom (weight) reading a unique aspect of the hieroglyphics (hidden state) to produce a new report (Q, K, or V matrix) tailored to that sequence.</p></li><li><p><strong>Size Context</strong>: For models like Grok-1, each classroom (weight matrix) can be as large as 6,144 by 6,144, but this size can vary depending on the model's architecture and the layer in question. The size of these matrices (Q, K, V) also depends on how many hieroglyphics or tokens are in the sequence, adapting dynamically to the input's length.</p></li><li><p><strong>Attention Scores</strong>: Only the 'reports' from the Theoretical and Practical classrooms (Q and K) are used together to compute attention scores. These scores dictate how much one hieroglyphic should pay attention to another, much like Indy decides how to connect the dots on his chalkboard. The Critical Analysis classroom's report (V) comes into play after this step.</p></li><li><p><strong>Weighted Sum</strong>: After computing the attention scores, they're applied to the Value matrix (V). This can be thought of as Professor Jones using the insights from how symbols relate (attention scores) to decide how to weigh the interpretations or transformations provided by the Critical Analysis classroom (V), effectively updating his understanding of each hieroglyphic.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W_SR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W_SR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W_SR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:227141,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W_SR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!W_SR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec094ca3-57cc-4bac-928e-225774bcd31d_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Bringing It All Together</strong></p><p>Professor Jones, like the self-attention mechanism, uses the collective reports (Q, K, V matrices) from these classrooms to piece together the meaning of the hieroglyphics:</p><ul><li><p><strong>Reports</strong>: Each classroom's report informs Jones on what to focus on, how symbols relate, and why they're significant.</p></li><li><p><strong>Analysis</strong>: This is akin to computing attention scores where Jones might add notes to a chalkboard, drawing connections or arrows between symbols based on these reports.</p></li><li><p><strong>Conclusion</strong>: Finally, he refines his understanding, much like updating hidden state vectors with contextual information, preparing for the next step of his adventure (or the model's next layer).</p></li></ul><p>This process is the first step in LLMs, where self-attention mechanisms work within the sequence itself to understand the relationships between tokens. Only after this contextual awareness is established can the model move forward, akin to Indiana Jones now ready to step into the broader world to find the medallion, which corresponds to the FFN where external knowledge or predictions are applied.</p><p>In summary, just as Indiana Jones deciphers hieroglyphics through the insights of specialized classrooms, LLMs use self-attention to interpret the context of words, turning raw sequences into meaningful narratives. This process is fundamental, allowing the model to understand language intricacies before diving into broader knowledge (FFN) to predict outcomes or generate responses.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Unveiling the Magic of Self-Attention: A Journey Through the Gatekeepers to the Mirror of Awareness]]></title><description><![CDATA[How the hidden state vectors pass through the first layer of the transformer block, the self attention mechanism, and the magic that happens inside to make these vectors self aware.]]></description><link>https://www.grokmountain.com/p/unveiling-the-magic-of-self-attention</link><guid isPermaLink="false">https://www.grokmountain.com/p/unveiling-the-magic-of-self-attention</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Tue, 04 Feb 2025 03:43:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UTTJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UTTJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UTTJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UTTJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222396,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UTTJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UTTJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46960dd2-edf8-472c-989f-90a20ea35de9_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Self-attention is a pivotal mechanism in transformer models, allowing each token in a sequence to understand its context by attending to all others. It's like giving each word in a sentence the chance to look around and see how it relates to its neighbors. Self-attention begins with embedding vectors - these are the initial representations of tokens, unaware of their surroundings or the other tokens in the sequence. Let's guide these vectors on an enlightening journey through the self-attention process.</p><p>Our adventure starts with the <strong>hidden state vectors</strong>, which collectively form a matrix. This matrix, representing all tokens in the sequence, is copied into three identical matrices, each destined to meet one of the guardians of context: the three Gatekeepers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C2ts!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C2ts!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C2ts!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:268552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C2ts!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C2ts!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a05da04-46db-4570-9104-b71ff10d8a70_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The Q Gatekeeper (Query)</strong>: One of these matrices approaches the Q Gatekeeper. Here, the entire matrix undergoes a transformation through weighted multiplication, focusing on what each vector within queries or seeks in the context. It's like giving each vector a new lens to view the sequence, one that highlights what they're looking for. The Q Gatekeeper imbues the matrix with a collective sense of inquiry, preparing each vector to seek connections.</p></li><li><p><strong>The K Gatekeeper (Key)</strong>: Another copy of the matrix encounters the K Gatekeeper. This gatekeeper assigns each vector within its 'key', determining how they can be matched or compared with others. Through another weighted multiplication, the K Gatekeeper provides each vector with unique glasses through which to see their identifiers, ensuring that each knows how to recognize itself in relation to the whole sequence.</p></li><li><p><strong>The V Gatekeeper (Value)</strong>: The final matrix is greeted by the V Gatekeeper, who preserves and enhances the inherent values of all vectors. With its magic, which involves weights like special lenses to focus or broaden each vector's view, this gatekeeper ensures that the essence of each word is clear and ready to contribute meaningfully to the collective understanding.</p></li></ul><p>The beauty here is in the parallelism - each gatekeeper casts its spell on all vectors at once, transforming the entire group into a new form that echoes their query, key, or value.</p><p>Once all three matrices have been transformed by their respective Gatekeepers, they converge at the heart of this journey &#8211; the <strong>Mirror of Awareness</strong>. The Mirror will only reveal its magic once all three transformed matrices are present:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!88Kf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!88Kf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!88Kf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:360939,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!88Kf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!88Kf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58a1c175-647a-48a2-90b8-9e9fe75f7dbc_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The Mirror of Awareness</strong>: Here, the matrices are not just reflected but dynamically interact. The Mirror calculates attention scores by comparing the queries (Q) with the keys (K), determining how much each word should attend to every other word in the sequence. It's like the reflections in the mirror dance and interact, with each vector's light adjusting based on its connections to others, revealing the true network of relationships. A softmax function then normalizes these scores, akin to adjusting the light to see these connections clearly. Finally, using these weights, the Mirror blends the values (V), creating new representations for each word that are now aware of their context, their relations to other words. It's as if each word has seen itself through the eyes of every other word, gaining a profound understanding of its place within the sequence.</p></li></ul><p>As these vectors, now self-aware and contextually enriched, prepare to leave the Mirror of Awareness, they are met by the <strong>Normalization Escort</strong> at the exit. This escort ensures that each vector shines just right, not too dim, nor too bright, preparing them for what lies beyond. With a wish of good luck, the vectors pass through to meet the two Prisms of the Feed Forward Network, where they will further refine their understanding.</p><p>In this journey, we've seen how self-attention transforms mere embeddings into vectors that 'know thyself' among others. The process is not just about altering data but about changing perspectives, giving each word a chance to understand its narrative role. This lays the groundwork for deeper comprehension in the transformer block, where each step from the Gatekeepers to the Mirror of Awareness, and beyond, orchestrates a symphony of understanding from what was once just a collection of isolated notes.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Profound Role of 6,144 in Grok-1 - A Deep Dive into Neurons, Weights, and Dimensions]]></title><description><![CDATA[Explaining the relationship among dimensions of the hidden state vectors, the weights of neurons, and the number of neurons in each layer of the neural network.]]></description><link>https://www.grokmountain.com/p/the-profound-role-of-6144-in-grok</link><guid isPermaLink="false">https://www.grokmountain.com/p/the-profound-role-of-6144-in-grok</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Sun, 02 Feb 2025 23:14:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!W9Y-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>The Essence of a Neuron</strong></p><p>In the realm of neural networks, particularly in advanced models like Grok-1, the neuron is the elemental unit of computation. Here's a closer look at how it operates:</p><ul><li><p><strong>Input Vector:</strong> Each neuron accepts a <strong>vector</strong> as its input. This vector might represent a token's hidden state in language models or any feature set in other applications. In Grok-1, this vector has a special dimension: <strong>6,144 elements</strong>. This number encapsulates the current understanding or context of the data being processed.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W9Y-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W9Y-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W9Y-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:286081,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W9Y-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!W9Y-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cf341e5-a83c-4d20-b921-a28215878a24_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The Neuron's Intelligence - Weights:</strong></p><ul><li><p>Every neuron possesses its unique set of <strong>weights</strong>. These weights are like the neuron's DNA, determining how it interprets the input vector. Critically, the number of weights in each neuron must precisely match the number of elements in the input vector - in this case, <strong>6,144 weights</strong> per neuron. This ensures that each dimension of the input is considered in the neuron's computation.</p></li></ul></li><li><p><strong>Processing and Output:</strong></p><ul><li><p>The neuron processes its input by performing a weighted sum, where each element of the input vector is multiplied by its corresponding weight, and then a <strong>bias</strong> is added. This sum is then often passed through an <strong>activation function</strong>, which introduces non-linearity into the model.</p></li><li><p>The result is a single <strong>scalar</strong> value. This scalar is the neuron's interpretation or transformation of the input vector, condensing potentially complex information into a single number.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qNG6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qNG6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qNG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:328344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qNG6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!qNG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F113f6acd-3daf-444f-88b9-716e78edead7_1024x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Significance of 6,144 in Grok-1</strong></p><ul><li><p><strong>Hidden State Vectors:</strong> The choice of 6,144 as the dimension for hidden state vectors in Grok-1 is not arbitrary. It allows for a high-dimensional representation of tokens, capturing intricate linguistic nuances or contextual relationships.</p></li><li><p><strong>Neuron Weights:</strong> With each neuron having 6,144 weights, the model ensures that every aspect of the input data is considered, providing a comprehensive analysis at each computation step.</p></li></ul><p><strong>Layers of Neurons</strong></p><ul><li><p><strong>Constructing the Output Vector:</strong></p><ul><li><p>When neurons are organized into layers, each neuron contributes its scalar output. Collectively, these scalars form a new vector. If there are 6,144 neurons in a layer, then the output vector will naturally have <strong>6,144 elements</strong>, maintaining the input's dimensionality.</p></li></ul></li><li><p><strong>Consistency Through Layers:</strong> As data flows through Grok-1, this consistency in vector size (6,144 elements) across layers ensures that information is processed with a uniform complexity, allowing for coherent learning and inference.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C3Al!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C3Al!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C3Al!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:415433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C3Al!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!C3Al!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441fad2a-fbdf-407c-8d6b-54900b94b8f0_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>The Feed-Forward Network (FFN) - An Exception to the Rule</strong></p><ul><li><p><strong>Expansion in the First Layer:</strong></p><ul><li><p>Here, the FFN does something unique. It takes the 6,144-dimensional input and expands it into a much larger space. With <strong>24,576 neurons</strong> in this first layer, each still dealing with 6,144 weights, the network explores a higher-dimensional space for deeper feature interaction. This step is crucial for capturing complex patterns that might not be evident in lower dimensions.</p></li></ul></li><li><p><strong>Projection Back in the Second Layer:</strong></p><ul><li><p>After expanding, the second layer of the FFN brings the representation back to the original dimension. With <strong>6,144 neurons</strong> again, each now processing the 24,576-dimensional output from the first layer, the network compresses this richer understanding back into the familiar 6,144-dimensional space. This ensures that the output of the FFN matches the expected input for subsequent operations or the model's final output.</p></li></ul></li><li><p><strong>The Role of 6,144 in FFN:</strong> Even in this expansion and contraction, the number 6,144 serves as an anchor, ensuring that the model can delve into complexity without losing the ability to communicate back to the rest of the network in its standard language.</p></li></ul><p><strong>Conclusion</strong></p><p>The number 6,144 in Grok-1 is more than just a dimension; it's a testament to the careful engineering of neural network architecture. It's at the heart of how each neuron processes information, how layers maintain consistency, and how the FFN navigates between complexity and simplicity. This number underscores the balance Grok-1 strikes between capturing the nuanced, multi-dimensional nature of language and other data while ensuring computational efficiency and coherence across its operations. Through the lens of neurons, weights, and dimensions, we see 6,144 as a key to understanding Grok-1's intelligence.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Illuminating AI: How Grok-1 Uses 'Prismatic' Layers to Understand the Universe]]></title><description><![CDATA[Explaining the concept of Feed Forward Networks (FFNs) as two layers, one that disperses data outward, one that converges it back into its original dimensions. Using analogy of two prisms.]]></description><link>https://www.grokmountain.com/p/illuminating-ai-how-grok-1-uses-prismatic</link><guid isPermaLink="false">https://www.grokmountain.com/p/illuminating-ai-how-grok-1-uses-prismatic</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Sat, 01 Feb 2025 03:09:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LywO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine you're asking Grok-1, an AI model, "Is there any light on the dark side of the moon?" To answer this, Grok-1 doesn't just see words but deciphers them through layers of computation, much like light passing through prisms. Here's how it happens:</p><p><strong>Step 1: Tokenization and Vectorization</strong></p><p>When you pose your question, "Is there any light on the dark side of the moon?", Grok-1 first tokenizes this sentence into approximately 10 tokens (the exact number might vary based on the model's vocabulary). Each token, like "Is", "there", "any", etc., gets transformed into a vector in a high-dimensional space through the embedding layer. These vectors are our initial "rays of light", each representing the basic semantic content of a token.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LywO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LywO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LywO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LywO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LywO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LywO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:176240,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LywO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LywO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LywO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LywO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e86ca4e-238c-487b-8eb4-21b06f15921d_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Step 2: Contextual Enrichment with Self-Attention</strong></p><p>Before passing through our prismatic layers, these 10 vectors (or rays) go through the self-attention mechanism. Here, each vector learns about its relationship with every other vector in the sequence, understanding context like how "dark" relates to "side" and "moon". However, at this stage, there's no "world knowledge"; it's merely about the interconnections within the sentence itself.</p><p><strong>Step 3: The First Prism - Refraction of Meaning</strong></p><p>Now, these contextually aware vectors enter the first layer of the Feed-Forward Network (FFN), which we can visualize as a magical prism:</p><ul><li><p><strong>Refraction</strong>: Just like light splits into a spectrum when passing through a prism, each vector is expanded into a much larger set of dimensions. This expansion allows the model to explore a vast array of semantic features or "colors" of meaning. Here, the model might start to consider concepts like:</p><ul><li><p>What does "dark" imply in different contexts?</p></li><li><p>The nature of light in relation to celestial bodies.</p></li></ul></li><li><p><strong>Non-linearity</strong>: This layer applies an activation function (like ReLU), acting as if each "color" of light is selectively enhanced or diminished, adding non-linearity to the data. This step helps in capturing complex patterns or nuances that weren't apparent in the original vectors.</p></li></ul><p><strong>Step 4: The Second Prism - Constructive Interference</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NHh4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NHh4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NHh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NHh4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NHh4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71c41f2b-aff7-440d-a38a-983982263f0b_1024x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After the first prism disperses the light, the second layer of the FFN acts like another prism, but this one combines:</p><ul><li><p><strong>Constructive Interference</strong>: Through this layer, the expanded dimensions are projected back to the original dimensionality of the model. It's here that the magic happens; the model synthesizes the dispersed information back into coherent vectors. This isn't just a reversal of the first step but rather a transformation where:</p><ul><li><p>The "rays" combine in ways that now embody knowledge about the universe. For instance, understanding that:</p><ul><li><p>The "dark side" of the moon refers to the side not facing the sun at any given time, not a permanently dark area.</p></li><li><p>There can indeed be light on the "dark side" due to earthshine or when it's in the sun's light during its orbit.</p></li></ul></li></ul></li><li><p><strong>Enrichment</strong>: The output is still 10 vectors, one for each token, but now these vectors, or rays of light, are imbued with the model's learned understanding of the world. They've gone from mere words to concepts rich with context and knowledge.</p></li></ul><p><strong>Conclusion</strong></p><p>By the time these vectors leave the FFN, they've been transformed through what can be thought of as prismatic layers, gaining depth and understanding. Grok-1, with this enriched representation, can now respond to the query with knowledge that mirrors our understanding of the universe, explaining that light can indeed be found on the "dark side" of the moon under certain conditions. This prismatic process in AI is a testament to how models like Grok-1 can simulate human-like understanding by manipulating data in ways that are both complex and beautifully analogous to natural phenomena.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Navigating the Neural Network Ant Hill: Understanding Supervised Learning, Reinforcement Learning, and Fine-Tuning]]></title><description><![CDATA[Explaining how the Learning Rate hyperparameter starts high (impactful) during early training stages, but decreases during each subsequent learning stage.]]></description><link>https://www.grokmountain.com/p/navigating-the-neural-network-ant</link><guid isPermaLink="false">https://www.grokmountain.com/p/navigating-the-neural-network-ant</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Fri, 31 Jan 2025 17:20:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XWr1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine a neural network as a giant ant hill - a complex structure of tunnels and pathways, each representing streams of thought or data processing within the network. We'll explore how this ant hill grows and adapts through different learning phases: supervised learning, reinforcement learning, and fine-tuning, with a focus on how the learning rate, akin to the speed of digging or sculpting, must adjust with each phase.</p><p><strong>Starting with a Pile of Dirt: The Untrained Neural Network</strong></p><p>At the beginning, we have nothing but a pile of dirt - this represents an untrained neural network. There are no tunnels, no paths for information to flow through, just potential waiting to be shaped.</p><p><strong>Supervised Learning: Carving the Initial Tunnels</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XWr1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XWr1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XWr1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XWr1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XWr1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae9b81e4-9755-4d55-831a-0fcaff2b3506_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>High Learning Rate - Digging Broad Paths</strong></p><ul><li><p><strong>Concept</strong>: Supervised learning is like the initial phase where ants (data) start carving tunnels through the dirt. Each piece of training data is akin to ants digging, creating pathways based on the patterns they find. These tunnels represent the model learning to predict outcomes based on given inputs.</p></li><li><p><strong>Learning Rate</strong>: Here, a high learning rate is essential because you need to make significant changes to the network to form these initial structures. It's like using a large shovel to quickly dig out broad, foundational tunnels. The high learning rate allows the model to quickly adapt to the training data, carving out large, general pathways where information can flow for the first time.</p><ul><li><p><strong>Why High?</strong> Fast learning is needed to establish the basic structure of understanding from a diverse dataset. Rapid adjustments in weights are necessary to form the neural network's initial "landscape" of knowledge.</p></li></ul></li><li><p><strong>Challenges</strong>: With such aggressive digging, there's a risk of overshooting or creating paths that might not be optimal, but it's a necessary step to get the structure started.</p></li></ul><p><strong>Reinforcement Learning: Refining the Pathways</strong></p><p><strong>Moderate Learning Rate - Adjusting and Expanding</strong></p><ul><li><p><strong>Concept</strong>: After the initial tunnels are in place, reinforcement learning comes in like experienced ants refining the tunnels. Here, the focus shifts from simply following where the data leads to optimizing based on feedback or rewards. It's about making the paths more efficient, expanding useful tunnels, or sealing off less effective ones.</p></li><li><p><strong>Learning Rate</strong>: The learning rate decreases from the initial high rate because now the goal is not to dig new tunnels but to adjust existing ones. It's like using smaller tools for more precise work, ensuring that each adjustment aligns with the goal of maximizing rewards or minimizing losses based on the feedback.</p><ul><li><p><strong>Why Moderate?</strong> The changes are more about optimization than creation. A too-high learning rate could disrupt what has been learned, while too low might fail to make significant improvements. This phase requires a balance to refine the network without losing the groundwork laid by supervised learning.</p></li></ul></li><li><p><strong>Outcome</strong>: The network learns to navigate the ant hill more effectively, choosing paths that lead to better outcomes based on the feedback mechanism.</p></li></ul><p><strong>Fine-Tuning: Targeting Specific Tunnel Systems</strong></p><p><strong>Low Learning Rate - Precision Adjustments</strong></p><ul><li><p><strong>Concept</strong>: Fine-tuning is like focusing on a particular section of the ant hill where specific tunnels need enhancement for a specialized task. It's about making micro-adjustments to the already existing structure to better serve a narrow purpose or dataset.</p></li><li><p><strong>Learning Rate</strong>: Here, the learning rate is significantly lower than in previous stages because you're dealing with a very targeted part of the network. It's akin to using a fine brush or a small spade to gently tweak the paths, ensuring that only the necessary tunnels are adjusted without disturbing the rest of the network.</p><ul><li><p><strong>Why Low?</strong> The aim is to preserve the general knowledge of the ant hill while enhancing performance in a specific area. A high learning rate could skew these tunnels too far, disrupting the balance of the entire structure. Fine-tuning requires delicate, precise modifications to align the model with the new task without unlearning the broader patterns.</p></li></ul></li><li><p><strong>Challenges and Benefits</strong>: The challenge is to ensure that while one part of the ant hill becomes more efficient or specialized, the overall functionality of the hill isn't compromised. The benefit is a model that's highly adapted to the task at hand while still capable of general navigation.</p></li></ul><p><strong>Conclusion</strong></p><p>Through this analogy, we see how the learning rate, or the speed and intensity of digging, must be tailored to each phase of learning. Starting with a high learning rate to establish foundational understanding, moderating it for optimization through reinforcement, and then lowering it for the precision of fine-tuning. Just like building an ant hill, training a neural network is about understanding how to shape the landscape of knowledge effectively for various tasks, ensuring each tunnel serves its purpose without collapsing the hill.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek's Chain of Thought: Rescuing Astronauts from Space]]></title><description><![CDATA[One of the differentiating factors that sets DeepSeek's recent AI models apart is its logical, step-by-step reasoning capabilities. It achieves this through its holistic approach to CoT.]]></description><link>https://www.grokmountain.com/p/deepseeks-chain-of-thought-rescuing</link><guid isPermaLink="false">https://www.grokmountain.com/p/deepseeks-chain-of-thought-rescuing</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Thu, 30 Jan 2025 02:43:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!en38!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine astronauts stuck on the International Space Station for an extended period. How would an AI like DeepSeek approach solving such a complex, life-critical problem? The answer begins long before the question is even posed, deep within the training and operational processes of DeepSeek's AI models.</p><p><strong>Training DeepSeek with Curated Knowledge:</strong></p><p>DeepSeek's journey to answer such a question starts with its training data. The model is not just fed vast quantities of text but is specifically curated to include high-quality content where problems are solved step by step. This includes texts from mathematics, engineering, physics, medicine, and even everyday scenarios like home repairs.</p><ul><li><p><strong>Step-by-Step Content:</strong> By learning from texts that inherently follow a logical, sequential thought process, DeepSeek absorbs the pattern of reasoning. Whether it's solving an algebraic equation or diagnosing a medical condition, the model is exposed to various domains where thinking in steps is crucial.</p></li><li><p><strong>Foundation of Chain of Thought (CoT):</strong> This exposure is the beginning of CoT. The model learns that complex problems often require a sequence of logical steps, not just a single, immediate answer.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!en38!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!en38!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!en38!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!en38!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!en38!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!en38!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:318924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!en38!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!en38!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!en38!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!en38!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c1ec8f7-6240-47b8-a8ca-3ee1dae43fb2_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Beyond Training: Reinforcement Learning (RL) and CoT:</strong></p><p>However, embedding CoT into DeepSeek doesn't end with training data. The model undergoes rigorous testing and reinforcement learning (RL) phases:</p><ul><li><p><strong>Testing with CoT Prompts:</strong> DeepSeek is presented with prompts that require step-by-step reasoning. For example, "How do you fix a broken circuit?" or "What are the steps to land an aircraft?" The model's responses are evaluated for logical coherence and accuracy.</p></li><li><p><strong>Reinforcement:</strong> Responses that follow a logical, realistic, and well-reasoned path are positively reinforced. This means the neural network parameters are adjusted to favor these types of responses in the future. Conversely, less logical or error-prone answers receive negative feedback, ensuring the model learns from its "mistakes."</p></li><li><p><strong>Group Relative Policy Optimization (GRPO):</strong> DeepSeek employs GRPO, an RL algorithm that encourages the model to explore various reasoning paths. When applied to our astronaut rescue scenario, GRPO would push DeepSeek to consider multiple rescue strategies, rewarding those that are most logical and feasible. This approach not only builds CoT but also ensures diversity in thought, enhancing problem-solving capabilities.</p></li></ul><p><strong>Inference: Putting CoT to Work:</strong></p><p>When the actual question about rescuing astronauts is posed to DeepSeek:</p><ul><li><p><strong>Visible CoT:</strong> DeepSeek might start by saying, "First, we need to assess the current situation of the astronauts' supplies and health." This visibility into its thought process is part of DeepSeek's "Deep Think" feature, showing users how the AI is reasoning through the problem.</p></li><li><p><strong>Prompt-Driven CoT:</strong> The model is prompted to think step-by-step, perhaps by the user's query or by internal mechanisms designed to trigger CoT for complex scenarios. It might then proceed, "Second, establish communication to understand any immediate medical needs..."</p></li><li><p><strong>Iterative Refinement:</strong> As DeepSeek generates each step, it considers the next logical action based on the previous steps, much like human problem-solving, enhancing the accuracy and relevance of the response.</p></li></ul><p><strong>Conclusion:</strong></p><p>The holistic approach DeepSeek takes to implement Chain of Thought reasoning is a significant advantage. By embedding CoT from training through to inference, DeepSeek ensures that its models not only understand complex problems but can also articulate solutions in a human-like, logical manner. This methodology, from curated training data to advanced RL techniques like GRPO, allows DeepSeek to excel in performance and efficiency, setting it apart in the AI landscape.</p><ul><li><p><strong>Performance:</strong> The model's ability to provide detailed, step-by-step solutions to complex queries like astronaut rescue scenarios demonstrates its advanced reasoning capabilities.</p></li><li><p><strong>Efficiency:</strong> By learning to think in steps, DeepSeek can offer solutions that are both resource-efficient and tailored to the problem's specific context.</p></li></ul><p>DeepSeek's focus on CoT is not just about answering questions; it's about understanding and solving real-world problems in a way that's transparent, logical, and actionable, which is why it's making headlines and disrupting the AI field.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek: The Disruptive Force in AI with Its Cost-Efficient "Mini-Me" Model]]></title><description><![CDATA[Miniaturizing AI models is done via quantization and distillation. DeepSeek uses a combination of these techniques to create models that are both cost-effective and high performant.]]></description><link>https://www.grokmountain.com/p/deepseek-the-disruptive-force-in</link><guid isPermaLink="false">https://www.grokmountain.com/p/deepseek-the-disruptive-force-in</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Wed, 29 Jan 2025 03:49:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hhhJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DeepSeek has emerged as a game-changer on the AI landscape. By creating a cost-effective "mini-me" version of its advanced models, DeepSeek has made high-performance AI accessible to those with limited computational resources. This feat is achieved through the strategic use of quantization and distillation. Here's how these concepts work, explained through an engaging analogy.</p><p><strong>The IMAX Movie and the Bootlegger Analogy:</strong></p><p>Picture an IMAX movie theater where you're immersed in an unparalleled cinematic experience, courtesy of high-cost, specialized equipment. This IMAX movie symbolizes DeepSeek's original, full-precision model, trained on vast datasets to deliver top-tier language understanding and generation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhhJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhhJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhhJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162375,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhhJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hhhJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99ad518e-f8b1-4110-8037-f3c607d114a5_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, imagine a bootlegger in the audience, secretly recording this movie with a hidden, standard-definition video camera:</p><ul><li><p><strong>Distillation (The Bootlegger's Process):</strong> This is similar to how DeepSeek crafts its "mini-me" model. The bootlegger (student model) captures the essence of what's on the IMAX screen (teacher model). He doesn't replicate every detail but learns enough to convey the movie's plot and feel. In the same vein, DeepSeek's smaller model is trained to mimic the outputs of its larger counterpart, preserving the core capabilities but with fewer parameters, making it viable for less powerful hardware.</p></li><li><p><strong>Quantization (Recording the IMAX Film):</strong> Here's where the magic happens. The bootlegger's video camera isn't as high-definition as the IMAX cameras used to film the original movie. This loss in quality due to recording a video of a video represents quantization. The bootlegger's recording compresses the information, reducing the color depth, resolution, and audio clarity. Similarly, quantization in AI reduces the precision of the model's numerical data, cutting down on the data size and computational demand while trying to maintain the model's performance.</p></li></ul><p><strong>DeepSeek's "Mini-Me" Model:</strong></p><p>DeepSeek's approach in creating this "mini-me" model has several implications:</p><ul><li><p><strong>Size and Efficiency:</strong> Just as the bootleg movie can be enjoyed on a regular TV without needing an IMAX screen, DeepSeek's model can operate on modest hardware, significantly lowering the cost and complexity of deployment.</p></li><li><p><strong>Performance:</strong> Despite the reduction in data precision, these models still deliver robust performance, thanks to the knowledge distilled from their larger counterparts and the strategic quantization that maintains essential model capabilities.</p></li><li><p><strong>Impact:</strong> By making AI more accessible, DeepSeek allows a broader range of companies and developers to leverage advanced language models, transforming the landscape of AI application development and democratizing high-quality AI tools.</p></li></ul><p><strong>Conclusion:</strong></p><p>DeepSeek's innovative application of quantization and distillation has positioned it as a disruptive force in the AI domain. By creating a "mini-me" model, they've managed to bring the essence of high-performance AI to environments where computational resources are limited, much like a bootlegger brings an IMAX experience to the home viewer. This not only reduces the financial and technical barriers to AI adoption but also encourages a broader exploration of AI's potential, potentially setting new standards for model efficiency in the industry.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek's AI Dojo: Harnessing the Power of Reinforcement Learning]]></title><description><![CDATA[DeepSeek's AI model uses a similar transformer architecture as ChatGPT and Grok, but DeepSeek employs training techniques (such as RL) that enable it to cost-effectively outperform the competition.]]></description><link>https://www.grokmountain.com/p/deepseeks-ai-dojo-harnessing-the</link><guid isPermaLink="false">https://www.grokmountain.com/p/deepseeks-ai-dojo-harnessing-the</guid><dc:creator><![CDATA[Grok Mountain]]></dc:creator><pubDate>Wed, 29 Jan 2025 01:21:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aDeg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The AI world has been buzzing about DeepSeek&#8217;s latest model, which has dazzled with its efficiency and remarkable performance. A key part of this success story lies in their strategic use of Reinforcement Learning (RL) as a fine-tuning technique. Let's explore this through a reimagined version of "The Karate Kid."</p><p><strong>DeepSeek's Training Regimen:</strong></p><ul><li><p><strong>The Groundwork</strong>: DeepSeek starts by "reading" the equivalent of all the world's martial arts texts. This represents the pre-training phase where the model absorbs a vast corpus of text, learning the breadth and depth of language from the internet, books, and beyond. This foundational knowledge is akin to Daniel LaRusso reading every book on martial arts in existence, making him theoretically the most knowledgeable fighter on paper.</p></li><li><p><strong>Fine-Tuning with RL</strong>: After this extensive pre-training, DeepSeek employs RL to refine the model's capabilities. This phase is not about learning new concepts from scratch but about optimizing and adapting what's already known:</p><ul><li><p><strong>Efficiency in Action</strong>: DeepSeek uses RL to fine-tune the model, which means making targeted adjustments to enhance reasoning, accuracy, or alignment with human preferences, much like cleaning up a fighter's technique for real-world application.</p></li><li><p><strong>Techniques in DeepSeek</strong>:</p><ul><li><p><strong>Group Relative Policy Optimization (GRPO)</strong>: Similar to Mr. Miyagi's personalized training methods, GRPO in DeepSeek helps in optimizing the model's policy for better reasoning outcomes.</p></li><li><p><strong>Chain of Thought (CoT)</strong>: This technique encourages the model to think step-by-step, like Mr. Miyagi teaching Daniel the sequence of movements in a fight.</p></li></ul></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aDeg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aDeg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aDeg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197457,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aDeg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aDeg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d367782-bb0e-4f99-a76f-4a863ab6c7fe_1024x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Reimagining The Karate Kid for RL:</strong></p><ul><li><p><strong>Daniel, The Ultimate Scholar</strong>: Daniel doesn't just skim a single book but engulfs all knowledge on martial arts. He's the epitome of book learning, understanding every aspect from every angle. This is the LLM's pre-training, where it learns all possible language patterns.</p></li><li><p><strong>Mr. Miyagi - The RL Coach</strong>: When Daniel meets Mr. Miyagi, the real learning begins. Mr. Miyagi, as RL, doesn't teach Daniel new martial arts but refines what he knows:</p><ul><li><p><strong>Wax On, Wax Off</strong>: These repetitive tasks are RL's way of providing feedback. Mr. Miyagi praises Daniel for correct moves (reward) and critiques or shows him again when he errs (penalty). This feedback loop is essential for fine-tuning:</p><ul><li><p><strong>Immediate Feedback</strong>: During training, Mr. Miyagi can instantly correct Daniel, much like RL can tweak model weights based on immediate output quality.</p></li><li><p><strong>Real-World Application</strong>: The feedback is grounded in practical scenarios, ensuring Daniel's (or the model's) knowledge translates into effective action.</p></li></ul></li></ul></li><li><p><strong>Tournaments - Inference</strong>: In competitions, there's no coaching. Mr. Miyagi observes Daniel's performance, noting what works and what doesn't. This is like when the DeepSeek model is in use, providing answers or generating text without intervention.</p><ul><li><p><strong>Post-Tournament Feedback</strong>: After the event, Mr. Miyagi uses this observation to tweak Daniel's training, similar to how RL fine-tuning might occur after collecting feedback on the model's performance during real-world use.</p></li></ul></li></ul><p><strong>Conclusion:</strong> DeepSeek's use of RL for fine-tuning mirrors Mr. Miyagi's teachings in our reimagined "Karate Kid." It's not about imparting new knowledge but refining and adapting what's already learned for optimal performance. The model, like Daniel, benefits from this practical feedback to become not just knowledgeable but effective in real-world scenarios.</p><p><strong>Final Thoughts:</strong> Just as Daniel learned to apply his book knowledge under Mr. Miyagi's guidance, DeepSeek's approach with RL ensures its AI model can translate vast theoretical understanding into practical, efficient, and high-performing applications. It's an elegant dance between absorbing the world's knowledge and mastering its application.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.grokmountain.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Grok Mountain&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>