{"id":2505,"date":"2025-11-05T12:58:25","date_gmt":"2025-11-05T17:58:25","guid":{"rendered":"https:\/\/www.med.unc.edu\/shire\/?page_id=2505"},"modified":"2025-11-13T09:58:20","modified_gmt":"2025-11-13T14:58:20","slug":"cost-scenarios-for-large-language-models-in-the-shire","status":"publish","type":"page","link":"https:\/\/www.med.unc.edu\/shire\/get-help\/cost-scenarios-for-large-language-models-in-the-shire\/","title":{"rendered":"Cost Scenarios for Large Language Models in the SHIRE"},"content":{"rendered":"<p><span data-contrast=\"auto\">The following write-up describes a series of experiments run by members of the SHIRE team to estimate the cost of using various Large Language Models (LLMs) within the SHIRE. We hope this information serves as helpful guidance to study teams wanting to price out their LLM options. <\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline\"><strong><em>Bottom Line Up Front: Conclusions from Testing<\/em><\/strong><\/span><\/p>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Creating and using provisioned throughput endpoints is generally much more expensive than using the ready-made pay-per-token endpoints.<\/span><\/b><span data-contrast=\"auto\"> Therefore, users should use the pay-per-token endpoints whenever possible. In addition to costing much less, these endpoints don\u2019t carry the risk of additional costs due to accidentally leaving the endpoint running when not in use.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">Using the new scale-from-zero option for applicable provisioned throughput endpoints seems to dramatically cut down their costs, but their performance becomes much less reliable. Thus, we would not recommend using this feature at this time.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">Even knowing the DBU rate, throughput rate, time of use, number of input tokens, etc., <\/span><b><span data-contrast=\"auto\">there are clearly still factors impacting cost for which we don\u2019t have visibility<\/span><\/b><span data-contrast=\"auto\"> (i.e., the costs don\u2019t scale in fully consistent ways across different tests), making it impossible to predict LLM use costs with more than moderate precision.\u00a0\u00a0<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><span style=\"text-decoration: underline\">Testing Models<\/span><\/h2>\n<h3><strong>A few caveats<\/strong><\/h3>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">This information accurately reflects the results of our testing, but <strong>will likely be very different from your research use cases<\/strong>. The best way to use the costs below will be to <strong>determine scale differences in the costs of each model<\/strong>, rather than relying on the actual dollar amount shown.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">This information is current as of <strong>October 2025<\/strong>. Prices will likely change over time.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">This write-up is designed with a technical audience in mind. The results will be most useful to readers who have some experience using LLM endpoints.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:720,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<li data-leveltext=\"\u00b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u00b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\">Several model tests resulted in errors. They are included in the table below as indicators for the effects of unanticipated errors or complications during usage on cost<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><b><span data-contrast=\"auto\">General procedures for tests<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/h3>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"auto\">Tester started a Windows VM in the SHIRE<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">Spun up personal compute in Databricks to run the testing code notebook<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"auto\">When necessary (i.e., provisioned throughput), created the model serving endpoint to be ready around the same time the personal compute was ready\u00a0<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"4\" data-aria-level=\"1\"><span data-contrast=\"auto\">Utilized the model continuously for 10 minutes via an infinite code loop OR made intermittent calls to the endpoint over a 10 minute period OR see log below<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"3\" data-list-defn-props=\"{&quot;335552541&quot;:0,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[65533,0],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;%1.&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"5\" data-aria-level=\"1\"><span data-contrast=\"auto\">Upon completion of testing, immediately deleted model serving endpoint if created, spun down personal compute, and stopped VM<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00a0<\/span><\/p>\n<table style=\"height: 385px;width: 96.3714%;border-collapse: collapse\" border=\"1\">\n<tbody>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\"><span style=\"text-decoration: underline\">Large Language Model and Version<\/span><\/td>\n<td style=\"width: 26.4633%;height: 23px\"><span style=\"text-decoration: underline\">Max. Throughput Rate<\/span><\/td>\n<td style=\"width: 16.299%;height: 23px\"><span style=\"text-decoration: underline\">Length\/Action of Test<\/span><\/td>\n<td style=\"width: 16.299%;height: 23px\"><span style=\"text-decoration: underline\">Successful Test<\/span><\/td>\n<td style=\"width: 6.7414%;height: 23px\"><span style=\"text-decoration: underline\">Cost*<\/span><\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Low (9,500 tokens\/second); 343 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$10.49<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Doubled (19,000 tokens\/second; 686 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$9.97<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Low (100 model units; 107 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$2.43<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Doubled (200 model units; 214 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$4.62<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Low (9,500 tokens\/second; 343 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes intermittently<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$10.02<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Low (9,500 tokens\/second; 343 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continously; 60 minutes of no use; 10 minutes continuously\u00b9<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$26.86<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Low (100 model units; 107 DBU)<\/td>\n<td style=\"width: 16.299%;height: 23px\">30 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$5.93<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">Unspecified<\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes of no use; 10 minutes continuous use; 60 minutes no use; 10 minutes continuous use<\/td>\n<td style=\"width: 16.299%;height: 23px\"><span style=\"color: #ff0000\">no\u00b2<\/span><\/td>\n<td style=\"width: 6.7414%;height: 23px\">$0.24<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">pay per token<sup>3<\/sup><\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$0.17<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Llama 3.3 70B<\/td>\n<td style=\"width: 26.4633%;height: 23px\">pay per token<sup>3<\/sup><\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes intermittently<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$0.10<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Claude Sonnet 4.5<\/td>\n<td style=\"width: 26.4633%;height: 23px\">pay per token<sup>4<\/sup><\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes continuously<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$0.67<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"width: 13.0381%;height: 23px\">Claude Sonnet 4.5<\/td>\n<td style=\"width: 26.4633%;height: 23px\">pay per token<sup>4<\/sup><\/td>\n<td style=\"width: 16.299%;height: 23px\">10 minutes intermittently<\/td>\n<td style=\"width: 16.299%;height: 23px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 23px\">$0.16<\/td>\n<\/tr>\n<tr style=\"height: 43px\">\n<td style=\"width: 13.0381%;height: 43px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 43px\">pay per token<sup>5<\/sup><\/td>\n<td style=\"width: 16.299%;height: 43px\">Classify 100 patients from notes<\/td>\n<td style=\"width: 16.299%;height: 43px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 43px\">$0.07<\/td>\n<\/tr>\n<tr style=\"height: 43px\">\n<td style=\"width: 13.0381%;height: 43px\">GPT OSS 20B<\/td>\n<td style=\"width: 26.4633%;height: 43px\">Low (100 model units, 107 DBU)<\/td>\n<td style=\"width: 16.299%;height: 43px\">Classify 100 patients from notes<\/td>\n<td style=\"width: 16.299%;height: 43px\">yes<\/td>\n<td style=\"width: 6.7414%;height: 43px\">$1.07<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>*<span data-olk-copy-source=\"MessageBody\">This is the specific <strong>cost of running the LLM<\/strong>. It <strong>does not<\/strong> include the cost of running the VM or compute for the code notebook<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:360,&quot;335559737&quot;:0,&quot;335559738&quot;:0,&quot;335559739&quot;:0,&quot;335559740&quot;:276,&quot;335559991&quot;:360}\">\u00b9This tested a \u201cScale to Zero\u201d option<\/span><\/p>\n<p>\u00b2This tested a &#8220;Scale from Zero option&#8221; which does not specify throughput rate. However, w<span class=\"TextRun SCXW36735875 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW36735875 BCX0\">hen trying to <\/span><span class=\"NormalTextRun SCXW36735875 BCX0\">initiate<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> 10 minutes of continuous use via the infinite code loop, the tester got an error saying &#8220;the workload exceeded the model unit rate limit for the endpoint<\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW36735875 BCX0\">, please<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> try again later.&#8221; A moment later, the tester tried <\/span><span class=\"NormalTextRun SCXW36735875 BCX0\">initiating<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> the loop a second time which was successful<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\">. After around 8 minutes of running the loop, the same error appeared again. The tester spun down the personal <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW36735875 BCX0\">compute<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\">, waited 60 minutes, spun the personal <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW36735875 BCX0\">compute<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> back up again, and tried to run the loop for another 10 minutes. The tester received the same error <\/span><span class=\"NormalTextRun ContextualSpellingAndGrammarErrorV2Themed SCXW36735875 BCX0\">initially again<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> and had to <\/span><span class=\"NormalTextRun SCXW36735875 BCX0\">initiate<\/span><span class=\"NormalTextRun SCXW36735875 BCX0\"> the loop a second time. The error appeared again around 7 minutes. The tester then stopped and shut everything down as usual.<\/span><\/span><span class=\"EOP SCXW36735875 BCX0\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:0,&quot;335559739&quot;:0}\">\u00a0<\/span><\/p>\n<p><sup>3<\/sup><span class=\"TextRun SCXW89778263 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW89778263 BCX0\">Pay per token Llama 3.3 70B was 7.143 DBU\/1 million input tokens and 21.429 DBU\/1 million output tokens<\/span><\/span><span class=\"EOP SCXW89778263 BCX0\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:0,&quot;335559739&quot;:0}\">\u00a0<\/span><\/p>\n<p><sup>4<\/sup><span class=\"TextRun SCXW178922522 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW178922522 BCX0\">Pay per token Claude Sonnet 4.5 was 47.143 DBU\/1 million input tokens and 235.715 DBU\/1 million output tokens<\/span><\/span><span class=\"EOP SCXW178922522 BCX0\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:0,&quot;335559739&quot;:0}\">\u00a0<\/span><\/p>\n<p><sup>5<\/sup><span class=\"TextRun SCXW261692298 BCX0\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW261692298 BCX0\">Pay per token GPT OSS 20B was 1 DBU\/1 million input tokens and 4.286 DBU\/1 million output tokens<\/span><\/span><span class=\"EOP SCXW261692298 BCX0\" data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:0,&quot;335559739&quot;:0}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The following write-up describes a series of experiments run by members of the SHIRE team to estimate the cost of using various Large Language Models (LLMs) within the SHIRE. We hope this information serves as helpful guidance to study teams wanting to price out their LLM options. &nbsp; Bottom Line Up Front: Conclusions from Testing &hellip; <a href=\"https:\/\/www.med.unc.edu\/shire\/get-help\/cost-scenarios-for-large-language-models-in-the-shire\/\" aria-label=\"Read more about Cost Scenarios for Large Language Models in the SHIRE\">Read more<\/a><\/p>\n","protected":false},"author":55643,"featured_media":0,"parent":2307,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"layout":"","cellInformation":"","apiCallInformation":"","footnotes":"","_links_to":"","_links_to_target":""},"class_list":["post-2505","page","type-page","status-publish","hentry","odd"],"acf":[],"_links_to":[],"_links_to_target":[],"_links":{"self":[{"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/pages\/2505","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/users\/55643"}],"replies":[{"embeddable":true,"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/comments?post=2505"}],"version-history":[{"count":4,"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/pages\/2505\/revisions"}],"predecessor-version":[{"id":2511,"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/pages\/2505\/revisions\/2511"}],"up":[{"embeddable":true,"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/pages\/2307"}],"wp:attachment":[{"href":"https:\/\/www.med.unc.edu\/shire\/wp-json\/wp\/v2\/media?parent=2505"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}