Controlling Costs in the SHIRE: Tips and Tricks
Please keep the below suggestions and recommendations in mind to limit the costs of your SHIRE to only what is necessary to complete your project
Use our budgeting tool to set cost ceilings per month. You can use the budgeting service in your workspace to set a budget ceiling for a given time period of SHIRE use. As you approach or meet this ceiling, you will receive alerts from the system to enable you to adapt accordingly. The SHIRE Cost Estimator and the SHIRE Cost Estimator Guide are available to help you develop a budget for your project.Review your weekly SHIRE cost reports for any unexpected charges. Study PIs will receive weekly emails with their project’s SHIRE costs over the last 7 days. Make sure to open and review these emails each week so you can catch unusual spending patterns early.
Don’t create virtual machines (VM) if you’re not going to use them, and don’t create more than one per user. Certain charges accrue when you create a Windows or Linux virtual machine—for example, the fixed charge for the disk space available on that VM. Only users who are going to actually log in to their VM and use it should create one. There is no reason to have more than one VM per user. If anyone on your study team has more than one, they can Stop and Disable the unused ones in the SHIRE interface.
Create the smallest virtual machine that will meet your needs. When you create a new VM, you have small, medium, and large options, with each costing progressively more per hour. Most users will be fine with the medium option (4 CPU/16 GB RAM), which is comparable to a laptop in terms of computing power. Users that need more processing power can use the large option (8 CPU/32 GB RAM), which is comparable to a desktop tower. Users that need high-performance compute will generally use Databricks within the SHIRE. Because that extra compute power will come from Databricks and not the VM itself, Databricks users can actually use a small or medium VM.
Use Databricks cautiously. We want to encourage SHIRE users to use and explore Databricks if their analysis requires larger compute—however, if you haven’t used Databricks before, there is a learning curve that you should be prepared for. It is possible to rack up costs that you were not anticipating if you do not follow the training materials carefully. For example, unintentional actions such as creating a compute cluster that is much larger than you need, forgetting to set the automatic timeout to a low number (e.g. 30 minutes), or using one of the large language model endpoints without considering cost. We encourage users to review our Databricks introductory material, as well as the documentation that Databricks provides to gain confidence with the tool and its capabilities.
Use Large Language Models (LLMs) cautiously. We are delighted to be able to offer access to LLMs for use with clinical data, but be aware that they can be very expensive to use. Our cost estimator tool can help you plan for these costs, but you can also control things on your own by running LLMs on smaller subsets of your data until you’re really ready to do a full run.