Machine learning costs: factors to consider
But before getting down to numbers, let’s quickly highlight the factors determining the final cost of a machine learning solution.
1. The complexity of the solution you’re eyeing to create
Machine learning solves many problems of different complexity. Social media engines making friend and content suggestions, smart surveillance cameras recognizing faces in video footage, and healthcare expert systems predicting heart failures run on machine learning. However, their complexity, performance, responsiveness, compliance requirements, and, hence, costs vary a lot.
2. The approach to training an ML model
Training a machine learning model can take one of three approaches: supervised learning, unsupervised learning, or reinforcement learning.
The choice among these methods significantly influences the costs associated with machine learning development. Let’s investigate how.
-
Supervised learning. This method involves training algorithms on datasets, with each example labeled with the correct output. It teaches the algorithm how to accurately classify data or predict outcomes using these examples. While supervised learning appears to require fewer computing resources than other methods, it is important to consider the potentially high costs of acquiring or creating a well-labeled dataset.
-
Unsupervised learning. In unsupervised learning, algorithms analyze and identify patterns in data without any prior labeling. This method necessitates human intervention, not only for validating the results but also for data preprocessing and pattern analysis. Although it requires significant computational power to handle large amounts of unclassified data, unsupervised learning can reveal insights that were not obvious or possible to define in advance.
-
Reinforcement learning. This method involves an agent that learns to make decisions by performing actions in a given environment and receiving feedback in the form of rewards or penalties. In contrast to supervised learning, which is based on a predefined dataset, reinforcement learning requires the model to interact dynamically with its environment, learning from the consequences of each action. This can be computationally intensive and may necessitate advanced infrastructure, particularly in complex environments.
Let us consider the cost implications of selecting a specific approach to ML model training.
Using supervised learning may require less computational power, making it appear cost-effective. However, the costs associated with creating or obtaining a labeled dataset can be significant.
Unsupervised and reinforcement learning, while computationally more demanding, do not require labeled data, which can lead to ML cost savings in situations where unlabeled data is abundant but labeling is impractical or prohibitively expensive.
Organizations looking to minimize ML model training costs might consider using foundation models like the GPT series from OpenAI. Specifically, this approach works well if you’re working on a generative AI solution.
Foundation models are pre-trained on large datasets and can be customized for specific tasks. This approach significantly reduces the need for large-scale data collection and computational resources, making it a more cost-effective alternative to training a model from scratch.
3. The availability and quality of training data
No matter the approach to machine learning, you will need enough data to train the algorithms on. Machine learning costs thus include the price of acquiring, preparing, and—in the case of supervised learning—annotating training data.
If you have enough training data on hand, you’re lucky. However, it’s rarely the case. Numerous researchers state that around 96% of enterprises do not initially have enough training data. For your reference, a study by Dimensional Research shows that, on average, ML projects need around 100,000 data samples to perform well.
You can synthetically generate the needed volume of data or augment the data you already have. Generating 100,000 data points via Amazon’s Mechanical Turk, for example, can cost you around $70,000.
Once you have enough data on hand, you need to make sure it’s of high quality. The study referenced above suggests that 66% of companies run into errors and bias in their training data sets. Removing those can take 80 to 160 hours for a 100,000-sample data set.
In case you opt for supervised learning (which is often the case for commercial ML solutions), you need to add the price of data annotation to the total machine learning cost, too. Depending on the complexity of labeling, it can take 300 to 850 hours to get 100,000 data samples labeled.
Drawing the line, a solid training data set of high quality can cost you anything from $25,000 to $65,000, depending on the nature of your data, the complexity of annotation, as well as the composition and location of your ML team.
4. The complexity and length of the exploratory stage
During an exploratory phase, you carry out a feasibility study, search for an optimal algorithm, and run experiments to confirm the chosen approach.
The cost of exploration depends on the complexity of the business problem, the expected time to market, and, subsequently, team composition.
As a rule, a team of a business analyst, a data engineer, an ML engineer, and—optionally—a project manager is enough to carry out the task. In that case, you can expect the exploratory stage to round out at $39,000 to $51,000.
5. The cost of production
Machine learning costs include the cost of production, too. Production costs involve the costs of the needed infrastructure (including cloud computing and data storage), integration costs (including designing a data pipeline and developing APIs), and maintenance costs.
-
Cloud resources
The price of the cloud infrastructure depends on the complexity of the models being trained. If you are building a simpler solution that relies on data of low dimensionality, you may get by with four virtual CPUs running on one to three nodes. This may cost you around $150 to $300 a month, or $1,460 to $3,600 a year.
If the solution you’re eyeing to create requires high latency and relies on complex deep learning algorithms, expect a minimum monthly cost of $10,000 to be added to the total ML price.
-
Integrations
Developing integrations involves designing and developing the data pipeline and the needed APIs. Putting together a data pipeline takes up around 80 development hours. Putting two to three API endpoints in place and documenting them to be used by the rest of the system requires another 20 to 30 hours, the cost of which should be added to the final machine learning cost estimates.
-
Support and maintenance
Machine learning models need ongoing support during their entire life cycle: incoming data must be cleansed and annotated; models must be retrained, tested, and deployed
According to the study conducted by Dimensional Research, businesses commit 25% to 75% of the initial resources to maintaining ML algorithms.
Assuming that the initial solution architecture and data pipelines are well designed and part of the recurring tasks is automated, the support cost can range from $20,000 to $150,000 per year based on the selected support model.
6. The cost of consulting
If you’re just tipping toes in the machine learning waters, you can’t really get too far without an experienced ML consultant.
Two main factors determining the cost of ML consulting include:
-
Consultant’s experience. It is worth making experience a critical factor in your hiring decision. You want to partner with someone who has enough expertise in the field you may not necessarily be familiar with.
-
Project scope. The more complicated the project, the more consulting involvement it will require. Moreover, if the project scope is undefined, search for a consultant who can carry out a discovery phase for you and offer a compelling proposal with all the necessary estimations.
Our ML consulting rates usually start from $75 per hour and depend on the seniority level of a specialist.
7. Opportunity costs
Opportunity costs can be defined as forfeiting all benefits associated with not taking an alternative route. To put things into perspective, think of Blockbuster, a former leader in the movie rental market. Foregoing innovation, the company lost to a newly emerged leader—Netflix. The opportunity cost equaled $6 billion and a near-bankruptcy.
The same idea goes for machine learning initiatives. Enterprises lagging in ML adoption can’t tap into the predictive insights and informed decision-making that come with it.
On the opposite side, implementing machine learning just for the sake of innovation, say, to solve problems that require rule-based solutions, is a loss as well.
Therefore, you may want to consider the cost-to-benefit ratio and carefully weigh implementation risks before introducing AI in business. This is where expert AI consulting services may help.
So, how much does ML cost? These estimates from ITRex’s portfolio might give you an idea
Now that you are familiar with the factors affecting the total ML price, let’s look at some examples from ITRex’s portfolio to help you better understand the costs involved.
Note that we draw effort estimations, too. The reason is that the price of developing an ML solution depends greatly on the composition and location of your ML development team. You can get an idea of the total cost associated with developing a similar ML solution based on the following ML engineer hourly rates:
Location | Average hourly rate |
---|---|
United States |
$130 |
Central Europe |
$75-$85 |
Eastern Europe |
$65-75 |
Asia |
$30 |
Latin America |
$20 |
Please be aware that the estimated budgets provided below apply exclusively to the development of the machine learning component within these solutions. It’s essential to consider additional expenses, such as infrastructure, productization, and other associated costs, as machine learning operates in conjunction with various elements within the wider solution.
Project 1. Emotion recognition solution
A multinational media and entertainment company wanted to analyze footage from their surveillance cameras to recognize people’s emotions. The task was complicated by degraded visual conditions, such as the quality of the footage itself, as well as people wearing face masks, glasses, and other items that made recognition difficult.
The media tycoon was seeking a trusted media and entertainment software vendor to conduct extensive research and power future development. The ITRex team of two ML engineers tested out three neural networks, selected the one optimal for the task, fine-tuned it for better performance, and provided other strategies for achieving a higher accuracy score.
ML team efforts: 350 hours
ML costs: approx. $26,000
Project 2. A fitness mirror with a personal coach inside
The customer wanted to build an innovative fitness mirror that can act like a personal coach, offering personalized training plans and guiding users through training sessions with real-time recommendations.
The ITRex team built the hardware components of the smart device and provided end-to-end software development, spanning infrastructure setup, embedded software/firmware development, and content management.
When it comes to the machine learning component of the solution, we designed and trained a deep learning model using a dataset of workout records to provide guidance for users. Additionally, we implemented computer vision algorithms for motion tracking and human pose estimation, as well as object recognition algorithms for overseeing the sports equipment used in workouts.
ML team efforts: approx. 640-700 hours
Costs: approx. $51,000-56,000
Project 3. Automated document recognition solution
Our customer was eyeing to create a solution that would automate the process of filling out documents. The key goal of the project was to develop an independent optical character recognition (OCR) solution that would recognize and index batches of incoming documents, as well as seamlessly integrate the solution into the customer’s existing document processing system.
The OCR solution we crafted helps automate the traditionally resource-intensive process of marking and indexing documents, leading to time and cost savings. By drastically reducing the manual effort typically allocated to document marking and indexing, the solution allows for handling more documents within the same timeframe. The outcome? Enhanced productivity and swift, accurate processing of critical documents.
ML team efforts: approx. 3000-4000 hours
ML costs: $225,000-300,000
How can you reduce ML development costs—and get ROI fast?
If you are thinking about venturing into AI development and looking for ways to lower machine learning costs without putting the quality of the final product at risk, look through our field-tested recommendations.
Start small but have a bigger picture in the back of your mind
When kicking off an ML project, it often pays off to keep the initial scope smaller. By starting with a minimum viable product, you can focus your resources on a specific problem and iterate quickly. This approach helps save machine learning costs in several ways:
-
Starting small allows you to test your ideas and hypotheses with a smaller dataset and a reduced set of features. This, in turn, lets you quickly assess the feasibility and effectiveness of your ML solution—without investing significant resources upfront.
-
By keeping the scope smaller, you can pinpoint and address potential challenges or limitations in your machine learning pipeline at an early stage. This helps avoid costly rework at the later stages of development.
-
By prioritizing critical use cases and features, you allocate resources more effectively and focus on the areas that provide the fastest ROI rather than tackling the entire project at once.
Follow MLOps best practices from day one to avoid scalability issues
MLOps refers to a set of practices that enhance collaboration and automation in ML development projects. By setting up an MLOps pipeline from the start, you can mitigate potential scalability issues and reduce machine learning costs. The cost reduction is achieved via:
-
Streamlined development process: MLOps promotes standardization and automation while reducing the need for manual, error-prone operations.
-
Scalable infrastructure: MLOps focuses on building scalable infrastructures to support the entire ML development lifecycle: from data preprocessing to model deployment. This helps accommodate growing data volumes, increasing model complexity, and higher user demand without introducing significant changes to the infrastructure.
-
CI/CD: CI/CD practices ensure that changes introduced to your ML solution are automatically integrated, tested, and deployed in a reliable and automated manner.
Use pre-trained machine learning models
Using machine learning models that have been previously trained helps reduce machine learning costs in the following ways:
-
Transfer learning: Serving as a starting point for many ML tasks, pre-trained models allow transferring the knowledge learned from a different but related task to the problem in question, which saves substantial computational resources and training time.
-
Reduced data requirements: Training ML models from scratch calls for large volumes of annotated data, which can be quite costly and time-consuming to collect and label. Pre-trained models can be fine-tuned on relatively small volumes of domain-specific data.
-
Faster prototyping and iteration: Pre-trained models allow you to quickly prototype and iterate your ML solution.