What Is up with CAE in the Cloud?
I’m writing this article from my living room, because... well, you know the reason. COVID-19 has changed a lot. One impacted aspect is how and where we work. When employees were forced into home office, organizations struggled to build up the necessary infrastructure for this new reality. The situation exposed how unprepared our economy is for the coming challenges. Even before the pandemic, traditional manufacturing companies were under enormous pressure to update their product portfolio and find new business models, threatened by socio-economic earthquakes like digitalization and the climate crisis. They must now rapidly transform into “agile” and “digital” operations. During the pandemic, it became apparent that these colorful words are too often still merely marketing slogans and do not have a lot to do with the everyday life of an engineer.
One technology is central to the necessary transformations of the industry: the cloud. The cloud enables new services, business models, and previously unknown development speeds by making data and resources universally accessible. It is especially powerful in engineering, where the success of new product designs often hinges on the accessibility of the best hardware and software.
This article covers the relationship of the cloud and “CAE”. Computer Aided Engineering (CAE) is a discipline that bundles digital design and optimization tools for mechanical hardware products. The cloud is a vaguer concept. At the end, it is nothing more than a network of servers you do not buy but rent. Most industries use it every day as a key enabling technology for new business models, digital products, and services. It is the heart of the most commercially successful companies on the planet. One successful example is Office 365 by Microsoft, that makes working on documents easier and more collaborative.
Cloud adoption is growing steadily across the board. Even in the relatively conservative Germany, 76% of enterprises already employ the cloud and 19% plan to employ it soon1. In CAE, the adoption of cloud technology is, however, far behind what we see in other disciplines. Why is that?
A well-known concept which describes the market entry of new technologies and services is the Innovation Adoption Lifecycle (see Figure 1). It segments the market for any given product into 5 categories. Here, us engineers are among the “laggards” or "late majority". We are often skeptical of new technologies and prefer sticking to proven routines.
But there are now visible signs that enterprises are slowly starting to embrace the opportunities of the cloud in the field of CAE. COVID-19 is just one factor that forces organizations to question the status quo and to invest more in agility, speed, and innovation. A prominent example: Nissan recently announced the migration of their High Performance Computing (HPC) services to the cloud2. The HPC analyst “Hyperion” predicts that the cloud HPC market will grow 2.5 times faster than on-premises HPC3. Last year, hardware vendors such as Dell sold more to cloud providers than to other enterprise customers4. This highlights in which direction we are heading.
When we discuss running CAE in the cloud with our clients and partners, similar questions often arise:
- Why should I start with this? (aka, The Value Question)
- Isn’t the cloud more expensive than on-premises systems? (aka, The Cost Question)
- Is my data secure in the cloud? (aka, The Fear Question)
- Everything else runs on-premises, how should I start this journey? (aka, The Implementation Question)
In the following sections, I will go over each of these points, discuss the data and outline some potential pitfalls. By the end, you will have general understanding of the status quo of CAE in the cloud and the big picture when looking ahead.
Disclaimer: I am of course a cloud native engineer. Everything I say should be taken with a grain of salt. Still, I will try to be as unbiased as possible and focus on the hard facts, leaving the interpretation up to you.
The Value Question
Or … Why should I start with this?
TL;DR: “It’s more flexible than an on-site product, better suited for the challenges facing your business today, and helps you update your product portfolio faster.”
CAE is all about computations and simulations (e.g. structural mechanics or Computational Fluid Dynamics). Organizations computing large quantities at great speeds will optimize their products more efficiently and thus gain a significant competitive advantage. This is where the power of the cloud comes into full force.
1. Elasticity Instead of Rigidity
In the past, businesses needed to invest in an on-premises HPC system. This is not only expensive, but it also has a crucial drawback: you operate on a fixed capacity. In engineering reality, workloads fluctuate with changing business demands. Today, you might need to run three simulations and tomorrow 500 to get the project done. An on-premises HPC system is a key resource shared by many users. At full capacity, it becomes a bottleneck hindering crucial design studies and delaying innovation. In the cloud you receive an “infinite” capacity and you always have just the computing power you need at any given moment. Figure 2 shows how the fixed capacity of a data center limits your simulation throughput.
This flexibility is especially valuable since HPC has a remarkably high ROI (Return on Investment). Studies about the business impact of HPC vary, but it has been shown that every $1 invested in HPC will typically yield an ROI of between $75 and $5005.
With the cloud, engineers can also easily test out new hardware and software to find the optimal solution for any given engineering task. This improves engineers’ productivity, allowing them to use tools for a specific time or project only or to test out different approaches for a given task. As the CAE world becomes more diverse with even more specialized algorithms, this brings crucial speed to your R&D. Cloud subscription models will include everything you need to start working right away. This simplifies scaling software capacities and adding features or hardware power, aligned with your evolving needs.
- Cost savings (on-premises server must be designed for peak loads)
- Time savings (projects are not delayed because jobs are queuing)
- Improved scalability (easily update hardware and software with fluctuating demand)
2. HPC on Steroids
Simulations increase in complexity. Models grow bigger and system simulations are required for new generations of increasingly complex hardware products. This directly impacts hardware demands. The “infinite” capacity of the cloud allows an unlimited number of parallel jobs. A game changer, when running many design variations in a design space exploration or automating CAE workflows. Recently, Microsoft demonstrated that MPI (Message Passing Interface) runs with 80.000 cores on their public cloud platform Azure6. This power makes every business more competitive as they gain the capabilities of an enterprise-style data center.
- Time savings (reduced time to result through parallelization)
3. Zero It Footprint on the Best Infrastructure
Who likes dealing with Linux installations on servers? This is the reality you face when implementing an on-premises HPC solution. After installation, you and your IT department carry the ongoing burden of keeping up with maintenance and upgrades. Cloud providers offer fully functioning turnkey solutions including hardware, software, and service and take responsibility for downtimes, maintenance, and updates.
This saves valuable time that is normally spent in the procurement process, installation, and maintenance that can now be used to focus on new product designs. The cloud reduces the need to own IT infrastructure, making your organization slimmer and more agile, outsourcing the risk of hardware investments to your cloud provider.
Cloud providers make sure that you always receive the newest and best hardware on the market. This is a key advantage because hardware is getting more diverse. Consider the latest advantages of general-purpose GPUs (Graphics Processing Units) and FPGA (Field-Programmable Gate Array) chips. Modern CAE algorithms depend heavily on running on the optimal hardware, making it increasingly difficult to accommodate all needs of CAE when investing in on-premises capacities.
- Always run all engineering applications on the best hardware
- Reduce cost of IT operations (staff and hardware)
- Increased agility (no investment, no hardware acquisition, no facilities)
4. Accessibility in a Browser-Based Software Environment
Combined with cloud technology, CAE data becomes accessible for many stakeholders across different departments in the organization, such as design, research, development, and product lines. This enables collaboration and communication between the parties. Simply share your projects, results and reports with a few clicks or discuss the effects of new design variations in real time.
In a cloud-based environment, remote and distributed work (remember the pandemic?) work out of the box. Cross-functional and international engineering teams can access all design tools and projects from anywhere and at any time, no matter what device is used. A simple laptop or tablet plus a browser is all you need.
- Higher productivity (access from anywhere, anytime)
- Innovation through collaboration
In the end it boils down to only one thing - making your engineering more productive, so that you can focus on what truly matters: optimizing product designs faster and fostering innovation. This improves your organization’s competitiveness in rapidly changing markets.
The Cost Question
Or … Isn’t the cloud more expensive than on-premises systems?
TL;DR: “It can be. However, if you choose the right model, the price will be at least competitive and often lower than purchasing hardware.”
Often you hear that the cloud is 2 to 5 times more expensive than investing in a server. This is not true. Let us clear up some misconceptions.
Misconceptions & Clarifications
Firstly, when comparing a cloud-based solution with an on-premises solution you risk an apples to oranges comparison. As outlined above, the two models behave completely differently and might even serve other purposes. Keep this in mind when analyzing costs. Still, I will try to build up a fair comparison.
Secondly, when you determine the cost of a cloud solution, you must consider the concrete pricing model of your cloud provider. A variety of options exist, suited for different business demands. Some offer more flexibility (“on-demand”) for a higher base price. Others provide more fixed models, closer to the on-premises world (e.g. renting a cloud server for 3 years). You need to decide which model fits your needs best, both financially and technically.
Thirdly, cloud computing is an Operational Expenditure (OPEX), whereas on-premises hardware requires Capital Expenditure (CAPEX). CAPEX typically involves complicated procurement processes and needs to be justified in front of upper management. Also, investing is a financial risk. What happens if the expected ROI of the server does not materialize? What happens if priorities change and instead of CPUs, you need GPUs for the next design project? Often the amortization of an IT investment takes a very long time or never happens.
Lastly, make sure to consider all costs involved in CAE. Often, just the bare metal hardware and software licenses are compared. This is a misleading metric because those are just the tip of the iceberg.
Building up a TCO Analysis
The most common approach is a Total Cost of Ownership (TCO) analysis, that considers all costs during a typical lifecycle of an HPC server system (3-5 years). Most studies conclude that raw hardware (computing, storage, switches, interconnect, …) will be only 10-30 % of the total cost of your system. IT staffing is a significant cost block that is often overlooked, because it is difficult to measure. You must consider all the time consumed by IT-related work, including maintenance, procurement, and commissioning. Sometimes, you might not have a dedicated administrator and the server is managed by an engineer on the side. Staffing will account for at least 20-30 % of the TCO. Since HPC is power-hungry, electricity also plays a large role, accounting for roughly 15 %. Specialized software can be required (applications, operating systems, etc.) and accounts for 10 %. Server downtimes will lead to productivity loss which has to be considered as well - typically with 15 %. Lastly, the room where your server is located must also be included. This depends heavily on the size of the system. Are you thinking in supercomputers or workstations? In this calculation, we will use 15 % as a rough estimate7.
Now, let us crunch the numbers. When purchasing a standard HPC system with 256 CPU cores, you will pay roughly € 125k for hardware. Your TCO over 3 years will be:
Table 1: Simplified TCO analysis of a 256 core HPC system
How does this compare to the cost in the cloud? Say you rent CPUs of the Type Intel Xeon Platinum 8168 with 44 Cores on Azure. This will cost you roughly € 2.67 per hour, giving you € 0.06 per core hour. If you purchase your core hours through a third party, you will typically pay € 0.10-0.20.
It now comes down to one key metric: the utilization of your servers. Even when you are not there, your server still is. The effective core hour price is:
Core Hour Price = TCO / (Available Core Hours x Utilization).
In the example, our on-premises server has
256 Cores x 24 h x 365 d x 3 yrs = 6.727.680 Core hours
available. At 70 % average utilization, the effective core hour price is
€ 625.000 / (6.727.680 Core hours x 0,7) = 0.13 € / Core hour.
Table 2 lists the effective core hour prices from 10 to 100% utilization.
Table 2: Comparison of cloud and on-premises. All prices are given in €/Core hour
In this simple example, with a price of € 0.10 in the cloud, you need to achieve an average utilization of over 90 % to justify the investment. This is almost impossible to achieve in real life.
A simple example has shown that the cloud is competitive with an on-premises system when you consider all costs. What we left out are changing demands in the future. HPC is a field of growth and you might need to update your infrastructure to support new usage scenarios. New hardware types (GPUs, FPGAs, etc.), multiple interconnect technologies (InfiniBand, Omnipath, etc.) and codes highly optimized for certain hardware configurations make strategic buying decisions increasingly difficult. To complete the business case, you must also include the opportunity cost of an on-premises server, for example by throttling your simulation output and failing to materialize the significant ROI of HPC.
The Fear Question
Or … Is my data secure in the cloud?
TL;DR: “The cloud is secure; learn to use it securely. Do not miss out on the significant business advantages the cloud offers due to the fear of the unknown.”
There is one potential showstopper that is brought up in every discussion about CAE in the cloud at some point: data security. Security is a crucial aspect in CAE because engineers handle some of their organizations’ most sensitive data.
There is an ongoing and very general trend of corporate IT migrating to the cloud. Already, 53 % of corporate data is stored in the cloud8, with the share still growing. This shows that CAE is part of a bigger picture. The security of the cloud is being addressed in a structured manner by a growing number of organizations. Today, more than 30 % of corporates have a cloud security concept9 that covers all uses of cloud. This is beneficial for CAE engineers, who can reapply exiting patterns and processes to their field.
In 2019, less security breaches happened to public cloud resources than to in-house IT systems9. The number of companies affected by cloud security incidents in 2019 varies between studies, with numbers ranging wildly from 22 % to 80 %9, 10. Apparently, most incidents seem to be of minor severance. Gartner states that the likelihood of experiencing a major cyber- or ransomware attack is roughly only 1 %11.
An important take away: Today, most security professionals (61 %) believe that the risk of a security breach is the same or lower in the cloud than in on-premises IT12.
Cloud Security Measures
Cloud service providers are aware of their exposure on the internet and therefore invest large sums into security, including physical security of buildings, third party certifications (e.g. ISO 27001, SOC 2), specialized hardware, and experienced security managers who take care of the safety of their customer’s data around the clock. Due to this, the brand-name providers have proven to be remarkably resistant to cybersecurity attacks13.
To secure clouds, various technical measures, processes, and contractual obligations between provider and client are employed. These include communication encryption (HTTPS, VPN) between clients and in the cloud network, and encryption of keys and discs. An arsenal of stricter measures exists, such as digital signature checking and single-tenant environments.
New technologies are in active development to further increase the security of the cloud14. Examples are Confidential Computing, a technology that protects your data on a hardware level, and Cloud Security Access Brokers (CASB), which helps organizations orchestrate security across different cloud products.
When not used properly, clouds can pose a significant security risk. Today, cloud adaption is often far ahead of cloud strategy in organizations. Therefore, there is often a high degree of unrecognized public cloud usage14. Without a clear strategy in place, organizations can become vulnerable, especially where highly confidential data is involved.
Organizations must create infrastructure and know-how to adequately monitor and govern the use of cloud in all business units. Funding and support from the executive board are often missing, leading to chronically understaffed cybersecurity teams. Fields that should be on the radar include Identity and Access Management (IAM), cyber monitoring, security governance13, and the development of central security platforms, which allow surveillance of usage behavior and configurations across different platforms and applications.
The key challenge moving forward will not lie in the security of the cloud itself but in its usage and the policies and technologies which control it. Already today, in most cases it will be the user and not the cloud provider who is the root cause of security breaches15.
The cloud has strengths and weaknesses but is generally a secure technology. Some argue that it’s more secure than on-premises systems16. CIOs should make sure that cloud usage is not rejected due to unsubstantiated security concerns. Instead, the focus should be on developing new measures and processes for a safe usage of the new technology, as it was done several years back with on-premises IT systems. Paying too much attention to the security of the cloud provider draws attention away from establishing cloud controls within your organization, making you vulnerable to breaches.
The Implementation Question
Or … Everything else runs on-premises, how should I start this journey?
TL;DR: “The ecosystem is growing rapidly right now and covers the whole PLM toolchain. Get an overview of the landscape, choose the right adoption model, and get going.”
Now that we eliminated the roadblocks, it is time to get your hands dirty. When you start the journey to the cloud, you enter a diverse world of new technologies and possibilities with which you should familiarize yourself before defining your own cloud roadmap.
The world of cloud CAE is growing and maturing. Tools across the whole CAE chain are available, ranging from CAD, FEA (Finite Element Analysis) simulation, electronics simulation, CFD to PLM (Product Lifecycle Management). Cloud CAE has long been a domain of bold startups, sensing the reluctance of traditional vendors to invest in the new technology. However, this seems to be changing. Most of the household names in CAE now try to strengthen their profile in the field, updating their license plans to comply with the changing requirements of the cloud, and some have even launched home-made cloud solutions.
Lifting the Fog on the Different Models
It is important to consider the different models available. First make a fundamental decision on your approach to the cloud:
Only the public cloud will materialize the values described earlier, as it gives direct and flexible access to a powerful data center, thus enabling higher CAE productivity. For some organizations, a hybrid solution could be of interest, covering baseloads with an on-premises system and shifting peak loads to the cloud.
On the public cloud, you will then need to consider the adequate level of service for your case:
IaaS is the domain of the big brand names with Amazon’s AWS being the market leader (45 % market share), followed by Microsoft Azure (18 %)17. IaaS offers the greatest freedom, allowing integration with other services or clouds. However, expert knowledge about the specifics of cloud platforms is required. SaaS, on the other end, is closer to traditional “software”.
SaaS applications are developed by a third-party vendor, whose interface is accessed on the clients’ side. SaaS includes everything you need to start working right away. You usually have less possibilities to customize solutions, but you will also not need specialized IT know-how to get going.
In CAE, vendors take different approaches to enabling the cloud, with different degrees of cloud adoption. One approach is “Cloud plugins” that offer easy interfaces to migrate existing CAE workflows to the cloud. This is targeted at users who are looking at the cloud for outsourcing peak demands while keeping on-premises capacities. Traditional software vendors typically position themselves on this level. “Cloud natives”, on the other end, provide a complete turnkey cloud solution for specific CAE workloads, including pre- and post-processing and dedicated support teams.
Defining Your Journey
Moving forward, you need to choose the right implementation model for your needs. Consider your CAE intensity (number of simulations run per day), sensitivity of data, your existing in-house knowledge about cloud, and your company strategy when choosing an appropriate technology stack.
The migration of CAE workloads to the cloud will most likely be part of a broader company cloud initiative. Outline a roadmap for the transition to cloud CAE to avoid misalignment with the holistic company cloud strategy. You should identify all stakeholders of the CAE process (IT, design, sales), learn who else benefits from the initiative, and get input on the technology stack decision. Also, consider first migrating one entire CAE workflow to the cloud to reduce transition risks.
Moving CAE to the cloud will be an exciting and challenging task. Several stakeholders must be aligned, data security and governance need to be assured, and different licensing options must be evaluated. A coherent roadmap aligned with the broader company cloud strategy should be defined early on18. Make strategic decisions about the platforms you chose and partner up with competent and knowledgeable vendors along the way. Engineers need to advocate for the new approaches within their organization, spreading knowledge and dispelling common misconceptions.
18 Recommended Material: (1) Gartner E-Book “IT Roadmap for Cloud Migration”; (2) Gartner E-Book “Cloud Strategy Leadership”
Quo Vadis, Cloud CAE?
You have learned about the potential of the cloud and some challenges we are currently facing. Where will it go from here? Through the currently unfolding, large-scale socioeconomical transformations, manufacturing companies will be forced to update big parts of their product portfolio in the coming years. This translates to a lot of work for CAE engineers. Their success, on the other hand, will depend on their ability to use state of the art tools and hardware. The most successful organizations will learn to leverage the new technologies, thereby bringing better products to the market faster.
There is still so much more to come. 5G will lift mobile data transfer rates to the speed of wired internet. Soon, you might not even need WiFi to run simulations. If you suffer from poor internet in your region, maybe Elon Musk will soon help you out with mobile internet from space19. Hardware will further specialize (e.g. FPGA, Quantum Computing), enabling unknown speed and possibilities in simulation and data analytics. Politicians have woken up to the calls of manufacturing companies for a European cloud, building up the European cloud initiative “Gaia-X” to counter the tremendous success of the US-American hyperscalers20.
The cloud will create new possibilities for engineering. Fully automated design studies, access across different cloud platforms, and straightforward connection of tools through API (Application Programming Interface) access drastically broaden the spectrum of how simulation is employed and make the remarkable work of simulation engineers accessible to everyone. Already, companies have started integrating simulations with IoT (Internet of Things) and digital twin applications to determine the health of complex systems in operation with the help of sensor-enabled simulations. Simulations will become simpler, enabling design engineers to participate in the process (“Democratization”). At the same time, they will become more sophisticated to satisfy the demand for ever more complex multi-physics simulations and ever bigger models.
Can you imagine all this in an on-premises world? Well, I can’t. Imagine a world, where hardware does not play a role anymore. The project is due tomorrow? No worries, just spin up 10,000 cores. A big DoE (Design of Experiments) is required to find the best design? We’ve got you covered, run 100 simulations in parallel. You have no clue what tool to use to get to the bottom of your design question? Easy one, just test the SaaS applications out there in a quick trial. This is not some bright and distant future. It is today. We have finally arrived in a world where engineers can focus on what they are truly good at – improving product design, being creative and dancing with the physics. No wrestling with hardware, no IT-hustle, no overhead. Just engineering.