HIGHLIGHTS, TECH TRENDS

Is the future of generative AI at the edge? Considerations on cloud to edge transition

February 1, 2024

Generative AI holds transformative potential across various industries, yet its applications are somewhat constrained by the size and complexity of the models. That’s why generative AI today mainly runs in the cloud, benefiting from unparalleled computational, memory, and energy resources. As companies develop new applications and run them on complex networks, the cloud experiences increasing limitations in terms of bandwidth, cost, and connectivity. Therefore, many are now exploring running generative AI at the edge.

This article delves into the rationale behind the migration of generative AI to the edge, under the premise that edge deployment of generative AI will not replace cloud deployment, rather they will have a complementary relationship. The cloud will continue to host AI model training, a less frequent and resource-intensive process, while inferences, and perhaps some model fine-tuning, will progressively extend to the edge.

The challenges of cloud-based generative AI

The substantial size and computational demand of most generative AI models necessitate their operation on robust centralized computing infrastructures, such as cloud servers. While prevalent, the long-term sustainability of the cloud-centric approach is being called into question. Concerns encompass cost, power consumption, latency, and the security and privacy of processed data.

The rapid expansion in generative AI workloads is driving an exponential demand for AI-enabled servers equipped with powerful yet expensive and power-hungry GPUs and additional cooling modules. These upgrades lead to a significant increase in the overall infrastructure costs, reaching up to seven times the expense of a standard server setup. The idle power consumption of an AI-accelerated server can reach approximately one kilowatt (kW), escalating to several kW at peak power. This figure multiplies by the number of servers needed to execute a generative AI model and the frequency of model runs. The power cost for data transfer over complex networks to and from the cloud adds to this equation. Consequently, energy consumption is on an exponential growth trend. Besides, with GPUs running most AI models in the cloud, the rise of applications using generative AI may lead to large traffic volumes and subsequent latency issues. Such limitations hinder suitability for real-time or low-latency applications.

Since the current data center setup is unsustainable and cost-inefficient, offloading some workloads to edge devices is imperative. By deploying generative AI solutions directly at the edge, organizations can easily integrate AI into their operations, minimize data transfer over external networks, and achieve quicker decision-making.

Benefits of generative AI at the edge

Edge computing plays a pivotal role in the evolution of generative AI because it addresses the problems of real-time processing, latency, cost and energy efficiency that generative AI needs to solve as it grows.

Privacy and security – As generative AI is increasingly being used, it gains access to even more sensitive information. Transferring, storing, and using data on cloud servers increase the potential for data breaches or unauthorized access. Edge-based generative AI safeguards users’ privacy by keeping data and intelligence localized on the device. This is important for both consumer data – encompassing personal, financial, health information – as well as enterprise data. Given that even a minor leak could lead to major consequences, on premises processing and storage mitigate the risks associated with transmitting all data to a remote server.
AI performance – Generative AI performance can be measured in many ways, including latency. With much of the data being generated and consumed at the edge, moving AI inferences there avoids the potential for latency caused by congested networks or cloud servers and enhances reliability. In an enterprise context, the benefits of low latency from local edge inference are substantial, meeting the demands of near-real time processing and responsiveness.
Accuracy and domain specificity – Many generative AI models running on the cloud are trained with generic data to address general purpose queries. However, using them for specific tasks will barely deliver results pertinent to the use case. To attain optimal accuracy and effectiveness, generative AI models need domain-specific training and additional fine-tuning tailored to enterprise-specific applications. The reduced size and parameter count of these domain-specific models make them well-suited for deployment on AI accelerators at the edge.
Cost – Cloud providers struggling with the equipment and operating costs associated with running generative AI models are increasing fees for their services. While AI inference costs in the cloud are recurring, the expense of inference at the edge is a one-time hardware investment. Augmenting the system with an edge AI accelerator capable of running generative AI models effectively lowers overall operational costs.
Power efficiency – Cloud-based inference processing of generative AI models often demands multiple AI accelerators and potentially multiple servers. Activating such a complex and power-hungry infrastructure leads to high energy consumption. Edge devices with efficient AI processing, delivering superior performance per watt, can execute generative AI models with significantly lower energy requirements, especially when considering not only processing but also data transport.

Full-stack optimizations for on-device generative AI

Making the most of generative AI at the edge requires a finely tuned balance of AI model optimization and hardware acceleration.

For generative AI to scale efficiently, models need to be resized to suit the computational and power constraints of target devices. Typically, when deploying AI models on edge devices, a trade-off between computational efficiency and accuracy occurs. However, a small and fast AI model lacks utility if it delivers inaccurate results. Therefore, a principled approach is essential to efficiently shrink AI models without compromising accuracy. AI model optimization may be achieved using three main techniques.

Quantization reduces the precision of the model’s weights and activations without significantly impacting its accuracy. Converting the 32-bit floating point (FP32) typically used to represent the weight of a model to 8-bit integers (INT8), for example, leads to a 4x decrease in model size, and at least a 50% speed improvement.
Pruning involves removing unnecessary or duplicate parameters from a trained model to make it more compact, faster, and more memory-efficient while maintaining a high level of accuracy.
Knowledge distillation involves training a smaller model to mimic the behavior of a larger, trained AI model, with the former being smaller but with the same level of accuracy as the latter.

Using these optimization techniques almost any AI application – including generative AI – can be deployed on the edge. Provided that the edge computing architecture is engineered to run AI workloads on device.

Generative AI requires hardware acceleration for fast, efficient inferencing and the capacity to handle multiple client requests simultaneously. Traditional industrial CPUs and GPUs proved inadequate for the growing data analysis needs of evolving networks, with specialized hardware being cost-prohibitive until recently. Today, the latest generation of AI chips tailored for edge computing unlock remarkable processing capabilities. They can efficiently handle large models, leveraging specialized accelerators optimized for inference task. The availability of small form factor computers equipped with purpose-designed AI processors for neural network processing, lays the foundation for edge-based generative AI. This new generation of hardware-accelerated edge devices not only swiftly processes optimized AI models but does so with reduced power consumption, ensuring energy efficiency and adaptability across various consumer and enterprise use cases. By directly performing AI inferences, edge devices can autonomously improve their performance in a given task through data learning, improving real-time response time and often surpassing human capabilities.

Deploying generative AI at the edge with SECO technologies

SECO, with extensive AI expertise and a comprehensive end-to-end offering, is a trusted partner to support companies approaching generative AI on the edge.

Starting with a dedicated portfolio of AI-accelerated edge computing platforms designed to make the edge architecture ready for performing generative AI inferences locally. This range of products is finely tuned for artificial intelligence, incorporating specialized accelerators like GPUs, NPUs, and TPUs, elevating computing capabilities to the next level. These processors are purpose-built to address the unique computational demands of neural networks, propelling devices into the domain of advanced machine learning and real-time decision-making. Consequently, every unit of processing power is harnessed specifically for AI inference, with an additional emphasis on energy efficiency.

SECO’s expertise also extends to AI software implementations. The Clea software suite stands out as a robust and cost-effective solution for harnessing the potential of field data by building value-added services, enabling advanced AI applications, and more – all within a modular, fully integrated system that boasts scalability and stability.

IoT data managed and orchestrated by Clea is also one of the foundations of StudioX. Tailored for OEMs to create personalized AI-powered support services, StudioX builds on a rich spectrum of technologies, including Generative AI, Large Language Models (LLMs), Machine Learning, Deep Learning, and Computer Vision. Functioning as a conversational interface, StudioX is geared to enhance manufacturing efficiency, elevate product quality, increase customer satisfaction, and create new revenue streams. Through seamless integration with client data, business systems, and workflows, StudioX can be trained on diverse datasets from various sources. The platform then delivers human-like responses that allow to optimize workflows and operational productivity, elevate support experience, and provide access to AI-generated knowledge in real-time.

By ingesting device telemetry data coming from Clea, for example, and cross-referencing them with other critical information, it can generate performance reports and automatically fit into work cycles to optimize them. Similarly, StudioX can inform on device status and potential upcoming issues, enabling predictive maintenance and, if necessary, providing real-time guidance for step-by-step troubleshooting. Its conversational interface allows operators to effortlessly retrieve and analyze data without the need for manual input. This not only saves time but also mitigate the risks of human error, empowering operators to make well-informed, data-driven decisions.

While there will always be a need for generative AI operating in the cloud, the present and future of generative AI lie at the edge, especially for enterprise-specific applications. The edge is a pivotal frontier for differentiating and attaining a competitive advantage, especially in scenarios where the time and complexity required to make a decision or trigger an event are fundamental. A robust edge computing infrastructure and purpose-built AI accelerators are essential factors in making this possible. Contact our team of experts today and learn how SECO can support your business in harnessing the potential of on-device generative AI, from implementing an AI-ready edge infrastructure to integrating a ready-to-use AI virtual assistant.

Computers on modules for computer systems: the most reliable and innovative products

Computers on modules for computer systems: the most reliable and innovative products The small

The importance of ESG criteria in corporate decision-making

The importance of ESG criteria in corporate decision-making ESG criteria are becoming increasingly important

Smart Solutions for Smart Displays

Smart Solutions for Smart Displays Digital displays are finding their way into all

Who we are

We are a tech company building solutions and technologies to enable a new generation of digital devices. From Edge Computing, to IoT, to AI, our comprehensive and modular offering suits the needs of customers who are looking for a partner to maximize the potential of their products and fully leverage new technological opportunities.