February 02, 2024

Making Large Language Models Work On The Edge

Thumbnail

Large language models like ChatGPT have revolutionized content creation. From writing code to offering nuanced, human-like conversations through customer service chatbots, businesses are now able to generate personalized content using foundation models trained on tens of billions of words. This evolving relationship with textual and visual data is fueling a new generation of potential custom solutions based on machine learning (ML) and requiring ever larger amounts of computing power and time.

With the cost of fine-tuning large language models (LLMs) exploding, however, price is rapidly becoming a barrier to training new generative AI applications. Privacy is another roadblock: Training on public cloud-hosting platforms deters its adoption by potential customers in certain sectors.

These issues are driving the demand for new technologies to solve these problems and allow the continued development of LLM-generated business products. One answer to this problem is coming from quantum-inspired algorithms that resolve many of the issues constraining the use of ML on local devices.

Limitations Of LLMs

LLMs are trained with billions of parameters in massive language datasets. The weights of trained models, the hundreds of gigs of required memory and the rising prices of electricity and semiconductors make these models an incredibly resource-intensive area of ML. In fact, these generative models are becoming prohibitively expensive to train, rapidly limiting the accessibility of LLMs to all but the biggest enterprises.

Also, cloud-hosted LLMs are currently the norm in most industries, yet this convenient access is counteracted by some of the disadvantages. Cloud-hosted training models are subject to intermittent connectivity problems and latency. Training via the cloud introduces other major concerns such as the privacy of proprietary information and the sovereignty of training data. Cloud-hosted LLMs are also limiting for enterprises that want to train or run LLMs on-premises or on edge devices.

Lifting Limitations With Tensor Networks

Once developed within the next five to 10 years, the full potential of quantum computing will revolutionize computational capacity in ways that completely transform our relationships with machines. In the interim, however, tensorizing neural networks is a bridging technology. It introduces new options for enterprises struggling with the resources needed for ML on LLMs on the edge.

Tensorizing neural networks is a quantum-inspired solution that uses a data structure derived from quantum physics. Neural networks have enormous numbers of nodes, with significant redundancy built into the structure. Tensors rid these networks of this redundancy without discarding its benefits. Restructuring the data results in a smaller model that can be trained faster on classical computers without the need for more hardware.

Opening New Horizons For LLMs

LLMs on the edge have many applications across a range of sectors. For example, defense agencies and contractors require security for sensitive data, necessitating training with local servers. In the financial sector, traders training models for sentiment analysis don’t want sensitive information or data leaked to competitors. To capture changes in sentiment reflected in news reports or social media channels, they also require models that run very quickly. Tensorizing neural networks enable rapid and repeated running of models on local devices without risking the capture of proprietary data or its use in training foundation models.

Another example of the value of localized training involves businesses in which connectivity is an issue. This is relevant to remote resource operations, such as mines or oil and gas installations, that depend on reliable real-time connections to convey vital safety and operational information. Self-driving vehicles could make good use of complex LLMs, but not if those models need a guaranteed connection to a server at all times, as connections to the cloud can be impacted while driving through tunnels or in remote locations.

Overall, ML provides an incredible opportunity to solve complex equations underlying many business problems—yet a paradigm shift is required. Until the advent of fully fault-tolerant quantum computing hardware, tensorizing the massive neural networks used to train and fine-tune LLMs offers a solution to simulating only the meaningful parts of a system without losing accuracy.

While still a new technique, it is an exciting approach for enterprises looking to surmount exorbitant costs, maintain the privacy of their data and use LLMs on the edge.

Full article here