June 17, 2024

Big AIs in Small Devices


Only a few years ago, artificial intelligence (AI) was dominated by rather small, tailored models, each designed for a singular purpose. Now, a massive transformation is occurring. These models are increasing in size on a daily basis. Powered by neural networks, they are turning into more general AI models capable of undertaking a multitude of tasks. To build those models and especially Large Language Models (LLMs), more and more data, and more and more computational resources are required. AI models are demanding enormous computational resources [1] to train and simply to use them.

Running concurrently with the AI revolution is the ever-growing emergence of IoT and edge devices. IoT devices collect data for processing and edge devices perform the processing. This network of physical devices, from everyday household items to industrial equipment, are working with limited resources, particularly when it comes to memory and computational power. AI models can leverage the high volume of data these devices collect but not in their current form.

Enabling LLMs for the edge

Consider a modern-day factory equipped with automated production lines. If equipped with IoT and edge devices, these machines also can self-diagnose issues, predict maintenance needs, and adapt in real-time to changes in the production schedule or design.

What if this production line could utilize AI to detect in real time microscopic defects [2], and allow the machinery to adjust its operations instantaneously? High-resolution cameras connected to processing units can execute such sophisticated tasks by leveraging the power of large AI models. In this case, computer vision models analyzing images in real time are known to be resource-intensive. Yet, the embedded systems running the production line (the edge devices in this scenario) are constrained in terms of computational capacity and memory. These devices need to be compact to fit within the machinery, robust to withstand the factory conditions, and responsive to maintain production efficiency.

The dilemma emerges: How do we equip a factory production line, bound by its inherent limitations, with the capabilities of expansive AI models that can transform it into a dynamic, adaptable and highly efficient manufacturing process?

The potential solution

The solution may lie in the quantum realm. Indeed, quantum computing carries a lot of promise when it comes to computational bottlenecks. Within the context of IoT, the potential applications of quantum computing are vast. However, given the current state of quantum hardware, these computations are only possible with cloud-based platforms, accessing quantum hardware through the internet. The reason lies in the inherent challenges associated with quantum systems. In the context of IoT and edge devices, where compactness and real-time computations are necessary, the current quantum hardware simply doesn’t fit.

While true quantum computation on edge is clearly out of reach for now, quantum-inspired algorithms offer a feasible alternative. These algorithms, run on classical hardware, offer a computational advantage compared to their classical counterparts. For the immediate future of edge computing, quantum-inspired methods might just be the bridge between the quantum promise and our classical reality.

The power of tensor networks

The most promising quantum-inspired approach for edge computing use cases relies on tensor networks.

Originating from the realm of quantum physics [3], tensor networks are attracting increasing attention in the domain of deep learning and AI. A tensor network is a framework to represent complex, multi-dimensional information efficiently. These networks can decompose high-dimensional information into smaller, more manageable components. In the process, they capture the essential information while discarding redundancies.

Tensor networks work well with neural networks, which are at the heart of many AI models.

Neural networks, particularly deep ones, can contain billions of parameters. These parameters often contain a lot of non-essential information, which increases memory and computational demands. Tensor networks provide a way to represent these vast networks using fewer parameters, compressing the model without significantly compromising its accuracy [4].

When it comes to integrating tensor networks with large AI models based on neural networks, two main approaches can be considered:

Model Compression: The AI model is compressed after its training using tensor decomposition techniques. This approach allows for significant size reduction while retaining most of the model’s performance.

Tensorized Layers: The AI model is constructed using tensor networks, and then trained with tensorized architecture. This approach also allows for significant size reduction, retains model performance and could improve training performance as well.

Recent experiments with these methods have shown promising outcomes, achieving model compression that significantly reduces the number of parameters by several orders of magnitude [5], thus creating models that are more suitable for edge computing.


As IoT devices that aim to collect more and more information to feed big AI models, those models in turn must work within the limited resources of edge devices. Large language models, which are leading the advances of natural language processing, exemplify this challenge.

In this landscape, tensor networks emerge as potential candidates to remove this limit. By leveraging tensor networks, it’s possible to compress the massive architectures of LLMs without diluting their proficiency and make them manageable in embedded systems.


[1] Pope, R., et al. (2022). Efficiently scaling Transformer inference. arXiv. http://arxiv.org/abs/2211.05102

[2] Guijo, D., et al. (2022). Quantum artificial vision for defect detection in manufacturing. arXiv. https://arxiv.org/abs/2208.04988

[3] Orus, R. (2013). A practical introduction to tensor networks: Matrix Product States and Projected Entangled Pair States. arXiv. https://arxiv.org/abs/1306.2164

[4] Jahromi, S., et al.. (2022). Variational tensor neural networks for deep learning. arXiv. https://arxiv.org/abs/2211.14657

[5] Patel, R., et al. (2022). Quantum-inspired Tensor Neural Networks for Partial Differential Equations. arXiv. http://arxiv.org/abs/2208.02235

About the Author

Luc Andrea is an Engineering Director at Multiverse Computing specializing in Artificial Intelligence and Quantum Computing. With a PhD in Theoretical Physics and a background as a Data Scientist in the services and industrial sectors, he currently leads teams in developing and deploying cutting-edge AI and Quantum systems tailored for various industry applications.