In recent times, applications featuring very large, sophisticated deep learning models are increasingly incorporating extensive, multimodal designs. These models are ubiquitous, contributing to the emergence of 'AI on the Edge,' where such models and applications are predominantly utilized on the internet. Edge devices, characterized by low configuration and limited power resources, present a significant challenge for researchers aiming to deploy these models effectively. While numerous studies have proposed various solutions, they typically involve models with tens of thousands of parameters. Recently, research [3] has introduced a solution capable of training models with several million parameters on edge devices. However, there is still room for improvement in these algorithms to enable widespread deployment of AI applications on the edge, seamlessly and at any time.
Federated Learning (FL) [1] is a distributed machine learning method that helps train models on multiple local devices without transmitting raw data to a central server. However, training large models on devices with compact configurations such as NVIDIA Jetson and Raspberry Pi poses significant challenges. Split Learning (SPL) [2] addresses this issue by partitioning the model into parts and transferring intermediate results between them to reduce computational load and communication bandwidth. Research [3] introduced the Adaptive Offloading Point (AOP) solution for transformer encoder models, employing Reinforcement Learning (RL) and Gaussian Mixture Model (GMM) Clustering to determine the optimal segmentation point. Comparative results indicate that the AOP approach in [3] yields significantly better results than the baseline FedAvg (FL) [1] method. Nonetheless, there remains substantial work to be done, particularly in achieving faster processing, reducing computational load on edge devices and servers, optimizing data transmission between clients and servers, and minimizing energy consumption.
The following tasks are expected to be completed by participants (at least any two of them):
Task 1
Design a Multimodal-Sensing (MM-sensing) model incorporating a model and offload algorithm similar to AOP [3], meeting the following requirements: [participants need to perform at least two tasks]The goal of Task 1 is to design a full model and an optimized model with faster training times, reduced computational load on both the server and client, and faster data exchange compared to the AOP algorithm in [3]. The goal of Task 2 is to develop a completely new AOP solution, replacing the current AOP approach. [Note: The full model size is required to be between 200-300M parameters, and the optimized model should have 15M parameters].
Task 2
Upgrade the solution from Subtask 1 to find the optimal method for data exchange (transmission between client and server) and minimize the load on all devices in the system (including CPU/GPU) and energy usage to the fullest extent.Task 3
Designed the completely new design of AOP, replacing current AOP solution, for instance as a Generative AIThe data used consist of videos extracted from cameras mounted on trucks traveling on Nisso Highway (Japan). Dozens of terabytes of data were collected from various types of trucks. To streamline the dataset, it has been curated and labeled with corresponding events (risk-1, non-risk-0) for prediction. Each video in this dataset is one minute long.
The model used in research [3] includes three main models as follows:
Here are some research questions related to this challenge that participants can try to answer to go beyond just looking at assessment metrics: