Welcome to the SEAMAI Special Session, MMM 2025 January 7-10, 2025, Nara, Japan

(SEAMAI)Simulating Edge Computing and Multimodal AI: A Benchmark for Real-World Applications

Motivation and Background

In recent times, applications featuring very large, sophisticated deep learning models are increasingly incorporating extensive, multimodal designs. These models are ubiquitous, contributing to the emergence of 'AI on the Edge,' where such models and applications are predominantly utilized on the internet. Edge devices, characterized by low configuration and limited power resources, present a significant challenge for researchers aiming to deploy these models effectively. While numerous studies have proposed various solutions, they typically involve models with tens of thousands of parameters. Recently, research [3] has introduced a solution capable of training models with several million parameters on edge devices. However, there is still room for improvement in these algorithms to enable widespread deployment of AI applications on the edge, seamlessly and at any time.

Task Description

Federated Learning (FL) [1] is a distributed machine learning method that helps train models on multiple local devices without transmitting raw data to a central server. However, training large models on devices with compact configurations such as NVIDIA Jetson and Raspberry Pi poses significant challenges. Split Learning (SPL) [2] addresses this issue by partitioning the model into parts and transferring intermediate results between them to reduce computational load and communication bandwidth. Research [3] introduced the Adaptive Offloading Point (AOP) solution for transformer encoder models, employing Reinforcement Learning (RL) and Gaussian Mixture Model (GMM) Clustering to determine the optimal segmentation point. Comparative results indicate that the AOP approach in [3] yields significantly better results than the baseline FedAvg (FL) [1] method. Nonetheless, there remains substantial work to be done, particularly in achieving faster processing, reducing computational load on edge devices and servers, optimizing data transmission between clients and servers, and minimizing energy consumption.

The following tasks are expected to be completed by participants (at least any two of them):

  1. Task 1

    Design a Multimodal-Sensing (MM-sensing) model incorporating a model and offload algorithm similar to AOP [3], meeting the following requirements: [participants need to perform at least two tasks]
    • Subtask 1: Designing a MM-sensing model involves achieving the following targets: The MM-sensing design model (full model) should have a size of 200M to 300M parameters. Refine the MM-sensing model (full model from step 1) to 15M parameters (optimized model). Note that the S3D model and HBCO model should remain unchanged; please keep them as they are while adjusting the TFE model. Participants are encouraged to use methods such as pruning, knowledge distillation, etc., to achieve these adjustments.
    • Subtask 2: Develop the idea of finding an alternative optimal solution for identifying the offloading point faster than RL. This could involve utilizing related algorithms such as convex optimization, generative algorithms (GA), etc. This approach will implement AOP on the TFE model.
    • The goal of Task 1 is to design a full model and an optimized model with faster training times, reduced computational load on both the server and client, and faster data exchange compared to the AOP algorithm in [3]. The goal of Task 2 is to develop a completely new AOP solution, replacing the current AOP approach. [Note: The full model size is required to be between 200-300M parameters, and the optimized model should have 15M parameters].

  2. Task 2

    Upgrade the solution from Subtask 1 to find the optimal method for data exchange (transmission between client and server) and minimize the load on all devices in the system (including CPU/GPU) and energy usage to the fullest extent.
  3. Task 3

    Designed the completely new design of AOP, replacing current AOP solution, for instance as a Generative AI

Datasets & Models

The data used consist of videos extracted from cameras mounted on trucks traveling on Nisso Highway (Japan). Dozens of terabytes of data were collected from various types of trucks. To streamline the dataset, it has been curated and labeled with corresponding events (risk-1, non-risk-0) for prediction. Each video in this dataset is one minute long.

The model used in research [3] includes three main models as follows:

  • S3D for video datasets
  • HBCO for IoT dataset
  • TFE (transformer encoder) for offloading point.
  • Performance Evaluation

  • Evaluate the training time on different edge devices compared to previous baseline methods [1,2]. Use the source code to highlight the ideas of the AOP study [3] and any upgrades participants perform.
  • Evaluate the testing time on edge devices and compare it with previous methods. The testing time requirement (inference) should be less than 0.1 seconds per test video.
  • Demonstrate that offloading CPU/GPU tasks reduces energy consumption on edge devices.
  • Questions for Insight

    Here are some research questions related to this challenge that participants can try to answer to go beyond just looking at assessment metrics:

  • What advantages does this new method have over RL?
  • Does this method solve the problem of finding the optimal cutting point better than RL? If so, please prove it with experiments.
  • What do the evaluation results show? In what aspects is this method better than RL and GMM [3]?
  • Can this method find cutting points for all DL models? If not, please explain clearly why.
  • Participant Information

  • Please register your team to SEAMAI by sending the email to seamai@ml.nict.go.jp and clearly describe: 1) the team name (full and acronym), 2) the name, affiliation, and email of the team leader, 3) the names, affiliations, and emails of the team members.
  • Group members can choose an arbitrary number of participants.
  • Do not share or publish source code outside of benchmarking purposes.
  • Do not buy, sell, or exchange source code in any form.
  • Please cite our AOP [3] research when you use it for research.
  • References and recommendations to read

    1. McMahan, B., Moore, E., Ramage, D., Hampson, S. &; Arcas, B.A.y.. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1273-1282 Available from https://proceedings.mlr.press/v54/mcmahan17a.html.
    2. D. Wu, R. Ullah, P. Harvey, P. Kilpatrick, I. Spence and B. Varghese,” FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning,” in IEEE Internet of Things Journal, vol. 9,no. 21, pp. 20889-20901, 1 Nov.1, 2022, doi: 10.1109/JIOT.2022.3176469.
    3. Clustering-Enhanced Reinforcement Learning for Adaptive Offloading in Resource-Constrained Devices Authors: Khoa Anh Tran (National Institute of Information and Communications Technology, Japan); Nguyen Do Van (NICT, Japan); Minh-Son Dao (National Institute of Information and Communication Technology, Japan); Koji Zettsu (NICT, Japan)

    MMM 2025, Nara, Japan. January 7-10, 2025