Tutorials // ACM Multimedia Asia 2023

Tutorial 1: Geometric deep learning and its applications for Multimedia

Speaker:

- Hannes Fassold, JOANNEUM RESEARCH
- Email: hannes.fassold@joanneum.at
- Hannes Fassold received a MSc degree in Applied Mathematics from Graz University of Technology in 2004. Since then he works at JOANNEUM RESEARCH, where he is currently a senior researcher at the Intelligent Vision Applications Group of the DIGITAL institute. His main research interests are how to employ machine vision and AI methods successfully to solve real-world problems like image and video enhancement, defect inspection, object detection and tracking and so on. He is presenting regularly in renowned computer vision, multimedia & AI conferences like ACM Multimedia, ICIP, ICME, AIVR, MVA, MMSP, ASPAI, ISVC, EUSIPCO, GTC etc.). Furthermore, he is doing paper reviews for several of these conferences and has been also in the program committee. He coordinates the machine learning workflow as well as the dedicated ML hardware & software infrastructure for the DIGITAL institute.

Detailed Description and Outlin:

- Geometric deep learning, the learning in non-Euclidean domains, is an emerging research domain of machine learning. In the tutorial, we give an introduction into geometric deep learning - with a focus on manifold learning - and how it is employed for important application fields in multimedia like similarity search, image classification, synthesis & enhancement, video analysis, 3D data processing and nonlinear dimension reduction. We will present also open source software frameworks for geometric deep learning. Finally, as a spotlight we will present the manifold mixing model soup algorithm, a novel algorithm which mixes the latent space manifolds of several finetuned models together which provides significantly better out-of-distribution performance of the fused model.
- The tutorial will cover a number of topics from geometric deep learning. A tentative list of the topics is reported hereafter:

Target Audience and Prerequisite Knowledge

- The tutorial is meant for Ph.D and post-doctoral students, researchers and practitioners who deal with images and videos in all areas including detection, classification, segmentation, retrieval. The reason for proposing this tutorial at ACM Multimedia Asia is to promote the usage and tap into the potential of geometric deep learning for all kind of multimedia applications. A basic understanding of mathematics, image processing and machine learning is a prerequisite.

Tutorial 2: Streaming Media: Algorithms, Protocols and Systems

Speaker:

- Dr. Ali C. Begen, Ozyegin University (Also Comcast NBCUniversal)
  https://ali.begen.net/
- Email: ali.begen@ozyegin.edu.tr
- Ali C. Begen is currently a computer science professor at Ozyegin University and a technical consultant in Comcast's Advanced Technology and Standards Group. Previously, he was a research and development engineer at Cisco. Begen received his PhD in electrical and computer engineering from Georgia Tech in 2006. To date, he received several academic and industry awards (including an Emmy® Award for Technology and Engineering), and was granted 30+ US patents. In 2020 and 2021, he was listed among the world's most influential scientists in the subfield of networking and telecommunications. More details are at https://ali.begen.net/ .

Detailed Description and Outlin:

- Streaming is a complex technology with dynamics that need to be studied thoroughly. The experience from the deployments in the last 10+ years suggests that streaming clients typically operate in an unfettered greedy mode and they are not necessarily designed to behave well in environments where other clients exist or network conditions can change dramatically.
- In this tutorial, we will examine the progress made in the streaming space over the last several years, primarily focusing on standards, interop guidelines, workflows, performance indicators, extensions for low latency, server, network and client collaboration and the research directions. We will also present several open-source tools for the attendees to explore these topics further in their own, practical environments.
- The material for this tutorial will be based on a variety of sources starting from the instructor’s courses and talks at MPEG, IETF, DASH Industry Forum and SCTE and Comcast’s encoding operational experience. The slides will be provided to the participants. Time permitting, example codes will be run.

Target Audience and Prerequisite Knowledge:

- This course includes both introductory and advanced level information. The audience is expected of understanding of basic video coding and IP networking principles. Students, researchers, developers, content and service providers are all welcome.