Imaging and Vision Special Event

 

at Display Week 2023

            

Imaging and Vision Special Event at Display Week 2023
Thursday, May 25, 2023
Los Angeles, CA

 

One of the exciting highlights for Display Week this year is a new Imaging & Vision Special Event. Recent advances in image sensing, computer graphics, and computer vision, aided by artificial intelligence and machine learning techniques, are enabling new intelligent and interactive systems and applications. The ecosystem that SID serves increasingly includes end-to-end imaging and vision technologies, from capture to display. This one-day program will be held on Thursday, May 25, and will feature keynotes, tutorials, invited presentations, and live demonstrations.

Thursday, May 25, 2023

 Los Angeles, CA

9:00 am to 5:00 pm

Room 304ABC

 

 

Time: 9:00 – 9:20 am

 

 

 

Yanagisawa

 

 

Time: 9:20 – 10:00 am 

 

Opening Remarks

Achin Bhowmik
CTO & EVP of Engineering
Starkey
SID President

  
 

 

 

 


KEYNOTE: The Future World by “Sensors x AI”

Eita Yanagisawa

Senior General Manager

Sony Semiconductor Solutions Corp., 

System Solutions Business Division

 

With the expansion of IoT, DX is expected to accelerate in various markets, and it is expected that image sensors will play significant roles in various scenarios of DX. Under these circumstances, Sony Semiconductor Solutions Corporation (SSS) has started a new initiative that supports creating and operating edge AI sensing solutions This talk is an introduction to industry trends surrounding image sensors and the technology and potential of Sony’s intelligent vision sensors and edge AI sensing platform, AITRIOS.

 

Eita Yanagisawa joined Sony Corporation (currently Sony Group Corporation) in 2005, engaging in planning and control for the semiconductor business at Sony Semiconductor Solutions headquarters. Following this role, he was put in charge of starting up a new business for image sensors in the United States. After returning to Japan, he led the launch of Sony’s sensing device business, including the execution of M&A and alliance deals. In December 2020, he was appointed deputy senior general manager of the System Solutions Business Division, which developed a new sensing platform. In October 2021, he was appointed senior general manager of the same division.

 

 

 

 

 

  

Time: 10:00 – 10:30 am

 

From Cameras to Displays, End-to-End Optimization Empowers High Imaging Fidelity

Evan Y. Peng

Assistant Professor 

University of Hong Kong

 

From cameras to displays, visual computing systems are becoming ubiquitous. However, their underlying design principles have stagnated after decades of evolution. Existing imaging devices require dedicated hardware that is not only complex and bulky, but also exhibits only suboptimal results in certain visual computing scenarios. This shortcoming is due to a lack of joint design between hardware and software; importantly, impeding the delivery of vivid 3D visual experience of displays. By bridging advances in machine intelligence and optics, this work engineers physically compact, yet functionally powerful imaging solutions of cameras and displays for applications in photography, wearable computing, IoT products, autonomous driving, and VR/AR/MR. In this talk, two classes of end-to-end optimized imaging modalities will be described. In Neural Optics, we jointly optimize lightweight diffractive optics and differentiable image-processing algorithms to enable high-fidelity imaging in domain-specific cameras. In Neural Holography, we apply the unique combination of machine intelligence and physics to solve long-standing problems of computer-generated holography. As such, full-color, high-quality holographic images can be generated in real-time.

 

Evan Y. Peng is currently an assistant professor at the University of Hong Kong. Before joining HKU, he was a postdoctoral research scholar at Stanford University. He received his PhD in computer science from the Imager Lab at the University of British Columbia, and both his MSc and BS in optical science and engineering from the State Key Lab of Modern Optical Instrumentation, Zhejiang University. Peng has been working on several Neural + X projects for cameras, displays, microscopes, and rendering. His unique and cross-disciplinary approach to research has been well received in multiple professional communities (OPTICA, SIGGRAPH, IEEE, SPIE, and SID). He is the recipient of an AsiaGraphics Young Researcher Award (2022).



 

 

Time: 10:30 – 11:00 am

 

Equitable Imaging: How Can Vision and Graphics Enable Sensing for All?

Achuta Kadambi 

Assistant Professor

UCLA

 

Today, billions of light-based medical sensors are used by hospitals to measure quantities like blood flow, temperature, oxygenation, and more. Clinical decision making is partially based on the measurements from these sensors — so it’s important that they measure data robustly. Unfortunately, the accuracy of light-based devices varies across demographics. Just as a soap dispenser may not always work for those with dark skin, a light-based medical device has fundamental challenges with signal-to-noise (SNR) ratio and measurement accuracy. To solve this problem and make devices more inclusive and even more accurate (for everyone), we need to rethink the sensing process. We draw from a paradigm of “equitable computational imaging,” where we co-design the optical sensing and computation layers to resist bias. Removing biases in both hardware and software attacks the root causes of bias at the physical layer (e.g. light-based reflections). We will discuss new types of equitable imaging systems that measure heart rate and blood volume (contact-free and wirelessly); synthetic data pipelines that model melanin content; and theoretical results on dataset composition. By building novel sensors, simulators, and AI pipelines (together), we are able to demonstrate medical devices that — when deployed at UCLA’s hospital — appear to reduce bias and also improve accuracy (for everyone). 

 

Achuta Kadambi received his PhD from MIT and joined UCLA as assistant professor in EECS, where he leads a computational imaging research group focusing on digital humans and equitable medical devices. He has received early career recognitions from NSF (CAREER), DARPA (YFA), and Army (YIP). He has also received the IEEE-HKN Outstanding Young Professional under 35 years old award, as well as other awards like the Forbes 30 under 30, Science. He holds 20+ US patents for inventions in computational imaging and has co-authored the textbook Computational Imaging (MIT Press, available as free PDF). He has also co-founded two computational imaging startup companies, one of which was acquired by Alphabet in 2022.



 

 

Time: 11:00 – 11:30 am

 

Energy-Efficient Machine Learning and Applications

Evgeni Gousev 

Senior Director 

Qualcomm AI Research, Qualcomm Technologies, Inc. 

Chairman, Board of Directors 

tinyML Foundation

 

Recent progress in computing hardware, machine learning algorithms, and networks, as well as the availability of large datasets for model training, have created a strong momentum in development and deployment of game-changing AI applications. Intelligent devices with human-like senses have enabled a variety of new use cases and applications, transforming the way we interact with each other and our surroundings. Dedicated hardware becomes tiny and very energy efficient (with mW or less power consumption), algorithms and models smaller (down to 10s of kB of memory requirements), and software lighter.  This enormous technology wave and fast-growing ecosystem create a strong momentum toward new applications and business opportunities.  On the other side of the physical world spectrum, sensors are becoming more sophisticated, able to sense a variety of modalities (vision, sound, environmental, motion/vibrations, etc.) and are being deployed in billions. This presentation will review the state-of-the-art in energy-efficient machine learning (including hardware, algorithmic, and software framework aspects), describe some examples of technologies and products, and illustrate use cases (including some display power-saving use cases). We will highlight Always-on Computer Vision, technology and product pioneered by Qualcomm, which combines innovations in the system architecture, ultra-low power designs, and dedicated hardware for CV algorithms running at the “edge.” With low end-to-end power consumption (less than 1 mW), tiny form factor, and low cost, always-on computer vision modules can be integrated into a wide range of battery- and line-powered devices (IoT, mobile/laptop, VR/AR, automotive, etc.), performing object detection, feature recognition, change/motion detection, and other applications.  

 

Evgeni Gousev is a senior director of Qualcomm AI Research. He leads Qualcomm’s R&D organization in the Bay area and is also responsible for developing ultra-low-power embedded computing platforms, including always-on machine vision. He serves as the chairman of the voard of directors of tinyML Foundation (www.tinyML.org), a non-profit organization of more than 14k professionals in 37 countries worldwide. The foundation is focused on supporting and nurturing the fast-growing branch of ultra-low-power machine learning technologies and approaches dealing with machine intelligence at the very edge. Evgeni joined Qualcomm in 2005 and led technology R&D in the MEMS Research and Innovation Center, commercializing mirasol display technology. He earned a PhD  in solid-state physics and an MS in applied physics. After graduation, Gousev joined Rutgers University, first as a postdoctoral fellow and then as a research assistant professor. While at Rutgers, he performed fundamental research in the area of advanced gate dielectric for CMOS devices, which, a decade later, became industry-wide standards.  In 1997, he was a visiting professor with the Center for Nanodevices and Systems, Hiroshima University, Japan. Shortly after, he joined IBM, where he led projects in the field of advanced silicon technologies at the Semiconductor Research and Development Center in East Fishkill and T.J. Watson Research Center in Yorktown Heights, NY.   He has co-edited 26 books and published more than 166 papers (with over 11k citations and h-index of 49: Google Scholar). He is a holder of more than 100 issued and filed patents. Gousev is a member of several professional boards, committees, panels, and societies.  In 2020, he was inducted into the “Hall of Fame” of SEMI MEMS and Sensors Industry Group. 


 

 

Time: 11:30 am – 12:00 pm  

 

Visual-Inertial Tracking

Oskar Linde 

Chief XR Technology Architect  

Meta

 

Oskar Linde is the Chief XR Technology Architect at Meta, responsible for developing next-generation AR and VR machine perception and graphics technologies. Linde has extensive experience developing real-time machine perception systems. His company 13th Lab shipped the world’s first consumer product using visual simultaneous localization and mapping (SLAM) technology in 2011. This technology was acquired by Meta in 2014 and became the foundation for Oculus Insight and Meta’s Presence Platform that enables 6-degrees-of-freedom-tracked AR and VR headsets and controllers.

 

Time: 1:00 – 1:40 pm

KEYNOTE: The Future of Display and Interaction with AI: From Super-Resolution to Super-Intelligence  

Steven Bathiche 

Technical Fellow, VP  

Microsoft 

 

Advances in artificial intelligence (AI) and its new silicon-neural processing units (NPUs) bring powerful opportunities to the display-and-interaction field. This talk will explore the impacts on three key areas: (1) Real-time compression and upscaling: We’ll look at the impact of large AI models such as super-resolution, neural codecs, and generative image models on real-time streaming and rendering.  (2) Telepresence and remote-interaction experiences: We’ll see how AI continues to benefit real-time communication and collaboration through audio-quality enhancement, image-warping techniques (such as for maintaining eye contact), and generative techniques for 3D and multiview experiences.  (3) Changing how we interact with machines: We’ll examine how foundational AI and large language models (LLMs) such as those being built by Microsoft and its partner OpenAI will open up new vistas of natural user interaction on an order not seen since the introduction of the mouse more than a generation ago. (AI is the new mouse!)

 

Steven Bathiche leads Microsoft’s Applied Sciences Group, an interdisciplinary team of scientists, AI researchers, and product engineers in the Windows and Devices organization. His guiding purpose is to evolve the computer as a tool that helps people achieve their intent. He does this by inventing new interaction technologies that remove barriers, reduce friction, and extend ability. He has helped evolve the PC architecture with neural processors, and is engaged in developing devices, software, and experiences powered by foundational AI models for the Windows platform. A Microsoft Technical Fellow, Bathiche has been inventing and shipping new devices and experiences since 1999, including the original Surface table. He holds more than 120 patents.  He is a Fellow of the Society for Information Display (SID) and has been honored as an innovator by the IEEE.  He is a frequent speaker at international conferences, and his work has been featured in both technical publications and popular media.

 


 

 

Time: 1:40 – 2:10 pm

 

 

   

Tommy Wu

 

 

Time: 2:10 – 2:40 pm 

 

Virtual Human Technology: From 2D Images to 3D Avatars

Victor Erukhimov 

Co-Founder, CEO 

Itseez3D

 

Avatar SDK is the leader in creating realistic 3D avatars from selfies, used in AR/VR, metaverse, e-commerce. and other 3D experiences. We will talk about the state of the art and the challenges of creating recognizable avatars, including avatar customization and the uncanny valley. We will also discuss measuring avatar likeness and deep fakes.

Victor Erukhimov is a co-founder and the CEO of Itseez3D, a company that works to make everyone feel as themselves in a virtual world. Previously he co-founded Itseez, a company that focused on developing computer vision solutions running on embedded platforms, specifically automotive safety systems. He held the positions of CTO, CEO, and president at Itseez before the company was acquired by Intel Corp. in 2016. Erukhimov was the chair of the OpenVX working group from 2012-2016, creating the standard for cross-platform computer-vision API. He is the author of the OpenVX Programming Guide book, many papers in the areas of computer vision and machine learning, and several US and international patents. He participated in the development and maintenance of the OpenCV library.

Better Understanding of Human Color Vision for Image Quality

Minchen Tommy Wei 

Associate Professor, 

Director of Color Imaging and Metaverse Research Center 

Hong Kong Polytechnic University

 

Imaging systems aim to produce images for better user experience. The development of new technologies over the past decades has introduced opportunities and challenges at the same time. Some of these challenges need to be addressed with better understanding about human color vision. This talk presents some of the latest work carried out at the Color Imaging and Metaverse Research Center at Hong Kong PolyU.

 

Minchen Tommy Wei is an associate professor and also the director of the Color Imaging and Metaverse Research Center at The Hong Kong Polytechnic University. He obtained his bachelor’s degree from Fudan University in 2009. In Aug 2011 and Dec 2015, he earned a master’s of science degree and PhD at Pennsylvania State University. His research focuses mainly on fundamental color science, color management, and applications for imaging and metaverse systems (e.g., displays, cameras, AR/VR/MR), and illuminating engineering. He has strong collaborations with various industrial partners. Currently, Wei is the vice president of CIE, and also serves as an associate editor for the Journal of the Optical Society of America A, Color Research & Application, and LEUKOS (Journal of the Illuminating Engineering Society). In 2021, he received a Google Research Scholar Award.

 

 

Time: 2:40 – 3:10 pm

OpenCV (the Open Source Computer Vision) Library: Overview, Future Directions, and Relevant Examples

 

Gary Bradski

 

Founder, VP

Farm-ng

 

This talk will give an overview of OpenCV (the Open Source Computer Vision Library), what is in it now, and future directions. I will then discuss some examples of use including human body and hand pose tracking, human segmentation and recent examples of calibrating cameras to robot body pose. In the course of this, I will give an example of using ChatGPT and other tools to make computer vision application programming in Python fast and easy.

 

Gary Bradski is a leading entrepreneur and researcher in computer vision and machine learning. He founded and is still president of the most popular computer vision library in the world, OpenCV http://opencv.org/.  He organized the computer vision team for Stanley, the autonomous car that won the $2M DARPA Grand Challenge (now in the Smithsonian Air and Space Museum), which in turn kicked off the autonomous driving industry. Bradski served as a visiting professor at Stanford University’s computer science department for seven years. He helped develop one of the first video search startups, VideoSurf (sold to Microsoft in 2011). He founded Industrial Perception, Inc. (sold to Google in 2013), after which he founded the Silicon Valley office of Magic Leap. He co-founded Arraiy (sold to Matterport in 2019). Bradski served as a technical fellow for Gauss Surgical (sold to Stryker in 2021). He invests and serves on the boards and advisory boards of over a dozen startups, and is a founder and chief scientist of OpenCV.ai, an AI + Computer vision contracting company, and co-founder of CVat.ai, an AI data labeling/training service, but spends most of his time as a founder and VP of farm-ng, a maker of modular autonomous electric tractors.

 

 

Time: 3:10 – 3:40 Pm

AI Art Generation

Satya Mallick

Founder

Big Vision LLC

 

In this talk, we will learn how to generate Art using AI without writing a single line of code. We will introduce you to the tools available for creating Art using AI, explain how they work, and how to use them to create stunning imagery.  We will explain Stable Diffusion, Prompt Engineering, and Inpainting with examples. In addition, we will understand concepts like fine tuning Stable Diffusion, Controlled Image Editing, and  Video Animations.  

 

Satya Mallick is an entrepreneur working in artificial intelligence, computer vision, and machine learning. Before starting his AI consulting company, Big Vision LLC, he co-founded Sight Commerce, Inc., (formerly Taaz, Inc.), where he built AI products that reached over 100M users. His work has been covered in publications like TechCrunch, Huffington Post, The New York Times, and The Wall Street Journal. Mallick is the creator of some of the most popular online courses in computer vision and AI. He also serves as the CEO of OpenCV.org and its non-profit arm, OpenCV.AI.

 

 

 

Time: 3:40 – 4:10 pm

XR Picture Quality Beyond Display and Optical Hardware Limitations – How to Achieve It Using Computational Imaging Techniques and Eye Tracking

Eugene Panich

Co-Founder

Almalence

 

Almalence presents a set of ISP techniques that achieve high picture quality beyond optical hardware limitations in VR/AR head-mounted displays by dynamically compensating for the optical performance deficiencies specific to the optical path at the given eye position. Optical hardware design constraints inherent to head-mounted displays compromise the image quality and visual experience in VR/AR. Even the latest-and-greatest devices retain common flaws spoiling user experience: poor image detail, blur and color fringing outside of a small “sweet spot,” geometry distortion when changing gaze direction, and low picture resolution of the outer world captured via pass-through cameras. In the session, we will explore why dynamic optical aberrations correction, distortion correction, and super-resolution algorithms are indispensable to achieving high visual quality in head-mounted displays. We will also see examples of picture quality improvement those computational techniques produce on the highest-end VR displays from Pico, Varjo, and HP.

 

Eugene Panich has always been passionate about pushing the boundaries of what is possible. He is a co-founder of Almalence, a team of talented engineers with a track record of creating breakthrough digital signal-processing technologies that changed how images are captured and displayed in consumer devices such as smartphones and VR displays. 

 

 

 

Time: 4:10 – 4:50 pm

Tutorial: Achieving the Visual Turing Test: Integrated Display and Eye Tracking Technologies

 

Mantas Žurauskas

Research and Development Lead for Optical Sensors and Systems 

Meta Reality Labs

 

With AR and VR displays, we aim to deliver visual experiences that are indistinguishable from reality — a bar we call the Visual Turing Test. This sets a high bar for the technology that is being used to deliver the photons to the user’s eyes. The resolution should match the resolution of the human eye for all gaze angles across the entire visual field. The system should account for a standard range of eye accommodation and the interface optics should correct for or be compatible with prescription correction. The dynamic range and peak brightness of the system should meet the levels that can be experienced by the user in real life in both indoor and outdoor environments. The performance for each characteristic can be achieved individually in proof-of-concept bench-top set-ups, but to get the desired level of performance in a wearable device, novel methods for targeted light delivery are needed. Solving this globally is impossible; therefore we seek to solve it locally as a function of eye state. Designing such a system will require both continuous knowledge of the eye conditions and the anticipated optical performance under these conditions to effectively correct for the combined system. We anticipate that in the near term the benefit of early integration and co-design of display and eye tracking sub-systems will allow us to make the best possible tradeoffs between performance and system complexity. In the long term, to ensure the efficient use of generated display photons and image-rendering compute, the eye tracking and light delivery will have to converge into a seamless single system. Here we will provide a brief overview of the optimizations space that can be leveraged by combining state-of-the-art display and eye-tracking technology. We will also provide a rationale that supports the need to consider both display and eye-tracking technologies as two inseparable architectural elements of a single light-projection system.

 

Mantas Žurauskas is an optical systems scientist with over 12 years of experience in the research and development of precision imaging and sensing systems. His areas of interest include optical coherence tomography imaging, wavefront sensing, super-resolution microscopy, adaptive optics, and eye tracking. In the past he worked in several research institutions including the University of Illinois at Urbana-Champaign, University of Oxford, University of Kent, and Imperial College London. Prior to joining Meta, he served as a CTO in LiveBx LLC, a microscopy start-up. Currently, Žurauskas is leading the research and development of optical sensors and systems in Meta Reality Labs.