By: Bob Tamburri
Building the VR Conferencing Solution
Previously, we presented a scenario of what a Virtual Reality Conferencing (VRC) experience might be like and explored the roots of VR and how we got where we are. In this segment, we will try to piece together the constituent components and technology that will be required to make this VR engine run. Actually, more than simply run, but provide us with an immersive world that is transparent to the user and allows us to be totally engaged without the technology getting in the way. No small task to be sure, let’s take a look at what we’d need to realize this goal.
It Begins with Hardware
First, we need to consider the hardware. It’s safe to say that bulky and obtrusive head-worn apparatus will simply not fly. The sleek goggles mentioned in the scenario above would need to be a highly streamlined version of the face-warts available today. Given the recent achievement of “retina” displays for near-field use, this should not be very difficult. Keeping the form-factor low will require the use of thin-film, flexible, OLED displays and lightweight carbon-fiber composites, while keeping much, if not all, the processing and power supplies remote.
One caveat to achieving this is that there are reasons for the form-factor of today’s VR headsets. First is the thickness of these headsets, causing them to hang off your face like an alien parasite. Most are between three and four inches thick, which at first sounds like an unnecessarily bulky design but the reason for this is simple optics in that the human eye cannot easily focus on images that are much closer than that. This may be compensated for mechanically by placing adaptive optical lenses in front of the display panels, as is done in currently available VR headsets, in effect, shortening the focal length. Unfortunately, this does not always work for everyone’s vision type and will also add to the bulk of the design. A secondary method could be the use of customized software that changes the way the image is displayed at close range, thus enabling the view to be optimized for each user.
The second issue is the sealed mask portion of the headset. This is necessary for two reasons: one is to help to bear the weight of the headset itself by providing a cushion, the second is to eliminate outside light from distracting from the desired images, thus leading to a more immersive experience.
The only way of completely overcoming these issues would be to eliminate the headset altogether in favor of holographic projection displays. Needless to say, this would require much bulkier, less portable, and more expensive hardware. Not to mention that the technology for this is likely decades away. (Sorry, no holodeck yet). Also, projecting images on the already existing environment would not provide the level of realism or control offered in an isolated, head-worn system. This is of particularly interest to gamers who insist on ultimate realism. However, headset-style holographic displays are already in development. Innovations such as Microsoft’s Hololens™ will provide an augmented reality (AR) environment by generating holographic projection which uses a HUD (Heads-Up Display) method inside the goggles and overlays it with the real world. This may have possibilities for conferencing, or may be incorporated in some way along with VR as we will explore later.
Processing, Processing, Processing
Next, advanced DSP processing will not only manage all aspects of the virtual 3D environment, but can allow us to incorporate other information as well. As we mentioned, the use of avatars is likely to be preferable, as it negates the need for video equipment and allows us to interact with high definition CGI characters, which could look indistinguishable from the actual person. The characters would be projected into the virtual meeting space, along with a customized selection of apparel, hair, even makeup. An added benefit is that of allowing the participants to appear their best, even when their physical selves do not. Admittedly, the example of some of us jumping out of bed and attending a conference call in our pajamas is perhaps a bit extreme, though, be honest, it’s a reality we all face. The point is, anyone can attend a meeting at anytime, from anywhere and in any condition. (Yes, we can mostly do that now, just not as gracefully.) The down side to this may be that it foregoes the ability to read people’s facial expressions and body language. It will therefore be necessary to capture the person’s expressions & gestures and integrate them with the virtual character. This can be theoretically done through facial recognition sensors placed on or in the goggle frames. These may be optical lasers or IR motion sensors which can read the person’s real facial expressions, eye movements, body movements and hand gestures and then map them to the person’s avatar. It may even be possible to allow someone to attend a meeting in your place without anyone else knowing it wasn’t you such as if you were out sick. They would just assume your avatar, using your image and your voice though this may actually push the boundaries of ethical business protocol. Just sayin’.
It is also a fair bet that the VR experience would integrate some level of augmented reality. As some may recall, this is the idea behind such innovations as Google Glass™ and, as mentioned earlier, Microsoft’s Hololens™. One such example of this would be the name/title/location labels mentioned earlier, though conceivably it could also include other information about attendees such as projects or published works with which they are associated. AR would also allow other information to be displayed and could be different for each participant. For example, in addition to your presentation material, you could also view web pages, notes and other statistics that would aid in the presentation, or you could choose to display information and statistics on objects or material being displayed by others.
What level of processing power would be involved in reproducing this virtual world? Today, we have dedicated video processors capable of rendering hi-def CGI generated characters in real-time with Gbps data paths. No problem there. The real issue has to do with latency between controller input (i.e.-head-tracking, hand-gestures, etc.) and its displayed output. Direct VR interfaces, like those used in gaming, provide sufficiently low latency for being able to interface with your virtual environment without objectionable time lag; most of the time.
That’s the rub. You see, it only takes one slight misalignment (for example, the synchronization of your head turning with the objects being displayed to appear in the correct place) for your brain to perceive that something’s off. In the best case, a current VR system’s processing latency is between 30 and 40ms. Ideally, to reproduce a perceptually flawless reality, it should be less than half that (<15ms). Add to that the latency of video-processing and internet connections and, well, you get the idea. Dedicated, enterprise level gigabit networks are capable of sufficient bandwidth and low latency that would allow VR to function reliably. VLAN & VPN connections can overcome latency often associated with regular broadband connections by bypassing proxy servers, DNS servers and other ISP-related junctions. Given the Moore’s Law of communication systems, this will no doubt improve in the coming years. Taken separately, display refresh rates (60-90Hz in the most current VR systems) are good enough, provided there is no additional latency from other factors. To offset latency produced by other factors it would be necessary to increase the display refresh rate to 120-240Hz.
Beyond all this we’re still faced with the sensors and raw processing power needed to provide the transparent interface we’re imagining above. Sensors will need to be small enough to be placed inside and along the frame of the headgear for tracking facial, eye, head and hand movements. Like many current VR systems, they will need to include an accelerometer, gyroscope, laser position sensor, front-facing camera, magnetometer & proximity sensor. Some of these functions can be handled by a single sensor type or a few in combination, depending on the desired level of accuracy. Processing speed required here is crucial to assuring a low level of input to output latency, as mentioned before. The good news is that in addition to the work being done on the VR front, rapid advances in I/O sensor processing are being made due to work being done for autonomous vehicles. In order to function reliably, an autonomous vehicle has to make split-second decisions based on input from cameras, laser & infra-red sensors mounted on the vehicle (along with some very slick artificial intelligence).
This lays out a framework blueprint for what the VRC machine will need in order to deliver what we want. The next questions are: What will it take for people to actually use it? And, is there a legitimate business case for VRC? We’ll attempt to answer these questions in our third and final segment.
Bob Tamburri is a veteran of the AV industry who (among other things) has been a Product Manager for companies such as TOA and Sony and has been heavily involved in bringing new products and technologies, including ones for audio production, sound reinforcement, AV presentation, conferencing and life safety to market. He is a charter member of the World Future Society, which analyzes & reports on technological and social megatrends. Bob is also an accomplished trainer, technical writer, craftsman & musician.