10 Reasons Why Video Conferencing Still Sucks

10 Reasons Why Video Conferencing Still Sucks

Hi. I’m Shan, Co-Founder / CEO of Highfive. Here’s my LinkedIn. Somehow, 4.5 years ago, my co-founder and I decided to enter the world of video conferencing. The argument was simple — video conferencing sucks. Everything else is old and outdated. It should take about 18 – 24 months, but we can build something great that people will love. Wow, did we miss the mark and underestimate how long it would take us.

Alex St. John just posted a particularly acute rant. “The geniuses can make THIS work… but their video conferencing solutions suck? How much money do people still spend on business travel annually?”

We agree. If you are wondering whether video conferencing is a solved problem, just talk to anyone who has ever worked in an office and ask them about the first 15 minutes of every meeting they join. It’s embarrassing.

We completely agree with the macro-observation. All of this crazy technology is being created around us (flying cars, self-driving cars, plant-based meat, cancer detecting and curing therapies, reusable rockets), while the state of video conferencing is embarrassing. How do the people who are building all of this technology communicate today? They usually communicate using conference calls — technology that is basically the same technology we used in the 60s.

For quick context, Highfive is an end-to-end hardware + software video conferencing system that companies of any size can use to connect their employees and bring a modern stack of communication tools to their broken conference rooms. To give you a sense of scale, Highfive currently hosts 30,000 meetings per week among 14,000 users every week and will process over 200 million call minutes this year. In other words, we have some reasonable initial scale to derive insights from.

So why does video conferencing still suck? 4.5 years into the battle, a mountain of work still remains in front of us, but progress is being made. I thought I would take a few minutes to identify 10 lessons that we have learned on what makes video conferencing hard.

1. Tyranny of the 10%

Prevailing wisdom in developing a new product is that one should focus on making the 90% use cases awesome and generally ignore the remaining 10% of use cases that don’t make a difference to your value proposition.

Here is the problem with video conferencing. If you don’t get the 10% use cases, the attractiveness of your product will be severely limited. How does that make any sense? Well 20 years of predecessor technologies have essentially created a very high floor for “minimum feature set required”. For example, suppose you were to create a brand new car today. Cruise control is a feature that is used by extremely infrequently. But if you are creating a brand new car, you have to have a cruise control feature. It’s part of the “feature floor” for creating a new car.

In video conferencing, 25-way and 50-way calls are a great example. 90% of calls on Highfive are 5 people or less. We initially supported 8-way calls only. However, we saw a significant increase in product sales and adoption as soon as we rolled out support for 25-way and now, 50-way calls, even though there are very few large calls that happen on the system. It turns out, it’s important enough that one needs to support it.

The best explanation for this dynamic is that the presence of many of these features convey a certain sense of confidence and legitimacy that a team and buyer can place into a product beyond the actual need for these features. And without these features, a potential team or customer can’t make the leap of faith required to ask their organization to try yet another tool (in the long chain of tools they have tried and watched fail).

It’s quite straightforward to build a product that can do the 80% use case reasonably well (appear.in is a great example). But to see meaningful commercial success, video conferencing requires investing in a long list of 10% use case features. Features like single sign-on, dial-in phone support, international phone dial-in numbers, call recording, call layout controls, zoom functionality all fall into this category of infrequently used, but absolutely critical to customers.

2. Extremely deep probability tree of failure

Here is a pattern we see frequently:

  • Someone on the IT team tries out Highfive.
  • They broaden their test to the rest of the IT team.
  • Highfive gets rolled out to a team where several members of the team start trying to use Highfive.
  • Highfive gets rolled out to another team for an extended end user evaluation.
  • Highfive gets used in the context of important or critical executive meetings.

At every step in this process, a video conferencing system can fail for a variety of reasons. And at every step, people are already predisposed to the idea that it will probably fail, so when a failure does happen (whether mild or severe), it confirms someone’s preconceived notions. And when a failure happens, the prospect of success immediately disappears. And because the video conferencing application is the tip of spear for any problems that would occur through the entire technology stack below it, it’s natural to assume that the problem is in the video conferencing application. When your cell phone doesn’t work, you immediately assume the problem is Verizon, AT&T, T-Mobile or whoever your carrier is.

One great example here — we were at risk of losing a very large customer who saw huge success with Highfive for over 9 months. They expanded their deployment and all of a sudden started seeing extreme latencies in their screensharing experience. The first stop on the blame train was Highfive and the customer was indeed to right to start there. The initial instinct was to pull Highfive out.

After an investigation, it turns out that someone else on the team was going through an evaluation process to buy new wireless routers. They had set up routers from 3 different vendors in the same vicinity to test them. The interference between each other lead to to terrible network performance.

At this point people were using Highfive every day. Luckily, we had built sophisticated infrastructure for collecting telemetry and diagnosing problems in the wild and were able to determine what led to the problems they were experiencing. We are now working to automate this experience and detect environmental changes proactively and notify customers appropriately.

The point here isn’t that the customer was wrong. The lesson here is that video conferencing is the tip of the spear for a deep technology stack, some of which is under your control and some of which is not under your control. From a customer’s perspective, they don’t care (nor should they care). But as the tip of the spear (and in particular, one of the most demanding ones given bandwidth, latency and jitter requirements as compared to other applications), the video conferencing application must take responsibility for helping a customer understand why the application is not working.

3. The need for hardware and software to work together

Observation #3 is unique and biased to Highfive’s point of view in the world. But like I often say to my team and my wife, that doesn’t make it wrong!

Let me describe one example of a meeting:

  • 2 or 3 people in a physical conference room
  • 1 person joining from their laptop or mobile device from home or from their desk
  • 1 person joining from their car via a phone dial-in

Does this meeting sound familiar to you? Of course it does. This is the typical meeting these days. However, historically, conferencing technology has fragmented into 3 categories that make no sense to us.

Category 1: “software-only” tools like WebEx, GoToMeeting, Zoom: tools designed for everyone but the people in the physical conference room

Category 2: “hardware-only” tools like Cisco & Polycom: these tools are designed ONLY for the people in the physical conference room

Category 3: “audio bridge” like Intercall, 8×8, BT MeetMe: these tools are designed ONLY for people to join via a phone

To support the meeting scenario I describe above, you must have hardware for the physical conference room, software for everyone outside of the physical room and an audio bridge for anyone who needs to call in. Unless you are a very small company, you will almost always have an office with conference rooms that people use to hold meetings (save for a few truly unique companies). The unfortunate reality of conferencing tools is that the “software-only” tools don’t really work well together with “hardware-only” tools and lead to a whole host of user experience and administrator problems.

A common question that we hear is why can’t I use commodity hardware for the physical conference room. You certainly can. Companies like Zoom offer Zoom Rooms as a software solution that you can run on off-the-shelf hardware.

But why would you use off-the-shelf hardware when you can get high quality, purpose built hardware at an affordable price point? In the meeting example above, purpose-built hardware means that people in the physical conference room get a product designed exactly for them and the people on their personal computers and mobile devices get a product designed exactly for them. Administrators get a single vendor to work with — one throat to choke.

One helpful analogy comes from the world of movies and TV. Most people have bought a movie from the iTunes cloud before. If you want to watch that movie by yourself (say on an airplane or on the train), you open up the iTunes app on your phone or laptop and stream it or even download it for offline viewing. But how do you watch that movie when you are in your living room with your friends and family all in the same location?

You use AppleTV to stream those movies from iTunes to your television. It’s just a better experience.

Asking customers to use off-the-shelf hardware would be like asking people to build their own AppleTVs to stream iTunes content. The combination of hardware and software is a WAY better experience.

4. Lack of incentives for existing incumbents to make anything better

Why doesn’t Microsoft or Google just make Skype for Business and Hangouts better? These are good questions, but it’s unclear the economics would make sense for them. For both companies Skype for Business and Hangouts are part of the productivity suite, rather than standalone, independent business units.

It is doubtful that making Skype for Business or Hangouts 10× better would result in a proportional increase in the O365 or G Suite businesses. It is equally unlikely that the underperformance of those products are dragging down those businesses either.

Ask yourself the following question: are you any more or less likely to adopt or lose O365 or G Suite due to the conferencing tool in the suite, when the dominating factor in your decision is the email system you prefer?

It seems even more unlikely that companies like Cisco and Polycom are in a position to make dramatic improvements in their product experience. Polycom has demonstrated a lackluster history of product innovation over the last 15 years. Cisco continues to be focused on solutions for the Fortune 500 and the 1% of the 1%, rather than a solution that is affordable and easy enough to be used by everyone else.

5. Lack of investment until 2009

The world of video conferencing entered a period of homeostasis between 2000 and 2009, while the world was focused on social networking and consumer apps. Between those years, only 1 company was funded materially to solve this problem — Lifesize in 2003 and was purchased in 2009 for $405M in a very good outcome. In other words, for a period of 9 years, there were very few new attempts to do something truly big and different.

Then the world started to change again. Enterprise apps came back into fashion. Cloud & B2B SaaS represented a massive investment opportunity. In 2009, BlueJeans was funded. In 2011, Zoom was funded. In 2012, Highfive was funded. Between these 3 companies, we have now successfully raised nearly $400M in an attempt to change the status quo.

Obviously the problem of video conferencing still remains unsolved, but real progress is being made across all 3 of our companies and many open-source efforts.

6. Video Conferencing is the Fattest Tip of the Spear

Like all other applications, video conferencing sits on top of a deep technology stack to deliver its functionality. Moreover, it is among the most demanding applications on top of this technology stack, in terms of its bandwidth, latency and jitter requirements.

Here’s a simplified summary of that technology stack:

  • the application itself (rich client & browser)
  • the real-time audio/video codecs
  • call state & signaling
  • the operating system
  • the hardware end point, including the physical components like camera, microphones
  • permutation of peripherals like bluetooth headsets, headphones, microphones and cameras
  • the network layer (including the wifi network, the routers, the firewalls, the proxies, the hops)
  • the internet service provider
  • the data center (which has its own stack that is fully virtualized)
  • the cloud service layer
  • the bridging between legacy phone network and the IP network

This doesn’t even include the layers of the stack necessary to instrument and monitor the service so that customers can be appropriately supported. Moreover, each of these components are typically not optimized by default to carry the type of application that video conferencing represents. Legacy video conferencing systems tried to control as much of this as possible, going as far as requiring dedicated network connections between end points. This represents technology that is managed and delivered by dozens to hundreds of different vendors.

The point here is not that the technology to solve the problem at the highest level is particularly novel. What makes video conferencing particularly hard is that whoever creates the application ultimately has to take responsibility for helping IT administrators and customers define and diagnose what could be happening at each of these layers to deliver a great, consistent experience (whether that component is under our control or not).

7. Lack of robust open source building blocks

Until recently, there were very few examples of open source projects that could be used to build a video conferencing application. That means that every company had to build & maintain their own stack of technology to solve problems at each layer of the technology stack.

6 years ago a project named WebRTC emerged as an incredibly powerful standard, spearheaded by Google. But only recently did the standards groups make a few concessions on critical issues that have paved the way for WebRTC to be fully endorsed and adopted by Microsoft, Apple, Cisco and all others across the entire industry. This is huge.

This standard, for the first time, offers a fully open-sourced, community-driven collection of communications protocols and APIs that anyone can use to build a communications application. Further, these protocols and APIs will be directly built into nearly every one of the billions of browser instances around the world.

This should dramatically improve the rate at which the industry can improve the overall video conferencing experience.

8. Trigger happy users & customers

When it comes to video conferencing, a history of failed technologies leaves all of us extremely trigger happy. We are expecting it to fail (for good reason!). When it confirms those expectations, we are immediately ready to switch to something new when a problem inevitably arises — regardless of the cause (see #6). This raises the expectations for each new tool that enters the market.

We have examples of customers that engage in thousands of calls per week. But if 1 executive has a bad experience, the system can get thrown out. Customers and users are trigger happy and predisposed to move on to the next tool.

This is understandable. Video conferencing has been promised for so long but rife with underdelivery.

This dynamic means that the bar for even basic functionality is substantially higher. You are always one bad experience away from the entire system being thrown out.

– It doesn’t work with my bluetooth headset? Fail. Find another tool.
– It can’t detect my camera (because another app has control over it)? Not our fault, but still fail. Find another tool.
– The Internet broke when the DNS system went down and I can’t do my video call? Fail. Find another tool.

As end users, we don’t care about or have excuses for what level of the stack in which a failure occurs. All I know is that my meeting doesn’t work. And that’s ok — rightfully so. It just means that expectations are really high.

9. Current Expectations vs Fixing Problems: Be Careful What You Ask For!

This is one of my favorite learnings over the last several years. No one likes the way video and web conferencing tools work because the tools are too clumsy to use — particularly when you compare them to how beautiful and usable consumer applications have become.

But when you try to do something unique to solve those problems, people often want the functionality to work the way they are used to it working today, even if that might be the cause for the problem in the first place.

Let me offer an example from our experience. We know that people hate PIN codes when joining a conference call. When I am driving and calling into a meeting, I have to swap back and forth between the phone and calendar and somehow get my pin code entered after the call has started. I have nearly killed myself multiple times as a result of this exercise.

We addressed this problem in an earlier version of our product. When a user clicks on a Highfive link in a calendar invite, they get an option to join the call via HD audio and video or the option to create a personalized, single-time-use phone number that they can use to dial into a call, which has the desired benefit of eliminating the PIN code.

One of our biggest customer complaints was that they could not pre-allocate a phone number for their calendar invites with Highfive. But if we gave customers the ability to do this, then they would need a PIN code so that the system knew what call they were trying to join, exactly what users don’t want.

So we came up with an in-between solution that turns out to work extremely well. Our invites now include 2 links. One link to join via video and one link to open a web page that generates the dial-in number. The text surrounding those links is different, but they essentially perform the same function.

Perhaps we should have understood that need ahead of time and got the design right on the first attempt. But often times, when building a product, it’s hard to tell when you should break the rule or when you should follow it. It sometimes just takes iteration and learning.

10. Consumer behavior obstacles to adoption & scale

Even if you were to get the product and technology right, how do you get everyone to use it? Business user requirements are different than consumer requirements. Tools like Slack naturally lend themselves to organic grounds up adoption, as we have seen. Text messaging is very compatible with our new world of communication (asymmetric, text based and limited human interaction required).

Video on the other hand requires people to overcome the personal anxieties that people have about being on camera. It’s a heavier weight communication medium. As a result, the ability for a video tool to spread like wildfire is significantly reduced. If you believe our viewpoint that hardware is a critical component of the solution, then the challenge grows. Hardware means that we have to partner with IT to make sure technology can get deployed inside an organization, while we work with end users to make sure the application works well, breaks the right rules and delivers a great experience.

There is good news here. A lot of great people (including at Highfive and our competitors) are working hard to deliver a world where we finally have video available in a way that is everywhere and reliable. A lot of good venture capital money has entered the space over the last decade that will lead to a sea-change over the next 5 years. Some challenges are falling away faster than others. But we are confident that the the conditions for change have finally emerged.

It won’t be long before we are no longer hunching over laptops, huddling around phones or doing conference calls with audio so bad that you just nod your head and agree, never having heard what someone said in the first place. We’ll be looking back and finally laughing at how miserable video conferencing was 5 years ago.

I will risk being that cliche one more time. This time it really does feel different!