When you think about video calling what is the first thing that comes to mind? For most people it is FaceTime or Skype with two people seeing and talking to each other from their computer or mobile device. I like to refer to this as gen1 video calling. Another form of a gen1 video calling experience is Go-to-Meeting or Go-to-Webinar where a presenter invites multiple people to see a presentation. These technologies, which have been around for well over a decade now, are so widely used that we think of video calling technology as mature. I believe we are on the verge of seeing a significant shift in the medium we know as video calling, and as the market matures to the next level it won’t be about talking heads, but about how we interact around live data. Over the last couple of years a new technology has emerged in this market space known as WebRTC. In 2011, Google released an open source project for browser-based real-time communication known as WebRTC. This was followed by ongoing work to standardize the relevant protocols in the IETF and browser APIs in the W3C, and both Firefox and Opera browsers opted to support the standard. Why does WebRTC get so much attention when the core use cases have been addressed by mature offerings like FaceTime, Skype, and Go-to-Meeting? And why has the standard emerged from third parties versus the core video calling players in the space? The answer appears to be in what people are trying to do with this new technology.

WebRTC Requires No Downloads

When WebRTC first emerged most of the hype was around not requiring a download to have a video call. This is particularly useful in situations where an online retailer is dealing with a customer and hasn’t established enough trust to get the customer to agree to a download. This sounds very negative to the retailer, but in reality most people have become wary of downloading third party software. The WebRTC value proposition got the attention of the customer service community, and generated a lot of conversation around who should see whom. Should the customer see the agent? Should the agent see the customer? I’d argue that in most use cases the real need is the agent seeing the issue the customer wants addressed. The agent only needs to see the customer if the problem is actually physically on the customer, and this is rare outside of telemedicine. Another fallacy of the no download value proposition is it can only be delivered in certain browsers. Is the customer service organization going to turn customers away if they don’t use Chrome or Firefox? I think this is highly unlikely, so the vendors in this market offer browser extensions or driver solutions. The reality is there really isn’t a broadly based no download value proposition today. So if the value isn’t “no download” what is it? Part of the value delivered by the next generation of video calling technology is the ability to launch a call in the context of an application. Put more simply, if I’m having an issue in an application wouldn’t it be great if the agent was just a touch away? Gen2 ushers in the ability to integrate video calling anywhere in the application allowing us to connect anywhere, anytime. With Skype and FaceTime this simply wasn’t possible, and the solutions offered by traditional unified communications vendors made this so overly complex the market simply gave up trying. The new Gen2 video calling technologies make this as simple as adding a few lines of JavaScript to a web page. So now the WebRTC community is selling the Amazon Mayday approach to customer support. Simply push a button anywhere in the user experience and within seconds you’re talking to customer service. Now I know what you’re thinking: Isn’t this still just talking heads? Two things are different with the Amazon Mayday experience. First, the customer makes the request from somewhere in the app. Second, the agent can see what the customer is seeing which significantly improves their ability to help the customer. Screen Sharing & Co-Browsing Screen sharing and co-browsing aren’t new, but what is new is the seamlessness of the experience. With gen2 video calling the customer pushes a button in the app and connects to a customer service agent. With gen1 the customer had to leave the application and launch another application, reconnect with the agent, and then work their way back to the context of the problem. The real value in this experience is not the customer seeing the agent (talking head), but the agent seeing the context of the problem. The agent’s ability to understand where the customer is within the application, what is happening, and talk to the customer is invaluable in efficiently addressing their issues. Customer Interaction Gen2 video calling takes this a step further and enables the agent to interact with the customer. Interaction in Gen1 video calling was talking to the customer. In Gen2 the agent can see what the customer is seeing, point out items on the screen and draw annotations while discussing the situation with the customer. A call becomes more about the context making clear communication easier. In a Gen2 video call the customer and agent aren't looking at each other they are looking at the issue and interacting around it. The customer can use a pointer or annotations to explain to the agent what is happening. The agent can use the same tools to guide the customer to a successful resolution by highlighting the steps that need to be taken. Mobile Camera Sharing We’re not always dealing with an issue in an application. The physical devices we purchase (think cars, printers, DVRs, etc.) often need support as well. Another big advantage of Gen 2 video calling is the ability to share a camera and have the customer and agent interact over the live video stream. Every smart phone has a video camera and with Gen2 video calling the customer can now use the video camera to show the agent their issue. The agent can use a pointer to point items out and can even pause the video and draw on the freeze frame to assist the customer.

Gen2 Video Calling

The use cases emerging for gen2 video calling go well beyond the WebRTC value propositions of no download and launching a call within the context of the browser. They are not about connecting people so they can see each other, and the reason why the technology is gaining momentum is not WebRTC. WebRTC might have gotten the market to take notice of gen2 video calling, but it is the value it delivers that is causing it to gain momentum. Much of this value cannot be delivered today in a pure WebRTC mode, which is why I refer to it as gen2 video calling. With Gen2 video calling the call is launched in the context of the issue. Gen2 video calling is about all the parties on a call seeing the same thing and interacting with each other about what they are seeing. Gen2 video calling is a completely different experience than what we got from gen1 solutions like Skype, FaceTime, and Go-to-Meeting, delivering superior value to the end users. Look closely at the use cases emerging for gen2 video calling and you’ll quickly notice that the true value is not the talking heads, but creating a data rich immersive environment where the parties can interact. Welcome to Gen2