Building AR/VR with Javascript and HTML

A comprehensive resource list for building engaging Augmented and Virtual Reality experiences using Web technologies

A few months ago I joined Halo Labs to help build our AR/VR design and prototyping platform. After the first interview, I received a “take home” exercise, to build a simple VR scene. While I was excited to build a VR app for the first time, I was also a bit afraid. I’m coming from a web development background and I never worked on a VR project before. To my surprise, it took me roughly three hours to finish the exercise, and it even looked pretty good (well, at least to me…).
Over the past six months, whenever I told someone that I’m developing a 
VR product using Web technologies, I got puzzled looks. The first annual WebXR week will take place in two weeks, and I thought it’s a great time to show web developers that VR & AR + Web technologies = ❤.

The main goal of this post is to allow Web developers to enter the AR/VR world quickly and easily. 
The approach I have taken is not to write a guide about a specific technology or library, but rather to build a “curriculum” that will take you from zero to expert, so you would be able to build complex AR/VR experiences. 
 Even if you don’t plan to develop AR/VR experiences, reading this guide will give you a glimpse into the current state of the WebXR world. Let’s begin.

Starting the journey — Getting to know the VR world

Before we begin, let’s line up the terms: Virtual Reality is the use of computer technology to create a simulated environment, so when you are in VR, you are viewing a completely different reality than the one in front of you. Augmented Reality (AR) on the other hand, is an enhanced version of reality created by adding digital information on your real-life reality (like in Pokemon GO). The term XR is often used to described any of the two.

While you could easily skip this step and jump directly to the WebVR frameworks, investing some time learning the basics of the XR world will greatly improve your learning speed and understanding down the road.

The following resources will help you gain some background on VR and AR development, as well as the required (very basic) mathematical background:

  • Introduction to Virtual Reality course by Udacity — This free course is a great place to start. The course introduces the main VR platforms available today and explains how they work while teaching some basic (but important!) VR terms.
  • VR/AR Glossary — Knowing the meaning of those basic XR terms will help you better understand articles and XR frameworks documentation. Another good resource is The VR Glossary website. I really like their infographics section, as it helped me to get my head around some VR terms and topics.
  • Basic 3D math — The subject I was most afraid of when I entered the VR world was math. I’m not a big math fan, and I thought that dealing with 3D requires a thorough math knowledge. Luckily, it turned out that I was wrong. The frameworks I’ll present below are relatively “high level” and doesn’t require any mathematical background. From my experience, the only important thing to know before moving on is the difference between the left and right-handed coordinate systems.

Rendering 3D content on the web

Now that we have some basic understanding of the XR world, we can start looking at XR web frameworks. The main framework for XR development is A-Frame (supported by Mozilla). The next section will go deeper into A-Frame, but before that, it’s important to understand how A-Frame is built in order to use it effectively. Let’s dive in!

In 2007, Mozilla first introduced Canvas 3D, which allowed rendering interactive 3D graphics on the web. The next step was to expose an API, and by 2009 The Khronos Group started the WebGL Working Group. The first version of the specification was released in 2011.
But what exactly is WebGL? to quote from Mozilla:

WebGL enables web content to use an API based on OpenGL ES 2.0 to perform 2D and 3D rendering in an HTML canvas in browsers that support it without the use of plug-ins. WebGL programs consist of control code written in JavaScript and shader code (GLSL) that is executed on a computer's Graphics Processing Unit (GPU)

In short, WebGL is an API that enables rendering 3D content in the browser, without the need to use plug-ins.

Today, all main browsers support the WebGL API, so we can safely use it to render 3D content on the web. The main problem? writing WebGL is hard and tedious. It’s enough to see the amount of code required to display simple 2D shapes to get discouraged. The solution? using Three.js.

The aim of the project is to create an easy to use, lightweight, 3D library. The library provides <canvas>, <svg>, CSS3D and WebGL renderers.
(source: Three.js GitHub page)

Three.js is a “high level” library that simplifies the creation of WebGL environments. It handles the lower level programming for you and lets you focus on building the scene.

To see how much it simplifies the development, take a look at the code example below, which renders an animated 3D cube on the screen:

In the above code example, we initialize the scene, the camera (which is our “eyes” in the scene) and the renderer. Then, we create a box geometry, which defines the cube shape, a material which defines how it’ll look and finally we create a cube by combining the two into a mesh. After that, we add the cube to the scene and attach a simple animation to constantly rotate it. 
Finally, we render the scene.

This is a big improvement from the hundreds of lines of WebGL code, but it’s still not very simple. In order to display a cube, you have to understand what’s a material, a mesh, a renderer and how they all connect together. In addition, presenting 3D content is not the end of the story. In order to create “serious” VR content, we’ll also have to allow user input, physics, integrating with various VR headsets, and more.
While all of these can definitely be built in three.js, it’ll be difficult to do so without deeper understanding in the 3D and VR domains.
But don’t worry! A-Frame to the rescue!

A-Frame — VR for the people

The A-Frame framework was created in 2015 by the Mozilla VR team in order to allow web developers and designers to author 3D and VR experiences with HTML without having to know WebGL. A-Frame is based on HTML and the DOM, which makes it very accessible and easy to use. While using only the HTML layer allows getting an impressive result, HTML is only the outermost abstraction layer of A-Frame. Underneath, A-Frame is an entity-component framework for three.js that is exposed declaratively.
A-Frame’s true power embodied in the last sentence, so let’s break it down to make sure we understand it:

A-Frame is an entity-component framework for three.js

To quote Wikipedia:

Entity–component–system (ECS) is an architectural pattern […] 
An ECS follows the Composition over inheritance principle that allows greater flexibility in defining entities where every object in a scene is an entity (e.g. enemies, bullets, vehicles, etc.). 
Every Entity consists of one or more components which add additional behavior or functionality. Therefore, the behavior of an entity can be changed at runtime by adding or removing components.

Let’s clarify with an example:
Say I want to build a vehicle with the ESC pattern.

Vehicle entity, composed of multiple components

First, I would need a vehicle entity, which is practically an object with an Id. Next, I would use components to define the vehicle’s look and behavior. I would have multiple components such as color, wheels, seats, and engine. Finally, composing all those components into the entity we created earlier will give us a functional vehicle entity.

While the above example is very simplistic, it should give you a rough idea of what an entity-component architecture is. A-Frame allows writing Three.js code in an ECS way, which makes VR development much easier. One main reason is that ECS makes it very easy to reuse components — so If I built a component, most chances you can use it too. The A-Frame community is taking advantage of that, and there is a big library of components which is available for you to use. 
Now that we understand the first part of the sentence, let’s examine the second part :

that is exposed declaratively

This part refers primarily to the HTML abstraction layer. This layer allows us to build a scene declaratively, which means we create a scene by defining what it should do, and not how it should do it. It can be done thanks to the underlying layers which allow us to create components. After we create a component, we can just say what we want to do — the component already knows how (that’s what the component code is all about).

Now that we understand what A-Frame is and how it works, let’s see A-Frame’s Hello-World example:

Copied from A-Frame’s official examples. You can move in the scene using the keyboard.

In this example, every tag under a-scene is a primitive. Primitives are just syntactic sugar for entities with default components. For example, the a-box primitive is an entity with multiple components (such as depth, height, and width) added by default. Each HTML property is a component added to these entities— we add (or override the default) position, rotation, color, and shadow components to our box entity.
A-Frame provides a set of primitives to help you create basic scenes quickly and easily, and you can also create your own primitives.

I won’t get deeper into A-Frame since it’s not the purpose of this article, but here are some good resources to jumpstart your A-Frame journey:

  1. A-Frame documentation — A-Frame official documentation is quite comprehensive and I highly recommend reading it. It probably contains the answers for all your “beginner questions”, so make sure to check it out before searching other places.
  2. A-Frame school — An interactive A-Frame course built by A-Frame creators. Using Glitch, the course provides step-by-step exercises to help you get started.
  3. Creating Your First WebVR App using React and A-Frame — Despite the fact that using A-Frame with React might result with poor performance, I find it to be a great combination (actually, that’s our setup here in Halo Labs). If you like React, this tutorial is using aframe-react and it’s a great place to start. (p.s — If you prefer Angular, check out angular-aframe-pipe)

Augment your skills

So far we talked about VR, but what about AR?
Since we still don’t have any broad consumer AR headsets today, the existing WebAR solutions mainly focus on mobile AR.

Today, there are three main libraries you can use to build AR scenes, all three work with A-Frame, but each has different capabilities. Let’s go over them one by one:

AR.js

AR.js provides both an A-Frame and a three.js extension that allows building marker-based AR scenes. AR.js was built with WebGL and WebRTC, so it’s the only one of the three that works with almost every smartphone, regardless of its OS version.

If you want to play with AR.js, check out Akash Kuttappa’s article.

aframe-ar

The common way to build mobile AR application is to use ARCore (for Android) or ARKit (for IOS) both are native SDK’s. In order to provide a way to use those SDK’s capabilities (like surface detection) on the web, Google released two experimental apps: WebARonARCore and WebARonARKit. which are actually browsers that expose a JavaScript API to the aforementioned capabilities. On top of that, they released a library called three.ar.js, which provides three.js helper functions for building AR experiences. Since A-Frame is built on three.js, aframe-ar was created in order to provide an easy to use A-Frame wrapper. How easy? all you need to do is to change your A-Frame scene tag from <a-scene> to <a-scene ar> and you have a working AR scene!

If you want to play with aframe-ar, check out Uri Shaked’s excellent article.

aframe-xr

aframe-xr is based on three.xr.js and they both created by Mozilla. Its main difference from aframe-ar is that it complies with the proposed WebXR Device API using the webxr-polyfill. The main implication is that aframe-xr enables building “progressive experiences” — experiences that change according to the device in use. Simply put, it allows you to move between AR and VR seamlessly.
Here in Halo Labs we are big believers in the WebXR API, so aframe-xr is our chosen framework.

If you want to learn more about the WebXR API, check out Dan’s blog post. Also, Mozilla has a great blog post regarding Progressive WebXR.

After playing with WebAR for a while, it’s obvious that it’s not yet mature. However, even today, using the libraries I mentioned above, you can build some impressive AR experiences.

Down The Rabbit Hole

So far we’ve covered all the basics. That’s enough to create basic AR/VR experiences and gain some confident in your abilities, but if you want to create some more complex stuff, you’ll need to extend your knowledge.
Here are some resources to help you get deeper understanding:

Interactive 3D Graphics — A Udacity course teaching basic principles of 3D computer graphics (meshes, transforms, materials, and more).

Beginning with 3D WebGL — A series of posts written by Rachel Smith, teaching Three.js basics with a lot of code examples.

Three.js 101: Hello World! — An introduction to Three.js. @necsoft talks about all the important stuff in one blog post.

Linear algebra — Khan academy — The lower you go on abstraction level, the greater the mathematical knowledge required from you. From my experience, if you want to strengthen your math knowledge, Khan Academy is your best friend.

Building a Minecraft demo with A-Frame — An example of how to implement a VR Minecraft demo using A-Frame. This step by step guide will help you better understand how to build a robust VR app with A-Frame.

Content

As we all know, on the internet- content is king. This is also true for the process of creating XR experiences. In order to build convincing XR experiences, 3D assets are required. While the amount of free and easy 3D creation tools increases rapidly, many of us prefer to use existing content instead of creating it ourselves. Currently, there are 2 main sources for free 3D assets:

  1. Google Poly — A library containing thousands of 3D models for use in VR and AR applications. Poly models are published under Creative Commons license (CC-BY 3.0), which means you can use them freely, even for commercial use, as long as you provide attribution to the author.
  2. Sketchfab — A marketplace of 3D models that contains more than 2M models. Sketchfab contains thousands of free models, also licensed under Creative Commons license (CC-BY 4.0). Sketchfab models are usually high quality, and as a result — “weigh” more.

Both sites support multiple 3D formats, include glTF. You can choose the required format when downloading the model.

It’s very easy to add models to an A-Frame scene, by using the a-gltf-model primitive (there are loaders for other formats as well):

<a-gltf-model src={http://model.url} />

The easiest way to avoid CORS problems is to host your assets on a publicly accessible CDN. Mozilla provides one for free: https://cdn.aframe.io/

Final words

As web developers, the AR/VR world often seems inaccessible. The truth is that the required tools for us to enter and create in this world already exist. Libraries like three.js and A-Frame allow us to use our existing web development skills to build rich VR scenes. In addition, complementary libraries add capabilities that enable the creation of AR scenes, and even progressive experiences that adapt to the capabilities of the device on which they run. Hopefully, my post will help other web developers enter the AR/VR world and together we can finally build the Metaverse! :)


Feel free to connect with us: Halolabs.io | Twitter | LinkedIn

Liked what you read? Hold down the 👏 to say “thanks!” and help others find this article.