Keycube: 3-torus-based typing in VR

posted in: Tech Posts - Other | 0

Introduction & Summary

Hi, I’m Steve Sedlmayr and this is my second guest blog post here at eleVR. This time around my post is concerned with trying to find a better way to type in VR. If you don’t want to read the whole thing, you can skip ahead to any section or browse via the outline below:

 

I Suck at Typing. And: Typing in VR Sucks

The Optimized, One-Handed Typing Pad

A Brief Survey of Other Methods of Virtual Typing

A quick comparative analysis

Conclusion 

Keycube to the rescue

        The idea

        A quick comparative analysis with the other methods

        Letter frequency and order

Implementation and Testing Realities

Final thoughts

Appendix A: My setup

All the Links

All the pics

 

First, here’s a video of me using the keycube prototype:

 

 

IMG_20170925_173047

Figure 1. My VR crash-test dummy.

 

I Suck at Typing. And: Typing in VR Sucks.

The whole idea behind this post started one day when I got the crazy idea to try coding in VR. I had heard about all the cool design and thought experiments eleVR was doing imagining virtual coding in Anyland. I thought, “Is there actually a way to code in VR today?” Almost immediately, it occurred to me that the answer was ‘yes’. I had played several VR games in the Vive on Steam. From that experience, I knew that there was an option to show the desktop while in the headset. Valve threw that in there presumably to make it easier for users to troubleshoot launch problems with finicky VR apps, which sometimes hang or crash on startup. Sometimes, it’s just taking a while and you want to know what the deal is. Sometimes, other open apps or drivers cause problems. In any event, it’s a very useful feature; and using this feature, I realized, I could crudely approximate what coding in VR using the old desktop metaphor and text in a virtual window might be like. It should be as simple as firing up an editor, throwing on a headset and showing the desktop.

So that’s what I did. At first, it was pretty cool. Wow, I’m looking at C# code in Visual Studio inside a VR headset. But here’s the thing. I really suck at typing. I took a typing class in high school, and I was so-so. I’m just old enough that our typing class still had actual typewriters. I went to high school in the late 1980’s. Personal computers were a thing, but still mainly for rich nerds. I had one at home because my father was well-off at the time and wanted me out of his hair. If you’ve ever seen Ferris Bueller’s Day Off, that was my situation: I never got a car and rarely was allowed to drive one of my dad’s cars, but I had a cool computer (an Amiga 1000, to be exact). So I guess my situation was more of a mash-up of Cameron and Ferris. But in any event, most people didn’t have computers at home, and typewriters were still in wide use for, you know… typing.

Despite that typing class, on a computer, I’ve always been a hunt-and-peck kind of guy. I’m a software engineer, so I’ve done a lot of typing in my day, but I’m still not a good typist. If I really concentrate, I can go for maybe half a sentence without looking at the keyboard. But ordinarily my eyes flit back and forth between keyboard and screen pretty frequently. I’ve always been jealous of those smart alecks who can look you in the eye and have a conversation while continuing to clatter away at 100 strokes per minute or whatever.

In VR, I figured I would make some mistakes but that I’d be able to correct them and get the hang of it pretty quickly. I found out something different very quickly. I discovered, first of all, that my inability to type blind was much worse than I thought. It turns out that even when I’m not looking at the keyboard directly, I rely heavily on peripheral vision to orient myself to the keys. In VR, you don’t even have that. Even if you do get oriented, it’s hard to get a feel for where the backspace key is to correct your mistakes. Basically, without being able to type blind, I was useless at coding in VR, and I had to give my experiment up almost immediately. In order to type in VR, I would have to put in the time and effort to completely master blind typing first, then come back and try again.

It seems to me that there are a lot of people out there that don’t type so well, like me. Like me, they hunt and peck. Like me, even when they don’t look at the keys, they need to see them out of the corners of their eyes to orient their fingers. And it doesn’t seem very good for the usability of a platform to require that everyone using it, if they want to type text, has to be a perfect typist.

The next obvious thought one has then, is to use a virtual keyboard of some kind, in the virtual environment. The trouble with virtual keyboards, is that, well, they suck too, at least as far as doing any serious typing for a prolonged period. Virtual keyboards, of course, were not introduced by VR. Smartphones and gaming consoles have used them for years. But the demands on those keyboards are pretty light: a little texting here, searching for a movie or game title there, a browser search over here. Any time you have to do prolonged typing on these keyboards, say to send a long-ish message or email on occasion, it’s honestly pretty annoying. You can easily spend half an hour writing what you would rattle off in a couple of minutes on a keyboard; which is why external keyboards for smartphones have become popular with people who have to type on smartphones a lot.

The virtual keyboards in VR so far mostly mimic what you find on smartphones or gaming consoles, or are virtual representations of a real keyboard. And they tend to suck as well. They’re slow, inefficient and annoying; you honestly wouldn’t want to type anything longer than a few words with them. So I started to wonder if there might be a better approach.

 

The Optimized, One-Handed Typing Pad

I started by thinking about what an optimal character layout would be in a virtual setting. It would need to be compact, but you would still need to be able to see the entire alphabet at once. Physical keyboards were designed to be used with two hands, but that limitation doesn’t exist in the virtual world. In fact, ideally, one would want to be able to type with one hand. What I came up with was this:

 

IMG_20170920_184349_Out

Figure 2. The optimized, one-handed typing pad.

 

The idea here was to create a compact typing pad that could be manipulated with one hand and that minimized the distance between characters. For the character arrangement, I looked up the letter frequency of characters in the English alphabet; then I wrapped them in a spiral around the central character (‘E’), arbitrarily with left chirality. I wanted it to be as square as possible, and the nearest whole square number to 26 is 25, or 52. But of course you need that extra space for ‘Z’, the least common character, so I added an extra row. I put ‘Z’ in the middle at the bottom and flanked it with, from left to right, backspace, space, a number button (presumably for calling up a separate pad for 10-key), and enter. Starting with this as a basis, I then began to compare it to other methods of virtual typing.

 

A Brief Survey of Other Methods of Virtual Typing

For this I included both methods that are currently in use, as well as older methods of encoding text in a compact fashion that are no longer in use. Off the top of my head, I came up with:

  • Morse code
  • Baudot code
  • Linear tree – similar to HBO Go, PSN, or other Playstation-based apps
  • A plain virtual keyboard – like on your smartphone or in certain interfaces for game consoles
  • VR gestures – waving something like a Vive controller around and interpreting the movement to represent letters

A quick comparative analysis

I then randomly chose a few words with which to compare the above typing techniques. For this first test I chose alonemasticate and equanimous. Here’s a quick note on the methodology used for each kind of typing:

 

oohtp

Starting from the letter ‘E’, the sum of the Manhattan distances from each letter to the next

Morse

The sum of the number of dashes and dots required for each letter

Baudot

The product of the number of bits per character (5) and the length of the word

tree

f + (L1 – 1) + Σ(n)2→L2,

where:

f = the number of moves to the first character

L1 = the length of the word

n = some number of moves to the next character, based on how the tree is populated; in this case simply equivalent to one move per iteration

L2 = the length of moves to the next letter within the next tree node

However, we don’t know how the tree is populated so I’m generously assuming we have a very sparse tree where there is only a single additional move to the next letter in each node, so it becomes:

f + ((L1 – 1) x 2)

I also assume that the first node in the tree contains every letter of the alphabet.

virtual  keyboard (qwerty)

Starting from ‘Q’, the sum of the Manhattan distances from each letter to the next

gestures

The number of lines required to indicate a block-printed English letter

 

alone

oohtp

1 + 1 + 3 + 3 + 1 = 9

Morse

2 + 4 + 3 + 2 + 1 = 12

Baudot

5 x 5 = 25

tree

0 + ((5 – 1) x 2) = 8

virtual  keyboard (qwerty)

2 + 8 + 1 + 4 + 5 = 20

gestures

13 lines

 

masticate

oohtp

4 + 2 + 3 + 3 + 2 + 3 + 3 + 2 + 1 = 23

Morse

2 + 2 + 3 + 1 + 2 + 4 + 2 + 1 + 1 = 18

Baudot

5 x 9 = 45

tree

12 + ((9 – 1) x 2) = 28

virtual  keyboard (qwerty)

8 + 7 + 1 + 4 + 3 + 7 + 3 + 5 + 2 = 41

gestures

21 lines

 

equanimous

oohtp

0 + 4 + 2 + 3 + 1 + 3 + 4 + 4 + 3 + 2 = 26

Morse

1 + 4 + 3 + 2 + 2 + 2 + 2 + 3 + 3 + 3 = 26

Baudot

5 x 10 = 50

tree

4 + ((10 – 1) x 2) = 22

virtual  keyboard (qwerty)

2 + 2 + 6 + 7 + 6 + 4 + 2 + 3 + 2 + 8 = 42

gestures

21 lines


Conclusion

From this we can see that the tree approach always seems to be in the top 3 or 4. Since in our example the first node has every letter of the alphabet, it does better for words beginning with letters close to the beginning of the alphabet. Gestures and Morse are also efficient and always in the top 3 or 4 in our very small sample.

However, I don’t think it’s reasonable to expect people to learn Morse code to type; and the problem with controller gestures is that your forearm and hand are going to fatigue rather quickly. So of the top 3, that really only leaves the tree approach among the existing methods.

As far as our one-handed pad, it’s performing reasonably well: it comes in 2nd on the first word, 3rd on the second word, and ties for 4th place on the last word. So it’s giving the other methods a run for its money, although it’s not as efficient as I would have liked; I was hoping it would be in first place most of the time.

From these findings, I started to brainstorm:

  • Maybe you could combine a tree with a polyhedron like a rhombic triacontahedron or a rhombicuboctahedron, and the polyhedron’s faces could reshuffle after each letter input.
  • That sounds cool, but also maybe somewhat confusing. So maybe a 2D “flower” arrangement, where “petals” pop up around the current letter, then the next letter becomes the center, and so forth, could work.
  • That still seems kind of confusing though… maybe I could use 3 axes and a 3 x 3 x 3 cube?!

I actually wrote down a question mark and an exclamation point in my notes on that last idea; I was pretty excited. So what would I call this idea? Well, a keyboard is a board with keys on it, and this is a cube with keys in it, so I simply dubbed it the… keycube.

 

Keycube to the rescue

The idea

The first thing I did is sketch it up. The cleaned up version looks like this:

 

IMG_20170920_184111_Out

Figure 3. A sketch of a keycube.

 

The above is based on a Vive controller; I also initially imagined one of the buttons or the trigger for vertical movement. At first I was just imagining Manhattan distance. But then I realized that if I facilitated the affordance to wrap around the cube from one side to the other I could reduce the maximum number moves by 1. Shortly after that I realized that if I also allowed diagonal moves, I could reduce those moves by 1 again. Our friend Henry Segerman pointed out to me that this is essentially a 3-torus.

 

A quick comparative analysis with the other methods

Since the keycube was newly introduced, I had to do a quick comparison with the other methods again. For this comparison, I chose just two words, alone (again), and zamboni. Since efficiency in the keycube would depend on user proficiency, I decided to calculate both the minimum (using all the best opportunities for diagonals and wrapping) and the maximum (using Manhattan distance) possible efficiencies.

 

alone

oohtp

1 + 1 + 3 + 3 + 1 = 9

Morse

2 + 4 + 3 + 2 + 1 = 12

Baudot

5 x 5 = 25

tree

0 + ((5 – 1) x 2) = 8

virtual  keyboard (qwerty)

2 + 8 + 1 + 4 + 5 = 20

gestures

13 lines

keycube

1 + 2 + 2 + 2 + 1 = 8 (minimum)

1 + 4 + 3 + 3 + 2 = 13 (maximum)

 

zamboni

oohtp

3 + 4 + 2 + 4 + 5 + 3 + 3 = 23

Morse

4 + 2 + 2 + 4 + 3 + 2 + 2 = 19

Baudot

5 x 7 = 35

tree

25 + ((7 – 1) x 2) = 37

virtual  keyboard (qwerty)

2 + 1 + 7 + 2 + 5 + 4 + 3 = 24

gestures

18 lines

keycube

2 + 3 + 2 + 2 + 2 + 2 + 2 = 15 (minimum)

3 + 4 + 3 + 3 + 3 + 3 + 3 = 22 (maximum)

 

Even with just a couple of words, these are looking like pretty exciting results. For the first word, the version with minimum moves is even with the tree approach, and the maximum version ties for 4th with gestures. For the second word, the minimum version wins handily, and the maximum version again comes in 4th.

Letter frequency and order

If you google “letter frequency”, you’ll get a number of hits for websites discussing the frequency of letters in the English language. The frequency ordering I chose was:

etaoinhsrdlcumpwfgybvkjxqz

However, this isn’t the full story, as it doesn’t take into account how frequently each letter occurs at which part of a word; it only measures statistical frequency within the language as a whole. Obviously you’re going to want letters more likely to occur at the front of a word closer your “home row”. In the case of the keycube, there is a “home level”. I chose the middle level as the first or home level, and arbitrarily designated the top level as the second level and the bottom as the third. I figured people would be more likely to go up into the top level than they would be to go down into the bottom level. Based on this, and cross-referencing with the statistical frequency of letters within a word, I came up with this order:

toaihfwmb | pcjxqchuv | esdrygkz

I arranged these letters in the levels of the keycube like this:

 

v

c

l

q

p

j

u

x

n

top/level 2

 

b

o

f

h

t

a

m

i

w

middle/home/level 1

 

 

s

g

y

e

d

z

r

k

bottom/level 3

 

I spiraled to the right this time, starting by moving up one from the center character and preferencing cardinal directions in the first pass, then filling in the diagonals in the same direction on a subsequent pass. This was again an arbitrary decision on my part; but the idea was to pack each level in a way that made more frequent letters as easy as possible to access.

You can see that certain letters, like ‘e’, that are frequent, nevertheless ended up in the basement with letters like ‘z’ because they usually occur at the end of a word. You could of course choose the top level as the home level, pre-supposing a pattern of cycling from the top to the bottom for each word. This was a relatively arbitrary choice on my part for this prototype; substantial testing would be necessary to hone the most optimal arrangement for the largest number of users. It could turn out after such testing not to make much of a difference, given that the cube is only 3 x 3 x 3 in dimension. To be honest, in using the keycube, the letter order never stood out to me as a help or a hindrance.

 

Implementation and Testing Realities

The biggest issue I ran into during implementation and subsequent testing was simply managing the input from the Vive controllers. I initially implemented it only for Vive, thinking that would be sufficient, but I ran into a number of issues:

 

  • Vive input isn’t very well documented, and the Unity plugin code, which is also not well documented, isn’t very readable. You could of course drop down to C++ with Open VR (possible regardless of whether or not you are using Unity). However, that documentation is also relatively sparse and I didn’t want to spend time exploring the C++ SDK and re-tooling my prototype completely for an unknown outcome.
  • The Vive wasn’t really designed for my use case. It’s meant for occasional, slow-paced interaction and relatively simple interfaces. Unlike a traditional gamepad, it isn’t designed to be spammed, with the user using all or most of the features of the controller rapidly and continuously.
  • Swiping requires clicking down on the touchpad first, which wasn’t intuitive for my testers. I believe it would be possible to bypass this requirement at the C++ level, but on second thought I’m not sure swiping without the clicking would be much better, as any slight accidental brushing of the touchpad would trigger input.
  • It turns out that while swiping left, right, forward and back with your thumb is relatively easy and produces consistent results, swiping diagonally is hard for my testers. I divided the touchpad into 8 octants for orthogonal and diagonal directions based on the angle of the swipe, but the diagonal swipes just weren’t intuitive. The result is sometimes frustrating: you think you’re swiping forward and to the right, but you end up moving up or right instead.
  • I found that different portions of the controller needed separate tuning of the interval at which to measure input in order to obtain the ideal responsiveness.

 

As a result of the issues above, I ended up adding an Xbox 360 control scheme to my prototype as well. I personally find the 360 controller much more precise and overall easier and more intuitive to use, but surprisingly I was the only one. M and Vi both still preferred the Vive controller. Something we discovered with the 360 controller was that diagonals – on the 360 controller I’m using the left analog stick for horizontal directional movement – were still tough. I ended up mapping diagonals to the bumpers and triggers instead.

Given the difficulty of hitting diagonals, I imagined an interface using the 360 controller that might use radial selection in two axes. Similar to radial menus in games, which often have 8 or more sections, you would move the stick until the chosen octant is highlighted, then release to select it. One would be oriented horizontally and the other vertically (probably oriented along the x-axis). This would give you enough degrees of movement for any possible movement within the cube, and the currently inaccessible compound diagonals from the corners would now be possible as well, reducing corner-to-corner wrapping movements from 2 moves to 1. I didn’t implement this version, but I may in the future.

Another issue I ran into was a bug in Unity’s InputField class. It turns out that the input caret, otherwise known as a cursor, is completely broken in the current version of UnityEngine. Even more frustrating is the fact that Unity appears to think the bug was fixed years ago, and there is no means to submit bugs for engine code – you only have the ability to report bugs for the editor itself, or vote on internally defined engine defects. This sounds like a minor thing, but I found that much like trying to type blind in VR, having no cursor made my virtual document – just an input field on a plane in the virtual space – more or less unusable, as the user has no idea where they are typing. To get around this, I had to add command-line style highlighting of the trailing character as a workaround.

This introduced another issue though; with no cursor, even with my rich-text highlighting hack, you still have no idea where you are typing if the trailing character is a space. To get around that, since this is just a prototype, for now I’ve replaced spaces with underscores. Obviously for a production version this would have to be overcome, which means either waiting for Unity to notice this defect again, actively trying to bring it to their attention and then hoping for a timely solution, or using another UI solution altogether.

Another thing we found was that the relationship and size of the keycube to the document mattered quite a lot. At first the document was quite large, and I designed it for a standing experience. But most people don’t want to stand while they type. Even then, it was a bit too large, causing the user to have to look up at the document and then back down at the keycube. It took a bit of tuning to get things situated with regards to the scale of the document, and also the position of the keycube such that it was close enough to the document to minimize head and eye movements, but also not obstruct the document.

A final thing to mention was that the keys were initially quite difficult to make out from one another. I implemented the keys as letters floating inside of transparent, lightly colored spheres. Despite the transparency, it was still difficult to make out some of the letters towards the bottom and back of the cube, resulting in a lot of leaning from side to side and glaring into the cube trying to find the right letter. To rectify this, I:

 

  • Reduced the opacity even more
  • Added a highlight color to the letter in addition to the existing highlight color for the sphere
  • Changed the colors of each level to be more contrasty from one level to the next
  • Added code to cause all letters to face the user
  • Reduced the size of all keys not currently selected
  • Offset each level from one another by half a key width

 

We found that the above greatly improved the readability and usability of the keycube.

 

Final Thoughts

In using the keycube over multiple iterations, it became clear that forcing yourself to type with it in the most optimal fashion possible was sometimes quite difficult. It sounds simple, but getting your mind to rapidly imagine the fastest wrapping move to the next letter (if available) takes quite a lot longer than you would think it would. As a result, most of the time people just use Manhattan distance. This is somewhat expected if you think about it. After all, we’ve been using keyboards, and typewriters before that, since the late 19th century. Our minds have been trained for keyboards for over a century.

I did find myself increasing in proficiency the more I used it, however. I think in time, given enough people using it over a prolonged period of time, users would divide into skilled and less skilled typists just like with keyboards. The equivalent of people, like me, who hunt and peck, would be those who simply choose to use Manhattan distance for all of their moves. Skilled typists would eventually acquire the ability to rapidly choose the most optimal moves.

When I first started developing the keycube, I would say the longest I could tolerate it was to type something the length of a label or title… a few words. By the end, I would say it graduated to perhaps haiku level: a few short lines. M and Vi, not having used it as much, and so not being as proficient with it, advanced to maybe a half-haiku level. I think with time and use this would improve, but of course a production version would need a lot of additional features to be fully polished and usable.

As mentioned above, the control scheme still has heaps of room for improvement, and simply having a functioning cursor in the document would be a huge help. However, there is still the glaring issue that even the Xbox 360 controller doesn’t have enough forms of input to represent all of the characters on the keyboard. For this, you would probably need to implement a scheme similar to the virtual keyboards on smartphones, where certain keys expand into a separate keycube for, say, numbers, and another for punctuation. For languages that don’t use a relatively short alphabet like the Roman alphabet used in English, you would of course have to adapt the scheme. For Japanese, Chinese or other languages that use something like ideograms, you would perhaps do something similar to how I understand Japanese keyboards to work: instead of characters, the keys would contain components of ideograms that could be composed to make individual ideograms.

All in all, however, I found the results encouraging, and I think I would already prefer typing this way in VR for the small tidbits of text currently called for in virtual settings: passwords, usernames, labels, search terms and the like. While I didn’t achieve all the goals I had set out to achieve with this prototype version, I’m sufficiently pleased that it demonstrates the rudiments of a possibly better way of typing in VR compared to porting traditional 2D approaches into virtual spaces.

 

Appendix A: My setup

Main development machine and environment

OS: Windows 7 Ultimate 64-bit

CPU: Intel i7 6900K

GPU: EVGA Geforce GTX Titan Black, 6GB GDDR5

RAM: 128 GB

Unity version: 2017.1.0p4

Unity license: Personal

Visual Studio version: Visual Studio 2017 Version 15.3.4

Vive hardware version: product 128 rev 2.1.0 lot 0/0/0 0

Vive firmware version: 1462663157 steamservices@firmware-win32 2016-05-07 FPGA 1.6/0/0

Source control

Mercurial running on a home server

 

All the links

https://en.wikipedia.org/wiki/Morse_code

https://en.wikipedia.org/wiki/Baudot_code

http://letterfrequency.org/

https://en.wikipedia.org/wiki/Letter_frequency

http://3.bp.blogspot.com/-r2LtTheTREU/U4INyKkVnNI/AAAAAAAAKq0/39Ojq2j0a3k/s1600/letters_brown_words_15.png

http://elevr.com/portfolio/future-programming-interfaces/

http://elevr.com/8-ar-programming-blocks/

http://elevr.com/cross-platform-vrar-development-in-unity-for-vive-and-hololens/

 

All the pics

etrnn4y

Figure 4. An example of a tree-based dictionary interface on a Playstation device.

I also decided to include photos of my original notes, just for fun and also perhaps as an interesting insight into my thought process:

IMG_20170920_184939IMG_20170920_184949IMG_20170920_184955IMG_20170920_185006IMG_20170920_185009IMG_20170920_185017IMG_20170920_185017IMG_20170920_185021IMG_20170920_185021 IMG_20170920_185029IMG_20170920_185032IMG_20170920_185032IMG_20170920_185043IMG_20170920_185047

– Steve