Handwriting Recognition
A related topic is GpeVirtualKeyboard.
See
xstroke (dead link), which now embodies almost all of the good ideas from below. It is full-screen, translucent, and has no libXt dependencies.
CarlWorth Wed Nov 21 19:15:18 UTC 2001
Another choice of handwriting recognition on Linux handhelds is GpeRosetta. You might consider installing the rosetta package from your distribution.
However,
none seem to be currently actively developed (as of September 2006).
Handwriting recognition could mean a lot of things including full-brown recognition of arbitrary handwriting. Most of the tools discussed below might better be described as doing GestureRecognition because they only process a single character at a time. However, I believe that if GestureRecognition can be performed in a full-screen, transparent environment, then there is little distinction from a user perspective, while the GestureRecognition approach is much simpler to implement.
If you're interested in writing a program to do handwriting recognition in X you may be interested in the discussion regarding GeneratingSyntheticX11Events.
Latest News (Mar 28, 2001)
Much of the discussion on this page is quite stale. Here is an update:
-
IanWalters enabled xscribble to be used in full screen mode. His full-screen program is known as fscrib and has been included in the familiar distribution, (v0.1 - v0.3)
CarlWorth wrote
xstroke (a modeless full-screen recognizer).
The frontend for fscrib and fstroke is currently based on the Xt library. An effort is underway to remove Xt from standard iPAQ distributions so this should be changed soon. Also, the frontend uses a PointerGrab in order to do full-screen recognition. This is a sledgehammer approach. Using input-only windows, (or transparent windows when they become available), could be a much more elegant solution.
TODO
CarlWorth is planning on implementing the following features soon:
-
An Xt-free full-screen frontend, (Xlib or GTK)
-
Dynamic selection of backends with dlopen and friends,
-
Mode-support for the xstroke backend, (this will enable
-
Automatic keyboard/mouse mode switching based on widgets,
-
Rewrite xstroke to make it faster and use less memory.
-
A new vector-based recognizer, (as mentioned in wiki/VecRec)
-
Another big thing that I want to get to sometime is allowing dynamic
-
(the same frontend will allow xscribble and xstroke backends). There will also be a nifty little API so people can easily develop new backends.
-
accented characters as well as a graffiti/xscribble compatible alphabet using the xstroke recognizer).
-
(ie. menus and scrollbars use stylus as mouse even when in recognition mode).
gesture->character mappings on a per-application basis. I also want to make a mechanism such that programs can easily make use of gesture-based control -- to me this will be much more interesting than simple character recognition.
Full-screen Gesture Recognition
-
A full-screen handwriting recognition program would be ideal for handheld devices. What I am imagining is a system where the user simply gestures on top of any window of interest and the characters are immediately input into that window. Several interesting issues come up:
-
Implementation -- what's the main architecture
-
Suggestion from JimGettys: You can write a program that goes looking for windows that want input events, and "do the right thing": for large windows, allow scribbling anywhere, for small dialog boxes, pop up a window to scribble in. This can be done the same way that Bob Scheifler did speech recognition control of X (the atox program)
-
Window manager integration -- This is appealing because it should be easy to get full-screen effects. One downside is that every window manager would have to be hacked to include the recognition hooks, (UGH!).
-
Full-screen transparent window -- This is more appealing in that it would be independent of the window manager. Difficulties include support for/performance of transparent windows. KeithPackard is working on some new extensions to X that will support transparency.
-
Here's another idea: The recognizer could have a small window with little more than a toggle button. When in full-screen mode, the program could grab the pointer to get the motion and release events, then grab the server to draw the stroke. I'm a bit concerned about all that grabbing, but I'll have to at least try it out and see how it works. Comments/criticisms welcomed.
-
Another idea I've had that is not great, but may be useful in certain cases: The character recognizer could be implemented as a daemon with no window at all. Then, applications could send mouse events to the daemon and receive key events in return. If this were done right, it could be as simple as a few lines of code to make a program "gesture aware". Note: I really think it would be a bad idea if we were forced to recompile every program to make it gesture aware. Also, this daemon idea could be used along with any of the above suggestions. Some ways this might be useful include: a few lines of hacking could turn a virtual keyboard into a keyboard/scribbler. And it might make sense to take a library of widgets like GTK and make them gesture-aware in this way. I'll have to give this some more thought.
-
Distinguishing character input from mouse input.
-
One idea is to simply have a toggle button, (physical button or icon), to toggle between recognition and mouse modes.
-
A more advanced idea would be to try to be smart about what events are most likely mouse events. For example, as long as the gesture alphabet does not rely on a single tap, that could be passed through as a mouse event even when in character recognition mode.
-
The Newton uses the above-mentioned "intelligence" combined with a 4-way toggle in the corner of the screen (Ink-text, Real-time Recognition, Sketch, and Sketch Recognition. In text recognition mode 'Taps' are taps unless placed in combination with text, and 'Pen down without an ensuing stroke(no movement)' changes to highlight mode. The system is as intuitive as I can imagine: It's always iin text-recognition mode unless a slower processor requires deferred recognition (Ink-text) or user is adding a Sketch.
-
The Newton suplemented recognition with strong correction capabilities and user feedback. When a word was recognized into text, the user had the option of clicking on the word and selecting a different word. The Newton learned from this input how to recognize the correct word in the future. The Newton also allowed meta-strokes. For example, scribbling a word/section out would delete the word/section (think of hilighting a section with your mouse, then hitting the delete key). This could be done by virtualizing a mouse click-and-drag to hilight a section, followed by a virtualized delete. The Newton also provided direct visible feedback so the user knew what the recognition engine was doing.
xstroke is one of the NativeProjects which aims to implement a full-screen handwriting recognition program, (using libstroke at least initially).
Software
Programs implementing some for of handwriting recognition.
Here are several different pieces of prior art which may be useful for use/modification/inspiration:
Interesting background: http://web.mit.edu/cadet/www/OLCCR/project-paper.html
see JamesMastros