Terminology

Anumaan

Anumaan is a perspective based predictive text entry system.

Context

A context is the basis (text) sequence on which the prediction is generated.

See Also: Partial Word Context, Partial Sentence Context.

Corpus
(Plural:Corpora)

A Corpus is any written text/document stored in a (text) file. These Corpora are used to train a Language Model. Currently Anumaan requires a Corpus to be stored as a .txt file.

Document Session

A Document Session is when a user types something using Anumaan. Everything that a user does between start of the application (Anumaan) and end of the application (Anumaan), in a Domain, comprises a Document Session. Each Document Session is binded to one Domain. During the Document Session a user uses predictions from a certain K (K<= Total number of Domains) number of Domains.

Document Object Model
(DOM)

Document Object Model is an object model for representing an XML document in memory.

Domain

A Domain allows for the classification of Language Model into different named categories (like Personal Letters, Official Letters, blogs, Newspaper & Scientific articles etc.). This conception allows a text written in a particular class category/subject/field be trained in various categories. This is conceptualised because a text from a particular category/subject/field will have words and text sequences concerning that field/discourse only. Users will not be bogged with unnecesary predictions coming from all types of training Corpora. A Document Session by any user can make use of predictions from various domain as and when required by switching between the Domains.

Hence while training, the users must train a particular Corpus in the appropriate Domain. He must carefully select the Domain to use his text/Corpora to train the Language Model in that Domain.

See Also: Language Model.

Frequency

The number of occurances of a sequence (of words) in a corpus/corpora constitutes frequency of that sequence. A Language model can give weitage to a sequence based on its frequency.

See Also: Language Model.

Input Method
(IM)

An Input Method is an application/program which allows users to enter characters & symbols in other applications.

Language Model
(LM)

A Language Model is a Mathematical Model, which is a condensed representation of the behaviour exibited by the Prediction Engine when it runs over that Language Model.

In simple words users train a Language Model with their text(s) and it stores the learned values (e.g. frequency count of sequence), which are then used in text prediction by the Prediction Engine.

See Also: Mathematical Model.

Mathematical Model

A Mathematical Model is a condensed represenation of some phenomenon.

See Also: Language Model.

Motor Disability

A certain form of Diability which hampers the normal movement of body parts of a person.

N-gram
(N-gram)

An N-gram is a sequence (of word) of length N, with associated weight. Generally N-grams are stored in N-gram Language Models.

See Also: N-gram Language Model.

N-gram Language Model
(N-gram)

A Language Model which contains trained N-grams in condensed form.

See Also: N-gram.

On-Screen Keyboard

Mouse operated keyboard, where the keys are normally arranged in a rectangular widget. The users press these keys with their mouse to put text/characters on same/other application.

In Anumaan the On-Screen Keyboard allows a user to put text/characters in Anumaan textarea widget.

See Also: N-gram.

Online Adaptation

Online Adaptation refers to the adaptation of the Language Model during a Document Session. Online adaptation gives a significant weightage to the text sequences (new/old) written in the current Document session and the Language Model is modified accordingly. Online Adapatation allows the system to learn at the real time during the Document Session.

The current version of Anumaan does not provide Online Adaptation. Hence a users would have to save their text and retrain

See Also: N-gram.

Partial Word Context

Partial word context is partial word typed by the user. It then forms a context for full words in Nth position in N-gram. Partial word context is lost when user presses spacebar.

See Also: Context, Partial Sentence Context.

Partial Sentence Context

A group of words which act as a context for prediction. Currently in Anumaan prediction is generated only for Nth position in an N-gram sequence and N-1 sequence acts as a partial sentence context.

See Also: Context, Partial Word Context.

Partial Text Context

A group of words or a partial word which acts as a context for prediction.

See Also: Context, Partial Word Context, Partial Sentence Context.

Perspective

Perspective is a kind of orientation (of Language Model, Prediction Engine, different Domains etc.) through which prediction is generated. Perspective influences the Decision Function of the Prediction Engine which in turn influences the generated prediction.

Currently Anumaan has N-gram and Domain as only perspectives. Other perspectives are also being worked on which will be released with subsequent versions.

See Also: Anumaan.

Prediction Engine
(PE)

Prediction Engine performs sequence/text prediction by running over the Language Model. It runs over the condensed values provided by the LM and then runs a Decision Function to compute the most appropriate (text) prediction.

See Also: Language Model.

Text Entry Widget

A Text Entry Widget is a user interface where the user types (through keyboard or On-Screen Keyboard) text in Application(Anumaan). Text typed in the Text Entry Widget form Context for text Prediction.

See Also: Context, Partial Word Context, Partial Sentence Context.

Training

Training a Language Model may involve computing numerical values of text sequences (e.g. frequency count of the sequence) or putting certain annotations, or learning certain mathematical function, each of which may further used by Decision Function as a criteria to take a decision for the most probable text sequence(s)

See Also: Language Model.

eXtensible Markup Language
(XML)

An eXtensible Markup Language is a document standard which is used to create markup languages. Anumaan uses an XML file format to store its Language Model.

See Also: Document Object Model.