Part of Speech Tagger for Maltese

Part of speech (POS) tagging in Maltese is carried out using TnT. TnT is an implementation of a statistical part of speech tagger, by Thorsten Brants. The model is trained on manually annotated texts, reaching an accuracy of 96%. Below is a list of the tags which are used, along with a description.

CC
coordinating conjunction
CMP
complementiser
CS
subordinating conjunction
DD
determiner
DDC
definite determiner, clitic
DP
determiner, plural
DQ
determiner quantifier
DS
specifier, singular
EX
existential marker
II
interjection
MJ
modifier, adjective
MV
modifier, adverb
NCI
numeral, cardinal, intransitive
NCT
numeral, cardinal, transitive
NN
common noun
NO
numeral, ordinal
NP
proper name
NPI
initial in proper name
NV
verbal negator
PAC
particle, aspect marker, continuous aspect
PAF
particle, aspect marker, prospective aspect
PD
pronoun, demonstrative
PI
pronoun, indefinite
PMP
preposition ma' with bound pronoun
PP
pronoun, personal
PR
pronoun, reflexive
PRP
preposition
PRPC
fused preposition-article
PT
pronoun, possessive
PUN
punctuation
RA
residual, acronym
RB
residual, abbreviation
RD
residual, date
RFR
residual, formula, mathematical symbol
RFW
residual, foreign word
RH
residual, honorific
RO
residual, other
RS
residual, other symbol
UAM
(unique,unassigned) multiword utterance
VA
verb, auxiliary
VG
pseudo verb
VP
participle, active, or passive
VV
main verb
Using the tagger

The part of speech tagger can be used in two ways: online, as well as integrated in other applications as a web-service.

Online graphical user interface

A graphical user interface is available here, containing different levels of tagging which can be applied to a given text.

Web Service

The POS tagger is also available as a web-service. The WSDL link is http://metanet4u.research.um.edu.mt/services/MtPOS?wsdl.

The service has two methods which can be invoked:

    - String tagOneWordReturn(String text)

    - String tagParagraphReturn(String text)

Both methods take a string as input, that being the text to be tagged, and return another string of that text tagged. The difference between the two is tagOneWordReturn returns the output as one word per line, while the other returns it as tagged paragraphs (if there was any in the input string).

The format of the output is as follows:
word_TAG