Tokeniser for Maltese

The tokeniser is a tool which, given a text as input, returns a list of tokens. The tokens can be ortographical words, numerals and punctuation marks.

The tokeniser is designed to work on Maltese texts.

Using the tokeniser

The tokeniser can be used in two ways: online, as well as integrated in other applications as a web-service.

Online graphical user interface

A graphical user interface is available here, containing different levels of tagging which can be applied to a given text.

Web Service

The tokeniser is also available as a web-service. The WSDL link is

The service has one method which can be invoked:

    - String tokenise(String text, Boolean tokenTags, String separator)

The method takes has three parameters:

    - text
This is the text that will be tokenised

    - tokenTags
This is a boolean variable. If tokenTags is true then the output tokens will be wrapped in tags
(ex: <token> tagged_text </token>). If false, the token will have no tags.

    - separator
This is a string which will be used to separate one token and another in the output string.