fuzzify-pashto

A JavaScript library that creates regular expressions (regex) for fuzzy searching Pashto text (approximate string matching)

NPM

Problem

It can be difficult to search for words in Pashto texts or dictionaries because of variants or difficulties in spelling. This is because:

Because of all these reasons, it can be difficult to search for words based on sound, or a particular non-standard spelling.

Solution

Search strings can be converted to regular expressions that can be used for fuzzy searching so that, for example:

Searching ForWill Match
گرزيدلګرځېدل
سنگہڅنګه
انطزارانتظار
د پارهدپاره
مالوممعلوم
زباژبه
سڑےسړی

and vice versa.

Usage

npm install --save fuzzify-pashto
const { fuzzifyPashto } = import "fuzzify-pashto";

const fuzzyRegex = fuzzifyPashto("سرک");
console.log(fuzzyRegex);

// output: /(?:^|[^\u0621-\u065f\u0670-\u06d3\u06d5])?[صسثڅ]ع?[رړڑڼ]ع?[ګږکقگك]/gm

See the Live Demo for interactive usage examples.

API

fuzzifyPashto.fuzzifyPashto(input, [options])

Takes an input of a string of Pashto text (usually a word), and returns a RegEx expression that can be used for fuzzy searching for approximate matches in Pashto text.

Options

options.matchStart

Chooses where to allow matches in the string to start from

options.matchWholeWordOnly
options.allowSpacesInWords
options.script
options.returnWholeWord
opitons.es2018
options.ignoreDiacritics