fuzzify-pashto

A JavaScript library that creates regular expressions (regex) for fuzzy searching Pashto text (approximate string matching)

NPM

Live Demo

Search String:
Options
Text to Search:
Edit or paste text hereClear
Matches:
پښتو ژبه د لرغونو آريایي ژبو څخه يوه خپلواکه ژبه ده چې له پخوا څخه په څو نومونو ياده شوې چې يو لړ نومونه ېې پښتو، پختو، پوختو، په هندي (پټاني)، په پاړسي او نړېوالې کچه د افغاني ژبې په نوم شهرت لري
Code:
import { fuzzifyPashto } from "fuzzify-pashto";  

const fuzzyRegex = fuzzifyPashto("سخه", {
	es2018: true,
});
Note: Your browser supports ES2018 lookbehind assertions in regex. This allows for cleaner matching at the beginning of words, but it's not supported in all environments.
Generated Regex:
(?<![ء-ٰٟ-ۓە])[صسثڅ][ا|و|ی|ي|ع]?[ښخشخهحغ][ا|و|ی|ي|ع]?(?:[اهحہۀ]|هٔ)[ا|و|ی|ي|ع]?

Note: Regex may appear out of order due to browser display issues with RTL-LTR text

Problem

It can be difficult to search for words in Pashto texts or dictionaries because of variants or difficulties in spelling. This is because:

Because of all these reasons, it can be difficult to search for words based on sound, or a particular non-standard spelling.

Solution

Search strings can be converted to regular expressions that can be used for fuzzy searching so that, for example:

Searching ForWill Match
گرزيدلګرځېدل
سنگہڅنګه
انطزارانتظار
د پارهدپاره
مالوممعلوم
زباژبه
سڑےسړی

and vice versa.

Usage

npm install --save fuzzify-pashto
const { fuzzifyPashto } = import "fuzzify-pashto";

const fuzzyRegex = fuzzifyPashto("سرک");
console.log(fuzzyRegex);

// output: /(?:^|[^\u0621-\u065f\u0670-\u06d3\u06d5])?[صسثڅ]ع?[رړڑڼ]ع?[ګږکقگك]/gm

See the Live Demo for interactive usage examples.

API

fuzzifyPashto.fuzzifyPashto(input, [options])

Takes an input of a string of Pashto text (usually a word), and returns a RegEx expression that can be used for fuzzy searching for approximate matches in Pashto text.

Options

options.matchStart

Chooses where to allow matches in the string to start from

options.matchWholeWordOnly
options.allowSpacesInWords
options.script
options.returnWholeWord
opitons.es2018
options.ignoreDiacritics