Diacritical Character to ASCII Character Mapping

Dgidx supports mapping Latin1, Latin extended-A, and Windows CP1252 international characters to their simple ASCII equivalents during indexing.

Using the --diacritic-folding flag on Dgidx causes accented characters to be mapped to simple ASCII equivalents.

Using the --diacritic-folding flag on the Dgraph allows Anglicized search queries such as cafe to match against result text containing international characters (accented) such as café.

The accented characters are folded down before indexing, so only the single form is indexed. The mappings performed are listed in the table below (characters not listed are not affected by the --diacritic-folding option).

Note that capital characters are mapped to lower case equivalents because Endeca search indexing is always case-folded.

ISO Latin1 decimal code ISO Latin 1 character ASCII map character Description
192 À a Capital A, grave accent
193 Á a Capital A, acute accent
194 Â a Capital A, circumflex accent
195 Ã a Capital A, tilde
196 Ä a Capital A, dieresis or umlaut mark
197 Å a Capital A, ring
198 Æ a Capital AE diphthong
199 Ç c Capital C, cedilla
200 È e Capital E, grave accent
201 É e Capital E, acute accent
202 Ê e Capital E, circumflex accent
203 Ë e Capital E, dieresis or umlaut mark
204 Ì i Capital I, grave accent
205 Í i Capital I, acute accent
206 Î i Capital I, circumflex accent
207 Ï i Capital I, dieresis or umlaut mark
208 Ð e Capital Eth, Icelandic
209 Ñ n Capital N, tilde
210 Ò o Capital O, grave accent
211 Ó o Capital O, acute accent
212 Ô o Capital O, circumflex accent
213 Õ o Capital O, tilde
214 Ö o Capital O, dieresis or umlaut mark
216 Ø o Capital O, slash
217 Ù u Capital U, grave accent
218 Ú u Capital U, acute accent
219 Û u Capital U, circumflex accent
220 Ü u Capital U, dieresis or umlaut mark
221 Ý y Capital Y, acute accent
222 Þ p Capital thorn, Icelandic
223 ß s Small sharp s, German
224 à a Small a, grave accent
225 á a Small a, acute accent
226 â a Small a, circumflex accent
227 ã a Small a, tilde
228 ä a Small a, dieresis or umlaut mark
229 å a Small a, ring
230 æ a Small ae diphthong
231 ç c Small c, cedilla
232 è e Small e, grave accent
233 é e Small e, acute accent
234 ê e Small e, circumflex accent
235 ë e Small e, dieresis or umlaut mark
236 ì i Small i, grave accent
237 í i Small i, acute accent
238 î i Small i, circumflex accent
239 ï i Small i, dieresis or umlaut mark
240 ð e Small eth, Icelandic
241 ñ n Small n, tilde
242 ò o Small o, grave accent
243 ó o Small o, acute accent
244 ô o Small o, circumflex accent
245 õ o Small o, tilde
246 ö o Small o, dieresis or umlaut mark
248 ø o Small o, slash
249 ù u Small u, grave accent
250 ú u Small u, acute accent
251 û u Small u, circumflex accent
252 ü u Small u, dieresis or umlaut mark
253 ý y Small y, acute accent
254 þ p Small thorn, Icelandic
255 ÿ y Small y, dieresis or umlaut mark
ISO Latin1 Extended A decimal code ISO Latin 1 Extended A character ASCII map character Description
256 Ā a Capital A, macron accent
257 ā a Small a, macron accent
258 Ă a Capital A, breve accent
259 ă a Small a, breve accent
260 Ą a Capital A, ogonek accent
261 ą a Small a, ogonek accent
262 Ć c Capital C, acute accent
263 ć c Small c, acute accent
264 Ĉ c Capital C, circumflex accent
265 ĉ c Small c, circumflex accent
266 Ċ c Capital C, dot accent
267 ċ c Small c, dot accent
268 Č c Capital C, caron accent
269 č c Small c, caron accent
270 Ď d Capital D, caron accent
271 ď d Small d, caron accent
272 Đ d Capital D, with stroke accent
273 đ d Small d, with stroke accent
274 Ē e Capital E, macron accent
275 ē e Small e, macron accent
276 Ĕ e Capital E, breve accent
277 ĕ e Small e, breve accent
278 Ė e Capital E, dot accent
279 ė e Small e, dot accent
280 Ę e Capital E, ogonek accent
281 ę e Small e, ogonek accent
282 Ě e Capital E, caron accent
283 ě e Small e, caron accent
284 Ĝ g Capital G, circumflex accent
285 ĝ g Small g, circumflex accent
286 Ğ g Capital G, breve accent
287 ğ g Small g, breve accent
288 Ġ g Capital G, dot accent
289 ġ g Small g, dot accent
290 Ģ g Capital G, cedilla accent
291 ģ g Small g, cedilla accent
292 Ĥ h Capital H, circumflex accent
293 ĥ h Small h, circumflex accent
294 Ħ h Capital H, with stroke accent
295 ħ h Small h, with stroke accent
296 Ĩ i Capital I, tilde accent
297 ĩ i Small I, tilde accent
298 Ī i Capital I, macron accent
299 ī i Small i, macron accent
300 Ĭ i Capital I, breve accent
301 ĭ i Small i, breve accent
302 Į i Capital I, ogonek accent
303 į i Small i, ogonek accent
304 İ i Capital I, dot accent
305 ı i Small dotless i
306 IJ i Capital ligature IJ
307 ij i Small ligature IJ
308 Ĵ j Capital J, circumflex accent
309 ĵ j Small j, circumflex accent
310 Ķ k Capital K, cedilla accent
311 ķ k Small k, cedilla accent
312 ĸ k Small Kra
313 Ĺ l Capital L, acute accent
314 ĺ l Small l, acute accent
315 Ļ l Capital L, cedilla accent
316 ļ l Small l, cedilla accent
317 Ľ l Capital L, caron accent
318 ľ l Small L, caron accent
319 Ŀ l Capital L, middle dot accent
320 ŀ l Small l, middle dot accent
321 Ł l Capital L, with stroke accent
322 ł l Small l, with stroke accent
323 Ń n Capital N, acute accent
324 ń n Small n, acute accent
325 Ņ n Capital N, cedilla accent
326 ņ n Small n, cedilla accent
327 Ň n Capital N, caron accent
328 ň n Small n, caron accent
329 ʼn n Small N, preceded by apostrophe
330 Ŋ n Capital Eng
331 ŋ n Small Eng
332 Ō o Capital O, macron accent
333 ō o Small o, macron accent
334 Ŏ o Capital O, breve accent
335 ŏ o Small o, breve accent
336 Ő o Capital O, with double acute accent
337 ő o Small O, with double acute accent
338 Πo Capital Ligature OE
339 œ o Small Ligature OE
340 Ŕ r Capital R, acute accent
341 ŕ r Small R, acute accent
342 Ŗ r Capital R, cedilla accent
343 ŗ r Small r, cedilla accent
344 Ř r Capital R, caron accent
345 ř r Small r, caron accent
346 Ś s Capital S, acute accent
347 ś s Small s, acute accent
348 Ŝ s Capital S, circumflex accent
349 ŝ s Small s, circumflex accent
350 Ş s Capital S, cedilla accent
351 ş s Small s, cedilla accent
352 Š s Capital S, caron accent
353 š s Small s, caron accent
354 Ţ t Capital T, cedilla accent
355 ţ t Small t, cedilla accent
356 Ť t Capital T, caron accent
357 ť t Small t, caron accent
358 Ŧ t Capital T, with stroke accent
359 ŧ t Small t, with stroke accent
360 Ũ u Capital U, tilde accent
361 ũ u Small u, tilde accent
362 Ū u Capital U, macron accent
363 ū u Small u, macron accent
364 Ŭ u Capital U, breve accent
365 ŭ u Small u, breve accent
366 Ů u Capital U with ring above
367 ů u Small u with ring above
368 Ű u Capital U, double acute accent
369 ű u Small u, double acute accent
370 Ų u Capital U, ogonek accent
371 ų u Small u, ogonek accent
372 Ŵ w Capital W, circumflex accent
373 ŵ w Small w, circumflex accent
374 Ŷ y Capital Y, circumflex accent
375 ŷ y Small y, circumflex accent
376 Ÿ y Capital Y, diaeresis accent
377 Ź z Capital Z, acute accent
378 ź z Small z, acute accent
379 Ż z Capital Z, dot accent
380 ż z Small Z, dot accent
381 Ž z Capital Z, caron accent
382 ž z Small z, caron accent
383 ſ s Small long s