Dgidx supports mapping Latin1, Latin extended-A, and Windows CP1252 international characters to their simple ASCII equivalents during indexing.
Using the --diacritic-folding flag on Dgidx causes accented characters to be mapped to simple ASCII equivalents.
Using the --diacritic-folding flag on the Dgraph allows Anglicized search queries such as cafe to match against result text containing international characters (accented) such as café.
The accented characters are folded down before indexing, so only the single form is indexed. The mappings performed are listed in the table below (characters not listed are not affected by the --diacritic-folding option).
Note that capital characters are mapped to lower case equivalents because Endeca search indexing is always case-folded.
ISO Latin1 decimal code | ISO Latin 1 character | ASCII map character | Description |
---|---|---|---|
192 | À | a | Capital A, grave accent |
193 | Á | a | Capital A, acute accent |
194 | Â | a | Capital A, circumflex accent |
195 | Ã | a | Capital A, tilde |
196 | Ä | a | Capital A, dieresis or umlaut mark |
197 | Å | a | Capital A, ring |
198 | Æ | a | Capital AE diphthong |
199 | Ç | c | Capital C, cedilla |
200 | È | e | Capital E, grave accent |
201 | É | e | Capital E, acute accent |
202 | Ê | e | Capital E, circumflex accent |
203 | Ë | e | Capital E, dieresis or umlaut mark |
204 | Ì | i | Capital I, grave accent |
205 | Í | i | Capital I, acute accent |
206 | Î | i | Capital I, circumflex accent |
207 | Ï | i | Capital I, dieresis or umlaut mark |
208 | Ð | e | Capital Eth, Icelandic |
209 | Ñ | n | Capital N, tilde |
210 | Ò | o | Capital O, grave accent |
211 | Ó | o | Capital O, acute accent |
212 | Ô | o | Capital O, circumflex accent |
213 | Õ | o | Capital O, tilde |
214 | Ö | o | Capital O, dieresis or umlaut mark |
216 | Ø | o | Capital O, slash |
217 | Ù | u | Capital U, grave accent |
218 | Ú | u | Capital U, acute accent |
219 | Û | u | Capital U, circumflex accent |
220 | Ü | u | Capital U, dieresis or umlaut mark |
221 | Ý | y | Capital Y, acute accent |
222 | Þ | p | Capital thorn, Icelandic |
223 | ß | s | Small sharp s, German |
224 | à | a | Small a, grave accent |
225 | á | a | Small a, acute accent |
226 | â | a | Small a, circumflex accent |
227 | ã | a | Small a, tilde |
228 | ä | a | Small a, dieresis or umlaut mark |
229 | å | a | Small a, ring |
230 | æ | a | Small ae diphthong |
231 | ç | c | Small c, cedilla |
232 | è | e | Small e, grave accent |
233 | é | e | Small e, acute accent |
234 | ê | e | Small e, circumflex accent |
235 | ë | e | Small e, dieresis or umlaut mark |
236 | ì | i | Small i, grave accent |
237 | í | i | Small i, acute accent |
238 | î | i | Small i, circumflex accent |
239 | ï | i | Small i, dieresis or umlaut mark |
240 | ð | e | Small eth, Icelandic |
241 | ñ | n | Small n, tilde |
242 | ò | o | Small o, grave accent |
243 | ó | o | Small o, acute accent |
244 | ô | o | Small o, circumflex accent |
245 | õ | o | Small o, tilde |
246 | ö | o | Small o, dieresis or umlaut mark |
248 | ø | o | Small o, slash |
249 | ù | u | Small u, grave accent |
250 | ú | u | Small u, acute accent |
251 | û | u | Small u, circumflex accent |
252 | ü | u | Small u, dieresis or umlaut mark |
253 | ý | y | Small y, acute accent |
254 | þ | p | Small thorn, Icelandic |
255 | ÿ | y | Small y, dieresis or umlaut mark |
ISO Latin1 Extended A decimal code | ISO Latin 1 Extended A character | ASCII map character | Description |
---|---|---|---|
256 | Ā | a | Capital A, macron accent |
257 | ā | a | Small a, macron accent |
258 | Ă | a | Capital A, breve accent |
259 | ă | a | Small a, breve accent |
260 | Ą | a | Capital A, ogonek accent |
261 | ą | a | Small a, ogonek accent |
262 | Ć | c | Capital C, acute accent |
263 | ć | c | Small c, acute accent |
264 | Ĉ | c | Capital C, circumflex accent |
265 | ĉ | c | Small c, circumflex accent |
266 | Ċ | c | Capital C, dot accent |
267 | ċ | c | Small c, dot accent |
268 | Č | c | Capital C, caron accent |
269 | č | c | Small c, caron accent |
270 | Ď | d | Capital D, caron accent |
271 | ď | d | Small d, caron accent |
272 | Đ | d | Capital D, with stroke accent |
273 | đ | d | Small d, with stroke accent |
274 | Ē | e | Capital E, macron accent |
275 | ē | e | Small e, macron accent |
276 | Ĕ | e | Capital E, breve accent |
277 | ĕ | e | Small e, breve accent |
278 | Ė | e | Capital E, dot accent |
279 | ė | e | Small e, dot accent |
280 | Ę | e | Capital E, ogonek accent |
281 | ę | e | Small e, ogonek accent |
282 | Ě | e | Capital E, caron accent |
283 | ě | e | Small e, caron accent |
284 | Ĝ | g | Capital G, circumflex accent |
285 | ĝ | g | Small g, circumflex accent |
286 | Ğ | g | Capital G, breve accent |
287 | ğ | g | Small g, breve accent |
288 | Ġ | g | Capital G, dot accent |
289 | ġ | g | Small g, dot accent |
290 | Ģ | g | Capital G, cedilla accent |
291 | ģ | g | Small g, cedilla accent |
292 | Ĥ | h | Capital H, circumflex accent |
293 | ĥ | h | Small h, circumflex accent |
294 | Ħ | h | Capital H, with stroke accent |
295 | ħ | h | Small h, with stroke accent |
296 | Ĩ | i | Capital I, tilde accent |
297 | ĩ | i | Small I, tilde accent |
298 | Ī | i | Capital I, macron accent |
299 | ī | i | Small i, macron accent |
300 | Ĭ | i | Capital I, breve accent |
301 | ĭ | i | Small i, breve accent |
302 | Į | i | Capital I, ogonek accent |
303 | į | i | Small i, ogonek accent |
304 | İ | i | Capital I, dot accent |
305 | ı | i | Small dotless i |
306 | IJ | i | Capital ligature IJ |
307 | ij | i | Small ligature IJ |
308 | Ĵ | j | Capital J, circumflex accent |
309 | ĵ | j | Small j, circumflex accent |
310 | Ķ | k | Capital K, cedilla accent |
311 | ķ | k | Small k, cedilla accent |
312 | ĸ | k | Small Kra |
313 | Ĺ | l | Capital L, acute accent |
314 | ĺ | l | Small l, acute accent |
315 | Ļ | l | Capital L, cedilla accent |
316 | ļ | l | Small l, cedilla accent |
317 | Ľ | l | Capital L, caron accent |
318 | ľ | l | Small L, caron accent |
319 | Ŀ | l | Capital L, middle dot accent |
320 | ŀ | l | Small l, middle dot accent |
321 | Ł | l | Capital L, with stroke accent |
322 | ł | l | Small l, with stroke accent |
323 | Ń | n | Capital N, acute accent |
324 | ń | n | Small n, acute accent |
325 | Ņ | n | Capital N, cedilla accent |
326 | ņ | n | Small n, cedilla accent |
327 | Ň | n | Capital N, caron accent |
328 | ň | n | Small n, caron accent |
329 | ʼn | n | Small N, preceded by apostrophe |
330 | Ŋ | n | Capital Eng |
331 | ŋ | n | Small Eng |
332 | Ō | o | Capital O, macron accent |
333 | ō | o | Small o, macron accent |
334 | Ŏ | o | Capital O, breve accent |
335 | ŏ | o | Small o, breve accent |
336 | Ő | o | Capital O, with double acute accent |
337 | ő | o | Small O, with double acute accent |
338 | Œ | o | Capital Ligature OE |
339 | œ | o | Small Ligature OE |
340 | Ŕ | r | Capital R, acute accent |
341 | ŕ | r | Small R, acute accent |
342 | Ŗ | r | Capital R, cedilla accent |
343 | ŗ | r | Small r, cedilla accent |
344 | Ř | r | Capital R, caron accent |
345 | ř | r | Small r, caron accent |
346 | Ś | s | Capital S, acute accent |
347 | ś | s | Small s, acute accent |
348 | Ŝ | s | Capital S, circumflex accent |
349 | ŝ | s | Small s, circumflex accent |
350 | Ş | s | Capital S, cedilla accent |
351 | ş | s | Small s, cedilla accent |
352 | Š | s | Capital S, caron accent |
353 | š | s | Small s, caron accent |
354 | Ţ | t | Capital T, cedilla accent |
355 | ţ | t | Small t, cedilla accent |
356 | Ť | t | Capital T, caron accent |
357 | ť | t | Small t, caron accent |
358 | Ŧ | t | Capital T, with stroke accent |
359 | ŧ | t | Small t, with stroke accent |
360 | Ũ | u | Capital U, tilde accent |
361 | ũ | u | Small u, tilde accent |
362 | Ū | u | Capital U, macron accent |
363 | ū | u | Small u, macron accent |
364 | Ŭ | u | Capital U, breve accent |
365 | ŭ | u | Small u, breve accent |
366 | Ů | u | Capital U with ring above |
367 | ů | u | Small u with ring above |
368 | Ű | u | Capital U, double acute accent |
369 | ű | u | Small u, double acute accent |
370 | Ų | u | Capital U, ogonek accent |
371 | ų | u | Small u, ogonek accent |
372 | Ŵ | w | Capital W, circumflex accent |
373 | ŵ | w | Small w, circumflex accent |
374 | Ŷ | y | Capital Y, circumflex accent |
375 | ŷ | y | Small y, circumflex accent |
376 | Ÿ | y | Capital Y, diaeresis accent |
377 | Ź | z | Capital Z, acute accent |
378 | ź | z | Small z, acute accent |
379 | Ż | z | Capital Z, dot accent |
380 | ż | z | Small Z, dot accent |
381 | Ž | z | Capital Z, caron accent |
382 | ž | z | Small z, caron accent |
383 | ſ | s | Small long s |