Character Codes

Latin 1 Characters; This chart shows the effects of numeric ampersand entities on your browser. To use these characters in your own HTML files, put the appropriate number into &#__; e.g. "£" for the British pound (currency) sign), or, for the 8-bit alphabetic characters, use the alternative standard HTML 2.0 entity in parentheses on the right. (These are the only non-numeric character entities defined in HTML 2.0, except for "&", "<", and ">", which should be used to escape the characters & < > in an HTML file, and """ to escape a double-quote character in an attribute value.)

If the right column looks the same as the left column, you're losing the eighth bit somewhere. If the characters in the right column don't match their descriptions, then your browser is translating incorrectly between ISO 8859-1 Latin 1 and your platform's native character set.

Finally, note that positions 127-159 are not displayable characters in ISO 8859-1 Latin 1, and are not part of any HTML standard, so that HTML code such as "™" is incorrect, and will be displayed differently in browsers on different platforms (probably often in ways that you did not intend). See the next chart below (unicode) for the (future) correct way of displaying characters which are in positions 130-159 in Microsoft Windows -- including such typographical niceties as "curly" quotes, dashes, ellipses, and the trademark symbol.

The following chart only tests the ISO 8859-1 compliance of your browser's non-proportional font.

 32      160     Non-breaking space
 33  !   161  ¡  Inverted exclamation
 34  "   162  ¢  Cent sign
 35  #   163  £  Pound sterling
 36  $   164  ¤  General currency sign
 37  %   165  ¥  Yen sign
 38  &   166  ¦  Broken vertical bar
 39  '   167  §  Section sign
 40  (   168  ¨  Umlaut (dieresis)
 41  )   169  ©  Copyright
 42  *   170  ª  Feminine ordinal
 43  +   171  «  Left angle quote, guillemotleft
 44  ,   172  ¬  Not sign
 45  -   173  ­  Soft hyphen
 46  .   174  ®  Registered trademark
 47  /   175  ¯  Macron accent
 48  0   176  °  Degree sign
 49  1   177  ±  Plus or minus
 50  2   178  ²  Superscript two
 51  3   179  ³  Superscript three
 52  4   180  ´  Acute accent
 53  5   181  µ  Micro sign
 54  6   182  ¶  Paragraph sign
 55  7   183  ·  Middle dot
 56  8   184  ¸  Cedilla
 57  9   185  ¹  Superscript one
 58  :   186  º  Masculine ordinal
 59  ;   187  »  Right angle quote, guillemotright
 60  <   188  ¼  Fraction one-fourth
 61  =   189  ½  Fraction one-half
 62  >   190  ¾  Fraction three-fourths
 63  ?   191  ¿  Inverted question mark
 64  @   192  À  Capital A, grave accent ("À")
 65  A   193  Á  Capital A, acute accent ("Á")
 66  B   194  Â  Capital A, circumflex accent ("Â")
 67  C   195  Ã  Capital A, tilde ("Ã")
 68  D   196  Ä  Capital A, dieresis or umlaut mark ("Ä")
 69  E   197  Å  Capital A, ring ("Å")
 70  F   198  Æ  Capital AE dipthong (ligature) ("Æ")
 71  G   199  Ç  Capital C, cedilla ("Ç")
 72  H   200  È  Capital E, grave accent ("È")
 73  I   201  É  Capital E, acute accent ("É")
 74  J   202  Ê  Capital E, circumflex accent ("Ê")
 75  K   203  Ë  Capital E, dieresis or umlaut mark ("Ë")
 76  L   204  Ì  Capital I, grave accent ("Ì")
 77  M   205  Í  Capital I, acute accent ("Í")
 78  N   206  Î  Capital I, circumflex accent ("Î")
 79  O   207  Ï  Capital I, dieresis or umlaut mark ("Ï")
 80  P   208  Ð  Capital Eth, Icelandic ("Ð")
 81  Q   209  Ñ  Capital N, tilde ("Ñ")
 82  R   210  Ò  Capital O, grave accent ("Ò")
 83  S   211  Ó  Capital O, acute accent ("Ó")
 84  T   212  Ô  Capital O, circumflex accent ("Ô")
 85  U   213  Õ  Capital O, tilde ("Õ")
 86  V   214  Ö  Capital O, dieresis or umlaut mark ("Ö")
 87  W   215  ×  Multiply sign
 88  X   216  Ø  Capital O, slash ("Ø")
 89  Y   217  Ù  Capital U, grave accent ("Ù")
 90  Z   218  Ú  Capital U, acute accent ("Ú")
 91  [   219  Û  Capital U, circumflex accent ("Û")
 92  \   220  Ü  Capital U, dieresis or umlaut mark ("Ü")
 93  ]   221  Ý  Capital Y, acute accent ("Ý")
 94  ^   222  Þ  Capital THORN, Icelandic ("Þ")
 95  _   223  ß  Small sharp s, German (sz ligature) ("ß")
 96  `   224  à  Small a, grave accent ("à")
 97  a   225  á  Small a, acute accent ("á")
 98  b   226  â  Small a, circumflex accent ("â")
 99  c   227  ã  Small a, tilde ("ã")
100  d   228  ä  Small a, dieresis or umlaut mark ("ä")
101  e   229  å  Small a, ring ("å")
102  f   230  æ  Small ae dipthong (ligature) ("æ")
103  g   231  ç  Small c, cedilla ("ç")
104  h   232  è  Small e, grave accent ("è")
105  i   233  é  Small e, acute accent ("é")
106  j   234  ê  Small e, circumflex accent ("ê")
107  k   235  ë  Small e, dieresis or umlaut mark ("ë")
108  l   236  ì  Small i, grave accent ("ì")
109  m   237  í  Small i, acute accent ("í")
110  n   238  î  Small i, circumflex accent ("î")
111  o   239  ï  Small i, dieresis or umlaut mark ("ï")
112  p   240  ð  Small eth, Icelandic ("ð")
113  q   241  ñ  Small n, tilde ("ñ")
114  r   242  ò  Small o, grave accent ("ò")
115  s   243  ó  Small o, acute accent ("ó")
116  t   244  ô  Small o, circumflex accent ("ô")
117  u   245  õ  Small o, tilde ("õ")
118  v   246  ö  Small o, dieresis or umlaut mark ("ö")
119  w   247  ÷  Division sign
120  x   248  ø  Small o, slash ("ø")
121  y   249  ù  Small u, grave accent ("ù")
122  z   250  ú  Small u, acute accent ("ú")
123  {   251  û  Small u, circumflex accent ("û")
124  |   252  ü  Small u, dieresis or umlaut mark ("ü")
125  }   253  ý  Small y, acute accent ("ý")
126  ~   254  þ  Small thorn, Icelandic ("þ")
              255  ÿ  Small y, dieresis or umlaut mark ("ÿ")

Unicode: The correct way to display "smart quotes", the trademark symbol, etc.

Some commonly-desired characters, such as the trademark symbol, as well as such typographical niceties as "curly" quotes, dashes, and ellipses, are not part of the ISO 8859-1 character set, and so cannot be displayed properly in HTML 2.0. If you put a raw 8-bit character in your file and intend it to be understood with a non-ISO8859-1 meaning, or put a numeric entity reference between 128 and 159 there (such as "™"), then this is incorrect HTML, which will not display as you intended on browsers on other platforms, and maybe not even on other browsers on the same platform -- even when it "looks right" in your own browser.

One correct way to specify such characters in more recent versions of HTML (starting with the "Cougar" proposal -- now superseded by the proposed HTML 4.0 standard -- and/or "internationalized HTML" as specified in RFC 2070 is to use numeric entities greater than 255, which refer to positions in the Unicode character set, as outlined in the Usenet posting below. Unfortunately, these are only begining to be implemented in some newer brower versions at this moment, but will become more widely implemented in the future. (You can see whether your own browser understands these entities by looking at the third column of the table below.)

(See also http://www.w3.org/pub/WWW/TR/WD-entities (from the "Cougar" draft) or http://www.w3.org/TR/WD-html40-970708/sgml/HTMLmisc.ent (HTML 4.0) for relevant entity lists in the proposed HTML standards.)

[Question: ’ valid HTML or no?]

The characters 128-159 are not used in ISO 8859-1 and Unicode, the character sets of HTML. MS-Windows uses a superset of ANSI/ISO 8859-1, known to experts as "Code Page 1252 (CP1252)", a Microsoft-specific character set with additional characters in the 128-159 range (also known as the "C1" range).

All the CP1252 characters are also available in Unicode. For example the CP1252 character 146 that you mentioned (RIGHT SINGLE QUOTATION MARK) has the Unicode number 8217, therefore you should use this number in order to conform to the HTML standard. Modern HTML browsers like Netscape 4.0 understand Unicode, and will automatically convert the Unicode character ’ back into the character 146 on MS-Windows machines, and into the appropriate character on other systems.

The official CP1252<->Unicode conversion table is printed in the Unicode 2.0 standard for instance, and is available on in the file ucs-map-cp1252. [See also the file ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT at the official Unicode site.]

The CP1252 characters that are not part of ANSI/ISO 8859-1, and that should therefore always be encoded as Unicode characters greater than 255, are the following:

 Windows   Unicode    Char.
  char.   HTML code   test         Description of Character
  -----     -----     ---          ------------------------
ALT-0130   ‚   ‚    Single Low-9 Quotation Mark
ALT-0131   ƒ    ƒ    Latin Small Letter F With Hook
ALT-0132   „   „    Double Low-9 Quotation Mark
ALT-0133   …   …    Horizontal Ellipsis
ALT-0134   †   †    Dagger
ALT-0135   ‡   ‡    Double Dagger
ALT-0136   ˆ    ˆ    Modifier Letter Circumflex Accent
ALT-0137   ‰   ‰    Per Mille Sign
ALT-0138   Š    Š    Latin Capital Letter S With Caron
ALT-0139   ‹   ‹    Single Left-Pointing Angle Quotation Mark
ALT-0140   Π   Π   Latin Capital Ligature OE
ALT-0145   ‘   ‘    Left Single Quotation Mark
ALT-0146   ’   ’    Right Single Quotation Mark
ALT-0147   “   “    Left Double Quotation Mark
ALT-0148   ”   ”    Right Double Quotation Mark
ALT-0149   •   •    Bullet
ALT-0150   –   –    En Dash
ALT-0151   —   —    Em Dash
ALT-0152   ˜    ˜    Small Tilde
ALT-0153   ™   ™    Trade Mark Sign
ALT-0154   š    š    Latin Small Letter S With Caron
ALT-0155   ›   ›    Single Right-Pointing Angle Quotation Mark
ALT-0156   œ    œ    Latin Small Ligature OE
ALT-0159   Ÿ    Ÿ    Latin Capital Letter Y With Diaeresis


latin 1 characters symbols


Back To Top
© 1998 - 2024 psacake.com
Version 7.21 | Advertise on this site