No studying? Damn! Next thing they'll tell me is I'll have to eat jelly doughnuts or sleep with a supermodel to get things done around here. I ask you, how much can one man give?

Xander ,'Conversations with Dead People'


Buffistechnology 2: You Made Her So She Growls?  

Got a question about technology? Ask it here. Discussion of hardware, software, TiVos, multi-region DVDs, Windows, Macs, LINUX, hand-helds, iPods, anything tech related. Better than any helpdesk!


tommyrot - Oct 12, 2005 5:56:50 am PDT #4943 of 10003
Sir, it's not an offence to let your cat eat your bacon. Okay? And we don't arrest cats, I'm very sorry.

A Unicode character has two bytes, as opposed to one byte for each ASCII character. This allows many more possible Unicode characters (256²?) , in order to accomodate the many characters of the many languages of the world.

That's all I know.


Tom Scola - Oct 12, 2005 6:04:17 am PDT #4944 of 10003
Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.

Very basically, since computers fundamentally deal with (binary) numbers, computers have have had to assign numbers to each of the characters on the keyboard so that it can process them.

Back in the Dark Ages of computing, different computer manufacturers used different sets of numbers to encode letters. In the US, they finally put an end to this madness by specifiying a standard encoding for all the letters and symbols, called ASCII.

But when computers around the world started connecting to each other on the Internet, another problem became apparent: computers in different countries each used their own encoding for their own alphabets, even if their alphabets used the same symbols.

Unicode is an attempt to provide a single encoding for all the alphabet systems in the world. It's a really big, complicated subject.


Tom Scola - Oct 12, 2005 6:06:29 am PDT #4945 of 10003
Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.

A Unicode character has two bytes, as opposed to one byte for each ASCII character. This allows many more possible Unicode characters (256²?) , in order to accomodate the many characters of the many languages of the world.

Unicode characters can be as long as 22 bits. [link]


tommyrot - Oct 12, 2005 6:08:41 am PDT #4946 of 10003
Sir, it's not an offence to let your cat eat your bacon. Okay? And we don't arrest cats, I'm very sorry.

Unicode characters can be as long as 22 bits.

Oh.

Now I know I knew less than I thought I knew.


Steph L. - Oct 12, 2005 6:10:40 am PDT #4947 of 10003
this mess was yours / now your mess is mine

Unicode is an attempt to provide a single encoding for all the alphabet systems in the world

How does/would/will that affect typesetting? We're switching to a paperless system of editing (which might be the straw that breaks my back and gets me to quit), and my boss keeps saying "At the seminar, they said Unicode was important -- do we *have* Unicode? Or do we need to tell the authors *they* need Unicode?"

She has no idea what it is, and seems to think I should.

Basically, we're going to give ourselves carpal tunnel syndrome and eyestrain by editing everything on the computer, in Word. Then the Word files get dumped into Quark (or InDesign, because we might as well change EVERYTHING ALL AT ONCE AND OH NO THAT WON'T CAUSE ANY SNAGS NOT ONE BIT) for layout.

How does Unicode come into play there? Or does it even?


Tom Scola - Oct 12, 2005 6:11:29 am PDT #4948 of 10003
Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.

The very, very, short answer to your question, Steph.

Whatever tool you're using to edit your documents ought to give you the option save your documents using a Unicode encoding. If you're using XML, they're probably already Unicode to begin with.

There are several different ways of storing Unicode files, the most typical ones are UTF-8, and UTF-16. It probably doesn't matter which one you pick, as long as each stage of your workflow is aware of which encoding you're using.


Steph L. - Oct 12, 2005 6:12:21 am PDT #4949 of 10003
this mess was yours / now your mess is mine

Whatever tool you're using to edit your documents ought to give you the option save your documents using a Unicode encoding. If you're using XML, they're probably already Unicode to begin with.

Um. So a Word document has the option of being saved as Unicode?


Tom Scola - Oct 12, 2005 6:16:52 am PDT #4950 of 10003
Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.

So a Word document has the option of being saved as Unicode?

If you're saving it as a .doc file, then no. But your tools that read .doc files ought to be able to make sure that all the characters are encoded properly.

If you're saving it as a .txt file, then Unicode would be one of the options for text files.

If you're saving it as a WordML (XML) file, then you get Unicode for free.


Rob - Oct 12, 2005 6:17:04 am PDT #4951 of 10003

Unicode is one of a millions ways of assigning a numeric value to each letter.

ASCII was the one of the first, but it is limited to 128 values and thus had no room for anything but the basic Roman alphabet, numbers, punctuation and a number of extra values used to make printing terminals go bing.

A little while later it became commont to use 256 values to to represent letters, allowing accented characters and a few things like dashes to be included. Sadly, Apple and Microsoft invented two different sets of values for the same characters. Also, many different countries had different sets. For example, the set for Mac in Hungary isn't the same as the set in the United States.

Unicode was invented to be universal, comprehensive assignment of values to letters. It was initially 65,536 values, but has since expanded to a much larger set. It can express the letter from every language you can imagine, including Klingon. It's the default for most text files on the Mac.

There are about a million more details to how Unicode works that might effect typesetting, but I'm afraid descrinbing them would make your head hurt. Is there something specific you need to know or do?


Tom Scola - Oct 12, 2005 6:19:52 am PDT #4952 of 10003
Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.

World alphabets covered by the Unicode standard: [link]