Edukira salto egin | Salto egin nabigazioara

Tresna pertsonalak

Eibarko peoria, San Blasa baino hobia
Hemen zaude: Hasiera / Blogak / Ingelesen hilerria / The English Cemetery

English-language version of Luistxo Fernandez's blog

Tim Foster's blog at Sun Microsystems: Thoughts on language, translation and tools

Luistxo Fernandez 2004/06/15 13:05

Sharing translation resources, tm tools... interesting topics. A search at Bloglines.com led me to this blog, written by Tim Foster at Sun. Now he's subscribed at my account at Bloglines (although the code of the RSS feed looks odd at Bloglines, there must be some error). I suppose the Tumatxa project (sharing translation memories on the web) could be of interest for this guy. But obviously, at our a small Basque company we may be working on the same field, but we're certainly on a very different scale.

Trackback URLs not to be clicked

moblog 2004/06/10 15:22

A friend posts in a list that he would like Googlebot not to click on the trackback URL's offered by his blog. I think that TB Url's are not to be clicked, neither by humans nor by robots. They're just to be copied by bloggers to paste them into their own posts. In this Coreblog of mine I touched the templates to let the TBping URL appear in a "text input" field. I feel like it's easier to copy a line of text when it's in an input field like that.

Gmane discussion about RSS feeds for mailing lists

Luistxo Fernandez 2004/05/27 15:48

Gmane. A great resource. They arhive mailing lists, and offer that content as searchable web pages or Usenet newsgroups. It's creator Lars is weighing the possibility to offer also RSS feeds, but he doesn't seem much convinced... After seeing the way another list archiving system, Mailbucket, uses to provide feeds (example feed for list ASRG ), Lars from Gmane says:

> Hm. I had imagined presenting excerpts or something, but these are > full mails. Does anybody really read mail in this manner? It seems > really... odd. >

My own opinion, was posted at Gmane.discuss and here it goes:

Yahoo provides the first 100 characters (quoted text left aside) of Yahoo Group messages in its almost hidden rss feed. Check this example for instance.

Mail-archive.com offers no content in its feed, just the title line as this example for coreblog-en. That is a Mailman list that actually is being archived also at Gmane

I feel the YG feed is better than Mailbucket's and that Mailbucket's is better than Mail-archive.com's

However, the feeds provided by Google Groups 2 Beta are the best.

Each GoogleGroups 2 Beta has two feeds, one message per message, the other for threads. I would love Gmane to provide such a double feed. As a matter of fact, our basque Mailman mailing list being channeled to Gmane ( here ) and it is also being stored at Google Groups 2 Beta, just to provide another channel for potential readers:

This way, thanks to Mailman, Gmane and Google Groups 2 Beta we offer potential users all channels available from our archive page

And yes, there's people reading mailing lists through feed-readers...

bye bye Movable Type, hello Coreblog

moblog 2004/05/17 09:14

The big affair with Movable Type, its new licenses and pricing structure, has ignited a debate in Internet. Many are weighting alternatives to MT to migrate their blogs to a free open environment... I think it could be a good moment to publicize Coreblog.

Here there are a couple of table comparisons for people willing to leave MT: one , two

Coreblog doesn't appear at those comparisons. Well, I leave this point here for comment... Maybe Atsushi at Webcore will take note.

If Coreblog had more users, we'd have a stronger community, and a stronger product as well.

Ten Commandments for bilingual blogs

Luistxo Fernandez 2004/04/29 17:02

Is it possible, bilingual blogging?

Yes, and there are sites out there. But, the truly coherent, consistent, bilingual blog... I just haven't found it. My blog isn't either that kind of blog. I have written a list of features (Ten Commandments) with which true multilingual blogs should comply.

My own blog is, surprise! quite compliant. Well, this list reflects personal viewpoints, so no one should feel disappointed with this superiority of standards shown by The English Cemetery: it's a biased commandment list. However, I miserably fail with commandments 6 and 7, so far. Regarding 4, my blog could behave better also (and it can, using the power of Zope and Localizer, but I'm just too lazy right now).

So, the good, true and faithful bilingual blog should have:

  1. Language change. There should be a button, link, or pulldown menu to click or select, present in every page. That's the way to turn from reading content in one language to the other in a bilingual blog. Mixtures of languages in individual pages, no, that's not OK. Langauge change behaviour could vary: the distinction between symmetrical and asymmetrical blogs that I describe here is a key issue.
  2. Monolingual entry page. The main page appears to you in a given language, coherently English, or coherently Basque. Then, you may opt to change language. The first page to appear may be set by default, or perhaps, depend on browser settings. This commandement rules out the usual mixture of Lang1 / Lang2 messages in the first page, ordered by pure chronology, as well as the very curious double-horizontal layout of several sites.
  3. Interface as well as content bilingualism. You are reading a Basque post, so you can click on the Erantzunak link, if you know what it means. You are reading an English post, so you can click on the Comments link. Interface bilingualism should be bilingualism, not double-strings. No "Erantzunak / Comments" links. I don't like the redundancy at this en-fr or en-de sites. Messages should be in one or other language, depending on the content or language-category of the post you are reading, or in the action taken by the users when clicking the language change option.
  4. Interface string localisation capabilities. Not just single terms, by locale sensitive logic issues like date formats (and dates are important when blogging) should be localised in each language. In XML feeds, date formats should be standard.
  5. No double reading work. These people, for instance, they translate every post, so they explain things twice, once in A, another one in B. ( Transblawg , 0909 ) Such a blog could work with symmetrical model described in a previous post . One may fell the need to say the same things in several languages, of course, but, the reader? I can only understand that as an attempt of 2nd language or translation-teaching for your readers. Separate messages makes discussion or commenting consistent as well. Basque readers respond to Basque messages, Spanish readers to Spanish ones. Different threads may be constructed, of course. A bilingual messages can't have a consistent thread behind it: are we supposed to comment also in bilingual ?
  6. Open and coherent categorization. So far, my own blog is trilingual cause I have twisted Coreblog to make just 3 categories as locale-defining factors. The result is that I don't have categories, just language options. Other blogs also use categorization for multilingualism.
  7. Character sets conveniently adapted to non-ascii character sets. At the HTML interface level, as well as in XML feeds or pings (trackbacks) delivered.
  8. One separate XML feed per language. This is the most obvious feature to me. Look at the commandments listed here: the mixing of languages in postings, categories, interfaces and so on can be so complicated. The XML feed must always be clear. Those who use the feed with some aggregator or other need clear messages from our feed. Basque users need a clear XML feed in Basque from this site. Basque users who understand other languages and want the other content that I post here, then it's easy: they can subscribe to the other language feeds as well. The XML feed should include, if possible the lang variable marked following the Dublin Core standard or in any other feed specification that there might be.
  9. Same system (that is, ONE system) for the bilingual blog. If it's a moblog, the email posting procedure must be the same for the whole blog, with just one variable (a kewyord in the message or something) to direct the email to the Basque or English section of the blog. If you change the skin, the css-style, whatever, you change it once, it is applied conveniently to the whole blog, to its contents.
  10. Should be based in free software and its content protected by an open license, like creative commons or FDL. I needed the 10th commandement to reach the magic number, so I included this one :-)

Examples out there.

  • Blogalization is funny, several languages are categories, but there other supra-language categories, all in English, for English posts. English is supra-categorical... and interface messages are only in English. Not, clearly, what I ask in point #6.
  • Miss-Information.net has clear separated categories (good, as for #6), some are for french posts, the other for English posts. But the main index page is a chronological mixture, not a coherent page that I ask in point #2. Moreover, the interface messages are "double", not bilingual as I ask in point #4.
  • Joi Ito, and his double blog, in English and Japanese . Good try. However, no interface bilingualism (#4). The japanese side, has the same Movable Type message collection in English. Moreover, I am not truly convinced if it is really what I ask in #9, just one machinery or two are there working in this site?
  • Some posts in one language, others in other. Just chronological ordering and mixing, and also a mix, or monolingualism in interface messages. There are several of this kind. They seem interesting, btw. This one is Farsi and English and this one Chinese and English
  • Separate feeds. These two sites have separate feeds, Polish/English and Dutch/German":http://www.interdependent.biz/main/index2.html But they look awkward with their double home-page: Horizontal scrolling cannot be the solution for bilingual blogging.

Coreblog localised or so-so

Luistxo Fernandez 2004/04/29 16:50

I have advanced in my i18n / l10n attempts with Coreblog. I have released my localised skin as a series of zexp files. This attempt needs Localizer, first of all, http://www.localizer.org If you have Localizer installed, then you can proceed.

I have released my localised skin as a series of zexp files. Importing that folder... then, you change the skin and there it is. I have also released some notes so others can localize it in other languages. As a matter of fact, it's not more work than translating some 60 or so strings in a .po file. Look here to check .po files: http://www.tumatxa.com/intl/ZTMX/coreblog

Instalation how-to's an the zexp files, here: http://www.manterola.org/familia/luistxo/coreblog/en

A localised Coreblog looks like

http://www.manterola.org/familia/luistxo/coreblog/esblog in Spanish, or http://www.manterola.org/familia/luistxo/coreblog/triblog with language change in the interface

Of course, this is nothing more than a personal attempt. I hope Atsushi and the people at Webcore will face one day the i18n of the original product. Without that, sustainability of l10n efforts is difficult. I must also say that I am no techie at all. I just have some user experience with Zope, and also have some ideas about i18n and l10n.

At least I tried to follow the clear and standard way that Localizer provides for l10n: a string repository following the Gettext methodology, logic locale-issues stored in particular folders...

A disclaimer: I tried to provide the original default style_css styles sheet with these skins of mine, but the graphic output didn't look the same as in the original (I don't know why). So, the skins come with a slightly modified skin, partly copied (fonts and other things) from Tom Lazar's coreblog at <http://www.tomster.org/blog>Tomster.org

False, true, symmetrical and asymetrical multilingual blogs

Luistxo Fernandez 2004/04/25 21:57

L10n in blogs can result in different types of Non-English sites. A localised blog in a given language it's the most obviuous, but then there is that curious possibility: the bilingual or multilingual blog, a territory of the net not totally explored to date. L10n in blogs can result in different types of Non-English sites.

  1. A localised blog, monolingual, in a given language.
  2. A multilingual blog, where a given language_change button transforms the interface from language a to b, and viceversa.
  3. A multilingual blog, where a given language_change button transforms the interface, and also the content visualization, from language a to b, and viceversa.

Some might think that 2) and 3) are the same, but they are not. Not at all.

In the Nuke family of PHP products for blogging and portal management, they have achieved stage 2) quite well.

Is example, nukeunited.com is a Nuke site with English content but in which you can change the interface language. But that's not multilingualism. It's a fun trick, at most. Probably, not even fun. What's the interest of a blog with content in English, to have it's interface in Bulgarian? Those are false multilingual blogs.

I have my own false multilingual blog. It's my triblog here, set up as an example of my personal Coreblog l10n effort, not a real posting site. The content is uniform in one language (a non-language, in this case, just lorem ipsum chatter) but if you click on the Language Change options, you can see the interface in Basque, Spanish or English.

----

In a true multilingual blog, however, content should change when you click the language_change button. That's what the user expects, at least. He/she is reading something in English in a given site, but if a "Spanish" button is offered, the user obviously expects to continue reading the content in Spanish, and not just changing a couple of menu-messages.

This kind of multilingual blog has, in turn, different possibilities. Mainly, it can be symetric or asimetric.

SYM is a symmetric blog. You are reading an item titled Elvis is alive, click on language change, and, well, you get the Spanish version of it: Elvis est&#225; vivo. Of course, this requires that, when posting, you have to fill double data in the form, two titles, two text bodies... It cannot work otherwise. Some people use to type double entries for their blogs, but just in one entry: check Merdeinfrance . If the software this blogger uses could be adapted to a doubled-input interface, it could have truly symmetric blogs: now the it's a double-reading blog.

ASYM is an asymmetric blog. Posts are directed, either to an English version, or the Spanish version. We may have 3 messages in English, Elvis is alive, Nostradamus was right, and Mars attacks Earth. And, just two in Spanish, Elvis est&#225; vivo, and Carlos Gardel resucita. In this site, when posting, you fill a usual form, but there must be some language choice to be made (or perhaps, two posting forms, one per language).

In ASYM, if you are reading Elvis is alive and click language_change, are you directed to Elvis est&#225; vivo? In order to make that possible, there must be some variables, marked somehow, linking those two posts. However, where's that variables when there is no equivalent to a given posting? You are reading Mars attacks Earth and click on language_change... Then you end up reading Carlos Gardel resucita or what?

My personal opinion is that, if you are reading Mars attacks Earth on ASYM type blog, and click on language change, you should land on the Spanish index page of the blog, simply.

Are this kind of blogs possible? Yes. The SYM type is a very closed news-blog type. I think that it is fit only for a corporate-like news posting system or so. A good example is ThyssenKrupp's press releases (A corporate newsroom is a blog? basically, yes, IMHO. Index page with latest posts, the newer ones are up, the other down, click on a given link to read it all...). Some bloggers that opt to translate everything they write, could also switch to that kind of interface.

ASYM type is more free. More apt for the average bilingual blogger. At my company we have developed several of these, based in Squishdot, but they are in the corporate-line, as news service of the sites. No comments allowed, although moderated contributions are possible in some sites.

Check http://www.eizie.org/News enter any article, and change language to test. The result is different at other levels of that website, as the general content-tree is, unlike, the News-section, truly symmetrical. So, language change is different at http://www.eizie.org/Tresnak

Basque Indymedia is also of this kind. And my own blog, The English Cemetery, wants to be of this kind, asymmetrical and multilingual.

Some posts in Basque, others in English...

However, mine is not a very usual multilingual blog. There are bilingual blogs out there, not in the thousands, but a bunch of them, and they are very different... Most (and mine, also) have some unsatisfactory feature, some l10n detail unresolved. That's the subject for another post, the Ten Commandments for Bilingual Blogs.

Localised blogs: Some examples from the real world

Luistxo Fernandez 2004/04/24 18:54

As I mentioned in the previous post, whatever the i18n degree that blogging software or services may have reached, users almost always have the option to create their own localised blog in a given language.

I have found several Welsh examples, that resume probably the typology of l10n results that particular bloggers may achieve when localizing their blogs. It is a 3-level typology.

  1. No interface localization. The blogging machine seems to be of the EnglishBlog type, or, perhaps, the user doesn't have the expertise to hack for l10n. Examples: "buchods.blog-city.com/:"http://buchods.blog-city.com/ & BratiaithBlog . Bloggers in both sites post in Welsh, and all interface messages are in English.
  2. Partial interface localization. The blogging machine seems to have some degree of i18n, or, perhaps, personalization of the blog permits the user to touch strings and things like tkat. Unardegg looks very much Welsh, but date formats appear in English, with mixed messages like Postiwyd gan Mr Coch yn Tuesday, April 20 @ 13:05:44 GMT. I assume that the blogger couldn't reach that level of date-format l10n... If the blogger knew or could have done it, I suspect that the dates would be also in proper Welsh.
  3. Consistent localization. Morfablog looks Welsh, and dates are also in Welsh. Qgil's blog in Catalonia looks also consistently localised

Those blogs are made with a variety of tools.

The first two, the imperfect ones (that's an unfair way to say it, I know....) are hosted in blogging account web services. They seem to offer limited l10n options to their users...

The 2nd case (Unardegg), with partial interface localization, is a blog made with PHP-Nuke. This software has an active i18n and l10n activity but it seems that many Nuke sites have the same date-formatting problem as Unardegg. This example, "nukeunited.com" is a Nuke site with English content but in which you can change the interface language. The resulst are very poor. Enviado por NukeUnited el Thursday, 08 August a las 11:29:49 doesn't look like a very correct Spanish sentence... However, some Nuke users are well aware of that, and have devised partial solutions: this patch for date formatting in Turkish shows that correct dates may be displayed in Turkish, at least... There's hope for Welsh Nuke-sites, after all.

The consistent cases use Movable Type in the case of Morfablog, and Drupal in the case of QGil's blog.

  • Movable Type has an active l10n section at their website.
  • Drupal is in development stage, but it seems that there are good options for bloggers or hackers to develop localized versions, as Qgil shows. There is also discussion going on on the site. In this other post that catalan user of Drupal, Qgil, offers his approach to the variety of multilingualism we may find in the web.

Just to mention CoreBLOG, the software running behind The English Cementery.

  • CoreBLOG. No i18n attempts so far. Well, the software hasn't reached 1.0. However, this is Zope and l10n can be done, hacking a little bit as I am doing. I also hope to release a l10n skin soon. We'll see.

Besides these localised blogs, then there is the curious world of bilingual or multilingual blogs... A much more limited necessity, probably, but well, there are some of us with that strange obsession. That's a matter for another post.

Blog internationalization (i18n) and localization (l10n)

Luistxo Fernandez 2004/04/23 17:47

I will post a series of probably long messages to my blog, regarding i18n and l10n of blogs. Many people has written about blog i18n, as for example Blojsom.com" . But I hope to clarify some points mainly from myself, before going on introducing a very modest Coreblog l10n project of my own... First of all, definitions.

  • Internationalization. The process of planning and developing products so that they can be changed to meet the requirements of specific local languages and cultures.
  • Localization is the actual preparing of data (or the software) for a particular language or locale.

For example the Zope product Plone is i18n aware and has localization for several languages. These terms are also spelled internationalisation and localisation, and shortened in the geek terms i18n and l10n, which are formed by the first and last letter of the word and the number of letters in between.

In the realm of blogging software and bloggers, i18n attempts are (probably) restricted to blogging software producers or online blogging account providers. One day or another, all of them will reach to that point: let's do i18n (as an example, here is the recent resolve of the creator of WordPress.

In the case of free software blogging machines, bloggers with a technical background can in turn re-arrange the original code to make personal i18n attempts, for personal use or re-release following the original license.

In turn, l10n is a more open process. If a software producer makes an i18n version of their blogging software, they nay be able to release different products. Let's suppose the blogging software SuperInternationalBlog by SuperInternational Co. is internationalized. Then they may release:

  • SuperInternationalBlog in Spanish - SuperInternationalBlog in Arabic - SuperInternationalBlog ... - SuperInternationalBlog intl' version with a default English skin but with options and instructions for users to create their localised version.

People may download different versions of SuperInternationalBlog , and also develop new ones. A wide arrange of l10n efforts may result from that.

And then, there is a company EnglishSoft Co. that has released EnglishBlog with no i18n attempt at all, and just as an English version. In this case, it is also possible that localized version of EnglishBlog may arise. How?

  • Users will personalize EnglishBlog as they can, to create a blog in their language.

So, localised blogs may be created both with SuperInternationalBlog and with EnglishBlog. Obviously, SuperInternationalBlog users have more opportunities to create good blogs than EnglishBlog users...

However, the quality of i18n achieved by SuperInternational Co. will affect the output of different attempts.

Not all i18n attempts resolve well locale-sensitive issues like date-time formatting, character sets or directionality of script. So, having some SuperInternationalBlog out there does not assure that l10n may be done correctly.

In turn, even the most simple of EnglishBlog-like machines will permit some personalization in skins or so on, and therefore, a localised simple version of EnglishBlog is probably easy to achieve. And then, there's this option for most users: no matter which language is the one that appears at my blogging software, I will post in my language.

Things that are important to assess

We may say that blog i18n will be all the more accurate if a given effort complies with:

  • The sustainabality of l10n efforts. When the software updates, what happens to users' particular localized blogs? Their l10n's will update at the same level?
  • The possibility to share l10n results is also important. Some software developers may ask their users to contribute their l10n packages (be them string collections, skins, or whatever) to a central repository: this practice has very different results in free or propietary software, I guess... Other systems may permit a user-to-user interaction where l10n packages may be shared, with no original developer intervening in that. Probably some l10n trials, however, are too hardcoded in one instance and are difficult to share.
  • Standardization of l10n procedures. If localized strings are stored in .po files, then translations may be shared between different systems, translation memory may be used... .po files are a standard format in the GNU Gettext i18n/l10n framework.
  • The real web effect of Blog software i18n is mostly localized monolingual blogs in a given language. But if i18n also permits to develop multilingual or bilingual blogs, a blog that can be at the same time in English and in Japanese, for instance, that's a step ahead.

Unicode learning, Technorati and Bloglines

Luistxo Fernandez 2004/04/21 22:21

One day I'll learn about Unicode. I suppose it's a must to better understand blog i18n. Three resources that may help in the learning process:

BTW, I have a Technorati account now. Don't see clearly the service's advantages. Bloglines in turn was a sudden revelation for me. I got instatly fascinated and I'm a daily user of my account now.

Aurkezpena
LUISTXO FERNANDEZ

Luistxo works in CodeSyntax, tweets as @Luistxo and tries to manage the automated newssite Niagarank. This Cemetery is part of a distributed multilingual blog (?!). These are the Basque and Spanish versions:

Ingelesen hilerria

El cementerio de los ingleses

 

Subscribe to the Cemetery: RSS entries / RSS comments | By email.

My cultural consumption in English: 2012/13 | Television | Movies | Books

Creative Commons by-sa