|
Re: [SOLUTION] Re: lug-bg: utf,ansi,unicode etc...
- Subject: Re: [SOLUTION] Re: lug-bg: utf,ansi,unicode etc...
- From: George Danchev <danchev@xxxxxxxxx>
- Date: Mon, 11 Aug 2003 12:44:02 +0300
--cut--
> taka a sega resheine za tezi koito sa opleskali nestata kato men i weche sa
> go zapisali faila wyw UTF8...
>
> 1. perl -MCPAN -e 'install Convert::Cyrillic'
> 2. perl -MCPAN -e 'install Lingua::DetectCharset'
> 3. pishete si tozi perl script :
>
> #!/usr/bin/perl
> use Convert::Cyrillic;
> use Lingua::DetectCharset;
> undef $/;
> while (<>) {
> $cs = Lingua::DetectCharset::Detect $_;
> print Convert::Cyrillic::cstocs($cs,'win',$_);
> }
a tova e dobre ;-) no kakto kazah nqma 100% strict method da se otgatne
kodiraneto na tova koeto podavash ... ima nesto symnitelno tuka pri
detect-vaneto, t.e. nqmame 100% garanciq franciq 4e ste ucelim input
encoding-a:
http://search.cpan.org/author/JNEYSTADT/cyrillic-1.05/Lingua/DetectCharset.pm
This routine is implemented using algorithm of statistical analysis of text,
which was proved to be very efficient and showed around 99.98% acccuracy in
tests.
Ako znaem input encoding-a, posle konviertiraneto gore dolu e lesno imajki
predvid izklu4eniqta za "symbols-out-of-range" ;-)
--
pub 4096R/0E4BD0AB 2003-03-18 <keyserver.bu.edu>
1AE7 7C66 0A26 5BFF DF22 5D55 1C57 0C89 0E4B D0AB
============================================================================
A mail-list of Linux Users Group - Bulgaria (bulgarian linuxers).
http://www.linux-bulgaria.org - Hosted by Internet Group Ltd. - Stara Zagora
To unsubscribe: http://www.linux-bulgaria.org/public/mail_list.html
============================================================================
|
|
|