Saturday, October 8, 2011

Customizing dict, or Offline dictionary from command line in Ubuntu

This weekend I spent quite some time setting up the dictionary in Ubuntu. My goal was: being able to easily get a translation between the languages I want, and if possible, offline.

I already knew that there is a Dictionary program, which is available in Ubuntu by default or can be easily installed, plus a gnome-dictionary plugin to easily invoke it from the top panel. I had two main problems with this:
  1. My native language, Russian, was not available by default as a target language.
  2. I wanted to be able to use the dictionary also when I am offline (which can happen when one has a laptop).
In addition, I would really love to use the dictionary without a client, just from command line.

How I went solving this problem:
  1. It appears there are better clients for Ubuntu, supporting both dict and other formats. I used StarDict before (plugging in extra dictionaries found elsewhere in the internet) but its interface is rather messed-up, at least for me (while I was on Windows, I used Babylon and it was practically just what I needed). On Ubuntu, recently I discoveblue Fantasdic and it seems to be at least much neater than Stardict; plus, it can itself import dictionaries in other formats (StarDict among them), so it was already an improvement.
  2. Then I found out that it's possible to install local dictd server and let it provide the dictionaries.
After some experiments and poking around the net, I have got dictd up and running and could import some extra dictionaries to it. Here are the steps, meant more as an inspiration than as a cookbook :)
  • You can get an idea what is available from the repos you use by typing something like:
  • apt-cache search dict- As a result, you will see something like:
    dict-jargon - dict package for The Jargon Lexicon
    dict-freedict-afr-deu - Dict package for Afrikaans-German Freedict dictionary
    dict-freedict-iri-eng - Dict package for Irish-English Freedict dictionary
    [...skipped...]
    dict-freedict-tur-eng - Dict package for Turkish-English Freedict dictionary
    dict-freedict-wel-eng - Dict package for Welsh-English Freedict dictionary
    stardict-common - International dictionary - data files
    stardict-czech - Stardict package for Czech dictionary of foreign words
    stardict-english-czech - Stardict package for English-Czech dictionary
    
    Most packages starting with "dict-" will be the dictionaries in the dictd format. The naming scheme, though, is not strict (for example, English-Russian Mueller dictionary is called mueller7-dict and Moby Thesaurus is called dict-moby-thesaurus) but you get the idea. Otherwise, you can just look them up in Synaptic package manager.
  • To install dictd and the additional packages, you can either add them via Synaptic package manager or just apt-get them:
  • apt-get install dictd dict-gcide dict-wn dict-moby-thesaurus [whatever else dictionaries you want]
    Additional dict packages can always be added later.
  • If installation succeeded, you will have your dictd service up an running locally on port 2628! You can check it by typing:
  • /etc/init.d/dictd status
    You should get:
    * dictd is running
    Also, now you can type something like:
    dict athwart
    and get results:
    6 definitions found
    
    From The Collaborative International Dictionary of English v.0.48 [gcide]:
    
    Athwart \A*thwart"\, prep. [Pref. a- + thwart.]
    1. Across; from side to side of.
    [1913 Webster]
    
    Athwart the thicket lone.             --Tennyson.
    [1913 Webster]
    
    2. (Naut.) Across the direction or course of; as, a fleet
    standing athwart our course.
    [...skipped...]
    From Mueller English-Russian Dictionary [mueller7]:
    
    athwart
    [ɜ↗θwɘ:t]
    1. _adv.
    1) косо; поперёк; перпендикулярно
    2) против; наперекор
    2. _prep.
    1) поперёк; через; to run athwart a ship врезаться в борт другого судна;
    to throw a bridge athwart a river перебросить мост через реку
    2) против; вопреки; athwart his plans вопреки его планам
    
    If you want to use the client (like Dictionary or Fantasdic) you can set up your local dictd server as the source there: in preferences, add new source of type "DICT dictionary server", specify "127.0.0.1" as the server address and leave the port number unchanged (2628).
  • In the previous example, I have cheated a bit: you will get less results, because I have put in a couple of additional dictionaries already, converted into dictd format. The reason for hat was that not all dictionaries I needed were available in dictd format, but they could be found in other formats (stardict, sdict, dsl): for example, look here or here (I suspect that the first list is just the combination of all entries from the second one, not sure).
  • The second link also points to the home of XDXF project, where you can get a program called makedict to convert the dictionaries between different formats. This program is not available in the binary form to install, so you can clone the source and build it yourself with standard steps:
    cd [someplace]
    mkdir xdxf
    svn co https://xdxf.svn.sourceforge.net/svnroot/xdxf xdxf
    mkdir makedict-out
    cd makedict-out
    cmake ../xdxf/trunk
    make
    make install 
    After this, you can convert the dictionaries (at least in sdict, stardict and xdxf formats - haven't tried the others) to dictd format using
    makedict -o dictd file-name
  • Finally, a couple of import examples.
    1. For example, suppose you have downloaded English-German dictionary.
    2. You will get a file comn_sdict_axm05_English_German.tar.bz2 in bzip format, and can proceed as follows:
      tar -xvjf comn_sdict_axm05_English_German.tar.bz2
      
      English_German/
      English_German/icon16.png
      English_German/dict.xdxf
      
      So, this is an xdxf format. We don't have to specify it explicitly, specifying output format is enough:
      makedict -o dictd English_German/dict.xdxf 
      
      Write index to English_German/English_German/English_German.index
      Write data to English_German/English_German/English_German.dict
      
      The resulting two files have to be put together with other dictd files (on my machine they dwell in /usr/share/dictd folder by default), the dictd config should be updated and the dictd service should be restarted: mv English_German/English_German/*.* /usr/share/dictd /usr/sbin/dictdconfig --write /etc/init.d/dictd restart Now you should be able to see new dictionary in your client or just check its availability from the terminal:
      dict --dbs
      It will provide a list which should contain the new source (usually named after the file name).
      
      Databases available:
       gcide           The Collaborative International Dictionary of English v.0.48
       wn              WordNet (r) 3.0 (2006)
      [...skipped...]
       English_German  English_German
       fd-eng-fra      English-French Freedict dictionary
       rus_eng_full    rus_eng_full
      
      And it should just work:
      dict athwart
      
      [...skipped...]
      From English_German [English_German]:
      
        <k>athwart<k>
        quer
      
      (Yes, there might be some specific tags which don't look pretty from terminal; they can be removed if needed - the file is just plain text - but that's outside of the current topic). According to the Wiki article about Dict, there is another program formatting text files into .dict and .index files, called dictfmt. I tried using it to format a file in text format generated from dicts.info page, but the format of these text files does not seem to be what dictfmt expects. I didn't spent much time on it yet.
    3. The procedure has an additional extra step for the files in stardict format, for example Dutch-English one.
    4. After unpacking the file, we get the following structure:
      dutch-english.dict.dz  dutch-english.idx  dutch-english.ifo
      
      The converter will complain, because it expects non-compmressed dict file. The additional step is uncompressing:
      dictzip -d dutch-english.dict.dz
      Which will give us:
      ls stardict-dutch-english-2.4.2
      dutch-english.dict  dutch-english.idx  dutch-english.ifo
      makedict -o dictd stardict-dutch-english-2.4.2/dutch-english.ifo
      Write index to stardict-dutch-english-2.4.2/dutch-english/dutch-english.index
      Write data to stardict-dutch-english-2.4.2/dutch-english/dutch-english.dict
      
    5. Another caveat is the index. If the index entries contain anything else than words (lexical definitions, hyphens, etc), then these entries won't be matched with a default search, but can be matched using a different search stragegy.
    6. dict -d English_German ceiling
      1 definition found
      
      From English_German [English_German]:
      
        <k>ceiling<k>
        Höchstbetrag {m}, Obergrenze {f}, Zimmerdecke {f}
      
      dict -d English_German -s suffix ceiling
      
      From English_German [English_German]:
      
        <k>(absolute) ceiling<k>
        Gipfelhöhe {f} (Luftfahrt)
      
      From English_German [English_German]:
      
        <k>asset ceiling<k>
        Höchstgrenze {f}
      
      From English_German [English_German]:
      
        <k>ceiling<k>
        Höchstbetrag {m}, Obergrenze {f}, Zimmerdecke {f}
      
      [...skipped...]
      
That's it for the start! Might not look extremely fancy, but... it's a free horse after all :)

UPDATE: if you have Babylon dictionaries (.BGL), you can convert them into dictd format using (available from Ubuntu distro) program called dictconv.

1 comment:

  1. Oh, removing the 's somehow destroyed my files... :-/

    ReplyDelete