Archive for March, 2010

Using the email library Mel with CL

Tuesday, March 30th, 2010

The mel library, if people aren’t familiar with it, is an extremely nice library for parsing various email sources. I started using this as a bit of a project to pull in emails to my central search store. I was extremely surprised how well done this library was – although I admit the documentation is a bit rough around the edges.

So first, if you want to install this you can either install via asdf, likely clbuild, or just grab the darcs repository. Personally I had problems with the asdf install so I went with the stable darcs repo instead. For my install I’ve done the following:

cd /usr/lib/sbcl/site
darcs clone http://common-lisp.net/project/mel-base/darcs/mel-base/
cd ../site-systems
ln -s /usr/lib/sbcl/site/mel-base/mel-base.asd .

The one problem I’ve found is that there is little documentation out there, but we can create some to give a starting point at the very least.

cd /usr/lib/sbcl/site/mel-base/docs/manual
texi2pdf mel.texinfo

You’ll get a mel.pdf if you have texinfo installed.

I will concentrate a bit on using Maildir, since that’s kinda what I was starting with. You connect to a Maildir directory by the following:

(defvar inbox (make-maildir-folder “/home/dthole/Maildir/”))

Form there, you can run quite a few things. The most notable ones around are the functions:

messages
map-messages

messages itself, as the docs describe it, will return all the messages. This is fairly helpful, but the real power comes from map-messages. With map-messages, we can pass a function to evaluate from each message that’s processed. Some sample code of what I’ve done to determine specific email from/to is the following:

(defun mapMessageTest ()
  "From the map-messages, we're given an object, and we can call methods on that."
  (let ((num-ui-froms 0)
        (num-to-greg 0))
    (map-messages #'(lambda (x)
                      (if (scan "foo@bar.com" (address-spec (from x)))
                          (incf num-ui-froms))
                      (if (and (to x) (scan "bar@foo.com" (address-spec
                                                            (if (listp (to x))
                                                            (car (to x))
                                                            (to x)))))
                          (incf num-to-greg)))
                  *inbox*)
    (format t "Num from foo@bar.com: ~a~%" num-ui-froms)
    (format t "Num to bar@foo.com: ~a~%" num-to-greg)))

So what I’m doing here is passing an anonymous function that does two if conditions. x in this case is an message CLOS-type object from mel. We can call (from), (to), (message-string) and so on. Many of these aren’t documented, but the code is there. Anyways, the one curious part people may notice is the issue related to the (address-spec (if ..)) section. The to address can contains a list of elements, if you have more than one email, since you can have multiple tos in an email. I check to say if it’s a list, then get the first element else just return the element. All this is doing is creating two counters and outputting the result at the end.

Another function I mentioned just breifly a bit ago is (message-string). There are a lot of message-* functions out there, this one returns the text of the email.

One big reason why I enjoy this library is the caching it does. I found from my testing that going through 14k emails in my MailDir folder took about 10 seconds or so. Subsequent calls to (mapMessageTest) was MUCH quicker, taking less than a second. There’s some interesting caching the library implemented that’s really helpful while in the REPL. I haven’t looked at the code yet for this, but I’m excited to do it.

The reason I’m finding to use this library is to do some recognition on the emails and copy them to another area on disk. This discussion on documentation and search will be in a later blog article.

Parsing dirty data in Common Lisp

Saturday, March 27th, 2010

I came across a bit of an issue today when I was building a parser in Common Lisp. Basically when I saved my archive folder in Outlook to a giant text file, I would like to parse through each individual email (there are over 8000), and save them to an individual file, on the hard disk. Later I would integrate this with an option with fetchmail to pull in those emails as well into the same folder structure which would then be indexed by DevonThink as an external folder. A bit of a long story, and I plan to write more about it – but for now, what about trying to parse dirty data in common lisp?

There are a few options about how to clean up dirty data. I came across two differnet options with what I was doing:

1. Just pull the readable data – using NIL for the other bits of data
The advantage of doing this is that it’s really fast. You just make a little different call to (with-open-file) and you effectively skip the data that’s not readable. There are a lot of disadvantages to this approach, mainly your data isn’t going to be near like you may have wanted it to be originally. Bullet points, for example, could be translated to a -. This method, though, will make it NIL, or empty. For my case this was OK, I didn’t really care about the translation of this bit of data – I was interested more in the overall theme of the email rather than the specific formatting.

To accomplish this, you can make use something like the following:

(with-open-file (stream parse-file :external-format :latin1)
….)

Thanks to nikodemus on Freenode for this information.

2. Clean up the data

Emacs gives a fairly nice way of handling this, well kinda. When you load the questionable file, you can type C-x RET f, and set the file encoding. I used utf-8-unix at first. Form that, save the file. You should be presented with a warning saying that some stuff can’t be encoded with that file system, blah blah, blah. You can see a listing, a minor list anyways, of characters that it’s complaining about. Cancel the save with C-g, switch to the warning buffer with C-x o and copy each individual character (C-space right arrow C-w). You can either hit enter at this point to view the first occurance of that character, or you can go to the original buffer and search. Once you made your determination of what that encoding char should be, simply hit M-< to go to the beginning, type M-x string-replace, C-y to insert that character, hit enter, then your substitution character of your wish. It’ll replace all occurances in that buffer with what you want. From there, rinse and repeat for the others.

The obvious disadvantage of this is it’ll take much longer to accomplish the task. The advantage is that you’ll end up with a sane file in the end. I started with this method, but went with method 1 in the end.

The one part I couldn’t figure out how to do, and I’ll likely post an update once I get this answered is when you’re trying to save the buffer with overriding the encoding system – it saves it as raw-text-unix, regardless of what I picked. Given an override, the warning states that I’d just lose those characters, which I was OK with. I’ll try to find out more and post later.

Compiling Ruby on Arch

Wednesday, March 24th, 2010

There are a few ways of dealing with Ruby/Rails on arch linux. There is a AUR entry for ruby, for 1.8. The default for pacman is 1.9. For my office, we were stuck with 1.8 for the most part so I decided to install by source (AUR may have been easier, but I don’t know it very well so I didn’t use it right now but may later – I do like being frozen at this version for work purposes though…)

To install it the way I did is fairly simple, all you need to do is:

wget ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7-p249.tar.bz2
tar -xjvf ruby-1.8.7-p249.tar.bz2
./configure –prefix=/usr –enable-shared –enable-pthread
make && make install

For the prefix, I know it’s usually customary to keep things in /usr/local. For my purpose, I didn’t do that because I had a lot of gems already installed (as well as rubygems itself) and they were in that location from the arch install that I did for 1.9. If you prefer it in /usr/local, just change the configure prefix option above.

The tricky part that I ran into so much was the –enable-shared and –enable-pthread options. Without those, our rake test task just would randomly die with an error such as:

note: ruby[5998] exited with preempt_count 1
BUG: scheduling while atomic: ruby/5998/0×10000002
Modules linked in: ipv6 vmsync vmmemctl vmblock vmhgfs ext2 mbcache snd_seq_dummy fan snd_seq_oss snd_seq_midi_event snd_seq snd_ens1371 snd_pcm_oss gameport s
nd_rawmidi snd_seq_device snd_ac97_codec ac97_bus snd_pcm snd_mixer_oss parport_pc uhci_hcd pcnet32 snd_timer battery vmxnet ppdev ehci_hcd snd soundcore snd_p
age_alloc mii usbcore container ac shpchp pci_hotplug i2c_piix4 intel_agp lp i2c_core processor button thermal psmouse pcspkr parport serio_raw evdev sg rtc_cm
os rtc_core rtc_lib reiserfs sr_mod cdrom pata_acpi ata_generic sd_mod floppy ata_piix libata mptspi mptscsih mptbase scsi_transport_spi scsi_mod
Pid: 5998, comm: ruby Tainted: G D W 2.6.32-ARCH #1
Call Trace:
[] ? thread_return+0×666/0x7ae
[] ? __cond_resched+0x1d/0×30
[] ? _cond_resched+0x2e/0×40
[] ? unmap_vmas+0x8e1/0xaa0
[] ? vt_console_print+0x7c/0×330
[] ? exit_mmap+0xc6/0x1d0
[] ? mmput+0×32/0xf0
[] ? exit_mm+0xfa/0×140
[] ? do_exit+0×136/0x7c0
[] ? printk+0×40/0×45
[] ? release_console_sem+0x1b0/0×200
[] ? oops_end+0xa3/0xf0
[] ? no_context+0xfa/0×260
[] ? HgfsDirOpen+0×0/0×30 [vmhgfs]
[] ? page_fault+0×25/0×30
[] ? task_rq_lock+0x3a/0xa0
[] ? try_to_wake_up+0×58/0×330
[] ? __mutex_unlock_slowpath+0xa1/0×150
[] ? HgfsDirLlseek+0×98/0xe0 [vmhgfs]
[] ? sys_lseek+0x6e/0×90
[] ? system_call_fastpath+0×16/0x1b
stack segment: 0000 [#10] PREEMPT SMP
…..

The part that convinced me of the issue was the line “scheduling while atomic”, which implied it was trying to spawn off another process of sorts and I went hunting around the AUR repository and found the right compile flags that worked. http://aur.archlinux.org/packages/ruby1.8/ruby1.8/PKGBUILD

Overall I was pretty happy that this fix worked – and I’m very curious if we had only one CPU/core, would this have been an issue? I’m not willing to disable it on my virtual machine to find out though.

CouchDB on Arch Linux

Thursday, March 18th, 2010

CouchDB on Arch was a bit of a pain to get working. I wanted to share my thoughts on how I got it to work and hopefully that’ll help solve a few people’s similar problems.

First, I couldn’t find a prebuilt package within arch (pacman) so we had to go totally by source of couchdb, to make this work. You can do it by the following steps:

1. Install Pacman Dependencies
pacman -S gcc make erlang extra/icu spidermonkey automake autoconf curl
2. Download the source code from the web site:

http://www.apache.org/dyn/closer.cgi?path=/couchdb/0.10.1/apache-couchdb-0.10.1.tar.gz

3. Unpack the source code: tar -xzvf apache-couchdb-0.10.1.tar.gz
4. cd into the directory and run ./configure –prefix=/
5. Run make and make install

What you’ll notice is that if you don’t run the above prefix, it goes into strange places, such as /usr/local/rc.d, which complicated matters when it came to finding the install locations for everything. If you already tried to install it, just cd back into the directory at step 4 and run “make uninstall” which will clean it all out first, then rerun configure, make and make install listed above.

You’ll notice that if you run /etc/rc.d/couchdb start, it won’t exactly work. This is because it wants a couchdb user, so lets create that:

useradd -s /bin/couchdb

But…it’s still not happy! Well, now we have permission issues. First we should fix the permissions:

chown -R couchdb:root /var/log/couchdb
chown -R couchdb:root /var/lib/couchdb

This should get you up and running smoothly at least with the basics, but there was one more thing I ran into, and that’s when running ps aux, the paths now have double front-slashes infront of everything. This is OK to my understanding, but you can edit /etc/rc.d/couchdb, and remove the double front-slashes as you see fit.

Now, you should be able to run /etc/rc.d/couchdb start successfully. You should notice some output like the following similar output, if you see only one process and a sleep then there’s a problem:

[root@tdtdev lib]# ps aux | grep -i ‘couchdb’
couchdb 1942 0.0 0.1 13544 1748 pts/3 S 16:38 0:00 /bin/sh -e /bin/couchdb -a \”//etc/couchdb/default.ini\” -a \”//etc/couchdb/local.ini\” -b -r 5 -p //var/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb 1959 0.0 0.0 13544 1012 pts/3 S 16:38 0:00 /bin/sh -e /bin/couchdb -a \”//etc/couchdb/default.ini\” -a \”//etc/couchdb/local.ini\” -b -r 5 -p //var/run/couchdb/couchdb.pid -o /dev/null -e /dev/null -R
couchdb 1960 0.0 1.3 169508 13708 pts/3 Sl 16:38 0:00 /usr/lib/erlang/erts-5.7.3/bin/beam.smp -Bd -K true — -root /usr/lib/erlang -progname erl — -home /home/couchdb -noshell -noinput -smp auto -sasl errlog_type error -pa //lib/couchdb/erlang/lib/couch-0.10.1/ebin //lib/couchdb/erlang/lib/mochiweb-r97/ebin //lib/couchdb/erlang/lib/ibrowse-1.5.2/ebin //lib/couchdb/erlang/lib/erlang-oauth/ebin -eval application:load(ibrowse) -eval application:load(oauth) -eval application:load(crypto) -eval application:load(couch) -eval crypto:start() -eval ssl:start() -eval ibrowse:start() -eval couch_server:start([ "//etc/couchdb/default.ini", "//etc/couchdb/local.ini", "//etc/couchdb/default.ini", "//etc/couchdb/local.ini"]), receive done -> done end. -pidfile //var/run/couchdb/couchdb.pid -heart
couchdb 1969 0.0 0.0 3668 480 ? Ss 16:38 0:00 heart -pid 1960 -ht 11

So a total of 4 processes. You should also be able to visit your local URL too:

http://127.0.0.1:5984/

If you’re still having problems after this, some things I’ve done is edited /etc/rc.d/couchdb as mentioned above, as well as /etc/couchdb/default.ini to remove the double slashes from there.

I hope this helps, it took a bit to really get it here.

Beginning Clojure

Wednesday, March 17th, 2010

I’ve recently begun to dabble in Clojure, a rising project with a fair bit of drive behind it. In a lot of ways it kinda reminds me of the Rails drive, in that the project itself is changing quite rapidly and a number of new people are joining the ranks fairly quickly. First, it’s worth mentioning what Clojure is – incase you haven’t heard of this before. Clojure can be defined as a lisp-like language that sits on top of the JVM. The purpose of it is to give the power that Java has in terms of concurrency and libraries and combine it with the best features of Common Lisp, without including “ugly-aspects” of the lisp history (extremely subjective…)

Common Lisp was pretty much the first language I’ve ever used that I actually fell in love with. There are many things that Common Lisp supports; including a REPL for interactive programming, extremely flexible error handling (restart case, handler case), macros, the multi-paradigm universe, and so on. I’ve found it extremely difficult to really think in a functional way when programming in PHP, Ruby, Python, and so on. Most of my programming, therefore, was done in Common Lisp because it feels cleaner and more simple. Clojure, though, I am finding is a nice balance point that Lisp has a difficulty meeting when it comes to being applicable to the masses. It includes much of the benefits of Lisp, but gives the options to call Java code too. Clojure feels also a little more on the functional side than Common Lisp does in a lot of aspects, or at least the presentation feels to lend itself more toward the functional aspect anyways.

Clojure’s project page can be located here:
http://clojure.org/

Using Clojure is fairly simple and straightforward. There are currently 3 books (that I’ve seen) for this language that talk about syntax, setup, and so on. The one book I decided upon was Programming Clojure, from the Pragmatic Studios. The book is fairly cheap and is of decent quality. You can find a link to it here:

http://www.pragprog.com/titles/shcloj/programming-clojure

There are many editors that support Clojure development, with plugins. Each one of them have different ways of being setup. Personally I settled upon using Emacs and Slime because that’s what I use for my Common Lisp development, and not having to change development environments is really helpful for me. Emacs and Slime may be a bit of a jump for many people, so there are many options when it comes to editors out there. Netbeans with Enclojure was well recommended from my searches. Textmate also has a LISP mode (but it doesn’t have the REPL – I really recommend not using it for serious development).

Clojure is also fairly simple to setup, as long as you find a good guide for doing so. Unfortunately there are many ways to setup Clojure and at a minimum you need JDK, ant, and so on – but this may differ significantly depending on your editor of choice. Emacs with slime offers an option in Emacs 23.X, for automated installed from using ELPA. To be honest, and I really hated this option, setting up this all by ELPA was probably the most simple. Your jar files are downloaded automagically to the ~/.swank-clojure directory and things are magically setup. If you use emacs 23.X and don’t currently do common lisp development, then this will work well. If you want to build stuff manually, and aren’t using emacs then your best bet is to get maven – then build your clojure .jar files accordingly. You can find detailed directions on building it on Ubuntu here:

http://riddell.us/ClojureOnUbuntu.html

If you decide to use Emacs, this video has been helpful from my perspective:

http://www.bestinclass.dk/index.php/2009/12/clojure-101-getting-clojure-slime-installed/

For general clojure development, there are a few areas that I find extremely helpful is leiningen. Essentially what this tool does is give you a working directory that makes clojure development a bit easier. There’s little I can say that the project page doesn’t say better – so you should visit this:

http://github.com/technomancy/leiningen

One last thing I’ll add is if you do Common Lisp development as well. If you do, some time may be sunk into trying to figure out how to make that work. Following the directions in the video you’d be left with an (eval-after-load “slime) block. In there, add something like: (add-to-list ‘slime-lisp-implementations ‘(sbcl (“/usr/bin/sbcl”))), then run slime by hitting M– (hold meta hit dash), M-x and type slime. You should get a prompt that asks you what lisp, type “sbcl” then enter. The other issue I want to bring up is CVS version of swank/slime don’t work with clojure. You should get git://github.com/technomancy/slime.git instead. Note that if you use ELPA to install clojure and deps, it will already install parts of the compatible slime. Running M– M-x slime, sbcl will generate errors about not being able to find source .lisp files. You can fix this by going into the .emacs.d/elpa/slime-../ directory and copying all the .lisp contents from the slime checkout from technomancy into this directory. If you find a cleaner way of accomplishing this, without separate .emacs conf files, please add a comment.

Clojure is a bit of a handful to get started on, but if you find yourself having problems I’d check the google groups group for clojure, and #clojure on freenode IRC.