How to transform html tables into org-mode tables

Sometimes for my work, I find tables in HTML files on the web that I need to process.

There are of course many ways to do it, you can cut and paste them from your browser (which leaves you with tab-separated text files that you can process with awk), you can also process the HTML directly with things like BeautifulSoup.

A solution that works quite well is to convert the HTML tables to org-mode tables, and then use org-mode on emacs (this assumes you use emacs of course).

Pandoc makes it easy to do so. Just type

pandoc -t org toto.html -o toto.org

You can use for instance to process the current file. In this case, the following table:

Company	Contact	Country
Alfreds Futterkiste	Maria Anders	Germany
Centro comercial Moctezuma	Francisco Chang	Mexico

will become

| Company                    | Contact         | Country |
|----------------------------+-----------------+---------|
| Alfreds Futterkiste        | Maria Anders    | Germany |
| Centro comercial Moctezuma | Francisco Chang | Mexico  |

It’s not always perfect (sometimes headers are not considered as headers), but that’s usually all you need to process it with org-mode.