List of Publications in HTML

pandoc
Author

Emmanuel Jeandel

Published

March 1, 2023

There are many ways to obtain a bibliography in HTML, to include in a webpage.

To specify the problem:

I won’t discuss in particular solutions that embed hal in your webpage for instance.

The input .bib file I will use for the test is the following:

@InProceedings{article1,
  author =   {Emmanuel Jeandel and Jeannot Lapin},
  title =    {Is Bibtex good},
  booktitle =       "International Colloquium on Bibtex (ICB)",
  year =            2020,
  url = "http://not.a.real.url",
  abstract = "This is an abstract"
}

@Article{article2,
  author =   {Emmanuel Jeandel},
  title =    {HTML and Bibtex},
  journal =      {Theory of HTML},
  pages = "12--49",
  volume = 12,
  year =     2023,
  url = "http://a.real.url",          
  abstract = " HTML is good",
}

To continue with the specs:

I will discuss a few solutions to this problem. The solution I use is the last one. You will see it is prettier and corresponds exactly to my needs, but that doesn’t mean that the other are unsuitable: I have of course taken more time to prettify the solution I use.

Pandoc

Pandoc can directly convert .bib files to .html with the command

pandoc -C -f bibtex toto.bib --csl=mine.csl -t html

If we launch pandoc on the previous file, we obtain the following output:

  1. Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020. DL:http://not.a.real.url Abstract:This is an abstract
  1. Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, (2023). DL:http://a.real.url Abstract:HTML is good

Style files for pandoc/citeproc are in the CSL format. (The file mine.csl I used is customized from the ACM format) The result is good if to be included in an article, but not perfect for a webpage. I found it difficult in particular to customize some parts of the output, in particular the href parts, and the abstract parts.

Bibtex2html

Another solution is to use bibtex2html.

bibtex2html -nobibsource -nokeywords -s dsgplain -linebreak -d -r -dl -nodoc -noheader     -nf url "DL" -nofooter publis.bib

Here is the output:

[1]
Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, 2023.
DL ]
HTML is good
[2]
Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020.
DL ]
This is an abstract

The -dl option is to use HTML DL lists rather than tables, which makes the output look nicer imho. The -d -r options are to sort by date in reverse order.

The output is nicer and that’s what I have been using for some time. The dsgplain.bst file is very standard, and any other bibtex style would have worked really. I was not completely happy with the DL lists, but the result is customizable. I didn’t find a way to put class on every tag so that I could customize it entirely, but it’s still quite good.

A silly attempt

Another possible solution is to actually create a .bst file for bibtex in such a way that the output is directly HTML. Once you understand how .bst files work (it uses reverse polish, à-la forth and postscript) , this is not hard.

I didn’t choose this solution for one reason: my .bib files might contain latex commands (for accents and math) and using this solution means the latex commands would be put in the HTML verbatim. I realized this very soon and didn’t pursue this solution.

The final result

bibtex is very good at producing .tex files from .bib. I realized that the best solution in my case was to use the following workflow:

  • Convert the .bib into a .tex (actually a .bbl) using bibtex with a customized .bst (bibtex style file)
  • Convert the .tex into .html using pandoc

I use the following script to do all the work:

#!/bin/bash

X=${1%.bib}
echo  "
\select@language{english}
\citation{*}
\bibstyle{dsgplain}
\bibdata{$X}" > ${1%.bib}.aux
bibtex ${1%.bib}
pandoc -f latex  ${1%.bib}.bbl -o ${1%.bib}.html
rm ${1%.bib}.aux
rm ${1%.bib}.blg

Here is the result:

  1. Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, 2023 [DL]

    Abstract

    HTML is good

  2. Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020 [DL]

    Abstract

    This is an abstract

Here is the .bst file I use for this result.

Finishing notes

The last solution is actually not what I’m using. My workflow is a bit more complex: As I use quarto for my webpage, I decided to convert the .tex into markdown using pandoc, and then process the markdown with quarto.

This made me able to have a presentation of abstracts that is better than the previous example and can be collapsed (see my publications page on the top).

Something tricky happened however. When you convert the first bibtex entry above in markdown, here is what you obtain:

1.  Emmanuel Jeandel. **HTML and Bibtex**. *Theory of HTML*, 12:12--49,
    2023 [\[DL\]](http://a.real.url)

    Abstract

    :   HTML is good

Notice that the second line starts with 2023. If I had put a period at the end of the year (to obtain 2023. instead of 2023), this could have been interpreted by pandoc/quarto to represent the beginning of a numbered list !

I still haven’t found a satisfying solution to this problem, outside of hacks.