There are many ways to obtain a bibliography in HTML, to include in a webpage.
To specify the problem:
- the input is a
.bib
file (or multiple.bib
files) - the output is an
.html
file, or a file that can be included in html.
I won’t discuss in particular solutions that embed hal in your webpage for instance.
The input .bib file I will use for the test is the following:
@InProceedings{article1,
author = {Emmanuel Jeandel and Jeannot Lapin},
title = {Is Bibtex good},
booktitle = "International Colloquium on Bibtex (ICB)",
year = 2020,
url = "http://not.a.real.url",
abstract = "This is an abstract"
}
@Article{article2,
author = {Emmanuel Jeandel},
title = {HTML and Bibtex},
journal = {Theory of HTML},
pages = "12--49",
volume = 12,
year = 2023,
url = "http://a.real.url",
abstract = " HTML is good",
}
To continue with the specs:
- I want the abstract to be written on the HTML page
- I want a download link to be written on the HTML page
- The bibtex file might contain a few tex commands (typically math symbols in the abstract, and potentially accents in the author field)
I will discuss a few solutions to this problem. The solution I use is the last one. You will see it is prettier and corresponds exactly to my needs, but that doesn’t mean that the other are unsuitable: I have of course taken more time to prettify the solution I use.
Pandoc
Pandoc can directly convert .bib
files to .html
with the command
pandoc -C -f bibtex toto.bib --csl=mine.csl -t html
If we launch pandoc on the previous file, we obtain the following output:
- Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020. DL:http://not.a.real.url Abstract:This is an abstract
- Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, (2023). DL:http://a.real.url Abstract:HTML is good
Style files for pandoc/citeproc are in the CSL format. (The file mine.csl I used is customized from the ACM format) The result is good if to be included in an article, but not perfect for a webpage. I found it difficult in particular to customize some parts of the output, in particular the href parts, and the abstract parts.
Bibtex2html
Another solution is to use bibtex2html.
bibtex2html -nobibsource -nokeywords -s dsgplain -linebreak -d -r -dl -nodoc -noheader -nf url "DL" -nofooter publis.bib
Here is the output:
- [1]
-
Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, 2023.
[ DL ]HTML is good
- [2]
-
Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020.
[ DL ]This is an abstract
The -dl
option is to use HTML DL lists rather than tables, which makes the output look nicer imho. The -d -r
options are to sort by date in reverse order.
The output is nicer and that’s what I have been using for some time. The dsgplain.bst
file is very standard, and any other bibtex style would have worked really. I was not completely happy with the DL lists, but the result is customizable. I didn’t find a way to put class
on every tag so that I could customize it entirely, but it’s still quite good.
A silly attempt
Another possible solution is to actually create a .bst
file for bibtex in such a way that the output is directly HTML. Once you understand how .bst
files work (it uses reverse polish, à-la forth and postscript) , this is not hard.
I didn’t choose this solution for one reason: my .bib
files might contain latex commands (for accents and math) and using this solution means the latex commands would be put in the HTML verbatim. I realized this very soon and didn’t pursue this solution.
The final result
bibtex
is very good at producing .tex
files from .bib
. I realized that the best solution in my case was to use the following workflow:
- Convert the
.bib
into a.tex
(actually a.bbl
) using bibtex with a customized.bst
(bibtex style file) - Convert the
.tex
into.html
using pandoc
I use the following script to do all the work:
#!/bin/bash
X=${1%.bib}
echo "
\select@language{english}
\citation{*}
\bibstyle{dsgplain}
\bibdata{$X}" > ${1%.bib}.aux
bibtex ${1%.bib}
pandoc -f latex ${1%.bib}.bbl -o ${1%.bib}.html
rm ${1%.bib}.aux
rm ${1%.bib}.blg
Here is the result:
-
Emmanuel Jeandel. HTML and Bibtex. Theory of HTML, 12:12–49, 2023 [DL]
- Abstract
-
HTML is good
-
Emmanuel Jeandel and Jeannot Lapin. Is Bibtex good. In International Colloquium on Bibtex (ICB), 2020 [DL]
- Abstract
-
This is an abstract
Here is the .bst file I use for this result.
Finishing notes
The last solution is actually not what I’m using. My workflow is a bit more complex: As I use quarto for my webpage, I decided to convert the .tex
into markdown using pandoc, and then process the markdown with quarto.
This made me able to have a presentation of abstracts that is better than the previous example and can be collapsed (see my publications page on the top).
Something tricky happened however. When you convert the first bibtex entry above in markdown, here is what you obtain:
1. Emmanuel Jeandel. **HTML and Bibtex**. *Theory of HTML*, 12:12--49,
[\[DL\]](http://a.real.url)
2023
Abstract
: HTML is good
Notice that the second line starts with 2023
. If I had put a period at the end of the year (to obtain 2023.
instead of 2023
), this could have been interpreted by pandoc/quarto to represent the beginning of a numbered list !
I still haven’t found a satisfying solution to this problem, outside of hacks.