Markdown support #274

Matistjati · 2024-08-17T15:23:17Z

supported for both pdf and html:

problem name and id are automatically inserted at the top of the statement. Spec that we automatically prepend it from problem.yaml
samples
interactive samples
{{nextsample}} and {{remainingsamples}}
commonmark-style images
"common" footnotes. extended commonmark syntax
"common" tables. extended commonmark syntax
inline and display math with $a+b$ and $$a+b$$
svgs, both for md and html (svgs are not sanitized)

A PR of the spec can be found here Kattis/problem-package-format#386.

TODO testing:

are the dependencies correct on all systems (pandoc + rsvg-convert)?
do we catch all images so their source can be sanitized?

not planned to be added in this PR (maybe ever):

tex -> html using pandoc (remove plastex)
tex -> pdf (we will keep using pdflatex for this)
pdf images (they look terrible)
webp images (poor support)
SVG sanitization (we'll do that on the Kattis side, don't want to add a huge dependency for this)

Matistjati · 2025-03-12T23:05:38Z

@pehrsoderman want to take a look? I think this PR is ready for a review. Here are the statements i tested it on:

https://github.com/Matistjati/kattis-markdown-examples

In general, most statements don't look perfect right off the bat, mostly because they come from their own markdown dialect. However, I believe that all ugliness can be fixed by small tweaks.

gkreitz

I took a pass over this. As mentioned on slack, it seems like pandoc makes no security guarantees on html output and recommends using a sanitizer (https://pandoc.org/chunkedhtml-demo/19-a-note-on-security.html). That's gonna get messy, I'm not a huge fan of any of the python options. If anyone happens to look at this PR and has any recommendations, feel free to chime in.

Looking through, I think I spotted a number of places where we need to be more careful with escaping stuff we inject (problem name, sample, and just as an extra precaution, problem id). I'm not quite sure how to do that cleanly, t.b.h. If I understand the code correctly, it seems like when pandoc goes from markdown to pdf, it goes via LaTeX, which means that inserting plain LaTeX in the markdown works. That dual-step translation makes me wonder how to sanely escape stuff like \ or $ when we inject it (or, as an extreme example, the sample of a problem about parsing latex :)).

problemtools/md2html.py

problemtools/problem2pdf.py

gkreitz · 2025-03-18T09:52:47Z

problemtools/problem2pdf.py

+    # Add problem name and id to the top
+    problem_id = os.path.basename(problem_root)
+    statement_md = r'\centerline{\large %s}' % f"Problem id: {problem_id}" + statement_md
+    statement_md = r'\centerline{\huge %s}' % problem_name + statement_md


We need to escape LaTeX command characters in strings we inject this way. This seems to break if the name contains characters like }, \, or $.

gkreitz · 2025-03-18T10:06:03Z

problemtools/statement_common.py

+        config = verifyproblem.ProblemConfig(prob)
+    if not config.check(None):
+        raise Exception("Invalid problem.yaml")
+    names = config.get("name")


This will break after #286, but you're probably aware of that. :)

Yup, talked with Rasmus and Hugo to hopefully work something out

gkreitz · 2025-03-18T10:15:04Z

problemtools/statement_common.py

+            temp_file.write(sample)
+            temp_file.flush()
+            command = ["pandoc", temp_file.name, "-t" , "markdown"]
+            return subprocess.run(command, capture_output=True, text=True,


A bit of a corner case, but what happens if the sample contains latex (or, more likely, latex control characters like $)? I'm assuming pandoc won't escape that when converting HTML to markdown, so won't that then later mess up the PDF rendering once you've injected this markdown into the document and are trying to pandoc into a pdf?

Painstakingly, this has been fixed.

problemtools/statement_common.py

Matistjati · 2025-04-07T21:08:12Z

TODO:

investigate under which conditions Pandoc will make web request more thoroughly.

Known issues (that will either be wontfix or fixed in follow-up pr):

(HTML) If you place {{remainingsamples}} in the text, you end up with something like
SVGs are currently not supported. Will be fixed in a follow-up PR.
Does not support multipass samples
if problem id has _, it breaks pdf rendering. wontfix, invalid problem name according to spec
samples can overflow in normal tex -> pdf rendering. would be nice to fix, but low priority

gkreitz

I took another pass over it, as requested. This is getting close to ready for merging. The main problem I see are a couple of places where we need to escape latex characters (the rendering path of markdown -> latex -> pdf that pandoc does is funky...).

debian/control

problemtools/statement_common.py

…y isn't needed

Matistjati added 15 commits August 8, 2024 00:54

Add markdown support

cb3ea10

Added display math

868eb39

Add dependencies for markdown

6a01b1c

Style markdown tables

05f6372

Remove temp files

673773e

Statement fix

1c64085

Some refactoring

48d18c7

Added image support in markdown

08645f5

Added footnote support

a6a1933

Code cleanup

7627c58

md -> html works

1b222ac

Make md styling more constistent with latex

712ce3e

md->pdf and Reorganize code

11a2e4c

Better md->pdf tables

480e0ea

Interactive samples for pdf

e9b3f8e

Matistjati mentioned this pull request Aug 18, 2024

Specify Markdown statements Kattis/problem-package-format#239

Closed

Matistjati added 14 commits August 18, 2024 02:14

Remove bplusa

ad3e801

PDF problem name

30d9603

Add dependencies

efc5c9e

Add problem names

762599f

Added problem name to test hello package

2bba9d4

Improve security by running pandoc without shell capabilities

cdd1804

Refactoring

194c7b1

Even more refactoring

554892a

Remove python3-markdown dependency

d8a4c3e

Add problem id to pdf and small fixes

7390fb8

Disable html

46a7003

Change to wikimedia example image

770d5da

Sanitize image sources

11b6a13

Remove SVG dependency

bfd4703

Matistjati added 3 commits March 12, 2025 20:39

Better sample styling

d55df47

Add \nextsample and \remainingsamples

a0b3f9f

Better pdf error handling

cc5f26e

Matistjati marked this pull request as ready for review March 12, 2025 22:59

Matistjati added 3 commits March 13, 2025 22:58

Use {{nextsample}} instead of \nextsample

608fe13

Relax image checking (implied by global regex on filenames)

c3dc3c9

Add svg dependency

6f1698e

gkreitz requested changes Mar 18, 2025

View reviewed changes

Matistjati force-pushed the pandoc branch from 5810c31 to 6f1698e Compare April 5, 2025 14:04

Matistjati added 2 commits April 5, 2025 16:30

Start sanitization + apply feedback

c6b57c8

Better sanitization + lots of tests

cfc285c

Matistjati added 5 commits April 8, 2025 02:19

problem_statement -> statement

5f5d59d

Better md -> pdf sample rendering

213f9ac

Another escape

d745f6e

More careful with images

d4e27a2

Make samplexss more focused

fdde1a4

gkreitz reviewed Apr 8, 2025

View reviewed changes

Matistjati added 11 commits April 9, 2025 06:32

Experimentally reuse normal LaTeX rendering

3ded4a4

Merge remote-tracking branch 'problemtools/develop' into pandoc

9134f30

Use problemtools problem2pdf to handle md -> pdf

79b5a5d

Cleanup

fcda106

librsvg out of focus for this PR

47bda29

Ensure nh3

054448e

Remove ghostscript sanitization. If it wasn't used before, it probabl…

ecdb6c4

…y isn't needed

Add nh3 to deb build

690215f

Linting

77cb2c9

Add back ghostscript sanitization

2e7653f

Remove unnecessary test

51f5539

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown support #274

Markdown support #274

Matistjati commented Aug 17, 2024 •

edited

Loading

Matistjati commented Mar 12, 2025

gkreitz left a comment

gkreitz Mar 18, 2025

gkreitz Mar 18, 2025

Matistjati Apr 4, 2025

gkreitz Mar 18, 2025

Matistjati Apr 8, 2025

Matistjati commented Apr 7, 2025 •

edited

Loading

gkreitz left a comment

Markdown support #274

Are you sure you want to change the base?

Markdown support #274

Conversation

Matistjati commented Aug 17, 2024 • edited Loading

Matistjati commented Mar 12, 2025

gkreitz left a comment

Choose a reason for hiding this comment

gkreitz Mar 18, 2025

Choose a reason for hiding this comment

gkreitz Mar 18, 2025

Choose a reason for hiding this comment

Matistjati Apr 4, 2025

Choose a reason for hiding this comment

gkreitz Mar 18, 2025

Choose a reason for hiding this comment

Matistjati Apr 8, 2025

Choose a reason for hiding this comment

Matistjati commented Apr 7, 2025 • edited Loading

gkreitz left a comment

Choose a reason for hiding this comment

Matistjati commented Aug 17, 2024 •

edited

Loading

Matistjati commented Apr 7, 2025 •

edited

Loading