-
Notifications
You must be signed in to change notification settings - Fork 77
Markdown support #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Markdown support #274
Conversation
@pehrsoderman want to take a look? I think this PR is ready for a review. Here are the statements i tested it on: https://github.com/Matistjati/kattis-markdown-examples In general, most statements don't look perfect right off the bat, mostly because they come from their own markdown dialect. However, I believe that all ugliness can be fixed by small tweaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a pass over this. As mentioned on slack, it seems like pandoc makes no security guarantees on html output and recommends using a sanitizer (https://pandoc.org/chunkedhtml-demo/19-a-note-on-security.html). That's gonna get messy, I'm not a huge fan of any of the python options. If anyone happens to look at this PR and has any recommendations, feel free to chime in.
Looking through, I think I spotted a number of places where we need to be more careful with escaping stuff we inject (problem name, sample, and just as an extra precaution, problem id). I'm not quite sure how to do that cleanly, t.b.h. If I understand the code correctly, it seems like when pandoc goes from markdown to pdf, it goes via LaTeX, which means that inserting plain LaTeX in the markdown works. That dual-step translation makes me wonder how to sanely escape stuff like \
or $
when we inject it (or, as an extreme example, the sample of a problem about parsing latex :)).
problemtools/problem2pdf.py
Outdated
# Add problem name and id to the top | ||
problem_id = os.path.basename(problem_root) | ||
statement_md = r'\centerline{\large %s}' % f"Problem id: {problem_id}" + statement_md | ||
statement_md = r'\centerline{\huge %s}' % problem_name + statement_md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to escape LaTeX command characters in strings we inject this way. This seems to break if the name contains characters like }
, \
, or $
.
problemtools/statement_common.py
Outdated
config = verifyproblem.ProblemConfig(prob) | ||
if not config.check(None): | ||
raise Exception("Invalid problem.yaml") | ||
names = config.get("name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break after #286, but you're probably aware of that. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, talked with Rasmus and Hugo to hopefully work something out
problemtools/statement_common.py
Outdated
temp_file.write(sample) | ||
temp_file.flush() | ||
command = ["pandoc", temp_file.name, "-t" , "markdown"] | ||
return subprocess.run(command, capture_output=True, text=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a corner case, but what happens if the sample contains latex (or, more likely, latex control characters like $
)? I'm assuming pandoc won't escape that when converting HTML to markdown, so won't that then later mess up the PDF rendering once you've injected this markdown into the document and are trying to pandoc
into a pdf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Painstakingly, this has been fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another pass over it, as requested. This is getting close to ready for merging. The main problem I see are a couple of places where we need to escape latex characters (the rendering path of markdown -> latex -> pdf that pandoc does is funky...).
supported for both pdf and html:
$a+b$
and$$a+b$$
A PR of the spec can be found here Kattis/problem-package-format#386.
TODO testing:
not planned to be added in this PR (maybe ever):