Skip to content

Adding non-breakable spaces lua filter #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
1c951b5
Create sample.md
Delanii Oct 13, 2020
ed6a50a
Add files via upload
Delanii Oct 13, 2020
dd2f12e
Add files via upload
Delanii Oct 13, 2020
36c91bf
Update README.md
Delanii Nov 6, 2020
7df6f7c
Rename sample.md to sampleCZ.md
Delanii Nov 6, 2020
64bf968
Add files via upload
Delanii Nov 6, 2020
e167899
Rename expected.html to expectedCZ.html
Delanii Nov 6, 2020
5e829a0
Update nonbreakablespace.lua
Delanii Nov 6, 2020
6a62c16
Rename nonbreakablespace.lua to pandocVlna.lua
Delanii Nov 6, 2020
6a8bc4f
Delete expectedCZ.html
Delanii Nov 6, 2020
d0bc724
Add files via upload
Delanii Nov 6, 2020
feec602
Update sampleCZ.md
Delanii Nov 6, 2020
0cca88c
Add files via upload
Delanii Nov 6, 2020
ea8e13a
Update makefile
Delanii Nov 6, 2020
f78a1fa
Delete expectedCZ.html
Delanii Nov 8, 2020
4b5d58a
New test file with fixes error causes by SoftBreak element
Delanii Nov 8, 2020
d5fce8b
Now testing also for SoftBreak element
Delanii Nov 8, 2020
80a6900
Delete expectedEN.html
Delanii Nov 8, 2020
25a7564
New test file for english
Delanii Nov 8, 2020
a4c6e81
Delete expectedEN.html
Delanii Nov 8, 2020
5f13001
Another try for test file for english
Delanii Nov 8, 2020
f060207
Delete README.md
Delanii Nov 10, 2020
c02aa69
Delete expectedCZ.html
Delanii Nov 10, 2020
22c7c82
Delete expectedEN.html
Delanii Nov 10, 2020
fa6a2b8
Delete makefile
Delanii Nov 10, 2020
7008e81
Delete pandocVlna.lua
Delanii Nov 10, 2020
a675d44
Delete sampleCZ.md
Delanii Nov 10, 2020
5ebf029
Delete sampleEN.md
Delanii Nov 10, 2020
bc67eb1
Reupload
Delanii Nov 10, 2020
7de7a14
Add files via upload
Delanii Nov 10, 2020
76b00e7
Update pandocVlna.lua
Delanii Nov 11, 2020
11794ff
Delete pandocVlna.lua
Delanii Nov 11, 2020
7536ba9
Add files via upload
Delanii Nov 11, 2020
dad0797
Rewrite or README.md: Default setting to English
Delanii Nov 22, 2020
8f7501f
Update per suggestions in PR
Delanii Nov 22, 2020
ce063f3
Bugfix
Delanii Dec 2, 2020
d407e90
Bugfix if last element in par block is "Space"
Delanii Mar 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions nonbreakablespace/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Non-breakable space filter

This filter replaces regular spaces with non-breakable spaces according to
predefined conditions. Currently, this filter replaces regular spaces with
unbreakable ones after one-letter words (prefixes and conjunctions):
'a', 'i', 'k', 'o', 's', 'u', 'v', 'z'; and theyre uppercase variant. Also
inserts non-breakable spaces in front of en-dashes and in front of numbers.
Some extra effort is taken in detecting these patterns in *not-fully* parsed
strings (for example, if this filter is used after some macro replacing
filter).

In this regard this filter functions similarly like TeX `vlna` preprocessor
or LuaTeX `luavlna` package.

The default settings are conformant to Czech typography rules, but these can
be changed easily by user customization in filter file `nonbreakablespace.lua`
by changing contents of `prefixes` or `dashes` tables.

Currently supported formats are:

* LaTeX a ConTeXt
* Open Office Document
* MS Word
* HTML

**NOTE**: Using this filter increases strain on line-breaking patterns. Whenever
possible, consider allowing hyphenation.
12 changes: 12 additions & 0 deletions nonbreakablespace/expectedCZ.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<h1 id="tests">Tests</h1>
<h2 id="basic-test">Basic test</h2>
<p>a&nbsp;test i&nbsp;test k&nbsp;test o&nbsp;test s&nbsp;test u&nbsp;test v&nbsp;test z&nbsp;test A&nbsp;test I&nbsp;test K&nbsp;test O test S&nbsp;test U&nbsp;test V&nbsp;test Z&nbsp;test&nbsp;– test&nbsp;– test</p>
<h2 id="test-with-numbers">Test with numbers</h2>
<p>Test&nbsp;19 test “19” test</p>
<h2 id="test-of-double-prefixes.">Test of double prefixes.</h2>
<p>A&nbsp;i&nbsp;test, i&nbsp;v&nbsp;test, a&nbsp;k&nbsp;test, a&nbsp;v&nbsp;test.</p>
<h2 id="test-of-block-code">Test of block code</h2>
<pre><code>a = 5
k = &quot;test&quot;</code></pre>
<h2 id="test-of-inline-code">Test of inline code</h2>
<p>Test <code>a = 5</code> test</p>
12 changes: 12 additions & 0 deletions nonbreakablespace/expectedEN.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<h1 id="tests">Tests</h1>
<h2 id="basic-test">Basic test</h2>
<p>a&nbsp;test i test A&nbsp;test I&nbsp;test the&nbsp;test The&nbsp;test&nbsp;– test&nbsp;– test</p>
<h2 id="test-with-numbers">Test with numbers</h2>
<p>Test&nbsp;19 test “19” test</p>
<h2 id="test-of-double-prefixes.">Test of double prefixes.</h2>
<p>A&nbsp;i test, i v test, a&nbsp;k test, a&nbsp;v test.</p>
<h2 id="test-of-block-code">Test of block code</h2>
<pre><code>a = 5
k = &quot;test&quot;</code></pre>
<h2 id="test-of-inline-code">Test of inline code</h2>
<p>Test <code>a = 5</code> test</p>
6 changes: 6 additions & 0 deletions nonbreakablespace/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
DIFF ?= diff --strip-trailing-cr -u

test:
@pandoc --lua-filter=pandocVlna.lua sampleCZ.md | $(DIFF) expectedCZ.html -
@pandoc --lua-filter=pandocVlna.lua sampleEN.md | $(DIFF) expectedEN.html -
.PHONY: test
202 changes: 202 additions & 0 deletions nonbreakablespace/pandocVlna.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
local utils = require 'pandoc.utils'
local stringify = utils.stringify

--[[
Indexed table of one-letter prefixes, after which should be inserted '\160'.
Verbose, but can be changed per user requirements.
--]]

local prefixes = {}

local prefixesEN = {
'I',
'a',
'A',
'the',
'The'
}

local prefixes = {
'a',
'i',
'k',
'o',
's',
'u',
'v',
'z',
'A',
'I',
'K',
'O',
'S',
'U',
'V',
'Z'
}

-- Set `prefixes` according to `lang` metadata value
function Meta(meta)
if meta.lang then
langSet = stringify(meta.lang)

if langSet == 'cs' then
prefixes = prefixesCZ
else
prefixes = prefixesEN --default to english prefixes
end

else
prefixes = prefixesEN --default to english prefixes
end

return prefixes
end

--[[
Some languages (czech among them) require nonbreakable space *before* long dash
--]]

local dashes = {
'--',
'–'
}

--[[
Table of replacement elements
--]]

local nonbreakablespaces = {
html = '&nbsp;',
latex = '~',
context = '~'
}

--[[
Function responsible for searching for one-letter prefixes, after which is
inserted non-breakable space. Function is short-circuited, that means:

* If it finds match with `prefix` in `prefixes` table, then it returns `true`.
* Otherwise, after the iteration is finished, returns `false` (prefix wasnt
found).
--]]

function find_one_letter_prefix(my_string)
for index, prefix in ipairs(prefixes) do
if my_string == prefix then
return true
end
end
return false
end

--[[
Function responsible for searching for dashes, before whose is inserted
non-breakable space. Function is short-circuited, that means:

* If it finds match with `dash` in `dashes` table, then it returns `true`.
* Otherwise, after the iteration is finished, returns `false` (dash wasnt
found).
--]]

function find_dashes(my_dash)
for index, dash in ipairs(dashes) do
if my_dash == dash then
return true
end
end
return false
end

--[[
Function to determine Space element replacement for non-breakable space according to output format
--]]

function insert_nonbreakable_space(format)
if format == 'html' then
return pandoc.RawInline('html', nonbreakablespaces.html)
elseif format:match 'latex' then
return pandoc.RawInline('tex',nonbreakablespaces.latex)
elseif format:match 'context' then
return pandoc.RawInline('tex',nonbreakablespaces.latex)
else
--fallback to inserting non-breakable space unicode symbol
return pandoc.Str '\u{a0}'
end
end

--[[
Core filter function:

* It iterates over all inline elements in block
* If it finds Space element, uses previously defined functions to find
`prefixes` or `dashes`
* Replaces Space element with `Str '\u{a0}'`, which is non-breakable space
representation
* Returns modified list of inlines
--]]

function Inlines (inlines)

--variable holding replacement value for the non-breakable space
local insert = insert_nonbreakable_space(FORMAT)

for i = 1, #inlines do
if inlines[i].t == 'Space' then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming the elements which we are looking at? I find prev or previous much easier to read than inlines[i - 1].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can accomodate for that, eventhough in this case for me personally was easier to see exactly what is it I am calling at.

I have made required changes, but I am having trouble assigning the replacement string wtih them, meaning if I write:

currentElement = insert

it doesnt work.

But this works:

inlines[i] = insert

works. It makes sense that these make-up variables dont reassing to original inlines list, but how can I accomodate for that?

I have also noticed another issue - writing SoftBreak element instead of Space in the place, where should be &nbsp wont trigger replacement (of course) - I have remedied for that: Now I am testing for Space of SoftBreak elements.

I have uploaded new files.

Please, let me know what you think about it. I personally in this specific case would prefer writing inlines[i]; but modifying the filter as you suggest would allow me to learn more, so I am open to doing that. After resolving this issue, I will work on the next one.


-- Check for one-letter prefixes in Str before Space

if inlines[i - 1].t == 'Str' then
local one_letter_prefix = find_one_letter_prefix(inlines[i - 1].text)
if one_letter_prefix == true then
-- inlines[i] = pandoc.Str '\xc2\xa0' -- Both work
inlines[i] = insert
end
end

-- Check for dashes in Str after Space

if inlines[i + 1].t == 'Str' then
local dash = find_dashes(inlines[i + 1].text)
if dash == true then
inlines[i] = insert
end
end

-- Check for not fully parsed Str elements - Those might be products of
-- other filters, that were executed before this one

if inlines[i + 1].t == 'Str' then
if string.match(inlines[i + 1].text, '%.*%s*[„]?%d+[“]?%s*%.*') then
inlines[i] = insert
end
end

end

--[[
Check for Str containing sequence " prefix ", which might occur in case of
preceding filter creates it in one Str element. Also check, if quotation
mark is present introduced by "quotation.lua" filter
--]]

if inlines[i].t == 'Str' then
for index, prefix in ipairs(prefixes) do
if string.match(inlines[i].text, '%.*%s+[„]?' .. prefix .. '[“]?%s+%.*') then
front, detection, replacement, back = string.match(inlines[i].c,
'(%.*)(%s+[„]?' .. prefix .. '[“]?)(%s+)(%.*)')

inlines[i].text = front .. detection .. insert .. back
end
end
end

end
return inlines
end

-- This should change the order of running functions: Meta - Inlines - rest ...
return {
{Meta = Meta},
{Inlines = Inlines},
}
28 changes: 28 additions & 0 deletions nonbreakablespace/sampleCZ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
lang: cs
---

# Tests

## Basic test

a test i test k test o test s test u test v test z test A test I test K test O test S test U test V test Z test -- test – test

## Test with numbers

Test 19 test "19" test

## Test of double prefixes.

A i test, i v test, a k test, a v test.

## Test of block code

```
a = 5
k = "test"
```

## Test of inline code

Test `a = 5` test
28 changes: 28 additions & 0 deletions nonbreakablespace/sampleEN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
lang: cs
---

# Tests

## Basic test

a test i test A test I test the test The test -- test – test

## Test with numbers

Test 19 test "19" test

## Test of double prefixes.

A i test, i v test, a k test, a v test.

## Test of block code

```
a = 5
k = "test"
```

## Test of inline code

Test `a = 5` test