I Prefer RST to Markdown (2024)

106 points 104 comments a day ago

Arainach

Previously discussed: https://news.ycombinator.com/item?id=41120254

Copying my thoughts from there which haven't changed:

>To which I say, are you really going to avoid using a good tool just because it makes you puke? Because looking at it makes your stomach churn? Because it offends every fiber of your being?"

Yes. A thousand times yes. Because the biggest advantage of Markdown is that it's easy to read, and its second-biggest advantage is that it's easy to write. How easy it is to parse doesn't matter. How easy it is to extend is largely irrelevant.

Markdown may or may not be the best tool for writing a book, but Markdown is the best tool for what it does - quickly writing formatted text in a way that is easy to read even for those who are not well versed in its syntax.

I don't want to write a book. If I did I'd use LaTeX before RST. I want something to take notes, make quick documentation and thread comments.

freehorse

I would argue that being harder to extend is actually an advantage of markdown, because it helps with it staying simple and having a relatively agreed upon standard form instead of getting lost in the complexities of different ways to extend it and the different standards this would bring. Being hard to extend means that it is easier to find local optimum rather than exploring the syntax space.

Moreover, simple, human readable parsing rules help a lot with reducing cognitive load of the form and focus on the content. Extending a syntax necessarily brings abstractions and more complex parsing rules which would conflict with that goal. In some contexts minimalism and simplicity are features in themselves.

For me, I often want to spend my time writing down the stuff I need to write and not play with extensions/logic/configs. I like that it forces me to actually not be able to do sth more complex because I am pretty sure that if I was incentivised to extend it instead, I would end up spending my time with that instead of writing.

Markdown is not good for stuff where complex logical structure in the content is important to be represented in the form. In the article it is beyond clear to me why the author did not use markdown for their book, I would be more interested in why they chose RST instead of latex or another language that is more towards the complex end than the minimalistic end. I guess what the author needed was some point in-between, and they found it in RST.

blenderob

>>To which I say, are you really going to avoid using a good tool just because it makes you puke? Because looking at it makes your stomach churn? Because it offends every fiber of your being?"

> Yes. A thousand times yes.

Your comment comes off as if it makes an opposing point to the article. My apologies if it wasn't meant that way.

But I want to note that the author agrees with you! The next sentence from the author which you didn't include in your quote says:

> Okay yeah that's actually a pretty good reason not to use it. I can't get into lisps for the same reason. I'm not going to begrudge anybody who avoids a tool because it's ugly.

chrismorgan

> How easy it is to parse doesn't matter.

How easy it is to parse does matter, because there’s a definite correlation between how easy it is to parse for the computer and for you. When there are bad corner cases, you either have to learn the rules, or keep on producing erroneous and often-content-destructive formatting.

> How easy it is to extend is largely irrelevant.

If you’re content with stock CommonMark, it is irrelevant to you.

If you want to go beyond that, you’re in for a world of pain and mangled content, content that you often won’t notice is mangled until much later, because there’s generally no meaningful way of sanity-checking stuff.

As soon as you interact with more than one Markdown engine—which is extremely likely to happen, your text editor is probably not using the parser your build tool uses, configured as it is configured—it matters a lot. If you have ever tried migrating from one engine to another on anything beyond the basics, you will have encountered problems because of this.

thiht

> there’s a definite correlation between how easy it is to parse for the computer and for you

I’m not sure that’s true tbh. Exhibit A: natural language. Exhibit B: Polish notation.

chrismorgan

I don’t see how either of those exhibits demonstrate your point.

I believe various research has shown that humans and machines parse natural language in rather similar ways. Garden-path sentences <https://en.wikipedia.org/wiki/Garden-path_sentence> are a fun demonstration of how human sentence parsing involves speculation and backtracking.

Polish notation is easy for both to parse; humans only struggle because they’re not so familiar with it.

(By adulthood, human processing biases extremely heavily toward the familiar. Computer parsing has to be implemented from scratch, so there’s not so much concept of familiarity, though libraries can encapsulate elements of parsing.)

saghm

> Polish notation is easy for both to parse; humans only struggle because they’re not so familiar with it

I think you're downplaying the significance of this. The lack of familiarity is exactly what I'd argue makes a huge difference in practice even if theoretically the way our brains parse things isn't that different. We spend so much time reading and writing words that it requires effort to learn how to parse each specific symbol-oriented thing we might want to learn how to read. To add to the parent comment's examples, I'll throw in Brainfuck, which is an extremely simple language for a machine to learn to parse that's literally named for how impenetrable it looks to people at first glance.

"Simple if I spend the time to learn it" is not the same as "simple without having to spend time to learn it", and for some things, the fact that the syntax essentially ignores some of the finer details is the main feature rather than a drawback. When everyone I work with can read and write markdown good enough for us not to have major issues, and junior engineers can get up to basically the same level of competence in it without needing a lot of hand holding, it's just not worth the effort for me to try to convince everyone to use RST even if it is better in theory. The total amount of time I've spent dealing with the minor annoyances in markdown in my life is less than the amount of time it would probably take me to convince even one of my coworkers that we should switch all of our READMEs to RST.

thiht

> I don’t see how either of those exhibits demonstrate your point.

Natural language is easy to do for a human and a hard computing problem.

Polish notation is extremely simple to implement, but relatively "hard" for a human, even knowing the rules and how to read it. See: `+ * - 15 6 / 20 4 ^ 2 3 - + 7 8 * 3 2`

chrismorgan

> Natural language is easy to do for a human and a hard computing problem.

You ever see someone learning a new language? They struggle hard on more complex sentences.

It’s easy for us because we’ve practised it so much.

> + * - 15 6 / 20 4 ^ 2 3 - + 7 8 * 3 2

To begin with, you’re missing an operator. I’ll assume another leading +.

  + + * - 15 6 / 20 4 ^ 2 3 - + 7 8 * 3 2

Now, if you use infix, you have to have at least some of the parentheses, in this case actually only one pair, given rules of operator precedence, associativity and commutativity:

  (15 - 6) * 20 / 4 + 2 ^ 3 + 7 + 8 - 3 * 2

But you may well just parenthesise everything, it makes solving easier:

  ((((15 - 6) * (20 / 4)) + (2 ^ 3)) + ((7 + 8) - (3 * 2)))

And you know how you go about solving it? Calculating chunks from the inside out, and replacing them with their values:

  (((    9    *     5   ) +    8   ) + (  15    -    6   ))
  ((         45           +    8   ) +          9         )
  (                      53          +          9         )
                                    62

Coming back to Polish notation—you know what? It’s exactly the same:

  (+ (+ (* (- 15 6) (/ 20 4)) (^ 2 3)) (- (+ 7 8) (* 3 2)))
  (+ (+ (* 9 5) 8) (- 15 6))
  (+ (+ 45 8) 9)
  (+ 53 9)
  62

For arithmetic at least, it’s not hard. You’re just not accustomed to it.

Arainach

This is a really weird hill to die on. HP tried hard to make RPN a thing and even among engineers eventually lost out to notation that is easier to work with.

People read in one direction - in English left to right. They read faster and comprehend better when they can move in that direction without constantly jumping back and forth.

> (15 - 6) * 20 / 4 + 2 ^ 3 + 7 + 8 - 3 * 2

(15-6)*20/4 can be read as one block left to right

2^3 can be read as one block left to right. Jump back to the operator (count: 1)

7 + 8 continue left to right

3*2 is a block, jump back to operator (count: 2)

So that reads left to right as speakers of most western languages do with only two context shifts. Now let's try RPN:

> + + * - 15 6 / 20 4 ^ 2 3 - + 7 8 * 3 2

ignore, ignore, ignore, ignore.

15, 6, context shift (1)

ignore?

20, context shift (2)

4, context shift (3)

ignore?

2 (wait, am I supposed to use that caret? I'm already confused and I've used RPN calculators before. Counting this as a context shift (4))

3, context shift (5)

two more operators and I don't really understand why any more

basically, RPN makes you context shift every single time you enter a number. It is utter chaos to understand of jumping back and forth and trying to remember what came before and happens next. Even if you're used to it it's dramatically worse for humans, and no one cares how much software it takes to parse.

Incidentally from my experience with RPN calculators I'd have expected

15 6 - 20 * 4 / 2 3 ^ + 7 + 8 + 3 2 * -

Though it's not really better since instead of context shifting after every number you have to context shift after ever operator to try to remember what's on the stack

fluidcruft

Polish notation looks like a nightmare for expressing something like a partial differential equation. Even combining fractions looks like it's going to be a nightmare.

Arainach

It's miserable to parse C++ and that's fine, because only a few people have to write a parser while 5 orders of magnitude more have to read and write it. Same thing with markdown - the user experience is what matters.

Edge cases largely don't matter, because again I'm not trying to make a book. I don't care if my table is off by a few pixels. 50% of the time I'm reading markdown it's not even formatted, it's just in raw format in an editor.

chrismorgan

If you write C++ in a way that it will misparse, you will normally get a hard error that you have to fix. (Also, the complexity is mostly fairly well encapsulated.)

If you write Markdown in a way that your engine will misparse, you may well not notice—the compiler probably doesn’t even have the notion of throwing an error. The fact that you’re normally working with the unformatted text makes it even more likely that you won’t notice. (And the complexity is badly encapsulated, too.)

I have seen errors due to Markdown complexity so often. I just dealt with a blog where some images were completely missing because they’d written <img> tags in order to add some class attributes, and the engine didn’t allow raw HTML, and they didn’t notice they were gone (probably because ![]() images were still there). Markdown is so technically unsound it’s sometimes quite distressing.

We’re not talking about a table being off by a few pixels. We’re talking about words being mangled, text disappearing, much more severe stuff. Markdown isn’t even capable of table stuff.

yawaramin

You are missing the point. We are talking about not even parsing the Markdown. We are talking about reading it raw. Literally raw-dogging it. At that point it doesn't even matter, we just want a format that's brain-dead simple.

HTML transformation is a bonus on top of that. If we want that we will mandate a specific Markdown engine with a strict parser.

chrismorgan

Actually I think you’re missing the point. “Parsing” is not something that computers alone do; humans do it. You see text and understand it to be text, you see <img> and understand it to be an HTML tag (and hopefully know whether your engine will pass it through, or leave it as text, or strip it), you see **double asterisks** and understand it to be bold or strong emphasis.

If you only care about reading it raw, you don’t bother with Markdown. Some of what you write will be the same as Markdown, but not all—for example, no one would use its ridiculous link or image syntax.

The reason you write with Markdown syntax is because you want to be able to format it (even if you will normally consume it in plain text yourself). And once you’re using Markdown syntax, you need to know the rules, to a greater or lesser extent. You must be able to parse Markdown syntax mentally. If you don’t know the rules well enough, your mental parse will be incorrect, and you’ll produce errors. Errors that won’t be detected by your computer. That’s the hazard with Markdown compared to C++: your mental parser is buggy and incomplete, but with C++ the computer will generally catch your errors while with Markdown it will never catch your errors.

SiempreViernes

Read their parsing statement in context:

> Markdown is that it's easy to read, and its second-biggest advantage is that it's easy to write. How easy it is to parse doesn't matter

After saying MD is "easy to read" the meaning of "parsing" is clearly limited to automated parsing by non-humans, and the only reasonable reading is "provided the markup is easy to read for humans, the difficulty in constructing an automated parser is irrelevant".

chrismorgan

Reading is not sufficient. If you want it to produce the appropriate HTML, you must parse too.

When you write a file name a_b_c in one place, and a mathematical expression a*b*c in another place, and you don’t want to use `code formatting`, you need to know Markdown’s rules. Because otherwise, you’ll write a*b*c and get abc, instead of writing a\*b\*c to get a*b*c.

(And those are only the exact rules if you’re using CommonMark. On another engine, it might behave differently.)

If you only want to read, don’t use Markdown. But if you want to process as well, you need to know the processing.

indymike

> C++ the computer will generally catch your errors while with Markdown it will never catch your errors.

Conveying meaning at the bitwise operator level is a different thing than applying emphasis to a few words in a sentence with bolding or embedding a hyperlink in a document.

chrismorgan

I’ve frequently seen mistakes in Markdown syntax that lead to content that has at best partially-broken formatting, at worst losing some of the content, sometimes even in ways that aren’t obvious.

Markup versus computer code is of course not exactly the same, but the nature of the mistakes—tokens in places they’re not supposed to be, and such—would generally lead to a syntax error in C++.

yawaramin

No, I am positive you are missing the point.

> no one would use its ridiculous link or image syntax.

And many don't, which is fine! But some do, if they remember the syntax. Markdown is tolerant of that, and ultimately if the file is rendered to HTML Markdown engines know to just turn raw URLs into hyperlinks.

> The reason you write with Markdown syntax is because you want to be able to format it

Maybe sometimes. Not always. That's the point. A lot of the time it's nice that most technical people who write docs in text files all agree on what headings, lists, emphasis etc. should look like in plain text so we don't have to constantly do a dance of negotiating what the markup is. And the bonus on top of that is we can also get a reasonable HTML page out of it.

> If you don’t know the rules well enough, your mental parse will be incorrect, and you’ll produce errors. Errors that won’t be detected by your computer. That’s the hazard with Markdown

I mean, 'hazard'. Kind of an over-the-top way to put it. It's a text file for documentation purposes, not a production system handling money or something. Nobody cares if the Markdown has a few syntactic errors. The point is to convey information to other humans in a reasonably efficient way.

thaumasiotes

> It's miserable to parse C++ and that's fine, because only a few people have to write a parser while 5 orders of magnitude more have to read and write it.

Really? I was under the impression that the fact that it is miserable to parse C++ directly means that it's also miserable to compile C++ - it can't be done quickly - which is something that everyone has to do all the time.

vkazanov

FYI: Parsing and compiling in the programming language sense are orthogonal problems. Both are major challenges in cpp compilers.

blenderob

> FYI: Parsing and compiling in the programming language sense are orthogonal problems.

How so? In Ada, Fortran, C, C++, Java, Python, etc. parsing is one of the many phases of compiling. Far from being orthogonal problems, parsing is a sub-problem of compiling.

Pet_Ant

The amount of time being consumed by parsing is vanishingly small. It's a lot like the decoding time spent on x86 code is marginal nowadays compared to the speculative and reordering logic.

YACC was called "Yet Another Compiler Compiler" because back in the day parsing was the bulk of compilation, now it's relatively minimal.

thaumasiotes

What I've read is that C++'s biggest compiling problem is specifically that the language is difficult to parse. You can't compile without parsing, so no, they're not orthogonal problems. Compiling is a parsing step followed by an emission step.

(And just to be completely clear, I'm not saying that the difficulty of parsing C++ makes it miserable to write a compiler. I'm saying that the difficulty of parsing C++ makes it miserable to run a compiler.)

noosphr

TeX is a type setting language, not a writing language. LaTeX inherits this. Unless you know ahead of time the exact dimensions you will be displaying your book at you shouldn't use either. ReST on the other hand can be resized to your hearts content which is what you need for digital publishing.

blenderob

> Unless you know ahead of time the exact dimensions you will be displaying your book at you shouldn't use either.

This is incorrect. You can sure write LaTeX that is intricately dependent on the output dimensions. But you can just as easily write LaTeX that is independent of output dimensions.

Case in point is compiling LaTeX doc to HTML which you'd admit is easily resizable.

Case in point is also writing LaTeX docs for journals or publication where you can easily resize the document to match the publisher's style guide and dimensions by changing the documentclass.

yawaramin

Technically, plain text can be resorted to your heart's content. You don't need ReST for that. But in practice if you're writing a serious technical book and you need a serious markup language, you will likely end up with DocBook XML for its flexibility and large range of outputs.

JohnKemeny

LaTeX cannot be resized?

noosphr

Not without rebuilding the whole document and editing the source code. A lot of packages only have a few sizes they support on top of that.

Tor3

I'll say that's inaccurate.

The dimensions of all the documents I've written over the years depend on the document class I use, and as our company keeps changing document layout standards every few years, the document class changes as well.. but the only source editing necessary is to replace the very first line in the document, i.e. changing the document class. And the whole document changes. Sometimes completely, much more than just sizes.

JohnKemeny

Oh, you mean that a PDF cannot be resized? Because the LaTeX document itself is very simple to "resize", by changing paper type, font size, num columns, etc, and you don't have to compile to PDF, you can compile to EPUB or HTML, if you prefer that over PDF.

But, yes, PDFs are intrinsically non-resizable.

bravesoul2

Yes. I think he prefers a car to walking. But there are few trips where you would think "should I drive, or walk?".

He should compare it to HTML or XML or Haml

jeroenhd

I can see the advantages RST offers in term of HTML generation, but whenever I've needed to work with custom blocks like that, I've always just written HTML.

I'm not sure if <img src="file.jpg" alt="alt text"/> is less readable than

    .. image:: file.jpg
       :alt: Alt text

HTML5 allows for leaving certain tags unclosed (such as <li>, or <head> or even <p>) to such an extent that I find many template languages to not be worth the effort of their complex syntax.

Sure, there are three or four lines here that you can omit using RST or markdown:

    <!doctype html>
    <html lang="en">
    <head>
    <title>My blog page</title>
    <body>
    <h1>Welcome to my blog</h1>
    <p>This is a bunch of text.
    Feel free to stuff newlines here.
    <p>This is also a bunch of text
    <p>Here's a list just for fun:
    <ol>
      <li>This is the first item!
      <li>This is the second one!
      <li>Boom, a third!
    </ol>
    <p>Have an image: <img src="filename.jpg" alt="alt text goes here">

But is having to wrap a list in <ol> and closing the <title> really that bad?

Automatically generating an index and such is nice, but five lines of Javascript can do the same. Plus, you don't need to run a second tool to "process" your input.

I generally use Markdown as a standardised way to format text that will probably be read in plaintext by other people, but when it comes to formatting documents, I don't see the point of most complex template languages.

tpoacher

Same. I have a couple of nice html templates (with locally-defined css and mathjax styling), and I now take all my notes directly in html in nano.

Once you've written a couple of documents, the usual tags become muscle memory and are no more of a bother to write than markdown. I've even created a couple of nano macros to automate some of the process.

"But it's not readable like markdown" you might say. Well. This might be true of 'some' html, especially autogenerated stuff, but the stuff I write is totally readable. Once you settle on some meaningful indentation and tag presentation conventions, readability is not a problem. We're talking about plain html documents, after all, not complex websites. The subset of html tags you'll need is generally very small and largely unintrusive.

I could even go a step further and say, my HTML is as readable as this guy's rST, but this guy's generated HTML code is far worse than how my direct HTML would have looked.

tambourine_man

> Markdown is ubiquitous because it's lightweight and portable…

Markdown is ubiquitous because it’s easy for humans to read and write.

tpoacher

Markdown is ubiquitous because it is easy for humans to read and write AND enough humans used it to make it so.

The second part is more important than the first. There could be far better systems which not enough humans used to make ubiquitous. And as far as we know, markdown could be one of the worse ones, but became ubiquitous because it became ubiquitous.

cf: MS Windows.

jajuuka

Agreed. Personally I really like asciidoc but hardly anything supports it. Markdown is just everywhere. In all the tools I use and all the most popular tools available. So it's far easier to use when it's so portable. So I only need to remember one set of operators to get the results I want. Even in systems where I don't know which syntax they support. There is a good chance Markdown is will be one of them.

deafpolygon

you're both wrong.

markdown is ubiquitous thanks to github.

tambourine_man

I don't think Windows is an apt comparison. One had huge market forces and distribution channels propelling it, the other is a description page, not even a standard, on Gruber's site.

What Gruber got right is that the syntax is beautiful to read, easy to write and powerful enough to be useful, with the optional inline HTML as an escape hatch. It may not seem much, but that's hard to get right.

lifthrasiir

Guaranteed, reST is more feature-complete and extension-friendly, but it is simply unusable for me because it wasn't designed for agglutinative languages like Korean. Markdown is much better in this case (though CommonMark has an annoying edge case [1]).

[1] https://talk.commonmark.org/t/foo-works-but-foo-fails/2528

chrismorgan

reStructuredText and Markdown both have a bad habit of clevernesses that fall down—just in different areas.

Both do at least some degree of only matching delimiters at word boundaries. I consider that to be a huge mistake.

reStructuredText falls for it, but has a universally-applicable workaround (backslash-space as a separator—note that it is not an escaped space, as you might reasonably expect: it’s special-cased to expand to nothing but a syntax separator).

Markdown falls for it inconsistently, which as a user of languages that divide words with spaces, is honestly worse. Its rules are more nuanced, which is generally a bad thing, because it makes it harder to build the appropriate mental model. It was also wildly underspecified, though that’s mostly settled now. For many years, Stack Overflow used at least two, I think three but I can’t remember where the third would have been, mutually-incompatible engines, and underscores and mid-word formatting were a total mess. Python in particular suffered—for many years, in comments it was impossible to get plain-text (i.e. not `-wrapped code) __init__.

In CommonMark, _abc_ and *abc* get you abc, but a*b*c gets you abc while a_b_c gets you a_b_c. That’s an admission of failure in syntax. Hmm… I hadn’t thought of this, but I suppose that makes _ basically untenable in languages with no word separator. Interesting argument against Prettier, which has a badly broken Markdown mode¹, and which insists on _ for emphasis, not *.

In my own lightweight markup language I’ve been steadily making and using for my own stuff for the last five years or so, there’s nothing about word boundaries. a*b*c is abc, and if a dialect² defined _ as emphasis, a_b_c would be abc.

Another example of the cleverness problem in reStructuredText is how hard wrapping is handled. https://docutils.sourceforge.io/docs/ref/rst/restructuredtex... is a good example of how badly wrong this can go. (Markdown has related issues, but a little more constrained. A mid-paragraph line starting with “1. ” or “- ”—both plausible, and the latter certain to occur eventually if you use - as a dash—will start a list.) The solution here is to reject column-based hard-wrapping as a terrible idea. Yes, this is a case where the markup language should tell people “you’re doing it wrong”, because otherwise the markup language will either mangle your content, or become bad; or more likely both.

Meanwhile in Markdown, it tries to be clever around specific HTML tags and just becomes hard to predict.

—⁂—

¹ Prettier’s Markdown formatting is known to mangle content, particularly around underscores and asterisks, and they haven’t done anything about it. The first time I accidentally used it it deleted the rest of a file after some messy bad emphasis stuff from a WYSIWYG HTML → Markdown conversion. That was when I discovered .prettierignore is almost completely broken, too. I came away no longer just unimpressed with some of Prettier’s opinions, but severely unimpressed with the rest of it technically. Why they haven’t disabled it until such things are fixed, I don’t know.

² There’s very little fundamental syntax in it: line break, indent and parsing CSS Counter Styles is about it. The rest is all defined in dialects, for easy extension.

do_not_redeem

What do you mean not designed for Korean? It's just unicode. If there's some situation where RST isn't parsing inline markup, you can write the role explicitly like this:

  this is **bold** text
  this is :strong:`bold` text

lifthrasiir

reST inline syntaxes are pretty much word-based, which doesn't work very well with agglutinative languages. For example if you want to apply a markup to "이 페이지" in "이 페이지는 ..." (lit. This page in This page is ...), you need to do `*이 페이지*\ 는 ...` AFAIK. That would happen every single time affixes are used, and affixes are extremely frequent in such languages.

do_not_redeem

Oh I see, you're talking about this:

  thisis\ **bold**\ text
  thisis\ :strong:`bold`\ text

It's possible, but you're right, definitely more awkward than markdown.

rune-space

But you can’t say:

   thisis:strong:`bold`text

Whereas the equivalent is perfectly fine in markdown.

Falsehoods programmers believe about written language: whitespace is used to separate atomic sequences of runes.

thaumasiotes

> Falsehoods programmers believe about written language: whitespace is used to separate atomic sequences of runes.

Really? That isn't just untrue of written language in general. It's untrue of every individual written language in specific. You can't even clearly define what an "atomic sequence of glyphs" is.

matja

> You can't even clearly define what an "atomic sequence of glyphs" is.

Kinda. Grapheme cluster breaks are defined in Unicode, but they have all the baggage and edge-cases you'd expect from human languages evolving over time, so they can be encoded in as a few as a thousand rules : https://github.com/unicode-org/icu/tree/main/icu4c/source/da...

rune-space

Which makes one wonder why REST puts so much weight on them being divided by WS!

thaumasiotes

> reST is more feature-complete and extension-friendly, but it is simply unusable for me because it wasn't designed for agglutinative languages like Korean.

How does whether you think of the language as agglutinative affect the usability of reST?

The biggest problem that occurs to me is that there isn't really a conceptual difference between an "agglutinative" language in which you have very long words expressing complex meanings, and an "isolating" language in which the same syllables occur in the same order with the same meaning but are thought of on a Platonic level as being all independent words.

This is because an "agglutinative" language is one in which syntax markers are more or less independent of any other syntax markers that may apply to the same word†, which means it's always possible by definition to consider those markers to be "words" themselves.

Would your problems be solved if you viewed what you had considered "long" Korean words as instead being several short words in a row? What difficulties does agglutination present?

† Compare: https://glossary.sil.org/term/agglutinative-language

> An agglutinative language is a language in which words are made up of a linear sequence of distinct morphemes and each component of meaning is represented by its own morpheme.

https://glossary.sil.org/term/isolating-language

> An isolating language is a language in which almost every word consists of a single morpheme.

lifthrasiir

> This is because an "agglutinative" language is one in which syntax markers are more or less independent of any other syntax markers that may apply to the same word†, which means it's always possible by definition to consider those markers to be "words" themselves.

I think SIL's definition is, while robust, not the usual definition because English can be regarded as agglutinative in this definition. This is particularly visible from the statement that most European languages are somewhat fusional [1], which is okay under their definitions but not the usual way we think of English.

In my understanding, the analyticity is a spectrum and highly analytic languages with most (but not necessarily all) words containing just one morpheme are said to be isolating. Words in agglutinative languages can be, but not necessarily have to be, analyzed as a main morpheme ("word") with dependent morphemes attached ("affixes"). Polysynthetic languages go further by allowing multiple main morphemes in one word. As languages tend to become synthetic (as opposed to analytic), the space-separated "word" is less useful [2] and segmentation gets harder and harder. reST's failure to support those languages is all about a bad assumption about segmentation.

[1] https://glossary.sil.org/term/fusional-language

[2] So much that several agglutinative languages---in which space-separated words can still be useful---don't even think about spacing, e.g. Japanese.

thaumasiotes

> I think SIL's definition is, while robust, not the usual definition because English can be regarded as agglutinative in this definition. This is particularly visible from the statement that most European languages are somewhat fusional, which is okay under their definitions but not the usual way we think of English.

Well, in the first place, I don't put much stock in the idea that "the usual way we think of" a language is a good way to determine the characteristics of that language. A good example here would be Finnish, which has a large number of particles that appear to be independent of the words they modify, but which are traditionally referred to as "case markers" by analogy to European languages that have case. Finnish is said to have an extraordinarily large number of cases, but that is because each Finnish preposition is called a "case".

In the second place, you can clearly see fusion in the English verb be. You can see it less clearly in other places - wikipedia's page on analytic languages calls out the third-person singular present verb ending for simultaneously encoding all three of those contrasts.

But I would say you're right in spirit that those are vestigial elements of the language. English verb structure looks very agglutinative to me; the biggest objection (which SIL's definition doesn't mention) would be that auxiliary verbs still inflect.

In particular, this:

> Words in agglutinative languages can be, but [do] not necessarily have to be, analyzed as a main morpheme ("word") with dependent morphemes attached ("affixes").

is actually the standard view of English verbs (except that the auxiliary verbs are not thought of as affixes), still taught in school, but contradicted by syntax classes that say that a dependent element shouldn't control the form of the element from which it depends. And then uncontradicted by practicing linguists who feel that we might as well follow the obvious semantic dependence.

Another objection, which I find more persuasive than "agglutinative particles shouldn't inflect", is that the meaning of a particular English word form isn't necessarily very tightly determined by the form. So in he is painting a picture, the -ing element we see on painting is fundamentally there to agree with the continuous aspect marker is, and it has other meanings in other contexts. In he likes painting pictures, the same element is there to derive a noun from the verb.

And another objection might be that the languages we call agglutinative commonly incorporate subject and object into the interior of the verb, surrounded by other affixes, which isn't done in English unless you want to count phrasal verbs. ;D

I am undisturbed by the ambiguity; you might note that I led with the observation that agglutinative languages aren't well-defined in the first place.

None of this helps to explain why there might be a conflict between Markdown and agglutination, though.

lifthrasiir

I'm not here for arguing against linguistic concepts, so let me cover just one thing:

> None of this helps to explain why there might be a conflict between Markdown and agglutination, though.

reST, not Markdown. (Yeah I totally get it though because I made the same mistake in the OP!) Those languages often need to highlight individual morphemes inside space-separated "words", but reST assumes space-separated "word" as a default, hence annoyance.

mattclarkdotnet

It’s amazing anyone can read, speak or write such a language!

chrismorgan

The key here is whether there’s a word separator, not agglutinativity or isolation. The term I find for this on a brief search is scriptio continua <https://en.wikipedia.org/wiki/Scriptio_continua>.

lifthrasiir

Yeah that would be a better way to phrase my opinion. Chinese is highly isolating but doesn't use spacing due to its writing system and therefore is heavily affected by this issue.

mattclarkdotnet

These are descriptive terms though? It’s not like the language actually works that way

bluGill

My only problem with rst is that several useful the extentions are not updated. I have some great rst documentation, but part of that is I importing doxygen, dolphin, and other extentions that are useful but saddly not updated on the same schedule as the main tool. I end up many versions back just because it is all that is compatible.

still markdown just isn't powerful enough for anything non trivial.

lifthrasiir

The original spirit of Markdown was to use HTML elements (or custom elements if you like) for whatever is missing from Markdown. That's surprisingly versatile in hindsight, but the specification didn't fully anticipate what happens to Markdown contents inside such elements. Some implementations supported them, some didn't, some used the `markdown` pseudo-attribute, and so on. And it was even less clear how block syntaxes work inside HTML elements. (CommonMark defines a very lengthy list of rules for them [1].) Markdown could have been extensible... if it did have a sensible specification from beginning.

[1] https://spec.commonmark.org/0.31.2/#html-blocks

chipotle_coyote

> still markdown just isn’t powerful enough for anything non trivial

I see this sentiment a lot, and my reaction is always, “Sure it is, with asterisks.” In the past decade I was the primary author of the RethinkDB documentation, a senior technical writer on Bixby’s developer documentation, and am now a contractor working on Minecraft’s developer docs. All of them were large, decidedly non-trivial, and Markdown. Microsoft’s entire learning portal, AFAICT, is in Markdown.

And the thing is, each of those systems used a different Markdown processor. My own blog uses one that’s different from all of those. According to HN, I should be spending virtually all my time fighting with all those weird differences and edge cases, but I’m not. I swear. The thing about edge cases is they’re edge cases. I saw a “Markdown torture” document the other day which contained a structure like this:

    [foo[bar(http://bar.com)](http://foo.com)

and proudly proclaimed that different Markdown processors interpret that construct differently. Yes, okay, and? Tell me a use case for that beyond “I want to see how my Markdown processor breaks on that.”

The asterisk is that almost any big docs (or even blogging) system built on Markdown has extensions in it, which are usually a function of the template system. Is that part of Markdown? Obviously not. Is it somehow “cheating”? I mean, maybe? At the end of the day, 99% of what I’m writing is still Markdown. I just know that for certain specific constructs I’m going use {{brace-enclosed shortcodes}}, or begin an otherwise-typical Markdown block quote with a special tag like “%tip%” to make it into a tip block. Every system that proclaims it’s better than Markdown because it allows for extensions, well, if you take advantage of that capability, look at you adding site-specific customization just like I’m doing with (checks notes) Markdown.

If reStructured Text works better for you, or AsciiDoc, or Org Mode, great! Hell, do it all in DITA, if you’re a masochist. But this whole “this is obviously technically superior to Markdown, which surely no one would ever do real work in, pish tosh” nonsense? We do. It works fine. Sorry.

chrismorgan

> {{brace-enclosed shortcodes}}

I haven’t checked if any of the details have changed any time recently, but Zola does this, and I had a rough time with it because of the interactions with Markdown rules around raw HTML and escaping and such. I have worked to forget the details. I reckon Zola bakes Markdown in too deeply, and it’s a pain. Especially because of indentation causing code blocks, that’s one of the biggest problems with extending Markdown by “just writing HTML”.

bluGill

markdown is great for a single short page. It doesn't have good links to the middle of a page (some extentions do but not the popular ones), nor can it generate tables of contents, indexes, and the other things a large site should have. Rst will do all that and because it is a site generator if you reorganize the links get fixed - or at least you get a warning onia dead link.

mitjam

MyST (https://myst-parser.readthedocs.io/en/v0.15.1/index.html) lets you write in markdown and still use roles and directives provided by Sphinx and its extensions.

mr_mitm

The author mentions MyST while acknowledging that rst is ugly. I also find MyST to be a sweetspot between comfortable syntax and expressiveness, so I wonder why the author doesn't prefer MyST over rst and common mark.

jeberle

The RST parser is available in only one language, Python. I don't want my content tied to a single language stack, regardless of how good it might be. Markdown parsers exist in any language I care to use.

xigoi

> Markdown parsers exist in any language I care to use.

Except each one actually parses a slightly different language.

https://git.sr.ht/~xigoi/markdown-monster/blob/master/monste...

Firehawke

MAME's documentation was moved to Sphinx (and thus RST) a number of years ago. I headed up that project, in fact.

There was a significant learning curve getting good output when converting some of the old ASCII charts out of .txt files, but once settled it makes for a much better user experience and it auto-compiles to HTML, PDF, and even EPUB with zero additional effort.

I would definitely not want to go to Markdown from RST for technical documentation that's more complex than a Github readme.

acidburnNSA

For books or significant document sets I definitely agree with the author on this. The builtin features for glossary and index are also nice. The extensibility is amazing. Some people are even doing formal requirements and lifecycle management in RST these days!!

https://www.sphinx-needs.com/

4b11b4

This looks kind of useful for creating good contexts about project requirements

cycomanic

I have run into frustrations with Markdown and even had a short "I do everything in RST" phase.

At some point during that phase I tried org mode and it's better than both, it is easier to read/write than RST, and better for large documents than Markdown. Unfortunately it doesn't get accepted in as many places as Markdown.

ekusiadadus

I’ve never used reST since Markdown has usually been sufficient for me. However, I recently had a situation where Markdown became frustrating to work with. Indeed, once a document exceeds ten pages, I feel that reST would make managing it much easier.

It would be nice if emphasis and other inline formatting worked smoothly even in agglutinative languages...

betaby

'asciidoc' is the middle ground.

klodolph

I think of it less like the middle ground, and more like the best of the three

asciidoc > rst > markdown

It’s just that the available tooling goes the opposite way,

markdown tooling > rst tooling > asciidoc tooling

I end up using HTML for anything serious instead, because it has better tooling support than any of the three, and is also more flexible. It’s just more verbose, which is fine.

aidenn0

I came here to say this; more complete than markdown, and fewer things require the weird gymnastics of rst.

eviks

> On the other hand, the markdown image is hardcoded as a special case in the parser, while the rst image is not. It was added in the exact same way as every other directive in rst: register a handler for the directive, have the handler output a specific kind of node, and then register a renderer for that node for each builder you want.

Why not "hardcode" the most common things to be the easiest to use and then still have the option to extend to other protocols? Why "suffer" every time equally instead?

freeopinion

I'm interested in people's opinion of typst's ease of writing and reading when limited to the simple types of documents for which people use Markdown.

Do you think it makes easy things easy and complex things possible?

blenderob

I'd have loved to use Typst but all journals I submit to ask for LaTeX. And it isn't just LaTeX they ask. They ask LaTeX with specific documentclasses. The mainstream academia and journals have so much tooling built around LaTeX and documentclasses and they have so much inertia that I am very pessimistic that Typst will ever get any "market share" in mainstream publishing!

freeopinion

Typst is made for the same target a Tex. Here I am asking if it is a reasonable replacement for Markdown or ReST in a target I don't think it was originally intended for. It would be interesting if the most popular forges enabled typst rendering for things like READMEs.

commandersaki

Would gladly use typst over md, rst, html, latex.

tornikeo

Typst is almost as easy as markdown, but my only gripe is with the web editor app: the inability to insert links quickly, i.e. with a copy-paste shortcut.

physicles

The author wrote a book, for which RST is undoubtedly the better choice. ("I wrote a book in Markdown" would be a surprising headline!)

But it's overkill for light documentation. Just look at their first example of embedding an image:

> ![alttext](example.jpg)

> .. image:: example.jpg

> :alt: alttext

In the first one, it's just the syntax for a hyperlink with ! in front.

In the second one, there are several bits of syntax to remember. We have .. and then whitespace, and not one but two colons after `image`, and it's not `alt:` but `:alt:`.

I don't have to try to remember Markdown syntax, because it's simpler and it's ubiquitous. I type Markdown directly into Slack and Obsidian every day. Most tech-adjacent people know some Markdown.

Many years back a developer on my team decided that all the readmes that live next to source code should be in RST, because it's Better(TM) and we could have nicely formatted generated docs. The result was that a lot less documentation got written, and nobody looked at the generated docs anyway. Eventually we transitioned back.

lilyball

Your argument here is basically just "I already know Markdown". Sure, the Markdown image syntax is similar to its hyperlink syntax, so if you know the hyperlink syntax then the image syntax is easy, but the same argument works for reST but even better, the image syntax is the same as any other directive, so if you know how to write a directive then you know how to write an image.

eviks

No, basically it's "I can remember markdown easier because it's simpler"

CuriouslyC

Lots of people write books in markdown. Obsian/Longform -> Pandoc works well.

ajross

Everyone who works seriously with editing and formatting documentation for presentation prefers RST.

Markdown is for the people, almost never full time doc jockeys, who need to WRITE that documentation.

hoherd

Does RST have a WYSIWYG editor? Linter that auto-corrects mistakes? Last time I looked neither of these existed. I have a bunch of rst docs that I want to to large edits on, and doing so has been so painful I've decided to migrate to markdown even though it's not as feature rich.

Firehawke

Not much in the way of WYSIWYG, unfortunately. There've been some VSCode extensions in the past to simplify editing but that's about as good as it gets.

arkt8

The only change I do and expect of markdown is enforce blank lines as separators after *any* header and blocks and use the `*` as formatting.

Everything can be extended with fenced block.

RST is a lot more difficult to write and much more "groffy".

chuckadams

The syntax for links in RST is:

    This is `a link`_

    .. _a link: https://foo.com

The underscores are required exactly like that. I believe the blank line between is also required. There's also an inline syntax where you use two trailing underscores:

    This is `an embedded link <http://foo.com>`__

I'd rather write raw HTML.

lawn

As a compromise, I prefer Djot to markdown.

It's basically markdown, but made to be easier to parse with explicit support for nice addons such as tables, divs, and attributes.

aledue

Me too. It's a shame though that it both feels unfinished and evolves at a glacial speed.

https://djot.net

stonecharioteer

I was on that hill too. For years I used Sphinx and Furo to render my blog, which you can still access at https://old.stonecharioteer.com

I like the framework, but it ended up being too in the way. I am not an RST maintainer. I want to blog and get my thoughts out in the world.

I split my website to use different subdomains, and most of the posts in that old blog are now in https://tech.stonecharioteer.com which is on Hugo now. I used Claude to fix some Css annoyances with the Paper mod theme, and to migrate not only the posts from that old blog but also from the Jekyll blog that predates it.

I'm happy with the blog now, it's so out of my way that I can write without trying to figure out how to make Hugo do something like Sphinx-style admonitions. Claude is great for that. What else is there to complain about?

pavel_lishin

These all feel like issues that power-users have, not an issue that lil' ol' me is going to run into while jotting down journal entries, or yelling at people on the internet with emphasis.

freehorse

Except if you want to yell at people using different colours and font sizes in a sentence, like in the old forum times, for which markdown is too restrictive.

lyu07282

On reddit you could really yell at people using a h1 headline in comments, not sure if that's still possible.. hn __is__ more *restricted* sadly

cubefox

Never mind RST, it would be nice to finally get Markdown and Latex support on Hacker News. Unfortunately the admin seems to be against it.

dangus

> Now here's how to make an image in rst:

    .. image:: example.jpg
      :alt: alttext

That is some horrendous syntax.

I totally get the author’s power user needs, and the article states plainly that this isn’t for everyone, but there’s gotta be something with power AND earthly syntax, right?

chrismorgan

If you’re not accustomed to reStructuredText, it looks dreadful. If you saw it in the context of a larger document that used directives for more things, it’d make more sense.

Also the author has very bad taste in having used two spaces of indentation. It should have been three, which makes it significantly less ugly:

  .. image:: example.jpg
     :alt: alttext

“.. ”: this block is magic.

“image::”: directive, type image.

“example.jpg”: directive argument, file name.

“:alt: alttext” directive option, named alt, value alttext.

Rewritten with a completely different sort of syntax, for fun:

  ┌ IMAGE ─ example.jpg ──┐
  │ alt = alttext         │
  └───────────────────────┘

dangus

Yikes, this explanation does not make it better. You’re telling me the indentation convention is three spaces?

And “..” is just your “something cool is about to happen” symbol?

I’ve been reading through the documentation more and this thing seems insane.

That .. symbol is also used for comments!? “Oh if it’s invalid it’s a comment!” No way to make multi-line comments without another ridiculous indentation.

The tables are insane, somehow they implemented something worse than markdown.

To make a header you need to make a long line of characters as long as your text for some reason.

For being a language that’s supposed to be more powerful than markdown and not so HTML-adjacent it sure depends on whitespace a lot. Like, why do literal blocks need be indented? Why do doctest blocks need to end with a blank line?

> The handling of the :: marker is smart:

> If it occurs as a paragraph of its own, that paragraph is completely left out of the document.

> If it is preceded by whitespace, the marker is removed.

> If it is preceded by non-whitespace, the marker is replaced by a single colon.

lol, a directive that does 3 entirely unrelated things depending on white space. Genius.

chrismorgan

Three space indentation is indeed the convention, mostly because of `.. `, but honestly `1. ` is nicer with three spaces too… unless you want to go to ` 1. `. Frankly, other than when inlining code that uses a different number of spaces, I think I visually prefer three spaces for something like reStructuredText, even apart from directives.

`.. ` is a bit overloaded and I think better was possible.

> The tables are insane, somehow they implemented something worse than markdown.

To write without tools, I’ll accept it’s not good. But to read, it’s good. Also be aware you can get a table from nested lists <https://docutils.sourceforge.io/docs/ref/rst/directives.html...> or CSV <https://docutils.sourceforge.io/docs/ref/rst/directives.html...>. That demonstrates a nice bit of the power of the reStructuredText approach, I’d say.

On the literal blocks `::` thing, it’s a little too magic for my taste, but in practice works pretty well. Based on your quote, you weren’t going through <https://docutils.sourceforge.io/docs/ref/rst/restructuredtex...>; I think that’s a better presentation of it, showing how it’s not three entirely unrelated things, but rather just two minimised forms of the complete thing. I’d say it does two largely-unrelated things only! (Aside: “a directive” in reStructuredText is the `.. foo::` construct. The `::` is not a directive.)

> Like, why do literal blocks need be indented?

I think you may be biased against significant indentation. It has advantages and disadvantages over fencing—and the most important disadvantages are really about <textarea>. I think things have swung way too far in the direction of fencing. I’ve been making and using my own prettiness-focused lightweight markup language for the last five or so years, and where I started out with only a little significant indentation (list markers), I’ve leaned ever more in the direction of significant indentation, because most of the time I find it to be better, and I reckon the results are very pretty.

Remember also the Python background to reStructuredText in things like indentation and more specifically doctests.

dangus

I can concede that it seems to make sense for Python, of course I can also imagine why it wouldn’t be a popular choice for general purpose use cases.

stock_toaster

Yeah, even asciidoc is nicer than rst!

linhns

This and poor editor support is why rst never takes off.

nialv7

Uh, why haven't we drilled it into people's brains that regex cannot be used to parse matching parentheses/brackets?