sd:what_s_a_language
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| sd:what_s_a_language [2026/04/14 06:09] – created appledog | sd:what_s_a_language [2026/04/18 07:10] (current) – appledog | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | = What's a Language? | ||
| + | Wake up, Neo. | ||
| + | Wake up.. and smell the ashes. | ||
| + | |||
| + | == Low Level Languages | ||
| + | In " | ||
| + | |||
| + | < | ||
| + | broke away from assembly-language by eliminating not only the one-for-one | ||
| + | correspondence between commands and machine operations, but also the linear | ||
| + | correspondence.</ | ||
| + | |||
| + | Assembly-language is a low level language. A computer runs instructions on its CPU. In assembly, the instructions look like this: | ||
| + | |||
| + | LOAD 5 into A ; A is a CPU register | ||
| + | LOAD 6 into X ; X is a CPU register | ||
| + | ADD A and X ; This stores the result in A | ||
| + | STORE A into Location 49152 ; This stores the value in A to memory | ||
| + | |||
| + | Each one of the above are state machine instructions. The CPU is, essentially, | ||
| + | |||
| + | <codify armasm> | ||
| + | ; Multiply A × X → 16-bit result in factor2 (high), factor1 (low) | ||
| + | ; By Leif Stensson, ~130 cycles average | ||
| + | |||
| + | STA factor1 | ||
| + | LDA #0 ; clear high byte of result | ||
| + | LDX #8 ; 8 bits to process | ||
| + | LSR factor1 | ||
| + | loop: | ||
| + | BCC no_add | ||
| + | CLC | ||
| + | ADC factor2 | ||
| + | no_add: | ||
| + | ROR ; rotate result right (high byte) | ||
| + | ROR factor1 | ||
| + | DEX | ||
| + | BNE loop | ||
| + | STA factor2 | ||
| + | RTS | ||
| + | </ | ||
| + | |||
| + | That's 13 instructions, | ||
| + | |||
| + | This is important because MUL, itself, is an abstraction. It's a new function; a new word. Assembly language, the language of the CPU, //**is** a programming language.// It's just a very low-level one. | ||
| + | |||
| + | == What's a (high level) language? | ||
| + | **A "high level" programming language attempts to abstract assembly-language by leveraging certain mathematical truths //about assembly language// into data structures (or vice versa).** | ||
| + | |||
| + | The purpose is simple. We don't want to think like a CPU. We want to think like a human being. And therein lies the fruit at the center of the Garden of Eden. If abstraction is turning the computer' | ||
| + | |||
| + | " | ||
| + | |||
| + | Not everyone has the time, the drive, the energy, to become //that// involved. Truth be told, even among those who call ourselves " | ||
| + | The forth wall? | ||
| + | Let's look at BASIC. Basic in its crudest form has 26 one-letter integer variables. These act like registers, but they' | ||
| + | |||
| + | Let's look at FORTH. Forth uses a stack. The CPU has a stack too. Forth has two stacks. Well, the thing is, the CPU //has// a stack, but it doesn' | ||
| + | |||
| + | Let's look at Python. Python does not have a GOTO. Frankly, JUMP and BRANCH in it's various forms are staples of any CPU instruction set. You cannot write practical code without branches and jumps. What Python and other languages that remove goto set out to do is restrict what kind of loops you can write. They want to control the flow of the program along some ordered thread. Forth is like this too with it's stack; it ties the stack and flow control together. | ||
| + | |||
| + | Look at Haskell. In Haskell, expressions are not evaluated when they are executed, but only when their value is actually needed to produce the final result. | ||
| + | |||
| + | //Why?// | ||
| + | |||
| + | At some point, you have to step back and ask yourself, why? Why is everyone trying to hide the truth? GOTO exists. Someone, somewhere, somehow, has to understand what a GOTO is. A CPU cannot plausibly exist without it. A CPU is a state machine. | ||
| + | |||
| + | Look at C++, Java and some others. | ||
| + | |||
| + | var a = b + c.d | ||
| + | |||
| + | c.d might call a function. That's hidden control flow. Why are people trying to make you think in strange ways that a) aren't natural anyways and b) aren't in-line with the CPU? | ||
| + | |||
| + | Ultimately, learning about programming and learning about the CPU is the responsibility of every programmer. Look at the legendary programmers. Mel, the greatest coder who ever lived, earned that title because he understood the hardware. | ||
| + | |||
| + | There' | ||
| + | |||
| + | == Rising above the convention of the CPU state machine | ||
| + | TO rise above, if one must, means that one must not /remain// bound by the CPU state machine. One may still optimize by serving the CPU. But it must no longer be the primary paradigm. In this sense, all of the bounds of the CPU must be immediately abstracted. | ||
| + | |||
| + | 1. Bytes are integers. Words are longer integers. Long integers are 32 bit. And so forth. u64, u128, don't mind if I do. But these require direct memory access. That's it. On one side, you can say that direct memory access is the programmer' | ||
| + | |||
| + | 2. Strings are pointers. The difficulty with strings versus integers is that they ultimately require different kinds of interface. You don't really want to mix all variables into a table, although it is very convenient! So maybe do it. Its a simple system. A pointer points to a string; based on this they system can tell you what kind of variable it is (type) and where it is (or, its value), its width (in bits) or its length, and so on. It's just a helper library. Like how BASIC inserts and deletes line numbers. | ||
| + | |||
| + | 3. Memory management. Memory does not understand what you want to put in it. CPUs do not understand STRINGS. Strings are conventions. A ' | ||
| + | |||
| + | == Tying it all together | ||
| + | Known as " | ||
| + | |||
| + | Forth on the other hand, has a " | ||
| + | |||
| + | So we see, that in the design and construction of a programming language, what is very important is the 'state machine' | ||
| + | |||
| + | == More RANTING | ||
| + | On Page 53-54 of " | ||
| + | |||
| + | Structured English is a sort of structured pseudocode in which our rate state- | ||
| + | ment would read something like this: | ||
| + | | ||
| + | IF full rate | ||
| + | ( code block ) | ||
| + | ELSE ( not-full-rate) | ||
| + | ( code block ) | ||
| + | THEN | ||
| + | ( code block ) | ||
| + | |||
| + | His conclusion on the matter is straightforward: | ||
| + | |||
| + | This is just plain awkward. It’s hard to read, harder to maintain, and hardest | ||
| + | to write. And for all that, it’s worthless at implementation time. I don’t even | ||
| + | want to talk about it anymore. | ||
| + | |||
| + | Okay, but this is //exactly// the analogy Brodie uses to explain what programming is on page 8-11, and in page 19 he dares to write this: | ||
| + | |||
| + | Here’s an example of Forth code; | ||
| + | | ||
| + | : BREAKFAST | ||
| + | HURRIED? IF CEREAL ELSE EGGS THEN CLEAN ; | ||
| + | | ||
| + | This is structurally identical to the procedure MAKE-BREAKFAST on page 8. | ||
| + | |||
| + | Lest we accuse Brodie of not being self-aware as a sort of cute excuse ("We still love you Brodie!" | ||
| + | |||
| + | The tradeoff is ... // | ||
| + | |||
| + | In case you missed it, here's what Brodie says is the solution: | ||
| + | |||
| + | : PROCESS-KEY | ||
| + | KEY DUP LEFT-ARROW = IF CURSOR-LEFT ELSE | ||
| + | DUP RIGHT-ARROW = IF CURSOR-RIGHT ELSE | ||
| + | DUP UP-ARROW = IF CURSOR-UP ELSE | ||
| + | CURSOR-DOWN | ||
| + | THEN THEN THEN DROP ; | ||
| + | |||
| + | Here, let me refresh your memory: | ||
| + | |||
| + | IF full rate \ This is just plain awkward. | ||
| + | ( code block ) \ It’s hard to read, harder to maintain, | ||
| + | ELSE ( not-full-rate) | ||
| + | ( code block ) | ||
| + | THEN \ it’s worthless at implementation time. | ||
| + | ( code block ) | ||
| + | |||
| + | On achieving simplicity, Brodie writes "Given two solutions to a problem, the correct one is the simpler." | ||
| + | |||
| + | What is going on here is simple. You can factor out nested ifs but all you are doing is naming the loops; essentially you are tying the execution of a loop to the comment that describes what it does. if every word in FORTH was three random letters, the program would still work. You just wouldn' | ||
| + | |||
| + | No, the way it should be is both ways should be available. You should be free to use one way or another, and it should compile to the same code. Then again, that's exactly what comments are. 100% pure syntax sugar. Thats for the humans. The abstractions are for us. For the humans. The code should align with the state machine. Brodie is right, but Brodie is also wrong. The fact that Brodie " | ||
| + | |||
| + | Your favorite food is all well and good, but it is food you must eat and in that there is no choice, only the humbling acceptation of reality that we are all human. Yes, we are human. We must interact with the world. Asking the world to interact with us, as if it owes us a living, is a strange concept. The new language must operate in accord with the CPU. This is one thing Forth got very very correct. STC and direct threading (i.e. inlining) is the way to go. The issue is how much freedom you have in the chunks (words). You can define your own words -- thats amazing. But you're locked into the little Cage Forth puts you in -- you HAVE TO use the stack machine. | ||
| + | |||
| + | == Lorthio, Forth' | ||
| + | I will now introduce you to Lorthio. Lorthio wears a hat with an upside-down F on it. He is Forth' | ||
| + | |||
| + | | BASIC | POKE 49152, 10 | | ||
| + | | Forth | 10 49152 POKE | | ||
| + | | Lorth | POKE 49152, 10 | | ||
| + | |||
| + | This is a pretty accurate statement. IN forth, you can write | ||
| + | |||
| + | 10 49152 C! | ||
| + | |||
| + | Therefore, | ||
| + | |||
| + | : POKE C! ; | ||
| + | |||
| + | This POKE adds nothing. It's an alias. 10 49152 C! already means "store 10 at 49152." | ||
| + | |||
| + | 49152 10 POKE | ||
| + | |||
| + | This would be: | ||
| + | |||
| + | : POKE SWAP C! ; | ||
| + | |||
| + | But now let's bring Lorth in on this. Let's do this: | ||
| + | |||
| + | : POKE WORD NUMBER WORD NUMBER C! ; | ||
| + | |||
| + | Now suddenly we can do: | ||
| + | |||
| + | POKE 10 49152 | ||
| + | |||
| + | But we want it in BASIC order, so we add a swap: | ||
| + | |||
| + | : POKE WORD NUMBER WORD NUMBER SWAP C! ; | ||
| + | |||
| + | Now we can do: | ||
| + | |||
| + | POKE 49152 10 \ This is valid Forth | ||
| + | |||
| + | But Lorthio is not done yet. After all, Lorthio is an evil genius, bent on taking over the earth. Let's try this: | ||
| + | |||
| + | : STRIP, DUP | ||
| + | BEGIN | ||
| + | DUP C@ 0= IF DROP EXIT THEN | ||
| + | DUP C@ 44 = IF 0 OVER C! THEN | ||
| + | 1 + | ||
| + | AGAIN ; | ||
| + | | ||
| + | : ARG WORD STRIP, NUMBER ; | ||
| + | | ||
| + | : POKE ARG ARG SWAP C! ; | ||
| + | | ||
| + | POKE 49152, 10 \ Now this is valid Forth. | ||
| + | |||
| + | Lorthio lives! | ||
| + | |||
| + | : X1 ARG VAR_X1 ! ; | ||
| + | : Y1 ARG VAR_X2 ! ; | ||
| + | : X2 ARG VAR_X3 ! ; | ||
| + | : Y2 ARG VAR_X4 ! ; | ||
| + | : COLOR ARG VAR_A5 ! ; | ||
| + | |||
| + | : DRAWLINE X1 Y1 X2 Y2 COLOR | ||
| + | A1 @ A2 @ A3 @ A4 @ A5 @ | ||
| + | ( now stack has x1 y1 x2 y2 color ) | ||
| + | ( ... draw the line ... ) | ||
| + | ; | ||
| + | |||
| + | DRAWLINE 50, 10, 100, 20, 1 | ||
| + | |||
| + | This is valid Forth. But Lorthio is not done yet! | ||
| + | |||
| + | : SPLITARG | ||
| + | WORD DUP | ||
| + | BEGIN | ||
| + | DUP C@ 0= IF DROP NUMBER EXIT THEN | ||
| + | DUP C@ 44 = IF | ||
| + | 0 OVER C! \ null-terminate first part | ||
| + | 1 + \ pointer to second part | ||
| + | SWAP NUMBER | ||
| + | SWAP NUMBER | ||
| + | EXIT | ||
| + | THEN | ||
| + | 1 + | ||
| + | AGAIN ; | ||
| + | | ||
| + | : POKE SPLITARG SWAP C! ; | ||
| + | | ||
| + | POKE 49152, | ||
| + | |||
| + | Lorthio will never stop laughing. And one day, if he works hard enough, he can make FORTH run BASIC. | ||
| + | |||
| + | VARIABLE LABEL | ||
| + | : GOTO! LABEL ! ; | ||
| + | : @LABEL LABEL @ ; | ||
| + | | ||
| + | : GAME | ||
| + | 1 GOTO! | ||
| + | BEGIN | ||
| + | @LABEL 1 = IF ." Room 1" CR 2 GOTO! THEN | ||
| + | @LABEL 2 = IF ." Room 2" CR 3 GOTO! THEN | ||
| + | @LABEL 3 = IF ." Done" CR EXIT THEN | ||
| + | AGAIN ; | ||
| + | GAME | ||
| + | |||
| + | but, Lorthio does, ultimately, have his limits. Forth is, after all, Forth. You simply cannot, ultimately, GOTO. The reason why is because a Forth program has two parts; the ": ;" blocks which are compiled words, and the program itself which is interpreted. Even if you could somehow define a back-jumping label system its just a kind of loop. Nothing special. No, a real forth hacker has to break out of the system. A real forth hacker has to stand up and say, "I don't like the idea that someone else is in control of my life." What is the Matrix? A real Forth hacker has to implement Forth. "To understand forth, you must implement Forth." | ||
| + | |||
| + | " | ||
| + | |||
| + | <codify armasm> | ||
| + | dict_fgoto: | ||
| + | .bytes $00, $00, $00 | ||
| + | .bytes $85 ; IMMEDIATE + len=5 | ||
| + | .bytes " | ||
| + | | ||
| + | code_fgoto: | ||
| + | LDGLZ [@FORTH_HERE] | ||
| + | LDKL @OP_JMP | ||
| + | STKL [GLZ,+] | ||
| + | ; Push placeholder address onto data stack | ||
| + | STAB [-,ELY] | ||
| + | MOV A, Z | ||
| + | MOV BL, GL | ||
| + | LDBH #0 | ||
| + | | ||
| + | ; Write zero placeholder | ||
| + | LDKL #0 | ||
| + | STKL [GLZ,+] | ||
| + | STKL [GLZ,+] | ||
| + | STKL [GLZ,+] | ||
| + | STGLZ [@FORTH_HERE] | ||
| + | RET | ||
| + | | ||
| + | dict_flabel: | ||
| + | .bytes $00, $00, $00 | ||
| + | .bytes $86 ; IMMEDIATE + len=6 | ||
| + | .bytes " | ||
| + | | ||
| + | code_flabel: | ||
| + | ; Pop placeholder address | ||
| + | MOV GLZ, BLA | ||
| + | LDAB [ELY, | ||
| + | | ||
| + | ; Write current HERE into placeholder | ||
| + | LDTLK [@FORTH_HERE] | ||
| + | STTLK [GLZ] | ||
| + | RET | ||
| + | </ | ||
| + | |||
| + | And just like that, | ||
| + | |||
| + | : GAME | ||
| + | FGOTO skip | ||
| + | ." You never see this" CR | ||
| + | FLABEL skip | ||
| + | ." You see this" CR | ||
| + | ; | ||
| + | GAME | ||
| + | |||
| + | ... that's game. Lorthio wins. The universe, broken, disappears in a puff of smoke. | ||
| + | |||
| + | And yet we are all still here. Look , none of this is a rant against Forth. We all need some kind of structure in our lives. I am just saying, that structure has to be useful. Truth must have a value proposition. I have a feeling that Forth is one of the all-time most important computer languges. Java (i.e. bytecode), Python is essentially BASIC, so BASIC is another. C is another, as is C++. LISP. Forth. All important. All have meaning, history, and culture. I just get the feeling there' | ||
| + | |||
| + | //"I have seen it."// -- Emperor Palpatine. | ||
| + | |||
| + | == Now Let's Have Fun | ||
| + | Strings. | ||
| + | |||
| + | Chuck Moore said, | ||
| + | |||
| + | < | ||
| + | |||
| + | " | ||
| + | |||
| + | A character string is enclosed in quotes. It can contain any character except a quote, specifically including spaces." | ||
| + | |||
| + | //Q: Why not just have the parser scan from opening " to closing " and push the whole string onto the stack (or leave it in the input buffer as a literal)?// | ||
| + | |||
| + | "We get in trouble immediately! How do you recognize a character string? By the leading quote, of course. But do you modify your word subroutine to recognize that quote? If you do so you may never use a leading quote for any other purpose. Much better that the quote is a word by itself, treated like any other dictionary entry, for it can then be re-defined. But words are terminated by spaces, and I still resist making quote an exception. So let's type character strings: | ||
| + | |||
| + | " ABCDEF...XYZ""</ | ||
| + | |||
| + | In other words, Forth' | ||
| + | |||
| + | Making " a special syntactic delimiter that triggers automatic scanning to the matching " would require hacking the core tokenizer with an exception. Moore hated exceptions because they break re-define-ability, | ||
| + | |||
| + | Instead, " (or more precisely in modern ANS Forth, the word S") is itself a regular dictionary word. When executed or compiled, it does the special parsing: it changes the delimiter temporarily to " , scans the following text, and leaves a string descriptor(c-addr u)—address and length—on the data stack. The actual characters stay in memory (usually compiled into the dictionary for literals, or in a temporary area like PAD). | ||
| + | |||
| + | S" Hello, world!" | ||
| + | |||
| + | (with a space after S") or, inside a definition: | ||
| + | |||
| + | : GREET | ||
| + | |||
| + | The same principle applies to ." (dot-quote) for printing. | ||
| + | |||
| + | This design choice has been echoed and analyzed ever since in Forth literature, tutorials, and discussions (e.g., Starting Forth, GForth manual, comp.lang.forth threads, and modern explanations like Dave's ratfactor.com article). People sometimes complain that Forth' | ||
| + | |||
| + | Moore' | ||
| + | It's one of the purest examples of Forth' | ||
| + | |||
| + | == The Truth | ||
| + | "But words are terminated by spaces, and I still resist making quote an exception." | ||
| + | |||
| + | Here's the problem though, Strings //are// and exception, and strings do not always contain words. Sometimes, strings contain control characters like 10 and 13 (CR and LF). Sometimes others. Also, apparently Mr. Moore was more human than some of us would like to admit, since he obviously didn't realize you can just make a word that pulls in words after it. S" could have picked up everything until the next ". Saying that "you can't have a word that begins with..." | ||
| + | |||
| + | //Sometimes you just need to add data.// Not everything is a file. Sometimes, one file programs -- aka binaries -- are just easier to distribute. Sometimes you want to compile things. Not everyone wants to release source code. Ultimately, saying "this is the real world" sounds trite -- and it is not being said to induce //peer pressure// to do something bad (like lazy resolution, or wrapped nulls) but it is being said because Forth fell straight back into the old trap of trying to make computers think like humans. Forthers found a neat way of using the stack -- which is valid -- but then assumed that was the only way to do things. They are wrong. It's not the only way. It's a way that actually can't be used for some applications. Forcing it to work is a kludge. Whether it works or not. | ||
| + | |||
| + | == But what is Forth? | ||
| + | //Lothorio whispered something in my ear. He told me that there is absolutely nothing wrong with a word that pulls in words until it reaches a " at the end of a word.// Then again, that's Lothorio speaking. Why should we trust that ne' | ||
| + | |||
| + | There' | ||
| + | |||
| + | The other thing is, the Dictionary is cool, but throwing everything onto a dictionary (or onto a stack) is obtusely simple. Returning to the idea that strings are not numbers, there //should// be an exception for them. | ||
| + | |||
| + | Or else people will just create a series of words; words that allow you to store and manage integers. // | ||
| + | |||
| + | //Do you hear me God? I am screaming from the top of the mountain! WHAT IS FORTH?// | ||
| + | |||
| + | < | ||
| + | |||
| + | //"I have seen it..."// | ||
| + | </ | ||
| + | |||
| + | Then again, look what happened to //him.// Maybe I'll try forth again this weekend. | ||
