From Script to Binary, Creating single executables with Grotsky
Recently I added the possibility to embed compiled scripts to Grotsky, this makes it super easy to generate single executables that can be easily distributed.
Using the release v0.0.13 of Grotsky, a toy programming language that I've been developing for a while, you can compile scripts into bytecode and embed them into a single binary that can be easily distributed.
For now, it's only possible to embed a single script, so if your script needs to import something it won't work.
How embedding works
Grotsky by default generates a magic pattern at compile time. It's 512 bytes and is stored as a static variable.
To generate that pattern we use the const-random crate.
We use that to define a marker and identify if the Grotsky binary is running in embedded mode or not.
#[repr(C)] struct Marker { magic_pattern: [u8; 512], is_embedded: u8, } const fn new_marker() -> Marker { Marker{ magic_pattern: const_random!([u8; 512]), is_embedded: 0, } } static EMBEDDED_MARKER: Marker = new_marker();
Then we can use a very hacky trick to take a compiled script and generate an single executable with the embedded bytecode.
pub fn embed_file(compiled_script: String, output_binary: String) { // Get the path of the current executable (Grotsky interpreter) let exe_path = env::current_exe().unwrap(); let mut exe_contents = read(exe_path).unwrap(); let pattern = &EMBEDDED_MARKER.magic_pattern; // Find the magic pattern inside the executable. Given that is a static // variable with a value defined at compile time, it has to be stored in // the binary, we can find it and switch the `is_embedded` flag. if let Some(pos) = find_position(&exe_contents, pattern) { // We defined the Marker struct with a C representation // which means that right after the magic PATH we have a byte // that indicates if the interpreter is running in embedded mode or not. exe_contents[pos+512] = 1; // We add the magic pattern at the end of the executable again. // As a stop mark that right after that the bytecode will come. for i in 0..512 { exe_contents.push(pattern[i]); } // Now we read the compiled code and add it to the end of the new executable. let mut compiled_content = read(compiled_script).unwrap(); exe_contents.append(&mut compiled_content); // We write a single file with the bytecode concatenated at the end. write(output_binary, exe_contents).unwrap(); } } // Function to find the position of magic pattern in a stream of bytes fn find_position(haystack: &Vec, needle: &[u8; 512]) -> Option { if haystack.len() < needle.len() { return None; } for i in 0..=haystack.len() - needle.len() { if &haystack[i..i + needle.len()] == needle.as_ref() { return Some(i); } } None }
We're using the magic pattern as a stop mark. Our resulting binary will have the same magic pattern twice. First is the original that gets loaded as a global static variable. The second one is almost at the end of the file and indicates the beginning of the embedded bytecode.
We also need a function to detect if we're running under "embedded" mode. In that case the interpreter should only read the embedded bytecode and execute it.
pub fn is_embedded() -> bool { let embedded_indicator = &EMBEDDED_MARKER.is_embedded as *const u8; unsafe { // Need to perform this trick to read the actual memory location. // Otherwise during compilation Rust does static analysis and assumes // this function always returns the same value. return ptr::read_volatile(embedded_indicator) != 0; } }
We change the value without the Rust compiler ever knowing, so we do a volatile read of the pointer to make sure we actually load the value from memory.
Otherwise the Rust compiler assumes that this always returns 0, because it is hardcoded in the new_marker function and is never changed in the codebase.
Now we can proceed to run in "embedded" mode.
pub fn execute_embedded() { // Get path of current executable let exe_path = env::current_exe().unwrap(); interpreter::set_absolute_path(exe_path.clone().to_str().unwrap().to_string()); let exe_contents = read(exe_path).unwrap(); let pattern = &EMBEDDED_MARKER.magic_pattern; // The offset is 512 because that's the size of the magic pattern let offset: usize = 512; // Find first match (original) let first_match = find_position(&exe_contents, pattern).unwrap(); // We try to find the second mark by reading what comes after the first one let remaining = &exe_contents[first_match+offset..].to_vec(); let pos = find_position(remaining, pattern).unwrap(); // The bytecode is located right after the second mark let compiled_content = &remaining[pos+offset..]; // Run interpreter from bytecode if !interpreter::run_interpreter_from_bytecode(&compiled_content) { println!("Could not read embedded script"); exit(1); } }
With all those function only thing I need to do is add an if-statement to the main function in the Rust project to check if we're on embedded mode and proceed accordingly.
fn main() { if embed::is_embedded() { embed::execute_embedded(); return; } // Continue executing normally // ... }
That's it. That's all that takes to implement single binaries with Grotsky. Continue reading to see an example of how to actually use this feature.
Embedding example: Make your own Grep
Let's try to reproduce a simple version of the well-known Unix tool grep.
Store the following script in a file called grep.gr:
# Join a list of strings separated by space " " fn join(list) { let out = "" for let i = 0; i < list.length; i = i + 1 { out = out + list[i] if i < list.length - 1 { out = out + " " } } return out } # Check that a pattern was provided if process.argv.length == 1 { io.println("Usage:\n\tgrep [pattern ...]") return 1 } # Join argv[1:] into a pattern let pattern = join(process.argv[1:]) # Read first line let line = io.readln() # While we are not in EOF # Check that line matches pattern and print it # Consume next line while line != nil { if re.match(pattern, line) { io.println(line) } line = io.readln() }
Then it can be used like this:
$ cat file.txt | ./grotsky grep.gr pattern
And it will print all lines that match the "pattern".
We can also package it as a single binary by doing the following commands.
$ ./grotsky compile grep.gr $ ./grotksy embed grep.grc
Now we should have a grep.exe in our directory. And we can use it:
$ chmod +x grep.exe $ cat file.txt | ./grep.exe pattern
Should work the same as the previous example.