⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Compiling SQL queries with Cranelift? #12238

@espoal

Description

@espoal

I'm building a next generation distributed datastore and I would like to add the possibility of querying using a relational model. A few weeks ago I had the crazy idea of compiling SQL prepared statements instead of running them through a traditional query planner, before finding out I'm not the first one to use this approach.

Right now I transliterate the query to Rust before compiling it, but maybe there is a smarter approach. My questions are:

  • Can Cranelift support this use case or is it too far from its original design?
  • Could Cranelift help me skip the transliteration step?
  • Are you aware of anyone else using this same approach with Cranelift instead of LLVM?

More details:
Let's say I have a query in the form:

SELECT 
    ot.customer_ID, 
    ct.customer_name, 
    SUM(ot.order_total) as total_amount,
    UDF.heavy_function(ot.order_total) AS processed_total
FROM orders_table AS ot
JOIN 
    customers_table AS ct ON ct.customer_ID = ot.customer_ID
WHERE 
    ot.customer_ID IN $customers_array
AND
    ot.order_date BETWEEN $start_date AND $end_date
GROUP BY 
    ot.customer_ID, 
    ct.customer_name;

Right now the compilation looks like:

  • The user register the SQL query before executing it
  • I parse the query using an SQL parser
  • I extract the parameters, these will enter in the signature of the Rust function representing the query
  • I transliterate to Rust using some very ugly code (a bunch of match, if and generic types)
  • I run some compile time optimizations (using a cost based function, like the vulcano query planner)
  • I compile the code
  • I cache the compiled output
  • I notify the user that the query is ready to use

To run the query:

  • The user calls the query and provide the inputs
  • I search for the cached compiled output using an hash of the query
  • I launch one thread for each shard
  • I reduce the results and present it to the user

Ideally I would like to run the query inside a WASM runtime.

I apologize in advance if this question is off topic for this repo and for my noobishness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions