The holiday season is a magical moment that fosters interesting projects and ideas. This Christmas, I experimented with a new way to query LLMs. Since LLMs essentially compress vast amounts of information , why not treat them as databases? That’s how InfiniDB was born: the unrealiable database of everything.
In InfiniDB, you can dynamically create tables and fill them with data generated on the fly. Here’s a simple example query and its results:
CREATE VIRTUAL TABLE beatles USING infinidb('all beatles members');
SELECT * FROM beatles ORDER BY first_name ASC
| member_id | first_name | last_name | instrument | join_year | leave_year |
|---|---|---|---|---|---|
| 7 | Chas | Newby | Drums | 1960 | 1961 |
| 3 | George | Harrison | Lead Guitar, Vocals | 1960 | 1980 |
| 1 | John | Lennon | Guitar, Vocals | 1960 | 1980 |
| 2 | Paul | McCartney | Bass Guitar, Vocals | 1960 | 1980 |
| 5 | Pete | Best | Drums | 1960 | 1962 |
| 4 | Ringo | Starr | Drums, Percussion | 1962 | 1980 |
| 6 | Stuart | Sutcliffe | Bass Guitar | 1960 | 1961 |
| 8 | Tony | Sheridan | Drums | 1960 | 1962 |
InfiniDB is built on the SQLite engine, which handles table management and query execution. As you can see, it supports familiar SQL features like WHERE, GROUP BY, JOIN, and ORDER BY clauses.
How it works: the SQLite module
InfiniDB is implemented as a SQLite virtual table module. When you create a virtual table with USING infinidb, it calls an LLM to generate the table schema (see the prompt here
). The first time you query the table, the data is generated and populated (see the data prompt here
). Everything is cached in a .cache folder for consistency and quicker follow-up queries.
Ideally InfiniDB would be implemented as an eponymous virtual table. In this mode, you could query the module directly like SELECT * FROM infinidb('US presidents') without creating a table first. The challenge with eponymous tables, is that the schema needs to be fixed. Since our schema can change depending on the argument (table) I was not able to implement that mode. If you have ideas to simplify the user experience, reach out on X
!
Usage examples
Pokemon
Let’s list the original 151 Pokémon and count them by primary type:
> CREATE VIRTUAL TABLE pokemon USING infinidb('151 classic pokemons');
> PRAGMA table_info(pokemon)
cid name type notnull dflt_value pk
0 id INTEGER 0 <nil> 1
1 name TEXT 1 <nil> 0
2 type_primary TEXT 1 <nil> 0
3 type_secondary TEXT 0 <nil> 0
4 generation INTEGER 1 <nil> 0
> SELECT type_primary, count(1) FROM pokemon GROUP BY type_primary
type_primary count(1)
Bug 12
Dragon 3
Electric 9
Fairy 2
Fighting 7
Fire 12
Ghost 3
Grass 12
Ground 8
Ice 2
Normal 22
Poison 14
Psychic 8
Rock 9
Water 28
Interesting inventions
Correlating interesting inventions with the start of U.S. presidential terms:
> CREATE VIRTUAL TABLE inventions USING infinidb('interesting inventions by year');
> CREATE VIRTUAL TABLE presidents USING infinidb('US presidents after 1900');
> SELECT * FROM presidents JOIN inventions ON term_start_year = inventions.year
id full_name term_start_year term_end_year party id name inventor year description
4 Warren G. Harding 1921 1923 Republican 26 Insulin Frederick Banting and Charles Best 1921 Important in treating diabetes by regulating blood sugar levels.
7 Franklin D. Roosevelt 1933 1945 Democratic 38 FM Radio Edwin Howard Armstrong 1933 Use automation in music transmission to listeners, changing the broadcast landscape.
17 Bill Clinton 1993 2001 Democratic 56 Directed Evolution Francis Arnold 1993 Devising enzymatic reactions that shift the thought of synthetic chemical processes.
18 George W. Bush 2001 2009 Republican 3 Segway PT Dean Kamen 2001 A single-wheeled vehicle allowing users to travel on gyroscopic technology.
Limitations and code
This is purely a fun project. Not suitable for production. As the name hints, the tables can be unreliable (both schema and data may vary). LLM knowledge is limited by the training cutoffs . Also, it only generates a sample of data. There is no pagination to expand content with multiple LLM requests.
You can check out the code on Github .