0 votes
asked in MicroStream for Java by (150 points)
edited by

Hello, i have some questions:

1.) You write, that microstream supports transactions.

Does that mean, that there are also different TA-levels like for example PostgreSQLs TA-Level TRANSACTION_SERIALIZABLE?

2.) Is the microstream system scalable (scale horizontally with ease)?

Does it work with multiple servers?

Or was it implemented to work as a single monolith machine?

3.) How to deal with persisted data and java-code-changes?

When working with an normal DB-System, one is used to to something like DB-Migration in order to update the DB that should work with newly java-code-changes.

How to handle code-changes with micsostream?

4.) Is and will the microstream software/code be open forever?

What is the license for the software? Is it the Apache 2.0 Open Source license?

Thanks for answers and best regards

3 Answers

+1 vote
answered by (1k points)

Hi.

As long as you dont have an answer from an official side, here are my notes:

1. What I know about transactions in MicroStream, is, that the whole transaction rolls back, when saving fails. So either the whole object is saved or nothing at all. But it works different as you are working with java objects, not tables. If you want a transaction over a group of objects you may have to do it programmatically I guess.

2. Not sure how far clustering is already with MicroStream. Its either already implemented or work in progress. But it is an enterprise feature that will need some kind of support / subscription I guess. See "professional support" on the homepage. For now Its free as far as I know.

3. See https://manual.docs.microstream.one/data-store/legacy-type-mapping

4. The microstream database software is not free, as far as I know. Its proprietary software from the MicroStream company. So no, its not Apache licensed. Maybe they will make it open source in the future. But i dont know of any plans for this.

+1 vote
answered by (3.4k points)
edited by

Hello rra,
Here are some official answers to your question in addition to Freds answer.

1. Transactions in Microstream
Every storing of data is executed as a transaction, an atomic all-or-nothing action. When one or more entities are stored, their data is collected into a continuous block of bytes and that block is written to the physical form (the "files") in one fell swoop. Any problem during the IO-operation causes the whole block to be deleted (rolled back).

2. Scalability
Microstream does not support clustering now, this is an upcoming feature.
 
3. Persisted data and code changes
Microstream provides a mechanism to handle changes in data structures (classes) we call “Legacy type mapping”.  This mapping can be done automatically, or manually defined.

Microstream maps the old type data to the new one when it is loaded. The stored data will not be changed during this process, the mapping is applied in memory only. To persist those changes, it is necessary to store the affected objects again explicitly.

See https://manual.docs.microstream.one/data-store/legacy-type-mapping

4. Licence
The Microstream Storage available under a proprietary license.
It is closed source, but completely free for any commercial use. There are no license fees.
See https://manual.docs.microstream.one/license for the complete license.

+2 votes
answered by (580 points)

I can answer the technical questions:

1.) Transactions

Transactions are a complex topic because it is never clear what everyone involved exactely thinks of when using the term.

Some simply use if to ask "If an error occurs, will partial writes be rolled back and everything will be fine?"
Some think of commands for explicitely starting and committing a transaction.
Others think of transaction isolation levels as, for example, described here: https://www.postgresql.org/docs/9.5/transaction-iso.html
Others might even think of yet something else.

From a MicroStream perspective, the answer to all those aspects is a clear "yes and no".
MicroStream does not support explicit transaction management. Simple because it is not needed.
Every communication with the database layer (read and write) are concurrency-safe and all writes are rolled back automatically if something goes wrong. Either directly during the execution, or, if the process got terminated abruptly, at the next initialization.
So depending on the notion one has in mind, one could say "Every write is automatically a transaction". Without performance overhead.

As far as isolation levels for concurrent operations are concerned:
This is hardly applicable due to the paradigm shift used in MicroStream.

"Classic" ("old") database systems (DBMS - Database Management System) were meant to act as an application server on their own. Hence all the user-friendly query language, user management and concurrency handling approach via transaction isolation.
Nowadays, this is no longer the typical usage of a DBMS. Instead, the actual application server is written in a modern language more suited for the task (Java, C#, etc.) which then uses the DBMS only as a data storage layer. But they do it in an architectural unclean way since they still use the database server as the "source of truth" for the data. Multiple threads inside the actual application work on partial copies of that primary data that have to be merged later on, can get conflicted, etc. So the application server process, which is actually the "master" of the whole application with its application logic and application-level concurrency handling (user requests etc) is degrading itself to be the "slave" of the DBMS when it comes to data. But the DBMS has only the data, not the application logic, so it can never be a proper "master". This causes all kinds of conflicts and/or inefficiencies when it comes to concurrent operation.
It's a typical "Too many cooks spoil the broth" situation. Not a good idea. Not a clean architecture.

MicroStream goes one step further and "cleans up" the application architecture: The actual application - finally - is for all intents and purposes the "master" of both logic and data. Including concurrency handling, which an application has to deal with, anyway, to keep all the user requests, sessions, in-memory data, caches, etc. consistent. The database layer (MicroStream) is nothing more than a subordinate module for persisting data and swapping out currently not needed data and later back in again.
Therefore, there is no need to support concurrent accesses to the data in MicroStream. Almost all of the work and logic for loading and storing data is done on the application level inside the application threads (but by MicroStream code, of course). The storage-level threads of MicroStream do hardly more than writing already completed blocks of data to the persistent medium (usually files) or loading data from the persistent medium into a cache for the application to process. This is as fast as the hardware, so supporting concurrent accessing would not speed it up, anyway. The concurrent application work happens only on the application level, not on the storage level. Thus, all storage-level accessing happens serially. Thus, no isolation to handle concurrent accesses is needed on that level. Or, if one wills, the transaction isolation level is always "serial". But this is misleading because it does not mean the same thing as it does in the old Wannabe-Server-Application-DBMS concept.

The short answer for for all things concerning transactions is something like:
You don't have to worry about transactions in MicroStream.
(But you do have to handle application-specific concurrency in your application's Java code, just like every other application has to)

2.) Horizontal scaling

MicroStream does currently not support horizontal scaling or clustering.

Partly because it is only a module of an application process instead of a process on its own. So the application is the thing that has to be clustered/scaled (anyway, as with old DBMS, too. Having a clustered DBMS will NOT scale the application if the application does not scale, too.).

However, there are already concepts and ongoing development to ease that clustering/scaling of the application in combination with MicroStream and also concepts to horizontally scale persisted data of one or more application processes over an arbitrary number of processes/machines/nodes.

3.) Changes in Java code

As Fred has already linked (thanks a lot), MicroStream contains a solution for that, called "Legacy Type Mapping".
Persisted objects of an older version of a type (a "Legacy Type") are mapped dynamically to the current version of the type during loading.
So there is no need to rewrite the database when a type changes. Hence the term "Legacy Type Mapping" instead of "Database Refactoring" or "Database Migration" or such.

Simple and unambiguous changes (like just changing the field "surname" to "lastName" are detected and mapped automatically, without any intervention of the developer.
More complex cases (ambiguous changes, conflicting renaming, etc.) must be mapped explicitely by the developer.

This is all on the level of trivial "old field -> new field" mappings.
There will never be a database upgrade script or a migration or such.

A grapical tool to derive the mappings from existing types is planned for the future, so even the most complex cases of type changes will boil down to a simple point-and-click procedure.

Notes: Every question must be a separate forum post. Headline: Formulate your question shortly and precisely. Thank you!
Powered by Question2Answer
...