We have to migrate an old (million LOC) system from SAP ADS (formerly Sybase) ISAM based (so called free tables) based system to a PostgreSQL database.
The system implements kind of multitenancy regarding putting some basic information (chancellery ID, client ID, fiscal year, etc.) just into the windows folder paths and the SQL statements use these to build SQL statements with the appropriate paths for the specific table(-files).
Since SAP decided to shutdown support for the Advantage Database Server, we’re forced to migrate the system to another (real) RDBMS.
We pretty much already decided to go with PostgreSQL since that would support kind of namespacing (schemata) to at least map the windows folder paths to something we could use to replicate this.
ATM we’re discussing two major approaches how to do this:
- Migrating the existing ADS tables to PostgreSQL table objects as they appear in each of the folders to table instances residing in specific PostgresSQL
- Migrating the existing ADS tables to PostgreSQL master-table objects, and create views for them a including a mapping table and a reference key included with the master table and another table where these SCHEMA names are mapped using a unique ID.
Both of the approaches are already technically almost solved, and we’re able to migrate the existing systems either way.
Though there are still points to consider which path we should take.
– Some of our architects say from their guts that having billions of rows in a single master-table would likely have a significant performance impact with the SQL statements.
– Others (like me) argue that a view/master-table model would have several benefits regarding the overall DB-design efficiency.
We’re doing measurements already, but I’d like to hear some advice, maybe some experiences even other choices of DB-design ideas, which of the approaches might scale better.
Some edge data:
- The overall system currently maintains ~1000 basic table structures (some of them with +400 column fields).
- We have customers with multitenancy paths up to 10.000.
- Some of those (master-)tables might need to keep up rows in the range of billions.
- Since indexes are (seemingly) cheap in ADS, there are sometimes many of these used with the existing ISAM tables.
What we already noticed:
- The apprehended performance impact for SQL statements applied to master tables through views vs single table objects is way from linear scaling going with the amount of data rows (I’ve been measuring that using exemplary EXPLAIN ANALYZE server side).
- Using tables per
pg_catalogtends to grow big, especially the
pg_catalog.pg_attributestable. Any operations done with these (including the PostgreSQL query optimizer) might hit by the sheer big data content.
- Currently we use a single tablespace for all of the tables. While we could organize tablespaces for each virtual directory path, this seems to overcomplicate the DB-design though and might even turn up other problems.
- As mentioned before, single tablespaces for such amount of relations doesn’t seem to scale well with PostgreSQL and an underlying NTFS windows filesystem.
- Regarding the underlying NTFS system, we should consider that there will be a minimum block size reserved for each file created, and PostgreSQL creates several files for any
pg_classrelation object, which includes tables, indexes, blob fields, etc.
May be what I’ve offered with my observations here is a bit biased, but I’d like to ask about either alternatives, or absolute no goes for either of those models we’re discussing.
Since my boss is a wise guy, and he knows that I am one of the architects supporting the view-model, he gave as a homework to me, to give them information what are the cons actually.