Designing MongoDB Schemas for Large SQL Migrations
原文英文,约1900词,阅读约需7分钟。发表于: 。Unlike relational databases, MongoDB provides significant flexibility in designing schemas. MongoDB schema design should take into consideration both the relationship between the data elements and the data access patterns to evolve the most optimal schema for a workload. Designing the right schema for MongoDB is vital to getting the best cost-performance ratio for our workloads, and a bad schema design can massively impact application performance. Enterprise customers working on SQL migrations typically have hundreds of tables that need to be migrated into MongoDB. The Relational Migrator tool has dramatically simplified schema transformation, data migration, and real-time CDC from SQL databases to MongoDB. Despite that, customers still spend considerable time in MongoDB schema design. When we move beyond a few tables, designing the right MongoDB schema for SQL migrations brings with it several challenges. The relationship between the tables is often not codified at the DB level as Primary or foreign key constraints. Rather, this is enforced at the application level. We would need to spend a significant amount of time with the customer SMEs to surface these constraints. The cardinality of data relationships plays a key role in MongoDB schema design. However, this is challenging to gather in production settings. Data access patterns - Frequently used tables, tables accessed together, index usage cannot be quickly gathered in an automated way, Analyzing the existing SQL footprint in an ad hoc manner can become quite cumbersome, and running sub-optimal brute-force queries on the existing database may cause many issues. All these challenges ultimately result in a lot of manual effort for key customer SMEs during the schema design process. PeerIslands approach PeerIslands has been working with multiple Fortune 100 customers on SQL migrations and has evolved a pre-migration analysis approach for addressing this problem. The pre-migration analysis helps understand your current SQL footprint and access patterns and provides valuable insights into designing the right schema on MongoDB. The pre-migration analysis is fully automated and optimized for performance. We, at PeerIslands, recently used this approach for one of the largest US insurance providers. We rapidly analyzed close to 700+ tables and provided schema design recommendations for key migration initiatives they were working on. Below are some key areas of analysis that are covered by the pre-migration analysis: Data relationships and cardinality MongoDB provides significant flexibility for modeling relationships. SQL DB metadata provides rich data on the relationships between the various data elements. Depending on the complexity of the relationships, we can design them in MongoDB using embedded documents or references between documents. In addition to relationships, the cardinality of the existing data can be gathered from SQL indexes in a performant manner and can help us make a determination of correctly modeling one-to-few, one-to-many, and one-to-zillion relationships on MongoDB. Query analysis SQL DBs store the actual queries that have been executed in the environment. We can also obtain the top queries that have been executed. Analyzing the queries helps us understand which data elements are accessed together, read/write metrics, and more. Understanding this will help us have the frequently accessed data elements together on MongoDB. Query analysis helps us rapidly understand the referential integrity constraints and join patterns that exist in the data. This is helpful in situations where referential integrity is not codified in the DB layer. Table metadata Table and DB metadata can provide us with the overall lay of the land - the number of tables, SQL stored procedures, views, size of tables, and most frequently accessed tables, among others. This can help us get a quick understanding of the complexity of the migration. Indexes Indexes speed up data access and retrieval, and they can have a significant impact on the performance of the database. Understanding the current indexes in the SQL DBs and their usage provides good inputs for index creation on MongoDB. An understanding of the available indexes can also help perform data migration more efficiently. Is it possible to get all this information in a performant and automated way in a production setting? Putting pre-migration analysis into practice with Oracle We have developed playbooks to extract this information from several SQL databases. Here is an example - below is a quick technical preview of how we can get this information from one of the most common databases that we help migrate from - Oracle. Oracle data dictionary The Oracle Data Dictionary refers to a set of read-only tables and views that contain metadata about the database objects, such as tables, columns, indexes, constraints, views, users, and privileges. By querying these tables and views, users can retrieve detailed information about the structure and contents of the database objects. This information can be used to analyze the database schema, optimize queries, and perform other tasks related to database management and maintenance. We can also leverage the Data dictionary to gather the details we discussed. Metrics In Oracle, you can get table metrics such as table size, number of rows, number of blocks, and number of empty blocks using the DBA_TABLES view. Below is an example query to get the table metrics for all tables in a schema: SELECT owner, table_name, num_rows, blocks, empty_blocks, avg_row_len, chain_cnt, avg_space&NewLineFROM dba_tables&NewLineWHERE owner = 'ADMIN'&NewLineORDER BY NUM_ROWS DESC;&NewLine Indexes To get all the indexes for a given schema in Oracle, we can use the DBA_INDEXES view. Here's an example query: SELECT owner, index_name, table_name, uniqueness, index_type, status&NewLineFROM dba_indexes&NewLineWHERE owner = 'ADMIN'&NewLineORDER BY table_name, index_name;&NewLine Relationships To get the one-to-many relationships of all the tables in Oracle, we can query the data dictionary views ALL_CONSTRAINTS and ALL_CONS_COLUMNS. SELECT uc.table_name, rc.table_name AS referenced_table_name, ucc.column_name, uc.constraint_name,&NewLinerc.constraint_name AS referenced_constraint_name&NewLineFROM all_constraints uc&NewLineJOIN all_cons_columns ucc ON uc.owner = ucc.owner AND uc.constraint_name = ucc.constraint_name&NewLineJOIN all_constraints rc ON uc.r_owner = rc.owner AND uc.r_constraint_name = rc.constraint_name&NewLineWHERE uc.constraint_type = 'R' AND uc.owner = 'ADMIN'&NewLineORDER BY uc.table_name, uc.constraint_name, ucc.position;&NewLine Variations of the above query can help in getting one-to-one relationships and many-to-many relationships. Queries list To get a list of queries executed in Oracle, you can query the database’s V$SQL and V$SQLTEXT views. SELECT &ast&NewLineFROM v$sql&NewLineWHERE parsing_schema_name = 'ADMIN'&NewLineORDER BY last_active_time DESC;&NewLine This query will retrieve all SQL statements executed by the specified schema and order them by the last time they were active. Top 10 queries The same v$sql tables can provide the Top 10 queries as well. SELECT sql_text, executions&NewLineFROM v$sql&NewLineORDER BY executions DESC&NewLineFETCH FIRST 10 ROWS ONLY;&NewLine All of these queries only use Oracle metadata. The queries are highly performant and do not negatively impact instance performance. A similar approach can be used to gather this information for other SQL DBs like MySQL and MS SQL. Conclusion PeerIslands has been working with multiple customers on SQL migration projects where we handle end-to-end services, including schema design, data migration, app refactoring, SQL procedure conversion, and more. Moreover, we have also been an engineering partner for MongoDB Relational Migrator and have worked closely with the team on multiple initiatives over the last two years. We are working with the Relational Migrator team to fully automate pre-migration analysis for various SQL databases like Oracle, MySQL, MS SQL, Postgres, etc., and surface these insights to users as part of the Relational Migrator tooling. This will help save time for users and also prevent incorrect schema designs. Follow this space for more announcements on this topic. With extensive experience handling multiple SQL engagements in partnership with MongoDB's Professional Services team, a rich repository of tools, and close collaboration with Relational Migrator, PeerIslands is your "Partner" to accelerate your SQL migration journey. Reach out to partners@Mongodb.com to learn more.
本文介绍了向量搜索和大型语言模型(LLMs)的概念和应用,以及它们为信息检索领域带来的变革。向量搜索通过机器学习技术将非结构化数据编码成数字数组,实现基于意义的数据查询。LLMs通过嵌入模型将文本转换为向量,实现自然语言处理任务。