Designing MongoDB Schemas for Large SQL Migrations

原文英文,约1900词,阅读约需7分钟。发表于:

Unlike relational databases, MongoDB provides significant flexibility in designing schemas. MongoDB schema design should take into consideration both the relationship between the data elements and the data access patterns to evolve the most optimal schema for a workload. Designing the right schema for MongoDB is vital to getting the best cost-performance ratio for our workloads, and a bad schema design can massively impact application performance. Enterprise customers working on SQL migrations typically have hundreds of tables that need to be migrated into MongoDB. The Relational Migrator tool has dramatically simplified schema transformation, data migration, and real-time CDC from SQL databases to MongoDB. Despite that, customers still spend considerable time in MongoDB schema design. When we move beyond a few tables, designing the right MongoDB schema for SQL migrations brings with it several challenges. The relationship between the tables is often not codified at the DB level as Primary or foreign key constraints. Rather, this is enforced at the application level. We would need to spend a significant amount of time with the customer SMEs to surface these constraints. The cardinality of data relationships plays a key role in MongoDB schema design. However, this is challenging to gather in production settings. Data access patterns - Frequently used tables, tables accessed together, index usage cannot be quickly gathered in an automated way, Analyzing the existing SQL footprint in an ad hoc manner can become quite cumbersome, and running sub-optimal brute-force queries on the existing database may cause many issues. All these challenges ultimately result in a lot of manual effort for key customer SMEs during the schema design process. PeerIslands approach PeerIslands has been working with multiple Fortune 100 customers on SQL migrations and has evolved a pre-migration analysis approach for addressing this problem. The pre-migration analysis helps understand your current SQL footprint and access patterns and provides valuable insights into designing the right schema on MongoDB. The pre-migration analysis is fully automated and optimized for performance. We, at PeerIslands, recently used this approach for one of the largest US insurance providers. We rapidly analyzed close to 700+ tables and provided schema design recommendations for key migration initiatives they were working on. Below are some key areas of analysis that are covered by the pre-migration analysis: Data relationships and cardinality MongoDB provides significant flexibility for modeling relationships. SQL DB metadata provides rich data on the relationships between the various data elements. Depending on the complexity of the relationships, we can design them in MongoDB using embedded documents or references between documents. In addition to relationships, the cardinality of the existing data can be gathered from SQL indexes in a performant manner and can help us make a determination of correctly modeling one-to-few, one-to-many, and one-to-zillion relationships on MongoDB. Query analysis SQL DBs store the actual queries that have been executed in the environment. We can also obtain the top queries that have been executed. Analyzing the queries helps us understand which data elements are accessed together, read/write metrics, and more. Understanding this will help us have the frequently accessed data elements together on MongoDB. Query analysis helps us rapidly understand the referential integrity constraints and join patterns that exist in the data. This is helpful in situations where referential integrity is not codified in the DB layer. Table metadata Table and DB metadata can provide us with the overall lay of the land - the number of tables, SQL stored procedures, views, size of tables, and most frequently accessed tables, among others. This can help us get a quick understanding of the complexity of the migration. Indexes Indexes speed up data access and retrieval, and they can have a significant impact on the performance of the database. Understanding the current indexes in the SQL DBs and their usage provides good inputs for index creation on MongoDB. An understanding of the available indexes can also help perform data migration more efficiently. Is it possible to get all this information in a performant and automated way in a production setting? Putting pre-migration analysis into practice with Oracle We have developed playbooks to extract this information from several SQL databases. Here is an example - below is a quick technical preview of how we can get this information from one of the most common databases that we help migrate from - Oracle. Oracle data dictionary The Oracle Data Dictionary refers to a set of read-only tables and views that contain metadata about the database objects, such as tables, columns, indexes, constraints, views, users, and privileges. By querying these tables and views, users can retrieve detailed information about the structure and contents of the database objects. This information can be used to analyze the database schema, optimize queries, and perform other tasks related to database management and maintenance. We can also leverage the Data dictionary to gather the details we discussed. Metrics In Oracle, you can get table metrics such as table size, number of rows, number of blocks, and number of empty blocks using the DBA_TABLES view. Below is an example query to get the table metrics for all tables in a schema: SELECT owner, table_name, num_rows, blocks, empty_blocks, avg_row_len, chain_cnt, avg_space&NewLineFROM dba_tables&NewLineWHERE owner = 'ADMIN'&NewLineORDER BY NUM_ROWS DESC;&NewLine Indexes To get all the indexes for a given schema in Oracle, we can use the DBA_INDEXES view. Here's an example query: SELECT owner, index_name, table_name, uniqueness, index_type, status&NewLineFROM dba_indexes&NewLineWHERE owner = 'ADMIN'&NewLineORDER BY table_name, index_name;&NewLine Relationships To get the one-to-many relationships of all the tables in Oracle, we can query the data dictionary views ALL_CONSTRAINTS and ALL_CONS_COLUMNS. SELECT uc.table_name, rc.table_name AS referenced_table_name, ucc.column_name, uc.constraint_name,&NewLinerc.constraint_name AS referenced_constraint_name&NewLineFROM all_constraints uc&NewLineJOIN all_cons_columns ucc ON uc.owner = ucc.owner AND uc.constraint_name = ucc.constraint_name&NewLineJOIN all_constraints rc ON uc.r_owner = rc.owner AND uc.r_constraint_name = rc.constraint_name&NewLineWHERE uc.constraint_type = 'R' AND uc.owner = 'ADMIN'&NewLineORDER BY uc.table_name, uc.constraint_name, ucc.position;&NewLine Variations of the above query can help in getting one-to-one relationships and many-to-many relationships. Queries list To get a list of queries executed in Oracle, you can query the database’s V$SQL and V$SQLTEXT views. SELECT &ast&NewLineFROM v$sql&NewLineWHERE parsing_schema_name = 'ADMIN'&NewLineORDER BY last_active_time DESC;&NewLine This query will retrieve all SQL statements executed by the specified schema and order them by the last time they were active. Top 10 queries The same v$sql tables can provide the Top 10 queries as well. SELECT sql_text, executions&NewLineFROM v$sql&NewLineORDER BY executions DESC&NewLineFETCH FIRST 10 ROWS ONLY;&NewLine All of these queries only use Oracle metadata. The queries are highly performant and do not negatively impact instance performance. A similar approach can be used to gather this information for other SQL DBs like MySQL and MS SQL. Conclusion PeerIslands has been working with multiple customers on SQL migration projects where we handle end-to-end services, including schema design, data migration, app refactoring, SQL procedure conversion, and more. Moreover, we have also been an engineering partner for MongoDB Relational Migrator and have worked closely with the team on multiple initiatives over the last two years. We are working with the Relational Migrator team to fully automate pre-migration analysis for various SQL databases like Oracle, MySQL, MS SQL, Postgres, etc., and surface these insights to users as part of the Relational Migrator tooling. This will help save time for users and also prevent incorrect schema designs. Follow this space for more announcements on this topic. With extensive experience handling multiple SQL engagements in partnership with MongoDB's Professional Services team, a rich repository of tools, and close collaboration with Relational Migrator, PeerIslands is your "Partner" to accelerate your SQL migration journey. Reach out to partners@Mongodb.com to learn more.

本文介绍了向量搜索和大型语言模型(LLMs)的概念和应用,以及它们为信息检索领域带来的变革。向量搜索通过机器学习技术将非结构化数据编码成数字数组,实现基于意义的数据查询。LLMs通过嵌入模型将文本转换为向量,实现自然语言处理任务。

Designing MongoDB Schemas for Large SQL Migrations
相关推荐 去reddit讨论
  1. Google 发布「AI 全家桶」反击 GPT-4o !搜索引擎罕见大更新, 121 句「AI」道尽焦虑
    Google在Google I/O发布会上推出了多种新产品和升级,包括Gemini大模型、Gemma多模态大模型、AI in Google Workspa...
  2. 简单是优势,但为何复杂性还受欢迎?
    文章探讨了简单性和复杂性的优劣。复杂性在某些情况下更受欢迎,因为它传达了努力、精通和创新的信号。然而,简单性更易理解、使用、构建和维护,具有较低的运营成本...
  3. 一目十行:上下文快速阅读法
    基于上下文关系的阅读方法强调理解文本的整体语境,通过识别句子、段落甚至整篇文章中的逻辑关系和线索,来解读文本的含义。实践方法包括上下文预测、语境线索、重点...
  4. 太贴心!OpenAI发布了类人化重大版本:GPT-4o
    OpenAI发布了GPT-4o,具有实时翻译、情绪检测、语音指令理解、声音合成和图像理解等功能。GPT-4o将通过API免费提供给用户。OpenAI还发布...
  5. Python中读写Parquet文件的方法
    Apache Parquet是一种流行的列式存储格式,使用pyarrow包可以轻松读写Parquet文件。可以转换DataFrame为Parquet文件,...
  6. 使用 braft 构建应用,应该关注哪些指标?
    本文介绍了使用braft构建应用时需要关注的指标。braft是一个C++ raft框架,开发者可以基于其抽象接口实现自己的业务逻辑。文章从metrics入...
  7. Python读取NetCDF文件-裁剪&计算
    这篇文章介绍了使用xarray包处理NetCDF文件的方法。xarray是基于pandas的数据结构构建的,可以方便地处理多维数组数据。文章还介绍了如何根...
  8. How to Use Stable Diffusion Effectively
    稳定扩散是一个由多个组件和参数组成的流水线,它们共同工作产生输出。模型是流水线中最重要的组件之一,不同版本的模型会对输出产生不同影响。选择合适的模型、采样...
  9. sqlx: 一个优秀的rust异步SQL库
    Rust生态圈中的sqlx库是一个功能齐全的数据库访问和查询构建器库,支持多种数据库。它具有异步、编译时检查查询、与数据库无关、纯Rust等特点。sqlx...
  10. Python 潮流周刊#50:我最喜欢的 Python 3.13 新特性!
    本期刊共分享了12篇文章、11个开源项目和2则音视频,涵盖Python 3.13新特性、Asyncio工作原理、Python文件处理、数据科学错误、加速N...