{"id":20964,"date":"2022-04-26T13:53:09","date_gmt":"2022-04-26T11:53:09","guid":{"rendered":"https:\/\/stage-fp.webenv.pl\/blog\/?p=20964"},"modified":"2023-02-22T12:58:04","modified_gmt":"2023-02-22T11:58:04","slug":"generic-metadata-framework","status":"publish","type":"post","link":"https:\/\/www.future-processing.com\/blog\/generic-metadata-framework\/","title":{"rendered":"Generic Metadata Framework &#8211; how to use it in a project?"},"content":{"rendered":"\n<p>ISASA defines the steps that must be taken to build a solution. These steps include:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">I \u2013 Ingest \u2013 data collection<br>S \u2013 Store \u2013 data storage<br>A \u2013 Analyse \u2013 data analysis<br>S \u2013 Surface \u2013 presentation of the prepared data<br>A \u2013 Act \u2013 4xM Make Me More Money<\/h4>\n\n\n\n<p><br>The first four steps are assigned to the team that creates the solution, while the final step \u2013 <strong>Act<\/strong> \u2013 <strong>is the Client\u2019s responsibility<\/strong>. Based on the results of data analysis (e.g. reports, dashboards), the Client makes decisions and carries out tasks aimed at <a href=\"https:\/\/www.future-processing.com\/blog\/how-to-choose-the-best-data-solution-for-you\/\" title=\"How to choose the best data solution for you?\">improving business operations<\/a>. The strictly technical tasks include those that do not require any knowledge of the domain as well as those that cannot be performed without the knowledge of the domain, e.g. model building.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><br>The Generic Metadata Framework &#8211; why is it working?<\/h2>\n\n\n\n<p>The Generic Metadata Framework automates tasks that are recurring in every data analysis project, such as data collection and data lake creation. Thanks to this, the creator of a solution is able to <strong>focus on the essence of a given problem<\/strong>, that is, on building a model that corresponds with the <strong>pre-defined business needs<\/strong>.<\/p>\n\n\n    <div class=\"o-icon-box__wrapper\">\n        <div class=\"o-icon-box o-icon-box--big o-icon-box--italics m-cool-gray-light\">\n            <div class=\"o-icon-box__text f-headline-extra-big\">\n                The Generic Metadata Framework makes it possible to create solutions based on the modern data warehouse or data lakehouse concepts in the cloud, in a quick and agile way. It relies on services such as Azure Synapse Analytics and Azure Databricks.            <\/div>\n        <\/div>\n    <\/div>\n\n\n\n<p>The Generic Metadata Framework <strong>helps the solution creator in concentrating on its business aspects<\/strong>. What is more, it simplifies and automates:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>processes of <strong>data loading<\/strong> (supporting both full and incremental data loading),<br><\/li><li><strong>building data lakes<\/strong> (defining the structure, data partitioning),<br><\/li><li>initial <strong>data processing<\/strong> (transformation of input data),<br><\/li><li>building <strong>delta lakes<\/strong> (defining the structure, data partitioning),<br><\/li><li>creating <strong>data warehouses<\/strong> (defining the model in the views).<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><br>Main characteristics:<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Flexibility<\/strong> at the level of architecture \u2013 creating data analysis solutions based on modern data warehouses and data lakehouses.<br><\/li><li><strong>Automation<\/strong> of recurring tasks, such as data collection.<br><\/li><li>Complete architecture that enables the <strong>creation of scalable solutions<\/strong>, comprising security, monitoring, data governance, etc.<br><\/li><li>Building components available through Azure.<br><\/li><li>Flexibility at the level of access to data \u2013 easy <strong>integration with Power BI<\/strong>.<br><\/li><li><strong>Extensibility<\/strong> \u2013 the framework can be easily extended, for instance, by supporting new types of data sources.<br><\/li><\/ul>\n\n\n\n<p>The graph below presents the areas covered by the framework:<\/p>\n\n\n    <div class=\"b-image js-lightbox\">\n        <figure class=\"b-image__figure\">\n            <a\n                href=\"1_1.png\"\n                class=\"js-lightbox__trigger\"\n                aria-haspopup=\"dialog\"\n                data-elementor-open-lightbox=\"no\"\n            >\n                <img fetchpriority=\"high\" decoding=\"async\" width=\"1238\" height=\"589\" src=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1.png\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1.png 1238w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-300x143.png 300w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-1024x487.png 1024w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-768x365.png 768w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-841x400.png 841w\" sizes=\"(max-width: 1238px) 100vw, 1238px\" \/>            <\/a>\n                    <\/figure>\n        <div\n    class=\"js-lightbox__dialog o-lightbox\"\n    role=\"dialog\"\n    aria-modal=\"true\"\n    aria-hidden=\"true\"\n    tabindex=\"-1\"\n>\n    <div class=\"o-lightbox__dialog\">\n        <div class=\"o-lightbox__content js-lightbox__content\" role=\"document\">\n            <button\n                class=\"o-button o-button--xs o-button--dark o-button--icon-right o-button--tertiary o-lightbox__close js-lightbox__close m-gradient-brand\"\n            >\n                Close picture                <svg class='o-icon o-icon--16 o-icon--timescircle '>\n            <use xlink:href='#icon-16_times-circle'><\/use>\n          <\/svg>            <\/button>\n                                            <figure class=\"o-lightbox__image is-active\">\n                    <img fetchpriority=\"high\" decoding=\"async\" width=\"1238\" height=\"589\" src=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1.png\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1.png 1238w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-300x143.png 300w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-1024x487.png 1024w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-768x365.png 768w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/1_1-841x400.png 841w\" sizes=\"(max-width: 1238px) 100vw, 1238px\" \/>                                    <\/figure>\n                    <\/div>\n    <\/div>\n<\/div>\n    <\/div>\n\n\n\n<p>In the first step, data is collected from data sources, including those from on-prem environments, and saved in a data lake (<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/blobs\/data-lake-storage-introduction\" target=\"_blank\" rel=\"noreferrer noopener\">Azure Data Lake Gen2<\/a>) in the native format. The data source supported at that point is based on SQL Server and mechanisms that allow incremental data loading by means of change tracking and time stamps. The process of configuration as such is based on the metadata collected from the source system.<\/p>\n\n\n\n<p>In the next step, the collected data is preprocessed \u2013 <strong>the history of data changes is constructed (SCD 2) and saved in a data lake in the delta format<\/strong>. The process is carried out through <a href=\"https:\/\/databricks.com\/spark\/about\" target=\"_blank\" rel=\"noreferrer noopener\">Spark Databricks<\/a>, on the basis of the prepared configuration and metadata loaded in the previous step.<\/p>\n\n\n\n<p>The following step is model building. The <strong>data<\/strong> <strong>is sourced from the tables in the so-called curated zone<\/strong>, while the model itself is saved as views on Spark and additional configuration, which defines, for example, how the model is meant to be fed, whether SCD 1 or SCD 2 are to be used.<\/p>\n\n\n\n<p>The graph below shows the data flow in a solution based on the Generic Metadata Framework.<\/p>\n\n\n    <div class=\"b-image js-lightbox\">\n        <figure class=\"b-image__figure\">\n            <a\n                href=\"2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E.png\"\n                class=\"js-lightbox__trigger\"\n                aria-haspopup=\"dialog\"\n                data-elementor-open-lightbox=\"no\"\n            >\n                <img decoding=\"async\" width=\"938\" height=\"637\" src=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E.png\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E.png 938w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-300x204.png 300w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-768x522.png 768w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-589x400.png 589w\" sizes=\"(max-width: 938px) 100vw, 938px\" \/>            <\/a>\n                    <\/figure>\n        <div\n    class=\"js-lightbox__dialog o-lightbox\"\n    role=\"dialog\"\n    aria-modal=\"true\"\n    aria-hidden=\"true\"\n    tabindex=\"-1\"\n>\n    <div class=\"o-lightbox__dialog\">\n        <div class=\"o-lightbox__content js-lightbox__content\" role=\"document\">\n            <button\n                class=\"o-button o-button--xs o-button--dark o-button--icon-right o-button--tertiary o-lightbox__close js-lightbox__close m-gradient-brand\"\n            >\n                Close picture                <svg class='o-icon o-icon--16 o-icon--timescircle '>\n            <use xlink:href='#icon-16_times-circle'><\/use>\n          <\/svg>            <\/button>\n                                            <figure class=\"o-lightbox__image is-active\">\n                    <img decoding=\"async\" width=\"938\" height=\"637\" src=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E.png\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E.png 938w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-300x204.png 300w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-768x522.png 768w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2022\/04\/2022-04-27-13_08_18-Generic-Metadata-Framework-compressed.pdf-and-9-more-pages-Work-Microsoft\u200b-E-589x400.png 589w\" sizes=\"(max-width: 938px) 100vw, 938px\" \/>                                    <\/figure>\n                    <\/div>\n    <\/div>\n<\/div>\n    <\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><br>The concept of zones used in the Generic Metadata Framework<\/h2>\n\n\n\n<p>The concept of zones used in the Generic Metadata Framework is very similar to the solution building approaches promoted by <a href=\"https:\/\/databricks.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Databrick <\/a>(bronze, silver, gold zones). Thanks to the division into zones, data can be separated not only at the logical level but also at the physical level (dedicated containers on a data lake). It also defines precisely both the input and the output of each stage of data processing. <strong>Access to data from each zone is made possible by Azure Synapse Serverless.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><br>The Generic Metadata Framework is built of four modules:<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Data Loader<\/strong> \u2013 responsible for loading data from data sources and saving data in a data lake.<br><\/li><li><strong>Data Preprocessor <\/strong>\u2013 responsible for initial data processing: building a data lake.<br><\/li><li><strong>Data Lakehouse<\/strong> \u2013 responsible for creating and feeding the data model.<br><\/li><li><strong>Synapse Integrator<\/strong> \u2013 responsible for transferring data from the data lakehouse to Azure Synapse Dedicated Pool.<\/li><\/ul>\n\n\n\n<p>Module 3 can work independently from the other modules, while Module 4 is optional: in other words, it is possible to build a solution using the Generic Metadata Framework that does not use the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/synapse-analytics\/sql-data-warehouse\/sql-data-warehouse-overview-what-is\" target=\"_blank\" rel=\"noreferrer noopener\">Azure Dedicated Pool<\/a>. The Generic Metadata Framework also supports building solutions based on the data mesh approach by providing a self-serve platform.<\/p>\n\n\n<div class=\"b-button\">\n            <a class=\"o-button o-button--primary o-button--s o-button--icon-right o-button--arrow\" href=\"https:\/\/www.future-processing.com\/software-services\/software-development\/\">\n            Deliver high quality software solutions\n            <svg class='o-icon o-icon--16 o-icon--arrow '>\n            <use xlink:href='#icon-16_arrow'><\/use>\n          <\/svg>\n                            <svg class='o-icon o-icon--24 o-icon--arrow '>\n            <use xlink:href='#icon-24_arrow'><\/use>\n          <\/svg>                    <\/a>\n    <\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><br>Summary<\/h2>\n\n\n\n<p>The usage of Generic Metadata Framework provides infrastructure necessary to build solutions and automates recurring processes, such as data collection, to reduce the migration project time considerably.<br>It guarantees <strong>great flexibility<\/strong> <strong>in terms of access to data as well as scalability.<\/strong><\/p>\n\n\n<div class=\"b-cta-banner m-gradient-light\">\n            <a href=\"https:\/\/www.future-processing.com\/software-services\/data-science-engineering\/\" class=\"b-cta-banner__image-container\" data-elementclick=\"article-banner\" data-elementname=\"Data Science &amp; Engineering\">\n            <img decoding=\"async\" width=\"450\" height=\"450\" src=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering.png\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering.png 450w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-300x300.png 300w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-150x150.png 150w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-400x400.png 400w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-24x24.png 24w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-48x48.png 48w, https:\/\/www.future-processing.com\/blog\/wp-content\/uploads\/2021\/08\/Data-Science-Engineering-96x96.png 96w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/>        <\/a>\n    \n        <a href=\"https:\/\/www.future-processing.com\/software-services\/data-science-engineering\/\" class=\"b-cta-banner__url b-cta-banner__text-container\" data-elementclick=\"article-banner\" data-elementname=\"Data Science &amp; Engineering\">\n                    <div class=\"b-cta-banner__text\">\n                                                    <h3 class=\"f-headline-extra-big b-cta-banner__header\">\n                        Data Science &amp; Engineering                    <\/h3>\n                \n                                    <div class=\"f-paragraph\">\n                        <p>Process data, base business decisions on knowledge and improve your day-to-day operations.<\/p>\n                    <\/div>\n                \n                                    <div class=\"o-button o-button--primary o-button--s o-button--icon-right o-button--arrow\">\n                        <span>Let\u2019s work together<\/span>\n                        <svg class='o-icon o-icon--16 o-icon--arrow '>\n            <use xlink:href='#icon-16_arrow'><\/use>\n          <\/svg>                    <\/div>\n                            <\/div>\n                <\/a>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>Creating a cloud solution for data analysis is always a time-consuming, complicated, and complex process, no matter if it is based on a modern data warehouse or a data lakehouse. The implementation of such projects can be described with a simple acronym: ISASA.<\/p>\n","protected":false},"author":251,"featured_media":16893,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1989],"tags":[2057],"coauthors":[2147],"class_list":["post-20964","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-solutions","tag-generic-metadata-framework"],"acf":{"reading-time":"5 min","show-toc-sublists":false,"image":null,"logo":null,"button1":{"button1_type":"","button":null},"button2":{"button2_type":"","button":null},"person":{"person_photo":null,"person_name":"","person_position":""}},"_links":{"self":[{"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/posts\/20964","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/users\/251"}],"replies":[{"embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/comments?post=20964"}],"version-history":[{"count":0,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/posts\/20964\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/media\/16893"}],"wp:attachment":[{"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/media?parent=20964"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/categories?post=20964"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/tags?post=20964"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.future-processing.com\/blog\/wp-json\/wp\/v2\/coauthors?post=20964"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}