Data Fusion is Google’s cloud native, fully managed, scalable enterprise data integration platform. It enables bringing transactional, social or machine data in various formats from databases, applications, messaging systems, mainframes, files, SaaS and IoT devices, , offers an easy to use visual interface , and provides deployment capabilities to execute data pipelines on ephemeral or dedicated Dataproc clusters in Spark. Cloud Data Fusion is powered by open source CDAP which makes the pipelines portable across Google Cloud or Hybrid or multi cloud environments..
Data integration capabilities
Data integration for optimized analytics and accelerated data transformations
- Data Fusion supports a broad set of more than 200 connectors and formats, which enables you to extract and blend data You can develop data pipelines in a visual environment to improve productivity. .
- Data Fusion provides data wrangling capabilities to prepare data and provides capabilities to operationalize the data wrangling to improve business IT collaboration
- You can leverage the extensive REST API to design, automate, orchestrate and manage the lifecycle of the pipelines .
- Data Fusion supports all data delivery modes including batch, streaming or real-time making it a comprehensive platform to address both batch and streaming related use cases.
- It provides operational insights so that you can monitor data integration processes. Manage SLA’s and help optimize and fine tune integration jobs.
- Data Fusion provides capabilities to parse and enrich unstructured data using Cloud AI, for example, converting audio files to text, applying NLP to detect sentiment, or extracting features from images and documents or converting HL7 to FHIR formats
Data Fusion builds confidence in business decision-making with advanced data consistency features:
- Data Fusion minimizes the risk of mistakes by providing structured ways of specifying transformations, data quality checks with Wrangler, and predefined directives.
- Data Fusion helps identify quality issues by keeping track of profiles of the data being integrated and enabling you make decisions based on data observability.
- Data formats change over time, Data Fusion helps handle data drift with the ability to identify change and customize error handling.
Metadata and modeling
Data Fusion makes it easy to gain insights with metadata:
- You can collect technical, business, and operational metadata for datasets and pipelines and easily discover metadata with a search.
- Data Fusion, provides end-to-end data view to understand the data model, and to profile data, flows, and relationships of datasets.
- It enables exchange of metadata between catalogs and integration with end-user workbenches using REST APIs.
The Data Fusion data lineage feature enables you to understand the flow of your data and how it is prepared for business decisions.