Modern data catalogs must be able to scale up as demand for data and knowledge grows within the enterprise. But not every catalog has that capacity – many can only scale out to multiple instances.
Data and analytics ecosystems are evolving at an amazing pace. New data and analytics tools, systems, and assets come online daily and the companies we work for are complicated, agile and fast moving.
As this happens, your data catalog needs to adapt, change, and be able to represent all the different parts of your business to be able to paint the most complete picture of your analytics ecosystem.
It’s not uncommon for data.world customers to track thousands of metadata attributes across dozens of business units with hundreds of thousands of data assets each.
As data.world Chief Product Officer Jon Loyens wrote for denmark whatsapp number data Towards Data Science, “If you impose limits on what gets cataloged, you risk losing potentially critical context for your data.”
Three hallmarks of an extensible data catalog
In the context of a data catalog, extensibility relates to the platform’s ability to quickly and easily catalog new data sources without having to overhaul the underlying metadata models or configuration, forcing a redeployment of infrastructure. Your data catalog should be able to absorb new information about your data and analytics ecosystem or represent new lines of business without costly re-engineering. Here are the three hallmarks of an extensible data catalog.
Cloud-native architecture
As we recently wrote, if you want supreme flexibility, your data catalog needs to be cloud native. Traditionally built and deployed catalogs typically require software to be set up and/or hardware to be provisioned (either on-prem or by your cloud provider). They’re also notoriously slow to update, requiring extensive migrations when new versions appear.
Cloud-native data catalogs are fully managed, ensuring you get the latest version as soon as possible with zero-migration downtime. At data.world, we release more than 1,000 updates to our platform annually. Everything from small bug fixes to major feature releases are available to everyone – no waiting, no worry. You also don’t have to plan your catalog usage around scheduled downtime... because there isn’t any.
Powered by knowledge graph
In 2012, Demian Hess published an article in the Journal of Digital Media Management that stated:
“Digital asset metadata cannot be represented by a single, unchanging metadata model and schema, because the metadata are too variable, complex, and change too rapidly. Data architects need to embrace flexible models that allow metadata to vary across asset types and that can accommodate changes to the underlying schema.”
He concluded that although flexible models can be implemented using traditional relational databases, the best way to achieve the desired result is by leveraging graph technology, like a knowledge graph.