Each item in the lists represent a new slide change

Presentation

  • Hello human, welcome to this first talk
  • I am Daniel Moreno Medina
  • Also know as photonbit, an I am going to talk about
  • Metadata
  • tu
  • tutu
  • tu
  • tutu
  • chu
  • chuchu
  • the journey

Motivations

I will present my motivations to embark in this journey, they are:

  • love
  • meaning
  • cooperation
  • curiosity
  • freedom
  • cognition

In particular:

  • love for
  • standards, not as definition of what it is, but as an agreement base to build upon. The first time I fell in love for a standard was in 1998, when Debian 2.0 adopted the File Hierarchy Standard (FHS) 2.0. The filesystem suddenly made sense.
  • meaning: if you worked with me I might have told you the sentence “it is about the meaning”. When I started programming I thought that it was about telling machines what to do, but with the years the idea that the work it is about creating shared meaning with other people, even when that other people might be a version of yourself in the future. Cooperation between scientists and software engineers.
  • I’ve seen that it is not rare to find that these two groups have an almost disjoint language to refer to ideas, and I want to move the language of production systems I design towards the language of scientists.
  • curiosity, in general as a spark for life, in particular oriented towards the three previous points: how standards are defined? how meaning can be described formally so different groups can agree on the definition of ideas and processes to work with?
  • libre open science and research, which I believe it is a requirements for humanity to survive and thrive. The script for this talk, the video itself and the work that will be presented in it will be available for everyone to access and modify.
  • last, this is the work of a cognitive artist in motion, the process of creation itself, as well as the ways of thinking about it, is part of the work.

Disclaimer

  • This presentation was not scripted from the beginning. The methodology used was to create a myriad of cutout for different concepts that have been part of the diffuse studies over the past months. Then I played with the concepts while taking pictures, and then I wrote the script. Many important concepts are not explained and the journey itself is not correctly represented. More presentations will refine the technique and come to fix these problems.
  • The explanations, relations and concepts might be wrong, as many of them come from intuitions or light readings across a vast amount of documentation and sources without taking notes.

From geospatial to linked data

  • In the past years I have been working in the field of geospatial software engineering, creating systems that operate with data coming from satellite, ground stations, and other sources, and creating maps or derivative data with them. I’ve had a growing interest for some standards that form a set of building blocks for geospatial web services. Some are
  • OGC API common, that defines the structure the rest of these API standards will use
  • OGC API features, good for sharing vector information
  • OGC API maps, that is good for spatially referenced maps, and
  • OGC API EDR, for Environmental Data Retrieval,
  • and OGC API records, for sharing metadata about your data
  • All these are created by an organization, the Open Geospatial Consortium, and I found particularly interesting the approach because they all use a technology I am very familiar with,
  • openapi, and represents a huge movement towards pairing industry technologies and standard services. In order to define the service contracts, Open API can use
  • json schema, that is a community standard that was originally designed under the
  • Internet Engineering Task Force, although it was finally published under an independent organization. While looking at the documentation of an OGC API implementation, pygeoapi, I saw that one feature is to represent the data returned by the service using
  • JSON-LD, that stands for JSON Linked Data. And that is where everything started to grow mad. JSON-LD is actually a way of serializing
  • RDF, that means Resource Description Framework, that is another standard (recommendation) created by the
  • World Wide Web Consortium, originally designed as a data model for metadata that grew into a more general method to describe and exchange
  • graph data in triples.
  • There are a family of related recommendations: SHACL, Shapes Constrain Language serves to validate RDF data, DCAT is a metadata vocabulary, and SPARQL is a SQL like query language for graph data in RDF.
  • Built on top of RDF, we have OWL, that is the Web Ontology Language, and we will come back to this one.
  • With RDF we also can express semantics, with RDFS, that is the addition of RDF schemas and semantics documents, but it is not as expressive as OWL. If you noticed, the squared entities represent organizations, let’s go over the organizations that are creating metadata standards, focusing mostly in the geospatial domain.

Organizations and their standards

  • First we have OGC, that we already mentioned,
  • And the World Wide Web consortium, that we also discussed,
  • then we have a big, serious organization, that it is not oh my god, but Object Management group. This organization created things like UML, for example.
  • The next is ISO, the International Organization for Standardization. It is a big organization, so it has
  • different kind of teams. The Technical Committee 211 is the one in charge of creating Geographic Information / Geomatics standards,
  • what we called Geospatial previously. And
  • OGC is a assoaciated partner for that Technical Committee. Some things that we know from our every day life can actually be an ISO group, like
  • MPEG, that is the Moving Picture Experts Group. During my first stage of research I became interested in other domains, like the one covered by
  • CIDOC, whose initials come from the original name in French, but the name of the organization is currently International Council of Museum, and finally an organization,
  • Dublin Core Metadata Initiative, that exists to maintain a general purpose metadata vocabulary. Some examples from each are:
  • The already mentioned OGC API records for OGC,
  • the Data Catalog Vocabulary from W3c,
  • the Metadata Object Facility, from OMG
  • and when it comes to ISO standards, many of them are created originally by other organizations, and then adopted by ISO, like the case of ISO/IEC 19502,
  • that is actually the MOF from OMG
  • or the Training data Markup for AI from OGC,
  • and the ISO 19178-1,
  • that are the same
  • From CIDOC we have CIDOC Conceptual Reference Model, that aims to be a theoretical and practical tool for integrating information in the field of cultural heritage,
  • the ISO 21127,
  • is the name for it. Then we have ISO standards that are created directly by ISO,
  • like the ISO 19115, Geographical Information Metadata, or
  • the ISO 11179-7, also called
  • MDR, Metadata Registries, from DCMI, the two standards they have is
  • Dublin Core elements, a set of fifteen elements aimed to give an agnostic, general set of metadata categories, and
  • Dublin Core terms, that include all the dublin core elements and adds more constrains to the new ones. These two are important, and they also have an ISO standards associated with them. Not only you can see some of the different direct interactions with organizations and standards, some other standard will explicitly use or allow the usage of other standards. For example, DCAT will use many of the dc terms in their ontology.
  • Overwhelming? Yes, but not that much if we take into account that

A definition of metadata

  • Metada is all. But, seriously what is metadata then? Most of us have a notion of metadata from the general usage, the information that is stored in a picture when we take them, like the camera used, the location, the resolution; or the details of a song, the album they belong to, the artist, etc. And we think in those terms: it is just some pairs of additional information. But in reality, metadata is involved every time that we have to
  • define a thing, what properties have that thing, and what makes it different
  • from another thing,
  • and how those things are related. And those things can be
  • anything: a material object like a sculpture, a technique for drawing, a school of thought. I am thinking that programming is mostly about metadata. Even data engineers, when they observe the data, they search for patterns, apply transformations, extract new information. All those processes are metadata, and the more we are able to describe formally as such, the more explainable will be a system, so
  • humans working with that system will be happy and also
  • robots working with that system are happy.

RDF, OWL, graphs and description

  • When we say that Thing IS Anything, we are creating a
  • RDF triple, that is defined as …, and I believe that studying
  • graph theory is interesting to see how well these graphs are behaved. If we add more constrains to the meaning, we no longer can do it using RDF, so
  • OWL comes into play. OWL is formally based on Description Logic. Description Logic is actually a family of logic-based knowledge representation formalisms designed to balance expressivity (being able to model rich concepts and relationships) and decidability (being able to guarantee that reasoning will always end, with an answer). OWL adopts subsets of DL as its formal foundation, which means that ontologies written in OWL can be reasoned over using DL reasoners. This allows tasks such as checking consistency, inferring new relationships, classifying concepts, and answering queries.
  • MEtadata
  • id
  • the journey
  • metadata is
  • all. Let’s see now a more practical usage of this

A better flow for AI

  • Imagine a human, or a group of humans, that are crafting
  • one or more ontologies using description logic. Those can be used to do a better
  • context engineering in a knowledgeable AI agentic system, then return the information
  • using the same ontologies. In the future, or maybe it is already happening, depending on how critical a problem to solve is,
  • the agentic systems will also be creating the ontologies, and it is the role of the human team to supervise and agree on the meaning of those ontologies, making it also easier for a mix of humans and robots to digest the output. There are a lot of concepts, relations and implications in these presentation. That is why I created

The Crucible

  • the
  • crucible: An open knowledge repository created to trace, follow, plan studies and projects. And to serve as living metadata architecture laboratory. In the beginning I was thinking that I am working in some
  • projects that cover several
  • domain knowledge areas and use
  • some tools. Then I was thinking how they interact and I came with the ideas of
  • learning pair and
  • learning modes. These learning modes can be diffuse, focused or deep.
  • a learning pair connects a project, a domain knowledge,
  • or a tool with
  • a learning mode.
  • Then we have another “thing” that is a source, and a source
  • is related to a particular learning pair. What is a source? It can be
  • many things: a book, a course, a website, a standard. Then, while going through each of those sources we can create fragments that can be notes, quotes, summaries, or even the script for this talk. One of the sources is a standard, and we already know that standards are associated with
  • organizations, and also that learning pairs can be associated with
  • projects and domain knowledge areas, like
  • machine learning,
  • description logic
  • geospatial, human learning. And the organizations can be
  • W3C, OGC,
  • ISO
  • CIDOC. And the standards are many,
  • many many. And as you can see most of the items in this talk correspond with a category inside the crucible, they are, or will be, text notes inside this knowledge repository. But a graph like this, so interconnected, is messy, and hard to digest. That is why I created another “thing” that I called
  • “index”
  • and helps to make linear content or stories using other documents
  • or fragments of them, creating more human digestible content. Finally,
  • these are all the types on things currently in the Crucible. The metadata architecture applied to the crucible is purposely badly chosen, and does not adhere to any standard. So it can evolve as the studies continue. Chan tata taaa ta t ataaaa

Images