Skip to content

LINK VALIDATION 🔍

Work in progress

Imagine a researcher in the need of integrating data on publications, authors and education institutions. Wrongly executing this integration task could mean, for example, assigning the h-index of author A to author B or wrongly saying that author A is affiliated to an institution he has never been involved with. Hardly any matching algorithm returns only perfect results. In light of this, to reach a solid investigation result, it is of importance to support human validation once links have been created and optionally improved.

1. Uncertainty vs. Vagueness

Referring to propositions (the sky is blue, the bottle is full, the tomato is ripe…) in logic, the use of degree of (un)certainty, belief or confidence is to emphasise an evaluation of the assignment of a truth value to a proposition which may or may not qualify as vague (“the tomato is ripe”). For example, how certain are we in saying that the proposition “the tomato is ripe” is true? As [Lukasiewicz2008] illustrates, saying that “John is a teacher” and “John is a student” with respectively 0.3 and 0.7 as degrees of certainty is roughly saying that “John is either a teacher or a student, but more likely a student”. However, the vague statement “John is tall” with the assignment of 0.9 as degree of truth can be roughly translated as “John is quite tall”.

The transition from assigning boolean truth values {0, 1} to assigning continuous values in the unit interval [0, 1] to propositions (event) is clearly the change from classical logic to fuzzy logic which is in the modelling paradigm of many-valued Logics. The latter truth space (unit interval) is motivated by the presence of vague concepts in a proposition, making it hard or sometimes even impossible to establish whether the proposition is completely true or false [Lukasiewicz2008]. Now, whatever truth value is assigned to an event, sometimes, one wonders about the chance that the even will happen (Probability) or might happen (Possibility)[Kovalerchuk2017].

Degrees of Truth is not to be confused with the Degrees of Certainty / Confidence as the latter is not an assignment of truth value as opposed to the former, but rather an evaluation of a weighted proposition (proposition with an assigned truth value) regardless of its truth value space ({0, 1} or [0, 1]).

  • intrinsic properties
  • the purpose or task for which the links are used.

2. Voting Strategies

In the Semantic Web, the standard semantic OWL/DL interpretation of identity between two resources entails full equality , i.e. they are necessarily the same and share all their properties. Such semantic applies independently of context and therefore no validation or dispute apply since that semantic does not take any of it into account: things are either the same or they are not. [Idrissou 2017]

In real life problems, the equality between resources may depend (i) not only on their intrinsic properties (ii) but also on the purpose or task for which they are used. For example, an organisation A and B are the same in context 1 but not not the same in context 2.

Instead of the rigid owl:sameAs standard semantics, if an alternative semantics is considered where context is taken into account, then things can be the same or not depending on the context that applies. Moreover, within a common context there can still be divergences when, for example, there is not enough information for reaching a correct (indisputable) outcome. In other words, for the same identity link, multiple “truths” (multiple possible interpretations) can co-exist once we agree on the context in which these “truths” are cast.

As a result, the Lenticular Lens tool allows for a single link to be both (i) established within a context and (ii) validated countless times, and then, by continuity, it allows for a collection of links (linkset or lens) to have various validations by different users. Even thought this is a desirable feature, before using the links for integration one needs to make a decision about their validity. This brings us to what we denote as Voting which is the merging of Validations sharing the same context.

The Lenticular Lens tool provides five ways for merging validations. These include: Accepted Once, Rejected Once, Majority, Disputed, Weighted Experts. and Highest Ranking Experts. In the next section, we discuss the aforementioned options.

2.1 Consistent Validations

This is the best scenario. A link has been validated several times with the same outcome, meaning not once a contradiction has occurred. The validation is consistent and therefore remains as it is.

2.2 Disputed Validations

Disputes occur when a link has been validated several times but with contradicting truth statements (inconsistently). For example, when a links has been ACCEPTED three times but REJECTED twice. In this scenario, we follow a simple protocol:

  • The link with contradicting validations is flagged as DISPUTED.

  • The following options can be chosen for reaching an agreement:

    Majority. Consider the link as ACCEPTED if it has been accepted more times than being rejected, otherwise consider the link as REJECTED.

    Rejected Once. Consider the link as REJECTED if it has been rejected at least once.

    Accepted Once. Consider the link as ACCEPTED if it has been accepted at least once.

    Weighted Experts. In this situation, a weighted sum is performed within each voting group (ACCEPTED vs REJECTED). The group with the highest weighted sum gets to cast its vote.

    Highest Ranking Experts. Deciding on how to merge several validations on the basis of the authority with the highest rank is also possible. The decision to accept or reject the link is solely made by the highest ranking expert. It is easy to see here that there can be still disputes among highest ranked experts. In this case, the majority among experts applies.

  • No matter the option, in the event of a tie, the link is REJECTED unless stated otherwise by the user.

3. Validation Support

3.1 Individual Validation

3.2 Group Validation

3.2.1 Linkset

3.2.2 Lens

3.2.3 Cluster-based validation

3.3 Visualisation