Understanding Matching Results ¶
Work in progress
This section addresses the problem of understanding the results of Matching Algorithms and Matching Methods supported in the Lenticular Lens tool and also how to correctly combine them. The former corresponds to a set of rules followed by a computer for finding pairs of matching resources. The latter applies the former to generate matching results. One is expected to (i) choose a similarity algorithm, (ii) provide the required input (datasets, entitytype and propertyvalue restrictions, matching properties…) and (iii) provide the conditions (threshold) under which a matched pair is outputted along with its score/weight. Matching methods involving a single entity matching algorithm are distinguished from those involving more than one: simple versus complex method (not to be confused with the complexity of the underlying the matching algorithm). The latter, naturally, requires the results to be combined.
Although combining matching results may seem relatively easy, deciding on the final score and the annotations of links with weights requires some extra thoughts. First of all, it requires deciding what a matching score is about: degree of truth or degree of confidence? In order words, is this a problem of Vagueness or Uncertainty or both? (see Sect. 1). Answering the later question opens the doors to Section 2 where we unveil our take on “how to combine scores?”. In particular, we distinguish the problems into (i) “how to combine degrees of truth?”, (ii) “how to combine degrees of confidence?” and finally (iii) “how/when to transition from degree of truth to degree of confidence?”.
1. Vagueness vs. Uncertainty¶
This section addresses the issue of “how to interpret the various scores in the process of entity matching?” by digging into “What is vagueness and degrees of truth and how are they different from uncertainty and degrees of confidence?”. Roughly, truth value or degrees of truth (the tomato is ripe with degree of truth 0.7) is not to be confused with degrees of certainty / confidence (I am 0.9 certain that the tomato is ripe with degree of truth 0.7) as the latter is not an assignment of truth value as opposed to the former, but rather an evaluation of a weighted proposition regardless of its truth value space ({0, 1} or [0, 1]).
In the next subsections, the distinction between Vagueness and Uncertainty is outlined based on the interpretation we associate to the output scores of similarity algorithms and matching methods.
1.1 Scores of similarity algorithms¶
Vagueness & Degrees of Truth Similarity algorithms (Levenshtein, Soundex, Cosine…) meant for computing the overlap between a pair of inputarguments (propertyvalue overlap) output values in the unit interval [0, 1] or values convertible in the unit interval (normalisation). These values are truth values / degrees of truth. For example, the inputarguments “Rembrand van Rijn” and “Rembrandt Harmensz van Rijn” have a similarity degree of truth of 0.63 using the Levenshtein algorithm and a similarity degree of truth of 0.74 using the Soundex algorithm. Transitioning from assigning boolean truth values {0, 1} to assigning continuous truth values in the unit interval [0, 1] to propositions (events) is clearly moving from classical logic to fuzzy logic which is in the modelling paradigm of manyvalued Logics. The latter truth space (unit interval) is motivated by the presence of vague concepts in a proposition (the use of ripe in the proposition “the tomato is ripe”), making it hard or sometimes even impossible to establish whether the proposition is completely true or false [Lukasiewicz2008].
1.2 Scores of matching methods¶
Uncertainty & Degree of confidence Whatever truth value is assigned to a proposition or event, sometimes, one may wonder about the likelihood that the event will occur or has occurred [Kovalerchuk2017]. This reflects the uncertainty regarding the statement due to, for example, lack of information. In these cases, the use of theories such as Probabilistic or Possibilistic Logics can be considered for the evaluation of the likelihood of a proposition. In other words, the use of degree of (un)certainty, belief or confidence is to emphasise a confidenceevaluation on the assignment of a truth value to a proposition, which may or may not qualify as vague (the tomato is ripe, the sky is blue, the bottle is full…). For example, how certain are we in asserting that the proposition “the tomato is ripe” is true? As [Lukasiewicz2008] illustrates, asserting that “John is a teacher” and “John is a student” with respectively the values 0.3 and 0.7 as degrees of certainty is roughly saying that “John is either a teacher or a student, but more likely a student”. However, the vague statement “John is tall” with the assignment of 0.9 as degree of truth can be roughly translated as “John is quite tall” rather than as “John is likely tall”.
Entity matching methods^{1} output links optionally annotated with matching scores reflecting (i) the level of confidence of a method (the strength of the evidence) for assimilating a pair of resources to be coreferents and (ii) the lower boundary under which two resources can be viewed as coreferents. For example, given the resources $e_1$ and $e_2$ respectively labeled Rembrand van Rijn and Rembrandt Harmensz van Rijn, $e_1$ and $e_2$ can be linked with a matching score of 0.63 using the Levenshtein algorithm, provided that it is acceptable doing so at a matching score above 0.60 (more details on how this confidence score can be calculated is provided in Section 2.3).
2. Combination of the Scores¶
Even though degree of truth (scores of similarity algorithms) and degree of confidence (scores of matching methods) are not be confused, they may be combined (multiple truth values or multiple degrees of confidence) or they me be subject to transition. In fact, it may be almost unthinkable to solve reallifeproblems without doing so. Considering the famous abductive reasoning example of the duck test where something is probably is a duck if it (i) looks like a duck, (ii) swims like a duck, (iii) and quacks like a duck. This requires first a combination of truth values associated to propositions (i), (ii) and (iii) as conjunction (see Sect. 2.1), followed by a transition into a confidence value concluding that it is probably a duck (see Sect. 2.3).
2.1 Combining Truth Values¶
One thing we know and agreed on is that similarity algorithms output boolean or fuzzy truth values in the range [0, 1]. This allows to make use of logic combination functions offers in conventional logic (Subsection 2.1.1) of fuzzy logic (Subsection 2.1.2) depending on whether we expect the solution space to be {0, 1} or [0, 1].
2.1.1 Classic Logic¶
The two standard logic operators or combination functions traditionally used are the classical Boolean Conjunction ($\land$) and Disjunction ($\lor$). The first takes the minimum strength and the latter takes the maximum strength. This applies for both classic values (True 1 or False 0) and fuzzy values ( between 0 and 1 ).
Since the results from matching methods are assigned fuzzy values in the interval ]0,1], the table bellow illustrates the default behaviour of the Lenticular Lens when combining them.
Example 18: Standard logic operations over conjunction (min) and disjunction (max).
Source Target Levenshtein Soundex OR(max) AND(min)

Jasper Cornelisz. Lodder Jaspar Cornelisz Lodder 0.92 1.00 1.00 0.92
Rembrand van Rijn Rembrandt Harmensz van Rijn 0.63 0.74 0.74 0.63
Barent Teunis Barent Teunisz gen. Drent 0.52 0.47 0.52 0.47
2.1.2 Fuzzy Logic¶
As provided in the next subsections, sophisticated combination functions such as Tnorms ($\otimes$) and Snorms ($\oplus$), developed by scholars like Łukasiewicz, Gödel, Goguen, Zadeh and others can also be used as alternatives for respectively Boolean Disjunction ($\lor$) and Conjunction ($\land$).
2.1.2.1 Tnorms¶
A list of six different operations can be applied when dealing with methods combined by Conjunction. Here, we present them:

Minimum tnorm
$⊤_{min} (a, b) = min(a, b) \tag{1}$

Product tnorm
$⊤_{prod} (a, b) = a \text{ . } b \tag{2}$

Łukasiewicz tnorm
$⊤_{Luk} (a, b) = max(0, a + b  1) \tag{3}$

Drastic
$⊤_D(a, b) = \begin{cases} b &\text{if } a = 1 \\ a &\text{if } b = 1 \\ 0 &\text{otherwise} \end{cases} \tag{4}$
 Nilpotent minimum
$⊤_{nM}(a, b) = \begin{cases} min(a, b) &\text{if } a + b > 1 \\ 0 &\text{otherwise} \end{cases} \tag{5}$
 Hamacher product
$⊤_{H_0}(a, b) = \begin{cases} 0 &\text{if } a = b = 0 \\ \dfrac{ab}{a + b  ab} &\text{otherwise} \end{cases} \tag{6}$
The following table provides three case studies to illustrate the application of each of the aforementioned Tnorm binary operations. They are presented in order from the less strict (⊤_{min}) to the most strict (⊤_{D}).
Source, Target 
Levenshtein, Soundex  ⊤_{min}  ⊤_{H0}  ⊤_{prod}  ⊤_{nM}  ⊤_{Luk}  ⊤_{D} 

Src: Jasper Cornelisz. Lodder Trg: Jaspar Cornelisz Lodder 
0.92, 1.00  0.920  0.920  0.920  0.920  0.920  0.920 
Src: Rembrand van Rijn Trg: Rembrandt Harmensz van Rijn 
0.63, 0.74  0.630  0.516  0.466  0.630  0.370  0.000 
Src: Barent Teunis Trg: Barent Teunisz gen. Drent 
0.52, 0.47  0.470  0.328  0.244  0.000  0.000  0.000 
2.1.2.2 Snorms¶
A list of six different operations can also be applied when dealing with methods combined by Disjunction. Here, we present them:
 Maximum Snorm
$⊥_{max} (a, b) = max(a, b) \tag{7}$

Probabilistic sum
$⊥_{sum} (a, b) = a + b  a \text{ . } b \tag{8}$

Bounded sum
$⊥_{Luk} (a, b) = min(a + b, 1) \tag{9}$

Drastic Snorm
$⊥_D(a, b) = \begin{cases} b &\text{if } a = 0 \\ a &\text{if } b = 0 \\ 1 &\text{otherwise} \end{cases} \tag{10}$
 Nilpotent maximum
$⊥_{nM}(a, b) = \begin{cases} max(a, b) &\text{if } a + b < 1 \\ 1 &\text{otherwise} \end{cases} \tag{11}$
 Einstein sum
$⊥_{H_2} (a, b) = \dfrac{a + b} {1 + ab} \tag{12}$
Source, Target 
Levenshtein, Soundex  ⊤_{D}  ⊤_{Luk}  ⊤_{H2}  ⊤_{sum}  ⊤_{nM}  ⊤_{max} 

Src: Jasper Cornelisz. Lodder Trg: Jaspar Cornelisz Lodder 
0.92, 1.00  1.000  1.000  1.000  1.000  1.000  1.000 
Src: Rembrand van Rijn Trg: Rembrandt Harmensz van Rijn 
0.63, 0.74  1.000  1.000  0.934  0.904  1.000  0.740 
Src: Barent Teunis Trg: Barent Teunisz gen. Drent 
0.52, 0.47  1.000  0.990  0.796  0.746  0.520  0.520 
2.1.3 Examples¶
Suppose that, two data items E1 and E2 have the following information:

E1
 Name: Titus Rembrandtsz. van Rijn
 Mother: Saskia Uylenburgh
 Father: Rembrand van Rijn
 Parent’s Marriage date: 16440622

E2
 Name: T. Rembrandtszoon van Rijn
 Mother: Saske van Uijlenburg
 Father: Rembrandt Harmensz van Rijn
 Baptism date:16410922
To interpret E1 and E2 as representing coreferent persons, the following four tests are proposed.
Test1 OR
Here, the names of E1 and E2 are to be compared using the Levenshtein and Soundex algorithms at a threshold of at least 0.7.
MATCHING RESULTS
 Levenshtein(Titus Rembrandtsz van Rijn, T. Rembrandtszoon van Rijn) => 0.73 ✅
 sdx_1 = Soundex(Titus Rembrandtsz van Rijn) = T320 R516 V500 R250
 sdx_2 = Soundex(T. Rembrandtszoon van Rijn) = T000 R516 V500 R250
 Levenshtein(sdx_1, sdx_2) => 0.89 ✅
DISJUNCTIONS RESULTS
 names similarity = Snorm(0.73, 0.88, 'MAXIMUM') => 0.89 ✅
 names similarity = Snorm(0.73, 0.88, 'PROBABILISTIC') => 0.97 ✅
Test2 AND
Names of the postulated mothers and fathers are to be similar at a threshold of at least 0.6 using the Levenshtein algorithm.
MATCHING RESULTS
 Levenshtein(Saskia Uylenburgh, Saske van Uijlenburg) => 0.65 ✅
 Levenshtein(Rembrand van Rijn, Rembrandt Harmensz van Rijn) => 0.63 ✅
CONJUNCTION RESULTS
 Parent's names similarity = t_norm(0.65, 0.63, 'MINIMUM') => 0.63 ✅
 Parent's names similarity = t_norm(0.65, 0.63, 'HAMACHER') => 0.47 ❌
Test3
The period between the parent’s marriage date on the one side and the child’s baptism date on the other side are to be no more than 25 years apart.
MATCHING RESULTS
 Delta(16680228, 16690322, 25) => 1.00 ✅
Test4 AND
Combining all above three tests considering a the conjunction fuzzy operator should result in a similarity score above or equal to 0.8.

FINAL CONJUNCTIONS WITH A TRUTH VALUE LIST OF [0.850, 0.63, 1]

 t_norm_list([0.850, 0.63, 1], 'MINIMIUM') => 0.63 ❌
 t_norm_list([0.850, 0.63, 1], 'HAMACHER') => 0.58 ❌
 t_norm_list([0.850, 0.63, 1], 'PRODUCT') => 0.56 ❌
 t_norm_list([0.850, 0.63, 1], 'NILPOTENT') => 0.63 ❌
 t_norm_list([0.850, 0.63, 1], 'Łuk') => 0.52 ❌
 t_norm_list([0.850, 0.63, 1], 'DRASTIC') => 0.00 ❌

CONJUNCTIONS WITH A DIFFERENT LIST OF TRUTH VALUES [0.89, 0.82, 1]

 t_norm_list([0.89, 0.82, 1], "MINIMUM") => 0.82 ✅
 t_norm_list([0.89, 0.82, 1], "HAMACHER") => 0.74 ❌
 t_norm_list([0.89, 0.82, 1], "PRODUCT") => 0.73 ❌
 t_norm_list([0.89, 0.82, 1], "NILPOTENYT") => 0.82 ✅
 t_norm_list([0.89, 0.82, 1], "LUK") => 0.71 ❌
 t_norm_list([0.89, 0.82, 1], "DRASTIC") => 0.0 ❌

EXAMPLE USING MORE THAN ONE FUZZY LOGIC OPERATOR

 Ops.t_norm(Ops.t_norm(0.850, 0.63, 'HAMACHER'), 1, 'MINIMIUM') => 0.57 ❌
 Ops.t_norm(Ops.t_norm(0.850, 0.63, 'MINIMIUM'), 1, 'HAMACHER') => 0.63 ❌
Conclusion: Given the evidence provided for E1 and E2 and the rules described above, the interpretation resulting from the chosen fuzzy logic operations leads to the conclusion that there is no sufficient evidence to infer that the underlying data items are coreferent. This rejection is mainly due to the low similarity of the parents’ names. If the resulting similarity were above 0.8, there would then be a better chance for the data items to be coreferent. Keep in mind that our conjectured rule asserts an identity relation for combination of scores only when above 0.8. A better data or more advanced algorithm could have helped.
2.2 Combining Confidence Values¶
Understanding how to combine uncertain events starts by a better understanding of uncertainty itself. Sentz et al. provide two important distinctions of uncertainty: Aleatory (Objective Uncertainty originated from random behaviour) or Epistemic (Subjective Uncertainty originated from ignorance or lack of knowledge).
Whereas traditional probability is clearly applicable to deal with Aleatory Uncertainty, researchers claim its inability to deal with Epistemic Uncertainty. In short this is because the latter neither implies knowing probability of all relevant events, nor its uniform distribution, not even the axiom of additivity (i.e. all probabilities summing up to 1). This leads to the emergence of more general representations of uncertainty as alternatives to the traditional probability theory, such as imprecise probabilities, possibility theory and evidence theory. Nonetheless, at present, there is no clear best representation of uncertainty [Sentz2002].
This section introduces alternative representations of uncertainty that are planned to be implemented in the Lenticular Lens. Although their choice can be ultimately a choice of the user, we consider the problem of coreference search by applying multiple matching methods to be a case of Epistemic Uncertainty nicely approached in evidence theory.
2.2.1 Probabilistic Logic¶
Using probability for combining confidence values with the logic operators “AND” and “OR” can be done in the context of link manipulation in theory with the equations (13) and (14) respectively with the strong assumption that the events to be combined are independent (the occurrence of one event has no effect on the probability of the occurrence of the other).
$P(\text{A and B}) = P(A) \: \cdotp P(B) \tag{13}$
$P(\text{A or B}) = P(A) + P(B)  P(\text{A and B}) \tag{14} \\ \footnotesize \text{where } P(\text{A and B}) = 0 \text{ if A and B are mutually exclusive events} \\ \text{ meaning that these events have no outcomes in common.}$
On the one hand, assuming that events A and B are independent, equations (13) and (14) are somewhat similar to $\otimes$ $\big(⊤_{prod}(a, b) = a \text{ . } b\big)$ and $\oplus$ $\big(⊥_{sum}(a, b) = a + b  a \text{ . } b\big)$ of Product Logic hence straightforward to implement when in the need of manipulating links such as applying an intersection or a union operator to two sets of links for example.
On the other hand, in the event that A and B are not independent, the value of $P(\text{A and B})$ should be observable $\frac{n(A\text{ and } B)}{n(Sample)}$, provided or computed using conditional probability in Equation 15.
$𝑃(\text{A and B})=𝑃(𝐴)\text{ . }𝑃(𝐵𝐴) \tag{15}$
In the context of the Lenticular Lens, the computed confidence values are independent (the computation of a confidence value by method1 has no effect on the computation of a confidence value by method2). This being said, equation (15) is not applicable.
2.2.2 Possibilistic Logic¶
Possibility^{2} is compositional with respect to the union operator as the possibility of the union is deducible from the possibility of each component. Note however that it is not compositional with respect to the intersection operator.
$pos(A \cup B) = max(pos(A), \: pos(B)) \:\:\:\: \text{\footnotesize for any subsets A and B} \tag{16}$
$pos(A \cap B) \leq min(pos(A), \: pos(B)) \leq max(pos(A), \: pos(B)) \tag{17}$
2.2.3 Evidence Theory¶
Evidence theory also provide different ways for combining uncertain scores. We present here two of them: starting from the first proposal, namely DempsterShafer, which is shown to have limitations, followed by a more accepted approach called average.
2.2.3.1 DempsterShafer¶
Combining or aggregating confidence values associated with evidence is made possible with the DempsterShafer conjunctive combination rule as presented in Equation 18. Here too, the assumption of independence among sources providing supporting or conflicting assessments for the same frame of discernment [Sentz2002]) is of keyimportance and the basic assumption supporting the DempsterShafer combination rule. However, a crucial context dependent limitation of this rule as pointed out by [Zadeh, 1984], occurs in cases with significant conflicts because the denominator in the DempsterShafer combination rule has the effect of completely ignoring conflict while the numerator emphasis agreement, thereby yielding result inconsistency (unintuitive).
$m_{12}(A) = (m_1 \oplus m_2)(A) = \footnotesize{\frac{\text{supporting evidence}}{1  \text{conflicting evidence}}} = \frac{ \displaystyle\sum_{B \cap C = A \ne \emptyset} m_1(B) m_2(C) }{1  \displaystyle\sum_{B \cap C = \emptyset} m_1(B) m_2(C) } \tag{18}$
$\scriptsize \text{where } \begin{cases} m &\text{Basic probability assignment (bpa) function for the universal set P(X) to [0, 1]}. \\ m(A) &\text{The bpa value or the mass of a given set A but not for a particular subset of A}. \\ m_1 \text{, } m_2 &\text{Two given basic probability assignments}. \\ m_{12}(A) &\text{The combination a.k.a the joint } m_{12}. \\ m(\emptyset) = 0 &\text{The mass of the empty set is zero.} \\ \displaystyle\sum_{A\in P(X)} m(A) = 1 &\text{The masses of all the members of the power set add up to a total of 1.} \end{cases}$
This inconsistency is higgled in Fig. 1: Patient Diagnose (1) where the joint agreement on the patient’s condition results to a 1 using Dempster’s combination rule although the doctors agreed that the patient is lesslikely to suffer from a brain tumour. Has it been the opposite scenario as in Fig. 1: Patient Diagnose (2) (Dr. Green and Dr. House assigning 0.99 as the basic probability for the patience suffering from a brain tumour), the result of $\scriptsize m_{12}(brainTumor)=1$ would be consistent with our intuition.
2.2.3.2 Averaging¶
Many alternatives to Equation 18 have been proposed by scholars such as Yager (modified Dempster’s rule), Inagaki (modified combination rule), Zhang (center combination rule), Dubois and Prade (disjunctive consensus rule), Ferson and Kreinovich (averaging), etc. One particular approach called Averaging (Equation 19) is considered to produced better outcomes [Choi2009] and therefore more likely to be implemented in the Lenticular Lens for combining confidence values, such as lens operations union and intersection. It provides means to calculate an average of several (n) sources while also taking into account possible reliability weights attributed to each source.
$m_{1...n}(A) = \frac{1}{n} \sum_{i=1}^{n} w_im_i(A) \tag{19}$
$\scriptsize \text{Where } \begin{cases} n &\text{Number of sources}. \\ w_i &\text{Reliability weight of the source}. \\ m_i &\text{Basic probability assignment of a body of evidence}. \\ \end{cases}$
Applying Equation 19 to the two scenarios illustrated in Fig. 1 will yield the following results:
 Patient Diagnosis (1): $\small m_{1,2}(brainTumor) = \frac{0.01 + 0.01}{2} = 0.01$
 Patient Diagnosis (2): $\small m_{1,2}(brainTumor) = \frac{0.99 + 0.99}{2} = 0.99$
2.3 From Truth to Confidence¶
Similarly to the duck test, the modelling of ways to find supporting evidence for isolating potential entity matching candidates is crucial to inferring identity for a pair of resources. Section 2.1 already covers our take on how to combine truth values and Section 2.2 covers the combination of degrees of confidence. What now remains is to understand “How to transition from truth value to uncertainty / degree of confidence?”. For illustration purpose, this means for example, how to move from [looks like a duck (0.8), swims like a duck (1.0), and quacks like a duck (0.95)] to [it probably is a duck (???)]. Applying $⊤_{prod}$ to the evidence truth values results in a truth value of 0.76. Assuming that a transition from the evidence truth value to the degree of confidence carries a weight of 1, we argue that it is now possible to extrapolate a confidence of 0.76 for asserting that the entity that looks like a duck (0.8), swims like a duck (1.0), and quacks like a duck (0.95) is probably (0.76) a duck. Similarly, if this transition weight is now reset to 0.9 for example, it is also reasonable that the degree of confidence computed for the entity being a duck drops to 0.684.
2.4 Implementation¶
There are several operations in the Lenticular Lens in which one or more of the above discussed values and their combinations take place. We summarise them here, including also the possibilities of future improvements to allow for more control over the produced values.
Link Construction
 Transition from Truth to Confidence Degree
 Example: If the entities’ names sound alike with degree of truth above 0.6, then the resources are probably the same with the
soundalike * 1
as degree of confidence.
 Combination of truth values via logicboxes followed by Transition from Truth to Confidence Degree
 Example: If the entities’ names sound alike with a degree of truth above 0.6 OR look alike with a degree of truth above 0.7, AND the date of birth is the same, then final degree of truth using classical OR/AND combinations is
min(max(soundalike, lookalike), samebirth)
and the resources are probably the same with the final degree of truth as degree of confidence.
Currenlty a fix transitionweight of 1 applied, so that the final score (currenlty only one is outputted) reflects both degrees of truth and confidence.
Improvements:
 Explicit output degrees of truth, confidence and transitionweight;
 Allow for the user to decide on the transitionweight applied, so that lowpower identity criteria such as the example in simple method would not conclude that high name similarity means high confidence.
Link Manipulation
1. Union

Disjunction of the final degrees of truth, followed by the reassignment of the confidence value given a transitionweight.
 Possible combinations:
 classic OR
 Snorms
 Possible combinations:

Disjunction of the final degrees confidence
 Possible combinations:
2. Intersection

Conjunction of the final degrees of truth, followed by the reassignment of the confidence value given a transitionweight
 Possible combinations:
 classic AND
 Tnorms
 Possible combinations:

Conjunction of the final degrees confidence
 Possible combinations:
 Probabilistic AND
 Possibilitic Intersection
 Averaging is applicable provided that there is agreement between all sources, i.e. all the sources have produced a degree of confidence above 0 for the link under scrutiny.
 Possible combinations:
3. Difference
This operation does not require combination of values, but simply selects the matches that do not occur in another linkset.
4. Composition

Transitivity over the final degress of truth ???

Composition of confidence attributed to independent events ???
5. In Set
This operation does not require combination of values, but simply selects the matches of which resources do occur in a given resourceset.
Currently, only the combination of the truth values are implemented, with the transitionweight equal to 1.
Improvements: allowing for the user to decide which values to combine and how, plus what transitionweight to (re)apply if needed would render the system more flexible.
Link Validation
On top of the automatically calculated dregre of confidence discussed so far, the manual validation allows for the user to attribute its own confidence.
Currently, such manual attribution consists of simply accepting or rejecting the produced link (which means manual confidence of 1 or 0).
Improvements: allowing for the user to attribute a confidence (increasing or decreasing the automatically calculated one) will render the system more flexible to handle matches that cannot be easily accepted or rejected, by allowing for example several experts’ opinion to be registered and the final user to decide whether to take it as acceptable or not.
This transition weight can easily be applied in the generation and manipulation of links. In a simplest setting where the transition weight is set to 1, discovered links can be annotated with an estimated degree of confidence by extrapolation of the evidence’s truth value using an appropriate combination function. Things become a bit more complicated when dealing with the manipulation of links because we have options that range from classic or fuzzy logics to possibilistic or probabilistic logics or evidence theory. In the score combination examples illustrated below, ex:lens1
is the result of the union of of ex:linkset1
and ex:linkset2
using fuzzy logic over the respective evidence’s truth value of links being united while in ex:lens2
and ex:lens3
the degree of confidence of a link is computed using evidence theory and probabilistic logic.
Combining Uncertainty in Identity
### Linkset metadata ###
#########################
ex:linkset1
ex:combinationfunction ex:tnormproduct ;
ex:transitionweight 1.0 .
ex:linkset2
ex:combinationfunction ex:snormmax ;
ex:transitionweight 1.0 .
### Annotated Linkset ###
#########################
ex:linkset1
{
<<ex:e1 owl:sameAs ex:e2>>
ex:degreeoftruth 0.76 ;
ex:degreeofconfidence 0.76 .
}
ex:linkset2
{
<<ex:e1 owl:sameAs ex:e2>>
ex:degreeoftruth 0.9 ;
ex:degreeofconfidence 0.9 .
}
#################################################################
### lens1: Obtaining a degree of confidence by combining ####
### scores with a UNION operation based on truth values ####
### using snormsum fuzzy logic operator. ####
#################################################################
ex:lens1
ex:operator ex:UNION ;
ex:target ex:linkset1, ex:linkset2 ;
ex:combinationfunction ex:snormsum ;
ex:transitionweight 1.0 .
ex:lens1
{
<<ex:e1 owl:sameAs ex:e2>>
ex:degreeoftruth 0.976 ;
ex:degreeofconfidence 0.976 .
}
#################################################################
### lens2: Obtaining a degree of confidence by combining ####
### scores with a UNION operation based on confidence values ####
### using the event averaging operator. ####
#################################################################
ex:lens2
ex:operator ex:UNION ;
ex:target ex:linkset1, ex:linkset2 ;
ex:combinationfunction ex:averaging .
ex:lens2
{
<<ex:e1 owl:sameAs ex:e2>>
ex:degreeofconfidence 0.83 .
}
#################################################################
### lens2: Obtaining a degree of confidence by combining ####
### scores with a UNION operation based on confidence values ####
### using probabilistic logic. ####
#################################################################
ex:lens3
ex:operator ex:UNION ;
ex:combinationfunction ex:Probabilistic .
ex:lens3
{
<<ex:e1 owl:sameAs ex:e2>>
ex:degreeofconfidence 0.976 .
}
Alex
Ben
G
3. Conclusion¶
We have shed light on the ambiguity surrounding the concepts of vagueness and uncertainty and their corresponding scores degree of truth and degree of confidence. This enables us to understand the nature of the computed scores and provide appropriate labels to scores obtained by matching algorithms as degree of truth (property value comparisons) and scores assigned to identity links generated by machines (matching methods or combination) or human (curation) as degree of confidence. As a consequence, different options for aggregating/combining these scores are presented, depending on whether one is dealing with degree of truth or degree of confidence.

Matching methods make explicit all arguments/prerequisites (datasets, entitytype and propertyvalue restrictions, matching properties…) of a matching algorithm including the conditions in which the algorithm is to accept a discovered link (threshold). ↩

For intellectual curiosity, see wikipedia for more information. ↩