How To Compare Sets In Python
A while ago I wrote a guide on how to compare two dictionaries in Python 3, and how this task is not as unproblematic as it might sound. It turns out comparing two lists in Python is just so catchy as comparison dict
s.
The way we learn to compare two objects in Python is by using either the ==
or the is
operator. In reality, these two operators embrace just a small fraction of the most frequent employ cases.
For example:
- what if we want to compare a list of floating-point numbers considering a certain tolerance?
- what if we wish to dissimilarity ii lists but ignoring the order in which the elements appear?
- maybe we need to compare two lists and return the elements that intersect both
- sometimes nosotros might desire to become the deviation between two lists
- what if nosotros have two lists of strings and demand to compare them past ignoring the string cases?
- what if nosotros're given a listing of
numpy
arrays to compare each other, what can nosotros do? - or maybe we have a list of custom objects, or a list of dictionaries.
The list goes on and on, and for all of these use cases using ==
doesn't help.
That'due south what we are going to see in this article. We'll learn the all-time ways of comparing 2 lists in Python for several use cases where the ==
operator is not enough.
Set up? Permit'due south go!
Comparing if two lists are equal in python
The easiest way to compare two lists for equality is to use the ==
operator. This comparing method works well for simple cases, but as we'll see afterwards, it doesn't work with advanced comparisons.
An example of a simple case would be a listing of int
or str
objects.
>>> numbers = [1, two, 3] >>> target = [1, 2, 3] >>> numbers == target True >>> [1, 2, 3] == [one, iii, 2] False >>> ['proper noun', 'lastname'] == ['name', 'lastname'] Truthful >>> ['proper noun', 'lastname'] == ['name', 'terminal proper noun'] False
Pretty simple, right? Unfortunately, the world is complex, and so is production class lawmaking. In the real world, things get complicated really fast. Every bit an illustration, consider the post-obit cases.
Suppose you lot have a list of floating points that is built dynamically. Yous can add together unmarried elements, or elements derived from a mathematical operation such as 0.1 + 0.i
.
>>> numbers = [] >>> numbers.append(0.1 + 0.1 + 0.i) # derive the element based on a summation >>> numbers.append(0.ii) # add together a single chemical element >>> target = [0.3, 0.2] >>> numbers == target # compares the lists Fake >>> numbers # Ooopppssss.... [0.30000000000000004, 0.2] >>> target [0.3, 0.ii]
Clearly, floating point arithmetic has its limitations, and sometimes we want to compare ii lists but ignore precision errors, or even define some tolerance. For cases like this, the ==
operator won't suffice.
Things tin go more than complicated if the lists have custom objects or objects from other libraries, such equally numpy
.
In [1]: import numpy as np In [2]: numbers = [np.ones(three), np.zeros(2)] In [3]: numbers Out[3]: [array([i., 1., 1.]), array([0., 0.])] In [4]: target = [np.ones(3), np.zeros(2)] In [5]: numbers == target --------------------------------------------------------------------------- ValueError Traceback (near recent telephone call final) <ipython-input-5-b832db4b039d> in <module> ----> ane numbers == target ValueError: The truth value of an array with more than than one element is ambiguous. Utilise a.any() or a.all()
You might also similar to compare the lists and return the matches. Or maybe compare the two lists and return the differences. Or perhaps you desire to compare two lists ignoring the duplicates, or compare a list of dictionaries in Python.
In every single instance, using ==
is non the reply, and that'south what nosotros are going to see next: how to perform complex comparison operations between two lists in Python.
Comparing two lists of bladder numbers
In the previous section, we saw that floating indicate arithmetics can cause precision errors. If nosotros have a list of floats and want to compare it with some other list, chances are that the ==
operator won't aid.
Let's revisit the example from the previous section and see what is the best way of comparison ii lists of floats.
>>> numbers = [] >>> numbers.append(0.i + 0.1 + 0.1) # derive the chemical element based on a summation >>> numbers.append(0.two) # add a unmarried element >>> target = [0.three, 0.2] >>> numbers == target # compares the lists False >>> numbers # Ooopppssss.... [0.30000000000000004, 0.2] >>> target [0.3, 0.2]
As you see, 0.one + 0.one + 0.1 = 0.30000000000000004
, which causes the comparison to fail. Now, how tin we do better? Is information technology fifty-fifty possible?
There are a few means of doing budgeted this task. Ane would be to create our ain custom function, that iterates over the elements and compare it one by one using the math.isclose()
office.
Fortunately nosotros don't accept to reinvent the wheel. As I showed in the "how to compare ii dicts" article, we tin can use a library called deepdiff
for that. This library supports different types of objects and lists are one of them.
The example below starts off by setting up the two lists we desire to compare. Nosotros and so laissez passer information technology to the deepdiff.DeepDiff
constructor which returns the difference. That's great, the returned value is much more informative than a simple boolean.
Since we want to ignore the precision error, we tin can set the number of digits Subsequently the decimal point to be used in the comparing.
The result is an empty dict, which means the lists are equal. If nosotros try comparing a list with a float number that differs in more than 3 significant digits, the library volition return that diff.
For reproducibility, in this article I used the latest version of deepdiff
which is 5.6.0
.
In [i]: from deepdiff import DeepDiff In [2]: numbers = [] In [3]: numbers.append(0.1 + 0.1 + 0.one) # derive the element based on a summation In [iv]: numbers.append(0.2) # add a unmarried element In [5]: target = [0.3, 0.2] # if we don't specify the number of meaning digits, the comparison will use == In [half-dozen]: DeepDiff(numbers, target) Out[six]: {'values_changed': {'root[0]': {'new_value': 0.iii, 'old_value': 0.30000000000000004}}} # 0.30000000000000004 and 0.iii are equal if we only look at the first 3 meaning digits In [7]: DeepDiff(numbers, target, significant_digits=iii) Out[7]: {} In [8]: numbers Out[8]: [0.30000000000000004, 0.2] In [ix]: target = [0.341, 0.two] # 0.341 differs in more than than three significant digits In [10]: DeepDiff(numbers, target, significant_digits=3) Out[ten]: {'values_changed': {'root[0]': {'new_value': 0.341, 'old_value': 0.30000000000000004}}}
Comparing if two lists without social club (unordered lists) are equal
Lists in Python are unordered by default. Sometimes we want to compare ii lists simply care for them as the same every bit long as they have the same elements—regardless of their order.
At that place are two ways of doing this:
- sorting the lists and using the
==
operator - converting them to
set
due south and using the==
operator - using
deepdiff
These outset two methods presume the elements can be safely compared using the ==
operator. This approach doesn't work for floating-indicate numbers, and other complex objects, but as we saw in the previous section, we tin can use deepdiff
.
Sorting the lists and using the ==
operator
You can sort lists in Python in two different ways:
- using the
listing.sort()
method - using the
sorted()
office
The kickoff method sorts a list in place, and that means your list volition be modified. It's a good idea to not change a listing in place as it tin can introduce bugs that are difficult to detect.
Using sorted
is meliorate since it returns a new list and keep the original unmodified.
Let'south see how information technology works.
In [vi]: numbers = [10, thirty, twenty] In [vii]: target = [x, twenty, xxx] In [viii]: numbers == target Out[8]: Simulated In [9]: sorted(numbers) == sorted(target) Out[9]: Truthful In [10]: sorted(numbers) Out[10]: [10, twenty, 30] In [11]: sorted(target) Out[eleven]: [10, 20, xxx]
As a outcome, past sorting the lists showtime we ensure that both lists volition take the aforementioned guild, and thus can exist compared using the ==
operator.
Converting the list
s to a set up
Reverse to lists, sets in Python don't care about order. For instance, a set up {1, 2, iii}
is the aforementioned as {two, iii, 1}
. As such, we can employ this characteristic to compare the ii lists ignoring the elements' order.
To do so, nosotros convert each listing into a set, then using the ==
to compare them.
In [12]: numbers = [10, thirty, twenty] In [thirteen]: target = [x, 20, 30] In [14]: set(numbers) == fix(target) Out[14]: True In [15]: set(numbers) Out[15]: {ten, xx, 30} In [16]: gear up(target) Out[16]: {ten, twenty, thirty}
Using the deepdiff
library
This library besides allows us to ignore the order in sequences such as list
s. By default, it will take the order in consideration, but if we set ignore_order
to True
, so nosotros're all good. Allow's see this in action.
In [eleven]: numbers = [ten, 30, 20] In [12]: target = [ten, 20, 30] In [xiii]: DeepDiff(numbers, target) Out[thirteen]: {'values_changed': {'root[one]': {'new_value': 20, 'old_value': 30}, 'root[2]': {'new_value': 30, 'old_value': 20}}} In [14]: DeepDiff(numbers, target, ignore_order=Truthful) Out[14]: {}
Using deepdiff
has pros and cons. In the finish, information technology is an external library yous demand to install, so if you can use a gear up
to compare the lists, so stick to it. Yet, if yous have other use cases where it can shine, then I'd get with it.
How to compare two lists and return matches
In this section, nosotros'll see how we can compare 2 lists and find their intersection. In other words, we want to discover the values that announced in both.
To practice that, we can again use a set
and take their intersection.
In [1]: t1 = [2, 1, 0, 7, 4, 9, iii] In [2]: t2 = [vii, 6, 11, 12, 9, 23, 2] In [3]: set(t1).intersection(ready(t2)) Out[3]: {ii, 7, 9} # the & operator is a shorthand for the prepare.intersection() method In [4]: set(t1) & set(t2) Out[four]: {ii, vii, 9}
How to compare ii lists in python and return differences
We can the find difference between ii lists in python in two different ways:
- using
set
- using the
deepdiff
library
Using set
Merely like we did to decide the intersection, we can leverage the set
information structure to check deviation betwixt 2 lists in python.
If we want to get all the elements that are present in the outset list but not in the second, nosotros can employ the set.difference()
.
On the other paw, if nosotros desire to find all the elements that are in either of the lists but not both, and so nosotros tin can use set.symmetric_difference()
.
In [8]: t1 = [2, 1, 0, 7, 4, 9, 3] In [ix]: t2 = [7, six, 11, 12, 9, 23, ii] In [10]: set(t1).difference(set(t2)) Out[10]: {0, ane, three, 4} In [eleven]: fix(t2).departure(set(t1)) Out[11]: {vi, 11, 12, 23} In [12]: set(t1).symmetric_difference(set up(t2)) Out[12]: {0, one, 3, 4, half dozen, eleven, 12, 23} In [xiii]: gear up(t1) - set(t2) Out[13]: {0, i, 3, 4} In [xiv]: prepare(t1) ^ set(t2) Out[14]: {0, 1, iii, 4, 6, 11, 12, 23}
This method has a limitation: it groups what is different between the lists into one final result which is the set difference. What if we want to know which elements in that diff vest to what list?
Using deepdiff
As we've seen so far, this library is powerful and it returns a prissy diff. Allow'southward see what happens when we use deepdiff
to get the difference between ii lists in Python.
In [15]: t1 = [2, 1, 0, 7, 4, ix, iii] In [xvi]: t2 = [7, six, 11, 12, 9, 23, two] In [17]: DeepDiff(t1, t2) Out[17]: {'values_changed': {'root[0]': {'new_value': seven, 'old_value': 2}, 'root[1]': {'new_value': six, 'old_value': 1}, 'root[2]': {'new_value': 11, 'old_value': 0}, 'root[three]': {'new_value': 12, 'old_value': vii}, 'root[four]': {'new_value': 9, 'old_value': 4}, 'root[5]': {'new_value': 23, 'old_value': nine}, 'root[vi]': {'new_value': 2, 'old_value': 3}}} In [18]: DeepDiff(t1, t2, ignore_order=Truthful) Out[18]: {'values_changed': {'root[4]': {'new_value': 6, 'old_value': 4}, 'root[6]': {'new_value': 11, 'old_value': 3}, 'root[1]': {'new_value': 12, 'old_value': one}}, 'iterable_item_added': {'root[five]': 23}, 'iterable_item_removed': {'root[two]': 0}}
Accordingly, deepdiff
returns what inverse from 1 listing to the other. The right approach then volition depend on your utilize case. If you desire a detailed diff, then use DeepDiff
. Otherwise, just use a fix
.
How to compare 2 lists of strings
Comparison two lists of string in Python depends largely on what type of comparing y'all desire to make. That's because nosotros can compare a string in a scattering of ways.
In this section, we'll see three different ways of doing that.
The simplest i is using a ==
operator, like we saw in the starting time. This method is suitable if you want a strict comparison between each string.
In [1]: names = ['jack', 'josh', 'james'] In [2]: target = ['jack', 'josh', 'james'] In [3]: names == target Out[3]: True
Things beginning to become messy if you want to compare the listing of strings but ignoring the instance. Using the ==
for that merely doesn't work.
In [iv]: names = ['Jack', 'Josh', 'James'] In [ii]: target = ['jack', 'josh', 'james'] In [5]: names = = target Out[5]: Imitation
The all-time tool for that is again deepdiff
. Information technology allows united states of america to ignore the cord past passing a boolean flag to it.
In [1]: import deepdiff In [two]: names = ['Jack', 'Josh', 'James'] In [three]: target = ['jack', 'josh', 'james'] # ignoring string case In [4]: deepdiff.DeepDiff(names, target, ignore_string_case=True) Out[four]: {} # considering the case In [v]: deepdiff.DeepDiff(names, target) Out[5]: {'values_changed': {'root[0]': {'new_value': 'jack', 'old_value': 'Jack'}, 'root[1]': {'new_value': 'josh', 'old_value': 'Josh'}, 'root[ii]': {'new_value': 'james', 'old_value': 'James'}}}
We can also ignore the order in which the strings announced in the lists.
In [6]: names = ['Jack', 'James', 'Josh'] In [7]: target = ['jack', 'josh', 'james'] # ignoring the gild and string example In [eight]: deepdiff.DeepDiff(names, target, ignore_string_case=True, ignore_order=T ...: rue) Out[8]: {} # considering the society but ignoring the instance In [9]: deepdiff.DeepDiff(names, target, ignore_string_case=True) Out[9]: {'values_changed': {'root[1]': {'new_value': 'josh', 'old_value': 'james'}, 'root[ii]': {'new_value': 'james', 'old_value': 'josh'}}}
You lot tin likewise go further and perform advanced comparisons by passing a custom operator to DeepDiff
.
For example, suppose you want to compare the strings but ignoring whatever whitespace they may have.
Or perhaps you want to perform a fuzzy matching using an edit distance metric.
To do that, we tin write the comparing logic in the operator class and laissez passer it to DeepDiff
.
In this first example, we'll ignore whatsoever whitespace past trimming the strings before comparing them.
grade IgnoreWhitespaceOperator: def lucifer(self, level) -> bool: return True def give_up_diffing(self, level, diff_instance) -> bool: if isinstance(level.t1, str) and isinstance(level.t2, str): render level.t1.strip() == level.t2.strip() return Imitation
Then nosotros can merely plug into DeepDiff
by adding it to the list of custom_operators
, similar and then custom_operators=[IgnoreWhitespaceOperator()]
.
In [6]: from deepdiff import DeepDiff In [13]: names = ['Jack', 'James ', ' Josh '] In [14]: target = ['Jack', 'James', 'Josh',] # the operator will ignore the spaces in both lists In [15]: DeepDiff(names, target, custom_operators=[IgnoreWhitespaceOperator()]) Out[15]: {} In [16]: target = ['Jack', 'James', 'Josh', 'Jelly'] # if one of the list has an additional fellow member, this volition exist flagged In [17]: DeepDiff(names, target, custom_operators=[IgnoreWhitespaceOperator()]) Out[17]: {'iterable_item_added': {'root[3]': 'Jelly'}} In [eighteen]: target = ['Jack', 'Josh', 'James'] # past default, the library doesn't ignore guild In [19]: DeepDiff(names, target, custom_operators=[IgnoreWhitespaceOperator()]) Out[nineteen]: {'values_changed': {'root[1]': {'new_value': 'Josh', 'old_value': 'James '}, 'root[2]': {'new_value': 'James', 'old_value': ' Josh '}}} # if yous don't care well-nigh order, be explicit In [20]: DeepDiff(names, target, ignore_order=True, custom_operators=[IgnoreWhitespaceOperator()]) Out[20]: {}
How to compare two lists of dictionaries
Comparing two lists of dictionaries in Python is definitely intricate without the help of an external library. Equally we've seen then far, deepdiff
is versatile enough and we can use it to compare deep complex objects such as lists of dictionaries.
Let'southward see what happens when nosotros pass two lists of dictionaries.
In [ane]: from deepdiff import DeepDiff In [two]: first_list = [ ...: { ...: 'number': 1, ...: 'listing': ['i', 'two'] ...: }, ...: { ...: 'number': 2, ...: 'list': ['1', 'two'] ...: }, ...: ] In [three]: target_list = [ ...: { ...: 'number': three, ...: 'listing': ['one', 'two'] ...: }, ...: { ...: 'number': 2, ...: 'list': ['one', 'two'] ...: }, ...: ] In [four]: DeepDiff(first_list, target_list) Out[4]: {'values_changed': {"root[0]['number']": {'new_value': 3, 'old_value': one}}}
It outputs the verbal location where the elements differ and what the deviation is!
Permit's see another example where a list has a missing element.
In [2]: first_list = [ ...: { ...: 'number': 1, ...: 'list': ['one', 'two'] ...: }, ...: { ...: 'number': two, ...: 'list': ['one', 'two'] ...: }, ...: ] In [5]: target = [ ...: { ...: 'number': 3, ...: 'listing': ['one', 'two'] ...: }, ...: ] In [6]: In [6]: DeepDiff(first_list, target) Out[half dozen]: {'values_changed': {"root[0]['number']": {'new_value': 3, 'old_value': 1}}, 'iterable_item_removed': {'root[1]': {'number': ii, 'listing': ['one', '2']}}}
Information technology says the the second dictionary has been removed, which is the case for this case.
How to compare ii list of lists
Comparing multidimensional lists—a.chiliad.a list of lists—is easy for deepdiff
. It works only like a list of dict
due south.
In the example beneath, we have ii multidimensional lists that we desire to compare. When passed to DeepDiff
, it returns the verbal location in which the elements differ.
For example, for the position [1][0]
, the new value is viii, and the old is three. Another interesting aspect is that it works for deeply nested structures, for instance, deepdiff
also highlights the difference in the [ii][0][0]
position.
In [1]: from deepdiff import DeepDiff In [ii]: first_list = [[1, ii], [three, 4], [[five]]] In [3]: target_list = [[ane, 2], [8, iv], [[vii]]] In [4]: DeepDiff(first_list, target_list) Out[four]: {'values_changed': {'root[ane][0]': {'new_value': 8, 'old_value': 3}, 'root[2][0][0]': {'new_value': 7, 'old_value': 5}}}
When feeding the library with ii identical multidimensional lists, it returns an empty response.
In [3]: target_list = [[one, 2], [eight, 4], [[7]]] In [5]: second_list = [[1, 2], [8, 4], [[7]]] In [7]: DeepDiff(second_list, target_list) Out[7]: {}
How to compare two lists of objects
Sometimes we take a list of custom objects that we want to compare. Peradventure nosotros desire to go a diff, or just check if they contain the aforementioned elements. The solution for this trouble couldn't be dissimilar: use deepdiff
.
The following example demonstrates the power of this library. Nosotros're going to compare two lists containing a custom objects, and nosotros'll be able to assert if they are equal or not and what are the differences.
In the example beneath, nosotros have two lists of Person
objects. The just difference between the two is that in the final position Person
object has a different historic period. deepdiff
not just finds the right position - [one]
- just also finds that age
field is different too.
In [ix]: from deepdiff import DeepDiff In [10]: first = [Person('Jack', 34), Person('Janine', 23)] In [xi]: target = [Person('Jack', 34), Person('Janine', 24)] In [12]: DeepDiff(commencement, target) Out[12]: {'values_changed': {'root[1].historic period': {'new_value': 24, 'old_value': 23}}} In [14]: second = [Person('Jack', 34), Person('Janine', 24)] In [fifteen]: DeepDiff(2d, target) Out[15]: {}
How to compare 2 lists of numpy arrays
In this section, we'll see how to compare ii lists of numpy
arrays. This is a fairly common task for those who work with data scientific discipline and/or machine learning.
We saw in the first department that using the ==
operator doesn't work well with lists of numpy
arrays. Luckily we can employ... guess what!? Yeah, we tin can use deepdiff
.
The case beneath shows two lists with different numpy
arrays and the library can detect the exact position in which they differ. How cool is that?
In [16]: import numpy every bit np In [17]: from deepdiff import DeepDiff In [18]: first = [np.ones(3), np.array([ane, 2, 3])] In [19]: target = [np.zeros(4), np.array([1, ii, 3, four])] In [xx]: DeepDiff(first, target) Out[20]: {'values_changed': {'root[0][0]': {'new_value': 0.0, 'old_value': i.0}, 'root[0][i]': {'new_value': 0.0, 'old_value': 1.0}, 'root[0][2]': {'new_value': 0.0, 'old_value': 1.0}}, 'iterable_item_added': {'root[0][3]': 0.0, 'root[one][three]': 4}}
Conclusion
In this postal service, we saw many ways to compare two lists in Python. The best method depends on what kind of elements we have and how we desire to compare. Hopefully, you now know how to:
- check if two lists are equal in python
- compare two lists without order (unordered lists)
- compare two lists in python and render matches
- compare two lists in python and return differences
- compare two lists of strings
- compare two lists of dictionaries
- compare two list of lists
- compare ii lists of objects
- compare two lists of numpy arrays
Other posts you may like:
-
The All-time Way to Compare Two Dictionaries in Python
-
How to Compare Two Strings in Python (in 8 Easy Ways)
-
7 Different Ways to Flatten a Listing of Lists in Python
Meet you next time!
This mail service was originally published at https://miguendes.me
Source: https://miguendes.me/python-compare-lists
0 Response to "How To Compare Sets In Python"
Post a Comment