Dropping duplicates in Python and Julia
20 Jan 2015
In Python, if we try to remove duplicates from a list, we can convert the list to a set and then back to a list:
a = rand(15)*10
a = a.astype(int)
print(a)
list(set(a))
Out:
[3 5 6 3 7 6 9 4 4 4 1 7 3 2 1]
[1, 2, 3, 4, 5, 6, 7, 9]
What if I don’t want to change the order? From this Stack Overflow response:
a = rand(15)*10
a = a.astype(int)
print(a)
def foo(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if not (x in seen or seen_add(x))]
# each iteration of the for loop, x is added to seen by being passed to seen_add() which is a set.add() function
foo(a)
Out:
[2 8 1 9 1 0 5 6 7 6 6 9 3 6 3]
[2, 8, 1, 9, 0, 5, 6, 7, 3]
In Julia, it’s a little more straightforward.
You can use union(var)
to drop duplicates and sort(union(var))
to sort the result. union(a,b)
finds the union between two variables and can be used to drop duplicates between them.
a = int(rand(10)*15)
println("a: ", a)
println("unique: ", union(a))
println("sort unique: ", sort(union(a)))
b = int(rand(10)*15)
println("b: ", b)
println("union(a,b): ", sort(union(a,b)))
Out:
a: [3,9,9,8,0,5,5,1,13,3]
unique: [3,9,8,0,5,1,13]
sort unique: [0,1,3,5,8,9,13]
b: [3,13,12,14,12,1,4,6,14,2]
union(a,b): [0,1,2,3,4,5,6,8,9,12,13,14]