{data dendrites} exploring data science with Python et al.

XKCD → Python → Julia

In Better Way to Read the News, a chapter from the online version of Everyday Python book, we find a function that implements the map from thix XKCD cartoon:


Use unique( ) to drop duplicates

In a recent post, I reviewed how you can use union() to get unique members out of a set. A faster way to do that is to use unique() function:

Out:

[3 4 8 9 8 2 4 9 2 7 3 3 6 9 4
 5 3 0 0 10 7 6 9 9 2 10 0 5 9 5]
[3,4,8,9,2,7,6]
[5,3,0,10,7,6,9,2]
[3,5,4,8,0,9,10,2,7,6]
[0,2,3,4,5,6,7,8,9,10]

Compare times:

n = 3
@time [ union(a,b) for i in 1:10^n ]
@time [ unique([a,b]) for i in 1:10^n ];

Out:

elapsed time: 0.007959665 seconds (2335432 bytes allocated)
elapsed time: 0.002394242 seconds (1983432 bytes allocated)

K-nearest neighbor exercise in Julia

My plan is to work through Machine Learning in Action (MLA) by Peter Harrington and "translate" the code from Python to Julia. The first exercise concerns k-nearest-neighbor (kNN) algorithm.


Adding layers in Gadfly (Julia)

It's not really very clear from the documentation of Gadfly package how to add layers to an existing plot outside of the first plot statement, or how to display a plot once a layer has been added. After asking around, I finally figured out how to do it:


Dropping duplicates in Python and Julia

In Python, if we try to remove duplicates from a list, we can convert the list to a set and then back to a list:

a = rand(15)*10
a = a.astype(int)
print(a)
list(set(a))

Out:

[3 5 6 3 7 6 9 4 4 4 1 7 3 2 1]
[1, 2, 3, 4, 5, 6, 7, 9]

What if I don't want to change the order? From this Stack Overflow response:

a = rand(15)*10
a = a.astype(int)
print(a)

def foo(seq):
    seen = set()
    seen_add = seen.add
    return [ x for x in seq if not (x in seen or seen_add(x))]
    # each iteration of the for loop, x is added to seen by being passed to seen_add() which is a set.add() function

foo(a)

Out:

[2 8 1 9 1 0 5 6 7 6 6 9 3 6 3]
[2, 8, 1, 9, 0, 5, 6, 7, 3]

In Julia, it's a little more straightforward.