I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and more efficient method for computing character n-grams?
b='student' >>> y= >>> for x in range(len(b)): n=b[x:x+2] y.append(n) >>> y ['st', 'tu', 'ud', 'de', 'en', 'nt', 't']
Here is the result I would like to get:
Thanks in advance for your suggestions.
To generate bigrams:
In : b='student' In : [b[i:i+2] for i in range(len(b)-1)] Out: ['st', 'tu', 'ud', 'de', 'en', 'nt']
To generalize to a different
In : n=4 In : [b[i:i+n] for i in range(len(b)-n+1)] Out: ['stud', 'tude', 'uden', 'dent']