Before we get to that, let’s understand what Skip-Gram is. Quite literally the opposite of CBOW, but more efficient. Here, given the center word, we have to predict its neighboring words. The second technique in today’s spotlight is the Skip-Gram approach. Using two matrices gives each word two different spheres of space in which they exist while giving us two different perspectives to view each word. For disentanglement between these two tasks, we will be using two matrices: the context word matrix to represent the words when viewing them as the neighboring words and the center word matrix to represent the words when viewing them as the center words. The matrices become the vectorial representation for the words.Ī single matrix can be used to fulfill both encoding/decoding purposes. The matrices provide a finite space in which each word is expressed. The loss affects the weights of these matrices in adapting to the data. The star here is the encoding/decoding matrices. This is compared to the actual label ( am) one-hot encoding of the same shape to complete the architecture. This hidden layer is now multiplied by a 3x5 decoding matrix to give us our prediction of a 1x5 shape. We start with the one-hot encodings of I and reading (shape 1x5), multiplying those encodings with an encoding matrix of shape 5x3. ![]() We will be considering the example of the input-label pair of ( I, reading) – ( am). That makes our vocabulary size 5, and we will assume there are 3 embedding dimensions for simplicity. Let’s assume our input sentence in Figure 1 is our complete input text. If our input sentence is “I am reading the book.”, then the input pairs and labels for a window size of 3 would be:įigure 1: Bare-bones CBOW (image by the author). Now that we have determined the magic of Word2Vec lies in word associations, let us take it a step further and understand the two subsets of Word2Vec.ĬBOW is a technique where, given the neighboring words, the center word is determined. Hence, this technique is totally dependent on a good dataset. If your text corpus has several instances of the word “read” in the same sentence as the word “book”, the Word2Vec approach will automatically group them together. Hence, the meaning of a word will depend on the words where it is associated. The context of a word is defined by its neighboring words. ![]() Now, if that sounds confusing to you, let’s break it down into even simpler terms. We help define the meaning of words based on their context. So are these weights assigned at random ( Table 1)?īelieve it or not, the answer lies in the last paragraph itself. On top of that, the English language has several words with multiple meanings based on the context. The word’s weight in each dimension of that embedding space defines it for the model.īut how will we assign these weights? It is not abundantly clear that teaching grammar and other semantics to a computer is a tough task but expressing the meaning of each word is a different ball game altogether. Word2Vec essentially means expressing each word in your text corpus in an N-dimensional space (embedding space). It is exactly what you think (i.e., words as vectors). ![]() Let us address the very first thing What does the name Word2vec mean? Looking for the source code to this post? Jump Right To The Downloads Section
0 Comments
Leave a Reply. |