Matriks-Rematrix

Tensor berubah menjadi matriks
Tensor berubah menjadi matriks

Pekerjaan jaringan saraf didasarkan pada manipulasi matriks. Untuk pelatihan, berbagai metode digunakan, banyak di antaranya telah berkembang dari metode penurunan gradien, di mana diperlukan kemampuan untuk menangani matriks, untuk menghitung gradien (turunan sehubungan dengan matriks). Jika Anda melihat di balik jaringan neural, Anda dapat melihat rangkaian matriks, yang sering kali terlihat menakutkan. Sederhananya, "matriks menunggu kita semua". Saatnya untuk lebih mengenal satu sama lain.





Untuk melakukan ini, kami akan mengambil langkah-langkah berikut:





  • Mari pertimbangkan manipulasi dengan matriks: transposisi, perkalian, gradien;





  • ;





  • .





NumPy . , , , , . , , , - , , , . , - : , .





-

- , , , . , , , Google TensorFlow.





, , , , , a_ {i} , i = 0, 1, 2, ..., n-1; n - .





import numpy as np #   numpy
a=np.array([1,2,5])
a.ndim #  ,   = 1
a.shape #      (3,)
a.shape[0] #      = 3
      
      



a_ {i} \ cdot b_ {i} = a_ {0} \ cdot b_ {0} + a_ {1} \ cdot b_ {1} + a_ {2} \ cdot b_ {2}โ€‹. , , โ€‹ 0 2 .





b=np.array([3,4,7])
np.dot(a,b) #   = 46
a*b #   array([ 3,  8, 35])
np.sum(a*b) # = 46
      
      



( ) - SEBUAHโ€‹, A_ {i, j} โ€‹. , A_ {0, 2}- 0- 2- . , .





A=np.array([[ 1,  2,  3],
            [ 2,  4,  6]])
A # array([[1, 2, 3],
  #        [2, 4, 6]])
A[0, 2] #    ,    = 3
A.shape # (2, 3)   2 , 3 
      
      



SEBUAHBโ€‹ C = AB โ€‹ , C_ {i, k} = A_ {i, j} B_ {j, k}โ€‹. , SEBUAH Bโ€‹ ( SEBUAH Bโ€‹)





B=np.array([[7, 8, 1, 3],
            [5, 4, 2, 7],
            [3, 6, 9, 4]])
A.shape[1] == B.shape[0] # true
A.shape[1], B.shape[0] # (3, 3) 
A.shape, B.shape # ((2, 3), (3, 4))
C = np.dot(A, B)
C # array([[26, 34, 32, 29],
  #        [52, 68, 64, 58]]); 
  #  , C[0,1]=A[0,0]B[0,1]+ A[0,1]B[1,1]+A[0,2]B[2,1]=1*8+2*4+3*6=34
C.shape # (2, 4)   
      
      



BAโ€‹ , :





np.dot(B, A) # ValueError: shapes (3,4) and (2,3) not aligned: 4 (dim 1) != 2 (dim 0)
      
      



B SEBUAH, .





, . , a_ {i, 0} b_ {j, 0}โ€‹. D_ {i, j} = a_ {i, 0} b_ {j, 0}โ€‹. , , , b_ {j, 0} = (bT) _ {0, j}โ€‹, bT- ( NumPy). D = a \ cdot bT โ€‹. , DT = {a \ cdot bT] .T = (bTT) \ cdot aT = b \ cdot aTโ€‹.





a = np.reshape(a, (3,1)) #   ,  a.shape = (3,)  (3,1),
b = np.reshape(b, (3,1)) #  ,  
D = np.dot(a,b.T)
D # array([[ 3,  4,  7],
  #        [ 6,  8, 14],
  #        [15, 20, 35]])
      
      



, . , .





, , . (cost function). , . . , (learning rate), , (epoch). , . (), . . , , , .





Waktu pertama

, ( , ).





- (samples) . . , (), ( ) - (samples), - (features).





, ( ). (, โ€ฆ) , , . , .





!

, , . , โ€œ โ€ . , , . , , . , , , .





, 10 . , โ€‹ (10, 3). โ€œ โ€, . , . , :





  • , , 0 50 ;





X=np.random.randint(0, 50, (10, 3))
      
      



  • 0 1;





X=np.random.rand(10, 3)
      
      



  • \ mu = 2 \ sigma ^ 2 = 16โ€‹. , , N (\ mu, \ sigma ^ 2);





X=4*np.random.randn(10, 3) + 2
      
      



\ mu = 0 \ sigma = 1โ€‹, .





, X (10, 3) W ^ {(1)}โ€‹, . , , . , , , W ^ {(1)} (3, 4). , (10, 3) (3, 4) \ Rightarrow (10, 4)โ€‹. , X \ cdot W ^ {(1)} (10.4)โ€‹, - - , . . , SEBUAHโ€‹ โ€‹(M N)( m, n ) a_ {i, j}โ€‹, f (A) , f (a_ {i, j}); , , a_ {1,2} \ Rightarrow f (a_ {1,2}), . , W ^ {(2)} , (4, 1)โ€‹. , (10, 3) (3, 4) (4, 1) \ Rightarrow (10, 1)โ€‹. , โ€‹ \ hat {Y} 10- (samples) . :





\ hat {Y} = X \ cdot W ^ {(1)} \ cdot W ^ {(2)}, \ quad \ quad \ hat {Y} _ {i, 0} = X_ {i, j} W_ { j, k} ^ {(1)} W_ {k, 0} ^ {(2)}.

, . (bias).





. : , , , .





X=np.random.randint(0, 50, (10, 3))
w1=2*np.random.rand(3,4)-1 #       -1  +1
w2=2*np.random.rand(4,1)-1
Y=np.dot(np.dot(x,w1),w2) #   
Y.shape # (10, 1)
Y.T.shape # (1, 10)
(np.dot(Y.T,Y)).shape # (1, 1), ,    
      
      



โ€‹. -1 +1, โ€œโ€ ( ).





. f_1 โ€œ โ€, - .





\ hat {Y} _ {i, 0} = f_2 (f_1 (X_ {i, j} W_ {j, k} ^ {(1)}) W_ {k, 0} ^ {(2)}), \hat{Y}=f_2(f_1(X \cdot W^{(1)})\cdot W^{(2)}).

, .





\triangle=\sum_i(Y_{i,0}-\hat{Y}_{i,0})^2=\sum_i\widetilde{Y}_{i,0}^2=(\widetilde{Y}.T)_{0,i}\widetilde{Y}_{i,0}=(\widetilde{Y}.T)\cdot\widetilde{Y},

(X,Y)- , \widetilde{Y}_{i,0}=Y_{i,0}-\hat{Y}_{i,0}. , (\widetilde{Y}.T)_{0,i}=\widetilde{Y}_{i,0}.





, . .





. - . , . , .





- , . f(x) f^{'}(x_0)=0โ€‹, โ€œ โ€ - . , , . , , . : - , , - . (, 16 ), , . . ,f^{'}(W)<0โ€‹, , , f^{'}(W)>0 โ€‹, . , โ€‹ .





W\Rightarrow W+\mu\cdot\delta W=W-\mu\cdot\frac{\partial \triangle}{\partial W},





W_{i,j}\Rightarrow W_{i,j}+\mu\cdot\delta W_{i,j}=W_{i,j}-\mu\cdot\frac{\partial \triangle}{\partial W_{i,j}},

\mu- (learning rate). , . . - , , . , - .





.





\frac{\partial a_{m, n}}{\partial a_{i,j}}=\delta_{m,i}\delta_{n,j},

\delta_{i,j}โ€‹- , , i=j . , \delta_{1,1}=1 โ€‹, \delta_{2,1}=0โ€‹. : .









\frac{\partial \triangle}{\partial W_{m,n}}=-2\sum_i(Y_{i,0}-\hat{Y}_{i,0})\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}},

, , \widetilde{Y}_{i,0}=Y_{i,0}-\hat{Y}_{i,0}โ€‹, .





. . , , .





, \hat{Y}_{i,0}=X_{i,j} W_{j,k}^{(1)} W_{k,0}^{(2)},





\frac{\partial \triangle}{\partial W_{m,0}^{(2)}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,0}^{(2)}}=-2\widetilde{Y}_{i,0}X_{i,j} W_{j,k}^{(1)}\delta_{k,m}=-2\widetilde{Y}_{i,0}X_{i,j} W_{j,m}^{(1)}=-2\widetilde{Y}_{i,0}(X\cdot W^{(1)})_{i,m}

, A_{i,m}=(A.T)_{m.i}โ€‹. , :





\delta  W_{m,0}^{(2)}=-\frac{\partial \triangle}{\partial W_{m,0}^{(2)}}=2((X\cdot W^{(1)}).T)_{m,i}\widetilde{Y}_{i,0},





\delta  W^{(2)}=2((X\cdot W^{(1)}).T)\cdot \widetilde{Y}.

, , , \delta  W^{(2)}โ€‹. X\cdot W^{(1)} (10,3)(3,4)=(10,4)โ€‹, - (4,10)โ€‹. \widetilde{Y} \hat{Y}- (10,1)โ€‹. , \delta  W^{(2)} (4,10)(10,1)=(4,1)โ€‹, .





deltaW2=2*np.dot(np.dot(X,w1).T,Y)
deltaW2.shape # (4,1)
      
      



W^{(1)}.





\frac{\partial \triangle}{\partial W_{m,n}^{(1)}}=-2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}^{(1)}}=-2\widetilde{Y}_{i,0}X_{i,j} \delta_{j,m}\delta_{k,n}W_{k,0}^{(2)}=-2\widetilde{Y}_{i,0}X_{i,m} W_{n,0}^{(2)}=-2(X.T)_{m,i}\widetilde{Y}_{i,0}(W^{(2)}.T)_{0,n}, \delta  W^{(1)}=2(X.T)\cdot \widetilde{Y}\cdot (W^{(2)}.T).

, โ€œ โ€, โ€œ โ€ - m nโ€‹. , , . : โ€œโ€ ( ), , .





\delta  W^{(1)}: (3,10)(10,1)(1,4)=(3,4).





. ,, , , . . , . , . , , : z=f(y(x))โ€‹, z xโ€‹ z_x^{'}=f_y^{'}y_x^{'}โ€‹.





,





\hat{Y}_{i,0}=f_2(f_1(X_{i,j} W_{j,k}^{(1)})W_{k,0}^{(2)})\quad\Rightarrow\quad  \hat{Y}_{i,0}=f_2(C_{i,0}),

:





C_{i,0}=B_{i,k}W_{k,0}^{(2)}, \quad\quad B_{i,k}=f_1(A_{i,k}), \quad\quad A_{i,k}=X_{i,j} W_{j,k}^{(1)}.

W_2 , . ,





\delta  W_{m,0}^{(2)}=2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,0}^{(2)}}=2\widetilde{Y}_{i,0}\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,0}}\frac{\partial C_{\mu,0}}{\partial W_{m,0}^{(2)}}=2\widetilde{Y}_{i,0}f_2^{'}(C_{i,0})\delta_{i,\mu}B_{\mu,k}\delta_{k,m}=2\widetilde{Y}_{i,0}f_2^{'}(C_{i,0})B_{i,m}.

,





\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,0}}=f_2^{'}(C_{i,0})\delta_{i,\mu}, \quad\quad \frac{\partial C_{\mu,0}}{\partial W_{m,0}^{(2)}}=B_{\mu,k}\frac{\partial W_{k,0}^{(2)}}{\partial W_{m,0}^{(2)}}=B_{\mu,k}\delta_{k,m}.

, - . m : B_{i,m}=(B.T)_{m,i}, f_1(A_{i,m})=(f_1(A).T)_{m,i}. ,





\delta  W_{m,0}^{(2)}=2(B.T)_{m,i}\widetilde{Y}_{i,0}f_2^{'}(C_{i,0}) \Rightarrow \delta  W^{(2)}=2(B.T)\cdot(\widetilde{Y}*f_2^{'}(C))

โ€œ*โ€ . , a bโ€‹, , a*b , ; , a_{1,2}b_{1,2}โ€‹.





. f_1(x)=x^2 f_2(x)=x^3. , , . NumPy .





def f1(x): #  
    return np.power(x,2)
def graf1(x): # 
    return 2*x
def f2(x): #  
    return np.power(x,3)
def gradf2(x): # 
    return 3*np.power(x,2)

A=np.dot(X,w1) #   
B=f1(A)        #   
C=np.dot(B,w2) #    
Y=f2() #   
deltaW2=2*np.dot(B.T, Y*gradf2(C))
deltaW2.shape # (4,1)
      
      



W^{(1)} , . - .





\delta  W_{m,n}^{(1)}=2\widetilde{Y}_{i,0}\frac{\partial \hat{Y}_{i,0}}{\partial W_{m,n}^{(1)}}=2\widetilde{Y}_{i,0}\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,\nu}}\frac{\partial C_{\mu,\nu}}{\partial B_{l,s}}\frac{\partial B_{l,s}}{\partial W_{m,n}^{(1)}},

C_{\mu,\nu}=B_{\mu,k}W_{k,\nu}^{(2)}. :





\frac{\partial f_2(C_{i,0})}{\partial C_{\mu,\nu}}=f_2^{'}(C_{i,0})\delta_{i,\mu}\delta_{0,\nu},\quad\quad \frac{\partial C_{\mu,\nu}}{\partial B_{l,s}}=\delta_{\mu,l}\delta_{k,s}W_{k,\nu}^{(2)},\quad\quad \frac{\partial B_{l,s}}{\partial W_{m,n}^{(1)}}=\frac{\partial B_{l,s}}{\partial A_{r,e}}\frac{\partial A_{r,e}}{\partial W_{m,n}^{(1)}}=f_1^{'}(A_{l,s})\delta_{l,r}\delta_{s,e}\delta_{j,m}\delta_{e,n}X_{r,j}=f_1^{'}(A_{l,s})\delta_{l,r}\delta_{s,n}X_{r,m}.

,





\ delta W_ {m, n} ^ {(1)} = 2 \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) \ delta_ {i, \ mu} \ delta_ {0, \ nu} \ delta _ {\ mu, l} \ delta_ {k, s} W_ {k, \ nu} ^ {(2)} f_1 ^ {'} (A_ {l, s}) \ delta_ {s, n} \ delta_ {l, r} X_ {r, m} = 2 \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) W_ {n, 0} ^ {(2)} f_1 ^ {'} (A_ {i, n}) X_ {i, m},





\ delta_ {i, \ mu} \ delta_ {0, \ nu} \ delta _ {\ mu, l} \ delta_ {k, s} \ delta_ {s, n} \ delta_ {l, r} = \ delta_ { i, l} \ delta_ {i, r} \ delta_ {k, n} \ delta_ {s, n}.

, \ delta_ {0, \ nu} W_ {k, \ nu} ^ {(2)} = W_ {k, 0} ^ {(2)}โ€‹, m n , โ€œโ€, l, r, k, sโ€‹.





โ€œโ€ ,





\ delta W_ {m, n} ^ {(1)} = 2 (XT) _ {m, i} \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) ( W ^ {(2)}. T) _ {0, n} f_1 ^ {'} (A_ {i, n}), \ delta W ^ {(1)} = 2 (XT) \ cdot [[(\ widetilde {Y} * f_2 ^ {'} (C)) \ cdot (W ^ {(2)}. T)] * f_1 ^ {'} (A)].

, D_ {i, o} = \ widetilde {Y} _ {i, 0} f_2 ^ {'} (C_ {i, 0}) \ Rightarrow \ widetilde {Y} * f_2 ^ {'} (C), F_ {i, n} = D_ {io} (W ^ {(2)}. T) _ {0, n}, F_ {i, n} f_1 ^ {'} (A_ {i, n}) \ Rightarrow F * f_1 ^ {'} (A)โ€‹.





.





deltaW1=2*np.dot(X.T, np.dot(Y*gradf2(C),w2.T)*gradf1(A))
deltaW1.shape # (3,4)
      
      



. .





โ€œ, - . -!โ€ ? , , , . , . - , , . ! , , - . , , .





, . James Loy - , , , , , . . , , , . โ€œ-โ€, , , . , TensorFlow Keras. , sumber aslinya (ada terjemahan ke dalam bahasa Rusia).





Menulis kode, mempelajari rumus, membaca buku, bertanya pada diri sendiri.





Adapun alat- alatnya adalah Jupyter Notebook ( Anaconda rules!), Colab ...








All Articles