資源簡介
Q學習的matlab代碼。自己寫的并且配了詳細注釋,很好理解。
代碼片段和文件信息
%?Q學習例程
addpath(‘modules‘);
%%?%%%%%%%%%%%%%%%%%%%%%%%%%?Q學習初始設置?%%%%%%%%%%%%%%%%%%%%%%%%%
%?設置學習率參數γ
????gamma=0.80;
%?設置獎勵矩陣R
????R=[-inf-inf-inf-inf???0?-inf;
???????-inf-inf-inf???0-inf?100;
???????-inf-inf-inf???0-inf?-inf;
???????-inf???0???0-inf???0?-inf;
??????????0-inf-inf???0-inf?100;
???????-inf???0-inf-inf???0?100];
%?初始化知識矩陣Q
????Q=zeros(size(R));
%?設置目標
????Target=6;
%?收斂判斷符
????count=0;
????Q_last=ones(size(R))*inf;
%%?%%%%%%%%%%%%%%%%%%%%%%%%%%%?強化學習?%%%%%%%%%%%%%%%%%%%%%%%%%%%
%?定義最大學習次數
episode_max=50000;
%?迭代學習
????for?episode=0:episode_max
????%%?選擇隨機初始狀態
????%?讀取狀態總數
????????state_num=size(R1);
????%?選擇隨機初始狀態
????????state=randperm(state_num1);
????%%?隨機搜索直到到達目標
????????while?1
????????%%?根據當前狀態隨機選擇一個可執行的行為
????????%?找出可執行的行為
????????????choices=find(?R(state:)>=0?);
????????%?隨機選擇一個可執行行為
????????????action=act_rand_select(?choices?);
????????%%?根據下一個狀態更新Q表
????????%?根據所選行為到達下一個狀態
????????????ne
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件????????2618??2018-03-17?18:14??Q_learning\Q_learning.m
?????目錄???????????0??2018-03-16?16:52??Q_learning\modules\
?????文件?????????369??2018-03-16?16:00??Q_learning\modules\act_rand_select.m
?????文件?????????504??2018-03-16?17:20??Q_learning\modules\conver_check.m
?????目錄???????????0??2018-03-16?22:52??Q_learning\
評論
共有 條評論