Login / Signup

A Reconfigurable Two-WSe 2 -Transistor Synaptic Cell for Reinforcement Learning.

Yue ZhouYasai WangFuwei ZhugeJianmiao GuoSijie MaJingli WangZijian TangYi LiXiangshui MiaoYuhui HeYang Chai
Published in: Advanced materials (Deerfield Beach, Fla.) (2022)
Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe 2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large G max /G min ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 µm 2 ) hardware chip for reinforcement learning.
Keyphrases
  • prefrontal cortex
  • neural network
  • decision making
  • primary care
  • white matter
  • high throughput
  • working memory
  • multiple sclerosis
  • risk assessment
  • resting state
  • bone marrow