Merge pull request 'feat/fix_loading_for_sahred_smoe' (#44) from feat/fix_loading_for_sahred_smoe into main
#47:Commit
39ff2710f5
pushed by
Fabel
Merge pull request 'fixed saving' (#43) from feat/bugfix into main
#44:Commit
7850cdba85
pushed by
Fabel
Merge pull request 'feat/bugfix' (#42) from feat/bugfix into main
#41:Commit
ff6f279728
pushed by
Fabel
Merge pull request 'corrected version numbering' (#40) from feat/new_version into develop
#34:Commit
924af79547
pushed by
Fabel
Merge pull request 'updated pretrainer to work with correct imports' (#39) from feat/fix_imports into develop
#32:Commit
f0f3f05584
pushed by
Fabel
Merge pull request 'feat/energy_efficenty' (#38) from feat/energy_efficenty into develop
#30:Commit
bb65dec449
pushed by
Fabel
Merge pull request 'feat/energy_efficenty' (#38) from feat/energy_efficenty into develop
#29:Commit
bb65dec449
pushed by
Fabel
Merge pull request 'feat/tf_support' (#37) from feat/tf_support into develop
#26:Commit
674e5d5409
pushed by
Fabel
removed the chunked and recursive model, since they are now depreacted. for the transfomer switch I decided to focus on my idea with Sparse Mixutre of Experts with shared Params.
#17:Commit
0852ddb109
pushed by
Fabel