Don't know about OLKi ? Have a look here.
Here is a tutorial about how to set-up your own OLKi instance.
How to set up an OLKi instance
- install docker: https://docs.docker.com/get-docker/
- install docker-compose: https://docs.docker.com/compose/install/
- create (and edit with your own info) a .env.docker with:
DATABASE_URL=postgresql://olki:olki@postgres/olki
CACHE_URL=redis://redis:6379
OLKI_WEBSERVER_HOST=olki.cerisara.fr
OLKI_WEBSERVER_PORT=443
OLKI_WEBSERVER_HTTPS=true
OLKI_TRUST_PROXY=["127.0.0.1", "loopback", "172.18.0.0/16"]
OLKI_ADMIN_EMAIL=your@email.com
- create (and edit with your own info) a .env for OLKi:
INSTANCE_NAME="yet another science data repository"
INSTANCE_HOSTNAME="olki.cerisara.fr"
INSTANCE_REGISTRATIONS_OPEN=True
EMAIL_SENDER="olki@cerisara.fr"
EMAIL_URL="smtp+tls://user@:password@smtp.example.org:587"
AUTH_LDAP_ENABLE=True
AUTH_LDAP_NO_NEW_USERS=False
#AUTH_LDAP_SERVER_URI="ldap://localhost:389"
#AUTH_LDAP_USER_DN_TEMPLATE="cn=%(user)s,ou=people,dc=planetexpress,dc=com"
#AUTH_LDAP_REQUIRE_GROUP="group"
LOGLEVEL="debug"
SENTRY_ENABLE=False
#SENTRY_DSN=""
- configuring the email url correctly may be a pain, because of characters in the password that may not be valid URL characters. So to do it properly, you may need to "urlencode" your password as follows:
alias urlencode='node -e "console.log(encodeURIComponent(process.argv[1]))"'
urlencode 'funnypassword'
- create (you shouldn't need to edit it) a docker-compose.yml:
version: "3.3"
services:
olki:
# If you don't want to use the official image and build one from sources
# build: ../../
# network: host
# context: .
# dockerfile: ./Dockerfile
image: ${OLKI_IMAGE-rigelk/olki}
env_file:
- .env.docker
# For local access without passing through the reverse proxy (unsecure access)
# Do not enable when using stack
#ports:
- "${OLKI_EXTERNAL_PORT-127.0.0.1:5000}:5000"
volumes:
- media:/app/olki_back/olki/media
- type: bind
source: ./.env
target: /app/.env
depends_on:
- postgres
- redis
- postfix
restart: ${RESTART_POLICY-unless-stopped}
networks:
- default
- inner
postgres:
image: sameersbn/postgresql:10-2
environment:
DB_USER: olki
DB_PASS: olki
DB_NAME: olki
DB_EXTENSION: 'unaccent,pg_trgm'
volumes:
- postgres:/var/lib/postgresql/data
restart: ${RESTART_POLICY-unless-stopped}
networks:
- inner
redis:
image: redis:5-alpine
volumes:
- redis:/data
restart: ${RESTART_POLICY-unless-stopped}
networks:
- inner
postfix:
image: mwader/postfix-relay
environment:
- POSTFIX_myhostname=${OLKI_WEBSERVER_HOST}
labels:
traefik.enable: "false"
restart: ${RESTART_POLICY-unless-stopped}
networks:
- inner
networks:
default:
inner:
volumes:
media:
postgres:
redis:
- launch it: docker-compose -f docker-compose.yml up
- get and configure an internet domain to point on your machine, for instance: olki.cerisara.fr
- configure your apache web server; here is an example:
<VirtualHost *:80>
ServerName olki.cerisara.fr
Redirect permanent / https://olki.cerisara.fr
</VirtualHost>
<VirtualHost *:443>
ServerName olki.cerisara.fr
RewriteEngine On
RewriteCond %{HTTP:Upgrade} =websocket [NC]
RewriteRule /(.*) ws://localhost:5000/$1 [P,L]
ProxyPass / http://localhost:5000/
ProxyPassReverse / http://localhost:5000/
ProxyRequests Off
</VirtualHost>
- configure your https certificate with certbot: https://certbot.eff.org/lets-encrypt/
- certbot should automatically update your apache config file
- create a superuser: docker-compose -f docker-compose.yml run olki manage createsuperuser --email "superuser@example.org" --password superpass --username superuser
Useful commands
- list users: docker-compose -f docker-compose.yml run olki manage user list
How to dev the backend
This procedure has been tested on Ubuntu20. Management of python dependencies is done with "poetry".
- clone the olki repo
- install some packages:
sudo apt install python3-dev postgresql libpq-dev
- create a virtualenv with python3.8
virtualenv -p python3 env
source env/bin/activate
pip install setuptools=44
pip install poetry
- create a postgresql olki user and database:
sudo su - postgres
createuser --interactive --pwprompt
createdb -O olki olki
- edit .env.default (e.g., postgresql olki password...)
- WARNING: the DB password is forced in olki_back/olki/settings/init.py I don't know why: you must comment this line to use your own password !
- Alternatively, create the DB as stated in https://framagit.org/synalp/olki/olki/-/wikis/Development-documentation
- Migrate the DB with "python manage.py migrate"
- launch the server:
make serve-dev-be
Towards MLaaS
For now, the OLKi platform can store corpora in a federated way, period. These features (storage and federation) can serve as a basis for various applications. The first one I'd like to implement is to build, on top of a OLKi network, a platform for Machine Learning as a Service, i.e., a platform that provides scripts, dataset, models and the associated computing power to partner scientists.
I think this is very important to push forward deep learning research in Europe. Indeed, although many of the state-of-the-art deep learning models and codes are distributed freely, this is not enough, because all training and inference scripts have to be adapted to each particular computing center, and this adaptation is far from being easy ! So a new researcher who would like to start experimenting with deep learning faces two main difficulties:
- Finding the appropriate models and code: every state-of-the-art model is implemented and distributed in several flavors, by different companies, and with various recipes: depending on what we precisely want to do, and depending on our computing facilities available, we have to chose the most adequate version. It's very hard to know in advance which one is the best fitted to our context, and trying several of them takes a lot of time (see next point). OLKi will be of great help to solve this issue, thanks to its federated nature: once a researcher has managed to setup the best recipe for a site or community, then every research from this community can just reuse it, with a simple click on a button. Furthermore, if the recipe has not yet been developed for a given site or community, it may well be available on a close community in the OLKi network, and adaptation efforts will be minimal.
- Adapting the recipe to a given site: every site has access to specific computing resources, middleware, network capabilities and software. The efforts needed to adapt a deep learning recipe is huge, and often requires many iterations to be able to reproduce state-of-the-art results.
Both of these difficulties constitute the main bottleneck to foster deep learning research in academics and small companies, and it must be solved by sharing knowledge in a federated way.
- Some of you might argue that in France, there's Jean Zay, why not putting all of the scripts in there for everybody ? Jean Zay is great, and it can (will) be integrated as a computing resource in OLKi. But there are still many specificities that are site- and community-dependent, and a federation is also needed.
- Others might argue that these issues tend nowadays to be solved in a centralized way by Google, who is giving away "free" computing resources in the Google collab platform: this is true, and more and more "recipes" are actually developed for this platform. But there are very serious drawbacks to this approach: first, when you start to work on serious deep learning models, the "free" GPUs are not enough and you have to pay to get more (it's a famous business model: once you're captive, you'll prefer to pay rather than restart from scratch elsewhere). Second, going this direction will result in Google controling all of the research made by European researchers: is it really what we want ?
There is another way to share generic deep learning expertise globally and specific one within our community: OLKi.
I have started to develop OLKi-ML clients, both in CLI and GUI versions to experiment with this MLaaS vision. If you're interested, please drop me a line at @admin@olkichat.duckdns.org