Document production hosting (#2661)

This commit is contained in:
Paul Melnikow
2019-01-07 20:55:49 -05:00
committed by GitHub
parent dcbe1cf906
commit 1e267f891d
3 changed files with 246 additions and 28 deletions

View File

@@ -1,37 +1,245 @@
# Production hosting
In production, a deploy commit checks in two config files:
[![operations issues](https://img.shields.io/github/issues/badges/shields/operations.svg?label=open%20operations%20issues)][operations issues]
- `.env`
- `config/local-shields-io-production.yml`
[#ops chat room][ops discord]
The `.env` file sets `NODE_CONFIG_ENV` which bootstraps the configuration process. The rest of the configuration is loaded from three sources:
[operations issues]: https://github.com/badges/shields/issues?q=is%3Aissue+is%3Aopen+label%3Aoperations
[ops discord]: https://discordapp.com/channels/308323056592486420/480747695879749633
- `config/local-shields-io-production.yml` (secrets)
- [`config/shields-io-production.yml`](../config/shields-io-production.yml) (non-secrets)
- [`config/default.yml`](../config/default.yml)
| Component | Subcomponent | People with access |
| -------------- | --------------- | ------------------------------------------------------------------------------------------ |
| Badge servers | Account owner | @espadrine |
| Badge servers | ssh, logs | @espadrine |
| Badge servers | Deployment | @espadrine, @paulmelnikow |
| Badge servers | Admin endpoints | @espadrine, @paulmelnikow |
| Cloudflare | Account owner | @espadrine |
| Cloudflare | Admin access | @espadrine, @paulmelnikow |
| GitHub | OAuth app | @espadrine ([could be transferred to the badges org][oauth transfer]) |
| DNS | Account owner | @olivierlacan |
| Sentry | Error reports | @espadrine, @paulmelnikow |
| Frontend | Deployment | Technically anyone with push access but in practice must be deployed with the badge server |
| Metrics server | Owner | @platan |
| UptimeRobot | Account owner | @paulmelnikow |
| More metrics | Owner | @RedSparr0w |
These settings are currently set in `config/local-shields-io-production.yml`:
There are [too many bottlenecks][issue 2577]!
- bintray_apikey
- bintray_user
- gh_client_id
- gh_client_secret
- gh_oauth_state
- libraries_io_api_key
- sentry_dsn
- shields_secret
- sl_insight_apiToken
- sl_insight_userUuid
- wheelmap_token
## Badge servers
## Main Server Sysadmin
There are three public badge servers on OVH VPSs.
- Servers in DNS round-robin:
- s0.shields-server.com: 192.99.59.72 (vps71670.vps.ovh.ca)
- s1.shields-server.com: 51.254.114.150 (vps244529.ovh.net)
- s2.shields-server.com: 149.56.96.133 (vps117870.vps.ovh.ca)
- Self-signed TLS certificates, but `img.shields.io` is behind CloudFlare, which provides signed certificates.
- Using systemd to automatically restart the server when it crashes.
| Cname | Hostname | Type | IP | Location |
| --------------------------- | -------------------- | ---- | -------------- | ------------------ |
| [s0.shields-server.com][s0] | vps71670.vps.ovh.ca | VPS | 192.99.59.72 | Quebec, Canada |
| [s1.shields-server.com][s1] | vps244529.ovh.net | VPS | 51.254.114.150 | Gravelines, France |
| [s2.shields-server.com][s2] | vps117870.vps.ovh.ca | VPS | 149.56.96.133 | Quebec, Canada |
See https://github.com/badges/ServerScript for helper admin scripts.
- These are single-core virtual hosts with 2 GB RAM [VPS SSD 1]().
- The Node version (v9.4.0 at time of writing) and dependency versions on the
servers can be inspected in Sentry, but only when an error occurs.
- The servers use self-signed SSL certificates. ([#1460][issue 1460])
- After accepting the certificate, you can debug an individual server using
the links above.
- The scripts that start the server live in the [ServerScript][] repo. However
updates must be pulled manually. They are not updated as part of the deploy process.
- The server runs SSH.
- Deploys are made using a git post-receive hook.
- The server uses systemd to automatically restart the server when it crashes.
- Provisioning additional servers is a manual process which is yet to been
documented.
[s0]: https://s0.shields-server.com/index.html
[s1]: https://s1.shields-server.com/index.html
[s2]: https://s2.shields-server.com/index.html
[vps ssd 1]: https://www.ovh.com/world/vps/vps-ssd.xml
[issue 1460]: https://github.com/badges/shields/issues/1460
[serverscript]: https://github.com/badges/ServerScript
## Attached state
Shields has mercifully little persistent state:
1. The GitHub tokens we collect are saved on each server in JSON files on disk.
They can be fetched from the [GitHub auth admin endpoint][] for debugging.
2. The analytics data is also saved on each server in JSON files on disk.
3. The server keeps a few caches in memory. These are neither persisted nor
inspectable.
- The [request cache][]
- The [regular-update cache][]
- The [raster cache][]
[github auth admin endpoint]: https://github.com/badges/shields/blob/master/services/github/auth/admin.js
[request cache]: https://github.com/badges/shields/blob/master/lib/request-handler.js#L29-L30
[regular-update cache]: https://github.com/badges/shields/blob/master/lib/regular-update.js
[raster cache]: https://github.com/badges/shields/blob/master/gh-badges/lib/svg-to-img.js#L9-L10
[oauth transfer]: https://developer.github.com/apps/managing-oauth-apps/transferring-ownership-of-an-oauth-app/
## Configuration
To bootstrap the configuration process,
[the script that starts the server][start-shields.sh] sets a single
environment variable:
```
NODE_CONFIG_ENV=shields-io-production
```
With that variable set, the server ([using `config`][config]) reads these
files:
- [`local-shields-io-production.yml`][local-shields-io-production.yml].
This file contains secrets which are checked in with a deploy commit.
- [`shields-io-production.yml`][shields-io-production.yml]. This file
contains non-secrets which are checked in to the main repo.
- [`default.yml`][default.yml]`. This file contains defaults.
[start-shields.sh]: https://github.com/badges/ServerScript/blob/master/start-shields.sh#L7
[config]: https://github.com/lorenwest/node-config/wiki/Configuration-Files
[local-shields-io-production.yml]: ../config/local-shields-io-production.example.yml
[shields-io-production.yml]: ../config/shields-io-production.yml
[default.yml]: ../config/default.yml
The project ships with `dotenv`, however there is no `.env` in production.
## Badge CDN
Sitting in front of the three servers is a Cloudflare Free account which
provides several services:
- Global CDN, caching, and SSL gateway for `img.shields.io`
- Analytics through the Cloudflare dashboard
- DNS hosting for `shields.io`
Cloudflare is configured to respect the servers' cache headers.
## Frontend
The frontend is served by [GitHub Pages][] via the [gh-pages branch][gh-pages]. SSL is enforced.
`shields.io` resolves to the GitHub Pages hosts. It is not proxied through
Cloudflare.
Technically any maintainer can push to `gh-pages`, but in practice the frontend must be deployed
with the badge server via the deployment process described below.
[github pages]: https://pages.github.com/
[gh-pages]: https://github.com/badges/shields/tree/gh-pages
## Deployment
To set things up for deployment:
1. Get your SSH key added to the server.
2. Clone a fresh copy of the repository, dedicated for deployment.
(Not required, but recommended; and lets you use `npm ci` below.)
3. Add remotes:
```sh
git remote add s0 root@s0.shields-server.com:/home/m/shields.git
git remote add s1 root@s1.shields-server.com:/home/m/shields.git
git remote add s2 root@s2.shields-server.com:/home/m/shields.git
```
`origin` should point to GitHub as usual.
4. Since the deploy uses `git worktree`, make sure you have git 2.5 or later.
To deploy:
1. Use `git fetch` to obtain a current copy of
`local-shields-io-production.yml` from the server (or obtain the current
version of that file some other way). Save it in `config/`.
2. Check out the commit you want to deploy.
3. Run `npm ci`. **This is super important for the frontend build!**
4. Run `make deploy-s0` to make a canary deploy.
5. Check the canary deploy:
- [Visit the server][s0]. Don't forget that most of the preview badges
are static!
- Look for errors in [Sentry][].
- Keep an eye on the [status page][status].
6. After a little while (usually 1060 minutes), finish the deploy:
`make push-s1 push-s2 deploy-gh-pages`.
To roll back, check out the commit you want to roll back to and repeat those
steps.
To see which commit is deployed to a server run `git ls-remote` and then
`git log` on the `HEAD` ref. There will be two deploy commits preceded by the
commit which was deployed.
Be careful not to push the deploy commits to GitHub.
`make deploy-s0` does the following:
1. Creates a working tree in `/tmp`.
2. In that tree, runs `features` and `examples` to generate data files
needed for the frontend.
3. Builds and checks in the built frontend.
4. Checks in `local-shields-io-production.yml`.
5. Pushes to s0, which updates dependencies and then restarts itself.
`make push-s1 push-s2 deploy-gh-pages` does the following:
1. Pushes the same working tree to s1 and s2.
2. Creates a new working tree for the frontend.
3. Adds a commit cleaning out the index.
4. Adds another commit with the build frontend.
5. Pushes to `gh-pages`.
## DNS
I'm not sure where the DNS is registered.
## Logs
Logs are available on the individual servers via SSH.
## Error reporting
[Error reporting][sentry] is one of the most useful tools we have for monitoring
the server. It's generously donated by [Sentry][sentry home]. We bundle
[`raven`][raven] into the application, and the Sentry DSN is configured via
`local-shields-io-production.yml` (see [documentation][sentry configuration]).
[sentry]: https://sentry.io/shields/
[raven]: https://www.npmjs.com/package/raven
[sentry home]: https://sentry.io/shields/
[sentry configuration]: https://github.com/badges/shields/blob/master/doc/self-hosting.md#sentry
## Monitoring
Request performance is monitored in two places:
- [Status][] (using [UptimeRobot][])
- [Server metrics][] using Prometheus and Grafana
- [@RedSparr0w's monitor][monitor] which posts [notifications][] to a private
[#monitor chat room][monitor discord]
Overall server performance is monitored using Prometheus and Grafana.
Coming soon! ([#2068][issue 2068])
[status]: https://status.shields.io/
[server metrics]: https://metrics.shields.io/
[uptimerobot]: https://uptimerobot.com/
[monitor]: https://shields.redsparr0w.com/1568/
[notifications]: http://shields.redsparr0w.com/discord_notification
[monitor discord]: https://discordapp.com/channels/308323056592486420/470700909182320646
[issue 2068]: https://github.com/badges/shields/issues/2068
## Analytics
The server analytics data is public and can be fetched from the
[analytics endpoint][] or using the [analytics script][].
[analytics endpoint]: https://github.com/badges/shields/blob/master/lib/analytics.js
[analytics script]: https://github.com/badges/ServerScript/blob/master/stats.js
## Known limitations
1. The only way to inspect the commit on the server is with `git ls-remote`.
2. The production deploy installs `devDependencies`. It does not honor
`package-lock.json`. ([#1988][issue 1988])
[issue 2577]: https://github.com/badges/shields/issues/2577
[issue 1988]: https://github.com/badges/shields/issues/1988