|
| 1 | +# Current infrastructure for metacpan-api / cpan.metacpan.org |
| 2 | + |
| 3 | +## Backpan - files on disk |
| 4 | + |
| 5 | +All servers have a full copy of the MetaCPAN's BackPAN (a full copy of |
| 6 | +all files, even those marked as deleted in `CPAN`). This means users |
| 7 | +can always get hold of a file. |
| 8 | + |
| 9 | +There is currently 102G - and it is not growing fast. |
| 10 | + |
| 11 | +The BackPAN is added to on each server by getting new updates from `cpan-rsync.perl.org`. |
| 12 | +The we have [puppet setup rrrclient](https://github.com/metacpan/metacpan-puppet/tree/master/modules/rrrclient), it is initiated from [metacpan::rrrclient](https://github.com/metacpan/metacpan-puppet/blob/master/modules/metacpan/manifests/rrrclient.pp). |
| 13 | + |
| 14 | +### :question: What processes happen? |
| 15 | + |
| 16 | +- Run [File::Rsync::Mirror::Recent](https://metacpan.org/pod/File::Rsync::Mirror::Recent) as service, monitoring `cpan-rsync.perl.org::CPAN/RECENT.recent` |
| 17 | +- nightly restart of the service (as it sometimes glitches) |
| 18 | +- nightly _standard_ rsync or `cpan-rsync.perl.org` to ensure no files are missed |
| 19 | +- About once a year there is a reason to remove a file manually, this currently has to happen on all servers |
| 20 | + |
| 21 | +#### Dependencies |
| 22 | + |
| 23 | +```mermaid |
| 24 | +graph TD; |
| 25 | + backpan <--> cpan.metacpan.org; |
| 26 | + backpan <--> fastapi.metacpan.org <--> metacpan.org; |
| 27 | +
|
| 28 | +``` |
| 29 | + |
| 30 | +## cpan.metacpan.org ( aka backpan.metacpan.org ) |
| 31 | + |
| 32 | +This site serves the static files from our `Backpan`. Fastly CDN is currently pointing to 2 ByteMark and 2 Liquid Web servers |
| 33 | + |
| 34 | +### :exclamation: Fastly stats (for 30 days) |
| 35 | +- Fastly delivered 4.7 TB |
| 36 | +- 36% was cached (seems low, but we don't ask fastly to cache everything...) |
| 37 | +- 44.5 million requests |
| 38 | +- 8% (3.7m) 4xx errors (mostly 404) |
| 39 | + |
| 40 | +### :question: What processes happen |
| 41 | + |
| 42 | +- nginx (on all `Backpan` servers) serves the `Backpan` files as per the [setup](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/common.yaml#L143-L153), worth noting the special [headers](https://github.com/metacpan/metacpan-puppet/blob/master/modules/metacpan/templates/web/metacpan-cpan-static/fastly.erb) we set for fastly |
| 43 | +- nginx is set to `autoindex` directories [e.g](https://cpan.metacpan.org/authors/id/L/LL/LLAP/) |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | +## fastapi.metacpan.org (aka api-v1) |
| 48 | + |
| 49 | +Current API... |
| 50 | + |
| 51 | +### :exclamation: Fastly stats (for 30 days) |
| 52 | +- Fastly delivered 3.43 TB |
| 53 | +- 50% served from cache |
| 54 | +- 312 million requests, |
| 55 | +- 224 million are fastly `passes` |
| 56 | + |
| 57 | +### :question: What processes happen |
| 58 | + |
| 59 | +- Uses ElasticSearch for searching |
| 60 | +- It can [unarchive](https://github.com/metacpan/metacpan-api/blob/master/lib/MetaCPAN/Model/Archive.pm) files from the `backpan`, either on a `ram disk` (for speed) or `tmp` dir |
| 61 | +- The `tmp` dir is around 350 GB (we provision 430 GB) |
| 62 | +- The `tmp` dir is purged nightly for anything over 30 days old |
| 63 | +- There are many [cron jobs](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/nodes/bm-mc-02.yaml#L29) using the [scripts](https://github.com/metacpan/metacpan-api/tree/master/lib/MetaCPAN/Script) |
| 64 | +- There is a [Minion Queue](https://docs.mojolicious.org/Minion) using a Postgres Database, this is so that indexing can be distributed across servers. |
| 65 | + |
| 66 | +## Metacpan.org (front end) |
| 67 | + |
| 68 | +### :exclamation: Fastly stats (for 30 days) |
| 69 | +- Fastly delivered 1.23 TB |
| 70 | +- 42% served from cache |
| 71 | +- 91 million requests |
| 72 | + |
| 73 | +### :question: What processes happen |
| 74 | + |
| 75 | +- uses `fastapi.metacpan.org` to search |
| 76 | + |
| 77 | +# Other sites |
| 78 | + |
| 79 | +## ElasticSearch |
| 80 | + |
| 81 | +We are moving to use the 2.4 version ElasticSearch are hosting for us |
| 82 | +(on a 3x32GB cluster). They have also supplied us with a 15G cluster |
| 83 | +on version 8 for our development environment as we refactor to that version. |
| 84 | + |
| 85 | +This will replace the cluster(s) we were running. |
| 86 | + |
| 87 | +## sco redirects... mcpan.org (search.cpan.org, sco.metacpan.org) |
| 88 | + |
| 89 | +- [puppet just starman](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/env/production.yaml#L61) |
| 90 | +- [code](https://github.com/metacpan/sco-redirect) |
| 91 | + |
| 92 | +## st.aticpan.org |
| 93 | + |
| 94 | +Uses `fastapi.metacpan.org` as it's backend (see below), but with special headers so script files are served |
| 95 | +as `text-plain` and so we can safely render context extracted from distributions |
| 96 | + |
| 97 | +### Fastly stats (for 30 days) |
| 98 | +- Fastly delivered 567 MB |
| 99 | +- Only 1% cached |
| 100 | + |
| 101 | +## api-v0-shim.metacpan.org (aka api.metacpan.org v0) |
| 102 | + |
| 103 | +Fastly rejects most requests with a 410, but if the user agents |
| 104 | +is cpanminus then it allows traffic to this domain which |
| 105 | +shims the request v1 syntax and returns the result. |
| 106 | + |
| 107 | + |
| 108 | +# Data |
| 109 | + |
| 110 | +``` |
| 111 | +cpan.metacpan.org is doing ~5TB pm 36% from cache |
| 112 | +fastapi.metacpan.org is doing ~3.5TB pm 50% from cache |
| 113 | +metacpan.org is going ~1.3TB pm 40% from cache |
| 114 | +
|
| 115 | +So we'll be serving ~6TB pm from our origin |
| 116 | +``` |
0 commit comments