Skip to content

Commit 38f5c9f

Browse files
authored
Merge pull request #92 from metacpan/leo/api-history
Some docs on current requirements
2 parents cd498be + 0df1f83 commit 38f5c9f

File tree

1 file changed

+116
-0
lines changed

1 file changed

+116
-0
lines changed

apps/api/pre-k8s.md

+116
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Current infrastructure for metacpan-api / cpan.metacpan.org
2+
3+
## Backpan - files on disk
4+
5+
All servers have a full copy of the MetaCPAN's BackPAN (a full copy of
6+
all files, even those marked as deleted in `CPAN`). This means users
7+
can always get hold of a file.
8+
9+
There is currently 102G - and it is not growing fast.
10+
11+
The BackPAN is added to on each server by getting new updates from `cpan-rsync.perl.org`.
12+
The we have [puppet setup rrrclient](https://github.com/metacpan/metacpan-puppet/tree/master/modules/rrrclient), it is initiated from [metacpan::rrrclient](https://github.com/metacpan/metacpan-puppet/blob/master/modules/metacpan/manifests/rrrclient.pp).
13+
14+
### :question: What processes happen?
15+
16+
- Run [File::Rsync::Mirror::Recent](https://metacpan.org/pod/File::Rsync::Mirror::Recent) as service, monitoring `cpan-rsync.perl.org::CPAN/RECENT.recent`
17+
- nightly restart of the service (as it sometimes glitches)
18+
- nightly _standard_ rsync or `cpan-rsync.perl.org` to ensure no files are missed
19+
- About once a year there is a reason to remove a file manually, this currently has to happen on all servers
20+
21+
#### Dependencies
22+
23+
```mermaid
24+
graph TD;
25+
backpan <--> cpan.metacpan.org;
26+
backpan <--> fastapi.metacpan.org <--> metacpan.org;
27+
28+
```
29+
30+
## cpan.metacpan.org ( aka backpan.metacpan.org )
31+
32+
This site serves the static files from our `Backpan`. Fastly CDN is currently pointing to 2 ByteMark and 2 Liquid Web servers
33+
34+
### :exclamation: Fastly stats (for 30 days)
35+
- Fastly delivered 4.7 TB
36+
- 36% was cached (seems low, but we don't ask fastly to cache everything...)
37+
- 44.5 million requests
38+
- 8% (3.7m) 4xx errors (mostly 404)
39+
40+
### :question: What processes happen
41+
42+
- nginx (on all `Backpan` servers) serves the `Backpan` files as per the [setup](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/common.yaml#L143-L153), worth noting the special [headers](https://github.com/metacpan/metacpan-puppet/blob/master/modules/metacpan/templates/web/metacpan-cpan-static/fastly.erb) we set for fastly
43+
- nginx is set to `autoindex` directories [e.g](https://cpan.metacpan.org/authors/id/L/LL/LLAP/)
44+
45+
46+
47+
## fastapi.metacpan.org (aka api-v1)
48+
49+
Current API...
50+
51+
### :exclamation: Fastly stats (for 30 days)
52+
- Fastly delivered 3.43 TB
53+
- 50% served from cache
54+
- 312 million requests,
55+
- 224 million are fastly `passes`
56+
57+
### :question: What processes happen
58+
59+
- Uses ElasticSearch for searching
60+
- It can [unarchive](https://github.com/metacpan/metacpan-api/blob/master/lib/MetaCPAN/Model/Archive.pm) files from the `backpan`, either on a `ram disk` (for speed) or `tmp` dir
61+
- The `tmp` dir is around 350 GB (we provision 430 GB)
62+
- The `tmp` dir is purged nightly for anything over 30 days old
63+
- There are many [cron jobs](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/nodes/bm-mc-02.yaml#L29) using the [scripts](https://github.com/metacpan/metacpan-api/tree/master/lib/MetaCPAN/Script)
64+
- There is a [Minion Queue](https://docs.mojolicious.org/Minion) using a Postgres Database, this is so that indexing can be distributed across servers.
65+
66+
## Metacpan.org (front end)
67+
68+
### :exclamation: Fastly stats (for 30 days)
69+
- Fastly delivered 1.23 TB
70+
- 42% served from cache
71+
- 91 million requests
72+
73+
### :question: What processes happen
74+
75+
- uses `fastapi.metacpan.org` to search
76+
77+
# Other sites
78+
79+
## ElasticSearch
80+
81+
We are moving to use the 2.4 version ElasticSearch are hosting for us
82+
(on a 3x32GB cluster). They have also supplied us with a 15G cluster
83+
on version 8 for our development environment as we refactor to that version.
84+
85+
This will replace the cluster(s) we were running.
86+
87+
## sco redirects... mcpan.org (search.cpan.org, sco.metacpan.org)
88+
89+
- [puppet just starman](https://github.com/metacpan/metacpan-puppet/blob/master/hieradata/env/production.yaml#L61)
90+
- [code](https://github.com/metacpan/sco-redirect)
91+
92+
## st.aticpan.org
93+
94+
Uses `fastapi.metacpan.org` as it's backend (see below), but with special headers so script files are served
95+
as `text-plain` and so we can safely render context extracted from distributions
96+
97+
### Fastly stats (for 30 days)
98+
- Fastly delivered 567 MB
99+
- Only 1% cached
100+
101+
## api-v0-shim.metacpan.org (aka api.metacpan.org v0)
102+
103+
Fastly rejects most requests with a 410, but if the user agents
104+
is cpanminus then it allows traffic to this domain which
105+
shims the request v1 syntax and returns the result.
106+
107+
108+
# Data
109+
110+
```
111+
cpan.metacpan.org is doing ~5TB pm 36% from cache
112+
fastapi.metacpan.org is doing ~3.5TB pm 50% from cache
113+
metacpan.org is going ~1.3TB pm 40% from cache
114+
115+
So we'll be serving ~6TB pm from our origin
116+
```

0 commit comments

Comments
 (0)