diff options
| author | jvoisin | 2019-05-16 20:59:15 +0200 |
|---|---|---|
| committer | jvoisin | 2019-05-16 20:59:15 +0200 |
| commit | 13d71a256587c2eb41904480ea9a7bce8e46cd3d (patch) | |
| tree | 81f165f4fa41dc10710adbbe69a96e88218a2f1d /doc/implementation_notes.md | |
| parent | 35d550d229b219f5a02cb9194c3bd24329f975ed (diff) | |
Document the archives handling implementation's details
Diffstat (limited to '')
| -rw-r--r-- | doc/implementation_notes.md | 26 |
1 files changed, 21 insertions, 5 deletions
diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md index cbf76ee..7555d2e 100644 --- a/doc/implementation_notes.md +++ b/doc/implementation_notes.md | |||
| @@ -12,11 +12,16 @@ images in a PDF or an office document. | |||
| 12 | Revisions handling | 12 | Revisions handling |
| 13 | ------------------ | 13 | ------------------ |
| 14 | 14 | ||
| 15 | Revisions are handled according to the principle of least astonishment: they are entirely removed. | 15 | Revisions are handled according to the principle of least astonishment: they |
| 16 | are entirely removed. | ||
| 16 | 17 | ||
| 17 | - Either the users aren't aware of the revisions, are thus they should be deleted. For example journalists that are editing a document to erase mentions sources mentions. | 18 | - Either the users aren't aware of the revisions, are thus they should be |
| 19 | deleted. For example journalists that are editing a document to erase | ||
| 20 | mentions sources mentions. | ||
| 18 | 21 | ||
| 19 | - Or they are aware of it, and will likely not expect MAT2 to be able to keep the revisions, that are basically traces about how, when and who edited the document. | 22 | - Or they are aware of it, and will likely not expect MAT2 to be able to keep |
| 23 | the revisions, that are basically traces about how, when and who edited the | ||
| 24 | document. | ||
| 20 | 25 | ||
| 21 | 26 | ||
| 22 | Race conditions | 27 | Race conditions |
| @@ -37,8 +42,19 @@ against them | |||
| 37 | Archives handling | 42 | Archives handling |
| 38 | ----------------- | 43 | ----------------- |
| 39 | 44 | ||
| 40 | MAT2 doesn't support archives yet, because we haven't found an usable way to ask the user | 45 | By default, when cleaning a non-support file format in an archive, |
| 41 | what to do when a non-supported files are encountered. | 46 | mat2 will abort with a detailed error message. |
| 47 | While strongly discouraged, it's possible to override this behaviour to force | ||
| 48 | the exclusion, or inclusion of unknown files into the cleaned archive. | ||
| 49 | |||
| 50 | While Python's [zipfile](https://docs.python.org/3/library/zipfile.html) module | ||
| 51 | provides *safe* way to extract members of a zip archive, the | ||
| 52 | [tarfile](https://docs.python.org/3/library/tarfile.html) one doesn't, | ||
| 53 | meaning that it's up to mat2 to implement safety checks. Currently, | ||
| 54 | it defends against path-traversal, both relative and absolute, | ||
| 55 | symlink-related attacks, setuid/setgid attacks, duplicate members, block and | ||
| 56 | char devices, … but there might still be dragons lurking there. | ||
| 57 | |||
| 42 | 58 | ||
| 43 | PDF handling | 59 | PDF handling |
| 44 | ------------ | 60 | ------------ |
