summaryrefslogtreecommitdiff
path: root/libmat2/office.py (follow)
AgeCommit message (Collapse)Author
2018-10-01Files processed via MAT2 are now accepted without warnings by MS Officejvoisin
2018-09-30Please mypyjvoisin
2018-09-30Remove dangling references in MS Office's [Content_types].xmljvoisin
2018-09-24Second pass of minor formattingjvoisin
2018-09-24Fix some minor formatting issuesjvoisin
2018-09-24Implement rsid stripping for office filesjvoisin
MS Office XML rsid is a "unique identifier used to track the editing session when the physical character representing this section mark was last formatted." See the following links for details: - https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx - https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24Lexicographical sort on xml attributes for office filesjvoisin
In XML, the order of the attributes shouldn't be meaningful, however, MS Office sorts attributes for a given XML tag differently than LibreOffice.
2018-09-06Split office and archivesjvoisin
2018-09-05Unknown Members: make policy use an EnumDaniel Kahn Gillmor
Closes #60 Note: this changeset also ensures that clean.cleaned.docx is removed up after the pytest is over.
2018-09-05Remove defusedxml support and document whyjvoisin
2018-09-05Improve the previous commitjvoisin
2018-09-04office: try all members, even when one failsDaniel Kahn Gillmor
the end result will be the same -- an abort -- but the user will get to see all the warnings for a particular file, instead of getting them one at a time.
2018-09-04document all unknown/unhandlable files even on abortDaniel Kahn Gillmor
This makes it easy to get a list of all files that mat2 doesn't know how to handle, without having to choose -u keep or -u omit.
2018-09-04office: create policy for what to do about unknown membersDaniel Kahn Gillmor
previously, encountering an unknown member meant that any parser of this type would abort. now, the user can set parser.unknown_member_policy to either 'omit' or 'keep' if they don't want the current action of 'abort' note that this causes pylint to complain about branching depth for remove_all() because of the nuanced error-handling. I've disabled this check.
2018-09-01Fix a minor formatting issuejvoisin
2018-09-01Logging cleanupdkg
2018-07-19Improve the code's documentationjvoisin
2018-07-19Minor simplification in how we're handling xml for office filesjvoisin
2018-07-10Remove `print` from libmat, and use the `logging` module insteadjvoisin
This should close #28
2018-07-09Make pylint even happierjvoisin
2018-07-08Fix some pep8 issues spotted by pyflakesjvoisin
2018-07-08Achieve 100% coverage!jvoisin
2018-07-08Bump coverage for office files and fix some related crashesjvoisin
2018-07-08Silence a mypy's stupid warningjvoisin
2018-07-08Add defusedxml as an (optional) way to prevent XML-based attacksjvoisin
Those attacks are DoS-only.
2018-07-07Fix a mistake in office file revisions handlingjvoisin
2018-07-02Improve a bit the formatting of the code thanks to pyflakes3jvoisin
2018-07-01Remove docx revisionsjvoisin
2018-07-01MAT2 is now cleaning revisions from odt files!jvoisin
2018-07-01Remove the thumbnails from libreoffice filesjvoisin
2018-06-27Massively simplify how we're cleaning office filesjvoisin
2018-06-21Improve the reliability of the office parserjvoisin
2018-06-21Fix some linter warningsjvoisin
2018-06-21Refactor how offices files are handledjvoisin
- xml files are no longer considered harmless - Factorization of the `remove_all` method for office files - Explicit whitelist are used - Blacklist are used to skip files completely - Non-blacklisted files are _still cleaned_ - Unsupported files are still triggering an error
2018-06-21Minor simplification of the office-related codejvoisin
2018-06-10Minor code simplificationjvoisin
2018-06-10Make the parsing of office format's metadata more robustjvoisin
2018-06-10Add some tests for non-supported embedded fileformatsjvoisin
2018-06-04Add more typing and use mypy in the CIjvoisin
2018-05-18Rename some files to simplify packagingjvoisin
- the `src` folder is now `libmat2` - the `main.py` script is now `mat2.py`