summaryrefslogtreecommitdiff
path: root/libmat2/office.py (follow)
AgeCommit message (Collapse)Author
2023-01-28Another typing passjvoisin
2023-01-28Fix the type annotationsjvoisin
2022-12-25Improve xlsx supportjvoisin
2022-11-21Remove pyflakesjvoisin
Isn't borderline useless compared to mypy and pylint
2022-08-28Simplify the typing annotationsjvoisin
2021-12-26Please pylint by iterating on dict directly, instead of calling .keys()jvoisin
2021-07-14Improve xlsx supportjvoisin
This should close #156
2021-05-20Improve support for xlsx filesjvoisin
2021-03-14Keep sharedStrings.xml when processing MSOffice sheetsjvoisin
2021-03-14Don't keep [trash] files when processing MS Office filesjvoisin
2020-11-13Bump coveragejvoisin
2020-11-06Handle multiple namespaces in MSOffice's content typesjvoisin
2020-11-06Fix a regexp for xsls filesjvoisin
This should increase a bit the compability with Excel files
2020-05-17Improve a bit Microsoft word supportjvoisin
2020-04-06Improve xlsx supportjvoisin
2020-04-02Improve xlsx supportjvoisin
2020-03-08Vastly improve ppt compatibilityjvoisin
2020-03-07Improve compatibility with MS Office of cleaned pptjvoisin
2020-03-07Improve a bit ppt supportjvoisin
2020-03-07Improve a bit the support of ppt filesjvoisin
2019-11-30Improve a bit ppt supportjvoisin
2019-11-30Improve a bit odt handlingjvoisin
2019-10-17Improve a bit the support for ppt filesjvoisin
2019-09-01Improve a bit the comments in the codejvoisin
This is related to the previous commit
2019-09-01Remove nsid fields from MSOffice documentsjvoisin
nsids are random identifiers, usually used to ease merging between documents, and can trivially be used for fingerprinting.
2019-04-27Add tar archive supportjvoisin
2019-03-05Refactor {black,white}list into {block,allow}listBrolf
Closes #96
2019-02-08Improve a bit get_meta for libreoffice filesjvoisin
2019-02-07Use of the archive refactoring for the office documents toojvoisin
2019-02-04Refactor a bit office get_meta handlingjvoisin
This should make easier to get more metadata from archive-based file formats.
2019-02-03Whenever possible, use bwrap for subprocessesintrigeri
This should closes #90
2018-10-25Implement get_meta() for archivesjvoisin
2018-10-12Bump mypy typing coveragejvoisin
2018-10-05Improve both the typing and the commentsjvoisin
2018-10-04Trash word/people.xml in office filesjvoisin
2018-10-03Don't break office files for MS Officejvoisin
We didn't take the whitelist into account while removing dangling files from [Content_types].xml
2018-10-03Improve mat2's cli reliabilityjvoisin
- Replace some class members by instance members - Don't thread the cleaning process anymore for now
2018-10-02Use [Content_Types].xml to improve MS Office coveragejvoisin
2018-10-02fix typogeorg
2018-10-01Files processed via MAT2 are now accepted without warnings by MS Officejvoisin
2018-09-30Please mypyjvoisin
2018-09-30Remove dangling references in MS Office's [Content_types].xmljvoisin
2018-09-24Second pass of minor formattingjvoisin
2018-09-24Fix some minor formatting issuesjvoisin
2018-09-24Implement rsid stripping for office filesjvoisin
MS Office XML rsid is a "unique identifier used to track the editing session when the physical character representing this section mark was last formatted." See the following links for details: - https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx - https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24Lexicographical sort on xml attributes for office filesjvoisin
In XML, the order of the attributes shouldn't be meaningful, however, MS Office sorts attributes for a given XML tag differently than LibreOffice.
2018-09-06Split office and archivesjvoisin
2018-09-05Unknown Members: make policy use an EnumDaniel Kahn Gillmor
Closes #60 Note: this changeset also ensures that clean.cleaned.docx is removed up after the pytest is over.
2018-09-05Remove defusedxml support and document whyjvoisin
2018-09-05Improve the previous commitjvoisin