Compare commits

..

4 commits

Author SHA1 Message Date
fb67c11eb6
add option for following symlinks, make extensions optional, new version!!
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/tag Build is passing
2021-04-04 23:52:16 +10:00
12d9001bb8
better mime type detection
- consider some/x-thing and some/thing to be identical
- use a patched version of mime_guess with many more extension/type mappings
2021-04-04 22:44:48 +10:00
55f73d8a9a
added a changelog 0u0 2021-04-04 22:41:37 +10:00
25c9efa2f1
fixed compilation on some obscurish architectures (e.g. powerpc), more clippy lints 2021-04-03 03:52:28 +10:00
11 changed files with 244 additions and 50 deletions

113
CHANGELOG.md Normal file
View file

@ -0,0 +1,113 @@
# Changelog
Dates are given in YYYY-MM-DD format.
## v0.2
### v0.2.11 (0201-)
#### Features
- fif can now traverse symlinks with the `-f`/`--follow-symlinks` flag
- Extensions are no longer mandatory - running fif without `-e` or `-E` will scan all files, regardless of extension
(files without extensions are still skipped unless the -S flag is used)
#### Bugfixes
- Fixed compilation on big endian 32-bit architectures (see
[here]https://github.com/bodil/smartstring/blob/v0.2.6/src/config.rs#L101-L103 for why that was a problem in the first
place)
- Fixed broken tests for the [`infer`] backend
#### Other
- Better mime type detection:
- Consider "some/x-thing" and "some/thing" to be identical
- Use a patched version of mime_guess (which took a while to make 0u0;) with many more extension<->type mappings
### v0.2.10 (2021-03-26)
- PowerShell support!
### v0.2.9 (2021-03-17)
- Replaced a bunch of `PathBuf`s with `Path`s, which should reduce memory usage
- Formatting improvements
### v0.2.8 (2021-03-03)
#### Features
- Added much more information - author, copyright, etc. - to `--help`/`-h` output
- Scan files without extensions with `-S` (by default, such files are ignored)
#### Bugfixes
- Using `-s` to skip hidden files no longer skips all files if the root directory itself is hidden
#### Other
- The `ScanError` enum now contains a `PathBuf` - Errors now return `ScanError` rather than `(ScanError, PathBuf)`
- Renamed modules in accordance with [Rust's API guidelines](https://rust-lang.github.io/api-guidelines/naming.html)
### v0.2.7 (2021-03-01)
- Default to `WARN`-level logging if `RUST_LOG` isn't set
- Added a drone CI config
- Added `test.py` for automated building and testing against Rust stable, beta, nightly, and the MSRV specified in
`Cargo.toml`
- Added a test for argument parsing
- Documentation! And lots of it! 0u0
### v0.2.6 (2021-02-28)
- Added tests!
- Default to [`xdg-mime`] on all Unixy platforms, not just Linux - this also includes the various *BSDs (I've tested
FreeBSD), macOS (haven't tested, but I have a very old MacBook running Leopard that has file preinstalled, so it
*should* work fine), Redox OS (haven't tested), etc.
### v0.2.5 (2021-02-27)
- Use [`xdg-mime`] by default on Linux, [`infer`] elsewhere
### v0.2.4 (2021-02-22)
- Proper(ish) XML document support
- Display version in help output
### v0.2.3+hotfix (2021-02-22)
- A quick hack to fix broken/non-existent support for XML document files - `docx`, `odt`, etc.
### v0.2.3 (2021-02-22)
#### Features
- Automatically disable [`xdg-mime`] backend on Windows
- Exit codes
- Improved error handling
- Retrieve extension sets from [`mime_guess`] rather than hardcoding them in
#### Bugfixes
- Improved SVG detection
#### Other
- Switched back from `printf` to `echo` in shell output
- More frequent and detailed comments
- Refactored `formats.rs`
- Exclude certain files and directories from the crate
### v0.2.2 (2021-02-20)
- Windows support
### v0.2.1 (2021-02-18)
#### Features
- Added extension sets -- you can now use, for example, `-E images` to check files with known image extensions
- Shell script output now uses `printf` instead of `echo`
- Added [`infer`] backend
#### Bugfixes
- Fixed broken singlethreaded support
#### Other
- Use a global backend instance instead of passing `&db` around constantly
- Use `rustfmt` 0u0
### v0.2.0 (2021-02-15)
#### Features
- Output a script rather than a list of misnamed files
- Parallel file scanning
- Added logging support
#### Bugfixes
- Handle filenames with invalid UTF-8
#### Other
- Added license
- Replaced [`structopt`] with [`clap`] 3 (beta)
- Specify 1.43.0 as minimum supported Rust version
## v0.1
### v0.1.0 (2021-02-04)
Initial commit!
- Only one backend - [`xdg-mime`]
- Prints files directly rather than outputting a script
- Only supported flags are `-e` (specify extensions) and `-s` (scan hidden files)
<!-- links -->
[`xdg-mime`]: https://crates.io/crates/xdg-mime
[`structopt`]: https://crates.io/crates/structopt
[`clap`]: https://crates.io/crates/clap
[`infer`]: https://crates.io/crates/infer
[`mime_guess`]: https://crates.io/crates/mime_guess

23
Cargo.lock generated
View file

@ -177,7 +177,7 @@ dependencies = [
[[package]]
name = "fif"
version = "0.2.10"
version = "0.2.11"
dependencies = [
"cached",
"cfg-if",
@ -284,9 +284,9 @@ dependencies = [
[[package]]
name = "libc"
version = "0.2.91"
version = "0.2.92"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8916b1f6ca17130ec6568feccee27c156ad12037880833a3b842a823236502e7"
checksum = "56d855069fafbb9b344c0f962150cd2c1187975cb1c22c1522c240d8c4986714"
[[package]]
name = "log"
@ -305,9 +305,9 @@ checksum = "0ee1c47aaa256ecabcaea351eae4a9b01ef39ed810004e298d2511ed284b1525"
[[package]]
name = "memoffset"
version = "0.6.1"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "157b4208e3059a8f9e78d559edc658e13df41410cb3ae03979c83130067fdd87"
checksum = "f83fb6581e8ed1f85fd45c116db8405483899489e38406156c25eb743554361d"
dependencies = [
"autocfg",
]
@ -320,9 +320,8 @@ checksum = "2a60c7ce501c71e03a9c9c0d35b861413ae925bd979cc7a4e30d060069aaac8d"
[[package]]
name = "mime_guess"
version = "2.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2684d4c2e97d99848d30b324b00c8fcc7e5c897b7cbb5819b09e7c90e8baf212"
version = "2.0.4"
source = "git+https://github.com/Lynnesbian/mime_guess#679d3b8887d30bd43a83f162d61b7226675c7012"
dependencies = [
"mime",
"unicase",
@ -393,9 +392,9 @@ dependencies = [
[[package]]
name = "proc-macro2"
version = "1.0.24"
version = "1.0.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e0704ee1a7e00d7bb417d0770ea303c1bccbabf0ef1667dae92b5967f5f8a71"
checksum = "a152013215dca273577e18d2bf00fa862b89b24169fb78c4c95aeb07992c9cec"
dependencies = [
"unicode-xid",
]
@ -550,9 +549,9 @@ checksum = "a2eb9349b6444b326872e140eb1cf5e7c522154d69e7a0ffb0fb81c06b37543f"
[[package]]
name = "syn"
version = "1.0.64"
version = "1.0.68"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3fd9d1e9976102a03c542daa2eff1b43f9d72306342f3f8b3ed5fb8908195d6f"
checksum = "3ce15dd3ed8aa2f8eeac4716d6ef5ab58b6b9256db41d7e1a0224c2788e8fd87"
dependencies = [
"proc-macro2",
"quote",

View file

@ -1,7 +1,7 @@
[package]
name = "fif"
description = "A command-line tool for detecting and optionally correcting files with incorrect extensions."
version = "0.2.10"
version = "0.2.11"
authors = ["Lynnesbian <lynne@bune.city>"]
edition = "2018"
license = "GPL-3.0-or-later"
@ -26,7 +26,6 @@ xdg-mime-backend = []
[dependencies]
walkdir = "2.3.1"
log = "0.4.14"
smartstring = "0.2.6"
mime_guess = "2.0.3"
snailquote = "0.3.0"
once_cell = "1.5.2"
@ -38,9 +37,14 @@ cfg-if = "1.0.0"
[target.'cfg(unix)'.dependencies]
xdg-mime = "0.3"
[target.'cfg(not(all(target_endian = "big", target_pointer_width = "32")))'.dependencies]
smartstring = "0.2.6"
[patch.crates-io]
# use git version while waiting on a release incorporating https://github.com/ebassi/xdg-mime-rs/commit/de5a6dd
xdg-mime = {git = "https://github.com/ebassi/xdg-mime-rs", version = "0.3", rev = "de5a6dd" }
xdg-mime = { git = "https://github.com/ebassi/xdg-mime-rs", version = "0.3", rev = "de5a6dd" }
# forked version with many more mime types
mime_guess = { git = "https://github.com/Lynnesbian/mime_guess", version = "2.0.4" }
[dependencies.clap]
version = "3.0.0-beta.2"

View file

@ -6,12 +6,19 @@ cargo clippy --tests -- \
-W clippy::pedantic \
-W clippy::complexity \
-W clippy::cargo \
-W clippy::float_cmp_const \
-W clippy::lossy_float_literal \
-W clippy::multiple_inherent_impl \
-W clippy::string_to_string \
-W clippy::wrong_pub_self_convention \
-A clippy::unused_io_amount \
-A clippy::redundant_closure_for_method_calls \
-A clippy::shadow_unrelated
-A clippy::shadow_unrelated \
-A clippy::option_if_let_else
# ALLOWS:
# unused_io_amount: there are two places where i want to read up to X bytes and i'm fine with getting less than that
# redundant_closure...: the alternative is often much more verbose
# shadow_unrelated: sometimes things that seem unrelated are actually related ;)
# option_if_let_else: the suggested code is usually harder to read than the original

View file

@ -1,7 +1,7 @@
use std::path::Path;
use crate::string_type::String;
use mime_guess::Mime;
use smartstring::alias::String;
use crate::inspectors::mime_extension_lookup;

View file

@ -167,6 +167,8 @@ impl Format for Script {
}
}
// PowerShell is a noun, not a type
#[allow(clippy::doc_markdown)]
/// PowerShell script.
pub struct PowerShell {}

View file

@ -8,9 +8,9 @@ use std::str::FromStr;
use cached::cached;
use mime_guess::Mime;
use smartstring::alias::String;
use crate::mime_db::MimeDb;
use crate::string_type::String;
/// The number of bytes to read initially.
///
@ -72,7 +72,24 @@ cached! {
// match on the mime's `essence_str` rather than the mime itself - mime_guess::get_mime_extensions ignores the type
// suffix, treating "image/svg+xml" as "image/svg", and thus fails to find any extensions. passing the essence_str
// (which includes the suffix) fixes this.
match mime_guess::get_mime_extensions_str(mime.essence_str()) {
let essence = mime.essence_str();
let mut exts = mime_guess::get_mime_extensions_str(essence);
if exts.is_none() {
// no matches :c
// mime_guess' database isn't exactly perfect... there are a lot of times where the db will return "some/x-thing"
// but mime_guess only understands "some/thing", or vice-versa.
// so, if there appear to be no extensions, try replacing "some/x-thing" with "some/thing", or "some/thing" with
// "some/x-thing".
if essence.contains("/x-") {
// replace e.g. "application/x-gzip" with "application/gzip"
exts = mime_guess::get_mime_extensions_str(&essence.replace("/x-", "/"));
} else {
// replace e.g. "video/mp2t" with "video/x-mp2t"
exts = mime_guess::get_mime_extensions_str(&essence.replace("/", "/x-"));
}
}
match exts {
Some(exts) => {
let possible_exts: Vec<String> = exts.iter().map(|e| String::from(*e)).collect();

View file

@ -41,6 +41,7 @@ mod inspectors;
mod mime_db;
mod parameters;
mod scan_error;
pub(crate) mod string_type;
#[cfg(test)]
mod tests;
@ -80,7 +81,7 @@ fn main() {
debug!("Checking files with extensions: {:?}", extensions);
let entries = scan_directory(&args.dirs, &extensions, &args.get_scan_opts());
let entries = scan_directory(&args.dirs, extensions.as_ref(), &args.get_scan_opts());
if entries.is_none() {
// no need to log anything for fatal errors - fif will already have printed something obvious like
@ -111,8 +112,9 @@ fn main() {
match result {
Ok(r) => {
debug!(
"{:?} should have file extension {}",
"{:?} is {}, should have file extension {}",
r.file,
r.mime,
r.recommended_extension().unwrap_or_else(|| "???".into())
)
}
@ -164,8 +166,8 @@ cfg_if! {
}
/// Returns `true` if a file matches the given criteria. This means checking whether the file's extension appears in
/// `exts`, potentially skipping over hidden files, and so on.
fn wanted_file(entry: &DirEntry, exts: &[&str], scan_opts: &ScanOpts) -> bool {
/// `exts` (if specified), potentially skipping over hidden files, and so on.
fn wanted_file(entry: &DirEntry, exts: Option<&Vec<&str>>, scan_opts: &ScanOpts) -> bool {
if entry.depth() == 0 {
// the root directory should always be scanned.
return true;
@ -188,7 +190,13 @@ fn wanted_file(entry: &DirEntry, exts: &[&str], scan_opts: &ScanOpts) -> bool {
return false;
}
if let Some(exts) = exts {
// only scan if the file has one of the specified extensions.
exts.contains(&ext.unwrap().to_string_lossy().to_lowercase().as_str())
} else {
// no extensions specified - no reason not to scan this file.
true
}
}
/// Given a file path, returns its extension, using [`std::path::Path::extension`].
@ -263,8 +271,8 @@ fn scan_from_walkdir(entries: &[DirEntry]) -> Vec<Result<Findings, ScanError>> {
/// Scans a given directory with [`WalkDir`], filters with [`wanted_file`], checks for errors, and returns a vector of
/// [DirEntry]s.
fn scan_directory(dirs: &Path, exts: &[&str], scan_opts: &ScanOpts) -> Option<Vec<DirEntry>> {
let stepper = WalkDir::new(dirs).into_iter();
fn scan_directory(dirs: &Path, exts: Option<&Vec<&str>>, scan_opts: &ScanOpts) -> Option<Vec<DirEntry>> {
let stepper = WalkDir::new(dirs).follow_links(scan_opts.follow_symlinks).into_iter();
let mut probably_fatal_error = false;
let entries: Vec<DirEntry> = stepper
.filter_entry(|e| wanted_file(e, exts, scan_opts)) // filter out unwanted files

View file

@ -1,11 +1,10 @@
//! [Clap] struct used to parse command line arguments.
use std::path::PathBuf;
use crate::extension_set::ExtensionSet;
use crate::string_type::String as StringType;
use cfg_if::cfg_if;
use clap::{AppSettings, Clap};
use smartstring::{LazyCompact, SmartString};
use std::path::PathBuf;
cfg_if! {
if #[cfg(windows)] {
@ -43,17 +42,11 @@ pub enum OutputFormat {
)]
pub struct Parameters {
/// Only examine files with these extensions (Comma-separated list)
#[clap(
short,
long,
use_delimiter = true,
require_delimiter = true,
required_unless_present = "ext-set"
)]
pub exts: Option<Vec<SmartString<LazyCompact>>>,
#[clap(short, long, use_delimiter = true, require_delimiter = true, group = "extensions")]
pub exts: Option<Vec<StringType>>,
/// Use a preset list of extensions as the search filter
#[clap(short = 'E', long, arg_enum, required_unless_present = "exts")]
#[clap(short = 'E', long, arg_enum, group = "extensions")]
pub ext_set: Option<ExtensionSet>,
/// Don't skip hidden files and directories
@ -68,6 +61,10 @@ pub struct Parameters {
#[clap(short, long, default_value = DEFAULT_FORMAT, arg_enum)]
pub output_format: OutputFormat,
/// Follow symlinks
#[clap(short, long)]
pub follow_symlinks: bool,
/// Directory to process
// TODO: right now this can only take a single directory - should this be improved?
#[clap(name = "DIR", default_value = ".", parse(from_os_str))]
@ -75,24 +72,27 @@ pub struct Parameters {
}
/// Further options relating to scanning.
#[derive(PartialEq, Debug)]
pub struct ScanOpts {
/// Whether hidden files and directories should be scanned.
pub hidden: bool,
/// Whether files without extensions should be scanned.
pub extensionless: bool,
/// Should symlinks be followed?
pub follow_symlinks: bool,
}
impl Parameters {
pub fn extensions(&self) -> Vec<&str> {
pub fn extensions(&self) -> Option<Vec<&str>> {
if let Some(exts) = &self.exts {
// extensions supplied like "-e png,jpg,jpeg"
exts.iter().map(|s| s.as_str()).collect()
Some(exts.iter().map(|s| s.as_str()).collect())
} else if let Some(exts) = &self.ext_set {
// extensions supplied like "-E images"
exts.extensions()
Some(exts.extensions())
} else {
// neither -E nor -e was passed - this should be impossible
unreachable!()
// neither -E nor -e was passed
None
}
}
@ -100,6 +100,7 @@ impl Parameters {
ScanOpts {
hidden: self.scan_hidden,
extensionless: self.scan_extensionless,
follow_symlinks: self.follow_symlinks,
}
}
}

11
src/string_type.rs Normal file
View file

@ -0,0 +1,11 @@
use cfg_if::cfg_if;
cfg_if! {
if #[cfg(not(all(target_endian = "big", target_pointer_width = "32")))] {
// most architectures
pub use smartstring::alias::String;
} else {
// powerpc and other big endian 32-bit archs
pub use std::string::String;
}
}

View file

@ -1,12 +1,13 @@
use crate::inspectors::{mime_extension_lookup, BUF_SIZE};
use crate::{extension_from_path, init_db, scan_directory, scan_from_walkdir};
use crate::parameters::{Parameters, ScanOpts};
use crate::mime_db::MimeDb;
use crate::parameters::{Parameters, ScanOpts};
use crate::string_type::String;
use cfg_if::cfg_if;
use mime_guess::mime::{APPLICATION_OCTET_STREAM, APPLICATION_PDF, IMAGE_JPEG, IMAGE_PNG};
use mime_guess::Mime;
use smartstring::alias::String;
use std::borrow::Borrow;
use std::collections::HashMap;
use std::ffi::OsStr;
@ -100,11 +101,12 @@ fn simple_directory() {
let scan_opts = ScanOpts {
hidden: true,
extensionless: false,
follow_symlinks: false,
};
let entries = scan_directory(
&dir.path().to_path_buf(),
&["jpg", "jpeg", "png", "pdf", "zip"],
Some(&vec!["jpg", "jpeg", "png", "pdf", "zip"]),
&scan_opts,
)
.expect("Directory scan failed.");
@ -156,21 +158,51 @@ fn simple_directory() {
fn argument_parsing() {
use clap::Clap;
// check if "jpg" is in the list of extensions to be considered when passing "-E images"
let args: Parameters = Parameters::parse_from(vec!["fif", "-E", "images"]);
assert!(args.extensions().contains(&"jpg"));
// pass `-f`, which enables following symlinks, and `-E images`, which scans files with image extensions
let args: Parameters = Parameters::parse_from(vec!["fif", "-f", "-E", "images"]);
// check if "jpg" is in the list of extensions to be scanned
assert!(args
.extensions()
.expect("args.extensions() should contain the `images` set!")
.contains(&"jpg"));
// make sure "scan_hidden" is false
assert!(!args.scan_hidden);
// exts should be none
assert!(args.exts.is_none());
// get the ScanOpts, and make sure they match expectations
assert_eq!(
args.get_scan_opts(),
ScanOpts {
hidden: false,
extensionless: false,
follow_symlinks: true
}
)
}
#[test]
fn rejects_bad_args() {
use clap::Clap;
assert!(Parameters::try_parse_from(vec!["fif", "-abcdefg", "-E", "-e"]).is_err());
let tests = [
// Non-existent flags:
vec!["fif", "-abcdefghijklmnopqrstuvwxyz"],
// `-E` without specifying a set:
vec!["fif", "-E"],
// `-E` with an invalid set:
vec!["fif", "-E", "pebis"],
// `-E` and `-e`:
vec!["fif", "-E", "media", "-e", "jpg"],
// `-e` with nothing but commas:
vec!["fif", "-e", ",,,,,"],
];
for test in &tests {
assert!(Parameters::try_parse_from(test).is_err(), "Failed to reject {:?}", test);
}
}
#[test]