Commit Graph

9 Commits (e09fb30d34d1bab8ec51afa16745ea2d546f9355)

Author SHA1 Message Date
zeripath bc4764ffc6
Detect truncated utf-8 characters at the end of content as still representing utf-8 (#19773)
Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x
if there is a truncated character at the end of the partially read file.

This PR changes the detection algorithm to truncated utf8 characters at the end of the
buffer.

Fix #19743

Signed-off-by: Andrew Thornton <art27@cantab.net>
2022-05-21 14:06:24 +01:00
6543 54e9ee37a7
format with gofumpt (#18184)
* gofumpt -w -l .

* gofumpt -w -l -extra .

* Add linter

* manual fix

* change make fmt
2022-01-20 18:46:10 +01:00
Gusted ff2fd08228
Simplify parameter types (#18006)
Remove repeated type declarations in function definitions.
2021-12-20 04:41:31 +00:00
zeripath e429c1164e
Ensure that the detected charset order is set in chardet test (#12574)
TestToUTF8WithFallback is the cause of recurrent spurious test failures
even despite code to set the detected charset order.

The reason why this happens is because the preferred detected charset order
is not being initialised for these tests.

This PR simply ensures that this is set at the start of each test and would
allow different tests to be written to allow differing orders.

Replaces #12571
Close #12571

Signed-off-by: Andrew Thornton <art27@cantab.net>
2020-08-23 14:15:29 +01:00
zeripath a1ad188326
Fix chardet test and add ordering option (#11621)
* Fix chardet test and add ordering option

Signed-off-by: Andrew Thornton <art27@cantab.net>

* minor fixes

Signed-off-by: Andrew Thornton <art27@cantab.net>

* remove log

Signed-off-by: Andrew Thornton <art27@cantab.net>

* remove log2

Signed-off-by: Andrew Thornton <art27@cantab.net>

* only iterate through top results

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Update docs/content/doc/advanced/config-cheat-sheet.en-us.md

* slight restructure of for loop

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
2020-06-02 19:20:19 -03:00
guillep2k 356e1a70ea Reduce test sensibility (#8393) 2019-10-07 01:49:14 -04:00
guillep2k 2628b15ee3 Fix utf8 tests (#8192)
* Prevent compiler environment from making the tests fail

* Remove unused function

* Pass lint
2019-09-21 13:01:34 -04:00
guillep2k 6097ff68e7 Make encoding tests independent of LOCALE settings (#8018)
* Make encoding tests independent of LOCALE settings

* Fix fmt

* Force CI to restart
2019-09-02 19:08:07 -04:00
guillep2k 5a44be627c Convert files to utf-8 for indexing (#7814)
* Convert files to utf-8 for indexing

* Move utf8 functions to modules/base

* Bump repoIndexerLatestVersion to 3

* Add tests for base/encoding.go

* Changes to pass gosimple

* Move UTF8 funcs into new modules/charset package
2019-08-15 20:07:28 +08:00