how to extract characters from a json/txt file?

pie · January 2016

Hi,
I am facing the translation of my game: I made a json file with all the strings I intend to use to send it to friends from other native languages.

Since every language has its own special characters(letters), I'd like to extract all the characters used in each specific language file, to a unique string for each language (to cache them on TTfont.new() ).

As an example, from the sentence:
"Both devices must be connected to the same local network"
I'd like to extract the string: aBbcdehiklmnostuvw"

I need to do this on every language file to get a "character set" for each language.

Maybe I am searching with the wrong keywords, since I can't find anything about this on the internet.

Do you have any suggestion on how I could do it?

Thank you

ar2rsawseen · January 2016

I think mostly people would simply create a separate json property with string of all characters to cache.

You could try to do that automatically, like iterating through every string (or rather splitting it into array and iterating it, http://stackoverflow.com/a/832414)
and create one single table, holding character as key, and setting value to true, so in the end you would have a table like
{
a = true,
b = true,
c = true,
--etc
}

Which would be a table with all unique symbols, and then you just iterate and concatenate all keys into single string.

The only thing I'm afraid of is utf encoding. Then iterating string by symbols, you would need to make sure you iterate it not by bytes, by but real symbols, based on encoding. Maybe Lua does it automatically, maybe you need to do something additionally, I don't know.

pie · January 2016

Here is what I did, of course feel free to use it.
It seems to work with latin characters and some other random grapheme I tried.

Could someone with knowledge in languages which use different sets of characters try it please?

Thank you

--[[extract characters from string ]]
local StrxtractedChars = ""
local tmpTabstr = {}
 
--test string
local str = "èèè test éà test òìù"
 
 
--taken from <a href="http://stackoverflow.com/a/24196142" rel="nofollow">http://stackoverflow.com/a/24196142</a>
for c in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
		if not tmpTabstr[c] then
			tmpTabstr[c] = true
		end
end
 
for char in pairs(tmpTabstr) do
StrxtractedChars = StrxtractedChars..char
end
 
print("CHARACTERS USED IN str:\n"..StrxtractedChars.."\nEND")
 
 
--[[ same thing on txt file: it should be saved as UTF-8 without BOM. notepad++ does this ]]
 
local extractedChars = ""
local tmpTab = {}
 
local file = io.open("testExtract.json", "r")
 
for line in file:lines() do
	for c in line:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
		--exclude carriage return and  { and }
		--if not(string.byte(c) == 13) and not(string.byte(c) == 123) and not(string.byte(c) == 125) then 
				if not tmpTab[c] then
					tmpTab[c] = true
				end	
		--end
	end
end
 
file:close()
 
for char in pairs(tmpTab) do
extractedChars = extractedChars..char
end
 
print("CHARACTERS USED IN FILE:\n"..extractedChars.."\nEND")

Howdy, Stranger!

Categories

In this Discussion

Top Posters

how to extract characters from a json/txt file?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Top Posters

how to extract characters from a json/txt file?

Comments