Quick Links: Download Gideros Studio | Gideros Documentation | Gideros community chat | DONATE
how to extract characters from a json/txt file? — Gideros Forum

how to extract characters from a json/txt file?

piepie Member
edited January 2016 in General questions
Hi,
I am facing the translation of my game: I made a json file with all the strings I intend to use to send it to friends from other native languages.

Since every language has its own special characters(letters), I'd like to extract all the characters used in each specific language file, to a unique string for each language (to cache them on TTfont.new() ).

As an example, from the sentence:
"Both devices must be connected to the same local network"
I'd like to extract the string: aBbcdehiklmnostuvw"

I need to do this on every language file to get a "character set" for each language.

Maybe I am searching with the wrong keywords, since I can't find anything about this on the internet.

Do you have any suggestion on how I could do it?

Thank you :)

Comments

  • ar2rsawseenar2rsawseen Maintainer
    Accepted Answer
    I think mostly people would simply create a separate json property with string of all characters to cache.

    You could try to do that automatically, like iterating through every string (or rather splitting it into array and iterating it, http://stackoverflow.com/a/832414)
    and create one single table, holding character as key, and setting value to true, so in the end you would have a table like
    {
    a = true,
    b = true,
    c = true,
    --etc
    }

    Which would be a table with all unique symbols, and then you just iterate and concatenate all keys into single string.

    The only thing I'm afraid of is utf encoding. Then iterating string by symbols, you would need to make sure you iterate it not by bytes, by but real symbols, based on encoding. Maybe Lua does it automatically, maybe you need to do something additionally, I don't know.

    Likes: pie

    +1 -1 (+1 / -0 )Share on Facebook
  • piepie Member
    edited January 2016
    Here is what I did, of course feel free to use it.
    It seems to work with latin characters and some other random grapheme I tried.

    Could someone with knowledge in languages which use different sets of characters try it please?

    Thank you :)
    --[[extract characters from string ]]
    local StrxtractedChars = ""
    local tmpTabstr = {}
     
    --test string
    local str = "èèè test éà test òìù"
     
     
    --taken from <a href="http://stackoverflow.com/a/24196142" rel="nofollow">http://stackoverflow.com/a/24196142</a>
    for c in str:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
    		if not tmpTabstr[c] then
    			tmpTabstr[c] = true
    		end
    end
     
    for char in pairs(tmpTabstr) do
    StrxtractedChars = StrxtractedChars..char
    end
     
    print("CHARACTERS USED IN str:\n"..StrxtractedChars.."\nEND")
     
     
    --[[ same thing on txt file: it should be saved as UTF-8 without BOM. notepad++ does this ]]
     
    local extractedChars = ""
    local tmpTab = {}
     
    local file = io.open("testExtract.json", "r")
     
    for line in file:lines() do
    	for c in line:gmatch("[%z\1-\127\194-\244][\128-\191]*") do
    		--exclude carriage return and  { and }
    		--if not(string.byte(c) == 13) and not(string.byte(c) == 123) and not(string.byte(c) == 125) then 
    				if not tmpTab[c] then
    					tmpTab[c] = true
    				end	
    		--end
    	end
    end
     
    file:close()
     
    for char in pairs(tmpTab) do
    extractedChars = extractedChars..char
    end
     
    print("CHARACTERS USED IN FILE:\n"..extractedChars.."\nEND")
Sign In or Register to comment.