一个关于base64编码的坑

2016年07月11日

base64 encoding, as known, is available to transmit data from binary, into (most commonly) ASCII characters so that it’s more easily transmitted in things like e-mail and HTML form data. Due to the likeliness that the recieving end can handle ASCII, it makes it a nice way to transfer binary data via a text stream. Your binary data could be screwed up because the underlying protocol might think that you’ve entered a special character combination. If your situation can handle native binary data, that will most likely yield better results, in terms of speed and such, but if not, base64 is most likely the way to go. JSON is a great example of when you would benefit from this, or when it needs to be stored in a text field somewhere.

Problem description:

If we received a base64 encoded ASCII characters, I want to base64 decode it on my recieving end.

IntcbiAgXCJzdWJcIjogXCIxMjM0NTY3ODkwXCIsXG4gIFwibmFtZVwiOiBcIk1hcnMgTWFcIixcbiAgXCJhZG1pblwiOiB0cnVlXG59Ig

Decoding the string:

$ base64 -d
IntcbiAgXCJzdWJcIjogXCIxMjM0NTY3ODkwXCIsXG4gIFwibmFtZVwiOiBcIk1hcnMgTWFcIixcbiAgXCJhZG1pblwiOiB0cnVlXG59Ig
Ctrl+D "{\n  \"sub\": \"1234567890\",\n  \"name\": \"Mars Ma\",\n  \"admin\": true\n}"base64: invalid input

Test exmaple for golang:

package main

import (
	"fmt"
	"encoding/base64"
)

func main() {
	str := "IntcbiAgXCJzdWJcIjogXCIxMjM0NTY3ODkwXCIsXG4gIFwibmFtZVwiOiBcIk1hcnMgTWFcIixcbiAgXCJhZG1pblwiOiB0cnVlXG59Ig"
	data, err := base64.URLEncoding.DecodeString(str)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Printf("%q\n", data)
}

$ go run base64_demo.go 
illegal base64 data at input byte 104
"\"{\\n  \\\"sub\\\": \\\"1234567890\\\",\\n  \\\"name\\\": \\\"Mars Ma\\\",\\n  \\\"admin\\\": true\\n}"

About padding

The == sequence indicates that the last group contained only one byte, and = indicates that it contained two bytes. The example below illustrates how truncating the input of the whole of the above quote changes the output padding:

Input Length Output Length Padding
any carnal pleasure. 20 YW55IGNhcm5hbCBwbGVhc3VyZS4= 28 1
any carnal pleasure 19 YW55IGNhcm5hbCBwbGVhc3VyZQ== 28 2
any carnal pleasur 18 YW55IGNhcm5hbCBwbGVhc3Vy 24 0
any carnal pleasu 17 YW55IGNhcm5hbCBwbGVhc3U= 24 1
any carnal pleas 16 YW55IGNhcm5hbCBwbGVhcw== 24 2

Decoding Base64 with padding

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single ‘=’ indicates that the four characters will decode to only two bytes, while ‘==’ indicates that the four characters will decode to only a single byte.

$ base64 -d
YW55IGNhcm5hbCBwbGVhc3VyZQ==
any carnal pleasure
$ base64 -d
YW55IGNhcm5hbCBwbGVhc3Vy    
any carnal pleasur

Problem Solution

Our receiving decoded string, is base64URL encoding. URLEncoding is the alternate base64 encoding( StdEncoding ) defined in RFC 4648. It is typically used in URLs and file names. There is another encoding for the unpadded alternate base64 encoding defined with RawURLEncoding. StdEncoding is same, and also have its Raw encoding. Raw encoding will omit padding characters, like newline character.

Solution for golang

package main

import (
	"fmt"
	"encoding/base64"
)

func main() {
	str := "IntcbiAgXCJzdWJcIjogXCIxMjM0NTY3ODkwXCIsXG4gIFwibmFtZVwiOiBcIk1hcnMgTWFcIixcbiAgXCJhZG1pblwiOiB0cnVlXG59Ig"
	data, err := base64.RawURLEncoding.DecodeString(str)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Printf("%q\n", data)
}

$ go run base64_demo.go 
"\"{\\n  \\\"sub\\\": \\\"1234567890\\\",\\n  \\\"name\\\": \\\"Mars Ma\\\",\\n  \\\"admin\\\": true\\n}"

Solution for Lua

For Lua language, the lib base64 cannot also address this problem, but baseed on base64, we can make some adjustment:

-- base64 decode with padding
local function b64_decode(input)
  local reminder = #input % 4

  if reminder > 0 then
    local padlen = 4 - reminder
    input = input..string.rep('=', padlen)
  end

  input = input:gsub("-", "+"):gsub("_", "/")
  return base64.decode(input)
end

or

-- character table string
local b='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

-- decoding
local function b64_decode(data)
    data = string.gsub(data, '[^'..b..'=]', '')
    return (data:gsub('.', function(x)
        if (x == '=') then return '' end
        local r,f='',(b:find(x)-1)
        for i=6,1,-1 do r=r..(f%2^i-f%2^(i-1)>0 and '1' or '0') end
        return r;
    end):gsub('%d%d%d?%d?%d?%d?%d?%d?', function(x)
        if (#x ~= 8) then return '' end
        local c=0
        for i=1,8 do c=c+(x:sub(i,i)=='1' and 2^(8-i) or 0) end
        return string.char(c)
    end))
end