2012-10-02

  Cloud Tip FS – Create a POSIX-compliant encrypted filesystem on every “storage cloud”

Problem description / Motivation

There are more and more providers for cheap or even free “online storage”, “cloud storage”, “internet hard disks” etc. But all of them have most of these problems:

Solution

Cloud Tip FS is a FUSE-based file system that provides a fully POSIX-compliant file system at client side. The data and metadata are stored in “chunks”. Each chunk is an encrypted ordinary file in another specified directory, called “cloud directory”. This directory can be synchronized with any online storage you like, either manually or automatically with the proprietary client software, depending on the storage provider you use.

For some storage providers there are already FUSE-based clients, so the cloud directory doesn't consume local storage space at all. :-)

Example

Imagine a directory like this:
~/down/linux-3.0.0$ ls
COPYING        Kbuild       Makefile        arch    drivers   include  kernel  net      security  usr
CREDITS        Kconfig      README          block   firmware  init     lib     samples  sound     virt
Documentation  MAINTAINERS  REPORTING-BUGS  crypto  fs        ipc      mm      scripts  tools
~/down/linux-3.0.0$ du -h --max-depth=0 .
489M    .
~/down/linux-3.0.0$ _
The content of this directory is stored in ~/cloud/ and if we list it we'll see only something like this:
~/cloud$ ls
baossoyckao511702305.jpg    giegnuyquyo76065818.jpg     lossyumue1884661237.jpg     raortyablio424238335.jpg
beoschooni392035568.jpg     giowrirrio498777856.jpg     lyellioffyu760313750.jpg    raullyiblyu135497281.jpg
beugribrue1973594324.jpg    gonnyizyo160051528.jpg      lyemmitzio2130794395.jpg    rauwrauffyo1335354340.jpg
buisiartie150122846.jpg     gookniowyu1548233367.jpg    lyhuissua1610120709.jpg     reiggoammua1411549676.jpg
buorneoprie112805732.jpg    goriukrua982906996.jpg      lyuffaibli1967513926.jpg    reiquiutrao1469348094.jpg
buusuyxue749241873.jpg      guesauschua945117276.jpg    maartychie1529195746.jpg    riydraartio1703964683.jpg
byidrydrie1376710097.jpg    gyivuiddua1476153275.jpg    maatsyknau1264095060.jpg    rooproissao1373226340.jpg
byochiochio1037127828.jpg   gyiweaxao1059961393.jpg     maipraernue155324914.jpg    ruohaephau1067854538.jpg
byudritte859484421.jpg      gypreegnio660260756.jpg     mauvuinyu1622597488.jpg     ruomiaphua805750846.jpg
byustriabe294702567.jpg     haetiwe628175011.jpg        meblyispryo1365180540.jpg   ruopreoplue1186452551.jpg
caatzeopri846930886.jpg     halbeuffie1432114613.jpg    meuphiymua1141616124.jpg    ryapluastrau2112255763.jpg
caawryittio356426808.jpg    haotraatti1359512183.jpg    meyschutzio2084420925.jpg   ryolbaospie327254586.jpg
caifreeknua1409959708.jpg   hassiotrua610515434.jpg     miaboequio2040332871.jpg    ryrtaapua752392754.jpg
caovoiwio2077486715.jpg     hiuchaushau1782436840.jpg   miatzuallyo269455306.jpg    saikeilbe352118606.jpg
caypeodyu1653377373.jpg     hiyduordyo1194953865.jpg    mibaostrue1350573793.jpg    saimmerie608413784.jpg
[…]
~/cloud$ _
You can see the chunks. But the content of the chunks is encrypted (twofish or AES at your choice), so neither the storage provider nor any third party will have a clue what is stored in this directory. :-)

There are post-processors that

all the chunks at your choice. Especially the encoding is handy if your data provider accepts only certain file formats. The generated “images” are valid, but contains only encrypted noise.

The interface of the post-processors is documented and extensible so you can add your own obfuscator in the chain. :-)

Special features

Architecture

Cloudtipfs is written in C++ and has a modular architecture. Currently there are 3 main types of modules. They represent the most common activities you usually do on a file system:

Main Filesystem Modules

There are 3 module types modules that implement the 3 main filesystem functions:
Directory Lister
Returns a list of directory entries. Each entry consists of a file name
File Stat Module
Returns the metadata of a file name, like the struct stat that is filled by the stat(2) system call.
File Data Module
Returns the data blocks of a given file
For each main module type there is a “pass through” implementation that just relays the file names/meta data/file data from/to the real file system. That allows 8 different types of cloud-tip filesystems:
Module typeFilesystem examples
#DirListerFile StatFile Data
0pass throughpass throughpass through simple mirror fs/relay fs
1pass throughpass throughmodify transparent compression/conversion/encryption
2pass throughmodifypass through fake UID, read-only permissions
3pass throughmodifymodify
4modifypass throughpass through case conversion/conversion of charsets/escaping of special characters in file names (i.e. on FAT)
5modifypass throughmodify compression/format conversion etc. with change of file name extensions
6modifymodifypass through POSIX-on-FAT (like UMSDOS did)
7modifymodifymodify all the crazy stuff you can think of! :-)

Auxiliary Modules

These modules provides additional functions that are needed from the main modules.
Block Number Encoder
Encodes a 64-bit number into a filesystem-safe name.
Strings to DataBlock
Encodes a std::vector<std::string> into a block that can be written into a file and vice versa.
Hash
Hashes a file name (or any other string) plus an optional generation value into a uint64_t hash value.
Block Converter
Converts a block of data into another one. This is the place where data compression and encryption is done

Module Interfaces

Modules can be compiled-in or build as a shared library that is loaded at runtime if necessary. All modules implement this interface:
#ifndef CTFS_MODULE_HH
#define CTFS_MODULE_HH

#include <string>

namespace ctfs
{
	class Module
	{
	public:
		virtual ~Module() {}
		
		// returns a name unique for each module class
		virtual std::string getName() const =0;
	};
	
} // end of namespace ctfs

extern "C"
{
	 // returns a short, human-readable name of the module
	std::string  ctfs_getModuleName();
	
	 // returns a description of the module parameters
	std::string  ctfs_getParameterHelp(const std::string& language="en");
	
	// returns a 'new'-allocated instance 
	ctfs::Module*  ctfs_createModule(const std::string& params);
}

#endif // CTFS_MODULE_HH
Each module category has its own characteristic set of methods. This allows “super modules” that implement more than one module category.

Main Filesystem Modules

Directory Lister

#ifndef CTFS_DIRLIST_HH
#define CTFS_DIRLIST_HH

#include <ctfs/Module.hh>

namespace ctfs
{
	// Helper class to allow iteration over directory entries
	// without knowing anything about its internal representation
	class Directory
	{
	public:
		virtual Directory() {}
		
		virtual size_t         size() const =0;
		virtual std::string getNext()       =0;
		virtual bool        hasNext() const =0;
	};

	class DirLister : public virtual Module
	{
	public:
		// returns a Directory that might be re-used by next
		// call to list()
		virtual Directory& list(std::string& path) =0;
	};
}

#endif //  CTFS_DIRLIST_HH

File Stat Module

#ifndef CTFS_FILESTAT_HH
#define CTFS_FILESTAT_HH

#include <ctfs/Module.hh>

namespace ctfs
{
	class FileStat : public virtual Module
	{
	public:
		// see man page of fstat()
		virtual struct stat stat(std::string& path) =0;
	};
}

#endif // CTFS_FILESTAT_HH

File Data Module

#ifndef CTFS_FILEDATA_HH
#define CTFS_FILEDATA_HH

#include <ctfs/Module.hh>

namespace ctfs
{
	class File
	{
	public:
		// SHALL flush all buffers and close the file
		virtual ~File() {}
		virtual void read (      char* buf, size_t size, off_t offset) =0;
		virtual void write(const char* buf, size_t size, off_t offset) =0;
	};

	class FileData : public virtual Module
	{
	public:
		// return a 'new'-allocated object, derived from File
		virtual File* open(std::string& path) =0;
	};
}

#endif // CTFS_FILEDATA_HH

Auxiliary Modules

Block Number Encoder Module

#ifndef CTFS_NUMBERENC_HH
#define CTFS_NUMBERENC_HH

#include <ctfs/Module.hh>

namespace ctfs
{
	class NumberEnc : public virtual Module
	{
	public:
		// returning a string representation of 'number'.
		// string rep length increases with each generation and must
		// be uinique for generation ~0u.
		virtual std::string number2string(uint64_t number, unsigned generation) const =0;
	};
}

#endif // CTFS_NUMBERENC_HH

Strings2Block


Hashes

#ifndef CTFS_HASH_HH
#define CTFS_HASH_HH

#include <ctfs/Module.hh>
#include <stdint.h> // for uint64_t

namespace ctfs
{
	class Hash : public virtual Module
	{
	public:
		// returning a 64-bit hash from string + generation
		virtual uint64_t hash(const std::string& s, unsigned generation=0) const =0;
	};
}

#endif // CTFS_HASH_HH

Download

Currently (2012-10-01) I'm rewriting the software to the modular architecture described above. I'll add new download links here, soon. Please be patient…


© 2012-2014 by Lars H. Rohwedder