Overview
- Network File System
- Shared between instances
- NFS v4.0/4.1 compatible
- Alternative standard is CIFS (SMB extension) on Windows
- Both are "on-the-wire" protocols (i.e. data serialization)
- Distributed across many servers
- Aggreate I/O throughput (10GB+)
- Size scales to 1 PB+
Filesystem (max 10 per AWS account)
- uses fopen/fclose/fwrite (POSIX compliant)
- shared access to file
- modififcations in situ ("in place")
- Alternatives
- EBS which is block store (lower level)
- S3 object store (uses GETs/PUTs) - (higher level)
File ingest
- EFS File Sync
- Tool to copy data in parallel
- 5x faster than standard linux tools
- Supported by AWS console
- Command line tools
- rsync - single-threaded, very chatty
- cp - single-threaded, faster than rsync
- GNU parallel - shell tool to run command in parallel
- mcp - multi-threaded, drop-in replacement for cp, developed by NASA
- fpart - multi-threaded rsync
- s3cp + GNU paraller (to copy S3 -> EFS)
Mount Target
- Endpoint for connecting to EFS
- Each AZ has its own endpoint
- When multiple subnets in AZ use arbitrary subnet
- IP Address assigned from the subnet
- DNS assigned automatically
- Avoids inter-AZ traffic (paid)
- When multiple subnets in AZ use arbitrary subnet
- Has Security Group
- Mounting
- Manually: mount -t nfs4 DNS "mount-point"
- On reboot: fstab (nfs defaults auto 0 0)
- On launch: cloud-init
Use cases
- Oracle
- SAP
- Legacy applications
- WordPress
- JIRA storage
- Shared or clustered databases
- Shared dataset when you want to modify files in situ
- Overflow
- DR
Security
- Initially 755 root root
- UID and GID are used (not user names)
- Turn off Id Mapper
- No identity authentication (anybody can claim to be root)
- Permissions are cached
chown_restricted- "giving away" files not permitted
- root can change owner
- root/owner can change owning group
- if owner changes he must be member of target group also
- No root squashing
- remote "root" is also a "root" on the EFS (i.e. can change file ownership)
- i.e. no way to isolate data from 2 EC2 instances
- remote "root" is also a "root" on the EFS (i.e. can change file ownership)
- Uses Security Groups (TCP 2049)
- Access from On-premise possible
- DX
- 3rd party VPN (but not VGW)
Sizing
- Each object
- Metadata: 2KiB
- Data: increments of 4KiB
- Metered information may not be real time
Performance
- SSD based
- Parallelizable (like S3)
- Modes
- General purpose
- Low Latency
- Limited throughput (max 7K ops/sec)
- Max I/O
- Large scale and data-heavy applications
- Higher latency per ops
- The higher IO size the higher the throughput
- General purpose
- Burstable Throughput
- Minimum 100MiB/s
- Burst of 100MiB/s per TB of storage (e.g. 10TiB can burst to 10 * 100 = 1000 MiB/s)
- Credit earned: 50 MiB/s per 1TiB
- Cap: can burst max 12h / day
Backup
- Must be deployed by customer
- CloudFormation template available (Lambda, SNS, EC2, DynamoDB, S3)
Legacy Approaches
- Linux
- Use storage optimized instances in RAID0 array
- DRDB
- Replicate blocks between AZs sync
- Replicate blocks to EBS async -> Snapshot
- For really large stores use GlusterFS: 2PB
- On Windows DNS round-robin required as there is no client
- Slow for small files
- Native x64 Linux client recommended
- Linux client gets the list of all servers and load balances himself
- similar to AWS Memcached auto-discovery client
- Linux client gets the list of all servers and load balances himself
- Windows
- DFS (Distributed File System) got improved in Windows 2012
- Samba Client writes/reads synchronously
- Must be SMB v3 (which means Windows 2012 must be used everywhere)
References
- http://docs.aws.amazon.com/efs/latest/ug/how-it-works.html
- http://unix.stackexchange.com/questions/27350/why-cant-a-normal-user-chown-a-file
- http://askubuntu.com/questions/108771/what-is-the-difference-between-a-hard-link-and-a-symbolic-link
- https://www.youtube.com/watch?v=xbuiIwEOCAs
- https://www.youtube.com/watch?v=VffbHp34UzQ&t=680s
- https://github.com/open-guides/og-aws#efs
No comments:
Post a Comment