再見CSV，速度提升150 倍！

最近上課時候發現老師在用一個我沒見過的數據格式，最主要的是速度很快，求教之後決定分享給大家，畢竟獨樂樂不如眾樂樂~~~

先介紹下為什麼要和其實也談不上徹底再見吧，日常還是要用的，這裡只是再介紹一個更加高效的數據格式。

用一是因為

雖然用

其實，今天和大家介紹一個

Feather是什麼？

Feather它最初是為了

現在

不過，要說明下，它的數據格式並不是為長期存儲而設計的，一般的短期存儲。

如何在Python中操作Feather？

在首先需要安裝

# pip # pip pip install feather -format # Anaconda conda install -c conda-forgefeather-format pip install feather -format # pip pip install feather -format # Anaconda conda install -c conda-forgefeather-format # Anaconda # pip pip install feather -format # Anaconda conda install -c conda-forgefeather-format

只需要上面一行安裝即可，很簡單。

我們通過一個較大的數據集舉例，需要數據集有5 列和1000 萬行隨機數。

import feather import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() import numpy as np import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() import pandas as pd import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() np.random.seed = 42 import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() df_size = 10000000 import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() df = pd.DataFrame({ import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() 'a': np.random.rand(df_size), import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() 'b': np.random.rand(df_size), import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() 'c': np.random.rand(df_size), import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() 'd': np.random.rand(df_size), import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() 'e': np.random.rand(df_size) import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head() }) import feather import numpy as np import pandas as pd np.random.seed = 42 df_size = 10000000 df = pd.DataFrame({ 'a': np.random.rand(df_size), 'b': np.random.rand(df_size), 'c': np.random.rand(df_size), 'd': np.random.rand(df_size), 'e': np.random.rand(df_size) }) df.head()

它的用法和之前

保存

兩種方式，一是

df.to_feather('1M.feather')

二是用

feather.write_dataframe(df, '1M.feather')

加載

加載也是一樣的，同樣還是兩種方式。一是通過

df = pd.read_feather('1M.feather')

二是用

df =feather.read_dataframe('1M.feather')

操作習慣一樣，難度完全沒有。

和CSV的區別

對比產生美。下面來看下下圖顯示了上面本地保存DataFrame 所需的時間：

差距巨大，有木有！原生如果使用

然後再看下讀取不同格式的相同數據集需要多長時間。

同樣，差異也很明顯。 CSV並且

CSV假如我們每天存儲千兆字節的數據，那麼選擇正確的文件格式至關重要。 Feather

當然，如果追求更多的壓縮空間，也可以試試

結語

說了這麼多，可能很多同學還是甩出一句話：這個東西怎麼說呢，當你需要它時，它就有用，如果日常沒有速度和空間的強烈需求，還是老老實實畢竟

再見CSV，速度提升150 倍！

再見CSV，速度提升150 倍！

Feather是什麼？

如何在Python中操作Feather？

和CSV的區別

結語

我是猿子，一起起飛啊！

What do you think?

Written by marketer

Google 最強 SEO入門：70個優化技巧

MonsterInsights 評論：它是最好的 WordPress Google Analytics 插件嗎？

Odoo 與其他ERP軟體之間的差異

如何通過全球第一免費開放原始碼ERP Odoo做到項目100%交付

Odoo 介紹

ERP部署的類型

詳解MAC地址、IP地址、ARP、TCP/UDP協議

30秒一張海報，字體配色入門科普｜Canva可畫教程

整整熬了72小時，總算把Excel函數基礎+財務公式匯總了，真心好用

親測有效的獨立站土豪市場

跟著Nature學繪圖

來了！ Python 官方發布整套中文PDF文檔（共27本）

Google Web Vitals – 使用者體驗量化

谷歌SEO必備的70個chrome擴展程式

營銷小百科海底撈SWOT分析|

Google Voice 跨境人必備賬號神器——使用方法及如何升級永久賬號（上篇）

外貿小公司、小工廠如何接外貿訂單？

墨爾本大學-商業分析碩士（Master of Business Analytics）項目介紹

哪些Shopify 的SEO應用程序是最好用

SCI論文的討論部分加分句型

Feather是什麼？

如何在Python中操作Feather？

和CSV的區別

結語

我是猿子，一起起飛啊！

What do you think?

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections