Newest 'parquet' Questions

0 votes

1 answer

38 views

Why do I get the "is not a Parquet file" error when when creating a parquet reader

Trying to create a AvroParquetReader for a parquet file reading in blockBlob in azure storageaccount, but getting an error - Caused by: java.lang.RuntimeException: InputBuffer@7a70b9e9 is not a ...

Developer208

89

asked Aug 8 at 16:28

2 votes

1 answer

47 views

Avoid writing partition column in Parquet file

I'm trying to export a DuckDB table to a Parquet file using hive partitioning. Unfortunately, DuckDb writes the partition both as hive partition and as a column in the file. This doesn't seem correct, ...

Erica

1,708

asked Aug 8 at 14:39

0 votes

1 answer

45 views

Pandas parquet file pyarrow.lib.ArrowMemoryError: malloc of size 106255424 failed

I am trying to run a python script in cPanel terminal. I am getting an error as the script tries to open the parquet file that is a 46.65 MB size. This worked on my home computer. df = pd.read_parquet(...

Shane S

2,173

asked Aug 8 at 3:33

0 votes

0 answers

40 views

How to reduce storage space used in redshift serverless?

I have S3 bucket which contains parquet files which are not more than 20 MB in size (all parquets combined). I'm loading these files into AWS Redshift serverless tables using COPY command. But after ...

Poreddy Siva Sukumar Reddy US

3

asked Aug 7 at 14:52

0 votes

0 answers

33 views

Best way to find multiple ids in list of files in spark scala

I have a list of IDs that I want to find in my parquet files. For each of the IDs I do have an idea in which files they could be present i.e. I would have a mapping where I have ID1 -> file1, ...

Utkarsh Goel

311

asked Aug 6 at 18:30

0 votes

0 answers

16 views

Inferring schema from a pyarrow dataframe created with parquet file is giving wrong data type for a column

I have a parquet file which is loaded into pandas dataframe as below. df = pq.read_table("full_file.parquet") and If I check the column types in the schema, as below, for some columns it is ...

Poreddy Siva Sukumar Reddy US

3

asked Aug 6 at 9:43

0 votes

0 answers

15 views

greenplum pxf server create external tabe based on parquet hdfs file with partition in select

I have a parquet hdfs file with partitions and I want to create external table in greenplum with partition columns in it. This is HDFS file CREATE TABLE productshelf.funnel ( system_product_id ...

user_Dima

57

asked Aug 6 at 6:07

0 votes

0 answers

23 views

How does pyarrow handles date partitions?

I store files on s3 in the following format: ../country/state/city/date=2024-08-02/12-00-57/time_series.parquet and the table contains multiple columns on of which is named date of type pa.date64(). ...

zipp

23

asked Aug 2 at 20:49

0 votes

0 answers

29 views

pd.to_parquet not owrking, but also is?

So I am trying to update the parquet file as well as google sheet at the same time, it works, but also doesn't it does seem to update the file at first glace from the reading in the file, but only if ...

kkkasza02

3

asked Aug 1 at 17:52

0 votes

1 answer

34 views

Reading iceberg table in Dremio fails due to "is not Parquet file" and "expected magic number"

I've got a Spark Structured Streaming job that reads data from Kafka and writes them to S3 (NetApp StorageGRID appliance, on-prem) as an Apache Iceberg table (via Nessie catalog). Afterwards I access ...

chris922

366

asked Jul 31 at 13:59

1 vote

1 answer

16 views

Writing Apache Parquet files using Parquet-GLib

Does anyone have any pointers towards a somewhat complete example or representative source code as to how to actually use parquet-glib (the C bindings to reading and writing Apache Parquet files)? The ...

CptPicard

326

asked Jul 31 at 12:04

0 votes

1 answer

32 views

How to automatically get table create statement for Redshift serverless from Pandas dataframe

I have an S3 bucket which contains parquet files. I need to analyse that parquet file and create the required table in Redshift serverless. import pyarrow.parquet as pq df = pq.read_table(f"s3://{...

Poreddy Siva Sukumar Reddy US

3

asked Jul 30 at 7:23

0 votes

3 answers

50 views

How update Parquet file after reading from it - refreshByPath not working

I need to persist certain information into parquet file to be accessed and updated during one batch job or the next (e.g. average values, slopes etc). I created a little test for a prototype: class ...

dermoritz

12.8k

asked Jul 30 at 6:54

0 votes

1 answer

63 views

Spark Large single Parquet file to Delta Failure with Spark SQL

Cluster details spark 3.4 5 executors nodes with x16 cores and 112GB RAM Parquet file details provided via 3rd party source file in adls single 20GB .parquet file 68 million rows 1,599 columns 5034 ...

Brian

139

asked Jul 29 at 21:08

0 votes

1 answer

33 views

Open with Python an R data.table saved as metadata in a Parquet file

With R, I created a Parquet file containing a data.table as main data, and another data.table as metadata. library(data.table) library(arrow) dt = data.table(x = c(1, 2, 3), y = c("a", "...

julien.leroux5

1,032

asked Jul 26 at 20:00

Collectives™ on Stack Overflow

Questions tagged [parquet]

Why do I get the "is not a Parquet file" error when when creating a parquet reader

Avoid writing partition column in Parquet file

Pandas parquet file pyarrow.lib.ArrowMemoryError: malloc of size 106255424 failed

How to reduce storage space used in redshift serverless?

Best way to find multiple ids in list of files in spark scala

Inferring schema from a pyarrow dataframe created with parquet file is giving wrong data type for a column

greenplum pxf server create external tabe based on parquet hdfs file with partition in select

How does pyarrow handles date partitions?

pd.to_parquet not owrking, but also is?

Reading iceberg table in Dremio fails due to "is not Parquet file" and "expected magic number"

Writing Apache Parquet files using Parquet-GLib

How to automatically get table create statement for Redshift serverless from Pandas dataframe

How update Parquet file after reading from it - refreshByPath not working

Spark Large single Parquet file to Delta Failure with Spark SQL

Open with Python an R data.table saved as metadata in a Parquet file

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [parquet]

Related Tags